Free natural language is the holy grail of any interaction between human and technology and there are multiple products that are claiming to solve this problem. The truth is that building a great natural language experience is hard and it’s even harder when the spectrum of the questions is not limited.
We have been contemplating with different ways to create a great natural experience for our users for a while now and we came up with a two-folded strategy – one is to build a robust natural language engine for a very specific space – data stored in BI platforms for the use of business people. Focusing on analytical questions coming from business professionals allows us to train a model that is more likely to understand the needs of our users. The complementary aspect of our approach is to develop a guiding auto-complete engine to help the user explore the database using different NLP features in the right way to reduce errors. A great auto-complete also takes into account the number of fields and distinct values to avoid a case where the user needs to scroll a list of dozens or hundreds of suggestions which results in a bad user experience.
As we are not solving the all-purpose NLP problem, we can predict quite accurately the questions that people may ask as they are analytical and business oriented on one hand, and as we know the schema of the table the users are exploring. Therefore, in our case a suggestion engine, an auto-complete in our terms, is needed and can be developed pretty accurately.
In this article, we discuss how we build Nibi’s auto-complete engine and compare it to other solutions in the market.
Our auto-complete engine is built on two layers:
Words auto-complete and
These mechanisms are not mutually exclusive as some might think.
Before we dive into how we solve this challange, a quick explanation of the difference between words and sentence auto-complete.
Briefly, words auto-complete suggests words from a corpus that is comprised of distinct values and fields names from the table. It shows them based on the letters that the user types. For example, the user starts with the letter “O” and the words auto-complete mechanism will suggest “Orders”, “Order_id” and “Ofer”. There are two issues with the words auto-complete – one is that it suggests words regardless of their position in the sentence which can result in funny sentences, far away from a great natural language experience that everyone wants to build. Second, and more important, it doesn’t suggest words that are not in the database – words such as “average”, “compare”, etc. These operators are important for the user to deeply understand the data.
This is where the sentence auto-complete kicks in. As mentioned above, while building a robust NLP to SQL engine is at the heart of our product, we learned that most people don’t know what they can ask and how to do so. If we kept it to a words auto-complete only, our users wouldn’t know that they can use words such as “compare” to run a comparison or that they can ask for “names that start with E”.
Without getting into the technical details, our sentence auto-complete is designed to refine the words that our words auto-complete recommends, so we suggest column names and distinct values based on their types and not just based on their names. This is important for example, when the user is trying to average a field and we narrow the options to relevant number fields. Another example would be showing location fields in the right place. In addition, the sentence auto-complete is a reflection of our NLP engine. Every new NLP feature that we develop, such as sort and compare, is re-constructed in various ways in the sentence auto-complete so the user is not only exposed to distinct and field names but also is guided to take advantage of numerical operators such as average, compare, total, sort by, etc.
This is how it looks in reality. As you can see, the user starts by typing “what is the” (which were offered before) and then he gets a list of suggestions that includes keywords and relevant data fields. You can also see the split and the description of the type of recommendations so the user can easily navigate it.
Looking at a well known platform in the NLP BI space – Q&A by PowerBI, which we experienced first hand – it seems that it is offering words auto-complete only. Meaning that PowerBI shows field names or distinct values regardless of their position in the sentence and based only on their names. As explained above, it means that without a sentence auto-complete, the user is left by its own to find out what kind of operators are possible.
Also, while UX is a subjective matter, it seems that PowerBI is bombarding the user with suggestions in quite a messy way. See PowerBI’s auto-complete below and feel free to compare it to the way we show our suggestions.