The leading enterprise conversational automation platform
Deploy virtual voice agents for automated phone conversations
Conversational AI Analytics Suite for 360° Insights
Elevating customer service with AI-powered Agent Assist
Turnkey ecosystem integration with pre-built connectors
Fast and efficient call handling with automated ID&V and smart routing, all in natural language
Enabling frictionless, transactional customer service on any voice or text channel
Giving service agents superpowers through AI-infused agent augmentation
Enhance customer interactions with AI on the frontline
Connect customers to the right agent on the first try
Give your agents the confidence to solve complex requests
Boost your customers’ experience and agents’ efficiency
Improve customer satisfaction with detailed insights
Engage with your customers on the platform of their choice
Drive appropriate actions with intelligent processes
Integrate easily with existing enterprise infrastructure
Increase operational efficiency and reduce agent attrition
Dive deep into the world of conversational AI
Learn everything about voice bots and virtual agents
Become an expert in Conversational AI and automation
See Conversational AI in action through demos
Explore Cognigy.AI and get support from the community
Join our mission to make Conversational AI more accessible
Find us at live conferences or virtual meetups
Help us achieve our vision with your talents and skills
Be the first to know about all the latest news
Reach out to our experts and get your questions answered
See how we help your organization secure sensitive data and comply with applicable laws and regulations.
An in-depth guide into the trusted use of AI in customer service automation
Find out everything you need to know about establishing "Explainable AI" based on the AIC4 criteria catalogue.
What makes a great Natural Language Understanding Engine? For starters, the ability to reliably identify the correct user intent in a given input. At the same time, the machine should make very few “false positive” mistakes – the error of incorrectly finding a given intent when possibly no intent was expressed at all.
But reliability and accuracy are not the only things that make a good NLU engine: Training an NLU can take a lot of time and effort. Consequently, the fewer examples needed to train the machine the better and few shot learning abilities of the NLU should be considered.
A method to assess and compare the NLU is to test a trained model on new inputs it has not seen before. A suitable approach is to construct a hold-out test set of utterances through random selection where the correct intent classification is part of the dataset.
To evaluate and distinguish few-shot learning ability NLU may be trained on only a handful of example sentences. The fewer sentence the machine has to train the worse one can expect it to perform – one would want to see whether performance is still useful in practice and rule out behavior where performance drops off like a cliff and see whether the NLU maintains a useful standard.
To conduct a benchmark test without human biases in the data set we at Cognigy use an independent data set compiled by researchers at Heriott-Watt University. It contains more than 10,000 utterances around home automation. Details on the research are published in the paper “Benchmarking Natural Language Understanding Services for Building Conversational Agents (2019). Their data is available on Github.
In our test we used the data from Heriott-Watt on NLU platforms Microsoft LUIS, Google Dialogflow and IBM Watson to compare it against Cognigy NLU.
In detail, for 64 different intents we randomly picked 10 example sentences and used them to train the NLU. We then tested 1076 examples not in the training set. The process is visualized in the graphic.
To compare results for different numbers of training sentences we constructed a second scenario with 30 input sentences.
Here are the results (all tests performed August 2020):
An accuracy score of .751 means that roughly 75% of test sentences were matched to the correct intent. As the data set is purposely designed to be a great challenge to state-of-the-art NLU engines, 75% is a pretty good result. There are many overlapping and challenging intents and most of the time the NLU well understands the correct topic such as music but merely fails to distinguish whether the user wants to turn it off or on, skip a song etc. This is one of the reasons Cognigy introduced Intent Hierarchies where intents can be ordered by semantic topic and one can resolve such hierarchical recognition challenges.
We repeated the process with about 30 example sentences per intent and a total of 5518 test sentences.
Unsurprisingly the intent recognition improves with more training data. However, in a real-life scenario one would have to write almost 3 times as many example sentences to surpass the accuracy that was already achieved with 10 training sentences.
Let’s take a look at the detail level and focus on one specific example from the data set.
The example sentence is: “are there any tornado warnings today”. The true intent that all engines should recognize is “weather_query” - a user asking for the weather forecast.
Here are individual results for the NLU engines mentioned above:
The table depicts the recognized intent and score for the example sentence. Like in the previous example: A higher score reflects better accuracy. DialogFlow, LUIS and Watson predict the wrong intent and they do so with relatively modest confidence correctly indicating the uncertainty of the model. In this particular example, Cognigy's recognition of the intent is correct and the confidence is relatively high.
Although the nature of and randomness ingrained in machine learning algorithms should caution us, we might venture an interpretation of the results in this case: Clearly, LUIS has not much to go with and likely associates “today” with a calendar query. Watson and DialogFlow in contrast interpret the phrase along the lines of “any traffic jam warnings today for me?” which seems sensible enough but ignores the reference to “tornado"- (which is not in the test set) and would have to be familiar somehow to the NLU.
The Cognigy NLU does pick up on tornado in contrast. Moreover, not only does it get the weather association - a capability without which the results from other NLU vendors could not be explained. It also weighs the importance of this word and competing signals for other intents in context to give the correct outcome in this instance.
Note the importance is in this instance as the technology is far away from a deep concept of a tornado, weather and the like. Its non-linear workings are the result of a neural network, however, which are getting more and more able to capture elements of rich meaning encoded in our language.
Curious about seeing Cognigy NLU in action? Start a free trial and follow our onboarding tutorials to explore Cognigy's leading-edge technology yourself.
Stay up to date with the latest Conversational AI tips and news.