When we saw you starting to explore the HumanFirst platform, we couldn't have been more thrilled to have someone with such in-depth understanding of the market provide feedback and product insights.
So it’s a pleasure to spend some time with you, to talk about all things Conversational AI-related; I hope this is the first of many conversations we have together!
Cobus: Thanks for having me Greg, I have been really looking forward to this chat….
[Greg] So let’s dive right in…what are the new trends and roles that you see emerging?
Cobus: Thinking of roles in specific, there has been an explosion of new job titles and descriptions which have not existed before, even until recently. And there seems to be a significant focus on conversational UI designers, and rightly so. There is this proliferation of conversational AI-related job opportunities across industries. And with this, there is good work being done by a few organizations in standardizing the industry and creating training opportunities; a few that come to mind are the CDI, Cognigy’s academy, Kore.ai’s Online Training and Certification Process, and Rasa’s certifications on Udemy. Obviously, there are also larger players like AWS and Microsoft in cognitive certifications. But I venture to say the focus is perhaps not the same.
Getting to the second part of your questions, new trends, the recent Gartner and CDI reports saw the minnows upset the behemoths of chatbots (laughs). It reminds me of Malcolm Gladwell’s analyses of the David and Goliath story. The newer entrants in the form of Kore.ai, Cognigy, and Avaamo saw an opportunity in the market. These needs were a SaaS environment, an end-to-end solution, no-code to low-code, and low barrier to entry scenario.
They also to some degree fused design and development of conversational flows with their design canvas approach for dialog development. I guess this played into the demise of an exceptional product like BotSociety.
Nuance Mix is really following suit, fueled by their prowess in voice enablement of the conversational experiences and being able to leverage Azure Cloud since the Microsoft acquisition. As Naval Ravikant would say, the world is an efficient place. So the behemoths can still adjust and leapfrog the competition with increased market relevance.
In a sense, the secret to mass adoption is out…but that is not to say that more technical, disparate products sets do not have a place for complex implementations; like edge installations and the like. Here Microsoft and NVIDIA Riva come to mind.
[Greg] What do you think of the ecosystem? Who’s doing a really good job at evangelizing?
Cobus: One thing I personally find refreshing is the demise of the stealth-mode and secret sauce era (laughs). The ecosystem has matured at a rapid pace. Open access allowing users to prototype and experiment plays a big role in the development of ecosystems. Cognigy has had full free access for a while now. Kore.ai has followed suit. These accessible systems do a lot for building the ecosystem and imparting knowledge, not only of the platforms but general Conversational AI principles. Nuance Mix needs a mention in this area: Mix was not rated by Gartner if I recall correctly but was rated in the CDI report. Mix is a product of the ilk of Cognigy, Kore.ai, etc. Also, free full access, which bodes well for product exploration.
In terms of evangelists, the content and dedication from the likes of Kane Simms from VUX World, Bret Kinsella from Voicebot.ai, and others like Hans van Dam ( founder of the CDI) are noteworthy enablers. Really for anyone wanting to learn about products and the environment resources are readily available.
Rasa has a strong advocacy movement, and their community is really growing. I don’t know how well this translates into commercial success, but the information, tutorials, and best-practice they put out are exceptional.
[Greg] What are your thoughts on the vertical versus modularized approach to building solutions?
Cobus: I would say, on a horizontal level there are really four groupings of products.
The first being NLP related, and can really act as an aid and help to a Conversational Agent. Or even be used very effectively for an initial high pass on user input for checks like language determination, named entities, sentence boundary detection, and the like. I would say products falling into this category are spaCy, HuggingFace, Rasa NLU API, etc. [ I realize a product like HuggingFace has a vast scope of implementation options, so I don’t want to detract on their capabilities ].
The second category I would say is more pro-code to low-code orientated. For dispersed more complex implementations Microsoft Cognitive Services and NVIDIA Riva come to mind. One can include Cisco MindMeld, Rasa, and DeepPavlov here.
The third category is where the excitement is currently being generated: these are the SaaS end-to-end solutions that are no-code to low-code, that are democratizing Conversational AI, fusing conversation design and development, and providing an end-to-end implementation framework. Easy access…the list goes on.
Then I would say, lastly there is a very exciting and emerging category. These are tools to optimize conversational experiences, products like QBox and TalkMap come to mind. HumanFirst is one of the leaders of this category in my opinion, which is why I really took notice to your solution and started exploring it and providing feedback :)
Finally, OpenAI’s language API, also known as GPT-3, even though using that term interchangeably is not accurate (laughs) needs to be mentioned. Everyone is hugely excited about OpenAI’s language API and Codex (as well as other large language model APIs like Cohere).
[Greg] Speaking of GPT-3, OpenAI, and large language models, what are your thoughts on the applications of these in business conversational AI contexts?
Cobus: I have really enjoyed prototyping with the OpenAI language API and Codex. It is impressive for sure.
Obviously, it is not a chatbot framework. The out-of-the-box chat capacity is impressive, in terms of maintaining context and the level of Natural Language Generation. A chatbot can be created with a single descriptive line.
The challenge I would say is fine-tuning the chatbot. I always think of seven elements when speaking of fine-tuning. These I would say are forms, or slot filling, managing intents and entities, NLG, dialog state management, digression and lastly disambiguation.
And I tend to assess chatbot frameworks based on these elements. Fine-tuning becomes paramount when the conversational agent scales in functionality, mediums, and languages, and really improving the user experiences at a steady cadence.
OpenAI’s Language API does make provision for fine-tuning, and it is a good start, but the granularity is lacking. Large data and language models work well for Speech Synthesis, Automatic Speech Recognition, and things like Named Entities I guess. But a real advantage of the successful conversational AI frameworks has been accuracy with little training data.
There are niche applications for the Language API, functionality like summarization, grammar correction, extracting keywords, classification, extract addresses, and the like.
[Greg] Intents or no intents? (i.e: the old tabs vs. spaces argument)
Cobus: (laughs) well, this has actually been one of my favorite discussion points for the last few months.
I always said, traditionally chatbots have four pillars; intents, entities, dialog state management, and bot responses or messages.
And there has been movement in the market to deprecate intents for instance. Cases in point are IBM Watson Assistant’s Action Skills, Microsoft Power Virtual Agents, and Alexa Conversations. These attempts to deprecate the normal approach of intents fronting the conversation and segmenting each conversation according to a defined intent.
Then there is the idea of deprecating the dialog state management system, Rasa is really the avant-garde here, with Alexa Conversations. And then Natural Language Generation wanting to deprecate fixed and pre-defined response scripts, OpenAI’s NLG has been astounding.
But of late we are seeing the opposite actually, there is a movement amongst the Gartner leaders to not deprecate one or more of these legs but merge them. One could say, merge these pillars.
Let me give you a few examples. Structure is being introduced to intents, hence quite the opposite of deprecating it - as you would know with HumanFirst (laughs).
To name a few other examples, Cognigy has hierarchical intents and Kore.ai has sub-intents and follow-up intents. Microsoft LUIS has a feature for nested entities they call machine learning entities. Then there is a whole host of functionality and settings added to the NLU portion to manage the conversation.
The same with the dialog state management portion, dialogs are more closely coupled with a portion of the NLU, for example, specific intents and entities are linked to specific portions of the flow. Like Nuance Mix have a fully integrated web console to test NLU separate from the flow, or with the flow and they have a nice extraction of the dialog from where the dialog wording and bot behavior can be programmed.
[Greg] What are the next big opportunities?
Cobus: I really like this question, it’s something I have been giving quite of bit of thought to…
With the launch of more complete end-to-end cloud-based Conversational AI solutions, the table stakes are increasing considerably. With low table stakes, it was easy for companies to go to market and differentiate themselves with really a lower level of functionality and complexity. With the current state of play, with advanced and augmented environments, the table stakes are ever-increasing, and considerable product development is required just to reach market parity. And then product differentiation still lies ahead. Not even to mention market share…I truly believe the horizontal growth and the entry of new platforms will slow down.
And as the market matures, vertical vectors will emerge one could say. New products servicing the Conversational AI market. These products should find themselves solving for one or more (not too many) of the vertical vectors.
The vertical vectors currently I would say are Data Preparation and Structuring, Dialog State Management, NLU, Personalization, Automated Bot Testing and QA, Voice Enablement, and lastly Conversation and Data Analyses.
These vertical-focused products in all likelihood will make for the next round of acquisitions, as a convenient way for larger players to differentiate their products and grow niche functionality.
I feel like applying NLU to ASR data is a very clear and significant market with quite a few problems which needs to be solved.
[Greg] Speaking of which, how do we solve for ASR quality problems?
Cobus: (Laughs), we might be out of time for a deep dive on this one... I have some ideas I would love to share, but suffice to say that voice channels offer a massive source of information and an allure of their own. There are areas I’m excited to see develop. Companies are struggling to build value from this data due to the inherent challenges of voice…
[Greg] Let’s have that catchup sooner rather than later (laughs), thank you for your time and hopefully we chat soon.
Cobus: Thanks again Greg, chat soon.
HumanFirst is like Excel, for Natural Language Data. A complete productivity suite to transform natural language into business insights and AI training data.