Back to blog
Articles
Articles
February 3, 2022
·
4 MIN READ

Continuous AI Improvement

February 3, 2022
|
4 MIN READ

Latest content

Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024
Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024

Let your data drive.

With the emergence of platforms democratizing conversational AI, there’s a smaller hurdle to get a basic NLU model up and running. The real work starts when your conversational AI project is in production and conversations you would have never expected are thrown at you.

Deploying your chatbot is similar to being thrown in the pool without floaties for the first time. It happens suddenly, you’re no longer in a controlled environment, and you’ll find out the hard way whether your training was adequate enough to stay afloat.

The only way to deliver good conversational experiences in an unforeseen environment is to adapt to it.

Why It’s Important To Have Continuous Improvement:

  • It’s impossible to predict every eventuality. NLU models need to be continuously learning.
  • There’s a need to respond to real-world changes, such as a new product or new company policies.
  • Humans are dynamic, and there will always be some spontaneity in customer queries.
  • Conversational AI  projects learn by doing. Incoming data should be reflected in your model.

You’ll be amazed at the unanticipated utterances you’ll receive in production. However, using state-of-the-art tools like HumanFirst can ease your anxiety about the unanticipated nature of conversations, and will make the process of continuous improvement consequential, data-driven, and streamlined.

How to Approach Continuous Improvement


Real-Life Conversation Data

Using your unlabeled data to discover new intents and optimizing existing ones is the path to 100% accuracy and coverage. As mentioned above, chatbots learn by doing. There’s no better way to learn how to react in real-world scenarios than by using real-life conversation data generated in production.

That’s why HumanFirst is so powerful; your post-deployment data can be automatically piped in from your Conversational AI platform at the cadence of your choice. This allows you to leverage your unlabeled data to improve your labeled data. You can sort and explore your unlabeled data based on uncertainty, margin score, and entropy of your trained model, identifying utterances that are most likely to represent new or related intents.

Data-Driven Improvement

Your model is bound to underperform when improvements aren’t driven by data. For example, when annotators come across unlabeled utterances, their first hunch might be to add them to the intents they think they belong in. But, if every decision was based on a hunch (whether that’s adding training phrases, disambiguating, splitting, merging intents, or other) it will take copious amounts of trial and error before achieving a well-performing conversational experience.

This specific example problem is mitigated by HumanFirsts’ real-time recommendations; when adding new utterances from your unlabeled data to existing intents, users are prompted to add it to the intent with the highest confidence of matching (provided in real-time). On a general level, all of HumanFirsts’ workflows around continuous improvement are similarly model-driven with real-time feedback.

Streamlined Continuous Improvement

With so much incoming data, so many problematic intents, and so little organization, how does one know where to begin? Having streamlined, consistent improvement workflows is important if you want to see significant enhancements to your models.

HumanFirst generates on-demand 5-fold cross-validation analysis against its NLU (or your own), to provide intent-level metrics (F1, precision, recall, accuracy) that can be used to understand and tune your model. This will help you:

  • Identify which intents need additional training examples
  • Identify which intents have high confusion (typically intents with a mix of very different training examples)
  • Identify which training examples belong to another intent in your corpus
  • Re-label problematic training examples

HumanFirst provides machine-learning-assisted workflows that help fix your model's data. It’s easy to merge conflicting intents and their training phrases together with a single click, or quickly move problematic utterances from one intent to the other.

Their built-in disambiguation feature (both real-time, and based on your trained model) allows you to quickly view intents that are conflicting, providing actionable workflows to ensure each intent's scope is as clean and specific as possible.

Disambiguation Workflow

At the end of the day, we’re more likely to make consistent changes when there is less friction between us and the problem. Finding the easiest solution to incrementally improve the model on a continuous basis is key. Like the stock market, you’re in it for the long game (steady increases), as opposed to the volatile cryptocurrency market (which includes massive changes, massive upticks, and massive crashes).

As Alex Halper says, deploying Conversational AI is a journey, not a milestone. That statement can seem overwhelming and daunting without streamlined processes. Luckily, state-of-the-art approaches like HumanFirst are there to help you in this process.

HumanFirst is like Excel, for Natural Language Data. A complete productivity suite to transform natural language into business insights and AI training data.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox