Back to blog
Articles
May 5, 2021
·
2 MIN READ

Labeling isn't hard if you know what to label.

May 5, 2021
|
2 MIN READ

Latest content

Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Tutorials
6 min read

Generating Chatbot Flow Logic from Real Conversations

How to build flexible, intuitive Conversational AI from unstructured customer data.
February 29, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024
Tutorials
4 min read

Accelerating Data Analysis with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to accelerate data analysis.
January 24, 2024
Tutorials
4 min read

Exploring Contact Center Data with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to streamline topic modeling.
January 11, 2024
Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024

Let your data drive.

Labeling isn't hard if you know what to label.

GREGORY WHITESIDE
May 5, 2021
.
2 MIN READ

Companies like Scale.ai [scale.com] (recently valued at $7B+ dollars [https://fortune.comscale-ai-valuation-new-funding-fundraising-data-labeling-company-startups-

Companies like Scale.ai (recently valued at $7B+ dollars) were able to build massive businesses  around data cleaning & labeling for AI: this is a human-intensive task, and companies flocked to services that allow them to get this value quickly.

Some of the other startups in this space include Clarifai, CloudFactory, LabelBox and Sama (now with offices in Montreal!), and in the majors we
have Amazon's SageMaker Ground Truth, as well as Google's and Azure's labeling tooling.

While some of these providers let you bring your own team of labelers, the real advantage (and business model) is in their workforce: they help teams scale the work by parallelizing the labeling tasks across a pool of humans. A lot of the peripheral value is in the data governance and collaboration features (which makes sense, given projects are often outsourced).

To my knowledge, most of these platforms started with image labeling, and gradually added workflows to support other types of content (videos, text) - in any case, most of them now provide labeling capabilities across text, video and image (with varying degrees of AI-assistance to accelerate the process).

It's undeniable that a lot of innovation in areas like self-driving cars & drones would not have happened so quickly without platforms like Scale.

However, I don't see any evidence that these labeling solutions have achieved adoption or been leveraged to accelerate conversational AI development.

The key limitation of these platforms (and a very critical one) is that they all expect a pre-defined set of labels to be provided ahead of time: this makes sense for image-labeling use-cases where the number of things you want to label is either very specific or very finite (i.e: « cars , buildings » etc), and for simple text-classification problems (when you want to bucket text in « negative vs. positive » sentiment for example); these also happen to be use-cases where domain expertise isn't necessarily needed, and where the labeling can therefore be easily outsourced.

On the other hand, the next generation of conversational AI use-cases will need to capture hundreds (or even thousands) of different domain-specific customer intents: it’s impossible to know what those are ahead of time.

The hardest part of conversational AI isn’t the labeling, but discovering and organizing the intents that can be trained from the data in the first place.

Figuring out the right "labels" (or intents) that will constitute the core of your conversational AI's NLU is as much an art as science: it requires a combination of domain expertise, linguistics, and insight into the way these intents will be applied within the AI's business rules.

In my experience, this intent discovery can't be done "top-down" if you're looking to achieve NLU with real depth and accuracy: rather, it requires an iterative and continuous process that will gradually uncover the information model "bottom-up" from the raw data, and improve over time.

Most teams use Excel to sort, organize and label their raw data today.

That tells me the types of labeling tools listed above are not providing the necessary AI-assisted data engineering and modelling capabilities required to train the next level of natural language understanding...

... otherwise all big brands investing in conversational AI experiences would have simply outsourced the NLU to Scale.ai, and chatbots wouldn't be saying "I didn't understand that" quite as much :)

HumanFirst is like Excel, for Natural Language Data. A complete productivity suite to transform natural language into business insights and AI training data.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox