Back to blog
Articles
May 5, 2021
·
2 MIN READ

Labeling isn't hard if you know what to label.

May 5, 2021
|
2 MIN READ

Latest content

Customer Stories
4min read

Lightspeed Uses HumanFirst for In-House AI Enablement

Meet Caroline, an analyst-turned-AI-expert who replaced manual QA, saved countless managerial hours, and built new solutions for customer support.
December 10, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024
Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Customer Stories
4min read

Lightspeed Uses HumanFirst for In-House AI Enablement

Meet Caroline, an analyst-turned-AI-expert who replaced manual QA, saved countless managerial hours, and built new solutions for customer support.
December 10, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024

Let your data drive.

Labeling isn't hard if you know what to label.

GREGORY WHITESIDE
May 5, 2021
.
2 MIN READ

Companies like Scale.ai [scale.com] (recently valued at $7B+ dollars [https://fortune.comscale-ai-valuation-new-funding-fundraising-data-labeling-company-startups-

Companies like Scale.ai (recently valued at $7B+ dollars) were able to build massive businesses  around data cleaning & labeling for AI: this is a human-intensive task, and companies flocked to services that allow them to get this value quickly.

Some of the other startups in this space include Clarifai, CloudFactory, LabelBox and Sama (now with offices in Montreal!), and in the majors we
have Amazon's SageMaker Ground Truth, as well as Google's and Azure's labeling tooling.

While some of these providers let you bring your own team of labelers, the real advantage (and business model) is in their workforce: they help teams scale the work by parallelizing the labeling tasks across a pool of humans. A lot of the peripheral value is in the data governance and collaboration features (which makes sense, given projects are often outsourced).

To my knowledge, most of these platforms started with image labeling, and gradually added workflows to support other types of content (videos, text) - in any case, most of them now provide labeling capabilities across text, video and image (with varying degrees of AI-assistance to accelerate the process).

It's undeniable that a lot of innovation in areas like self-driving cars & drones would not have happened so quickly without platforms like Scale.

However, I don't see any evidence that these labeling solutions have achieved adoption or been leveraged to accelerate conversational AI development.

The key limitation of these platforms (and a very critical one) is that they all expect a pre-defined set of labels to be provided ahead of time: this makes sense for image-labeling use-cases where the number of things you want to label is either very specific or very finite (i.e: « cars , buildings » etc), and for simple text-classification problems (when you want to bucket text in « negative vs. positive » sentiment for example); these also happen to be use-cases where domain expertise isn't necessarily needed, and where the labeling can therefore be easily outsourced.

On the other hand, the next generation of conversational AI use-cases will need to capture hundreds (or even thousands) of different domain-specific customer intents: it’s impossible to know what those are ahead of time.

The hardest part of conversational AI isn’t the labeling, but discovering and organizing the intents that can be trained from the data in the first place.

Figuring out the right "labels" (or intents) that will constitute the core of your conversational AI's NLU is as much an art as science: it requires a combination of domain expertise, linguistics, and insight into the way these intents will be applied within the AI's business rules.

In my experience, this intent discovery can't be done "top-down" if you're looking to achieve NLU with real depth and accuracy: rather, it requires an iterative and continuous process that will gradually uncover the information model "bottom-up" from the raw data, and improve over time.

Most teams use Excel to sort, organize and label their raw data today.

That tells me the types of labeling tools listed above are not providing the necessary AI-assisted data engineering and modelling capabilities required to train the next level of natural language understanding...

... otherwise all big brands investing in conversational AI experiences would have simply outsourced the NLU to Scale.ai, and chatbots wouldn't be saying "I didn't understand that" quite as much :)

HumanFirst is like Excel, for Natural Language Data. A complete productivity suite to transform natural language into business insights and AI training data.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox