Generative AI introduced a new way to work with data. Large language models (LLMs) have advanced contextual understanding, and can reason across vast amounts of information, removing the need for data pre-processing and making qualitative analyses newly manageable.
Directing the LLM to extract an insight, or perform a transformation, is called prompt engineering. With a well-engineered prompt, you can use LLMs to manage any task; whether you’re making sense of a handful of onboarding calls or modeling customer sentiment across thousands of support conversations. Applying the prompt at scale is where the results become valuable.
With these 6 field-tested prompts, you can:
Summarize key issues from product and service reviews
Analyze customer sentiment across support calls
Generate knowledge articles from successful issue resolutions
Auto-respond to customer reviews in your specific brand voice
For most companies, business takes the shape of emails, contracts, support tickets, meeting notes, and survey responses. Last year, an estimated 347 billion emails were sent around the world.
Classifying those documents has always been a valuable initiative and a herculean task. Say we wanted to understand those 347 billion emails; hard-to-parse send fields, footers, and all. Normally, we’d have two options: ask a data science team, or ask an LLM.
Data Science-Led Text Classification
The tried and tested (and tedious) method of complex text classification involves handing over an email corpus to an in-house or outsourced data science team. Natural language processing (NLP) capabilities makes automated document classification somewhat possible. The data team constructs a project in Python, imports the emails, and architects an arduous process of text cleaning, vectorization, and model creation.
The result is a shortlist of labels that the emails map to–in theory. In practice, data science-led labeling can fall short of the targeted clarity. Without a business expert in the loop who can add their contextual understanding–how granular the labels should be, what segment of the email chains should direct the classification–the resulting taxonomy can miss the mark.
LLM-Led Text Classification
The efficiencies of LLMs afford a second option. LLMs are trained on vast amounts of data. They can manage a multi-turn analysis and make complex classification decisions with some level of accuracy. Analytics plugins released in the latest wave of GPTs make it easy to drag and drop a file into a ChatGPT-like interface and write a prompt to extract key topics, assess frequency, or perform basic classification.
But one dimensional prompt interfaces come with limitations. Namely, the labeling and analysis happens without a human in the loop, which prohibits the alignment of human understanding with the LLM’s predictive decision-making. The classification project quickly becomes impossible to audit. Even different models disagree on simple definitions; if business experts can’t inspect and correct the model’s understanding, trusting the resulting taxonomy becomes impossible.
Prompt and Data Engineering for Automated Document Classification
This workflow offers teams a third option. Combining the efficiency of LLM-led document classification with the fidelity of inspectable data engineering methods, and removing the barrier of code, teams can quickly arrive at an automated classification model they can inspect, adjust, and trust.
After following this how-to guide, you’ll have:
Annotated data you can use for dashboards and visualizations
Organized datasets you can analyze further
An operative taxonomy you can use to categorize new documents as they’re created
Download the full guide + video walkthrough to learn more.