Back to blog
Articles
Tutorials
February 29, 2024
·
6 min read

Generating Chatbot Flow Logic from Real Conversations

February 29, 2024
|
6 min read

Generative AI is fundamentally changing conversation design, but deploying an AI-enabled chatbot still requires a human-led process of hardcoding flows for different conversational scenarios. We’re not yet at the point of leaving the logic of complex interactions to the LLM; it’s unclear if or when we’ll arrive.

A roadblock to fast and performant chatbot design is the number of side roads and strange turns inherent to human conversations. Humans can drive those interactions without thinking, but it’s time consuming, arduous, and challenging for us to chart the path with the detail required. 

Luckily, companies already have wide libraries of successful customer conversations from call transcripts, bot logs, email support tickets, and other sources. With the right human-in-the-loop workflow, those examples should provide the LLM what it needs to deduce the logic and design the chatbot flow. This article outlines that process, beginning with specific subsets of data and ending with a flow that could feed a generative playbook, a text to flow converter, or a chatbot platform that can build flows from natural language.

Validating the Concept: Single-Use Case Testing

To begin, we need to validate whether our goal is feasible for a single kind of customer request. In this example, we’ll test the workflow on conversations that flag a missing package.

We can search through the sea of transcripts for missing packages in one of two ways. First, we can employ semantic search, a data engineering tactic underlying RAG solutions, which will search our transcripts for similar phrases. If we add a custom example to the stash–”I didn’t receive my package yet”--we can automatically surface semantically similar requests. We’ll select a handful conversations deemed similar and moved them to the stash. 

This test data determines the quality of our experiment–we want to make sure it’s accurate and specific. Searching by semantic similarity alone, we risk including conversations that mention a missing package but regard a different issue entirely. Rather than read through every transcript individually, we can build a custom prompt to find the true key issues of the conversations we’ve selected. 

We can run this prompt on the ten conversations in the stash to see the summarized key issue for each one. In this example, only a segment of the ten conversations have ‘package missing’ as the key issue. We can move those conversations to a new stash and proceed to the next step: engineering a chatbot flow prompt. 

Generating Chatbot Flows from a Small Data Segment

Having filtered our data through semantic search and a custom summarization prompt, we can start working on the prompt that will generate the chatbot logic. This is where iterative, human-in-the-loop prompt engineering takes over. After many revisions, we’ve engineered a prompt with sufficient detail to create the flow we need–full of conditional logic, subnodes, specific omissions, and instructions a chatbot could follow. 

Running this prompt on a merged stash will take all of the selected conversations into account, compiling their cues to create a single flow. This is the beginning of many revisions–we’ll want to ensure the output is properly formatted and that there are no oversights across these six examples. 

Expanding the Logic

We’re now confident that we’ve generated a chatbot flow logic that could sufficiently manage these example conversations. But no doubt there are other conversations regarding missing packages that take different turns. 

To find more conversations and test this flow against edge cases, we can return to our ‘key issue’ prompt. Using the pipeline feature, we can summarize the key issue across all raw transcripts simultaneously. We can cluster the output by similarity, search for the keyword ‘not delivered,’ curate the results, and move them to a new stash. Working with simplified summaries, we can now effectively search by semantic similarity to find more conversations on the same topic. In this example, we found 15 additional conversations about a missing package.

Let’s create a custom prompt to test our generated flow against these new scenarios. We can copy and paste the flow from the previous step and ask the model whether the given instructions would sufficiently handle these new conversations. In the case of no, we’ll ask it to list the missing steps. We can run this on each individual conversation in the stash to see a yes or no classification for all 15 examples. 

Following is that prompt, slightly simplified:

Our output highlights important gaps in the flow; a step for validating the user’s account information, a provision for the case that order details are unknown. These are just a few examples of overlooked steps we’ll want to account for. 

We can continue to fine-tune this flow until we get nothing but ‘yes’ from the above validation prompt. Then we can transfer the output to a generative playbook, a text to flow converter, or a chatbot platform that allows you to express flows in natural language. 

Finding and accounting for edge cases is easier, faster, and more effective when we work from real customer conversations. Synthetic data and imagined scenarios are limited by our conscious understanding of customer interactions, often lacking the nuances of natural language in action. With this workflow, conversation designers can accelerate the backend build of flexible and intuitive chatbots, improving agility and dramatically accelerating time to deployment.

Latest content

Customer Stories
4min read

Lightspeed Uses HumanFirst for In-House AI Enablement

Meet Caroline, an analyst-turned-AI-expert who replaced manual QA, saved countless managerial hours, and built new solutions for customer support.
December 10, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Customer Stories
4min read

Lightspeed Uses HumanFirst for In-House AI Enablement

Meet Caroline, an analyst-turned-AI-expert who replaced manual QA, saved countless managerial hours, and built new solutions for customer support.
December 10, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024

Let your data drive.

Tutorials

Generating Chatbot Flow Logic from Real Conversations

GREGORY WHITESIDE
February 29, 2024
.
6 min read

How to build flexible, intuitive Conversational AI from unstructured customer data.

Generative AI is fundamentally changing conversation design, but deploying an AI-enabled chatbot still requires a human-led process of hardcoding flows for different conversational scenarios. We’re not yet at the point of leaving the logic of complex interactions to the LLM; it’s unclear if or when we’ll arrive.

A roadblock to fast and performant chatbot design is the number of side roads and strange turns inherent to human conversations. Humans can drive those interactions without thinking, but it’s time consuming, arduous, and challenging for us to chart the path with the detail required. 

Luckily, companies already have wide libraries of successful customer conversations from call transcripts, bot logs, email support tickets, and other sources. With the right human-in-the-loop workflow, those examples should provide the LLM what it needs to deduce the logic and design the chatbot flow. This article outlines that process, beginning with specific subsets of data and ending with a flow that could feed a generative playbook, a text to flow converter, or a chatbot platform that can build flows from natural language.

Validating the Concept: Single-Use Case Testing

To begin, we need to validate whether our goal is feasible for a single kind of customer request. In this example, we’ll test the workflow on conversations that flag a missing package.

We can search through the sea of transcripts for missing packages in one of two ways. First, we can employ semantic search, a data engineering tactic underlying RAG solutions, which will search our transcripts for similar phrases. If we add a custom example to the stash–”I didn’t receive my package yet”--we can automatically surface semantically similar requests. We’ll select a handful conversations deemed similar and moved them to the stash. 

This test data determines the quality of our experiment–we want to make sure it’s accurate and specific. Searching by semantic similarity alone, we risk including conversations that mention a missing package but regard a different issue entirely. Rather than read through every transcript individually, we can build a custom prompt to find the true key issues of the conversations we’ve selected. 

We can run this prompt on the ten conversations in the stash to see the summarized key issue for each one. In this example, only a segment of the ten conversations have ‘package missing’ as the key issue. We can move those conversations to a new stash and proceed to the next step: engineering a chatbot flow prompt. 

Generating Chatbot Flows from a Small Data Segment

Having filtered our data through semantic search and a custom summarization prompt, we can start working on the prompt that will generate the chatbot logic. This is where iterative, human-in-the-loop prompt engineering takes over. After many revisions, we’ve engineered a prompt with sufficient detail to create the flow we need–full of conditional logic, subnodes, specific omissions, and instructions a chatbot could follow. 

Running this prompt on a merged stash will take all of the selected conversations into account, compiling their cues to create a single flow. This is the beginning of many revisions–we’ll want to ensure the output is properly formatted and that there are no oversights across these six examples. 

Expanding the Logic

We’re now confident that we’ve generated a chatbot flow logic that could sufficiently manage these example conversations. But no doubt there are other conversations regarding missing packages that take different turns. 

To find more conversations and test this flow against edge cases, we can return to our ‘key issue’ prompt. Using the pipeline feature, we can summarize the key issue across all raw transcripts simultaneously. We can cluster the output by similarity, search for the keyword ‘not delivered,’ curate the results, and move them to a new stash. Working with simplified summaries, we can now effectively search by semantic similarity to find more conversations on the same topic. In this example, we found 15 additional conversations about a missing package.

Let’s create a custom prompt to test our generated flow against these new scenarios. We can copy and paste the flow from the previous step and ask the model whether the given instructions would sufficiently handle these new conversations. In the case of no, we’ll ask it to list the missing steps. We can run this on each individual conversation in the stash to see a yes or no classification for all 15 examples. 

Following is that prompt, slightly simplified:

Our output highlights important gaps in the flow; a step for validating the user’s account information, a provision for the case that order details are unknown. These are just a few examples of overlooked steps we’ll want to account for. 

We can continue to fine-tune this flow until we get nothing but ‘yes’ from the above validation prompt. Then we can transfer the output to a generative playbook, a text to flow converter, or a chatbot platform that allows you to express flows in natural language. 

Finding and accounting for edge cases is easier, faster, and more effective when we work from real customer conversations. Synthetic data and imagined scenarios are limited by our conscious understanding of customer interactions, often lacking the nuances of natural language in action. With this workflow, conversation designers can accelerate the backend build of flexible and intuitive chatbots, improving agility and dramatically accelerating time to deployment.

HumanFirst is a data-centric productivity platform designed to help companies find and solve problems with AI-powered workflows that combine prompt and data engineering. Experiment with raw data, surface insights, and build reliable solutions with speed, accuracy, and trust.

HumanFirst is a data-centric productivity platform designed to help companies find and solve problems with AI-powered workflows that combine prompt and data engineering. Experiment with raw data, surface insights, and build reliable solutions with speed, accuracy, and trust.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox