Back to blog
Articles
Articles
August 7, 2023
·
5 min read

Plan-And-Solve Prompting

August 7, 2023
|
5 min read

Latest content

Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024
Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024

Let your data drive.

The notion of fine-tuning a Large Language Models (LLMs) for very specific generative use-cases is in most instances not feasible. However, due to the flexibility of LLMs, variations in Prompt Engineering can yield astounding results. This article covers a new prompt method which improves LLM results in accuracy and completeness.

Chain-Of-Thought (CoT) prompting is one of the most successful ways to query an LLM via a zero or few-shot, single prompt. CoT prompting does well particularly well in solving for multi-step reasoning tasks.

As I have shown in the past, multi-step reasoning tasks can be created by the LLM via a few-shot chain-of-thought (CoT) prompt which includes a few manually crafted step-by-step reasoning demonstrations. Followed by the request or problem statement, and the words: Let us think step by step.

But a recent study found that CoT prompting fails in three areas:

  1. Calculations (7% failure rate in test examples)
  2. Missing steps in a sequence of events (12% failure rate in test examples)
  3. Semantic misunderstanding (27% failure rate in test examples)

These vulnerabilities are addressed by Plan-And-Solve (PS) prompting and Plan-and-Solve prompting with more detailed instructions (PS+ prompting)

PS consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan.

Considering the image below… (a) shows a Zero-Shot-CoT prompt and (b) shows the Plan-And-Solve (PS) approach for prompting and answer extraction.

Zero-shot-CoT encourages LLMs to generate multi-step reasoning with “Let’s think step by step”, it may still generate wrong reasoning steps when the problem is complex.

But, PS prompting first asks LLMs to devise a plan to solve the problem by generating a step-by-step plan and carrying out the plan to find the answer.

Source

Below I submit the question to text-davinci-003 and get the correct answer. It might be that in multiple requests I get an incorrect answer, but there is no explanation or reasoning supplied by the LLM.

Moving on to the image below, the CoT method is employed, there is an improvement to the quality of the answer and surfaced reasoning. However, the PS example at the bottom is far superior in detail and segmenting the answer into a plan, a solution and subsequently executing on that solution.

The example below is a comparison between Plan-And-Solve Prompting (PS) and Plan-And-Solve Prompting accompanied by more detailed instructions (PS+).

PS+ prompting greatly improves the quality of the generated reasoning process.

Source

In the OpenAI playground example below, the question is asked via a very simple prompt with no instruction or guidance for the LLM. The incorrect answer is returned by text-davinci-003.

And here below the PS methodology is followed, yielding the correct result and showing the plan and the solution, reaching a final conclusion.

Considering the image below, the PS+ prompting methodology is followed with an augmented and detailed response.

Final Considerations

The number of tokens used for these detailed queries increases significantly, so there is a cost consideration.

Another consideration for PS and especially PS+ is the additional overhead and effort to design the prompt. From the tests it is clear how sensitive LLMs are to prompt wording and composition.

Lastly, PS and PS+ do address calculation and reasoning vulnerabilities, but semantic misunderstanding still remains. I believe it is possible to solve for this by supplying a contextual reference within the prompt.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox