Back to blog
Articles
Articles
September 14, 2023
·
4 min read

How Does Large Language Models Use Long Contexts?

September 14, 2023
|
4 min read

Latest content

Customer Stories
4min read

Lightspeed Uses HumanFirst for In-House AI Enablement

Meet Caroline, an analyst-turned-AI-expert who replaced manual QA, saved countless managerial hours, and built new solutions for customer support.
December 10, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024
Articles
6 min read

AI for CIOs: From One-Off Use to Company-Wide Value

A maturity model for three stages of AI adoption, including strategies for company leaders to progress to the next stage.
September 12, 2024
Tutorials
4 min read

Building Prompts for Generators in Dialogflow CX

How to get started with generative features.
August 15, 2024
Announcements
3 min read

HumanFirst and Infobip Announce a Partnership to Equip Enterprise Teams with Data + Generative AI

With a one-click integration to Conversations, Infobip’s contact center solution, HumanFirst helps enterprise teams leverage LLMs to analyze 100% of their customer data.
August 8, 2024
Tutorials
4 min read

Two Field-Tested Prompts for CX Teams

Get deeper insights from unstructured customer data with generative AI.
August 7, 2024
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Customer Stories
4min read

Lightspeed Uses HumanFirst for In-House AI Enablement

Meet Caroline, an analyst-turned-AI-expert who replaced manual QA, saved countless managerial hours, and built new solutions for customer support.
December 10, 2024
Customer Stories
4 min read

How Infobip Generated 220+ Knowledge Articles with Gen AI For Smarter Self-Service and Better NPS

Partnering with HumanFirst, Infobip generated over 220 knowledge articles, unlocked 30% of their agents' time, and improved containment by a projected 15%.
September 16, 2024
Articles
7 min read

Non-Technical AI Adoption: The Value of & Path Towards Workforce-Wide AI

Reviewing the state of employee experimentation and organizational adoption, and exploring the shifts in thinking, tooling, and training required for workforce-wide AI.
September 12, 2024

Let your data drive.

Articles

How Does Large Language Models Use Long Contexts?

COBUS GREYLING
September 14, 2023
.
4 min read

And how to manage the performance and cost of large context input to LLMs.

TL;DR

  1. There is a deprecation in LLM performance when large context windows are leveraged.
  2. Offloading complexity to the LLM provider will turn the application into a black-box without the ability to granularly manage cost, input and output token use, model performance and context.
  3. A simplistic approach will incur technical debt which will have to be addressed later in the application lifecycle.
  4. Offloading complexity and data management to the LLM also closely ties the Generative App to a specific LLM. Generative Apps can be LLM agnostic by following a RAG approach.
  5. The ideal scenario is where the LLM is a utility and do not manage data or hold application complexity.
  6. Via a RAG implementation, use-cases demanding large context windows can be managed outside the ambit of the LLM.

As seen in the chart below, the context size of Large Language Models (LLMs) are growing and currently range between 4,000 to 100,000 tokens. Hence there is the temptation to over simplify LLM enterprise implementations and directly and natively leverage the large context window of LLMs.

Source

This avenue is very attractive in the short-term, in terms of favourable time-to-market, cost, solution complexity.

The disadvantages include the fact that the LLM becomes a black-box with no operational insights past the LLM input and output point.

Model performance substantially decreases as input contexts grow longer. — Source

Considering the graph below, cost is also important in terms of token use during text input and output. It is clear from the token use/cost breakdown that the output token use can be exorbitant.

Hence there are cost considerations to truncate the text input and shortening the LLM output. This goes to illustrate that this truncating will necessitate the introduction of complexity if implementers do not want to be completely at the mercy and behest of LLM suppliers.

Added to these considerations, long context accuracy has come under the scrutiny.

A recent study has found that LLM performance is best when the relevant information is present at the start or end of the input context.

And in contrast, performance degrades when data relevant to the user query is in the middle of long context.

Source

The graph below graphically illustrates how the accuracy improves at the beginning and end of the information entered.

And the performance deprecation when referencing data in the middle is also visible.

Source

Added to this, models with extended context windows does not generally perform better than other smaller context models.

Source

The graphs above shows different scenarios in terms of number of documents contrasted against accuracy and the position of the document holding the answer. Again performance is generally highest when relevant information is positioned at the very start or very end of the context, and rapidly degrades when models must reason over information in the middle of their input context.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox