Context Compression
Also known as: Prompt Compression, Context Condensation
Techniques for reducing the token count of context provided to language models while preserving the most essential information, enabling more efficient use of limited context windows.
Sources & References
Google Cloud
Related Terms
Context Window
The maximum amount of text (measured in tokens) that a large language model can process in a single interaction, encompassing both the input prompt and the generated output. Managing context windows effectively is critical for enterprise AI deployments where complex queries require extensive background information.
Prompt Engineering
The practice of designing, optimizing, and structuring inputs (prompts) to AI language models to elicit desired outputs, including techniques for instruction formatting, context provision, and output specification.
Retrieval-Augmented Generation
A technique that enhances AI model outputs by retrieving relevant information from external knowledge sources and incorporating it into the model's context before generating a response.
Tokens
The basic units of text that language models process, typically representing words, subwords, or characters. Token counts determine context window usage and API costs.