AI Safety 2 min read

Hallucination

Also known as: AI Hallucination, Confabulation, Model Hallucination

When an AI model generates information that sounds plausible but is factually incorrect, fabricated, or not supported by its training data or provided context.

Definition

When an AI model generates information that sounds plausible but is factually incorrect, fabricated, or not supported by its training data or provided context.

AI Safety 2 min read H

Overview

Hallucination is one of the most significant challenges in deploying AI systems. Language models generate text by predicting probable next tokens based on patterns learned during training. When the model lacks sufficient context or encounters ambiguous queries, it may generate responses that appear authoritative and well-structured but contain fabricated facts, false citations, or invented details.

Types of Hallucinations

Intrinsic Hallucination

The model generates output that contradicts the source material provided in its context. For example, when summarizing a document, the model might include claims not present in the original text.

Extrinsic Hallucination

The model generates information that cannot be verified or contradicted from the source material — it introduces new, unverifiable claims.

Why Models Hallucinate

  • Insufficient Context: The model lacks the specific information needed to answer accurately
  • Pattern Completion: The model fills in gaps with statistically likely but factually wrong information
  • Training Data Contradictions: Conflicting information in training data creates uncertainty
  • Overconfidence: Models are trained to always produce an answer, even when they should express uncertainty

Context Management as a Mitigation Strategy

Effective context management is the primary defense against hallucination. By providing accurate, relevant, and comprehensive context through techniques like RAG, organizations can ground model responses in verified information.

  • Source Grounding: Always providing verified source documents for the model to reference
  • Citation Requirements: Instructing models to cite sources and only state claims supported by provided context
  • Context Verification: Cross-referencing model outputs against the provided context to detect unsupported claims
  • Confidence Calibration: Training models to express uncertainty when context is insufficient