Knowledge Base
Also known as: KB, Knowledge Repository, Knowledge Graph
A structured repository of information, facts, and relationships used by AI systems as a source of context and ground truth for answering queries and making decisions.
“A structured repository of information, facts, and relationships used by AI systems as a source of context and ground truth for answering queries and making decisions.
“
Overview
A knowledge base in the context of AI systems is an organized collection of information that serves as the ground truth for AI-powered applications. Unlike the parametric knowledge stored in a model's weights during training, a knowledge base provides explicit, updateable, and verifiable information that can be retrieved and incorporated into the model's context at runtime.
Types of Knowledge Bases
Document Stores
Collections of unstructured or semi-structured documents (PDFs, web pages, manuals) that are indexed for retrieval. This is the most common form of knowledge base for RAG systems.
Knowledge Graphs
Structured representations of entities and their relationships, stored as nodes and edges in a graph database. Knowledge graphs excel at representing complex relationships and enabling multi-hop reasoning.
FAQ and Curated Databases
Hand-crafted collections of question-answer pairs or structured data entries, often used for customer support and domain-specific applications.
Building Effective Knowledge Bases
- Content Curation: Selecting high-quality, authoritative, and up-to-date information
- Chunking Strategy: Breaking documents into optimally-sized pieces for retrieval
- Metadata Enrichment: Adding metadata (source, date, topic, access level) to enable filtering
- Version Control: Tracking changes to knowledge base content over time
- Quality Assurance: Regular review and updating of stored information
Context Management Role
The knowledge base is the primary source of external context for enterprise AI systems. The quality, organization, and accessibility of the knowledge base directly determine the quality of AI responses. Context management systems must efficiently select, retrieve, and format knowledge base content to maximize the utility of the AI system's limited context window.
Sources & Further Reading
Related Terms
Embeddings
Dense numerical vector representations of data (text, images, audio) that capture semantic meaning, enabling similarity comparisons and machine learning operations in a continuous vector space.
Retrieval-Augmented Generation
A technique that enhances AI model outputs by retrieving relevant information from external knowledge sources and incorporating it into the model's context before generating a response.
Semantic Search
A search methodology that understands the contextual meaning and intent behind a query rather than matching exact keywords, using embeddings and vector similarity to find semantically relevant results.
Vector Database
A specialized database designed to store, index, and query high-dimensional vector embeddings, enabling efficient similarity search used in RAG systems and AI applications.