Build Your First Context Management System: Step-by-Step

Overview

Context management is the backbone of any AI application that needs to maintain awareness across interactions. Whether you are building a customer support chatbot, a personalized recommendation engine, or an internal knowledge assistant, your system needs a reliable way to store, retrieve, and serve relevant context to AI models. This guide walks you through building a complete context management system from scratch, covering database design, API architecture, storage patterns, and AI integration. By the end, you will have a production-ready foundation that you can extend as your requirements grow.

A well-designed context management system can reduce AI hallucinations by up to 40% by grounding responses in relevant, verified information rather than relying solely on a model's training data.

Prerequisites

Before you begin, ensure you have the following in place:

Basic understanding of REST APIs and HTTP methods (GET, POST, PUT, DELETE)
Familiarity with Python 3.9+ (examples use Python, but concepts apply to any language)
PostgreSQL 15+ installed and running (our recommended database for context storage)
An AI model API key from OpenAI, Anthropic, or a similar provider for testing integration
Docker installed if you want to containerize your setup (see our Docker deployment guide)

Architecture Decision: Choosing Your Stack

Before writing any code, you need to make foundational decisions about your technology stack. The choices you make here will affect scalability, maintainability, and performance for months or years to come.

Database Selection

Your database is the most critical component. Here is how the main contenders compare for context management workloads:

Database	Best For	JSON Support	Vector Search	Scalability	Operational Complexity
PostgreSQL	General purpose, JSONB queries	Excellent (JSONB)	Via pgvector extension	Vertical + read replicas	Low
MongoDB	Schema-flexible documents	Native	Atlas Vector Search	Horizontal (sharding)	Medium
Redis	Caching, real-time access	Via RedisJSON	Via RediSearch	Redis Cluster	Low-Medium
Elasticsearch	Full-text search	Native	Dense vector fields	Horizontal (sharding)	High

We recommend PostgreSQL for most teams starting out. It offers excellent JSONB support for flexible context storage, the pgvector extension for future vector search capabilities, and a mature ecosystem of tools and libraries. You can always add specialized stores later as your needs evolve.

API Framework

For Python, FastAPI provides excellent performance, automatic OpenAPI documentation, and async support out of the box. Flask is simpler but lacks built-in async capabilities. For Node.js teams, Express or Fastify are solid choices.

Step 1: Design Your Context Schema

A well-designed schema is the foundation of your entire system. Start by defining what context you will store. A minimal but extensible schema includes: user identifier, context type, content payload, timestamps, and flexible metadata.

CREATE TABLE contexts (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL,
  context_type VARCHAR(50) NOT NULL,
  content JSONB NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  expires_at TIMESTAMP WITH TIME ZONE,
  metadata JSONB DEFAULT '{}',
  is_active BOOLEAN DEFAULT true,
  version INTEGER DEFAULT 1
);

-- Essential indexes for common query patterns
CREATE INDEX idx_contexts_user_type ON contexts(user_id, context_type);
CREATE INDEX idx_contexts_created ON contexts(created_at DESC);
CREATE INDEX idx_contexts_active ON contexts(is_active) WHERE is_active = true;
CREATE INDEX idx_contexts_metadata ON contexts USING GIN(metadata);

Schema Design Principles

Use UUIDs for identifiers to avoid sequential ID enumeration attacks and simplify distributed deployments
Store content as JSONB rather than TEXT to enable querying within context content without application-level parsing
Include an expires_at column to support automatic context expiration and GDPR compliance requirements
Add a version column for optimistic concurrency control and context versioning
Use soft deletes (is_active flag) rather than hard deletes for audit compliance

Step 2: Build the Storage Layer

The storage layer abstracts database operations behind a clean interface. This separation lets you swap databases or add caching without changing your API code.

import asyncpg
import json
from uuid import UUID
from datetime import datetime
from typing import Optional, List, Dict, Any

class ContextStore:
    def __init__(self, pool: asyncpg.Pool):
        self.pool = pool

    async def create(self, user_id: UUID, context_type: str,
                     content: Dict[str, Any],
                     metadata: Optional[Dict] = None,
                     expires_at: Optional[datetime] = None) -> Dict:
        query = """
            INSERT INTO contexts (user_id, context_type, content,
                                  metadata, expires_at)
            VALUES ($1, $2, $3, $4, $5)
            RETURNING id, user_id, context_type, content,
                      created_at, metadata, version
        """
        row = await self.pool.fetchrow(
            query, user_id, context_type,
            json.dumps(content),
            json.dumps(metadata or {}),
            expires_at
        )
        return dict(row)

    async def get_by_user(self, user_id: UUID,
                          context_type: Optional[str] = None,
                          limit: int = 50) -> List[Dict]:
        if context_type:
            query = """
                SELECT * FROM contexts
                WHERE user_id = $1 AND context_type = $2
                  AND is_active = true
                  AND (expires_at IS NULL OR expires_at > NOW())
                ORDER BY created_at DESC LIMIT $3
            """
            rows = await self.pool.fetch(query, user_id,
                                         context_type, limit)
        else:
            query = """
                SELECT * FROM contexts
                WHERE user_id = $1 AND is_active = true
                  AND (expires_at IS NULL OR expires_at > NOW())
                ORDER BY created_at DESC LIMIT $2
            """
            rows = await self.pool.fetch(query, user_id, limit)
        return [dict(r) for r in rows]

    async def update(self, context_id: UUID,
                     content: Dict[str, Any],
                     expected_version: int) -> Dict:
        query = """
            UPDATE contexts
            SET content = $2, updated_at = NOW(),
                version = version + 1
            WHERE id = $1 AND version = $3 AND is_active = true
            RETURNING *
        """
        row = await self.pool.fetchrow(
            query, context_id,
            json.dumps(content), expected_version
        )
        if not row:
            raise ConflictError("Context was modified by another process")
        return dict(row)

    async def soft_delete(self, context_id: UUID) -> bool:
        query = """
            UPDATE contexts SET is_active = false,
              updated_at = NOW()
            WHERE id = $1 AND is_active = true
        """
        result = await self.pool.execute(query, context_id)
        return result == "UPDATE 1"

Always use connection pooling in production. A single connection per request will exhaust your database connections under load. Libraries like asyncpg provide built-in connection pools that handle this efficiently.

Step 3: Create the API Layer

Expose context operations through a REST API. Use FastAPI for automatic request validation, OpenAPI documentation, and high-performance async request handling.

from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from uuid import UUID
from typing import Optional, Dict, Any

app = FastAPI(title="Context Management API")

class CreateContextRequest(BaseModel):
    user_id: UUID
    context_type: str
    content: Dict[str, Any]
    metadata: Optional[Dict] = None

@app.post("/contexts", status_code=201)
async def create_context(req: CreateContextRequest,
                         store: ContextStore = Depends(get_store)):
    result = await store.create(
        user_id=req.user_id,
        context_type=req.context_type,
        content=req.content,
        metadata=req.metadata
    )
    return result

@app.get("/contexts/{user_id}")
async def get_user_contexts(user_id: UUID,
                            context_type: Optional[str] = None,
                            limit: int = 50,
                            store: ContextStore = Depends(get_store)):
    return await store.get_by_user(user_id, context_type, limit)

API Best Practices

Implement pagination for list endpoints using cursor-based pagination over offset-based for consistent results
Add request validation with Pydantic models to reject malformed input before it reaches your storage layer
Use proper HTTP status codes: 201 for creation, 404 for missing resources, 409 for version conflicts
Include rate limiting to protect your database from abuse (see rate limiting strategies)
Return consistent error responses with error codes, messages, and request IDs for debugging

Step 4: Integrate with AI Models

The final step connects your context system to an AI model. Fetch relevant context, format it appropriately, and include it in prompts. This is where your context management system proves its value.

async def generate_response(user_id: UUID, user_message: str,
                            store: ContextStore):
    # 1. Retrieve relevant context
    contexts = await store.get_by_user(
        user_id, context_type="conversation", limit=10
    )
    profile = await store.get_by_user(
        user_id, context_type="profile", limit=1
    )

    # 2. Format context for the model
    context_text = format_contexts(contexts, profile)

    # 3. Build the prompt with context
    messages = [
        {"role": "system",
         "content": f"Use this context: {context_text}"},
        {"role": "user", "content": user_message}
    ]

    # 4. Call the AI model
    response = await ai_client.chat(messages=messages)

    # 5. Store the interaction as new context
    await store.create(
        user_id=user_id,
        context_type="conversation",
        content={"role": "assistant",
                 "message": response.content,
                 "user_message": user_message}
    )
    return response

Context Formatting Tips

How you format context for the AI model significantly impacts response quality. Keep these principles in mind:

Prioritize recency: More recent context should appear closer to the user's message
Summarize when necessary: If context volume exceeds your model's optimal window, summarize older entries
Include metadata: Timestamps and context types help the model understand what it is looking at
Separate concerns: Profile context, conversation history, and domain knowledge should be clearly delineated

For advanced context formatting strategies, see our guide on effective context windows for LLMs and prompt engineering with dynamic context.

Step 5: Add Observability

Before going to production, add monitoring so you can understand how your system is performing. Track these key metrics:

Context retrieval latency (P50, P95, P99)
Context store size per user and overall
API request rates and error rates
AI response quality scores (if you have evaluation in place)

Set up structured logging for every context operation. Include the user ID, context type, operation, and duration in each log entry. This data will be invaluable when debugging issues or optimizing performance.

Frequently Asked Questions

How much context should I store per user?

Start with the last 50-100 interactions and a single profile document per user. Monitor your storage growth and AI response quality to determine the right retention window. Most applications find that context older than 30 days has diminishing returns unless it contains critical reference information.

Should I use a relational database or a NoSQL database for context storage?

For most teams, PostgreSQL with JSONB columns offers the best of both worlds: flexible schema-free content storage with the reliability, ACID compliance, and query power of a relational database. If you anticipate storing billions of context documents with simple access patterns, a document database like MongoDB may be more appropriate. See our guide on scalable context store patterns for more architectural options.

How do I handle context for multiple AI models?

Store context in a model-agnostic format (plain text or structured JSON) and add a formatting layer that transforms context into the format each model expects. This way, switching or adding models does not require changes to your storage layer. Our multi-model context orchestration guide covers this topic in depth.

What is the most common mistake when building a first context system?

Over-engineering the initial design. Start with the simple schema shown in this guide, get it working end-to-end with your AI model, and then iterate based on real usage patterns. Adding caching, vector search, and advanced features is much easier when you have a solid, simple foundation to build on.

MCP Tutorials

RAG Cookbook

Library Integrations

Context Window Engineering

Embeddings & Retrieval

Tool Use & Function Calling

Getting Started: Your First Context Management System