Context Caching with Redis: Setup & Best Practices

Why Cache Context

Every AI request that retrieves context from a database adds latency. A typical PostgreSQL query takes 5-20ms, and when your pipeline makes multiple context lookups per request, those milliseconds compound quickly. Caching frequently-accessed context in Redis reduces retrieval time to sub-millisecond levels, a difference that becomes critical in real-time applications where users expect instant responses.

But caching is not just about speed. It also protects your primary database from read amplification during traffic spikes. A well-implemented cache layer can absorb 80-95% of context reads, dramatically reducing load on your database and enabling your system to handle bursts of traffic without provisioning additional database capacity.

In production context management systems, a properly tuned Redis cache typically achieves a 90%+ hit rate for active user context, reducing average retrieval latency from 15ms to under 1ms and cutting database read load by an order of magnitude.

Caching Strategy: What to Cache and What Not To

Not all context benefits equally from caching. The key is to cache data that is read frequently, changes infrequently, and is expensive to retrieve. Here is a framework for making caching decisions:

Cache These Context Types

User profile context: Read on every request, changes infrequently, small payload. TTL: 30-60 minutes.
Recent conversation context: Accessed repeatedly during an active session. TTL: 15-30 minutes.
Reference data: Product catalogs, FAQ entries, policy documents. TTL: 1-4 hours.
Computed aggregations: User preference summaries, behavioral patterns. TTL: 1-2 hours.

Do Not Cache These

Rapidly changing context: Real-time sensor data or live metrics with high invalidation rates that negate cache benefits.
Rarely accessed historical context: Low hit rates waste memory. Let these stay in the database.
Very large context collections: Full conversation histories spanning months create memory pressure. Cache summaries instead.
Sensitive PII context: Unless your Redis instance has the same security controls as your database (encryption at rest, access controls). See our context encryption strategies guide.

Redis Architecture for Context Caching

Before writing code, decide on the right Redis deployment topology for your needs:

Topology	Availability	Use Case	Complexity	Cost
Single Instance	No failover	Development, low-traffic apps	Minimal	Low
Sentinel (1 primary + 2 replicas)	Automatic failover	Production with moderate traffic	Medium	Medium
Redis Cluster (6+ nodes)	High availability + horizontal scaling	High traffic, large datasets	High	High
Managed Redis (ElastiCache, Memorystore)	Provider-managed HA	Teams without Redis operations expertise	Low	Medium-High

For most context management systems, start with a Sentinel deployment in production. It provides automatic failover without the operational complexity of a full cluster. Move to Redis Cluster only when you need to store more cached context than fits on a single node (typically beyond 25-50GB).

Step 1: Set Up Redis

Deploy Redis with configuration tuned for caching workloads. Key settings to adjust from defaults:

# redis.conf for context caching

# Memory management
maxmemory 2gb
maxmemory-policy allkeys-lru

# Persistence (optional for pure cache)
save ""
appendonly no

# Performance tuning
tcp-keepalive 300
timeout 0

# Security
requirepass your-strong-password-here
rename-command FLUSHALL ""
rename-command FLUSHDB ""

The allkeys-lru eviction policy ensures Redis automatically removes the least recently used keys when memory is full, which aligns well with context caching where recent context is most valuable. If you need different TTL behavior for different context types, consider volatile-lru which only evicts keys with an expiry set.

Step 2: Implement the Cache Layer

Build a cache layer that sits between your API and your database. The pattern is straightforward: check cache first, fall back to database on miss, populate cache on miss.

import redis.asyncio as redis
import json
from typing import Optional, Dict, Any, List
from datetime import timedelta

class ContextCache:
    # TTL defaults by context type
    TTL_MAP = {
        "profile": timedelta(minutes=60),
        "conversation": timedelta(minutes=15),
        "reference": timedelta(hours=4),
        "preference": timedelta(hours=2),
    }
    DEFAULT_TTL = timedelta(minutes=30)

    def __init__(self, redis_url: str):
        self.redis = redis.from_url(redis_url,
                                     decode_responses=True)

    def _cache_key(self, user_id: str,
                   context_type: str) -> str:
        return f"ctx:{user_id}:{context_type}"

    async def get(self, user_id: str,
                  context_type: str) -> Optional[List[Dict]]:
        """Retrieve cached context. Returns None on miss."""
        key = self._cache_key(user_id, context_type)
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        return None

    async def set(self, user_id: str, context_type: str,
                  contexts: List[Dict]):
        """Cache context with type-appropriate TTL."""
        key = self._cache_key(user_id, context_type)
        ttl = self.TTL_MAP.get(context_type, self.DEFAULT_TTL)
        await self.redis.setex(
            key, int(ttl.total_seconds()),
            json.dumps(contexts, default=str)
        )

    async def invalidate(self, user_id: str,
                         context_type: str):
        """Remove cached context after an update."""
        key = self._cache_key(user_id, context_type)
        await self.redis.delete(key)

    async def invalidate_user(self, user_id: str):
        """Remove all cached context for a user."""
        pattern = f"ctx:{user_id}:*"
        cursor = 0
        while True:
            cursor, keys = await self.redis.scan(
                cursor, match=pattern, count=100
            )
            if keys:
                await self.redis.delete(*keys)
            if cursor == 0:
                break

Integrating Cache with the Storage Layer

Wrap your existing storage layer with cache-aware logic. This is the cache-aside (lazy loading) pattern:

class CachedContextStore:
    def __init__(self, store: ContextStore,
                 cache: ContextCache):
        self.store = store
        self.cache = cache

    async def get_by_user(self, user_id: str,
                          context_type: str,
                          limit: int = 50) -> List[Dict]:
        # 1. Check cache
        cached = await self.cache.get(user_id, context_type)
        if cached is not None:
            return cached[:limit]

        # 2. Cache miss: fetch from database
        contexts = await self.store.get_by_user(
            user_id, context_type, limit
        )

        # 3. Populate cache
        await self.cache.set(user_id, context_type, contexts)
        return contexts

    async def create(self, user_id: str, context_type: str,
                     content: Dict, **kwargs) -> Dict:
        result = await self.store.create(
            user_id, context_type, content, **kwargs
        )
        # Invalidate after write
        await self.cache.invalidate(user_id, context_type)
        return result

Step 3: Implement Cache Invalidation

Cache invalidation is famously one of the two hard problems in computer science. For context caching, choose the right invalidation strategy based on your consistency requirements:

Write-Through Invalidation

Invalidate the cache entry whenever the underlying context is modified. This is the simplest and most common approach. The code above demonstrates this pattern. It provides strong consistency at the cost of a cache miss on the next read after every write.

Pub/Sub Invalidation for Distributed Systems

If you have multiple application instances, each with its own connection to Redis, use Redis Pub/Sub to broadcast invalidation events:

# Publisher: after any context write
await redis.publish("context_invalidation",
    json.dumps({"user_id": user_id,
                "context_type": context_type}))

# Subscriber: running in each app instance
async def listen_for_invalidations(cache: ContextCache):
    pubsub = cache.redis.pubsub()
    await pubsub.subscribe("context_invalidation")
    async for message in pubsub.listen():
        if message["type"] == "message":
            data = json.loads(message["data"])
            await cache.invalidate(
                data["user_id"], data["context_type"]
            )

For more complex event-driven invalidation patterns involving multiple data sources, see our guide on Change Data Capture for context.

Step 4: Advanced Caching Patterns

Write-Behind Caching

For write-heavy context (like conversation history), buffer writes in Redis and flush to the database periodically. This reduces database write load but introduces a risk window where data exists only in Redis. Only use this pattern when you can tolerate potential data loss during Redis failures.

Cache Warming

Pre-populate the cache for users who are likely to become active. If your application has predictable usage patterns (e.g., business hours), warm the cache shortly before peak times:

async def warm_cache(store: ContextStore,
                     cache: ContextCache,
                     active_user_ids: List[str]):
    """Pre-populate cache for expected active users."""
    for user_id in active_user_ids:
        for ctx_type in ["profile", "preference"]:
            contexts = await store.get_by_user(
                user_id, ctx_type
            )
            await cache.set(user_id, ctx_type, contexts)

Multi-Level Caching

For extreme performance requirements, add an in-process cache (like Python's lru_cache or a local dictionary with TTL) in front of Redis. This eliminates even the Redis network hop for the hottest keys. Be cautious with this pattern as it introduces an additional layer of cache coherence complexity.

Monitoring and Troubleshooting

Track these metrics to ensure your cache is healthy and effective:

Hit rate: Should be above 85% for well-tuned caches. Below 70% suggests you are caching the wrong data or your TTLs are too short.
Memory usage: Monitor against your maxmemory setting. Consistent evictions indicate you need more memory or shorter TTLs.
Eviction rate: High eviction rates under memory pressure mean frequently-accessed context is being evicted. Increase memory or reduce what you cache.
Latency: Redis operations should be under 1ms. Spikes indicate network issues, large key sizes, or slow commands.

# Monitor Redis metrics via CLI
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys"
redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human"

For comprehensive performance monitoring beyond just caching, see our guide on sub-millisecond context retrieval and load testing context systems.

Frequently Asked Questions

How much memory should I allocate to my Redis context cache?

Estimate based on your active user count and average context size. If you have 10,000 daily active users, each with roughly 10KB of cached context across all types, you need about 100MB of raw data. Add 50% overhead for Redis data structures, giving you roughly 150MB. Start there and monitor eviction rates to adjust. Most context management systems run well with 1-4GB of Redis cache.

Should I use Redis Strings, Hashes, or Sorted Sets for context?

Use Strings with JSON serialization for most context caching. Strings are the simplest to implement and work well with TTL-based expiration. Use Hashes only if you need to read or update individual fields within a cached context without deserializing the entire value. Use Sorted Sets only if you need server-side ordering or range queries on cached data.

What happens if Redis goes down? Will my application crash?

Never let a cache failure cascade into an application failure. Wrap all cache operations in try/except blocks and fall back to the database when Redis is unavailable. Your application should work correctly, just more slowly, without the cache. This is the cache-aside pattern's primary advantage: the cache is an optimization, not a dependency.

How do I handle cache consistency in a multi-tenant context system?

Include the tenant ID in your cache key structure (e.g., ctx:{tenant}:{user}:{type}) to ensure complete isolation between tenants. This prevents cross-tenant data leakage and allows per-tenant cache management. For broader multi-tenant architecture patterns, see our guide on multi-tenant context architecture.

MCP Tutorials

RAG Cookbook

Library Integrations

Context Window Engineering

Embeddings & Retrieval

Tool Use & Function Calling

Setting Up Context Caching with Redis