Caching Strategies and Patterns in Distributed Systems

Caching is the single most effective performance optimization in distributed systems. A well-designed cache reduces database load, decreases response latency, and improves system throughput. This article covers the major caching patterns, eviction policies, distributed caching with Redis, CDN caching, and the hardest problem in computer science: cache invalidation.

Caching Patterns

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern. The application checks the cache first. On a cache miss, it reads from the database and populates the cache.

class CacheAside:

def __init__(self, cache, database):

self.cache = cache

self.database = database

def get_user(self, user_id):

# 1. Try cache first

cached = self.cache.get(f"user:{user_id}")

if cached is not None:

return cached

# 2. Cache miss: read from database

user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

if user:

# 3. Populate cache for next time

self.cache.set(f"user:{user_id}", user, ttl=3600)

return user

def update_user(self, user_id, data):

# 1. Update database

self.database.execute("UPDATE users SET name = ? WHERE id = ?",

data['name'], user_id)

# 2. Invalidate cache (not update!)

self.cache.delete(f"user:{user_id}")

**Advantages**:

* Only caches data that is actually requested (no wasted space).

* Simple to implement and understand.

* Cache failures are not fatal (system falls back to database).

**Disadvantages**:

* Cache miss penalty includes both cache check and database read.

* Stale data until TTL expires (if items are not invalidated on update).

* Thundering herd problem on cache miss for popular items.

Write-Through

Write-through caches update the cache synchronously when data is written to the database.

class WriteThrough:

def __init__(self, cache, database):

self.cache = cache

self.database = database

def update_user(self, user_id, data):

# 1. Update database

self.database.execute("UPDATE users SET name = ? WHERE id = ?",

data['name'], user_id)

# 2. Update cache synchronously

user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

self.cache.set(f"user:{user_id}", user, ttl=3600)

**Advantages**:

* Cache is always consistent with the database (no stale data).

* No cache miss penalty for reads.

* Read path is simple (always from cache or cache-miss-then-database).

**Disadvantages**:

* Writes are slower (must update both database and cache).

* Writes more data to cache than may ever be read (cache pollution).

* Cache and database updates are not atomic (risk of inconsistency).

Write-Behind (Write-Back)

Write-behind caches write to the cache immediately and asynchronously update the database.

import asyncio

class WriteBehind:

def __init__(self, cache, database):

self.cache = cache

self.database = database

self.write_queue = asyncio.Queue()

self._start_flusher()

def _start_flusher(self):

"""Background task that flushes writes to database."""

async def flusher():

while True:

# Batch writes and flush periodically

batch = []

for _ in range(100): # Batch size

try:

item = await asyncio.wait_for(

self.write_queue.get(), timeout=1.0

)

batch.append(item)

except asyncio.TimeoutError:

break

if batch:

self._flush_to_database(batch)

asyncio.create_task(flusher())

def update_user(self, user_id, data):

# 1. Update cache immediately

user = {**self.cache.get(f"user:{user_id}", {}), **data}

self.cache.set(f"user:{user_id}", user)

# 2. Queue database update

self.write_queue.put_nowait({

"type": "update_user",

"user_id": user_id,

"data": data

})

**Advantages**:

* Very fast writes (no database latency).

* Can batch database writes for efficiency.

* Reduces database write load.

**Disadvantages**:

* Risk of data loss if cache fails before flush completes.

* Complex to implement correctly.

* Inconsistency window between cache update and database update.

Refresh-Ahead

Refresh-ahead proactively refreshes the cache before data expires.

class RefreshAhead:

def __init__(self, cache, database, refresh_threshold=0.8):

self.cache = cache

self.database = database

self.refresh_threshold = refresh_threshold # Refresh when 80% of TTL elapsed

def get_user(self, user_id):

cached = self.cache.get(f"user:{user_id}")

if cached is None:

user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

self.cache.set(f"user:{user_id}", user, ttl=3600)

return user

# Check if we should refresh

ttl = self.cache.ttl(f"user:{user_id}")

if ttl < 3600 * (1 - self.refresh_threshold):

# Asynchronously refresh in background

self._async_refresh(f"user:{user_id}", user_id)

return cached

def _async_refresh(self, cache_key, user_id):

"""Background refresh task."""

import threading

def refresh():

user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

if user:

self.cache.set(cache_key, user, ttl=3600)

threading.Thread(target=refresh, daemon=True).start()

Cache Eviction Policies

Least Recently Used (LRU)

Evicts the item that was accessed least recently. Good for workloads with temporal locality.

Cache: [A(1min ago), B(30s ago), C(5s ago), D(now)]

A is accessed least recently -> evict A

Redis implements LRU approximation with `maxmemory-policy allkeys-lru`.

Least Frequently Used (LFU)

Evicts the item accessed least frequently. Good for workloads with skewed popularity.

Cache: [A(100x), B(50x), C(30x), D(5x)]

D is least frequently accessed -> evict D

Redis supports LFU with `maxmemory-policy allkeys-lfu`.

Time-To-Live (TTL)

Evicts items based on their TTL. Items expire regardless of access pattern. Essential for all caching systems.

First In, First Out (FIFO)

Evicts the oldest item regardless of access frequency. Simple but less effective than LRU.

Choosing an Eviction Policy

| Workload | Best Policy | |----------|-------------| | Uniform access (all items equally likely) | FIFO or TTL | | Temporal locality (recent items more likely) | LRU | | Skewed access (some items much more popular) | LFU | | Time-sensitive data (session, expiring offers) | TTL | | Unknown | LRU + TTL |

Distributed Caching with Redis

Redis is the dominant distributed cache. It provides in-memory data structures, replication, persistence, and high availability.

Redis Cluster Setup

# docker-compose.yml for Redis Cluster

version: '3'

services:

redis-cluster:

image: redis:7-alpine

command: redis-cli --cluster create

127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002

127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005

--cluster-replicas 1

ports:

- "7000-7005:7000-7005"

Redis Caching Best Practices

import redis

import json

class RedisCache:

def __init__(self, redis_url):

self.client = redis.from_url(redis_url)

def get_or_compute(self, key, compute_func, ttl=300):

"""Cache-aside with compute function."""

cached = self.client.get(key)

if cached is not None:

return json.loads(cached)

value = compute_func()

self.client.setex(key, ttl, json.dumps(value))

return value

def get_batch(self, keys):

"""Batch cache get using pipeline."""

pipeline = self.client.pipeline()

for key in keys:

pipeline.get(key)

results = pipeline.execute()

return {

key: json.loads(val) if val else None

for key, val in zip(keys, results)

}

Cache Sharding

For very large caches, shard across multiple Redis nodes.

import hashlib

class ShardedRedis:

def __init__(self, nodes):

self.nodes = nodes # List of Redis clients

def _get_node(self, key):

"""Determine which node holds this key."""

hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)

return self.nodes[hash_val % len(self.nodes)]

def get(self, key):

node = self._get_node(key)

return node.get(key)

def set(self, key, value, ttl=300):

node = self._get_node(key)

node.setex(key, ttl, value)

CDN Caching

Content Delivery Networks (CDNs) cache static and dynamic content at edge locations close to users.

Cache Control Headers

# Nginx: Static asset caching headers

location /static/ {

expires 365d;

add_header Cache-Control "public, immutable";

}

location /api/content/ {

# Dynamic content: shorter cache

expires 5m;

add_header Cache-Control "public, must-revalidate";

}

location /api/user/ {

# Private content: no CDN caching

add_header Cache-Control "private, no-cache";

}

CDN Cache Invalidation

# CloudFront: Invalidate specific paths

aws cloudfront create-invalidation \

--distribution-id E123456 \

--paths "/api/content/*" "/index.html"

# Fastly: Purge by key

curl -X POST https://api.fastly.com/service/SERVICE/purge \

-H "Fastly-Key: $API_KEY" \

-H "Surrogate-Key: product:1234" \

-H "Accept: application/json"

Cache Invalidation

Cache invalidation is notoriously difficult. These strategies help.

Time-Based Invalidation (TTL)

The simplest approach. Every cache entry has a TTL. Data is stale until the TTL expires.

Always safe: stale data is eventually replaced.

Always simple: no complex invalidation logic.

Limitation: data can be arbitrarily stale within the TTL window.

Event-Driven Invalidation

When data changes, publish an invalidation event.

# Event-driven invalidation

class EventDrivenCache:

def __init__(self, cache, message_bus):

self.cache = cache

self.message_bus = message_bus

# Subscribe to invalidation events

self.message_bus.subscribe("cache.invalidate", self.handle_invalidation)

def handle_invalidation(self, event):

key = event.data['key']

self.cache.delete(key)

log.info(f"Invalidated cache key: {key} due to {event.data['reason']}")

Write-Through Invalidation

Invalidate (or update) the cache as part of the write transaction.

def update_product(product_id, data):

with transaction():

# Update database

db.execute("UPDATE products SET price = ? WHERE id = ?",

data['price'], product_id)

# Invalidate cache in same transaction if possible

cache.delete(f"product:{product_id}")

# Publish invalidation for other cache nodes

message_bus.publish("cache.invalidate", {"key": f"product:{product_id}"})

Conclusion

Choose cache-aside for most general-purpose caching. Use write-through when read consistency is critical. Use write-behind when write performance is paramount. Use refresh-ahead for predictable access patterns. Set appropriate TTLs as a safety net. Use Redis for distributed caching with proper cluster configuration. Use CDNs for content delivery to global users. Remember that cache invalidation is hard: prefer TTLs over complex invalidation logic, use event-driven invalidation when TTLs are insufficient, and always have a fallback to the original data source.

Caching Strategies and Patterns in Distributed Systems

Caching Strategies and Patterns in Distributed Systems

Related Articles