Caching Strategies and Patterns in Distributed Systems

Caching is the single most effective performance optimization in distributed systems. A well-designed cache reduces database load, decreases response latency, and improves system throughput. This article covers the major caching patterns, eviction policies, distributed caching with Redis, CDN caching, and the hardest problem in computer science: cache invalidation.

Caching Patterns

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern. The application checks the cache first. On a cache miss, it reads from the database and populates the cache.


class CacheAside:

    def __init__(self, cache, database):

        self.cache = cache

        self.database = database

    

    def get_user(self, user_id):

        # 1. Try cache first

        cached = self.cache.get(f"user:{user_id}")

        if cached is not None:

            return cached

        

        # 2. Cache miss: read from database

        user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

        

        if user:

            # 3. Populate cache for next time

            self.cache.set(f"user:{user_id}", user, ttl=3600)

        

        return user

    

    def update_user(self, user_id, data):

        # 1. Update database

        self.database.execute("UPDATE users SET name = ? WHERE id = ?",

                            data['name'], user_id)

        

        # 2. Invalidate cache (not update!)

        self.cache.delete(f"user:{user_id}")

**Advantages**:

Only caches data that is actually requested (no wasted space).

Simple to implement and understand.

Cache failures are not fatal (system falls back to database).

**Disadvantages**:

Cache miss penalty includes both cache check and database read.

Stale data until TTL expires (if items are not invalidated on update).

Thundering herd problem on cache miss for popular items.

Write-Through

Write-through caches update the cache synchronously when data is written to the database.


class WriteThrough:

    def __init__(self, cache, database):

        self.cache = cache

        self.database = database

    

    def update_user(self, user_id, data):

        # 1. Update database

        self.database.execute("UPDATE users SET name = ? WHERE id = ?",

                            data['name'], user_id)

        

        # 2. Update cache synchronously

        user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

        self.cache.set(f"user:{user_id}", user, ttl=3600)

**Advantages**:

Cache is always consistent with the database (no stale data).

No cache miss penalty for reads.

Read path is simple (always from cache or cache-miss-then-database).

**Disadvantages**:

Writes are slower (must update both database and cache).

Writes more data to cache than may ever be read (cache pollution).

Cache and database updates are not atomic (risk of inconsistency).

Write-Behind (Write-Back)

Write-behind caches write to the cache immediately and asynchronously update the database.


import asyncio



class WriteBehind:

    def __init__(self, cache, database):

        self.cache = cache

        self.database = database

        self.write_queue = asyncio.Queue()

        self._start_flusher()

    

    def _start_flusher(self):

        """Background task that flushes writes to database."""

        async def flusher():

            while True:

                # Batch writes and flush periodically

                batch = []

                for _ in range(100):  # Batch size

                    try:

                        item = await asyncio.wait_for(

                            self.write_queue.get(), timeout=1.0

                        )

                        batch.append(item)

                    except asyncio.TimeoutError:

                        break

                

                if batch:

                    self._flush_to_database(batch)

        

        asyncio.create_task(flusher())

    

    def update_user(self, user_id, data):

        # 1. Update cache immediately

        user = {**self.cache.get(f"user:{user_id}", {}), **data}

        self.cache.set(f"user:{user_id}", user)

        

        # 2. Queue database update

        self.write_queue.put_nowait({

            "type": "update_user",

            "user_id": user_id,

            "data": data

        })

**Advantages**:

Very fast writes (no database latency).

Can batch database writes for efficiency.

Reduces database write load.

**Disadvantages**:

Risk of data loss if cache fails before flush completes.

Complex to implement correctly.

Inconsistency window between cache update and database update.

Refresh-Ahead

Refresh-ahead proactively refreshes the cache before data expires.


class RefreshAhead:

    def __init__(self, cache, database, refresh_threshold=0.8):

        self.cache = cache

        self.database = database

        self.refresh_threshold = refresh_threshold  # Refresh when 80% of TTL elapsed

    

    def get_user(self, user_id):

        cached = self.cache.get(f"user:{user_id}")

        if cached is None:

            user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

            self.cache.set(f"user:{user_id}", user, ttl=3600)

            return user

        

        # Check if we should refresh

        ttl = self.cache.ttl(f"user:{user_id}")

        if ttl < 3600 * (1 - self.refresh_threshold):

            # Asynchronously refresh in background

            self._async_refresh(f"user:{user_id}", user_id)

        

        return cached

    

    def _async_refresh(self, cache_key, user_id):

        """Background refresh task."""

        import threading

        def refresh():

            user = self.database.query("SELECT * FROM users WHERE id = ?", user_id)

            if user:

                self.cache.set(cache_key, user, ttl=3600)

        threading.Thread(target=refresh, daemon=True).start()

Cache Eviction Policies

Least Recently Used (LRU)

Evicts the item that was accessed least recently. Good for workloads with temporal locality.


Cache: [A(1min ago), B(30s ago), C(5s ago), D(now)]

A is accessed least recently -> evict A

Redis implements LRU approximation with `maxmemory-policy allkeys-lru`.

Least Frequently Used (LFU)

Evicts the item accessed least frequently. Good for workloads with skewed popularity.


Cache: [A(100x), B(50x), C(30x), D(5x)]

D is least frequently accessed -> evict D

Redis supports LFU with `maxmemory-policy allkeys-lfu`.

Time-To-Live (TTL)

Evicts items based on their TTL. Items expire regardless of access pattern. Essential for all caching systems.

First In, First Out (FIFO)

Evicts the oldest item regardless of access frequency. Simple but less effective than LRU.

Choosing an Eviction Policy

| Workload | Best Policy |

|----------|-------------|

| Uniform access (all items equally likely) | FIFO or TTL |

| Temporal locality (recent items more likely) | LRU |

| Skewed access (some items much more popular) | LFU |

| Time-sensitive data (session, expiring offers) | TTL |

| Unknown | LRU + TTL |

Distributed Caching with Redis

Redis is the dominant distributed cache. It provides in-memory data structures, replication, persistence, and high availability.

Redis Cluster Setup


# docker-compose.yml for Redis Cluster

version: '3'

services:

  redis-cluster:

    image: redis:7-alpine

    command: redis-cli --cluster create

      127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002

      127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005

      --cluster-replicas 1

    ports:

      - "7000-7005:7000-7005"

Redis Caching Best Practices


import redis

import json



class RedisCache:

    def __init__(self, redis_url):

        self.client = redis.from_url(redis_url)

    

    def get_or_compute(self, key, compute_func, ttl=300):

        """Cache-aside with compute function."""

        cached = self.client.get(key)

        if cached is not None:

            return json.loads(cached)

        

        value = compute_func()

        self.client.setex(key, ttl, json.dumps(value))

        return value

    

    def get_batch(self, keys):

        """Batch cache get using pipeline."""

        pipeline = self.client.pipeline()

        for key in keys:

            pipeline.get(key)

        results = pipeline.execute()

        

        return {

            key: json.loads(val) if val else None

            for key, val in zip(keys, results)

        }

Cache Sharding

For very large caches, shard across multiple Redis nodes.


import hashlib



class ShardedRedis:

    def __init__(self, nodes):

        self.nodes = nodes  # List of Redis clients

    

    def _get_node(self, key):

        """Determine which node holds this key."""

        hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)

        return self.nodes[hash_val % len(self.nodes)]

    

    def get(self, key):

        node = self._get_node(key)

        return node.get(key)

    

    def set(self, key, value, ttl=300):

        node = self._get_node(key)

        node.setex(key, ttl, value)

CDN Caching

Content Delivery Networks (CDNs) cache static and dynamic content at edge locations close to users.

Cache Control Headers


# Nginx: Static asset caching headers

location /static/ {

    expires 365d;

    add_header Cache-Control "public, immutable";

}



location /api/content/ {

    # Dynamic content: shorter cache

    expires 5m;

    add_header Cache-Control "public, must-revalidate";

}



location /api/user/ {

    # Private content: no CDN caching

    add_header Cache-Control "private, no-cache";

}

CDN Cache Invalidation


# CloudFront: Invalidate specific paths

aws cloudfront create-invalidation \

  --distribution-id E123456 \

  --paths "/api/content/*" "/index.html"



# Fastly: Purge by key

curl -X POST https://api.fastly.com/service/SERVICE/purge \

  -H "Fastly-Key: $API_KEY" \

  -H "Surrogate-Key: product:1234" \

  -H "Accept: application/json"

Cache Invalidation

Cache invalidation is notoriously difficult. These strategies help.

Time-Based Invalidation (TTL)

The simplest approach. Every cache entry has a TTL. Data is stale until the TTL expires.


Always safe: stale data is eventually replaced.

Always simple: no complex invalidation logic.

Limitation: data can be arbitrarily stale within the TTL window.

Event-Driven Invalidation

When data changes, publish an invalidation event.


# Event-driven invalidation

class EventDrivenCache:

    def __init__(self, cache, message_bus):

        self.cache = cache

        self.message_bus = message_bus

        

        # Subscribe to invalidation events

        self.message_bus.subscribe("cache.invalidate", self.handle_invalidation)

    

    def handle_invalidation(self, event):

        key = event.data['key']

        self.cache.delete(key)

        log.info(f"Invalidated cache key: {key} due to {event.data['reason']}")

Write-Through Invalidation

Invalidate (or update) the cache as part of the write transaction.


def update_product(product_id, data):

    with transaction():

        # Update database

        db.execute("UPDATE products SET price = ? WHERE id = ?",

                   data['price'], product_id)

        

        # Invalidate cache in same transaction if possible

        cache.delete(f"product:{product_id}")

    

    # Publish invalidation for other cache nodes

    message_bus.publish("cache.invalidate", {"key": f"product:{product_id}"})

Conclusion

Choose cache-aside for most general-purpose caching. Use write-through when read consistency is critical. Use write-behind when write performance is paramount. Use refresh-ahead for predictable access patterns. Set appropriate TTLs as a safety net. Use Redis for distributed caching with proper cluster configuration. Use CDNs for content delivery to global users. Remember that cache invalidation is hard: prefer TTLs over complex invalidation logic, use event-driven invalidation when TTLs are insufficient, and always have a fallback to the original data source.

Caching Strategies and Patterns in Distributed Systems

Caching Patterns

Cache-Aside (Lazy Loading)

Write-Through

Write-Behind (Write-Back)

Refresh-Ahead

Cache Eviction Policies

Least Recently Used (LRU)

Least Frequently Used (LFU)

Time-To-Live (TTL)

First In, First Out (FIFO)

Choosing an Eviction Policy

Distributed Caching with Redis

Redis Cluster Setup

Redis Caching Best Practices

Cache Sharding

CDN Caching

Cache Control Headers

CDN Cache Invalidation

Cache Invalidation

Time-Based Invalidation (TTL)

Event-Driven Invalidation

Write-Through Invalidation

Conclusion

Related Articles