System Design Fundamentals 2026: A Developer's Guide to Scalable Applications


System design interviews get all the attention, but the real value is in day-to-day decisions: should you extract that service? Add a cache? Reach for a message queue? This guide covers the fundamental patterns, their trade-offs, and the concrete decisions you'll face building production systems in 2026.


---


1. Microservices vs Monolith vs Modular Monolith


The "monolith vs microservices" debate has matured. In 2026, the winner is often somewhere in between.


| Architecture | Team Size | Deploy Frequency | Best For |

|---|---|---|---|

| **Monolith** | 1–5 | Low | Prototypes, internal tools, MVPs |

| **Modular Monolith** | 3–15 | Medium | Most business apps, teams that aren't Spotify-sized |

| **Microservices** | 10+ per service | High | Large orgs with clear domain boundaries |


The Modular Monolith Sweet Spot


A modular monolith is a single deployable unit with **strict module boundaries**. Modules communicate through well-defined interfaces but share the same process and database.



┌─────────────────────────────────────┐

│         Modular Monolith            │

│  ┌──────────┐  ┌──────────┐        │

│  │  Orders  │  │  Billing │        │

│  │  Module  │──│  Module  │        │

│  └────┬─────┘  └────┬─────┘        │

│       │              │              │

│  ┌────▼──────────────▼─────┐       │

│  │    Shared Kernel        │       │

│  │  (DB, messaging, auth)  │       │

│  └─────────────────────────┘       │

└─────────────────────────────────────┘


**When to extract a service**: When two conditions are met — the module has a clear bounded context (DDD), and you need independent scaling or deploy velocity that the monolith can't provide.


**Rule of thumb**: Don't break your monolith until it hurts. Premature microservices add distributed transaction complexity, network latency, and operational overhead. Start modular, extract surgically.


Real-World Decision Tree



Monolith → Modular Monolith → Selective Extraction → Full Microservices



MVP Phase:         Monolith

10k users/5 devs:  Modular monolith

100k users:        Extract payments (PCI scope)

1M users:          Extract search (separate scale)

10M users:         Extract recommendations (different stack)


---


2. CQRS: Command Query Responsibility Segregation


CQRS separates **reads** from **writes** — different models, sometimes different databases.


When CQRS Makes Sense


  • Your read queries are complex and don't map well to your write model (e.g., reporting dashboards)
  • Your read and write workloads have different scaling requirements (10:1 read-to-write ratio)
  • You need different data shapes for reading vs writing (e.g., write normalized, read denormalized)

  • A Simple CQRS Implementation


    
    # --- Command Side (Writes) ---
    
    class CreateOrderCommand:
    
        def __init__(self, user_id: str, items: list):
    
            self.user_id = user_id
    
            self.items = items
    
    
    
    class OrderCommandHandler:
    
        def handle(self, cmd: CreateOrderCommand) -> str:
    
            # Validate business rules
    
            order = Order.create(cmd.user_id, cmd.items)
    
            order.save()  # Write to transactional DB (PostgreSQL)
    
            event_bus.publish("order.created", {"order_id": order.id})
    
            return order.id
    
    
    
    # --- Query Side (Reads) ---
    
    class OrderQueryHandler:
    
        def get_order_summary(self, user_id: str) -> dict:
    
            # Read from denormalized read model (could be a different DB)
    
            return read_db.query(
    
                "SELECT * FROM order_summaries WHERE user_id = :uid",
    
                {"uid": user_id}
    
            )
    
    

    CQRS Without Event Sourcing


    You don't need event sourcing to use CQRS. The most common pattern is:


  • **Write** to a normalized PostgreSQL table
  • 2. **Sync** (or async via CDC) to a read-optimized table

    3. **Read** from the read table


    
    -- Write model: normalized
    
    CREATE TABLE orders (
    
        id UUID PRIMARY KEY,
    
        user_id UUID NOT NULL,
    
        status VARCHAR(20) NOT NULL,
    
        total_cents BIGINT NOT NULL,
    
        created_at TIMESTAMP DEFAULT NOW()
    
    );
    
    
    
    CREATE TABLE order_items (
    
        id UUID PRIMARY KEY,
    
        order_id UUID REFERENCES orders(id),
    
        product_id UUID NOT NULL,
    
        quantity INT NOT NULL,
    
        unit_price_cents BIGINT NOT NULL
    
    );
    
    
    
    -- Read model: denormalized for fast queries
    
    CREATE TABLE order_summaries (
    
        order_id UUID PRIMARY KEY,
    
        user_id UUID NOT NULL,
    
        status VARCHAR(20) NOT NULL,
    
        item_count INT NOT NULL,
    
        total_cents BIGINT NOT NULL,
    
        product_names TEXT[] NOT NULL,
    
        created_at TIMESTAMP DEFAULT NOW()
    
    );
    
    

    When NOT to Use CQRS


  • Your app is a simple CRUD interface with no complex queries
  • You don't need separate read/write scaling
  • Your team is small and you can't justify the infrastructure overhead

  • ---


    3. Event-Driven Architecture


    Event-driven systems decouple producers from consumers. When an event happens, interested services react.


    Core Concepts


    
    ┌──────────┐     Event Bus      ┌──────────────┐
    
    │ Producer │─────(Kafka/RMQ)────▶│  Consumer 1  │
    
    │ (Orders) │                    │ (Analytics)  │
    
    └──────────┘                    └──────────────┘
    
                    ───────────────▶┌──────────────┐
    
                                    │  Consumer 2   │
    
                                    │ (Email)       │
    
                                    └──────────────┘
    
    

    Message Queue Comparison


    | Feature | Kafka | RabbitMQ | SQS |

    |---|---|---|---|

    | **Delivery** | At-least-once, exactly-once (idempotent) | At-most-once, at-least-once | At-least-once |

    | **Ordering** | Per-partition guaranteed | Not guaranteed (unless single queue) | FIFO queue (limited throughput) |

    | **Persistence** | Disk-based, configurable retention | Memory + disk (lazy queues) | Automatic (up to 14 days) |

    | **Throughput** | Millions/sec | Thousands/sec | Unlimited (soft limit 300/s for FIFO) |

    | **Consumer model** | Pull-based (offset tracking) | Push or pull | Pull-based (long polling) |

    | **Use case** | Event sourcing, stream processing, logs | Task queues, RPC, work queues | Serverless workloads, simple decoupling |

    | **Operational cost** | High (requires Zookeeper/KRaft) | Medium | Zero (fully managed) |


    Kafka in Practice: The Url Shortener Click Stream


    
    # Producer — emit click events
    
    def record_click(short_code: str, ip: str, user_agent: str):
    
        producer.send(
    
            topic="url_clicks",
    
            key=short_code.encode(),  # Same key → same partition → ordered
    
            value={
    
                "short_code": short_code,
    
                "ip": ip,
    
                "user_agent": user_agent,
    
                "timestamp": int(time.time()),
    
            }
    
        )
    
    
    
    # Consumer 1 — real-time analytics (e.g., update Redis counters)
    
    def consume_clicks_for_analytics():
    
        for message in consumer:
    
            click = message.value
    
            redis.zincrby("popular_urls:today", 1, click["short_code"])
    
            redis.incr(f"url:{click['short_code']}:clicks")
    
    
    
    # Consumer 2 — store raw clicks in data warehouse
    
    def consume_clicks_for_storage():
    
        for message in consumer:
    
            warehouse.insert_one(message.value)
    
    

    Event Sourcing: Storing State as Events


    Instead of storing the current state, event sourcing stores a sequence of state-changing events. The current state is derived by replaying them.


    
    # Events (immutable facts)
    
    events = [
    
        {"type": "AccountCreated", "data": {"user_id": "u1", "email": "a@b.com"}},
    
        {"type": "EmailVerified", "data": {"user_id": "u1", "verified_at": "2026-05-01"}},
    
        {"type": "PasswordChanged", "data": {"user_id": "u1", "changed_at": "2026-05-10"}},
    
    ]
    
    
    
    # Derive current state by replaying events
    
    def get_account_state(events):
    
        state = {"email": None, "email_verified": False, "password_hash": None}
    
        for event in events:
    
            if event["type"] == "AccountCreated":
    
                state["email"] = event["data"]["email"]
    
            elif event["type"] == "EmailVerified":
    
                state["email_verified"] = True
    
        return state
    
    

    **Trade-offs**: Event sourcing gives you a complete audit trail and time travel, but makes querying awkward (you need projections) and schema evolution painful.


    ---


    4. Database Scaling Strategies


    Read Replicas


    The simplest scaling strategy: one primary handles writes, replicas handle reads.


    
                            ┌─────────────┐
    
                            │  Primary DB  │◀── Writes
    
                            └──────┬──────┘
    
                                   │
    
                  ┌────────────────┼────────────────┐
    
                  │                │                │
    
             ┌────▼─────┐   ┌─────▼────┐   ┌───────▼──┐
    
             │ Replica 1│   │ Replica 2│   │ Replica 3│
    
             │  (Reads) │   │  (Reads) │   │  (Reads) │
    
             └──────────┘   └──────────┘   └──────────┘
    
    

    
    # Using read/write separation in code
    
    class DatabaseRouter:
    
        def __init__(self):
    
            self.primary = create_engine(PRIMARY_URL)
    
            self.replicas = [create_engine(url) for url in REPLICA_URLS]
    
            self.replica_index = 0
    
    
    
        def write(self, query, params=None):
    
            with self.primary.begin() as conn:
    
                return conn.execute(query, params or {})
    
    
    
        def read(self, query, params=None):
    
            # Round-robin across replicas
    
            replica = self.replicas[self.replica_index % len(self.replicas)]
    
            self.replica_index += 1
    
            return replica.execute(query, params or {})
    
    

    **Replication lag** is the #1 problem. If your app reads immediately after a write (e.g., "you just placed an order" page), route that read to the primary. This is called **read-after-write consistency**.


    
    async def create_order_and_redirect(user_id: str, items: list):
    
        order_id = db.write("INSERT INTO orders ... RETURNING id")
    
    
    
        # Read-after-write: force this read to the primary
    
        order = db.read_from_primary(
    
            "SELECT * FROM orders WHERE id = :oid", {"oid": order_id}
    
        )
    
    
    
        return redirect(f"/orders/{order_id}")
    
    

    Sharding (Horizontal Partitioning)


    Split data across databases by a shard key.


    | Strategy | Shard Key | Pros | Cons |

    |---|---|---|---|

    | **Hash-based** | hash(user_id) % N | Even distribution | Resharding is painful (need consistent hashing) |

    | **Range-based** | user_id 1–10000 → shard 1 | Range queries work | Hot spots possible |

    | **Directory-based** | Lookup table maps key → shard | Flexible, re-shardable | Extra lookup, single point of failure |


    
    # Consistent hashing — minimizes re-sharding
    
    class ConsistentHashRing:
    
        def __init__(self, nodes: list, replicas: int = 150):
    
            self.ring = {}
    
            for node in nodes:
    
                for i in range(replicas):
    
                    key = self._hash(f"{node}:{i}")
    
                    self.ring[key] = node
    
            self.sorted_keys = sorted(self.ring.keys())
    
    
    
        def get_node(self, key: str) -> str:
    
            if not self.ring:
    
                return None
    
            hash_val = self._hash(key)
    
            for ring_key in self.sorted_keys:
    
                if hash_val <= ring_key:
    
                    return self.ring[ring_key]
    
            return self.ring[self.sorted_keys[0]]
    
    
    
        def _hash(self, key: str) -> int:
    
            return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    

    Partitioning (Within a Database)


    Split a table into smaller physical chunks. PostgreSQL declarative partitioning:


    
    CREATE TABLE events (
    
        event_id UUID NOT NULL,
    
        occurred_at TIMESTAMP NOT NULL,
    
        payload JSONB
    
    ) PARTITION BY RANGE (occurred_at);
    
    
    
    CREATE TABLE events_2026_q1
    
        PARTITION OF events
    
        FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
    
    
    
    CREATE TABLE events_2026_q2
    
        PARTITION OF events
    
        FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
    
    

    Partition pruning means queries with `WHERE occurred_at >= '2026-04-01'` only scan relevant partitions.


    ---


    5. Caching Layers


    The Three Cache Levels


    
    CDN ─── Application Cache (Redis) ─── In-Memory Cache (Local)
    
    │               │                           │
    
    │        Expensive to fill             Fastest access
    
    │        Shared across servers         1-5μs per get
    
    │        50-500μs per get              Lost on restart
    
    

    Cache Strategies


    | Strategy | Read Behavior | Write Behavior | Best For |

    |---|---|---|---|

    | **Cache Aside** | Check cache → miss → load from DB → populate cache | Write to DB, invalidate cache key | Most general-purpose apps |

    | **Read Through** | Cache is authoritative; loads from DB on miss | Write through to DB; cache handles loading | When cache handles persistence |

    | **Write Through** | — | Write to cache first, then DB synchronously | Apps needing strong consistency |

    | **Write Behind** | — | Write to cache, async flush to DB | High-write-throughput apps |

    | **Write Around** | — | Write to DB only; cache populated on subsequent read | Write-once, read-rarely data |


    Cache Aside — The Default Choice


    
    async def get_user_profile(user_id: str) -> dict:
    
        cache_key = f"user:profile:{user_id}"
    
    
    
        # 1. Try cache
    
        cached = await redis.get(cache_key)
    
        if cached:
    
            return json.loads(cached)
    
    
    
        # 2. Cache miss — load from database
    
        profile = await db.query(
    
            "SELECT * FROM user_profiles WHERE user_id = :uid",
    
            {"uid": user_id}
    
        )
    
    
    
        if profile:
    
            # 3. Populate cache with TTL
    
            await redis.setex(cache_key, 300, json.dumps(profile))
    
    
    
        return profile
    
    
    
    async def update_user_profile(user_id: str, data: dict):
    
        # 1. Write to database
    
        await db.execute(
    
            "UPDATE user_profiles SET name = :name WHERE user_id = :uid",
    
            {"uid": user_id, "name": data["name"]}
    
        )
    
    
    
        # 2. Invalidate cache (don't update it — let next read re-populate)
    
        await redis.delete(f"user:profile:{user_id}")
    
    

    Write Behind — For High-Volume Writes


    
    # Batch writer process — runs every 5 seconds
    
    write_buffer = []
    
    
    
    async def write_to_cache(key: str, value: dict):
    
        write_buffer.append((key, value))
    
        if len(write_buffer) >= 100:
    
            await flush_buffer()
    
    
    
    async def flush_buffer():
    
        async with db.transaction():
    
            for key, value in write_buffer:
    
                await db.execute(
    
                    "UPSERT INTO ... VALUES (:k, :v)",
    
                    {"k": key, "v": json.dumps(value)}
    
                )
    
        write_buffer.clear()
    
    
    
    # Start background flusher
    
    async def periodic_flush():
    
        while True:
    
            await asyncio.sleep(5)
    
            if write_buffer:
    
                await flush_buffer()
    
    

    **Write behind risk**: if the process crashes before the flush, data is lost. Use a persistent queue (Kafka) for critical writes.


    ---


    6. CAP Theorem Explained Practically


    CAP says a distributed data store can provide at most two of three guarantees: **Consistency**, **Availability**, and **Partition Tolerance**.


    What CAP Actually Means


  • **C (Consistency)**: Every read sees the most recent write (or an error)
  • **A (Availability)**: Every request gets a non-error response (not necessarily the latest data)
  • **P (Partition Tolerance)**: System continues working despite network failures

  • The Key Insight


    You **must** choose CP or AP. Partition tolerance is non-negotiable in distributed systems — networks WILL fail.


    | System | Choice | Real-World |

    |---|---|---|

    | PostgreSQL (single node) | CA | No distribution, no partition |

    | PostgreSQL + synchronous replication | CP | Writes wait for replicas |

    | Cassandra | AP | Writes always succeed, reads may be stale |

    | DynamoDB (eventual consistency) | AP | Default read is eventually consistent |

    | DynamoDB (strongly consistent) | CP | Higher latency, lower availability |

    | MongoDB (replica set) | CP | Writes acknowledged by majority |


    Practical CAP Decisions


    
    # AP choice — accept stale reads for availability
    
    async def get_product_stock(product_id: str) -> int:
    
        # Read from nearest replica, may be stale
    
        return await replica.query(
    
            "SELECT stock FROM products WHERE id = :pid",
    
            {"pid": product_id}
    
        )
    
    
    
    # CP choice — accept slower reads for consistency
    
    async def get_product_stock_cp(product_id: str) -> int:
    
        # Read from primary, always latest
    
        return await primary.query(
    
            "SELECT stock FROM products WHERE id = :pid",
    
            {"pid": product_id}
    
        )
    
    

    **Rule of thumb**: Use eventual consistency for read-heavy, non-critical data (product descriptions, view counts). Use strong consistency for financial data, inventory, and auth tokens.


    ---


    7. Load Balancing Strategies


    Layer 4 vs Layer 7


    | Aspect | Layer 4 (TCP) | Layer 7 (HTTP) |

    |---|---|---|

    | **Routing based on** | IP + port | URL, headers, cookies, body |

    | **Performance** | Very fast | Slower (inspects payload) |

    | **Features** | Simple forwarding | Content-based routing, rate limiting |

    | **Examples** | HAProxy (TCP mode), AWS NLB | NGINX, Envoy, AWS ALB |


    Algorithms


    
    # Round Robin — predictable, but doesn't handle different load sizes
    
    servers = ["app-01", "app-02", "app-03"]
    
    next_server = current_index % len(servers)
    
    current_index += 1
    
    
    
    # Least Connections — better for variable request durations
    
    def least_connections(servers: list) -> str:
    
        return min(servers, key=lambda s: s.active_connections)
    
    
    
    # IP Hash — session persistence without cookies
    
    def ip_hash(client_ip: str, servers: list) -> str:
    
        hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
    
        return servers[hash_val % len(servers)]
    
    

    Health Checks: The Bare Minimum


    
    ┌──────────┐    /healthz    ┌──────────┐
    
    │  LB      │───────────────▶│  App-01  │──▶ Returns 200
    
    │          │                ├──────────┤
    
    │          │───────────────▶│  App-02  │──▶ Returns 500 (removed from pool)
    
    │          │                ├──────────┤
    
    │          │───────────────▶│  App-03  │──▶ Returns 200
    
    └──────────┘                └──────────┘
    
    

    
    # /healthz endpoint
    
    @app.get("/healthz")
    
    async def health_check():
    
        # Check critical dependencies
    
        db_ok = await check_database()
    
        cache_ok = await check_redis()
    
        if db_ok and cache_ok:
    
            return {"status": "ok"}
    
        return {"status": "degraded"}, 503
    
    

    ---


    8. API Gateway Patterns


    An API gateway sits between clients and your services, handling cross-cutting concerns.


    
                       ┌─────────────────┐
    
                       │    API Gateway   │
    
                       │ ┌─────────────┐  │
    
      Client ──────────┼─▶  Auth       │  │
    
                       │ └─────────────┘  │
    
                       │ ┌─────────────┐  │
    
                       │─▶  Rate Limit │  │──▶ Service A
    
                       │ └─────────────┘  │
    
                       │ ┌─────────────┐  │──▶ Service B
    
                       │─▶  Routing   │  │
    
                       │ └─────────────┘  │──▶ Service C
    
                       │ ┌─────────────┐  │
    
                       │─▶  Logging   │  │
    
                       │ └─────────────┘  │
    
                       └─────────────────┘
    
    

    What the Gateway Handles


    
    # Before gateway — each service handles auth
    
    @app.route("/api/orders")
    
    class OrdersResource:
    
        def get(self):
    
            token = request.headers["Authorization"]
    
            user = verify_token(token)  # Duplicated in EVERY service
    
    
    
    # After gateway — auth is centralized
    
    # Service code is simpler:
    
    @app.route("/api/orders")
    
    class OrdersResource:
    
        def get(self):
    
            user = request.environ["X-Authenticated-User"]  # Set by gateway
    
            return get_orders(user["id"])
    
    

    Gateway vs Service Mesh


    | Concern | API Gateway | Service Mesh (e.g., Istio) |

    |---|---|---|

    | **Client-facing** | Yes (edge) | No (internal) |

    | **Auth** | Token verification, API keys | mTLS between services |

    | **Rate limiting** | Per-client, per-endpoint | Per-service |

    | **Routing** | URL-based | Traffic splitting, canary |

    | **Location** | Edge proxy | Sidecar per pod |


    **Recommendation**: Start with an API gateway. Add a service mesh only when you have dozens of services and need advanced traffic management.


    ---


    9. Circuit Breaker and Resilience Patterns


    The Circuit Breaker Pattern


    
    class CircuitBreaker:
    
        STATES = ["CLOSED", "OPEN", "HALF_OPEN"]
    
    
    
        def __init__(self, failure_threshold=5, recovery_timeout=30):
    
            self.failure_count = 0
    
            self.failure_threshold = failure_threshold
    
            self.recovery_timeout = recovery_timeout  # seconds
    
            self.state = "CLOSED"
    
            self.last_failure_time = None
    
    
    
        async def call(self, func, fallback=None):
    
            if self.state == "OPEN":
    
                if time.time() - self.last_failure_time > self.recovery_timeout:
    
                    self.state = "HALF_OPEN"
    
                else:
    
                    return await fallback() if fallback else None
    
    
    
            try:
    
                result = await func()
    
                if self.state == "HALF_OPEN":
    
                    self.state = "CLOSED"
    
                    self.failure_count = 0
    
                return result
    
            except Exception as e:
    
                self.failure_count += 1
    
                self.last_failure_time = time.time()
    
                if self.failure_count >= self.failure_threshold:
    
                    self.state = "OPEN"
    
                return await fallback() if fallback else None
    
    
    
    
    
    # Usage
    
    cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
    
    
    
    async def get_recommendations(user_id: str):
    
        return await cb.call(
    
            func=lambda: recommendations_service.fetch(user_id),
    
            fallback=lambda: {"recommendations": [], "source": "fallback"}
    
        )
    
    

    Other Resilience Patterns


    | Pattern | What It Does |

    |---|---|

    | **Retry with backoff** | Exponential backoff + jitter to avoid thundering herd |

    | **Timeout** | Hard timeout per request (e.g., 5s) to prevent cascading |

    | **Bulkhead** | Isolate resources — limit connections per service |

    | **Rate limiting** | Token bucket or leaky bucket per client |

    | **Dead letter queue** | Failed messages go to a DLQ for manual inspection |


    
    # Retry with exponential backoff and jitter
    
    async def retry_with_backoff(func, max_retries=3):
    
        for attempt in range(max_retries):
    
            try:
    
                return await func()
    
            except (ConnectionError, TimeoutError) as e:
    
                if attempt == max_retries - 1:
    
                    raise
    
                sleep_time = (2 ** attempt) + random.random()  # exp + jitter
    
                await asyncio.sleep(sleep_time)
    
    

    ---


    10. Real Example: Design a URL Shortener


    Let's design bit.ly/tinyurl step by step.


    Requirements


  • Generate a short, unique code for any URL
  • Redirect to the original URL when the short code is accessed
  • Track click analytics (count, referrer, timestamp)
  • Handle 10M URLs, 100M redirects/day

  • Step 1: URL Encoding


    
    BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    
    
    
    def encode_base62(num: int) -> str:
    
        if num == 0:
    
            return BASE62[0]
    
        result = []
    
        while num > 0:
    
            result.append(BASE62[num % 62])
    
            num //= 62
    
        return ''.join(reversed(result))
    
    
    
    def decode_base62(code: str) -> int:
    
        result = 0
    
        for char in code:
    
            result = result * 62 + BASE62.index(char)
    
        return result
    
    
    
    # Example: 7 chars of base62 = 62^7 ≈ 3.5 trillion unique URLs
    
    encode_base62(123456789)  # "8m0Kx"
    
    

    Step 2: Architecture


    
                                             ┌────────────┐
    
                                             │  Analytics  │
    
                                             │  (Kafka →   │
    
                                             │   ClickHouse)│
    
                                             └─────────────┘
    
                                                   ▲
    
                                                   │ (async)
    
    ┌──────────┐    POST /shorten    ┌──────────────────────────┐
    
    │  Client  │────────────────────▶│     API Gateway          │
    
    │          │                     │  ┌────────────────────┐  │
    
    │          │    GET /abc123      │  │  Write Service     │──┼──▶ PostgreSQL (URLs)
    
    │          │────────────────────▶│  │  (generate code)   │  │
    
    │          │                     │  └────────────────────┘  │
    
    │          │  301 Redirect      │  ┌────────────────────┐  │
    
    │          │◀────────────────────│  │  Read Service      │  │
    
    │          │                     │  │  (resolve + cache) │──┼──▶ Redis (cache)
    
    │          │                     │  └────────────────────┘  │
    
    │          │                     │  ┌────────────────────┐  │
    
    │          │                     │  │  Click Logger      │──┼──▶ Kafka
    
    │          │                     │  └────────────────────┘  │
    
    └──────────┘                     └──────────────────────────┘
    
    

    Step 3: Data Model


    
    -- PostgreSQL
    
    CREATE TABLE urls (
    
        id BIGSERIAL PRIMARY KEY,
    
        short_code VARCHAR(10) UNIQUE NOT NULL,
    
        original_url TEXT NOT NULL,
    
        user_id UUID,          -- nullable for anonymous users
    
        created_at TIMESTAMP DEFAULT NOW(),
    
        expires_at TIMESTAMP   -- nullable
    
    );
    
    
    
    CREATE INDEX idx_short_code ON urls(short_code);
    
    
    
    -- Redis cache
    
    -- Key: "url:abc123" → Value: "https://example.com/long-url"
    
    -- TTL: 24 hours
    
    

    Step 4: Write Path


    
    @app.post("/shorten")
    
    async def shorten_url(url: str, user_id: str = None):
    
        # 1. Check if URL already shortened (optimization)
    
        existing = await db.query(
    
            "SELECT short_code FROM urls WHERE original_url = :url AND user_id = :uid",
    
            {"url": url, "uid": user_id}
    
        )
    
        if existing:
    
            return {"short_url": f"https://short.domain/{existing['short_code']}"}
    
    
    
        # 2. Generate unique code
    
        short_code = await generate_unique_code()
    
    
    
        # 3. Store in DB
    
        await db.execute(
    
            "INSERT INTO urls (short_code, original_url, user_id) VALUES (:c, :u, :uid)",
    
            {"c": short_code, "u": url, "uid": user_id}
    
        )
    
    
    
        # 4. Warm the cache
    
        await redis.setex(f"url:{short_code}", 86400, url)
    
    
    
        return {"short_url": f"https://short.domain/{short_code}"}
    
    
    
    
    
    async def generate_unique_code() -> str:
    
        for _ in range(3):  # Retry on collision
    
            code = encode_base62(random.randint(0, 62**7 - 1))
    
            exists = await db.query(
    
                "SELECT 1 FROM urls WHERE short_code = :c", {"c": code}
    
            )
    
            if not exists:
    
                return code
    
        raise Exception("Collision rate too high — increase code length")
    
    

    Step 5: Read Path (The Hot Path — Handles 100M req/day)


    
    @app.get("/{short_code}")
    
    async def redirect(short_code: str, request: Request):
    
        # 1. Try cache (99% hit rate with 24h TTL)
    
        original_url = await redis.get(f"url:{short_code}")
    
        if not original_url:
    
            # 2. Cache miss — hit DB
    
            row = await db.query(
    
                "SELECT original_url FROM urls WHERE short_code = :c",
    
                {"c": short_code}
    
            )
    
            if not row:
    
                raise HTTPException(status_code=404)
    
    
    
            original_url = row["original_url"]
    
    
    
            # 3. Populate cache with TTL
    
            await redis.setex(f"url:{short_code}", 86400, original_url)
    
    
    
        # 4. Log click asynchronously (don't block the redirect)
    
        click_event = {
    
            "short_code": short_code,
    
            "ip": request.client.host,
    
            "user_agent": request.headers.get("user-agent"),
    
            "referer": request.headers.get("referer"),
    
            "timestamp": int(time.time()),
    
        }
    
        # Fire and forget — queue to Kafka
    
        await click_producer.send("url_clicks", click_event)
    
    
    
        # 5. Redirect (301 for permanent, 302 for analytics)
    
        return RedirectResponse(url=original_url, status_code=301)
    
    

    Step 6: Scale Considerations


  • **Read replicas** for URL resolution (read-heavy: 10:1 read-to-write ratio)
  • **Redis cluster** for cache (with consistent hashing)
  • **Kafka partitions** by short_code for ordered click logs
  • **Batch write** click analytics to ClickHouse every 30 seconds
  • **CDN** for the redirect page itself (not the API — API calls are cheap)

  • ---


    11. Async Processing Patterns


    The Problem: Synchronous Chains


    
    Client ──▶ Service A ──▶ Service B ──▶ Service C ──▶ Response
    
                             500ms          800ms        200ms = 1.5s total
    
    

    The client waits 1.5 seconds for something that doesn't need a response.


    Solution: Decouple with Async


    
    Client ──▶ Service A ──▶ Response (immediate: "Accepted")
    
                    │
    
                    ▼
    
               Queue (Kafka/SQS)
    
                    │
    
             ┌──────┴──────┐
    
             ▼              ▼
    
        Service B      Service C
    
        (email)        (generate PDF)
    
    

    Pattern 1: Fire and Forget


    
    @app.post("/api/send-email")
    
    async def send_email(request: EmailRequest):
    
        # Validate request
    
        if not request.valid:
    
            raise HTTPException(400)
    
    
    
        # Queue the work — don't wait
    
        await email_queue.send({
    
            "to": request.to,
    
            "template": request.template,
    
            "data": request.data,
    
        })
    
    
    
        # Return immediately
    
        return {"status": "queued", "message_id": str(uuid.uuid4())}
    
    

    Pattern 2: Polling with Status


    
    @app.post("/api/report/generate")
    
    async def generate_report(params: ReportParams):
    
        report_id = str(uuid.uuid4())
    
        await report_queue.send({"report_id": report_id, "params": params})
    
        return {"report_id": report_id, "status_url": f"/api/report/{report_id}/status"}
    
    
    
    @app.get("/api/report/{report_id}/status")
    
    async def check_status(report_id: str):
    
        status = await redis.get(f"report:{report_id}:status")
    
        if status == "ready":
    
            return {"status": "ready", "url": f"/api/report/{report_id}/download"}
    
        return {"status": "processing"}
    
    

    Pattern 3: Webhook Callback


    Instead of polling, have the worker call a URL when done:


    
    async def process_report(report_id: str, params: dict, callback_url: str):
    
        # ... generate report ...
    
        await save_report(report_id, result)
    
    
    
        # Notify caller
    
        if callback_url:
    
            await httpx.post(callback_url, json={
    
                "report_id": report_id,
    
                "status": "completed",
    
                "download_url": f"/api/report/{report_id}/download",
    
            })
    
    

    ---


    12. Common Anti-Patterns


    1. The Distributed Monolith


    You split into microservices but deploy them together and fail to maintain boundaries. Every service calls every other service directly. Schema changes ripple across the system.


    **Signs**: A "simple" feature touches 5+ services. You need to coordinate deploys across teams. Services share a database — or god forbid, tables.


    **Fix**: Enforce bounded contexts. Each service owns its data. Communication is via APIs or events, not shared databases.


    2. Over-Engineering from Day One


    "Let's use Kafka, Cassandra, Kubernetes, and event sourcing" — for a blog with 10 visitors/day.


    **Fix**: Start with the simplest thing that works. A monolith with PostgreSQL and Redis will handle 99% of applications. Extract services when there's a proven need.


    3. Synchronous Coupling via HTTP


    
    Service A ──HTTP──▶ Service B ──HTTP──▶ Service C ──HTTP──▶ Service D
    
    

    If one service is slow, the whole chain slows. Latency adds up. Failures cascade.


    **Fix**: Use async communication for non-critical paths. Use circuit breakers for critical sync calls. Prefer eventual consistency over synchronous coordination.


    4. The Shared Database


    Two services reading/writing the same database table. Schema changes require coordination. One service can deadlock the other.


    **Fix**: Each service owns its data. Share via APIs or events, not databases.


    5. Ignoring Caching


    Every request hits the database. Database CPU is 90%. Response times are 200ms for data that changes hourly.


    **Fix**: Add Redis. Cache the most frequently accessed data. Even a 60-second cache TTL reduces DB load by 95% for read-heavy workloads.


    6. The N+1 Query Problem


    
    # Anti-pattern: N+1 queries
    
    def get_orders_with_items(user_id: str):
    
        orders = db.query("SELECT * FROM orders WHERE user_id = :uid", {"uid": user_id})
    
        for order in orders:
    
            # One query PER order — terrible!
    
            order["items"] = db.query(
    
                "SELECT * FROM order_items WHERE order_id = :oid",
    
                {"oid": order["id"]}
    
            )
    
        return orders
    
    
    
    # Fix: single query with JOIN
    
    def get_orders_with_items_fixed(user_id: str):
    
        return db.query("""
    
            SELECT o.id, o.total, oi.product_id, oi.quantity
    
            FROM orders o
    
            LEFT JOIN order_items oi ON oi.order_id = o.id
    
            WHERE o.user_id = :uid
    
        """, {"uid": user_id})
    
    

    7. No Monitoring / No Observability


    "Everything looks fine" — until users complain that the site is slow and you have no idea why.


    **Baseline monitoring**: Request latency (p50, p95, p99), error rate, throughput, CPU/memory per service. Structured logging with correlation IDs. Distributed tracing for async flows.


    ---


    Summary: Key Decisions for 2026


    | Decision | Default Choice | Upgrade When |

    |---|---|---|

    | **Architecture** | Modular monolith | Team >15 or clear independent scale need |

    | **Database** | PostgreSQL | Read replicas at 10k reads/s, sharding at 100k |

    | **Cache** | Redis (cache aside) | Write-behind for high-throughput writes |

    | **Queue** | SQS (serverless) → RabbitMQ (control) → Kafka (streaming) | Scale-dependent |

    | **Async** | Fire and forget for non-critical | Polling → Webhooks as needs grow |

    | **API Gateway** | NGINX / Traefik | Envoy / Kong for advanced routing |

    | **Resilience** | Circuit breaker + timeout | Bulkhead + rate limiting at scale |


    The best system design is the one that solves today's problem without creating tomorrow's nightmare. Start simple, measure everything, extract with surgical precision, and never optimize for a scale you haven't reached.