System Design Fundamentals 2026: A Developer's Guide to Scalable Applications

System design interviews get all the attention, but the real value is in day-to-day decisions: should you extract that service? Add a cache? Reach for a message queue? This guide covers the fundamental patterns, their trade-offs, and the concrete decisions you'll face building production systems in 2026.

---

1. Microservices vs Monolith vs Modular Monolith

The "monolith vs microservices" debate has matured. In 2026, the winner is often somewhere in between.

|---|---|---|---|

| **Monolith** | 1–5 | Low | Prototypes, internal tools, MVPs |

The Modular Monolith Sweet Spot

A modular monolith is a single deployable unit with **strict module boundaries**. Modules communicate through well-defined interfaces but share the same process and database.


┌─────────────────────────────────────┐

│         Modular Monolith            │

│  ┌──────────┐  ┌──────────┐        │

│  │  Orders  │  │  Billing │        │

│  │  Module  │──│  Module  │        │

│  └────┬─────┘  └────┬─────┘        │

│       │              │              │

│  ┌────▼──────────────▼─────┐       │

│  │    Shared Kernel        │       │

│  │  (DB, messaging, auth)  │       │

│  └─────────────────────────┘       │

└─────────────────────────────────────┘

**When to extract a service**: When two conditions are met — the module has a clear bounded context (DDD), and you need independent scaling or deploy velocity that the monolith can't provide.

**Rule of thumb**: Don't break your monolith until it hurts. Premature microservices add distributed transaction complexity, network latency, and operational overhead. Start modular, extract surgically.

Real-World Decision Tree


Monolith → Modular Monolith → Selective Extraction → Full Microservices



MVP Phase:         Monolith

10k users/5 devs:  Modular monolith

100k users:        Extract payments (PCI scope)

1M users:          Extract search (separate scale)

10M users:         Extract recommendations (different stack)

---

2. CQRS: Command Query Responsibility Segregation

CQRS separates **reads** from **writes** — different models, sometimes different databases.

When CQRS Makes Sense

Your read queries are complex and don't map well to your write model (e.g., reporting dashboards)

Your read and write workloads have different scaling requirements (10:1 read-to-write ratio)

You need different data shapes for reading vs writing (e.g., write normalized, read denormalized)

A Simple CQRS Implementation


# --- Command Side (Writes) ---

class CreateOrderCommand:

    def __init__(self, user_id: str, items: list):

        self.user_id = user_id

        self.items = items



class OrderCommandHandler:

    def handle(self, cmd: CreateOrderCommand) -> str:

        # Validate business rules

        order = Order.create(cmd.user_id, cmd.items)

        order.save()  # Write to transactional DB (PostgreSQL)

        event_bus.publish("order.created", {"order_id": order.id})

        return order.id



# --- Query Side (Reads) ---

class OrderQueryHandler:

    def get_order_summary(self, user_id: str) -> dict:

        # Read from denormalized read model (could be a different DB)

        return read_db.query(

            "SELECT * FROM order_summaries WHERE user_id = :uid",

            {"uid": user_id}

        )

CQRS Without Event Sourcing

You don't need event sourcing to use CQRS. The most common pattern is:

**Write** to a normalized PostgreSQL table

2. **Sync** (or async via CDC) to a read-optimized table

3. **Read** from the read table


-- Write model: normalized

CREATE TABLE orders (

    id UUID PRIMARY KEY,

    user_id UUID NOT NULL,

    status VARCHAR(20) NOT NULL,

    total_cents BIGINT NOT NULL,

    created_at TIMESTAMP DEFAULT NOW()

);



CREATE TABLE order_items (

    id UUID PRIMARY KEY,

    order_id UUID REFERENCES orders(id),

    product_id UUID NOT NULL,

    quantity INT NOT NULL,

    unit_price_cents BIGINT NOT NULL

);



-- Read model: denormalized for fast queries

CREATE TABLE order_summaries (

    order_id UUID PRIMARY KEY,

    user_id UUID NOT NULL,

    status VARCHAR(20) NOT NULL,

    item_count INT NOT NULL,

    total_cents BIGINT NOT NULL,

    product_names TEXT[] NOT NULL,

    created_at TIMESTAMP DEFAULT NOW()

);

When NOT to Use CQRS

Your app is a simple CRUD interface with no complex queries

You don't need separate read/write scaling

Your team is small and you can't justify the infrastructure overhead

---

3. Event-Driven Architecture

Event-driven systems decouple producers from consumers. When an event happens, interested services react.

Core Concepts


┌──────────┐     Event Bus      ┌──────────────┐

│ Producer │─────(Kafka/RMQ)────▶│  Consumer 1  │

│ (Orders) │                    │ (Analytics)  │

└──────────┘                    └──────────────┘

                ───────────────▶┌──────────────┐

                                │  Consumer 2   │

                                │ (Email)       │

                                └──────────────┘

Message Queue Comparison

|---|---|---|---|

Kafka in Practice: The Url Shortener Click Stream


# Producer — emit click events

def record_click(short_code: str, ip: str, user_agent: str):

    producer.send(

        topic="url_clicks",

        key=short_code.encode(),  # Same key → same partition → ordered

        value={

            "short_code": short_code,

            "ip": ip,

            "user_agent": user_agent,

            "timestamp": int(time.time()),

        }

    )



# Consumer 1 — real-time analytics (e.g., update Redis counters)

def consume_clicks_for_analytics():

    for message in consumer:

        click = message.value

        redis.zincrby("popular_urls:today", 1, click["short_code"])

        redis.incr(f"url:{click['short_code']}:clicks")



# Consumer 2 — store raw clicks in data warehouse

def consume_clicks_for_storage():

    for message in consumer:

        warehouse.insert_one(message.value)

Event Sourcing: Storing State as Events

Instead of storing the current state, event sourcing stores a sequence of state-changing events. The current state is derived by replaying them.


# Events (immutable facts)

events = [

    {"type": "AccountCreated", "data": {"user_id": "u1", "email": "a@b.com"}},

    {"type": "EmailVerified", "data": {"user_id": "u1", "verified_at": "2026-05-01"}},

    {"type": "PasswordChanged", "data": {"user_id": "u1", "changed_at": "2026-05-10"}},

]



# Derive current state by replaying events

def get_account_state(events):

    state = {"email": None, "email_verified": False, "password_hash": None}

    for event in events:

        if event["type"] == "AccountCreated":

            state["email"] = event["data"]["email"]

        elif event["type"] == "EmailVerified":

            state["email_verified"] = True

    return state

**Trade-offs**: Event sourcing gives you a complete audit trail and time travel, but makes querying awkward (you need projections) and schema evolution painful.

---

4. Database Scaling Strategies

Read Replicas

The simplest scaling strategy: one primary handles writes, replicas handle reads.


                        ┌─────────────┐

                        │  Primary DB  │◀── Writes

                        └──────┬──────┘

                               │

              ┌────────────────┼────────────────┐

              │                │                │

         ┌────▼─────┐   ┌─────▼────┐   ┌───────▼──┐

         │ Replica 1│   │ Replica 2│   │ Replica 3│

         │  (Reads) │   │  (Reads) │   │  (Reads) │

         └──────────┘   └──────────┘   └──────────┘


# Using read/write separation in code

class DatabaseRouter:

    def __init__(self):

        self.primary = create_engine(PRIMARY_URL)

        self.replicas = [create_engine(url) for url in REPLICA_URLS]

        self.replica_index = 0



    def write(self, query, params=None):

        with self.primary.begin() as conn:

            return conn.execute(query, params or {})



    def read(self, query, params=None):

        # Round-robin across replicas

        replica = self.replicas[self.replica_index % len(self.replicas)]

        self.replica_index += 1

        return replica.execute(query, params or {})

**Replication lag** is the #1 problem. If your app reads immediately after a write (e.g., "you just placed an order" page), route that read to the primary. This is called **read-after-write consistency**.


async def create_order_and_redirect(user_id: str, items: list):

    order_id = db.write("INSERT INTO orders ... RETURNING id")



    # Read-after-write: force this read to the primary

    order = db.read_from_primary(

        "SELECT * FROM orders WHERE id = :oid", {"oid": order_id}

    )



    return redirect(f"/orders/{order_id}")

Sharding (Horizontal Partitioning)

Split data across databases by a shard key.

|---|---|---|---|


# Consistent hashing — minimizes re-sharding

class ConsistentHashRing:

    def __init__(self, nodes: list, replicas: int = 150):

        self.ring = {}

        for node in nodes:

            for i in range(replicas):

                key = self._hash(f"{node}:{i}")

                self.ring[key] = node

        self.sorted_keys = sorted(self.ring.keys())



    def get_node(self, key: str) -> str:

        if not self.ring:

            return None

        hash_val = self._hash(key)

        for ring_key in self.sorted_keys:

            if hash_val <= ring_key:

                return self.ring[ring_key]

        return self.ring[self.sorted_keys[0]]



    def _hash(self, key: str) -> int:

        return int(hashlib.md5(key.encode()).hexdigest(), 16)

Partitioning (Within a Database)

Split a table into smaller physical chunks. PostgreSQL declarative partitioning:


CREATE TABLE events (

    event_id UUID NOT NULL,

    occurred_at TIMESTAMP NOT NULL,

    payload JSONB

) PARTITION BY RANGE (occurred_at);



CREATE TABLE events_2026_q1

    PARTITION OF events

    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');



CREATE TABLE events_2026_q2

    PARTITION OF events

    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

Partition pruning means queries with `WHERE occurred_at >= '2026-04-01'` only scan relevant partitions.

---

5. Caching Layers

The Three Cache Levels


CDN ─── Application Cache (Redis) ─── In-Memory Cache (Local)

│               │                           │

│        Expensive to fill             Fastest access

│        Shared across servers         1-5μs per get

│        50-500μs per get              Lost on restart

Cache Strategies

|---|---|---|---|

Cache Aside — The Default Choice


async def get_user_profile(user_id: str) -> dict:

    cache_key = f"user:profile:{user_id}"



    # 1. Try cache

    cached = await redis.get(cache_key)

    if cached:

        return json.loads(cached)



    # 2. Cache miss — load from database

    profile = await db.query(

        "SELECT * FROM user_profiles WHERE user_id = :uid",

        {"uid": user_id}

    )



    if profile:

        # 3. Populate cache with TTL

        await redis.setex(cache_key, 300, json.dumps(profile))



    return profile



async def update_user_profile(user_id: str, data: dict):

    # 1. Write to database

    await db.execute(

        "UPDATE user_profiles SET name = :name WHERE user_id = :uid",

        {"uid": user_id, "name": data["name"]}

    )



    # 2. Invalidate cache (don't update it — let next read re-populate)

    await redis.delete(f"user:profile:{user_id}")

Write Behind — For High-Volume Writes


# Batch writer process — runs every 5 seconds

write_buffer = []



async def write_to_cache(key: str, value: dict):

    write_buffer.append((key, value))

    if len(write_buffer) >= 100:

        await flush_buffer()



async def flush_buffer():

    async with db.transaction():

        for key, value in write_buffer:

            await db.execute(

                "UPSERT INTO ... VALUES (:k, :v)",

                {"k": key, "v": json.dumps(value)}

            )

    write_buffer.clear()



# Start background flusher

async def periodic_flush():

    while True:

        await asyncio.sleep(5)

        if write_buffer:

            await flush_buffer()

**Write behind risk**: if the process crashes before the flush, data is lost. Use a persistent queue (Kafka) for critical writes.

---

6. CAP Theorem Explained Practically

CAP says a distributed data store can provide at most two of three guarantees: **Consistency**, **Availability**, and **Partition Tolerance**.

What CAP Actually Means

**C (Consistency)**: Every read sees the most recent write (or an error)

**A (Availability)**: Every request gets a non-error response (not necessarily the latest data)

**P (Partition Tolerance)**: System continues working despite network failures

The Key Insight

You **must** choose CP or AP. Partition tolerance is non-negotiable in distributed systems — networks WILL fail.

| System | Choice | Real-World |

|---|---|---|

| PostgreSQL (single node) | CA | No distribution, no partition |

| PostgreSQL + synchronous replication | CP | Writes wait for replicas |

| Cassandra | AP | Writes always succeed, reads may be stale |

| DynamoDB (eventual consistency) | AP | Default read is eventually consistent |

| DynamoDB (strongly consistent) | CP | Higher latency, lower availability |

| MongoDB (replica set) | CP | Writes acknowledged by majority |

Practical CAP Decisions


# AP choice — accept stale reads for availability

async def get_product_stock(product_id: str) -> int:

    # Read from nearest replica, may be stale

    return await replica.query(

        "SELECT stock FROM products WHERE id = :pid",

        {"pid": product_id}

    )



# CP choice — accept slower reads for consistency

async def get_product_stock_cp(product_id: str) -> int:

    # Read from primary, always latest

    return await primary.query(

        "SELECT stock FROM products WHERE id = :pid",

        {"pid": product_id}

    )

**Rule of thumb**: Use eventual consistency for read-heavy, non-critical data (product descriptions, view counts). Use strong consistency for financial data, inventory, and auth tokens.

---

7. Load Balancing Strategies

Layer 4 vs Layer 7

| Aspect | Layer 4 (TCP) | Layer 7 (HTTP) |

|---|---|---|

| **Routing based on** | IP + port | URL, headers, cookies, body |

| **Performance** | Very fast | Slower (inspects payload) |

| **Features** | Simple forwarding | Content-based routing, rate limiting |

| **Examples** | HAProxy (TCP mode), AWS NLB | NGINX, Envoy, AWS ALB |

Algorithms


# Round Robin — predictable, but doesn't handle different load sizes

servers = ["app-01", "app-02", "app-03"]

next_server = current_index % len(servers)

current_index += 1



# Least Connections — better for variable request durations

def least_connections(servers: list) -> str:

    return min(servers, key=lambda s: s.active_connections)



# IP Hash — session persistence without cookies

def ip_hash(client_ip: str, servers: list) -> str:

    hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)

    return servers[hash_val % len(servers)]

Health Checks: The Bare Minimum


┌──────────┐    /healthz    ┌──────────┐

│  LB      │───────────────▶│  App-01  │──▶ Returns 200

│          │                ├──────────┤

│          │───────────────▶│  App-02  │──▶ Returns 500 (removed from pool)

│          │                ├──────────┤

│          │───────────────▶│  App-03  │──▶ Returns 200

└──────────┘                └──────────┘


# /healthz endpoint

@app.get("/healthz")

async def health_check():

    # Check critical dependencies

    db_ok = await check_database()

    cache_ok = await check_redis()

    if db_ok and cache_ok:

        return {"status": "ok"}

    return {"status": "degraded"}, 503

---

8. API Gateway Patterns

An API gateway sits between clients and your services, handling cross-cutting concerns.


                   ┌─────────────────┐

                   │    API Gateway   │

                   │ ┌─────────────┐  │

  Client ──────────┼─▶  Auth       │  │

                   │ └─────────────┘  │

                   │ ┌─────────────┐  │

                   │─▶  Rate Limit │  │──▶ Service A

                   │ └─────────────┘  │

                   │ ┌─────────────┐  │──▶ Service B

                   │─▶  Routing   │  │

                   │ └─────────────┘  │──▶ Service C

                   │ ┌─────────────┐  │

                   │─▶  Logging   │  │

                   │ └─────────────┘  │

                   └─────────────────┘

What the Gateway Handles


# Before gateway — each service handles auth

@app.route("/api/orders")

class OrdersResource:

    def get(self):

        token = request.headers["Authorization"]

        user = verify_token(token)  # Duplicated in EVERY service



# After gateway — auth is centralized

# Service code is simpler:

@app.route("/api/orders")

class OrdersResource:

    def get(self):

        user = request.environ["X-Authenticated-User"]  # Set by gateway

        return get_orders(user["id"])

Gateway vs Service Mesh

| Concern | API Gateway | Service Mesh (e.g., Istio) |

|---|---|---|

| **Client-facing** | Yes (edge) | No (internal) |

| **Auth** | Token verification, API keys | mTLS between services |

| **Rate limiting** | Per-client, per-endpoint | Per-service |

| **Routing** | URL-based | Traffic splitting, canary |

| **Location** | Edge proxy | Sidecar per pod |

**Recommendation**: Start with an API gateway. Add a service mesh only when you have dozens of services and need advanced traffic management.

---

9. Circuit Breaker and Resilience Patterns

The Circuit Breaker Pattern


class CircuitBreaker:

    STATES = ["CLOSED", "OPEN", "HALF_OPEN"]



    def __init__(self, failure_threshold=5, recovery_timeout=30):

        self.failure_count = 0

        self.failure_threshold = failure_threshold

        self.recovery_timeout = recovery_timeout  # seconds

        self.state = "CLOSED"

        self.last_failure_time = None



    async def call(self, func, fallback=None):

        if self.state == "OPEN":

            if time.time() - self.last_failure_time > self.recovery_timeout:

                self.state = "HALF_OPEN"

            else:

                return await fallback() if fallback else None



        try:

            result = await func()

            if self.state == "HALF_OPEN":

                self.state = "CLOSED"

                self.failure_count = 0

            return result

        except Exception as e:

            self.failure_count += 1

            self.last_failure_time = time.time()

            if self.failure_count >= self.failure_threshold:

                self.state = "OPEN"

            return await fallback() if fallback else None





# Usage

cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60)



async def get_recommendations(user_id: str):

    return await cb.call(

        func=lambda: recommendations_service.fetch(user_id),

        fallback=lambda: {"recommendations": [], "source": "fallback"}

    )

Other Resilience Patterns

| Pattern | What It Does |

|---|---|

| **Retry with backoff** | Exponential backoff + jitter to avoid thundering herd |

| **Timeout** | Hard timeout per request (e.g., 5s) to prevent cascading |

| **Bulkhead** | Isolate resources — limit connections per service |

| **Rate limiting** | Token bucket or leaky bucket per client |

| **Dead letter queue** | Failed messages go to a DLQ for manual inspection |


# Retry with exponential backoff and jitter

async def retry_with_backoff(func, max_retries=3):

    for attempt in range(max_retries):

        try:

            return await func()

        except (ConnectionError, TimeoutError) as e:

            if attempt == max_retries - 1:

                raise

            sleep_time = (2 ** attempt) + random.random()  # exp + jitter

            await asyncio.sleep(sleep_time)

---

10. Real Example: Design a URL Shortener

Let's design bit.ly/tinyurl step by step.

Requirements

Generate a short, unique code for any URL

Redirect to the original URL when the short code is accessed

Track click analytics (count, referrer, timestamp)

Handle 10M URLs, 100M redirects/day

Step 1: URL Encoding


BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"



def encode_base62(num: int) -> str:

    if num == 0:

        return BASE62[0]

    result = []

    while num > 0:

        result.append(BASE62[num % 62])

        num //= 62

    return ''.join(reversed(result))



def decode_base62(code: str) -> int:

    result = 0

    for char in code:

        result = result * 62 + BASE62.index(char)

    return result



# Example: 7 chars of base62 = 62^7 ≈ 3.5 trillion unique URLs

encode_base62(123456789)  # "8m0Kx"

Step 2: Architecture


                                         ┌────────────┐

                                         │  Analytics  │

                                         │  (Kafka →   │

                                         │   ClickHouse)│

                                         └─────────────┘

                                               ▲

                                               │ (async)

┌──────────┐    POST /shorten    ┌──────────────────────────┐

│  Client  │────────────────────▶│     API Gateway          │

│          │                     │  ┌────────────────────┐  │

│          │    GET /abc123      │  │  Write Service     │──┼──▶ PostgreSQL (URLs)

│          │────────────────────▶│  │  (generate code)   │  │

│          │                     │  └────────────────────┘  │

│          │  301 Redirect      │  ┌────────────────────┐  │

│          │◀────────────────────│  │  Read Service      │  │

│          │                     │  │  (resolve + cache) │──┼──▶ Redis (cache)

│          │                     │  └────────────────────┘  │

│          │                     │  ┌────────────────────┐  │

│          │                     │  │  Click Logger      │──┼──▶ Kafka

│          │                     │  └────────────────────┘  │

└──────────┘                     └──────────────────────────┘

Step 3: Data Model


-- PostgreSQL

CREATE TABLE urls (

    id BIGSERIAL PRIMARY KEY,

    short_code VARCHAR(10) UNIQUE NOT NULL,

    original_url TEXT NOT NULL,

    user_id UUID,          -- nullable for anonymous users

    created_at TIMESTAMP DEFAULT NOW(),

    expires_at TIMESTAMP   -- nullable

);



CREATE INDEX idx_short_code ON urls(short_code);



-- Redis cache

-- Key: "url:abc123" → Value: "https://example.com/long-url"

-- TTL: 24 hours

Step 4: Write Path


@app.post("/shorten")

async def shorten_url(url: str, user_id: str = None):

    # 1. Check if URL already shortened (optimization)

    existing = await db.query(

        "SELECT short_code FROM urls WHERE original_url = :url AND user_id = :uid",

        {"url": url, "uid": user_id}

    )

    if existing:

        return {"short_url": f"https://short.domain/{existing['short_code']}"}



    # 2. Generate unique code

    short_code = await generate_unique_code()



    # 3. Store in DB

    await db.execute(

        "INSERT INTO urls (short_code, original_url, user_id) VALUES (:c, :u, :uid)",

        {"c": short_code, "u": url, "uid": user_id}

    )



    # 4. Warm the cache

    await redis.setex(f"url:{short_code}", 86400, url)



    return {"short_url": f"https://short.domain/{short_code}"}





async def generate_unique_code() -> str:

    for _ in range(3):  # Retry on collision

        code = encode_base62(random.randint(0, 62**7 - 1))

        exists = await db.query(

            "SELECT 1 FROM urls WHERE short_code = :c", {"c": code}

        )

        if not exists:

            return code

    raise Exception("Collision rate too high — increase code length")

Step 5: Read Path (The Hot Path — Handles 100M req/day)


@app.get("/{short_code}")

async def redirect(short_code: str, request: Request):

    # 1. Try cache (99% hit rate with 24h TTL)

    original_url = await redis.get(f"url:{short_code}")

    if not original_url:

        # 2. Cache miss — hit DB

        row = await db.query(

            "SELECT original_url FROM urls WHERE short_code = :c",

            {"c": short_code}

        )

        if not row:

            raise HTTPException(status_code=404)



        original_url = row["original_url"]



        # 3. Populate cache with TTL

        await redis.setex(f"url:{short_code}", 86400, original_url)



    # 4. Log click asynchronously (don't block the redirect)

    click_event = {

        "short_code": short_code,

        "ip": request.client.host,

        "user_agent": request.headers.get("user-agent"),

        "referer": request.headers.get("referer"),

        "timestamp": int(time.time()),

    }

    # Fire and forget — queue to Kafka

    await click_producer.send("url_clicks", click_event)



    # 5. Redirect (301 for permanent, 302 for analytics)

    return RedirectResponse(url=original_url, status_code=301)

Step 6: Scale Considerations

**Read replicas** for URL resolution (read-heavy: 10:1 read-to-write ratio)

**Redis cluster** for cache (with consistent hashing)

**Kafka partitions** by short_code for ordered click logs

**Batch write** click analytics to ClickHouse every 30 seconds

**CDN** for the redirect page itself (not the API — API calls are cheap)

---

11. Async Processing Patterns

The Problem: Synchronous Chains


Client ──▶ Service A ──▶ Service B ──▶ Service C ──▶ Response

                         500ms          800ms        200ms = 1.5s total

The client waits 1.5 seconds for something that doesn't need a response.

Solution: Decouple with Async


Client ──▶ Service A ──▶ Response (immediate: "Accepted")

                │

                ▼

           Queue (Kafka/SQS)

                │

         ┌──────┴──────┐

         ▼              ▼

    Service B      Service C

    (email)        (generate PDF)

Pattern 1: Fire and Forget


@app.post("/api/send-email")

async def send_email(request: EmailRequest):

    # Validate request

    if not request.valid:

        raise HTTPException(400)



    # Queue the work — don't wait

    await email_queue.send({

        "to": request.to,

        "template": request.template,

        "data": request.data,

    })



    # Return immediately

    return {"status": "queued", "message_id": str(uuid.uuid4())}

Pattern 2: Polling with Status


@app.post("/api/report/generate")

async def generate_report(params: ReportParams):

    report_id = str(uuid.uuid4())

    await report_queue.send({"report_id": report_id, "params": params})

    return {"report_id": report_id, "status_url": f"/api/report/{report_id}/status"}



@app.get("/api/report/{report_id}/status")

async def check_status(report_id: str):

    status = await redis.get(f"report:{report_id}:status")

    if status == "ready":

        return {"status": "ready", "url": f"/api/report/{report_id}/download"}

    return {"status": "processing"}

Pattern 3: Webhook Callback

Instead of polling, have the worker call a URL when done:


async def process_report(report_id: str, params: dict, callback_url: str):

    # ... generate report ...

    await save_report(report_id, result)



    # Notify caller

    if callback_url:

        await httpx.post(callback_url, json={

            "report_id": report_id,

            "status": "completed",

            "download_url": f"/api/report/{report_id}/download",

        })

---

12. Common Anti-Patterns

1. The Distributed Monolith

You split into microservices but deploy them together and fail to maintain boundaries. Every service calls every other service directly. Schema changes ripple across the system.

**Signs**: A "simple" feature touches 5+ services. You need to coordinate deploys across teams. Services share a database — or god forbid, tables.

**Fix**: Enforce bounded contexts. Each service owns its data. Communication is via APIs or events, not shared databases.

2. Over-Engineering from Day One

"Let's use Kafka, Cassandra, Kubernetes, and event sourcing" — for a blog with 10 visitors/day.

**Fix**: Start with the simplest thing that works. A monolith with PostgreSQL and Redis will handle 99% of applications. Extract services when there's a proven need.

3. Synchronous Coupling via HTTP


Service A ──HTTP──▶ Service B ──HTTP──▶ Service C ──HTTP──▶ Service D

If one service is slow, the whole chain slows. Latency adds up. Failures cascade.

**Fix**: Use async communication for non-critical paths. Use circuit breakers for critical sync calls. Prefer eventual consistency over synchronous coordination.

4. The Shared Database

Two services reading/writing the same database table. Schema changes require coordination. One service can deadlock the other.

**Fix**: Each service owns its data. Share via APIs or events, not databases.

5. Ignoring Caching

Every request hits the database. Database CPU is 90%. Response times are 200ms for data that changes hourly.

**Fix**: Add Redis. Cache the most frequently accessed data. Even a 60-second cache TTL reduces DB load by 95% for read-heavy workloads.

6. The N+1 Query Problem


# Anti-pattern: N+1 queries

def get_orders_with_items(user_id: str):

    orders = db.query("SELECT * FROM orders WHERE user_id = :uid", {"uid": user_id})

    for order in orders:

        # One query PER order — terrible!

        order["items"] = db.query(

            "SELECT * FROM order_items WHERE order_id = :oid",

            {"oid": order["id"]}

        )

    return orders



# Fix: single query with JOIN

def get_orders_with_items_fixed(user_id: str):

    return db.query("""

        SELECT o.id, o.total, oi.product_id, oi.quantity

        FROM orders o

        LEFT JOIN order_items oi ON oi.order_id = o.id

        WHERE o.user_id = :uid

    """, {"uid": user_id})

7. No Monitoring / No Observability

"Everything looks fine" — until users complain that the site is slow and you have no idea why.

**Baseline monitoring**: Request latency (p50, p95, p99), error rate, throughput, CPU/memory per service. Structured logging with correlation IDs. Distributed tracing for async flows.

---

Summary: Key Decisions for 2026

| Decision | Default Choice | Upgrade When |

|---|---|---|

| **Architecture** | Modular monolith | Team >15 or clear independent scale need |

| **Database** | PostgreSQL | Read replicas at 10k reads/s, sharding at 100k |

| **Cache** | Redis (cache aside) | Write-behind for high-throughput writes |

| **Queue** | SQS (serverless) → RabbitMQ (control) → Kafka (streaming) | Scale-dependent |

| **Async** | Fire and forget for non-critical | Polling → Webhooks as needs grow |

| **API Gateway** | NGINX / Traefik | Envoy / Kong for advanced routing |

| **Resilience** | Circuit breaker + timeout | Bulkhead + rate limiting at scale |

The best system design is the one that solves today's problem without creating tomorrow's nightmare. Start simple, measure everything, extract with surgical precision, and never optimize for a scale you haven't reached.

System Design Fundamentals 2026: A Developer Guide to Scalable Applications

System Design Fundamentals 2026: A Developer's Guide to Scalable Applications

1. Microservices vs Monolith vs Modular Monolith

The Modular Monolith Sweet Spot

Real-World Decision Tree

2. CQRS: Command Query Responsibility Segregation

When CQRS Makes Sense

A Simple CQRS Implementation

CQRS Without Event Sourcing

When NOT to Use CQRS

3. Event-Driven Architecture

Core Concepts

Message Queue Comparison

Kafka in Practice: The Url Shortener Click Stream

Event Sourcing: Storing State as Events

4. Database Scaling Strategies

Read Replicas

Sharding (Horizontal Partitioning)

Partitioning (Within a Database)

5. Caching Layers

The Three Cache Levels

Cache Strategies

Cache Aside — The Default Choice

Write Behind — For High-Volume Writes

6. CAP Theorem Explained Practically

What CAP Actually Means

The Key Insight

Practical CAP Decisions

7. Load Balancing Strategies

Layer 4 vs Layer 7

Algorithms

Health Checks: The Bare Minimum

8. API Gateway Patterns

What the Gateway Handles

Gateway vs Service Mesh

9. Circuit Breaker and Resilience Patterns

The Circuit Breaker Pattern

Other Resilience Patterns

10. Real Example: Design a URL Shortener

Requirements

Step 1: URL Encoding

Step 2: Architecture

Step 3: Data Model

Step 4: Write Path

Step 5: Read Path (The Hot Path — Handles 100M req/day)

Step 6: Scale Considerations

11. Async Processing Patterns

The Problem: Synchronous Chains

Solution: Decouple with Async

Pattern 1: Fire and Forget

Pattern 2: Polling with Status

Pattern 3: Webhook Callback

12. Common Anti-Patterns

1. The Distributed Monolith

2. Over-Engineering from Day One

3. Synchronous Coupling via HTTP

4. The Shared Database

5. Ignoring Caching

6. The N+1 Query Problem

7. No Monitoring / No Observability

Summary: Key Decisions for 2026

Related Articles