System Design Fundamentals 2026: A Developer Guide to Scalable Applications

System Design Fundamentals 2026: A Developer's Guide to Scalable Applications

System design interviews get all the attention, but the real value is in day-to-day decisions: should you extract that service? Add a cache? Reach for a message queue? This guide covers the fundamental patterns, their trade-offs, and the concrete decisions you'll face building production systems in 2026.

\---

1\. Microservices vs Monolith vs Modular Monolith

The "monolith vs microservices" debate has matured. In 2026, the winner is often somewhere in between.

| Architecture | Team Size | Deploy Frequency | Best For | |---|---|---|---| | **Monolith** | 1–5 | Low | Prototypes, internal tools, MVPs | | **Modular Monolith** | 3–15 | Medium | Most business apps, teams that aren't Spotify-sized | | **Microservices** | 10+ per service | High | Large orgs with clear domain boundaries |

The Modular Monolith Sweet Spot

A modular monolith is a single deployable unit with **strict module boundaries**. Modules communicate through well-defined interfaces but share the same process and database.

┌─────────────────────────────────────┐

│ Modular Monolith │

│ ┌──────────┐ ┌──────────┐ │

│ │ Orders │ │ Billing │ │

│ │ Module │──│ Module │ │

│ └────┬─────┘ └────┬─────┘ │

│ │ │ │

│ ┌────▼──────────────▼─────┐ │

│ │ Shared Kernel │ │

│ │ (DB, messaging, auth) │ │

│ └─────────────────────────┘ │

└─────────────────────────────────────┘

**When to extract a service**: When two conditions are met — the module has a clear bounded context (DDD), and you need independent scaling or deploy velocity that the monolith can't provide.

**Rule of thumb**: Don't break your monolith until it hurts. Premature microservices add distributed transaction complexity, network latency, and operational overhead. Start modular, extract surgically.

Real-World Decision Tree

Monolith → Modular Monolith → Selective Extraction → Full Microservices

MVP Phase: Monolith

10k users/5 devs: Modular monolith

100k users: Extract payments (PCI scope)

1M users: Extract search (separate scale)

10M users: Extract recommendations (different stack)

\---

2\. CQRS: Command Query Responsibility Segregation

CQRS separates **reads** from **writes** — different models, sometimes different databases.

When CQRS Makes Sense

* Your read queries are complex and don't map well to your write model (e.g., reporting dashboards)

* Your read and write workloads have different scaling requirements (10:1 read-to-write ratio)

* You need different data shapes for reading vs writing (e.g., write normalized, read denormalized)

A Simple CQRS Implementation

# --- Command Side (Writes) ---

class CreateOrderCommand:

def __init__(self, user_id: str, items: list):

self.user_id = user_id

self.items = items

class OrderCommandHandler:

def handle(self, cmd: CreateOrderCommand) -> str:

# Validate business rules

order = Order.create(cmd.user_id, cmd.items)

order.save() # Write to transactional DB (PostgreSQL)

event_bus.publish("order.created", {"order_id": order.id})

return order.id

# --- Query Side (Reads) ---

class OrderQueryHandler:

def get_order_summary(self, user_id: str) -> dict:

# Read from denormalized read model (could be a different DB)

return read_db.query(

"SELECT * FROM order_summaries WHERE user_id = :uid",

{"uid": user_id}

)

CQRS Without Event Sourcing

You don't need event sourcing to use CQRS. The most common pattern is:

* **Write** to a normalized PostgreSQL table

2\. **Sync** (or async via CDC) to a read-optimized table 3\. **Read** from the read table

-- Write model: normalized

CREATE TABLE orders (

id UUID PRIMARY KEY,

user_id UUID NOT NULL,

status VARCHAR(20) NOT NULL,

total_cents BIGINT NOT NULL,

created_at TIMESTAMP DEFAULT NOW()

);

CREATE TABLE order_items (

id UUID PRIMARY KEY,

order_id UUID REFERENCES orders(id),

product_id UUID NOT NULL,

quantity INT NOT NULL,

unit_price_cents BIGINT NOT NULL

);

-- Read model: denormalized for fast queries

CREATE TABLE order_summaries (

order_id UUID PRIMARY KEY,

user_id UUID NOT NULL,

status VARCHAR(20) NOT NULL,

item_count INT NOT NULL,

total_cents BIGINT NOT NULL,

product_names TEXT[] NOT NULL,

created_at TIMESTAMP DEFAULT NOW()

);

When NOT to Use CQRS

* Your app is a simple CRUD interface with no complex queries

* You don't need separate read/write scaling

* Your team is small and you can't justify the infrastructure overhead

\---

3\. Event-Driven Architecture

Event-driven systems decouple producers from consumers. When an event happens, interested services react.

Core Concepts

┌──────────┐ Event Bus ┌──────────────┐

│ Producer │─────(Kafka/RMQ)────▶│ Consumer 1 │

│ (Orders) │ │ (Analytics) │

└──────────┘ └──────────────┘

───────────────▶┌──────────────┐

│ Consumer 2 │

│ (Email) │

└──────────────┘

Message Queue Comparison

| Feature | Kafka | RabbitMQ | SQS | |---|---|---|---| | **Delivery** | At-least-once, exactly-once (idempotent) | At-most-once, at-least-once | At-least-once | | **Ordering** | Per-partition guaranteed | Not guaranteed (unless single queue) | FIFO queue (limited throughput) | | **Persistence** | Disk-based, configurable retention | Memory + disk (lazy queues) | Automatic (up to 14 days) | | **Throughput** | Millions/sec | Thousands/sec | Unlimited (soft limit 300/s for FIFO) | | **Consumer model** | Pull-based (offset tracking) | Push or pull | Pull-based (long polling) | | **Use case** | Event sourcing, stream processing, logs | Task queues, RPC, work queues | Serverless workloads, simple decoupling | | **Operational cost** | High (requires Zookeeper/KRaft) | Medium | Zero (fully managed) |

Kafka in Practice: The Url Shortener Click Stream

# Producer — emit click events

def record_click(short_code: str, ip: str, user_agent: str):

producer.send(

topic="url_clicks",

key=short_code.encode(), # Same key → same partition → ordered

value={

"short_code": short_code,

"ip": ip,

"user_agent": user_agent,

"timestamp": int(time.time()),

}

)

# Consumer 1 — real-time analytics (e.g., update Redis counters)

def consume_clicks_for_analytics():

for message in consumer:

click = message.value

redis.zincrby("popular_urls:today", 1, click["short_code"])

redis.incr(f"url:{click['short_code']}:clicks")

# Consumer 2 — store raw clicks in data warehouse

def consume_clicks_for_storage():

for message in consumer:

warehouse.insert_one(message.value)

Event Sourcing: Storing State as Events

Instead of storing the current state, event sourcing stores a sequence of state-changing events. The current state is derived by replaying them.

# Events (immutable facts)

events = [

{"type": "AccountCreated", "data": {"user_id": "u1", "email": "a@b.com"}},

{"type": "EmailVerified", "data": {"user_id": "u1", "verified_at": "2026-05-01"}},

{"type": "PasswordChanged", "data": {"user_id": "u1", "changed_at": "2026-05-10"}},

]

# Derive current state by replaying events

def get_account_state(events):

state = {"email": None, "email_verified": False, "password_hash": None}

for event in events:

if event["type"] == "AccountCreated":

state["email"] = event["data"]["email"]

elif event["type"] == "EmailVerified":

state["email_verified"] = True

return state

**Trade-offs**: Event sourcing gives you a complete audit trail and time travel, but makes querying awkward (you need projections) and schema evolution painful.

\---

4\. Database Scaling Strategies

Read Replicas

The simplest scaling strategy: one primary handles writes, replicas handle reads.

┌─────────────┐

│ Primary DB │◀── Writes

└──────┬──────┘

│

┌────────────────┼────────────────┐

│ │ │

┌────▼─────┐ ┌─────▼────┐ ┌───────▼──┐

│ Replica 1│ │ Replica 2│ │ Replica 3│

│ (Reads) │ │ (Reads) │ │ (Reads) │

└──────────┘ └──────────┘ └──────────┘

# Using read/write separation in code

class DatabaseRouter:

def __init__(self):

self.primary = create_engine(PRIMARY_URL)

self.replicas = [create_engine(url) for url in REPLICA_URLS]

self.replica_index = 0

def write(self, query, params=None):

with self.primary.begin() as conn:

return conn.execute(query, params or {})

def read(self, query, params=None):

# Round-robin across replicas

replica = self.replicas[self.replica_index % len(self.replicas)]

self.replica_index += 1

return replica.execute(query, params or {})

**Replication lag** is the #1 problem. If your app reads immediately after a write (e.g., "you just placed an order" page), route that read to the primary. This is called **read-after-write consistency**.

async def create_order_and_redirect(user_id: str, items: list):

order_id = db.write("INSERT INTO orders ... RETURNING id")

# Read-after-write: force this read to the primary

order = db.read_from_primary(

"SELECT * FROM orders WHERE id = :oid", {"oid": order_id}

)

return redirect(f"/orders/{order_id}")

Sharding (Horizontal Partitioning)

Split data across databases by a shard key.

| Strategy | Shard Key | Pros | Cons | |---|---|---|---| | **Hash-based** | hash(user_id) % N | Even distribution | Resharding is painful (need consistent hashing) | | **Range-based** | user_id 1–10000 → shard 1 | Range queries work | Hot spots possible | | **Directory-based** | Lookup table maps key → shard | Flexible, re-shardable | Extra lookup, single point of failure |

# Consistent hashing — minimizes re-sharding

class ConsistentHashRing:

def __init__(self, nodes: list, replicas: int = 150):

self.ring = {}

for node in nodes:

for i in range(replicas):

key = self._hash(f"{node}:{i}")

self.ring[key] = node

self.sorted_keys = sorted(self.ring.keys())

def get_node(self, key: str) -> str:

if not self.ring:

return None

hash_val = self._hash(key)

for ring_key in self.sorted_keys:

if hash_val <= ring_key:

return self.ring[ring_key]

return self.ring[self.sorted_keys[0]]

def _hash(self, key: str) -> int:

return int(hashlib.md5(key.encode()).hexdigest(), 16)

Partitioning (Within a Database)

Split a table into smaller physical chunks. PostgreSQL declarative partitioning:

CREATE TABLE events (

event_id UUID NOT NULL,

occurred_at TIMESTAMP NOT NULL,

payload JSONB

) PARTITION BY RANGE (occurred_at);

CREATE TABLE events_2026_q1

PARTITION OF events

FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE events_2026_q2

PARTITION OF events

FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

Partition pruning means queries with `WHERE occurred_at >= '2026-04-01'` only scan relevant partitions.

\---

5\. Caching Layers

The Three Cache Levels

CDN ─── Application Cache (Redis) ─── In-Memory Cache (Local)

│ │ │

│ Expensive to fill Fastest access

│ Shared across servers 1-5μs per get

│ 50-500μs per get Lost on restart

Cache Strategies

| Strategy | Read Behavior | Write Behavior | Best For | |---|---|---|---| | **Cache Aside** | Check cache → miss → load from DB → populate cache | Write to DB, invalidate cache key | Most general-purpose apps | | **Read Through** | Cache is authoritative; loads from DB on miss | Write through to DB; cache handles loading | When cache handles persistence | | **Write Through** | — | Write to cache first, then DB synchronously | Apps needing strong consistency | | **Write Behind** | — | Write to cache, async flush to DB | High-write-throughput apps | | **Write Around** | — | Write to DB only; cache populated on subsequent read | Write-once, read-rarely data |

Cache Aside — The Default Choice

async def get_user_profile(user_id: str) -> dict:

cache_key = f"user:profile:{user_id}"

# 1. Try cache

cached = await redis.get(cache_key)

if cached:

return json.loads(cached)

# 2. Cache miss — load from database

profile = await db.query(

"SELECT * FROM user_profiles WHERE user_id = :uid",

{"uid": user_id}

)

if profile:

# 3. Populate cache with TTL

await redis.setex(cache_key, 300, json.dumps(profile))

return profile

async def update_user_profile(user_id: str, data: dict):

# 1. Write to database

await db.execute(

"UPDATE user_profiles SET name = :name WHERE user_id = :uid",

{"uid": user_id, "name": data["name"]}

)

# 2. Invalidate cache (don't update it — let next read re-populate)

await redis.delete(f"user:profile:{user_id}")

Write Behind — For High-Volume Writes

# Batch writer process — runs every 5 seconds

write_buffer = []

async def write_to_cache(key: str, value: dict):

write_buffer.append((key, value))

if len(write_buffer) >= 100:

await flush_buffer()

async def flush_buffer():

async with db.transaction():

for key, value in write_buffer:

await db.execute(

"UPSERT INTO ... VALUES (:k, :v)",

{"k": key, "v": json.dumps(value)}

)

write_buffer.clear()

# Start background flusher

async def periodic_flush():

while True:

await asyncio.sleep(5)

if write_buffer:

await flush_buffer()

**Write behind risk**: if the process crashes before the flush, data is lost. Use a persistent queue (Kafka) for critical writes.

\---

6\. CAP Theorem Explained Practically

CAP says a distributed data store can provide at most two of three guarantees: **Consistency**, **Availability**, and **Partition Tolerance**.

What CAP Actually Means

* **C (Consistency)**: Every read sees the most recent write (or an error)

* **A (Availability)**: Every request gets a non-error response (not necessarily the latest data)

* **P (Partition Tolerance)**: System continues working despite network failures

The Key Insight

You **must** choose CP or AP. Partition tolerance is non-negotiable in distributed systems — networks WILL fail.

| System | Choice | Real-World | |---|---|---| | PostgreSQL (single node) | CA | No distribution, no partition | | PostgreSQL + synchronous replication | CP | Writes wait for replicas | | Cassandra | AP | Writes always succeed, reads may be stale | | DynamoDB (eventual consistency) | AP | Default read is eventually consistent | | DynamoDB (strongly consistent) | CP | Higher latency, lower availability | | MongoDB (replica set) | CP | Writes acknowledged by majority |

Practical CAP Decisions

# AP choice — accept stale reads for availability

async def get_product_stock(product_id: str) -> int:

# Read from nearest replica, may be stale

return await replica.query(

"SELECT stock FROM products WHERE id = :pid",

{"pid": product_id}

)

# CP choice — accept slower reads for consistency

async def get_product_stock_cp(product_id: str) -> int:

# Read from primary, always latest

return await primary.query(

"SELECT stock FROM products WHERE id = :pid",

{"pid": product_id}

)

**Rule of thumb**: Use eventual consistency for read-heavy, non-critical data (product descriptions, view counts). Use strong consistency for financial data, inventory, and auth tokens.

\---

7\. Load Balancing Strategies

Layer 4 vs Layer 7

| Aspect | Layer 4 (TCP) | Layer 7 (HTTP) | |---|---|---| | **Routing based on** | IP + port | URL, headers, cookies, body | | **Performance** | Very fast | Slower (inspects payload) | | **Features** | Simple forwarding | Content-based routing, rate limiting | | **Examples** | HAProxy (TCP mode), AWS NLB | NGINX, Envoy, AWS ALB |

Algorithms

# Round Robin — predictable, but doesn't handle different load sizes

servers = ["app-01", "app-02", "app-03"]

next_server = current_index % len(servers)

current_index += 1

# Least Connections — better for variable request durations

def least_connections(servers: list) -> str:

return min(servers, key=lambda s: s.active_connections)

# IP Hash — session persistence without cookies

def ip_hash(client_ip: str, servers: list) -> str:

hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)

return servers[hash_val % len(servers)]

Health Checks: The Bare Minimum

┌──────────┐ /healthz ┌──────────┐

│ LB │───────────────▶│ App-01 │──▶ Returns 200

│ │ ├──────────┤

│ │───────────────▶│ App-02 │──▶ Returns 500 (removed from pool)

│ │ ├──────────┤

│ │───────────────▶│ App-03 │──▶ Returns 200

└──────────┘ └──────────┘

# /healthz endpoint

@app.get("/healthz")

async def health_check():

# Check critical dependencies

db_ok = await check_database()

cache_ok = await check_redis()

if db_ok and cache_ok:

return {"status": "ok"}

return {"status": "degraded"}, 503

\---

8\. API Gateway Patterns

An API gateway sits between clients and your services, handling cross-cutting concerns.

┌─────────────────┐

│ API Gateway │

│ ┌─────────────┐ │

Client ──────────┼─▶ Auth │ │

│ └─────────────┘ │

│ ┌─────────────┐ │

│─▶ Rate Limit │ │──▶ Service A

│ └─────────────┘ │

│ ┌─────────────┐ │──▶ Service B

│─▶ Routing │ │

│ └─────────────┘ │──▶ Service C

│ ┌─────────────┐ │

│─▶ Logging │ │

│ └─────────────┘ │

└─────────────────┘

What the Gateway Handles

# Before gateway — each service handles auth

@app.route("/api/orders")

class OrdersResource:

def get(self):

token = request.headers["Authorization"]

user = verify_token(token) # Duplicated in EVERY service

# After gateway — auth is centralized

# Service code is simpler:

@app.route("/api/orders")

class OrdersResource:

def get(self):

user = request.environ["X-Authenticated-User"] # Set by gateway

return get_orders(user["id"])

Gateway vs Service Mesh

| Concern | API Gateway | Service Mesh (e.g., Istio) | |---|---|---| | **Client-facing** | Yes (edge) | No (internal) | | **Auth** | Token verification, API keys | mTLS between services | | **Rate limiting** | Per-client, per-endpoint | Per-service | | **Routing** | URL-based | Traffic splitting, canary | | **Location** | Edge proxy | Sidecar per pod |

**Recommendation**: Start with an API gateway. Add a service mesh only when you have dozens of services and need advanced traffic management.

\---

9\. Circuit Breaker and Resilience Patterns

The Circuit Breaker Pattern

class CircuitBreaker:

STATES = ["CLOSED", "OPEN", "HALF_OPEN"]

def __init__(self, failure_threshold=5, recovery_timeout=30):

self.failure_count = 0

self.failure_threshold = failure_threshold

self.recovery_timeout = recovery_timeout # seconds

self.state = "CLOSED"

self.last_failure_time = None

async def call(self, func, fallback=None):

if self.state == "OPEN":

if time.time() - self.last_failure_time > self.recovery_timeout:

self.state = "HALF_OPEN"

else:

return await fallback() if fallback else None

try:

result = await func()

if self.state == "HALF_OPEN":

self.state = "CLOSED"

self.failure_count = 0

return result

except Exception as e:

self.failure_count += 1

self.last_failure_time = time.time()

if self.failure_count >= self.failure_threshold:

self.state = "OPEN"

return await fallback() if fallback else None

# Usage

cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60)

async def get_recommendations(user_id: str):

return await cb.call(

func=lambda: recommendations_service.fetch(user_id),

fallback=lambda: {"recommendations": [], "source": "fallback"}

)

Other Resilience Patterns

| Pattern | What It Does | |---|---| | **Retry with backoff** | Exponential backoff + jitter to avoid thundering herd | | **Timeout** | Hard timeout per request (e.g., 5s) to prevent cascading | | **Bulkhead** | Isolate resources — limit connections per service | | **Rate limiting** | Token bucket or leaky bucket per client | | **Dead letter queue** | Failed messages go to a DLQ for manual inspection |

# Retry with exponential backoff and jitter

async def retry_with_backoff(func, max_retries=3):

for attempt in range(max_retries):

try:

return await func()

except (ConnectionError, TimeoutError) as e:

if attempt == max_retries - 1:

raise

sleep_time = (2 ** attempt) + random.random() # exp + jitter

await asyncio.sleep(sleep_time)

\---

10\. Real Example: Design a URL Shortener

Let's design bit.ly/tinyurl step by step.

Requirements

* Generate a short, unique code for any URL

* Redirect to the original URL when the short code is accessed

* Track click analytics (count, referrer, timestamp)

* Handle 10M URLs, 100M redirects/day

Step 1: URL Encoding

BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode_base62(num: int) -> str:

if num == 0:

return BASE62[0]

result = []

while num > 0:

result.append(BASE62[num % 62])

num //= 62

return ''.join(reversed(result))

def decode_base62(code: str) -> int:

result = 0

for char in code:

result = result * 62 + BASE62.index(char)

return result

# Example: 7 chars of base62 = 62^7 ≈ 3.5 trillion unique URLs

encode_base62(123456789) # "8m0Kx"

Step 2: Architecture

┌────────────┐

│ Analytics │

│ (Kafka → │

│ ClickHouse)│

└─────────────┘

▲

│ (async)

┌──────────┐ POST /shorten ┌──────────────────────────┐

│ Client │────────────────────▶│ API Gateway │

│ │ │ ┌────────────────────┐ │

│ │ GET /abc123 │ │ Write Service │──┼──▶ PostgreSQL (URLs)

│ │────────────────────▶│ │ (generate code) │ │

│ │ │ └────────────────────┘ │

│ │ 301 Redirect │ ┌────────────────────┐ │

│ │◀────────────────────│ │ Read Service │ │

│ │ │ │ (resolve + cache) │──┼──▶ Redis (cache)

│ │ │ └────────────────────┘ │

│ │ │ ┌────────────────────┐ │

│ │ │ │ Click Logger │──┼──▶ Kafka

│ │ │ └────────────────────┘ │

└──────────┘ └──────────────────────────┘

Step 3: Data Model

-- PostgreSQL

CREATE TABLE urls (

id BIGSERIAL PRIMARY KEY,

short_code VARCHAR(10) UNIQUE NOT NULL,

original_url TEXT NOT NULL,

user_id UUID, -- nullable for anonymous users

created_at TIMESTAMP DEFAULT NOW(),

expires_at TIMESTAMP -- nullable

);

CREATE INDEX idx_short_code ON urls(short_code);

-- Redis cache

-- Key: "url:abc123" → Value: "https://example.com/long-url"

-- TTL: 24 hours

Step 4: Write Path

@app.post("/shorten")

async def shorten_url(url: str, user_id: str = None):

# 1. Check if URL already shortened (optimization)

existing = await db.query(

"SELECT short_code FROM urls WHERE original_url = :url AND user_id = :uid",

{"url": url, "uid": user_id}

)

if existing:

return {"short_url": f"https://short.domain/{existing['short_code']}"}

# 2. Generate unique code

short_code = await generate_unique_code()

# 3. Store in DB

await db.execute(

"INSERT INTO urls (short_code, original_url, user_id) VALUES (:c, :u, :uid)",

{"c": short_code, "u": url, "uid": user_id}

)

# 4. Warm the cache

await redis.setex(f"url:{short_code}", 86400, url)

return {"short_url": f"https://short.domain/{short_code}"}

async def generate_unique_code() -> str:

for _ in range(3): # Retry on collision

code = encode_base62(random.randint(0, 62**7 - 1))

exists = await db.query(

"SELECT 1 FROM urls WHERE short_code = :c", {"c": code}

)

if not exists:

return code

raise Exception("Collision rate too high — increase code length")

Step 5: Read Path (The Hot Path — Handles 100M req/day)

@app.get("/{short_code}")

async def redirect(short_code: str, request: Request):

# 1. Try cache (99% hit rate with 24h TTL)

original_url = await redis.get(f"url:{short_code}")

if not original_url:

# 2. Cache miss — hit DB

row = await db.query(

"SELECT original_url FROM urls WHERE short_code = :c",

{"c": short_code}

)

if not row:

raise HTTPException(status_code=404)

original_url = row["original_url"]

# 3. Populate cache with TTL

await redis.setex(f"url:{short_code}", 86400, original_url)

# 4. Log click asynchronously (don't block the redirect)

click_event = {

"short_code": short_code,

"ip": request.client.host,

"user_agent": request.headers.get("user-agent"),

"referer": request.headers.get("referer"),

"timestamp": int(time.time()),

}

# Fire and forget — queue to Kafka

await click_producer.send("url_clicks", click_event)

# 5. Redirect (301 for permanent, 302 for analytics)

return RedirectResponse(url=original_url, status_code=301)

Step 6: Scale Considerations

* **Read replicas** for URL resolution (read-heavy: 10:1 read-to-write ratio)

* **Redis cluster** for cache (with consistent hashing)

* **Kafka partitions** by short_code for ordered click logs

* **Batch write** click analytics to ClickHouse every 30 seconds

* **CDN** for the redirect page itself (not the API — API calls are cheap)

\---

11\. Async Processing Patterns

The Problem: Synchronous Chains

Client ──▶ Service A ──▶ Service B ──▶ Service C ──▶ Response

500ms 800ms 200ms = 1.5s total

The client waits 1.5 seconds for something that doesn't need a response.

Solution: Decouple with Async

Client ──▶ Service A ──▶ Response (immediate: "Accepted")

│

▼

Queue (Kafka/SQS)

│

┌──────┴──────┐

▼ ▼

Service B Service C

(email) (generate PDF)

Pattern 1: Fire and Forget

@app.post("/api/send-email")

async def send_email(request: EmailRequest):

# Validate request

if not request.valid:

raise HTTPException(400)

# Queue the work — don't wait

await email_queue.send({

"to": request.to,

"template": request.template,

"data": request.data,

})

# Return immediately

return {"status": "queued", "message_id": str(uuid.uuid4())}

Pattern 2: Polling with Status

@app.post("/api/report/generate")

async def generate_report(params: ReportParams):

report_id = str(uuid.uuid4())

await report_queue.send({"report_id": report_id, "params": params})

return {"report_id": report_id, "status_url": f"/api/report/{report_id}/status"}

@app.get("/api/report/{report_id}/status")

async def check_status(report_id: str):

status = await redis.get(f"report:{report_id}:status")

if status == "ready":

return {"status": "ready", "url": f"/api/report/{report_id}/download"}

return {"status": "processing"}

Pattern 3: Webhook Callback

Instead of polling, have the worker call a URL when done:

async def process_report(report_id: str, params: dict, callback_url: str):

# ... generate report ...

await save_report(report_id, result)

# Notify caller

if callback_url:

await httpx.post(callback_url, json={

"report_id": report_id,

"status": "completed",

"download_url": f"/api/report/{report_id}/download",

})

\---

12\. Common Anti-Patterns

1\. The Distributed Monolith

You split into microservices but deploy them together and fail to maintain boundaries. Every service calls every other service directly. Schema changes ripple across the system.

**Signs**: A "simple" feature touches 5+ services. You need to coordinate deploys across teams. Services share a database — or god forbid, tables.

**Fix**: Enforce bounded contexts. Each service owns its data. Communication is via APIs or events, not shared databases.

2\. Over-Engineering from Day One

"Let's use Kafka, Cassandra, Kubernetes, and event sourcing" — for a blog with 10 visitors/day.

**Fix**: Start with the simplest thing that works. A monolith with PostgreSQL and Redis will handle 99% of applications. Extract services when there's a proven need.

3\. Synchronous Coupling via HTTP

Service A ──HTTP──▶ Service B ──HTTP──▶ Service C ──HTTP──▶ Service D

If one service is slow, the whole chain slows. Latency adds up. Failures cascade.

**Fix**: Use async communication for non-critical paths. Use circuit breakers for critical sync calls. Prefer eventual consistency over synchronous coordination.

4\. The Shared Database

Two services reading/writing the same database table. Schema changes require coordination. One service can deadlock the other.

**Fix**: Each service owns its data. Share via APIs or events, not databases.

5\. Ignoring Caching

Every request hits the database. Database CPU is 90%. Response times are 200ms for data that changes hourly.

**Fix**: Add Redis. Cache the most frequently accessed data. Even a 60-second cache TTL reduces DB load by 95% for read-heavy workloads.

6\. The N+1 Query Problem

# Anti-pattern: N+1 queries

def get_orders_with_items(user_id: str):

orders = db.query("SELECT * FROM orders WHERE user_id = :uid", {"uid": user_id})

for order in orders:

# One query PER order — terrible!

order["items"] = db.query(

"SELECT * FROM order_items WHERE order_id = :oid",

{"oid": order["id"]}

)

return orders

# Fix: single query with JOIN

def get_orders_with_items_fixed(user_id: str):

return db.query("""

SELECT o.id, o.total, oi.product_id, oi.quantity

FROM orders o

LEFT JOIN order_items oi ON oi.order_id = o.id

WHERE o.user_id = :uid

""", {"uid": user_id})

7\. No Monitoring / No Observability

"Everything looks fine" — until users complain that the site is slow and you have no idea why.

**Baseline monitoring**: Request latency (p50, p95, p99), error rate, throughput, CPU/memory per service. Structured logging with correlation IDs. Distributed tracing for async flows.

\---

Summary: Key Decisions for 2026

| Decision | Default Choice | Upgrade When | |---|---|---| | **Architecture** | Modular monolith | Team >15 or clear independent scale need | | **Database** | PostgreSQL | Read replicas at 10k reads/s, sharding at 100k | | **Cache** | Redis (cache aside) | Write-behind for high-throughput writes | | **Queue** | SQS (serverless) → RabbitMQ (control) → Kafka (streaming) | Scale-dependent | | **Async** | Fire and forget for non-critical | Polling → Webhooks as needs grow | | **API Gateway** | NGINX / Traefik | Envoy / Kong for advanced routing | | **Resilience** | Circuit breaker + timeout | Bulkhead + rate limiting at scale |

The best system design is the one that solves today's problem without creating tomorrow's nightmare. Start simple, measure everything, extract with surgical precision, and never optimize for a scale you haven't reached.

System Design Fundamentals 2026: A Developer Guide to Scalable Applications

System Design Fundamentals 2026: A Developer Guide to Scalable Applications

Related Articles