System Design Fundamentals 2026: A Developer's Guide to Scalable Applications
System design interviews get all the attention, but the real value is in day-to-day decisions: should you extract that service? Add a cache? Reach for a message queue? This guide covers the fundamental patterns, their trade-offs, and the concrete decisions you'll face building production systems in 2026.
---
1. Microservices vs Monolith vs Modular Monolith
The "monolith vs microservices" debate has matured. In 2026, the winner is often somewhere in between.
| Architecture | Team Size | Deploy Frequency | Best For |
|---|---|---|---|
| **Monolith** | 1–5 | Low | Prototypes, internal tools, MVPs |
| **Modular Monolith** | 3–15 | Medium | Most business apps, teams that aren't Spotify-sized |
| **Microservices** | 10+ per service | High | Large orgs with clear domain boundaries |
The Modular Monolith Sweet Spot
A modular monolith is a single deployable unit with **strict module boundaries**. Modules communicate through well-defined interfaces but share the same process and database.
┌─────────────────────────────────────┐
│ Modular Monolith │
│ ┌──────────┐ ┌──────────┐ │
│ │ Orders │ │ Billing │ │
│ │ Module │──│ Module │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ ┌────▼──────────────▼─────┐ │
│ │ Shared Kernel │ │
│ │ (DB, messaging, auth) │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────┘
**When to extract a service**: When two conditions are met — the module has a clear bounded context (DDD), and you need independent scaling or deploy velocity that the monolith can't provide.
**Rule of thumb**: Don't break your monolith until it hurts. Premature microservices add distributed transaction complexity, network latency, and operational overhead. Start modular, extract surgically.
Real-World Decision Tree
Monolith → Modular Monolith → Selective Extraction → Full Microservices
MVP Phase: Monolith
10k users/5 devs: Modular monolith
100k users: Extract payments (PCI scope)
1M users: Extract search (separate scale)
10M users: Extract recommendations (different stack)
---
2. CQRS: Command Query Responsibility Segregation
CQRS separates **reads** from **writes** — different models, sometimes different databases.
When CQRS Makes Sense
A Simple CQRS Implementation
# --- Command Side (Writes) ---
class CreateOrderCommand:
def __init__(self, user_id: str, items: list):
self.user_id = user_id
self.items = items
class OrderCommandHandler:
def handle(self, cmd: CreateOrderCommand) -> str:
# Validate business rules
order = Order.create(cmd.user_id, cmd.items)
order.save() # Write to transactional DB (PostgreSQL)
event_bus.publish("order.created", {"order_id": order.id})
return order.id
# --- Query Side (Reads) ---
class OrderQueryHandler:
def get_order_summary(self, user_id: str) -> dict:
# Read from denormalized read model (could be a different DB)
return read_db.query(
"SELECT * FROM order_summaries WHERE user_id = :uid",
{"uid": user_id}
)
CQRS Without Event Sourcing
You don't need event sourcing to use CQRS. The most common pattern is:
2. **Sync** (or async via CDC) to a read-optimized table
3. **Read** from the read table
-- Write model: normalized
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
status VARCHAR(20) NOT NULL,
total_cents BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id UUID PRIMARY KEY,
order_id UUID REFERENCES orders(id),
product_id UUID NOT NULL,
quantity INT NOT NULL,
unit_price_cents BIGINT NOT NULL
);
-- Read model: denormalized for fast queries
CREATE TABLE order_summaries (
order_id UUID PRIMARY KEY,
user_id UUID NOT NULL,
status VARCHAR(20) NOT NULL,
item_count INT NOT NULL,
total_cents BIGINT NOT NULL,
product_names TEXT[] NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
When NOT to Use CQRS
---
3. Event-Driven Architecture
Event-driven systems decouple producers from consumers. When an event happens, interested services react.
Core Concepts
┌──────────┐ Event Bus ┌──────────────┐
│ Producer │─────(Kafka/RMQ)────▶│ Consumer 1 │
│ (Orders) │ │ (Analytics) │
└──────────┘ └──────────────┘
───────────────▶┌──────────────┐
│ Consumer 2 │
│ (Email) │
└──────────────┘
Message Queue Comparison
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| **Delivery** | At-least-once, exactly-once (idempotent) | At-most-once, at-least-once | At-least-once |
| **Ordering** | Per-partition guaranteed | Not guaranteed (unless single queue) | FIFO queue (limited throughput) |
| **Persistence** | Disk-based, configurable retention | Memory + disk (lazy queues) | Automatic (up to 14 days) |
| **Throughput** | Millions/sec | Thousands/sec | Unlimited (soft limit 300/s for FIFO) |
| **Consumer model** | Pull-based (offset tracking) | Push or pull | Pull-based (long polling) |
| **Use case** | Event sourcing, stream processing, logs | Task queues, RPC, work queues | Serverless workloads, simple decoupling |
| **Operational cost** | High (requires Zookeeper/KRaft) | Medium | Zero (fully managed) |
Kafka in Practice: The Url Shortener Click Stream
# Producer — emit click events
def record_click(short_code: str, ip: str, user_agent: str):
producer.send(
topic="url_clicks",
key=short_code.encode(), # Same key → same partition → ordered
value={
"short_code": short_code,
"ip": ip,
"user_agent": user_agent,
"timestamp": int(time.time()),
}
)
# Consumer 1 — real-time analytics (e.g., update Redis counters)
def consume_clicks_for_analytics():
for message in consumer:
click = message.value
redis.zincrby("popular_urls:today", 1, click["short_code"])
redis.incr(f"url:{click['short_code']}:clicks")
# Consumer 2 — store raw clicks in data warehouse
def consume_clicks_for_storage():
for message in consumer:
warehouse.insert_one(message.value)
Event Sourcing: Storing State as Events
Instead of storing the current state, event sourcing stores a sequence of state-changing events. The current state is derived by replaying them.
# Events (immutable facts)
events = [
{"type": "AccountCreated", "data": {"user_id": "u1", "email": "a@b.com"}},
{"type": "EmailVerified", "data": {"user_id": "u1", "verified_at": "2026-05-01"}},
{"type": "PasswordChanged", "data": {"user_id": "u1", "changed_at": "2026-05-10"}},
]
# Derive current state by replaying events
def get_account_state(events):
state = {"email": None, "email_verified": False, "password_hash": None}
for event in events:
if event["type"] == "AccountCreated":
state["email"] = event["data"]["email"]
elif event["type"] == "EmailVerified":
state["email_verified"] = True
return state
**Trade-offs**: Event sourcing gives you a complete audit trail and time travel, but makes querying awkward (you need projections) and schema evolution painful.
---
4. Database Scaling Strategies
Read Replicas
The simplest scaling strategy: one primary handles writes, replicas handle reads.
┌─────────────┐
│ Primary DB │◀── Writes
└──────┬──────┘
│
┌────────────────┼────────────────┐
│ │ │
┌────▼─────┐ ┌─────▼────┐ ┌───────▼──┐
│ Replica 1│ │ Replica 2│ │ Replica 3│
│ (Reads) │ │ (Reads) │ │ (Reads) │
└──────────┘ └──────────┘ └──────────┘
# Using read/write separation in code
class DatabaseRouter:
def __init__(self):
self.primary = create_engine(PRIMARY_URL)
self.replicas = [create_engine(url) for url in REPLICA_URLS]
self.replica_index = 0
def write(self, query, params=None):
with self.primary.begin() as conn:
return conn.execute(query, params or {})
def read(self, query, params=None):
# Round-robin across replicas
replica = self.replicas[self.replica_index % len(self.replicas)]
self.replica_index += 1
return replica.execute(query, params or {})
**Replication lag** is the #1 problem. If your app reads immediately after a write (e.g., "you just placed an order" page), route that read to the primary. This is called **read-after-write consistency**.
async def create_order_and_redirect(user_id: str, items: list):
order_id = db.write("INSERT INTO orders ... RETURNING id")
# Read-after-write: force this read to the primary
order = db.read_from_primary(
"SELECT * FROM orders WHERE id = :oid", {"oid": order_id}
)
return redirect(f"/orders/{order_id}")
Sharding (Horizontal Partitioning)
Split data across databases by a shard key.
| Strategy | Shard Key | Pros | Cons |
|---|---|---|---|
| **Hash-based** | hash(user_id) % N | Even distribution | Resharding is painful (need consistent hashing) |
| **Range-based** | user_id 1–10000 → shard 1 | Range queries work | Hot spots possible |
| **Directory-based** | Lookup table maps key → shard | Flexible, re-shardable | Extra lookup, single point of failure |
# Consistent hashing — minimizes re-sharding
class ConsistentHashRing:
def __init__(self, nodes: list, replicas: int = 150):
self.ring = {}
for node in nodes:
for i in range(replicas):
key = self._hash(f"{node}:{i}")
self.ring[key] = node
self.sorted_keys = sorted(self.ring.keys())
def get_node(self, key: str) -> str:
if not self.ring:
return None
hash_val = self._hash(key)
for ring_key in self.sorted_keys:
if hash_val <= ring_key:
return self.ring[ring_key]
return self.ring[self.sorted_keys[0]]
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
Partitioning (Within a Database)
Split a table into smaller physical chunks. PostgreSQL declarative partitioning:
CREATE TABLE events (
event_id UUID NOT NULL,
occurred_at TIMESTAMP NOT NULL,
payload JSONB
) PARTITION BY RANGE (occurred_at);
CREATE TABLE events_2026_q1
PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026_q2
PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
Partition pruning means queries with `WHERE occurred_at >= '2026-04-01'` only scan relevant partitions.
---
5. Caching Layers
The Three Cache Levels
CDN ─── Application Cache (Redis) ─── In-Memory Cache (Local)
│ │ │
│ Expensive to fill Fastest access
│ Shared across servers 1-5μs per get
│ 50-500μs per get Lost on restart
Cache Strategies
| Strategy | Read Behavior | Write Behavior | Best For |
|---|---|---|---|
| **Cache Aside** | Check cache → miss → load from DB → populate cache | Write to DB, invalidate cache key | Most general-purpose apps |
| **Read Through** | Cache is authoritative; loads from DB on miss | Write through to DB; cache handles loading | When cache handles persistence |
| **Write Through** | — | Write to cache first, then DB synchronously | Apps needing strong consistency |
| **Write Behind** | — | Write to cache, async flush to DB | High-write-throughput apps |
| **Write Around** | — | Write to DB only; cache populated on subsequent read | Write-once, read-rarely data |
Cache Aside — The Default Choice
async def get_user_profile(user_id: str) -> dict:
cache_key = f"user:profile:{user_id}"
# 1. Try cache
cached = await redis.get(cache_key)
if cached:
return json.loads(cached)
# 2. Cache miss — load from database
profile = await db.query(
"SELECT * FROM user_profiles WHERE user_id = :uid",
{"uid": user_id}
)
if profile:
# 3. Populate cache with TTL
await redis.setex(cache_key, 300, json.dumps(profile))
return profile
async def update_user_profile(user_id: str, data: dict):
# 1. Write to database
await db.execute(
"UPDATE user_profiles SET name = :name WHERE user_id = :uid",
{"uid": user_id, "name": data["name"]}
)
# 2. Invalidate cache (don't update it — let next read re-populate)
await redis.delete(f"user:profile:{user_id}")
Write Behind — For High-Volume Writes
# Batch writer process — runs every 5 seconds
write_buffer = []
async def write_to_cache(key: str, value: dict):
write_buffer.append((key, value))
if len(write_buffer) >= 100:
await flush_buffer()
async def flush_buffer():
async with db.transaction():
for key, value in write_buffer:
await db.execute(
"UPSERT INTO ... VALUES (:k, :v)",
{"k": key, "v": json.dumps(value)}
)
write_buffer.clear()
# Start background flusher
async def periodic_flush():
while True:
await asyncio.sleep(5)
if write_buffer:
await flush_buffer()
**Write behind risk**: if the process crashes before the flush, data is lost. Use a persistent queue (Kafka) for critical writes.
---
6. CAP Theorem Explained Practically
CAP says a distributed data store can provide at most two of three guarantees: **Consistency**, **Availability**, and **Partition Tolerance**.
What CAP Actually Means
The Key Insight
You **must** choose CP or AP. Partition tolerance is non-negotiable in distributed systems — networks WILL fail.
| System | Choice | Real-World |
|---|---|---|
| PostgreSQL (single node) | CA | No distribution, no partition |
| PostgreSQL + synchronous replication | CP | Writes wait for replicas |
| Cassandra | AP | Writes always succeed, reads may be stale |
| DynamoDB (eventual consistency) | AP | Default read is eventually consistent |
| DynamoDB (strongly consistent) | CP | Higher latency, lower availability |
| MongoDB (replica set) | CP | Writes acknowledged by majority |
Practical CAP Decisions
# AP choice — accept stale reads for availability
async def get_product_stock(product_id: str) -> int:
# Read from nearest replica, may be stale
return await replica.query(
"SELECT stock FROM products WHERE id = :pid",
{"pid": product_id}
)
# CP choice — accept slower reads for consistency
async def get_product_stock_cp(product_id: str) -> int:
# Read from primary, always latest
return await primary.query(
"SELECT stock FROM products WHERE id = :pid",
{"pid": product_id}
)
**Rule of thumb**: Use eventual consistency for read-heavy, non-critical data (product descriptions, view counts). Use strong consistency for financial data, inventory, and auth tokens.
---
7. Load Balancing Strategies
Layer 4 vs Layer 7
| Aspect | Layer 4 (TCP) | Layer 7 (HTTP) |
|---|---|---|
| **Routing based on** | IP + port | URL, headers, cookies, body |
| **Performance** | Very fast | Slower (inspects payload) |
| **Features** | Simple forwarding | Content-based routing, rate limiting |
| **Examples** | HAProxy (TCP mode), AWS NLB | NGINX, Envoy, AWS ALB |
Algorithms
# Round Robin — predictable, but doesn't handle different load sizes
servers = ["app-01", "app-02", "app-03"]
next_server = current_index % len(servers)
current_index += 1
# Least Connections — better for variable request durations
def least_connections(servers: list) -> str:
return min(servers, key=lambda s: s.active_connections)
# IP Hash — session persistence without cookies
def ip_hash(client_ip: str, servers: list) -> str:
hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
return servers[hash_val % len(servers)]
Health Checks: The Bare Minimum
┌──────────┐ /healthz ┌──────────┐
│ LB │───────────────▶│ App-01 │──▶ Returns 200
│ │ ├──────────┤
│ │───────────────▶│ App-02 │──▶ Returns 500 (removed from pool)
│ │ ├──────────┤
│ │───────────────▶│ App-03 │──▶ Returns 200
└──────────┘ └──────────┘
# /healthz endpoint
@app.get("/healthz")
async def health_check():
# Check critical dependencies
db_ok = await check_database()
cache_ok = await check_redis()
if db_ok and cache_ok:
return {"status": "ok"}
return {"status": "degraded"}, 503
---
8. API Gateway Patterns
An API gateway sits between clients and your services, handling cross-cutting concerns.
┌─────────────────┐
│ API Gateway │
│ ┌─────────────┐ │
Client ──────────┼─▶ Auth │ │
│ └─────────────┘ │
│ ┌─────────────┐ │
│─▶ Rate Limit │ │──▶ Service A
│ └─────────────┘ │
│ ┌─────────────┐ │──▶ Service B
│─▶ Routing │ │
│ └─────────────┘ │──▶ Service C
│ ┌─────────────┐ │
│─▶ Logging │ │
│ └─────────────┘ │
└─────────────────┘
What the Gateway Handles
# Before gateway — each service handles auth
@app.route("/api/orders")
class OrdersResource:
def get(self):
token = request.headers["Authorization"]
user = verify_token(token) # Duplicated in EVERY service
# After gateway — auth is centralized
# Service code is simpler:
@app.route("/api/orders")
class OrdersResource:
def get(self):
user = request.environ["X-Authenticated-User"] # Set by gateway
return get_orders(user["id"])
Gateway vs Service Mesh
| Concern | API Gateway | Service Mesh (e.g., Istio) |
|---|---|---|
| **Client-facing** | Yes (edge) | No (internal) |
| **Auth** | Token verification, API keys | mTLS between services |
| **Rate limiting** | Per-client, per-endpoint | Per-service |
| **Routing** | URL-based | Traffic splitting, canary |
| **Location** | Edge proxy | Sidecar per pod |
**Recommendation**: Start with an API gateway. Add a service mesh only when you have dozens of services and need advanced traffic management.
---
9. Circuit Breaker and Resilience Patterns
The Circuit Breaker Pattern
class CircuitBreaker:
STATES = ["CLOSED", "OPEN", "HALF_OPEN"]
def __init__(self, failure_threshold=5, recovery_timeout=30):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout # seconds
self.state = "CLOSED"
self.last_failure_time = None
async def call(self, func, fallback=None):
if self.state == "OPEN":
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "HALF_OPEN"
else:
return await fallback() if fallback else None
try:
result = await func()
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
return await fallback() if fallback else None
# Usage
cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
async def get_recommendations(user_id: str):
return await cb.call(
func=lambda: recommendations_service.fetch(user_id),
fallback=lambda: {"recommendations": [], "source": "fallback"}
)
Other Resilience Patterns
| Pattern | What It Does |
|---|---|
| **Retry with backoff** | Exponential backoff + jitter to avoid thundering herd |
| **Timeout** | Hard timeout per request (e.g., 5s) to prevent cascading |
| **Bulkhead** | Isolate resources — limit connections per service |
| **Rate limiting** | Token bucket or leaky bucket per client |
| **Dead letter queue** | Failed messages go to a DLQ for manual inspection |
# Retry with exponential backoff and jitter
async def retry_with_backoff(func, max_retries=3):
for attempt in range(max_retries):
try:
return await func()
except (ConnectionError, TimeoutError) as e:
if attempt == max_retries - 1:
raise
sleep_time = (2 ** attempt) + random.random() # exp + jitter
await asyncio.sleep(sleep_time)
---
10. Real Example: Design a URL Shortener
Let's design bit.ly/tinyurl step by step.
Requirements
Step 1: URL Encoding
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode_base62(num: int) -> str:
if num == 0:
return BASE62[0]
result = []
while num > 0:
result.append(BASE62[num % 62])
num //= 62
return ''.join(reversed(result))
def decode_base62(code: str) -> int:
result = 0
for char in code:
result = result * 62 + BASE62.index(char)
return result
# Example: 7 chars of base62 = 62^7 ≈ 3.5 trillion unique URLs
encode_base62(123456789) # "8m0Kx"
Step 2: Architecture
┌────────────┐
│ Analytics │
│ (Kafka → │
│ ClickHouse)│
└─────────────┘
▲
│ (async)
┌──────────┐ POST /shorten ┌──────────────────────────┐
│ Client │────────────────────▶│ API Gateway │
│ │ │ ┌────────────────────┐ │
│ │ GET /abc123 │ │ Write Service │──┼──▶ PostgreSQL (URLs)
│ │────────────────────▶│ │ (generate code) │ │
│ │ │ └────────────────────┘ │
│ │ 301 Redirect │ ┌────────────────────┐ │
│ │◀────────────────────│ │ Read Service │ │
│ │ │ │ (resolve + cache) │──┼──▶ Redis (cache)
│ │ │ └────────────────────┘ │
│ │ │ ┌────────────────────┐ │
│ │ │ │ Click Logger │──┼──▶ Kafka
│ │ │ └────────────────────┘ │
└──────────┘ └──────────────────────────┘
Step 3: Data Model
-- PostgreSQL
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(10) UNIQUE NOT NULL,
original_url TEXT NOT NULL,
user_id UUID, -- nullable for anonymous users
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP -- nullable
);
CREATE INDEX idx_short_code ON urls(short_code);
-- Redis cache
-- Key: "url:abc123" → Value: "https://example.com/long-url"
-- TTL: 24 hours
Step 4: Write Path
@app.post("/shorten")
async def shorten_url(url: str, user_id: str = None):
# 1. Check if URL already shortened (optimization)
existing = await db.query(
"SELECT short_code FROM urls WHERE original_url = :url AND user_id = :uid",
{"url": url, "uid": user_id}
)
if existing:
return {"short_url": f"https://short.domain/{existing['short_code']}"}
# 2. Generate unique code
short_code = await generate_unique_code()
# 3. Store in DB
await db.execute(
"INSERT INTO urls (short_code, original_url, user_id) VALUES (:c, :u, :uid)",
{"c": short_code, "u": url, "uid": user_id}
)
# 4. Warm the cache
await redis.setex(f"url:{short_code}", 86400, url)
return {"short_url": f"https://short.domain/{short_code}"}
async def generate_unique_code() -> str:
for _ in range(3): # Retry on collision
code = encode_base62(random.randint(0, 62**7 - 1))
exists = await db.query(
"SELECT 1 FROM urls WHERE short_code = :c", {"c": code}
)
if not exists:
return code
raise Exception("Collision rate too high — increase code length")
Step 5: Read Path (The Hot Path — Handles 100M req/day)
@app.get("/{short_code}")
async def redirect(short_code: str, request: Request):
# 1. Try cache (99% hit rate with 24h TTL)
original_url = await redis.get(f"url:{short_code}")
if not original_url:
# 2. Cache miss — hit DB
row = await db.query(
"SELECT original_url FROM urls WHERE short_code = :c",
{"c": short_code}
)
if not row:
raise HTTPException(status_code=404)
original_url = row["original_url"]
# 3. Populate cache with TTL
await redis.setex(f"url:{short_code}", 86400, original_url)
# 4. Log click asynchronously (don't block the redirect)
click_event = {
"short_code": short_code,
"ip": request.client.host,
"user_agent": request.headers.get("user-agent"),
"referer": request.headers.get("referer"),
"timestamp": int(time.time()),
}
# Fire and forget — queue to Kafka
await click_producer.send("url_clicks", click_event)
# 5. Redirect (301 for permanent, 302 for analytics)
return RedirectResponse(url=original_url, status_code=301)
Step 6: Scale Considerations
---
11. Async Processing Patterns
The Problem: Synchronous Chains
Client ──▶ Service A ──▶ Service B ──▶ Service C ──▶ Response
500ms 800ms 200ms = 1.5s total
The client waits 1.5 seconds for something that doesn't need a response.
Solution: Decouple with Async
Client ──▶ Service A ──▶ Response (immediate: "Accepted")
│
▼
Queue (Kafka/SQS)
│
┌──────┴──────┐
▼ ▼
Service B Service C
(email) (generate PDF)
Pattern 1: Fire and Forget
@app.post("/api/send-email")
async def send_email(request: EmailRequest):
# Validate request
if not request.valid:
raise HTTPException(400)
# Queue the work — don't wait
await email_queue.send({
"to": request.to,
"template": request.template,
"data": request.data,
})
# Return immediately
return {"status": "queued", "message_id": str(uuid.uuid4())}
Pattern 2: Polling with Status
@app.post("/api/report/generate")
async def generate_report(params: ReportParams):
report_id = str(uuid.uuid4())
await report_queue.send({"report_id": report_id, "params": params})
return {"report_id": report_id, "status_url": f"/api/report/{report_id}/status"}
@app.get("/api/report/{report_id}/status")
async def check_status(report_id: str):
status = await redis.get(f"report:{report_id}:status")
if status == "ready":
return {"status": "ready", "url": f"/api/report/{report_id}/download"}
return {"status": "processing"}
Pattern 3: Webhook Callback
Instead of polling, have the worker call a URL when done:
async def process_report(report_id: str, params: dict, callback_url: str):
# ... generate report ...
await save_report(report_id, result)
# Notify caller
if callback_url:
await httpx.post(callback_url, json={
"report_id": report_id,
"status": "completed",
"download_url": f"/api/report/{report_id}/download",
})
---
12. Common Anti-Patterns
1. The Distributed Monolith
You split into microservices but deploy them together and fail to maintain boundaries. Every service calls every other service directly. Schema changes ripple across the system.
**Signs**: A "simple" feature touches 5+ services. You need to coordinate deploys across teams. Services share a database — or god forbid, tables.
**Fix**: Enforce bounded contexts. Each service owns its data. Communication is via APIs or events, not shared databases.
2. Over-Engineering from Day One
"Let's use Kafka, Cassandra, Kubernetes, and event sourcing" — for a blog with 10 visitors/day.
**Fix**: Start with the simplest thing that works. A monolith with PostgreSQL and Redis will handle 99% of applications. Extract services when there's a proven need.
3. Synchronous Coupling via HTTP
Service A ──HTTP──▶ Service B ──HTTP──▶ Service C ──HTTP──▶ Service D
If one service is slow, the whole chain slows. Latency adds up. Failures cascade.
**Fix**: Use async communication for non-critical paths. Use circuit breakers for critical sync calls. Prefer eventual consistency over synchronous coordination.
4. The Shared Database
Two services reading/writing the same database table. Schema changes require coordination. One service can deadlock the other.
**Fix**: Each service owns its data. Share via APIs or events, not databases.
5. Ignoring Caching
Every request hits the database. Database CPU is 90%. Response times are 200ms for data that changes hourly.
**Fix**: Add Redis. Cache the most frequently accessed data. Even a 60-second cache TTL reduces DB load by 95% for read-heavy workloads.
6. The N+1 Query Problem
# Anti-pattern: N+1 queries
def get_orders_with_items(user_id: str):
orders = db.query("SELECT * FROM orders WHERE user_id = :uid", {"uid": user_id})
for order in orders:
# One query PER order — terrible!
order["items"] = db.query(
"SELECT * FROM order_items WHERE order_id = :oid",
{"oid": order["id"]}
)
return orders
# Fix: single query with JOIN
def get_orders_with_items_fixed(user_id: str):
return db.query("""
SELECT o.id, o.total, oi.product_id, oi.quantity
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = :uid
""", {"uid": user_id})
7. No Monitoring / No Observability
"Everything looks fine" — until users complain that the site is slow and you have no idea why.
**Baseline monitoring**: Request latency (p50, p95, p99), error rate, throughput, CPU/memory per service. Structured logging with correlation IDs. Distributed tracing for async flows.
---
Summary: Key Decisions for 2026
| Decision | Default Choice | Upgrade When |
|---|---|---|
| **Architecture** | Modular monolith | Team >15 or clear independent scale need |
| **Database** | PostgreSQL | Read replicas at 10k reads/s, sharding at 100k |
| **Cache** | Redis (cache aside) | Write-behind for high-throughput writes |
| **Queue** | SQS (serverless) → RabbitMQ (control) → Kafka (streaming) | Scale-dependent |
| **Async** | Fire and forget for non-critical | Polling → Webhooks as needs grow |
| **API Gateway** | NGINX / Traefik | Envoy / Kong for advanced routing |
| **Resilience** | Circuit breaker + timeout | Bulkhead + rate limiting at scale |
The best system design is the one that solves today's problem without creating tomorrow's nightmare. Start simple, measure everything, extract with surgical precision, and never optimize for a scale you haven't reached.