System Design Fundamentals 2026: A Developer Guide to Scalable Applications


System Design Fundamentals 2026: A Developer's Guide to Scalable Applications

System design interviews get all the attention, but the real value is in day-to-day decisions: should you extract that service? Add a cache? Reach for a message queue? This guide covers the fundamental patterns, their trade-offs, and the concrete decisions you'll face building production systems in 2026.

\---

1\. Microservices vs Monolith vs Modular Monolith

The "monolith vs microservices" debate has matured. In 2026, the winner is often somewhere in between.

| Architecture | Team Size | Deploy Frequency | Best For | |---|---|---|---| | **Monolith** | 1–5 | Low | Prototypes, internal tools, MVPs | | **Modular Monolith** | 3–15 | Medium | Most business apps, teams that aren't Spotify-sized | | **Microservices** | 10+ per service | High | Large orgs with clear domain boundaries |

The Modular Monolith Sweet Spot

A modular monolith is a single deployable unit with **strict module boundaries**. Modules communicate through well-defined interfaces but share the same process and database.




┌─────────────────────────────────────┐


│ Modular Monolith │


│ ┌──────────┐ ┌──────────┐ │


│ │ Orders │ │ Billing │ │


│ │ Module │──│ Module │ │


│ └────┬─────┘ └────┬─────┘ │


│ │ │ │


│ ┌────▼──────────────▼─────┐ │


│ │ Shared Kernel │ │


│ │ (DB, messaging, auth) │ │


│ └─────────────────────────┘ │


└─────────────────────────────────────┘





**When to extract a service**: When two conditions are met — the module has a clear bounded context (DDD), and you need independent scaling or deploy velocity that the monolith can't provide.

**Rule of thumb**: Don't break your monolith until it hurts. Premature microservices add distributed transaction complexity, network latency, and operational overhead. Start modular, extract surgically.

Real-World Decision Tree




Monolith → Modular Monolith → Selective Extraction → Full Microservices




MVP Phase: Monolith


10k users/5 devs: Modular monolith


100k users: Extract payments (PCI scope)


1M users: Extract search (separate scale)


10M users: Extract recommendations (different stack)





\---

2\. CQRS: Command Query Responsibility Segregation

CQRS separates **reads** from **writes** — different models, sometimes different databases.

When CQRS Makes Sense


* Your read queries are complex and don't map well to your write model (e.g., reporting dashboards)

* Your read and write workloads have different scaling requirements (10:1 read-to-write ratio)

* You need different data shapes for reading vs writing (e.g., write normalized, read denormalized)


A Simple CQRS Implementation




# --- Command Side (Writes) ---


class CreateOrderCommand:


def __init__(self, user_id: str, items: list):


self.user_id = user_id


self.items = items




class OrderCommandHandler:


def handle(self, cmd: CreateOrderCommand) -> str:


# Validate business rules


order = Order.create(cmd.user_id, cmd.items)


order.save() # Write to transactional DB (PostgreSQL)


event_bus.publish("order.created", {"order_id": order.id})


return order.id




# --- Query Side (Reads) ---


class OrderQueryHandler:


def get_order_summary(self, user_id: str) -> dict:


# Read from denormalized read model (could be a different DB)


return read_db.query(


"SELECT * FROM order_summaries WHERE user_id = :uid",


{"uid": user_id}


)





CQRS Without Event Sourcing

You don't need event sourcing to use CQRS. The most common pattern is:


* **Write** to a normalized PostgreSQL table

2\. **Sync** (or async via CDC) to a read-optimized table 3\. **Read** from the read table




-- Write model: normalized


CREATE TABLE orders (


id UUID PRIMARY KEY,


user_id UUID NOT NULL,


status VARCHAR(20) NOT NULL,


total_cents BIGINT NOT NULL,


created_at TIMESTAMP DEFAULT NOW()


);




CREATE TABLE order_items (


id UUID PRIMARY KEY,


order_id UUID REFERENCES orders(id),


product_id UUID NOT NULL,


quantity INT NOT NULL,


unit_price_cents BIGINT NOT NULL


);




-- Read model: denormalized for fast queries


CREATE TABLE order_summaries (


order_id UUID PRIMARY KEY,


user_id UUID NOT NULL,


status VARCHAR(20) NOT NULL,


item_count INT NOT NULL,


total_cents BIGINT NOT NULL,


product_names TEXT[] NOT NULL,


created_at TIMESTAMP DEFAULT NOW()


);





When NOT to Use CQRS


* Your app is a simple CRUD interface with no complex queries

* You don't need separate read/write scaling

* Your team is small and you can't justify the infrastructure overhead


\---

3\. Event-Driven Architecture

Event-driven systems decouple producers from consumers. When an event happens, interested services react.

Core Concepts




┌──────────┐ Event Bus ┌──────────────┐


│ Producer │─────(Kafka/RMQ)────▶│ Consumer 1 │


│ (Orders) │ │ (Analytics) │


└──────────┘ └──────────────┘


───────────────▶┌──────────────┐


│ Consumer 2 │


│ (Email) │


└──────────────┘





Message Queue Comparison

| Feature | Kafka | RabbitMQ | SQS | |---|---|---|---| | **Delivery** | At-least-once, exactly-once (idempotent) | At-most-once, at-least-once | At-least-once | | **Ordering** | Per-partition guaranteed | Not guaranteed (unless single queue) | FIFO queue (limited throughput) | | **Persistence** | Disk-based, configurable retention | Memory + disk (lazy queues) | Automatic (up to 14 days) | | **Throughput** | Millions/sec | Thousands/sec | Unlimited (soft limit 300/s for FIFO) | | **Consumer model** | Pull-based (offset tracking) | Push or pull | Pull-based (long polling) | | **Use case** | Event sourcing, stream processing, logs | Task queues, RPC, work queues | Serverless workloads, simple decoupling | | **Operational cost** | High (requires Zookeeper/KRaft) | Medium | Zero (fully managed) |

Kafka in Practice: The Url Shortener Click Stream




# Producer — emit click events


def record_click(short_code: str, ip: str, user_agent: str):


producer.send(


topic="url_clicks",


key=short_code.encode(), # Same key → same partition → ordered


value={


"short_code": short_code,


"ip": ip,


"user_agent": user_agent,


"timestamp": int(time.time()),


}


)




# Consumer 1 — real-time analytics (e.g., update Redis counters)


def consume_clicks_for_analytics():


for message in consumer:


click = message.value


redis.zincrby("popular_urls:today", 1, click["short_code"])


redis.incr(f"url:{click['short_code']}:clicks")




# Consumer 2 — store raw clicks in data warehouse


def consume_clicks_for_storage():


for message in consumer:


warehouse.insert_one(message.value)





Event Sourcing: Storing State as Events

Instead of storing the current state, event sourcing stores a sequence of state-changing events. The current state is derived by replaying them.




# Events (immutable facts)


events = [


{"type": "AccountCreated", "data": {"user_id": "u1", "email": "a@b.com"}},


{"type": "EmailVerified", "data": {"user_id": "u1", "verified_at": "2026-05-01"}},


{"type": "PasswordChanged", "data": {"user_id": "u1", "changed_at": "2026-05-10"}},


]




# Derive current state by replaying events


def get_account_state(events):


state = {"email": None, "email_verified": False, "password_hash": None}


for event in events:


if event["type"] == "AccountCreated":


state["email"] = event["data"]["email"]


elif event["type"] == "EmailVerified":


state["email_verified"] = True


return state





**Trade-offs**: Event sourcing gives you a complete audit trail and time travel, but makes querying awkward (you need projections) and schema evolution painful.

\---

4\. Database Scaling Strategies

Read Replicas

The simplest scaling strategy: one primary handles writes, replicas handle reads.




┌─────────────┐


│ Primary DB │◀── Writes


└──────┬──────┘



┌────────────────┼────────────────┐


│ │ │


┌────▼─────┐ ┌─────▼────┐ ┌───────▼──┐


│ Replica 1│ │ Replica 2│ │ Replica 3│


│ (Reads) │ │ (Reads) │ │ (Reads) │


└──────────┘ └──────────┘ └──────────┘








# Using read/write separation in code


class DatabaseRouter:


def __init__(self):


self.primary = create_engine(PRIMARY_URL)


self.replicas = [create_engine(url) for url in REPLICA_URLS]


self.replica_index = 0




def write(self, query, params=None):


with self.primary.begin() as conn:


return conn.execute(query, params or {})




def read(self, query, params=None):


# Round-robin across replicas


replica = self.replicas[self.replica_index % len(self.replicas)]


self.replica_index += 1


return replica.execute(query, params or {})





**Replication lag** is the #1 problem. If your app reads immediately after a write (e.g., "you just placed an order" page), route that read to the primary. This is called **read-after-write consistency**.




async def create_order_and_redirect(user_id: str, items: list):


order_id = db.write("INSERT INTO orders ... RETURNING id")




# Read-after-write: force this read to the primary


order = db.read_from_primary(


"SELECT * FROM orders WHERE id = :oid", {"oid": order_id}


)




return redirect(f"/orders/{order_id}")





Sharding (Horizontal Partitioning)

Split data across databases by a shard key.

| Strategy | Shard Key | Pros | Cons | |---|---|---|---| | **Hash-based** | hash(user_id) % N | Even distribution | Resharding is painful (need consistent hashing) | | **Range-based** | user_id 1–10000 → shard 1 | Range queries work | Hot spots possible | | **Directory-based** | Lookup table maps key → shard | Flexible, re-shardable | Extra lookup, single point of failure |




# Consistent hashing — minimizes re-sharding


class ConsistentHashRing:


def __init__(self, nodes: list, replicas: int = 150):


self.ring = {}


for node in nodes:


for i in range(replicas):


key = self._hash(f"{node}:{i}")


self.ring[key] = node


self.sorted_keys = sorted(self.ring.keys())




def get_node(self, key: str) -> str:


if not self.ring:


return None


hash_val = self._hash(key)


for ring_key in self.sorted_keys:


if hash_val <= ring_key:


return self.ring[ring_key]


return self.ring[self.sorted_keys[0]]




def _hash(self, key: str) -> int:


return int(hashlib.md5(key.encode()).hexdigest(), 16)





Partitioning (Within a Database)

Split a table into smaller physical chunks. PostgreSQL declarative partitioning:




CREATE TABLE events (


event_id UUID NOT NULL,


occurred_at TIMESTAMP NOT NULL,


payload JSONB


) PARTITION BY RANGE (occurred_at);




CREATE TABLE events_2026_q1


PARTITION OF events


FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');




CREATE TABLE events_2026_q2


PARTITION OF events


FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');





Partition pruning means queries with `WHERE occurred_at >= '2026-04-01'` only scan relevant partitions.

\---

5\. Caching Layers

The Three Cache Levels




CDN ─── Application Cache (Redis) ─── In-Memory Cache (Local)


│ │ │


│ Expensive to fill Fastest access


│ Shared across servers 1-5μs per get


│ 50-500μs per get Lost on restart





Cache Strategies

| Strategy | Read Behavior | Write Behavior | Best For | |---|---|---|---| | **Cache Aside** | Check cache → miss → load from DB → populate cache | Write to DB, invalidate cache key | Most general-purpose apps | | **Read Through** | Cache is authoritative; loads from DB on miss | Write through to DB; cache handles loading | When cache handles persistence | | **Write Through** | — | Write to cache first, then DB synchronously | Apps needing strong consistency | | **Write Behind** | — | Write to cache, async flush to DB | High-write-throughput apps | | **Write Around** | — | Write to DB only; cache populated on subsequent read | Write-once, read-rarely data |

Cache Aside — The Default Choice




async def get_user_profile(user_id: str) -> dict:


cache_key = f"user:profile:{user_id}"




# 1. Try cache


cached = await redis.get(cache_key)


if cached:


return json.loads(cached)




# 2. Cache miss — load from database


profile = await db.query(


"SELECT * FROM user_profiles WHERE user_id = :uid",


{"uid": user_id}


)




if profile:


# 3. Populate cache with TTL


await redis.setex(cache_key, 300, json.dumps(profile))




return profile




async def update_user_profile(user_id: str, data: dict):


# 1. Write to database


await db.execute(


"UPDATE user_profiles SET name = :name WHERE user_id = :uid",


{"uid": user_id, "name": data["name"]}


)




# 2. Invalidate cache (don't update it — let next read re-populate)


await redis.delete(f"user:profile:{user_id}")





Write Behind — For High-Volume Writes




# Batch writer process — runs every 5 seconds


write_buffer = []




async def write_to_cache(key: str, value: dict):


write_buffer.append((key, value))


if len(write_buffer) >= 100:


await flush_buffer()




async def flush_buffer():


async with db.transaction():


for key, value in write_buffer:


await db.execute(


"UPSERT INTO ... VALUES (:k, :v)",


{"k": key, "v": json.dumps(value)}


)


write_buffer.clear()




# Start background flusher


async def periodic_flush():


while True:


await asyncio.sleep(5)


if write_buffer:


await flush_buffer()





**Write behind risk**: if the process crashes before the flush, data is lost. Use a persistent queue (Kafka) for critical writes.

\---

6\. CAP Theorem Explained Practically

CAP says a distributed data store can provide at most two of three guarantees: **Consistency**, **Availability**, and **Partition Tolerance**.

What CAP Actually Means


* **C (Consistency)**: Every read sees the most recent write (or an error)

* **A (Availability)**: Every request gets a non-error response (not necessarily the latest data)

* **P (Partition Tolerance)**: System continues working despite network failures


The Key Insight

You **must** choose CP or AP. Partition tolerance is non-negotiable in distributed systems — networks WILL fail.

| System | Choice | Real-World | |---|---|---| | PostgreSQL (single node) | CA | No distribution, no partition | | PostgreSQL + synchronous replication | CP | Writes wait for replicas | | Cassandra | AP | Writes always succeed, reads may be stale | | DynamoDB (eventual consistency) | AP | Default read is eventually consistent | | DynamoDB (strongly consistent) | CP | Higher latency, lower availability | | MongoDB (replica set) | CP | Writes acknowledged by majority |

Practical CAP Decisions




# AP choice — accept stale reads for availability


async def get_product_stock(product_id: str) -> int:


# Read from nearest replica, may be stale


return await replica.query(


"SELECT stock FROM products WHERE id = :pid",


{"pid": product_id}


)




# CP choice — accept slower reads for consistency


async def get_product_stock_cp(product_id: str) -> int:


# Read from primary, always latest


return await primary.query(


"SELECT stock FROM products WHERE id = :pid",


{"pid": product_id}


)





**Rule of thumb**: Use eventual consistency for read-heavy, non-critical data (product descriptions, view counts). Use strong consistency for financial data, inventory, and auth tokens.

\---

7\. Load Balancing Strategies

Layer 4 vs Layer 7

| Aspect | Layer 4 (TCP) | Layer 7 (HTTP) | |---|---|---| | **Routing based on** | IP + port | URL, headers, cookies, body | | **Performance** | Very fast | Slower (inspects payload) | | **Features** | Simple forwarding | Content-based routing, rate limiting | | **Examples** | HAProxy (TCP mode), AWS NLB | NGINX, Envoy, AWS ALB |

Algorithms




# Round Robin — predictable, but doesn't handle different load sizes


servers = ["app-01", "app-02", "app-03"]


next_server = current_index % len(servers)


current_index += 1




# Least Connections — better for variable request durations


def least_connections(servers: list) -> str:


return min(servers, key=lambda s: s.active_connections)




# IP Hash — session persistence without cookies


def ip_hash(client_ip: str, servers: list) -> str:


hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)


return servers[hash_val % len(servers)]





Health Checks: The Bare Minimum




┌──────────┐ /healthz ┌──────────┐


│ LB │───────────────▶│ App-01 │──▶ Returns 200


│ │ ├──────────┤


│ │───────────────▶│ App-02 │──▶ Returns 500 (removed from pool)


│ │ ├──────────┤


│ │───────────────▶│ App-03 │──▶ Returns 200


└──────────┘ └──────────┘








# /healthz endpoint


@app.get("/healthz")


async def health_check():


# Check critical dependencies


db_ok = await check_database()


cache_ok = await check_redis()


if db_ok and cache_ok:


return {"status": "ok"}


return {"status": "degraded"}, 503





\---

8\. API Gateway Patterns

An API gateway sits between clients and your services, handling cross-cutting concerns.




┌─────────────────┐


│ API Gateway │


│ ┌─────────────┐ │


Client ──────────┼─▶ Auth │ │


│ └─────────────┘ │


│ ┌─────────────┐ │


│─▶ Rate Limit │ │──▶ Service A


│ └─────────────┘ │


│ ┌─────────────┐ │──▶ Service B


│─▶ Routing │ │


│ └─────────────┘ │──▶ Service C


│ ┌─────────────┐ │


│─▶ Logging │ │


│ └─────────────┘ │


└─────────────────┘





What the Gateway Handles




# Before gateway — each service handles auth


@app.route("/api/orders")


class OrdersResource:


def get(self):


token = request.headers["Authorization"]


user = verify_token(token) # Duplicated in EVERY service




# After gateway — auth is centralized


# Service code is simpler:


@app.route("/api/orders")


class OrdersResource:


def get(self):


user = request.environ["X-Authenticated-User"] # Set by gateway


return get_orders(user["id"])





Gateway vs Service Mesh

| Concern | API Gateway | Service Mesh (e.g., Istio) | |---|---|---| | **Client-facing** | Yes (edge) | No (internal) | | **Auth** | Token verification, API keys | mTLS between services | | **Rate limiting** | Per-client, per-endpoint | Per-service | | **Routing** | URL-based | Traffic splitting, canary | | **Location** | Edge proxy | Sidecar per pod |

**Recommendation**: Start with an API gateway. Add a service mesh only when you have dozens of services and need advanced traffic management.

\---

9\. Circuit Breaker and Resilience Patterns

The Circuit Breaker Pattern




class CircuitBreaker:


STATES = ["CLOSED", "OPEN", "HALF_OPEN"]




def __init__(self, failure_threshold=5, recovery_timeout=30):


self.failure_count = 0


self.failure_threshold = failure_threshold


self.recovery_timeout = recovery_timeout # seconds


self.state = "CLOSED"


self.last_failure_time = None




async def call(self, func, fallback=None):


if self.state == "OPEN":


if time.time() - self.last_failure_time > self.recovery_timeout:


self.state = "HALF_OPEN"


else:


return await fallback() if fallback else None




try:


result = await func()


if self.state == "HALF_OPEN":


self.state = "CLOSED"


self.failure_count = 0


return result


except Exception as e:


self.failure_count += 1


self.last_failure_time = time.time()


if self.failure_count >= self.failure_threshold:


self.state = "OPEN"


return await fallback() if fallback else None






# Usage


cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60)




async def get_recommendations(user_id: str):


return await cb.call(


func=lambda: recommendations_service.fetch(user_id),


fallback=lambda: {"recommendations": [], "source": "fallback"}


)





Other Resilience Patterns

| Pattern | What It Does | |---|---| | **Retry with backoff** | Exponential backoff + jitter to avoid thundering herd | | **Timeout** | Hard timeout per request (e.g., 5s) to prevent cascading | | **Bulkhead** | Isolate resources — limit connections per service | | **Rate limiting** | Token bucket or leaky bucket per client | | **Dead letter queue** | Failed messages go to a DLQ for manual inspection |




# Retry with exponential backoff and jitter


async def retry_with_backoff(func, max_retries=3):


for attempt in range(max_retries):


try:


return await func()


except (ConnectionError, TimeoutError) as e:


if attempt == max_retries - 1:


raise


sleep_time = (2 ** attempt) + random.random() # exp + jitter


await asyncio.sleep(sleep_time)





\---

10\. Real Example: Design a URL Shortener

Let's design bit.ly/tinyurl step by step.

Requirements


* Generate a short, unique code for any URL

* Redirect to the original URL when the short code is accessed

* Track click analytics (count, referrer, timestamp)

* Handle 10M URLs, 100M redirects/day


Step 1: URL Encoding




BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"




def encode_base62(num: int) -> str:


if num == 0:


return BASE62[0]


result = []


while num > 0:


result.append(BASE62[num % 62])


num //= 62


return ''.join(reversed(result))




def decode_base62(code: str) -> int:


result = 0


for char in code:


result = result * 62 + BASE62.index(char)


return result




# Example: 7 chars of base62 = 62^7 ≈ 3.5 trillion unique URLs


encode_base62(123456789) # "8m0Kx"





Step 2: Architecture




┌────────────┐


│ Analytics │


│ (Kafka → │


│ ClickHouse)│


└─────────────┘



│ (async)


┌──────────┐ POST /shorten ┌──────────────────────────┐


│ Client │────────────────────▶│ API Gateway │


│ │ │ ┌────────────────────┐ │


│ │ GET /abc123 │ │ Write Service │──┼──▶ PostgreSQL (URLs)


│ │────────────────────▶│ │ (generate code) │ │


│ │ │ └────────────────────┘ │


│ │ 301 Redirect │ ┌────────────────────┐ │


│ │◀────────────────────│ │ Read Service │ │


│ │ │ │ (resolve + cache) │──┼──▶ Redis (cache)


│ │ │ └────────────────────┘ │


│ │ │ ┌────────────────────┐ │


│ │ │ │ Click Logger │──┼──▶ Kafka


│ │ │ └────────────────────┘ │


└──────────┘ └──────────────────────────┘





Step 3: Data Model




-- PostgreSQL


CREATE TABLE urls (


id BIGSERIAL PRIMARY KEY,


short_code VARCHAR(10) UNIQUE NOT NULL,


original_url TEXT NOT NULL,


user_id UUID, -- nullable for anonymous users


created_at TIMESTAMP DEFAULT NOW(),


expires_at TIMESTAMP -- nullable


);




CREATE INDEX idx_short_code ON urls(short_code);




-- Redis cache


-- Key: "url:abc123" → Value: "https://example.com/long-url"


-- TTL: 24 hours





Step 4: Write Path




@app.post("/shorten")


async def shorten_url(url: str, user_id: str = None):


# 1. Check if URL already shortened (optimization)


existing = await db.query(


"SELECT short_code FROM urls WHERE original_url = :url AND user_id = :uid",


{"url": url, "uid": user_id}


)


if existing:


return {"short_url": f"https://short.domain/{existing['short_code']}"}




# 2. Generate unique code


short_code = await generate_unique_code()




# 3. Store in DB


await db.execute(


"INSERT INTO urls (short_code, original_url, user_id) VALUES (:c, :u, :uid)",


{"c": short_code, "u": url, "uid": user_id}


)




# 4. Warm the cache


await redis.setex(f"url:{short_code}", 86400, url)




return {"short_url": f"https://short.domain/{short_code}"}






async def generate_unique_code() -> str:


for _ in range(3): # Retry on collision


code = encode_base62(random.randint(0, 62**7 - 1))


exists = await db.query(


"SELECT 1 FROM urls WHERE short_code = :c", {"c": code}


)


if not exists:


return code


raise Exception("Collision rate too high — increase code length")





Step 5: Read Path (The Hot Path — Handles 100M req/day)




@app.get("/{short_code}")


async def redirect(short_code: str, request: Request):


# 1. Try cache (99% hit rate with 24h TTL)


original_url = await redis.get(f"url:{short_code}")


if not original_url:


# 2. Cache miss — hit DB


row = await db.query(


"SELECT original_url FROM urls WHERE short_code = :c",


{"c": short_code}


)


if not row:


raise HTTPException(status_code=404)




original_url = row["original_url"]




# 3. Populate cache with TTL


await redis.setex(f"url:{short_code}", 86400, original_url)




# 4. Log click asynchronously (don't block the redirect)


click_event = {


"short_code": short_code,


"ip": request.client.host,


"user_agent": request.headers.get("user-agent"),


"referer": request.headers.get("referer"),


"timestamp": int(time.time()),


}


# Fire and forget — queue to Kafka


await click_producer.send("url_clicks", click_event)




# 5. Redirect (301 for permanent, 302 for analytics)


return RedirectResponse(url=original_url, status_code=301)





Step 6: Scale Considerations


* **Read replicas** for URL resolution (read-heavy: 10:1 read-to-write ratio)

* **Redis cluster** for cache (with consistent hashing)

* **Kafka partitions** by short_code for ordered click logs

* **Batch write** click analytics to ClickHouse every 30 seconds

* **CDN** for the redirect page itself (not the API — API calls are cheap)


\---

11\. Async Processing Patterns

The Problem: Synchronous Chains




Client ──▶ Service A ──▶ Service B ──▶ Service C ──▶ Response


500ms 800ms 200ms = 1.5s total





The client waits 1.5 seconds for something that doesn't need a response.

Solution: Decouple with Async




Client ──▶ Service A ──▶ Response (immediate: "Accepted")




Queue (Kafka/SQS)



┌──────┴──────┐


▼ ▼


Service B Service C


(email) (generate PDF)





Pattern 1: Fire and Forget




@app.post("/api/send-email")


async def send_email(request: EmailRequest):


# Validate request


if not request.valid:


raise HTTPException(400)




# Queue the work — don't wait


await email_queue.send({


"to": request.to,


"template": request.template,


"data": request.data,


})




# Return immediately


return {"status": "queued", "message_id": str(uuid.uuid4())}





Pattern 2: Polling with Status




@app.post("/api/report/generate")


async def generate_report(params: ReportParams):


report_id = str(uuid.uuid4())


await report_queue.send({"report_id": report_id, "params": params})


return {"report_id": report_id, "status_url": f"/api/report/{report_id}/status"}




@app.get("/api/report/{report_id}/status")


async def check_status(report_id: str):


status = await redis.get(f"report:{report_id}:status")


if status == "ready":


return {"status": "ready", "url": f"/api/report/{report_id}/download"}


return {"status": "processing"}





Pattern 3: Webhook Callback

Instead of polling, have the worker call a URL when done:




async def process_report(report_id: str, params: dict, callback_url: str):


# ... generate report ...


await save_report(report_id, result)




# Notify caller


if callback_url:


await httpx.post(callback_url, json={


"report_id": report_id,


"status": "completed",


"download_url": f"/api/report/{report_id}/download",


})





\---

12\. Common Anti-Patterns

1\. The Distributed Monolith

You split into microservices but deploy them together and fail to maintain boundaries. Every service calls every other service directly. Schema changes ripple across the system.

**Signs**: A "simple" feature touches 5+ services. You need to coordinate deploys across teams. Services share a database — or god forbid, tables.

**Fix**: Enforce bounded contexts. Each service owns its data. Communication is via APIs or events, not shared databases.

2\. Over-Engineering from Day One

"Let's use Kafka, Cassandra, Kubernetes, and event sourcing" — for a blog with 10 visitors/day.

**Fix**: Start with the simplest thing that works. A monolith with PostgreSQL and Redis will handle 99% of applications. Extract services when there's a proven need.

3\. Synchronous Coupling via HTTP




Service A ──HTTP──▶ Service B ──HTTP──▶ Service C ──HTTP──▶ Service D




If one service is slow, the whole chain slows. Latency adds up. Failures cascade.

**Fix**: Use async communication for non-critical paths. Use circuit breakers for critical sync calls. Prefer eventual consistency over synchronous coordination.

4\. The Shared Database

Two services reading/writing the same database table. Schema changes require coordination. One service can deadlock the other.

**Fix**: Each service owns its data. Share via APIs or events, not databases.

5\. Ignoring Caching

Every request hits the database. Database CPU is 90%. Response times are 200ms for data that changes hourly.

**Fix**: Add Redis. Cache the most frequently accessed data. Even a 60-second cache TTL reduces DB load by 95% for read-heavy workloads.

6\. The N+1 Query Problem




# Anti-pattern: N+1 queries


def get_orders_with_items(user_id: str):


orders = db.query("SELECT * FROM orders WHERE user_id = :uid", {"uid": user_id})


for order in orders:


# One query PER order — terrible!


order["items"] = db.query(


"SELECT * FROM order_items WHERE order_id = :oid",


{"oid": order["id"]}


)


return orders




# Fix: single query with JOIN


def get_orders_with_items_fixed(user_id: str):


return db.query("""


SELECT o.id, o.total, oi.product_id, oi.quantity


FROM orders o


LEFT JOIN order_items oi ON oi.order_id = o.id


WHERE o.user_id = :uid


""", {"uid": user_id})





7\. No Monitoring / No Observability

"Everything looks fine" — until users complain that the site is slow and you have no idea why.

**Baseline monitoring**: Request latency (p50, p95, p99), error rate, throughput, CPU/memory per service. Structured logging with correlation IDs. Distributed tracing for async flows.

\---

Summary: Key Decisions for 2026

| Decision | Default Choice | Upgrade When | |---|---|---| | **Architecture** | Modular monolith | Team >15 or clear independent scale need | | **Database** | PostgreSQL | Read replicas at 10k reads/s, sharding at 100k | | **Cache** | Redis (cache aside) | Write-behind for high-throughput writes | | **Queue** | SQS (serverless) → RabbitMQ (control) → Kafka (streaming) | Scale-dependent | | **Async** | Fire and forget for non-critical | Polling → Webhooks as needs grow | | **API Gateway** | NGINX / Traefik | Envoy / Kong for advanced routing | | **Resilience** | Circuit breaker + timeout | Bulkhead + rate limiting at scale |

The best system design is the one that solves today's problem without creating tomorrow's nightmare. Start simple, measure everything, extract with surgical precision, and never optimize for a scale you haven't reached.