What Is Database Replication?
Database replication is the process of copying data from one database server to another. Replication serves three primary purposes: high availability (failover if the primary fails), read scaling (distribute read queries across replicas), and disaster recovery (geographically distributed copies).
Replication Topologies
Leader-Follower (Single-Leader)
The most common replication pattern. One node accepts writes (leader), and one or more nodes replicate data asynchronously or synchronously (followers).
Read Queries
|
Client Writes ───> Leader ────> Follower 1 ───> Results
| |
| Follower 2 ───> Results
|
Follower 3 ───> Results
-- PostgreSQL: Configure streaming replication
-- On leader (postgresql.conf)
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024 -- MB
-- On follower (recovery.conf)
primary_conninfo = 'host=leader-ip port=5432 user=replicator'
primary_slot_name = 'follower_1'
# Application connection pooling (read/write splitting)
spring:
datasource:
write:
url: jdbc:postgresql://leader:5432/db
read:
url: jdbc:postgresql://follower:5432/db
# Or multiple read replicas:
# - jdbc:postgresql://follower1:5432/db
# - jdbc:postgresql://follower2:5432/db
Multi-Leader
Multiple nodes accept writes, each propagating changes to other leaders. Common in geo-distributed deployments.
US East Leader <───> EU West Leader
│ │
│ │
US East Follower EU West Follower
# Conflict resolution in multi-leader replication
# Last-Writer-Wins (LWW) approach
def resolve_conflict(existing, incoming):
"""Resolve by most recent timestamp."""
if incoming.timestamp > existing.timestamp:
return incoming
elif incoming.timestamp < existing.timestamp:
return existing
else:
# Same timestamp: merge values
return merge_values(existing, incoming)
Peer-to-Peer (Multi-Leader All-to-All)
Every node can accept reads and writes. Changes propagate to all other nodes.
Node 1 <───> Node 2
↑ ↑
↓ ↓
Node 3 <───> Node 4
Replication Methods
Statement-Based Replication
SQL statements are sent from leader to followers.
Leader executes: UPDATE users SET name = 'Bob' WHERE id = 1
Follower receives: UPDATE users SET name = 'Bob' WHERE id = 1
**Problem**: Non-deterministic functions (`NOW()`, `RANDOM()`) may produce different results on followers.
Write-Ahead Log (WAL) Shipping
The leader's transaction log is shipped to followers, which replay it.
# PostgreSQL WAL shipping
# Followers continuously apply WAL segments
# This is the most common PostgreSQL replication method
Logical Replication
A higher-level replication that streams changes by row rather than by disk block:
-- PostgreSQL logical replication
-- Publisher (leader)
CREATE PUBLICATION my_pub FOR TABLE users, orders;
-- Subscriber (follower)
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=leader port=5432 dbname=mydb'
PUBLICATION my_pub;
Synchronous vs Asynchronous
| Aspect | Synchronous | Asynchronous |
|--------|-------------|--------------|
| Data loss on failover | None (committed on all nodes) | Possible (unsynced writes lost) |
| Write latency | Higher (wait for all nodes) | Lower (ack immediately) |
| Availability | Lower (replica failure blocks writes) | Higher (replica failure ignored) |
| Use case | Financial systems | Most web applications |
-- PostgreSQL: Configure synchronous replication
synchronous_standby_names = 'FIRST 2 (*)'
-- Write commits only after at least 2 replicas confirm
Automatic Failover
# Patroni configuration for PostgreSQL HA
scope: mydb
namespace: /service/
name: pg_leader
consul:
host: consul.service.consul
postgresql:
listen: 0.0.0.0:5432
data_dir: /data/postgresql
bin_dir: /usr/lib/postgresql/16/bin
parameters:
max_connections: 200
authentication:
replication:
username: replicator
password: secure_password
superuser:
username: postgres
password: secure_password
callbacks:
on_role_change: /scripts/notify_role_change.sh
Read Replica Lag
Asynchronous replication introduces lag. Applications must handle stale reads:
class DatabaseRouter:
"""Route reads to replicas with staleness awareness."""
def __init__(self, primary, replicas):
self.primary = primary
self.replicas = replicas
def get_connection(self, need_freshness=False):
"""Get a database connection for reading."""
if need_freshness:
# Query must see the latest write
return self.primary
# Round-robin through replicas
replica = random.choice(self.replicas)
return replica
# Usage
def get_order(order_id, user_id):
# If user just placed this order, read from primary
if order_just_placed(user_id):
conn = router.get_connection(need_freshness=True)
else:
conn = router.get_connection()
return conn.query("SELECT * FROM orders WHERE id = %s", [order_id])
Monitoring Replication
-- PostgreSQL: Check replication status
SELECT
application_name,
state,
sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
(EXTRACT(EPOCH FROM NOW() - replay_lag) * 1000)::bigint AS lag_ms
FROM pg_stat_replication;
-- Check last WAL receive/replay time
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_delay;
Common Pitfalls
| Pitfall | Consequence | Mitigation |
|---------|-------------|------------|
| Replication lag | Stale reads | Route freshness-sensitive queries to primary |
| Auto-increment conflicts | Duplicate keys | Use UUIDs or sequence ranges in multi-leader |
| Schema drift | Replication breaks | Apply migrations to followers first, then leader |
| Network latency | Replication delay | Keep replicas in same region for sync replication |
| Cascading failures | Overloaded replicas | Add connection pooling and query limits |
Summary
Leader-follower replication is the standard pattern for most applications, providing read scaling and failover capabilities. Use synchronous replication for zero data loss in critical systems and asynchronous replication for lower write latency. Implement automatic failover with tools like Patroni, handle read replicas with staleness-aware routing, and monitor replication lag as a key operational metric. For global deployments, consider multi-leader replication with conflict resolution strategies.