Multi-Master Replication: Conflict Resolution, CRDTs, Galera, and BDR
Multi-Master Replication: Conflict Resolution, CRDTs, Galera, and BDR
Multi-master replication allows writes to multiple database nodes simultaneously. Unlike primary-replica setups, there is no single point for writes. The trade-off is complexity, particularly around conflict resolution.
Why Multi-Master?
The primary reasons to consider multi-master replication are:
* **Multi-region writes**: Users in the US and Europe both write with local latency.
* **Zero downtime upgrades**: Any node can be taken offline without losing write availability.
* **Read scalability with local writes**: Each node can serve both reads and writes with low latency.
Conflict Resolution Strategies
When two nodes concurrently modify the same row, a conflict occurs. Resolution strategies vary by system:
Last-Write-Wins (LWW)
Each row carries a timestamp. The write with the latest timestamp wins. Simple but lossy:
-- Cassandra default
UPDATE users SET email = 'new@example.com', updated_at = now() WHERE id = 1;
-- The LWW with the highest timestamp wins
**Pros**: Simple, always converges. **Cons**: Loses data silently.
Application-Mediated Conflict Resolution
The database detects conflicts and presents them to the application for resolution. BDR (Bi-Directional Replication) for PostgreSQL supports this:
-- Create a conflict handler function
CREATE OR REPLACE FUNCTION resolve_conflict()
RETURNS trigger AS $$
BEGIN
-- Custom logic: keep the row with the higher version
IF NEW.version >= OLD.version THEN
RETURN NEW;
END IF;
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
-- Apply to the replication set
SELECT bdr.conflict_handler('orders', 'resolve_conflict');
CRDT-Based Reconciliation
Conflict-free Replicated Data Types (CRDTs) mathematically guarantee convergence without coordination. Instead of storing a scalar value, you store a data structure where concurrent operations commute or combine:
-- Counter CRDT table
CREATE TABLE page_views (
page_id INTEGER PRIMARY KEY,
counter INTEGER DEFAULT 0
);
-- Each replica increments independently
UPDATE page_views SET counter = counter + 1 WHERE page_id = 42;
-- The final value after reconciliation: MAX of all replicas' counters (for grow-only counters)
-- or SUM (for counters implemented as PN-Counters)
More sophisticated CRDTs include:
* **G-Set** (grow-only set): Elements can only be added. Union merges converge.
* **LWW-Register**: Last-write-wins register (if using wall clocks) or compare-and-swap.
* **OR-Set** (observed-remove set): Supports both add and remove without losing concurrent adds.
Galera Cluster (MySQL / MariaDB)
Galera Cluster implements synchronous multi-master replication for MySQL. All nodes coordinate before committing:
-- All nodes are writable
-- On each node:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) NOT NULL
);
INSERT INTO users (email) VALUES ('alice@example.com');
-- This INSERT is certified on all nodes before returning
Galera Key Properties
* **Synchronous replication**: All nodes apply changes simultaneously.
* **Certification-based**: Transactions are certified on every node before commit.
* **Automatic node provisioning**: New nodes join via State Snapshot Transfer (SST) or Incremental State Transfer (IST).
* **No replication lag**: All nodes are consistent at commit time.
Galera Limitations
* Cluster size penalty: 2-3 nodes is optimal. More nodes increase certification overhead.
* Full table writes locks must be avoided; `SELECT ... FOR UPDATE` on uncached workloads causes deadlocks.
* Network latency between nodes directly impacts write latency (the "slowest node" problem).
* `REPEATABLE READ` is the default; `SERIALIZABLE` causes high conflict rates.
PostgreSQL BDR (Bi-Directional Replication)
BDR extends PostgreSQL's logical replication for multi-master scenarios. It operates at row level with conflict resolution:
-- Configure BDR group
SELECT bdr.create_node(
node_name := 'node1',
dsn := 'host=node1.example.com port=5432 dbname=appdb'
);
SELECT bdr.create_node(
node_name := 'node2',
dsn := 'host=node2.example.com port=5432 dbname=appdb'
);
SELECT bdr.create_group(
group_name := 'global_app',
nodes := ARRAY['node1', 'node2']
);
BDR supports multiple conflict resolution modes:
* `latest_timestamp_wins`
* `latest_version_wins`
* `apply_remote`
* `apply_local`
CAP Theorem Implications
Multi-master systems live in the CAP trade-off space:
* **Galera**: CP (Consistency + Partition tolerance). Sacrifices availability during network partitions because writes must certify on all nodes.
* **Cassandra-style LWW**: AP (Availability + Partition tolerance). Sacrifices immediate consistency but always accepts writes.
* **BDR with custom handlers**: Tunable. Configure per-table conflict strategies.
When to Avoid Multi-Master
Multi-master replication is not the default for good reasons:
* **Single-region apps**: A single PostgreSQL primary with streaming replicas handles the vast majority of workloads.
* **Apps with hot rows**: If 90% of writes target 10 rows (e.g., a counter), conflicts are guaranteed.
* **Teams without operational maturity**: Multi-master requires monitoring replication lag, conflict rates, and node health.
Monitoring Conflicts
Whichever system you choose, monitor conflict rates:
-- BDR conflict stats
SELECT * FROM bdr.conflict_history;
-- Galeria cluster status
SHOW STATUS LIKE 'wsrep_cluster_size';
SHOW STATUS LIKE 'wsrep_local_cert_failures';
A rising conflict rate indicates application-level contention that might be better solved by sharding the data model rather than relying on replication to resolve conflicts.
Multi-master replication is a powerful but complex tool. Start with single-master and add multi-master only when your availability or latency requirements genuinely demand it.