Distributed Databases: Concepts and Implementation


Distributed Database Architecture

Distributed databases store data across multiple nodes, presenting a unified interface. They provide scalability and fault tolerance.

CAP Theorem

A distributed system can provide at most two of: Consistency, Availability, Partition Tolerance. Since partitions are inevitable, systems choose CP or AP.

Consensus Algorithms

Raft

Raft is a consensus algorithm designed for understandability:




class RaftNode:


def __init__(self):


self.state = "FOLLOWER"


self.current_term = 0




def start_election(self):


self.state = "CANDIDATE"


votes = 1


for peer in self.peers:


if peer.request_vote(self.current_term):


votes += 1


if votes > len(self.peers) // 2:


self.state = "LEADER"





Raft powers etcd, Consul, and MongoDB replication.

Paxos

Paxos is the original consensus algorithm. It is correct but difficult to understand. Used in Google Spanner and Chubby.

Gossip Protocol

Nodes periodically exchange state with random peers. Information spreads in O(log N) rounds. Used in Cassandra for membership detection and failure detection.

Dynamo-Style Architecture

Amazon DynamoDB and Cassandra prioritize availability:




-- Cassandra: tunable consistency


INSERT INTO users (user_id, name) VALUES ('u1', 'Alice')


USING CONSISTENCY QUORUM;




SELECT * FROM users WHERE user_id = 'u1'


USING CONSISTENCY ONE;





Spanner-Style Architecture

Google Spanner provides strong consistency globally using TrueTime (GPS + atomic clocks) for external consistency.

Conclusion

Choose Dynamo-style (Cassandra, DynamoDB) for availability and tunable consistency. Choose Spanner-style (Spanner, CockroachDB) for strong consistency. Choose Raft-based systems (etcd) for coordination. Consider your consistency requirements carefully.