Change Data Capture: Tracking Database Changes in Real-Time
Change Data Capture (CDC) tracks row-level changes in a database and streams them to other systems. CDC captures inserts, updates, and deletes without application-level instrumentation. It is the foundation for event-driven architectures and real-time data pipelines.
CDC Methods
Log-based CDC reads the database transaction log (WAL in PostgreSQL, binlog in MySQL). It captures all changes with minimal database impact. Log-based CDC is the preferred method because it does not require schema changes and has low overhead.
Trigger-based CDC uses database triggers to capture changes. It provides more control over what is captured but adds overhead to every write. Trigger-based CDC is suitable when log-based capture is not available.
Polling-based CDC periodically queries tables for changes using timestamp or version columns. It is the simplest to implement but has higher latency and database impact. Polling is suitable for low-frequency synchronization.
Tools
Debezium is the most popular CDC platform. It connects to database transaction logs and streams changes to Apache Kafka. Debezium supports PostgreSQL, MySQL, MongoDB, SQL Server, and Oracle.
Use Cases
CDC supports data warehouse synchronization, cache invalidation, search index updates, event streams for microservices, and real-time analytics. It reduces coupling between operational and analytical systems.