Columnar Databases: When and How to Use Them
Columnar Storage
Columnar databases store data by column rather than by row. This enables aggressive compression and fast analytical queries.
Row vs Columnar
Row-oriented storage writes each row contiguously. Columnar stores all values of a column together, enabling efficient scans of a few columns across millions of rows.
-- ClickHouse: columnar query
SELECT region, SUM(revenue)
FROM sales
WHERE year = 2026
GROUP BY region;
-- Only reads 3 columns instead of all columns
Compression Techniques
Columnar databases use specialized compression: run-length encoding for low-cardinality columns, dictionary encoding, and delta encoding for sorted columns. Compression ratios of 5-10x are common.
When to Use Columnar
| Database | Best For | Notable Feature | |----------|----------|-----------------| | ClickHouse | Real-time analytics | MergeTree engine | | DuckDB | Embedded analytics | In-process OLAP | | Redshift | Cloud data warehousing | MPP architecture |
-- ClickHouse table
CREATE TABLE events (
timestamp DateTime,
event_type String,
user_id UInt32
) ENGINE = MergeTree()
ORDER BY (event_type, timestamp);
Conclusion
Use columnar databases for analytical workloads scanning many rows but few columns. ClickHouse for real-time, DuckDB for embedded, Redshift for cloud. Avoid for transactional workloads with frequent single-row operations.