Database Compression: Page-Level, Tuple-Level, Columnar, and TOAST


Database Compression: Page-Level, Tuple-Level, Columnar, and TOAST

Database compression reduces storage footprint and, more importantly, improves query performance by reducing the amount of data read from disk. This article covers compression techniques across PostgreSQL, MySQL, and columnar databases.

Why Compression Matters

Compression provides three benefits:


* **Storage savings**: Reduce disk costs by 2x-10x depending on data characteristics.

* **I/O reduction**: Fewer pages read per query means faster scans.

* **Cache efficiency**: More data fits in shared_buffers or the OS page cache.


The trade-off is CPU usage for compression and decompression. Modern hardware makes this trade-off favorable for most workloads.

PostgreSQL Compression Layers

Page-Level Compression

PostgreSQL stores data in 8 KB pages. Page-level compression compresses the entire page as a unit. The `page_compression` feature (available since PostgreSQL 15 with `zstd`) reduces page size on disk:




-- Enable page compression on a table


CREATE TABLE logs_compressed (


id BIGSERIAL,


payload TEXT,


created_at TIMESTAMPTZ


) WITH (compression = 'zstd');




-- Or alter existing table


ALTER TABLE logs SET (compression = 'pglz');





Page compression works transparently: the database decompresses pages when reading and compresses when writing. The overhead is minimal for sequential scans.

TOAST (The Oversized-Attribute Storage Technique)

TOAST is PostgreSQL's built-in mechanism for handling large field values. When a row exceeds the 8 KB page size, PostgreSQL moves oversized values to a secondary TOAST table:




-- Check TOAST compression settings


SELECT attname,


CASE attstorage


WHEN 'p' THEN 'plain'


WHEN 'm' THEN 'main'


WHEN 'x' THEN 'extended'


WHEN 'e' THEN 'external'


END AS storage_type


FROM pg_attribute


WHERE attrelid = 'documents'::regclass


AND attnum > 0;





Storage types:


* `PLAIN`: No compression, no TOAST. For fixed-width types like `INTEGER`.

* `EXTENDED` (default for `TEXT`, `BYTEA`): Try compression first; if still too large, move to TOAST.

* `EXTERNAL`: Move to TOAST without compression. Useful for data that is already compressed (e.g., JPEG, JSON).

* `MAIN`: Try compression but keep in main table if possible.





-- Change storage type for a column


ALTER TABLE documents ALTER COLUMN image SET STORAGE EXTERNAL;





TOAST compression uses a fast, lightweight algorithm (pglz or zstd). It is invisible to queries: SELECT statements decompress transparently.

Tuple-Level Compression

Tuple-level compression compresses individual row values. PostgreSQL's built-in `COMPRESSION` clause (PostgreSQL 14+) allows per-column compression:




CREATE TABLE events (


id BIGSERIAL,


event_type TEXT COMPRESSION lz4,


payload JSONB COMPRESSION zstd,


created_at TIMESTAMPTZ


);





Supported algorithms: `pglz`, `lz4`, `zstd` (with `--with-zstd` build flag). LZ4 is fastest with moderate compression; Zstd offers the best ratio.

External Compression with pgstattuple

Measure your current compression benefits:




CREATE EXTENSION pgstattuple;




SELECT * FROM pgstattuple('large_table');


-- free_space, tuple_count, tuple_len, dead_tuple_count, dead_tuple_len





Columnar Compression

Columnar databases like ClickHouse, Redshift, and Parquet-format files compress extremely well because same-typed values repeat within a column:




-- ClickHouse example: columnar compression is automatic


CREATE TABLE events (


event_type LowCardinality(String),


user_id UInt32,


timestamp DateTime,


payload String


) ENGINE = MergeTree()


ORDER BY timestamp;




-- Each column is compressed independently


-- event_type: run-length encoding after dictionary compression


-- user_id: delta encoding + zstd


-- payload: zstd





Columnar compression ratios are often 5x-10x higher than row-oriented compression because similar values cluster together.

Compression Algorithm Comparison

| Algorithm | Speed | Ratio | CPU Use | PostgreSQL Support | |-----------|-------|-------|---------|-------------------| | pglz | Fast | Low | Low | Built-in (default) | | LZ4 | Very fast | Low-Medium | Very low | PG 14+ | | Zstd | Moderate | High | Moderate | PG 15+ | | zlib | Slow | High | High | Extension |

Benchmarks show LZ4 compresses at 2-3 GB/s per core, while Zstd achieves 15-30% better ratios at 500 MB/s.

Real-World Storage Savings

| Data Type | Raw Size | pglz | zstd | Notes | |-----------|----------|------|------|-------| | Log text | 100 GB | 35 GB | 22 GB | Highly compressible | | JSON payloads | 50 GB | 42 GB | 30 GB | Semi-structured | | UUIDs | 10 GB | 10 GB | 10 GB | Incompressible | | Numeric arrays | 20 GB | 8 GB | 5 GB | Delta encoding helps |

Monitoring Compression




-- Table size breakdown


SELECT schemaname, tablename,


pg_size_pretty(pg_table_size(relid)) AS table_size,


pg_size_pretty(pg_indexes_size(relid)) AS index_size,


pg_size_pretty(pg_total_relation_size(relid)) AS total_size


FROM pg_catalog.pg_statio_user_tables


ORDER BY pg_total_relation_size(relid) DESC;





For TOAST-specific sizes:




SELECT relname,


pg_size_pretty(relpages::bigint * 8192) AS toast_size


FROM pg_class


WHERE oid = (


SELECT reltoastrelid FROM pg_class WHERE relname = 'large_table'


);





Best Practices


* **Use Zstd for new tables** with large TEXT/JSONB columns. It offers the best compression ratio in PostgreSQL.

2\. **Use EXTERNAL storage** for columns that are already compressed (images, compressed archives). 3\. **Benchmark your workload**: Compression benefits vary. Run `pgstattuple` before and after. 4\. **Consider BRIN indexes** on compressed, naturally ordered columns (timestamps, sequence IDs) for additional space savings. 5\. **Balance CPU and I/O**: If your database is CPU-bound, avoid heavy compression. If I/O-bound, compress aggressively.

Compression is a rare optimization that reduces cost and improves performance simultaneously. Measure your data's compressibility and choose the right algorithm for each column type.