Composite Indexes: Column Order, Covering Indexes, and Partial Indexes


Composite Indexes: Column Order, Covering Indexes, and Partial Indexes

Composite indexes (indexes on multiple columns) are powerful but easy to misuse. The column order determines which queries benefit, and the right combination of covering and partial indexes can dramatically reduce query time.

Column Order: The Cardinality Rule

The general rule for column order in a composite B-tree index is:

**Put columns with the highest cardinality (most distinct values) first.**




-- Correct: high cardinality first


CREATE INDEX idx_user_status ON orders (user_id, status);




-- Less optimal: low cardinality first


CREATE INDEX idx_status_user ON orders (status, user_id);





Why? B-tree indexes sort by the first column, then the second, and so on. Equality conditions on the leading column prune the search space immediately. Range conditions on the leading column defeat subsequent columns.

Query Scenarios




-- Index: (user_id, status)




-- Excellent: equality on first column


SELECT * FROM orders WHERE user_id = 42; -- uses index




-- Excellent: equality on first column, equality on second


SELECT * FROM orders WHERE user_id = 42 AND status = 'paid'; -- uses index




-- Good: equality on first, range on second


SELECT * FROM orders WHERE user_id = 42 AND total > 100; -- still uses index




-- Excellent: ORDER BY on leading columns


SELECT * FROM orders WHERE user_id = 42 ORDER BY status; -- index provides sort order




-- Poor: only second column in WHERE


SELECT * FROM orders WHERE status = 'paid'; -- cannot use leading column





The "Skip Scan" Limitation

Without the first column in the WHERE clause, PostgreSQL cannot use the composite index efficiently (before PostgreSQL 16). Since PostgreSQL 16, the `enable_skip_scan` parameter allows the planner to skip through distinct values of the leading column, but it is less efficient than a dedicated index:




-- PostgreSQL 16+ can use this with skip scan, but still suboptimal


SELECT * FROM orders WHERE status = 'pending';


-- Better: CREATE INDEX idx_status ON orders (status);





Covering Indexes

A covering index contains all the columns needed by a query, enabling index-only scans. PostgreSQL calls these INCLUDE indexes:




CREATE INDEX idx_orders_covering ON orders (user_id, status) INCLUDE (total, created_at);




-- This query can be satisfied entirely from the index:


SELECT total, created_at FROM orders WHERE user_id = 42 AND status = 'paid';


-- Index Only Scan, no heap access




-- This query needs columns not in the index:


SELECT total, shipping_address FROM orders WHERE user_id = 42 AND status = 'paid';


-- Includes heap access for shipping_address





The `INCLUDE` columns are stored in the index's leaf pages but do not participate in sorting. They allow index-only scans without bloating the index's internal pages.

When to Use INCLUDE




-- Without INCLUDE: index has only the key column


CREATE INDEX idx_user_created ON orders (user_id, created_at);




-- Query triggers bitmap heap scan or fetches from heap:


SELECT user_id, created_at, total FROM orders WHERE user_id = 42;




-- With INCLUDE: total is stored in leaf pages


CREATE INDEX idx_user_created_covering ON orders (user_id, created_at) INCLUDE (total);




-- Index-only scan:


SELECT user_id, created_at, total FROM orders WHERE user_id = 42;





Key INCLUDE columns are those that:


* Are frequently selected in queries filtered by the index key columns.

* Have types that are expensive to fetch from the heap (large TEXT, JSONB).

* Significantly reduce the number of heap fetches.


Partial Indexes

Partial indexes cover only a subset of rows, making them smaller and faster to maintain:




CREATE INDEX idx_orders_active ON orders (user_id) WHERE status NOT IN ('cancelled', 'refunded');




-- This query uses the partial index:


SELECT * FROM orders WHERE user_id = 42 AND status NOT IN ('cancelled', 'refunded');




-- This query does NOT use the partial index:


SELECT * FROM orders WHERE user_id = 42 AND status = 'cancelled';





Partial indexes excel in scenarios where queries consistently target a subset of data:




-- Index for recent orders only


CREATE INDEX idx_orders_recent ON orders (user_id, created_at)


WHERE created_at > NOW() - INTERVAL '30 days';




-- Index for high-value customers


CREATE INDEX idx_orders_vip ON orders (user_id, total)


WHERE total > 1000;




-- Index that excludes nulls entirely (common pattern)


CREATE INDEX idx_users_email ON users (email) WHERE email IS NOT NULL;





Partial Unique Indexes




-- Enforce unique email only for active users


CREATE UNIQUE INDEX idx_active_email ON users (email) WHERE active = true;




-- Users can have duplicate inactive emails, but active users must have unique ones





Index-Only Scans

An index-only scan returns all needed data from the index without touching the heap table. For this to work, the visibility map must confirm that all tuples are visible to the current transaction:




-- Table with frequent updates: heap fetches still needed


EXPLAIN (ANALYZE, BUFFERS)


SELECT user_id, status FROM orders WHERE user_id = 42;


-- Look for: Heap Fetches: 0 (true index-only) or > 0 (partial)





Autovacuum maintains the visibility map. Run `VACUUM` aggressively on tables where index-only scans are critical:




ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);





Practical Examples

Multi-Column Filter and Sort




-- Query: find recent paid orders for a user, sorted by date


SELECT total, created_at


FROM orders


WHERE user_id = 42 AND status = 'paid' AND created_at > '2026-01-01'


ORDER BY created_at DESC;




-- Optimal index:


CREATE INDEX idx_orders_lookup ON orders (user_id, status, created_at DESC) INCLUDE (total);





Reporting Query




-- Query: aggregate orders by status for a date range


SELECT status, COUNT(*), SUM(total)


FROM orders


WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'


GROUP BY status;




-- Index: if created_at is monotonically increasing, BRIN might work


CREATE INDEX idx_orders_date_brin ON orders USING BRIN (created_at);




-- Or a regular B-tree for range scans


CREATE INDEX idx_orders_date ON orders (created_at, status) INCLUDE (total);





Maintenance

Composite indexes can become bloated. Monitor their efficiency:




-- Check index usage


SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch


FROM pg_stat_user_indexes


WHERE idx_scan > 0


ORDER BY idx_scan;




-- Find unused indexes


SELECT schemaname, tablename, indexname, idx_scan


FROM pg_stat_user_indexes


WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey';





A well-designed composite index can eliminate the need for multiple single-column indexes. Audit your indexes regularly: remove duplicates, add covering columns, and trim unused partial indexes. Each index adds write overhead, so the total set should serve your query patterns without redundancy.