Composite Indexes: Column Order, Covering Indexes, and Partial Indexes
Composite Indexes: Column Order, Covering Indexes, and Partial Indexes
Composite indexes (indexes on multiple columns) are powerful but easy to misuse. The column order determines which queries benefit, and the right combination of covering and partial indexes can dramatically reduce query time.
Column Order: The Cardinality Rule
The general rule for column order in a composite B-tree index is:
**Put columns with the highest cardinality (most distinct values) first.**
-- Correct: high cardinality first
CREATE INDEX idx_user_status ON orders (user_id, status);
-- Less optimal: low cardinality first
CREATE INDEX idx_status_user ON orders (status, user_id);
Why? B-tree indexes sort by the first column, then the second, and so on. Equality conditions on the leading column prune the search space immediately. Range conditions on the leading column defeat subsequent columns.
Query Scenarios
-- Index: (user_id, status)
-- Excellent: equality on first column
SELECT * FROM orders WHERE user_id = 42; -- uses index
-- Excellent: equality on first column, equality on second
SELECT * FROM orders WHERE user_id = 42 AND status = 'paid'; -- uses index
-- Good: equality on first, range on second
SELECT * FROM orders WHERE user_id = 42 AND total > 100; -- still uses index
-- Excellent: ORDER BY on leading columns
SELECT * FROM orders WHERE user_id = 42 ORDER BY status; -- index provides sort order
-- Poor: only second column in WHERE
SELECT * FROM orders WHERE status = 'paid'; -- cannot use leading column
The "Skip Scan" Limitation
Without the first column in the WHERE clause, PostgreSQL cannot use the composite index efficiently (before PostgreSQL 16). Since PostgreSQL 16, the `enable_skip_scan` parameter allows the planner to skip through distinct values of the leading column, but it is less efficient than a dedicated index:
-- PostgreSQL 16+ can use this with skip scan, but still suboptimal
SELECT * FROM orders WHERE status = 'pending';
-- Better: CREATE INDEX idx_status ON orders (status);
Covering Indexes
A covering index contains all the columns needed by a query, enabling index-only scans. PostgreSQL calls these INCLUDE indexes:
CREATE INDEX idx_orders_covering ON orders (user_id, status) INCLUDE (total, created_at);
-- This query can be satisfied entirely from the index:
SELECT total, created_at FROM orders WHERE user_id = 42 AND status = 'paid';
-- Index Only Scan, no heap access
-- This query needs columns not in the index:
SELECT total, shipping_address FROM orders WHERE user_id = 42 AND status = 'paid';
-- Includes heap access for shipping_address
The `INCLUDE` columns are stored in the index's leaf pages but do not participate in sorting. They allow index-only scans without bloating the index's internal pages.
When to Use INCLUDE
-- Without INCLUDE: index has only the key column
CREATE INDEX idx_user_created ON orders (user_id, created_at);
-- Query triggers bitmap heap scan or fetches from heap:
SELECT user_id, created_at, total FROM orders WHERE user_id = 42;
-- With INCLUDE: total is stored in leaf pages
CREATE INDEX idx_user_created_covering ON orders (user_id, created_at) INCLUDE (total);
-- Index-only scan:
SELECT user_id, created_at, total FROM orders WHERE user_id = 42;
Key INCLUDE columns are those that:
* Are frequently selected in queries filtered by the index key columns.
* Have types that are expensive to fetch from the heap (large TEXT, JSONB).
* Significantly reduce the number of heap fetches.
Partial Indexes
Partial indexes cover only a subset of rows, making them smaller and faster to maintain:
CREATE INDEX idx_orders_active ON orders (user_id) WHERE status NOT IN ('cancelled', 'refunded');
-- This query uses the partial index:
SELECT * FROM orders WHERE user_id = 42 AND status NOT IN ('cancelled', 'refunded');
-- This query does NOT use the partial index:
SELECT * FROM orders WHERE user_id = 42 AND status = 'cancelled';
Partial indexes excel in scenarios where queries consistently target a subset of data:
-- Index for recent orders only
CREATE INDEX idx_orders_recent ON orders (user_id, created_at)
WHERE created_at > NOW() - INTERVAL '30 days';
-- Index for high-value customers
CREATE INDEX idx_orders_vip ON orders (user_id, total)
WHERE total > 1000;
-- Index that excludes nulls entirely (common pattern)
CREATE INDEX idx_users_email ON users (email) WHERE email IS NOT NULL;
Partial Unique Indexes
-- Enforce unique email only for active users
CREATE UNIQUE INDEX idx_active_email ON users (email) WHERE active = true;
-- Users can have duplicate inactive emails, but active users must have unique ones
Index-Only Scans
An index-only scan returns all needed data from the index without touching the heap table. For this to work, the visibility map must confirm that all tuples are visible to the current transaction:
-- Table with frequent updates: heap fetches still needed
EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id, status FROM orders WHERE user_id = 42;
-- Look for: Heap Fetches: 0 (true index-only) or > 0 (partial)
Autovacuum maintains the visibility map. Run `VACUUM` aggressively on tables where index-only scans are critical:
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);
Practical Examples
Multi-Column Filter and Sort
-- Query: find recent paid orders for a user, sorted by date
SELECT total, created_at
FROM orders
WHERE user_id = 42 AND status = 'paid' AND created_at > '2026-01-01'
ORDER BY created_at DESC;
-- Optimal index:
CREATE INDEX idx_orders_lookup ON orders (user_id, status, created_at DESC) INCLUDE (total);
Reporting Query
-- Query: aggregate orders by status for a date range
SELECT status, COUNT(*), SUM(total)
FROM orders
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY status;
-- Index: if created_at is monotonically increasing, BRIN might work
CREATE INDEX idx_orders_date_brin ON orders USING BRIN (created_at);
-- Or a regular B-tree for range scans
CREATE INDEX idx_orders_date ON orders (created_at, status) INCLUDE (total);
Maintenance
Composite indexes can become bloated. Monitor their efficiency:
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan > 0
ORDER BY idx_scan;
-- Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey';
A well-designed composite index can eliminate the need for multiple single-column indexes. Audit your indexes regularly: remove duplicates, add covering columns, and trim unused partial indexes. Each index adds write overhead, so the total set should serve your query patterns without redundancy.