Composite Indexes: Column Order, Covering Indexes, and Partial Indexes

Composite indexes (indexes on multiple columns) are powerful but easy to misuse. The column order determines which queries benefit, and the right combination of covering and partial indexes can dramatically reduce query time.

Column Order: The Cardinality Rule

The general rule for column order in a composite B-tree index is:

**Put columns with the highest cardinality (most distinct values) first.**

-- Correct: high cardinality first

CREATE INDEX idx_user_status ON orders (user_id, status);

-- Less optimal: low cardinality first

CREATE INDEX idx_status_user ON orders (status, user_id);

Why? B-tree indexes sort by the first column, then the second, and so on. Equality conditions on the leading column prune the search space immediately. Range conditions on the leading column defeat subsequent columns.

Query Scenarios

-- Index: (user_id, status)

-- Excellent: equality on first column

SELECT * FROM orders WHERE user_id = 42; -- uses index

-- Excellent: equality on first column, equality on second

SELECT * FROM orders WHERE user_id = 42 AND status = 'paid'; -- uses index

-- Good: equality on first, range on second

SELECT * FROM orders WHERE user_id = 42 AND total > 100; -- still uses index

-- Excellent: ORDER BY on leading columns

SELECT * FROM orders WHERE user_id = 42 ORDER BY status; -- index provides sort order

-- Poor: only second column in WHERE

SELECT * FROM orders WHERE status = 'paid'; -- cannot use leading column

The "Skip Scan" Limitation

Without the first column in the WHERE clause, PostgreSQL cannot use the composite index efficiently (before PostgreSQL 16). Since PostgreSQL 16, the `enable_skip_scan` parameter allows the planner to skip through distinct values of the leading column, but it is less efficient than a dedicated index:

-- PostgreSQL 16+ can use this with skip scan, but still suboptimal

SELECT * FROM orders WHERE status = 'pending';

-- Better: CREATE INDEX idx_status ON orders (status);

Covering Indexes

A covering index contains all the columns needed by a query, enabling index-only scans. PostgreSQL calls these INCLUDE indexes:

CREATE INDEX idx_orders_covering ON orders (user_id, status) INCLUDE (total, created_at);

-- This query can be satisfied entirely from the index:

SELECT total, created_at FROM orders WHERE user_id = 42 AND status = 'paid';

-- Index Only Scan, no heap access

-- This query needs columns not in the index:

SELECT total, shipping_address FROM orders WHERE user_id = 42 AND status = 'paid';

-- Includes heap access for shipping_address

The `INCLUDE` columns are stored in the index's leaf pages but do not participate in sorting. They allow index-only scans without bloating the index's internal pages.

When to Use INCLUDE

-- Without INCLUDE: index has only the key column

CREATE INDEX idx_user_created ON orders (user_id, created_at);

-- Query triggers bitmap heap scan or fetches from heap:

SELECT user_id, created_at, total FROM orders WHERE user_id = 42;

-- With INCLUDE: total is stored in leaf pages

CREATE INDEX idx_user_created_covering ON orders (user_id, created_at) INCLUDE (total);

-- Index-only scan:

SELECT user_id, created_at, total FROM orders WHERE user_id = 42;

Key INCLUDE columns are those that:

* Are frequently selected in queries filtered by the index key columns.

* Have types that are expensive to fetch from the heap (large TEXT, JSONB).

* Significantly reduce the number of heap fetches.

Partial Indexes

Partial indexes cover only a subset of rows, making them smaller and faster to maintain:

CREATE INDEX idx_orders_active ON orders (user_id) WHERE status NOT IN ('cancelled', 'refunded');

-- This query uses the partial index:

SELECT * FROM orders WHERE user_id = 42 AND status NOT IN ('cancelled', 'refunded');

-- This query does NOT use the partial index:

SELECT * FROM orders WHERE user_id = 42 AND status = 'cancelled';

Partial indexes excel in scenarios where queries consistently target a subset of data:

-- Index for recent orders only

CREATE INDEX idx_orders_recent ON orders (user_id, created_at)

WHERE created_at > NOW() - INTERVAL '30 days';

-- Index for high-value customers

CREATE INDEX idx_orders_vip ON orders (user_id, total)

WHERE total > 1000;

-- Index that excludes nulls entirely (common pattern)

CREATE INDEX idx_users_email ON users (email) WHERE email IS NOT NULL;

Partial Unique Indexes

-- Enforce unique email only for active users

CREATE UNIQUE INDEX idx_active_email ON users (email) WHERE active = true;

-- Users can have duplicate inactive emails, but active users must have unique ones

Index-Only Scans

An index-only scan returns all needed data from the index without touching the heap table. For this to work, the visibility map must confirm that all tuples are visible to the current transaction:

-- Table with frequent updates: heap fetches still needed

EXPLAIN (ANALYZE, BUFFERS)

SELECT user_id, status FROM orders WHERE user_id = 42;

-- Look for: Heap Fetches: 0 (true index-only) or > 0 (partial)

Autovacuum maintains the visibility map. Run `VACUUM` aggressively on tables where index-only scans are critical:

ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);

Practical Examples

Multi-Column Filter and Sort

-- Query: find recent paid orders for a user, sorted by date

SELECT total, created_at

FROM orders

WHERE user_id = 42 AND status = 'paid' AND created_at > '2026-01-01'

ORDER BY created_at DESC;

-- Optimal index:

CREATE INDEX idx_orders_lookup ON orders (user_id, status, created_at DESC) INCLUDE (total);

Reporting Query

-- Query: aggregate orders by status for a date range

SELECT status, COUNT(*), SUM(total)

FROM orders

WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'

GROUP BY status;

-- Index: if created_at is monotonically increasing, BRIN might work

CREATE INDEX idx_orders_date_brin ON orders USING BRIN (created_at);

-- Or a regular B-tree for range scans

CREATE INDEX idx_orders_date ON orders (created_at, status) INCLUDE (total);

Maintenance

Composite indexes can become bloated. Monitor their efficiency:

-- Check index usage

SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch

FROM pg_stat_user_indexes

WHERE idx_scan > 0

ORDER BY idx_scan;

-- Find unused indexes

SELECT schemaname, tablename, indexname, idx_scan

FROM pg_stat_user_indexes

WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey';

A well-designed composite index can eliminate the need for multiple single-column indexes. Audit your indexes regularly: remove duplicates, add covering columns, and trim unused partial indexes. Each index adds write overhead, so the total set should serve your query patterns without redundancy.

Composite Indexes: Column Order, Covering Indexes, and Partial Indexes

Composite Indexes: Column Order, Covering Indexes, and Partial Indexes

Related Articles