Index Maintenance: Bloat, Rebuild, Reindex, and Fillfactor Tuning
Index Maintenance: Bloat, Rebuild, Reindex, and Fillfactor Tuning
Indexes degrade over time. PostgreSQL's MVCC architecture creates dead index entries that waste space and slow down scans. Regular maintenance keeps indexes healthy and query performance predictable.
Index Bloat
Index bloat occurs when dead tuples leave empty space in index pages. Unlike tables, PostgreSQL does not automatically reuse index page space aggressively.
Measuring Bloat
-- Quick bloat estimation using pg_stat_user_indexes
SELECT
indexrelname AS index_name,
relname AS table_name,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
For detailed bloat estimation, the `pgstattuple` extension provides accurate measurements:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstatindex('idx_orders_user_id');
-- version, tree_level, index_size, root_block_no, internal_pages, leaf_pages,
-- empty_pages, deleted_pages, avg_leaf_density, leaf_fragmentation
Key metrics:
* `avg_leaf_density`: Below 50% indicates significant bloat.
* `leaf_fragmentation`: High fragmentation slows sequential index scans.
* `empty_pages + deleted_pages`: Pages that are allocated but useless.
External Tools
`pg_repack` rebuilds indexes without locks:
# Rebuild all indexes on a table without blocking writes
pg_repack -h localhost -d mydb --table orders -o idx_orders_user_id
REINDEX
`REINDEX` rebuilds an index from scratch, eliminating bloat and restoring optimal structure:
-- Rebuild a single index
REINDEX INDEX idx_orders_user_id;
-- Rebuild all indexes on a table
REINDEX TABLE orders;
-- Rebuild all indexes in a schema
REINDEX SCHEMA public;
-- Rebuild all indexes in a database (offline)
REINDEX DATABASE mydb;
CONCURRENTLY
`REINDEX` by default takes an `ACCESS EXCLUSIVE` lock, blocking reads and writes. The `CONCURRENTLY` option avoids this:
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
Concurrent reindexing creates a new index, starts a new transaction, and drops the old index when complete. It uses more resources (CPU, I/O, temporary disk space) but does not block queries.
-- For production systems, always use CONCURRENTLY
-- But monitor for failure:
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
-- If the process fails, an "invalid" index remains:
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE indisvalid = false;
-- Drop and retry the invalid index
DROP INDEX IF EXISTS idx_orders_user_id_ccnew;
Fillfactor Tuning
Fillfactor controls how full each index page is when it is first built. The default is 90% for B-tree indexes (10% free space per page).
-- Create index with 70% fillfactor (30% free space)
CREATE INDEX idx_orders_user_id ON orders (user_id) WITH (fillfactor = 70);
-- Alter existing index (only affects future page splits)
ALTER INDEX idx_orders_user_id SET (fillfactor = 80);
When to Adjust Fillfactor
| Workload | Recommended Fillfactor | Reason | |----------|----------------------|--------| | Read-only, bulk-loaded | 100 | No updates expected | | Standard OLTP (default) | 90 | Balance reads and writes | | Write-heavy (updates) | 70-80 | Reduce page splits | | HOT updates via fillfactor | 50-60 | Maximize HOT update ratio |
Heap-Only Tuples (HOT) optimization occurs when updated row versions fit in the same page. Lower fillfactor means more page space for HOT updates:
-- Increase HOT update ratio with lower fillfactor
CREATE TABLE frequently_updated (
id INTEGER PRIMARY KEY,
status TEXT,
counter INTEGER
) WITH (fillfactor = 70);
Monitor HOT update ratio:
SELECT relname, n_tup_upd, n_tup_hot_upd,
ROUND(n_tup_hot_upd * 100.0 / NULLIF(n_tup_upd, 0), 2) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY hot_pct;
A low HOT ratio (below 50%) indicates many index-only updates that cause index bloat. Lowering fillfactor or adding INCLUDE columns to avoid index updates can help.
Automated Maintenance
Autovacuum Configuration
Autovacuum manages both table and index cleanup:
# postgresql.conf
autovacuum = on
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.01
autovacuum_vacuum_threshold = 1000
autovacuum_vacuum_cost_limit = 1000
autovacuum_vacuum_cost_delay = 5ms
For write-heavy tables, configure per-table autovacuum:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.005,
autovacuum_vacuum_threshold = 1000,
autovacuum_vacuum_cost_limit = 2000
);
Scheduled REINDEX
For tables with regular bloat patterns, schedule REINDEX:
-- Using pg_cron extension
SELECT cron.schedule('reindex-orders', '0 3 * * 0',
'REINDEX INDEX CONCURRENTLY idx_orders_user_id'
);
Monitoring Index Health
-- Index size trends
SELECT
indexrelid::regclass,
pg_size_pretty(pg_relation_size(indexrelid)) AS current_size,
pg_size_pretty(pg_relation_size(indexrelid) -
pg_indexes_size(indexrelid::regclass::text::regclass)) AS estimated_bloat
FROM pg_stat_user_indexes;
-- Top 10 largest indexes
SELECT
indexrelid::regclass,
pg_size_pretty(sum(pg_relation_size(indexrelid))) AS total_size
FROM pg_stat_user_indexes
GROUP BY indexrelid
ORDER BY sum(pg_relation_size(indexrelid)) DESC
LIMIT 10;
-- Index scan frequency vs bloat
SELECT
indexrelid::regclass,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Best Practices
* **Set up bloat monitoring** with thresholds (e.g., alert when avg_leaf_density < 60%).
2\. **Use CONCURRENTLY** for all production REINDEX operations. 3\. **Tune fillfactor** per table based on update frequency. 4\. **Schedule index maintenance** during low-traffic periods using pg_cron. 5\. **Drop unused indexes** identified by `pg_stat_user_indexes` with `idx_scan = 0`. 6\. **Consider BRIN indexes** for append-only tables to avoid B-tree bloat entirely.
Index maintenance is not a one-time activity. Build monitoring and automation into your database operations, and check index health as part of your regular performance review cycle.