Index Maintenance: Bloat, Rebuild, Reindex, and Fillfactor Tuning


Index Maintenance: Bloat, Rebuild, Reindex, and Fillfactor Tuning

Indexes degrade over time. PostgreSQL's MVCC architecture creates dead index entries that waste space and slow down scans. Regular maintenance keeps indexes healthy and query performance predictable.

Index Bloat

Index bloat occurs when dead tuples leave empty space in index pages. Unlike tables, PostgreSQL does not automatically reuse index page space aggressively.

Measuring Bloat




-- Quick bloat estimation using pg_stat_user_indexes


SELECT


indexrelname AS index_name,


relname AS table_name,


idx_scan,


idx_tup_read,


idx_tup_fetch,


pg_size_pretty(pg_relation_size(indexrelid)) AS index_size


FROM pg_stat_user_indexes


WHERE schemaname = 'public'


ORDER BY pg_relation_size(indexrelid) DESC;





For detailed bloat estimation, the `pgstattuple` extension provides accurate measurements:




CREATE EXTENSION IF NOT EXISTS pgstattuple;




SELECT * FROM pgstatindex('idx_orders_user_id');


-- version, tree_level, index_size, root_block_no, internal_pages, leaf_pages,


-- empty_pages, deleted_pages, avg_leaf_density, leaf_fragmentation





Key metrics:


* `avg_leaf_density`: Below 50% indicates significant bloat.

* `leaf_fragmentation`: High fragmentation slows sequential index scans.

* `empty_pages + deleted_pages`: Pages that are allocated but useless.


External Tools

`pg_repack` rebuilds indexes without locks:




# Rebuild all indexes on a table without blocking writes


pg_repack -h localhost -d mydb --table orders -o idx_orders_user_id





REINDEX

`REINDEX` rebuilds an index from scratch, eliminating bloat and restoring optimal structure:




-- Rebuild a single index


REINDEX INDEX idx_orders_user_id;




-- Rebuild all indexes on a table


REINDEX TABLE orders;




-- Rebuild all indexes in a schema


REINDEX SCHEMA public;




-- Rebuild all indexes in a database (offline)


REINDEX DATABASE mydb;





CONCURRENTLY

`REINDEX` by default takes an `ACCESS EXCLUSIVE` lock, blocking reads and writes. The `CONCURRENTLY` option avoids this:




REINDEX INDEX CONCURRENTLY idx_orders_user_id;





Concurrent reindexing creates a new index, starts a new transaction, and drops the old index when complete. It uses more resources (CPU, I/O, temporary disk space) but does not block queries.




-- For production systems, always use CONCURRENTLY


-- But monitor for failure:


REINDEX INDEX CONCURRENTLY idx_orders_user_id;




-- If the process fails, an "invalid" index remains:


SELECT indexrelid::regclass, indisvalid


FROM pg_index


WHERE indisvalid = false;




-- Drop and retry the invalid index


DROP INDEX IF EXISTS idx_orders_user_id_ccnew;





Fillfactor Tuning

Fillfactor controls how full each index page is when it is first built. The default is 90% for B-tree indexes (10% free space per page).




-- Create index with 70% fillfactor (30% free space)


CREATE INDEX idx_orders_user_id ON orders (user_id) WITH (fillfactor = 70);




-- Alter existing index (only affects future page splits)


ALTER INDEX idx_orders_user_id SET (fillfactor = 80);





When to Adjust Fillfactor

| Workload | Recommended Fillfactor | Reason | |----------|----------------------|--------| | Read-only, bulk-loaded | 100 | No updates expected | | Standard OLTP (default) | 90 | Balance reads and writes | | Write-heavy (updates) | 70-80 | Reduce page splits | | HOT updates via fillfactor | 50-60 | Maximize HOT update ratio |

Heap-Only Tuples (HOT) optimization occurs when updated row versions fit in the same page. Lower fillfactor means more page space for HOT updates:




-- Increase HOT update ratio with lower fillfactor


CREATE TABLE frequently_updated (


id INTEGER PRIMARY KEY,


status TEXT,


counter INTEGER


) WITH (fillfactor = 70);





Monitor HOT update ratio:




SELECT relname, n_tup_upd, n_tup_hot_upd,


ROUND(n_tup_hot_upd * 100.0 / NULLIF(n_tup_upd, 0), 2) AS hot_pct


FROM pg_stat_user_tables


WHERE n_tup_upd > 0


ORDER BY hot_pct;





A low HOT ratio (below 50%) indicates many index-only updates that cause index bloat. Lowering fillfactor or adding INCLUDE columns to avoid index updates can help.

Automated Maintenance

Autovacuum Configuration

Autovacuum manages both table and index cleanup:




# postgresql.conf


autovacuum = on


autovacuum_naptime = 1min


autovacuum_vacuum_scale_factor = 0.01


autovacuum_vacuum_threshold = 1000


autovacuum_vacuum_cost_limit = 1000


autovacuum_vacuum_cost_delay = 5ms





For write-heavy tables, configure per-table autovacuum:




ALTER TABLE orders SET (


autovacuum_vacuum_scale_factor = 0.005,


autovacuum_vacuum_threshold = 1000,


autovacuum_vacuum_cost_limit = 2000


);





Scheduled REINDEX

For tables with regular bloat patterns, schedule REINDEX:




-- Using pg_cron extension


SELECT cron.schedule('reindex-orders', '0 3 * * 0',


'REINDEX INDEX CONCURRENTLY idx_orders_user_id'


);





Monitoring Index Health




-- Index size trends


SELECT


indexrelid::regclass,


pg_size_pretty(pg_relation_size(indexrelid)) AS current_size,


pg_size_pretty(pg_relation_size(indexrelid) -


pg_indexes_size(indexrelid::regclass::text::regclass)) AS estimated_bloat


FROM pg_stat_user_indexes;




-- Top 10 largest indexes


SELECT


indexrelid::regclass,


pg_size_pretty(sum(pg_relation_size(indexrelid))) AS total_size


FROM pg_stat_user_indexes


GROUP BY indexrelid


ORDER BY sum(pg_relation_size(indexrelid)) DESC


LIMIT 10;




-- Index scan frequency vs bloat


SELECT


indexrelid::regclass,


idx_scan,


idx_tup_read,


idx_tup_fetch,


pg_size_pretty(pg_relation_size(indexrelid))


FROM pg_stat_user_indexes


WHERE idx_scan = 0


ORDER BY pg_relation_size(indexrelid) DESC;





Best Practices


* **Set up bloat monitoring** with thresholds (e.g., alert when avg_leaf_density < 60%).

2\. **Use CONCURRENTLY** for all production REINDEX operations. 3\. **Tune fillfactor** per table based on update frequency. 4\. **Schedule index maintenance** during low-traffic periods using pg_cron. 5\. **Drop unused indexes** identified by `pg_stat_user_indexes` with `idx_scan = 0`. 6\. **Consider BRIN indexes** for append-only tables to avoid B-tree bloat entirely.

Index maintenance is not a one-time activity. Build monitoring and automation into your database operations, and check index health as part of your regular performance review cycle.