SQL Query Optimization


Why Query Optimization Matters

A poorly written query can take seconds or minutes instead of milliseconds, consuming database resources and degrading the experience for all users. Optimizing queries is often the highest-ROI performance improvement you can make because it requires no infrastructure changes.

Understanding EXPLAIN

Every database has a query planner that decides how to execute your SQL. EXPLAIN shows you the plan.




EXPLAIN ANALYZE


SELECT u.name, COUNT(o.id) AS order_count


FROM users u


LEFT JOIN orders o ON u.id = o.user_id


WHERE u.created_at > '2026-01-01'


GROUP BY u.id, u.name


HAVING COUNT(o.id) > 5;





Output analysis:




Hash Join (cost=125.4..892.1 rows=342 width=72)


Hash Cond: (u.id = o.user_id)


-> Seq Scan on users u (cost=0.0..423.1 rows=1250 width=36)


Filter: (created_at > '2026-01-01')


-> Hash (cost=89.2..89.2 rows=2892 width=8)


-> Seq Scan on orders o (cost=0.0..89.2 rows=2892 width=8)





Key metrics to watch:

| Metric | What It Means | Good | Bad | |--------|---------------|------|-----| | `cost` | Estimated cost (arbitrary units) | Low | High | | `rows` | Estimated rows returned | Accurate | Off by 10x+ | | `actual time` | Actual execution time | ms | seconds | | `Seq Scan` | Full table scan | Small tables | Large tables | | `Sort` | Explicit sort operation | Sorted data | Large sorts to disk |

Indexing for Query Performance

Covering Indexes

An index that contains all columns needed by a query, enabling index-only scans:




-- Query needs: id, email, status, created_at


-- Instead of:


CREATE INDEX idx_users_status ON users(status);




-- Create a covering index:


CREATE INDEX idx_users_status_covering


ON users(status) INCLUDE (email, created_at);




-- Now this query uses an index-only scan:


SELECT email, created_at


FROM users


WHERE status = 'active'


ORDER BY created_at DESC;





JOIN Optimization

Nested Loop vs Hash Join vs Merge Join

| Join Type | When It Is Used | Best For | |-----------|-----------------|----------| | Nested Loop | One table is small, other is indexed | Small result sets | | Hash Join | No useful index, medium tables | Unindexed joins | | Merge Join | Both tables sorted on join key | Large sorted datasets |

Optimizing JOINs




-- INEFFICIENT: Joining without indexes


SELECT *


FROM orders o


JOIN customers c ON o.customer_id = c.id


WHERE c.country = 'US';




-- Add index on the join column


CREATE INDEX idx_orders_customer ON orders(customer_id);


CREATE INDEX idx_customers_country ON customers(country);




-- Now the planner can use indexes effectively





Subquery vs CTE vs JOIN




-- Option 1: Subquery


SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count


FROM users;




-- Option 2: Lateral join (often fastest)


SELECT u.name, sq.order_count


FROM users u


LEFT JOIN LATERAL (


SELECT COUNT(*) AS order_count


FROM orders


WHERE user_id = u.id


) sq ON true;




-- Option 3: Regular JOIN + GROUP BY


SELECT u.name, COUNT(o.id) AS order_count


FROM users u


LEFT JOIN orders o ON u.id = o.user_id


GROUP BY u.id, u.name;





For correlated subqueries, LATERAL joins are often the best choice because they can use indexes efficiently.

Avoiding Common Anti-Patterns

SELECT * (Especially with JOINs)




-- BAD: Selects all columns, may force heap lookups


SELECT * FROM users


JOIN orders ON users.id = orders.user_id;




-- GOOD: Select only needed columns


SELECT users.name, orders.total


FROM users


JOIN orders ON users.id = orders.user_id;





Functions on Indexed Columns




-- BAD: Function prevents index use


SELECT * FROM users


WHERE LOWER(email) = 'alice@example.com';




-- GOOD: Use a proper case-insensitive index or column


CREATE INDEX idx_users_email_lower ON users(LOWER(email));


SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';





Implicit Type Conversion




-- BAD: Type conversion prevents index use


SELECT * FROM orders


WHERE order_date = '2026-05-11'; -- text vs date




-- GOOD: Explicit cast


SELECT * FROM orders


WHERE order_date = DATE '2026-05-11';





Pagination with OFFSET




-- BAD: OFFSET re-scans skipped rows


SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;




-- GOOD: Keyset pagination (seek method)


SELECT * FROM orders


WHERE id > 10000


ORDER BY id


LIMIT 20;





Query Rewriting Examples




-- BEFORE (Slow - 12s)


SELECT p.*,


(SELECT COUNT(*) FROM reviews r WHERE r.product_id = p.id AND r.rating >= 4) AS good_reviews


FROM products p


WHERE p.category_id = 5;




-- AFTER (Fast - 0.3s)


SELECT p.*, COALESCE(gr.count, 0) AS good_reviews


FROM products p


LEFT JOIN (


SELECT product_id, COUNT(*) AS count


FROM reviews


WHERE rating >= 4


GROUP BY product_id


) gr ON p.id = gr.product_id


WHERE p.category_id = 5;





Performance Budgets

| Query Type | Target Latency | Max Rows Returned | |------------|---------------|-------------------| | Simple lookup (by PK) | < 5ms | 1 | | List with filter | < 50ms | 100 | | Report/aggregation | < 500ms | 10000 | | Dashboard | < 2s | 1000 | | Batch/ETL | < 30min | Unlimited |

Summary

SQL query optimization starts with understanding the query plan through EXPLAIN ANALYZE. Ensure appropriate indexes exist for your query patterns, prefer covering indexes for index-only scans, avoid functions on indexed columns, replace OFFSET pagination with keyset pagination, and rewrite correlated subqueries as joins or lateral joins. Measure before and after each optimization to confirm the improvement.