Why Query Optimization Matters
A poorly written query can take seconds or minutes instead of milliseconds, consuming database resources and degrading the experience for all users. Optimizing queries is often the highest-ROI performance improvement you can make because it requires no infrastructure changes.
Understanding EXPLAIN
Every database has a query planner that decides how to execute your SQL. EXPLAIN shows you the plan.
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2026-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5;
Output analysis:
Hash Join (cost=125.4..892.1 rows=342 width=72)
Hash Cond: (u.id = o.user_id)
-> Seq Scan on users u (cost=0.0..423.1 rows=1250 width=36)
Filter: (created_at > '2026-01-01')
-> Hash (cost=89.2..89.2 rows=2892 width=8)
-> Seq Scan on orders o (cost=0.0..89.2 rows=2892 width=8)
Key metrics to watch:
| Metric | What It Means | Good | Bad |
|--------|---------------|------|-----|
| `cost` | Estimated cost (arbitrary units) | Low | High |
| `rows` | Estimated rows returned | Accurate | Off by 10x+ |
| `actual time` | Actual execution time | ms | seconds |
| `Seq Scan` | Full table scan | Small tables | Large tables |
| `Sort` | Explicit sort operation | Sorted data | Large sorts to disk |
Indexing for Query Performance
Covering Indexes
An index that contains all columns needed by a query, enabling index-only scans:
-- Query needs: id, email, status, created_at
-- Instead of:
CREATE INDEX idx_users_status ON users(status);
-- Create a covering index:
CREATE INDEX idx_users_status_covering
ON users(status) INCLUDE (email, created_at);
-- Now this query uses an index-only scan:
SELECT email, created_at
FROM users
WHERE status = 'active'
ORDER BY created_at DESC;
JOIN Optimization
Nested Loop vs Hash Join vs Merge Join
| Join Type | When It Is Used | Best For |
|-----------|-----------------|----------|
| Nested Loop | One table is small, other is indexed | Small result sets |
| Hash Join | No useful index, medium tables | Unindexed joins |
| Merge Join | Both tables sorted on join key | Large sorted datasets |
Optimizing JOINs
-- INEFFICIENT: Joining without indexes
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'US';
-- Add index on the join column
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_customers_country ON customers(country);
-- Now the planner can use indexes effectively
Subquery vs CTE vs JOIN
-- Option 1: Subquery
SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count
FROM users;
-- Option 2: Lateral join (often fastest)
SELECT u.name, sq.order_count
FROM users u
LEFT JOIN LATERAL (
SELECT COUNT(*) AS order_count
FROM orders
WHERE user_id = u.id
) sq ON true;
-- Option 3: Regular JOIN + GROUP BY
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
For correlated subqueries, LATERAL joins are often the best choice because they can use indexes efficiently.
Avoiding Common Anti-Patterns
SELECT * (Especially with JOINs)
-- BAD: Selects all columns, may force heap lookups
SELECT * FROM users
JOIN orders ON users.id = orders.user_id;
-- GOOD: Select only needed columns
SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id;
Functions on Indexed Columns
-- BAD: Function prevents index use
SELECT * FROM users
WHERE LOWER(email) = 'alice@example.com';
-- GOOD: Use a proper case-insensitive index or column
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
Implicit Type Conversion
-- BAD: Type conversion prevents index use
SELECT * FROM orders
WHERE order_date = '2026-05-11'; -- text vs date
-- GOOD: Explicit cast
SELECT * FROM orders
WHERE order_date = DATE '2026-05-11';
Pagination with OFFSET
-- BAD: OFFSET re-scans skipped rows
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;
-- GOOD: Keyset pagination (seek method)
SELECT * FROM orders
WHERE id > 10000
ORDER BY id
LIMIT 20;
Query Rewriting Examples
-- BEFORE (Slow - 12s)
SELECT p.*,
(SELECT COUNT(*) FROM reviews r WHERE r.product_id = p.id AND r.rating >= 4) AS good_reviews
FROM products p
WHERE p.category_id = 5;
-- AFTER (Fast - 0.3s)
SELECT p.*, COALESCE(gr.count, 0) AS good_reviews
FROM products p
LEFT JOIN (
SELECT product_id, COUNT(*) AS count
FROM reviews
WHERE rating >= 4
GROUP BY product_id
) gr ON p.id = gr.product_id
WHERE p.category_id = 5;
Performance Budgets
| Query Type | Target Latency | Max Rows Returned |
|------------|---------------|-------------------|
| Simple lookup (by PK) | < 5ms | 1 |
| List with filter | < 50ms | 100 |
| Report/aggregation | < 500ms | 10000 |
| Dashboard | < 2s | 1000 |
| Batch/ETL | < 30min | Unlimited |
Summary
SQL query optimization starts with understanding the query plan through EXPLAIN ANALYZE. Ensure appropriate indexes exist for your query patterns, prefer covering indexes for index-only scans, avoid functions on indexed columns, replace OFFSET pagination with keyset pagination, and rewrite correlated subqueries as joins or lateral joins. Measure before and after each optimization to confirm the improvement.