SQL Performance Tuning: Indexes, EXPLAIN, Query Optimization and Anti-Patterns
Slow database queries are one of the most common performance bottlenecks in web applications. The good news: most query performance problems are fixable with the right indexes, query rewrites, and an understanding of how the query planner works.
EXPLAIN ANALYZE: Your Most Important Tool
Before optimizing anything, measure. EXPLAIN ANALYZE shows how PostgreSQL actually executes your query and how long each step takes.
sqlEXPLAIN ANALYZE SELECT u.name, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > NOW() - INTERVAL '30 days' GROUP BY u.id, u.name ORDER BY order_count DESC LIMIT 10;
Sample output:
codeLimit (cost=1234.56..1234.61 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1) -> Sort (cost=1234.56..1237.06 rows=1000 width=40) (actual time=45.121..45.122 rows=10 loops=1) -> HashAggregate (cost=1200.00..1210.00 rows=1000 width=40) (actual time=44.800..44.900 rows=980 loops=1) -> Hash Left Join (cost=500.00..1150.00 rows=5000 width=32) (actual time=12.000..42.000 rows=50000 loops=1) Hash Cond: (o.user_id = u.id) -> Seq Scan on orders (cost=0.00..400.00 rows=20000 width=16) (actual time=0.010..8.000 rows=20000 loops=1) -> Hash (cost=450.00..450.00 rows=1000 width=24) (actual time=11.000..11.000 rows=980 loops=1) -> Index Scan on users (cost=0.43..450.00 rows=1000 width=24) (actual time=0.050..10.500 rows=980 loops=1) Index Cond: (created_at > (now() - '30 days'::interval)) Planning Time: 0.5 ms Execution Time: 45.3 ms
Key things to look for:
- Seq Scan on large tables β usually needs an index
- actual rows vs estimated rows β large discrepancy means stale statistics
- loops β a node executing many times is often the bottleneck
- High actual time at any node
Indexing Strategies
Single-column indexes
sql-- Slow: full table scan SELECT * FROM orders WHERE user_id = 42; -- Add index CREATE INDEX idx_orders_user_id ON orders(user_id); -- Now: index scan -- O(log n) instead of O(n)
Composite indexes
Index multiple columns together. Column order matters β the index supports queries filtering on the leading columns.
sql-- This composite index supports: -- WHERE status = 'pending' -- WHERE status = 'pending' AND created_at > ... -- But NOT: WHERE created_at > ... (without status) CREATE INDEX idx_orders_status_created ON orders(status, created_at); -- Rule: put equality conditions before range conditions -- Good: (status, created_at) -- filter by status, range on date -- Bad: (created_at, status) -- range on date first, status filter less useful
Covering indexes
Include all columns needed by the query so it never touches the main table:
sql-- Query only needs user_id, status, and total SELECT user_id, status, total FROM orders WHERE user_id = 42 AND status = 'completed'; -- Covering index: all needed columns included CREATE INDEX idx_orders_covering ON orders(user_id, status) INCLUDE (total); -- PostgreSQL reads only the index, never the heap -- faster
Partial indexes
Index only a subset of rows β smaller, faster:
sql-- Most queries only care about active users CREATE INDEX idx_users_active_email ON users(email) WHERE active = true; -- Pending orders are queried frequently; completed/cancelled rarely CREATE INDEX idx_orders_pending ON orders(created_at, user_id) WHERE status = 'pending';
Expression indexes
Index the result of a function or expression:
sql-- Slow: function prevents index use SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- Create expression index CREATE INDEX idx_users_lower_email ON users(LOWER(email)); -- Now the query uses the index
The N+1 Query Problem
One of the most common performance killers in ORM-heavy applications.
javascript// N+1: 1 query to get users, then N queries for each user's orders const users = await User.findAll({ limit: 100 }); // 1 query for (const user of users) { const orders = await user.getOrders(); // 100 queries! } // Total: 101 queries // Fix: eager loading (JOIN) const users = await User.findAll({ limit: 100, include: [{ model: Order }], // 1 or 2 queries total });
In raw SQL, the fix is a single JOIN:
sql-- N+1 pattern (application does this in a loop) SELECT * FROM users LIMIT 100; SELECT * FROM orders WHERE user_id = 1; SELECT * FROM orders WHERE user_id = 2; -- ... 100 times -- Fix: single query with JOIN SELECT u.id, u.name, o.id AS order_id, o.total FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.id = ANY(ARRAY[1,2,3,...,100]);
Query Anti-Patterns
SELECT *
sql-- Bad: fetches all columns, including large blobs SELECT * FROM products WHERE category_id = 5; -- Good: fetch only what you need SELECT id, name, price FROM products WHERE category_id = 5;
SELECT * prevents covering indexes from working and transfers unnecessary data.
Functions on indexed columns in WHERE
sql-- Bad: index on created_at cannot be used SELECT * FROM orders WHERE YEAR(created_at) = 2024; SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15'; -- Good: use range conditions that index can satisfy SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
LIKE with leading wildcard
sql-- Bad: cannot use a B-tree index on name SELECT * FROM products WHERE name LIKE '%laptop%'; -- Options: -- 1. Full-text search (proper solution for text search) SELECT * FROM products WHERE to_tsvector('english', name) @@ to_tsquery('laptop'); -- 2. pg_trgm extension for LIKE with index support CREATE EXTENSION pg_trgm; CREATE INDEX idx_products_name_trgm ON products USING GIN(name gin_trgm_ops); -- Now LIKE '%laptop%' uses the index
Unbounded queries
sql-- Bad: no LIMIT on large tables SELECT * FROM logs WHERE level = 'ERROR'; -- Good: always paginate SELECT * FROM logs WHERE level = 'ERROR' ORDER BY created_at DESC LIMIT 50 OFFSET 0; -- Better: keyset pagination (faster than OFFSET for deep pages) SELECT * FROM logs WHERE level = 'ERROR' AND created_at < :last_seen_timestamp ORDER BY created_at DESC LIMIT 50;
Statistics and Vacuuming
PostgreSQL uses statistics to estimate row counts for query planning. Stale statistics cause bad query plans.
sql-- Update statistics manually (auto-vacuum usually handles this) ANALYZE users; ANALYZE orders; -- Check when a table was last analyzed SELECT relname, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname IN ('users', 'orders'); -- Check index usage SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE relname = 'orders' ORDER BY idx_scan DESC;
Indexes with zero or very low idx_scan are candidates for removal β they add overhead to writes for no read benefit.
Partitioning
Split a large table into smaller physical partitions for better query performance and easier data management:
sql-- Range partition by date CREATE TABLE orders ( id BIGINT, user_id INT, total DECIMAL, created_at TIMESTAMPTZ ) PARTITION BY RANGE (created_at); CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); CREATE TABLE orders_2025 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'); -- Queries filtered by created_at automatically scan only relevant partitions SELECT * FROM orders WHERE created_at >= '2025-01-01'; -- only scans orders_2025
Connection Pooling
Every new database connection is expensive (~5ms). Use a connection pool:
javascript// Node.js with pg import { Pool } from "pg"; const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, // maximum pool size idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000, }); // Reuse connections -- no connection overhead per query const result = await pool.query("SELECT * FROM users WHERE id = $1", [userId]);
For high-traffic applications, add PgBouncer between your app and PostgreSQL for transaction-level connection pooling.
Common Interview Questions
Q: What is the difference between a clustered and non-clustered index?
A clustered index determines the physical order of data on disk β there can be only one per table. In PostgreSQL, the primary key index is clustered by default (via CLUSTER). Non-clustered (secondary) indexes store a pointer to the heap row. PostgreSQL calls them heap scans vs index scans.
Q: When would you NOT add an index?
On small tables (full scan is fast enough), on columns with very low cardinality (e.g., a boolean column β the index is rarely selective enough to be useful), on tables with very high write volume where index maintenance overhead outweighs read benefits, and on columns that are never used in WHERE, JOIN, or ORDER BY clauses.
Q: What is a query execution plan and why does it matter?
The query planner generates an execution plan that describes the steps PostgreSQL will take to answer a query β which indexes to use, which join algorithm to apply, in what order to access tables. A bad plan (e.g., sequential scan instead of index scan due to stale statistics) can make a query 100x slower. EXPLAIN ANALYZE reveals the actual plan used.
Practice SQL on Froquiz
Performance and indexing questions appear in senior backend and data engineering interviews. Test your SQL knowledge on Froquiz β from basic queries to advanced optimization.
Summary
EXPLAIN ANALYZEshows the actual execution plan β always profile before optimizing- Composite index column order matters: equality columns first, range columns last
- Covering indexes eliminate heap access entirely β significant speedup for hot queries
- Partial indexes are smaller and faster β index only the rows you actually query
- Never apply functions to indexed columns in
WHEREβ it prevents index use - The N+1 problem: use JOINs or eager loading instead of querying in a loop
SELECT *disables covering indexes and transfers unnecessary data- Keyset pagination outperforms
OFFSETfor deep pages on large tables