SQL Performance Tuning: Indexes, EXPLAIN, Query Optimization and Anti-Patterns

Slow database queries are one of the most common performance bottlenecks in web applications. The good news: most query performance problems are fixable with the right indexes, query rewrites, and an understanding of how the query planner works.

EXPLAIN ANALYZE: Your Most Important Tool

Before optimizing anything, measure. EXPLAIN ANALYZE shows how PostgreSQL actually executes your query and how long each step takes.

sql
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

Sample output:

code
Limit  (cost=1234.56..1234.61 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1)
  -> Sort  (cost=1234.56..1237.06 rows=1000 width=40) (actual time=45.121..45.122 rows=10 loops=1)
       -> HashAggregate  (cost=1200.00..1210.00 rows=1000 width=40) (actual time=44.800..44.900 rows=980 loops=1)
            -> Hash Left Join  (cost=500.00..1150.00 rows=5000 width=32) (actual time=12.000..42.000 rows=50000 loops=1)
                 Hash Cond: (o.user_id = u.id)
                 -> Seq Scan on orders  (cost=0.00..400.00 rows=20000 width=16) (actual time=0.010..8.000 rows=20000 loops=1)
                 -> Hash  (cost=450.00..450.00 rows=1000 width=24) (actual time=11.000..11.000 rows=980 loops=1)
                       -> Index Scan on users  (cost=0.43..450.00 rows=1000 width=24) (actual time=0.050..10.500 rows=980 loops=1)
                            Index Cond: (created_at > (now() - '30 days'::interval))
Planning Time: 0.5 ms
Execution Time: 45.3 ms

Key things to look for:

Seq Scan on large tables — usually needs an index
actual rows vs estimated rows — large discrepancy means stale statistics
loops — a node executing many times is often the bottleneck
High actual time at any node

Indexing Strategies

Single-column indexes

sql
-- Slow: full table scan
SELECT * FROM orders WHERE user_id = 42;

-- Add index
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Now: index scan -- O(log n) instead of O(n)

Composite indexes

Index multiple columns together. Column order matters — the index supports queries filtering on the leading columns.

sql
-- This composite index supports:
-- WHERE status = 'pending'
-- WHERE status = 'pending' AND created_at > ...
-- But NOT: WHERE created_at > ... (without status)
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

-- Rule: put equality conditions before range conditions
-- Good: (status, created_at) -- filter by status, range on date
-- Bad:  (created_at, status) -- range on date first, status filter less useful

Covering indexes

Include all columns needed by the query so it never touches the main table:

sql
-- Query only needs user_id, status, and total
SELECT user_id, status, total
FROM orders
WHERE user_id = 42 AND status = 'completed';

-- Covering index: all needed columns included
CREATE INDEX idx_orders_covering
ON orders(user_id, status)
INCLUDE (total);
-- PostgreSQL reads only the index, never the heap -- faster

Partial indexes

Index only a subset of rows — smaller, faster:

sql
-- Most queries only care about active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE active = true;

-- Pending orders are queried frequently; completed/cancelled rarely
CREATE INDEX idx_orders_pending
ON orders(created_at, user_id)
WHERE status = 'pending';

Expression indexes

Index the result of a function or expression:

sql
-- Slow: function prevents index use
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Create expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now the query uses the index

The N+1 Query Problem

One of the most common performance killers in ORM-heavy applications.

javascript
// N+1: 1 query to get users, then N queries for each user's orders
const users = await User.findAll({ limit: 100 }); // 1 query
for (const user of users) {
  const orders = await user.getOrders(); // 100 queries!
}
// Total: 101 queries

// Fix: eager loading (JOIN)
const users = await User.findAll({
  limit: 100,
  include: [{ model: Order }], // 1 or 2 queries total
});

In raw SQL, the fix is a single JOIN:

sql
-- N+1 pattern (application does this in a loop)
SELECT * FROM users LIMIT 100;
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... 100 times

-- Fix: single query with JOIN
SELECT u.id, u.name, o.id AS order_id, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id = ANY(ARRAY[1,2,3,...,100]);

Query Anti-Patterns

SELECT *

sql
-- Bad: fetches all columns, including large blobs
SELECT * FROM products WHERE category_id = 5;

-- Good: fetch only what you need
SELECT id, name, price FROM products WHERE category_id = 5;

SELECT * prevents covering indexes from working and transfers unnecessary data.

Functions on indexed columns in WHERE

sql
-- Bad: index on created_at cannot be used
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';

-- Good: use range conditions that index can satisfy
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at < '2025-01-01';

LIKE with leading wildcard

sql
-- Bad: cannot use a B-tree index on name
SELECT * FROM products WHERE name LIKE '%laptop%';

-- Options:
-- 1. Full-text search (proper solution for text search)
SELECT * FROM products WHERE to_tsvector('english', name) @@ to_tsquery('laptop');

-- 2. pg_trgm extension for LIKE with index support
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_products_name_trgm ON products USING GIN(name gin_trgm_ops);
-- Now LIKE '%laptop%' uses the index

Unbounded queries

sql
-- Bad: no LIMIT on large tables
SELECT * FROM logs WHERE level = 'ERROR';

-- Good: always paginate
SELECT * FROM logs
WHERE level = 'ERROR'
ORDER BY created_at DESC
LIMIT 50 OFFSET 0;

-- Better: keyset pagination (faster than OFFSET for deep pages)
SELECT * FROM logs
WHERE level = 'ERROR'
  AND created_at < :last_seen_timestamp
ORDER BY created_at DESC
LIMIT 50;

Statistics and Vacuuming

PostgreSQL uses statistics to estimate row counts for query planning. Stale statistics cause bad query plans.

sql
-- Update statistics manually (auto-vacuum usually handles this)
ANALYZE users;
ANALYZE orders;

-- Check when a table was last analyzed
SELECT relname, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE relname IN ('users', 'orders');

-- Check index usage
SELECT
  indexrelname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan DESC;

Indexes with zero or very low idx_scan are candidates for removal — they add overhead to writes for no read benefit.

Partitioning

Split a large table into smaller physical partitions for better query performance and easier data management:

sql
-- Range partition by date
CREATE TABLE orders (
  id BIGINT,
  user_id INT,
  total DECIMAL,
  created_at TIMESTAMPTZ
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2024 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Queries filtered by created_at automatically scan only relevant partitions
SELECT * FROM orders WHERE created_at >= '2025-01-01'; -- only scans orders_2025

Connection Pooling

Every new database connection is expensive (~5ms). Use a connection pool:

javascript
// Node.js with pg
import { Pool } from "pg";

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,          // maximum pool size
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Reuse connections -- no connection overhead per query
const result = await pool.query("SELECT * FROM users WHERE id = $1", [userId]);

For high-traffic applications, add PgBouncer between your app and PostgreSQL for transaction-level connection pooling.

Common Interview Questions

Q: What is the difference between a clustered and non-clustered index?

A clustered index determines the physical order of data on disk — there can be only one per table. In PostgreSQL, the primary key index is clustered by default (via CLUSTER). Non-clustered (secondary) indexes store a pointer to the heap row. PostgreSQL calls them heap scans vs index scans.

Q: When would you NOT add an index?

On small tables (full scan is fast enough), on columns with very low cardinality (e.g., a boolean column — the index is rarely selective enough to be useful), on tables with very high write volume where index maintenance overhead outweighs read benefits, and on columns that are never used in WHERE, JOIN, or ORDER BY clauses.

Q: What is a query execution plan and why does it matter?

The query planner generates an execution plan that describes the steps PostgreSQL will take to answer a query — which indexes to use, which join algorithm to apply, in what order to access tables. A bad plan (e.g., sequential scan instead of index scan due to stale statistics) can make a query 100x slower. EXPLAIN ANALYZE reveals the actual plan used.

Practice SQL on Froquiz

Performance and indexing questions appear in senior backend and data engineering interviews. Test your SQL knowledge on Froquiz — from basic queries to advanced optimization.

Summary

EXPLAIN ANALYZE shows the actual execution plan — always profile before optimizing
Composite index column order matters: equality columns first, range columns last
Covering indexes eliminate heap access entirely — significant speedup for hot queries
Partial indexes are smaller and faster — index only the rows you actually query
Never apply functions to indexed columns in WHERE — it prevents index use
The N+1 problem: use JOINs or eager loading instead of querying in a loop
SELECT * disables covering indexes and transfers unnecessary data
Keyset pagination outperforms OFFSET for deep pages on large tables