PostgreSQL Indexing and Performance Tuning: A Practical Guide
Slow queries are one of the most common production problems developers face. The good news is that most query performance issues have a clear solution β and understanding indexes is the single most valuable skill for diagnosing and fixing them.
This guide covers how PostgreSQL indexes work, when to use them, and how to read query plans.
How PostgreSQL Executes a Query
Without an index, PostgreSQL performs a sequential scan β it reads every row in the table to find matches. On a table with 10 million rows, that means reading 10 million rows for every query, regardless of how many results you need.
An index is a separate data structure that lets PostgreSQL jump directly to the rows that match your condition, like the index at the back of a book.
Key insight: Indexes speed up reads but slow down writes. Every INSERT, UPDATE, or DELETE must also update all indexes on that table. More indexes = faster queries, slower writes.
The Default: B-tree Index
PostgreSQL creates B-tree indexes by default. They work for equality (=), range (<, >, BETWEEN), and ORDER BY on the indexed column.
sql-- Create an index CREATE INDEX idx_users_email ON users(email); -- Unique index (also enforces uniqueness) CREATE UNIQUE INDEX idx_users_email ON users(email); -- Index is used automatically by the query planner SELECT * FROM users WHERE email = 'alice@example.com';
When PostgreSQL ignores your index
Even with an index, PostgreSQL may choose a sequential scan if:
- The query returns a large percentage of the table (full scans can be faster for bulk reads)
- The table is very small
- Statistics are stale β run
ANALYZE table_nameto update them - You use a function on the indexed column (see below)
Composite Indexes
A composite index covers multiple columns. Column order matters β the index is useful for queries that filter on the leading columns.
sqlCREATE INDEX idx_orders_user_status ON orders(user_id, status); -- Uses the index (leading column matches) SELECT * FROM orders WHERE user_id = 42; -- Uses the index (both columns match) SELECT * FROM orders WHERE user_id = 42 AND status = 'pending'; -- Does NOT use the index (skips leading column) SELECT * FROM orders WHERE status = 'pending';
Rule of thumb: Put the most selective column first, or the column you filter on most often. Think of a composite index like a phone book sorted by last name then first name β you can look up by last name alone, but not by first name alone.
Partial Indexes
A partial index only covers rows that match a condition. Much smaller and faster than a full index when you only query a subset of rows.
sql-- Only index active users CREATE INDEX idx_users_active_email ON users(email) WHERE active = true; -- Only index unprocessed orders CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';
If 95% of your orders are completed and you only ever query pending orders, a partial index is dramatically smaller and faster than indexing the whole table.
Covering Indexes (INCLUDE)
A covering index includes extra columns so PostgreSQL can answer the query entirely from the index without touching the main table (an "index-only scan").
sqlCREATE INDEX idx_users_email_covering ON users(email) INCLUDE (id, name); -- This query never touches the users table SELECT id, name FROM users WHERE email = 'alice@example.com';
The INCLUDE columns are stored in the index leaf nodes but not used for sorting β they just allow index-only scans.
Expression Indexes
If your queries use a function on a column, index the expression:
sql-- This query cannot use a plain index on email SELECT * FROM users WHERE lower(email) = 'alice@example.com'; -- Create an expression index CREATE INDEX idx_users_lower_email ON users(lower(email)); -- Now this query uses the index SELECT * FROM users WHERE lower(email) = 'alice@example.com';
The same applies to date_trunc, substring, JSON operators, and any other function.
Reading EXPLAIN and EXPLAIN ANALYZE
EXPLAIN shows the query plan. EXPLAIN ANALYZE actually runs the query and shows real timings.
sqlEXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
Sample output:
codeIndex Scan using idx_orders_user_status on orders (cost=0.43..8.45 rows=3 width=72) (actual time=0.021..0.025 rows=2 loops=1) Index Cond: ((user_id = 42) AND (status = 'pending')) Planning Time: 0.3 ms Execution Time: 0.1 ms
Key things to look for
| Node type | Meaning |
|---|---|
Seq Scan | Full table scan β consider an index |
Index Scan | Using an index, fetching from heap |
Index Only Scan | Using covering index β fastest |
Bitmap Index Scan | Efficient for moderate result sets |
Hash Join | Joining with a hash table |
Nested Loop | Good for small result sets |
cost=X..Y β estimated startup cost .. total cost (arbitrary units) rows=N β estimated rows returned actual time=X..Y β real milliseconds loops=N β how many times this node executed
Watch for: Estimated rows far off from actual rows. This means statistics are stale. Run
ANALYZEor check if autovacuum is running.
The N+1 Problem in SQL
A common performance killer in applications: running one query to get a list, then N more queries for related data.
sql-- Query 1: get all users SELECT id FROM users LIMIT 100; -- Then 100 more queries, one per user: SELECT * FROM orders WHERE user_id = 1; SELECT * FROM orders WHERE user_id = 2; -- ... 98 more
Fix with a JOIN:
sqlSELECT u.id, u.name, o.id AS order_id, o.amount FROM users u LEFT JOIN orders o ON o.user_id = u.id LIMIT 100;
One query instead of 101.
Common Performance Pitfalls
Implicit type casting breaks indexes
sql-- Table has: user_id integer, indexed -- This breaks the index (casting integer to text): SELECT * FROM orders WHERE user_id = '42'; -- This uses the index: SELECT * FROM orders WHERE user_id = 42;
LIKE with leading wildcard skips the index
sql-- Cannot use B-tree index SELECT * FROM users WHERE name LIKE '%alice%'; -- Can use B-tree index (no leading wildcard) SELECT * FROM users WHERE name LIKE 'alice%'; -- For full-text search, use GIN index with tsvector instead
SELECT * fetches unnecessary data
sql-- Fetches all columns, cannot use covering index SELECT * FROM users WHERE email = 'alice@example.com'; -- Only fetches what you need SELECT id, name FROM users WHERE email = 'alice@example.com';
Missing indexes on foreign keys
PostgreSQL does not automatically index foreign key columns. Always index them:
sqlCREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_order_items_order_id ON order_items(order_id);
Useful Maintenance Queries
sql-- Find tables with sequential scans (candidates for indexing) SELECT schemaname, tablename, seq_scan, idx_scan FROM pg_stat_user_tables ORDER BY seq_scan DESC; -- Find unused indexes (wasting write performance) SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY schemaname, tablename; -- Find slow queries (requires pg_stat_statements extension) SELECT query, mean_exec_time, calls FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20; -- Check index sizes SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) FROM pg_indexes WHERE tablename = 'orders' ORDER BY pg_relation_size(indexname::regclass) DESC;
Practice SQL on Froquiz
Database performance is a core backend skill tested in interviews at every level. Practice SQL on Froquiz β we cover queries, indexes, joins, window functions, and more.
Summary
- Sequential scan = reads every row; Index scan = jumps to matching rows
- B-tree is the default β works for equality, ranges, and sorting
- Composite indexes β column order matters, leading column must be in the WHERE clause
- Partial indexes β index only the rows you actually query
- Covering indexes β include extra columns to enable index-only scans
- Expression indexes β required when filtering on functions like
lower()ordate_trunc() - Use
EXPLAIN ANALYZEto see what the query planner actually does - Always index foreign key columns β PostgreSQL does not do this automatically