PostgreSQL Indexing and Performance Tuning: A Practical Guide

Slow queries are one of the most common production problems developers face. The good news is that most query performance issues have a clear solution — and understanding indexes is the single most valuable skill for diagnosing and fixing them.

This guide covers how PostgreSQL indexes work, when to use them, and how to read query plans.

How PostgreSQL Executes a Query

Without an index, PostgreSQL performs a sequential scan — it reads every row in the table to find matches. On a table with 10 million rows, that means reading 10 million rows for every query, regardless of how many results you need.

An index is a separate data structure that lets PostgreSQL jump directly to the rows that match your condition, like the index at the back of a book.

Key insight: Indexes speed up reads but slow down writes. Every INSERT, UPDATE, or DELETE must also update all indexes on that table. More indexes = faster queries, slower writes.

The Default: B-tree Index

PostgreSQL creates B-tree indexes by default. They work for equality (=), range (<, >, BETWEEN), and ORDER BY on the indexed column.

sql
-- Create an index
CREATE INDEX idx_users_email ON users(email);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Index is used automatically by the query planner
SELECT * FROM users WHERE email = 'alice@example.com';

When PostgreSQL ignores your index

Even with an index, PostgreSQL may choose a sequential scan if:

The query returns a large percentage of the table (full scans can be faster for bulk reads)
The table is very small
Statistics are stale — run ANALYZE table_name to update them
You use a function on the indexed column (see below)

Composite Indexes

A composite index covers multiple columns. Column order matters — the index is useful for queries that filter on the leading columns.

sql
CREATE INDEX idx_orders_user_status
  ON orders(user_id, status);

-- Uses the index (leading column matches)
SELECT * FROM orders WHERE user_id = 42;

-- Uses the index (both columns match)
SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';

-- Does NOT use the index (skips leading column)
SELECT * FROM orders WHERE status = 'pending';

Rule of thumb: Put the most selective column first, or the column you filter on most often. Think of a composite index like a phone book sorted by last name then first name — you can look up by last name alone, but not by first name alone.

Partial Indexes

A partial index only covers rows that match a condition. Much smaller and faster than a full index when you only query a subset of rows.

sql
-- Only index active users
CREATE INDEX idx_users_active_email
  ON users(email)
  WHERE active = true;

-- Only index unprocessed orders
CREATE INDEX idx_orders_pending
  ON orders(created_at)
  WHERE status = 'pending';

If 95% of your orders are completed and you only ever query pending orders, a partial index is dramatically smaller and faster than indexing the whole table.

Covering Indexes (INCLUDE)

A covering index includes extra columns so PostgreSQL can answer the query entirely from the index without touching the main table (an "index-only scan").

sql
CREATE INDEX idx_users_email_covering
  ON users(email)
  INCLUDE (id, name);

-- This query never touches the users table
SELECT id, name FROM users WHERE email = 'alice@example.com';

The INCLUDE columns are stored in the index leaf nodes but not used for sorting — they just allow index-only scans.

Expression Indexes

If your queries use a function on a column, index the expression:

sql
-- This query cannot use a plain index on email
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- Create an expression index
CREATE INDEX idx_users_lower_email ON users(lower(email));

-- Now this query uses the index
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

The same applies to date_trunc, substring, JSON operators, and any other function.

Reading EXPLAIN and EXPLAIN ANALYZE

EXPLAIN shows the query plan. EXPLAIN ANALYZE actually runs the query and shows real timings.

sql
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending';

Sample output:

code
Index Scan using idx_orders_user_status on orders
  (cost=0.43..8.45 rows=3 width=72)
  (actual time=0.021..0.025 rows=2 loops=1)
  Index Cond: ((user_id = 42) AND (status = 'pending'))
Planning Time: 0.3 ms
Execution Time: 0.1 ms

Key things to look for

Node type	Meaning
`Seq Scan`	Full table scan — consider an index
`Index Scan`	Using an index, fetching from heap
`Index Only Scan`	Using covering index — fastest
`Bitmap Index Scan`	Efficient for moderate result sets
`Hash Join`	Joining with a hash table
`Nested Loop`	Good for small result sets

cost=X..Y — estimated startup cost .. total cost (arbitrary units) rows=N — estimated rows returned actual time=X..Y — real milliseconds loops=N — how many times this node executed

Watch for: Estimated rows far off from actual rows. This means statistics are stale. Run ANALYZE or check if autovacuum is running.

The N+1 Problem in SQL

A common performance killer in applications: running one query to get a list, then N more queries for related data.

sql
-- Query 1: get all users
SELECT id FROM users LIMIT 100;

-- Then 100 more queries, one per user:
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... 98 more

Fix with a JOIN:

sql
SELECT u.id, u.name, o.id AS order_id, o.amount
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
LIMIT 100;

One query instead of 101.

Common Performance Pitfalls

Implicit type casting breaks indexes

sql
-- Table has: user_id integer, indexed
-- This breaks the index (casting integer to text):
SELECT * FROM orders WHERE user_id = '42';

-- This uses the index:
SELECT * FROM orders WHERE user_id = 42;

LIKE with leading wildcard skips the index

sql
-- Cannot use B-tree index
SELECT * FROM users WHERE name LIKE '%alice%';

-- Can use B-tree index (no leading wildcard)
SELECT * FROM users WHERE name LIKE 'alice%';

-- For full-text search, use GIN index with tsvector instead

SELECT * fetches unnecessary data

sql
-- Fetches all columns, cannot use covering index
SELECT * FROM users WHERE email = 'alice@example.com';

-- Only fetches what you need
SELECT id, name FROM users WHERE email = 'alice@example.com';

Missing indexes on foreign keys

PostgreSQL does not automatically index foreign key columns. Always index them:

sql
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);

Useful Maintenance Queries

sql
-- Find tables with sequential scans (candidates for indexing)
SELECT schemaname, tablename, seq_scan, idx_scan
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

-- Find unused indexes (wasting write performance)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;

-- Find slow queries (requires pg_stat_statements extension)
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Check index sizes
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass))
FROM pg_indexes
WHERE tablename = 'orders'
ORDER BY pg_relation_size(indexname::regclass) DESC;

Practice SQL on Froquiz

Database performance is a core backend skill tested in interviews at every level. Practice SQL on Froquiz — we cover queries, indexes, joins, window functions, and more.

Summary

Sequential scan = reads every row; Index scan = jumps to matching rows
B-tree is the default — works for equality, ranges, and sorting
Composite indexes — column order matters, leading column must be in the WHERE clause
Partial indexes — index only the rows you actually query
Covering indexes — include extra columns to enable index-only scans
Expression indexes — required when filtering on functions like lower() or date_trunc()
Use EXPLAIN ANALYZE to see what the query planner actually does
Always index foreign key columns — PostgreSQL does not do this automatically