What Is Database Indexing? The Science of Instantly Speeding Up Slow Queries
You run a WHERE email = "ali@example.com" query on a table with 10 million rows. Without an index, the database checks every row one by one — a full table scan. With an index, it jumps directly to the result.
The difference? Seconds versus milliseconds.
Why Are Indexes Necessary?
Think of database tables: data is stored unsorted on disk. Finding a record requires the database to scan every row — O(n) complexity. 10 million comparisons for 10 million rows.
An index reduces this search problem to O(log n). Like the index at the back of a book: it keeps the values in the "email" column sorted and points to the physical location of each value.
-- Without index: 10 million rows scanned SELECT * FROM users WHERE email = 'ali@example.com'; -- Create index CREATE INDEX idx_users_email ON users(email); -- With index: B-tree traversal, ~20 steps SELECT * FROM users WHERE email = 'ali@example.com';
B-Tree: The Structure Behind Indexes
The index structure used by most databases is the B-tree (Balanced Tree). It stores sorted data in a balanced tree structure.
[M]
/ \
[D, H] [R, V]
/ | \ / | \
[A] [E] [J] [N] [S] [X]
Search always starts at the root. At each node, a decision is made: "smaller, equal, or larger?" and the relevant branch is followed. Even with 1 billion records, maximum ~30 steps.
B-tree is effective not only for equality queries but also for range queries:
-- B-tree handles these queries efficiently SELECT * FROM orders WHERE amount > 1000; SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'; SELECT * FROM products WHERE name LIKE 'iPhone%'; -- Prefix search
Index Types
Single Column Index:
CREATE INDEX idx_email ON users(email); CREATE INDEX idx_created ON orders(created_at);
Composite (Multi-Column) Index:
-- Column order matters: with (last_name, first_name) -- WHERE last_name = ? → uses index -- WHERE first_name = ? → does NOT use index -- WHERE last_name = ? AND first_name = ? → uses index CREATE INDEX idx_name ON users(last_name, first_name);
Column order in composite indexes is critical. The index is used left to right. Without the first column, the index won't kick in for subsequent columns.
Unique Index:
CREATE UNIQUE INDEX idx_unique_email ON users(email); -- PRIMARY KEY automatically creates a unique index
Partial Index — only on a subset of the table:
-- Only index active users — smaller, faster CREATE INDEX idx_active_users ON users(email) WHERE active = true; -- Only pending orders CREATE INDEX idx_pending ON orders(created_at) WHERE status = 'pending';
Covering Index — contains all columns the query needs, never touches the table:
-- Covering index for this query SELECT email, name FROM users WHERE department = 'engineering'; CREATE INDEX idx_dept_covering ON users(department, email, name); -- Database reads only the index, never goes to the table
EXPLAIN: Reading the Query Plan
The way to understand whether an index is being used is the EXPLAIN command:
EXPLAIN SELECT * FROM users WHERE email = 'ali@example.com';
id | select_type | table | type | key | rows | Extra
1 | SIMPLE | users | ref | idx_users_email | 1 | Using index
The type column is critical:
constoreq_ref— perfect, single rowref— good, index is being usedrange— acceptable, range scanindex— full index scan, be carefulALL— full table scan, no index or index not being used
Situations Where Indexes Don't Get Used
Index created but query is still slow? Probably one of these mistakes:
Using a function on a column:
-- Does NOT use index: column was transformed SELECT * FROM users WHERE LOWER(email) = 'ali@example.com'; SELECT * FROM orders WHERE YEAR(created_at) = 2024; -- Uses index: column left as is SELECT * FROM users WHERE email = 'ali@example.com'; SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
LIKE with non-prefix search:
-- Uses index: prefix search SELECT * FROM products WHERE name LIKE 'iPhone%'; -- Does NOT use index: no prefix SELECT * FROM products WHERE name LIKE '%phone'; SELECT * FROM products WHERE name LIKE '%phone%';
Implicit type conversion:
-- user_id is INTEGER but string is given → does NOT use index SELECT * FROM users WHERE user_id = '123'; -- Correct usage SELECT * FROM users WHERE user_id = 123;
The Cost of Indexes
Indexes aren't magic. Every index carries a cost:
Write cost: Every INSERT, UPDATE, DELETE operation also updates the index. Write operations slow down on tables with many indexes.
Disk space: Every index uses additional storage. On large tables, indexes can double the total data size.
Index bloat: On tables with constant deletes/updates, indexes fragment over time. Periodic REINDEX or VACUUM may be needed.
The rule: index the columns you filter, sort, and join on — not the columns you select. Instead of adding indexes to every column, identify slow queries and create indexes for them.
The N+1 Problem and Indexes
The N+1 problem frequently encountered when using ORMs isn't solved by indexes — but indexes reduce its impact.
// N+1: Separate query per user — 1001 queries const users = await User.findAll(); // 1 query for (const user of users) { const orders = await Order.findAll({ where: { userId: user.id } }); // N queries } // Correct: JOIN in a single query const users = await User.findAll({ include: [{ model: Order }] // 1 or 2 queries });
If there's an index on orders.user_id, the JOIN query runs much faster. Foreign key columns should almost always be indexed.
When indexes are set up correctly, the database stops being the bottleneck. When set up incorrectly, both reads and writes slow down. The difference is knowing which queries you run and designing indexes that support them.