PostgreSQL CTEs and Subqueries: Write Cleaner, More Powerful SQL
Subqueries and CTEs (Common Table Expressions) are two powerful tools for breaking complex SQL queries into manageable pieces. Knowing when to use each β and how to use them well β is a mark of a developer who truly understands SQL.
What Is a Subquery?
A subquery is a query nested inside another query. It can appear in the SELECT, FROM, or WHERE clause.
sql-- Subquery in WHERE clause SELECT name, salary FROM employees WHERE salary > ( SELECT AVG(salary) FROM employees ); -- Subquery in FROM clause (derived table) SELECT dept, avg_salary FROM ( SELECT department AS dept, AVG(salary) AS avg_salary FROM employees GROUP BY department ) AS dept_averages WHERE avg_salary > 80000; -- Subquery in SELECT clause (scalar subquery) SELECT e.name, e.salary, (SELECT AVG(salary) FROM employees WHERE department = e.department) AS dept_avg FROM employees e;
Types of Subqueries
Correlated Subqueries
A correlated subquery references a column from the outer query. It runs once per outer row β potentially slow on large tables.
sql-- Find employees earning more than their department average SELECT name, department, salary FROM employees e WHERE salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department -- references outer query );
EXISTS and NOT EXISTS
More readable and often faster than IN for checking existence:
sql-- Customers who have placed at least one order SELECT name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders WHERE customer_id = c.id ); -- Customers who have never ordered SELECT name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders WHERE customer_id = c.id );
IN vs EXISTS
sql-- IN -- works well for small subquery results SELECT name FROM customers WHERE id IN (SELECT customer_id FROM orders WHERE status = 'pending'); -- EXISTS -- better when subquery result is large or you just need to check presence SELECT name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders WHERE customer_id = c.id AND status = 'pending' );
EXISTS stops at the first match; IN collects all values. For large datasets, EXISTS is generally more efficient.
What Is a CTE?
A CTE (Common Table Expression) is a named temporary result set defined with WITH. It makes complex queries more readable by breaking them into named steps.
sqlWITH high_earners AS ( SELECT name, department, salary FROM employees WHERE salary > 100000 ), dept_counts AS ( SELECT department, COUNT(*) AS headcount FROM high_earners GROUP BY department ) SELECT h.department, d.headcount, AVG(h.salary) AS avg_high_salary FROM high_earners h JOIN dept_counts d ON h.department = d.department GROUP BY h.department, d.headcount ORDER BY avg_high_salary DESC;
You can reference a CTE multiple times in the same query, and CTEs can reference earlier CTEs.
CTE vs Subquery: When to Use Each
Both achieve similar results. The choice is often about readability:
sql-- Subquery version -- harder to read, logic buried SELECT name, revenue FROM ( SELECT c.name, SUM(o.amount) AS revenue FROM customers c JOIN orders o ON o.customer_id = c.id WHERE o.created_at >= NOW() - INTERVAL '30 days' GROUP BY c.name ) AS recent_revenue WHERE revenue > 1000 ORDER BY revenue DESC; -- CTE version -- reads like a story WITH recent_orders AS ( SELECT customer_id, SUM(amount) AS revenue FROM orders WHERE created_at >= NOW() - INTERVAL '30 days' GROUP BY customer_id ), high_value_customers AS ( SELECT c.name, r.revenue FROM customers c JOIN recent_orders r ON r.customer_id = c.id WHERE r.revenue > 1000 ) SELECT name, revenue FROM high_value_customers ORDER BY revenue DESC;
Use CTEs when the query has multiple logical steps. Use subqueries for simple inline filtering or scalar values.
Recursive CTEs
Recursive CTEs process hierarchical data β org charts, category trees, file systems, graph traversal.
sql-- Org chart: find all reports under a manager WITH RECURSIVE reports AS ( -- Anchor: start with the target manager SELECT id, name, manager_id, 0 AS depth FROM employees WHERE id = 5 -- CEO or any manager UNION ALL -- Recursive: join each employee to their manager in the CTE SELECT e.id, e.name, e.manager_id, r.depth + 1 FROM employees e JOIN reports r ON e.manager_id = r.id ) SELECT id, name, depth FROM reports ORDER BY depth, name;
Category breadcrumb path
sqlWITH RECURSIVE category_path AS ( SELECT id, name, parent_id, name::text AS path FROM categories WHERE id = 42 -- target category UNION ALL SELECT c.id, c.name, c.parent_id, c.name || ' > ' || cp.path FROM categories c JOIN category_path cp ON c.id = cp.parent_id ) SELECT path FROM category_path WHERE parent_id IS NULL; -- Returns: "Electronics > Computers > Laptops"
Preventing infinite recursion
Add a depth limit or cycle detection:
sqlWITH RECURSIVE safe_tree AS ( SELECT id, parent_id, 0 AS depth FROM nodes WHERE id = 1 UNION ALL SELECT n.id, n.parent_id, t.depth + 1 FROM nodes n JOIN safe_tree t ON n.parent_id = t.id WHERE t.depth < 10 -- stop at depth 10 ) SELECT * FROM safe_tree;
Writeable CTEs
CTEs can include INSERT, UPDATE, and DELETE statements with RETURNING:
sql-- Move rows: delete from one table, insert into another atomically WITH deleted_rows AS ( DELETE FROM orders_staging WHERE processed = true RETURNING * ) INSERT INTO orders_archive SELECT * FROM deleted_rows; -- Update and return the old values WITH old_prices AS ( UPDATE products SET price = price * 1.10 WHERE category = 'electronics' RETURNING id, name, price / 1.10 AS old_price, price AS new_price ) SELECT * FROM old_prices;
Performance Tips
MATERIALIZED vs non-materialized CTEs
In PostgreSQL 12+, CTEs are non-materialized by default β the planner can inline them and optimize them with the rest of the query.
If a CTE is referenced multiple times and its result is expensive to compute, force materialization so it runs once:
sqlWITH MATERIALIZED expensive_data AS ( SELECT ... FROM large_table WHERE ... ) SELECT * FROM expensive_data WHERE condition_1 UNION ALL SELECT * FROM expensive_data WHERE condition_2;
Without MATERIALIZED, the CTE might execute twice. With it, it executes once and the result is cached.
Avoid correlated subqueries on large tables
Replace correlated subqueries with JOINs or window functions when performance matters:
sql-- Slow: correlated subquery runs once per row SELECT name, salary, (SELECT AVG(salary) FROM employees WHERE department = e.department) AS dept_avg FROM employees e; -- Fast: window function, single pass SELECT name, salary, AVG(salary) OVER (PARTITION BY department) AS dept_avg FROM employees;
Common Interview Questions
Q: What is the difference between a CTE and a temporary table?
A CTE exists only within the single query where it is defined β it is gone when the query finishes. A temporary table persists for the duration of the session (or transaction) and can be indexed. Use CTEs for single-query clarity; use temp tables when you need indexes on intermediate results or reuse across multiple queries in a session.
Q: What makes a recursive CTE work?
A recursive CTE has two parts joined by UNION ALL: the anchor (non-recursive base case that runs once) and the recursive member (references the CTE itself and runs repeatedly until it produces no new rows).
Q: When would you choose NOT EXISTS over LEFT JOIN ... WHERE IS NULL?
Both find non-matching rows, but NOT EXISTS short-circuits on the first match and is often more readable. LEFT JOIN ... WHERE IS NULL can be less clear in intent. On large datasets, NOT EXISTS with a proper index is usually the better choice.
Practice SQL on Froquiz
CTEs and subqueries are tested in every advanced SQL interview. Test your SQL knowledge on Froquiz across beginner, intermediate, and advanced levels.
Summary
- Subqueries can appear in
SELECT,FROM, andWHEREclauses - Correlated subqueries reference the outer query and run once per outer row β watch performance
EXISTSshort-circuits on first match; prefer it overINfor large subquery results- CTEs (
WITH) name intermediate result sets, making complex queries readable - Recursive CTEs process hierarchical data β org charts, trees, graphs
- Writeable CTEs (
INSERT/UPDATE/DELETE ... RETURNING) enable multi-step data operations - Use
MATERIALIZEDto force a CTE to execute once when it is referenced multiple times