PostgreSQL CTEs and Subqueries: Write Cleaner, More Powerful SQL

Subqueries and CTEs (Common Table Expressions) are two powerful tools for breaking complex SQL queries into manageable pieces. Knowing when to use each — and how to use them well — is a mark of a developer who truly understands SQL.

What Is a Subquery?

A subquery is a query nested inside another query. It can appear in the SELECT, FROM, or WHERE clause.

sql
-- Subquery in WHERE clause
SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary) FROM employees
);

-- Subquery in FROM clause (derived table)
SELECT dept, avg_salary
FROM (
  SELECT department AS dept, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 80000;

-- Subquery in SELECT clause (scalar subquery)
SELECT
  e.name,
  e.salary,
  (SELECT AVG(salary) FROM employees WHERE department = e.department) AS dept_avg
FROM employees e;

Types of Subqueries

Correlated Subqueries

A correlated subquery references a column from the outer query. It runs once per outer row — potentially slow on large tables.

sql
-- Find employees earning more than their department average
SELECT name, department, salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE department = e.department  -- references outer query
);

EXISTS and NOT EXISTS

More readable and often faster than IN for checking existence:

sql
-- Customers who have placed at least one order
SELECT name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders WHERE customer_id = c.id
);

-- Customers who have never ordered
SELECT name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders WHERE customer_id = c.id
);

IN vs EXISTS

sql
-- IN -- works well for small subquery results
SELECT name FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE status = 'pending');

-- EXISTS -- better when subquery result is large or you just need to check presence
SELECT name FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders
  WHERE customer_id = c.id AND status = 'pending'
);

EXISTS stops at the first match; IN collects all values. For large datasets, EXISTS is generally more efficient.

What Is a CTE?

A CTE (Common Table Expression) is a named temporary result set defined with WITH. It makes complex queries more readable by breaking them into named steps.

sql
WITH high_earners AS (
  SELECT name, department, salary
  FROM employees
  WHERE salary > 100000
),
dept_counts AS (
  SELECT department, COUNT(*) AS headcount
  FROM high_earners
  GROUP BY department
)
SELECT
  h.department,
  d.headcount,
  AVG(h.salary) AS avg_high_salary
FROM high_earners h
JOIN dept_counts d ON h.department = d.department
GROUP BY h.department, d.headcount
ORDER BY avg_high_salary DESC;

You can reference a CTE multiple times in the same query, and CTEs can reference earlier CTEs.

CTE vs Subquery: When to Use Each

Both achieve similar results. The choice is often about readability:

sql
-- Subquery version -- harder to read, logic buried
SELECT name, revenue
FROM (
  SELECT
    c.name,
    SUM(o.amount) AS revenue
  FROM customers c
  JOIN orders o ON o.customer_id = c.id
  WHERE o.created_at >= NOW() - INTERVAL '30 days'
  GROUP BY c.name
) AS recent_revenue
WHERE revenue > 1000
ORDER BY revenue DESC;

-- CTE version -- reads like a story
WITH recent_orders AS (
  SELECT customer_id, SUM(amount) AS revenue
  FROM orders
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY customer_id
),
high_value_customers AS (
  SELECT c.name, r.revenue
  FROM customers c
  JOIN recent_orders r ON r.customer_id = c.id
  WHERE r.revenue > 1000
)
SELECT name, revenue
FROM high_value_customers
ORDER BY revenue DESC;

Use CTEs when the query has multiple logical steps. Use subqueries for simple inline filtering or scalar values.

Recursive CTEs

Recursive CTEs process hierarchical data — org charts, category trees, file systems, graph traversal.

sql
-- Org chart: find all reports under a manager
WITH RECURSIVE reports AS (
  -- Anchor: start with the target manager
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE id = 5  -- CEO or any manager

  UNION ALL

  -- Recursive: join each employee to their manager in the CTE
  SELECT e.id, e.name, e.manager_id, r.depth + 1
  FROM employees e
  JOIN reports r ON e.manager_id = r.id
)
SELECT id, name, depth
FROM reports
ORDER BY depth, name;

Category breadcrumb path

sql
WITH RECURSIVE category_path AS (
  SELECT id, name, parent_id, name::text AS path
  FROM categories
  WHERE id = 42  -- target category

  UNION ALL

  SELECT c.id, c.name, c.parent_id,
         c.name || ' > ' || cp.path
  FROM categories c
  JOIN category_path cp ON c.id = cp.parent_id
)
SELECT path
FROM category_path
WHERE parent_id IS NULL;
-- Returns: "Electronics > Computers > Laptops"

Preventing infinite recursion

Add a depth limit or cycle detection:

sql
WITH RECURSIVE safe_tree AS (
  SELECT id, parent_id, 0 AS depth
  FROM nodes WHERE id = 1

  UNION ALL

  SELECT n.id, n.parent_id, t.depth + 1
  FROM nodes n
  JOIN safe_tree t ON n.parent_id = t.id
  WHERE t.depth < 10  -- stop at depth 10
)
SELECT * FROM safe_tree;

Writeable CTEs

CTEs can include INSERT, UPDATE, and DELETE statements with RETURNING:

sql
-- Move rows: delete from one table, insert into another atomically
WITH deleted_rows AS (
  DELETE FROM orders_staging
  WHERE processed = true
  RETURNING *
)
INSERT INTO orders_archive
SELECT * FROM deleted_rows;

-- Update and return the old values
WITH old_prices AS (
  UPDATE products
  SET price = price * 1.10
  WHERE category = 'electronics'
  RETURNING id, name, price / 1.10 AS old_price, price AS new_price
)
SELECT * FROM old_prices;

Performance Tips

MATERIALIZED vs non-materialized CTEs

In PostgreSQL 12+, CTEs are non-materialized by default — the planner can inline them and optimize them with the rest of the query.

If a CTE is referenced multiple times and its result is expensive to compute, force materialization so it runs once:

sql
WITH MATERIALIZED expensive_data AS (
  SELECT ... FROM large_table WHERE ...
)
SELECT * FROM expensive_data WHERE condition_1
UNION ALL
SELECT * FROM expensive_data WHERE condition_2;

Without MATERIALIZED, the CTE might execute twice. With it, it executes once and the result is cached.

Avoid correlated subqueries on large tables

Replace correlated subqueries with JOINs or window functions when performance matters:

sql
-- Slow: correlated subquery runs once per row
SELECT name, salary,
       (SELECT AVG(salary) FROM employees WHERE department = e.department) AS dept_avg
FROM employees e;

-- Fast: window function, single pass
SELECT name, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

Common Interview Questions

Q: What is the difference between a CTE and a temporary table?

A CTE exists only within the single query where it is defined — it is gone when the query finishes. A temporary table persists for the duration of the session (or transaction) and can be indexed. Use CTEs for single-query clarity; use temp tables when you need indexes on intermediate results or reuse across multiple queries in a session.

Q: What makes a recursive CTE work?

A recursive CTE has two parts joined by UNION ALL: the anchor (non-recursive base case that runs once) and the recursive member (references the CTE itself and runs repeatedly until it produces no new rows).

Q: When would you choose NOT EXISTS over LEFT JOIN ... WHERE IS NULL?

Both find non-matching rows, but NOT EXISTS short-circuits on the first match and is often more readable. LEFT JOIN ... WHERE IS NULL can be less clear in intent. On large datasets, NOT EXISTS with a proper index is usually the better choice.

Practice SQL on Froquiz

CTEs and subqueries are tested in every advanced SQL interview. Test your SQL knowledge on Froquiz across beginner, intermediate, and advanced levels.

Summary

Subqueries can appear in SELECT, FROM, and WHERE clauses
Correlated subqueries reference the outer query and run once per outer row — watch performance
EXISTS short-circuits on first match; prefer it over IN for large subquery results
CTEs (WITH) name intermediate result sets, making complex queries readable
Recursive CTEs process hierarchical data — org charts, trees, graphs
Writeable CTEs (INSERT/UPDATE/DELETE ... RETURNING) enable multi-step data operations
Use MATERIALIZED to force a CTE to execute once when it is referenced multiple times