Graph Queries in SQL: Recursive CTEs, Adjacency Lists, and WITH RECURSIVE
Graph Queries in SQL: Recursive CTEs, Adjacency Lists, and WITH RECURSIVE
Relational databases can model and query graph data effectively using recursive Common Table Expressions (CTEs). While not as optimized as dedicated graph databases, SQL-based graph queries handle many real-world use cases without adding infrastructure.
Modeling Graphs in SQL
The adjacency list model represents nodes and edges as separate tables:
CREATE TABLE nodes (
id BIGSERIAL PRIMARY KEY,
label TEXT NOT NULL,
properties JSONB DEFAULT '{}'
);
CREATE TABLE edges (
id BIGSERIAL PRIMARY KEY,
source_id BIGINT NOT NULL REFERENCES nodes(id),
target_id BIGINT NOT NULL REFERENCES nodes(id),
edge_type TEXT NOT NULL,
properties JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for graph traversal
CREATE INDEX idx_edges_source ON edges (source_id);
CREATE INDEX idx_edges_target ON edges (target_id);
CREATE INDEX idx_edges_type ON edges (edge_type);
For an organizational hierarchy, the "manager-employee" relationship:
INSERT INTO nodes (id, label) VALUES
(1, 'Alice CEO'),
(2, 'Bob CTO'),
(3, 'Carol CFO'),
(4, 'Dave Engineering Lead'),
(5, 'Eve Senior Engineer'),
(6, 'Frank Junior Engineer');
INSERT INTO edges (source_id, target_id, edge_type) VALUES
(1, 2, 'manages'),
(1, 3, 'manages'),
(2, 4, 'manages'),
(4, 5, 'manages'),
(5, 6, 'manages');
WITH RECURSIVE Basics
A recursive CTE has two parts: a base term and a recursive term, joined by `UNION ALL`:
WITH RECURSIVE org_chart AS (
-- Base: find the CEO
SELECT id, label, 0 AS depth, ARRAY[id] AS path
FROM nodes
WHERE id = 1
UNION ALL
-- Recursive: find direct reports
SELECT n.id, n.label, oc.depth + 1, oc.path || n.id
FROM nodes n
JOIN edges e ON e.target_id = n.id AND e.edge_type = 'manages'
JOIN org_chart oc ON oc.id = e.source_id
)
SELECT repeat(' ', depth) || label AS hierarchy
FROM org_chart
ORDER BY path;
Result:
Alice CEO
Bob CTO
Dave Engineering Lead
Eve Senior Engineer
Frank Junior Engineer
Carol CFO
Graph Traversal Patterns
Shortest Path (BFS)
WITH RECURSIVE bfs AS (
-- Base: start node
SELECT id AS node_id, 0 AS distance, ARRAY[id] AS path
FROM nodes WHERE id = 1
UNION ALL
-- Explore neighbors
SELECT e.target_id, bfs.distance + 1, bfs.path || e.target_id
FROM bfs
JOIN edges e ON e.source_id = bfs.node_id
WHERE NOT e.target_id = ANY(bfs.path) -- avoid cycles
AND bfs.distance < 10 -- max depth
)
SELECT * FROM bfs
WHERE node_id = 6 -- target node
LIMIT 1;
All Paths Between Two Nodes
WITH RECURSIVE paths AS (
SELECT e.source_id, e.target_id, ARRAY[e.source_id, e.target_id] AS path
FROM edges e
WHERE e.source_id = 1 AND e.target_id = 6
UNION ALL
SELECT p.source_id, e.target_id, p.path || e.target_id
FROM paths p
JOIN edges e ON e.source_id = p.target_id
WHERE NOT e.target_id = ANY(p.path)
AND array_length(p.path, 1) < 10
)
SELECT path FROM paths WHERE target_id = 6;
Cycle Detection
WITH RECURSIVE detect_cycles AS (
SELECT id, ARRAY[id] AS visited, false AS is_cycle
FROM nodes
WHERE label = 'Alice CEO'
UNION ALL
SELECT n.id, dc.visited || n.id, n.id = ANY(dc.visited)
FROM detect_cycles dc
JOIN edges e ON e.source_id = dc.id
JOIN nodes n ON n.id = e.target_id
WHERE NOT dc.is_cycle
)
SELECT * FROM detect_cycles WHERE is_cycle;
Optimizing Recursive Queries
Recursive CTEs execute sequentially (each iteration is one plan node). Optimization strategies:
* **Limit depth early**: Add a depth guard in the `WHERE` clause.
2\. **Use cycle detection**: The `ARRAY[...]` path check can be expensive for deep graphs. Use PostgreSQL 14+'s `CYCLE` clause:
WITH RECURSIVE org_chart AS (
SELECT id, label, 0 AS depth
FROM nodes WHERE id = 1
UNION ALL
SELECT n.id, n.label, oc.depth + 1
FROM nodes n
JOIN edges e ON e.target_id = n.id AND e.edge_type = 'manages'
JOIN org_chart oc ON oc.id = e.source_id
)
CYCLE id SET is_cycle USING path
SELECT * FROM org_chart;
The `CYCLE` clause is more efficient than manual `ARRAY` checks because it uses efficient internal data structures.
3\. **Create indexes**: The source and target columns must be indexed for good join performance.
SQL Graphs vs Dedicated Graph Databases
| Capability | PostgreSQL (Recursive CTEs) | Neo4j (Cypher) | |------------|-----------------------------|-----------------| | Query syntax | WITH RECURSIVE + JOINs | MATCH (n)-[r]->(m) | | Path expressions | Manual construction | Built-in | | Variable-length paths | Recursive CTE depth | [r*1..5] syntax | | Graph algorithms | Custom SQL | Built-in (PageRank, etc.) | | Performance | Table scans, JOINs | Index-free adjacency | | Horizontal scaling | Read replicas, Citus | Native sharding |
When to Use Recursive CTEs
Recursive CTEs are the right tool when:
* The graph is a tree or near-tree hierarchy (org charts, category trees, bill of materials).
* The maximum depth is bounded (typically under 20 levels).
* You already use PostgreSQL and do not want to introduce a separate graph database.
* Graph traversal is a small fraction of overall query volume.
Consider a dedicated graph database when:
* You need unbounded, many-to-many graph traversal at scale.
* Graph algorithms (shortest path, centrality, PageRank) are the core of your application.
* Path queries traverse millions of nodes and edges.
* You need property graph features (labels on both nodes and edges) as a primary data model.
Recursive CTEs prove that SQL can handle graph queries. For bounded-depth hierarchies, they perform well and keep your architecture simple. When your graph queries become the dominant workload, it is time to evaluate a dedicated graph database.