Graph Queries in SQL: Recursive CTEs, Adjacency Lists, and WITH RECURSIVE


Graph Queries in SQL: Recursive CTEs, Adjacency Lists, and WITH RECURSIVE

Relational databases can model and query graph data effectively using recursive Common Table Expressions (CTEs). While not as optimized as dedicated graph databases, SQL-based graph queries handle many real-world use cases without adding infrastructure.

Modeling Graphs in SQL

The adjacency list model represents nodes and edges as separate tables:




CREATE TABLE nodes (


id BIGSERIAL PRIMARY KEY,


label TEXT NOT NULL,


properties JSONB DEFAULT '{}'


);




CREATE TABLE edges (


id BIGSERIAL PRIMARY KEY,


source_id BIGINT NOT NULL REFERENCES nodes(id),


target_id BIGINT NOT NULL REFERENCES nodes(id),


edge_type TEXT NOT NULL,


properties JSONB DEFAULT '{}',


created_at TIMESTAMPTZ DEFAULT NOW()


);




-- Indexes for graph traversal


CREATE INDEX idx_edges_source ON edges (source_id);


CREATE INDEX idx_edges_target ON edges (target_id);


CREATE INDEX idx_edges_type ON edges (edge_type);





For an organizational hierarchy, the "manager-employee" relationship:




INSERT INTO nodes (id, label) VALUES


(1, 'Alice CEO'),


(2, 'Bob CTO'),


(3, 'Carol CFO'),


(4, 'Dave Engineering Lead'),


(5, 'Eve Senior Engineer'),


(6, 'Frank Junior Engineer');




INSERT INTO edges (source_id, target_id, edge_type) VALUES


(1, 2, 'manages'),


(1, 3, 'manages'),


(2, 4, 'manages'),


(4, 5, 'manages'),


(5, 6, 'manages');





WITH RECURSIVE Basics

A recursive CTE has two parts: a base term and a recursive term, joined by `UNION ALL`:




WITH RECURSIVE org_chart AS (


-- Base: find the CEO


SELECT id, label, 0 AS depth, ARRAY[id] AS path


FROM nodes


WHERE id = 1




UNION ALL




-- Recursive: find direct reports


SELECT n.id, n.label, oc.depth + 1, oc.path || n.id


FROM nodes n


JOIN edges e ON e.target_id = n.id AND e.edge_type = 'manages'


JOIN org_chart oc ON oc.id = e.source_id


)


SELECT repeat(' ', depth) || label AS hierarchy


FROM org_chart


ORDER BY path;





Result:




Alice CEO


Bob CTO


Dave Engineering Lead


Eve Senior Engineer


Frank Junior Engineer


Carol CFO





Graph Traversal Patterns

Shortest Path (BFS)




WITH RECURSIVE bfs AS (


-- Base: start node


SELECT id AS node_id, 0 AS distance, ARRAY[id] AS path


FROM nodes WHERE id = 1




UNION ALL




-- Explore neighbors


SELECT e.target_id, bfs.distance + 1, bfs.path || e.target_id


FROM bfs


JOIN edges e ON e.source_id = bfs.node_id


WHERE NOT e.target_id = ANY(bfs.path) -- avoid cycles


AND bfs.distance < 10 -- max depth


)


SELECT * FROM bfs


WHERE node_id = 6 -- target node


LIMIT 1;





All Paths Between Two Nodes




WITH RECURSIVE paths AS (


SELECT e.source_id, e.target_id, ARRAY[e.source_id, e.target_id] AS path


FROM edges e


WHERE e.source_id = 1 AND e.target_id = 6




UNION ALL




SELECT p.source_id, e.target_id, p.path || e.target_id


FROM paths p


JOIN edges e ON e.source_id = p.target_id


WHERE NOT e.target_id = ANY(p.path)


AND array_length(p.path, 1) < 10


)


SELECT path FROM paths WHERE target_id = 6;





Cycle Detection




WITH RECURSIVE detect_cycles AS (


SELECT id, ARRAY[id] AS visited, false AS is_cycle


FROM nodes


WHERE label = 'Alice CEO'




UNION ALL




SELECT n.id, dc.visited || n.id, n.id = ANY(dc.visited)


FROM detect_cycles dc


JOIN edges e ON e.source_id = dc.id


JOIN nodes n ON n.id = e.target_id


WHERE NOT dc.is_cycle


)


SELECT * FROM detect_cycles WHERE is_cycle;





Optimizing Recursive Queries

Recursive CTEs execute sequentially (each iteration is one plan node). Optimization strategies:


* **Limit depth early**: Add a depth guard in the `WHERE` clause.

2\. **Use cycle detection**: The `ARRAY[...]` path check can be expensive for deep graphs. Use PostgreSQL 14+'s `CYCLE` clause:




WITH RECURSIVE org_chart AS (


SELECT id, label, 0 AS depth


FROM nodes WHERE id = 1




UNION ALL




SELECT n.id, n.label, oc.depth + 1


FROM nodes n


JOIN edges e ON e.target_id = n.id AND e.edge_type = 'manages'


JOIN org_chart oc ON oc.id = e.source_id


)


CYCLE id SET is_cycle USING path


SELECT * FROM org_chart;





The `CYCLE` clause is more efficient than manual `ARRAY` checks because it uses efficient internal data structures.

3\. **Create indexes**: The source and target columns must be indexed for good join performance.

SQL Graphs vs Dedicated Graph Databases

| Capability | PostgreSQL (Recursive CTEs) | Neo4j (Cypher) | |------------|-----------------------------|-----------------| | Query syntax | WITH RECURSIVE + JOINs | MATCH (n)-[r]->(m) | | Path expressions | Manual construction | Built-in | | Variable-length paths | Recursive CTE depth | [r*1..5] syntax | | Graph algorithms | Custom SQL | Built-in (PageRank, etc.) | | Performance | Table scans, JOINs | Index-free adjacency | | Horizontal scaling | Read replicas, Citus | Native sharding |

When to Use Recursive CTEs

Recursive CTEs are the right tool when:


* The graph is a tree or near-tree hierarchy (org charts, category trees, bill of materials).

* The maximum depth is bounded (typically under 20 levels).

* You already use PostgreSQL and do not want to introduce a separate graph database.

* Graph traversal is a small fraction of overall query volume.


Consider a dedicated graph database when:


* You need unbounded, many-to-many graph traversal at scale.

* Graph algorithms (shortest path, centrality, PageRank) are the core of your application.

* Path queries traverse millions of nodes and edges.

* You need property graph features (labels on both nodes and edges) as a primary data model.


Recursive CTEs prove that SQL can handle graph queries. For bounded-depth hierarchies, they perform well and keep your architecture simple. When your graph queries become the dominant workload, it is time to evaluate a dedicated graph database.