PostgreSQL Can Handle Everything

"PostgreSQL can do everything." "Joins are free." That's what we were told. And technically, it's true. But just because you can walk a 3.1 million-edge graph, join four tables, and check access control in a single query doesn't mean you should.

We build a structured data app. Everything is a graph, items linked to items, nested arbitrarily deep, with access control on every node. Pure PostgreSQL, no cache layer. It worked fine at small scale. Then we hit 100k items and things got ugly: 5-second page loads, CPU spikes on the database, and one endpoint that took 28 minutes per request.

ironic-gif

Here are five patterns that brought us back to sub-second.

Stop materializing what you don't need

Our recursive CTEs pre-built the entire graph before walking it. The parent_links CTE joined every edge in the database, 3.1 million rows materialized just to find the ancestors of a single item.

-- before: materialize ALL edges, then walk
WITH RECURSIVE parent_links AS (
    SELECT child.id AS child_id, parent.id AS parent_id
    FROM instances
    JOIN value_item_list ON ...
    JOIN items child ON ...
    JOIN items parent ON ...
),
walk AS (
    SELECT parent_id FROM parent_links WHERE child_id = $1
    UNION ALL
    SELECT pl.parent_id FROM parent_links pl
    JOIN walk w ON pl.child_id = w.parent_id
)
SELECT * FROM walk;
sql

The fix: inline the edge lookup in each recursion step. Only visit ancestors of the target items.

-- after: walk only what you need
WITH RECURSIVE walk AS (
    SELECT vil.item_id AS current_id
    FROM value_item_list vil
    JOIN instances ins ON ins.id = vil.instance_id
    WHERE vil.item_id = $1
    UNION
    SELECT vil.item_id
    FROM walk w
    JOIN instances ins ON ins.item_id = w.current_id
    JOIN value_item_list vil ON vil.instance_id = ins.id
)
SELECT * FROM walk;
sql
Note the UNION instead of UNION ALL. In a DAG with shared ancestors (diamond shapes), UNION ALL revisits the same nodes through every path — exponential blowup. UNION deduplicates naturally.

5.9s → 106ms. 56x faster.

One query instead of N

Cycle detection is critical in a DAG — a cycle corrupts everything. Our original code checked each child one by one:

// before: N sequential queries
for child_id in &children {
    let has_cycle = is_cyclic(&mut tx, *child_id, &ancestors).await?;
    if has_cycle { return Err(...); }
}
rust

The fix is embarrassingly simple — batch with ANY:

// after: one query for all children
let has_cycle = are_cyclic(&mut tx, &children, &ancestors).await?;
rust

Under the hood, the SQL uses WHERE id = ANY($1) instead of a single WHERE id = $2. One round-trip instead of N.

350ms → 0.2ms for 5 children. 1750x faster.

The JOIN that multiplied your rows

We needed the maximum rank among existing children to insert new items at the right position. The natural approach:

-- before: LEFT JOIN + GROUP BY
SELECT p.item_id, MAX(vil.rank) AS max_rank
FROM parents p
LEFT JOIN value_item_list vil ON vil.instance_id = p.instance_id
GROUP BY p.item_id;
sql

With 64k existing children, this created a 320k-row cross-product. PostgreSQL spilled to disk.

-- after: correlated subquery
SELECT p.item_id,
    (SELECT MAX(rank) FROM value_item_list
     WHERE instance_id = p.instance_id AND attribute_id = p.attribute_id)
     AS max_rank
FROM parents p;
sql

Same result, no cross-product.

1091ms → 0.36ms. 3000x faster.

ironic-gif

We applied the same thinking to access control. A LEFT JOIN on access rights was multiplying rows by 4x, then deduplicating with GROUP BY. Replaced it with a NOT EXISTS anti-join, cleaner, better, feaster, stronger and it fixed a semantic bug where the old query passed if any item was accessible instead of all items.

Only join what you actually need

Our search query always joined the models and instances tables in the main CTE, even when no filter or sort column referenced them. With 2349 models and no index on models(item_id), PostgreSQL chose a nested loop with a sequential scan: 6 million row comparisons for 2568 matches.

The fix: conditionally emit the JOIN only when filters need it. And when you do need it, defer it past pagination, join on 50 paginated rows, not 64k CTE rows.

2.9x on multi-filter queries. 5.5x on search.

Split and run in parallel

We had a monolithic value retrieval query that LEFT JOIN-ed four backing tables (simple values, item lists, file lists, entity lists) regardless of which attribute types the model actually used. Every row paid for every join.

We partitioned attributes by type, then ran type-specific queries in parallel with tokio::try_join!. Each query only touches the table it needs.

211ms → 1.3ms for a single item. 160x faster.

What we learned

None of this required a new database, a cache layer, or a rewrite. Every fix was about understanding what PostgreSQL was actually doing with our queries — and stopping it from doing too much.

Recursive CTEs that materialize the entire graph? Inline the walk. Loops that fire N queries? Batch them. JOINs that multiply rows into cross-products? Use correlated subqueries. JOINs on tables you don't even filter on? Skip them. One giant query touching every table? Split by table, run in parallel.

The get-children endpoint went from 28 minutes to under a second. The insert endpoint from 10 seconds to 200 milliseconds. Same database, same hardware, same PostgreSQL.

The real lesson is simple: EXPLAIN ANALYZE is your best friend. Every one of these fixes started by reading the query plan and asking "why is this scanning 3 million rows to find 5 ancestors?" The answer was always the same, we were asking PostgreSQL to do work we didn't need.

PostgreSQL can handle everything. You just have to stop asking it to handle everything at once.

ironic-gif