Module 3 · Postgres & Data · Deep Dive

Advanced SQL & Postgres

CTEs, window functions, query plans, JSONB and transaction isolation — the layer that separates "I can write a SELECT" from "I own this database." This is the single biggest interview delta for a 2026 full-stack role.

IntermediateAdvancedBuild

Why this matters The fundamentals lesson got DocChat's tables built. This one is what gets asked in the technical screen. "Show me the 3 most recent documents per user" — that's a window function, and most candidates freeze on it. "Usage analytics by day" — that's date_trunc plus a running total. "Re-ingesting a document re-writes its chunks" — that's an UPSERT. "This endpoint is slow" — that's EXPLAIN ANALYZE. You already know MySQL from your PHP days; Postgres rewards you with sharper tools and stricter defaults. Learn these and you stop fearing the SQL round.

In this lesson

CTEs — WITH … AS
Window functions
Reading EXPLAIN plans
Indexes, deeper
UPSERT & RETURNING
JSONB
Dates, times & intervals
Transaction isolation & MVCC
Sharp small tools
Drill: top-N per user
Flashcards
Check yourself

1 · CTEs — readable multi-step queries

A Common Table Expression (CTE) is a named, temporary result set defined with WITH … AS (…) at the top of a query. Think of it as a local variable for a subquery: you name a step, then refer to that name below. It turns a deeply-nested mess into a top-to-bottom recipe.

-- "users who own more than 3 docs, with their newest doc's title"
WITH doc_counts AS (
    SELECT owner_id, COUNT(*) AS n
    FROM documents
    GROUP BY owner_id
)
SELECT u.name, c.n
FROM doc_counts c
JOIN users u ON u.id = c.owner_id
WHERE c.n > 3
ORDER BY c.n DESC;

You can chain several CTEs (comma-separated) so each step builds on the last — far easier to read and debug than one giant nested subquery. The query reads like prose: compute the counts, then join names, then filter.

PHP bridge: MySQL gained CTEs in 8.0, so the syntax is identical to what you may have seen. Postgres has had them for years and they're idiomatic — reach for a CTE the moment a subquery nests more than once.

Recursive CTEs walking a tree

Add RECURSIVE and a CTE can reference itself — the standard way to walk hierarchies (a category tree, an org chart, a comment thread) without a loop in application code. It runs the base query once, then repeatedly feeds its own output back in until nothing new appears.

-- all descendants of category 1, with their depth
WITH RECURSIVE subtree AS (
    -- base case: the starting node
    SELECT id, name, parent_id, 0 AS depth
    FROM categories
    WHERE id = 1

    UNION ALL

    -- recursive step: children of rows already found
    SELECT c.id, c.name, c.parent_id, s.depth + 1
    FROM categories c
    JOIN subtree s ON c.parent_id = s.id
)
SELECT * FROM subtree ORDER BY depth;

Interview: "How do you query a tree stored as parent_id?" Say: a recursive CTE — a base case selecting the root, UNION ALL, and a recursive term that joins the table to the CTE on parent_id = id. Postgres iterates until the recursive term returns no rows. The alternative (looping in app code, N queries deep) is what you're replacing.

2 · Window functions

This is the topic. A window function computes a value across a set of rows related to the current row — but, crucially, keeps every row in the output. That's the whole difference from GROUP BY, which collapses each group into a single summary row. A window lets you say "rank this row within its user's documents" while still returning the row itself.

The shape is always fn() OVER (PARTITION BY … ORDER BY …). PARTITION BY splits rows into groups (like GROUP BY but without collapsing); ORDER BY orders rows inside each partition so functions like ranking and running totals know the sequence.

ROW_NUMBER / RANK / DENSE_RANK

-- number each user's docs, newest = 1
SELECT
    id, owner_id, title, created_at,
    ROW_NUMBER() OVER (
        PARTITION BY owner_id
        ORDER BY created_at DESC
    ) AS rn
FROM documents;

Now wrap it in a CTE and filter WHERE rn <= 3 to get the classic top-3 docs per user — the canonical window-function interview question:

WITH ranked AS (
    SELECT id, owner_id, title, created_at,
           ROW_NUMBER() OVER (PARTITION BY owner_id
                              ORDER BY created_at DESC) AS rn
    FROM documents
)
SELECT owner_id, title, created_at
FROM ranked
WHERE rn <= 3;

The three ranking functions differ only on ties:

ROW_NUMBER() — always distinct: 1, 2, 3, 4. Ties broken arbitrarily.
RANK() — ties share a rank, then it skips: 1, 1, 3, 4.
DENSE_RANK() — ties share a rank, no gap: 1, 1, 2, 3.

Running totals & SUM() OVER

-- cumulative documents uploaded over time, per user
SELECT owner_id, created_at,
       COUNT(*) OVER (
           PARTITION BY owner_id
           ORDER BY created_at
       ) AS running_total
FROM documents;

With an ORDER BY inside the window, an aggregate like SUM or COUNT becomes a running total — it accumulates row by row instead of returning one grand total. Drop the ORDER BY and you get the partition's overall total repeated on every row, handy for "this row's value vs the group total."

LAG & LEAD reach across rows

-- gap in days between each upload and the user's previous one
SELECT owner_id, created_at,
       created_at - LAG(created_at) OVER (
           PARTITION BY owner_id ORDER BY created_at
       ) AS since_previous
FROM documents;

LAG(col) reads the column from the previous row in the window; LEAD(col) reads the next. Perfect for "change since last time" — week-over-week growth, time-between-events, price deltas — without a self-join.

Interview answer · GROUP BY vs window function

"They both partition rows — when do you use which?" Answer crisply: GROUP BY collapses each group into one summary row, so you lose the individual rows. A window function keeps every row and adds a computed column alongside it.

So: "total docs per user" is GROUP BY (one row per user). "Each doc, labelled with its rank among that user's docs" or "each doc next to its user's running total" is a window function (every doc still returned). If the question is "show me the rows and something computed over their group," it's a window. The giveaway phrases: per-row, top-N per group, running total, compared to previous.

3 · Reading EXPLAIN plans

When a query is slow, EXPLAIN shows you how Postgres intends to run it — the query plan — without running it. EXPLAIN ANALYZE actually executes the query and reports real timings and real row counts. That distinction matters: EXPLAIN alone gives estimates; EXPLAIN ANALYZE gives the truth (so never run it on a destructive statement in production — it really does the work).

EXPLAIN ANALYZE
SELECT * FROM documents WHERE owner_id = 42;

A plan is a tree of nodes, read inside-out. The lines you care about:

You see	It means
`Seq Scan`	Reading every row to find matches. Fine on tiny tables; a red flag on big ones with a selective filter — usually a missing index.
`Index Scan`	Jumping straight to matching rows via an index. What you want for selective lookups.
`cost=0.00..18.5`	Estimated startup..total cost in arbitrary units. Lower is better; compare alternatives.
`rows=…`	Estimated rows. With `ANALYZE` you also get `actual rows` — a big gap means stale stats (run `ANALYZE` the table).
`actual time=…`	Real milliseconds for that node (only with `ANALYZE`). Find the node eating the time.

The skill is scanning the tree for the most expensive node and asking "why is this a Seq Scan?" If the filter is selective (returns few rows) but Postgres scans the whole table, you're missing an index on that column.

Interview answer · "Your query is slow — how do you debug it?"

Walk the interviewer through the exact loop, naming the tools:

Run EXPLAIN ANALYZE on the real query with real parameters — get the actual plan and timings, not a guess.
Find the node eating the time — typically a Seq Scan over a large table, or a rows estimate wildly off from actual rows.
If it's scanning a big table to satisfy a selective WHERE/JOIN, add the right index (a B-tree on the filtered/joined column; a composite or partial index if the pattern calls for it).
Re-run EXPLAIN ANALYZE and confirm it flipped to an Index Scan and the time dropped. Measure, don't assume.

Bonus points: mention checking that statistics are fresh (ANALYZE the table), and that an index won't help if the filter isn't selective or the query returns most of the table.

4 · Indexes, deeper

The fundamentals lesson introduced the B-tree index. Three refinements come up constantly in interviews and real tuning.

Composite indexes column order matters

CREATE INDEX idx_docs_owner_created
ON documents (owner_id, created_at DESC);

A multi-column index is sorted by the first column, then the second within that. The leftmost-prefix rule: this index helps queries filtering on owner_id, or on owner_id AND created_at — but not a query filtering on created_at alone, because the second column is only ordered within each owner_id. Put the column you always filter by first. This index is also perfect for our top-N-per-user query: it filters by owner and pre-sorts by date.

Partial indexes index only what you query

-- only active users are ever looked up by email
CREATE INDEX idx_users_active_email
ON users (email)
WHERE is_active;

A WHERE on the index definition means Postgres indexes only the matching rows — smaller, faster, and cheaper to maintain. Ideal when queries always carry the same condition (e.g. WHERE is_active, or WHERE deleted_at IS NULL).

GIN indexes for JSONB & full-text

B-trees are for scalar equality and ranges. For "does this JSONB contain that key?" or "does this document match these search terms?" you need a GIN (Generalized Inverted) index — it indexes the contents of composite values. We use one below for JSONB.

When an index is NOT used Even a perfect index gets ignored when: the filter isn't selective (returns most of the table — a Seq Scan is genuinely faster); you wrapped the column in a function (WHERE lower(email) = … can't use a plain index on email — index the expression instead); a leading % in LIKE '%x' defeats a B-tree; or the table is so small the planner doesn't bother. EXPLAIN tells you which.

5 · UPSERT — insert-or-update in one statement

You often need "insert this row, but if it already exists, update it instead." Doing that as a SELECT-then-INSERT-or-UPDATE in app code is racy under concurrency. Postgres does it atomically with INSERT … ON CONFLICT.

-- re-ingesting a doc: same (document_id, chunk_index) overwrites the chunk
INSERT INTO chunks (document_id, chunk_index, content, embedding)
VALUES (7, 0, 'Retrieval augmented…', '[…]')
ON CONFLICT (document_id, chunk_index)
DO UPDATE SET
    content   = EXCLUDED.content,
    embedding = EXCLUDED.embedding
RETURNING id;

The conflict target (document_id, chunk_index) must be a UNIQUE constraint or primary key. EXCLUDED is the magic word: it's the row you tried to insert, so EXCLUDED.content is the new value. Two variants:

ON CONFLICT … DO UPDATE SET … — overwrite the existing row (a true upsert).
ON CONFLICT … DO NOTHING — silently skip the duplicate, no error. Great for idempotent inserts.

RETURNING hands back columns from the affected row in the same round-trip — the new id, a created_at, whatever you need. It works on INSERT, UPDATE and DELETE.

PHP bridge: MySQL's INSERT … ON DUPLICATE KEY UPDATE is the closest cousin. Postgres' version is more explicit (you name the conflicting columns) and pairs with RETURNING — there's no separate last_insert_id() call.

6 · JSONB — structured data in a column

Postgres can store JSON natively in a jsonb column — parsed, binary, indexable. It's perfect for semi-structured data that doesn't deserve its own columns: per-document metadata, flexible settings, API payloads you want to keep whole.

SELECT metadata -> 'source'            -- returns JSON: "pdf"
     , metadata ->> 'source'           -- returns text:  pdf
     , metadata -> 'tags' ->> 0         -- first tag as text
FROM documents
WHERE metadata @> '{"source": "pdf"}';  -- "contains" this key/value

-> extracts a field/element as JSON (chainable).
->> extracts it as text (use this when comparing to a string).
@> is "contains" — true if the left JSONB contains the right. This is the workhorse filter, and it's indexable.

-- make @> / key-existence lookups fast on a big table
CREATE INDEX idx_docs_metadata
ON documents USING GIN (metadata);

JSONB vs a real column Use a column for anything you filter, join, sort, or constrain on regularly — it's typed, indexed by default, and the schema documents it. Use JSONB for genuinely variable, sparse, or rarely-queried data where adding columns would be churn. The trap is reaching for JSONB to dodge schema design, then needing to query inside it constantly. If you @> the same key every request, promote it to a column.

PHP bridge: MySQL has a JSON type too, but Postgres' jsonb is more mature — richer operators (@>, ?, path queries) and proper GIN indexing. Prefer jsonb over plain json in Postgres: json stores raw text and re-parses every time.

7 · Dates, times & intervals

Analytics queries live and die on date handling. The essentials:

SELECT now();                              -- current timestamptz
SELECT now() - INTERVAL '7 days';          -- a week ago
SELECT date_trunc('day', created_at);      -- floor to midnight
SELECT AGE(now(), created_at);             -- human interval, e.g. "3 mons 2 days"

date_trunc('day', …) is how you bucket rows for daily/weekly/monthly analytics — it chops a timestamp down to the start of the day (or 'hour', 'week', 'month'). The DocChat usage chart is one query:

-- documents uploaded per day, last 30 days
SELECT date_trunc('day', created_at) AS day,
       COUNT(*) AS uploads
FROM documents
WHERE created_at >= now() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;

Always use timestamptz, never timestamp timestamp stores a wall-clock with no timezone — ambiguous and a source of bugs. timestamptz stores an absolute instant (internally UTC) and converts to the session's timezone on the way out. For a UAE app whose users span timezones, timestamptz is the only sane choice. The "tz" costs nothing and saves you a class of bug.

PHP bridge: if you fought MySQL's DATETIME vs TIMESTAMP and stored UTC strings by hand, timestamptz does that bookkeeping for you correctly.

8 · Transaction isolation & MVCC

The fundamentals lesson covered ACID. Isolation — the "I" — is the part interviewers push on, because it governs what concurrent transactions see of each other.

MVCC (Multi-Version Concurrency Control) is how Postgres pulls it off: instead of locking rows for reads, every write creates a new version of the row, and each transaction sees a consistent snapshot of the data as of a point in time. The headline consequence: readers never block writers, and writers never block readers. That's why Postgres stays responsive under load where a lock-everything database would stall.

You choose how strict the snapshot is with the isolation level:

Level	What it guarantees
`READ COMMITTED` (default)	Each statement sees rows committed before it began. Two reads in the same transaction can differ if another commits between them.
`REPEATABLE READ`	The whole transaction sees one frozen snapshot from its start — re-reading a row always returns the same value. Blocks non-repeatable reads.
`SERIALIZABLE`	Strongest: the result is as if transactions ran one-at-a-time. Postgres may abort one with a serialization error you must retry.

BEGIN ISOLATION LEVEL REPEATABLE READ;
  -- … reads here all see one consistent snapshot …
COMMIT;

Pick READ COMMITTED for ordinary web requests (the default, fine 95% of the time). Step up to REPEATABLE READ or SERIALIZABLE for multi-step money/quota logic where a mid-transaction change would corrupt a decision.

Deadlocks & how to avoid them A deadlock is two transactions each holding a lock the other needs — A locked row 1 and waits for row 2, B locked row 2 and waits for row 1. Neither can proceed. Postgres detects this and kills one with a deadlock detected error. The fix is consistent lock ordering: if every transaction always touches rows in the same order (e.g. always lower id first), the cycle can't form. Keep transactions short, and be ready to retry the aborted one.

Interview: "What's MVCC?" Multi-version concurrency control — writes create new row versions rather than overwriting, so each transaction reads a consistent snapshot without blocking writers (and vice versa). It's why Postgres' default READ COMMITTED is cheap and why you rarely need explicit read locks.

9 · Sharp small tools

Four constructs that punch above their weight in real queries:

-- DISTINCT ON: the newest doc per user, one row each (Postgres-only, very handy)
SELECT DISTINCT ON (owner_id) owner_id, title, created_at
FROM documents
ORDER BY owner_id, created_at DESC;

-- COALESCE: first non-null wins — a default for missing values
SELECT COALESCE(nickname, name, 'Anonymous') FROM users;

-- CASE: inline if/else inside a SELECT
SELECT title,
       CASE WHEN char_length(body) > 5000 THEN 'long'
            WHEN body IS NULL            THEN 'empty'
            ELSE 'short' END AS size_bucket
FROM documents;

And the set operators combine two result sets (each must have matching columns): UNION stacks rows and removes duplicates (UNION ALL keeps them, faster), while INTERSECT returns only rows present in both. DISTINCT ON is a Postgres specialty — a terser "first row per group" than a window function when you just want one row each.

10 · Drill it

Your tangible win Write the top-3-most-recent-documents-per-user query from scratch — the single most-asked advanced SQL question. If you can produce it without notes, you've cleared the bar most candidates trip on.

ChallengeSQL

Given documents(id, owner_id, title, created_at), return each user's 3 most recent documents — the owner_id, title, and created_at, with the newest first within each user. Bonus: which index makes it fast, and which single line would you change to get the single newest doc per user instead?

Show the answer

Number each user's docs newest-first with ROW_NUMBER(), then keep ranks 1–3:

WITH ranked AS (
    SELECT owner_id, title, created_at,
           ROW_NUMBER() OVER (
               PARTITION BY owner_id
               ORDER BY created_at DESC
           ) AS rn
    FROM documents
)
SELECT owner_id, title, created_at
FROM ranked
WHERE rn <= 3
ORDER BY owner_id, created_at DESC;

Index: CREATE INDEX ON documents (owner_id, created_at DESC); — it partitions by owner_id and pre-sorts by date, exactly what the window needs, so Postgres avoids a sort.

Single newest per user: change the filter to WHERE rn = 1 — or skip the CTE entirely with SELECT DISTINCT ON (owner_id) … ORDER BY owner_id, created_at DESC.

11 · Flashcards

Tap a card to flip it. Quiz yourself until the answer arrives before the flip.

Window function vs GROUP BY?

GROUP BY collapses each group to one summary row; a window function keeps every row and adds a computed column over its partition.

tap to flip

Shape of a window function?

fn() OVER (PARTITION BY … ORDER BY …). PARTITION groups; ORDER sequences rows inside each group.

tap to flip

EXPLAIN vs EXPLAIN ANALYZE?

EXPLAIN shows the estimated plan without running it; EXPLAIN ANALYZE actually executes it and reports real timings and row counts.

tap to flip

Seq Scan in a plan means?

Postgres reads every row. On a large table with a selective filter, it signals a missing or unusable index.

tap to flip

UPSERT syntax?

INSERT … ON CONFLICT (cols) DO UPDATE SET c = EXCLUDED.c (or DO NOTHING). Target must be a unique/primary key.

tap to flip

JSONB -> vs ->>?

-> returns the field as JSON (chainable); ->> returns it as text. @> tests containment and is GIN-indexable.

tap to flip

timestamp vs timestamptz?

timestamptz stores an absolute UTC instant and converts per session; timestamp is a bare wall-clock with no zone. Always use timestamptz.

tap to flip

What is MVCC?

Multi-version concurrency control: writes create new row versions, each transaction reads a consistent snapshot, so readers and writers don't block each other.

tap to flip

Composite index leftmost rule?

A (a, b) index serves filters on a or a AND b, but not b alone — it's sorted by a first, then b within each a.

tap to flip

How to avoid deadlocks?

Consistent lock ordering — always touch rows in the same order (e.g. lowest id first), keep transactions short, and retry the aborted one.

tap to flip

12 · Check yourself

Answer from memory — retrieval is what moves this from "I read it" to "I know it".

Recall quiz

Which keeps every row while adding a per-group rank?

What does EXPLAIN ANALYZE do that EXPLAIN won't?

In an UPSERT, what does EXCLUDED refer to?

Which JSONB operator tests for containment?

How do you reliably avoid most deadlocks?

Primary source ⭐ PostgreSQL Docs — Window Functions tutorial, the authoritative reference for §2 above and the whole advanced toolkit. For indexes and query plans done right, Use The Index, Luke is the definitive, free guide — read its chapters on the leftmost-prefix rule and reading EXPLAIN.