Module 6 · RAG · Drills
Knowing the words "embedding" and "pgvector" isn't the same as writing the SQL. Type every query, reason through every question out loud, then reveal the solution. This is the muscle an interviewer tests.
psql or say the answer aloud, then click “Show solution” to compare. The reasoning drills matter as much as the SQL — they're the ones interviewers ask. Tick each box as you go; your progress is saved in this browser.
Drill 1 pgvector setup
Enable the vector extension, then create a chunks table with an id, a document_id, the chunk content, and a 1536-dimension embedding.
CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE chunks ( id bigserial PRIMARY KEY, document_id bigint NOT NULL, content text NOT NULL, embedding vector(1536) NOT NULL );
The vector(1536) dimension must match your embedding model. Mismatch it and inserts fail.
Drill 2 retrieval SQL
Write the top-k retrieval query: return the content of the 5 chunks nearest in meaning to a question vector passed as $1.
SELECT id, content FROM chunks ORDER BY embedding <=> $1 LIMIT 5;
Use <=> (cosine distance) for meaning. Smaller distance = closer match, so plain ORDER BY ascending puts the best chunks first.
Drill 3 recall
Without looking back: name the five RAG steps in order, and say which run once at upload vs. live on every question.
# Indexing (once, at upload): 1. CHUNK — split the document into passages 2. EMBED — turn each chunk into a vector 3. STORE — save text + vector in pgvector # Querying (live, every question): 4. RETRIEVE — embed the question, find nearest chunks 5. GENERATE — give chunks + question to the LLM
Drill 4 reasoning
You're chunking two sources: a 40-page employment contract and a noisy customer chat log. What chunk size would you choose for each, and why?
Contract → larger chunks (~600–800 tokens). Legal clauses build on definitions and conditions across several sentences; a clause split too finely loses the "subject to clause 4.2" context. Bigger chunks keep a full clause together.
Chat log → smaller chunks (~150–300 tokens), maybe one message or turn each. Each message is a self-contained thought, and topics shift fast. Small chunks keep retrieval precise instead of dragging in unrelated banter.
The real interview answer: "it depends on the document's structure — I'd tune, not guess." Add overlap in both cases so boundary sentences survive.
Drill 5 reasoning
A user asks DocChat: "What's the money-back period?" but the policy only ever says "refund window." Explain in words why a keyword search fails here and vector search wins.
Keyword search fails because it matches letters. The query words "money-back period" never appear in the document, so a LIKE or full-text search returns nothing — even though the answer is right there under a different name.
Vector search wins because embeddings capture meaning, not spelling. "money-back period" and "refund window" produce vectors pointing in nearly the same direction, so cosine distance between them is small — the right chunk surfaces despite zero shared keywords.
This synonym/paraphrase gap is the single clearest reason to reach for RAG over a keyword index — a great thing to say in an interview.
chunks table, add a speed index, and write the live retrieval query. This is the exact schema you'll run in the capstone — get it right once and reuse it.
Build · DocChat schema + retrieval
Goal: a migration that an interviewer could read top-to-bottom and understand your whole retrieval path.
001_chunks.sql
-- enable the vector type & operators CREATE EXTENSION IF NOT EXISTS vector; -- one row per chunk of an uploaded document CREATE TABLE chunks ( id bigserial PRIMARY KEY, document_id bigint NOT NULL, content text NOT NULL, embedding vector(1536) NOT NULL ); -- approximate index so search stays fast at scale CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops); -- RETRIEVE: 5 nearest chunks to the question vector $1 SELECT id, content FROM chunks ORDER BY embedding <=> $1 LIMIT 5;
Note the index uses vector_cosine_ops to match the <=> operator in the query — the index and the search must agree on the distance metric, or the index won't be used.
Click a card to flip it. Say the answer out loud before you flip — that's the rep that builds storage strength.
<=> do in pgvector?ORDER BY.vector(1536) mean?Tick each only if you can do it without looking:
chunks table from memory<=> and LIMIT