Module 6 · RAG · Drills

Drills: RAG Foundations

Knowing the words "embedding" and "pgvector" isn't the same as writing the SQL. Type every query, reason through every question out loud, then reveal the solution. This is the muscle an interviewer tests.

How to use this page Each drill is a small task — some SQL, some reasoning. Attempt it first: write the query in psql or say the answer aloud, then click “Show solution” to compare. The reasoning drills matter as much as the SQL — they're the ones interviewers ask. Tick each box as you go; your progress is saved in this browser.

A · Warm-up reps Basic

Drill 1 pgvector setup

Enable the vector extension, then create a chunks table with an id, a document_id, the chunk content, and a 1536-dimension embedding.

Show solution

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks (
    id          bigserial PRIMARY KEY,
    document_id bigint      NOT NULL,
    content     text        NOT NULL,
    embedding   vector(1536) NOT NULL
);

The vector(1536) dimension must match your embedding model. Mismatch it and inserts fail.

Drill 2 retrieval SQL

Write the top-k retrieval query: return the content of the 5 chunks nearest in meaning to a question vector passed as $1.

Show solution

SELECT id, content
FROM   chunks
ORDER BY embedding <=> $1
LIMIT  5;

Use <=> (cosine distance) for meaning. Smaller distance = closer match, so plain ORDER BY ascending puts the best chunks first.

Drill 3 recall

Without looking back: name the five RAG steps in order, and say which run once at upload vs. live on every question.

Show solution

# Indexing (once, at upload):
1. CHUNK    — split the document into passages
2. EMBED    — turn each chunk into a vector
3. STORE    — save text + vector in pgvector

# Querying (live, every question):
4. RETRIEVE — embed the question, find nearest chunks
5. GENERATE — give chunks + question to the LLM

B · Stretch Intermediate

Drill 4 reasoning

You're chunking two sources: a 40-page employment contract and a noisy customer chat log. What chunk size would you choose for each, and why?

Show solution

Contract → larger chunks (~600–800 tokens). Legal clauses build on definitions and conditions across several sentences; a clause split too finely loses the "subject to clause 4.2" context. Bigger chunks keep a full clause together.

Chat log → smaller chunks (~150–300 tokens), maybe one message or turn each. Each message is a self-contained thought, and topics shift fast. Small chunks keep retrieval precise instead of dragging in unrelated banter.

The real interview answer: "it depends on the document's structure — I'd tune, not guess." Add overlap in both cases so boundary sentences survive.

Drill 5 reasoning

A user asks DocChat: "What's the money-back period?" but the policy only ever says "refund window." Explain in words why a keyword search fails here and vector search wins.

Show solution

Keyword search fails because it matches letters. The query words "money-back period" never appear in the document, so a LIKE or full-text search returns nothing — even though the answer is right there under a different name.

Vector search wins because embeddings capture meaning, not spelling. "money-back period" and "refund window" produce vectors pointing in nearly the same direction, so cosine distance between them is small — the right chunk surfaces despite zero shared keywords.

This synonym/paraphrase gap is the single clearest reason to reach for RAG over a keyword index — a great thing to say in an interview.

C · Build challenge Build

Mini-project Write the full storage layer for DocChat in one SQL file: enable pgvector, create the chunks table, add a speed index, and write the live retrieval query. This is the exact schema you'll run in the capstone — get it right once and reuse it.

Build · DocChat schema + retrieval

Goal: a migration that an interviewer could read top-to-bottom and understand your whole retrieval path.

Show solution

001_chunks.sql

-- enable the vector type & operators
CREATE EXTENSION IF NOT EXISTS vector;

-- one row per chunk of an uploaded document
CREATE TABLE chunks (
    id          bigserial PRIMARY KEY,
    document_id bigint      NOT NULL,
    content     text        NOT NULL,
    embedding   vector(1536) NOT NULL
);

-- approximate index so search stays fast at scale
CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops);

-- RETRIEVE: 5 nearest chunks to the question vector $1
SELECT id, content
FROM   chunks
ORDER BY embedding <=> $1
LIMIT  5;

Note the index uses vector_cosine_ops to match the <=> operator in the query — the index and the search must agree on the distance metric, or the index won't be used.

D · Rapid recall Flashcards

Click a card to flip it. Say the answer out loud before you flip — that's the rep that builds storage strength.

What is an embedding?

A vector — a list of numbers — that captures the meaning of text. Similar meaning → nearby vectors.

click to flip

The five RAG steps, in order?

Chunk → Embed → Store → Retrieve → Generate.

click to flip

What does <=> do in pgvector?

Cosine distance between two vectors. Smaller = closer in meaning; use it in ORDER BY.

click to flip

What does vector(1536) mean?

A column holding a 1536-dimension embedding. The size must match your embedding model.

click to flip

Why give chunks overlap?

So a sentence split across a chunk boundary still appears whole in one chunk — preserving context.

click to flip

Why does vector beat keyword search?

It matches meaning, not spelling — "refund window" finds "money-back period" with zero shared words.

click to flip

E · Self-check before moving on

Tick each only if you can do it without looking:

I can explain what an embedding is and why similar meaning lands at nearby vectors
I can recite the five RAG steps and say which run at upload vs. at query time
I can enable pgvector and create the chunks table from memory
I can write the top-k cosine retrieval query with <=> and LIMIT
I can reason about chunk size and explain why vector beats keyword search in an interview

Next All ticked? You own the foundations — the concepts and the storage layer. Next we wire the live pieces around it: Lesson 6.2 — Building a RAG Pipeline.