Module 3 · Postgres & Data · Deep Dive

SQL & Postgres Fundamentals

You already write SQL from your PHP/MySQL days — this lesson sharpens it, fixes the MySQL-isms, and grounds it in Postgres, the database your DocChat backend will actually run on.

BasicIntermediateBuild

Why this matters Every document DocChat ingests, every user who logs in, every chat message — all of it lives in Postgres. The Python and FastAPI you've learned are just the doors; the data is the building. You wrote SQL against MySQL in PHP, so the shapes are familiar. Here we make that knowledge production-grade: correct joins, indexes that keep queries fast, and transactions that keep data honest. Interviewers probe SQL hard because it separates people who "use an ORM" from people who understand their data.

In this lesson

The relational model
Creating tables & constraints
CRUD: select, insert, update, delete
JOINs: inner vs left
Aggregation & grouping
Indexes
Transactions & ACID
Connecting to Postgres
Build: the DocChat schema
Check yourself

1 · The relational model

A relational database stores data in tables. A table has named columns (each with a fixed type) and holds rows (one record each). A users table is exactly a spreadsheet of users: columns across the top, one user per row.

Type	Example	Use for
`integer`	`42`	Counts, IDs, whole numbers.
`text`	`'intro.pdf'`	Strings of any length (Postgres has no penalty).
`timestamptz`	`now()`	A date + time with timezone.
`boolean`	`true` / `false`	Yes/no flags like `is_active`.

Two constraints define how rows relate:

PRIMARY KEY — the column that uniquely identifies a row (usually an id). No two rows share it, and it's never null.
FOREIGN KEY — a column that points at another table's primary key. A documents.owner_id that references users.id is what makes "this document belongs to that user" true and enforced.

PHP bridge: this is the same model you used with MySQL — tables, rows, PRIMARY KEY, FOREIGN KEY. Postgres is very close to MySQL in SQL syntax; it's just the more rigorous, standards-compliant database and the production standard for modern Python apps. Most of your MySQL muscle memory transfers directly.

2 · Creating tables & constraints

You define a table with CREATE TABLE. Each column gets a name, a type, and optional constraints that the database will enforce for you — forever, on every write.

CREATE TABLE users (
    id          serial PRIMARY KEY,
    email       text NOT NULL UNIQUE,
    name        text NOT NULL,
    is_active   boolean NOT NULL DEFAULT true,
    created_at  timestamptz NOT NULL DEFAULT now()
);

What each constraint buys you:

serial PRIMARY KEY — an auto-incrementing integer id, unique per row. (Postgres' serial ≈ MySQL's AUTO_INCREMENT.)
NOT NULL — the column can never be empty; an insert without it is rejected.
UNIQUE — no two rows may share this value. Two users can't register the same email.
DEFAULT — the value used when you don't supply one. New users are active and stamped with the current time automatically.

Constraints are documentation that can't lie A NOT NULL or UNIQUE in the schema is a guarantee the database keeps no matter what code writes to it. Push rules into the schema; don't rely on application code to remember them.

3 · CRUD: the four operations

Create, Read, Update, Delete — INSERT, SELECT, UPDATE, DELETE. These are 90% of what you write daily. We'll use a documents table that belongs to users.

SELECT read rows

-- pick columns, filter, sort, cap the count
SELECT id, title, created_at
FROM documents
WHERE owner_id = 1
ORDER BY created_at DESC
LIMIT 10;

Read it left to right: from documents, keep rows where the owner is user 1, order newest first, return at most 10. Use SELECT * to grab every column, but name columns explicitly in real code — it's faster and won't break when the table grows.

PHP bridge: identical to the SELECT … WHERE … ORDER BY … LIMIT you ran through mysqli/PDO. Same clauses, same order.

INSERT create rows

INSERT INTO documents (owner_id, title, body)
VALUES (1, 'Intro to RAG', 'Retrieval augmented…')
RETURNING id;

You list the columns, then the matching values. Columns you skip take their DEFAULT. The Postgres-only RETURNING clause hands back the new row's id in the same round-trip — no separate "last insert id" call like in MySQL.

UPDATE change rows

UPDATE documents
SET title = 'Intro to RAG (v2)'
WHERE id = 7;

The career-ending omission An UPDATE or DELETE without a WHERE clause hits every row in the table. Always write the WHERE first. Many teams run inside a transaction (next section) so a mistake can be rolled back.

DELETE remove rows

DELETE FROM documents
WHERE id = 7;

4 · JOINs: combining tables

Data is split across tables to avoid duplication — users in one, their documents in another, linked by the owner_id foreign key. A JOIN stitches them back together in a query.

-- every document, with its owner's name
SELECT d.title, u.name AS owner
FROM documents d
INNER JOIN users u ON u.id = d.owner_id;

The ON clause says how the two tables line up: a document's owner_id matches a user's id. The d and u are short table aliases.

The difference between the two joins you'll use most:

INNER JOIN keeps only rows that match on both sides. A document with no valid owner, or a user with no documents, simply doesn't appear.
LEFT JOIN keeps every row from the left (first) table, and fills the right side with NULL where there's no match. Use it to ask "all users, even those with zero documents."

-- every user, plus their documents — users with none still show, body NULL
SELECT u.name, d.title
FROM users u
LEFT JOIN documents d ON d.owner_id = u.id;

Interview: "Explain a LEFT JOIN." Say: it returns all rows from the left table and the matching rows from the right; where there's no match, the right-hand columns come back NULL. The classic use is finding records with nothing on the other side — e.g. LEFT JOIN … WHERE d.id IS NULL lists users who have uploaded no documents. An INNER JOIN would silently drop them.

5 · Aggregation & grouping

Aggregate functions collapse many rows into one number: COUNT, SUM, AVG, MIN, MAX. On their own they summarise the whole table:

SELECT COUNT(*) AS total_docs
FROM documents;

GROUP BY splits the rows into buckets first, then aggregates each bucket. "How many documents does each user own?" is one query:

SELECT owner_id, COUNT(*) AS doc_count
FROM documents
GROUP BY owner_id
HAVING COUNT(*) > 3
ORDER BY doc_count DESC;

WHERE vs HAVING WHERE filters individual rows before grouping; HAVING filters the groups after aggregating. "Only documents from 2026" is a WHERE; "only users with more than 3 documents" is a HAVING. Mixing them up is a common interview slip.

6 · Indexes

Without an index, finding rows by a column means the database reads every row to check — a "sequential scan." On a big table that's slow. An index is a sorted lookup structure (a B-tree) that lets Postgres jump straight to the matching rows, the way a book index sends you to a page instead of skimming the whole book.

CREATE INDEX idx_documents_owner
ON documents (owner_id);

Now WHERE owner_id = 1 and joins on owner_id get dramatically faster as the table grows. When to add one: on columns you frequently filter, join, or sort by — foreign keys and lookup columns are prime candidates. Don't index everything: each index makes INSERT/UPDATE a little slower and uses disk, so add them where reads actually need them.

To see whether a query uses an index, prefix it with EXPLAIN — Postgres shows the plan, where "Index Scan" means it's using your index and "Seq Scan" means it isn't:

EXPLAIN SELECT * FROM documents WHERE owner_id = 1;

Interview: "What's an index?" A separate sorted data structure (usually a B-tree) on one or more columns that lets the database find matching rows without scanning the whole table — turning a slow linear search into a fast lookup. The trade-off: faster reads, slightly slower writes and extra storage, so you index the columns you actually query on (especially foreign keys), not every column.

7 · Transactions & ACID

A transaction groups several statements into one all-or-nothing unit. You open it with BEGIN, and either COMMIT to save everything or ROLLBACK to discard everything as if it never happened.

BEGIN;
  INSERT INTO documents (owner_id, title) VALUES (1, 'Report');
  UPDATE users SET doc_quota = doc_quota - 1 WHERE id = 1;
COMMIT;   -- both happen, or if anything fails, ROLLBACK and neither does

This is atomicity — the "A" in ACID — and it's why transactions matter: the insert and the quota decrement must both succeed or both fail, never half. If the server crashed between them without a transaction, you'd have a document with no quota charged. The full set:

Atomicity — all statements succeed together, or none do.
Consistency — the database moves from one valid state to another; constraints always hold.
Isolation — concurrent transactions don't see each other's half-done work.
Durability — once committed, the data survives a crash.

PHP bridge: if you ever wrapped MySQL writes in beginTransaction() / commit() / rollBack() with PDO, this is the same idea — Postgres just gives you stronger guarantees by default.

8 · Connecting to Postgres

psql is the official command-line client — your REPL for SQL. Create a database and connect:

# create a database from the shell
createdb docchat

# open an interactive session against it
psql docchat
docchat=# SELECT version();
docchat=# \dt        -- list tables
docchat=# \d users   -- describe the users table
docchat=# \q         -- quit

Backslash commands (\dt, \d, \q) are psql's own helpers, not SQL. Prefer a GUI? pgAdmin ships with Postgres, and DBeaver is a free, popular cross-database client — both let you browse tables and run queries visually, like phpMyAdmin did for MySQL. In Week 7 we'll run Postgres in Docker so DocChat's database is reproducible on any machine.

9 · Build it

Your tangible win Design and create DocChat's core schema: a users table and a documents table, linked by a foreign key so every document belongs to exactly one user. This is the literal foundation your capstone is built on — you'll add to it in every later lesson.

Open psql docchat and run these. Try writing them from memory first, then compare:

schema.sql

CREATE TABLE users (
    id          serial PRIMARY KEY,
    email       text        NOT NULL UNIQUE,
    name        text        NOT NULL,
    is_active   boolean     NOT NULL DEFAULT true,
    created_at  timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE documents (
    id          serial PRIMARY KEY,
    owner_id    integer     NOT NULL REFERENCES users (id),
    title       text        NOT NULL,
    body        text,
    created_at  timestamptz NOT NULL DEFAULT now()
);

-- foreign keys are filtered/joined constantly — index it
CREATE INDEX idx_documents_owner ON documents (owner_id);

The REFERENCES users (id) on owner_id is the foreign key — Postgres now refuses any document whose owner_id doesn't match a real user, and refuses to delete a user who still owns documents. Seed a row and read it back:

INSERT INTO users (email, name) VALUES ('sam@docchat.ae', 'Sam') RETURNING id;
INSERT INTO documents (owner_id, title) VALUES (1, 'My first doc');

SELECT u.name, d.title
FROM users u
JOIN documents d ON d.owner_id = u.id;

10 · Check yourself

Answer from memory — retrieval is what moves this from "I read it" to "I know it".

Recall quiz

What does a FOREIGN KEY actually enforce?

Which join keeps users who own no documents?

Which clause filters groups after aggregating rows?

Why add an index to a frequently-filtered column?

What does ROLLBACK do to a transaction's writes?

Primary source ⭐ The Official PostgreSQL Tutorial. The canonical, authoritative reference for every statement above — tables, queries, joins, and transactions, straight from the source.