Module 3 · Postgres & Data · Deep Dive

SQLAlchemy 2.0 & Migrations

Stop writing raw SQL strings by hand. Map Python classes to Postgres tables, query in pure Python, and version-control your schema with Alembic — the way every modern Python backend ships.

BasicIntermediateBuild

Why this matters Last lesson you wrote SQL by hand against Postgres. That's vital to understand, but production Python code rarely concatenates SQL — it uses an ORM. SQLAlchemy is the one the entire ecosystem standardised on, and FastAPI jobs assume it. In DocChat your documents currently live in a Python list that vanishes on restart. By the end of this lesson they'll be real rows in Postgres, mapped to clean Python models, served through a FastAPI dependency — and you'll be able to evolve the schema safely with Alembic.
In this lesson
  1. What an ORM is (and why)
  2. 2.0 declarative models
  3. Engine & Session
  4. CRUD through the ORM
  5. Relationships & loading
  6. Wiring into FastAPI
  7. Alembic migrations
  8. Build: persist DocChat
  9. Many-to-many relationships
  10. Cascades & delete behavior
  11. Loading strategies compared
  12. Connection pooling & engine lifecycle
  13. Hybrid properties
  14. Check yourself

1 · What an ORM is

An ORM (Object-Relational Mapper) is a translation layer. A Python class maps to a database table; an instance of that class maps to a row; an attribute maps to a column. You work with normal Python objects, and the ORM generates the SQL to load and save them.

In PythonIn Postgres
class Documentthe documents table
doc = Document(...)one row in that table
doc.titlethe title column

Why bother instead of raw SQL? Three reasons interviewers want to hear: you stay in one language (no string-glued SQL scattered everywhere), you get safety (parameters are bound, so SQL injection is much harder), and you get portability and refactoring (rename a column in one model, not across fifty query strings). The trade-off: a layer of abstraction you must understand to debug.

PHP bridge: SQLAlchemy is to Python what Eloquent or Doctrine is to PHP — model classes standing in for tables. SQLAlchemy's style sits closer to Doctrine's data-mapper than to Eloquent's active-record, but the mental model is the same.

2 · 2.0 declarative models

SQLAlchemy 2.0 is a real break from the old 1.x tutorials you'll find online. The modern way uses a DeclarativeBase subclass plus typed attributes with Mapped[...] and mapped_column(). The type annotation drives the column type, so models read like ordinary typed Python.

models.py
from datetime import datetime
from sqlalchemy import String, ForeignKey, func
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[str] = mapped_column(String(255), unique=True)
    name: Mapped[str | None] = mapped_column(String(100))


class Document(Base):
    __tablename__ = "documents"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))
    body: Mapped[str]
    owner_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
    created_at: Mapped[datetime] = mapped_column(server_default=func.now())

Read the annotations carefully — they carry meaning. Mapped[int] with primary_key=True is an auto-incrementing PK. Mapped[str] is NOT NULL; Mapped[str | None] allows NULL. A bare Mapped[str] with no mapped_column still becomes a column — the annotation alone is enough.

2.0 vs the old tutorials (interviewers notice) If you see Column(Integer, primary_key=True) and declarative_base() called as a function, that's 1.x style. The 2.0 way is mapped_column(...) with Mapped[...] annotations and a class Base(DeclarativeBase). Knowing the difference signals you learned this in 2026, not from a 2019 blog post.

3 · Engine & Session

Two objects connect your models to the database. The Engine is the connection factory — created once, at startup, for the whole app. The Session is your unit of work — short-lived, one per request, the thing you actually add and query through.

db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# postgresql+psycopg = the Postgres dialect using the psycopg 3 driver
engine = create_engine("postgresql+psycopg://user:pass@localhost:5432/docchat")

# A factory that produces new Session objects
SessionLocal = sessionmaker(bind=engine)

The connection string has a fixed shape: dialect+driver://user:pass@host:port/database. In real code the URL comes from an environment variable, never a hardcoded literal.

A session's lifecycle is four moves. You add objects to it, commit to flush them to Postgres in a transaction, optionally refresh to pull back server-generated values (like the new id), and close to release the connection.

with SessionLocal() as session:
    user = User(email="sam@uae.dev", name="Sam")
    session.add(user)        # staged, not yet in the DB
    session.commit()         # transaction committed
    session.refresh(user)    # now user.id is populated
    print(user.id)
# leaving the `with` block closes the session automatically
PHP bridge: the Engine ≈ your PDO connection pool set up once; the Session ≈ Doctrine's EntityManager you get per request. commit() is the same idea as flushing a transaction.

4 · CRUD through the ORM

This is the day-to-day. In 2.0 you read with select() and execute it through the session — the old session.query(...) API still exists but is legacy. Get used to select().

Create add & commit

doc = Document(title="Q2 Report", body="...", owner_id=1)
session.add(doc)
session.commit()

Read select / where / order_by

from sqlalchemy import select

stmt = (
    select(Document)
    .where(Document.owner_id == 1)
    .order_by(Document.created_at.desc())
)
docs = session.scalars(stmt).all()   # list[Document], newest first

# A single row (or None):
one = session.scalars(
    select(Document).where(Document.id == 5)
).first()

Note session.scalars(stmt).all()scalars() unwraps each row to the single ORM object, so you get a clean list[Document] rather than rows of tuples. Use .first() for one-or-none.

Update mutate & commit

doc = session.scalars(
    select(Document).where(Document.id == 5)
).first()
doc.title = "Q2 Report (final)"   # just set the attribute
session.commit()                     # SQLAlchemy issues the UPDATE

You don't write an UPDATE statement — you change the object and commit. The session tracks what changed and emits the SQL. This "dirty tracking" is core ORM behaviour.

Delete remove & commit

session.delete(doc)
session.commit()

5 · Relationships & loading

A ForeignKey gives you the column; a relationship() gives you the convenient Python attribute that navigates it. One User owns many Documents — a classic one-to-many. back_populates wires both sides together so they stay in sync.

class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[str] = mapped_column(String(255), unique=True)

    documents: Mapped[list["Document"]] = relationship(
        back_populates="owner"
    )


class Document(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))
    owner_id: Mapped[int] = mapped_column(ForeignKey("users.id"))

    owner: Mapped["User"] = relationship(back_populates="documents")

Now navigation is just attribute access:

user.documents      # list[Document] owned by this user
doc.owner.email     # walk from a document back to its user
Lazy vs selectin loading (the N+1 trap) By default relationships load lazily — each user.documents fires a fresh query the moment you touch it. Loop over 100 users and you get 100 extra queries: the infamous N+1 problem. Fix it by eager-loading with selectinload: select(User).options(selectinload(User.documents)) fetches all the documents in one extra query. Interviewers love this question.

6 · Wiring into FastAPI

Back in Module 2 you learned dependency injection with Depends. The session fits that pattern perfectly: a get_db dependency that yields a session and guarantees it closes after the request — even if the route raises.

deps.py
from collections.abc import Generator
from sqlalchemy.orm import Session
from .db import SessionLocal


def get_db() -> Generator[Session]:
    db = SessionLocal()
    try:
        yield db          # hand the session to the route
    finally:
        db.close()       # always runs, even on error
main.py
from fastapi import Depends, FastAPI
from sqlalchemy import select
from sqlalchemy.orm import Session
from .deps import get_db
from .models import Document

app = FastAPI()


@app.get("/documents")
def list_documents(db: Session = Depends(get_db)):
    docs = db.scalars(
        select(Document).order_by(Document.created_at.desc())
    ).all()
    return docs

The yield is the whole trick: code before it runs at request start, code in finally runs at request end. One dependency, reused by every route that touches the database.

7 · Alembic migrations

Your models describe the schema you want. The database has the schema it currently has. Alembic (SQLAlchemy's migration tool) bridges the gap and — crucially — version-controls it, so every change is a reviewable, repeatable file committed to git. No more "did anyone run the ALTER on prod?"

# 1. One-time setup — creates the alembic/ folder + config
alembic init alembic

# 2. Compare models to the DB and auto-generate a revision
alembic revision --autogenerate -m "create users and documents"

# 3. Apply all pending revisions to the database
alembic upgrade head

# (to undo the latest revision)
alembic downgrade -1

To make --autogenerate work, point Alembic at your models' metadata. In alembic/env.py set target_metadata = Base.metadata so Alembic can diff your classes against the live schema and write the op.create_table(...) / op.add_column(...) calls for you.

Always read the generated migration Autogenerate is a draft, not gospel. It can miss renames (sees a drop + an add) and server defaults. Open the new file under alembic/versions/, read the upgrade() and downgrade() functions, and fix anything off before you run upgrade head. Reviewing migrations is a senior habit.

8 · Build it

Your tangible win Convert DocChat's in-memory document list into real SQLAlchemy models persisted in Postgres, served through get_db. After this, your documents survive a restart — the moment a toy app becomes a real one.

Before: documents lived in a module-level list that reset on every reload.

before — in-memory (don't keep this)
DOCS = []   # lost on restart

@app.post("/documents")
def create(title: str, body: str):
    DOCS.append({"title": title, "body": body})
    return {"count": len(DOCS)}

After: the same endpoints, now writing to and reading from Postgres through the session.

after — persisted in Postgres
from fastapi import Depends, FastAPI
from sqlalchemy import select
from sqlalchemy.orm import Session
from .deps import get_db
from .models import Document

app = FastAPI()


@app.post("/documents")
def create(title: str, body: str, db: Session = Depends(get_db)):
    doc = Document(title=title, body=body, owner_id=1)
    db.add(doc)
    db.commit()
    db.refresh(doc)            # pull back the new id
    return doc


@app.get("/documents")
def list_docs(db: Session = Depends(get_db)):
    return db.scalars(
        select(Document).order_by(Document.created_at.desc())
    ).all()

Steps in order: define Base, User, Document in models.py; create the engine and SessionLocal in db.py; add get_db; then alembic revision --autogenerate and alembic upgrade head to create the tables. Restart, POST a document, restart again — it's still there.

9 · Many-to-many relationships

One-to-many covered UserDocument. But a Document can carry many tags, and each tag labels many documents — that's many-to-many. SQL has no native column for this, so you model it with a third association table holding just the two foreign keys. In 2.0 you declare a plain Table and hand it to relationship(secondary=...).

models.py
from sqlalchemy import Table, Column, ForeignKey, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship


class Base(DeclarativeBase):
    pass


# Association table: just the two foreign keys, no model class
document_tag = Table(
    "document_tag",
    Base.metadata,
    Column("document_id", ForeignKey("documents.id"), primary_key=True),
    Column("tag_id", ForeignKey("tags.id"), primary_key=True),
)


class Document(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))

    tags: Mapped[list["Tag"]] = relationship(
        secondary=document_tag, back_populates="documents"
    )


class Tag(Base):
    __tablename__ = "tags"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), unique=True)

    documents: Mapped[list["Document"]] = relationship(
        secondary=document_tag, back_populates="tags"
    )

Now linking is pure Python — append a Tag object to doc.tags and commit; SQLAlchemy inserts the join row for you:

doc.tags.append(Tag(name="finance"))
session.commit()        # a row appears in document_tag
When the link itself has data: the association-object pattern A bare secondary table works only when the join carries nothing but the two FKs. The moment the link needs its own columns — say added_at or added_by — you promote the join to a real model (an association object) with its own __tablename__ and two relationship()s, and you navigate through it rather than across it. Reach for it the instant a many-to-many grows attributes.

10 · Cascades & delete behavior

In DocChat a Document gets split into chunks for embedding. Those chunks are meaningless without their parent — delete the document and the chunks must die with it. By default SQLAlchemy does not do that; you ask for it with cascade="all, delete-orphan" on the parent side.

class Document(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))

    chunks: Mapped[list["Chunk"]] = relationship(
        back_populates="document",
        cascade="all, delete-orphan",
        passive_deletes=True,
    )


class Chunk(Base):
    __tablename__ = "chunks"
    id: Mapped[int] = mapped_column(primary_key=True)
    text: Mapped[str]
    document_id: Mapped[int] = mapped_column(
        ForeignKey("documents.id", ondelete="CASCADE")
    )

    document: Mapped["Document"] = relationship(back_populates="chunks")

Two distinct things are happening, and interviewers probe the difference. cascade="all, delete-orphan" is the ORM-level rule: deleting a Document deletes its Chunks, and removing a chunk from doc.chunks (orphaning it) deletes that chunk too. The ondelete="CASCADE" on the ForeignKey is the database-level rule — Postgres itself removes the rows. passive_deletes=True tells SQLAlchemy to trust the database to do it, so it skips loading every child into memory just to delete them one by one.

doc = session.get(Document, 5)
session.delete(doc)
session.commit()        # the document AND all its chunks are gone
Belt and braces — set both The ORM cascade only fires for objects the session knows about; a stray DELETE straight to Postgres would orphan chunks if the FK had no ondelete. Set both the ORM cascade and the DB ondelete="CASCADE", then add passive_deletes=True so they cooperate instead of doing the work twice. That trio is the production-correct answer.

11 · Loading strategies compared

Section 5 met the N+1 trap and fixed it with selectinload. But that's one of several strategies, and picking the right one per relationship is a senior skill. You pass them through .options(...) on a select().

from sqlalchemy import select
from sqlalchemy.orm import selectinload, joinedload

# Collections (one-to-many / many-to-many): selectinload
stmt = select(User).options(selectinload(User.documents))

# Many-to-one (the single parent): joinedload
stmt = select(Document).options(joinedload(Document.owner))
StrategyHow it loadsBest for
selectinloadone extra query, WHERE id IN (...)collections (one-to-many, many-to-many)
joinedloada JOIN in the same querymany-to-one (the single parent row)
subqueryloadone extra query via a subqueryolder fallback for collections; mostly superseded by selectinload
lazy="raise"refuses to load — raises insteadforcing every access to be explicit; catches accidental N+1 in tests

The rule of thumb: collections want selectinload (a single IN query, no row multiplication), while a many-to-one wants joinedload (one JOIN, no extra round trip). Avoid joinedload on a collection — the join multiplies the parent row once per child and bloats the result set. Set lazy="raise" on a relationship during development so any forgotten eager-load blows up loudly instead of silently firing N queries in prod.

Interview answer · selectinload vs joinedload "Use selectinload for collections and joinedload for many-to-one. selectinload issues one extra query with WHERE id IN (...), so it never multiplies rows — perfect for a one-to-many like a user's documents. joinedload emits a single JOIN, which is ideal for fetching the one parent of a many-to-one (a document's owner) but wasteful on a collection because the parent row repeats for every child. Both kill the N+1 problem; the difference is row shape and whether you want a second query or a join."

12 · Connection pooling & engine lifecycle

Opening a Postgres connection is expensive — a TCP handshake plus auth, every time. The Engine avoids that by keeping a pool of live connections and lending one out per session, then taking it back on close() rather than tearing it down. You create the engine once for the whole process and tune the pool with arguments.

db.py
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine(
    os.environ["DATABASE_URL"],
    pool_size=5,            # persistent connections kept open
    max_overflow=10,        # extra burst connections under load
    pool_pre_ping=True,     # test a connection before handing it out
)

SessionLocal = sessionmaker(bind=engine)

pool_size is the steady-state pool; max_overflow is how many extra connections it may open during a spike before requests have to wait. The one that saves you in production is pool_pre_ping=True: it sends a tiny test query before lending a connection, and silently reconnects if it's dead.

Why pool_pre_ping matters on Neon / Supabase Managed Postgres providers (Neon, Supabase) aggressively close idle connections to save resources. Without pool_pre_ping, your pool hands out a connection the server already hung up on, and the request dies with a "server closed the connection unexpectedly" error. pool_pre_ping=True detects the dead socket and transparently opens a fresh one — it's effectively mandatory on serverless Postgres in 2026.

The engine lives for the whole app; the session lives for one request. That's exactly the get_db dependency from section 6 — SessionLocal() borrows a connection from the pool, db.close() in the finally returns it. One session per request, one engine per process: the standard FastAPI shape.

13 · Hybrid properties

Sometimes you want a computed attribute that works the same in Python and in SQL — readable as obj.prop on a loaded object, and usable inside a where() so the database does the filtering. That's @hybrid_property.

from sqlalchemy.ext.hybrid import hybrid_property


class Document(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))
    body: Mapped[str]

    @hybrid_property
    def is_long(self) -> bool:
        return func.length(self.body) > 5000

The same expression evaluates in Python on an instance and compiles to SQL in a query:

doc.is_long                         # True/False on a loaded object
session.scalars(
    select(Document).where(Document.is_long)   # filters in Postgres
).all()

14 · Check yourself

Answer from memory — retrieval is what moves this from "I read it" to "I know it".

Recall quiz

In an ORM, a Python class maps to which thing?

Which is the correct SQLAlchemy 2.0 column style?

How do you read rows as clean model objects?

What links the two sides of a relationship together?

What does Alembic actually give your project?

Which eager-loading strategy fits a collection best?

Why set pool_pre_ping on managed Postgres?

Primary source ⭐ SQLAlchemy 2.0 — ORM Quick Start. The canonical, authoritative reference for the declarative models, Session, and select() patterns above. Pair it with the Alembic autogenerate docs for the migration workflow.