Module 1 · Python · Deep Dive

Pythonic & Advanced

The features that separate "I write Python" from "I write Pythonic Python" — comprehensions, generators, decorators, context managers, type hints, and the async keyword that powers FastAPI.

BasicIntermediateBuild

Why this matters Everything from here is what an interviewer probes to tell a real Python dev from a tourist. The DocChat backend leans on every idea below: comprehensions clean ingested text, generators stream document chunks without blowing up memory, decorators are literally how FastAPI declares routes (@app.get), context managers open files safely, and async/await is the engine under every endpoint. Learn these and the rest of the course stops feeling like magic.
In this lesson
  1. Comprehensions
  2. Generators & yield
  3. Iterators, zip, enumerate, unpacking
  4. Error handling & custom exceptions
  5. Decorators
  6. Context managers
  7. Type hints, deeper
  8. Standard-library gold
  9. A taste of async/await
  10. Build: a log-line parser
  11. functools — caching, wraps, partial
  12. Parameterized decorators
  13. collections & itertools
  14. A taste of re (regex)
  15. Check yourself

1 · Comprehensions

A comprehension builds a collection in one expressive line. It's the single most "Python looks different from PHP" feature — and interviewers expect you to read and write them fluently.

# The long way
squares = []
for n in range(5):
    squares.append(n * n)

# The Pythonic way — a list comprehension
squares = [n * n for n in range(5)]   # [0, 1, 4, 9, 16]

Read it left to right: "the expression n*n, for each n in the range." Add a trailing if to filter:

nums = [4, 1, 7, 2, 9]
evens = [n for n in nums if n % 2 == 0]   # [4, 2]
labels = [f"#{n}" for n in nums if n > 3]   # ['#4', '#7', '#9']

The same syntax builds dicts and sets — just swap the brackets:

# dict comprehension — {key: value for ...}
prices = {"pen": 3, "pad": 8, "ink": 12}
with_vat = {name: p * 1.05 for name, p in prices.items()}
cheap    = {name: p for name, p in prices.items() if p < 10}

# set comprehension — unique results, no duplicates
words = "the cat the dog the cat".split()
unique_lengths = {len(w) for w in words}   # {3}
PHP bridge: there's no real PHP equivalent — the closest is array_map + array_filter chained, but those are clunkier and lazier readers skip them. A comprehension says map and filter in one breath.
Don't over-nest Comprehensions can nest ([x for row in grid for x in row]) but if it takes more than a glance to read, write a plain loop. Clever is not the goal; clear is.

2 · Generators & yield

A list builds all its items in memory at once. A generator produces items one at a time, on demand — it's lazy. You write one with yield instead of return:

def count_up(limit):
    n = 0
    while n < limit:
        yield n        # hand back one value, then PAUSE here
        n += 1

for x in count_up(3):
    print(x)          # 0, 1, 2 — produced one at a time

When the loop asks for the next value, the function resumes right where it paused. Nothing is stored except the current position. The memory win is enormous: a generator over a billion items uses the same memory as one over three.

There's also a generator expression — a comprehension with round brackets — for the same laziness inline:

total = sum(n * n for n in range(1_000_000))   # no million-item list ever built
Why this matters for DocChat To ingest a 300-page PDF you don't load every chunk into a list — you stream chunks through a generator, embedding and storing each as it flows past. A list of every chunk could exhaust memory on a big upload; a generator keeps a constant, tiny footprint. This is the exact pattern you'll write in the RAG module.
# Streaming chunks of text — lazily, one at a time
def chunks(text, size):
    for i in range(0, len(text), size):
        yield text[i : i + size]

for piece in chunks(big_document, 500):
    embed_and_store(piece)   # process, then discard — memory stays flat

3 · Iterators, zip, enumerate, unpacking

A generator is one kind of iterator — anything you can loop over. Three built-ins make iteration clean and come up constantly in interviews:

# enumerate — index + item together (you met this in 1.1)
for i, name in enumerate(["a", "b"], start=1):
    print(i, name)            # 1 a / 2 b

# zip — walk two (or more) sequences in lockstep
names  = ["Sam", "Mei"]
scores = [90, 85]
for name, score in zip(names, scores):
    print(f"{name}: {score}")

paired = dict(zip(names, scores))   # {'Sam': 90, 'Mei': 85}

Tuple unpacking assigns several names at once, and the * "star" collects the rest:

x, y = (10, 20)            # x=10, y=20
first, *rest = [1, 2, 3, 4]   # first=1, rest=[2, 3, 4]
head, *mid, tail = [1, 2, 3, 4]   # head=1, mid=[2,3], tail=4

# * also SPREADS a sequence into function args
nums = [3, 7, 1]
print(max(*nums))           # 7  (same as max(3, 7, 1))

# ** spreads a dict into keyword args
opts = {"sep": " | "}
print("a", "b", **opts)        # a | b
PHP bridge: *args ≈ PHP's ...$args spread/variadic, and **kwargs ≈ passing a named-argument array. zip has no neat PHP twin — you'd loop an index by hand.

4 · Error handling & custom exceptions

Python uses try / except / else / finally. You catch specific exception types, never a bare blanket catch:

try:
    pages = int(raw_value)      # might raise ValueError
except ValueError:
    pages = 0                   # runs only if THAT error happened
else:
    print("parsed cleanly")     # runs only if NO error happened
finally:
    print("always runs")        # cleanup — runs no matter what

You raise errors yourself with raise, and you can define your own exception classes by subclassing Exception — this is how real APIs signal domain problems:

class DocumentTooLargeError(Exception):
    """Raised when an upload exceeds the page limit."""

def ingest(doc):
    if doc["pages"] > 1000:
        raise DocumentTooLargeError(f"{doc['pages']} pages is too many")
    return "ok"
Interview note — EAFP vs LBYL. Python prefers EAFP ("Easier to Ask Forgiveness than Permission"): just try the operation and catch the failure. LBYL ("Look Before You Leap") checks conditions first with if. PHP code leans LBYL (if (isset(...))); idiomatic Python leans EAFP — try: d[k] except KeyError: over if k in d: when a miss is the exception, not the norm. Saying this in an interview signals you think in Python, not translated PHP.
PHP bridge: nearly identical to PHP's try/catch/finally and throw. The new parts are the else clause and the cultural lean toward EAFP.

5 · Decorators

A decorator is a function that wraps another function to add behaviour — without editing the original. The @name syntax sits on the line above a function. It's the feature that makes FastAPI's route declarations possible, so understand it now.

import time

def timed(fn):                       # takes a function...
    def wrapper(*args, **kwargs):     # ...returns a wrapped version
        start = time.perf_counter()
        result = fn(*args, **kwargs)   # call the original
        ms = (time.perf_counter() - start) * 1000
        print(f"{fn.__name__} took {ms:.2f} ms")
        return result
    return wrapper

@timed                              # == greet = timed(greet)
def greet(name):
    return f"Hi {name}"

greet("Sam")                       # prints: greet took 0.01 ms

The @timed line is pure sugar for greet = timed(greet). The wrapper now runs around every call — perfect for timing, logging, caching, or auth checks.

The bridge to FastAPI When you write @app.get("/docs") over a function in Module 2, you are using a decorator. FastAPI's app.get(...) returns a decorator that registers your function as the handler for GET /docs — it wires the route, parses the request, and validates the response, all without you touching the function body. The mental model you just built — "a decorator wraps and adds behaviour" — is exactly what makes @app.get click instead of feeling like incantation.

6 · Context managers

A context manager guarantees setup and cleanup around a block — most famously, closing a file even if an error is thrown. The with statement is how you use one:

with open("notes.txt") as f:
    text = f.read()
# file is AUTOMATICALLY closed here — even if an exception fired inside

Without with, you'd have to call f.close() by hand and wrap it in try/finally to be safe. The with block does that for you. The same pattern manages database sessions, network connections, and locks — anything that must be released.

PHP bridge: PHP leaves resource cleanup mostly to you (fclose, or hoping the request ends). Python's with makes "always release it" a one-liner — a habit interviewers love to see.

7 · Type hints, deeper

You met basic hints in 1.2. Real codebases (and FastAPI especially) lean on richer ones. They don't change runtime behaviour — they document intent and let tools catch bugs before you run.

from typing import Optional

def find_title(doc_id: int) -> Optional[str]:
    """Returns the title, or None if not found."""
    ...

tags: list[str] = ["ai", "uae"]          # a list of strings
counts: dict[str, int] = {"ai": 2}        # str keys, int values
One-line mypy mention Run mypy your_file.py and it flags type mismatches statically — like passing a str where an int is declared — before your code ever executes. It's the closest Python gets to a compiler's safety net, and it's standard on professional 2026 teams.

8 · Standard-library gold

Python ships "batteries included." Four pieces of the standard library you'll reach for weekly:

from collections import Counter
from pathlib import Path
import json, datetime

# Counter — tally items, instantly
votes = Counter(["ai", "uae", "ai", "ai"])
votes.most_common(1)              # [('ai', 3)]

# pathlib — modern, OS-safe file paths (no string juggling)
p = Path("docs") / "report.pdf"   # 'docs/report.pdf'
p.exists(), p.suffix, p.stem      # False, '.pdf', 'report'

# json — dict ⇄ JSON string, the lingua franca of APIs
text = json.dumps({"ok": True})    # '{"ok": true}'
back = json.loads(text)            # {'ok': True}

# datetime — timestamps
now = datetime.datetime.now()
now.isoformat()                    # '2026-06-19T18:55:00'
PHP bridge: json.dumps/loadsjson_encode/json_decode. Counterarray_count_values but smarter. pathlib ≈ a cleaner DIRECTORY_SEPARATOR story — paths become objects you compose with /.

9 · A taste of async/await

You don't need to master async yet — but you'll see it from the first FastAPI lesson, so meet it now. An async def function is a coroutine; inside it, await pauses on a slow operation (a network call, a DB query) and lets other work run meanwhile, instead of blocking.

import asyncio

async def fetch_title(doc_id):
    await asyncio.sleep(1)        # pretend this is a slow DB/network call
    return f"Title {doc_id}"

async def main():
    title = await fetch_title(7)   # wait without blocking everything else
    print(title)

asyncio.run(main())
Why FastAPI cares A web server spends most of its time waiting — on the database, on an AI embedding call, on disk. async lets one server handle many requests during those waits instead of sitting idle. That's why you'll write async def handlers in FastAPI: it's how a single process serves lots of concurrent DocChat users efficiently.

10 · Build it

Your tangible win Parse a block of server log lines and report how many of each level (INFO, WARN, ERROR) appeared — using a comprehension to extract levels and a Counter to tally them. This is real ops code: you'll skim logs exactly like this when DocChat is deployed.

Try it first, then compare. A clean version:

logparse.py
from collections import Counter

logs = """INFO  server started
WARN  slow query 1200ms
INFO  request /docs 200
ERROR db connection lost
WARN  retry scheduled
INFO  request /ask 200
ERROR embedding timeout"""

# 1. comprehension: first word of each non-empty line = the level
levels = [line.split()[0] for line in logs.splitlines() if line.strip()]

# 2. Counter: tally them
counts = Counter(levels)

for level, n in counts.most_common():
    print(f"{level}: {n}")
# INFO: 3 / WARN: 2 / ERROR: 2

Two lines of real logic — a comprehension to extract, a Counter to tally. That's the Pythonic toolkit doing exactly what it's for.

11 · functools — caching, wraps, partial

The functools module is a toolbox of helpers that operate on functions themselves. Four of its pieces show up constantly in real backends and interviews.

@lru_cache / @cache memoize a function: the result for a given set of arguments is computed once, then served from cache. Perfect for an expensive, pure lookup — like fetching an embedding for a repeated piece of text:

from functools import lru_cache, cache

@lru_cache(maxsize=1024)      # keep the 1024 most-recent results
def embed(text):
    # pretend this is a slow, costly model call
    return expensive_embedding_api(text)

embed("hello")   # slow — actually calls the API
embed("hello")   # instant — served from cache, no API call

@cache                     # Python 3.9+: lru_cache with no size limit
def fib(n):
    return n if n < 2 else fib(n - 1) + fib(n - 2)

@wraps ties straight back to the decorators section. When you wrap a function, the inner wrapper replaces the original — so its name and docstring get clobbered. @wraps(fn) copies that metadata back across:

from functools import wraps

def timed(fn):
    @wraps(fn)                  # preserve fn's __name__ and __doc__
    def wrapper(*args, **kwargs):
        return fn(*args, **kwargs)
    return wrapper

@timed
def greet(name):
    """Say hello."""
    return f"Hi {name}"

greet.__name__   # 'greet'  — WITHOUT @wraps this would be 'wrapper'
greet.__doc__    # 'Say hello.'  — preserved
Interview answer — what @functools.wraps fixes "When you write a decorator, the wrapper function it returns shadows the original — so fn.__name__ becomes 'wrapper', the docstring disappears, and the signature is lost. @wraps(fn) copies that identity metadata (__name__, __doc__, __wrapped__, and more) from the original onto the wrapper." Why it matters in practice: introspection, logging, and especially FastAPI rely on a function's real name and signature to build docs and validate requests. Forget @wraps and your decorated route handlers all look like wrapper in tracebacks and the OpenAPI schema breaks. Saying this out loud marks you as someone who has actually written decorators, not just used them.

partial pre-fills some of a function's arguments, handing you back a new, shorter-to-call function:

from functools import partial

def connect(host, port, timeout):
    ...

local = partial(connect, "127.0.0.1", 5432)   # host+port locked in
local(timeout=5)                            # only the last arg left to give

reduce folds a sequence down to a single value by applying a function pairwise. It's niche — a plain loop or sum() is usually clearer — but worth recognising:

from functools import reduce

reduce(lambda a, b: a * b, [1, 2, 3, 4])   # 24  (1*2*3*4)
PHP bridge: reduce ≈ PHP's array_reduce. partial ≈ a closure that captures a few arguments. @lru_cache has no built-in PHP twin — you'd hand-roll a static $cache array.

12 · Parameterized decorators

The decorators in section 5 took no arguments. But what about @retry(times=3) — a decorator you configure? This is a classic interview stumper because it needs three nested layers: a function that returns a decorator that returns a wrapper.

import time
from functools import wraps

def retry(times=3, delay=0.5):        # LAYER 1: takes the arguments
    def decorator(fn):                   # LAYER 2: the actual decorator
        @wraps(fn)
        def wrapper(*args, **kwargs):   # LAYER 3: wraps each call
            for attempt in range(1, times + 1):
                try:
                    return fn(*args, **kwargs)
                except Exception:
                    if attempt == times:
                        raise            # out of attempts — give up
                    time.sleep(delay)
        return wrapper
    return decorator

@retry(times=3, delay=1)
def fetch_embedding(text):
    return flaky_network_call(text)   # retried up to 3 times

Read the layers from the outside in. @retry(times=3) calls retry first, which returns decorator — and that is what actually decorates fetch_embedding. So @retry(times=3) is sugar for fetch_embedding = retry(times=3)(fetch_embedding). The extra pair of parentheses is the whole trick: a no-arg decorator is one call; a parameterized one is two.

Interview note — the parentheses test. If a decorator is used with parentheses (@retry(times=3)) it must be a function that returns a decorator. If it's used without (@timed), it is the decorator directly. Mixing these up — e.g. writing @retry when retry expects arguments — is the single most common decorator bug. Knowing exactly why is a strong signal.

13 · collections & itertools

Two standard-library modules pull their weight on almost every backend. You already met Counter in section 8 — here are its siblings.

defaultdict gives every missing key a default value automatically, so you skip the clumsy d.get(k, 0) + 1 dance:

from collections import defaultdict

# the awkward plain-dict way
counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

# the clean defaultdict way — missing keys start at 0
counts = defaultdict(int)
for word in words:
    counts[word] += 1                  # no key-exists check needed

# group items into lists — missing keys start as []
by_level = defaultdict(list)
by_level["ERROR"].append("db lost")   # key auto-created as []

Counter (from section 8) is itself a specialised defaultdict for tallying — reach for it when counting is all you need, and defaultdict(list) when you're grouping.

itertools is a set of lazy, memory-efficient iterator builders. Four you'll actually use:

from itertools import chain, islice, groupby, count

# chain — flatten several iterables into one stream
all_tags = chain(["ai", "uae"], ["rag"], ["fastapi"])   # one sequence

# islice — take a slice of any iterator (great for batching)
def batched(it, size):
    it = iter(it)
    while batch := list(islice(it, size)):
        yield batch                # e.g. chunk an embedding stream by 100

# groupby — group ADJACENT items by a key (sort first!)
rows = sorted(rows, key=lambda r: r["level"])
for level, group in groupby(rows, key=lambda r: r["level"]):
    print(level, list(group))

# count — an endless counter, handy for auto-incrementing ids
ids = count(1)
next(ids), next(ids)              # 1, 2, ...
Why this matters for DocChat Embedding APIs charge per call and often cap how many chunks you can send at once. The batched helper above — built on islice — lazily slices your chunk generator into groups of, say, 100, so you embed a whole document in efficient batches without ever holding it all in memory. That's the generator pattern from section 2 plus itertools working together.
PHP bridge: defaultdict ≈ PHP arrays that spring into existence on first write ($counts[$k]++ just works) — Python normally forbids that, and defaultdict restores the convenience deliberately. itertools has no neat PHP twin; you'd write manual loops or array_chunk for islice-style batching.

14 · A taste of re (regex)

The re module is Python's regular-expression engine — for finding and extracting patterns in text. You'll use it to pull structured bits (dates, IDs, emails) out of messy document text. Always write patterns as raw strings (r"...") so backslashes mean what you expect.

import re

text = "Invoice INV-2026-0042 dated 2026-06-19, contact a@b.com"

# re.search — find the FIRST match; returns a match object or None
m = re.search(r"INV-\d{4}-\d{4}", text)
if m:
    print(m.group())             # 'INV-2026-0042'

# re.findall — return EVERY match as a list of strings
dates = re.findall(r"\d{4}-\d{2}-\d{2}", text)
print(dates)                     # ['2026-06-19']

emails = re.findall(r"[\w.]+@[\w.]+", text)
print(emails)                    # ['a@b.com']

\d matches a digit, {4} means "exactly four", \w is a word character. re.search stops at the first hit; re.findall sweeps the whole string. That's enough to extract invoice numbers or dates from ingested PDF text before you store them.

PHP bridge: re.searchpreg_match, re.findallpreg_match_all. The big difference: no slash delimiters and no /i-style flags glued to the pattern — Python passes flags as arguments (re.IGNORECASE) and uses raw strings instead of delimiters.

15 · Check yourself

Answer from memory — retrieval is what turns "I read it" into "I know it".

Recall quiz

What does yield give a function?

Which builds a dict from an expression?

A decorator is best described as a thing that…

What does Optional[str] mean?

Why use with open(...) over plain open?

What does @functools.wraps mainly fix?

Why does @retry(times=3) need three layers?

Primary source ⭐ The Official Python Tutorial — §8 Errors and Exceptions. Authoritative on try/except/else/finally, raising, and custom exception classes. For approachable deep-dives on comprehensions, generators, and decorators, Real Python is the best supplement on the web.