Module 1 · Python · Deep Dive
The features that separate "I write Python" from "I write Pythonic Python" — comprehensions, generators, decorators, context managers, type hints, and the async keyword that powers FastAPI.
BasicIntermediateBuild
@app.get), context managers open files safely, and async/await is the engine under every endpoint. Learn these and the rest of the course stops feeling like magic.
yieldfunctools — caching, wraps, partialcollections & itertoolsre (regex)A comprehension builds a collection in one expressive line. It's the single most "Python looks different from PHP" feature — and interviewers expect you to read and write them fluently.
# The long way squares = [] for n in range(5): squares.append(n * n) # The Pythonic way — a list comprehension squares = [n * n for n in range(5)] # [0, 1, 4, 9, 16]
Read it left to right: "the expression n*n, for each n in the range." Add a trailing if to filter:
nums = [4, 1, 7, 2, 9] evens = [n for n in nums if n % 2 == 0] # [4, 2] labels = [f"#{n}" for n in nums if n > 3] # ['#4', '#7', '#9']
The same syntax builds dicts and sets — just swap the brackets:
# dict comprehension — {key: value for ...} prices = {"pen": 3, "pad": 8, "ink": 12} with_vat = {name: p * 1.05 for name, p in prices.items()} cheap = {name: p for name, p in prices.items() if p < 10} # set comprehension — unique results, no duplicates words = "the cat the dog the cat".split() unique_lengths = {len(w) for w in words} # {3}PHP bridge: there's no real PHP equivalent — the closest is
array_map + array_filter chained, but those are clunkier and lazier readers skip them. A comprehension says map and filter in one breath.
[x for row in grid for x in row]) but if it takes more than a glance to read, write a plain loop. Clever is not the goal; clear is.
yieldA list builds all its items in memory at once. A generator produces items one at a time, on demand — it's lazy. You write one with yield instead of return:
def count_up(limit): n = 0 while n < limit: yield n # hand back one value, then PAUSE here n += 1 for x in count_up(3): print(x) # 0, 1, 2 — produced one at a time
When the loop asks for the next value, the function resumes right where it paused. Nothing is stored except the current position. The memory win is enormous: a generator over a billion items uses the same memory as one over three.
There's also a generator expression — a comprehension with round brackets — for the same laziness inline:
total = sum(n * n for n in range(1_000_000)) # no million-item list ever built
# Streaming chunks of text — lazily, one at a time def chunks(text, size): for i in range(0, len(text), size): yield text[i : i + size] for piece in chunks(big_document, 500): embed_and_store(piece) # process, then discard — memory stays flat
A generator is one kind of iterator — anything you can loop over. Three built-ins make iteration clean and come up constantly in interviews:
# enumerate — index + item together (you met this in 1.1) for i, name in enumerate(["a", "b"], start=1): print(i, name) # 1 a / 2 b # zip — walk two (or more) sequences in lockstep names = ["Sam", "Mei"] scores = [90, 85] for name, score in zip(names, scores): print(f"{name}: {score}") paired = dict(zip(names, scores)) # {'Sam': 90, 'Mei': 85}
Tuple unpacking assigns several names at once, and the * "star" collects the rest:
x, y = (10, 20) # x=10, y=20 first, *rest = [1, 2, 3, 4] # first=1, rest=[2, 3, 4] head, *mid, tail = [1, 2, 3, 4] # head=1, mid=[2,3], tail=4 # * also SPREADS a sequence into function args nums = [3, 7, 1] print(max(*nums)) # 7 (same as max(3, 7, 1)) # ** spreads a dict into keyword args opts = {"sep": " | "} print("a", "b", **opts) # a | bPHP bridge:
*args ≈ PHP's ...$args spread/variadic, and **kwargs ≈ passing a named-argument array. zip has no neat PHP twin — you'd loop an index by hand.
Python uses try / except / else / finally. You catch specific exception types, never a bare blanket catch:
try: pages = int(raw_value) # might raise ValueError except ValueError: pages = 0 # runs only if THAT error happened else: print("parsed cleanly") # runs only if NO error happened finally: print("always runs") # cleanup — runs no matter what
You raise errors yourself with raise, and you can define your own exception classes by subclassing Exception — this is how real APIs signal domain problems:
class DocumentTooLargeError(Exception): """Raised when an upload exceeds the page limit.""" def ingest(doc): if doc["pages"] > 1000: raise DocumentTooLargeError(f"{doc['pages']} pages is too many") return "ok"
if. PHP code leans LBYL (if (isset(...))); idiomatic Python leans EAFP — try: d[k] except KeyError: over if k in d: when a miss is the exception, not the norm. Saying this in an interview signals you think in Python, not translated PHP.
try/catch/finally and throw. The new parts are the else clause and the cultural lean toward EAFP.
A decorator is a function that wraps another function to add behaviour — without editing the original. The @name syntax sits on the line above a function. It's the feature that makes FastAPI's route declarations possible, so understand it now.
import time def timed(fn): # takes a function... def wrapper(*args, **kwargs): # ...returns a wrapped version start = time.perf_counter() result = fn(*args, **kwargs) # call the original ms = (time.perf_counter() - start) * 1000 print(f"{fn.__name__} took {ms:.2f} ms") return result return wrapper @timed # == greet = timed(greet) def greet(name): return f"Hi {name}" greet("Sam") # prints: greet took 0.01 ms
The @timed line is pure sugar for greet = timed(greet). The wrapper now runs around every call — perfect for timing, logging, caching, or auth checks.
@app.get("/docs") over a function in Module 2, you are using a decorator. FastAPI's app.get(...) returns a decorator that registers your function as the handler for GET /docs — it wires the route, parses the request, and validates the response, all without you touching the function body. The mental model you just built — "a decorator wraps and adds behaviour" — is exactly what makes @app.get click instead of feeling like incantation.
A context manager guarantees setup and cleanup around a block — most famously, closing a file even if an error is thrown. The with statement is how you use one:
with open("notes.txt") as f: text = f.read() # file is AUTOMATICALLY closed here — even if an exception fired inside
Without with, you'd have to call f.close() by hand and wrap it in try/finally to be safe. The with block does that for you. The same pattern manages database sessions, network connections, and locks — anything that must be released.
fclose, or hoping the request ends). Python's with makes "always release it" a one-liner — a habit interviewers love to see.
You met basic hints in 1.2. Real codebases (and FastAPI especially) lean on richer ones. They don't change runtime behaviour — they document intent and let tools catch bugs before you run.
from typing import Optional def find_title(doc_id: int) -> Optional[str]: """Returns the title, or None if not found.""" ... tags: list[str] = ["ai", "uae"] # a list of strings counts: dict[str, int] = {"ai": 2} # str keys, int values
Optional[str] means "a str or None" — the modern equivalent is str | None.list[str], dict[str, int] — built-in generics (lowercase, since Python 3.9). No import needed.from typing import ... pulls in extras like Optional, Union, Any, Callable.mypy your_file.py and it flags type mismatches statically — like passing a str where an int is declared — before your code ever executes. It's the closest Python gets to a compiler's safety net, and it's standard on professional 2026 teams.
Python ships "batteries included." Four pieces of the standard library you'll reach for weekly:
from collections import Counter from pathlib import Path import json, datetime # Counter — tally items, instantly votes = Counter(["ai", "uae", "ai", "ai"]) votes.most_common(1) # [('ai', 3)] # pathlib — modern, OS-safe file paths (no string juggling) p = Path("docs") / "report.pdf" # 'docs/report.pdf' p.exists(), p.suffix, p.stem # False, '.pdf', 'report' # json — dict ⇄ JSON string, the lingua franca of APIs text = json.dumps({"ok": True}) # '{"ok": true}' back = json.loads(text) # {'ok': True} # datetime — timestamps now = datetime.datetime.now() now.isoformat() # '2026-06-19T18:55:00'PHP bridge:
json.dumps/loads ≈ json_encode/json_decode. Counter ≈ array_count_values but smarter. pathlib ≈ a cleaner DIRECTORY_SEPARATOR story — paths become objects you compose with /.
You don't need to master async yet — but you'll see it from the first FastAPI lesson, so meet it now. An async def function is a coroutine; inside it, await pauses on a slow operation (a network call, a DB query) and lets other work run meanwhile, instead of blocking.
import asyncio async def fetch_title(doc_id): await asyncio.sleep(1) # pretend this is a slow DB/network call return f"Title {doc_id}" async def main(): title = await fetch_title(7) # wait without blocking everything else print(title) asyncio.run(main())
async lets one server handle many requests during those waits instead of sitting idle. That's why you'll write async def handlers in FastAPI: it's how a single process serves lots of concurrent DocChat users efficiently.
INFO, WARN, ERROR) appeared — using a comprehension to extract levels and a Counter to tally them. This is real ops code: you'll skim logs exactly like this when DocChat is deployed.
Try it first, then compare. A clean version:
logparse.py
from collections import Counter logs = """INFO server started WARN slow query 1200ms INFO request /docs 200 ERROR db connection lost WARN retry scheduled INFO request /ask 200 ERROR embedding timeout""" # 1. comprehension: first word of each non-empty line = the level levels = [line.split()[0] for line in logs.splitlines() if line.strip()] # 2. Counter: tally them counts = Counter(levels) for level, n in counts.most_common(): print(f"{level}: {n}") # INFO: 3 / WARN: 2 / ERROR: 2
Two lines of real logic — a comprehension to extract, a Counter to tally. That's the Pythonic toolkit doing exactly what it's for.
functools — caching, wraps, partialThe functools module is a toolbox of helpers that operate on functions themselves. Four of its pieces show up constantly in real backends and interviews.
@lru_cache / @cache memoize a function: the result for a given set of arguments is computed once, then served from cache. Perfect for an expensive, pure lookup — like fetching an embedding for a repeated piece of text:
from functools import lru_cache, cache @lru_cache(maxsize=1024) # keep the 1024 most-recent results def embed(text): # pretend this is a slow, costly model call return expensive_embedding_api(text) embed("hello") # slow — actually calls the API embed("hello") # instant — served from cache, no API call @cache # Python 3.9+: lru_cache with no size limit def fib(n): return n if n < 2 else fib(n - 1) + fib(n - 2)
@wraps ties straight back to the decorators section. When you wrap a function, the inner wrapper replaces the original — so its name and docstring get clobbered. @wraps(fn) copies that metadata back across:
from functools import wraps def timed(fn): @wraps(fn) # preserve fn's __name__ and __doc__ def wrapper(*args, **kwargs): return fn(*args, **kwargs) return wrapper @timed def greet(name): """Say hello.""" return f"Hi {name}" greet.__name__ # 'greet' — WITHOUT @wraps this would be 'wrapper' greet.__doc__ # 'Say hello.' — preserved
@functools.wraps fixes
"When you write a decorator, the wrapper function it returns shadows the original — so fn.__name__ becomes 'wrapper', the docstring disappears, and the signature is lost. @wraps(fn) copies that identity metadata (__name__, __doc__, __wrapped__, and more) from the original onto the wrapper." Why it matters in practice: introspection, logging, and especially FastAPI rely on a function's real name and signature to build docs and validate requests. Forget @wraps and your decorated route handlers all look like wrapper in tracebacks and the OpenAPI schema breaks. Saying this out loud marks you as someone who has actually written decorators, not just used them.
partial pre-fills some of a function's arguments, handing you back a new, shorter-to-call function:
from functools import partial def connect(host, port, timeout): ... local = partial(connect, "127.0.0.1", 5432) # host+port locked in local(timeout=5) # only the last arg left to give
reduce folds a sequence down to a single value by applying a function pairwise. It's niche — a plain loop or sum() is usually clearer — but worth recognising:
from functools import reduce reduce(lambda a, b: a * b, [1, 2, 3, 4]) # 24 (1*2*3*4)PHP bridge:
reduce ≈ PHP's array_reduce. partial ≈ a closure that captures a few arguments. @lru_cache has no built-in PHP twin — you'd hand-roll a static $cache array.
The decorators in section 5 took no arguments. But what about @retry(times=3) — a decorator you configure? This is a classic interview stumper because it needs three nested layers: a function that returns a decorator that returns a wrapper.
import time from functools import wraps def retry(times=3, delay=0.5): # LAYER 1: takes the arguments def decorator(fn): # LAYER 2: the actual decorator @wraps(fn) def wrapper(*args, **kwargs): # LAYER 3: wraps each call for attempt in range(1, times + 1): try: return fn(*args, **kwargs) except Exception: if attempt == times: raise # out of attempts — give up time.sleep(delay) return wrapper return decorator @retry(times=3, delay=1) def fetch_embedding(text): return flaky_network_call(text) # retried up to 3 times
Read the layers from the outside in. @retry(times=3) calls retry first, which returns decorator — and that is what actually decorates fetch_embedding. So @retry(times=3) is sugar for fetch_embedding = retry(times=3)(fetch_embedding). The extra pair of parentheses is the whole trick: a no-arg decorator is one call; a parameterized one is two.
@retry(times=3)) it must be a function that returns a decorator. If it's used without (@timed), it is the decorator directly. Mixing these up — e.g. writing @retry when retry expects arguments — is the single most common decorator bug. Knowing exactly why is a strong signal.
collections & itertoolsTwo standard-library modules pull their weight on almost every backend. You already met Counter in section 8 — here are its siblings.
defaultdict gives every missing key a default value automatically, so you skip the clumsy d.get(k, 0) + 1 dance:
from collections import defaultdict # the awkward plain-dict way counts = {} for word in words: counts[word] = counts.get(word, 0) + 1 # the clean defaultdict way — missing keys start at 0 counts = defaultdict(int) for word in words: counts[word] += 1 # no key-exists check needed # group items into lists — missing keys start as [] by_level = defaultdict(list) by_level["ERROR"].append("db lost") # key auto-created as []
Counter (from section 8) is itself a specialised defaultdict for tallying — reach for it when counting is all you need, and defaultdict(list) when you're grouping.
itertools is a set of lazy, memory-efficient iterator builders. Four you'll actually use:
from itertools import chain, islice, groupby, count # chain — flatten several iterables into one stream all_tags = chain(["ai", "uae"], ["rag"], ["fastapi"]) # one sequence # islice — take a slice of any iterator (great for batching) def batched(it, size): it = iter(it) while batch := list(islice(it, size)): yield batch # e.g. chunk an embedding stream by 100 # groupby — group ADJACENT items by a key (sort first!) rows = sorted(rows, key=lambda r: r["level"]) for level, group in groupby(rows, key=lambda r: r["level"]): print(level, list(group)) # count — an endless counter, handy for auto-incrementing ids ids = count(1) next(ids), next(ids) # 1, 2, ...
batched helper above — built on islice — lazily slices your chunk generator into groups of, say, 100, so you embed a whole document in efficient batches without ever holding it all in memory. That's the generator pattern from section 2 plus itertools working together.
defaultdict ≈ PHP arrays that spring into existence on first write ($counts[$k]++ just works) — Python normally forbids that, and defaultdict restores the convenience deliberately. itertools has no neat PHP twin; you'd write manual loops or array_chunk for islice-style batching.
re (regex)The re module is Python's regular-expression engine — for finding and extracting patterns in text. You'll use it to pull structured bits (dates, IDs, emails) out of messy document text. Always write patterns as raw strings (r"...") so backslashes mean what you expect.
import re text = "Invoice INV-2026-0042 dated 2026-06-19, contact a@b.com" # re.search — find the FIRST match; returns a match object or None m = re.search(r"INV-\d{4}-\d{4}", text) if m: print(m.group()) # 'INV-2026-0042' # re.findall — return EVERY match as a list of strings dates = re.findall(r"\d{4}-\d{2}-\d{2}", text) print(dates) # ['2026-06-19'] emails = re.findall(r"[\w.]+@[\w.]+", text) print(emails) # ['a@b.com']
\d matches a digit, {4} means "exactly four", \w is a word character. re.search stops at the first hit; re.findall sweeps the whole string. That's enough to extract invoice numbers or dates from ingested PDF text before you store them.
re.search ≈ preg_match, re.findall ≈ preg_match_all. The big difference: no slash delimiters and no /i-style flags glued to the pattern — Python passes flags as arguments (re.IGNORECASE) and uses raw strings instead of delimiters.
Answer from memory — retrieval is what turns "I read it" into "I know it".
What does yield give a function?
Which builds a dict from an expression?
A decorator is best described as a thing that…
What does Optional[str] mean?
Why use with open(...) over plain open?
What does @functools.wraps mainly fix?
Why does @retry(times=3) need three layers?
try/except/else/finally, raising, and custom exception classes. For approachable deep-dives on comprehensions, generators, and decorators, Real Python is the best supplement on the web.