Module 1 · Python · Deep Dive
The capstone of Module 1. Concurrency, the event loop, task groups, timeouts, and the one blocking mistake that fails FastAPI interviews — all of it pointed straight at DocChat.
IntermediateAdvancedBuild
Most backend work is waiting. Your code asks a database, an HTTP API, or a disk for something, then sits idle for milliseconds — an eternity for a CPU — until the answer arrives. Async is a way to do other useful work during that wait, on a single thread.
Two words that get confused in interviews:
| Term | Means | How Python does it |
|---|---|---|
| Concurrency | Many tasks in progress, interleaved on one worker. Like one chef juggling several pans. | asyncio — the event loop |
| Parallelism | Many tasks running at the same instant on many cores. Several chefs, several stoves. | multiprocessing |
The event loop is the chef. It runs one coroutine until that coroutine hits an await on something slow (a network call), then sets it aside and runs another. When the network answers, it picks the first one back up. Nothing truly happens "at the same time" — it just never wastes the wait.
multiprocessing or ProcessPoolExecutor.
The GIL in one line: CPython's Global Interpreter Lock means only one thread runs Python bytecode at a time — which is why threads don't give you CPU parallelism, and why async (single-threaded) and processes (separate interpreters) are the two real tools.
PHP bridge: classic PHP is synchronous and blocking — each request runs start-to-finish on its own process/worker, and the web server (php-fpm) spins up many workers to handle many users. Python flips it: one async process interleaves hundreds of requests itself, because it never blocks on I/O.async def & awaitMark a function async def and it becomes a coroutine function. Inside it you may use await to pause on something asynchronous and hand control back to the event loop.
import asyncio async def fetch_user(uid): await asyncio.sleep(1) # stand-in for a real network wait return {"id": uid, "name": "Sam"}
Here's the trap that surprises everyone: calling a coroutine function does nothing. Coroutines are lazy. You get a coroutine object back; the body hasn't run.
c = fetch_user(7) # <coroutine object> — NOTHING has executed yet result = await c # NOW it runs and yields the dict
At the very top — your main — there's no await available because you're not inside a coroutine yet. That's what asyncio.run() is for: it starts an event loop, runs your top coroutine to completion, and tears the loop down.
async def main(): user = await fetch_user(7) print(user) asyncio.run(main()) # the single entry point — call this once
fetch_user(7) without await, Python warns "coroutine was never awaited" and the work silently never happens. When a result is mysteriously empty in async code, this is the first thing to check.
Awaiting in a loop is still sequential — each await finishes before the next starts. Three one-second calls take three seconds:
# SLOW — sequential, 3 seconds total results = [] for uid in [1, 2, 3]: results.append(await fetch_user(uid))
To overlap the waits you must schedule them together. Two modern ways.
asyncio.gather takes many coroutines, runs them concurrently, and returns their results in order. Three one-second calls now take ~one second:
# FAST — concurrent, ~1 second total results = await asyncio.gather( fetch_user(1), fetch_user(2), fetch_user(3) ) # Fan out over a list of inputs: results = await asyncio.gather(*[fetch_user(u) for u in ids])
The modern, recommended way is asyncio.TaskGroup. You open an async with block, create tasks inside it, and the block does not exit until they all finish. This is structured concurrency: the tasks' lifetime is bounded by the block.
async def main(): async with asyncio.TaskGroup() as tg: t1 = tg.create_task(fetch_user(1)) t2 = tg.create_task(fetch_user(2)) t3 = tg.create_task(fetch_user(3)) # on exit, ALL tasks are done — read results with .result() print(t1.result(), t2.result(), t3.result())
Why prefer it over gather:
ExceptionGroup, which you catch with except* (3.11+) — no silently lost errors.try: async with asyncio.TaskGroup() as tg: tg.create_task(fetch_user(1)) tg.create_task(might_fail()) except* ValueError as eg: # note the star — catches from the group print("one or more failed:", eg.exceptions)
TaskGroup by default in new code. Keep gather for quick "run these and give me the list" cases where you don't need cancellation semantics — and remember gather(..., return_exceptions=True) if you want failures as values instead of raised.
A network call that hangs forever will hang your request forever. Always bound external waits. The modern tool is the asyncio.timeout() context manager (3.11+):
try: async with asyncio.timeout(5): # 5 seconds for everything inside data = await slow_api_call() except TimeoutError: data = None # fall back gracefully
The older one-shot form still appears in codebases: await asyncio.wait_for(slow_api_call(), timeout=5). Both wrap the awaited work and cancel it if the clock runs out.
What cancellation actually does: it raises asyncio.CancelledError inside the awaiting coroutine at its current await point. That's how a timeout or a TaskGroup stops sibling tasks.
except Exception: can accidentally catch CancelledError and break cancellation — your "cancelled" task keeps running. If you must catch it to clean up, re-raise it:
try: await work() except asyncio.CancelledError: cleanup() raise # always re-raise — let cancellation propagate
The event loop is a single thread. If you call a blocking, synchronous function inside a coroutine, that thread cannot do anything else — every other request on the server freezes until it returns. This is the async mistake interviewers probe for.
# DISASTER — blocks the ENTIRE event loop for 5 seconds. # Every other user's request is frozen. async def handler(): import time time.sleep(5) # sync sleep — NOT awaitable, blocks all response = requests.get(url) # sync HTTP lib — same problem
The offenders: time.sleep, the requests library, heavy CPU loops, synchronous database drivers, file reads with the plain open(). Anything that isn't awaited but takes real time.
Two fixes:
requests for httpx and await it; swap time.sleep for asyncio.sleep. Best option when one exists.asyncio.to_thread(). It runs the sync function on a worker thread and gives you back an awaitable, freeing the loop:# Blocking function you can't avoid (a sync SDK, a CPU-light parse) async def handler(): result = await asyncio.to_thread(slow_blocking_fn, arg1, arg2) return result
requests blocks that thread until the HTTP call returns, so while it waits, no other request can be served — async throughput collapses to one-at-a-time. The fix is an async client like httpx that I can await, or wrapping the blocking call in asyncio.to_thread so it runs off the loop. For genuinely CPU-bound work I'd use a process pool, since threads can't beat the GIL." Thirty seconds, and it tells them you've felt this in production.
curl only blocks that request. In async Python a slow blocking call blocks everyone sharing the loop. Different model, different discipline.
Some resources need to set up and tear down asynchronously — opening a connection pool is itself I/O. For those you use async with instead of plain with:
import httpx async with httpx.AsyncClient() as client: # async setup/teardown r = await client.get("https://api.example.com/docs") print(r.json())
And when data arrives in a stream — say a model streaming tokens, or a large download — you consume it with async for, awaiting each chunk as it lands:
async with client.stream("GET", url) as resp: async for chunk in resp.aiter_bytes(): process(chunk) # handle each piece as it streams in
You'll use both directly in DocChat: async with for the HTTP client to your embedding/LLM provider, and async for to stream the chat answer back to the browser token-by-token.
Semaphore.
A Semaphore(n) is a permit counter. At most n coroutines hold a permit at once; the rest wait their turn at async with sem:. Combine it with a TaskGroup to fan out safely:
embed.py
import asyncio, httpx async def embed_chunk(client, sem, text): async with sem: # wait for a free permit (max N in flight) async with asyncio.timeout(30): # bound each call r = await client.post(EMBED_URL, json={"input": text}) return r.json()["embedding"] async def embed_all(chunks): sem = asyncio.Semaphore(8) # at most 8 concurrent API calls async with httpx.AsyncClient() as client: async with asyncio.TaskGroup() as tg: tasks = [ tg.create_task(embed_chunk(client, sem, c)) for c in chunks ] return [t.result() for t in tasks] # all done; collect in order vectors = asyncio.run(embed_all(chunks))
Read what this gives you: one shared HTTP client (connection reuse), eight calls in flight at a time (fast but polite), a 30-second cap on each (no infinite hangs), automatic cancellation of siblings if one blows up, and results in original order. That single function is genuinely the heart of DocChat's ingestion.
Concurrency vs parallelism — one line each?
tap to flip
Concurrency = many tasks interleaved on one worker (asyncio). Parallelism = many tasks at the same instant on many cores (multiprocessing).
Does async speed up CPU-bound work?
tap to flip
No. Async only overlaps waiting (I/O). CPU work has no idle wait to fill — use multiprocessing / a ProcessPoolExecutor for that.
What does calling a coroutine function do?
tap to flip
Nothing yet — it returns a lazy coroutine object. It only runs when you await it (or schedule it as a task / pass it to gather).
TaskGroup over gather — why?
tap to flip
Structured concurrency: auto-cancels siblings on failure, groups errors as an ExceptionGroup (catch with except*), and never leaks dangling tasks.
You must run a blocking sync function. How?
tap to flip
await asyncio.to_thread(fn, args) — runs it on a worker thread so the event loop stays free. Or switch to an async-native library.
What does cancellation raise, and the rule?
tap to flip
It raises asyncio.CancelledError at the current await point. Never swallow it — if you catch it to clean up, always raise again.
Build itDrill
Write an async script that "fetches" three fake URLs concurrently. Each fetch(url) should await asyncio.sleep for a different number of seconds (1, 2, 3) and return f"{url} done". Run all three inside a TaskGroup, bound the whole group to a 2.5-second timeout, and print which results came back before the deadline. Total wall-clock should be ~2.5s, not 6s.
import asyncio async def fetch(url, delay): await asyncio.sleep(delay) return f"{url} done" async def main(): jobs = [("a", 1), ("b", 2), ("c", 3)] tasks = [] try: async with asyncio.timeout(2.5): async with asyncio.TaskGroup() as tg: tasks = [tg.create_task(fetch(u, d)) for u, d in jobs] except TimeoutError: print("deadline hit — slow ones were cancelled") for t in tasks: if t.done() and not t.cancelled(): print(t.result()) # a and b made it; c was cancelled asyncio.run(main())
Note how the timeout cancels the still-running task c by raising CancelledError inside it — exactly the mechanism from §4. The TaskGroup ensures nothing is left dangling.
Answer from memory — retrieval is what moves this from "I read it" to "I know it".
Async mainly helps which kind of work?
What does calling a coroutine function return?
Why prefer TaskGroup over plain gather?
How do you run blocking code safely?
What should you do with CancelledError?
asyncio documentation — authoritative for coroutines, TaskGroup, timeouts, and to_thread. For a gentler narrative walkthrough, read Real Python — Async IO in Python: A Complete Walkthrough.