Module 1 · Python · Deep Dive

Async Python, properly

The capstone of Module 1. Concurrency, the event loop, task groups, timeouts, and the one blocking mistake that fails FastAPI interviews — all of it pointed straight at DocChat.

IntermediateAdvancedBuild

Why this matters DocChat lives or dies on async. When a user drops a 200-page PDF, you embed dozens of chunks — and waiting for each API call one-by-one is unbearable. Async lets you fire them concurrently. And FastAPI, the framework under the whole backend, is async to its core: one process serves hundreds of users at once because it never sits idle waiting on I/O. Get this lesson wrong and your server stalls under load. Get it right and you sound like someone who has actually shipped.

In this lesson

Why async exists
async def & await
Running things concurrently
Timeouts & cancellation
The blocking trap
async context managers & iterators
DocChat: bounded concurrent embedding
Check yourself

1 · Why async exists

Most backend work is waiting. Your code asks a database, an HTTP API, or a disk for something, then sits idle for milliseconds — an eternity for a CPU — until the answer arrives. Async is a way to do other useful work during that wait, on a single thread.

Two words that get confused in interviews:

Term	Means	How Python does it
Concurrency	Many tasks in progress, interleaved on one worker. Like one chef juggling several pans.	`asyncio` — the event loop
Parallelism	Many tasks running at the same instant on many cores. Several chefs, several stoves.	`multiprocessing`

The event loop is the chef. It runs one coroutine until that coroutine hits an await on something slow (a network call), then sets it aside and runs another. When the network answers, it picks the first one back up. Nothing truly happens "at the same time" — it just never wastes the wait.

I/O-bound vs CPU-bound — the dividing line Async helps I/O-bound work (waiting on network, disk, database) because waiting is exactly what it overlaps. It does not speed up CPU-bound work (resizing images, crunching numbers) — there's no idle wait to fill, and one thread is one thread. For CPU work you need real parallelism: multiprocessing or ProcessPoolExecutor.

The GIL in one line: CPython's Global Interpreter Lock means only one thread runs Python bytecode at a time — which is why threads don't give you CPU parallelism, and why async (single-threaded) and processes (separate interpreters) are the two real tools.

PHP bridge: classic PHP is synchronous and blocking — each request runs start-to-finish on its own process/worker, and the web server (php-fpm) spins up many workers to handle many users. Python flips it: one async process interleaves hundreds of requests itself, because it never blocks on I/O.

2 · `async def` & `await`

Mark a function async def and it becomes a coroutine function. Inside it you may use await to pause on something asynchronous and hand control back to the event loop.

import asyncio

async def fetch_user(uid):
    await asyncio.sleep(1)        # stand-in for a real network wait
    return {"id": uid, "name": "Sam"}

Here's the trap that surprises everyone: calling a coroutine function does nothing. Coroutines are lazy. You get a coroutine object back; the body hasn't run.

c = fetch_user(7)    # <coroutine object> — NOTHING has executed yet
result = await c   # NOW it runs and yields the dict

At the very top — your main — there's no await available because you're not inside a coroutine yet. That's what asyncio.run() is for: it starts an event loop, runs your top coroutine to completion, and tears the loop down.

async def main():
    user = await fetch_user(7)
    print(user)

asyncio.run(main())   # the single entry point — call this once

Forgot to await? If you write fetch_user(7) without await, Python warns "coroutine was never awaited" and the work silently never happens. When a result is mysteriously empty in async code, this is the first thing to check.

PHP bridge: there's no real equivalent — PHP just runs the function. The mental shift: in async Python, defining and scheduling work are two separate steps.

3 · Running things concurrently

Awaiting in a loop is still sequential — each await finishes before the next starts. Three one-second calls take three seconds:

# SLOW — sequential, 3 seconds total
results = []
for uid in [1, 2, 3]:
    results.append(await fetch_user(uid))

To overlap the waits you must schedule them together. Two modern ways.

gather the classic fan-out

asyncio.gather takes many coroutines, runs them concurrently, and returns their results in order. Three one-second calls now take ~one second:

# FAST — concurrent, ~1 second total
results = await asyncio.gather(
    fetch_user(1), fetch_user(2), fetch_user(3)
)

# Fan out over a list of inputs:
results = await asyncio.gather(*[fetch_user(u) for u in ids])

TaskGroup structured concurrency (3.11+)

The modern, recommended way is asyncio.TaskGroup. You open an async with block, create tasks inside it, and the block does not exit until they all finish. This is structured concurrency: the tasks' lifetime is bounded by the block.

async def main():
    async with asyncio.TaskGroup() as tg:
        t1 = tg.create_task(fetch_user(1))
        t2 = tg.create_task(fetch_user(2))
        t3 = tg.create_task(fetch_user(3))
    # on exit, ALL tasks are done — read results with .result()
    print(t1.result(), t2.result(), t3.result())

Why prefer it over gather:

Auto-cancellation: if one task raises, the group cancels the siblings instead of leaving them running in the background.
Exception grouping: failures surface as an ExceptionGroup, which you catch with except* (3.11+) — no silently lost errors.
No leaks: you cannot exit the block with tasks still dangling.

try:
    async with asyncio.TaskGroup() as tg:
        tg.create_task(fetch_user(1))
        tg.create_task(might_fail())
except* ValueError as eg:        # note the star — catches from the group
    print("one or more failed:", eg.exceptions)

Rule of thumb Reach for TaskGroup by default in new code. Keep gather for quick "run these and give me the list" cases where you don't need cancellation semantics — and remember gather(..., return_exceptions=True) if you want failures as values instead of raised.

4 · Timeouts & cancellation

A network call that hangs forever will hang your request forever. Always bound external waits. The modern tool is the asyncio.timeout() context manager (3.11+):

try:
    async with asyncio.timeout(5):     # 5 seconds for everything inside
        data = await slow_api_call()
except TimeoutError:
    data = None                     # fall back gracefully

The older one-shot form still appears in codebases: await asyncio.wait_for(slow_api_call(), timeout=5). Both wrap the awaited work and cancel it if the clock runs out.

What cancellation actually does: it raises asyncio.CancelledError inside the awaiting coroutine at its current await point. That's how a timeout or a TaskGroup stops sibling tasks.

Never swallow CancelledError A bare except Exception: can accidentally catch CancelledError and break cancellation — your "cancelled" task keeps running. If you must catch it to clean up, re-raise it:

try:
    await work()
except asyncio.CancelledError:
    cleanup()
    raise                       # always re-raise — let cancellation propagate

5 · The blocking trap (the FastAPI interview gotcha)

The event loop is a single thread. If you call a blocking, synchronous function inside a coroutine, that thread cannot do anything else — every other request on the server freezes until it returns. This is the async mistake interviewers probe for.

# DISASTER — blocks the ENTIRE event loop for 5 seconds.
# Every other user's request is frozen.
async def handler():
    import time
    time.sleep(5)              # sync sleep — NOT awaitable, blocks all
    response = requests.get(url)  # sync HTTP lib — same problem

The offenders: time.sleep, the requests library, heavy CPU loops, synchronous database drivers, file reads with the plain open(). Anything that isn't awaited but takes real time.

Two fixes:

Use an async-native library. Swap requests for httpx and await it; swap time.sleep for asyncio.sleep. Best option when one exists.
Push blocking work to a thread with asyncio.to_thread(). It runs the sync function on a worker thread and gives you back an awaitable, freeing the loop:

# Blocking function you can't avoid (a sync SDK, a CPU-light parse)
async def handler():
    result = await asyncio.to_thread(slow_blocking_fn, arg1, arg2)
    return result

Interview answer · "Why is calling requests.get in an async route bad?" "Because the event loop runs on one thread. requests blocks that thread until the HTTP call returns, so while it waits, no other request can be served — async throughput collapses to one-at-a-time. The fix is an async client like httpx that I can await, or wrapping the blocking call in asyncio.to_thread so it runs off the loop. For genuinely CPU-bound work I'd use a process pool, since threads can't beat the GIL." Thirty seconds, and it tells them you've felt this in production.

PHP bridge: in PHP this never bites you — each request owns its own worker, so a slow curl only blocks that request. In async Python a slow blocking call blocks everyone sharing the loop. Different model, different discipline.

6 · async context managers & iterators

Some resources need to set up and tear down asynchronously — opening a connection pool is itself I/O. For those you use async with instead of plain with:

import httpx

async with httpx.AsyncClient() as client:   # async setup/teardown
    r = await client.get("https://api.example.com/docs")
    print(r.json())

And when data arrives in a stream — say a model streaming tokens, or a large download — you consume it with async for, awaiting each chunk as it lands:

async with client.stream("GET", url) as resp:
    async for chunk in resp.aiter_bytes():
        process(chunk)              # handle each piece as it streams in

You'll use both directly in DocChat: async with for the HTTP client to your embedding/LLM provider, and async for to stream the chat answer back to the browser token-by-token.

7 · DocChat: embedding N chunks, concurrently but bounded

The real problem A user uploads a PDF. You've split it into 60 text chunks. Each must be sent to the embedding API. Sequentially that's 60 round-trips — slow. Fire all 60 at once and you'll hit rate limits or get throttled. The professional answer: concurrent, but capped — with a Semaphore.

A Semaphore(n) is a permit counter. At most n coroutines hold a permit at once; the rest wait their turn at async with sem:. Combine it with a TaskGroup to fan out safely:

embed.py

import asyncio, httpx

async def embed_chunk(client, sem, text):
    async with sem:                       # wait for a free permit (max N in flight)
        async with asyncio.timeout(30):       # bound each call
            r = await client.post(EMBED_URL, json={"input": text})
            return r.json()["embedding"]

async def embed_all(chunks):
    sem = asyncio.Semaphore(8)             # at most 8 concurrent API calls
    async with httpx.AsyncClient() as client:
        async with asyncio.TaskGroup() as tg:
            tasks = [
                tg.create_task(embed_chunk(client, sem, c))
                for c in chunks
            ]
    return [t.result() for t in tasks]   # all done; collect in order

vectors = asyncio.run(embed_all(chunks))

Read what this gives you: one shared HTTP client (connection reuse), eight calls in flight at a time (fast but polite), a 30-second cap on each (no infinite hangs), automatic cancellation of siblings if one blows up, and results in original order. That single function is genuinely the heart of DocChat's ingestion.

Your tangible win By the end of this you can explain — and write — concurrent, bounded, timeout-guarded I/O. That's the exact skill the FastAPI module is built on, and it's a senior-sounding answer in any 2026 backend interview.

Flashcards — flip before you move on

Concurrency vs parallelism — one line each?

tap to flip

Concurrency = many tasks interleaved on one worker (asyncio). Parallelism = many tasks at the same instant on many cores (multiprocessing).

Does async speed up CPU-bound work?

tap to flip

No. Async only overlaps waiting (I/O). CPU work has no idle wait to fill — use multiprocessing / a ProcessPoolExecutor for that.

What does calling a coroutine function do?

tap to flip

Nothing yet — it returns a lazy coroutine object. It only runs when you await it (or schedule it as a task / pass it to gather).

TaskGroup over gather — why?

tap to flip

Structured concurrency: auto-cancels siblings on failure, groups errors as an ExceptionGroup (catch with except*), and never leaks dangling tasks.

You must run a blocking sync function. How?

tap to flip

await asyncio.to_thread(fn, args) — runs it on a worker thread so the event loop stays free. Or switch to an async-native library.

What does cancellation raise, and the rule?

tap to flip

It raises asyncio.CancelledError at the current await point. Never swallow it — if you catch it to clean up, always raise again.

Build itDrill

Write an async script that "fetches" three fake URLs concurrently. Each fetch(url) should await asyncio.sleep for a different number of seconds (1, 2, 3) and return f"{url} done". Run all three inside a TaskGroup, bound the whole group to a 2.5-second timeout, and print which results came back before the deadline. Total wall-clock should be ~2.5s, not 6s.

Show a clean solution

import asyncio

async def fetch(url, delay):
    await asyncio.sleep(delay)
    return f"{url} done"

async def main():
    jobs = [("a", 1), ("b", 2), ("c", 3)]
    tasks = []
    try:
        async with asyncio.timeout(2.5):
            async with asyncio.TaskGroup() as tg:
                tasks = [tg.create_task(fetch(u, d)) for u, d in jobs]
    except TimeoutError:
        print("deadline hit — slow ones were cancelled")

    for t in tasks:
        if t.done() and not t.cancelled():
            print(t.result())          # a and b made it; c was cancelled

asyncio.run(main())

Note how the timeout cancels the still-running task c by raising CancelledError inside it — exactly the mechanism from §4. The TaskGroup ensures nothing is left dangling.

8 · Check yourself

Answer from memory — retrieval is what moves this from "I read it" to "I know it".

Recall quiz

Async mainly helps which kind of work?

What does calling a coroutine function return?

Why prefer TaskGroup over plain gather?

How do you run blocking code safely?

What should you do with CancelledError?

Primary source ⭐ The official Python asyncio documentation — authoritative for coroutines, TaskGroup, timeouts, and to_thread. For a gentler narrative walkthrough, read Real Python — Async IO in Python: A Complete Walkthrough.

Async Python, properly

1 · Why async exists

2 · async def & await

3 · Running things concurrently

gather the classic fan-out

TaskGroup structured concurrency (3.11+)

4 · Timeouts & cancellation

5 · The blocking trap (the FastAPI interview gotcha)

6 · async context managers & iterators

7 · DocChat: embedding N chunks, concurrently but bounded

Flashcards — flip before you move on

8 · Check yourself

2 · `async def` & `await`