The plumbing you paste into every backend: structured logging, typed settings, UTC dates, an httpx client, retry/backoff, safe file I/O, custom errors. Not language tutorials — reusable recipes with the code, when to reach for it, and the gotcha.
Why these, for real backend work
A FastAPI + RAG service lives or dies on its boring parts. It calls embedding and LLM APIs over flaky networks (so: httpx + retry), reads its config from the environment in Docker (so: typed settings), writes timestamps that cross timezones (so: UTC everywhere), and needs to be debuggable at 2am from log aggregation (so: structured logging, never print()). These thirteen recipes are that kit.
When: every app, from line one — you want to see what happened, with levels and context, in stdout that your platform ships to log aggregation.
app/logging_setup.py
import logging
from logging.config import dictConfig
defconfigure_logging(level: str = "INFO") -> None:
dictConfig({
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"json": {
# key=value is grep-able and parses cleanly in most stacks"format": 'time=%(asctime)s level=%(levelname)s ''logger=%(name)s msg="%(message)s"',
},
},
"handlers": {
"stdout": {"class": "logging.StreamHandler", "formatter": "json"},
},
"root": {"handlers": ["stdout"], "level": level},
})
# In every other module — name it after the module, never the root logger:
logger = logging.getLogger(__name__)
logger.info("embedded chunks count=%s doc_id=%s", 128, doc_id)
A module-level getLogger(__name__) gives every line a logger name (so you can filter by module), and one dictConfig at startup sets format + level once for the whole tree.
print() has no levels, no timestamps, no routing, and floods prod — it's a debugging crutch, not application output; and call configure_logging() once at startup, not per-request.
Typed settings from env (pydantic-settings)
When: you need config — DB URL, API keys, model name — that comes from .env locally and real env vars in Docker, validated and typed.
Field types coerce and validate env strings for free ("1" → bool, "30" → float), defaults document what's optional, and one cached instance means config is parsed once and shared.
Don't sprinkle os.getenv("THING") across the codebase — it's untyped, undocumented, and fails at the point of use instead of at startup; never commit the real .env.
Timezone-aware datetimes
When: any timestamp you store, compare, or send across a wire — created_at, token expiry, "embedded at".
from datetime import datetime, timezone
now = datetime.now(timezone.utc) # aware, in UTC — the only now() you want
stamp = now.isoformat() # "2026-06-19T14:03:22.511+00:00"# store/transmit the isoformat string; parse it back the same way:
parsed = datetime.fromisoformat(stamp) # aware again, ready to compare
age_seconds = (datetime.now(timezone.utc) - parsed).total_seconds()
# Py 3.11+: datetime.UTC is a shorter alias for timezone.utc
UTC-aware datetimes sort and subtract correctly across servers and clients, and isoformat() ↔ fromisoformat() is a lossless, unambiguous round-trip that every language understands.
datetime.now() and datetime.utcnow() are naive (no tzinfo) — comparing a naive to an aware datetime raises TypeError, and naive timestamps silently drift when machines disagree on local time. Always pass timezone.utc.
Talking to the outside world 2 recipes
HTTP calls with a reusable httpx client
When: you call any external API — embeddings, an LLM, a webhook — and especially when you're inside an async FastAPI route.
app/clients.py
import httpx
from app.settings import settings
# Build the client ONCE and reuse it — it pools connections.
client = httpx.AsyncClient(
base_url="https://api.openai.com/v1",
timeout=httpx.Timeout(settings.request_timeout), # NEVER leave timeout unset
headers={"Authorization": f"Bearer {settings.openai_api_key}"},
http2=True,
)
async defembed(text: str) -> list[float]:
resp = await client.post("/embeddings",
json={"model": settings.embed_model, "input": text})
resp.raise_for_status() # turn 4xx/5xx into an exceptionreturn resp.json()["data"][0]["embedding"]
# on shutdown: await client.aclose()
A single long-lived client reuses TCP connections (huge under load), a base_url + headers keep call sites clean, and httpx gives you the same API sync or async plus HTTP/2 — which requests can't do.
requests is fine for a script, but it has no async and defaults to no timeout — a hung upstream can freeze a worker forever. Set a timeout on every client, and don't create a new client per request.
Retry with backoff (tenacity + hand-rolled)
When: calling a flaky network service that occasionally 500s or times out — embedding/LLM APIs especially. Retry the transient, not the permanent.
from tenacity import (retry, stop_after_attempt,
wait_exponential, retry_if_exception_type)
import httpx
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=0.5, max=8), # 0.5, 1, 2, 4… capped at 8s
retry=retry_if_exception_type((httpx.TimeoutException, httpx.HTTPStatusError)),
reraise=True,
)
async defembed_with_retry(text: str) -> list[float]:
returnawait embed(text)
# --- hand-rolled, when you don't want the dependency ---import asyncio, random
async defwith_backoff(fn, *, attempts=4, base=0.5, cap=8.0):
for i in range(attempts):
try:
returnawait fn()
except (httpx.TimeoutException, httpx.HTTPStatusError):
if i == attempts - 1:
raise# out of tries — let it bubble
delay = min(cap, base * 2 ** i) + random.uniform(0, 0.3) # jitterawait asyncio.sleep(delay)
Exponential backoff with a cap (and a little jitter) spreads load instead of hammering a struggling service in lockstep, and limiting which exceptions retry keeps you from masking real bugs.
Never retry 4xx — a 400/401/404 means the request itself is wrong and will fail identically forever; retry only timeouts and 5xx, and always cap both the delay and the attempt count or you'll DOS yourself.
Data & files 3 recipes
Safe JSON read/write
When: caching a response to disk, reading a fixture, or persisting a small index — and the data might contain dates, UUIDs, or a malformed file.
default=str stops dumps from crashing on non-JSON types you forgot were in there, and catching FileNotFoundError / JSONDecodeError turns two real failure modes into a graceful fallback.
A bare json.load(open(path)) leaks the file handle and explodes on the first missing or half-written file — read via Path.read_text() and guard the parse.
pathlib for paths
When: you touch the filesystem at all — locating files relative to a module, joining paths, checking existence, globbing a directory.
from pathlib import Path
HERE = Path(__file__).parent # dir of the current file
DATA = HERE / "data" / "docs"# "/" joins — OS-correct separators
DATA.mkdir(parents=True, exist_ok=True) # make it if missing, no error if thereif (DATA / "index.faiss").exists():
...
for pdf in DATA.glob("*.pdf"): # iterate matching files
print(pdf.name, pdf.stat().st_size)
Path(__file__).parent anchors paths to the code, not the working directory (which changes), and the / operator produces correct separators on every OS while giving you .exists(), .glob(), .read_text() in one object.
String-concatenating paths (dir + "/" + name) breaks on Windows, double-slashes, and relative-cwd surprises — let pathlib handle joining.
Dataclass / TypedDict for shape
When: you pass a small structured value around internally and want the editor and the next reader to know its fields.
from dataclasses import dataclass
from typing import TypedDict
# dataclass: an internal value object you construct and pass
@dataclass(frozen=True, slots=True)
classChunk:
doc_id: str
text: str
page: int = 0
c = Chunk(doc_id="abc", text="…", page=3) # c.doc_id, repr, == all free# TypedDict: you're stuck with a dict (an API payload) but want field typesclassEmbedResult(TypedDict):
embedding: list[float]
tokens: int
defuse(r: EmbedResult) -> int:
return r["tokens"] # type-checker knows the keys
A @dataclass gives you __init__/__repr__/__eq__ with zero boilerplate (and frozen+slots = immutable and memory-light), while TypedDict documents the shape of dicts you can't replace — types as documentation the editor enforces.
Don't reach for a full Pydantic model when nothing crosses a trust boundary — validation costs cycles; use a dataclass for internal values and save Pydantic for request/response edges.
Reliability & structure 5 recipes
Custom exceptions & a boundary
When: your code has domain failure modes ("doc not found", "embedding failed") and you want to handle them in one place — the request handler — not scattered everywhere.
# app/errors.py — one base, specific subclassesclassAppError(Exception):
"""Base for all errors this app raises on purpose."""classDocumentNotFound(AppError): ...
classEmbeddingFailed(AppError): ...
# deep in the code — just try it (EAFP), raise a domain errordefget_doc(doc_id: str) -> dict:
try:
return STORE[doc_id]
except KeyError:
raise DocumentNotFound(doc_id) # translate to a domain error# at the EDGE (e.g. a FastAPI handler) — catch the base oncetry:
doc = get_doc(doc_id)
except AppError as e:
logger.warning("app error type=%s detail=%s", type(e).__name__, e)
raise HTTPException(status_code=404, detail=str(e))
A single AppError base lets the boundary catch all intentional failures with one except, while subclasses keep the meaning — and EAFP (try, then handle the error) is cleaner and race-free versus checking first.
Don't catch broad exceptions deep in helpers (you swallow bugs and lose the stack) and don't let raw KeyError/httpx errors leak to the client — translate at the source, handle at the edge.
lru_cache / cache for expensive pure calls
When: a function is pure (same input → same output, no side effects) and expensive — a config parse, a tokenizer load, a deterministic computation called repeatedly with the same args.
from functools import cache, lru_cache
@cache # Py 3.9+: unbounded memoizedefload_tokenizer(name: str):
return Tokenizer.from_pretrained(name) # slow — do it once per name
@lru_cache(maxsize=1024) # bounded: evicts least-recently-useddeftoken_count(text: str) -> int:
return len(load_tokenizer("cl100k").encode(text))
Memoizing a pure function trades a little memory for skipping repeated work entirely, and maxsize on lru_cache bounds that memory by evicting the coldest entries.
Never cache impure functions — caching something that hits the DB or reads now() returns stale data — and arguments must be hashable (no lists/dicts); unbounded @cache on unbounded inputs is a slow memory leak.
Batch an iterable into chunks
When: you must process a large sequence in groups — embed 1000 chunks 100 at a time, bulk-insert rows, or respect an API's batch limit.
from itertools import islice
defbatched(iterable, n):
"""Yield lists of up to n items. (Py 3.12+: use itertools.batched.)"""
it = iter(iterable)
while batch := list(islice(it, n)):
yield batch
# Py 3.12+: from itertools import batchedfor group in batched(chunks, 100): # 1000 chunks → 10 calls of 100
vectors = await embed_many(group) # one API round-trip per batch
store(vectors)
Lazy batching with islice chunks any iterable (even a generator) without ever building the whole list in memory, and grouping turns N tiny API calls into N/100 efficient ones.
On Python 3.12+ just use the built-in itertools.batched — but note it yields tuples, not lists; and don't slice with list[i:i+n] if the source is a one-shot generator (you'd exhaust or re-iterate it wrong).
A tiny CLI (typer)
When: a script needs to be run by a human or a cron job with arguments — reindex a doc, backfill embeddings — instead of editing constants and re-running.
scripts/reindex.py
import typer
app = typer.Typer()
@app.command()
defreindex(doc_id: str, force: bool = False, batch: int = 100):
"""Re-embed and re-index one document."""
typer.echo(f"reindexing {doc_id} (force={force}, batch={batch})")
...
if __name__ == "__main__":
app()
# run: python -m scripts.reindex abc123 --force --batch 50# argparse is the stdlib alternative — no dependency, more boilerplate
Typer turns type-hinted function params into validated --flags with a --help for free, so the script is self-documenting and safe to hand to someone else (or a scheduler).
A CLI beats editing hard-coded constants because those changes get committed by accident and aren't reproducible — but don't put real logic in the command function; have it call your importable, testable code.
The __main__ import-safe entry
When: a file is both something you run directly and something other modules import — which, in practice, is almost every module.
defmain() -> None:
configure_logging()
run_the_thing()
if __name__ == "__main__": # True only when run directly
main() # NOT when imported, NOT under pytest
The guard runs main() only when the file is the entry point — so importing it (for tests, or to reuse a function) doesn't fire side effects, kick off a server, or block on input.
Code at module top level runs on every import — putting your startup there means import app.cli silently launches it, and tools like pytest or autodoc that import your modules will trigger it too.
The pattern behind the patterns
Notice the through-line: do the boring thing once, at the edge, with types. Configure logging once. Read settings once. Build the client once. Catch errors at the boundary. Anchor paths to the code. That discipline is what makes a backend debuggable and calm under load — and it's nearly all paste-able.