Module 1 · Python · Deep Dive
From loose scripts to reusable, organised code — functions, classes, dataclasses, and the module system that every FastAPI app is built from.
BasicIntermediateAdvancedBuild
self. Nail both today.
You define a function with def, a colon, and an indented body. return hands a value back; a function with no return gives back None.
def greet(name): return f"Hello, {name}" print(greet("Sam")) # Hello, SamPHP bridge:
def greet($name) {} becomes def greet(name): — same idea, no braces, no $, no function keyword.
Give a parameter a fallback value and callers can omit it:
def greet(name, greeting="Hello"): return f"{greeting}, {name}" greet("Sam") # "Hello, Sam" greet("Sam", "Marhaba") # "Marhaba, Sam"
You can pass arguments by name, in any order — this makes calls self-documenting:
greet(name="Sam", greeting="Hi") greet(greeting="Hi", name="Sam") # same result — order doesn't matter
greet(name="Sam", "Hi") is a syntax error; greet("Sam", greeting="Hi") is fine.
A string as the first line of a function body is its docstring — it shows up in help() and in your editor's tooltips. Write them for anything non-obvious.
def word_count(text): """Return the number of whitespace-separated words in text.""" return len(text.split())
Sometimes you don't know how many arguments you'll get. *args collects extra positional arguments into a tuple; **kwargs collects extra keyword arguments into a dict.
def log(*args, **kwargs): print("positional:", args) # a tuple print("keyword:", kwargs) # a dict log(1, 2, user="sam", level="info") # positional: (1, 2) # keyword: {'user': 'sam', 'level': 'info'}
The names args and kwargs are convention, not magic — the * and ** do the work. You'll see **kwargs constantly when wrapping or forwarding to other functions.
Python is dynamically typed, but you can annotate what you expect. Hints don't enforce anything at runtime — they power your editor, tools like mypy, and (crucially) FastAPI and Pydantic, which read them to validate requests.
def repeat(text: str, times: int = 2) -> str: return text * times repeat("ab", 3) # "ababab"
Read text: str as "text, expected to be a str", and -> str as "returns a str". You'll write hints on every FastAPI route, so get comfortable now.
function repeat(string $text): string), except Python does not enforce them at runtime — they're documentation plus tooling fuel.
Variables created inside a function are local — they don't leak out. To read a module-level (global) name you can just reference it; to reassign one from inside a function you'd need the global keyword (which you should almost never do).
counter = 0 def bump(): global counter # needed only to REASSIGN a global counter += 1 bump() print(counter) # 1
Prefer passing values in and returning them out over reaching for global — it keeps functions testable, which interviewers like to hear.
# BUG: the list is created once and reused def add_tag(tag, tags=[]): tags.append(tag) return tags add_tag("ai") # ['ai'] add_tag("uae") # ['ai', 'uae'] ← leaked from the previous call!
The fix is the universal idiom: default to None, then create a fresh object inside.
# FIX: None sentinel, fresh list each call def add_tag(tag, tags=None): if tags is None: tags = [] tags.append(tag) return tags add_tag("ai") # ['ai'] add_tag("uae") # ['uae'] ← correct, independent
def f(x, items=[]):?" The mutable default is created once and persists between calls, so it accumulates state. The fix: default to None and assign a new list inside. This question appears in real UAE screening rounds — say "evaluated once at definition time" and you've passed it.
In Python, functions are values. You can store one in a variable, pass it to another function, or return it. This is the foundation of decorators, callbacks, and FastAPI's route registration.
def shout(text): return text.upper() f = shout # no parentheses — store the function itself print(f("hi")) # "HI" # pass a function as an argument def apply(fn, value): return fn(value) apply(shout, "hello") # "HELLO"
A lambda is a tiny anonymous function for one-off use — most often as a key= argument to sorted():
docs = [{"title": "B", "pages": 5}, {"title": "A", "pages": 9}]
docs.sort(key=lambda d: d["pages"], reverse=True)
# sorted by pages, biggest first
PHP bridge: a lambda ≈ a PHP fn($d) => $d['pages'] arrow function. key=lambda is the same role as PHP's usort comparator, but cleaner.
A class is a blueprint; an instance is one object built from it. __init__ is the constructor; self is the current instance, passed automatically as the first parameter of every method.
class Document: def __init__(self, title, body): self.title = title # instance attribute self.body = body def word_count(self): return len(self.body.split()) doc = Document("Intro", "the cat sat") print(doc.title) # "Intro" print(doc.word_count()) # 3
self, really?
self is just the instance, handed to the method automatically. doc.word_count() is sugar for Document.word_count(doc). You must list self as the first parameter of every instance method, but you never pass it manually.
self.title is Python's $this->title; __init__ is PHP's __construct. The big difference: Python makes self an explicit first parameter rather than an implicit $this.
An attribute set on self is per-instance. An attribute set directly on the class is shared by all instances — useful for constants and defaults.
class Document: extension = ".txt" # class attribute — shared def __init__(self, title): self.title = title # instance attribute — per object a = Document("A") b = Document("B") print(a.extension, b.extension) # .txt .txt (same shared value)
By default, printing an object shows an ugly <__main__.Document object at 0x...>. Define __repr__ to control how it appears — invaluable for debugging.
class Document: def __init__(self, title, body): self.title = title self.body = body def __repr__(self): return f"Document(title={self.title!r}, words={len(self.body.split())})" print(Document("Intro", "a b c")) # Document(title='Intro', words=3)
The !r in an f-string calls repr() on the value — that's why the title comes out quoted.
A class can inherit from another, reusing and extending its behaviour. Call the parent's constructor with super().__init__(...).
class Document: def __init__(self, title, body): self.title = title self.body = body class PDFDocument(Document): def __init__(self, title, body, pages): super().__init__(title, body) # run the parent's __init__ self.pages = pages pdf = PDFDocument("Report", "...text...", 12) print(pdf.title, pdf.pages) # Report 12
A @property turns a method into something you access like an attribute — no parentheses. Use it for values derived from other data.
class Document: def __init__(self, body): self.body = body @property def word_count(self): return len(self.body.split()) doc = Document("the cat sat") print(doc.word_count) # 3 — no () because it's a propertyPHP bridge:
@property is like a PHP magic __get getter, but explicit and per-attribute — far cleaner and discoverable.
Most classes just hold data. Writing __init__ and __repr__ by hand for those is tedious. The @dataclass decorator generates them for you from the annotated fields.
from dataclasses import dataclass @dataclass class Document: title: str body: str pages: int = 1 # field with a default doc = Document("Intro", "the cat sat") print(doc) # Document(title='Intro', body='the cat sat', pages=1) print(doc.title) # "Intro"
You still add methods normally — the dataclass only generates the boilerplate (constructor, __repr__, equality). It's the idiomatic 2026 way to model a record.
@dataclass is the exact mental model for Pydantic's BaseModel, which you meet in Module 2 (FastAPI). Same shape: a class, typed fields, defaults. The difference: Pydantic also validates and parses incoming data against those types at runtime. When you understand dataclasses, Pydantic is a two-minute leap — and every FastAPI request body is a Pydantic model.
Every .py file is a module. You pull names from other modules with import.
# two import styles import math math.sqrt(9) # 3.0 from math import sqrt, pi sqrt(9) # 3.0 — name imported directly
Your own files work the same way. Say you have textutils.py next to your script:
textutils.py
def word_count(text: str) -> int: return len(text.split())
main.py
from textutils import word_count print(word_count("the cat sat")) # 3PHP bridge:
import ≈ require/use, but Python imports a whole module namespace at once — no manual file paths, just the module name. A folder with an __init__.py is a package (PHP's namespace-per-directory idea).
When Python runs a file directly, it sets that file's __name__ to "__main__". When the file is imported, __name__ is the module's name instead. The guard below means "only run this when executed directly, not when imported":
def main(): print("running directly") if __name__ == "__main__": main()
Without the guard, your script's top-level code would fire every time another file imported it. This pattern is on nearly every Python entry point you'll ever see.
A virtual environment is a per-project sandbox for packages, so Project A's FastAPI version can't collide with Project B's. Create, activate, install, then freeze your dependency list:
# create a venv in a .venv folder python -m venv .venv # activate it source .venv/bin/activate # macOS / Linux .venv\Scripts\activate # Windows # install packages into THIS project only pip install fastapi uvicorn # record exact versions so others can reproduce it pip freeze > requirements.txt # later, on another machine: pip install -r requirements.txt
(.venv). Commit requirements.txt to git, but never the .venv folder — add it to .gitignore. This is exactly how DocChat's dependencies will be managed.
requirements.txt ≈ composer.json; pip install ≈ composer require; the .venv folder ≈ vendor/. Same dependency-management instincts, different commands.
Not every method needs an instance. Two decorators change what gets passed as the first argument. A normal method receives self (the instance); a @classmethod receives cls (the class itself); a @staticmethod receives nothing automatic — it's just a plain function that lives inside the class for namespacing.
class User: def __init__(self, name, email): self.name = name self.email = email @classmethod def from_row(cls, row): # cls is User (or a subclass) return cls(row["name"], row["email"]) @staticmethod def is_valid_email(email): # no self, no cls — just a utility return "@" in email row = {"name": "Sam", "email": "sam@x.ae"} u = User.from_row(row) # build a User straight from a DB row User.is_valid_email("sam@x.ae") # True — called on the class, no instance
The classic use of @classmethod is an alternative constructor: __init__ takes the "normal" arguments, while from_row, from_json, from_file etc. each build an instance from a different source and then call cls(...). Because it uses cls (not the hard-coded User), a subclass that calls SubUser.from_row(row) correctly gets a SubUser back.
A @staticmethod earns its place when a helper is logically part of the class but needs neither the instance nor the class — like is_valid_email above. It's a namespacing choice, nothing more.
@staticmethod ≈ a PHP public static function called as User::isValidEmail(). @classmethod is the piece PHP lacks a clean equivalent for: a static-style method that still receives the class as cls, so it works polymorphically through static::-like late binding.
@classmethod and @staticmethod?" A classmethod receives the class as cls and is typically an alternative constructor (User.from_row(row)); a staticmethod receives nothing automatic and is just a namespaced utility. The tell of a strong answer is naming the alternative-constructor pattern — it's the single most common real use.
"Dunder" methods (double-underscore, like __init__) let your objects plug into Python's built-in operators. You already met __repr__; here are the ones interviewers actually ask about.
By default a == b is identity — true only if they're the same object. Define __eq__ to compare by value instead. But there's a catch every interviewer loves: defining __eq__ sets __hash__ to None, which makes the object unhashable — you can no longer put it in a set or use it as a dict key. If you want both, define __hash__ too.
class Tag: def __init__(self, name): self.name = name def __eq__(self, other): return isinstance(other, Tag) and self.name == other.name def __hash__(self): return hash(self.name) # keep it hashable for sets / dict keys Tag("ai") == Tag("ai") # True — compared by value {Tag("ai"), Tag("ai")} # {Tag('ai')} — one element, thanks to __hash__
@dataclass generates __eq__ automatically. By default it stays unhashable (matching the rule above); add @dataclass(frozen=True) and you also get a __hash__, because a frozen instance is immutable and safe to hash.
Define __lt__ (less-than) and Python can sort() your objects and answer <. Rather than writing all four of __lt__, __le__, __gt__, __ge__ by hand, define __eq__ plus one ordering method and let functools.total_ordering fill in the rest.
from functools import total_ordering @total_ordering class Version: def __init__(self, n): self.n = n def __eq__(self, other): return self.n == other.n def __lt__(self, other): return self.n < other.n sorted([Version(3), Version(1), Version(2)]) # sorted ascending Version(1) >= Version(1) # True — synthesised from __eq__ + __lt__
These look similar but serve different readers. __repr__ is for developers — unambiguous, debug-friendly, ideally something you could paste back into code. __str__ is for end users — friendly and readable. print(obj) and str(obj) use __str__; the REPL, containers, and repr(obj) use __repr__. If you define only one, define __repr__ — str() falls back to it.
class Money: def __init__(self, dirhams): self.dirhams = dirhams def __repr__(self): return f"Money(dirhams={self.dirhams!r})" # for devs / debugging def __str__(self): return f"AED {self.dirhams:.2f}" # for users m = Money(42) print(m) # AED 42.00 (uses __str__) m # Money(dirhams=42) (REPL uses __repr__) repr(m) # "Money(dirhams=42)"
__repr__ is for developers — unambiguous and ideally eval-able, shown in the REPL, debugger, and inside containers. __str__ is the friendly, user-facing form used by print() and str(). If I write only one I write __repr__, because str() falls back to it but never the other way around." That last sentence is what separates a memorised answer from an understood one.
A few more dunders make your object behave like a built-in collection — Python never checks its type, only whether it supports the right methods. That's duck typing: "if it has __len__ and __getitem__, it walks like a list." Define them and len(), indexing, iteration, and in all start working on your object.
class Library: def __init__(self, docs): self.docs = docs def __len__(self): # powers len(lib) return len(self.docs) def __getitem__(self, i): # powers lib[0] AND iteration return self.docs[i] def __contains__(self, title): # powers the `in` operator return any(d == title for d in self.docs) lib = Library(["intro", "setup", "deploy"]) len(lib) # 3 — __len__ lib[0] # "intro" — __getitem__ "setup" in lib # True — __contains__ for d in lib: ... # works! — Python falls back to __getitem__
Note the bonus: with __getitem__ alone Python can iterate by calling it with 0, 1, 2… until IndexError. Implement __contains__ only if you want in to be smarter or faster than that default scan.
__len__ ≈ Countable::count, __getitem__ ≈ ArrayAccess::offsetGet, iteration ≈ IteratorAggregate. Python just uses dunder methods instead of explicit implements declarations — the duck-typing philosophy.
word_count() method, a summary() method, and a clean __repr__ — then split it into its own module with a __main__ guard. This is the literal seed of DocChat's document model.
Try it yourself first, then compare. A clean version:
document.py
class Document: """A single uploaded document in DocChat.""" def __init__(self, title: str, body: str): self.title = title self.body = body def word_count(self) -> int: return len(self.body.split()) def summary(self, limit: int = 8) -> str: words = self.body.split() return " ".join(words[:limit]) + ("…" if len(words) > limit else "") def __repr__(self) -> str: return f"Document(title={self.title!r}, words={self.word_count()})" def main(): doc = Document("Welcome", "the cat sat on the mat and then the cat ran away") print(doc) # Document(title='Welcome', words=12) print(doc.word_count()) # 12 print(doc.summary()) # the cat sat on the mat and then… if __name__ == "__main__": main()
Run it with python document.py. Then in another file, from document import Document — note the main() demo does not fire on import, thanks to the guard. That separation is what makes code reusable.
Answer from memory — retrieval is what moves this from "I read it" to "I know it".
What does self refer to inside a method?
Why is def f(x, items=[]): a known trap?
What collects extra keyword arguments into a dict?
What does @dataclass save you from writing?
When is a file's __name__ set to "__main__"?
Which method is the typical alternative constructor?
How should you describe __repr__ against __str__?
dataclasses module docs.