Interview Bank · 2026
The round where they stop testing syntax and start testing judgement: can you reason about a system out loud, tell your story, and read the UAE market. Don't recite — think aloud, structure it, and keep pointing at DocChat.
Click a card. Name it in one breath, then reveal.
This is exactly your DocChat. Don't list parts — narrate how you'd reason it out. Structure beats completeness.
"Before I design, let me scope it." Out loud, ask: who uses it (one team or public?), scale (10 docs or 10 million?), what 'done' means (just Q&A, or auth, history, citations?), and constraints (latency budget, cost ceiling). For DocChat I'd say: "Authenticated users upload PDFs and ask questions; answers must cite sources; moderate scale — hundreds of users. I'll optimise for correct, grounded answers over raw throughput." Stating assumptions out loud is the single biggest signal in this round — it shows you design for a real problem, not a generic one.
"Here's the high level, then I'll go deeper where it matters." Five boxes: a frontend (Next.js — upload UI + chat), an API (FastAPI — auth, orchestration), a database (Postgres — users, documents, chat history), a vector store (pgvector in that same Postgres — embeddings for retrieval), and the LLM (Claude, for the generation step). I'd note the deliberate choice: "I keep vectors in Postgres via pgvector rather than a separate vector DB — one less system to run, and it's plenty at this scale." Naming the trade-off as you draw shows seniority.
"Let me follow a document through the system." Upload: user sends a PDF; API stores the file and a row in Postgres. Ingest: extract text → chunk it → embed each chunk → write vectors to pgvector. This is slow, so I'd push it to a background job / queue and return immediately — the user doesn't wait. Ask: a question comes in → embed the question → vector-search for the top-k nearest chunks → stuff them into the prompt as context → call the LLM → stream the grounded answer back with citations. Walking one request end-to-end proves you understand the system as a living thing, not a static diagram.
"If traffic grows, here's what I'd reach for, in order." Make the API stateless (JWT in the request, no in-memory sessions) so I can run many instances behind a load balancer and scale out. Add connection pooling (e.g. PgBouncer) so a burst of instances doesn't exhaust Postgres connections. Put a proper vector index (HNSW in pgvector) on the embeddings so search stays fast as documents grow. Cache repeated/identical questions and their answers to cut LLM cost and latency. Add rate limiting per user to protect cost. Move ingestion fully onto a queue so spikes don't block the API. The discipline that impresses: tie each lever to the bottleneck it relieves.
This is where you sound senior. Chunk size: small chunks = precise retrieval but lost context; large chunks = more context but noisier, pricier prompts — I'd start ~500 tokens with overlap and tune on real questions. Cost vs latency: a bigger model answers better but costs more and is slower; I might route easy questions to a cheaper model and hard ones to a stronger one. Build vs buy: pgvector (build, full control, one system) vs a managed vector DB like Pinecone (buy, less ops, more cost/lock-in) — at this scale I build; at massive scale I'd reconsider. Closing with "here's what I'd watch and revisit" beats pretending a design is final.
Vertical (scale up) = a bigger machine: simplest, but there's a ceiling and a single point of failure. Horizontal (scale out) = more machines behind a load balancer: near-limitless and fault-tolerant, but only works if the layer is stateless. Rule of thumb: scale up until it's awkward or risky, then scale out. Stateful pieces (the database) are the hard ones to scale horizontally — that's why we reach for read replicas and pooling.
You can cache at many levels: the browser/CDN (static assets), the app (computed results in Redis), and the DB (query cache). Caching trades freshness for speed — so the hard part is invalidation: knowing when cached data is stale and evicting it. "There are only two hard problems: cache invalidation and naming things." For DocChat, caching identical question→answer pairs is cheap and safe; caching document content needs a clear bust-on-update rule.
Latency = how long one request takes (a single answer feeling snappy). Throughput = how many requests you handle per second overall. They're different and can trade off: batching work raises throughput but adds latency to each item. Know which the question is really about — a chat answer is a latency problem (stream it so it feels instant); a nightly re-ingest of 100k docs is a throughput problem.
The primary handles writes and copies changes to one or more read replicas; reads are spread across the replicas. This scales read-heavy workloads (most apps) and gives you failover. The catch is replication lag — a replica can be milliseconds behind, so a user might not instantly see their own write. You route reads to replicas but reads-after-writes (like "show me what I just saved") to the primary.
Naïve vector search compares the query to every vector — fine for thousands, fatal for millions. The fix is an approximate nearest neighbour (ANN) index like HNSW (a navigable graph) or IVFFlat — trading a sliver of recall for huge speed. In pgvector you build an HNSW index on the embedding column. Beyond that: shard by tenant/namespace so each search hits a smaller set, and pre-filter by metadata (e.g. only this user's documents) before the vector search.
If a server keeps session state in its own memory, requests are stuck to that server (sticky sessions) — you can't freely add or kill instances, and one crash loses sessions. Make each request self-contained (auth via JWT, shared state pushed to Postgres/Redis) and suddenly any instance can serve any request. That's the precondition for horizontal scaling, zero-downtime deploys, and autoscaling. It's the quiet reason your FastAPI app scales at all.
STAR = Situation, Task, Action, Result. Keep DocChat as your running example — a real shipped project makes every answer concrete.
Not STAR — this is a 60-second past → present → future arc. Past: "I'm a developer with a web background in PHP and jQuery." Present: "Over the last weeks I retrained on the modern stack — Python, FastAPI, Postgres, Next.js — and built DocChat, a deployed document Q&A app with a RAG pipeline." Future: "Now I'm looking for a full-stack role in the UAE where I can ship features like that." End by pointing at the project, not your CV. Keep it tight — they're testing whether you can be concise and signal direction.
S: "In DocChat, answers sometimes ignored the right document." T: "I had to find why retrieval was missing obvious passages." A: "I logged the retrieved chunks, saw my chunks were too large and diluting the match, reduced chunk size and added overlap, and verified with a set of known question→source pairs." R: "Retrieval accuracy jumped and answers started citing the correct passage." The trick is to show method — reproduce, isolate, fix, verify — not heroics. Interviewers want a debugging process they'd trust on their codebase.
S: a teammate wanted a separate vector database; I argued for pgvector. T: agree on an approach without it turning personal. A: "I asked what problem the separate DB solved, shared our actual scale numbers, and proposed we start with pgvector and revisit if we hit limits." R: "We shipped faster on one system and kept the door open." The point isn't winning — it's showing you listen, use data, and disagree-and-commit. Never badmouth the other person.
Frame it as deliberate, not random. S/T: "My web fundamentals were strong but my stack was dated for the UAE market." A: "I picked exactly what employers here hire for — Python/FastAPI, Postgres, React/Next.js — and proved it by building and deploying DocChat end-to-end, including an AI feature." R: "I switched into demand with something shippable to show for it." Motivation + evidence beats either alone.
Give a real one with a correction in motion — not a humblebrag. "Coming from PHP, I'd reach for synchronous patterns; FastAPI's async model took deliberate effort. So when building DocChat I forced myself to learn where async actually helps — I/O-bound calls like the LLM request — and where it doesn't." That's S→A→R: honest gap, concrete action, you're already past it. Avoid "I work too hard" — interviewers have heard it a thousand times and it reads as evasive.
Show ambition that fits the role, not a five-year manifesto. "I want to go deep as a full-stack engineer — own features end-to-end, get strong at the AI side since that's where I already have an edge, and grow toward mentoring or leading on technical decisions." Tie it back: "Shipping DocChat solo showed me I like owning the whole slice — that's the direction I want to keep going." Keep it grounded in this job, not a different one.
Referrals. In the UAE market, a referral from someone inside a company beats a cold application many times over — it gets your CV actually read. So the real job before the job is building a small network: engage where UAE developers are, do a little visible work, and ask for intros. Cold-applying still happens, but treat referrals as your primary channel, not a backup.
Real places to start: r/dubai for local job-market reality and leads; Dubai developer meetups on Meetup for in-person networking that turns into referrals; and LinkedIn "Tech in UAE" groups for postings and people to connect with. Show up consistently, be useful, and ask genuine questions — networking that leads to referrals is a relationship, not a single message.
Research first — know the realistic band for your level and stack in the UAE before the call. Then give a range (not a single number you might undersell yourself with), framed as AED per month (the local convention — salaries are quoted monthly, often tax-free). Something like: "Based on what I've seen for full-stack roles at my level, I'd expect somewhere in the X–Y AED/month range, but I'm flexible depending on the full package." Anchor with data, stay open, and let them counter.
Be concise — answer the question asked, then stop; rambling reads as uncertainty. And when you hit something you don't know, say so gracefully: "I don't know that off the top of my head — here's how I'd find out." Then describe your approach (check the docs, test it, reason from first principles). That honesty plus a method beats a confident wrong answer every time, and it's exactly how you'd behave on the job.
Expect direct questions about Copilot, Claude, and similar in your workflow. The strong answer is responsible use: "I use them to move faster — scaffolding, boilerplate, rubber-ducking a bug — but I read and understand every line, I don't ship what I can't explain, and I never paste secrets or proprietary code into them." That signals you're productive and trustworthy. Saying "I don't use AI" reads as out of touch in 2026; saying "I let it write everything" reads as risky. The middle is the win.
People who can ship — take something from idea to deployed and working — and who can add AI features rather than just talk about them. That's precisely your edge: DocChat is a real, deployed, end-to-end app with a working RAG pipeline. In a market full of candidates reciting theory, "I built and shipped this, here's the live link, let me walk you through the retrieval flow" is rare and exactly what fast-moving teams want.