Interview Bank · 2026

🏗 System Design & Behavioural

The round where they stop testing syntax and start testing judgement: can you reason about a system out loud, tell your story, and read the UAE market. Don't recite — think aloud, structure it, and keep pointing at DocChat.

Design building blocks flip to check

Click a card. Name it in one breath, then reveal.

Load balancer
Spreads incoming requests across many server instances — distributes load and removes single points of failure.
click to flip
Cache
A fast, temporary store (memory/Redis) for hot data so you skip the slow source — trades freshness for speed.
click to flip
Message queue
A buffer between producer and consumer — decouples work so slow tasks run async (e.g. ingest a document later).
click to flip
CDN
Edge servers that cache static assets near users — cuts latency by serving from the closest location.
click to flip
Read replica
A copy of the DB that takes read traffic, letting the primary focus on writes — scales reads horizontally.
click to flip
Rate limiting
Caps how many requests a client can make in a window — protects the system from abuse and runaway cost.
click to flip
Horizontal scaling
Add more machines (scale out), not a bigger one — the cloud-native way to grow, behind a load balancer.
click to flip
Statelessness
Each request carries everything it needs; the server stores no session in memory — so any instance can serve any request.
click to flip

Design walk-through: a document Q&A app the centrepiece — think aloud

This is exactly your DocChat. Don't list parts — narrate how you'd reason it out. Structure beats completeness.

1 · Clarify requirements before drawing anything

"Before I design, let me scope it." Out loud, ask: who uses it (one team or public?), scale (10 docs or 10 million?), what 'done' means (just Q&A, or auth, history, citations?), and constraints (latency budget, cost ceiling). For DocChat I'd say: "Authenticated users upload PDFs and ask questions; answers must cite sources; moderate scale — hundreds of users. I'll optimise for correct, grounded answers over raw throughput." Stating assumptions out loud is the single biggest signal in this round — it shows you design for a real problem, not a generic one.

2 · Lay out the core components

"Here's the high level, then I'll go deeper where it matters." Five boxes: a frontend (Next.js — upload UI + chat), an API (FastAPI — auth, orchestration), a database (Postgres — users, documents, chat history), a vector store (pgvector in that same Postgres — embeddings for retrieval), and the LLM (Claude, for the generation step). I'd note the deliberate choice: "I keep vectors in Postgres via pgvector rather than a separate vector DB — one less system to run, and it's plenty at this scale." Naming the trade-off as you draw shows seniority.

3 · Trace the data flow: upload → ingest → ask

"Let me follow a document through the system." Upload: user sends a PDF; API stores the file and a row in Postgres. Ingest: extract text → chunk it → embed each chunk → write vectors to pgvector. This is slow, so I'd push it to a background job / queue and return immediately — the user doesn't wait. Ask: a question comes in → embed the question → vector-search for the top-k nearest chunks → stuff them into the prompt as context → call the LLM → stream the grounded answer back with citations. Walking one request end-to-end proves you understand the system as a living thing, not a static diagram.

4 · Now scale it — and say why each move helps

"If traffic grows, here's what I'd reach for, in order." Make the API stateless (JWT in the request, no in-memory sessions) so I can run many instances behind a load balancer and scale out. Add connection pooling (e.g. PgBouncer) so a burst of instances doesn't exhaust Postgres connections. Put a proper vector index (HNSW in pgvector) on the embeddings so search stays fast as documents grow. Cache repeated/identical questions and their answers to cut LLM cost and latency. Add rate limiting per user to protect cost. Move ingestion fully onto a queue so spikes don't block the API. The discipline that impresses: tie each lever to the bottleneck it relieves.

5 · Name the trade-offs honestly

This is where you sound senior. Chunk size: small chunks = precise retrieval but lost context; large chunks = more context but noisier, pricier prompts — I'd start ~500 tokens with overlap and tune on real questions. Cost vs latency: a bigger model answers better but costs more and is slower; I might route easy questions to a cheaper model and hard ones to a stronger one. Build vs buy: pgvector (build, full control, one system) vs a managed vector DB like Pinecone (buy, less ops, more cost/lock-in) — at this scale I build; at massive scale I'd reconsider. Closing with "here's what I'd watch and revisit" beats pretending a design is final.

Theory deep-cuts the "why"

Vertical vs horizontal scaling — when each? theory

Vertical (scale up) = a bigger machine: simplest, but there's a ceiling and a single point of failure. Horizontal (scale out) = more machines behind a load balancer: near-limitless and fault-tolerant, but only works if the layer is stateless. Rule of thumb: scale up until it's awkward or risky, then scale out. Stateful pieces (the database) are the hard ones to scale horizontally — that's why we reach for read replicas and pooling.

What are caching layers, and what's the catch? theory

You can cache at many levels: the browser/CDN (static assets), the app (computed results in Redis), and the DB (query cache). Caching trades freshness for speed — so the hard part is invalidation: knowing when cached data is stale and evicting it. "There are only two hard problems: cache invalidation and naming things." For DocChat, caching identical question→answer pairs is cheap and safe; caching document content needs a clear bust-on-update rule.

Latency vs throughput — don't conflate them. theory

Latency = how long one request takes (a single answer feeling snappy). Throughput = how many requests you handle per second overall. They're different and can trade off: batching work raises throughput but adds latency to each item. Know which the question is really about — a chat answer is a latency problem (stream it so it feels instant); a nightly re-ingest of 100k docs is a throughput problem.

DB replication and read replicas — how do they help? theory

The primary handles writes and copies changes to one or more read replicas; reads are spread across the replicas. This scales read-heavy workloads (most apps) and gives you failover. The catch is replication lag — a replica can be milliseconds behind, so a user might not instantly see their own write. You route reads to replicas but reads-after-writes (like "show me what I just saved") to the primary.

How do you scale vector search? theory

Naïve vector search compares the query to every vector — fine for thousands, fatal for millions. The fix is an approximate nearest neighbour (ANN) index like HNSW (a navigable graph) or IVFFlat — trading a sliver of recall for huge speed. In pgvector you build an HNSW index on the embedding column. Beyond that: shard by tenant/namespace so each search hits a smaller set, and pre-filter by metadata (e.g. only this user's documents) before the vector search.

Statelessness — why does it unlock scaling? theory

If a server keeps session state in its own memory, requests are stuck to that server (sticky sessions) — you can't freely add or kill instances, and one crash loses sessions. Make each request self-contained (auth via JWT, shared state pushed to Postgres/Redis) and suddenly any instance can serve any request. That's the precondition for horizontal scaling, zero-downtime deploys, and autoscaling. It's the quiet reason your FastAPI app scales at all.

Behavioural — answer with STAR structure, then story

STAR = Situation, Task, Action, Result. Keep DocChat as your running example — a real shipped project makes every answer concrete.

"Tell me about yourself."

Not STAR — this is a 60-second past → present → future arc. Past: "I'm a developer with a web background in PHP and jQuery." Present: "Over the last weeks I retrained on the modern stack — Python, FastAPI, Postgres, Next.js — and built DocChat, a deployed document Q&A app with a RAG pipeline." Future: "Now I'm looking for a full-stack role in the UAE where I can ship features like that." End by pointing at the project, not your CV. Keep it tight — they're testing whether you can be concise and signal direction.

"Tell me about a hard bug you solved." tricky

S: "In DocChat, answers sometimes ignored the right document." T: "I had to find why retrieval was missing obvious passages." A: "I logged the retrieved chunks, saw my chunks were too large and diluting the match, reduced chunk size and added overlap, and verified with a set of known question→source pairs." R: "Retrieval accuracy jumped and answers started citing the correct passage." The trick is to show method — reproduce, isolate, fix, verify — not heroics. Interviewers want a debugging process they'd trust on their codebase.

"Tell me about a disagreement with a teammate."

S: a teammate wanted a separate vector database; I argued for pgvector. T: agree on an approach without it turning personal. A: "I asked what problem the separate DB solved, shared our actual scale numbers, and proposed we start with pgvector and revisit if we hit limits." R: "We shipped faster on one system and kept the door open." The point isn't winning — it's showing you listen, use data, and disagree-and-commit. Never badmouth the other person.

"Why this stack / why the career switch?"

Frame it as deliberate, not random. S/T: "My web fundamentals were strong but my stack was dated for the UAE market." A: "I picked exactly what employers here hire for — Python/FastAPI, Postgres, React/Next.js — and proved it by building and deploying DocChat end-to-end, including an AI feature." R: "I switched into demand with something shippable to show for it." Motivation + evidence beats either alone.

"What's your biggest weakness?" tricky

Give a real one with a correction in motion — not a humblebrag. "Coming from PHP, I'd reach for synchronous patterns; FastAPI's async model took deliberate effort. So when building DocChat I forced myself to learn where async actually helps — I/O-bound calls like the LLM request — and where it doesn't." That's S→A→R: honest gap, concrete action, you're already past it. Avoid "I work too hard" — interviewers have heard it a thousand times and it reads as evasive.

"Where do you see yourself in a few years?"

Show ambition that fits the role, not a five-year manifesto. "I want to go deep as a full-stack engineer — own features end-to-end, get strong at the AI side since that's where I already have an edge, and grow toward mentoring or leading on technical decisions." Tie it back: "Shipping DocChat solo showed me I like owning the whole slice — that's the direction I want to keep going." Keep it grounded in this job, not a different one.

UAE specifics & applying the local edge

What's the most effective way to actually get hired here? new

Referrals. In the UAE market, a referral from someone inside a company beats a cold application many times over — it gets your CV actually read. So the real job before the job is building a small network: engage where UAE developers are, do a little visible work, and ask for intros. Cold-applying still happens, but treat referrals as your primary channel, not a backup.

Where do I engage the UAE tech community? new

Real places to start: r/dubai for local job-market reality and leads; Dubai developer meetups on Meetup for in-person networking that turns into referrals; and LinkedIn "Tech in UAE" groups for postings and people to connect with. Show up consistently, be useful, and ask genuine questions — networking that leads to referrals is a relationship, not a single message.

How do I handle "what are your salary expectations?" tricky

Research first — know the realistic band for your level and stack in the UAE before the call. Then give a range (not a single number you might undersell yourself with), framed as AED per month (the local convention — salaries are quoted monthly, often tax-free). Something like: "Based on what I've seen for full-stack roles at my level, I'd expect somewhere in the X–Y AED/month range, but I'm flexible depending on the full package." Anchor with data, stay open, and let them counter.

How do I come across well in the interview itself? new

Be concise — answer the question asked, then stop; rambling reads as uncertainty. And when you hit something you don't know, say so gracefully: "I don't know that off the top of my head — here's how I'd find out." Then describe your approach (check the docs, test it, reason from first principles). That honesty plus a method beats a confident wrong answer every time, and it's exactly how you'd behave on the job.

What's new in 2026 say this and stand out

They'll ask how you use AI tools — what's the right answer? 2026

Expect direct questions about Copilot, Claude, and similar in your workflow. The strong answer is responsible use: "I use them to move faster — scaffolding, boilerplate, rubber-ducking a bug — but I read and understand every line, I don't ship what I can't explain, and I never paste secrets or proprietary code into them." That signals you're productive and trustworthy. Saying "I don't use AI" reads as out of touch in 2026; saying "I let it write everything" reads as risky. The middle is the win.

What do UAE startups actually prize right now? 2026

People who can ship — take something from idea to deployed and working — and who can add AI features rather than just talk about them. That's precisely your edge: DocChat is a real, deployed, end-to-end app with a working RAG pipeline. In a market full of candidates reciting theory, "I built and shipped this, here's the live link, let me walk you through the retrieval flow" is rare and exactly what fast-moving teams want.

Memory hooks STAR = Situation, Task, Action, Result. Set the scene, your job, what you did, the outcome — in that order.
Design order: Requirements → Components → Data flow → Scale → Trade-offs. Always clarify before you draw; always end on trade-offs.
Tie it to DocChat Your DocChat capstone is the answer to most of these — the system to design, the hard bug, the disagreement, the AI feature, the shipped thing. A real, deployed, end-to-end project beats rehearsed theory every single time. When in doubt, point at it.