System Design Interview Complete Guide
A complete, practical guide for system design interviews — frameworks, estimation, scalability patterns, and walkthroughs.
System Design Interview — Complete Mastery Guide
A self-contained reference for FAANG-style and senior engineering system design interviews
How to Use This Guide
This document is designed as your single source of truth for system design interview preparation. Read it linearly once for orientation, then use the sidebar table of contents to drill into weak areas. Each part builds on prior concepts: fundamentals (Parts 1–3), building blocks (Parts 4–20), patterns and storage (Parts 21–25), full designs (Part 27), and interview execution (Parts 28–33).
- First pass (2 weeks): Parts 0–3, 28–30. Skim Part 27 walkthrough titles.
- Second pass (4 weeks): Parts 4–20 in depth. Do one Part 27 walkthrough per day.
- Third pass (ongoing): Mock interviews using Part 33 rubric. Re-read trade-off matrices before interviews.
Practice aloud: explain diagrams as if to an interviewer. Time-box yourself to 45 minutes per mock design.
Study modes: (1) Reading mode — understand concepts. (2) Active recall — cover diagrams and explain from memory. (3) Timed mock — random Part 27 problem, 45 min timer. (4) Peer review — swap designs and critique using Part 33 rubric.
Document Map
| Parts | Topic | When to study |
|---|---|---|
| 1–3 | Interview mechanics & estimation | Week 1 |
| 4–7 | Traffic path: LB, cache, CDN | Week 2 |
| 8–12 | Data: DB, replication, transactions | Week 3–4 |
| 13–20 | Distributed systems & ops | Week 5–6 |
| 21–26 | Patterns & domain designs | Week 7 |
| 27 | 25 full walkthroughs | Daily practice weeks 8–12 |
| 28–33 | Execution & checklists | Before every interview |
Part 1: Interview Format & What Interviewers Score
Typical 45–60 Minute Structure
Most system design interviews at large tech companies run 45–60 minutes with a single problem. The first 5–10 minutes are requirements and scope; 10–15 minutes high-level architecture; 20–30 minutes deep dives on data model, scaling, and failure modes; the last 5 minutes trade-offs and extensions.
| Phase | Time | Your Goal |
|---|---|---|
| Clarify & scope | 5–10 min | Functional/non-functional requirements, users, scale, constraints |
| High-level design | 10–15 min | Boxes-and-arrows: clients, LB, services, caches, DBs, queues |
| Deep dive | 20–30 min | Schema, APIs, sharding, consistency, bottlenecks — interviewer-led |
| Wrap-up | 5 min | Summary, monitoring, future work, what you'd do with more time |
What Interviewers Score
Interviewers use a holistic rubric, not a single correct diagram. They evaluate:
- Problem solving: Can you decompose an ambiguous problem and prioritize what matters?
- Technical depth: Do you understand how databases, caches, queues, and networks behave at scale?
- Trade-off reasoning: Can you articulate why you chose SQL vs NoSQL, sync vs async replication, etc.?
- Communication: Do you think aloud, check assumptions, and respond to hints?
- Operational awareness: Monitoring, failure modes, security, cost — not just happy path.
Senior vs Mid-Level Expectations
| Dimension | Mid (L4/L5) | Senior (L6+) |
|---|---|---|
| Scope | One clear product feature | Multi-region, org boundaries, platform concerns |
| Depth | Correct building blocks | CAP, consistency, idempotency, saga, observability SLOs |
| Leadership | Follows hints | Proactively surfaces risks, drives discussion |
| Estimation | Order-of-magnitude OK | Back-of-envelope with explicit assumptions |
Virtual Whiteboard Tips
- Use a consistent layout: users left, data stores right, async flows bottom.
- Label arrows (HTTPS, events, replication). Unlabeled lines confuse you and the interviewer.
- Draw incrementally — don't erase entire diagrams; add layers (MVP → scale).
- Keep text large enough to read on a shared screen; abbreviate (API GW, DB) consistently.
- Excalidraw, Miro, or built-in CoderPad — practice one tool before interview day.
- When stuck, narrate: "I'd pause here and validate QPS assumptions with you."
Part 2: The Answer Framework
Use this repeatable framework for every design question. Interviewers recognize structured thinking even when the exact architecture differs.
Step 1: Requirements
Functional: What the system must do (users post tweets, shorten URLs, book seats).
Non-functional: Scale, latency, availability, durability, consistency, security, cost.
Example script: "I'll assume 100M DAU, read-heavy 100:1, p99 read latency under 200ms, 99.9% availability unless we need stronger consistency for payments."
Step 2: Constraints & Assumptions
- Budget, team size, existing stack, regulatory (GDPR, PCI), geographic focus
- Explicitly state what you are not building (e.g., ML ranking v1, admin portal)
Step 3: Back-of-the-Envelope
DAU → QPS (peak ~2–5× average), storage per object × objects/year, bandwidth. See Part 3.
Step 4: API Design
RESTful resources or RPC methods; idempotency keys for writes; pagination cursors. Keep to 5–8 core endpoints in the interview.
Step 5: High-Level Diagram
[Clients] → [CDN] → [LB] → [API Servers] → [Cache]
↓
[Workers] ← [Queue] → [DB / Object Store]
Step 6: Deep Dives
Interviewer picks: data model, hot paths, sharding key, cache strategy, fan-out, consistency.
Step 7: Bottlenecks & Mitigations
DB write throughput, hot keys, thundering herd, single points of failure — pair each with a fix.
Step 8: Trade-offs Summary
One sentence each: "We chose eventual consistency for feeds because… at the cost of…"
Step 9: Closing Summary
Recap architecture in 30 seconds; mention monitoring and phased rollout.
Part 3: Back-of-the-Envelope Estimation
Why Estimation Matters in Interviews
Interviewers rarely expect exact numbers; they want to see that you decompose a fuzzy problem into measurable quantities, state assumptions explicitly, and sanity-check whether your architecture can handle the load. A five-minute back-of-the-envelope (BOE) prevents you from proposing a single MySQL instance for a billion-read-per-day product.
Good estimation is a chain of reasoning: daily active users (DAU) lead to actions per day, which become average and peak queries per second (QPS), which drive storage growth, egress bandwidth, cache sizing, and shard counts. Each hop should be spoken aloud so the interviewer can correct assumptions early.
Script: "With 100M DAU and each user viewing 20 pages/day, that is 2B page views/day. Dividing by 86,400 seconds gives ~23K average QPS; peak is often 2–5×, so I will plan for ~100K read QPS at peak."
Latency Numbers Every Engineer Should Know
Memorize orders of magnitude so you can reason without looking up charts. Times vary by hardware; use these as interview anchors when arguing for caches, CDNs, or async processing.
| Operation | Typical Latency | Notes |
|---|---|---|
| L1 cache reference | 0.5 ns | CPU-local |
| Branch mispredict | 5 ns | Pipeline flush |
| L2 cache | 7 ns | |
| Mutex lock/unlock | 25 ns | Uncontended |
| Main memory reference | 100 ns | DDR4/5 |
| SSD random read | 16 µs | NVMe faster |
| Round trip in datacenter | 0.5 ms | Same AZ |
| Redis/Memcached RTT | 0.5–1 ms | Local network |
| SSD sequential 1 MB | 1 ms | |
| Disk seek (HDD) | 10 ms | Avoid in hot path |
| Send 1 MB over 1 Gbps LAN | 10 ms | |
| Cross-country RTT | 40–80 ms | US coast-to-coast |
| Read 1 MB from S3 (first byte) | 100–300 ms | Region-dependent |
| Database query (simple indexed) | 1–10 ms | Local DB |
| Complex DB join / full scan | 10–100+ ms | Why indexes matter |
Rule of thumb: one cross-region RTT (50–150 ms) dominates a datacenter cache hit (sub-ms). If your design needs 20 sequential RPCs across regions, latency will exceed 1 second before application logic runs — batch, parallelize, or move data closer.
From DAU to QPS
requests_per_day = DAU × actions_per_user_per_day
avg_QPS = requests_per_day / 86,400
peak_QPS ≈ avg_QPS × peak_multiplier # often 2–5× for consumer appsExample: 50M DAU, 10 timeline loads/day → 500M reads/day → ~5,800 average QPS → ~29K peak at 5× multiplier during evening hours.
Writes: Often one to two orders of magnitude lower than reads in social and feed products. State read:write ratio explicitly (e.g., 100:1). For write-heavy systems (logging, IoT ingestion), invert the analysis and size for ingest QPS first.
Storage Estimation Formulas
storage_per_year = objects_per_year × bytes_per_object × replication_factor
objects_per_year = (new_objects_per_second) × 86,400 × 365| Object | Size (order of magnitude) |
|---|---|
| User profile row | 1–4 KB |
| Tweet / short post metadata | 300 B – 2 KB; media separate |
| Image (compressed) | 200 KB – 2 MB |
| Video minute (1080p) | 50–150 MB |
| Log line (JSON) | 0.5–2 KB |
| UUID + indexes overhead | +30–50% on row size |
Worked example: 10M new photos/day × 500 KB average × 3× replication ≈ 15 TB/day before compression and lifecycle tiering — clearly object storage (S3/GCS) territory, not inline BLOB columns in OLTP.
Account for soft deletes, audit trails, and backups: operational storage often exceeds raw user data by 2×. Cold tier (Glacier) reduces cost but not logical size on planning spreadsheets.
Power of Two for Capacity Planning
| Power | Exact | Approx |
|---|---|---|
| 2^10 | 1,024 | ~1 thousand (1 KB) |
| 2^20 | 1,048,576 | ~1 million (1 MB) |
| 2^30 | 1,073,741,824 | ~1 billion (1 GB) |
| 2^40 | ~1.1×10^12 | ~1 trillion (1 TB) |
| 2^50 | ~1.1×10^15 | ~1 quadrillion (1 PB) |
Use powers of two when estimating shard counts, hash ring size, and memory: a 32-bit user ID space has 4B values; at 1 KB per cached profile, fully populated memory would be 4 TB (never fully hot). Sharding by user_id mod 1024 yields 1024 shards — a clean power-of-two boundary.
Bandwidth Estimation
egress_Gbps = peak_QPS × avg_response_bytes × 8 / 10^91 Gbps ≈ 125 MB/s theoretical maximum. A 500 KB JSON API at 10K QPS needs roughly 40 Gbps egress at origin — CDN edge caching and compression are mandatory, not optional optimizations.
Include upload bandwidth for user-generated content: 1M uploads/day × 2 MB average ≈ 23 GB/s average if spread evenly — in reality peak upload windows concentrate load on ingress load balancers and object-store write paths.
Availability Math
Independent components in series multiply reliability: if A is 99.9% and B is 99.9%, combined ≈ 99.8%. Parallel redundancy improves availability: 1 - (1-p)^n for n identical redundant nodes.
| Nines | Downtime/year |
|---|---|
| 99% | 3.65 days |
| 99.9% | 8.76 hours |
| 99.99% | 52.6 minutes |
| 99.999% | 5.26 minutes |
Interview tip: tie nines to product — 99.9% may be fine for a news feed; payment authorization often needs multi-region active-active and stricter SLOs. Mention error budgets (Part 18) when discussing how much downtime is acceptable.
Servers as a Sanity Check
Rough capacity: one modern app server might handle 500–2,000 RPS for light JSON (highly workload-dependent). 100K QPS divided by 1K per server ≈ 100 servers before cache — then apply cache hit ratio: 90% hit rate cuts origin load by 10×.
Database connection limits often bind before CPU: 500 app servers × 10 connections each = 5,000 connections — many managed Postgres tiers cap below that, requiring PgBouncer or fewer, larger connection pools with careful tuning.
[Assumption chain]
DAU → actions/day → QPS (avg & peak)
→ storage/year (× replication)
→ bandwidth (× bytes/response)
→ cache hit ratio → DB QPS
→ shard count / machine count
→ monthly cost (servers + egress + storage)Common BOE Mistakes
- Forgetting peak multiplier and planning only for average QPS
- Ignoring replication factor and backup storage in disk math
- Using HDD seek latency assumptions for SSD/NVMe-backed stores
- Treating CDN hit ratio as 100% without stating edge cache assumptions
- Confusing bits and bytes in bandwidth (×8 conversion)
Practice Problem
Design a photo-sharing app BOE: 20M DAU, 3 photo views and 0.2 uploads per user per day, 400 KB average display size, 2 MB upload, 5-year retention. Walk through QPS, storage/year, and peak egress. Compare with and without 85% CDN cache hit on reads.
Interview BOE Drills
Practice these until automatic:
- URL shortener: 100M URLs/month → writes/s; 10:1 read → read QPS; 500 B row → GB/year
- Photo app: 10M photos/day × 2 MB → 20 TB/day raw; CDN hit ratio effect on origin
- Chat: 1M concurrent × 1 msg/min → message QPS; WS memory per connection
Latency Budget Example
p99 target 200ms for API: CDN 20ms + LB 5ms + app 30ms + cache 2ms + DB 80ms + serialization 10ms + margin 53ms. If DB is 80ms, you cannot add 5 sequential microservice hops without busting budget.
Availability Budget Math
99.9% monthly ≈ 43 minutes downtime. If deploy 20 times/month with 0.1% blast per deploy, plan canary and automatic rollback. Error budget policy links reliability to release velocity.
Worked Example: News Site BOE
Assumptions: 20M DAU, 50 article views/user/day, 500 KB average page (HTML+assets), 80% CDN hit ratio.
Views/day = 20M × 50 = 1B. Avg QPS = 1B/86400 ≈ 11,600. Peak 5× ≈ 58,000 read QPS.
Origin QPS = 58K × 20% = 11,600 if CDN handles 80%. Egress without CDN: 1B × 500KB = 500 TB/day — impossible without CDN.
Storage: 10K new articles/day × 50 KB text + 2 MB images × 3 replicas ≈ 60 GB/day text-heavy; media in S3 not counted in DB row size.
Q&A
Q: Why powers of two for shards? A: Clean routing bitmask (user_id & 0x3FF), even split in consistent hash rings.
Q: How many servers for 58K QPS? A: If 2K RPS/instance → ~30 origin app servers before DB/cache; cache cuts DB load further.
Bandwidth Worked Numbers
| Payload | QPS | Egress Gbps |
|---|---|---|
| 1 KB JSON | 100K | 0.8 |
| 10 KB | 100K | 8 |
| 100 KB | 100K | 80 |
| 1 MB | 10K | 80 |
Interview Question Bank — Back-of-Envelope
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
How do you estimate peak QPS from DAU?
DAU × actions per day / 86400 × peak multiplier (2–5×). State assumptions explicitly.
How much storage for 5 years of tweets?
Daily tweets × size × 365 × 5 × replication. Separate media to object storage.
What latency dominates cross-region design?
RTT 50–150ms per round trip — minimize sequential RPCs.
How do you convert availability % to downtime?
99.9% ≈ 8.76 hours/year. Use for error budget discussions.
Additional BOE Practice
Review this section with Part 27 walkthroughs — apply boe calculations to each classic problem.
| Exercise | Goal |
|---|---|
| Recalculate QPS | Under 2 min without notes |
| Identify bottleneck | Label on diagram |
| Propose mitigation | With trade-off sentence |
Part 3 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Latency table memorized
- DAU→QPS formula
- Storage/year calc
- Bandwidth Gbps
- Power of two
- Availability nines
- Assumption chain spoken
- Peak multiplier 2-5x
- Sanity check servers
- CDN impact on egress
Self-test prompt
Explain Part 3 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 3 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 4: Scalability
What Scalability Means
Scalability is the ability of a system to handle increased load by adding resources without redesigning core architecture. In interviews, distinguish vertical scaling (bigger machines) from horizontal scaling (more machines). Most web-scale systems horizontal-scale stateless tiers and partition stateful data.
Scalability has dimensions: load (QPS), data volume, fan-out complexity, geographic distribution, and team/org scale. Clarify which dimension dominates for the problem at hand.
Vertical vs Horizontal Scaling
| Aspect | Vertical (scale-up) | Horizontal (scale-out) |
|---|---|---|
| How | More CPU/RAM/disk on one node | Add nodes behind LB |
| Limits | Hardware ceiling, single point of failure | Requires partition-friendly design |
| Cost curve | Expensive high-end boxes | Commodity hardware, linear-ish |
| Downtime | Often requires restart | Rolling deploys, replace nodes |
| Interview use | Quick MVP, DB until sharding | Default for stateless app tier |
Databases often scale vertically first (read replicas, bigger instance), then shard horizontally. Application servers horizontal-scale from day one in most designs.
Stateless Application Tier
Stateless servers store no session data locally; any request can land on any instance. Session state lives in client tokens (JWT), centralized session store (Redis), or database. This enables elastic autoscaling and zero-downtime deploys.
[LB]
/ | [App1][App2][App3] ← no local session
\ | /
[Redis sessions] or [JWT in cookie]Anti-pattern: sticky files on disk per server without shared storage — breaks scale-in and causes data loss on node termination.
Sticky Sessions
Load balancers can pin a user to one backend via cookie or connection affinity. Useful when legacy app keeps local cache or non-replicated sessions. Downsides: uneven load, poor failover, complicates deploys and autoscaling.
- When acceptable: Short migration period, WebSocket origin pinning with reconnect logic
- Prefer instead: External session store, stateless APIs, connection draining on deploy
- If you mention sticky sessions, always note load imbalance risk and mitigation (session replication)
Autoscaling
Autoscaling adjusts instance count based on metrics (CPU, request count, queue depth, custom business metrics). Scale-out triggers add capacity before SLO breach; scale-in removes idle capacity to save cost.
| Signal | Pros | Cons |
|---|---|---|
| CPU utilization | Simple | Laggy; misleads on I/O-bound work |
| Request rate / latency p99 | User-visible | Needs good LB metrics |
| Queue depth | Great for workers | Not for synchronous API tier alone |
| Schedule-based | Predictable peaks (TV events) | Wastes capacity if wrong |
Cooldown periods prevent flapping. Warm pools and pre-warmed AMIs reduce cold-start latency for latency-sensitive APIs. Mention minimum instance count for availability during scale-from-zero (if allowed).
Scaling Stateful Components
Caches scale via clustering and consistent hashing. Databases scale via read replicas, sharding, and federation. Queues scale via partitions and consumer groups. Each stateful layer needs its own scaling story — do not assume app autoscaling fixes DB writes.
Bottleneck Hierarchy
- Single DB master write throughput
- Hot keys / hot partitions
- Expensive synchronous RPC chains
- Lock contention on shared resources
- Thundering herd on cache miss
- Cross-region replication lag
Interview flow: identify the first bottleneck at estimated peak load, propose mitigation, re-estimate capacity, repeat.
Elasticity vs Performance
Serverless and aggressive autoscaling maximize elasticity; fixed large pools minimize tail latency variance. Cost vs latency trade-off: financial systems may keep warm capacity; batch analytics may scale to zero overnight.
Senior signal: Discuss scaling limits of the team — microservices scale independently but multiply operational overhead. A monolith with modular boundaries may scale further with one on-call rotation.
Case Study: E-commerce Checkout
Browse/catalog tier: horizontal stateless, CDN, read replicas. Cart: Redis per user with TTL. Checkout: smaller pool, stricter timeouts, idempotent payment API, queue for order fulfillment. Scale browse 100× checkout — different tiers, different scaling policies.
[Browse] → many replicas, CDN, cache-heavy [Cart] → Redis cluster, moderate replicas [Checkout]→ few replicas, sync payment, saga async
Scaling Case Study
Instagram scaled Python app servers horizontally behind LB; Memcached for hot objects; sharded Postgres/Cassandra for data. Key lesson: stateless app tier scales linearly until database becomes bottleneck — then shard or cache.
Auto-Scaling Signals
| Signal | Scale out when | Caution |
|---|---|---|
| CPU | >70% for 5 min | CPU low but queue deep — scale on queue depth |
| Request rate | Approaching RPS limit per instance | Coordinate with DB capacity |
| Custom | Kafka consumer lag > threshold | Adding consumers > partitions useless |
Sticky Sessions Detail
Cookie-based affinity routes user to same server for session data in memory — fragile on deploy (drain connections). Prefer external session store (Redis) + stateless servers.
Worked Example: E-Commerce Checkout Scale
Black Friday 10× normal: auto-scale API 50→500 pods in 10 min. Database cannot scale 10× instantly — queue checkout requests, show wait time, prioritize payment capture.
Stateless cart in Redis keyed by session_id; order creation idempotent. Sticky sessions avoided — all state external.
Q&A
Q: Vertical vs horizontal first? A: Vertical until single-machine limits (CPU/RAM/disk IOPS), then read replicas, then shard writes.
Interview Question Bank — Scalability
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
When is vertical scaling enough?
Low traffic MVP, single-region, team velocity priority — until CPU/IO saturates.
What makes a service stateless?
Any instance handles any request; session in Redis/DB; no local disk state.
How does auto-scaling avoid flapping?
Cooldown periods, hysteresis thresholds, scale-up faster than scale-down.
Additional Scale Practice
Review this section with Part 27 walkthroughs — apply scale calculations to each classic problem.
| Exercise | Goal |
|---|---|
| Recalculate QPS | Under 2 min without notes |
| Identify bottleneck | Label on diagram |
| Propose mitigation | With trade-off sentence |
Part 4 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Vertical limits named
- Stateless app tier
- Sticky session downside
- Auto-scale signals
- Scale DB last
- Read replicas
- Connection pool limits
- Split compute/storage
- No local disk state
- Phase scaling plan
Self-test prompt
Explain Part 4 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 4 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 5: Load Balancing
Role of Load Balancers
Load balancers distribute traffic across healthy backends, terminate TLS, enforce routing rules, and provide a stable endpoint while instances churn. They sit between clients and your application tier, and between internal services in multi-tier designs.
Layer 4 vs Layer 7
| Layer | OSI | Routes on | Aware of HTTP | Use case |
|---|---|---|---|---|
| L4 | Transport | IP + port | No | TCP pass-through, gaming, extreme throughput |
| L7 | Application | URL path, headers, host | Yes | REST APIs, sticky cookies, A/B routes |
L4 LB forwards packets with minimal inspection — lower latency, cannot route /api to different pool than /static. L7 can route Host: api.example.com to gRPC pool and www to web pool; can inject headers (X-Request-ID).
Load Balancing Algorithms
| Algorithm | Behavior | When to use |
|---|---|---|
| Round robin | Cycle backends | Homogeneous, equal capacity |
| Weighted round robin | Proportional to weight | Mixed instance sizes |
| Least connections | Fewest active conns | Long-lived requests, variable duration |
| Least response time | Lowest latency backend | Heterogeneous performance |
| Random + two choices | Pick 2 random, use least loaded | Power of two choices — near-optimal |
| IP hash | Client IP → fixed backend | Legacy sticky without cookies |
Consistent hashing (Part 17) appears at cache layers and some L7 gateways for shard-aware routing. Do not confuse LB algorithms with data partitioning hashes.
Health Checks
Active health checks: LB periodically calls /health and removes failing nodes. Passive checks: observe error rates from real traffic. Use deep checks sparingly — hitting DB on every probe overloads dependencies.
- Liveness: Process up? Return 200 if server binds port.
- Readiness: Can serve traffic? DB connected, cache warmed, migrations done.
- Kubernetes: liveness vs readiness probes map directly to interview answers
Graceful shutdown: on SIGTERM, stop accepting new connections, drain in-flight requests (30–60s), then deregister. Prevents 502 spikes during deploys.
Global Load Balancing
Global server load balancing (GSLB) directs users to nearest healthy region using DNS, anycast, or edge networks. Goals: lower latency, disaster recovery, regulatory data residency.
User in Tokyo → GSLB → ap-northeast-1 User in London → GSLB → eu-west-1 Region failure → DNS/health failover → us-east-1
Challenges: cross-region data consistency, session stickiness across regions, cache invalidation globally. Often pair GSLB with geo-replicated data or region-scoped user accounts.
DNS Load Balancing
DNS returns multiple A/AAAA records with short TTL (30–300s). Clients pick randomly or by resolver behavior — crude load spread. DNS failover removes unhealthy IPs after TTL propagation delay.
Limitations: DNS caching causes stale routes; not good for fine-grained load control. Commonly combined with Anycast IP (one IP, BGP routes to nearest POP) at CDN/LB edge.
TLS and Connection Management
TLS termination at LB offloads crypto from app servers. TLS passthrough preserves end-to-end encryption but limits L7 routing. HTTP/2 and gRPC multiplex many streams on one connection — least-connections matters more than round robin.
Internal Service Load Balancing
Sidecars (Envoy) and client-side LB (gRPC name resolution) distribute east-west traffic inside Kubernetes. Service mesh adds retries, timeouts, circuit breaking at data plane — see Part 14.
Failure Modes
- Thundering herd when all backends marked unhealthy — keep minimum healthy pool
- SYN flood — SYN cookies, rate limits at edge
- LB itself as SPOF — cloud LB is managed; self-hosted needs HA pair (VRRP)
- Misconfigured idle timeout killing long WebSockets
Interview Checklist
- Where does TLS terminate?
- L4 or L7 — can we route by path/host?
- Health check type and drain strategy on deploy?
- Single region or GSLB — how failover works?
Health Check Types
- Liveness: process up? Restart if fails
- Readiness: can accept traffic? Remove from LB if DB down
- Deep check: optional dependency ping — use sparingly (cascades)
Global Load Balancing
GeoDNS or Anycast routes user to nearest healthy region. Health checks per region; failover when region degraded. Data replication lag limits active-active for strongly consistent apps.
DNS Round Robin vs LB
DNS multiple A records — client picks; TTL caching causes stale routes. Application LB (ALB, NGINX) preferred for HTTP with health checks.
Worked Example: Global API
Users in US, EU, APAC. Route53 latency-based routing to regional ALB. EU data stays EU (GDPR). Health check removes region on 5xx spike.
Q&A
Q: L7 vs L4 for WebSocket? A: L7 ALB supports WS upgrade; L4 passes through opaque TCP — use when need raw TCP or extreme throughput.
Interview Question Bank — Load Balancing
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Why least connections vs round robin?
Long-polling/WebSocket ties up connections — least connections balances better.
How do health checks cause outages?
Too aggressive checks mark healthy nodes bad — use readiness not deep dependency chain.
Explain DNS load balancing limits.
TTL caches old IPs; not aware of server load — use for geo routing with health-checked endpoints.
Additional LB Practice
Review this section with Part 27 walkthroughs — apply lb calculations to each classic problem.
| Exercise | Goal |
|---|---|
| Recalculate QPS | Under 2 min without notes |
| Identify bottleneck | Label on diagram |
| Propose mitigation | With trade-off sentence |
Part 5 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- L4 vs L7 explained
- Round robin vs least conn
- Health check types
- SSL at LB
- Global DNS routing
- Avoid DNS round robin pitfalls
- Session affinity alternative
- LB as choke point HA
- DDoS at edge
- Cross-zone LB
Self-test prompt
Explain Part 5 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 5 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 6: Caching
Why Cache
Caching stores copies of expensive-to-compute or expensive-to-fetch data closer to the consumer. A cache hit avoids repeated database queries, RPC chains, or disk reads. In system design interviews, caching is often the difference between a design that meets 100ms p99 and one that collapses at 10K QPS.
Caches trade freshness for speed. Every cache introduces staleness risk and invalidation complexity — state these trade-offs explicitly rather than treating cache as a free performance boost.
Cache Layers
Caches exist at every layer of the stack. Understanding the hierarchy helps you place the right cache for the right bottleneck.
| Layer | Examples | Typical TTL | Invalidation |
|---|---|---|---|
| Client | Browser HTTP cache, mobile disk | Minutes–days | Cache-Control headers |
| CDN / Edge | CloudFront, Cloudflare | Seconds–hours | URL purge, versioned paths |
| API gateway | Response cache by route | Seconds | Key eviction |
| Application | In-process LRU (Caffeine) | Seconds–minutes | Process restart |
| Distributed | Redis, Memcached | Minutes–hours | TTL, pub/sub invalidation |
| Database | Buffer pool, materialized views | Varies | Query refresh, CDC |
Cache-Aside (Lazy Loading)
Application checks cache first; on miss, reads from DB, writes to cache, returns. Most common pattern for read-heavy workloads.
value = cache.get(key)
if value is None:
value = db.get(key)
cache.set(key, value, ttl=300)
return value- Pros: Only caches requested data; survives cache failure (degrades to DB)
- Cons: First request always slow; stale data if DB updated without invalidation
- Race: Two misses can double-load DB — use singleflight or lock per key
Read-Through & Write-Through
Read-through: Cache library loads from DB on miss transparently to app. Write-through: Writes go to cache and DB synchronously — cache always consistent but write latency equals DB latency.
Write-behind (write-back): Writes update cache immediately; async flush to DB. Higher write throughput but risk of data loss on cache crash before persistence — use for analytics counters, not financial balances without durable queue.
Eviction Policies
| Policy | Behavior | Use when |
|---|---|---|
| LRU | Evict least recently used | General purpose hot set |
| LFU | Evict least frequently used | Stable popularity skew |
| TTL | Time-based expiry | Naturally stale data (feeds, config) |
| Random | Simple, no metadata | Memcached default at scale |
| Size-based | Max memory cap triggers eviction | Redis maxmemory-policy |
Cache Stampede (Thundering Herd)
When a hot key expires, thousands of requests may miss simultaneously and hammer the database. Mitigations:
- Probabilistic early expiration — jitter TTL so keys do not expire together
- Lock / singleflight — first miss rebuilds; others wait or serve stale
- External pre-warm — background job refreshes hot keys before expiry
- Stale-while-revalidate — return old value while async refresh runs
TTL Strategy
Short TTL for rapidly changing data (stock prices). Long TTL + explicit invalidation for user profiles. Version keys (user:123:v5) allow instant logical invalidation without scanning Redis.
Negative caching: cache 'not found' briefly to protect DB from repeated lookups for bogus IDs (security scanning, bots).
Consistency & Invalidation
Invalidation strategies: delete key on write; publish invalidation event to all app servers; rely on TTL only for low-stakes data. Event-driven invalidation scales better than broadcast for large fleets.
[Write path]
Client → API → DB commit → publish invalidation
→ subscribers delete cache keysRedis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Strings, hashes, lists, sets, streams | Strings only |
| Persistence | Optional RDB/AOF | Pure memory |
| Clustering | Redis Cluster, sentinel | Client-side consistent hash |
| Typical use | Sessions, leaderboards, pub/sub | Simple object cache |
Interview Pitfalls
- Caching without stating hit ratio assumption in BOE
- No plan for cold start or cache cluster failure
- Caching personalized data at CDN without Vary: Cookie
- Ignoring memory cost at scale (1M keys × 10 KB = 10 GB)
Cache Key Design
Namespace keys: v1:user:123:profile. Version prefix enables bulk invalidation on schema change. Avoid unbounded key cardinality (per-request keys).
Memcached vs Redis for Pure Cache
Memcached multithreaded, simple evict — pure cache layer at Facebook scale. Redis when you need structures (sorted sets for leaderboards) or persistence.
Multi-Layer Example
Browser cache → CDN → API in-process LRU → Redis → DB 95% 90% of remainder 80% hit
Worked Example: Product Page Cache
Cache-aside key product:42 TTL 300s. On price update, DELETE key + publish invalidation to local caches. Stampede on flash sale: singleflight + pre-warm top 1000 SKUs.
Q&A
Q: Write-behind for inventory? A: Risky — loss on crash. Use for analytics page views, not stock count.
Interview Question Bank — Caching
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
How do you prevent cache stampede?
Jitter TTL, singleflight, stale-while-revalidate, proactive pre-warm.
Cache-aside vs write-through?
Cache-aside: flexible, app controls. Write-through: stronger consistency, higher write latency.
When is negative caching used?
Repeated lookups for non-existent keys — bots scanning IDs — short TTL prevents DB hammering.
Additional Cache Practice
Review this section with Part 27 walkthroughs — apply cache calculations to each classic problem.
| Exercise | Goal |
|---|---|
| Recalculate QPS | Under 2 min without notes |
| Identify bottleneck | Label on diagram |
| Propose mitigation | With trade-off sentence |
Part 6 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Cache layers drawn
- Cache-aside flow
- Write-through vs behind
- Eviction policy pick
- TTL + invalidation
- Stampede mitigation
- Redis vs Memcached
- Hit ratio in BOE
- Negative caching
- Cache failure degrade
Self-test prompt
Explain Part 6 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 6 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 7: CDN Deep Dive
What a CDN Does
A Content Delivery Network caches static and cacheable dynamic content at Points of Presence (POPs) geographically distributed near users. Reduces origin load, latency, and egress cost. Essential when BOE shows high read QPS or large asset payloads (images, video segments, JS bundles).
CDN Architecture
[User] → [DNS GeoDNS] → [Edge POP cache]
miss ↓
[Shield / Mid-tier] → [Origin / S3]Edge POP serves from SSD/RAM. Shield layer collapses origin fetches — many edge misses become one shield-to-origin request. Origin shield protects S3 from thundering herd during viral content.
What to Cache at the Edge
- Static assets: CSS, JS, images with content-hash filenames (immutable)
- Video segments (HLS/DASH .ts chunks) with long TTL
- API responses only if identical for many users (public product catalog)
- Do NOT cache authenticated personalized HTML without careful Vary headers
Cache Control & Headers
| Header | Purpose |
|---|---|
| Cache-Control: max-age | Browser and CDN TTL |
| s-maxage | CDN-specific TTL (shared caches) |
| stale-while-revalidate | Serve stale while fetching fresh |
| ETag / If-None-Match | Conditional GET — 304 saves bandwidth |
| Vary | Cache variants by Accept-Encoding, Cookie, etc. |
Versioned URLs (/static/app.v42.js) allow infinite TTL — invalidation is deploy a new filename. Purge API needed for emergency takedown of bad assets.
Dynamic Content Acceleration
CDNs can terminate TLS closer to user, use persistent connections to origin, and route over private backbone (AWS CloudFront to S3). Dynamic Site Accelerator still cannot cache POST responses — focus on connection reuse and TCP optimization.
Video Streaming & CDN
Adaptive bitrate streaming splits video into small files; CDN caches each segment independently. Live streaming uses low-latency protocols (LL-HLS) and origin packagers — harder than VOD. BOE: concurrent viewers × bitrate = egress Gbps.
Invalidation & Consistency
Purge by URL, wildcard, or tag (Cloudflare cache-tags). Propagation takes seconds to minutes globally. Prefer immutable assets over purge for routine deploys. For news sites, short TTL + stale-while-revalidate balances freshness and load.
Security at CDN Edge
- DDoS absorption — CDN scales to absorb volumetric attacks
- WAF rules at edge (OWASP Top 10 patterns)
- Bot management, rate limiting before origin
- Geo blocking, IP allowlists for admin paths
Multi-CDN & Failover
Large properties use multiple CDNs for resilience and price negotiation. DNS or traffic manager weighted routing splits traffic. Complexity: cache efficiency drops if same asset on two CDNs — coordinate TTL and purge.
Cost Model
CDN bills per GB egress and request count. Origin egress to CDN often cheaper than internet egress. Calculate: monthly page views × asset size × (1 - edge_hit_ratio) = origin traffic. Improving hit ratio from 85% to 95% halves origin load.
Interview Script
"I will put all static media behind a CDN with content-hashed paths and 1-year TTL. API responses stay origin-only unless we have a truly public read API; user-specific data never caches at edge without explicit design."
CDN Providers Comparison (Conceptual)
| Feature | Typical offering |
|---|---|
| Edge locations | 100+ POPs global |
| Origin shield | Reduce origin load |
| Image optimization | Resize on edge |
| Workers@Edge | Light compute at POP |
Origin Collapse
Without shield: 1000 edge POPs miss simultaneously → 1000 origin requests. Shield tier: 1000 misses → 1 shield fetch → 1 origin. Critical for viral content.
Worked Example: Video Platform
1080p segment 2 MB, 10M views/day on popular video. CDN serves 95%; origin 500K segment fetches. Origin bandwidth 500K × 2MB = 1 TB/day manageable vs 200 PB/day without CDN.
Interview Question Bank — CDN
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
What should never be cached at CDN?
Personalized HTML with user PII, uncacheable Set-Cookie responses without Vary.
How does cache poisoning happen?
Host header attacks — validate Host, use signed URLs for origin.
Origin shield benefit?
Collapses many edge misses into one origin fetch during viral traffic.
Additional CDN Practice
Review this section with Part 27 walkthroughs — apply cdn calculations to each classic problem.
| Exercise | Goal |
|---|---|
| Recalculate QPS | Under 2 min without notes |
| Identify bottleneck | Label on diagram |
| Propose mitigation | With trade-off sentence |
Part 7 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- CDN for static
- Origin shield
- Cache-Control headers
- Immutable hashed assets
- Purge vs version URL
- Personalized not at edge
- Video segments
- Multi-CDN note
- Cost per GB
- DDoS absorption
Self-test prompt
Explain Part 7 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 7 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 8: Databases — SQL vs NoSQL
Choosing SQL vs NoSQL
SQL (relational) databases excel at structured data, ACID transactions, complex joins, and ad-hoc analytics. NoSQL broad category includes document, wide-column, key-value, graph, and time-series — each optimizes for specific access patterns at scale.
| Factor | SQL (Postgres, MySQL) | NoSQL (varies) |
|---|---|---|
| Schema | Fixed, migrations | Flexible or schema-on-read |
| Transactions | Strong ACID | Often per-document or eventual |
| Joins | Native, optimizer | Denormalize or application-side |
| Scale writes | Vertical + sharding harder | Partition-friendly (Cassandra, Dynamo) |
| Query patterns | Ad-hoc SQL | Must know partition key upfront |
Indexes & B-Trees
Most OLTP databases use B+ trees for indexes: balanced tree, O(log n) lookups, sequential leaf scans for range queries. Primary key cluster determines physical row order (InnoDB, Postgres clustered options).
Composite index (user_id, created_at) supports queries filtering on user_id and sorting by time — left-prefix rule: index useless for queries filtering only created_at without user_id.
- Covering index includes all SELECT columns — avoids table lookup
- Too many indexes slow writes (each index updated on INSERT)
- Full table scan acceptable for rare admin reports, not user path
Normalization vs Denormalization
Normalization (3NF): Eliminate redundancy; joins reconstruct data. Good for OLTP consistency, smaller writes. Denormalization: Duplicate fields to avoid joins at read time — standard in Cassandra, MongoDB feed designs, and read-heavy SQL when join cost dominates.
Interview pattern: normalized writes in OLTP, denormalized read models via CDC to search/feed store (CQRS-lite).
Connection Pooling
Opening a DB connection is expensive (TLS, auth, memory). App servers use pools (PgBouncer, HikariCP) to reuse connections. Pool size ≈ (core_count × 2) + effective_spindle_count per Postgres folklore — but at scale, thousands of microservices × pool size can exhaust max_connections.
App (500 instances) → PgBouncer (transaction pooling) → Postgres
# Transaction pooling: connection returned after each transactionDocument Stores (MongoDB)
JSON documents, flexible schema, replica sets, sharded cluster by shard key. Good for catalogs, content management, user profiles with nested objects. Avoid unbounded document growth (embedding unbounded arrays).
Wide-Column (Cassandra, HBase)
Partition key determines node; clustering columns sort within partition. Optimized for high write throughput and time-series. Query must include partition key — designing access patterns first is mandatory.
Key-Value (DynamoDB, Redis)
Simple get/put by key, predictable latency at scale. DynamoDB: partition key + optional sort key, on-demand or provisioned capacity, GSIs for alternate access patterns (with consistency caveats).
Graph Databases
Neo4j, Neptune for relationship-heavy queries (social graph friends-of-friends, fraud rings). Not a replacement for primary OLTP at billion-user scale — often specialized subgraph service.
Operational Concerns
- Backup, PITR, replication lag monitoring
- Migration strategy (expand-contract, dual-write)
- Read replica routing for analytics vs user traffic
[Write] → Primary SQL [Read hot path] → Redis → optional replica [Analytics] → Read replica / warehouse (never on primary)
Index Types Beyond B-Tree
- Hash index: equality only (Postgres hash, limited use)
- GIN/GiST: full-text, JSON, geo in Postgres
- Column store: analytics (Redshift, ClickHouse)
Migration at Scale
Online schema change: gh-ost, pt-online-schema-change copy rows in background. Expand-contract: add nullable column → dual-write → backfill → switch reads → remove old.
Read Path Routing
ORM must distinguish writer vs reader endpoints. Stale replica reads acceptable for dashboards, not for "withdraw balance" immediately after deposit.
Worked Example: Social Graph in SQL
follows(follower_id, followee_id) composite index (follower_id, created_at). Query followees' recent posts: JOIN posts ON followee_id — at 10M followers for one user, denormalize celebrity follows to separate fan-out pipeline.
Interview Question Bank — Databases
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
B-tree vs LSM-tree?
B-tree: better read, default OLTP. LSM (RocksDB): better write throughput, compaction overhead.
When denormalize?
Read path >> write, join cost high, acceptable inconsistency window with CDC refresh.
Connection pool exhaustion symptom?
Timeouts under load while CPU low — increase pool cautiously or use PgBouncer.
Additional DB Practice
Review this section with Part 27 walkthroughs — apply db calculations to each classic problem.
| Exercise | Goal |
|---|---|
| Recalculate QPS | Under 2 min without notes |
| Identify bottleneck | Label on diagram |
| Propose mitigation | With trade-off sentence |
Part 8 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- SQL vs NoSQL table
- B-tree index
- Composite index rule
- Normalize vs denorm
- Connection pooling
- Read replica routing
- Shard when needed
- Migration strategy
- Covering index
- Avoid SELECT *
Self-test prompt
Explain Part 8 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 8 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 9: Replication
Why Replicate Data
Replication copies data across multiple nodes for read scalability, lower latency (geo-local reads), and fault tolerance. In interviews, always pair replication with a consistency story: synchronous replication favors durability; asynchronous favors write latency.
Leader-Follower (Primary-Replica)
One leader accepts all writes; followers tail the write-ahead log (WAL) or binlog. PostgreSQL streaming replication, MySQL binlog replication, and MongoDB replica sets follow this pattern. Reads can hit followers to scale SELECT traffic.
| Aspect | Detail |
|---|---|
| Write path | Client → leader only |
| Read path | Leader or any follower (may be stale) |
| Failover | Promote follower via Patroni, Orchestrator, RDS Multi-AZ |
| Risk | Replication lag → stale reads; split-brain if fencing fails |
Synchronous vs Asynchronous Replication
| Mode | Behavior | Trade-off |
|---|---|---|
| Synchronous | Leader waits for follower ACK before commit | No lost committed writes if leader dies; higher write latency |
| Asynchronous | Leader commits locally; followers catch up later | Lower latency; possible data loss on leader crash |
| Semi-sync | Wait for at least one follower | Balance of durability and latency |
Expose replication_lag_seconds as a metric. Route critical reads (balance, inventory) to leader or use linearizable reads; route timelines to followers with "may be stale" UX.
Multi-Leader (Multi-Primary)
Multiple nodes accept writes — useful for multi-datacenter active-active. Conflicts are inevitable when two leaders update the same row. Resolution strategies:
- Last-write-wins (LWW): Timestamp-based; simple but can drop updates
- Vector clocks / version vectors: Track causality; surface conflicts to application
- CRDTs: Data structures that merge without conflicts (counters, sets) — good for collaborative editing
Interview probe: "Two users like the same post from different regions simultaneously — how do you merge counts?" Answer with idempotent increments or CRDT counters.
Leaderless Replication (Quorum)
Dynamo-style systems (Cassandra, Riak, DynamoDB internals): no single leader. Replication factor N; write quorum W; read quorum R. If W + R > N, reads see latest write (strong consistency for that config).
N=3, W=2, R=2 → tolerate 1 node failure, strong reads
N=3, W=1, R=1 → fast but weak; eventual consistencyHinted handoff: Temporarily store writes for down nodes. Read repair: On read, detect stale replicas and update them. Anti-entropy: Background Merkle-tree comparison fixes drift.
Change Data Capture (CDC)
Stream WAL/binlog to Kafka (Debezium) → search index, warehouse, cache invalidation. Avoids dual-write bugs where app writes DB and search separately and they diverge.
[Leader DB] → WAL → CDC connector → Kafka → [Consumers]
├→ Elasticsearch
├→ Data warehouse
└→ Cache invalidationReplication Topology Diagram
Leader-Follower:
Writes → [Leader] ──repl──→ [Follower1]
└──repl──→ [Follower2]
Reads → any node (stale OK?)
Multi-Leader:
[DC-East Leader] ←──conflict──→ [DC-West Leader]
Leaderless (N=3):
Client writes to any 2 of 3 nodes (W=2)Interview Checklist
- State sync vs async and what happens when leader dies mid-write
- How followers are chosen for failover (lag, priority)
- Whether reads need strong consistency or eventual is acceptable
- How cross-region replication affects CAP trade-offs
Script: "I use leader-follower with async replication for the feed service — followers serve 95% of reads. Payment ledger reads go to the leader or a sync replica because we cannot tolerate lost commits."
Split-Brain Prevention
Fencing: isolate old leader via STONITH or lease in etcd before promoting replica. TTL lease shorter than failover detection time.
Read Replica Routing
PgBouncer + ORM: @replica hint for analytics. Causal consistency: read from replica that has applied at least transaction T.
Lag Monitoring
| Alert | Action |
|---|---|
| lag > 30s | Page DBA |
| lag > 5min | Block promote failover |
Worked Example: Replication
Leader failover in 30s: promote replica, update DNS/VIP, invalidate connection pools. Clients retry with backoff. Apps must handle brief write errors during failover.
Extended Notes
Connect replication to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Replication
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Sync replication when?
Financial ledger, leader election metadata — when lost write unacceptable.
Read replica lag handling?
Route critical reads to primary; show 'syncing' UX for non-critical stale reads.
What is split-brain?
Two nodes both think they are leader — use fencing and quorum.
Extended Reference — Replication
Write path latency
Synchronous replication adds RTT to nearest replica per commit — measure p99 write impact before enabling on hot path.
Semi-synchronous 'at least one replica' is popular compromise in MySQL production clusters.
Failover testing
Game day: kill primary during load test; measure detection time, promotion time, client error rate.
Applications must reconnect — connection pools stale to old primary IP until refreshed.
Global readers
Geo-routed read replicas serve local users; replication lag means EU user may not see US write for seconds.
Causal tracking: Google Spanner TrueTime; application-level: version tokens in API responses.
Binlog consumption
Multiple consumers read same binlog stream for search, warehouse, cache — coordinate retention size.
Binlog growth disk risk — monitor and archive to S3.
Interview diagram
Draw primary + 2 replicas; label sync vs async arrows; mark read traffic to replicas with 'stale OK' note.
Part 9 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Leader-follower diagram
- Sync vs async
- Replication lag metric
- Failover fencing
- Multi-leader conflicts
- Quorum W+R>N
- Read repair
- CDC pipeline
- Split-brain prevent
- Never hide lag
Self-test prompt
Explain Part 9 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 9 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 10: Partitioning & Sharding
Partitioning vs Sharding
Partitioning splits data within one database (Postgres table partitions by date). Sharding distributes partitions across independent database servers. Interviews often use the terms interchangeably for horizontal scale-out.
Partitioning Strategies
| Strategy | Key | Pros | Cons |
|---|---|---|---|
| Range | user_id 1–1M on shard A | Range queries efficient | Hot spots on latest range |
| Hash | hash(user_id) mod N | Even distribution | Range scans across shards expensive |
| Geo | country/region | Data locality, compliance | Uneven country sizes |
| Directory | lookup table shard_id | Flexible rebalancing | Lookup service is SPOF unless replicated |
Choosing a Shard Key
The shard key determines query locality forever. Good keys: high cardinality, even distribution, align with dominant query pattern.
- Good:
user_idfor user-scoped data — all user queries hit one shard - Bad:
countryif US is 40% of traffic — hot shard - Bad:
created_atalone — all writes hit "today" shard
Composite keys ((tenant_id, user_id)) help SaaS multi-tenancy isolate noisy neighbors.
Hot Keys & Hot Shards
Celebrity problem: one logical key (Beyoncé's tweet ID) receives disproportionate traffic. Mitigations:
- Split key: logical key → 100 random suffix keys; read aggregates
- Local cache: in-process cache on each API server for hot entities
- Separate service: dedicated read path for global counters (Redis INCR sharded)
- CDN / edge: for read-heavy public content
Cross-Shard Operations
Joins across shards require scatter-gather (query all shards, merge) — expensive. Design schemas so hot queries are single-shard. Global secondary indexes (DynamoDB GSI) replicate data under alternate keys at write cost.
Resharding
When N shards is insufficient, move from 256 to 512 shards. Strategies:
- Fixed partitions: 4096 logical partitions mapped to shards; move partitions between shards without changing app hash
- Dual-write: write to old and new shard during migration
- Backfill: copy data with CDC; cutover when caught up
- Consistent hashing: only K/N keys move when adding a node (see Part 17)
[Router] hash(user_id) → shard map → [Shard 0] [Shard 1] ... [Shard N]
hot key? → local cache / key splittingElasticsearch / Cassandra Sharding Notes
Elasticsearch: index split into shards + replicas; routing by document ID. Cassandra: partition key required in every query; clustering columns for sort within partition.
Interview Pitfalls
- Sharding too early — single Postgres with read replicas handles surprising scale
- Shard key that does not match access pattern
- No plan for resharding or tenant growth
Uber Ringpop / Scuttlebutt
Service discovery + shard ownership — gossip protocol distributes shard map to nodes.
Vitess (YouTube)
MySQL sharding middleware: VTGate routes SQL by sharding key; resharding with minimal app change.
Interview: Design Sharded DB
Start with hash(user_id) mod 256 logical shards mapped to 32 physical MySQL instances. Router layer in app or sidecar.
Worked Example: Sharding
Reshard user_id 0-1M from shard A to new shard B: dual-write phase, backfill historical rows, verify counts, switch reads, stop writes to A.
Extended Notes
Connect sharding to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Sharding
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Hot shard mitigation?
Split key, local cache, async aggregation, dedicated hardware for hot tenant.
How to choose shard count?
Start 2× expected data size per shard; plan consistent hash virtual nodes for growth.
Cross-shard query?
Scatter-gather parallel queries + merge — expensive; redesign access pattern if frequent.
Extended Reference — Partitioning & Sharding
Shard map service
Directory service stores range → shard mapping; update map during migration without client redeploy if using discovery API.
Co-location
Place related entities on same shard: user_id shard carries user profile, settings, private posts — avoids cross-shard transactions.
Secondary indexes
Global index in Dynamo: scatter query all shards — high cost; prefer local GSIs with duplicated partition strategy.
Rebalancing
Consistent hash minimizes movement; still schedule low-traffic window; throttle migration bandwidth.
Monitoring
Per-shard QPS, storage, replication lag heatmap — detect hot shard before outage.
Part 10 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Shard key choice
- Hash vs range
- Hot key mitigations
- Cross-shard cost
- Resharding plan
- Directory lookup
- Co-locate related data
- Scatter-gather aware
- Vitess mention OK
- Monitor per shard
Self-test prompt
Explain Part 10 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 10 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 11: CAP, PACELC & Consistency Models
CAP Theorem
In a network partition (P), a distributed system must choose between consistency (C) and availability (A). You cannot have all three in the strict sense during a partition.
- CP: Refuse writes/reads until consensus (ZooKeeper, etcd, HBase) — correct but may be unavailable
- AP: Accept requests; replicas may diverge (Cassandra tunable, DynamoDB eventual)
Most production systems are not purely one letter — they offer tunable consistency per operation.
PACELC Extension
If Partition (P): choose Availability or Consistency (AC). Else (normal operation): choose Latency or Consistency (LC). Under no partition, you still trade off sync replication latency vs strong consistency.
Consistency Models (Weakest to Strongest)
| Model | Guarantee | Example |
|---|---|---|
| Eventual | Replicas converge if no new writes | DNS, Cassandra default |
| Read-your-writes | User sees own updates | Session stickiness or user-scoped routing |
| Monotonic reads | No going backward in time | Route user to same replica |
| Consistent prefix | Causal order preserved | Kafka partition ordering |
| Linearizable | Appears instantaneous global order | etcd, Spanner TrueTime |
| Serializable | Transactions as if serial order | Postgres SERIALIZABLE |
Linearizability vs Serializability
Linearizability: single-object, real-time order — register read sees latest write. Serializability: multi-object transaction isolation — no interleaving anomalies. Spanner provides external consistency via TrueTime bounded clock uncertainty.
Practical Interview Mapping
| Product feature | Typical choice |
|---|---|
| Social feed | Eventual + read-your-writes |
| Like counter | Eventual or CRDT; approximate OK |
| Inventory / seat booking | Strong consistency, transactions |
| Chat messages | Per-channel ordering (Kafka partition) |
| Config flags | Eventual with short TTL |
Quorum Recap
W + R > N gives strong reads on write; latency cost on every write. Mention tunable per query in Cassandra (ONE vs QUORUM vs ALL).
Clocks & Ordering
Lamport clocks, vector clocks, and hybrid logical clocks (HLC) order events without perfect sync clocks. Never assume NTP is perfect — design for clock skew in distributed IDs (Snowflake uses time + machine id).
Partition happens: CP system: some nodes reject traffic → lower availability AP system: nodes diverge → need merge / conflict resolution later
Script: "Feeds are AP — we accept eventual consistency with 30s staleness on followers. Seat reservation is CP on the shard leader with row-level locking."
Dynamo Paper Takeaways
Consistent hashing + quorum + sloppy quorum + hinted handoff — foundation for AP systems.
Google Spanner
TrueTime API bounds clock uncertainty → external consistency globally. Not magic — GPS/atomic clocks in datacenters.
Session Guarantees in Practice
Sticky sessions + read-your-writes: route same user to primary or replica with session token tracking applied LSN.
Worked Example: CAP
Bank transfer during partition: CP choice — reject transfer if cannot reach quorum. Social like during partition: AP — accept like, merge count later.
Extended Notes
Connect cap to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — CAP & Consistency
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Is CAP a theorem to cite blindly?
Explain partition behavior practically — tunable quorums, not binary CAP labels.
Linearizable example?
Distributed lock, leader election — user expects immediate global visibility.
Eventual consistency user impact?
Delayed notification count, duplicate like possible — product must accept or merge.
Extended Reference — CAP & Consistency
PACELC in interview
Normal operation: choose between latency and consistency — sync replication is LC trade-off.
Client-side choices
DynamoDB ConsistentRead=true on GetItem; Cassandra QUORUM vs ONE per query.
Session tokens
Return version with write; client passes version on read — server routes to replica ≥ version.
Split brain during partition
AP system may accept conflicting writes — product must define merge UX.
Avoid CAP buzzword only
Explain concrete failure: 'If link between DCs drops, we pause writes to enforce CP for wallet.'
Part 11 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- CAP during partition
- PACELC else branch
- Consistency models list
- Linearizable example
- Eventual product OK
- Read-your-writes how
- Tunable quorum
- Clock skew aware
- Not buzzword only
- Map feature to model
Self-test prompt
Explain Part 11 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 11 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 12: Distributed Transactions
Why Distributed Transactions Are Hard
A transaction spanning multiple databases or services cannot use a single node's lock manager. Network failures leave systems in partial states. Interviews favor pragmatic patterns over pure 2PC unless banking-level ACID is required.
Two-Phase Commit (2PC)
Coordinator runs prepare (vote) then commit. All participants must ACK before commit.
- Prepare: Participants lock resources, vote yes/no
- Commit: If all yes, coordinator sends commit; else abort
Problems: blocking if coordinator dies after prepare; latency; not suited across unreliable WAN. Used inside distributed databases (Spanner, distributed Postgres experiments) more than microservices.
Saga Pattern
Sequence of local transactions with compensating actions on failure. Choreography (events) vs orchestration (central coordinator).
| Step | Action | Compensate |
|---|---|---|
| 1 | Reserve inventory | Release inventory |
| 2 | Charge payment | Refund payment |
| 3 | Ship order | Cancel shipment |
Compensations must be idempotent — retries are inevitable. Sagas are eventually consistent; not a substitute for single-node ACID when you need atomic debit+credit.
Transactional Outbox
Write business row + outbox event in same local DB transaction. Relay process publishes to Kafka. Consumers achieve at-least-once; idempotent handlers required.
BEGIN;
INSERT INTO orders ...;
INSERT INTO outbox (topic, payload) ...;
COMMIT;
-- separate relay: read outbox → publish → mark sentIdempotency
Duplicate requests must not double-charge or double-ship. Store idempotency_key with unique constraint; return cached response on replay.
- Client generates UUID per user action
- Server stores (key → response) with TTL 24h
- Payment APIs (Stripe) mandate idempotency keys
TCC (Try-Confirm-Cancel)
Reserve resources in try phase, confirm or cancel. Like saga with explicit resource holds — used in some Chinese payment ecosystems.
When to Use What
| Pattern | Use when |
|---|---|
| Local ACID only | Single service owns all data |
| Outbox + events | Notify other services reliably |
| Saga | Multi-service workflow with compensations |
| 2PC | Rare; internal to specialized DB |
Interview Example: Order Service
"Order service writes order + outbox in Postgres. Payment service consumes PaymentRequested event, calls Stripe with idempotency key. On failure, publishes PaymentFailed; order service runs compensating cancel saga step."
Outbox vs Dual Write
| Approach | Risk |
|---|---|
| Dual write DB+Kafka | One succeeds one fails — inconsistent |
| Outbox | Single transaction; relay may lag |
Idempotency Table Schema
CREATE TABLE idempotency_keys (
key VARCHAR(64) PRIMARY KEY,
response_body JSONB,
created_at TIMESTAMPTZ
);Poison Message Handling
After N failed saga steps, move to manual review queue — do not infinite retry charging user.
Worked Example: Transactions
Order saga: reserve→pay→ship. Compensate ship cancel if pay failed after reserve. Each step stores saga_id state machine row.
Extended Notes
Connect transactions to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Distributed Transactions
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Why avoid 2PC in microservices?
Blocking, coordinator SPOF, latency — use saga/outbox instead.
Outbox relay failure?
Relay retries; at-least-once delivery; consumers idempotent.
Saga compensation failure?
Manual intervention queue; alert; never silent money loss.
Extended Reference — Distributed Transactions
Outbox ordering
Relay publishes in order per aggregate id — consumers depend on order for state machine.
Saga timeouts
Each step has deadline; timeout triggers compensate — avoid stuck saga occupying inventory.
Duplicate event handling
Consumer stores processed event_id; unique constraint prevents double ship.
Testing sagas
Inject failure after step 2 in integration test; verify compensate called once.
vs local transaction
Prefer single-service ACID when boundary allows — extract service only when necessary.
Part 12 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- 2PC limitations
- Saga compensate
- Outbox pattern
- Idempotency keys
- At-least-once consumers
- Poison saga handling
- TCC optional mention
- Prefer local TX
- Event ordering
- Test failure injection
Self-test prompt
Explain Part 12 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 12 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 13: Message Queues & Streams
Queues vs Logs
Message queue (RabbitMQ, SQS): message deleted after ack — task distribution. Log / stream (Kafka, Pulsar): messages retained; consumers track offset — replay and multiple consumer groups.
Kafka Core Concepts
| Term | Meaning |
|---|---|
| Topic | Named stream of records |
| Partition | Ordered, immutable sequence; parallelism unit |
| Offset | Position in partition |
| Consumer group | Partitions divided among consumers in group |
| Replication | Leader + ISR followers per partition |
Throughput scales with partition count. Key by user_id to preserve per-user ordering.
Ordering Guarantees
- Within partition: strict order
- Across partitions: no global order
- Fix: partition key = entity id needing order (order_id, user_id)
Delivery Semantics
| Semantic | Meaning | How |
|---|---|---|
| At-most-once | May lose messages | Fire-and-forget, no retry |
| At-least-once | May duplicate | Retry until ack; idempotent consumer |
| Exactly-once | Hard end-to-end | Kafka transactions + idempotent producer + dedup DB |
Interview default: at-least-once + idempotent handlers. Exactly-once is expensive; justify for billing.
Consumer Groups
Each partition consumed by at most one consumer in a group. Scale consumers ≤ partition count. Rebalance on consumer join/leave — causes brief pause; use cooperative sticky assignors in production.
Backpressure & Retention
Retention policy (7 days default) bounds disk. Slow consumers fall behind (lag). Monitor consumer lag alert. Dead-letter queue (DLQ) for poison messages after N failures.
Use Cases
- Async jobs: email, thumbnails, search indexing
- Event sourcing / CDC propagation
- Metrics aggregation pipeline
- Decouple peak write spikes from slow processors
Producer → [Topic: orders]
├─ partition 0 → Consumer A (group billing)
├─ partition 1 → Consumer B (group billing)
└─ partition 2 → Consumer C (group analytics)RabbitMQ vs Kafka
| RabbitMQ | Kafka | |
|---|---|---|
| Model | Queue, routing | Distributed log |
| Replay | Limited | Native by offset |
| Throughput | High | Very high |
| Routing | Exchanges, bindings | Topic + key |
Partition Sizing
Target 10–100 MB/s per partition; too few partitions limits parallelism; too many increases broker overhead.
Kafka vs SQS
| Kafka | SQS | |
|---|---|---|
| Ordering | Per partition | FIFO queues only |
| Retention | Days+ | 14 days max |
| Consumers | Pull, groups | Competing consumers |
Event Schema Evolution
Avro/Protobuf with schema registry; backward compatible field addition; never remove required fields without version bump.
Worked Example: Kafka
Order events keyed by order_id preserve per-order ordering. 12 partitions → max 12 parallel consumers in group.
Extended Notes
Connect kafka to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Message Queues
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Kafka partition key choice?
Entity needing ordering — order_id, user_id — not random if order matters.
At-least-once duplicate handling?
Idempotent consumer: upsert by event_id, check processed table.
When queue vs direct RPC?
Async, burst absorption, fan-out to many consumers, decouple peak load.
Extended Reference — Message Queues & Streams
Message size
Kafka default 1MB max; large payloads store S3 pointer in message body.
Compaction
Log compaction retains latest key per topic — changelog topics for config/state.
Consumer lag SLO
Alert lag > 60s for billing pipeline; > 5min for analytics acceptable.
Ordering vs parallelism
More partitions = more parallelism but no global order — business must accept per-entity order only.
Poison pill
Message fails parse — DLQ after 3 tries; manual fix schema or skip with audit.
Part 13 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Queue vs log
- Kafka partitions
- Consumer groups
- Delivery semantics
- Idempotent consumer
- DLQ
- Lag monitoring
- Partition key
- Message size S3
- Schema evolution
Self-test prompt
Explain Part 13 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 13 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 14: Microservices vs Monolith
Monolith First
A single deployable application with shared database is faster to build and debug. Many successful products scale monoliths vertically and with read replicas before splitting. Interview tip: do not jump to 50 microservices without scale pain.
When to Split Services
- Independent scaling (video transcoding vs API)
- Different release cadences per team
- Technology fit (Python ML vs Go API)
- Fault isolation (billing outage must not take down feed)
- Regulatory boundaries (PCI scope reduction)
Microservices Challenges
| Challenge | Mitigation |
|---|---|
| Distributed debugging | Tracing (Jaeger), correlation IDs |
| Data consistency | Sagas, outbox, eventual consistency |
| Network latency | Batch APIs, avoid chatty chains |
| Operational overhead | K8s, Helm, service mesh maturity |
| Testing | Contract tests, staging environments |
API Gateway
Single entry for clients: auth, rate limiting, routing, SSL termination, request aggregation (BFF pattern for mobile vs web).
[Mobile] ──┐ [Web] ──┼→ [API Gateway] → [User Svc] [Order Svc] [Feed Svc] [3rd party]─┘ ↓ auth, throttle, route
Service Mesh (Istio, Linkerd)
Sidecar proxy per pod handles mTLS, retries, timeouts, traffic splitting without app code changes. Cost: latency hop, complexity. Worth it at dozens+ services with strong platform team.
Communication Patterns
- Sync REST/gRPC: simple request-response; cascading failure risk
- Async events: loose coupling; harder to debug
- BFF: Backend-for-frontend tailored API per client type
Data Per Service
Each service owns its database — no shared tables. Cross-service queries via API composition or materialized views fed by CDC. Violating this creates distributed monolith.
Interview Script
"I start with a modular monolith — clear package boundaries. If transcoding becomes a bottleneck, extract media-worker service behind a queue while keeping user API monolithic."
Domain-Driven Boundaries
Split by bounded context (billing, catalog, shipping) not by technical layer (all DBs separate wrong way).
Strangler Fig Migration
Proxy routes 5% traffic to new service; increment until monolith retired.
Service Mesh Cost
~1–2ms latency per hop; 1000 services × mesh control plane ops burden — justify before adopting.
Worked Example: Microservices
Extract notification service first — clear boundary, async, reduces monolith deploy risk without splitting core transaction path.
Extended Notes
Connect microservices to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Microservices
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Monolith to microservices first split?
Highest churn isolated component with clear API — not arbitrary layer split.
API gateway vs service mesh?
Gateway: edge auth/routing. Mesh: service-to-service mTLS, retries, traffic split.
Distributed monolith antipattern?
Microservices sharing database tables — no bounded context isolation.
Extended Reference — Microservices
Team topology
Conway's law: service boundaries match team boundaries — align org before splitting code.
Contract testing
Pact tests verify provider/consumer API contracts in CI — prevent breaking downstream.
Shared libraries
Thin shared libs only — fat shared library recreates monolith coupling.
Observability tax
Each service needs metrics, logs, traces — platform team provides templates.
Decomposition trigger
Extract when independent scale, deploy, or failure domain justified — not preemptively.
Part 14 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Monolith first OK
- Split boundaries
- API gateway role
- Service mesh cost
- BFF pattern
- Data per service
- Saga across services
- Contract tests
- Strangler migration
- Avoid distributed monolith
Self-test prompt
Explain Part 14 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 14 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 15: REST, GraphQL, gRPC & WebSockets
REST
Resource-oriented HTTP: nouns as URLs, verbs as methods. Stateless; cacheable GETs. Standard for public APIs and browser clients.
| Method | Idempotent | Safe | Use |
|---|---|---|---|
| GET | Yes | Yes | Read |
| POST | No | No | Create |
| PUT | Yes | No | Replace |
| PATCH | No | No | Partial update |
| DELETE | Yes | No | Remove |
Pagination: cursor-based (?cursor=abc) scales better than offset for large tables. Version in path (/v1/) or header.
GraphQL
Client specifies exact fields needed — one round trip for nested data. Server defines schema (types, queries, mutations).
- Pros: flexible clients, reduced over-fetching
- Cons: complex caching (no HTTP cache per URL easily), N+1 query risk (DataLoader batching), expensive arbitrary queries — need depth/complexity limits
gRPC
HTTP/2 + Protocol Buffers — binary, fast, strongly typed. Streaming (unary, server, client, bidirectional). Best service-to-service; browsers need grpc-web gateway.
| REST/JSON | gRPC | |
|---|---|---|
| Contract | OpenAPI optional | .proto required |
| Performance | Good | Better (binary) |
| Browser | Native | Needs proxy |
| Streaming | SSE, WS | Native |
WebSockets
Persistent bidirectional TCP — chat, live games, collaborative docs. Stateful connections complicate load balancing (sticky sessions or pub/sub backplane). Heartbeats detect dead connections.
Server-Sent Events (SSE)
One-way server → client over HTTP. Simpler than WebSockets for live feeds, notifications. Auto-reconnect built-in.
Choosing in Interviews
| Scenario | Choice |
|---|---|
| Public mobile API | REST or GraphQL |
| Internal microservices | gRPC |
| Live chat | WebSockets + Redis pub/sub |
| Stock ticker | SSE or WebSocket |
External: REST/GraphQL → API Gateway Internal: gRPC between services Realtime: WebSocket tier → Redis channel → all WS nodes
REST Pagination Patterns
Cursor: ?after=tweet_id stable under concurrent inserts. Offset bad for deep pages (OFFSET 1000000 slow).
GraphQL N+1
Resolvers per field cause N DB queries — DataLoader batches loads per request tick.
gRPC Streaming Use Cases
Server stream: log tail. Client stream: bulk upload. Bidi: collaborative editing.
Worked Example: APIs
Mobile uses GraphQL for home screen single request; backend services still gRPC internally.
Extended Notes
Connect apis to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — API Styles
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
REST versioning?
URL /v1/ or Accept header — be consistent; deprecate with sunset headers.
GraphQL complexity attack?
Limit query depth, cost analysis, timeouts, persisted queries allowlist.
gRPC vs REST for public API?
REST/JSON for third parties; gRPC internal — developer experience and browser support.
Extended Reference — REST, GraphQL, gRPC & WebSockets
API versioning
Deprecation timeline communicated via Sunset header; maintain v1 for 12 months.
Idempotent HTTP
PUT/DELETE idempotent by definition; POST needs Idempotency-Key for payments.
GraphQL complexity
Calculate cost: depth × breadth; reject expensive queries at gateway.
gRPC deadlines
context.WithDeadline propagates timeout across call chain.
WebSocket auth
Validate JWT on connect message; re-auth on long-lived connections periodically.
Part 15 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- REST verbs idempotent
- Cursor pagination
- GraphQL N+1 fix
- gRPC internal
- WebSocket LB
- SSE one-way
- Versioning strategy
- Proto breaking change
- Timeout deadlines gRPC
- Pick API per client
Self-test prompt
Explain Part 15 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 15 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 16: Rate Limiting
Why Rate Limit
Protect origin from abuse, ensure fair usage, enforce SLA tiers, and prevent cascade failure. Apply at edge (CDN), API gateway, and service level.
Token Bucket
Bucket holds tokens refilled at rate R (e.g., 100/sec). Each request consumes one token; overflow requests rejected or queued.
- Allows bursts up to bucket capacity B
- Smooth average rate over time
- Used by many APIs (Stripe, AWS)
tokens = min(capacity, tokens + (now - last) * rate)
if tokens >= 1: tokens -= 1; allow
else: reject 429Leaky Bucket
Requests enter queue; processed at fixed rate. Smoother output than token bucket; less bursty allowance.
Fixed & Sliding Window
Fixed window: count requests per minute bucket — boundary burst (199 at 0:59 + 199 at 1:00). Sliding window log: store timestamp per request — accurate, memory heavy. Sliding window counter: hybrid of fixed windows — good balance (Redis).
| Algorithm | Burst | Memory | Accuracy |
|---|---|---|---|
| Token bucket | Yes | Low | Good |
| Fixed window | Edge spike | Low | OK |
| Sliding log | No | High | Exact |
| Sliding counter | Moderate | Medium | Good |
Distributed Rate Limiting
Redis centralizes counters; all API nodes check INCR with TTL. Race conditions: use Lua script for atomic check-and-decrement. For strict global limits across regions, Redis Cluster or dedicated rate-limit service.
Response Headers
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716300000
Retry-After: 60Rate Limit Dimensions
- Per IP (anonymous)
- Per API key / user id
- Per endpoint (expensive vs cheap)
- Global (protect DB)
Interview: Design API Rate Limiter
Redis sorted set sliding window per key; rules service stores limits; gateway enforces before business logic. Mention fail-open vs fail-closed on Redis outage.
Redis Implementation Sketch
ZREMRANGEBYSCORE key 0 (now - window)
ZADD key now request_id
ZCARD key -- if > limit: 429Hierarchical Limits
Global 1M RPS → per-tenant 10K → per-user 100 — check cheapest filter first.
Fairness vs Priority
Paid tier higher limits; burst allowance for onboarding flows.
Worked Example: Rate Limit
Free tier 100 req/min, Pro 10K. Enforce at gateway; return 429 with Upgrade header.
Extended Notes
Connect rate limit to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Rate Limiting
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Token bucket vs sliding window?
Bucket allows controlled burst; sliding window smoother rate over window.
Distributed rate limit race?
Atomic Lua in Redis; or centralized limiter service.
Fail open or closed on limiter outage?
Fail closed for abuse protection; fail open for internal low-risk if business prefers availability.
Extended Reference — Rate Limiting
Burst vs sustained
Token bucket separates concerns — document both limits in API docs.
Per-tenant fairness
Noisy neighbor: one API key cannot consume entire global quota — hierarchical caps.
Cost of Redis limiter
One Redis round trip per request — acceptable at 100K RPS with cluster; shard keys.
Edge vs app limit
CDN/WAF blocks obvious abuse; app enforces business tier limits.
Testing
Load test verifies 429 at threshold and Recovery after window reset.
Part 16 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Token bucket
- Sliding window
- Redis atomic limit
- 429 headers
- Hierarchical limits
- Fail open vs closed
- Per user and global
- Burst allowance
- Edge + app limits
- Load test 429
Self-test prompt
Explain Part 16 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 16 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 17: Consistent Hashing, Bloom Filters & More
Consistent Hashing
When adding/removing cache nodes, hash mod N remaps almost all keys. Consistent hashing maps keys and nodes to a ring — only K/N keys move on average when one node added.
Virtual nodes (vnodes): each physical node has 100–200 points on ring for even distribution. Used in Dynamo, Cassandra, Memcached clients, CDNs.
hash(key) → clockwise first node on ring
[N1·] [N2·] [N3·] [N1·] ...
ring 0° — 360°Bloom Filters
Probabilistic set: test membership with zero false negatives (if says no, definitely no) but possible false positives (says yes, might not exist). Space-efficient.
- Web crawler: skip already-visited URLs
- Cassandra: avoid disk read if key definitely absent
- CDN: prevent cache pollution
- Spell check: dictionary in compact filter
Cannot delete from standard bloom; counting bloom or rebuild. Size m bits, k hash functions — tune false positive rate p.
Geohashing
Encode lat/long into string prefix; nearby places share prefix — efficient proximity search in Redis/Elasticsearch. Precision = string length. Used in Uber/Lyft driver matching, Yelp nearby.
Merkle Trees
Hash tree: leaf = data block hash; parent = hash(children). Compare root hashes to detect differing subtrees — O(log n) sync.
- Git: commit tree integrity
- Bitcoin: block verification
- Cassandra anti-entropy: replica sync without full compare
- Distributed DBs: efficient replica reconciliation
HyperLogLog
Approximate distinct count in fixed memory — unique visitors, cardinality analytics. Redis PFADD/PFCOUNT.
Count-Min Sketch
Frequency estimation in streaming — heavy hitter detection, hotspot keys.
| Structure | Answers | False? |
|---|---|---|
| Bloom filter | Maybe in set? | FP only |
| HyperLogLog | How many unique? | Approximate |
| Count-Min | How many of X? | Overestimate |
Consistent Hashing Math
With m keys and n nodes, expected keys to move when add node ≈ m/n. Modulo hash moves ~100% keys.
Bloom Filter Sizing
m = -n ln(p) / (ln 2)² bits for n items and false positive rate p. k = m/n × ln 2 hash functions.
Geohash Neighbor Search
Query 8 neighboring cells plus center — handle edge cases at equator/prime meridian.
Worked Example: Algorithms
URL dedup crawler: bloom filter 10B URLs, 1% FP → 100M false positives still saves disk — verify with disk set on positive.
Extended Notes
Connect algorithms to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Specialized Algorithms
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Consistent hashing use case?
Cache cluster, DynamoDB partitions, CDN origin selection — minimal remapping on node add.
Bloom filter false positive impact?
Extra DB read occasionally — tune false positive rate vs memory budget.
Merkle tree in anti-entropy?
Compare root hashes; recurse into differing branches only — efficient replica sync.
Extended Reference — Consistent Hashing & Probabilistic Structures
Virtual nodes
100 vnodes per physical node prevents uneven ring distribution when few servers.
Bloom in practice
Size for 1% FP and 1B items ≈ 1.14 GB — still cheaper than exact set in RAM.
Geohash precision
6 chars ≈ 1.2km; 7 chars ≈ 150m — pick for urban driver matching.
Merkle sync
Compare subtree hashes top-down — bandwidth proportional to differences not total data.
Sketch algorithms
Use Count-Min for trending hashtags; HyperLogLog for UV — not exact counts.
Part 17 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Consistent hash ring
- Virtual nodes
- Bloom filter FP
- Geohash neighbors
- Merkle sync
- HyperLogLog UV
- Count-Min sketch
- Use case per structure
- Modulo hash bad
- Size bloom formula
Self-test prompt
Explain Part 17 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 17 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 18: Observability — Logs, Metrics, Traces
Three Pillars
Logs: discrete events (errors, audit). Metrics: numeric time series (CPU, QPS). Traces: request path across services. Together they answer: what broke, how bad, where in the chain.
Structured Logging
JSON logs with trace_id, user_id, service, level. Centralize in ELK (Elasticsearch, Logstash, Kibana), Loki, or CloudWatch. Avoid logging PII/passwords. Sample debug logs at high QPS.
Metrics (RED & USE)
| Method | Scope | Metrics |
|---|---|---|
| RED | Services | Rate, Errors, Duration |
| USE | Resources | Utilization, Saturation, Errors |
Prometheus pull model + Grafana dashboards. Histograms for p50/p99 latency — averages lie.
Distributed Tracing
OpenTelemetry → Jaeger/Tempo. Propagate trace context (W3C traceparent) across HTTP/gRPC/Kafka. One slow span in 20-service chain visible immediately.
Request trace_id=abc
API 45ms → Auth 12ms → DB 180ms ← bottleneck
→ Cache 2msSLI, SLO, SLA
- SLI: measurable indicator (availability = successful / total)
- SLO: target (99.9% availability over 30 days)
- SLA: contract with customer (refund if missed)
Error budget = 1 - SLO. If budget exhausted, freeze features; focus reliability. Burn rate alerts predict SLO violation early.
Alerting
Alert on symptoms (high 5xx rate, p99 latency) not causes (CPU 80%) unless correlated. Page humans for user-facing SLO breach; ticket for disk 70%. Runbooks linked in alert.
Interview Mention
"I define SLO 99.95% for read API, SLI from load balancer success rate, alert when 1-hour burn rate exceeds 10× budget consumption."
Log Levels
ERROR: action needed. WARN: degraded. INFO: business events. DEBUG: dev only, sampled in prod.
Cardinality Explosion
Never label metrics with unbounded user_id — use aggregated histograms. High cardinality kills Prometheus.
On-Call Hygiene
Runbooks, escalation policy, blameless postmortems within 48 hours.
Worked Example: Observability
SLO 99.9%: burn rate alert when 5xx > 0.1% for 5 min. Trace slow checkout to payment RPC timeout.
Extended Notes
Connect observability to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Observability
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
SLI vs SLO vs SLA?
Indicator vs internal target vs customer contract.
High cardinality metric example?
http_requests{user_id=x} — forbidden at scale.
Trace sampling?
100% errors, 1% success — balance cost and debuggability.
Extended Reference — Observability
Log sampling
Sample 1% debug at 1M RPS — still 10K logs/sec — tune levels.
Metric labels
service, endpoint, status_code — bounded cardinality.
Trace context propagation
Inject trace_id into logs for correlation — single pane search.
SLO dashboard
Burn rate panels for executives — error budget remaining this quarter.
On-call
Every alert actionable — if not, fix alert or delete.
Part 18 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Logs metrics traces
- RED metrics
- Trace propagation
- SLI SLO SLA
- Error budget
- High cardinality avoid
- Alert symptoms
- Runbooks
- Sampling strategy
- Postmortem blameless
Self-test prompt
Explain Part 18 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 18 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 19: Reliability & Disaster Recovery
Reliability Goals
System continues correctly despite failures. Measure with availability SLOs and MTTR (mean time to repair). Design for failure — everything fails eventually.
Redundancy
- Active-active: all nodes serve traffic — no idle capacity waste; harder consistency
- Active-passive: standby takes over on failover — simpler, wasted standby
- N+1, N+2: spare capacity for component failure
Failover
Health checks detect unhealthy instances; LB removes from pool. DNS failover for regional outage (slow TTL). Database automatic failover with fencing (STONITH) to prevent dual-writer split-brain.
Disaster Recovery (DR)
| Term | Meaning |
|---|---|
| RPO | Recovery Point Objective — max data loss (time of last backup/replica) |
| RTO | Recovery Time Objective — max downtime to restore service |
Async cross-region replication increases RPO (minutes of loss possible). Sync replication lowers RPO but raises latency.
Multi-Region Strategies
- Backup restore: cheapest; highest RTO/RPO
- Pilot light: minimal DR region, scale up on disaster
- Warm standby: reduced capacity always running
- Active-active: full capacity both regions; hardest
Chaos Engineering
Proactively inject failures (Chaos Monkey, Litmus) in controlled environments. Validate retries, circuit breakers, and runbooks before real outages. Start with game days, not random prod kills.
Dependency Failure
Every sync call is a failure domain. Timeouts + circuit breakers + graceful degradation (show cached feed if ranking service down).
[Primary Region] ←──async repl──→ [DR Region]
↓ failover DNS / traffic manager
RPO 5 min, RTO 30 min (example targets)Blast Radius
Isolate by cell (subset of users), shard, or region — failure affects 1% not 100%.
Game Day Checklist
- Inject DB failover
- Kill AZ
- Spike traffic 3×
- Verify alerts fire
- Measure RTO actual
Backup Testing
Untested restore = no backup. Quarterly restore drill to staging.
Worked Example: Reliability
RPO 1 hour: async binlog replicate. RTO 15 min: automated failover + runbook for DNS flip.
Extended Notes
Connect reliability to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Reliability
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
RPO vs RTO example?
RPO 5 min = lose 5 min data max. RTO 30 min = down 30 min max.
Chaos engineering prerequisite?
Observability, on-call, steady state hypothesis — otherwise chaos is reckless.
Active-active database challenge?
Write conflicts across regions — need CRDT or conflict resolution.
Extended Reference — Reliability & DR
Dependency map
Maintain tier-0 dependency graph — if Redis down, which features degrade?
Graceful degradation
Feature flags disable recommendations; core feed still serves from cache.
DR drill
Quarterly failover to secondary region with production-like traffic shadow.
Data backup
PITR 35 days; test restore to new cluster monthly.
Incident response
Severity levels, comms template, status page update cadence.
Part 19 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Redundancy N+1
- Active-active vs passive
- RPO RTO defined
- DR drill
- Chaos engineering safe
- Graceful degradation
- Blast radius
- Backup restore test
- Multi-region tradeoff
- Dependency map
Self-test prompt
Explain Part 19 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 19 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 20: Security Fundamentals
Authentication vs Authorization
Authn: who are you (login, JWT, session). Authz: what may you do (RBAC, ABAC, ACL). Always authenticate at gateway; authorize per resource in service.
OAuth 2.0 / OpenID Connect
Delegate auth to identity provider (Google, Okta). Authorization code flow with PKCE for SPAs. Access token (short) + refresh token (long, stored securely). OIDC adds ID token (user profile).
Session vs JWT
| Server session | JWT | |
|---|---|---|
| Revocation | Easy (delete session) | Hard until expiry |
| Scale | Needs Redis session store | Stateless verification |
| Size | Small cookie | Large header |
Encryption
- In transit: TLS 1.2+ everywhere (HTTPS, mTLS service mesh)
- At rest: AES-256 disk encryption (AWS KMS, envelope encryption)
- Application-level: encrypt PII fields before DB for defense in depth
DDoS Protection
Volumetric attacks absorbed at CDN/scrubbing center. Rate limiting, WAF, geo blocking. Anycast spreads load. Never expose origin IP directly.
OWASP Top 10 (Overview)
- Broken access control
- Cryptographic failures
- Injection (SQL, XSS)
- Insecure design
- Security misconfiguration
- Vulnerable components
- Auth failures
- Integrity failures
- Logging failures
- SSRF
Mitigate: parameterized queries, input validation, CSP headers, least privilege IAM, secret rotation, security scanning in CI.
Zero Trust
Never trust internal network; verify every request. mTLS between services, network policies in K8s.
Secrets Management
Vault, AWS Secrets Manager — never commit .env. Rotate keys; short-lived tokens.
SQL Injection Prevention
Parameterized queries only; ORM not excuse for raw string concat.
PCI Scope
Use hosted fields / tokenization — card data never touches your servers.
Worked Example: Security
OAuth scopes: read:profile vs write:post. JWT 15 min access + refresh rotation.
Extended Notes
Connect security to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Security
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
OAuth implicit flow deprecated why?
Token exposed in browser — use authorization code + PKCE.
mTLS benefit?
Mutual authentication service-to-service — no trusted network assumption.
OWASP injection fix?
Parameterized queries, ORM, input validation, least privilege DB user.
Extended Reference — Security
Least privilege IAM
Service account per microservice; no shared admin keys in apps.
Secrets in CI
Short-lived OIDC to cloud — no long-lived AWS keys in GitHub.
Audit logging
Immutable audit trail for admin actions — who changed ACL when.
DDoS layers
Volumetric at CDN; application layer at WAF rate rules; origin protection hide IP.
Supply chain
Dependabot, signed containers, SBOM for compliance.
Part 20 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Authn vs authz
- OAuth PKCE
- JWT vs session
- TLS everywhere
- Encryption at rest
- OWASP top aware
- DDoS layers
- Least privilege IAM
- PCI scope reduce
- No secrets in git
Self-test prompt
Explain Part 20 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 20 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 21: Resilience Design Patterns
Circuit Breaker
Stop calling failing dependency after threshold — fail fast, give time to recover. States: closed (normal), open (reject), half-open (probe).
Libraries: Resilience4j, Hystrix (legacy). Pair with fallback (cached response, degraded mode).
Bulkhead
Isolate resource pools — thread pool per dependency so one slow service cannot exhaust all threads. K8s resource limits per container are bulkheads at infra level.
Retry with Backoff
Transient failures (503, timeout): retry with exponential backoff + jitter. Cap max retries. Idempotent operations only for POST without idempotency key.
delay = min(cap, base * 2^attempt + random_jitter)Timeout
Set timeout at every hop; client timeout < server timeout chain. Cascading waits kill systems — default 30s HTTP client timeout is dangerous at scale.
CQRS
Command Query Responsibility Segregation — separate write model (normalized OLTP) from read model (denormalized Elasticsearch). Updates propagate via events. Scales reads independently.
Event Sourcing
Store sequence of events as source of truth; state derived by replay. Audit trail for free; complex queries need projections. Pair with snapshots for long streams.
| Pattern | Problem solved |
|---|---|
| Circuit breaker | Cascade failure |
| Bulkhead | Resource exhaustion |
| Retry + backoff | Transient errors |
| Timeout | Hung connections |
| CQRS | Read/write scale mismatch |
| Event sourcing | Audit, temporal queries |
[Service] --timeout 200ms--> [Dependency]
| circuit OPEN → fallback cache
| bulkhead pool max 50 threadsRetry Storm
Clients retry on 503 simultaneously → overload. Jittered backoff + server Retry-After header.
CQRS Read Model Build
Projection worker consumes events → updates Elasticsearch doc. Rebuild projection from event log on corruption.
Saga vs 2PC Decision Tree
Money across services → saga + ledger audit. Config update across services → saga OK with compensate.
Worked Example: Patterns
Circuit open after 50% errors in 10s window; half-open allow 3 probe requests.
Extended Notes
Connect patterns to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Resilience Patterns
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Circuit breaker half-open?
Test if dependency recovered — single probe before full traffic.
Retry idempotency?
POST without key may duplicate — require Idempotency-Key header.
CQRS without event sourcing?
Yes — separate read/write stores synced by CDC enough for many systems.
Extended Reference — Resilience Patterns
Timeout budgets
Total user request 300ms — budget 50ms per hop max 4 hops.
Bulkhead thread pools
Pool per downstream — search slow does not exhaust pools for payments.
Fallback quality
Stale cache better than 500 error for product listing — label 'prices may be delayed'.
CQRS rebuild
Replay event log 24h to rebuild corrupted read model — disaster recovery for projections.
Anti-pattern
Retry storm without jitter — amplifies outage.
Part 21 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Circuit breaker states
- Bulkhead pools
- Retry jitter
- Timeout budgets
- CQRS projection
- Event sourcing snapshot
- Fallback defined
- No retry storm
- Half-open probe
- Degrade UX message
Self-test prompt
Explain Part 21 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 21 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 22: Storage — Block, File & Object
Block Storage
Raw volumes (EBS, SAN) mounted as disks. Low-level, high IOPS. Used for databases, VM boot volumes. Snapshots for backup; attach/detach to instances.
File Storage
POSIX filesystem (NFS, EFS, HDFS). Shared folders, legacy apps, data science home directories. Not ideal for internet-scale static assets — latency and cost.
Object Storage
Blob + metadata via HTTP API (S3, GCS). Virtually unlimited scale, 11 nines durability claim, cheap per GB. Keys like s3://bucket/user/123/photo.jpg.
| Type | Access | Best for |
|---|---|---|
| Block | Disk protocol | Databases, transactional local state |
| File | Filesystem path | Shared files, Hadoop |
| Object | HTTP key-value | Media, backups, data lake |
Object Storage Patterns
- Pre-signed URLs for direct client upload (bypass API bandwidth)
- Lifecycle policies: Standard → IA → Glacier
- CDN origin for static delivery
- Versioning + replication for DR
Interview: Photo App
Metadata in SQL; binary in S3; thumbnail via async worker; CloudFront in front. Never store 5 MB images in Postgres rows.
Client → presigned PUT → S3
→ POST /photos {s3_key} → API → SQL metadata
Worker ← SQS ← event → generate thumbnails → S3EBS vs Instance Store
EBS network-attached, snapshot backup. Instance store faster ephemeral — cache nodes only.
Data Lake
S3 + Parquet + Spark/Presto for analytics decoupled from OLTP.
Erasure Coding
S3 IA/Glacier use erasure coding for cost-effective durability at rest.
Worked Example: Storage
Dropbox: metadata SQL, chunks object storage, dedupe by content hash per user namespace.
Extended Notes
Connect storage to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Storage
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
S3 eventual consistency?
Read-after-write consistency for new objects; LIST eventual — design listing carefully.
Block storage for DB?
EBS gp3/io2 — provisioned IOPS for latency-sensitive OLTP.
File vs object for ML training?
Object store + parallel read workers; POSIX file mount optional layer.
Extended Reference — Storage Systems
S3 key design
Prefix with hash of user_id to avoid hot partition — random prefix if extreme scale.
Lifecycle cost
80% storage cost in old infrequent access — lifecycle rules save money.
EBS snapshot
Incremental snapshots; cross-region copy for DR.
POSIX on object
Mount s3fs for legacy — not performance path; use native SDK.
Compliance
Object lock WORM for regulatory retention.
Part 22 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Block file object diff
- S3 presigned upload
- Lifecycle tiers
- EBS for DB
- Data lake S3
- No big BLOB SQL
- Erasure coding note
- POSIX on object caution
- Cross-region replicate
- Backup snapshots
Self-test prompt
Explain Part 22 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 22 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 23: Search & Indexing
Why Separate Search Engine
SQL LIKE '%foo%' full table scan — unusable at scale. Inverted indexes power fast full-text search, fuzzy match, facets, ranking.
Inverted Index
Maps each term → list of document IDs containing it. Query intersects posting lists for AND queries.
"quick fox" indexed: quick → [doc1, doc5] fox → [doc1, doc9] AND → [doc1]
Elasticsearch Architecture
- Index: logical namespace (like database)
- Shard: horizontal partition of index
- Replica: copy for read scale and HA
- Analyzers: tokenize, stem, lowercase text
Writes route to primary shard; replicas sync. Near-real-time search (refresh interval ~1s default).
Ranking & Relevance
TF-IDF, BM25 scoring. Boost fields (title > body). Function scores for popularity, recency. Personalization often hybrid: ES retrieval + ML rerank.
Sync from Primary DB
CDC or dual-write to index. Reindex on mapping changes (new field type). Handle deletes — tombstone in index.
Alternatives
Algolia/Typesense for managed SaaS; Postgres full-text for small scale; vector DB for semantic search (embeddings).
| Feature | SQL | Elasticsearch |
|---|---|---|
| Prefix search | Poor | Good (edge n-grams) |
| Faceted browse | Heavy GROUP BY | Native aggregations |
| ACID writes | Yes | Eventual index refresh |
Autocomplete Pipeline
Edge n-gram tokenizer at index time; completion suggester on prefix queries.
Pagination in Search
search_after with sort keys — deep pagination without costly offset.
Index Mapping Mistakes
Wrong field type (text vs keyword) breaks aggregations and exact filters.
Worked Example: Search
Yelp search: geo filter + text + rating facet — inverted index + geo index combined.
Extended Notes
Connect search to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Search
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Inverted index update?
Near real-time refresh interval; or external versioning for strict freshness.
Why Elasticsearch for logs?
Full-text + aggregations + time-series index patterns (ELK stack).
Vector search addition?
Embeddings index for semantic similarity — hybrid with keyword BM25.
Extended Reference — Search & Indexing
Analyzer chain
Lowercase → stopwords → stemmer — tune for language.
Shard sizing ES
20–50 GB per shard guideline; force merge maintenance window.
Hybrid search
BM25 retrieve top 100 → vector rerank top 10 — best of keyword + semantic.
Index rebuild
Blue-green indices alias swap — zero downtime reindex.
Security
Filter queries by tenant_id mandatory — prevent cross-tenant leak.
Part 23 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Inverted index
- ES shards replicas
- Analyzers
- BM25 ranking
- CDC to index
- Reindex blue-green
- search_after pagination
- Tenant filter mandatory
- Vector hybrid optional
- Autocomplete edge ngram
Self-test prompt
Explain Part 23 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 23 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 24: Real-Time Systems & News Feeds
Fan-Out on Write
When user posts, push post ID into all followers' timeline caches (Redis sorted sets). Read is O(1) — fetch precomputed timeline.
- Pros: fast reads, predictable latency
- Cons: slow write for celebrities (millions of followers); wasted work if follower inactive
Fan-Out on Read
On read, merge recent posts from all followed users. Write is cheap; read is expensive and slow for users following many accounts.
Hybrid (Twitter-style)
Fan-out on write for normal users (<10K followers). Fan-out on read for celebrities — merge celebrity tweets at read time from dedicated cache.
Post tweet: if followers < 10K → push to each follower timeline cache else → write to celebrity tweet cache only Read timeline: merge(user_timeline_cache, celebrity_tweets_cache)
Pull vs Push Models
| Pull | Push | |
|---|---|---|
| Client | Polls server periodically | Server sends via WS/SSE/push notification |
| Latency | Poll interval bound | Near real-time |
| Server load | Empty polls waste resources | Connection state per client |
| Battery | Worse if aggressive poll | Push can be efficient with FCM/APNs |
Activity Streams
FQL-style aggregation: store activities, fan-out to inboxes, rank by ML offline. Kafka for event pipeline; Redis for hot timelines; cold storage in Cassandra.
Ranking Feed
Not chronological at scale — score = f(recency, engagement, affinity). Precompute scores in batch; blend with real-time signals.
Timeline Storage
Redis ZSET: key=timeline:user_id, score=timestamp, member=tweet_id. Trim to top 1000 entries.
Cold Start User
Global popular feed until follow graph populated — onboarding engagement.
Feed Ranking Features
Recency, author affinity, engagement probability — offline model + online blend.
Worked Example: Feeds
Normal user 500 followers: fan-out write 500 Redis ZADDs ~5ms. Celebrity 50M: fan-out read only.
Extended Notes
Connect feeds to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Real-Time Feeds
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Celebrity fan-out hybrid threshold?
Industry often 10K–100K followers — tune by infrastructure cost.
WebSocket scaling?
Pub/sub backplane (Redis) so any WS node receives broadcast to local connections.
Feed ranking offline/online?
Offline batch scores + online rerank with fresh engagement signals.
Extended Reference — Real-Time & Feeds
Ranking pipeline
Offline Spark computes scores hourly; online feature store serves p99 < 10ms lookup.
Feed pagination
Cursor = last tweet_id in page; stable if no deletes; tombstone deleted ids.
Live updates
SSE fanout from pub/sub cheaper than WS for one-way notifications.
Write amplification
Fan-out write 10M followers = 10M writes — async queue required; rate limit celebrity post.
Read merging
K-way merge sorted lists from followees — heap O(log k) per item.
Part 24 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Fan-out write read
- Hybrid celebrities
- Redis ZSET timeline
- Pull vs push
- K-way merge
- Ranking offline online
- Cold start feed
- WS pub/sub scale
- Write amplification aware
- Stale feed OK?
Self-test prompt
Explain Part 24 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 24 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 25: Payments & Ledger Design
Requirements
Exactly-once money movement illusion, audit trail, idempotency, PCI scope minimization (use Stripe/Adyen tokenization). Strong consistency for balances.
Double-Entry Ledger
Every transaction has equal debit and credit entries; sum of accounts always balances.
Transfer $100 A → B:
DEBIT account_A 100
CREDIT account_B 100Immutable ledger entries — never UPDATE balance in place; append entries and compute balance as SUM or maintain materialized balance with transactional update in same DB transaction.
Idempotency Keys
Client sends Idempotency-Key: uuid on POST /charges. Server stores key → result mapping. Retries return same response without double charge.
Payment Flow
- Create payment intent (pending)
- Call PSP (payment service provider)
- Webhook confirms success/failure (async)
- Update ledger + order state atomically
Webhook handler must be idempotent — PSP may retry webhooks.
Reconciliation
Nightly batch compare internal ledger vs PSP settlement files. Discrepancy alerts for fraud or bugs.
Outbox for Side Effects
Ledger write + outbox event in one transaction → email receipt, analytics without losing money record.
| Failure | Handling |
|---|---|
| PSP timeout | Query PSP status; never assume failure |
| Duplicate webhook | Idempotent webhook handler |
| Partial saga | Compensating refund saga |
PCI DSS Layers
SAQ A if all card data on Stripe Elements — smallest compliance burden.
Currency & Rounding
Store amounts in minor units (cents) as integers — never float for money.
Chargeback Flow
Webhook dispute.created → freeze merchant payout → evidence upload workflow.
Worked Example: Payments
Stripe webhook idempotent by event_id unique index. Ledger append-only, never UPDATE amount.
Extended Notes
Connect payments to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Payments
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Double-entry why?
Audit trail, imbalance detection fraud, accounting compliance.
Idempotency key storage TTL?
24–72 hours covers client retry windows; Stripe documents 24h.
Webhook ordering?
Do not assume order — use event_id dedup and state machine.
Extended Reference — Payments & Ledger
Immutable ledger
Append-only entries; corrections via compensating entries not UPDATE.
Minor units
BIGINT cents prevents float rounding 0.1 + 0.2 bugs.
PSP abstraction
Interface PaymentProvider — swap Stripe/Adyen; mock in tests.
Fraud checks
Sync fraud score before capture — async review for high value.
Regulatory
KYC/AML separate service; PCI scope minimization documented.
Part 25 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Double-entry ledger
- Idempotency Stripe
- Webhook dedup
- Integer cents
- Saga payment
- Reconciliation batch
- PCI tokenize
- Never float money
- Outbox notify
- Compensate refund
Self-test prompt
Explain Part 25 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 25 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 26: Notification System Design
Channels
Email, SMS, push (FCM/APNs), in-app, web push. Each channel has different providers, rate limits, cost, and delivery guarantees.
High-Level Architecture
[Event: order shipped] → Kafka → [Notification Service]
├→ Email worker → SendGrid
├→ SMS worker → Twilio
└→ Push worker → FCM/APNsUser Preferences
Store per-user channel opt-in, quiet hours, locale. Check preferences before enqueue. Regulatory: marketing vs transactional (CAN-SPAM, TCPA).
Template & Localization
Template ID + variables rendered per locale. Version templates; A/B test subject lines offline.
Delivery & Retries
At-least-once queue per channel. Exponential backoff on provider 5xx. DLQ for bad addresses. Track delivery webhooks (email opened, bounce).
Volume Estimation
10M DAU × 5 notifications/day = 50M messages/day ≈ 580/sec average, higher peak. Shard queue by user_id. Rate limit per provider (SMS expensive).
Idempotency
Event id + notification type dedupes — avoid duplicate push on Kafka replay.
Priority Queues
Transactional (password reset) > marketing. Separate queues so blast campaign does not delay 2FA codes.
Monitoring
Metrics: sent, delivered, failed, latency per channel. Alert on bounce rate spike (bad list) or provider outage.
| Channel | Latency | Cost |
|---|---|---|
| Push | Seconds | Low |
| Seconds–minutes | Low | |
| SMS | Seconds | High |
Push Token Registry
device_tokens table: user_id, platform, token, updated_at. Invalidate on bounce.
Email Bounce Handling
Hard bounce → suppress address. Soft bounce → retry with backoff.
Unsubscribe One-Click
List-Unsubscribe header for marketing compliance.
Worked Example: Notifications
Password reset: SMS+email parallel, priority queue bypasses marketing throttle.
Extended Notes
Connect notifications to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Notifications
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Push vs SMS for 2FA?
SMS deliverability issues — prefer TOTP app; SMS fallback with rate limit.
Notification dedup?
event_id + channel unique constraint before send.
Quiet hours?
Store user timezone; scheduler delays non-urgent marketing sends.
Extended Reference — Notification Systems
Template versioning
v2 template rollback if conversion drops — A/B metric driven.
Provider failover
Primary SendGrid fail → secondary SES — circuit breaker per provider.
Batching
Digest email aggregates 50 events — reduces send volume.
Compliance
STOP keyword for SMS; one-click unsubscribe link tracking.
Load test
Simulate Black Friday notification spike through queue without sending real SMS cost.
Part 26 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Multi-channel queue
- Priority queues
- Template locale
- Device token registry
- Bounce suppress
- Dedup event_id
- Rate limit SMS cost
- Quiet hours TZ
- Provider failover
- Transactional vs marketing
Self-test prompt
Explain Part 26 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 26 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 27: Full System Design Walkthroughs
Twenty-five classic interview problems. For each: clarify requirements, run numbers, define APIs, draw architecture, sketch schema, name bottlenecks, and discuss extensions. Time-box to 45 minutes per problem in mock practice.
How to practice: Minute 0–8 requirements + estimates. Minute 8–20 high-level diagram. Minute 20–38 deep dive (interviewer choice). Minute 38–45 trade-offs and monitoring. Record yourself and score with Part 33 rubric.
URL Shortener (TinyURL)
Functional & Non-Functional Requirements
Scope summary: 100M URLs/month, 100:1 read:write
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
100M/mo → ~40 writes/s, ~4000 reads/s peak
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST /v1/urls {long_url} → {short_code}; GET /{code} → 302
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Hash (base62) or counter+encode; collision retry; custom aliases optional
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
urls(id, short_code PK, long_url, user_id, created_at); index on short_code
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Hot counter shard; cache redirects; DB for durability
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Analytics pipeline, expiration, abuse detection
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Paste Bin
Functional & Non-Functional Requirements
Scope summary: 10M pastes/month, public/private
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
~4 pastes/s, reads higher for popular
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST /pastes; GET /pastes/{id}; optional expiry
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Object store for body; metadata in SQL; CDN for public reads
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
pastes(id, user_id, visibility, expiry, s3_key, created_at)
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Large paste size; spam; dedupe identical content
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Syntax highlighting service, rate limits
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Distributed Rate Limiter
Functional & Non-Functional Requirements
Scope summary: 1M users, rules per API key
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Per-key QPS limits, sliding window
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Middleware checks X-RateLimit-* headers
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Redis sorted sets or token bucket per key; sync optional for strict
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
rules(key, limit, window); counters in Redis
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Redis memory; clock skew; burst traffic
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Hierarchical limits, dynamic config
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Web Crawler
Functional & Non-Functional Requirements
Scope summary: 1B pages, polite crawling
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Frontier queue dominates
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
BFS frontier; fetcher workers; dedupe URL hash
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
URL frontier queue, visited bloom, robots.txt cache
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
urls(url_hash PK, status, priority, last_crawled)
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Politeness per host; duplicate detection; DNS
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Distributed scheduling, PageRank pipeline
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Twitter / X News Feed
Functional & Non-Functional Requirements
Scope summary: 300M DAU, fan-out on write vs read
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
5K tweets/s write, massive read
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST /tweets; GET /timeline; follow graph
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Hybrid fan-out: celebrities fan-out on read; normal users on write
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
tweets, follows, timeline cache (Redis sorted sets)
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Hot users; thundering herd on celebrities
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Ranking ML, spaces, ads injection
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Functional & Non-Functional Requirements
Scope summary: Photo-heavy, social graph
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
S3 + CDN; metadata DB
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST /media; GET /feed; likes/comments
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Blob store; Cassandra for feeds; graph for follows
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
media, users, feeds, likes — denormalized counters
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Image processing pipeline; feed generation
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Stories TTL, recommendations
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
WhatsApp / Chat
Functional & Non-Functional Requirements
Scope summary: 1B messages/day, delivery guarantees
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
~12K msg/s average
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
WebSocket gateway; message service; presence
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Per-chat sequence; store-and-forward; offline inbox
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
messages(chat_id, seq, body, status); users, devices
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Connection count; multi-device sync; E2E optional
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Groups, media, encryption
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
YouTube / Netflix Video
Functional & Non-Functional Requirements
Scope summary: Upload + transcode + stream
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Huge egress bandwidth
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Multipart upload; HLS/DASH segments; CDN
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Upload → queue → transcode workers → object store + CDN
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
videos, renditions, view_counts
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Transcode cost; copyright; regional CDN
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Live stream, recommendations, DRM
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Uber / Lyft
Functional & Non-Functional Requirements
Scope summary: Real-time location, matching
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Geospatial index critical
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST /rides; driver location stream; match service
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Geohash/grid index; dispatch service; trip state machine
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
drivers(location, status), rides, users
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Split-brain matching; surge pricing events
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Pooling, ETA ML, payments
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Yelp Proximity Search
Functional & Non-Functional Requirements
Scope summary: Search nearby businesses
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Geospatial queries <100ms
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
GET /search?lat&lng&radius&query
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Elastic/OpenSearch geo_distance; cache popular cities
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
businesses(id, lat, lng, categories, rating)
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Index size; ranking relevance vs distance
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Reviews, photos, ads
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Ticketmaster
Functional & Non-Functional Requirements
Scope summary: High contention on-sale
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Spike 100x normal at drop
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Reserve → pay → confirm; queue users virtually
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Virtual waiting room; inventory row locks; idempotent booking
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
events, seats(status), reservations, orders
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Overselling; bots; payment failures
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Secondary market, dynamic pricing
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Dropbox
Functional & Non-Functional Requirements
Scope summary: File sync, conflict resolution
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Chunk-level dedupe
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Upload blocks; sync metadata; delta sync
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Metadata DB + block blob store; content-hash dedupe
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
files, blocks, devices, versions
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Large file uploads; conflict merges
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Sharing permissions, encryption
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Typeahead / Autocomplete
Functional & Non-Functional Requirements
Scope summary: Low latency <50ms
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Prefix queries, trending
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
GET /suggest?q=pre
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Trie or Elasticsearch completion; popular queries cache
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
n-gram index; query log aggregation
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Hot prefixes; personalization
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Spell-check, ranking by CTR
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
News Feed Ranking
Functional & Non-Functional Requirements
Scope summary: Personalized ranked feed
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
ML feature store + scoring
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Candidate generation → rank → filter
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Stream processing for features; cache ranked pages
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
posts, user_features, impressions
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Freshness vs relevance; filter bubbles
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Real-time re-rank, A/B infra
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Metrics Monitoring (Datadog)
Functional & Non-Functional Requirements
Scope summary: 1M metrics × 10 tags, write heavy
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Time-series DB
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Agents push; rollup; query API
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Kafka → TSDB (Cassandra/ClickHouse); downsampling
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
series_id, timestamp, value
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Cardinality explosion; query cost
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Alerting, anomaly detection
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Distributed Cache (Redis Cluster)
Functional & Non-Functional Requirements
Scope summary: Cache 100GB+, HA
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Consistent hashing shards
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
GET/SET; TTL; cluster gossip
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Redis cluster slots; replication per shard
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
In-memory only; persistence optional
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Hot keys; resharding
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Multi-DC, client-side caching
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
E-commerce Checkout
Functional & Non-Functional Requirements
Scope summary: Cart → inventory → payment
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Strong consistency for inventory
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST /checkout idempotent; saga for payment
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Reserve inventory; charge; confirm; compensate on fail
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
orders, inventory, payments — transactional
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Race on last item; double charge
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Fulfillment, returns, fraud
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Hotel Booking
Functional & Non-Functional Requirements
Scope summary: Date-range inventory
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Similar to tickets, less spike
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Search availability; book room-night
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Inventory per room-type per night; hold TTL
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
hotels, room_nights, bookings
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Overbooking policies; cancellation
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Rate parity, loyalty
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Google Docs Collaboration
Functional & Non-Functional Requirements
Scope summary: Real-time OT/CRDT
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
WebSocket + operation log
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Send ops; server orders; broadcast
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
OT or CRDT; snapshot + op log
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
doc_id, revision, operations
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Conflict resolution; offline sync
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Comments, permissions, history
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Stack Overflow
Functional & Non-Functional Requirements
Scope summary: Q&A, search, reputation
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Read-heavy
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
POST questions/answers; search; vote
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
SQL for integrity; ES for search; cache hot questions
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
posts, votes, users, tags
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Reputation gaming; duplicate detection
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Moderation queue, notifications
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Zoom Video Conferencing
Functional & Non-Functional Requirements
Scope summary: SFU/MCU architecture
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
UDP media, signaling TCP
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Signaling server; media SFU; TURN fallback
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Regional SFU mesh; recording to S3
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
rooms, participants, sessions
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: NAT traversal; CPU for video
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Webinar mode, breakout rooms
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Payment Wallet
Functional & Non-Functional Requirements
Scope summary: Ledger correctness
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
ACID + idempotency
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Transfer with idempotency-key; double-entry ledger
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Immutable ledger entries; balance materialized view
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
accounts, ledger_entries, transfers
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Exactly-once; reconciliation
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
KYC, fraud, multi-currency
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Notification Service
Functional & Non-Functional Requirements
Scope summary: Multi-channel delivery
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
1M notifs/min
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Enqueue → workers → email/SMS/push
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Priority queues; templates; device tokens
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
notifications, templates, user_preferences
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Provider rate limits; retries
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Digest batching, A/B
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Ad Click Aggregator
Functional & Non-Functional Requirements
Scope summary: 1M clicks/s aggregate
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Stream processing
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Kafka → Flink → OLAP
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
Counting, billing, fraud filters
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
raw_clicks stream; aggregates by campaign
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Late data; exactly-once billing
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Real-time dashboard, attribution
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
API Rate Limiter at Scale
Functional & Non-Functional Requirements
Scope summary: Global edge + regional
- Define MVP features vs phase-2 (analytics, admin, ML ranking).
- State who the users are (consumers, businesses, internal operators).
- Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
- Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
- Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
| Non-functional | Typical target | Design lever |
|---|---|---|
| Availability | 99.9%–99.99% | Multi-AZ, redundancy, health checks |
| Latency (p99) | 50–300 ms reads | Cache, CDN, regional deployment |
| Durability | No acknowledged write loss | Replication, fsync policy, backups |
| Scale | See estimates below | Sharding, async pipelines, autoscale |
Back-of-Envelope Estimates
Millions of keys
Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.
API Design
Edge PoP counters + sync; token bucket
# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid> # for POST/PUT that must not double-apply
X-Request-Id: <uuid> # tracing
# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.
High-Level Architecture
CDN edge + central Redis; GCRA algorithm
┌─────────────┐
Users ───────────►│ CDN / Edge │ (static, cacheable GETs)
└──────┬──────┘
▼
┌─────────────┐
│ Load Balancer│ L7 routing, TLS, WAF
└──────┬──────┘
▼
┌────────────────────────┐
│ Stateless API tier │ autoscale on CPU/latency
└───────────┬────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ Cache │ │ Primary │ │ Message Queue │
│ (Redis) │ │ Database │ │ (async work) │
└──────────┘ └────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Object store │ (media, large blobs)
└─────────────┘Data Model & Schema Sketch
policy store; sharded counters
-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.
Bottlenecks, Failure Modes & Mitigations
Primary risks: Cross-PoP consistency; config propagation
| Failure | Symptom | Mitigation |
|---|---|---|
| Traffic spike | Latency ↑, errors ↑ | Autoscale, queue absorption, rate limit |
| Hot key / shard | Single node saturated | Split key, local cache, random suffix |
| Dependency down | Cascading timeouts | Circuit breaker, timeouts, fallbacks |
| Data corruption | Incorrect state | Checksums, audits, idempotent replays |
Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.
Extensions & Senior Follow-Ups
Per-tenant custom limits, burst
- Multi-region active-active or active-passive — CAP trade-offs on writes.
- Cost: egress, storage tiering, reserved capacity vs serverless.
- Security: abuse, authZ scopes, encryption at rest and in transit.
- Migration: dual-write, backfill, feature flags for rollout.
Quick Reference: Picking Building Blocks
| Need | Often choose |
|---|---|
| Strong transactions | PostgreSQL + application saga for cross-service |
| Massive write throughput | Cassandra, DynamoDB, sharded MySQL |
| Full-text search | Elasticsearch / OpenSearch |
| Async decoupling | Kafka, SQS, RabbitMQ |
| Sub-ms reads | Redis cluster + CDN |
| Blob media | S3 + CloudFront |
Part 28: Trade-Off Matrices
How to Use Matrices in Interviews
After proposing a design, summarize decisions in a table: option A vs B across dimensions (latency, consistency, cost, ops complexity). Shows structured trade-off thinking.
SQL vs NoSQL
| Dimension | SQL (Postgres) | Document (Mongo) | Wide-column (Cassandra) | Key-value (DynamoDB) |
|---|---|---|---|---|
| Schema | Rigid, migrations | Flexible JSON | Row per partition key | Schemaless per item |
| Transactions | Multi-row ACID | Single-doc ACID | Per-partition lightweight | Conditional writes |
| Joins | Native | $lookup or app-side | Denormalize | No joins |
| Scale pattern | Read replicas + shard | Shard by key | Built for write scale | Managed partition |
| Best fit | Orders, accounts | Catalog, CMS | Feeds, metrics | Sessions, locks |
Push vs Pull (Updates)
| Dimension | Push | Pull |
|---|---|---|
| Latency to client | Low (server initiated) | Bounded by poll interval |
| Server connections | Stateful (WS) | Stateless HTTP |
| Missed messages | Need reconnect logic | Client controls cursor |
| Scale cost | Connection memory | Wasted empty polls |
| Example | Chat, live scores | Email client sync |
Fan-Out Write vs Read
| Fan-out on write | Fan-out on read | |
|---|---|---|
| Read cost | O(1) prebuilt | O(followees) merge |
| Write cost | O(followers) | O(1) |
| Celebrity problem | Severe | Manageable |
| Storage | High (many copies) | Low |
Cache Patterns
| Pattern | Consistency | Write amplification | When |
|---|---|---|---|
| Cache-aside | App-managed TTL | Low | General reads |
| Read-through | Cache loads on miss | Low | Simpler app code |
| Write-through | Sync to cache+DB | High | Strong read-after-write |
| Write-behind | Async to DB | Batch writes | Counters, analytics |
Consistency vs Availability (during partition)
| Choice | During partition | Example systems |
|---|---|---|
| CP | Reject ops to stay consistent | ZooKeeper, etcd |
| AP | Accept ops; reconcile later | Cassandra, DynamoDB (default) |
Monolith vs Microservices
| Factor | Monolith | Microservices |
|---|---|---|
| Time to market | Faster early | Slower (infra) |
| Scale | Vertical + replicas | Per-service scale |
| Failures | All-or-nothing deploy | Isolated blast radius |
| Data | Single DB joins | Distributed transactions hard |
REST vs gRPC (internal)
| REST+JSON | gRPC | |
|---|---|---|
| Performance | Good | Better |
| Contract | Loose | Strict proto |
| Browser | Yes | Needs gateway |
| Streaming | Limited | First-class |
Strong vs Eventual — When to Say What
Strong: inventory, wallet, booking. Eventual: likes, view counts, recommendations.
Blob Storage in SQL vs S3
| SQL BLOB | S3 | |
|---|---|---|
| >1MB file | Bad | Good |
| Metadata query | Good | Need index table |
Worked Example: Matrices
Document decision in interview: 'Chose Cassandra AP because write QPS 500K/s, accept eventual timeline.'
Extended Notes
Connect matrices to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Trade-Off Matrices
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
How to present matrix in interview?
After proposing design: 'Summarizing: SQL for orders, Redis cache, S3 media — see trade-offs.'
Push vs pull for mobile?
Push for engagement; pull for battery-sensitive background sync.
Extended Reference — Trade-Off Matrices
Using matrices well
Do not read table verbatim — highlight 2 cells relevant to your design decision.
Consistency spectrum
Place your feature on spectrum from strong to eventual — justify with product requirement.
Cost dimension
Add row: operational complexity 1–5 — microservices score high.
When matrices fail
Nuanced decisions need prose — matrix is summary not analysis.
Compare three options
SQL vs Dynamo vs Cassandra — pick two dimensions interviewer cares about.
Part 28 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- SQL vs NoSQL matrix
- Push vs pull
- Fan-out matrix
- Cache pattern matrix
- Monolith vs micro
- Summarize after design
- Two relevant cells
- Cost row optional
- Consistency spectrum
- Do not read table verbatim
Self-test prompt
Explain Part 28 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 28 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 29: Common Interview Mistakes
Jumping to Diagram Too Fast
Drawing boxes before requirements loses points. Spend 5–10 minutes on functional scope, DAU, read:write ratio, latency, consistency needs.
No Numbers
Architecture without BOE feels hand-wavy. Always compute rough QPS, storage, and bandwidth.
Single Point of Failure Blindness
One database, one region, one cache with no replica — interviewers will probe failure. Label replicas, failover, multi-AZ.
Ignoring the Hot Path
Optimize what users do 100×/day (read feed), not edge admin features. State which paths get cache, CDN, sharding.
Cache Everything
Cache without invalidation story or hit ratio assumption. Personalized data at CDN without Vary headers is a common trap.
Wrong Database Choice
Graph DB for simple CRUD; SQL for billion-scale write-heavy counters without plan. Justify with access pattern.
Over-Engineering
Kubernetes + Kafka + microservices for 1000 users MVP. Phased approach: monolith → cache → shard → extract services.
Under-Engineering Critical Paths
Payments with eventual consistency and no idempotency. Seat booking without transactions.
Not Thinking Aloud
Silent drawing confuses interviewer. Narrate trade-offs: "I could use X but choose Y because…"
Ignoring Interviewer Hints
Hints steer toward intended deep dive. If they ask "what if the DB is slow?" — discuss indexes and replicas, not unrelated CDN.
No Monitoring or Launch Plan
Senior candidates mention SLOs, feature flags, gradual rollout, rollback.
- Fix: use Part 2 framework every time
- Fix: end with trade-off summary table
- Fix: invite feedback: "Should we deep dive data model or scaling?"
Red Flags Interviewers Notice
- Vague 'we'll scale horizontally' without shard key
- No failure discussion
- Buzzwords without mechanism
- Copying Netflix stack for CRUD app
Recovery Phrases
"Let me step back and clarify scale assumptions" — shows maturity when caught in hole.
Worked Example: Mistakes
Candidate drew 15 boxes in 2 minutes with no requirements — failed communication dimension.
Extended Notes
Connect mistakes to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Common Mistakes
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Biggest junior mistake?
No requirements — jump to Kafka and microservices.
Biggest senior expectation?
Operational completeness: metrics, rollout, failure modes unprompted.
Extended Reference — Common Mistakes
Time management
Spending 25 min on DB schema before high-level diagram — reverse order loses structure points.
Hint integration
Interviewer says 'what about cache' — pivot immediately; ignoring hint is negative signal.
Overconfidence
Claiming zero downtime without explaining mechanism — credibility loss.
Underconfidence
Silence is worse than wrong try — think aloud partial ideas.
Post-interview
Do not argue feedback — note and improve.
Part 29 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Requirements first
- BOE before diagram
- No SPOF blind
- Hot path focus
- Trade-offs spoken
- Think aloud
- Take hints
- Monitoring mentioned
- Phased rollout
- No buzzword soup
Self-test prompt
Explain Part 29 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 29 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 30: Communication Scripts
Opening (First 2 Minutes)
"Thanks — before I design, I want to clarify scope. Is this mobile + web? Rough scale in DAU? Should I focus on read or write path first? Any constraints like existing AWS stack or strong consistency requirements?"
Clarifying Functional Requirements
- "Core user actions are X, Y, Z — anything else in v1?"
- "Do we need real-time updates or is 30-second delay OK?"
- "Public vs private content — different retention?"
- "Anonymous users or login required?"
Clarifying Non-Functional Requirements
- "Target p99 latency for reads? Writes?"
- "Availability target — 99.9% or 99.99%?"
- "Durability — can we ever lose a post / payment?"
- "Geographic focus — single region or global?"
While Estimating
"I'll assume 50M DAU, 10 reads per user per day — that's 500M reads/day, about 6K average QPS, ~30K peak with a 5× multiplier. Does that match your expectations?"
Introducing High-Level Design
"I'll sketch clients → CDN for static → load balancer → stateless API tier → cache → primary database, with async workers on a queue for heavy tasks."
Trade-Off Phrasing
| Instead of | Say |
|---|---|
| "We'll use NoSQL" | "Access pattern is key-value by user_id; I'll use Dynamo for horizontal scale; we give up cross-shard joins" |
| "We'll cache it" | "80% hit ratio assumed; TTL 5 min with invalidation on write" |
| "Eventually consistent" | "Followers may see new post up to 30s late; acceptable for feed per product" |
When Stuck
"I'm weighing fan-out on write vs read — for celebrities, hybrid is industry standard. I'll go hybrid unless you want to optimize for write simplicity."
Closing (Last 2 Minutes)
"To recap: stateless APIs behind LB, Redis timeline cache with hybrid fan-out, Postgres sharded by user_id, S3 for media, Kafka for async. I'd add p99 latency and replication lag alerts. With more time I'd detail search indexing and multi-region DR."
Responding to Challenges
"Good point — if the cache fails we degrade to DB with circuit breaker and higher latency; we don't fail closed unless data correctness requires it."
Deep Dive Invitation
"I can go deeper on data model, consistency, or ops — which is most valuable?"
Acknowledging Unknown
"I haven't operated Cassandra in prod; at high level it uses partition keys and tunable quorum — I'd partner with DBA for SLA specifics."
Worked Example: Scripts
Practice recording 5-min clarify+BOE aloud weekly; playback catches filler and silence.
Extended Notes
Connect scripts to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Communication
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
How long to clarify?
5–10 min acceptable — shows thoroughness; don't exceed without checkpoint.
How to handle 'you're wrong'?
Explore: 'If we need strong consistency here, I'd move writes to primary — does that match product?'
Extended Reference — Communication Scripts
Pacing
Pause after BOE: 'Does 100M DAU sound right?' — engages interviewer as collaborator.
Jargon control
Define acronyms once: 'CDN (edge cache)' — interviewer may be cross-functional.
Diagram narration
Left to right: 'User hits CDN, then...' — orient viewer continuously.
Trade-off sandwich
We gain X, we sacrifice Y, because product priority Z.
Closing question
Ask interviewer: 'What would you prioritize next for v2?' — shows curiosity.
Part 30 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Opening clarify script
- Assumption validation
- BOE narrated
- Trade-off sandwich
- Deep dive offer
- Stuck recovery phrase
- Closing recap 30s
- Ask interviewer question
- Acknowledge challenge
- Collaborative tone
Self-test prompt
Explain Part 30 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 30 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 31: 8-Week & 12-Week Study Plans
8-Week Plan (Intensive)
| Week | Focus | Daily topics (Mon–Sun) |
|---|---|---|
| 1 | Foundation | Part 0–2; 1 BOE exercise/day; 1 mock clarify-only |
| 2 | Estimation & scale | Part 3–4; daily latency quiz; scale 5 products on paper |
| 3 | Networking & caching | Part 5–7; draw CDN+LB for 3 apps |
| 4 | Databases | Part 8–12; SQL vs NoSQL matrix; saga exercise |
| 5 | Distributed systems | Part 11–13; CAP scenarios; Kafka ordering drill |
| 6 | Architecture styles | Part 14–17; rate limiter design; consistent hash drill |
| 7 | Ops & patterns | Part 18–21; SLO math; circuit breaker scenarios |
| 8 | Mocks & execution | Part 27–33; 3 full mocks; rubric self-score |
12-Week Plan (Steady)
| Week | Topics | Practice |
|---|---|---|
| 1–2 | Parts 1–3, 28–30 | 2 BOE drills/week; communication scripts aloud |
| 3–4 | Parts 4–7 | 1 design: URL shortener, rate limiter |
| 5–6 | Parts 8–10 | 1 design: Twitter feed, shard key exercises |
| 7–8 | Parts 11–15 | 1 design: chat; API style comparison writeup |
| 9–10 | Parts 16–22 | 1 design: Dropbox, payment ledger outline |
| 11 | Parts 23–26 | 1 design: notification system end-to-end |
| 12 | Parts 27, 32–33 | 4 full timed mocks; review mistake list |
Daily 90-Minute Block Template
- 15 min — flash review (latency table, CAP, one matrix)
- 45 min — read one Part section deeply; notes in own words
- 30 min — whiteboard mini-design or explain aloud recorded
Weekend Deep Work
Saturday: full 45-min mock with peer or AI. Sunday: postmortem using Part 33 rubric; update weak-area queue for next week.
Part 27 Walkthrough Rotation
Week 8+: one classic design daily from guide Part 27: URL shortener, Twitter, Uber, WhatsApp, YouTube.
Spaced Repetition
Re-read Parts 3, 11, 28 every 2 weeks — core interview anchors.
Worked Example: Study
Track hours: 40% reading, 40% whiteboard, 20% mock — adjust if mocks score low.
Extended Notes
Connect study to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Study Plans
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
8 vs 12 week plan?
8 if interview in 2 months intensive; 12 if part-time while employed.
How many mocks?
Minimum 8–12 full mocks before onsite loop.
Extended Reference — Study Plans
Active recall
Close guide; sketch Twitter on blank paper from memory — gaps drive next reading.
Spaced repetition
Anki deck for latency numbers, CAP, algorithms — 10 min daily.
Peer mocks
Swap interviewer role — teaching exposes gaps.
Company-specific
Meta: feed/ranking. Amazon: retail inventory. Google: search/index. Stripe: payments/idempotency.
Burnout prevention
One day off weekly — retention drops when exhausted.
Part 31 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- 8-week plan track
- 12-week if employed
- Daily 90 min block
- Weekend mock
- Part 27 rotation
- Spaced repetition
- Active recall
- Company specific focus
- Peer exchange
- Rest day weekly
Self-test prompt
Explain Part 31 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 31 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 32: Day-Before & Day-Of Checklist
Day Before Interview
- Review latency numbers (Part 3 table) — 10 min
- Skim trade-off matrices (Part 28) — 15 min
- Re-read communication scripts (Part 30) — 10 min
- One 25-min timed mini-design (clarify + BOE + high-level only)
- Prepare 2 questions for interviewer about team/system
- Test whiteboard tool (Excalidraw, CoderPad), camera, mic, internet backup
- Sleep 7+ hours — cognitive performance drops sharply when tired
Day Of — 2 Hours Before
- Light breakfast; hydrate
- No cramming new topics — confidence from frameworks
- Close noisy apps; phone silent
- Open blank board tab + one-page cheat sheet (BOE formulas only)
15 Minutes Before
- Bathroom, water nearby
- Deep breath; review opening script once
- Remind: collaboration, not exam — think aloud
During Interview
- Clarify requirements before drawing
- State assumptions and ask validation
- BOE before deep architecture
- Label diagram components and arrows
- Pause for questions: "Does this direction make sense?"
- Leave 5 min for summary and trade-offs
After Interview
Write notes while fresh: questions asked, hints given, what to study. Do not obsess on outcome — process improvement matters.
| Item | Done? |
|---|---|
| Tool tested | ☐ |
| Framework internalized | ☐ |
| Opening script ready | ☐ |
| Questions for interviewer | ☐ |
Virtual Interview Setup
- Second monitor for notes
- Browser zoom 100%
- Pen and paper backup if whiteboard fails
Energy Management
Back-to-back interviews: protein snack between; avoid heavy lunch carb crash.
Worked Example: Checklist
Bring water; interviewer waits if you need 10 seconds to think — say 'let me structure this.'
Extended Notes
Connect checklist to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Checklists
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Whiteboard tool failure?
Verbal description + ASCII in chat — communication still scored.
Post-interview note?
Within 1 hour: questions, hints, weak dimensions for next study week.
Extended Reference — Day-Before & Day-Of
Materials
Water, charger, backup internet hotspot for virtual.
Mindset
Interview is collaborative design session not exam — reduces anxiety.
During lag
If video freezes, summarize last sentence when reconnected — maintain thread.
Note taking
Interviewer may allow notes — have BOE formulas written.
After
Send thank-you not required at big tech — focus on self debrief.
Part 32 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Tool tested
- Latency table skim
- Matrices skim
- Opening script
- Water charger
- No cram new topic
- Think pause OK
- Post debrief notes
- Questions for interviewer
- Sleep priority
Self-test prompt
Explain Part 32 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 32 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
Part 33: Mock Interview Rubric — Self-Score
How to Use This Rubric
After each mock, score 1–5 per dimension (1 = weak, 5 = strong). Track weekly; target average ≥4 on dimensions that matter for level. Compare with Part 1 interviewer expectations.
Scoring Scale
| Score | Meaning |
|---|---|
| 1 | Missing or incorrect |
| 2 | Superficial mention |
| 3 | Adequate with gaps |
| 4 | Solid, minor misses |
| 5 | Strong, proactive depth |
Dimension Definitions
| Dimension | Score 1 | Score 5 |
|---|---|---|
| Requirements | Jumped to design | Functional + NFR + scale + constraints |
| Estimation | No numbers | Full BOE chain with stated assumptions |
| High-level design | Confusing diagram | Clear layers, labeled flows |
| Data model | Missing schema | Tables/keys/indexes justified |
| Scaling | No sharding/cache | Hot keys, replicas, CDN addressed |
| Reliability | Happy path only | Failures, retries, SPOF mitigation |
| Trade-offs | One-sided | Explicit pros/cons; matrices |
| Communication | Silent or rambling | Structured, collaborative, concise |
Self-Score Sheet (Copy Per Mock)
| Dimension | 1–5 | Notes / evidence |
|---|---|---|
| Requirements & scope | ||
| Back-of-envelope | ||
| API / interface design | ||
| High-level architecture | ||
| Data storage & model | ||
| Caching & CDN | ||
| Async / queues | ||
| Scaling & sharding | ||
| Consistency & reliability | ||
| Security & privacy | ||
| Observability & ops | ||
| Communication | ||
| Total /60 |
Interpretation
- 48–60: Interview-ready for most senior loops
- 36–47: Targeted study on lowest 3 dimensions
- <36: Repeat framework (Part 2); more mocks before real interviews
Action Template
Lowest dimension this week: ___. Study Part #___. Drill: one mock focusing only on that phase next session.
Peer Mock Exchange
Swap rubrics with study partner; score each other blind; compare self vs peer scores for calibration.
Weekly Trend
Plot total score week over week — plateau means change mock format (harder problems, shorter time).
Worked Example: Rubric
Score communication separately even if design weak — improves hire/no-hire in borderline cases.
Extended Notes
Connect rubric to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.
Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.
Reference related parts: see adjacent sections in this guide for complementary patterns.
Interview Question Bank — Rubric
Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.
Self-score inflation?
Compare with peer mock scores — calibrate harshly on communication and depth.
Hire bar mapping?
48/60+ consistent across 3 mocks suggests readiness for many FAANG loops.
Extended Reference — Mock Interview Rubric
Calibration
Score first mock harshly (3 average) — improvement visible by mock 5.
Dimension weighting
L5: depth + trade-offs weighted higher than perfect diagram art.
Communication 5
Requires thinking aloud entire session without long silence.
Tracking spreadsheet
Date, problem, scores per dimension, action items — weekly review.
Hire decision
Rubric guides study; actual hire uses holistic loop — don't overfit one mock score.
Part 33 Mastery Checklist
Before mock interviews, verify you can explain each item without reading:
- Score 12 dimensions
- Notes column evidence
- Weekly trend
- Peer calibration
- Action on lowest
- 48+ target
- Communication separate
- Mock count 8+
- Honest scoring
- Hire bar holistic
Self-test prompt
Explain Part 33 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.
Mock tie-in
Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).
Record score after self-test: /5 on confidence for Part 33 — revisit if below 4.
Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.
This guide was created with the help of Cursor, which assisted with structuring, drafting, and refining the content for clarity and completeness.
