System Design Interview Complete Guide

A complete, practical guide for system design interviews — frameworks, estimation, scalability patterns, and walkthroughs.

System Design Interview — Complete Mastery Guide

A self-contained reference for FAANG-style and senior engineering system design interviews

How to Use This Guide

This document is designed as your single source of truth for system design interview preparation. Read it linearly once for orientation, then use the sidebar table of contents to drill into weak areas. Each part builds on prior concepts: fundamentals (Parts 1–3), building blocks (Parts 4–20), patterns and storage (Parts 21–25), full designs (Part 27), and interview execution (Parts 28–33).

  • First pass (2 weeks): Parts 0–3, 28–30. Skim Part 27 walkthrough titles.
  • Second pass (4 weeks): Parts 4–20 in depth. Do one Part 27 walkthrough per day.
  • Third pass (ongoing): Mock interviews using Part 33 rubric. Re-read trade-off matrices before interviews.

Practice aloud: explain diagrams as if to an interviewer. Time-box yourself to 45 minutes per mock design.

Study modes: (1) Reading mode — understand concepts. (2) Active recall — cover diagrams and explain from memory. (3) Timed mock — random Part 27 problem, 45 min timer. (4) Peer review — swap designs and critique using Part 33 rubric.

Document Map

PartsTopicWhen to study
1–3Interview mechanics & estimationWeek 1
4–7Traffic path: LB, cache, CDNWeek 2
8–12Data: DB, replication, transactionsWeek 3–4
13–20Distributed systems & opsWeek 5–6
21–26Patterns & domain designsWeek 7
2725 full walkthroughsDaily practice weeks 8–12
28–33Execution & checklistsBefore every interview

↑ Back to top

Part 1: Interview Format & What Interviewers Score

Typical 45–60 Minute Structure

Most system design interviews at large tech companies run 45–60 minutes with a single problem. The first 5–10 minutes are requirements and scope; 10–15 minutes high-level architecture; 20–30 minutes deep dives on data model, scaling, and failure modes; the last 5 minutes trade-offs and extensions.

PhaseTimeYour Goal
Clarify & scope5–10 minFunctional/non-functional requirements, users, scale, constraints
High-level design10–15 minBoxes-and-arrows: clients, LB, services, caches, DBs, queues
Deep dive20–30 minSchema, APIs, sharding, consistency, bottlenecks — interviewer-led
Wrap-up5 minSummary, monitoring, future work, what you'd do with more time

What Interviewers Score

Interviewers use a holistic rubric, not a single correct diagram. They evaluate:

  • Problem solving: Can you decompose an ambiguous problem and prioritize what matters?
  • Technical depth: Do you understand how databases, caches, queues, and networks behave at scale?
  • Trade-off reasoning: Can you articulate why you chose SQL vs NoSQL, sync vs async replication, etc.?
  • Communication: Do you think aloud, check assumptions, and respond to hints?
  • Operational awareness: Monitoring, failure modes, security, cost — not just happy path.

Senior vs Mid-Level Expectations

DimensionMid (L4/L5)Senior (L6+)
ScopeOne clear product featureMulti-region, org boundaries, platform concerns
DepthCorrect building blocksCAP, consistency, idempotency, saga, observability SLOs
LeadershipFollows hintsProactively surfaces risks, drives discussion
EstimationOrder-of-magnitude OKBack-of-envelope with explicit assumptions

Virtual Whiteboard Tips

  • Use a consistent layout: users left, data stores right, async flows bottom.
  • Label arrows (HTTPS, events, replication). Unlabeled lines confuse you and the interviewer.
  • Draw incrementally — don't erase entire diagrams; add layers (MVP → scale).
  • Keep text large enough to read on a shared screen; abbreviate (API GW, DB) consistently.
  • Excalidraw, Miro, or built-in CoderPad — practice one tool before interview day.
  • When stuck, narrate: "I'd pause here and validate QPS assumptions with you."

↑ Back to top

Part 2: The Answer Framework

Use this repeatable framework for every design question. Interviewers recognize structured thinking even when the exact architecture differs.

Step 1: Requirements

Functional: What the system must do (users post tweets, shorten URLs, book seats).

Non-functional: Scale, latency, availability, durability, consistency, security, cost.

Example script: "I'll assume 100M DAU, read-heavy 100:1, p99 read latency under 200ms, 99.9% availability unless we need stronger consistency for payments."

Step 2: Constraints & Assumptions

  • Budget, team size, existing stack, regulatory (GDPR, PCI), geographic focus
  • Explicitly state what you are not building (e.g., ML ranking v1, admin portal)

Step 3: Back-of-the-Envelope

DAU → QPS (peak ~2–5× average), storage per object × objects/year, bandwidth. See Part 3.

Step 4: API Design

RESTful resources or RPC methods; idempotency keys for writes; pagination cursors. Keep to 5–8 core endpoints in the interview.

Step 5: High-Level Diagram

[Clients] → [CDN] → [LB] → [API Servers] → [Cache]
                              ↓
                    [Workers] ← [Queue] → [DB / Object Store]

Step 6: Deep Dives

Interviewer picks: data model, hot paths, sharding key, cache strategy, fan-out, consistency.

Step 7: Bottlenecks & Mitigations

DB write throughput, hot keys, thundering herd, single points of failure — pair each with a fix.

Step 8: Trade-offs Summary

One sentence each: "We chose eventual consistency for feeds because… at the cost of…"

Step 9: Closing Summary

Recap architecture in 30 seconds; mention monitoring and phased rollout.

↑ Back to top

Part 3: Back-of-the-Envelope Estimation

Why Estimation Matters in Interviews

Interviewers rarely expect exact numbers; they want to see that you decompose a fuzzy problem into measurable quantities, state assumptions explicitly, and sanity-check whether your architecture can handle the load. A five-minute back-of-the-envelope (BOE) prevents you from proposing a single MySQL instance for a billion-read-per-day product.

Good estimation is a chain of reasoning: daily active users (DAU) lead to actions per day, which become average and peak queries per second (QPS), which drive storage growth, egress bandwidth, cache sizing, and shard counts. Each hop should be spoken aloud so the interviewer can correct assumptions early.

Script: "With 100M DAU and each user viewing 20 pages/day, that is 2B page views/day. Dividing by 86,400 seconds gives ~23K average QPS; peak is often 2–5×, so I will plan for ~100K read QPS at peak."

Latency Numbers Every Engineer Should Know

Memorize orders of magnitude so you can reason without looking up charts. Times vary by hardware; use these as interview anchors when arguing for caches, CDNs, or async processing.

OperationTypical LatencyNotes
L1 cache reference0.5 nsCPU-local
Branch mispredict5 nsPipeline flush
L2 cache7 ns
Mutex lock/unlock25 nsUncontended
Main memory reference100 nsDDR4/5
SSD random read16 µsNVMe faster
Round trip in datacenter0.5 msSame AZ
Redis/Memcached RTT0.5–1 msLocal network
SSD sequential 1 MB1 ms
Disk seek (HDD)10 msAvoid in hot path
Send 1 MB over 1 Gbps LAN10 ms
Cross-country RTT40–80 msUS coast-to-coast
Read 1 MB from S3 (first byte)100–300 msRegion-dependent
Database query (simple indexed)1–10 msLocal DB
Complex DB join / full scan10–100+ msWhy indexes matter

Rule of thumb: one cross-region RTT (50–150 ms) dominates a datacenter cache hit (sub-ms). If your design needs 20 sequential RPCs across regions, latency will exceed 1 second before application logic runs — batch, parallelize, or move data closer.

From DAU to QPS

requests_per_day = DAU × actions_per_user_per_day
avg_QPS = requests_per_day / 86,400
peak_QPS ≈ avg_QPS × peak_multiplier   # often 2–5× for consumer apps

Example: 50M DAU, 10 timeline loads/day → 500M reads/day → ~5,800 average QPS → ~29K peak at 5× multiplier during evening hours.

Writes: Often one to two orders of magnitude lower than reads in social and feed products. State read:write ratio explicitly (e.g., 100:1). For write-heavy systems (logging, IoT ingestion), invert the analysis and size for ingest QPS first.

Storage Estimation Formulas

storage_per_year = objects_per_year × bytes_per_object × replication_factor
objects_per_year = (new_objects_per_second) × 86,400 × 365
ObjectSize (order of magnitude)
User profile row1–4 KB
Tweet / short post metadata300 B – 2 KB; media separate
Image (compressed)200 KB – 2 MB
Video minute (1080p)50–150 MB
Log line (JSON)0.5–2 KB
UUID + indexes overhead+30–50% on row size

Worked example: 10M new photos/day × 500 KB average × 3× replication ≈ 15 TB/day before compression and lifecycle tiering — clearly object storage (S3/GCS) territory, not inline BLOB columns in OLTP.

Account for soft deletes, audit trails, and backups: operational storage often exceeds raw user data by 2×. Cold tier (Glacier) reduces cost but not logical size on planning spreadsheets.

Power of Two for Capacity Planning

PowerExactApprox
2^101,024~1 thousand (1 KB)
2^201,048,576~1 million (1 MB)
2^301,073,741,824~1 billion (1 GB)
2^40~1.1×10^12~1 trillion (1 TB)
2^50~1.1×10^15~1 quadrillion (1 PB)

Use powers of two when estimating shard counts, hash ring size, and memory: a 32-bit user ID space has 4B values; at 1 KB per cached profile, fully populated memory would be 4 TB (never fully hot). Sharding by user_id mod 1024 yields 1024 shards — a clean power-of-two boundary.

Bandwidth Estimation

egress_Gbps = peak_QPS × avg_response_bytes × 8 / 10^9

1 Gbps ≈ 125 MB/s theoretical maximum. A 500 KB JSON API at 10K QPS needs roughly 40 Gbps egress at origin — CDN edge caching and compression are mandatory, not optional optimizations.

Include upload bandwidth for user-generated content: 1M uploads/day × 2 MB average ≈ 23 GB/s average if spread evenly — in reality peak upload windows concentrate load on ingress load balancers and object-store write paths.

Availability Math

Independent components in series multiply reliability: if A is 99.9% and B is 99.9%, combined ≈ 99.8%. Parallel redundancy improves availability: 1 - (1-p)^n for n identical redundant nodes.

NinesDowntime/year
99%3.65 days
99.9%8.76 hours
99.99%52.6 minutes
99.999%5.26 minutes

Interview tip: tie nines to product — 99.9% may be fine for a news feed; payment authorization often needs multi-region active-active and stricter SLOs. Mention error budgets (Part 18) when discussing how much downtime is acceptable.

Servers as a Sanity Check

Rough capacity: one modern app server might handle 500–2,000 RPS for light JSON (highly workload-dependent). 100K QPS divided by 1K per server ≈ 100 servers before cache — then apply cache hit ratio: 90% hit rate cuts origin load by 10×.

Database connection limits often bind before CPU: 500 app servers × 10 connections each = 5,000 connections — many managed Postgres tiers cap below that, requiring PgBouncer or fewer, larger connection pools with careful tuning.

[Assumption chain]
  DAU → actions/day → QPS (avg & peak)
       → storage/year (× replication)
       → bandwidth (× bytes/response)
       → cache hit ratio → DB QPS
       → shard count / machine count
       → monthly cost (servers + egress + storage)

Common BOE Mistakes

  • Forgetting peak multiplier and planning only for average QPS
  • Ignoring replication factor and backup storage in disk math
  • Using HDD seek latency assumptions for SSD/NVMe-backed stores
  • Treating CDN hit ratio as 100% without stating edge cache assumptions
  • Confusing bits and bytes in bandwidth (×8 conversion)

Practice Problem

Design a photo-sharing app BOE: 20M DAU, 3 photo views and 0.2 uploads per user per day, 400 KB average display size, 2 MB upload, 5-year retention. Walk through QPS, storage/year, and peak egress. Compare with and without 85% CDN cache hit on reads.

Interview BOE Drills

Practice these until automatic:

  1. URL shortener: 100M URLs/month → writes/s; 10:1 read → read QPS; 500 B row → GB/year
  2. Photo app: 10M photos/day × 2 MB → 20 TB/day raw; CDN hit ratio effect on origin
  3. Chat: 1M concurrent × 1 msg/min → message QPS; WS memory per connection

Latency Budget Example

p99 target 200ms for API: CDN 20ms + LB 5ms + app 30ms + cache 2ms + DB 80ms + serialization 10ms + margin 53ms. If DB is 80ms, you cannot add 5 sequential microservice hops without busting budget.

Availability Budget Math

99.9% monthly ≈ 43 minutes downtime. If deploy 20 times/month with 0.1% blast per deploy, plan canary and automatic rollback. Error budget policy links reliability to release velocity.

Worked Example: News Site BOE

Assumptions: 20M DAU, 50 article views/user/day, 500 KB average page (HTML+assets), 80% CDN hit ratio.

Views/day = 20M × 50 = 1B. Avg QPS = 1B/86400 ≈ 11,600. Peak 5× ≈ 58,000 read QPS.

Origin QPS = 58K × 20% = 11,600 if CDN handles 80%. Egress without CDN: 1B × 500KB = 500 TB/day — impossible without CDN.

Storage: 10K new articles/day × 50 KB text + 2 MB images × 3 replicas ≈ 60 GB/day text-heavy; media in S3 not counted in DB row size.

Q&A

Q: Why powers of two for shards? A: Clean routing bitmask (user_id & 0x3FF), even split in consistent hash rings.

Q: How many servers for 58K QPS? A: If 2K RPS/instance → ~30 origin app servers before DB/cache; cache cuts DB load further.

Bandwidth Worked Numbers

PayloadQPSEgress Gbps
1 KB JSON100K0.8
10 KB100K8
100 KB100K80
1 MB10K80

Interview Question Bank — Back-of-Envelope

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How do you estimate peak QPS from DAU?

DAU × actions per day / 86400 × peak multiplier (2–5×). State assumptions explicitly.

How much storage for 5 years of tweets?

Daily tweets × size × 365 × 5 × replication. Separate media to object storage.

What latency dominates cross-region design?

RTT 50–150ms per round trip — minimize sequential RPCs.

How do you convert availability % to downtime?

99.9% ≈ 8.76 hours/year. Use for error budget discussions.

Additional BOE Practice

Review this section with Part 27 walkthroughs — apply boe calculations to each classic problem.

ExerciseGoal
Recalculate QPSUnder 2 min without notes
Identify bottleneckLabel on diagram
Propose mitigationWith trade-off sentence

Part 3 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Latency table memorized
  2. DAU→QPS formula
  3. Storage/year calc
  4. Bandwidth Gbps
  5. Power of two
  6. Availability nines
  7. Assumption chain spoken
  8. Peak multiplier 2-5x
  9. Sanity check servers
  10. CDN impact on egress

Self-test prompt

Explain Part 3 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 3 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 4: Scalability

What Scalability Means

Scalability is the ability of a system to handle increased load by adding resources without redesigning core architecture. In interviews, distinguish vertical scaling (bigger machines) from horizontal scaling (more machines). Most web-scale systems horizontal-scale stateless tiers and partition stateful data.

Scalability has dimensions: load (QPS), data volume, fan-out complexity, geographic distribution, and team/org scale. Clarify which dimension dominates for the problem at hand.

Vertical vs Horizontal Scaling

AspectVertical (scale-up)Horizontal (scale-out)
HowMore CPU/RAM/disk on one nodeAdd nodes behind LB
LimitsHardware ceiling, single point of failureRequires partition-friendly design
Cost curveExpensive high-end boxesCommodity hardware, linear-ish
DowntimeOften requires restartRolling deploys, replace nodes
Interview useQuick MVP, DB until shardingDefault for stateless app tier

Databases often scale vertically first (read replicas, bigger instance), then shard horizontally. Application servers horizontal-scale from day one in most designs.

Stateless Application Tier

Stateless servers store no session data locally; any request can land on any instance. Session state lives in client tokens (JWT), centralized session store (Redis), or database. This enables elastic autoscaling and zero-downtime deploys.

         [LB]
       /  |     [App1][App2][App3]  ← no local session
       \  |  /
      [Redis sessions] or [JWT in cookie]

Anti-pattern: sticky files on disk per server without shared storage — breaks scale-in and causes data loss on node termination.

Sticky Sessions

Load balancers can pin a user to one backend via cookie or connection affinity. Useful when legacy app keeps local cache or non-replicated sessions. Downsides: uneven load, poor failover, complicates deploys and autoscaling.

  • When acceptable: Short migration period, WebSocket origin pinning with reconnect logic
  • Prefer instead: External session store, stateless APIs, connection draining on deploy
  • If you mention sticky sessions, always note load imbalance risk and mitigation (session replication)

Autoscaling

Autoscaling adjusts instance count based on metrics (CPU, request count, queue depth, custom business metrics). Scale-out triggers add capacity before SLO breach; scale-in removes idle capacity to save cost.

SignalProsCons
CPU utilizationSimpleLaggy; misleads on I/O-bound work
Request rate / latency p99User-visibleNeeds good LB metrics
Queue depthGreat for workersNot for synchronous API tier alone
Schedule-basedPredictable peaks (TV events)Wastes capacity if wrong

Cooldown periods prevent flapping. Warm pools and pre-warmed AMIs reduce cold-start latency for latency-sensitive APIs. Mention minimum instance count for availability during scale-from-zero (if allowed).

Scaling Stateful Components

Caches scale via clustering and consistent hashing. Databases scale via read replicas, sharding, and federation. Queues scale via partitions and consumer groups. Each stateful layer needs its own scaling story — do not assume app autoscaling fixes DB writes.

Bottleneck Hierarchy

  1. Single DB master write throughput
  2. Hot keys / hot partitions
  3. Expensive synchronous RPC chains
  4. Lock contention on shared resources
  5. Thundering herd on cache miss
  6. Cross-region replication lag

Interview flow: identify the first bottleneck at estimated peak load, propose mitigation, re-estimate capacity, repeat.

Elasticity vs Performance

Serverless and aggressive autoscaling maximize elasticity; fixed large pools minimize tail latency variance. Cost vs latency trade-off: financial systems may keep warm capacity; batch analytics may scale to zero overnight.

Senior signal: Discuss scaling limits of the team — microservices scale independently but multiply operational overhead. A monolith with modular boundaries may scale further with one on-call rotation.

Case Study: E-commerce Checkout

Browse/catalog tier: horizontal stateless, CDN, read replicas. Cart: Redis per user with TTL. Checkout: smaller pool, stricter timeouts, idempotent payment API, queue for order fulfillment. Scale browse 100× checkout — different tiers, different scaling policies.

[Browse]  → many replicas, CDN, cache-heavy
[Cart]    → Redis cluster, moderate replicas
[Checkout]→ few replicas, sync payment, saga async

Scaling Case Study

Instagram scaled Python app servers horizontally behind LB; Memcached for hot objects; sharded Postgres/Cassandra for data. Key lesson: stateless app tier scales linearly until database becomes bottleneck — then shard or cache.

Auto-Scaling Signals

SignalScale out whenCaution
CPU>70% for 5 minCPU low but queue deep — scale on queue depth
Request rateApproaching RPS limit per instanceCoordinate with DB capacity
CustomKafka consumer lag > thresholdAdding consumers > partitions useless

Sticky Sessions Detail

Cookie-based affinity routes user to same server for session data in memory — fragile on deploy (drain connections). Prefer external session store (Redis) + stateless servers.

Worked Example: E-Commerce Checkout Scale

Black Friday 10× normal: auto-scale API 50→500 pods in 10 min. Database cannot scale 10× instantly — queue checkout requests, show wait time, prioritize payment capture.

Stateless cart in Redis keyed by session_id; order creation idempotent. Sticky sessions avoided — all state external.

Q&A

Q: Vertical vs horizontal first? A: Vertical until single-machine limits (CPU/RAM/disk IOPS), then read replicas, then shard writes.

Interview Question Bank — Scalability

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

When is vertical scaling enough?

Low traffic MVP, single-region, team velocity priority — until CPU/IO saturates.

What makes a service stateless?

Any instance handles any request; session in Redis/DB; no local disk state.

How does auto-scaling avoid flapping?

Cooldown periods, hysteresis thresholds, scale-up faster than scale-down.

Additional Scale Practice

Review this section with Part 27 walkthroughs — apply scale calculations to each classic problem.

ExerciseGoal
Recalculate QPSUnder 2 min without notes
Identify bottleneckLabel on diagram
Propose mitigationWith trade-off sentence

Part 4 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Vertical limits named
  2. Stateless app tier
  3. Sticky session downside
  4. Auto-scale signals
  5. Scale DB last
  6. Read replicas
  7. Connection pool limits
  8. Split compute/storage
  9. No local disk state
  10. Phase scaling plan

Self-test prompt

Explain Part 4 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 4 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 5: Load Balancing

Role of Load Balancers

Load balancers distribute traffic across healthy backends, terminate TLS, enforce routing rules, and provide a stable endpoint while instances churn. They sit between clients and your application tier, and between internal services in multi-tier designs.

Layer 4 vs Layer 7

LayerOSIRoutes onAware of HTTPUse case
L4TransportIP + portNoTCP pass-through, gaming, extreme throughput
L7ApplicationURL path, headers, hostYesREST APIs, sticky cookies, A/B routes

L4 LB forwards packets with minimal inspection — lower latency, cannot route /api to different pool than /static. L7 can route Host: api.example.com to gRPC pool and www to web pool; can inject headers (X-Request-ID).

Load Balancing Algorithms

AlgorithmBehaviorWhen to use
Round robinCycle backendsHomogeneous, equal capacity
Weighted round robinProportional to weightMixed instance sizes
Least connectionsFewest active connsLong-lived requests, variable duration
Least response timeLowest latency backendHeterogeneous performance
Random + two choicesPick 2 random, use least loadedPower of two choices — near-optimal
IP hashClient IP → fixed backendLegacy sticky without cookies

Consistent hashing (Part 17) appears at cache layers and some L7 gateways for shard-aware routing. Do not confuse LB algorithms with data partitioning hashes.

Health Checks

Active health checks: LB periodically calls /health and removes failing nodes. Passive checks: observe error rates from real traffic. Use deep checks sparingly — hitting DB on every probe overloads dependencies.

  • Liveness: Process up? Return 200 if server binds port.
  • Readiness: Can serve traffic? DB connected, cache warmed, migrations done.
  • Kubernetes: liveness vs readiness probes map directly to interview answers

Graceful shutdown: on SIGTERM, stop accepting new connections, drain in-flight requests (30–60s), then deregister. Prevents 502 spikes during deploys.

Global Load Balancing

Global server load balancing (GSLB) directs users to nearest healthy region using DNS, anycast, or edge networks. Goals: lower latency, disaster recovery, regulatory data residency.

User in Tokyo → GSLB → ap-northeast-1
User in London → GSLB → eu-west-1
Region failure → DNS/health failover → us-east-1

Challenges: cross-region data consistency, session stickiness across regions, cache invalidation globally. Often pair GSLB with geo-replicated data or region-scoped user accounts.

DNS Load Balancing

DNS returns multiple A/AAAA records with short TTL (30–300s). Clients pick randomly or by resolver behavior — crude load spread. DNS failover removes unhealthy IPs after TTL propagation delay.

Limitations: DNS caching causes stale routes; not good for fine-grained load control. Commonly combined with Anycast IP (one IP, BGP routes to nearest POP) at CDN/LB edge.

TLS and Connection Management

TLS termination at LB offloads crypto from app servers. TLS passthrough preserves end-to-end encryption but limits L7 routing. HTTP/2 and gRPC multiplex many streams on one connection — least-connections matters more than round robin.

Internal Service Load Balancing

Sidecars (Envoy) and client-side LB (gRPC name resolution) distribute east-west traffic inside Kubernetes. Service mesh adds retries, timeouts, circuit breaking at data plane — see Part 14.

Failure Modes

  • Thundering herd when all backends marked unhealthy — keep minimum healthy pool
  • SYN flood — SYN cookies, rate limits at edge
  • LB itself as SPOF — cloud LB is managed; self-hosted needs HA pair (VRRP)
  • Misconfigured idle timeout killing long WebSockets

Interview Checklist

  1. Where does TLS terminate?
  2. L4 or L7 — can we route by path/host?
  3. Health check type and drain strategy on deploy?
  4. Single region or GSLB — how failover works?

Health Check Types

  • Liveness: process up? Restart if fails
  • Readiness: can accept traffic? Remove from LB if DB down
  • Deep check: optional dependency ping — use sparingly (cascades)

Global Load Balancing

GeoDNS or Anycast routes user to nearest healthy region. Health checks per region; failover when region degraded. Data replication lag limits active-active for strongly consistent apps.

DNS Round Robin vs LB

DNS multiple A records — client picks; TTL caching causes stale routes. Application LB (ALB, NGINX) preferred for HTTP with health checks.

Worked Example: Global API

Users in US, EU, APAC. Route53 latency-based routing to regional ALB. EU data stays EU (GDPR). Health check removes region on 5xx spike.

Q&A

Q: L7 vs L4 for WebSocket? A: L7 ALB supports WS upgrade; L4 passes through opaque TCP — use when need raw TCP or extreme throughput.

Interview Question Bank — Load Balancing

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Why least connections vs round robin?

Long-polling/WebSocket ties up connections — least connections balances better.

How do health checks cause outages?

Too aggressive checks mark healthy nodes bad — use readiness not deep dependency chain.

Explain DNS load balancing limits.

TTL caches old IPs; not aware of server load — use for geo routing with health-checked endpoints.

Additional LB Practice

Review this section with Part 27 walkthroughs — apply lb calculations to each classic problem.

ExerciseGoal
Recalculate QPSUnder 2 min without notes
Identify bottleneckLabel on diagram
Propose mitigationWith trade-off sentence

Part 5 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. L4 vs L7 explained
  2. Round robin vs least conn
  3. Health check types
  4. SSL at LB
  5. Global DNS routing
  6. Avoid DNS round robin pitfalls
  7. Session affinity alternative
  8. LB as choke point HA
  9. DDoS at edge
  10. Cross-zone LB

Self-test prompt

Explain Part 5 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 5 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 6: Caching

Why Cache

Caching stores copies of expensive-to-compute or expensive-to-fetch data closer to the consumer. A cache hit avoids repeated database queries, RPC chains, or disk reads. In system design interviews, caching is often the difference between a design that meets 100ms p99 and one that collapses at 10K QPS.

Caches trade freshness for speed. Every cache introduces staleness risk and invalidation complexity — state these trade-offs explicitly rather than treating cache as a free performance boost.

Cache Layers

Caches exist at every layer of the stack. Understanding the hierarchy helps you place the right cache for the right bottleneck.

LayerExamplesTypical TTLInvalidation
ClientBrowser HTTP cache, mobile diskMinutes–daysCache-Control headers
CDN / EdgeCloudFront, CloudflareSeconds–hoursURL purge, versioned paths
API gatewayResponse cache by routeSecondsKey eviction
ApplicationIn-process LRU (Caffeine)Seconds–minutesProcess restart
DistributedRedis, MemcachedMinutes–hoursTTL, pub/sub invalidation
DatabaseBuffer pool, materialized viewsVariesQuery refresh, CDC

Cache-Aside (Lazy Loading)

Application checks cache first; on miss, reads from DB, writes to cache, returns. Most common pattern for read-heavy workloads.

value = cache.get(key)
if value is None:
    value = db.get(key)
    cache.set(key, value, ttl=300)
return value
  • Pros: Only caches requested data; survives cache failure (degrades to DB)
  • Cons: First request always slow; stale data if DB updated without invalidation
  • Race: Two misses can double-load DB — use singleflight or lock per key

Read-Through & Write-Through

Read-through: Cache library loads from DB on miss transparently to app. Write-through: Writes go to cache and DB synchronously — cache always consistent but write latency equals DB latency.

Write-behind (write-back): Writes update cache immediately; async flush to DB. Higher write throughput but risk of data loss on cache crash before persistence — use for analytics counters, not financial balances without durable queue.

Eviction Policies

PolicyBehaviorUse when
LRUEvict least recently usedGeneral purpose hot set
LFUEvict least frequently usedStable popularity skew
TTLTime-based expiryNaturally stale data (feeds, config)
RandomSimple, no metadataMemcached default at scale
Size-basedMax memory cap triggers evictionRedis maxmemory-policy

Cache Stampede (Thundering Herd)

When a hot key expires, thousands of requests may miss simultaneously and hammer the database. Mitigations:

  1. Probabilistic early expiration — jitter TTL so keys do not expire together
  2. Lock / singleflight — first miss rebuilds; others wait or serve stale
  3. External pre-warm — background job refreshes hot keys before expiry
  4. Stale-while-revalidate — return old value while async refresh runs

TTL Strategy

Short TTL for rapidly changing data (stock prices). Long TTL + explicit invalidation for user profiles. Version keys (user:123:v5) allow instant logical invalidation without scanning Redis.

Negative caching: cache 'not found' briefly to protect DB from repeated lookups for bogus IDs (security scanning, bots).

Consistency & Invalidation

Invalidation strategies: delete key on write; publish invalidation event to all app servers; rely on TTL only for low-stakes data. Event-driven invalidation scales better than broadcast for large fleets.

[Write path]
  Client → API → DB commit → publish invalidation
                    → subscribers delete cache keys

Redis vs Memcached

FeatureRedisMemcached
Data structuresStrings, hashes, lists, sets, streamsStrings only
PersistenceOptional RDB/AOFPure memory
ClusteringRedis Cluster, sentinelClient-side consistent hash
Typical useSessions, leaderboards, pub/subSimple object cache

Interview Pitfalls

  • Caching without stating hit ratio assumption in BOE
  • No plan for cold start or cache cluster failure
  • Caching personalized data at CDN without Vary: Cookie
  • Ignoring memory cost at scale (1M keys × 10 KB = 10 GB)

Cache Key Design

Namespace keys: v1:user:123:profile. Version prefix enables bulk invalidation on schema change. Avoid unbounded key cardinality (per-request keys).

Memcached vs Redis for Pure Cache

Memcached multithreaded, simple evict — pure cache layer at Facebook scale. Redis when you need structures (sorted sets for leaderboards) or persistence.

Multi-Layer Example

Browser cache → CDN → API in-process LRU → Redis → DB
  95%           90% of remainder    80%              hit

Worked Example: Product Page Cache

Cache-aside key product:42 TTL 300s. On price update, DELETE key + publish invalidation to local caches. Stampede on flash sale: singleflight + pre-warm top 1000 SKUs.

Q&A

Q: Write-behind for inventory? A: Risky — loss on crash. Use for analytics page views, not stock count.

Interview Question Bank — Caching

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How do you prevent cache stampede?

Jitter TTL, singleflight, stale-while-revalidate, proactive pre-warm.

Cache-aside vs write-through?

Cache-aside: flexible, app controls. Write-through: stronger consistency, higher write latency.

When is negative caching used?

Repeated lookups for non-existent keys — bots scanning IDs — short TTL prevents DB hammering.

Additional Cache Practice

Review this section with Part 27 walkthroughs — apply cache calculations to each classic problem.

ExerciseGoal
Recalculate QPSUnder 2 min without notes
Identify bottleneckLabel on diagram
Propose mitigationWith trade-off sentence

Part 6 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Cache layers drawn
  2. Cache-aside flow
  3. Write-through vs behind
  4. Eviction policy pick
  5. TTL + invalidation
  6. Stampede mitigation
  7. Redis vs Memcached
  8. Hit ratio in BOE
  9. Negative caching
  10. Cache failure degrade

Self-test prompt

Explain Part 6 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 6 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 7: CDN Deep Dive

What a CDN Does

A Content Delivery Network caches static and cacheable dynamic content at Points of Presence (POPs) geographically distributed near users. Reduces origin load, latency, and egress cost. Essential when BOE shows high read QPS or large asset payloads (images, video segments, JS bundles).

CDN Architecture

[User] → [DNS GeoDNS] → [Edge POP cache]
              miss ↓
         [Shield / Mid-tier] → [Origin / S3]

Edge POP serves from SSD/RAM. Shield layer collapses origin fetches — many edge misses become one shield-to-origin request. Origin shield protects S3 from thundering herd during viral content.

What to Cache at the Edge

  • Static assets: CSS, JS, images with content-hash filenames (immutable)
  • Video segments (HLS/DASH .ts chunks) with long TTL
  • API responses only if identical for many users (public product catalog)
  • Do NOT cache authenticated personalized HTML without careful Vary headers

Cache Control & Headers

HeaderPurpose
Cache-Control: max-ageBrowser and CDN TTL
s-maxageCDN-specific TTL (shared caches)
stale-while-revalidateServe stale while fetching fresh
ETag / If-None-MatchConditional GET — 304 saves bandwidth
VaryCache variants by Accept-Encoding, Cookie, etc.

Versioned URLs (/static/app.v42.js) allow infinite TTL — invalidation is deploy a new filename. Purge API needed for emergency takedown of bad assets.

Dynamic Content Acceleration

CDNs can terminate TLS closer to user, use persistent connections to origin, and route over private backbone (AWS CloudFront to S3). Dynamic Site Accelerator still cannot cache POST responses — focus on connection reuse and TCP optimization.

Video Streaming & CDN

Adaptive bitrate streaming splits video into small files; CDN caches each segment independently. Live streaming uses low-latency protocols (LL-HLS) and origin packagers — harder than VOD. BOE: concurrent viewers × bitrate = egress Gbps.

Invalidation & Consistency

Purge by URL, wildcard, or tag (Cloudflare cache-tags). Propagation takes seconds to minutes globally. Prefer immutable assets over purge for routine deploys. For news sites, short TTL + stale-while-revalidate balances freshness and load.

Security at CDN Edge

  • DDoS absorption — CDN scales to absorb volumetric attacks
  • WAF rules at edge (OWASP Top 10 patterns)
  • Bot management, rate limiting before origin
  • Geo blocking, IP allowlists for admin paths

Multi-CDN & Failover

Large properties use multiple CDNs for resilience and price negotiation. DNS or traffic manager weighted routing splits traffic. Complexity: cache efficiency drops if same asset on two CDNs — coordinate TTL and purge.

Cost Model

CDN bills per GB egress and request count. Origin egress to CDN often cheaper than internet egress. Calculate: monthly page views × asset size × (1 - edge_hit_ratio) = origin traffic. Improving hit ratio from 85% to 95% halves origin load.

Interview Script

"I will put all static media behind a CDN with content-hashed paths and 1-year TTL. API responses stay origin-only unless we have a truly public read API; user-specific data never caches at edge without explicit design."

CDN Providers Comparison (Conceptual)

FeatureTypical offering
Edge locations100+ POPs global
Origin shieldReduce origin load
Image optimizationResize on edge
Workers@EdgeLight compute at POP

Origin Collapse

Without shield: 1000 edge POPs miss simultaneously → 1000 origin requests. Shield tier: 1000 misses → 1 shield fetch → 1 origin. Critical for viral content.

Worked Example: Video Platform

1080p segment 2 MB, 10M views/day on popular video. CDN serves 95%; origin 500K segment fetches. Origin bandwidth 500K × 2MB = 1 TB/day manageable vs 200 PB/day without CDN.

Interview Question Bank — CDN

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

What should never be cached at CDN?

Personalized HTML with user PII, uncacheable Set-Cookie responses without Vary.

How does cache poisoning happen?

Host header attacks — validate Host, use signed URLs for origin.

Origin shield benefit?

Collapses many edge misses into one origin fetch during viral traffic.

Additional CDN Practice

Review this section with Part 27 walkthroughs — apply cdn calculations to each classic problem.

ExerciseGoal
Recalculate QPSUnder 2 min without notes
Identify bottleneckLabel on diagram
Propose mitigationWith trade-off sentence

Part 7 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. CDN for static
  2. Origin shield
  3. Cache-Control headers
  4. Immutable hashed assets
  5. Purge vs version URL
  6. Personalized not at edge
  7. Video segments
  8. Multi-CDN note
  9. Cost per GB
  10. DDoS absorption

Self-test prompt

Explain Part 7 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 7 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 8: Databases — SQL vs NoSQL

Choosing SQL vs NoSQL

SQL (relational) databases excel at structured data, ACID transactions, complex joins, and ad-hoc analytics. NoSQL broad category includes document, wide-column, key-value, graph, and time-series — each optimizes for specific access patterns at scale.

FactorSQL (Postgres, MySQL)NoSQL (varies)
SchemaFixed, migrationsFlexible or schema-on-read
TransactionsStrong ACIDOften per-document or eventual
JoinsNative, optimizerDenormalize or application-side
Scale writesVertical + sharding harderPartition-friendly (Cassandra, Dynamo)
Query patternsAd-hoc SQLMust know partition key upfront

Indexes & B-Trees

Most OLTP databases use B+ trees for indexes: balanced tree, O(log n) lookups, sequential leaf scans for range queries. Primary key cluster determines physical row order (InnoDB, Postgres clustered options).

Composite index (user_id, created_at) supports queries filtering on user_id and sorting by time — left-prefix rule: index useless for queries filtering only created_at without user_id.

  • Covering index includes all SELECT columns — avoids table lookup
  • Too many indexes slow writes (each index updated on INSERT)
  • Full table scan acceptable for rare admin reports, not user path

Normalization vs Denormalization

Normalization (3NF): Eliminate redundancy; joins reconstruct data. Good for OLTP consistency, smaller writes. Denormalization: Duplicate fields to avoid joins at read time — standard in Cassandra, MongoDB feed designs, and read-heavy SQL when join cost dominates.

Interview pattern: normalized writes in OLTP, denormalized read models via CDC to search/feed store (CQRS-lite).

Connection Pooling

Opening a DB connection is expensive (TLS, auth, memory). App servers use pools (PgBouncer, HikariCP) to reuse connections. Pool size ≈ (core_count × 2) + effective_spindle_count per Postgres folklore — but at scale, thousands of microservices × pool size can exhaust max_connections.

App (500 instances) → PgBouncer (transaction pooling) → Postgres
# Transaction pooling: connection returned after each transaction

Document Stores (MongoDB)

JSON documents, flexible schema, replica sets, sharded cluster by shard key. Good for catalogs, content management, user profiles with nested objects. Avoid unbounded document growth (embedding unbounded arrays).

Wide-Column (Cassandra, HBase)

Partition key determines node; clustering columns sort within partition. Optimized for high write throughput and time-series. Query must include partition key — designing access patterns first is mandatory.

Key-Value (DynamoDB, Redis)

Simple get/put by key, predictable latency at scale. DynamoDB: partition key + optional sort key, on-demand or provisioned capacity, GSIs for alternate access patterns (with consistency caveats).

Graph Databases

Neo4j, Neptune for relationship-heavy queries (social graph friends-of-friends, fraud rings). Not a replacement for primary OLTP at billion-user scale — often specialized subgraph service.

Operational Concerns

  • Backup, PITR, replication lag monitoring
  • Migration strategy (expand-contract, dual-write)
  • Read replica routing for analytics vs user traffic
[Write] → Primary SQL
[Read hot path] → Redis → optional replica
[Analytics] → Read replica / warehouse (never on primary)

Index Types Beyond B-Tree

  • Hash index: equality only (Postgres hash, limited use)
  • GIN/GiST: full-text, JSON, geo in Postgres
  • Column store: analytics (Redshift, ClickHouse)

Migration at Scale

Online schema change: gh-ost, pt-online-schema-change copy rows in background. Expand-contract: add nullable column → dual-write → backfill → switch reads → remove old.

Read Path Routing

ORM must distinguish writer vs reader endpoints. Stale replica reads acceptable for dashboards, not for "withdraw balance" immediately after deposit.

Worked Example: Social Graph in SQL

follows(follower_id, followee_id) composite index (follower_id, created_at). Query followees' recent posts: JOIN posts ON followee_id — at 10M followers for one user, denormalize celebrity follows to separate fan-out pipeline.

Interview Question Bank — Databases

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

B-tree vs LSM-tree?

B-tree: better read, default OLTP. LSM (RocksDB): better write throughput, compaction overhead.

When denormalize?

Read path >> write, join cost high, acceptable inconsistency window with CDC refresh.

Connection pool exhaustion symptom?

Timeouts under load while CPU low — increase pool cautiously or use PgBouncer.

Additional DB Practice

Review this section with Part 27 walkthroughs — apply db calculations to each classic problem.

ExerciseGoal
Recalculate QPSUnder 2 min without notes
Identify bottleneckLabel on diagram
Propose mitigationWith trade-off sentence

Part 8 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. SQL vs NoSQL table
  2. B-tree index
  3. Composite index rule
  4. Normalize vs denorm
  5. Connection pooling
  6. Read replica routing
  7. Shard when needed
  8. Migration strategy
  9. Covering index
  10. Avoid SELECT *

Self-test prompt

Explain Part 8 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 8 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 9: Replication

Why Replicate Data

Replication copies data across multiple nodes for read scalability, lower latency (geo-local reads), and fault tolerance. In interviews, always pair replication with a consistency story: synchronous replication favors durability; asynchronous favors write latency.

Leader-Follower (Primary-Replica)

One leader accepts all writes; followers tail the write-ahead log (WAL) or binlog. PostgreSQL streaming replication, MySQL binlog replication, and MongoDB replica sets follow this pattern. Reads can hit followers to scale SELECT traffic.

AspectDetail
Write pathClient → leader only
Read pathLeader or any follower (may be stale)
FailoverPromote follower via Patroni, Orchestrator, RDS Multi-AZ
RiskReplication lag → stale reads; split-brain if fencing fails

Synchronous vs Asynchronous Replication

ModeBehaviorTrade-off
SynchronousLeader waits for follower ACK before commitNo lost committed writes if leader dies; higher write latency
AsynchronousLeader commits locally; followers catch up laterLower latency; possible data loss on leader crash
Semi-syncWait for at least one followerBalance of durability and latency

Expose replication_lag_seconds as a metric. Route critical reads (balance, inventory) to leader or use linearizable reads; route timelines to followers with "may be stale" UX.

Multi-Leader (Multi-Primary)

Multiple nodes accept writes — useful for multi-datacenter active-active. Conflicts are inevitable when two leaders update the same row. Resolution strategies:

  • Last-write-wins (LWW): Timestamp-based; simple but can drop updates
  • Vector clocks / version vectors: Track causality; surface conflicts to application
  • CRDTs: Data structures that merge without conflicts (counters, sets) — good for collaborative editing

Interview probe: "Two users like the same post from different regions simultaneously — how do you merge counts?" Answer with idempotent increments or CRDT counters.

Leaderless Replication (Quorum)

Dynamo-style systems (Cassandra, Riak, DynamoDB internals): no single leader. Replication factor N; write quorum W; read quorum R. If W + R > N, reads see latest write (strong consistency for that config).

N=3, W=2, R=2  → tolerate 1 node failure, strong reads
N=3, W=1, R=1  → fast but weak; eventual consistency

Hinted handoff: Temporarily store writes for down nodes. Read repair: On read, detect stale replicas and update them. Anti-entropy: Background Merkle-tree comparison fixes drift.

Change Data Capture (CDC)

Stream WAL/binlog to Kafka (Debezium) → search index, warehouse, cache invalidation. Avoids dual-write bugs where app writes DB and search separately and they diverge.

[Leader DB] → WAL → CDC connector → Kafka → [Consumers]
                                              ├→ Elasticsearch
                                              ├→ Data warehouse
                                              └→ Cache invalidation

Replication Topology Diagram

Leader-Follower:
  Writes → [Leader] ──repl──→ [Follower1]
                    └──repl──→ [Follower2]
  Reads  → any node (stale OK?)

Multi-Leader:
  [DC-East Leader] ←──conflict──→ [DC-West Leader]

Leaderless (N=3):
  Client writes to any 2 of 3 nodes (W=2)

Interview Checklist

  • State sync vs async and what happens when leader dies mid-write
  • How followers are chosen for failover (lag, priority)
  • Whether reads need strong consistency or eventual is acceptable
  • How cross-region replication affects CAP trade-offs

Script: "I use leader-follower with async replication for the feed service — followers serve 95% of reads. Payment ledger reads go to the leader or a sync replica because we cannot tolerate lost commits."

Split-Brain Prevention

Fencing: isolate old leader via STONITH or lease in etcd before promoting replica. TTL lease shorter than failover detection time.

Read Replica Routing

PgBouncer + ORM: @replica hint for analytics. Causal consistency: read from replica that has applied at least transaction T.

Lag Monitoring

AlertAction
lag > 30sPage DBA
lag > 5minBlock promote failover

Worked Example: Replication

Leader failover in 30s: promote replica, update DNS/VIP, invalidate connection pools. Clients retry with backoff. Apps must handle brief write errors during failover.

Extended Notes

Connect replication to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Replication

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Sync replication when?

Financial ledger, leader election metadata — when lost write unacceptable.

Read replica lag handling?

Route critical reads to primary; show 'syncing' UX for non-critical stale reads.

What is split-brain?

Two nodes both think they are leader — use fencing and quorum.

Extended Reference — Replication

Write path latency

Synchronous replication adds RTT to nearest replica per commit — measure p99 write impact before enabling on hot path.

Semi-synchronous 'at least one replica' is popular compromise in MySQL production clusters.

Failover testing

Game day: kill primary during load test; measure detection time, promotion time, client error rate.

Applications must reconnect — connection pools stale to old primary IP until refreshed.

Global readers

Geo-routed read replicas serve local users; replication lag means EU user may not see US write for seconds.

Causal tracking: Google Spanner TrueTime; application-level: version tokens in API responses.

Binlog consumption

Multiple consumers read same binlog stream for search, warehouse, cache — coordinate retention size.

Binlog growth disk risk — monitor and archive to S3.

Interview diagram

Draw primary + 2 replicas; label sync vs async arrows; mark read traffic to replicas with 'stale OK' note.

Part 9 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Leader-follower diagram
  2. Sync vs async
  3. Replication lag metric
  4. Failover fencing
  5. Multi-leader conflicts
  6. Quorum W+R>N
  7. Read repair
  8. CDC pipeline
  9. Split-brain prevent
  10. Never hide lag

Self-test prompt

Explain Part 9 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 9 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 10: Partitioning & Sharding

Partitioning vs Sharding

Partitioning splits data within one database (Postgres table partitions by date). Sharding distributes partitions across independent database servers. Interviews often use the terms interchangeably for horizontal scale-out.

Partitioning Strategies

StrategyKeyProsCons
Rangeuser_id 1–1M on shard ARange queries efficientHot spots on latest range
Hashhash(user_id) mod NEven distributionRange scans across shards expensive
Geocountry/regionData locality, complianceUneven country sizes
Directorylookup table shard_idFlexible rebalancingLookup service is SPOF unless replicated

Choosing a Shard Key

The shard key determines query locality forever. Good keys: high cardinality, even distribution, align with dominant query pattern.

  • Good: user_id for user-scoped data — all user queries hit one shard
  • Bad: country if US is 40% of traffic — hot shard
  • Bad: created_at alone — all writes hit "today" shard

Composite keys ((tenant_id, user_id)) help SaaS multi-tenancy isolate noisy neighbors.

Hot Keys & Hot Shards

Celebrity problem: one logical key (Beyoncé's tweet ID) receives disproportionate traffic. Mitigations:

  1. Split key: logical key → 100 random suffix keys; read aggregates
  2. Local cache: in-process cache on each API server for hot entities
  3. Separate service: dedicated read path for global counters (Redis INCR sharded)
  4. CDN / edge: for read-heavy public content

Cross-Shard Operations

Joins across shards require scatter-gather (query all shards, merge) — expensive. Design schemas so hot queries are single-shard. Global secondary indexes (DynamoDB GSI) replicate data under alternate keys at write cost.

Resharding

When N shards is insufficient, move from 256 to 512 shards. Strategies:

  • Fixed partitions: 4096 logical partitions mapped to shards; move partitions between shards without changing app hash
  • Dual-write: write to old and new shard during migration
  • Backfill: copy data with CDC; cutover when caught up
  • Consistent hashing: only K/N keys move when adding a node (see Part 17)
[Router] hash(user_id) → shard map → [Shard 0] [Shard 1] ... [Shard N]
         hot key? → local cache / key splitting

Elasticsearch / Cassandra Sharding Notes

Elasticsearch: index split into shards + replicas; routing by document ID. Cassandra: partition key required in every query; clustering columns for sort within partition.

Interview Pitfalls

  • Sharding too early — single Postgres with read replicas handles surprising scale
  • Shard key that does not match access pattern
  • No plan for resharding or tenant growth

Uber Ringpop / Scuttlebutt

Service discovery + shard ownership — gossip protocol distributes shard map to nodes.

Vitess (YouTube)

MySQL sharding middleware: VTGate routes SQL by sharding key; resharding with minimal app change.

Interview: Design Sharded DB

Start with hash(user_id) mod 256 logical shards mapped to 32 physical MySQL instances. Router layer in app or sidecar.

Worked Example: Sharding

Reshard user_id 0-1M from shard A to new shard B: dual-write phase, backfill historical rows, verify counts, switch reads, stop writes to A.

Extended Notes

Connect sharding to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Sharding

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Hot shard mitigation?

Split key, local cache, async aggregation, dedicated hardware for hot tenant.

How to choose shard count?

Start 2× expected data size per shard; plan consistent hash virtual nodes for growth.

Cross-shard query?

Scatter-gather parallel queries + merge — expensive; redesign access pattern if frequent.

Extended Reference — Partitioning & Sharding

Shard map service

Directory service stores range → shard mapping; update map during migration without client redeploy if using discovery API.

Co-location

Place related entities on same shard: user_id shard carries user profile, settings, private posts — avoids cross-shard transactions.

Secondary indexes

Global index in Dynamo: scatter query all shards — high cost; prefer local GSIs with duplicated partition strategy.

Rebalancing

Consistent hash minimizes movement; still schedule low-traffic window; throttle migration bandwidth.

Monitoring

Per-shard QPS, storage, replication lag heatmap — detect hot shard before outage.

Part 10 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Shard key choice
  2. Hash vs range
  3. Hot key mitigations
  4. Cross-shard cost
  5. Resharding plan
  6. Directory lookup
  7. Co-locate related data
  8. Scatter-gather aware
  9. Vitess mention OK
  10. Monitor per shard

Self-test prompt

Explain Part 10 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 10 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 11: CAP, PACELC & Consistency Models

CAP Theorem

In a network partition (P), a distributed system must choose between consistency (C) and availability (A). You cannot have all three in the strict sense during a partition.

  • CP: Refuse writes/reads until consensus (ZooKeeper, etcd, HBase) — correct but may be unavailable
  • AP: Accept requests; replicas may diverge (Cassandra tunable, DynamoDB eventual)

Most production systems are not purely one letter — they offer tunable consistency per operation.

PACELC Extension

If Partition (P): choose Availability or Consistency (AC). Else (normal operation): choose Latency or Consistency (LC). Under no partition, you still trade off sync replication latency vs strong consistency.

Consistency Models (Weakest to Strongest)

ModelGuaranteeExample
EventualReplicas converge if no new writesDNS, Cassandra default
Read-your-writesUser sees own updatesSession stickiness or user-scoped routing
Monotonic readsNo going backward in timeRoute user to same replica
Consistent prefixCausal order preservedKafka partition ordering
LinearizableAppears instantaneous global orderetcd, Spanner TrueTime
SerializableTransactions as if serial orderPostgres SERIALIZABLE

Linearizability vs Serializability

Linearizability: single-object, real-time order — register read sees latest write. Serializability: multi-object transaction isolation — no interleaving anomalies. Spanner provides external consistency via TrueTime bounded clock uncertainty.

Practical Interview Mapping

Product featureTypical choice
Social feedEventual + read-your-writes
Like counterEventual or CRDT; approximate OK
Inventory / seat bookingStrong consistency, transactions
Chat messagesPer-channel ordering (Kafka partition)
Config flagsEventual with short TTL

Quorum Recap

W + R > N gives strong reads on write; latency cost on every write. Mention tunable per query in Cassandra (ONE vs QUORUM vs ALL).

Clocks & Ordering

Lamport clocks, vector clocks, and hybrid logical clocks (HLC) order events without perfect sync clocks. Never assume NTP is perfect — design for clock skew in distributed IDs (Snowflake uses time + machine id).

Partition happens:
  CP system: some nodes reject traffic → lower availability
  AP system: nodes diverge → need merge / conflict resolution later

Script: "Feeds are AP — we accept eventual consistency with 30s staleness on followers. Seat reservation is CP on the shard leader with row-level locking."

Dynamo Paper Takeaways

Consistent hashing + quorum + sloppy quorum + hinted handoff — foundation for AP systems.

Google Spanner

TrueTime API bounds clock uncertainty → external consistency globally. Not magic — GPS/atomic clocks in datacenters.

Session Guarantees in Practice

Sticky sessions + read-your-writes: route same user to primary or replica with session token tracking applied LSN.

Worked Example: CAP

Bank transfer during partition: CP choice — reject transfer if cannot reach quorum. Social like during partition: AP — accept like, merge count later.

Extended Notes

Connect cap to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — CAP & Consistency

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Is CAP a theorem to cite blindly?

Explain partition behavior practically — tunable quorums, not binary CAP labels.

Linearizable example?

Distributed lock, leader election — user expects immediate global visibility.

Eventual consistency user impact?

Delayed notification count, duplicate like possible — product must accept or merge.

Extended Reference — CAP & Consistency

PACELC in interview

Normal operation: choose between latency and consistency — sync replication is LC trade-off.

Client-side choices

DynamoDB ConsistentRead=true on GetItem; Cassandra QUORUM vs ONE per query.

Session tokens

Return version with write; client passes version on read — server routes to replica ≥ version.

Split brain during partition

AP system may accept conflicting writes — product must define merge UX.

Avoid CAP buzzword only

Explain concrete failure: 'If link between DCs drops, we pause writes to enforce CP for wallet.'

Part 11 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. CAP during partition
  2. PACELC else branch
  3. Consistency models list
  4. Linearizable example
  5. Eventual product OK
  6. Read-your-writes how
  7. Tunable quorum
  8. Clock skew aware
  9. Not buzzword only
  10. Map feature to model

Self-test prompt

Explain Part 11 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 11 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 12: Distributed Transactions

Why Distributed Transactions Are Hard

A transaction spanning multiple databases or services cannot use a single node's lock manager. Network failures leave systems in partial states. Interviews favor pragmatic patterns over pure 2PC unless banking-level ACID is required.

Two-Phase Commit (2PC)

Coordinator runs prepare (vote) then commit. All participants must ACK before commit.

  1. Prepare: Participants lock resources, vote yes/no
  2. Commit: If all yes, coordinator sends commit; else abort

Problems: blocking if coordinator dies after prepare; latency; not suited across unreliable WAN. Used inside distributed databases (Spanner, distributed Postgres experiments) more than microservices.

Saga Pattern

Sequence of local transactions with compensating actions on failure. Choreography (events) vs orchestration (central coordinator).

StepActionCompensate
1Reserve inventoryRelease inventory
2Charge paymentRefund payment
3Ship orderCancel shipment

Compensations must be idempotent — retries are inevitable. Sagas are eventually consistent; not a substitute for single-node ACID when you need atomic debit+credit.

Transactional Outbox

Write business row + outbox event in same local DB transaction. Relay process publishes to Kafka. Consumers achieve at-least-once; idempotent handlers required.

BEGIN;
  INSERT INTO orders ...;
  INSERT INTO outbox (topic, payload) ...;
COMMIT;
-- separate relay: read outbox → publish → mark sent

Idempotency

Duplicate requests must not double-charge or double-ship. Store idempotency_key with unique constraint; return cached response on replay.

  • Client generates UUID per user action
  • Server stores (key → response) with TTL 24h
  • Payment APIs (Stripe) mandate idempotency keys

TCC (Try-Confirm-Cancel)

Reserve resources in try phase, confirm or cancel. Like saga with explicit resource holds — used in some Chinese payment ecosystems.

When to Use What

PatternUse when
Local ACID onlySingle service owns all data
Outbox + eventsNotify other services reliably
SagaMulti-service workflow with compensations
2PCRare; internal to specialized DB

Interview Example: Order Service

"Order service writes order + outbox in Postgres. Payment service consumes PaymentRequested event, calls Stripe with idempotency key. On failure, publishes PaymentFailed; order service runs compensating cancel saga step."

Outbox vs Dual Write

ApproachRisk
Dual write DB+KafkaOne succeeds one fails — inconsistent
OutboxSingle transaction; relay may lag

Idempotency Table Schema

CREATE TABLE idempotency_keys (
  key VARCHAR(64) PRIMARY KEY,
  response_body JSONB,
  created_at TIMESTAMPTZ
);

Poison Message Handling

After N failed saga steps, move to manual review queue — do not infinite retry charging user.

Worked Example: Transactions

Order saga: reserve→pay→ship. Compensate ship cancel if pay failed after reserve. Each step stores saga_id state machine row.

Extended Notes

Connect transactions to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Distributed Transactions

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Why avoid 2PC in microservices?

Blocking, coordinator SPOF, latency — use saga/outbox instead.

Outbox relay failure?

Relay retries; at-least-once delivery; consumers idempotent.

Saga compensation failure?

Manual intervention queue; alert; never silent money loss.

Extended Reference — Distributed Transactions

Outbox ordering

Relay publishes in order per aggregate id — consumers depend on order for state machine.

Saga timeouts

Each step has deadline; timeout triggers compensate — avoid stuck saga occupying inventory.

Duplicate event handling

Consumer stores processed event_id; unique constraint prevents double ship.

Testing sagas

Inject failure after step 2 in integration test; verify compensate called once.

vs local transaction

Prefer single-service ACID when boundary allows — extract service only when necessary.

Part 12 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. 2PC limitations
  2. Saga compensate
  3. Outbox pattern
  4. Idempotency keys
  5. At-least-once consumers
  6. Poison saga handling
  7. TCC optional mention
  8. Prefer local TX
  9. Event ordering
  10. Test failure injection

Self-test prompt

Explain Part 12 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 12 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 13: Message Queues & Streams

Queues vs Logs

Message queue (RabbitMQ, SQS): message deleted after ack — task distribution. Log / stream (Kafka, Pulsar): messages retained; consumers track offset — replay and multiple consumer groups.

Kafka Core Concepts

TermMeaning
TopicNamed stream of records
PartitionOrdered, immutable sequence; parallelism unit
OffsetPosition in partition
Consumer groupPartitions divided among consumers in group
ReplicationLeader + ISR followers per partition

Throughput scales with partition count. Key by user_id to preserve per-user ordering.

Ordering Guarantees

  • Within partition: strict order
  • Across partitions: no global order
  • Fix: partition key = entity id needing order (order_id, user_id)

Delivery Semantics

SemanticMeaningHow
At-most-onceMay lose messagesFire-and-forget, no retry
At-least-onceMay duplicateRetry until ack; idempotent consumer
Exactly-onceHard end-to-endKafka transactions + idempotent producer + dedup DB

Interview default: at-least-once + idempotent handlers. Exactly-once is expensive; justify for billing.

Consumer Groups

Each partition consumed by at most one consumer in a group. Scale consumers ≤ partition count. Rebalance on consumer join/leave — causes brief pause; use cooperative sticky assignors in production.

Backpressure & Retention

Retention policy (7 days default) bounds disk. Slow consumers fall behind (lag). Monitor consumer lag alert. Dead-letter queue (DLQ) for poison messages after N failures.

Use Cases

  • Async jobs: email, thumbnails, search indexing
  • Event sourcing / CDC propagation
  • Metrics aggregation pipeline
  • Decouple peak write spikes from slow processors
Producer → [Topic: orders]
              ├─ partition 0 → Consumer A (group billing)
              ├─ partition 1 → Consumer B (group billing)
              └─ partition 2 → Consumer C (group analytics)

RabbitMQ vs Kafka

RabbitMQKafka
ModelQueue, routingDistributed log
ReplayLimitedNative by offset
ThroughputHighVery high
RoutingExchanges, bindingsTopic + key

Partition Sizing

Target 10–100 MB/s per partition; too few partitions limits parallelism; too many increases broker overhead.

Kafka vs SQS

KafkaSQS
OrderingPer partitionFIFO queues only
RetentionDays+14 days max
ConsumersPull, groupsCompeting consumers

Event Schema Evolution

Avro/Protobuf with schema registry; backward compatible field addition; never remove required fields without version bump.

Worked Example: Kafka

Order events keyed by order_id preserve per-order ordering. 12 partitions → max 12 parallel consumers in group.

Extended Notes

Connect kafka to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Message Queues

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Kafka partition key choice?

Entity needing ordering — order_id, user_id — not random if order matters.

At-least-once duplicate handling?

Idempotent consumer: upsert by event_id, check processed table.

When queue vs direct RPC?

Async, burst absorption, fan-out to many consumers, decouple peak load.

Extended Reference — Message Queues & Streams

Message size

Kafka default 1MB max; large payloads store S3 pointer in message body.

Compaction

Log compaction retains latest key per topic — changelog topics for config/state.

Consumer lag SLO

Alert lag > 60s for billing pipeline; > 5min for analytics acceptable.

Ordering vs parallelism

More partitions = more parallelism but no global order — business must accept per-entity order only.

Poison pill

Message fails parse — DLQ after 3 tries; manual fix schema or skip with audit.

Part 13 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Queue vs log
  2. Kafka partitions
  3. Consumer groups
  4. Delivery semantics
  5. Idempotent consumer
  6. DLQ
  7. Lag monitoring
  8. Partition key
  9. Message size S3
  10. Schema evolution

Self-test prompt

Explain Part 13 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 13 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 14: Microservices vs Monolith

Monolith First

A single deployable application with shared database is faster to build and debug. Many successful products scale monoliths vertically and with read replicas before splitting. Interview tip: do not jump to 50 microservices without scale pain.

When to Split Services

  • Independent scaling (video transcoding vs API)
  • Different release cadences per team
  • Technology fit (Python ML vs Go API)
  • Fault isolation (billing outage must not take down feed)
  • Regulatory boundaries (PCI scope reduction)

Microservices Challenges

ChallengeMitigation
Distributed debuggingTracing (Jaeger), correlation IDs
Data consistencySagas, outbox, eventual consistency
Network latencyBatch APIs, avoid chatty chains
Operational overheadK8s, Helm, service mesh maturity
TestingContract tests, staging environments

API Gateway

Single entry for clients: auth, rate limiting, routing, SSL termination, request aggregation (BFF pattern for mobile vs web).

[Mobile] ──┐
[Web]    ──┼→ [API Gateway] → [User Svc] [Order Svc] [Feed Svc]
[3rd party]─┘         ↓ auth, throttle, route

Service Mesh (Istio, Linkerd)

Sidecar proxy per pod handles mTLS, retries, timeouts, traffic splitting without app code changes. Cost: latency hop, complexity. Worth it at dozens+ services with strong platform team.

Communication Patterns

  • Sync REST/gRPC: simple request-response; cascading failure risk
  • Async events: loose coupling; harder to debug
  • BFF: Backend-for-frontend tailored API per client type

Data Per Service

Each service owns its database — no shared tables. Cross-service queries via API composition or materialized views fed by CDC. Violating this creates distributed monolith.

Interview Script

"I start with a modular monolith — clear package boundaries. If transcoding becomes a bottleneck, extract media-worker service behind a queue while keeping user API monolithic."

Domain-Driven Boundaries

Split by bounded context (billing, catalog, shipping) not by technical layer (all DBs separate wrong way).

Strangler Fig Migration

Proxy routes 5% traffic to new service; increment until monolith retired.

Service Mesh Cost

~1–2ms latency per hop; 1000 services × mesh control plane ops burden — justify before adopting.

Worked Example: Microservices

Extract notification service first — clear boundary, async, reduces monolith deploy risk without splitting core transaction path.

Extended Notes

Connect microservices to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Microservices

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Monolith to microservices first split?

Highest churn isolated component with clear API — not arbitrary layer split.

API gateway vs service mesh?

Gateway: edge auth/routing. Mesh: service-to-service mTLS, retries, traffic split.

Distributed monolith antipattern?

Microservices sharing database tables — no bounded context isolation.

Extended Reference — Microservices

Team topology

Conway's law: service boundaries match team boundaries — align org before splitting code.

Contract testing

Pact tests verify provider/consumer API contracts in CI — prevent breaking downstream.

Shared libraries

Thin shared libs only — fat shared library recreates monolith coupling.

Observability tax

Each service needs metrics, logs, traces — platform team provides templates.

Decomposition trigger

Extract when independent scale, deploy, or failure domain justified — not preemptively.

Part 14 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Monolith first OK
  2. Split boundaries
  3. API gateway role
  4. Service mesh cost
  5. BFF pattern
  6. Data per service
  7. Saga across services
  8. Contract tests
  9. Strangler migration
  10. Avoid distributed monolith

Self-test prompt

Explain Part 14 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 14 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 15: REST, GraphQL, gRPC & WebSockets

REST

Resource-oriented HTTP: nouns as URLs, verbs as methods. Stateless; cacheable GETs. Standard for public APIs and browser clients.

MethodIdempotentSafeUse
GETYesYesRead
POSTNoNoCreate
PUTYesNoReplace
PATCHNoNoPartial update
DELETEYesNoRemove

Pagination: cursor-based (?cursor=abc) scales better than offset for large tables. Version in path (/v1/) or header.

GraphQL

Client specifies exact fields needed — one round trip for nested data. Server defines schema (types, queries, mutations).

  • Pros: flexible clients, reduced over-fetching
  • Cons: complex caching (no HTTP cache per URL easily), N+1 query risk (DataLoader batching), expensive arbitrary queries — need depth/complexity limits

gRPC

HTTP/2 + Protocol Buffers — binary, fast, strongly typed. Streaming (unary, server, client, bidirectional). Best service-to-service; browsers need grpc-web gateway.

REST/JSONgRPC
ContractOpenAPI optional.proto required
PerformanceGoodBetter (binary)
BrowserNativeNeeds proxy
StreamingSSE, WSNative

WebSockets

Persistent bidirectional TCP — chat, live games, collaborative docs. Stateful connections complicate load balancing (sticky sessions or pub/sub backplane). Heartbeats detect dead connections.

Server-Sent Events (SSE)

One-way server → client over HTTP. Simpler than WebSockets for live feeds, notifications. Auto-reconnect built-in.

Choosing in Interviews

ScenarioChoice
Public mobile APIREST or GraphQL
Internal microservicesgRPC
Live chatWebSockets + Redis pub/sub
Stock tickerSSE or WebSocket
External: REST/GraphQL → API Gateway
Internal: gRPC between services
Realtime: WebSocket tier → Redis channel → all WS nodes

REST Pagination Patterns

Cursor: ?after=tweet_id stable under concurrent inserts. Offset bad for deep pages (OFFSET 1000000 slow).

GraphQL N+1

Resolvers per field cause N DB queries — DataLoader batches loads per request tick.

gRPC Streaming Use Cases

Server stream: log tail. Client stream: bulk upload. Bidi: collaborative editing.

Worked Example: APIs

Mobile uses GraphQL for home screen single request; backend services still gRPC internally.

Extended Notes

Connect apis to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — API Styles

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

REST versioning?

URL /v1/ or Accept header — be consistent; deprecate with sunset headers.

GraphQL complexity attack?

Limit query depth, cost analysis, timeouts, persisted queries allowlist.

gRPC vs REST for public API?

REST/JSON for third parties; gRPC internal — developer experience and browser support.

Extended Reference — REST, GraphQL, gRPC & WebSockets

API versioning

Deprecation timeline communicated via Sunset header; maintain v1 for 12 months.

Idempotent HTTP

PUT/DELETE idempotent by definition; POST needs Idempotency-Key for payments.

GraphQL complexity

Calculate cost: depth × breadth; reject expensive queries at gateway.

gRPC deadlines

context.WithDeadline propagates timeout across call chain.

WebSocket auth

Validate JWT on connect message; re-auth on long-lived connections periodically.

Part 15 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. REST verbs idempotent
  2. Cursor pagination
  3. GraphQL N+1 fix
  4. gRPC internal
  5. WebSocket LB
  6. SSE one-way
  7. Versioning strategy
  8. Proto breaking change
  9. Timeout deadlines gRPC
  10. Pick API per client

Self-test prompt

Explain Part 15 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 15 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 16: Rate Limiting

Why Rate Limit

Protect origin from abuse, ensure fair usage, enforce SLA tiers, and prevent cascade failure. Apply at edge (CDN), API gateway, and service level.

Token Bucket

Bucket holds tokens refilled at rate R (e.g., 100/sec). Each request consumes one token; overflow requests rejected or queued.

  • Allows bursts up to bucket capacity B
  • Smooth average rate over time
  • Used by many APIs (Stripe, AWS)
tokens = min(capacity, tokens + (now - last) * rate)
if tokens >= 1: tokens -= 1; allow
else: reject 429

Leaky Bucket

Requests enter queue; processed at fixed rate. Smoother output than token bucket; less bursty allowance.

Fixed & Sliding Window

Fixed window: count requests per minute bucket — boundary burst (199 at 0:59 + 199 at 1:00). Sliding window log: store timestamp per request — accurate, memory heavy. Sliding window counter: hybrid of fixed windows — good balance (Redis).

AlgorithmBurstMemoryAccuracy
Token bucketYesLowGood
Fixed windowEdge spikeLowOK
Sliding logNoHighExact
Sliding counterModerateMediumGood

Distributed Rate Limiting

Redis centralizes counters; all API nodes check INCR with TTL. Race conditions: use Lua script for atomic check-and-decrement. For strict global limits across regions, Redis Cluster or dedicated rate-limit service.

Response Headers

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716300000
Retry-After: 60

Rate Limit Dimensions

  • Per IP (anonymous)
  • Per API key / user id
  • Per endpoint (expensive vs cheap)
  • Global (protect DB)

Interview: Design API Rate Limiter

Redis sorted set sliding window per key; rules service stores limits; gateway enforces before business logic. Mention fail-open vs fail-closed on Redis outage.

Redis Implementation Sketch

ZREMRANGEBYSCORE key 0 (now - window)
ZADD key now request_id
ZCARD key -- if > limit: 429

Hierarchical Limits

Global 1M RPS → per-tenant 10K → per-user 100 — check cheapest filter first.

Fairness vs Priority

Paid tier higher limits; burst allowance for onboarding flows.

Worked Example: Rate Limit

Free tier 100 req/min, Pro 10K. Enforce at gateway; return 429 with Upgrade header.

Extended Notes

Connect rate limit to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Rate Limiting

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Token bucket vs sliding window?

Bucket allows controlled burst; sliding window smoother rate over window.

Distributed rate limit race?

Atomic Lua in Redis; or centralized limiter service.

Fail open or closed on limiter outage?

Fail closed for abuse protection; fail open for internal low-risk if business prefers availability.

Extended Reference — Rate Limiting

Burst vs sustained

Token bucket separates concerns — document both limits in API docs.

Per-tenant fairness

Noisy neighbor: one API key cannot consume entire global quota — hierarchical caps.

Cost of Redis limiter

One Redis round trip per request — acceptable at 100K RPS with cluster; shard keys.

Edge vs app limit

CDN/WAF blocks obvious abuse; app enforces business tier limits.

Testing

Load test verifies 429 at threshold and Recovery after window reset.

Part 16 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Token bucket
  2. Sliding window
  3. Redis atomic limit
  4. 429 headers
  5. Hierarchical limits
  6. Fail open vs closed
  7. Per user and global
  8. Burst allowance
  9. Edge + app limits
  10. Load test 429

Self-test prompt

Explain Part 16 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 16 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 17: Consistent Hashing, Bloom Filters & More

Consistent Hashing

When adding/removing cache nodes, hash mod N remaps almost all keys. Consistent hashing maps keys and nodes to a ring — only K/N keys move on average when one node added.

Virtual nodes (vnodes): each physical node has 100–200 points on ring for even distribution. Used in Dynamo, Cassandra, Memcached clients, CDNs.

        hash(key) → clockwise first node on ring
    [N1·]  [N2·]  [N3·]  [N1·] ...
         ring 0° — 360°

Bloom Filters

Probabilistic set: test membership with zero false negatives (if says no, definitely no) but possible false positives (says yes, might not exist). Space-efficient.

  • Web crawler: skip already-visited URLs
  • Cassandra: avoid disk read if key definitely absent
  • CDN: prevent cache pollution
  • Spell check: dictionary in compact filter

Cannot delete from standard bloom; counting bloom or rebuild. Size m bits, k hash functions — tune false positive rate p.

Geohashing

Encode lat/long into string prefix; nearby places share prefix — efficient proximity search in Redis/Elasticsearch. Precision = string length. Used in Uber/Lyft driver matching, Yelp nearby.

Merkle Trees

Hash tree: leaf = data block hash; parent = hash(children). Compare root hashes to detect differing subtrees — O(log n) sync.

  • Git: commit tree integrity
  • Bitcoin: block verification
  • Cassandra anti-entropy: replica sync without full compare
  • Distributed DBs: efficient replica reconciliation

HyperLogLog

Approximate distinct count in fixed memory — unique visitors, cardinality analytics. Redis PFADD/PFCOUNT.

Count-Min Sketch

Frequency estimation in streaming — heavy hitter detection, hotspot keys.

StructureAnswersFalse?
Bloom filterMaybe in set?FP only
HyperLogLogHow many unique?Approximate
Count-MinHow many of X?Overestimate

Consistent Hashing Math

With m keys and n nodes, expected keys to move when add node ≈ m/n. Modulo hash moves ~100% keys.

Bloom Filter Sizing

m = -n ln(p) / (ln 2)² bits for n items and false positive rate p. k = m/n × ln 2 hash functions.

Geohash Neighbor Search

Query 8 neighboring cells plus center — handle edge cases at equator/prime meridian.

Worked Example: Algorithms

URL dedup crawler: bloom filter 10B URLs, 1% FP → 100M false positives still saves disk — verify with disk set on positive.

Extended Notes

Connect algorithms to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Specialized Algorithms

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Consistent hashing use case?

Cache cluster, DynamoDB partitions, CDN origin selection — minimal remapping on node add.

Bloom filter false positive impact?

Extra DB read occasionally — tune false positive rate vs memory budget.

Merkle tree in anti-entropy?

Compare root hashes; recurse into differing branches only — efficient replica sync.

Extended Reference — Consistent Hashing & Probabilistic Structures

Virtual nodes

100 vnodes per physical node prevents uneven ring distribution when few servers.

Bloom in practice

Size for 1% FP and 1B items ≈ 1.14 GB — still cheaper than exact set in RAM.

Geohash precision

6 chars ≈ 1.2km; 7 chars ≈ 150m — pick for urban driver matching.

Merkle sync

Compare subtree hashes top-down — bandwidth proportional to differences not total data.

Sketch algorithms

Use Count-Min for trending hashtags; HyperLogLog for UV — not exact counts.

Part 17 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Consistent hash ring
  2. Virtual nodes
  3. Bloom filter FP
  4. Geohash neighbors
  5. Merkle sync
  6. HyperLogLog UV
  7. Count-Min sketch
  8. Use case per structure
  9. Modulo hash bad
  10. Size bloom formula

Self-test prompt

Explain Part 17 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 17 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 18: Observability — Logs, Metrics, Traces

Three Pillars

Logs: discrete events (errors, audit). Metrics: numeric time series (CPU, QPS). Traces: request path across services. Together they answer: what broke, how bad, where in the chain.

Structured Logging

JSON logs with trace_id, user_id, service, level. Centralize in ELK (Elasticsearch, Logstash, Kibana), Loki, or CloudWatch. Avoid logging PII/passwords. Sample debug logs at high QPS.

Metrics (RED & USE)

MethodScopeMetrics
REDServicesRate, Errors, Duration
USEResourcesUtilization, Saturation, Errors

Prometheus pull model + Grafana dashboards. Histograms for p50/p99 latency — averages lie.

Distributed Tracing

OpenTelemetry → Jaeger/Tempo. Propagate trace context (W3C traceparent) across HTTP/gRPC/Kafka. One slow span in 20-service chain visible immediately.

Request trace_id=abc
  API 45ms → Auth 12ms → DB 180ms ← bottleneck
              → Cache 2ms

SLI, SLO, SLA

  • SLI: measurable indicator (availability = successful / total)
  • SLO: target (99.9% availability over 30 days)
  • SLA: contract with customer (refund if missed)

Error budget = 1 - SLO. If budget exhausted, freeze features; focus reliability. Burn rate alerts predict SLO violation early.

Alerting

Alert on symptoms (high 5xx rate, p99 latency) not causes (CPU 80%) unless correlated. Page humans for user-facing SLO breach; ticket for disk 70%. Runbooks linked in alert.

Interview Mention

"I define SLO 99.95% for read API, SLI from load balancer success rate, alert when 1-hour burn rate exceeds 10× budget consumption."

Log Levels

ERROR: action needed. WARN: degraded. INFO: business events. DEBUG: dev only, sampled in prod.

Cardinality Explosion

Never label metrics with unbounded user_id — use aggregated histograms. High cardinality kills Prometheus.

On-Call Hygiene

Runbooks, escalation policy, blameless postmortems within 48 hours.

Worked Example: Observability

SLO 99.9%: burn rate alert when 5xx > 0.1% for 5 min. Trace slow checkout to payment RPC timeout.

Extended Notes

Connect observability to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Observability

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

SLI vs SLO vs SLA?

Indicator vs internal target vs customer contract.

High cardinality metric example?

http_requests{user_id=x} — forbidden at scale.

Trace sampling?

100% errors, 1% success — balance cost and debuggability.

Extended Reference — Observability

Log sampling

Sample 1% debug at 1M RPS — still 10K logs/sec — tune levels.

Metric labels

service, endpoint, status_code — bounded cardinality.

Trace context propagation

Inject trace_id into logs for correlation — single pane search.

SLO dashboard

Burn rate panels for executives — error budget remaining this quarter.

On-call

Every alert actionable — if not, fix alert or delete.

Part 18 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Logs metrics traces
  2. RED metrics
  3. Trace propagation
  4. SLI SLO SLA
  5. Error budget
  6. High cardinality avoid
  7. Alert symptoms
  8. Runbooks
  9. Sampling strategy
  10. Postmortem blameless

Self-test prompt

Explain Part 18 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 18 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 19: Reliability & Disaster Recovery

Reliability Goals

System continues correctly despite failures. Measure with availability SLOs and MTTR (mean time to repair). Design for failure — everything fails eventually.

Redundancy

  • Active-active: all nodes serve traffic — no idle capacity waste; harder consistency
  • Active-passive: standby takes over on failover — simpler, wasted standby
  • N+1, N+2: spare capacity for component failure

Failover

Health checks detect unhealthy instances; LB removes from pool. DNS failover for regional outage (slow TTL). Database automatic failover with fencing (STONITH) to prevent dual-writer split-brain.

Disaster Recovery (DR)

TermMeaning
RPORecovery Point Objective — max data loss (time of last backup/replica)
RTORecovery Time Objective — max downtime to restore service

Async cross-region replication increases RPO (minutes of loss possible). Sync replication lowers RPO but raises latency.

Multi-Region Strategies

  • Backup restore: cheapest; highest RTO/RPO
  • Pilot light: minimal DR region, scale up on disaster
  • Warm standby: reduced capacity always running
  • Active-active: full capacity both regions; hardest

Chaos Engineering

Proactively inject failures (Chaos Monkey, Litmus) in controlled environments. Validate retries, circuit breakers, and runbooks before real outages. Start with game days, not random prod kills.

Dependency Failure

Every sync call is a failure domain. Timeouts + circuit breakers + graceful degradation (show cached feed if ranking service down).

[Primary Region] ←──async repl──→ [DR Region]
        ↓ failover DNS / traffic manager
   RPO 5 min, RTO 30 min (example targets)

Blast Radius

Isolate by cell (subset of users), shard, or region — failure affects 1% not 100%.

Game Day Checklist

  1. Inject DB failover
  2. Kill AZ
  3. Spike traffic 3×
  4. Verify alerts fire
  5. Measure RTO actual

Backup Testing

Untested restore = no backup. Quarterly restore drill to staging.

Worked Example: Reliability

RPO 1 hour: async binlog replicate. RTO 15 min: automated failover + runbook for DNS flip.

Extended Notes

Connect reliability to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Reliability

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

RPO vs RTO example?

RPO 5 min = lose 5 min data max. RTO 30 min = down 30 min max.

Chaos engineering prerequisite?

Observability, on-call, steady state hypothesis — otherwise chaos is reckless.

Active-active database challenge?

Write conflicts across regions — need CRDT or conflict resolution.

Extended Reference — Reliability & DR

Dependency map

Maintain tier-0 dependency graph — if Redis down, which features degrade?

Graceful degradation

Feature flags disable recommendations; core feed still serves from cache.

DR drill

Quarterly failover to secondary region with production-like traffic shadow.

Data backup

PITR 35 days; test restore to new cluster monthly.

Incident response

Severity levels, comms template, status page update cadence.

Part 19 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Redundancy N+1
  2. Active-active vs passive
  3. RPO RTO defined
  4. DR drill
  5. Chaos engineering safe
  6. Graceful degradation
  7. Blast radius
  8. Backup restore test
  9. Multi-region tradeoff
  10. Dependency map

Self-test prompt

Explain Part 19 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 19 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 20: Security Fundamentals

Authentication vs Authorization

Authn: who are you (login, JWT, session). Authz: what may you do (RBAC, ABAC, ACL). Always authenticate at gateway; authorize per resource in service.

OAuth 2.0 / OpenID Connect

Delegate auth to identity provider (Google, Okta). Authorization code flow with PKCE for SPAs. Access token (short) + refresh token (long, stored securely). OIDC adds ID token (user profile).

Session vs JWT

Server sessionJWT
RevocationEasy (delete session)Hard until expiry
ScaleNeeds Redis session storeStateless verification
SizeSmall cookieLarge header

Encryption

  • In transit: TLS 1.2+ everywhere (HTTPS, mTLS service mesh)
  • At rest: AES-256 disk encryption (AWS KMS, envelope encryption)
  • Application-level: encrypt PII fields before DB for defense in depth

DDoS Protection

Volumetric attacks absorbed at CDN/scrubbing center. Rate limiting, WAF, geo blocking. Anycast spreads load. Never expose origin IP directly.

OWASP Top 10 (Overview)

  1. Broken access control
  2. Cryptographic failures
  3. Injection (SQL, XSS)
  4. Insecure design
  5. Security misconfiguration
  6. Vulnerable components
  7. Auth failures
  8. Integrity failures
  9. Logging failures
  10. SSRF

Mitigate: parameterized queries, input validation, CSP headers, least privilege IAM, secret rotation, security scanning in CI.

Zero Trust

Never trust internal network; verify every request. mTLS between services, network policies in K8s.

Secrets Management

Vault, AWS Secrets Manager — never commit .env. Rotate keys; short-lived tokens.

SQL Injection Prevention

Parameterized queries only; ORM not excuse for raw string concat.

PCI Scope

Use hosted fields / tokenization — card data never touches your servers.

Worked Example: Security

OAuth scopes: read:profile vs write:post. JWT 15 min access + refresh rotation.

Extended Notes

Connect security to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Security

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

OAuth implicit flow deprecated why?

Token exposed in browser — use authorization code + PKCE.

mTLS benefit?

Mutual authentication service-to-service — no trusted network assumption.

OWASP injection fix?

Parameterized queries, ORM, input validation, least privilege DB user.

Extended Reference — Security

Least privilege IAM

Service account per microservice; no shared admin keys in apps.

Secrets in CI

Short-lived OIDC to cloud — no long-lived AWS keys in GitHub.

Audit logging

Immutable audit trail for admin actions — who changed ACL when.

DDoS layers

Volumetric at CDN; application layer at WAF rate rules; origin protection hide IP.

Supply chain

Dependabot, signed containers, SBOM for compliance.

Part 20 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Authn vs authz
  2. OAuth PKCE
  3. JWT vs session
  4. TLS everywhere
  5. Encryption at rest
  6. OWASP top aware
  7. DDoS layers
  8. Least privilege IAM
  9. PCI scope reduce
  10. No secrets in git

Self-test prompt

Explain Part 20 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 20 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 21: Resilience Design Patterns

Circuit Breaker

Stop calling failing dependency after threshold — fail fast, give time to recover. States: closed (normal), open (reject), half-open (probe).

Libraries: Resilience4j, Hystrix (legacy). Pair with fallback (cached response, degraded mode).

Bulkhead

Isolate resource pools — thread pool per dependency so one slow service cannot exhaust all threads. K8s resource limits per container are bulkheads at infra level.

Retry with Backoff

Transient failures (503, timeout): retry with exponential backoff + jitter. Cap max retries. Idempotent operations only for POST without idempotency key.

delay = min(cap, base * 2^attempt + random_jitter)

Timeout

Set timeout at every hop; client timeout < server timeout chain. Cascading waits kill systems — default 30s HTTP client timeout is dangerous at scale.

CQRS

Command Query Responsibility Segregation — separate write model (normalized OLTP) from read model (denormalized Elasticsearch). Updates propagate via events. Scales reads independently.

Event Sourcing

Store sequence of events as source of truth; state derived by replay. Audit trail for free; complex queries need projections. Pair with snapshots for long streams.

PatternProblem solved
Circuit breakerCascade failure
BulkheadResource exhaustion
Retry + backoffTransient errors
TimeoutHung connections
CQRSRead/write scale mismatch
Event sourcingAudit, temporal queries
[Service] --timeout 200ms--> [Dependency]
     | circuit OPEN → fallback cache
     | bulkhead pool max 50 threads

Retry Storm

Clients retry on 503 simultaneously → overload. Jittered backoff + server Retry-After header.

CQRS Read Model Build

Projection worker consumes events → updates Elasticsearch doc. Rebuild projection from event log on corruption.

Saga vs 2PC Decision Tree

Money across services → saga + ledger audit. Config update across services → saga OK with compensate.

Worked Example: Patterns

Circuit open after 50% errors in 10s window; half-open allow 3 probe requests.

Extended Notes

Connect patterns to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Resilience Patterns

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Circuit breaker half-open?

Test if dependency recovered — single probe before full traffic.

Retry idempotency?

POST without key may duplicate — require Idempotency-Key header.

CQRS without event sourcing?

Yes — separate read/write stores synced by CDC enough for many systems.

Extended Reference — Resilience Patterns

Timeout budgets

Total user request 300ms — budget 50ms per hop max 4 hops.

Bulkhead thread pools

Pool per downstream — search slow does not exhaust pools for payments.

Fallback quality

Stale cache better than 500 error for product listing — label 'prices may be delayed'.

CQRS rebuild

Replay event log 24h to rebuild corrupted read model — disaster recovery for projections.

Anti-pattern

Retry storm without jitter — amplifies outage.

Part 21 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Circuit breaker states
  2. Bulkhead pools
  3. Retry jitter
  4. Timeout budgets
  5. CQRS projection
  6. Event sourcing snapshot
  7. Fallback defined
  8. No retry storm
  9. Half-open probe
  10. Degrade UX message

Self-test prompt

Explain Part 21 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 21 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 22: Storage — Block, File & Object

Block Storage

Raw volumes (EBS, SAN) mounted as disks. Low-level, high IOPS. Used for databases, VM boot volumes. Snapshots for backup; attach/detach to instances.

File Storage

POSIX filesystem (NFS, EFS, HDFS). Shared folders, legacy apps, data science home directories. Not ideal for internet-scale static assets — latency and cost.

Object Storage

Blob + metadata via HTTP API (S3, GCS). Virtually unlimited scale, 11 nines durability claim, cheap per GB. Keys like s3://bucket/user/123/photo.jpg.

TypeAccessBest for
BlockDisk protocolDatabases, transactional local state
FileFilesystem pathShared files, Hadoop
ObjectHTTP key-valueMedia, backups, data lake

Object Storage Patterns

  • Pre-signed URLs for direct client upload (bypass API bandwidth)
  • Lifecycle policies: Standard → IA → Glacier
  • CDN origin for static delivery
  • Versioning + replication for DR

Interview: Photo App

Metadata in SQL; binary in S3; thumbnail via async worker; CloudFront in front. Never store 5 MB images in Postgres rows.

Client → presigned PUT → S3
       → POST /photos {s3_key} → API → SQL metadata
Worker ← SQS ← event → generate thumbnails → S3

EBS vs Instance Store

EBS network-attached, snapshot backup. Instance store faster ephemeral — cache nodes only.

Data Lake

S3 + Parquet + Spark/Presto for analytics decoupled from OLTP.

Erasure Coding

S3 IA/Glacier use erasure coding for cost-effective durability at rest.

Worked Example: Storage

Dropbox: metadata SQL, chunks object storage, dedupe by content hash per user namespace.

Extended Notes

Connect storage to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Storage

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

S3 eventual consistency?

Read-after-write consistency for new objects; LIST eventual — design listing carefully.

Block storage for DB?

EBS gp3/io2 — provisioned IOPS for latency-sensitive OLTP.

File vs object for ML training?

Object store + parallel read workers; POSIX file mount optional layer.

Extended Reference — Storage Systems

S3 key design

Prefix with hash of user_id to avoid hot partition — random prefix if extreme scale.

Lifecycle cost

80% storage cost in old infrequent access — lifecycle rules save money.

EBS snapshot

Incremental snapshots; cross-region copy for DR.

POSIX on object

Mount s3fs for legacy — not performance path; use native SDK.

Compliance

Object lock WORM for regulatory retention.

Part 22 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Block file object diff
  2. S3 presigned upload
  3. Lifecycle tiers
  4. EBS for DB
  5. Data lake S3
  6. No big BLOB SQL
  7. Erasure coding note
  8. POSIX on object caution
  9. Cross-region replicate
  10. Backup snapshots

Self-test prompt

Explain Part 22 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 22 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 23: Search & Indexing

Why Separate Search Engine

SQL LIKE '%foo%' full table scan — unusable at scale. Inverted indexes power fast full-text search, fuzzy match, facets, ranking.

Inverted Index

Maps each term → list of document IDs containing it. Query intersects posting lists for AND queries.

"quick fox" indexed:
  quick → [doc1, doc5]
  fox   → [doc1, doc9]
  AND   → [doc1]

Elasticsearch Architecture

  • Index: logical namespace (like database)
  • Shard: horizontal partition of index
  • Replica: copy for read scale and HA
  • Analyzers: tokenize, stem, lowercase text

Writes route to primary shard; replicas sync. Near-real-time search (refresh interval ~1s default).

Ranking & Relevance

TF-IDF, BM25 scoring. Boost fields (title > body). Function scores for popularity, recency. Personalization often hybrid: ES retrieval + ML rerank.

Sync from Primary DB

CDC or dual-write to index. Reindex on mapping changes (new field type). Handle deletes — tombstone in index.

Alternatives

Algolia/Typesense for managed SaaS; Postgres full-text for small scale; vector DB for semantic search (embeddings).

FeatureSQLElasticsearch
Prefix searchPoorGood (edge n-grams)
Faceted browseHeavy GROUP BYNative aggregations
ACID writesYesEventual index refresh

Autocomplete Pipeline

Edge n-gram tokenizer at index time; completion suggester on prefix queries.

Pagination in Search

search_after with sort keys — deep pagination without costly offset.

Index Mapping Mistakes

Wrong field type (text vs keyword) breaks aggregations and exact filters.

Worked Example: Search

Yelp search: geo filter + text + rating facet — inverted index + geo index combined.

Extended Notes

Connect search to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Search

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Inverted index update?

Near real-time refresh interval; or external versioning for strict freshness.

Why Elasticsearch for logs?

Full-text + aggregations + time-series index patterns (ELK stack).

Vector search addition?

Embeddings index for semantic similarity — hybrid with keyword BM25.

Extended Reference — Search & Indexing

Analyzer chain

Lowercase → stopwords → stemmer — tune for language.

Shard sizing ES

20–50 GB per shard guideline; force merge maintenance window.

Hybrid search

BM25 retrieve top 100 → vector rerank top 10 — best of keyword + semantic.

Index rebuild

Blue-green indices alias swap — zero downtime reindex.

Security

Filter queries by tenant_id mandatory — prevent cross-tenant leak.

Part 23 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Inverted index
  2. ES shards replicas
  3. Analyzers
  4. BM25 ranking
  5. CDC to index
  6. Reindex blue-green
  7. search_after pagination
  8. Tenant filter mandatory
  9. Vector hybrid optional
  10. Autocomplete edge ngram

Self-test prompt

Explain Part 23 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 23 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 24: Real-Time Systems & News Feeds

Fan-Out on Write

When user posts, push post ID into all followers' timeline caches (Redis sorted sets). Read is O(1) — fetch precomputed timeline.

  • Pros: fast reads, predictable latency
  • Cons: slow write for celebrities (millions of followers); wasted work if follower inactive

Fan-Out on Read

On read, merge recent posts from all followed users. Write is cheap; read is expensive and slow for users following many accounts.

Hybrid (Twitter-style)

Fan-out on write for normal users (<10K followers). Fan-out on read for celebrities — merge celebrity tweets at read time from dedicated cache.

Post tweet:
  if followers < 10K → push to each follower timeline cache
  else → write to celebrity tweet cache only

Read timeline:
  merge(user_timeline_cache, celebrity_tweets_cache)

Pull vs Push Models

PullPush
ClientPolls server periodicallyServer sends via WS/SSE/push notification
LatencyPoll interval boundNear real-time
Server loadEmpty polls waste resourcesConnection state per client
BatteryWorse if aggressive pollPush can be efficient with FCM/APNs

Activity Streams

FQL-style aggregation: store activities, fan-out to inboxes, rank by ML offline. Kafka for event pipeline; Redis for hot timelines; cold storage in Cassandra.

Ranking Feed

Not chronological at scale — score = f(recency, engagement, affinity). Precompute scores in batch; blend with real-time signals.

Timeline Storage

Redis ZSET: key=timeline:user_id, score=timestamp, member=tweet_id. Trim to top 1000 entries.

Cold Start User

Global popular feed until follow graph populated — onboarding engagement.

Feed Ranking Features

Recency, author affinity, engagement probability — offline model + online blend.

Worked Example: Feeds

Normal user 500 followers: fan-out write 500 Redis ZADDs ~5ms. Celebrity 50M: fan-out read only.

Extended Notes

Connect feeds to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Real-Time Feeds

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Celebrity fan-out hybrid threshold?

Industry often 10K–100K followers — tune by infrastructure cost.

WebSocket scaling?

Pub/sub backplane (Redis) so any WS node receives broadcast to local connections.

Feed ranking offline/online?

Offline batch scores + online rerank with fresh engagement signals.

Extended Reference — Real-Time & Feeds

Ranking pipeline

Offline Spark computes scores hourly; online feature store serves p99 < 10ms lookup.

Feed pagination

Cursor = last tweet_id in page; stable if no deletes; tombstone deleted ids.

Live updates

SSE fanout from pub/sub cheaper than WS for one-way notifications.

Write amplification

Fan-out write 10M followers = 10M writes — async queue required; rate limit celebrity post.

Read merging

K-way merge sorted lists from followees — heap O(log k) per item.

Part 24 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Fan-out write read
  2. Hybrid celebrities
  3. Redis ZSET timeline
  4. Pull vs push
  5. K-way merge
  6. Ranking offline online
  7. Cold start feed
  8. WS pub/sub scale
  9. Write amplification aware
  10. Stale feed OK?

Self-test prompt

Explain Part 24 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 24 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 25: Payments & Ledger Design

Requirements

Exactly-once money movement illusion, audit trail, idempotency, PCI scope minimization (use Stripe/Adyen tokenization). Strong consistency for balances.

Double-Entry Ledger

Every transaction has equal debit and credit entries; sum of accounts always balances.

Transfer $100 A → B:
  DEBIT  account_A  100
  CREDIT account_B  100

Immutable ledger entries — never UPDATE balance in place; append entries and compute balance as SUM or maintain materialized balance with transactional update in same DB transaction.

Idempotency Keys

Client sends Idempotency-Key: uuid on POST /charges. Server stores key → result mapping. Retries return same response without double charge.

Payment Flow

  1. Create payment intent (pending)
  2. Call PSP (payment service provider)
  3. Webhook confirms success/failure (async)
  4. Update ledger + order state atomically

Webhook handler must be idempotent — PSP may retry webhooks.

Reconciliation

Nightly batch compare internal ledger vs PSP settlement files. Discrepancy alerts for fraud or bugs.

Outbox for Side Effects

Ledger write + outbox event in one transaction → email receipt, analytics without losing money record.

FailureHandling
PSP timeoutQuery PSP status; never assume failure
Duplicate webhookIdempotent webhook handler
Partial sagaCompensating refund saga

PCI DSS Layers

SAQ A if all card data on Stripe Elements — smallest compliance burden.

Currency & Rounding

Store amounts in minor units (cents) as integers — never float for money.

Chargeback Flow

Webhook dispute.created → freeze merchant payout → evidence upload workflow.

Worked Example: Payments

Stripe webhook idempotent by event_id unique index. Ledger append-only, never UPDATE amount.

Extended Notes

Connect payments to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Payments

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Double-entry why?

Audit trail, imbalance detection fraud, accounting compliance.

Idempotency key storage TTL?

24–72 hours covers client retry windows; Stripe documents 24h.

Webhook ordering?

Do not assume order — use event_id dedup and state machine.

Extended Reference — Payments & Ledger

Immutable ledger

Append-only entries; corrections via compensating entries not UPDATE.

Minor units

BIGINT cents prevents float rounding 0.1 + 0.2 bugs.

PSP abstraction

Interface PaymentProvider — swap Stripe/Adyen; mock in tests.

Fraud checks

Sync fraud score before capture — async review for high value.

Regulatory

KYC/AML separate service; PCI scope minimization documented.

Part 25 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Double-entry ledger
  2. Idempotency Stripe
  3. Webhook dedup
  4. Integer cents
  5. Saga payment
  6. Reconciliation batch
  7. PCI tokenize
  8. Never float money
  9. Outbox notify
  10. Compensate refund

Self-test prompt

Explain Part 25 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 25 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 26: Notification System Design

Channels

Email, SMS, push (FCM/APNs), in-app, web push. Each channel has different providers, rate limits, cost, and delivery guarantees.

High-Level Architecture

[Event: order shipped] → Kafka → [Notification Service]
                                      ├→ Email worker → SendGrid
                                      ├→ SMS worker → Twilio
                                      └→ Push worker → FCM/APNs

User Preferences

Store per-user channel opt-in, quiet hours, locale. Check preferences before enqueue. Regulatory: marketing vs transactional (CAN-SPAM, TCPA).

Template & Localization

Template ID + variables rendered per locale. Version templates; A/B test subject lines offline.

Delivery & Retries

At-least-once queue per channel. Exponential backoff on provider 5xx. DLQ for bad addresses. Track delivery webhooks (email opened, bounce).

Volume Estimation

10M DAU × 5 notifications/day = 50M messages/day ≈ 580/sec average, higher peak. Shard queue by user_id. Rate limit per provider (SMS expensive).

Idempotency

Event id + notification type dedupes — avoid duplicate push on Kafka replay.

Priority Queues

Transactional (password reset) > marketing. Separate queues so blast campaign does not delay 2FA codes.

Monitoring

Metrics: sent, delivered, failed, latency per channel. Alert on bounce rate spike (bad list) or provider outage.

ChannelLatencyCost
PushSecondsLow
EmailSeconds–minutesLow
SMSSecondsHigh

Push Token Registry

device_tokens table: user_id, platform, token, updated_at. Invalidate on bounce.

Email Bounce Handling

Hard bounce → suppress address. Soft bounce → retry with backoff.

Unsubscribe One-Click

List-Unsubscribe header for marketing compliance.

Worked Example: Notifications

Password reset: SMS+email parallel, priority queue bypasses marketing throttle.

Extended Notes

Connect notifications to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Notifications

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Push vs SMS for 2FA?

SMS deliverability issues — prefer TOTP app; SMS fallback with rate limit.

Notification dedup?

event_id + channel unique constraint before send.

Quiet hours?

Store user timezone; scheduler delays non-urgent marketing sends.

Extended Reference — Notification Systems

Template versioning

v2 template rollback if conversion drops — A/B metric driven.

Provider failover

Primary SendGrid fail → secondary SES — circuit breaker per provider.

Batching

Digest email aggregates 50 events — reduces send volume.

Compliance

STOP keyword for SMS; one-click unsubscribe link tracking.

Load test

Simulate Black Friday notification spike through queue without sending real SMS cost.

Part 26 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Multi-channel queue
  2. Priority queues
  3. Template locale
  4. Device token registry
  5. Bounce suppress
  6. Dedup event_id
  7. Rate limit SMS cost
  8. Quiet hours TZ
  9. Provider failover
  10. Transactional vs marketing

Self-test prompt

Explain Part 26 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 26 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 27: Full System Design Walkthroughs

Twenty-five classic interview problems. For each: clarify requirements, run numbers, define APIs, draw architecture, sketch schema, name bottlenecks, and discuss extensions. Time-box to 45 minutes per problem in mock practice.

How to practice: Minute 0–8 requirements + estimates. Minute 8–20 high-level diagram. Minute 20–38 deep dive (interviewer choice). Minute 38–45 trade-offs and monitoring. Record yourself and score with Part 33 rubric.

URL Shortener (TinyURL)

Functional & Non-Functional Requirements

Scope summary: 100M URLs/month, 100:1 read:write

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

100M/mo → ~40 writes/s, ~4000 reads/s peak

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /v1/urls {long_url} → {short_code}; GET /{code} → 302

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Hash (base62) or counter+encode; collision retry; custom aliases optional

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

urls(id, short_code PK, long_url, user_id, created_at); index on short_code

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot counter shard; cache redirects; DB for durability

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Analytics pipeline, expiration, abuse detection

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Paste Bin

Functional & Non-Functional Requirements

Scope summary: 10M pastes/month, public/private

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

~4 pastes/s, reads higher for popular

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /pastes; GET /pastes/{id}; optional expiry

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Object store for body; metadata in SQL; CDN for public reads

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

pastes(id, user_id, visibility, expiry, s3_key, created_at)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Large paste size; spam; dedupe identical content

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Syntax highlighting service, rate limits

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Distributed Rate Limiter

Functional & Non-Functional Requirements

Scope summary: 1M users, rules per API key

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Per-key QPS limits, sliding window

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Middleware checks X-RateLimit-* headers

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Redis sorted sets or token bucket per key; sync optional for strict

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

rules(key, limit, window); counters in Redis

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Redis memory; clock skew; burst traffic

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Hierarchical limits, dynamic config

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Web Crawler

Functional & Non-Functional Requirements

Scope summary: 1B pages, polite crawling

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Frontier queue dominates

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

BFS frontier; fetcher workers; dedupe URL hash

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

URL frontier queue, visited bloom, robots.txt cache

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

urls(url_hash PK, status, priority, last_crawled)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Politeness per host; duplicate detection; DNS

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Distributed scheduling, PageRank pipeline

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Twitter / X News Feed

Functional & Non-Functional Requirements

Scope summary: 300M DAU, fan-out on write vs read

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

5K tweets/s write, massive read

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /tweets; GET /timeline; follow graph

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Hybrid fan-out: celebrities fan-out on read; normal users on write

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

tweets, follows, timeline cache (Redis sorted sets)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot users; thundering herd on celebrities

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Ranking ML, spaces, ads injection

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Instagram

Functional & Non-Functional Requirements

Scope summary: Photo-heavy, social graph

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

S3 + CDN; metadata DB

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /media; GET /feed; likes/comments

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Blob store; Cassandra for feeds; graph for follows

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

media, users, feeds, likes — denormalized counters

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Image processing pipeline; feed generation

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Stories TTL, recommendations

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

WhatsApp / Chat

Functional & Non-Functional Requirements

Scope summary: 1B messages/day, delivery guarantees

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

~12K msg/s average

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

WebSocket gateway; message service; presence

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Per-chat sequence; store-and-forward; offline inbox

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

messages(chat_id, seq, body, status); users, devices

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Connection count; multi-device sync; E2E optional

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Groups, media, encryption

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

YouTube / Netflix Video

Functional & Non-Functional Requirements

Scope summary: Upload + transcode + stream

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Huge egress bandwidth

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Multipart upload; HLS/DASH segments; CDN

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Upload → queue → transcode workers → object store + CDN

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

videos, renditions, view_counts

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Transcode cost; copyright; regional CDN

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Live stream, recommendations, DRM

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Uber / Lyft

Functional & Non-Functional Requirements

Scope summary: Real-time location, matching

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Geospatial index critical

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /rides; driver location stream; match service

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Geohash/grid index; dispatch service; trip state machine

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

drivers(location, status), rides, users

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Split-brain matching; surge pricing events

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Pooling, ETA ML, payments

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Yelp Proximity Search

Functional & Non-Functional Requirements

Scope summary: Search nearby businesses

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Geospatial queries <100ms

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

GET /search?lat&lng&radius&query

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Elastic/OpenSearch geo_distance; cache popular cities

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

businesses(id, lat, lng, categories, rating)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Index size; ranking relevance vs distance

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Reviews, photos, ads

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Ticketmaster

Functional & Non-Functional Requirements

Scope summary: High contention on-sale

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Spike 100x normal at drop

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Reserve → pay → confirm; queue users virtually

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Virtual waiting room; inventory row locks; idempotent booking

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

events, seats(status), reservations, orders

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Overselling; bots; payment failures

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Secondary market, dynamic pricing

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Dropbox

Functional & Non-Functional Requirements

Scope summary: File sync, conflict resolution

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Chunk-level dedupe

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Upload blocks; sync metadata; delta sync

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Metadata DB + block blob store; content-hash dedupe

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

files, blocks, devices, versions

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Large file uploads; conflict merges

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Sharing permissions, encryption

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Typeahead / Autocomplete

Functional & Non-Functional Requirements

Scope summary: Low latency <50ms

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Prefix queries, trending

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

GET /suggest?q=pre

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Trie or Elasticsearch completion; popular queries cache

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

n-gram index; query log aggregation

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot prefixes; personalization

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Spell-check, ranking by CTR

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

News Feed Ranking

Functional & Non-Functional Requirements

Scope summary: Personalized ranked feed

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

ML feature store + scoring

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Candidate generation → rank → filter

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Stream processing for features; cache ranked pages

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

posts, user_features, impressions

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Freshness vs relevance; filter bubbles

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Real-time re-rank, A/B infra

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Metrics Monitoring (Datadog)

Functional & Non-Functional Requirements

Scope summary: 1M metrics × 10 tags, write heavy

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Time-series DB

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Agents push; rollup; query API

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Kafka → TSDB (Cassandra/ClickHouse); downsampling

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

series_id, timestamp, value

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Cardinality explosion; query cost

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Alerting, anomaly detection

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Distributed Cache (Redis Cluster)

Functional & Non-Functional Requirements

Scope summary: Cache 100GB+, HA

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Consistent hashing shards

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

GET/SET; TTL; cluster gossip

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Redis cluster slots; replication per shard

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

In-memory only; persistence optional

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot keys; resharding

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Multi-DC, client-side caching

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

E-commerce Checkout

Functional & Non-Functional Requirements

Scope summary: Cart → inventory → payment

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Strong consistency for inventory

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /checkout idempotent; saga for payment

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Reserve inventory; charge; confirm; compensate on fail

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

orders, inventory, payments — transactional

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Race on last item; double charge

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Fulfillment, returns, fraud

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Hotel Booking

Functional & Non-Functional Requirements

Scope summary: Date-range inventory

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Similar to tickets, less spike

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Search availability; book room-night

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Inventory per room-type per night; hold TTL

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

hotels, room_nights, bookings

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Overbooking policies; cancellation

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Rate parity, loyalty

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Google Docs Collaboration

Functional & Non-Functional Requirements

Scope summary: Real-time OT/CRDT

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

WebSocket + operation log

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Send ops; server orders; broadcast

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

OT or CRDT; snapshot + op log

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

doc_id, revision, operations

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Conflict resolution; offline sync

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Comments, permissions, history

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Stack Overflow

Functional & Non-Functional Requirements

Scope summary: Q&A, search, reputation

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Read-heavy

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST questions/answers; search; vote

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

SQL for integrity; ES for search; cache hot questions

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

posts, votes, users, tags

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Reputation gaming; duplicate detection

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Moderation queue, notifications

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Zoom Video Conferencing

Functional & Non-Functional Requirements

Scope summary: SFU/MCU architecture

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

UDP media, signaling TCP

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Signaling server; media SFU; TURN fallback

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Regional SFU mesh; recording to S3

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

rooms, participants, sessions

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: NAT traversal; CPU for video

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Webinar mode, breakout rooms

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Payment Wallet

Functional & Non-Functional Requirements

Scope summary: Ledger correctness

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

ACID + idempotency

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Transfer with idempotency-key; double-entry ledger

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Immutable ledger entries; balance materialized view

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

accounts, ledger_entries, transfers

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Exactly-once; reconciliation

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

KYC, fraud, multi-currency

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Notification Service

Functional & Non-Functional Requirements

Scope summary: Multi-channel delivery

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

1M notifs/min

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Enqueue → workers → email/SMS/push

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Priority queues; templates; device tokens

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

notifications, templates, user_preferences

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Provider rate limits; retries

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Digest batching, A/B

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Ad Click Aggregator

Functional & Non-Functional Requirements

Scope summary: 1M clicks/s aggregate

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Stream processing

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Kafka → Flink → OLAP

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Counting, billing, fraud filters

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

raw_clicks stream; aggregates by campaign

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Late data; exactly-once billing

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Real-time dashboard, attribution

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

API Rate Limiter at Scale

Functional & Non-Functional Requirements

Scope summary: Global edge + regional

  • Define MVP features vs phase-2 (analytics, admin, ML ranking).
  • State who the users are (consumers, businesses, internal operators).
  • Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
  • Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
  • Call out compliance if relevant: GDPR delete, PCI for payments, data residency.
Non-functionalTypical targetDesign lever
Availability99.9%–99.99%Multi-AZ, redundancy, health checks
Latency (p99)50–300 ms readsCache, CDN, regional deployment
DurabilityNo acknowledged write lossReplication, fsync policy, backups
ScaleSee estimates belowSharding, async pipelines, autoscale

Back-of-Envelope Estimates

Millions of keys

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

Edge PoP counters + sync; token bucket

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

CDN edge + central Redis; GCRA algorithm

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

policy store; sharded counters

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Cross-PoP consistency; config propagation

FailureSymptomMitigation
Traffic spikeLatency ↑, errors ↑Autoscale, queue absorption, rate limit
Hot key / shardSingle node saturatedSplit key, local cache, random suffix
Dependency downCascading timeoutsCircuit breaker, timeouts, fallbacks
Data corruptionIncorrect stateChecksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Per-tenant custom limits, burst

  • Multi-region active-active or active-passive — CAP trade-offs on writes.
  • Cost: egress, storage tiering, reserved capacity vs serverless.
  • Security: abuse, authZ scopes, encryption at rest and in transit.
  • Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Quick Reference: Picking Building Blocks

NeedOften choose
Strong transactionsPostgreSQL + application saga for cross-service
Massive write throughputCassandra, DynamoDB, sharded MySQL
Full-text searchElasticsearch / OpenSearch
Async decouplingKafka, SQS, RabbitMQ
Sub-ms readsRedis cluster + CDN
Blob mediaS3 + CloudFront

Continue to Part 28 →

Part 28: Trade-Off Matrices

How to Use Matrices in Interviews

After proposing a design, summarize decisions in a table: option A vs B across dimensions (latency, consistency, cost, ops complexity). Shows structured trade-off thinking.

SQL vs NoSQL

DimensionSQL (Postgres)Document (Mongo)Wide-column (Cassandra)Key-value (DynamoDB)
SchemaRigid, migrationsFlexible JSONRow per partition keySchemaless per item
TransactionsMulti-row ACIDSingle-doc ACIDPer-partition lightweightConditional writes
JoinsNative$lookup or app-sideDenormalizeNo joins
Scale patternRead replicas + shardShard by keyBuilt for write scaleManaged partition
Best fitOrders, accountsCatalog, CMSFeeds, metricsSessions, locks

Push vs Pull (Updates)

DimensionPushPull
Latency to clientLow (server initiated)Bounded by poll interval
Server connectionsStateful (WS)Stateless HTTP
Missed messagesNeed reconnect logicClient controls cursor
Scale costConnection memoryWasted empty polls
ExampleChat, live scoresEmail client sync

Fan-Out Write vs Read

Fan-out on writeFan-out on read
Read costO(1) prebuiltO(followees) merge
Write costO(followers)O(1)
Celebrity problemSevereManageable
StorageHigh (many copies)Low

Cache Patterns

PatternConsistencyWrite amplificationWhen
Cache-asideApp-managed TTLLowGeneral reads
Read-throughCache loads on missLowSimpler app code
Write-throughSync to cache+DBHighStrong read-after-write
Write-behindAsync to DBBatch writesCounters, analytics

Consistency vs Availability (during partition)

ChoiceDuring partitionExample systems
CPReject ops to stay consistentZooKeeper, etcd
APAccept ops; reconcile laterCassandra, DynamoDB (default)

Monolith vs Microservices

FactorMonolithMicroservices
Time to marketFaster earlySlower (infra)
ScaleVertical + replicasPer-service scale
FailuresAll-or-nothing deployIsolated blast radius
DataSingle DB joinsDistributed transactions hard

REST vs gRPC (internal)

REST+JSONgRPC
PerformanceGoodBetter
ContractLooseStrict proto
BrowserYesNeeds gateway
StreamingLimitedFirst-class

Strong vs Eventual — When to Say What

Strong: inventory, wallet, booking. Eventual: likes, view counts, recommendations.

Blob Storage in SQL vs S3

SQL BLOBS3
>1MB fileBadGood
Metadata queryGoodNeed index table

Worked Example: Matrices

Document decision in interview: 'Chose Cassandra AP because write QPS 500K/s, accept eventual timeline.'

Extended Notes

Connect matrices to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Trade-Off Matrices

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How to present matrix in interview?

After proposing design: 'Summarizing: SQL for orders, Redis cache, S3 media — see trade-offs.'

Push vs pull for mobile?

Push for engagement; pull for battery-sensitive background sync.

Extended Reference — Trade-Off Matrices

Using matrices well

Do not read table verbatim — highlight 2 cells relevant to your design decision.

Consistency spectrum

Place your feature on spectrum from strong to eventual — justify with product requirement.

Cost dimension

Add row: operational complexity 1–5 — microservices score high.

When matrices fail

Nuanced decisions need prose — matrix is summary not analysis.

Compare three options

SQL vs Dynamo vs Cassandra — pick two dimensions interviewer cares about.

Part 28 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. SQL vs NoSQL matrix
  2. Push vs pull
  3. Fan-out matrix
  4. Cache pattern matrix
  5. Monolith vs micro
  6. Summarize after design
  7. Two relevant cells
  8. Cost row optional
  9. Consistency spectrum
  10. Do not read table verbatim

Self-test prompt

Explain Part 28 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 28 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 29: Common Interview Mistakes

Jumping to Diagram Too Fast

Drawing boxes before requirements loses points. Spend 5–10 minutes on functional scope, DAU, read:write ratio, latency, consistency needs.

No Numbers

Architecture without BOE feels hand-wavy. Always compute rough QPS, storage, and bandwidth.

Single Point of Failure Blindness

One database, one region, one cache with no replica — interviewers will probe failure. Label replicas, failover, multi-AZ.

Ignoring the Hot Path

Optimize what users do 100×/day (read feed), not edge admin features. State which paths get cache, CDN, sharding.

Cache Everything

Cache without invalidation story or hit ratio assumption. Personalized data at CDN without Vary headers is a common trap.

Wrong Database Choice

Graph DB for simple CRUD; SQL for billion-scale write-heavy counters without plan. Justify with access pattern.

Over-Engineering

Kubernetes + Kafka + microservices for 1000 users MVP. Phased approach: monolith → cache → shard → extract services.

Under-Engineering Critical Paths

Payments with eventual consistency and no idempotency. Seat booking without transactions.

Not Thinking Aloud

Silent drawing confuses interviewer. Narrate trade-offs: "I could use X but choose Y because…"

Ignoring Interviewer Hints

Hints steer toward intended deep dive. If they ask "what if the DB is slow?" — discuss indexes and replicas, not unrelated CDN.

No Monitoring or Launch Plan

Senior candidates mention SLOs, feature flags, gradual rollout, rollback.

  • Fix: use Part 2 framework every time
  • Fix: end with trade-off summary table
  • Fix: invite feedback: "Should we deep dive data model or scaling?"

Red Flags Interviewers Notice

  • Vague 'we'll scale horizontally' without shard key
  • No failure discussion
  • Buzzwords without mechanism
  • Copying Netflix stack for CRUD app

Recovery Phrases

"Let me step back and clarify scale assumptions" — shows maturity when caught in hole.

Worked Example: Mistakes

Candidate drew 15 boxes in 2 minutes with no requirements — failed communication dimension.

Extended Notes

Connect mistakes to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Common Mistakes

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Biggest junior mistake?

No requirements — jump to Kafka and microservices.

Biggest senior expectation?

Operational completeness: metrics, rollout, failure modes unprompted.

Extended Reference — Common Mistakes

Time management

Spending 25 min on DB schema before high-level diagram — reverse order loses structure points.

Hint integration

Interviewer says 'what about cache' — pivot immediately; ignoring hint is negative signal.

Overconfidence

Claiming zero downtime without explaining mechanism — credibility loss.

Underconfidence

Silence is worse than wrong try — think aloud partial ideas.

Post-interview

Do not argue feedback — note and improve.

Part 29 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Requirements first
  2. BOE before diagram
  3. No SPOF blind
  4. Hot path focus
  5. Trade-offs spoken
  6. Think aloud
  7. Take hints
  8. Monitoring mentioned
  9. Phased rollout
  10. No buzzword soup

Self-test prompt

Explain Part 29 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 29 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 30: Communication Scripts

Opening (First 2 Minutes)

"Thanks — before I design, I want to clarify scope. Is this mobile + web? Rough scale in DAU? Should I focus on read or write path first? Any constraints like existing AWS stack or strong consistency requirements?"

Clarifying Functional Requirements

  • "Core user actions are X, Y, Z — anything else in v1?"
  • "Do we need real-time updates or is 30-second delay OK?"
  • "Public vs private content — different retention?"
  • "Anonymous users or login required?"

Clarifying Non-Functional Requirements

  • "Target p99 latency for reads? Writes?"
  • "Availability target — 99.9% or 99.99%?"
  • "Durability — can we ever lose a post / payment?"
  • "Geographic focus — single region or global?"

While Estimating

"I'll assume 50M DAU, 10 reads per user per day — that's 500M reads/day, about 6K average QPS, ~30K peak with a 5× multiplier. Does that match your expectations?"

Introducing High-Level Design

"I'll sketch clients → CDN for static → load balancer → stateless API tier → cache → primary database, with async workers on a queue for heavy tasks."

Trade-Off Phrasing

Instead ofSay
"We'll use NoSQL""Access pattern is key-value by user_id; I'll use Dynamo for horizontal scale; we give up cross-shard joins"
"We'll cache it""80% hit ratio assumed; TTL 5 min with invalidation on write"
"Eventually consistent""Followers may see new post up to 30s late; acceptable for feed per product"

When Stuck

"I'm weighing fan-out on write vs read — for celebrities, hybrid is industry standard. I'll go hybrid unless you want to optimize for write simplicity."

Closing (Last 2 Minutes)

"To recap: stateless APIs behind LB, Redis timeline cache with hybrid fan-out, Postgres sharded by user_id, S3 for media, Kafka for async. I'd add p99 latency and replication lag alerts. With more time I'd detail search indexing and multi-region DR."

Responding to Challenges

"Good point — if the cache fails we degrade to DB with circuit breaker and higher latency; we don't fail closed unless data correctness requires it."

Deep Dive Invitation

"I can go deeper on data model, consistency, or ops — which is most valuable?"

Acknowledging Unknown

"I haven't operated Cassandra in prod; at high level it uses partition keys and tunable quorum — I'd partner with DBA for SLA specifics."

Worked Example: Scripts

Practice recording 5-min clarify+BOE aloud weekly; playback catches filler and silence.

Extended Notes

Connect scripts to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Communication

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How long to clarify?

5–10 min acceptable — shows thoroughness; don't exceed without checkpoint.

How to handle 'you're wrong'?

Explore: 'If we need strong consistency here, I'd move writes to primary — does that match product?'

Extended Reference — Communication Scripts

Pacing

Pause after BOE: 'Does 100M DAU sound right?' — engages interviewer as collaborator.

Jargon control

Define acronyms once: 'CDN (edge cache)' — interviewer may be cross-functional.

Diagram narration

Left to right: 'User hits CDN, then...' — orient viewer continuously.

Trade-off sandwich

We gain X, we sacrifice Y, because product priority Z.

Closing question

Ask interviewer: 'What would you prioritize next for v2?' — shows curiosity.

Part 30 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Opening clarify script
  2. Assumption validation
  3. BOE narrated
  4. Trade-off sandwich
  5. Deep dive offer
  6. Stuck recovery phrase
  7. Closing recap 30s
  8. Ask interviewer question
  9. Acknowledge challenge
  10. Collaborative tone

Self-test prompt

Explain Part 30 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 30 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 31: 8-Week & 12-Week Study Plans

8-Week Plan (Intensive)

WeekFocusDaily topics (Mon–Sun)
1FoundationPart 0–2; 1 BOE exercise/day; 1 mock clarify-only
2Estimation & scalePart 3–4; daily latency quiz; scale 5 products on paper
3Networking & cachingPart 5–7; draw CDN+LB for 3 apps
4DatabasesPart 8–12; SQL vs NoSQL matrix; saga exercise
5Distributed systemsPart 11–13; CAP scenarios; Kafka ordering drill
6Architecture stylesPart 14–17; rate limiter design; consistent hash drill
7Ops & patternsPart 18–21; SLO math; circuit breaker scenarios
8Mocks & executionPart 27–33; 3 full mocks; rubric self-score

12-Week Plan (Steady)

WeekTopicsPractice
1–2Parts 1–3, 28–302 BOE drills/week; communication scripts aloud
3–4Parts 4–71 design: URL shortener, rate limiter
5–6Parts 8–101 design: Twitter feed, shard key exercises
7–8Parts 11–151 design: chat; API style comparison writeup
9–10Parts 16–221 design: Dropbox, payment ledger outline
11Parts 23–261 design: notification system end-to-end
12Parts 27, 32–334 full timed mocks; review mistake list

Daily 90-Minute Block Template

  1. 15 min — flash review (latency table, CAP, one matrix)
  2. 45 min — read one Part section deeply; notes in own words
  3. 30 min — whiteboard mini-design or explain aloud recorded

Weekend Deep Work

Saturday: full 45-min mock with peer or AI. Sunday: postmortem using Part 33 rubric; update weak-area queue for next week.

Part 27 Walkthrough Rotation

Week 8+: one classic design daily from guide Part 27: URL shortener, Twitter, Uber, WhatsApp, YouTube.

Spaced Repetition

Re-read Parts 3, 11, 28 every 2 weeks — core interview anchors.

Worked Example: Study

Track hours: 40% reading, 40% whiteboard, 20% mock — adjust if mocks score low.

Extended Notes

Connect study to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Study Plans

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

8 vs 12 week plan?

8 if interview in 2 months intensive; 12 if part-time while employed.

How many mocks?

Minimum 8–12 full mocks before onsite loop.

Extended Reference — Study Plans

Active recall

Close guide; sketch Twitter on blank paper from memory — gaps drive next reading.

Spaced repetition

Anki deck for latency numbers, CAP, algorithms — 10 min daily.

Peer mocks

Swap interviewer role — teaching exposes gaps.

Company-specific

Meta: feed/ranking. Amazon: retail inventory. Google: search/index. Stripe: payments/idempotency.

Burnout prevention

One day off weekly — retention drops when exhausted.

Part 31 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. 8-week plan track
  2. 12-week if employed
  3. Daily 90 min block
  4. Weekend mock
  5. Part 27 rotation
  6. Spaced repetition
  7. Active recall
  8. Company specific focus
  9. Peer exchange
  10. Rest day weekly

Self-test prompt

Explain Part 31 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 31 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 32: Day-Before & Day-Of Checklist

Day Before Interview

  • Review latency numbers (Part 3 table) — 10 min
  • Skim trade-off matrices (Part 28) — 15 min
  • Re-read communication scripts (Part 30) — 10 min
  • One 25-min timed mini-design (clarify + BOE + high-level only)
  • Prepare 2 questions for interviewer about team/system
  • Test whiteboard tool (Excalidraw, CoderPad), camera, mic, internet backup
  • Sleep 7+ hours — cognitive performance drops sharply when tired

Day Of — 2 Hours Before

  • Light breakfast; hydrate
  • No cramming new topics — confidence from frameworks
  • Close noisy apps; phone silent
  • Open blank board tab + one-page cheat sheet (BOE formulas only)

15 Minutes Before

  • Bathroom, water nearby
  • Deep breath; review opening script once
  • Remind: collaboration, not exam — think aloud

During Interview

  1. Clarify requirements before drawing
  2. State assumptions and ask validation
  3. BOE before deep architecture
  4. Label diagram components and arrows
  5. Pause for questions: "Does this direction make sense?"
  6. Leave 5 min for summary and trade-offs

After Interview

Write notes while fresh: questions asked, hints given, what to study. Do not obsess on outcome — process improvement matters.

ItemDone?
Tool tested
Framework internalized
Opening script ready
Questions for interviewer

Virtual Interview Setup

  • Second monitor for notes
  • Browser zoom 100%
  • Pen and paper backup if whiteboard fails

Energy Management

Back-to-back interviews: protein snack between; avoid heavy lunch carb crash.

Worked Example: Checklist

Bring water; interviewer waits if you need 10 seconds to think — say 'let me structure this.'

Extended Notes

Connect checklist to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Checklists

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Whiteboard tool failure?

Verbal description + ASCII in chat — communication still scored.

Post-interview note?

Within 1 hour: questions, hints, weak dimensions for next study week.

Extended Reference — Day-Before & Day-Of

Materials

Water, charger, backup internet hotspot for virtual.

Mindset

Interview is collaborative design session not exam — reduces anxiety.

During lag

If video freezes, summarize last sentence when reconnected — maintain thread.

Note taking

Interviewer may allow notes — have BOE formulas written.

After

Send thank-you not required at big tech — focus on self debrief.

Part 32 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Tool tested
  2. Latency table skim
  3. Matrices skim
  4. Opening script
  5. Water charger
  6. No cram new topic
  7. Think pause OK
  8. Post debrief notes
  9. Questions for interviewer
  10. Sleep priority

Self-test prompt

Explain Part 32 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 32 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 33: Mock Interview Rubric — Self-Score

How to Use This Rubric

After each mock, score 1–5 per dimension (1 = weak, 5 = strong). Track weekly; target average ≥4 on dimensions that matter for level. Compare with Part 1 interviewer expectations.

Scoring Scale

ScoreMeaning
1Missing or incorrect
2Superficial mention
3Adequate with gaps
4Solid, minor misses
5Strong, proactive depth

Dimension Definitions

DimensionScore 1Score 5
RequirementsJumped to designFunctional + NFR + scale + constraints
EstimationNo numbersFull BOE chain with stated assumptions
High-level designConfusing diagramClear layers, labeled flows
Data modelMissing schemaTables/keys/indexes justified
ScalingNo sharding/cacheHot keys, replicas, CDN addressed
ReliabilityHappy path onlyFailures, retries, SPOF mitigation
Trade-offsOne-sidedExplicit pros/cons; matrices
CommunicationSilent or ramblingStructured, collaborative, concise

Self-Score Sheet (Copy Per Mock)

Dimension1–5Notes / evidence
Requirements & scope
Back-of-envelope
API / interface design
High-level architecture
Data storage & model
Caching & CDN
Async / queues
Scaling & sharding
Consistency & reliability
Security & privacy
Observability & ops
Communication
Total /60

Interpretation

  • 48–60: Interview-ready for most senior loops
  • 36–47: Targeted study on lowest 3 dimensions
  • <36: Repeat framework (Part 2); more mocks before real interviews

Action Template

Lowest dimension this week: ___. Study Part #___. Drill: one mock focusing only on that phase next session.

Peer Mock Exchange

Swap rubrics with study partner; score each other blind; compare self vs peer scores for calibration.

Weekly Trend

Plot total score week over week — plateau means change mock format (harder problems, shorter time).

Worked Example: Rubric

Score communication separately even if design weak — improves hire/no-hire in borderline cases.

Extended Notes

Connect rubric to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Rubric

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Self-score inflation?

Compare with peer mock scores — calibrate harshly on communication and depth.

Hire bar mapping?

48/60+ consistent across 3 mocks suggests readiness for many FAANG loops.

Extended Reference — Mock Interview Rubric

Calibration

Score first mock harshly (3 average) — improvement visible by mock 5.

Dimension weighting

L5: depth + trade-offs weighted higher than perfect diagram art.

Communication 5

Requires thinking aloud entire session without long silence.

Tracking spreadsheet

Date, problem, scores per dimension, action items — weekly review.

Hire decision

Rubric guides study; actual hire uses holistic loop — don't overfit one mock score.

Part 33 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

  1. Score 12 dimensions
  2. Notes column evidence
  3. Weekly trend
  4. Peer calibration
  5. Action on lowest
  6. 48+ target
  7. Communication separate
  8. Mock count 8+
  9. Honest scoring
  10. Hire bar holistic

Self-test prompt

Explain Part 33 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 33 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

This guide was created with the help of Cursor, which assisted with structuring, drafting, and refining the content for clarity and completeness.