System Design Interview — Complete Mastery Guide

A self-contained reference for FAANG-style and senior engineering system design interviews

How to Use This Guide

This document is designed as your single source of truth for system design interview preparation. Read it linearly once for orientation, then use the sidebar table of contents to drill into weak areas. Each part builds on prior concepts: fundamentals (Parts 1–3), building blocks (Parts 4–20), patterns and storage (Parts 21–25), full designs (Part 27), and interview execution (Parts 28–33).

First pass (2 weeks): Parts 0–3, 28–30. Skim Part 27 walkthrough titles.
Second pass (4 weeks): Parts 4–20 in depth. Do one Part 27 walkthrough per day.
Third pass (ongoing): Mock interviews using Part 33 rubric. Re-read trade-off matrices before interviews.

Practice aloud: explain diagrams as if to an interviewer. Time-box yourself to 45 minutes per mock design.

Study modes: (1) Reading mode — understand concepts. (2) Active recall — cover diagrams and explain from memory. (3) Timed mock — random Part 27 problem, 45 min timer. (4) Peer review — swap designs and critique using Part 33 rubric.

Document Map

Parts	Topic	When to study
1–3	Interview mechanics & estimation	Week 1
4–7	Traffic path: LB, cache, CDN	Week 2
8–12	Data: DB, replication, transactions	Week 3–4
13–20	Distributed systems & ops	Week 5–6
21–26	Patterns & domain designs	Week 7
27	25 full walkthroughs	Daily practice weeks 8–12
28–33	Execution & checklists	Before every interview

↑ Back to top

Part 1: Interview Format & What Interviewers Score

Typical 45–60 Minute Structure

Most system design interviews at large tech companies run 45–60 minutes with a single problem. The first 5–10 minutes are requirements and scope; 10–15 minutes high-level architecture; 20–30 minutes deep dives on data model, scaling, and failure modes; the last 5 minutes trade-offs and extensions.

Phase	Time	Your Goal
Clarify & scope	5–10 min	Functional/non-functional requirements, users, scale, constraints
High-level design	10–15 min	Boxes-and-arrows: clients, LB, services, caches, DBs, queues
Deep dive	20–30 min	Schema, APIs, sharding, consistency, bottlenecks — interviewer-led
Wrap-up	5 min	Summary, monitoring, future work, what you'd do with more time

What Interviewers Score

Interviewers use a holistic rubric, not a single correct diagram. They evaluate:

Problem solving: Can you decompose an ambiguous problem and prioritize what matters?
Technical depth: Do you understand how databases, caches, queues, and networks behave at scale?
Trade-off reasoning: Can you articulate why you chose SQL vs NoSQL, sync vs async replication, etc.?
Communication: Do you think aloud, check assumptions, and respond to hints?
Operational awareness: Monitoring, failure modes, security, cost — not just happy path.

Senior vs Mid-Level Expectations

Dimension	Mid (L4/L5)	Senior (L6+)
Scope	One clear product feature	Multi-region, org boundaries, platform concerns
Depth	Correct building blocks	CAP, consistency, idempotency, saga, observability SLOs
Leadership	Follows hints	Proactively surfaces risks, drives discussion
Estimation	Order-of-magnitude OK	Back-of-envelope with explicit assumptions

Virtual Whiteboard Tips

Use a consistent layout: users left, data stores right, async flows bottom.
Label arrows (HTTPS, events, replication). Unlabeled lines confuse you and the interviewer.
Draw incrementally — don't erase entire diagrams; add layers (MVP → scale).
Keep text large enough to read on a shared screen; abbreviate (API GW, DB) consistently.
Excalidraw, Miro, or built-in CoderPad — practice one tool before interview day.
When stuck, narrate: "I'd pause here and validate QPS assumptions with you."

↑ Back to top

Part 2: The Answer Framework

Use this repeatable framework for every design question. Interviewers recognize structured thinking even when the exact architecture differs.

Step 1: Requirements

Functional: What the system must do (users post tweets, shorten URLs, book seats).

Non-functional: Scale, latency, availability, durability, consistency, security, cost.

Example script: "I'll assume 100M DAU, read-heavy 100:1, p99 read latency under 200ms, 99.9% availability unless we need stronger consistency for payments."

Step 2: Constraints & Assumptions

Budget, team size, existing stack, regulatory (GDPR, PCI), geographic focus
Explicitly state what you are not building (e.g., ML ranking v1, admin portal)

Step 3: Back-of-the-Envelope

DAU → QPS (peak ~2–5× average), storage per object × objects/year, bandwidth. See Part 3.

Step 4: API Design

RESTful resources or RPC methods; idempotency keys for writes; pagination cursors. Keep to 5–8 core endpoints in the interview.

Step 5: High-Level Diagram

[Clients] → [CDN] → [LB] → [API Servers] → [Cache]
                              ↓
                    [Workers] ← [Queue] → [DB / Object Store]

Step 6: Deep Dives

Interviewer picks: data model, hot paths, sharding key, cache strategy, fan-out, consistency.

Step 7: Bottlenecks & Mitigations

DB write throughput, hot keys, thundering herd, single points of failure — pair each with a fix.

Step 8: Trade-offs Summary

One sentence each: "We chose eventual consistency for feeds because… at the cost of…"

Step 9: Closing Summary

Recap architecture in 30 seconds; mention monitoring and phased rollout.

↑ Back to top

Part 3: Back-of-the-Envelope Estimation

Why Estimation Matters in Interviews

Interviewers rarely expect exact numbers; they want to see that you decompose a fuzzy problem into measurable quantities, state assumptions explicitly, and sanity-check whether your architecture can handle the load. A five-minute back-of-the-envelope (BOE) prevents you from proposing a single MySQL instance for a billion-read-per-day product.

Good estimation is a chain of reasoning: daily active users (DAU) lead to actions per day, which become average and peak queries per second (QPS), which drive storage growth, egress bandwidth, cache sizing, and shard counts. Each hop should be spoken aloud so the interviewer can correct assumptions early.

Script: "With 100M DAU and each user viewing 20 pages/day, that is 2B page views/day. Dividing by 86,400 seconds gives ~23K average QPS; peak is often 2–5×, so I will plan for ~100K read QPS at peak."

Latency Numbers Every Engineer Should Know

Memorize orders of magnitude so you can reason without looking up charts. Times vary by hardware; use these as interview anchors when arguing for caches, CDNs, or async processing.

Operation	Typical Latency	Notes
L1 cache reference	0.5 ns	CPU-local
Branch mispredict	5 ns	Pipeline flush
L2 cache	7 ns
Mutex lock/unlock	25 ns	Uncontended
Main memory reference	100 ns	DDR4/5
SSD random read	16 µs	NVMe faster
Round trip in datacenter	0.5 ms	Same AZ
Redis/Memcached RTT	0.5–1 ms	Local network
SSD sequential 1 MB	1 ms
Disk seek (HDD)	10 ms	Avoid in hot path
Send 1 MB over 1 Gbps LAN	10 ms
Cross-country RTT	40–80 ms	US coast-to-coast
Read 1 MB from S3 (first byte)	100–300 ms	Region-dependent
Database query (simple indexed)	1–10 ms	Local DB
Complex DB join / full scan	10–100+ ms	Why indexes matter

Rule of thumb: one cross-region RTT (50–150 ms) dominates a datacenter cache hit (sub-ms). If your design needs 20 sequential RPCs across regions, latency will exceed 1 second before application logic runs — batch, parallelize, or move data closer.

From DAU to QPS

requests_per_day = DAU × actions_per_user_per_day
avg_QPS = requests_per_day / 86,400
peak_QPS ≈ avg_QPS × peak_multiplier   # often 2–5× for consumer apps

Example: 50M DAU, 10 timeline loads/day → 500M reads/day → ~5,800 average QPS → ~29K peak at 5× multiplier during evening hours.

Writes: Often one to two orders of magnitude lower than reads in social and feed products. State read:write ratio explicitly (e.g., 100:1). For write-heavy systems (logging, IoT ingestion), invert the analysis and size for ingest QPS first.

Storage Estimation Formulas

storage_per_year = objects_per_year × bytes_per_object × replication_factor
objects_per_year = (new_objects_per_second) × 86,400 × 365

Object	Size (order of magnitude)
User profile row	1–4 KB
Tweet / short post metadata	300 B – 2 KB; media separate
Image (compressed)	200 KB – 2 MB
Video minute (1080p)	50–150 MB
Log line (JSON)	0.5–2 KB
UUID + indexes overhead	+30–50% on row size

Worked example: 10M new photos/day × 500 KB average × 3× replication ≈ 15 TB/day before compression and lifecycle tiering — clearly object storage (S3/GCS) territory, not inline BLOB columns in OLTP.

Account for soft deletes, audit trails, and backups: operational storage often exceeds raw user data by 2×. Cold tier (Glacier) reduces cost but not logical size on planning spreadsheets.

Power of Two for Capacity Planning

Power	Exact	Approx
2^10	1,024	~1 thousand (1 KB)
2^20	1,048,576	~1 million (1 MB)
2^30	1,073,741,824	~1 billion (1 GB)
2^40	~1.1×10^12	~1 trillion (1 TB)
2^50	~1.1×10^15	~1 quadrillion (1 PB)

Use powers of two when estimating shard counts, hash ring size, and memory: a 32-bit user ID space has 4B values; at 1 KB per cached profile, fully populated memory would be 4 TB (never fully hot). Sharding by user_id mod 1024 yields 1024 shards — a clean power-of-two boundary.

Bandwidth Estimation

egress_Gbps = peak_QPS × avg_response_bytes × 8 / 10^9

1 Gbps ≈ 125 MB/s theoretical maximum. A 500 KB JSON API at 10K QPS needs roughly 40 Gbps egress at origin — CDN edge caching and compression are mandatory, not optional optimizations.

Include upload bandwidth for user-generated content: 1M uploads/day × 2 MB average ≈ 23 GB/s average if spread evenly — in reality peak upload windows concentrate load on ingress load balancers and object-store write paths.

Availability Math

Independent components in series multiply reliability: if A is 99.9% and B is 99.9%, combined ≈ 99.8%. Parallel redundancy improves availability: 1 - (1-p)^n for n identical redundant nodes.

Nines	Downtime/year
99%	3.65 days
99.9%	8.76 hours
99.99%	52.6 minutes
99.999%	5.26 minutes

Interview tip: tie nines to product — 99.9% may be fine for a news feed; payment authorization often needs multi-region active-active and stricter SLOs. Mention error budgets (Part 18) when discussing how much downtime is acceptable.

Servers as a Sanity Check

Rough capacity: one modern app server might handle 500–2,000 RPS for light JSON (highly workload-dependent). 100K QPS divided by 1K per server ≈ 100 servers before cache — then apply cache hit ratio: 90% hit rate cuts origin load by 10×.

Database connection limits often bind before CPU: 500 app servers × 10 connections each = 5,000 connections — many managed Postgres tiers cap below that, requiring PgBouncer or fewer, larger connection pools with careful tuning.

[Assumption chain]
  DAU → actions/day → QPS (avg & peak)
       → storage/year (× replication)
       → bandwidth (× bytes/response)
       → cache hit ratio → DB QPS
       → shard count / machine count
       → monthly cost (servers + egress + storage)

Common BOE Mistakes

Forgetting peak multiplier and planning only for average QPS
Ignoring replication factor and backup storage in disk math
Using HDD seek latency assumptions for SSD/NVMe-backed stores
Treating CDN hit ratio as 100% without stating edge cache assumptions
Confusing bits and bytes in bandwidth (×8 conversion)

Practice Problem

Design a photo-sharing app BOE: 20M DAU, 3 photo views and 0.2 uploads per user per day, 400 KB average display size, 2 MB upload, 5-year retention. Walk through QPS, storage/year, and peak egress. Compare with and without 85% CDN cache hit on reads.

Interview BOE Drills

Practice these until automatic:

URL shortener: 100M URLs/month → writes/s; 10:1 read → read QPS; 500 B row → GB/year
Photo app: 10M photos/day × 2 MB → 20 TB/day raw; CDN hit ratio effect on origin
Chat: 1M concurrent × 1 msg/min → message QPS; WS memory per connection

Latency Budget Example

p99 target 200ms for API: CDN 20ms + LB 5ms + app 30ms + cache 2ms + DB 80ms + serialization 10ms + margin 53ms. If DB is 80ms, you cannot add 5 sequential microservice hops without busting budget.

Availability Budget Math

99.9% monthly ≈ 43 minutes downtime. If deploy 20 times/month with 0.1% blast per deploy, plan canary and automatic rollback. Error budget policy links reliability to release velocity.

Worked Example: News Site BOE

Assumptions: 20M DAU, 50 article views/user/day, 500 KB average page (HTML+assets), 80% CDN hit ratio.

Views/day = 20M × 50 = 1B. Avg QPS = 1B/86400 ≈ 11,600. Peak 5× ≈ 58,000 read QPS.

Origin QPS = 58K × 20% = 11,600 if CDN handles 80%. Egress without CDN: 1B × 500KB = 500 TB/day — impossible without CDN.

Storage: 10K new articles/day × 50 KB text + 2 MB images × 3 replicas ≈ 60 GB/day text-heavy; media in S3 not counted in DB row size.

Q&A

Q: Why powers of two for shards? A: Clean routing bitmask (user_id & 0x3FF), even split in consistent hash rings.

Q: How many servers for 58K QPS? A: If 2K RPS/instance → ~30 origin app servers before DB/cache; cache cuts DB load further.

Bandwidth Worked Numbers

Payload	QPS	Egress Gbps
1 KB JSON	100K	0.8
10 KB	100K	8
100 KB	100K	80
1 MB	10K	80

Interview Question Bank — Back-of-Envelope

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How do you estimate peak QPS from DAU?

DAU × actions per day / 86400 × peak multiplier (2–5×). State assumptions explicitly.

How much storage for 5 years of tweets?

Daily tweets × size × 365 × 5 × replication. Separate media to object storage.

What latency dominates cross-region design?

RTT 50–150ms per round trip — minimize sequential RPCs.

How do you convert availability % to downtime?

99.9% ≈ 8.76 hours/year. Use for error budget discussions.

Additional BOE Practice

Review this section with Part 27 walkthroughs — apply boe calculations to each classic problem.

Exercise	Goal
Recalculate QPS	Under 2 min without notes
Identify bottleneck	Label on diagram
Propose mitigation	With trade-off sentence

Part 3 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Latency table memorized
DAU→QPS formula
Storage/year calc
Bandwidth Gbps
Power of two
Availability nines
Assumption chain spoken
Peak multiplier 2-5x
Sanity check servers
CDN impact on egress

Self-test prompt

Explain Part 3 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 3 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 4: Scalability

What Scalability Means

Scalability is the ability of a system to handle increased load by adding resources without redesigning core architecture. In interviews, distinguish vertical scaling (bigger machines) from horizontal scaling (more machines). Most web-scale systems horizontal-scale stateless tiers and partition stateful data.

Scalability has dimensions: load (QPS), data volume, fan-out complexity, geographic distribution, and team/org scale. Clarify which dimension dominates for the problem at hand.

Vertical vs Horizontal Scaling

Aspect	Vertical (scale-up)	Horizontal (scale-out)
How	More CPU/RAM/disk on one node	Add nodes behind LB
Limits	Hardware ceiling, single point of failure	Requires partition-friendly design
Cost curve	Expensive high-end boxes	Commodity hardware, linear-ish
Downtime	Often requires restart	Rolling deploys, replace nodes
Interview use	Quick MVP, DB until sharding	Default for stateless app tier

Databases often scale vertically first (read replicas, bigger instance), then shard horizontally. Application servers horizontal-scale from day one in most designs.

Stateless Application Tier

Stateless servers store no session data locally; any request can land on any instance. Session state lives in client tokens (JWT), centralized session store (Redis), or database. This enables elastic autoscaling and zero-downtime deploys.

         [LB]
       /  |     [App1][App2][App3]  ← no local session
       \  |  /
      [Redis sessions] or [JWT in cookie]

Anti-pattern: sticky files on disk per server without shared storage — breaks scale-in and causes data loss on node termination.

Sticky Sessions

Load balancers can pin a user to one backend via cookie or connection affinity. Useful when legacy app keeps local cache or non-replicated sessions. Downsides: uneven load, poor failover, complicates deploys and autoscaling.

When acceptable: Short migration period, WebSocket origin pinning with reconnect logic
Prefer instead: External session store, stateless APIs, connection draining on deploy
If you mention sticky sessions, always note load imbalance risk and mitigation (session replication)

Autoscaling

Autoscaling adjusts instance count based on metrics (CPU, request count, queue depth, custom business metrics). Scale-out triggers add capacity before SLO breach; scale-in removes idle capacity to save cost.

Signal	Pros	Cons
CPU utilization	Simple	Laggy; misleads on I/O-bound work
Request rate / latency p99	User-visible	Needs good LB metrics
Queue depth	Great for workers	Not for synchronous API tier alone
Schedule-based	Predictable peaks (TV events)	Wastes capacity if wrong

Cooldown periods prevent flapping. Warm pools and pre-warmed AMIs reduce cold-start latency for latency-sensitive APIs. Mention minimum instance count for availability during scale-from-zero (if allowed).

Scaling Stateful Components

Caches scale via clustering and consistent hashing. Databases scale via read replicas, sharding, and federation. Queues scale via partitions and consumer groups. Each stateful layer needs its own scaling story — do not assume app autoscaling fixes DB writes.

Bottleneck Hierarchy

Single DB master write throughput
Hot keys / hot partitions
Expensive synchronous RPC chains
Lock contention on shared resources
Thundering herd on cache miss
Cross-region replication lag

Interview flow: identify the first bottleneck at estimated peak load, propose mitigation, re-estimate capacity, repeat.

Elasticity vs Performance

Serverless and aggressive autoscaling maximize elasticity; fixed large pools minimize tail latency variance. Cost vs latency trade-off: financial systems may keep warm capacity; batch analytics may scale to zero overnight.

Senior signal: Discuss scaling limits of the team — microservices scale independently but multiply operational overhead. A monolith with modular boundaries may scale further with one on-call rotation.

Case Study: E-commerce Checkout

Browse/catalog tier: horizontal stateless, CDN, read replicas. Cart: Redis per user with TTL. Checkout: smaller pool, stricter timeouts, idempotent payment API, queue for order fulfillment. Scale browse 100× checkout — different tiers, different scaling policies.

[Browse]  → many replicas, CDN, cache-heavy
[Cart]    → Redis cluster, moderate replicas
[Checkout]→ few replicas, sync payment, saga async

Scaling Case Study

Instagram scaled Python app servers horizontally behind LB; Memcached for hot objects; sharded Postgres/Cassandra for data. Key lesson: stateless app tier scales linearly until database becomes bottleneck — then shard or cache.

Auto-Scaling Signals

Signal	Scale out when	Caution
CPU	>70% for 5 min	CPU low but queue deep — scale on queue depth
Request rate	Approaching RPS limit per instance	Coordinate with DB capacity
Custom	Kafka consumer lag > threshold	Adding consumers > partitions useless

Sticky Sessions Detail

Cookie-based affinity routes user to same server for session data in memory — fragile on deploy (drain connections). Prefer external session store (Redis) + stateless servers.

Worked Example: E-Commerce Checkout Scale

Black Friday 10× normal: auto-scale API 50→500 pods in 10 min. Database cannot scale 10× instantly — queue checkout requests, show wait time, prioritize payment capture.

Stateless cart in Redis keyed by session_id; order creation idempotent. Sticky sessions avoided — all state external.

Q&A

Q: Vertical vs horizontal first? A: Vertical until single-machine limits (CPU/RAM/disk IOPS), then read replicas, then shard writes.

Interview Question Bank — Scalability

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

When is vertical scaling enough?

Low traffic MVP, single-region, team velocity priority — until CPU/IO saturates.

What makes a service stateless?

Any instance handles any request; session in Redis/DB; no local disk state.

How does auto-scaling avoid flapping?

Cooldown periods, hysteresis thresholds, scale-up faster than scale-down.

Additional Scale Practice

Review this section with Part 27 walkthroughs — apply scale calculations to each classic problem.

Exercise	Goal
Recalculate QPS	Under 2 min without notes
Identify bottleneck	Label on diagram
Propose mitigation	With trade-off sentence

Part 4 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Vertical limits named
Stateless app tier
Sticky session downside
Auto-scale signals
Scale DB last
Read replicas
Connection pool limits
Split compute/storage
No local disk state
Phase scaling plan

Self-test prompt

Explain Part 4 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 4 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 5: Load Balancing

Role of Load Balancers

Load balancers distribute traffic across healthy backends, terminate TLS, enforce routing rules, and provide a stable endpoint while instances churn. They sit between clients and your application tier, and between internal services in multi-tier designs.

Layer 4 vs Layer 7

Layer	OSI	Routes on	Aware of HTTP	Use case
L4	Transport	IP + port	No	TCP pass-through, gaming, extreme throughput
L7	Application	URL path, headers, host	Yes	REST APIs, sticky cookies, A/B routes

L4 LB forwards packets with minimal inspection — lower latency, cannot route /api to different pool than /static. L7 can route Host: api.example.com to gRPC pool and www to web pool; can inject headers (X-Request-ID).

Load Balancing Algorithms

Algorithm	Behavior	When to use
Round robin	Cycle backends	Homogeneous, equal capacity
Weighted round robin	Proportional to weight	Mixed instance sizes
Least connections	Fewest active conns	Long-lived requests, variable duration
Least response time	Lowest latency backend	Heterogeneous performance
Random + two choices	Pick 2 random, use least loaded	Power of two choices — near-optimal
IP hash	Client IP → fixed backend	Legacy sticky without cookies

Consistent hashing (Part 17) appears at cache layers and some L7 gateways for shard-aware routing. Do not confuse LB algorithms with data partitioning hashes.

Health Checks

Active health checks: LB periodically calls /health and removes failing nodes. Passive checks: observe error rates from real traffic. Use deep checks sparingly — hitting DB on every probe overloads dependencies.

Liveness: Process up? Return 200 if server binds port.
Readiness: Can serve traffic? DB connected, cache warmed, migrations done.
Kubernetes: liveness vs readiness probes map directly to interview answers

Graceful shutdown: on SIGTERM, stop accepting new connections, drain in-flight requests (30–60s), then deregister. Prevents 502 spikes during deploys.

Global Load Balancing

Global server load balancing (GSLB) directs users to nearest healthy region using DNS, anycast, or edge networks. Goals: lower latency, disaster recovery, regulatory data residency.

User in Tokyo → GSLB → ap-northeast-1
User in London → GSLB → eu-west-1
Region failure → DNS/health failover → us-east-1

Challenges: cross-region data consistency, session stickiness across regions, cache invalidation globally. Often pair GSLB with geo-replicated data or region-scoped user accounts.

DNS Load Balancing

DNS returns multiple A/AAAA records with short TTL (30–300s). Clients pick randomly or by resolver behavior — crude load spread. DNS failover removes unhealthy IPs after TTL propagation delay.

Limitations: DNS caching causes stale routes; not good for fine-grained load control. Commonly combined with Anycast IP (one IP, BGP routes to nearest POP) at CDN/LB edge.

TLS and Connection Management

TLS termination at LB offloads crypto from app servers. TLS passthrough preserves end-to-end encryption but limits L7 routing. HTTP/2 and gRPC multiplex many streams on one connection — least-connections matters more than round robin.

Internal Service Load Balancing

Sidecars (Envoy) and client-side LB (gRPC name resolution) distribute east-west traffic inside Kubernetes. Service mesh adds retries, timeouts, circuit breaking at data plane — see Part 14.

Failure Modes

Thundering herd when all backends marked unhealthy — keep minimum healthy pool
SYN flood — SYN cookies, rate limits at edge
LB itself as SPOF — cloud LB is managed; self-hosted needs HA pair (VRRP)
Misconfigured idle timeout killing long WebSockets

Interview Checklist

Where does TLS terminate?
L4 or L7 — can we route by path/host?
Health check type and drain strategy on deploy?
Single region or GSLB — how failover works?

Health Check Types

Liveness: process up? Restart if fails
Readiness: can accept traffic? Remove from LB if DB down
Deep check: optional dependency ping — use sparingly (cascades)

Global Load Balancing

GeoDNS or Anycast routes user to nearest healthy region. Health checks per region; failover when region degraded. Data replication lag limits active-active for strongly consistent apps.

DNS Round Robin vs LB

DNS multiple A records — client picks; TTL caching causes stale routes. Application LB (ALB, NGINX) preferred for HTTP with health checks.

Worked Example: Global API

Users in US, EU, APAC. Route53 latency-based routing to regional ALB. EU data stays EU (GDPR). Health check removes region on 5xx spike.

Q&A

Q: L7 vs L4 for WebSocket? A: L7 ALB supports WS upgrade; L4 passes through opaque TCP — use when need raw TCP or extreme throughput.

Interview Question Bank — Load Balancing

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Why least connections vs round robin?

Long-polling/WebSocket ties up connections — least connections balances better.

How do health checks cause outages?

Too aggressive checks mark healthy nodes bad — use readiness not deep dependency chain.

Explain DNS load balancing limits.

TTL caches old IPs; not aware of server load — use for geo routing with health-checked endpoints.

Additional LB Practice

Review this section with Part 27 walkthroughs — apply lb calculations to each classic problem.

Exercise	Goal
Recalculate QPS	Under 2 min without notes
Identify bottleneck	Label on diagram
Propose mitigation	With trade-off sentence

Part 5 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

L4 vs L7 explained
Round robin vs least conn
Health check types
SSL at LB
Global DNS routing
Avoid DNS round robin pitfalls
Session affinity alternative
LB as choke point HA
DDoS at edge
Cross-zone LB

Self-test prompt

Explain Part 5 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 5 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 6: Caching

Why Cache

Caching stores copies of expensive-to-compute or expensive-to-fetch data closer to the consumer. A cache hit avoids repeated database queries, RPC chains, or disk reads. In system design interviews, caching is often the difference between a design that meets 100ms p99 and one that collapses at 10K QPS.

Caches trade freshness for speed. Every cache introduces staleness risk and invalidation complexity — state these trade-offs explicitly rather than treating cache as a free performance boost.

Cache Layers

Caches exist at every layer of the stack. Understanding the hierarchy helps you place the right cache for the right bottleneck.

Layer	Examples	Typical TTL	Invalidation
Client	Browser HTTP cache, mobile disk	Minutes–days	Cache-Control headers
CDN / Edge	CloudFront, Cloudflare	Seconds–hours	URL purge, versioned paths
API gateway	Response cache by route	Seconds	Key eviction
Application	In-process LRU (Caffeine)	Seconds–minutes	Process restart
Distributed	Redis, Memcached	Minutes–hours	TTL, pub/sub invalidation
Database	Buffer pool, materialized views	Varies	Query refresh, CDC

Cache-Aside (Lazy Loading)

Application checks cache first; on miss, reads from DB, writes to cache, returns. Most common pattern for read-heavy workloads.

value = cache.get(key)
if value is None:
    value = db.get(key)
    cache.set(key, value, ttl=300)
return value

Pros: Only caches requested data; survives cache failure (degrades to DB)
Cons: First request always slow; stale data if DB updated without invalidation
Race: Two misses can double-load DB — use singleflight or lock per key

Read-Through & Write-Through

Read-through: Cache library loads from DB on miss transparently to app. Write-through: Writes go to cache and DB synchronously — cache always consistent but write latency equals DB latency.

Write-behind (write-back): Writes update cache immediately; async flush to DB. Higher write throughput but risk of data loss on cache crash before persistence — use for analytics counters, not financial balances without durable queue.

Eviction Policies

Policy	Behavior	Use when
LRU	Evict least recently used	General purpose hot set
LFU	Evict least frequently used	Stable popularity skew
TTL	Time-based expiry	Naturally stale data (feeds, config)
Random	Simple, no metadata	Memcached default at scale
Size-based	Max memory cap triggers eviction	Redis maxmemory-policy

Cache Stampede (Thundering Herd)

When a hot key expires, thousands of requests may miss simultaneously and hammer the database. Mitigations:

Probabilistic early expiration — jitter TTL so keys do not expire together
Lock / singleflight — first miss rebuilds; others wait or serve stale
External pre-warm — background job refreshes hot keys before expiry
Stale-while-revalidate — return old value while async refresh runs

TTL Strategy

Short TTL for rapidly changing data (stock prices). Long TTL + explicit invalidation for user profiles. Version keys (user:123:v5) allow instant logical invalidation without scanning Redis.

Negative caching: cache 'not found' briefly to protect DB from repeated lookups for bogus IDs (security scanning, bots).

Consistency & Invalidation

Invalidation strategies: delete key on write; publish invalidation event to all app servers; rely on TTL only for low-stakes data. Event-driven invalidation scales better than broadcast for large fleets.

[Write path]
  Client → API → DB commit → publish invalidation
                    → subscribers delete cache keys

Redis vs Memcached

Feature	Redis	Memcached
Data structures	Strings, hashes, lists, sets, streams	Strings only
Persistence	Optional RDB/AOF	Pure memory
Clustering	Redis Cluster, sentinel	Client-side consistent hash
Typical use	Sessions, leaderboards, pub/sub	Simple object cache

Interview Pitfalls

Caching without stating hit ratio assumption in BOE
No plan for cold start or cache cluster failure
Caching personalized data at CDN without Vary: Cookie
Ignoring memory cost at scale (1M keys × 10 KB = 10 GB)

Cache Key Design

Namespace keys: v1:user:123:profile. Version prefix enables bulk invalidation on schema change. Avoid unbounded key cardinality (per-request keys).

Memcached vs Redis for Pure Cache

Memcached multithreaded, simple evict — pure cache layer at Facebook scale. Redis when you need structures (sorted sets for leaderboards) or persistence.

Multi-Layer Example

Browser cache → CDN → API in-process LRU → Redis → DB
  95%           90% of remainder    80%              hit

Worked Example: Product Page Cache

Cache-aside key product:42 TTL 300s. On price update, DELETE key + publish invalidation to local caches. Stampede on flash sale: singleflight + pre-warm top 1000 SKUs.

Q&A

Q: Write-behind for inventory? A: Risky — loss on crash. Use for analytics page views, not stock count.

Interview Question Bank — Caching

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How do you prevent cache stampede?

Jitter TTL, singleflight, stale-while-revalidate, proactive pre-warm.

Cache-aside vs write-through?

Cache-aside: flexible, app controls. Write-through: stronger consistency, higher write latency.

When is negative caching used?

Repeated lookups for non-existent keys — bots scanning IDs — short TTL prevents DB hammering.

Additional Cache Practice

Review this section with Part 27 walkthroughs — apply cache calculations to each classic problem.

Exercise	Goal
Recalculate QPS	Under 2 min without notes
Identify bottleneck	Label on diagram
Propose mitigation	With trade-off sentence

Part 6 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Cache layers drawn
Cache-aside flow
Write-through vs behind
Eviction policy pick
TTL + invalidation
Stampede mitigation
Redis vs Memcached
Hit ratio in BOE
Negative caching
Cache failure degrade

Self-test prompt

Explain Part 6 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 6 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 7: CDN Deep Dive

What a CDN Does

A Content Delivery Network caches static and cacheable dynamic content at Points of Presence (POPs) geographically distributed near users. Reduces origin load, latency, and egress cost. Essential when BOE shows high read QPS or large asset payloads (images, video segments, JS bundles).

CDN Architecture

[User] → [DNS GeoDNS] → [Edge POP cache]
              miss ↓
         [Shield / Mid-tier] → [Origin / S3]

Edge POP serves from SSD/RAM. Shield layer collapses origin fetches — many edge misses become one shield-to-origin request. Origin shield protects S3 from thundering herd during viral content.

What to Cache at the Edge

Static assets: CSS, JS, images with content-hash filenames (immutable)
Video segments (HLS/DASH .ts chunks) with long TTL
API responses only if identical for many users (public product catalog)
Do NOT cache authenticated personalized HTML without careful Vary headers

Cache Control & Headers

Header	Purpose
Cache-Control: max-age	Browser and CDN TTL
s-maxage	CDN-specific TTL (shared caches)
stale-while-revalidate	Serve stale while fetching fresh
ETag / If-None-Match	Conditional GET — 304 saves bandwidth
Vary	Cache variants by Accept-Encoding, Cookie, etc.

Versioned URLs (/static/app.v42.js) allow infinite TTL — invalidation is deploy a new filename. Purge API needed for emergency takedown of bad assets.

Dynamic Content Acceleration

CDNs can terminate TLS closer to user, use persistent connections to origin, and route over private backbone (AWS CloudFront to S3). Dynamic Site Accelerator still cannot cache POST responses — focus on connection reuse and TCP optimization.

Video Streaming & CDN

Adaptive bitrate streaming splits video into small files; CDN caches each segment independently. Live streaming uses low-latency protocols (LL-HLS) and origin packagers — harder than VOD. BOE: concurrent viewers × bitrate = egress Gbps.

Invalidation & Consistency

Purge by URL, wildcard, or tag (Cloudflare cache-tags). Propagation takes seconds to minutes globally. Prefer immutable assets over purge for routine deploys. For news sites, short TTL + stale-while-revalidate balances freshness and load.

Security at CDN Edge

DDoS absorption — CDN scales to absorb volumetric attacks
WAF rules at edge (OWASP Top 10 patterns)
Bot management, rate limiting before origin
Geo blocking, IP allowlists for admin paths

Multi-CDN & Failover

Large properties use multiple CDNs for resilience and price negotiation. DNS or traffic manager weighted routing splits traffic. Complexity: cache efficiency drops if same asset on two CDNs — coordinate TTL and purge.

Cost Model

CDN bills per GB egress and request count. Origin egress to CDN often cheaper than internet egress. Calculate: monthly page views × asset size × (1 - edge_hit_ratio) = origin traffic. Improving hit ratio from 85% to 95% halves origin load.

Interview Script

"I will put all static media behind a CDN with content-hashed paths and 1-year TTL. API responses stay origin-only unless we have a truly public read API; user-specific data never caches at edge without explicit design."

CDN Providers Comparison (Conceptual)

Feature	Typical offering
Edge locations	100+ POPs global
Origin shield	Reduce origin load
Image optimization	Resize on edge
Workers@Edge	Light compute at POP

Origin Collapse

Without shield: 1000 edge POPs miss simultaneously → 1000 origin requests. Shield tier: 1000 misses → 1 shield fetch → 1 origin. Critical for viral content.

Worked Example: Video Platform

1080p segment 2 MB, 10M views/day on popular video. CDN serves 95%; origin 500K segment fetches. Origin bandwidth 500K × 2MB = 1 TB/day manageable vs 200 PB/day without CDN.

Interview Question Bank — CDN

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

What should never be cached at CDN?

Personalized HTML with user PII, uncacheable Set-Cookie responses without Vary.

How does cache poisoning happen?

Host header attacks — validate Host, use signed URLs for origin.

Origin shield benefit?

Collapses many edge misses into one origin fetch during viral traffic.

Additional CDN Practice

Review this section with Part 27 walkthroughs — apply cdn calculations to each classic problem.

Exercise	Goal
Recalculate QPS	Under 2 min without notes
Identify bottleneck	Label on diagram
Propose mitigation	With trade-off sentence

Part 7 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

CDN for static
Origin shield
Cache-Control headers
Immutable hashed assets
Purge vs version URL
Personalized not at edge
Video segments
Multi-CDN note
Cost per GB
DDoS absorption

Self-test prompt

Explain Part 7 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 7 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 8: Databases — SQL vs NoSQL

Choosing SQL vs NoSQL

SQL (relational) databases excel at structured data, ACID transactions, complex joins, and ad-hoc analytics. NoSQL broad category includes document, wide-column, key-value, graph, and time-series — each optimizes for specific access patterns at scale.

Factor	SQL (Postgres, MySQL)	NoSQL (varies)
Schema	Fixed, migrations	Flexible or schema-on-read
Transactions	Strong ACID	Often per-document or eventual
Joins	Native, optimizer	Denormalize or application-side
Scale writes	Vertical + sharding harder	Partition-friendly (Cassandra, Dynamo)
Query patterns	Ad-hoc SQL	Must know partition key upfront

Indexes & B-Trees

Most OLTP databases use B+ trees for indexes: balanced tree, O(log n) lookups, sequential leaf scans for range queries. Primary key cluster determines physical row order (InnoDB, Postgres clustered options).

Composite index (user_id, created_at) supports queries filtering on user_id and sorting by time — left-prefix rule: index useless for queries filtering only created_at without user_id.

Covering index includes all SELECT columns — avoids table lookup
Too many indexes slow writes (each index updated on INSERT)
Full table scan acceptable for rare admin reports, not user path

Normalization vs Denormalization

Normalization (3NF): Eliminate redundancy; joins reconstruct data. Good for OLTP consistency, smaller writes. Denormalization: Duplicate fields to avoid joins at read time — standard in Cassandra, MongoDB feed designs, and read-heavy SQL when join cost dominates.

Interview pattern: normalized writes in OLTP, denormalized read models via CDC to search/feed store (CQRS-lite).

Connection Pooling

Opening a DB connection is expensive (TLS, auth, memory). App servers use pools (PgBouncer, HikariCP) to reuse connections. Pool size ≈ (core_count × 2) + effective_spindle_count per Postgres folklore — but at scale, thousands of microservices × pool size can exhaust max_connections.

App (500 instances) → PgBouncer (transaction pooling) → Postgres
# Transaction pooling: connection returned after each transaction

Document Stores (MongoDB)

JSON documents, flexible schema, replica sets, sharded cluster by shard key. Good for catalogs, content management, user profiles with nested objects. Avoid unbounded document growth (embedding unbounded arrays).

Wide-Column (Cassandra, HBase)

Partition key determines node; clustering columns sort within partition. Optimized for high write throughput and time-series. Query must include partition key — designing access patterns first is mandatory.

Key-Value (DynamoDB, Redis)

Simple get/put by key, predictable latency at scale. DynamoDB: partition key + optional sort key, on-demand or provisioned capacity, GSIs for alternate access patterns (with consistency caveats).

Graph Databases

Neo4j, Neptune for relationship-heavy queries (social graph friends-of-friends, fraud rings). Not a replacement for primary OLTP at billion-user scale — often specialized subgraph service.

Operational Concerns

Backup, PITR, replication lag monitoring
Migration strategy (expand-contract, dual-write)
Read replica routing for analytics vs user traffic

[Write] → Primary SQL
[Read hot path] → Redis → optional replica
[Analytics] → Read replica / warehouse (never on primary)

Index Types Beyond B-Tree

Hash index: equality only (Postgres hash, limited use)
GIN/GiST: full-text, JSON, geo in Postgres
Column store: analytics (Redshift, ClickHouse)

Migration at Scale

Online schema change: gh-ost, pt-online-schema-change copy rows in background. Expand-contract: add nullable column → dual-write → backfill → switch reads → remove old.

Read Path Routing

ORM must distinguish writer vs reader endpoints. Stale replica reads acceptable for dashboards, not for "withdraw balance" immediately after deposit.

Worked Example: Social Graph in SQL

follows(follower_id, followee_id) composite index (follower_id, created_at). Query followees' recent posts: JOIN posts ON followee_id — at 10M followers for one user, denormalize celebrity follows to separate fan-out pipeline.

Interview Question Bank — Databases

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

B-tree vs LSM-tree?

B-tree: better read, default OLTP. LSM (RocksDB): better write throughput, compaction overhead.

When denormalize?

Read path >> write, join cost high, acceptable inconsistency window with CDC refresh.

Connection pool exhaustion symptom?

Timeouts under load while CPU low — increase pool cautiously or use PgBouncer.

Additional DB Practice

Review this section with Part 27 walkthroughs — apply db calculations to each classic problem.

Exercise	Goal
Recalculate QPS	Under 2 min without notes
Identify bottleneck	Label on diagram
Propose mitigation	With trade-off sentence

Part 8 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

SQL vs NoSQL table
B-tree index
Composite index rule
Normalize vs denorm
Connection pooling
Read replica routing
Shard when needed
Migration strategy
Covering index
Avoid SELECT *

Self-test prompt

Explain Part 8 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 8 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 9: Replication

Why Replicate Data

Replication copies data across multiple nodes for read scalability, lower latency (geo-local reads), and fault tolerance. In interviews, always pair replication with a consistency story: synchronous replication favors durability; asynchronous favors write latency.

Leader-Follower (Primary-Replica)

One leader accepts all writes; followers tail the write-ahead log (WAL) or binlog. PostgreSQL streaming replication, MySQL binlog replication, and MongoDB replica sets follow this pattern. Reads can hit followers to scale SELECT traffic.

Aspect	Detail
Write path	Client → leader only
Read path	Leader or any follower (may be stale)
Failover	Promote follower via Patroni, Orchestrator, RDS Multi-AZ
Risk	Replication lag → stale reads; split-brain if fencing fails

Synchronous vs Asynchronous Replication

Mode	Behavior	Trade-off
Synchronous	Leader waits for follower ACK before commit	No lost committed writes if leader dies; higher write latency
Asynchronous	Leader commits locally; followers catch up later	Lower latency; possible data loss on leader crash
Semi-sync	Wait for at least one follower	Balance of durability and latency

Expose replication_lag_seconds as a metric. Route critical reads (balance, inventory) to leader or use linearizable reads; route timelines to followers with "may be stale" UX.

Multi-Leader (Multi-Primary)

Multiple nodes accept writes — useful for multi-datacenter active-active. Conflicts are inevitable when two leaders update the same row. Resolution strategies:

Last-write-wins (LWW): Timestamp-based; simple but can drop updates
Vector clocks / version vectors: Track causality; surface conflicts to application
CRDTs: Data structures that merge without conflicts (counters, sets) — good for collaborative editing

Interview probe: "Two users like the same post from different regions simultaneously — how do you merge counts?" Answer with idempotent increments or CRDT counters.

Leaderless Replication (Quorum)

Dynamo-style systems (Cassandra, Riak, DynamoDB internals): no single leader. Replication factor N; write quorum W; read quorum R. If W + R > N, reads see latest write (strong consistency for that config).

N=3, W=2, R=2  → tolerate 1 node failure, strong reads
N=3, W=1, R=1  → fast but weak; eventual consistency

Hinted handoff: Temporarily store writes for down nodes. Read repair: On read, detect stale replicas and update them. Anti-entropy: Background Merkle-tree comparison fixes drift.

Change Data Capture (CDC)

Stream WAL/binlog to Kafka (Debezium) → search index, warehouse, cache invalidation. Avoids dual-write bugs where app writes DB and search separately and they diverge.

[Leader DB] → WAL → CDC connector → Kafka → [Consumers]
                                              ├→ Elasticsearch
                                              ├→ Data warehouse
                                              └→ Cache invalidation

Replication Topology Diagram

Leader-Follower:
  Writes → [Leader] ──repl──→ [Follower1]
                    └──repl──→ [Follower2]
  Reads  → any node (stale OK?)

Multi-Leader:
  [DC-East Leader] ←──conflict──→ [DC-West Leader]

Leaderless (N=3):
  Client writes to any 2 of 3 nodes (W=2)

Interview Checklist

State sync vs async and what happens when leader dies mid-write
How followers are chosen for failover (lag, priority)
Whether reads need strong consistency or eventual is acceptable
How cross-region replication affects CAP trade-offs

Script: "I use leader-follower with async replication for the feed service — followers serve 95% of reads. Payment ledger reads go to the leader or a sync replica because we cannot tolerate lost commits."

Split-Brain Prevention

Fencing: isolate old leader via STONITH or lease in etcd before promoting replica. TTL lease shorter than failover detection time.

Read Replica Routing

PgBouncer + ORM: @replica hint for analytics. Causal consistency: read from replica that has applied at least transaction T.

Lag Monitoring

Alert	Action
lag > 30s	Page DBA
lag > 5min	Block promote failover

Worked Example: Replication

Leader failover in 30s: promote replica, update DNS/VIP, invalidate connection pools. Clients retry with backoff. Apps must handle brief write errors during failover.

Extended Notes

Connect replication to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Replication

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Sync replication when?

Financial ledger, leader election metadata — when lost write unacceptable.

Read replica lag handling?

Route critical reads to primary; show 'syncing' UX for non-critical stale reads.

What is split-brain?

Two nodes both think they are leader — use fencing and quorum.

Extended Reference — Replication

Write path latency

Synchronous replication adds RTT to nearest replica per commit — measure p99 write impact before enabling on hot path.

Semi-synchronous 'at least one replica' is popular compromise in MySQL production clusters.

Failover testing

Game day: kill primary during load test; measure detection time, promotion time, client error rate.

Applications must reconnect — connection pools stale to old primary IP until refreshed.

Global readers

Geo-routed read replicas serve local users; replication lag means EU user may not see US write for seconds.

Causal tracking: Google Spanner TrueTime; application-level: version tokens in API responses.

Binlog consumption

Multiple consumers read same binlog stream for search, warehouse, cache — coordinate retention size.

Binlog growth disk risk — monitor and archive to S3.

Interview diagram

Draw primary + 2 replicas; label sync vs async arrows; mark read traffic to replicas with 'stale OK' note.

Part 9 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Leader-follower diagram
Sync vs async
Replication lag metric
Failover fencing
Multi-leader conflicts
Quorum W+R>N
Read repair
CDC pipeline
Split-brain prevent
Never hide lag

Self-test prompt

Explain Part 9 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 9 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 10: Partitioning & Sharding

Partitioning vs Sharding

Partitioning splits data within one database (Postgres table partitions by date). Sharding distributes partitions across independent database servers. Interviews often use the terms interchangeably for horizontal scale-out.

Partitioning Strategies

Strategy	Key	Pros	Cons
Range	user_id 1–1M on shard A	Range queries efficient	Hot spots on latest range
Hash	hash(user_id) mod N	Even distribution	Range scans across shards expensive
Geo	country/region	Data locality, compliance	Uneven country sizes
Directory	lookup table shard_id	Flexible rebalancing	Lookup service is SPOF unless replicated

Choosing a Shard Key

The shard key determines query locality forever. Good keys: high cardinality, even distribution, align with dominant query pattern.

Good: user_id for user-scoped data — all user queries hit one shard
Bad: country if US is 40% of traffic — hot shard
Bad: created_at alone — all writes hit "today" shard

Composite keys ((tenant_id, user_id)) help SaaS multi-tenancy isolate noisy neighbors.

Hot Keys & Hot Shards

Celebrity problem: one logical key (Beyoncé's tweet ID) receives disproportionate traffic. Mitigations:

Split key: logical key → 100 random suffix keys; read aggregates
Local cache: in-process cache on each API server for hot entities
Separate service: dedicated read path for global counters (Redis INCR sharded)
CDN / edge: for read-heavy public content

Cross-Shard Operations

Joins across shards require scatter-gather (query all shards, merge) — expensive. Design schemas so hot queries are single-shard. Global secondary indexes (DynamoDB GSI) replicate data under alternate keys at write cost.

Resharding

When N shards is insufficient, move from 256 to 512 shards. Strategies:

Fixed partitions: 4096 logical partitions mapped to shards; move partitions between shards without changing app hash
Dual-write: write to old and new shard during migration
Backfill: copy data with CDC; cutover when caught up
Consistent hashing: only K/N keys move when adding a node (see Part 17)

[Router] hash(user_id) → shard map → [Shard 0] [Shard 1] ... [Shard N]
         hot key? → local cache / key splitting

Elasticsearch / Cassandra Sharding Notes

Elasticsearch: index split into shards + replicas; routing by document ID. Cassandra: partition key required in every query; clustering columns for sort within partition.

Interview Pitfalls

Sharding too early — single Postgres with read replicas handles surprising scale
Shard key that does not match access pattern
No plan for resharding or tenant growth

Uber Ringpop / Scuttlebutt

Service discovery + shard ownership — gossip protocol distributes shard map to nodes.

Vitess (YouTube)

MySQL sharding middleware: VTGate routes SQL by sharding key; resharding with minimal app change.

Interview: Design Sharded DB

Start with hash(user_id) mod 256 logical shards mapped to 32 physical MySQL instances. Router layer in app or sidecar.

Worked Example: Sharding

Reshard user_id 0-1M from shard A to new shard B: dual-write phase, backfill historical rows, verify counts, switch reads, stop writes to A.

Extended Notes

Connect sharding to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Sharding

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Hot shard mitigation?

Split key, local cache, async aggregation, dedicated hardware for hot tenant.

How to choose shard count?

Start 2× expected data size per shard; plan consistent hash virtual nodes for growth.

Cross-shard query?

Scatter-gather parallel queries + merge — expensive; redesign access pattern if frequent.

Extended Reference — Partitioning & Sharding

Shard map service

Directory service stores range → shard mapping; update map during migration without client redeploy if using discovery API.

Co-location

Place related entities on same shard: user_id shard carries user profile, settings, private posts — avoids cross-shard transactions.

Secondary indexes

Global index in Dynamo: scatter query all shards — high cost; prefer local GSIs with duplicated partition strategy.

Rebalancing

Consistent hash minimizes movement; still schedule low-traffic window; throttle migration bandwidth.

Monitoring

Per-shard QPS, storage, replication lag heatmap — detect hot shard before outage.

Part 10 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Shard key choice
Hash vs range
Hot key mitigations
Cross-shard cost
Resharding plan
Directory lookup
Co-locate related data
Scatter-gather aware
Vitess mention OK
Monitor per shard

Self-test prompt

Explain Part 10 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 10 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 11: CAP, PACELC & Consistency Models

CAP Theorem

In a network partition (P), a distributed system must choose between consistency (C) and availability (A). You cannot have all three in the strict sense during a partition.

CP: Refuse writes/reads until consensus (ZooKeeper, etcd, HBase) — correct but may be unavailable
AP: Accept requests; replicas may diverge (Cassandra tunable, DynamoDB eventual)

Most production systems are not purely one letter — they offer tunable consistency per operation.

PACELC Extension

If Partition (P): choose Availability or Consistency (AC). Else (normal operation): choose Latency or Consistency (LC). Under no partition, you still trade off sync replication latency vs strong consistency.

Consistency Models (Weakest to Strongest)

Model	Guarantee	Example
Eventual	Replicas converge if no new writes	DNS, Cassandra default
Read-your-writes	User sees own updates	Session stickiness or user-scoped routing
Monotonic reads	No going backward in time	Route user to same replica
Consistent prefix	Causal order preserved	Kafka partition ordering
Linearizable	Appears instantaneous global order	etcd, Spanner TrueTime
Serializable	Transactions as if serial order	Postgres SERIALIZABLE

Linearizability vs Serializability

Linearizability: single-object, real-time order — register read sees latest write. Serializability: multi-object transaction isolation — no interleaving anomalies. Spanner provides external consistency via TrueTime bounded clock uncertainty.

Practical Interview Mapping

Product feature	Typical choice
Social feed	Eventual + read-your-writes
Like counter	Eventual or CRDT; approximate OK
Inventory / seat booking	Strong consistency, transactions
Chat messages	Per-channel ordering (Kafka partition)
Config flags	Eventual with short TTL

Quorum Recap

W + R > N gives strong reads on write; latency cost on every write. Mention tunable per query in Cassandra (ONE vs QUORUM vs ALL).

Clocks & Ordering

Lamport clocks, vector clocks, and hybrid logical clocks (HLC) order events without perfect sync clocks. Never assume NTP is perfect — design for clock skew in distributed IDs (Snowflake uses time + machine id).

Partition happens:
  CP system: some nodes reject traffic → lower availability
  AP system: nodes diverge → need merge / conflict resolution later

Script: "Feeds are AP — we accept eventual consistency with 30s staleness on followers. Seat reservation is CP on the shard leader with row-level locking."

Dynamo Paper Takeaways

Consistent hashing + quorum + sloppy quorum + hinted handoff — foundation for AP systems.

Google Spanner

TrueTime API bounds clock uncertainty → external consistency globally. Not magic — GPS/atomic clocks in datacenters.

Session Guarantees in Practice

Sticky sessions + read-your-writes: route same user to primary or replica with session token tracking applied LSN.

Worked Example: CAP

Bank transfer during partition: CP choice — reject transfer if cannot reach quorum. Social like during partition: AP — accept like, merge count later.

Extended Notes

Connect cap to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — CAP & Consistency

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Is CAP a theorem to cite blindly?

Explain partition behavior practically — tunable quorums, not binary CAP labels.

Linearizable example?

Distributed lock, leader election — user expects immediate global visibility.

Eventual consistency user impact?

Delayed notification count, duplicate like possible — product must accept or merge.

Extended Reference — CAP & Consistency

PACELC in interview

Normal operation: choose between latency and consistency — sync replication is LC trade-off.

Client-side choices

DynamoDB ConsistentRead=true on GetItem; Cassandra QUORUM vs ONE per query.

Session tokens

Return version with write; client passes version on read — server routes to replica ≥ version.

Split brain during partition

AP system may accept conflicting writes — product must define merge UX.

Avoid CAP buzzword only

Explain concrete failure: 'If link between DCs drops, we pause writes to enforce CP for wallet.'

Part 11 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

CAP during partition
PACELC else branch
Consistency models list
Linearizable example
Eventual product OK
Read-your-writes how
Tunable quorum
Clock skew aware
Not buzzword only
Map feature to model

Self-test prompt

Explain Part 11 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 11 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 12: Distributed Transactions

Why Distributed Transactions Are Hard

A transaction spanning multiple databases or services cannot use a single node's lock manager. Network failures leave systems in partial states. Interviews favor pragmatic patterns over pure 2PC unless banking-level ACID is required.

Two-Phase Commit (2PC)

Coordinator runs prepare (vote) then commit. All participants must ACK before commit.

Prepare: Participants lock resources, vote yes/no
Commit: If all yes, coordinator sends commit; else abort

Problems: blocking if coordinator dies after prepare; latency; not suited across unreliable WAN. Used inside distributed databases (Spanner, distributed Postgres experiments) more than microservices.

Saga Pattern

Sequence of local transactions with compensating actions on failure. Choreography (events) vs orchestration (central coordinator).

Step	Action	Compensate
1	Reserve inventory	Release inventory
2	Charge payment	Refund payment
3	Ship order	Cancel shipment

Compensations must be idempotent — retries are inevitable. Sagas are eventually consistent; not a substitute for single-node ACID when you need atomic debit+credit.

Transactional Outbox

Write business row + outbox event in same local DB transaction. Relay process publishes to Kafka. Consumers achieve at-least-once; idempotent handlers required.

BEGIN;
  INSERT INTO orders ...;
  INSERT INTO outbox (topic, payload) ...;
COMMIT;
-- separate relay: read outbox → publish → mark sent

Idempotency

Duplicate requests must not double-charge or double-ship. Store idempotency_key with unique constraint; return cached response on replay.

Client generates UUID per user action
Server stores (key → response) with TTL 24h
Payment APIs (Stripe) mandate idempotency keys

TCC (Try-Confirm-Cancel)

Reserve resources in try phase, confirm or cancel. Like saga with explicit resource holds — used in some Chinese payment ecosystems.

When to Use What

Pattern	Use when
Local ACID only	Single service owns all data
Outbox + events	Notify other services reliably
Saga	Multi-service workflow with compensations
2PC	Rare; internal to specialized DB

Interview Example: Order Service

"Order service writes order + outbox in Postgres. Payment service consumes PaymentRequested event, calls Stripe with idempotency key. On failure, publishes PaymentFailed; order service runs compensating cancel saga step."

Outbox vs Dual Write

Approach	Risk
Dual write DB+Kafka	One succeeds one fails — inconsistent
Outbox	Single transaction; relay may lag

Idempotency Table Schema

CREATE TABLE idempotency_keys (
  key VARCHAR(64) PRIMARY KEY,
  response_body JSONB,
  created_at TIMESTAMPTZ
);

Poison Message Handling

After N failed saga steps, move to manual review queue — do not infinite retry charging user.

Worked Example: Transactions

Order saga: reserve→pay→ship. Compensate ship cancel if pay failed after reserve. Each step stores saga_id state machine row.

Extended Notes

Connect transactions to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Distributed Transactions

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Why avoid 2PC in microservices?

Blocking, coordinator SPOF, latency — use saga/outbox instead.

Outbox relay failure?

Relay retries; at-least-once delivery; consumers idempotent.

Saga compensation failure?

Manual intervention queue; alert; never silent money loss.

Extended Reference — Distributed Transactions

Outbox ordering

Relay publishes in order per aggregate id — consumers depend on order for state machine.

Saga timeouts

Each step has deadline; timeout triggers compensate — avoid stuck saga occupying inventory.

Duplicate event handling

Consumer stores processed event_id; unique constraint prevents double ship.

Testing sagas

Inject failure after step 2 in integration test; verify compensate called once.

vs local transaction

Prefer single-service ACID when boundary allows — extract service only when necessary.

Part 12 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

2PC limitations
Saga compensate
Outbox pattern
Idempotency keys
At-least-once consumers
Poison saga handling
TCC optional mention
Prefer local TX
Event ordering
Test failure injection

Self-test prompt

Explain Part 12 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 12 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 13: Message Queues & Streams

Queues vs Logs

Message queue (RabbitMQ, SQS): message deleted after ack — task distribution. Log / stream (Kafka, Pulsar): messages retained; consumers track offset — replay and multiple consumer groups.

Kafka Core Concepts

Term	Meaning
Topic	Named stream of records
Partition	Ordered, immutable sequence; parallelism unit
Offset	Position in partition
Consumer group	Partitions divided among consumers in group
Replication	Leader + ISR followers per partition

Throughput scales with partition count. Key by user_id to preserve per-user ordering.

Ordering Guarantees

Within partition: strict order
Across partitions: no global order
Fix: partition key = entity id needing order (order_id, user_id)

Delivery Semantics

Semantic	Meaning	How
At-most-once	May lose messages	Fire-and-forget, no retry
At-least-once	May duplicate	Retry until ack; idempotent consumer
Exactly-once	Hard end-to-end	Kafka transactions + idempotent producer + dedup DB

Interview default: at-least-once + idempotent handlers. Exactly-once is expensive; justify for billing.

Consumer Groups

Each partition consumed by at most one consumer in a group. Scale consumers ≤ partition count. Rebalance on consumer join/leave — causes brief pause; use cooperative sticky assignors in production.

Backpressure & Retention

Retention policy (7 days default) bounds disk. Slow consumers fall behind (lag). Monitor consumer lag alert. Dead-letter queue (DLQ) for poison messages after N failures.

Use Cases

Async jobs: email, thumbnails, search indexing
Event sourcing / CDC propagation
Metrics aggregation pipeline
Decouple peak write spikes from slow processors

Producer → [Topic: orders]
              ├─ partition 0 → Consumer A (group billing)
              ├─ partition 1 → Consumer B (group billing)
              └─ partition 2 → Consumer C (group analytics)

RabbitMQ vs Kafka

	RabbitMQ	Kafka
Model	Queue, routing	Distributed log
Replay	Limited	Native by offset
Throughput	High	Very high
Routing	Exchanges, bindings	Topic + key

Partition Sizing

Target 10–100 MB/s per partition; too few partitions limits parallelism; too many increases broker overhead.

Kafka vs SQS

	Kafka	SQS
Ordering	Per partition	FIFO queues only
Retention	Days+	14 days max
Consumers	Pull, groups	Competing consumers

Event Schema Evolution

Avro/Protobuf with schema registry; backward compatible field addition; never remove required fields without version bump.

Worked Example: Kafka

Order events keyed by order_id preserve per-order ordering. 12 partitions → max 12 parallel consumers in group.

Extended Notes

Connect kafka to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Message Queues

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Kafka partition key choice?

Entity needing ordering — order_id, user_id — not random if order matters.

At-least-once duplicate handling?

Idempotent consumer: upsert by event_id, check processed table.

When queue vs direct RPC?

Async, burst absorption, fan-out to many consumers, decouple peak load.

Extended Reference — Message Queues & Streams

Message size

Kafka default 1MB max; large payloads store S3 pointer in message body.

Compaction

Log compaction retains latest key per topic — changelog topics for config/state.

Consumer lag SLO

Alert lag > 60s for billing pipeline; > 5min for analytics acceptable.

Ordering vs parallelism

More partitions = more parallelism but no global order — business must accept per-entity order only.

Poison pill

Message fails parse — DLQ after 3 tries; manual fix schema or skip with audit.

Part 13 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Queue vs log
Kafka partitions
Consumer groups
Delivery semantics
Idempotent consumer
DLQ
Lag monitoring
Partition key
Message size S3
Schema evolution

Self-test prompt

Explain Part 13 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 13 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 14: Microservices vs Monolith

Monolith First

A single deployable application with shared database is faster to build and debug. Many successful products scale monoliths vertically and with read replicas before splitting. Interview tip: do not jump to 50 microservices without scale pain.

When to Split Services

Independent scaling (video transcoding vs API)
Different release cadences per team
Technology fit (Python ML vs Go API)
Fault isolation (billing outage must not take down feed)
Regulatory boundaries (PCI scope reduction)

Microservices Challenges

Challenge	Mitigation
Distributed debugging	Tracing (Jaeger), correlation IDs
Data consistency	Sagas, outbox, eventual consistency
Network latency	Batch APIs, avoid chatty chains
Operational overhead	K8s, Helm, service mesh maturity
Testing	Contract tests, staging environments

API Gateway

Single entry for clients: auth, rate limiting, routing, SSL termination, request aggregation (BFF pattern for mobile vs web).

[Mobile] ──┐
[Web]    ──┼→ [API Gateway] → [User Svc] [Order Svc] [Feed Svc]
[3rd party]─┘         ↓ auth, throttle, route

Service Mesh (Istio, Linkerd)

Sidecar proxy per pod handles mTLS, retries, timeouts, traffic splitting without app code changes. Cost: latency hop, complexity. Worth it at dozens+ services with strong platform team.

Communication Patterns

Sync REST/gRPC: simple request-response; cascading failure risk
Async events: loose coupling; harder to debug
BFF: Backend-for-frontend tailored API per client type

Data Per Service

Each service owns its database — no shared tables. Cross-service queries via API composition or materialized views fed by CDC. Violating this creates distributed monolith.

Interview Script

"I start with a modular monolith — clear package boundaries. If transcoding becomes a bottleneck, extract media-worker service behind a queue while keeping user API monolithic."

Domain-Driven Boundaries

Split by bounded context (billing, catalog, shipping) not by technical layer (all DBs separate wrong way).

Strangler Fig Migration

Proxy routes 5% traffic to new service; increment until monolith retired.

Service Mesh Cost

~1–2ms latency per hop; 1000 services × mesh control plane ops burden — justify before adopting.

Worked Example: Microservices

Extract notification service first — clear boundary, async, reduces monolith deploy risk without splitting core transaction path.

Extended Notes

Connect microservices to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Microservices

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Monolith to microservices first split?

Highest churn isolated component with clear API — not arbitrary layer split.

API gateway vs service mesh?

Gateway: edge auth/routing. Mesh: service-to-service mTLS, retries, traffic split.

Distributed monolith antipattern?

Microservices sharing database tables — no bounded context isolation.

Extended Reference — Microservices

Team topology

Conway's law: service boundaries match team boundaries — align org before splitting code.

Contract testing

Pact tests verify provider/consumer API contracts in CI — prevent breaking downstream.

Shared libraries

Thin shared libs only — fat shared library recreates monolith coupling.

Observability tax

Each service needs metrics, logs, traces — platform team provides templates.

Decomposition trigger

Extract when independent scale, deploy, or failure domain justified — not preemptively.

Part 14 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Monolith first OK
Split boundaries
API gateway role
Service mesh cost
BFF pattern
Data per service
Saga across services
Contract tests
Strangler migration
Avoid distributed monolith

Self-test prompt

Explain Part 14 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 14 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 15: REST, GraphQL, gRPC & WebSockets

REST

Resource-oriented HTTP: nouns as URLs, verbs as methods. Stateless; cacheable GETs. Standard for public APIs and browser clients.

Method	Idempotent	Safe	Use
GET	Yes	Yes	Read
POST	No	No	Create
PUT	Yes	No	Replace
PATCH	No	No	Partial update
DELETE	Yes	No	Remove

Pagination: cursor-based (?cursor=abc) scales better than offset for large tables. Version in path (/v1/) or header.

GraphQL

Client specifies exact fields needed — one round trip for nested data. Server defines schema (types, queries, mutations).

Pros: flexible clients, reduced over-fetching
Cons: complex caching (no HTTP cache per URL easily), N+1 query risk (DataLoader batching), expensive arbitrary queries — need depth/complexity limits

gRPC

HTTP/2 + Protocol Buffers — binary, fast, strongly typed. Streaming (unary, server, client, bidirectional). Best service-to-service; browsers need grpc-web gateway.

	REST/JSON	gRPC
Contract	OpenAPI optional	.proto required
Performance	Good	Better (binary)
Browser	Native	Needs proxy
Streaming	SSE, WS	Native

WebSockets

Persistent bidirectional TCP — chat, live games, collaborative docs. Stateful connections complicate load balancing (sticky sessions or pub/sub backplane). Heartbeats detect dead connections.

Server-Sent Events (SSE)

One-way server → client over HTTP. Simpler than WebSockets for live feeds, notifications. Auto-reconnect built-in.

Choosing in Interviews

Scenario	Choice
Public mobile API	REST or GraphQL
Internal microservices	gRPC
Live chat	WebSockets + Redis pub/sub
Stock ticker	SSE or WebSocket

External: REST/GraphQL → API Gateway
Internal: gRPC between services
Realtime: WebSocket tier → Redis channel → all WS nodes

REST Pagination Patterns

Cursor: ?after=tweet_id stable under concurrent inserts. Offset bad for deep pages (OFFSET 1000000 slow).

GraphQL N+1

Resolvers per field cause N DB queries — DataLoader batches loads per request tick.

gRPC Streaming Use Cases

Server stream: log tail. Client stream: bulk upload. Bidi: collaborative editing.

Worked Example: APIs

Mobile uses GraphQL for home screen single request; backend services still gRPC internally.

Extended Notes

Connect apis to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — API Styles

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

REST versioning?

URL /v1/ or Accept header — be consistent; deprecate with sunset headers.

GraphQL complexity attack?

Limit query depth, cost analysis, timeouts, persisted queries allowlist.

gRPC vs REST for public API?

REST/JSON for third parties; gRPC internal — developer experience and browser support.

Extended Reference — REST, GraphQL, gRPC & WebSockets

API versioning

Deprecation timeline communicated via Sunset header; maintain v1 for 12 months.

Idempotent HTTP

PUT/DELETE idempotent by definition; POST needs Idempotency-Key for payments.

GraphQL complexity

Calculate cost: depth × breadth; reject expensive queries at gateway.

gRPC deadlines

context.WithDeadline propagates timeout across call chain.

WebSocket auth

Validate JWT on connect message; re-auth on long-lived connections periodically.

Part 15 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

REST verbs idempotent
Cursor pagination
GraphQL N+1 fix
gRPC internal
WebSocket LB
SSE one-way
Versioning strategy
Proto breaking change
Timeout deadlines gRPC
Pick API per client

Self-test prompt

Explain Part 15 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 15 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 16: Rate Limiting

Why Rate Limit

Protect origin from abuse, ensure fair usage, enforce SLA tiers, and prevent cascade failure. Apply at edge (CDN), API gateway, and service level.

Token Bucket

Bucket holds tokens refilled at rate R (e.g., 100/sec). Each request consumes one token; overflow requests rejected or queued.

Allows bursts up to bucket capacity B
Smooth average rate over time
Used by many APIs (Stripe, AWS)

tokens = min(capacity, tokens + (now - last) * rate)
if tokens >= 1: tokens -= 1; allow
else: reject 429

Leaky Bucket

Requests enter queue; processed at fixed rate. Smoother output than token bucket; less bursty allowance.

Fixed & Sliding Window

Fixed window: count requests per minute bucket — boundary burst (199 at 0:59 + 199 at 1:00). Sliding window log: store timestamp per request — accurate, memory heavy. Sliding window counter: hybrid of fixed windows — good balance (Redis).

Algorithm	Burst	Memory	Accuracy
Token bucket	Yes	Low	Good
Fixed window	Edge spike	Low	OK
Sliding log	No	High	Exact
Sliding counter	Moderate	Medium	Good

Distributed Rate Limiting

Redis centralizes counters; all API nodes check INCR with TTL. Race conditions: use Lua script for atomic check-and-decrement. For strict global limits across regions, Redis Cluster or dedicated rate-limit service.

Response Headers

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716300000
Retry-After: 60

Rate Limit Dimensions

Per IP (anonymous)
Per API key / user id
Per endpoint (expensive vs cheap)
Global (protect DB)

Interview: Design API Rate Limiter

Redis sorted set sliding window per key; rules service stores limits; gateway enforces before business logic. Mention fail-open vs fail-closed on Redis outage.

Redis Implementation Sketch

ZREMRANGEBYSCORE key 0 (now - window)
ZADD key now request_id
ZCARD key -- if > limit: 429

Hierarchical Limits

Global 1M RPS → per-tenant 10K → per-user 100 — check cheapest filter first.

Fairness vs Priority

Paid tier higher limits; burst allowance for onboarding flows.

Worked Example: Rate Limit

Free tier 100 req/min, Pro 10K. Enforce at gateway; return 429 with Upgrade header.

Extended Notes

Connect rate limit to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Rate Limiting

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Token bucket vs sliding window?

Bucket allows controlled burst; sliding window smoother rate over window.

Distributed rate limit race?

Atomic Lua in Redis; or centralized limiter service.

Fail open or closed on limiter outage?

Fail closed for abuse protection; fail open for internal low-risk if business prefers availability.

Extended Reference — Rate Limiting

Burst vs sustained

Token bucket separates concerns — document both limits in API docs.

Per-tenant fairness

Noisy neighbor: one API key cannot consume entire global quota — hierarchical caps.

Cost of Redis limiter

One Redis round trip per request — acceptable at 100K RPS with cluster; shard keys.

Edge vs app limit

CDN/WAF blocks obvious abuse; app enforces business tier limits.

Testing

Load test verifies 429 at threshold and Recovery after window reset.

Part 16 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Token bucket
Sliding window
Redis atomic limit
429 headers
Hierarchical limits
Fail open vs closed
Per user and global
Burst allowance
Edge + app limits
Load test 429

Self-test prompt

Explain Part 16 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 16 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 17: Consistent Hashing, Bloom Filters & More

Consistent Hashing

When adding/removing cache nodes, hash mod N remaps almost all keys. Consistent hashing maps keys and nodes to a ring — only K/N keys move on average when one node added.

Virtual nodes (vnodes): each physical node has 100–200 points on ring for even distribution. Used in Dynamo, Cassandra, Memcached clients, CDNs.

        hash(key) → clockwise first node on ring
    [N1·]  [N2·]  [N3·]  [N1·] ...
         ring 0° — 360°

Bloom Filters

Probabilistic set: test membership with zero false negatives (if says no, definitely no) but possible false positives (says yes, might not exist). Space-efficient.

Web crawler: skip already-visited URLs
Cassandra: avoid disk read if key definitely absent
CDN: prevent cache pollution
Spell check: dictionary in compact filter

Cannot delete from standard bloom; counting bloom or rebuild. Size m bits, k hash functions — tune false positive rate p.

Geohashing

Encode lat/long into string prefix; nearby places share prefix — efficient proximity search in Redis/Elasticsearch. Precision = string length. Used in Uber/Lyft driver matching, Yelp nearby.

Merkle Trees

Hash tree: leaf = data block hash; parent = hash(children). Compare root hashes to detect differing subtrees — O(log n) sync.

Git: commit tree integrity
Bitcoin: block verification
Cassandra anti-entropy: replica sync without full compare
Distributed DBs: efficient replica reconciliation

HyperLogLog

Approximate distinct count in fixed memory — unique visitors, cardinality analytics. Redis PFADD/PFCOUNT.

Count-Min Sketch

Frequency estimation in streaming — heavy hitter detection, hotspot keys.

Structure	Answers	False?
Bloom filter	Maybe in set?	FP only
HyperLogLog	How many unique?	Approximate
Count-Min	How many of X?	Overestimate

Consistent Hashing Math

With m keys and n nodes, expected keys to move when add node ≈ m/n. Modulo hash moves ~100% keys.

Bloom Filter Sizing

m = -n ln(p) / (ln 2)² bits for n items and false positive rate p. k = m/n × ln 2 hash functions.

Geohash Neighbor Search

Query 8 neighboring cells plus center — handle edge cases at equator/prime meridian.

Worked Example: Algorithms

URL dedup crawler: bloom filter 10B URLs, 1% FP → 100M false positives still saves disk — verify with disk set on positive.

Extended Notes

Connect algorithms to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Specialized Algorithms

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Consistent hashing use case?

Cache cluster, DynamoDB partitions, CDN origin selection — minimal remapping on node add.

Bloom filter false positive impact?

Extra DB read occasionally — tune false positive rate vs memory budget.

Merkle tree in anti-entropy?

Compare root hashes; recurse into differing branches only — efficient replica sync.

Extended Reference — Consistent Hashing & Probabilistic Structures

Virtual nodes

100 vnodes per physical node prevents uneven ring distribution when few servers.

Bloom in practice

Size for 1% FP and 1B items ≈ 1.14 GB — still cheaper than exact set in RAM.

Geohash precision

6 chars ≈ 1.2km; 7 chars ≈ 150m — pick for urban driver matching.

Merkle sync

Compare subtree hashes top-down — bandwidth proportional to differences not total data.

Sketch algorithms

Use Count-Min for trending hashtags; HyperLogLog for UV — not exact counts.

Part 17 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Consistent hash ring
Virtual nodes
Bloom filter FP
Geohash neighbors
Merkle sync
HyperLogLog UV
Count-Min sketch
Use case per structure
Modulo hash bad
Size bloom formula

Self-test prompt

Explain Part 17 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 17 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 18: Observability — Logs, Metrics, Traces

Three Pillars

Logs: discrete events (errors, audit). Metrics: numeric time series (CPU, QPS). Traces: request path across services. Together they answer: what broke, how bad, where in the chain.

Structured Logging

JSON logs with trace_id, user_id, service, level. Centralize in ELK (Elasticsearch, Logstash, Kibana), Loki, or CloudWatch. Avoid logging PII/passwords. Sample debug logs at high QPS.

Metrics (RED & USE)

Method	Scope	Metrics
RED	Services	Rate, Errors, Duration
USE	Resources	Utilization, Saturation, Errors

Prometheus pull model + Grafana dashboards. Histograms for p50/p99 latency — averages lie.

Distributed Tracing

OpenTelemetry → Jaeger/Tempo. Propagate trace context (W3C traceparent) across HTTP/gRPC/Kafka. One slow span in 20-service chain visible immediately.

Request trace_id=abc
  API 45ms → Auth 12ms → DB 180ms ← bottleneck
              → Cache 2ms

SLI, SLO, SLA

SLI: measurable indicator (availability = successful / total)
SLO: target (99.9% availability over 30 days)
SLA: contract with customer (refund if missed)

Error budget = 1 - SLO. If budget exhausted, freeze features; focus reliability. Burn rate alerts predict SLO violation early.

Alerting

Alert on symptoms (high 5xx rate, p99 latency) not causes (CPU 80%) unless correlated. Page humans for user-facing SLO breach; ticket for disk 70%. Runbooks linked in alert.

Interview Mention

"I define SLO 99.95% for read API, SLI from load balancer success rate, alert when 1-hour burn rate exceeds 10× budget consumption."

Log Levels

ERROR: action needed. WARN: degraded. INFO: business events. DEBUG: dev only, sampled in prod.

Cardinality Explosion

Never label metrics with unbounded user_id — use aggregated histograms. High cardinality kills Prometheus.

On-Call Hygiene

Runbooks, escalation policy, blameless postmortems within 48 hours.

Worked Example: Observability

SLO 99.9%: burn rate alert when 5xx > 0.1% for 5 min. Trace slow checkout to payment RPC timeout.

Extended Notes

Connect observability to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Observability

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

SLI vs SLO vs SLA?

Indicator vs internal target vs customer contract.

High cardinality metric example?

http_requests{user_id=x} — forbidden at scale.

Trace sampling?

100% errors, 1% success — balance cost and debuggability.

Extended Reference — Observability

Log sampling

Sample 1% debug at 1M RPS — still 10K logs/sec — tune levels.

Metric labels

service, endpoint, status_code — bounded cardinality.

Trace context propagation

Inject trace_id into logs for correlation — single pane search.

SLO dashboard

Burn rate panels for executives — error budget remaining this quarter.

On-call

Every alert actionable — if not, fix alert or delete.

Part 18 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Logs metrics traces
RED metrics
Trace propagation
SLI SLO SLA
Error budget
High cardinality avoid
Alert symptoms
Runbooks
Sampling strategy
Postmortem blameless

Self-test prompt

Explain Part 18 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 18 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 19: Reliability & Disaster Recovery

Reliability Goals

System continues correctly despite failures. Measure with availability SLOs and MTTR (mean time to repair). Design for failure — everything fails eventually.

Redundancy

Active-active: all nodes serve traffic — no idle capacity waste; harder consistency
Active-passive: standby takes over on failover — simpler, wasted standby
N+1, N+2: spare capacity for component failure

Failover

Health checks detect unhealthy instances; LB removes from pool. DNS failover for regional outage (slow TTL). Database automatic failover with fencing (STONITH) to prevent dual-writer split-brain.

Disaster Recovery (DR)

Term	Meaning
RPO	Recovery Point Objective — max data loss (time of last backup/replica)
RTO	Recovery Time Objective — max downtime to restore service

Async cross-region replication increases RPO (minutes of loss possible). Sync replication lowers RPO but raises latency.

Multi-Region Strategies

Backup restore: cheapest; highest RTO/RPO
Pilot light: minimal DR region, scale up on disaster
Warm standby: reduced capacity always running
Active-active: full capacity both regions; hardest

Chaos Engineering

Proactively inject failures (Chaos Monkey, Litmus) in controlled environments. Validate retries, circuit breakers, and runbooks before real outages. Start with game days, not random prod kills.

Dependency Failure

Every sync call is a failure domain. Timeouts + circuit breakers + graceful degradation (show cached feed if ranking service down).

[Primary Region] ←──async repl──→ [DR Region]
        ↓ failover DNS / traffic manager
   RPO 5 min, RTO 30 min (example targets)

Blast Radius

Isolate by cell (subset of users), shard, or region — failure affects 1% not 100%.

Game Day Checklist

Inject DB failover
Kill AZ
Spike traffic 3×
Verify alerts fire
Measure RTO actual

Backup Testing

Untested restore = no backup. Quarterly restore drill to staging.

Worked Example: Reliability

RPO 1 hour: async binlog replicate. RTO 15 min: automated failover + runbook for DNS flip.

Extended Notes

Connect reliability to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Reliability

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

RPO vs RTO example?

RPO 5 min = lose 5 min data max. RTO 30 min = down 30 min max.

Chaos engineering prerequisite?

Observability, on-call, steady state hypothesis — otherwise chaos is reckless.

Active-active database challenge?

Write conflicts across regions — need CRDT or conflict resolution.

Extended Reference — Reliability & DR

Dependency map

Maintain tier-0 dependency graph — if Redis down, which features degrade?

Graceful degradation

Feature flags disable recommendations; core feed still serves from cache.

DR drill

Quarterly failover to secondary region with production-like traffic shadow.

Data backup

PITR 35 days; test restore to new cluster monthly.

Incident response

Severity levels, comms template, status page update cadence.

Part 19 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Redundancy N+1
Active-active vs passive
RPO RTO defined
DR drill
Chaos engineering safe
Graceful degradation
Blast radius
Backup restore test
Multi-region tradeoff
Dependency map

Self-test prompt

Explain Part 19 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 19 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 20: Security Fundamentals

Authentication vs Authorization

Authn: who are you (login, JWT, session). Authz: what may you do (RBAC, ABAC, ACL). Always authenticate at gateway; authorize per resource in service.

OAuth 2.0 / OpenID Connect

Delegate auth to identity provider (Google, Okta). Authorization code flow with PKCE for SPAs. Access token (short) + refresh token (long, stored securely). OIDC adds ID token (user profile).

Session vs JWT

	Server session	JWT
Revocation	Easy (delete session)	Hard until expiry
Scale	Needs Redis session store	Stateless verification
Size	Small cookie	Large header

Encryption

In transit: TLS 1.2+ everywhere (HTTPS, mTLS service mesh)
At rest: AES-256 disk encryption (AWS KMS, envelope encryption)
Application-level: encrypt PII fields before DB for defense in depth

DDoS Protection

Volumetric attacks absorbed at CDN/scrubbing center. Rate limiting, WAF, geo blocking. Anycast spreads load. Never expose origin IP directly.

OWASP Top 10 (Overview)

Broken access control
Cryptographic failures
Injection (SQL, XSS)
Insecure design
Security misconfiguration
Vulnerable components
Auth failures
Integrity failures
Logging failures
SSRF

Mitigate: parameterized queries, input validation, CSP headers, least privilege IAM, secret rotation, security scanning in CI.

Zero Trust

Never trust internal network; verify every request. mTLS between services, network policies in K8s.

Secrets Management

Vault, AWS Secrets Manager — never commit .env. Rotate keys; short-lived tokens.

SQL Injection Prevention

Parameterized queries only; ORM not excuse for raw string concat.

PCI Scope

Use hosted fields / tokenization — card data never touches your servers.

Worked Example: Security

OAuth scopes: read:profile vs write:post. JWT 15 min access + refresh rotation.

Extended Notes

Connect security to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Security

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

OAuth implicit flow deprecated why?

Token exposed in browser — use authorization code + PKCE.

mTLS benefit?

Mutual authentication service-to-service — no trusted network assumption.

OWASP injection fix?

Parameterized queries, ORM, input validation, least privilege DB user.

Extended Reference — Security

Least privilege IAM

Service account per microservice; no shared admin keys in apps.

Secrets in CI

Short-lived OIDC to cloud — no long-lived AWS keys in GitHub.

Audit logging

Immutable audit trail for admin actions — who changed ACL when.

DDoS layers

Volumetric at CDN; application layer at WAF rate rules; origin protection hide IP.

Supply chain

Dependabot, signed containers, SBOM for compliance.

Part 20 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Authn vs authz
OAuth PKCE
JWT vs session
TLS everywhere
Encryption at rest
OWASP top aware
DDoS layers
Least privilege IAM
PCI scope reduce
No secrets in git

Self-test prompt

Explain Part 20 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 20 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 21: Resilience Design Patterns

Circuit Breaker

Stop calling failing dependency after threshold — fail fast, give time to recover. States: closed (normal), open (reject), half-open (probe).

Libraries: Resilience4j, Hystrix (legacy). Pair with fallback (cached response, degraded mode).

Bulkhead

Isolate resource pools — thread pool per dependency so one slow service cannot exhaust all threads. K8s resource limits per container are bulkheads at infra level.

Retry with Backoff

Transient failures (503, timeout): retry with exponential backoff + jitter. Cap max retries. Idempotent operations only for POST without idempotency key.

delay = min(cap, base * 2^attempt + random_jitter)

Timeout

Set timeout at every hop; client timeout < server timeout chain. Cascading waits kill systems — default 30s HTTP client timeout is dangerous at scale.

CQRS

Command Query Responsibility Segregation — separate write model (normalized OLTP) from read model (denormalized Elasticsearch). Updates propagate via events. Scales reads independently.

Event Sourcing

Store sequence of events as source of truth; state derived by replay. Audit trail for free; complex queries need projections. Pair with snapshots for long streams.

Pattern	Problem solved
Circuit breaker	Cascade failure
Bulkhead	Resource exhaustion
Retry + backoff	Transient errors
Timeout	Hung connections
CQRS	Read/write scale mismatch
Event sourcing	Audit, temporal queries

[Service] --timeout 200ms--> [Dependency]
     | circuit OPEN → fallback cache
     | bulkhead pool max 50 threads

Retry Storm

Clients retry on 503 simultaneously → overload. Jittered backoff + server Retry-After header.

CQRS Read Model Build

Projection worker consumes events → updates Elasticsearch doc. Rebuild projection from event log on corruption.

Saga vs 2PC Decision Tree

Money across services → saga + ledger audit. Config update across services → saga OK with compensate.

Worked Example: Patterns

Circuit open after 50% errors in 10s window; half-open allow 3 probe requests.

Extended Notes

Connect patterns to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Resilience Patterns

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Circuit breaker half-open?

Test if dependency recovered — single probe before full traffic.

Retry idempotency?

POST without key may duplicate — require Idempotency-Key header.

CQRS without event sourcing?

Yes — separate read/write stores synced by CDC enough for many systems.

Extended Reference — Resilience Patterns

Timeout budgets

Total user request 300ms — budget 50ms per hop max 4 hops.

Bulkhead thread pools

Pool per downstream — search slow does not exhaust pools for payments.

Fallback quality

Stale cache better than 500 error for product listing — label 'prices may be delayed'.

CQRS rebuild

Replay event log 24h to rebuild corrupted read model — disaster recovery for projections.

Anti-pattern

Retry storm without jitter — amplifies outage.

Part 21 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Circuit breaker states
Bulkhead pools
Retry jitter
Timeout budgets
CQRS projection
Event sourcing snapshot
Fallback defined
No retry storm
Half-open probe
Degrade UX message

Self-test prompt

Explain Part 21 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 21 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 22: Storage — Block, File & Object

Block Storage

Raw volumes (EBS, SAN) mounted as disks. Low-level, high IOPS. Used for databases, VM boot volumes. Snapshots for backup; attach/detach to instances.

File Storage

POSIX filesystem (NFS, EFS, HDFS). Shared folders, legacy apps, data science home directories. Not ideal for internet-scale static assets — latency and cost.

Object Storage

Blob + metadata via HTTP API (S3, GCS). Virtually unlimited scale, 11 nines durability claim, cheap per GB. Keys like s3://bucket/user/123/photo.jpg.

Type	Access	Best for
Block	Disk protocol	Databases, transactional local state
File	Filesystem path	Shared files, Hadoop
Object	HTTP key-value	Media, backups, data lake

Object Storage Patterns

Pre-signed URLs for direct client upload (bypass API bandwidth)
Lifecycle policies: Standard → IA → Glacier
CDN origin for static delivery
Versioning + replication for DR

Interview: Photo App

Metadata in SQL; binary in S3; thumbnail via async worker; CloudFront in front. Never store 5 MB images in Postgres rows.

Client → presigned PUT → S3
       → POST /photos {s3_key} → API → SQL metadata
Worker ← SQS ← event → generate thumbnails → S3

EBS vs Instance Store

EBS network-attached, snapshot backup. Instance store faster ephemeral — cache nodes only.

Data Lake

S3 + Parquet + Spark/Presto for analytics decoupled from OLTP.

Erasure Coding

S3 IA/Glacier use erasure coding for cost-effective durability at rest.

Worked Example: Storage

Dropbox: metadata SQL, chunks object storage, dedupe by content hash per user namespace.

Extended Notes

Connect storage to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Storage

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

S3 eventual consistency?

Read-after-write consistency for new objects; LIST eventual — design listing carefully.

Block storage for DB?

EBS gp3/io2 — provisioned IOPS for latency-sensitive OLTP.

File vs object for ML training?

Object store + parallel read workers; POSIX file mount optional layer.

Extended Reference — Storage Systems

S3 key design

Prefix with hash of user_id to avoid hot partition — random prefix if extreme scale.

Lifecycle cost

80% storage cost in old infrequent access — lifecycle rules save money.

EBS snapshot

Incremental snapshots; cross-region copy for DR.

POSIX on object

Mount s3fs for legacy — not performance path; use native SDK.

Compliance

Object lock WORM for regulatory retention.

Part 22 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Block file object diff
S3 presigned upload
Lifecycle tiers
EBS for DB
Data lake S3
No big BLOB SQL
Erasure coding note
POSIX on object caution
Cross-region replicate
Backup snapshots

Self-test prompt

Explain Part 22 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 22 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 23: Search & Indexing

Why Separate Search Engine

SQL LIKE '%foo%' full table scan — unusable at scale. Inverted indexes power fast full-text search, fuzzy match, facets, ranking.

Inverted Index

Maps each term → list of document IDs containing it. Query intersects posting lists for AND queries.

"quick fox" indexed:
  quick → [doc1, doc5]
  fox   → [doc1, doc9]
  AND   → [doc1]

Elasticsearch Architecture

Index: logical namespace (like database)
Shard: horizontal partition of index
Replica: copy for read scale and HA
Analyzers: tokenize, stem, lowercase text

Writes route to primary shard; replicas sync. Near-real-time search (refresh interval ~1s default).

Ranking & Relevance

TF-IDF, BM25 scoring. Boost fields (title > body). Function scores for popularity, recency. Personalization often hybrid: ES retrieval + ML rerank.

Sync from Primary DB

CDC or dual-write to index. Reindex on mapping changes (new field type). Handle deletes — tombstone in index.

Alternatives

Algolia/Typesense for managed SaaS; Postgres full-text for small scale; vector DB for semantic search (embeddings).

Feature	SQL	Elasticsearch
Prefix search	Poor	Good (edge n-grams)
Faceted browse	Heavy GROUP BY	Native aggregations
ACID writes	Yes	Eventual index refresh

Autocomplete Pipeline

Edge n-gram tokenizer at index time; completion suggester on prefix queries.

Pagination in Search

search_after with sort keys — deep pagination without costly offset.

Index Mapping Mistakes

Wrong field type (text vs keyword) breaks aggregations and exact filters.

Worked Example: Search

Yelp search: geo filter + text + rating facet — inverted index + geo index combined.

Extended Notes

Connect search to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Search

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Inverted index update?

Near real-time refresh interval; or external versioning for strict freshness.

Why Elasticsearch for logs?

Full-text + aggregations + time-series index patterns (ELK stack).

Vector search addition?

Embeddings index for semantic similarity — hybrid with keyword BM25.

Extended Reference — Search & Indexing

Analyzer chain

Lowercase → stopwords → stemmer — tune for language.

Shard sizing ES

20–50 GB per shard guideline; force merge maintenance window.

Hybrid search

BM25 retrieve top 100 → vector rerank top 10 — best of keyword + semantic.

Index rebuild

Blue-green indices alias swap — zero downtime reindex.

Security

Filter queries by tenant_id mandatory — prevent cross-tenant leak.

Part 23 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Inverted index
ES shards replicas
Analyzers
BM25 ranking
CDC to index
Reindex blue-green
search_after pagination
Tenant filter mandatory
Vector hybrid optional
Autocomplete edge ngram

Self-test prompt

Explain Part 23 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 23 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 24: Real-Time Systems & News Feeds

Fan-Out on Write

When user posts, push post ID into all followers' timeline caches (Redis sorted sets). Read is O(1) — fetch precomputed timeline.

Pros: fast reads, predictable latency
Cons: slow write for celebrities (millions of followers); wasted work if follower inactive

Fan-Out on Read

On read, merge recent posts from all followed users. Write is cheap; read is expensive and slow for users following many accounts.

Hybrid (Twitter-style)

Fan-out on write for normal users (<10K followers). Fan-out on read for celebrities — merge celebrity tweets at read time from dedicated cache.

Post tweet:
  if followers < 10K → push to each follower timeline cache
  else → write to celebrity tweet cache only

Read timeline:
  merge(user_timeline_cache, celebrity_tweets_cache)

Pull vs Push Models

	Pull	Push
Client	Polls server periodically	Server sends via WS/SSE/push notification
Latency	Poll interval bound	Near real-time
Server load	Empty polls waste resources	Connection state per client
Battery	Worse if aggressive poll	Push can be efficient with FCM/APNs

Activity Streams

FQL-style aggregation: store activities, fan-out to inboxes, rank by ML offline. Kafka for event pipeline; Redis for hot timelines; cold storage in Cassandra.

Ranking Feed

Not chronological at scale — score = f(recency, engagement, affinity). Precompute scores in batch; blend with real-time signals.

Timeline Storage

Redis ZSET: key=timeline:user_id, score=timestamp, member=tweet_id. Trim to top 1000 entries.

Cold Start User

Global popular feed until follow graph populated — onboarding engagement.

Feed Ranking Features

Recency, author affinity, engagement probability — offline model + online blend.

Worked Example: Feeds

Normal user 500 followers: fan-out write 500 Redis ZADDs ~5ms. Celebrity 50M: fan-out read only.

Extended Notes

Connect feeds to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Real-Time Feeds

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Celebrity fan-out hybrid threshold?

Industry often 10K–100K followers — tune by infrastructure cost.

WebSocket scaling?

Pub/sub backplane (Redis) so any WS node receives broadcast to local connections.

Feed ranking offline/online?

Offline batch scores + online rerank with fresh engagement signals.

Extended Reference — Real-Time & Feeds

Ranking pipeline

Offline Spark computes scores hourly; online feature store serves p99 < 10ms lookup.

Feed pagination

Cursor = last tweet_id in page; stable if no deletes; tombstone deleted ids.

Live updates

SSE fanout from pub/sub cheaper than WS for one-way notifications.

Write amplification

Fan-out write 10M followers = 10M writes — async queue required; rate limit celebrity post.

Read merging

K-way merge sorted lists from followees — heap O(log k) per item.

Part 24 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Fan-out write read
Hybrid celebrities
Redis ZSET timeline
Pull vs push
K-way merge
Ranking offline online
Cold start feed
WS pub/sub scale
Write amplification aware
Stale feed OK?

Self-test prompt

Explain Part 24 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 24 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 25: Payments & Ledger Design

Requirements

Exactly-once money movement illusion, audit trail, idempotency, PCI scope minimization (use Stripe/Adyen tokenization). Strong consistency for balances.

Double-Entry Ledger

Every transaction has equal debit and credit entries; sum of accounts always balances.

Transfer $100 A → B:
  DEBIT  account_A  100
  CREDIT account_B  100

Immutable ledger entries — never UPDATE balance in place; append entries and compute balance as SUM or maintain materialized balance with transactional update in same DB transaction.

Idempotency Keys

Client sends Idempotency-Key: uuid on POST /charges. Server stores key → result mapping. Retries return same response without double charge.

Payment Flow

Create payment intent (pending)
Call PSP (payment service provider)
Webhook confirms success/failure (async)
Update ledger + order state atomically

Webhook handler must be idempotent — PSP may retry webhooks.

Reconciliation

Nightly batch compare internal ledger vs PSP settlement files. Discrepancy alerts for fraud or bugs.

Outbox for Side Effects

Ledger write + outbox event in one transaction → email receipt, analytics without losing money record.

Failure	Handling
PSP timeout	Query PSP status; never assume failure
Duplicate webhook	Idempotent webhook handler
Partial saga	Compensating refund saga

PCI DSS Layers

SAQ A if all card data on Stripe Elements — smallest compliance burden.

Currency & Rounding

Store amounts in minor units (cents) as integers — never float for money.

Chargeback Flow

Webhook dispute.created → freeze merchant payout → evidence upload workflow.

Worked Example: Payments

Stripe webhook idempotent by event_id unique index. Ledger append-only, never UPDATE amount.

Extended Notes

Connect payments to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Payments

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Double-entry why?

Audit trail, imbalance detection fraud, accounting compliance.

Idempotency key storage TTL?

24–72 hours covers client retry windows; Stripe documents 24h.

Webhook ordering?

Do not assume order — use event_id dedup and state machine.

Extended Reference — Payments & Ledger

Immutable ledger

Append-only entries; corrections via compensating entries not UPDATE.

Minor units

BIGINT cents prevents float rounding 0.1 + 0.2 bugs.

PSP abstraction

Interface PaymentProvider — swap Stripe/Adyen; mock in tests.

Fraud checks

Sync fraud score before capture — async review for high value.

Regulatory

KYC/AML separate service; PCI scope minimization documented.

Part 25 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Double-entry ledger
Idempotency Stripe
Webhook dedup
Integer cents
Saga payment
Reconciliation batch
PCI tokenize
Never float money
Outbox notify
Compensate refund

Self-test prompt

Explain Part 25 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 25 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 26: Notification System Design

Channels

Email, SMS, push (FCM/APNs), in-app, web push. Each channel has different providers, rate limits, cost, and delivery guarantees.

High-Level Architecture

[Event: order shipped] → Kafka → [Notification Service]
                                      ├→ Email worker → SendGrid
                                      ├→ SMS worker → Twilio
                                      └→ Push worker → FCM/APNs

User Preferences

Store per-user channel opt-in, quiet hours, locale. Check preferences before enqueue. Regulatory: marketing vs transactional (CAN-SPAM, TCPA).

Template & Localization

Template ID + variables rendered per locale. Version templates; A/B test subject lines offline.

Delivery & Retries

At-least-once queue per channel. Exponential backoff on provider 5xx. DLQ for bad addresses. Track delivery webhooks (email opened, bounce).

Volume Estimation

10M DAU × 5 notifications/day = 50M messages/day ≈ 580/sec average, higher peak. Shard queue by user_id. Rate limit per provider (SMS expensive).

Idempotency

Event id + notification type dedupes — avoid duplicate push on Kafka replay.

Priority Queues

Transactional (password reset) > marketing. Separate queues so blast campaign does not delay 2FA codes.

Monitoring

Metrics: sent, delivered, failed, latency per channel. Alert on bounce rate spike (bad list) or provider outage.

Channel	Latency	Cost
Push	Seconds	Low
Email	Seconds–minutes	Low
SMS	Seconds	High

Push Token Registry

device_tokens table: user_id, platform, token, updated_at. Invalidate on bounce.

Email Bounce Handling

Hard bounce → suppress address. Soft bounce → retry with backoff.

Unsubscribe One-Click

List-Unsubscribe header for marketing compliance.

Worked Example: Notifications

Password reset: SMS+email parallel, priority queue bypasses marketing throttle.

Extended Notes

Connect notifications to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Notifications

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Push vs SMS for 2FA?

SMS deliverability issues — prefer TOTP app; SMS fallback with rate limit.

Notification dedup?

event_id + channel unique constraint before send.

Quiet hours?

Store user timezone; scheduler delays non-urgent marketing sends.

Extended Reference — Notification Systems

Template versioning

v2 template rollback if conversion drops — A/B metric driven.

Provider failover

Primary SendGrid fail → secondary SES — circuit breaker per provider.

Batching

Digest email aggregates 50 events — reduces send volume.

Compliance

STOP keyword for SMS; one-click unsubscribe link tracking.

Load test

Simulate Black Friday notification spike through queue without sending real SMS cost.

Part 26 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Multi-channel queue
Priority queues
Template locale
Device token registry
Bounce suppress
Dedup event_id
Rate limit SMS cost
Quiet hours TZ
Provider failover
Transactional vs marketing

Self-test prompt

Explain Part 26 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 26 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 27: Full System Design Walkthroughs

Twenty-five classic interview problems. For each: clarify requirements, run numbers, define APIs, draw architecture, sketch schema, name bottlenecks, and discuss extensions. Time-box to 45 minutes per problem in mock practice.

How to practice: Minute 0–8 requirements + estimates. Minute 8–20 high-level diagram. Minute 20–38 deep dive (interviewer choice). Minute 38–45 trade-offs and monitoring. Record yourself and score with Part 33 rubric.

URL Shortener (TinyURL)

Functional & Non-Functional Requirements

Scope summary: 100M URLs/month, 100:1 read:write

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

100M/mo → ~40 writes/s, ~4000 reads/s peak

Walk through explicitly: (1) DAU or total objects, (2) operations per user per day, (3) average and peak QPS = daily_ops/86400 × peak_factor, (4) storage = count × size × replication × retention, (5) bandwidth = QPS × payload. Document assumptions on the whiteboard before calculating.

API Design

POST /v1/urls {long_url} → {short_code}; GET /{code} → 302

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Hash (base62) or counter+encode; collision retry; custom aliases optional

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

urls(id, short_code PK, long_url, user_id, created_at); index on short_code

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot counter shard; cache redirects; DB for durability

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Analytics pipeline, expiration, abuse detection

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Paste Bin

Functional & Non-Functional Requirements

Scope summary: 10M pastes/month, public/private

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

~4 pastes/s, reads higher for popular

API Design

POST /pastes; GET /pastes/{id}; optional expiry

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Object store for body; metadata in SQL; CDN for public reads

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

pastes(id, user_id, visibility, expiry, s3_key, created_at)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Large paste size; spam; dedupe identical content

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Syntax highlighting service, rate limits

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Distributed Rate Limiter

Functional & Non-Functional Requirements

Scope summary: 1M users, rules per API key

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Per-key QPS limits, sliding window

API Design

Middleware checks X-RateLimit-* headers

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Redis sorted sets or token bucket per key; sync optional for strict

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

rules(key, limit, window); counters in Redis

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Redis memory; clock skew; burst traffic

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Hierarchical limits, dynamic config

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Web Crawler

Functional & Non-Functional Requirements

Scope summary: 1B pages, polite crawling

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Frontier queue dominates

API Design

BFS frontier; fetcher workers; dedupe URL hash

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

URL frontier queue, visited bloom, robots.txt cache

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

urls(url_hash PK, status, priority, last_crawled)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Politeness per host; duplicate detection; DNS

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Distributed scheduling, PageRank pipeline

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Twitter / X News Feed

Functional & Non-Functional Requirements

Scope summary: 300M DAU, fan-out on write vs read

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

5K tweets/s write, massive read

API Design

POST /tweets; GET /timeline; follow graph

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Hybrid fan-out: celebrities fan-out on read; normal users on write

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

tweets, follows, timeline cache (Redis sorted sets)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot users; thundering herd on celebrities

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Ranking ML, spaces, ads injection

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Instagram

Functional & Non-Functional Requirements

Scope summary: Photo-heavy, social graph

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

S3 + CDN; metadata DB

API Design

POST /media; GET /feed; likes/comments

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Blob store; Cassandra for feeds; graph for follows

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

media, users, feeds, likes — denormalized counters

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Image processing pipeline; feed generation

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Stories TTL, recommendations

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

WhatsApp / Chat

Functional & Non-Functional Requirements

Scope summary: 1B messages/day, delivery guarantees

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

~12K msg/s average

API Design

WebSocket gateway; message service; presence

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Per-chat sequence; store-and-forward; offline inbox

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

messages(chat_id, seq, body, status); users, devices

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Connection count; multi-device sync; E2E optional

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Groups, media, encryption

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

YouTube / Netflix Video

Functional & Non-Functional Requirements

Scope summary: Upload + transcode + stream

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Huge egress bandwidth

API Design

Multipart upload; HLS/DASH segments; CDN

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Upload → queue → transcode workers → object store + CDN

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

videos, renditions, view_counts

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Transcode cost; copyright; regional CDN

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Live stream, recommendations, DRM

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Uber / Lyft

Functional & Non-Functional Requirements

Scope summary: Real-time location, matching

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Geospatial index critical

API Design

POST /rides; driver location stream; match service

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Geohash/grid index; dispatch service; trip state machine

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

drivers(location, status), rides, users

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Split-brain matching; surge pricing events

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Pooling, ETA ML, payments

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Yelp Proximity Search

Functional & Non-Functional Requirements

Scope summary: Search nearby businesses

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Geospatial queries <100ms

API Design

GET /search?lat&lng&radius&query

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Elastic/OpenSearch geo_distance; cache popular cities

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

businesses(id, lat, lng, categories, rating)

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Index size; ranking relevance vs distance

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Reviews, photos, ads

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Ticketmaster

Functional & Non-Functional Requirements

Scope summary: High contention on-sale

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Spike 100x normal at drop

API Design

Reserve → pay → confirm; queue users virtually

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Virtual waiting room; inventory row locks; idempotent booking

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

events, seats(status), reservations, orders

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Overselling; bots; payment failures

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Secondary market, dynamic pricing

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Dropbox

Functional & Non-Functional Requirements

Scope summary: File sync, conflict resolution

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Chunk-level dedupe

API Design

Upload blocks; sync metadata; delta sync

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Metadata DB + block blob store; content-hash dedupe

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

files, blocks, devices, versions

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Large file uploads; conflict merges

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Sharing permissions, encryption

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Typeahead / Autocomplete

Functional & Non-Functional Requirements

Scope summary: Low latency <50ms

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Prefix queries, trending

API Design

GET /suggest?q=pre

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Trie or Elasticsearch completion; popular queries cache

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

n-gram index; query log aggregation

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot prefixes; personalization

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Spell-check, ranking by CTR

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

News Feed Ranking

Functional & Non-Functional Requirements

Scope summary: Personalized ranked feed

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

ML feature store + scoring

API Design

Candidate generation → rank → filter

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Stream processing for features; cache ranked pages

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

posts, user_features, impressions

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Freshness vs relevance; filter bubbles

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Real-time re-rank, A/B infra

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Metrics Monitoring (Datadog)

Functional & Non-Functional Requirements

Scope summary: 1M metrics × 10 tags, write heavy

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Time-series DB

API Design

Agents push; rollup; query API

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Kafka → TSDB (Cassandra/ClickHouse); downsampling

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

series_id, timestamp, value

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Cardinality explosion; query cost

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Alerting, anomaly detection

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Distributed Cache (Redis Cluster)

Functional & Non-Functional Requirements

Scope summary: Cache 100GB+, HA

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Consistent hashing shards

API Design

GET/SET; TTL; cluster gossip

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Redis cluster slots; replication per shard

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

In-memory only; persistence optional

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Hot keys; resharding

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Multi-DC, client-side caching

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

E-commerce Checkout

Functional & Non-Functional Requirements

Scope summary: Cart → inventory → payment

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Strong consistency for inventory

API Design

POST /checkout idempotent; saga for payment

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Reserve inventory; charge; confirm; compensate on fail

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

orders, inventory, payments — transactional

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Race on last item; double charge

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Fulfillment, returns, fraud

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Hotel Booking

Functional & Non-Functional Requirements

Scope summary: Date-range inventory

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Similar to tickets, less spike

API Design

Search availability; book room-night

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Inventory per room-type per night; hold TTL

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

hotels, room_nights, bookings

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Overbooking policies; cancellation

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Rate parity, loyalty

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Google Docs Collaboration

Functional & Non-Functional Requirements

Scope summary: Real-time OT/CRDT

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

WebSocket + operation log

API Design

Send ops; server orders; broadcast

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

OT or CRDT; snapshot + op log

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

doc_id, revision, operations

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Conflict resolution; offline sync

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Comments, permissions, history

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Stack Overflow

Functional & Non-Functional Requirements

Scope summary: Q&A, search, reputation

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Read-heavy

API Design

POST questions/answers; search; vote

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

SQL for integrity; ES for search; cache hot questions

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

posts, votes, users, tags

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Reputation gaming; duplicate detection

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Moderation queue, notifications

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Zoom Video Conferencing

Functional & Non-Functional Requirements

Scope summary: SFU/MCU architecture

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

UDP media, signaling TCP

API Design

Signaling server; media SFU; TURN fallback

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Regional SFU mesh; recording to S3

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

rooms, participants, sessions

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: NAT traversal; CPU for video

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Webinar mode, breakout rooms

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Payment Wallet

Functional & Non-Functional Requirements

Scope summary: Ledger correctness

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

ACID + idempotency

API Design

Transfer with idempotency-key; double-entry ledger

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Immutable ledger entries; balance materialized view

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

accounts, ledger_entries, transfers

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Exactly-once; reconciliation

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

KYC, fraud, multi-currency

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Notification Service

Functional & Non-Functional Requirements

Scope summary: Multi-channel delivery

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

1M notifs/min

API Design

Enqueue → workers → email/SMS/push

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Priority queues; templates; device tokens

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

notifications, templates, user_preferences

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Provider rate limits; retries

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Digest batching, A/B

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Ad Click Aggregator

Functional & Non-Functional Requirements

Scope summary: 1M clicks/s aggregate

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Stream processing

API Design

Kafka → Flink → OLAP

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

Counting, billing, fraud filters

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

raw_clicks stream; aggregates by campaign

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Late data; exactly-once billing

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Real-time dashboard, attribution

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

API Rate Limiter at Scale

Functional & Non-Functional Requirements

Scope summary: Global edge + regional

Define MVP features vs phase-2 (analytics, admin, ML ranking).
State who the users are (consumers, businesses, internal operators).
Agree read vs write ratio and consistency needs (strong for money, eventual for feeds).
Target availability (e.g. 99.9% vs 99.99%) and p99 latency for core paths.
Call out compliance if relevant: GDPR delete, PCI for payments, data residency.

Non-functional	Typical target	Design lever
Availability	99.9%–99.99%	Multi-AZ, redundancy, health checks
Latency (p99)	50–300 ms reads	Cache, CDN, regional deployment
Durability	No acknowledged write loss	Replication, fsync policy, backups
Scale	See estimates below	Sharding, async pipelines, autoscale

Back-of-Envelope Estimates

Millions of keys

API Design

Edge PoP counters + sync; token bucket

# Common headers
Authorization: Bearer <token>
Idempotency-Key: <uuid>   # for POST/PUT that must not double-apply
X-Request-Id: <uuid>      # tracing

# Pagination
GET /resources?cursor=<opaque>&limit=20
# Response: { "items": [], "next_cursor": "..." }

Version APIs (/v1/), use appropriate status codes (201 create, 409 conflict, 429 rate limited), and document idempotent retry behavior for clients.

High-Level Architecture

CDN edge + central Redis; GCRA algorithm

                    ┌─────────────┐
  Users ───────────►│ CDN / Edge  │ (static, cacheable GETs)
                    └──────┬──────┘
                           ▼
                    ┌─────────────┐
                    │ Load Balancer│ L7 routing, TLS, WAF
                    └──────┬──────┘
                           ▼
              ┌────────────────────────┐
              │  Stateless API tier    │ autoscale on CPU/latency
              └───────────┬────────────┘
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────────┐
    │  Cache   │   │  Primary   │   │ Message Queue │
    │ (Redis)  │   │  Database  │   │ (async work)  │
    └──────────┘   └────────────┘   └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Object store │ (media, large blobs)
                    └─────────────┘

Data Model & Schema Sketch

policy store; sharded counters

-- Index for hot access path; avoid over-indexing writes
-- Partition/shard key: choose what spreads load (user_id, tenant_id)

Normalize for correctness on writes; denormalize read models (counters, feeds) when read QPS dominates. Use UUIDs externally; consider snowflake IDs for ordered sharding.

Bottlenecks, Failure Modes & Mitigations

Primary risks: Cross-PoP consistency; config propagation

Failure	Symptom	Mitigation
Traffic spike	Latency ↑, errors ↑	Autoscale, queue absorption, rate limit
Hot key / shard	Single node saturated	Split key, local cache, random suffix
Dependency down	Cascading timeouts	Circuit breaker, timeouts, fallbacks
Data corruption	Incorrect state	Checksums, audits, idempotent replays

Observability: metrics (RED/USE), structured logs with request_id, distributed traces across services. Alert on SLO burn rate, not only thresholds.

Extensions & Senior Follow-Ups

Per-tenant custom limits, burst

Multi-region active-active or active-passive — CAP trade-offs on writes.
Cost: egress, storage tiering, reserved capacity vs serverless.
Security: abuse, authZ scopes, encryption at rest and in transit.
Migration: dual-write, backfill, feature flags for rollout.

↑ Back to top

Quick Reference: Picking Building Blocks

Need	Often choose
Strong transactions	PostgreSQL + application saga for cross-service
Massive write throughput	Cassandra, DynamoDB, sharded MySQL
Full-text search	Elasticsearch / OpenSearch
Async decoupling	Kafka, SQS, RabbitMQ
Sub-ms reads	Redis cluster + CDN
Blob media	S3 + CloudFront

Continue to Part 28 →

Part 28: Trade-Off Matrices

How to Use Matrices in Interviews

After proposing a design, summarize decisions in a table: option A vs B across dimensions (latency, consistency, cost, ops complexity). Shows structured trade-off thinking.

SQL vs NoSQL

Dimension	SQL (Postgres)	Document (Mongo)	Wide-column (Cassandra)	Key-value (DynamoDB)
Schema	Rigid, migrations	Flexible JSON	Row per partition key	Schemaless per item
Transactions	Multi-row ACID	Single-doc ACID	Per-partition lightweight	Conditional writes
Joins	Native	$lookup or app-side	Denormalize	No joins
Scale pattern	Read replicas + shard	Shard by key	Built for write scale	Managed partition
Best fit	Orders, accounts	Catalog, CMS	Feeds, metrics	Sessions, locks

Push vs Pull (Updates)

Dimension	Push	Pull
Latency to client	Low (server initiated)	Bounded by poll interval
Server connections	Stateful (WS)	Stateless HTTP
Missed messages	Need reconnect logic	Client controls cursor
Scale cost	Connection memory	Wasted empty polls
Example	Chat, live scores	Email client sync

Fan-Out Write vs Read

	Fan-out on write	Fan-out on read
Read cost	O(1) prebuilt	O(followees) merge
Write cost	O(followers)	O(1)
Celebrity problem	Severe	Manageable
Storage	High (many copies)	Low

Cache Patterns

Pattern	Consistency	Write amplification	When
Cache-aside	App-managed TTL	Low	General reads
Read-through	Cache loads on miss	Low	Simpler app code
Write-through	Sync to cache+DB	High	Strong read-after-write
Write-behind	Async to DB	Batch writes	Counters, analytics

Consistency vs Availability (during partition)

Choice	During partition	Example systems
CP	Reject ops to stay consistent	ZooKeeper, etcd
AP	Accept ops; reconcile later	Cassandra, DynamoDB (default)

Monolith vs Microservices

Factor	Monolith	Microservices
Time to market	Faster early	Slower (infra)
Scale	Vertical + replicas	Per-service scale
Failures	All-or-nothing deploy	Isolated blast radius
Data	Single DB joins	Distributed transactions hard

REST vs gRPC (internal)

	REST+JSON	gRPC
Performance	Good	Better
Contract	Loose	Strict proto
Browser	Yes	Needs gateway
Streaming	Limited	First-class

Strong vs Eventual — When to Say What

Strong: inventory, wallet, booking. Eventual: likes, view counts, recommendations.

Blob Storage in SQL vs S3

	SQL BLOB	S3
>1MB file	Bad	Good
Metadata query	Good	Need index table

Worked Example: Matrices

Document decision in interview: 'Chose Cassandra AP because write QPS 500K/s, accept eventual timeline.'

Extended Notes

Connect matrices to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Trade-Off Matrices

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How to present matrix in interview?

After proposing design: 'Summarizing: SQL for orders, Redis cache, S3 media — see trade-offs.'

Push vs pull for mobile?

Push for engagement; pull for battery-sensitive background sync.

Extended Reference — Trade-Off Matrices

Using matrices well

Do not read table verbatim — highlight 2 cells relevant to your design decision.

Consistency spectrum

Place your feature on spectrum from strong to eventual — justify with product requirement.

Cost dimension

Add row: operational complexity 1–5 — microservices score high.

When matrices fail

Nuanced decisions need prose — matrix is summary not analysis.

Compare three options

SQL vs Dynamo vs Cassandra — pick two dimensions interviewer cares about.

Part 28 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

SQL vs NoSQL matrix
Push vs pull
Fan-out matrix
Cache pattern matrix
Monolith vs micro
Summarize after design
Two relevant cells
Cost row optional
Consistency spectrum
Do not read table verbatim

Self-test prompt

Explain Part 28 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 28 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 29: Common Interview Mistakes

Jumping to Diagram Too Fast

Drawing boxes before requirements loses points. Spend 5–10 minutes on functional scope, DAU, read:write ratio, latency, consistency needs.

No Numbers

Architecture without BOE feels hand-wavy. Always compute rough QPS, storage, and bandwidth.

Single Point of Failure Blindness

One database, one region, one cache with no replica — interviewers will probe failure. Label replicas, failover, multi-AZ.

Ignoring the Hot Path

Optimize what users do 100×/day (read feed), not edge admin features. State which paths get cache, CDN, sharding.

Cache Everything

Cache without invalidation story or hit ratio assumption. Personalized data at CDN without Vary headers is a common trap.

Wrong Database Choice

Graph DB for simple CRUD; SQL for billion-scale write-heavy counters without plan. Justify with access pattern.

Over-Engineering

Kubernetes + Kafka + microservices for 1000 users MVP. Phased approach: monolith → cache → shard → extract services.

Under-Engineering Critical Paths

Payments with eventual consistency and no idempotency. Seat booking without transactions.

Not Thinking Aloud

Silent drawing confuses interviewer. Narrate trade-offs: "I could use X but choose Y because…"

Ignoring Interviewer Hints

Hints steer toward intended deep dive. If they ask "what if the DB is slow?" — discuss indexes and replicas, not unrelated CDN.

No Monitoring or Launch Plan

Senior candidates mention SLOs, feature flags, gradual rollout, rollback.

Fix: use Part 2 framework every time
Fix: end with trade-off summary table
Fix: invite feedback: "Should we deep dive data model or scaling?"

Red Flags Interviewers Notice

Vague 'we'll scale horizontally' without shard key
No failure discussion
Buzzwords without mechanism
Copying Netflix stack for CRUD app

Recovery Phrases

"Let me step back and clarify scale assumptions" — shows maturity when caught in hole.

Worked Example: Mistakes

Candidate drew 15 boxes in 2 minutes with no requirements — failed communication dimension.

Extended Notes

Connect mistakes to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Common Mistakes

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Biggest junior mistake?

No requirements — jump to Kafka and microservices.

Biggest senior expectation?

Operational completeness: metrics, rollout, failure modes unprompted.

Extended Reference — Common Mistakes

Time management

Spending 25 min on DB schema before high-level diagram — reverse order loses structure points.

Hint integration

Interviewer says 'what about cache' — pivot immediately; ignoring hint is negative signal.

Overconfidence

Claiming zero downtime without explaining mechanism — credibility loss.

Underconfidence

Silence is worse than wrong try — think aloud partial ideas.

Post-interview

Do not argue feedback — note and improve.

Part 29 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Requirements first
BOE before diagram
No SPOF blind
Hot path focus
Trade-offs spoken
Think aloud
Take hints
Monitoring mentioned
Phased rollout
No buzzword soup

Self-test prompt

Explain Part 29 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 29 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 30: Communication Scripts

Opening (First 2 Minutes)

"Thanks — before I design, I want to clarify scope. Is this mobile + web? Rough scale in DAU? Should I focus on read or write path first? Any constraints like existing AWS stack or strong consistency requirements?"

Clarifying Functional Requirements

"Core user actions are X, Y, Z — anything else in v1?"
"Do we need real-time updates or is 30-second delay OK?"
"Public vs private content — different retention?"
"Anonymous users or login required?"

Clarifying Non-Functional Requirements

"Target p99 latency for reads? Writes?"
"Availability target — 99.9% or 99.99%?"
"Durability — can we ever lose a post / payment?"
"Geographic focus — single region or global?"

While Estimating

"I'll assume 50M DAU, 10 reads per user per day — that's 500M reads/day, about 6K average QPS, ~30K peak with a 5× multiplier. Does that match your expectations?"

Introducing High-Level Design

"I'll sketch clients → CDN for static → load balancer → stateless API tier → cache → primary database, with async workers on a queue for heavy tasks."

Trade-Off Phrasing

Instead of	Say
"We'll use NoSQL"	"Access pattern is key-value by user_id; I'll use Dynamo for horizontal scale; we give up cross-shard joins"
"We'll cache it"	"80% hit ratio assumed; TTL 5 min with invalidation on write"
"Eventually consistent"	"Followers may see new post up to 30s late; acceptable for feed per product"

When Stuck

"I'm weighing fan-out on write vs read — for celebrities, hybrid is industry standard. I'll go hybrid unless you want to optimize for write simplicity."

Closing (Last 2 Minutes)

"To recap: stateless APIs behind LB, Redis timeline cache with hybrid fan-out, Postgres sharded by user_id, S3 for media, Kafka for async. I'd add p99 latency and replication lag alerts. With more time I'd detail search indexing and multi-region DR."

Responding to Challenges

"Good point — if the cache fails we degrade to DB with circuit breaker and higher latency; we don't fail closed unless data correctness requires it."

Deep Dive Invitation

"I can go deeper on data model, consistency, or ops — which is most valuable?"

Acknowledging Unknown

"I haven't operated Cassandra in prod; at high level it uses partition keys and tunable quorum — I'd partner with DBA for SLA specifics."

Worked Example: Scripts

Practice recording 5-min clarify+BOE aloud weekly; playback catches filler and silence.

Extended Notes

Connect scripts to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Communication

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

How long to clarify?

5–10 min acceptable — shows thoroughness; don't exceed without checkpoint.

How to handle 'you're wrong'?

Explore: 'If we need strong consistency here, I'd move writes to primary — does that match product?'

Extended Reference — Communication Scripts

Pacing

Pause after BOE: 'Does 100M DAU sound right?' — engages interviewer as collaborator.

Jargon control

Define acronyms once: 'CDN (edge cache)' — interviewer may be cross-functional.

Diagram narration

Left to right: 'User hits CDN, then...' — orient viewer continuously.

Trade-off sandwich

We gain X, we sacrifice Y, because product priority Z.

Closing question

Ask interviewer: 'What would you prioritize next for v2?' — shows curiosity.

Part 30 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Opening clarify script
Assumption validation
BOE narrated
Trade-off sandwich
Deep dive offer
Stuck recovery phrase
Closing recap 30s
Ask interviewer question
Acknowledge challenge
Collaborative tone

Self-test prompt

Explain Part 30 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 30 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 31: 8-Week & 12-Week Study Plans

8-Week Plan (Intensive)

Week	Focus	Daily topics (Mon–Sun)
1	Foundation	Part 0–2; 1 BOE exercise/day; 1 mock clarify-only
2	Estimation & scale	Part 3–4; daily latency quiz; scale 5 products on paper
3	Networking & caching	Part 5–7; draw CDN+LB for 3 apps
4	Databases	Part 8–12; SQL vs NoSQL matrix; saga exercise
5	Distributed systems	Part 11–13; CAP scenarios; Kafka ordering drill
6	Architecture styles	Part 14–17; rate limiter design; consistent hash drill
7	Ops & patterns	Part 18–21; SLO math; circuit breaker scenarios
8	Mocks & execution	Part 27–33; 3 full mocks; rubric self-score

12-Week Plan (Steady)

Week	Topics	Practice
1–2	Parts 1–3, 28–30	2 BOE drills/week; communication scripts aloud
3–4	Parts 4–7	1 design: URL shortener, rate limiter
5–6	Parts 8–10	1 design: Twitter feed, shard key exercises
7–8	Parts 11–15	1 design: chat; API style comparison writeup
9–10	Parts 16–22	1 design: Dropbox, payment ledger outline
11	Parts 23–26	1 design: notification system end-to-end
12	Parts 27, 32–33	4 full timed mocks; review mistake list

Daily 90-Minute Block Template

15 min — flash review (latency table, CAP, one matrix)
45 min — read one Part section deeply; notes in own words
30 min — whiteboard mini-design or explain aloud recorded

Weekend Deep Work

Saturday: full 45-min mock with peer or AI. Sunday: postmortem using Part 33 rubric; update weak-area queue for next week.

Part 27 Walkthrough Rotation

Week 8+: one classic design daily from guide Part 27: URL shortener, Twitter, Uber, WhatsApp, YouTube.

Spaced Repetition

Re-read Parts 3, 11, 28 every 2 weeks — core interview anchors.

Worked Example: Study

Track hours: 40% reading, 40% whiteboard, 20% mock — adjust if mocks score low.

Extended Notes

Connect study to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Study Plans

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

8 vs 12 week plan?

8 if interview in 2 months intensive; 12 if part-time while employed.

How many mocks?

Minimum 8–12 full mocks before onsite loop.

Extended Reference — Study Plans

Active recall

Close guide; sketch Twitter on blank paper from memory — gaps drive next reading.

Spaced repetition

Anki deck for latency numbers, CAP, algorithms — 10 min daily.

Peer mocks

Swap interviewer role — teaching exposes gaps.

Company-specific

Meta: feed/ranking. Amazon: retail inventory. Google: search/index. Stripe: payments/idempotency.

Burnout prevention

One day off weekly — retention drops when exhausted.

Part 31 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

8-week plan track
12-week if employed
Daily 90 min block
Weekend mock
Part 27 rotation
Spaced repetition
Active recall
Company specific focus
Peer exchange
Rest day weekly

Self-test prompt

Explain Part 31 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 31 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 32: Day-Before & Day-Of Checklist

Day Before Interview

Review latency numbers (Part 3 table) — 10 min
Skim trade-off matrices (Part 28) — 15 min
Re-read communication scripts (Part 30) — 10 min
One 25-min timed mini-design (clarify + BOE + high-level only)
Prepare 2 questions for interviewer about team/system
Test whiteboard tool (Excalidraw, CoderPad), camera, mic, internet backup
Sleep 7+ hours — cognitive performance drops sharply when tired

Day Of — 2 Hours Before

Light breakfast; hydrate
No cramming new topics — confidence from frameworks
Close noisy apps; phone silent
Open blank board tab + one-page cheat sheet (BOE formulas only)

15 Minutes Before

Bathroom, water nearby
Deep breath; review opening script once
Remind: collaboration, not exam — think aloud

During Interview

Clarify requirements before drawing
State assumptions and ask validation
BOE before deep architecture
Label diagram components and arrows
Pause for questions: "Does this direction make sense?"
Leave 5 min for summary and trade-offs

After Interview

Write notes while fresh: questions asked, hints given, what to study. Do not obsess on outcome — process improvement matters.

Item	Done?
Tool tested	☐
Framework internalized	☐
Opening script ready	☐
Questions for interviewer	☐

Virtual Interview Setup

Second monitor for notes
Browser zoom 100%
Pen and paper backup if whiteboard fails

Energy Management

Back-to-back interviews: protein snack between; avoid heavy lunch carb crash.

Worked Example: Checklist

Bring water; interviewer waits if you need 10 seconds to think — say 'let me structure this.'

Extended Notes

Connect checklist to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Checklists

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Whiteboard tool failure?

Verbal description + ASCII in chat — communication still scored.

Post-interview note?

Within 1 hour: questions, hints, weak dimensions for next study week.

Extended Reference — Day-Before & Day-Of

Materials

Water, charger, backup internet hotspot for virtual.

Mindset

Interview is collaborative design session not exam — reduces anxiety.

During lag

If video freezes, summarize last sentence when reconnected — maintain thread.

Note taking

Interviewer may allow notes — have BOE formulas written.

After

Send thank-you not required at big tech — focus on self debrief.

Part 32 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Tool tested
Latency table skim
Matrices skim
Opening script
Water charger
No cram new topic
Think pause OK
Post debrief notes
Questions for interviewer
Sleep priority

Self-test prompt

Explain Part 32 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 32 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top

Part 33: Mock Interview Rubric — Self-Score

How to Use This Rubric

After each mock, score 1–5 per dimension (1 = weak, 5 = strong). Track weekly; target average ≥4 on dimensions that matter for level. Compare with Part 1 interviewer expectations.

Scoring Scale

Score	Meaning
1	Missing or incorrect
2	Superficial mention
3	Adequate with gaps
4	Solid, minor misses
5	Strong, proactive depth

Dimension Definitions

Dimension	Score 1	Score 5
Requirements	Jumped to design	Functional + NFR + scale + constraints
Estimation	No numbers	Full BOE chain with stated assumptions
High-level design	Confusing diagram	Clear layers, labeled flows
Data model	Missing schema	Tables/keys/indexes justified
Scaling	No sharding/cache	Hot keys, replicas, CDN addressed
Reliability	Happy path only	Failures, retries, SPOF mitigation
Trade-offs	One-sided	Explicit pros/cons; matrices
Communication	Silent or rambling	Structured, collaborative, concise

Self-Score Sheet (Copy Per Mock)

Dimension	1–5	Notes / evidence
Requirements & scope
Back-of-envelope
API / interface design
High-level architecture
Data storage & model
Caching & CDN
Async / queues
Scaling & sharding
Consistency & reliability
Security & privacy
Observability & ops
Communication
Total /60

Interpretation

48–60: Interview-ready for most senior loops
36–47: Targeted study on lowest 3 dimensions
<36: Repeat framework (Part 2); more mocks before real interviews

Action Template

Lowest dimension this week: ___. Study Part #___. Drill: one mock focusing only on that phase next session.

Peer Mock Exchange

Swap rubrics with study partner; score each other blind; compare self vs peer scores for calibration.

Weekly Trend

Plot total score week over week — plateau means change mock format (harder problems, shorter time).

Worked Example: Rubric

Score communication separately even if design weak — improves hire/no-hire in borderline cases.

Extended Notes

Connect rubric to Part 2 framework step 6 deep dive. Interviewer may spend 10 minutes here — prepare one diagram and one failure scenario.

Document trade-off aloud: what you optimize for (latency, cost, consistency) and what you explicitly deprioritize in v1.

Reference related parts: see adjacent sections in this guide for complementary patterns.

Interview Question Bank — Rubric

Practice answering aloud in 60–90 seconds each. Tie answers to diagrams when possible.

Self-score inflation?

Compare with peer mock scores — calibrate harshly on communication and depth.

Hire bar mapping?

48/60+ consistent across 3 mocks suggests readiness for many FAANG loops.

Extended Reference — Mock Interview Rubric

Calibration

Score first mock harshly (3 average) — improvement visible by mock 5.

Dimension weighting

L5: depth + trade-offs weighted higher than perfect diagram art.

Communication 5

Requires thinking aloud entire session without long silence.

Tracking spreadsheet

Date, problem, scores per dimension, action items — weekly review.

Hire decision

Rubric guides study; actual hire uses holistic loop — don't overfit one mock score.

Part 33 Mastery Checklist

Before mock interviews, verify you can explain each item without reading:

Score 12 dimensions
Notes column evidence
Weekly trend
Peer calibration
Action on lowest
48+ target
Communication separate
Mock count 8+
Honest scoring
Hire bar holistic

Self-test prompt

Explain Part 33 to a rubber duck in 3 minutes, then answer two follow-up "what if it fails?" questions.

Mock tie-in

Map this checklist to Part 33 rubric: each item supports one rubric dimension (requirements, depth, trade-offs, communication).

Record score after self-test: /5 on confidence for Part 33 — revisit if below 4.

Link to Part 27: pick one walkthrough that stresses these concepts and whiteboard end-to-end.

↑ Back to top