Back to roadmap
Module 2 · Scale, Latency, CapacityDay 01225 min

Capacity Planning

Provision for peak — and prove it.

Day 012

Capacity Planning

Forecast
note
Headroom
note
Zoning
note
Verify
note
Signal path
Capacity planning building blocks
Forecast
note
flow
Headroom
note
Headroom
note
flow
Zoning
note
Zoning
note
flow
Verify
note
Memory hook

Capacity Planning: provision for peak

Mental model

make the invisible limits visible

Design lens

Always-on capacity wastes money outside spikes.

Recall anchors
ForecastRedundancyVerification

Why it matters

Capacity planning is taking your traffic forecast plus failure assumptions and producing an infrastructure shape that survives them with margin. Average is a comfortable lie — peaks, spikes, and failure modes are what bring systems down.

Deep dive

Start with peak QPS. For consumer apps, peak is roughly 3–5× average. Black Friday or launch events can be 50× — model these explicitly.

Apply N+1 or N+2 redundancy: if 4 instances handle peak, run 5 or 6 so a single failure leaves you whole. For region-level: keep one region's worth of headroom in another.

Estimate is a hypothesis; load test is the verification. Run synthetic peak through staging and watch p99/p999 — they break long before averages.

Demo / scenario

Plan capacity for an online ticketing event.

  1. Average: 100 RPS. Drop launches at noon → expected 50× = 5,000 RPS for 60s.
  2. Each instance handles 200 RPS at p99 ≤ 200ms.
  3. Need 25 instances for spike + 5 N+1 → 30.
  4. Add per-zone redundancy across 3 zones.
  5. Load test confirms — p99 stays at 240ms under 6,000 RPS.

Tradeoffs

  • Always-on capacity wastes money outside spikes.
  • Autoscaling has minute-scale lag — too slow for 60s spikes.
  • Pre-warming + queue-based admission control is the safer combo.

Diagram

Forecast
Peak QPS
Headroom
+ N+1
Zoning
Multi-AZ
Verify
Load test
Capacity planning building blocks.

Mind map

Check yourself

Loading quiz…

Sources & further reading