Back to roadmap
Module 2 · Scale, Latency, CapacityDay 01020 min

Throughput vs Latency

Two different curves with different bottlenecks.

Day 010

Throughput vs Latency

L (concurrency)
service
λ (throughput)
service
W (latency)
service
Signal path
Little's Law: concurrency × throughput × latency
λ (throughput)
service
flow
L (concurrency)
service
W (latency)
service
flow
L (concurrency)
service
Memory hook

Throughput vs Latency: two different curves with different bottlenecks

Mental model

make the invisible limits visible

Design lens

Bigger pool → memory + downstream pressure.

Recall anchors
Throughput leversLatency leversCoupling

Why it matters

Latency is how long one request takes. Throughput is how many requests finish per second. They are not the same: a system can have low latency at low load and collapse at high throughput, or sustain high throughput while individual requests crawl.

Deep dive

Little's Law: average concurrent requests L = arrival rate λ × average latency W. If you can hold 100 in-flight and each takes 50 ms, max throughput is 2,000/s. Tighten W or grow L.

Throughput often improves with batching, queueing, parallelism. Latency often improves with caching, locality, smaller payloads. Tactics differ.

Capacity ceilings are hit when L grows but W also grows nonlinearly — queueing kicks in and tail latency explodes well before raw QPS limit.

Demo / scenario

Service does 200 ms request, 50 concurrent slots.

  1. Theoretical max: 50 / 0.2 = 250 RPS.
  2. Saw 240 RPS — near saturation.
  3. Tail latency rising: queueing.
  4. Add 50 more slots → 500 RPS, but check downstream DB.

Tradeoffs

  • Bigger pool → memory + downstream pressure.
  • Smaller payloads → more dev work.
  • Adding cache cuts W and raises throughput simultaneously.

Diagram

L (concurrency)
λ (throughput)
W (latency)
Little's Law: concurrency × throughput × latency.

Mind map

Check yourself

Loading quiz…

Sources & further reading