Module 2 · Scale, Latency, CapacityDay 01525 min

Tail Latency

p99 is your real product.

← Previous Next →

Day 015

Tail Latency

25m

focus

Page

service

Signal path

Fan-out amplifies tail latency

Page

service

flow

service

Page

service

flow

service

Page

service

flow

service

Memory hook

Tail Latency: p99 is your real product

Mental model

make the invisible limits visible

Design lens

Hedging doubles bandwidth on tail requests.

Recall anchors

CausesMeasureMitigate

Why it matters

Tail latency — p99 and beyond — is what users actually feel because it dominates page-loads composed of many parallel requests. A page that calls 10 backends, each with 99% under 100ms, has only 90% chance of finishing under 100ms.

1Understand why averages lie about user experience.
2Use percentiles (p50/p95/p99/p999) for measuring.
3Apply common mitigations: hedged requests, timeouts, copies.

Deep dive

p99 grows with fan-out. The probability all 10 of your subrequests are fast is 0.99^10 ≈ 90%. Reduce fan-out, hedge slow requests, or set tight timeouts.

Sources of tails: GC pauses, JIT, packet retransmits, contention, bad neighbors on shared cores. Most are stochastic and unfixable individually.

Mitigations: hedged requests (send to 2 replicas, take fastest), tight timeouts with retries, shedding under load, isolating noisy neighbors.

Demo / scenario

A search page calls 8 services in parallel.

Each service: p99 = 100 ms.
Page p99 ≈ 1 - (0.99)^8 = 7.7% slow → much worse than 1%.
Add hedging: dispatch 2nd request after 50ms.
Tail tames; cost is ~5% extra load.

Tradeoffs

Hedging doubles bandwidth on tail requests.
Tight timeouts cause more user-visible errors.
Reducing fan-out is structurally better but slower to ship.

Diagram

Fan-out amplifies tail latency.

Mind map

Check yourself

Loading quiz…

Sources & further reading

Tail at Scale — Dean & Barroso

← Previous Next →