Back to roadmap
Module 2 · Scale, Latency, CapacityDay 01525 min

Tail Latency

p99 is your real product.

Day 015

Tail Latency

Page
service
S1
service
S2
service
S3
service
Signal path
Fan-out amplifies tail latency
Page
service
flow
S1
service
Page
service
flow
S2
service
Page
service
flow
S3
service
Memory hook

Tail Latency: p99 is your real product

Mental model

make the invisible limits visible

Design lens

Hedging doubles bandwidth on tail requests.

Recall anchors
CausesMeasureMitigate

Why it matters

Tail latency — p99 and beyond — is what users actually feel because it dominates page-loads composed of many parallel requests. A page that calls 10 backends, each with 99% under 100ms, has only 90% chance of finishing under 100ms.

Deep dive

p99 grows with fan-out. The probability all 10 of your subrequests are fast is 0.99^10 ≈ 90%. Reduce fan-out, hedge slow requests, or set tight timeouts.

Sources of tails: GC pauses, JIT, packet retransmits, contention, bad neighbors on shared cores. Most are stochastic and unfixable individually.

Mitigations: hedged requests (send to 2 replicas, take fastest), tight timeouts with retries, shedding under load, isolating noisy neighbors.

Demo / scenario

A search page calls 8 services in parallel.

  1. Each service: p99 = 100 ms.
  2. Page p99 ≈ 1 - (0.99)^8 = 7.7% slow → much worse than 1%.
  3. Add hedging: dispatch 2nd request after 50ms.
  4. Tail tames; cost is ~5% extra load.

Tradeoffs

  • Hedging doubles bandwidth on tail requests.
  • Tight timeouts cause more user-visible errors.
  • Reducing fan-out is structurally better but slower to ship.

Diagram

Page
S1
S2
S3
Fan-out amplifies tail latency.

Mind map

Check yourself

Loading quiz…

Sources & further reading