p99 is your real product.
Tail Latency: p99 is your real product
make the invisible limits visible
Hedging doubles bandwidth on tail requests.
Tail latency — p99 and beyond — is what users actually feel because it dominates page-loads composed of many parallel requests. A page that calls 10 backends, each with 99% under 100ms, has only 90% chance of finishing under 100ms.
p99 grows with fan-out. The probability all 10 of your subrequests are fast is 0.99^10 ≈ 90%. Reduce fan-out, hedge slow requests, or set tight timeouts.
Sources of tails: GC pauses, JIT, packet retransmits, contention, bad neighbors on shared cores. Most are stochastic and unfixable individually.
Mitigations: hedged requests (send to 2 replicas, take fastest), tight timeouts with retries, shedding under load, isolating noisy neighbors.
A search page calls 8 services in parallel.