Verify

Latency percentile

A latency percentile (p50, p95, p99, p999) is the response time below which that share of requests completed. p99 = 500ms means 99% of requests finished in 500ms or less; the slowest 1% took longer. Averages hide tail latency, which is what users actually feel when the system is degraded; percentiles surface it.

May 23, 2026

Tail latency matters more than averages because, in distributed systems, a single user request often fans out to dozens of backend calls, and the user's response time is roughly the slowest backend response, not the average. p99 backend latency of 200ms with a 30-way fanout means median user latency is ~600ms (because the chance of hitting at least one p99 outlier is high). p999 (the 0.1% slowest) matters at scale: a service handling 1B requests/day still has 1M users hitting the slowest 0.1%. Healthy SLOs target p95 or p99, not averages; healthy alerting fires on percentile breaches, not average breaches.