All glossary terms
Verify

Latency percentile

A latency percentile (p50, p95, p99, p999) is the response time below which that share of requests completed. p99 = 500ms means 99% of requests finished in 500ms or less; the slowest 1% took longer. Averages hide tail latency, which is what users actually feel when the system is degraded; percentiles surface it.

Tail latency matters more than averages because, in distributed systems, a single user request often fans out to dozens of backend calls — and the user's response time is roughly the slowest backend response, not the average. p99 backend latency of 200ms with a 30-way fanout means median user latency is ~600ms (because the chance of hitting at least one p99 outlier is high). p999 (the 0.1% slowest) matters at scale: a service handling 1B requests/day still has 1M users hitting the slowest 0.1%. Healthy SLOs target p95 or p99, not averages; healthy alerting fires on percentile breaches, not average breaches.

Related terms