Latency percentile
A latency percentile (p50, p95, p99, p999) is the response time below which that share of requests completed. p99 = 500ms means 99% of requests finished in 500ms or less; the slowest 1% took longer. Averages hide tail latency, which is what users actually feel when the system is degraded; percentiles surface it.
Tail latency matters more than averages because, in distributed systems, a single user request often fans out to dozens of backend calls — and the user's response time is roughly the slowest backend response, not the average. p99 backend latency of 200ms with a 30-way fanout means median user latency is ~600ms (because the chance of hitting at least one p99 outlier is high). p999 (the 0.1% slowest) matters at scale: a service handling 1B requests/day still has 1M users hitting the slowest 0.1%. Healthy SLOs target p95 or p99, not averages; healthy alerting fires on percentile breaches, not average breaches.
Related terms
- SLI
A Service Level Indicator is a numerical measurement of one specific dimension of a service's behaviour — request latency, error rate, throughput, availability — expressed over a defined window.
- SLO
A Service-Level Objective is a target reliability metric for a service — typically expressed as a percentage over a time window.
- Saturation
Saturation is the measure of how full the most-constrained resource of a system is — CPU, memory, IOPS, network bandwidth, queue depth, file descriptors.