Saturation
Saturation is the measure of how full the most-constrained resource of a system is — CPU, memory, IOPS, network bandwidth, queue depth, file descriptors. Saturation is the leading indicator of capacity problems: latency stays flat until saturation crosses ~70%, then degrades nonlinearly until queue depth explodes.
Saturation is one of the four golden signals because it predicts trouble before users feel it. The diagnostic value: a service with 95% CPU saturation but acceptable latency is one traffic spike away from collapse. Common saturation metrics include CPU run queue length, memory pressure (PSI on Linux), disk IO utilisation %, network bandwidth %, garbage collection pause time, and application-level queue depth. The single biggest mistake is monitoring averages rather than tail percentiles — a service that averages 40% CPU saturation but spends 5 minutes per hour at 99% will look healthy in dashboards while routinely paging the on-call.
Related terms
- Four golden signals
The four golden signals — latency, traffic, errors, saturation — are the minimum monitoring set Google SRE recommends for any user-facing service.
- Latency percentile
A latency percentile (p50, p95, p99, p999) is the response time below which that share of requests completed.
- Horizontal autoscaling
Horizontal autoscaling adds or removes service instances in response to load — typically CPU, memory, or a custom metric like queue depth.