Vertical autoscaling
Vertical autoscaling resizes an existing instance — adding CPU, memory, or storage — rather than adding more instances. Useful for workloads that don't parallelise well (single-threaded servers, stateful databases) and for right-sizing chronically over-provisioned services. Less responsive than horizontal scaling because resize typically requires a restart.
Vertical scaling is the simpler design — no need for load balancing, no concerns about session affinity, no distributed state coordination. The trade-offs: there's a hard ceiling on how big an instance can get (cloud providers cap at ~128 vCPU / 2TB RAM); resize is slow (seconds to minutes) and disruptive (most VPAs evict the pod); cost scales superlinearly above mid-range instance sizes. The most common use case is stateful databases that can't be sharded — the operator runs them on the biggest instance the workload needs and resizes annually. For stateless services, horizontal scaling almost always wins on flexibility and cost.
Related terms
- Horizontal autoscaling
Horizontal autoscaling adds or removes service instances in response to load — typically CPU, memory, or a custom metric like queue depth.
- Saturation
Saturation is the measure of how full the most-constrained resource of a system is — CPU, memory, IOPS, network bandwidth, queue depth, file descriptors.
- Immutable infrastructure
Immutable infrastructure is the operational pattern where servers are never modified after deployment — to change configuration or apply patches, a new image is built and the old instances are replaced rather than updated in-place.