High availability
High availability is the design objective of keeping a system continuously operational for a defined uptime target — typically expressed in nines (99.9% = ~8.7 hours/year downtime, 99.99% = ~52 minutes/year, 99.999% = ~5 minutes/year). HA is achieved through redundancy, automatic failover, and elimination of single points of failure.
Each additional nine of availability costs roughly an order of magnitude more — 99.9% can be a single-AZ deployment with good monitoring; 99.99% requires multi-AZ; 99.999% requires multi-region with active-active routing and runbook automation that doesn't wait for humans. Most B2B SaaS targets 99.9% (three nines) because the cost-benefit beyond that is poor unless customers are paying a premium. The honest framing: availability targets are budget decisions, not just engineering decisions, and most teams over-promise nines in marketing collateral while measuring at three nines internally.
Related terms
- Disaster recovery
Disaster recovery is the set of plans, procedures, and infrastructure that restores a service after a major failure — region outage, data corruption, ransomware, deletion of production data.
- RTO and RPO
Recovery Time Objective (RTO) is the maximum acceptable time between a disaster and service restoration.
- Fault tolerance
Fault tolerance is the property of a system to continue operating, possibly in a degraded state, when one or more of its components fail.