All glossary terms
Optimize

Error budget

An error budget is the allowable reliability gap between the SLA (customer contract) and the SLO (operational target). If your SLO is 99.9% and you're meeting 99.95%, you have a 0.05% error budget to spend on risky changes — new features, infrastructure migrations, schema rewrites. Error budgets convert reliability from a yes/no debate into a tradeable resource.

The team's relationship with the error budget shapes release cadence: when there's budget, ship boldly; when budget is depleted, slow down and prioritise reliability work. The Google SRE handbook treats burn-down of the error budget as a primary on-call signal — burning faster than expected triggers a freeze on non-reliability work. The tradeable-resource framing is what makes the concept stick organisationally; without it, reliability vs. velocity becomes a recurring philosophical debate.

Discussed in our blog

Long-form posts that explore error budget in depth — when to use it, common failure modes, how AI helps.

Related terms