MTTR
Mean Time To Recovery is the average elapsed time between an incident's detection and its resolution. It's one of the four DORA metrics (lead time, deploy frequency, change failure rate, MTTR) and indicates how quickly a team can return to healthy production after a failure.
Confusingly, MTTR has been used for at least four distinct meanings: Mean Time To Recovery, Mean Time To Repair, Mean Time To Resolve, and Mean Time To Respond. The DORA framing uses Recovery — wall-clock from incident-fired to service-restored. Healthy teams report MTTR in minutes; teams missing the runbook discipline often report it in hours or days. The metric is most useful when combined with frequency (deploys per day) — high frequency + low MTTR is the elite tier.
Long-form posts that explore mttr in depth — when to use it, common failure modes, how AI helps.
- BPMN process mining without Celonis moneyCelonis charges $100K-$1M+ for process mining. It's genuinely good. It's also wildly overpriced for 95% of teams. This is the lighter-weight playbook that actually works.9 min read
- What's the actual ROI of AI in software delivery?$4-$8 back for every dollar spent within 6 months, for most teams. The honest math from real data, not the deck.7 min read
Related terms
- Lead time
Lead time is the elapsed time from when work is requested (story created, ticket filed) to when it's delivered (deployed to production).
- Throughput
Throughput is the count of work items completed per unit of time (typically per week or per sprint).
- Regression test
A regression test verifies that previously working functionality still works after a code change.