Verify

Runbook

A runbook is a step-by-step operational document that describes how to diagnose and resolve a specific failure mode, what alert fires, what to check first, which commands to run, when to escalate. Runbooks are linked from alert payloads so the responder reaches the procedure within seconds of the page firing.

May 23, 2026

Runbooks degrade quickly when not exercised. The two failure modes are absence (the alert fires, the responder has nothing to follow) and rot (the runbook references a deprecated tool, a renamed dashboard, or a person who left two years ago). Healthy runbook practice: every alert links to a runbook; every game day surfaces runbook gaps; every postmortem produces a runbook edit; the runbook lives in version control alongside the service it covers, not in a wiki nobody updates. A runbook is the difference between a 5-minute MTTR and a 50-minute MTTR for routine incidents.