Incident commander
The incident commander is the single individual with end-to-end authority during a production incident — coordinating responders, deciding on mitigation actions, communicating to stakeholders, and declaring when the incident is over. The IC role is usually rotated, not assigned to a fixed person, and the IC is explicitly not the person doing the hands-on debugging.
The IC structure is borrowed from the Incident Command System used by fire services and emergency response. Its central insight: under stress, coordination is a full-time job and someone needs to do it without context-switching to also debug. Typical structure: IC coordinates and communicates; ops lead executes mitigations; communications lead handles external updates; scribe captures the timeline. Small incidents collapse all roles into one person; large incidents (multi-hour, customer-impacting) need the full separation. Healthy teams train every senior engineer as an IC and rotate the role per incident; teams that always assign the same person create a single point of failure and burn out the IC.
Related terms
- Runbook
A runbook is a step-by-step operational document that describes how to diagnose and resolve a specific failure mode — what alert fires, what to check first, which commands to run, when to escalate.
- Blameless postmortem
A blameless postmortem is an incident review structured to identify systemic causes — flawed processes, missing alerts, fragile dependencies — rather than individual fault.
- On-call rotation
An on-call rotation is the scheduled assignment of engineers to be the primary responder for production incidents during a defined window — usually 1 week per engineer, 24/7, with a secondary backup who escalates if the primary doesn't acknowledge inside the agreed window.