Postmortem
A postmortem is a structured retrospective on an incident or failure — capturing what happened, why, what was learned, and what will change. Blameless postmortems focus on systemic causes rather than individual mistakes, on the premise that anyone in the same situation would have made the same call. They are the central practice of high-reliability engineering cultures.
A good postmortem has: timeline (when each event happened, in UTC), root-cause analysis (often via the Five Whys technique), impact assessment (which users, how long, what business cost), action items with owners and dates, and lessons learned. The output should be readable cold by someone who wasn't involved. Blameless framing matters: the moment a postmortem becomes about who to blame, future incidents get hidden instead of analyzed, which erodes reliability over time.
Related terms
- MTTR
Mean Time To Recovery is the average elapsed time between an incident's detection and its resolution.
- Five whys
Five Whys is a root-cause-analysis technique: ask 'why?' five times in a row (or until the answer becomes systemic rather than situational) to find the underlying cause of a problem.
- Regression test
A regression test verifies that previously working functionality still works after a code change.