All glossary terms
Verify

Game day

A game day is a scheduled exercise in which an engineering team intentionally exercises a failure scenario in a live or production-like environment — pulling a database, killing a region, exhausting a quota — to validate that monitoring fires, runbooks work, and the on-call rotation can respond inside the agreed SLO.

Game days originated at Amazon as a way to keep recovery procedures from atrophying between real incidents. The typical structure is a 2-4 hour session with a pre-written scenario, an injection point, observers timing each detection and response step, and a post-exercise debrief that produces concrete runbook edits. Game days complement chaos engineering: chaos runs unattended in production at random intervals; game days are scheduled, observed, and focused on validating human + tooling response. Both surface the same kinds of latent failure (stale runbooks, missing alerts, unclear escalation) but game days are higher-resolution because a human team is actively diagnosing.

Related terms