Retrospectives that change behavior
Formats that work (Mad/Sad/Glad, Sailboat, 4Ls, Lean Coffee), formats that don't, and the action-item discipline that turns retros into actual change.
Most retrospectives are theatre. The team gathers, complains about the same issues for 30 minutes, votes on three action items, and then never tracks whether those action items happened. Six months later, the same complaints surface again.
A retrospective is supposed to change behaviour. If yours don't, you don't have retros — you have a recurring grievance forum.
This article is the difference: formats that work, formats that don't, and what AI can usefully do (and what it shouldn't).
What a retro is for
Two jobs.
-
Surface patterns from the sprint that the team can act on. Not every story-level issue — the issues with momentum across the sprint.
-
Decide on 1-3 specific behavioural changes that will be visible in the next sprint. Vague "communicate better" doesn't count. "Standup gets a 5-min hard timer, enforced by the scrum master" counts.
If your retro produces neither, it's wasted time.
Formats that work
Mad / Sad / Glad
The simplest format. Three columns. What made you mad? Sad? Glad?
Works because: low cognitive overhead, emotional access (good signal in software teams that under-discuss emotional load), trivial to facilitate.
Doesn't work when: the team always writes the same things in each column. At that point, switch formats.
Start / Stop / Continue
Three columns again. What should the team start doing? Stop doing? Continue doing?
Works because: each item maps directly to an action. "Stop scheduling planning the day after a release" is an immediately actionable change.
Doesn't work when: the team is gun-shy about saying "stop." Some teams accumulate "continue" items as a polite way to avoid hard conversations.
Sailboat
Drawing on a whiteboard (or its remote equivalent): a sailboat with wind, anchors, rocks, and an island.
- Wind = what's pushing us forward
- Anchors = what's holding us back
- Rocks = risks ahead
- Island = the goal
Works because: visual + metaphorical, surfaces risks and goals (which the other formats don't), good when the team is bored of the column-based formats.
Doesn't work when: the team includes people who hate metaphors. They'll stay quiet rather than engage.
The 4Ls (Liked / Learned / Lacked / Longed for)
Four columns this time. Adds "longed for" — what's missing from the team's environment.
Works because: surfaces structural / environmental issues the team can escalate up. Useful when the issues are leadership-level rather than within-team.
Lean Coffee
No fixed columns. Team members brainstorm topics on stickies, vote on which to discuss, set a per-topic timer (5-7 min), and roll on to the next one when time's up. Topics that didn't make the cut get rolled to next retro.
Works because: democratic agenda, time-boxed, prevents one topic from eating the whole hour.
Doesn't work when: facilitator doesn't enforce the timer. Topics drag, ground covered halves.
Formats that don't work
No format. "Let's just talk about how the sprint went." Devolves into senior voices dominating and the same three issues coming up.
The Same Format Every Sprint. Even good formats get stale. Mad/Sad/Glad for 18 months produces fatigue. Rotate every quarter.
The Read-the-Burndown-Out-Loud retro. Some teams open with 15 minutes of metrics review. By the time the team is supposed to discuss, energy is dead. Metrics are inputs to the retro; they're not the retro.
The Confession Booth. The team only surfaces individual mistakes. Useful zero times. Retrospectives are about systems, not individuals. If someone made a specific error, the conversation is a 1:1, not a retro.
The action-item discipline
This is where most retros fail. Lots of energy in the meeting → 3 action items → none of them happen → next retro produces 3 more.
Three rules:
1. Each action item has an owner. Not "the team." A specific person who's accountable for the change happening.
2. Each action item has a definition of done. "Improve communication" is not a done-able thing. "Add a #blockers channel and require eng leads to post a daily blocker update by 10am" is.
3. Each action item gets reviewed at the start of the next retro. Did it happen? If yes, did it work? If no, why not? This is the discipline that turns retrospectives into behaviour change.
The role of AI
AI is genuinely useful in retros — for the data side. Specifically:
Pattern recognition. Across 6 sprints, the team's stories that touched the auth module took 1.7x their estimated time. The team didn't notice. The AI does.
Sentiment trends. Standup comments and PR review comments over the sprint can be summarised: the team's tone shifted negative around day 7. What happened then? The model finds the inflection point.
Action item tracking. Did last retro's action items happen? The AI can match the action ("Add a #blockers channel") against actual events (channel created? eng leads posting?) and surface the answer.
Pattern correlation. Sprints where the team hit their goal had different characteristics than sprints where they didn't. The model can surface those characteristics — usually 2-3 actionable patterns.
What AI should NOT do: lead the retrospective. The conversations of a retro depend on the team's emotional and political dynamics. The AI is a data input, not a facilitator.
The Plan module surfaces patterns across sprints — which stories took longer than estimated, where capacity drift happened, what changed when the team hit (or missed) its goal.
Read next
- Sprint goals worth committing to — the retrospective checks whether you hit the goal; the goal sets up the retrospective.
- Burndown charts and what they actually tell you — the mid-sprint metric that feeds the retro.
- How AI writes acceptance criteria (and where it fails) — the AI-on-the-data theme, applied to a different surface.
Longer-form blog posts that go deeper on retrospectives that change behavior.
- What's the actual ROI of AI in software delivery?$4-$8 back for every dollar spent within 6 months, for most teams. The honest math from real data, not the deck.7 min read
- The connected delivery graph: one source of truth from PRD to prodMost teams ship software with five tools that don't talk to each other. The friction isn't any individual tool — it's the missing graph between them. This is the case for one connected graph.9 min read
- How long should a sprint be when using AI to write stories?1-week sprints become the right default with AI. The 2-week standard was calibrated to slow manual planning — AI changes the math.6 min read
More in Sprint planning
- Capacity planning that survives reality8 min · Naive capacity is team-size × sprint-days. Realistic capacity is 50-65% of that. Why, and how to compute it for your team.
- Story sizing without flame wars7 min · Fibonacci vs t-shirt, when to estimate, when to stop, and how AI helps without taking over the room.
- Sprint goals worth committing to7 min · The difference between 'complete these 12 stories' and 'deliver the multi-tenant CSV export'. Goals teams actually care about.
- Burndown charts and what they actually tell you9 min · The false-positive trap, the right metrics next to burndown, and what burndown does NOT show. Plus the patterns that mean something.