Exploratory testing alongside automation
Charters, time-boxes, observed defect rates. The structured discipline that finds the bugs automation never catches — UX issues, unexpected combinations, real-world data quirks.
Automated tests verify what you know to test for. Exploratory testing finds what you didn't. Despite the dominance of automation in modern QA practice, exploratory testing remains the most efficient way to discover the bugs that no automated test would have caught — UX problems, unexpected combinations, real-world data quirks, performance regressions in edge conditions.
The fix isn't to do less automation. It's to do explicit exploratory sessions alongside automation, with charters, time-boxes, and observed defect rates that show whether the practice is earning its keep.
What exploratory testing actually is
Exploratory testing is structured, time-boxed investigation of an area of the application with the explicit goal of finding defects automation wouldn't catch. It's not "click around and see if anything breaks" — that's ad-hoc testing, which produces low signal and no learning.
The structured version has four components:
- A charter: a 1-2 sentence statement of what's being explored and why. "Investigate the new bulk-import flow with adversarial inputs to find data-validation gaps."
- A time-box: typically 60-120 minutes. Long enough to dig in; short enough to maintain focus.
- An exploration approach: usually a heuristic (boundary values, illegal inputs, sequence violations, race conditions, accessibility-only navigation).
- A session report: what was tested, what was found, what was learned, what should be investigated further.
The output of a session is typically: 2-5 defects filed, 1-2 follow-up areas identified, and 0-2 improvements to the automated test suite (where the session revealed gaps that automation should have caught).
When exploratory pays off
Exploratory has the highest ROI in these situations:
- New feature areas where automation hasn't caught up. The first 2-3 sessions on a new feature typically find 5-10 defects that automation alone would have missed.
- Pre-release verification of high-stakes flows. Payment, authentication, data migration. Automation catches most regressions; exploratory catches the UX issues automation doesn't notice ("the form works but it's confusing").
- Investigating customer-reported issues. A customer says "the import doesn't work for me" and you can't reproduce. Exploratory testing with the customer's data shape is faster than guessing.
- AI-generated code review. AI-assisted code generation has expanded the surface area of changes per sprint; exploratory testing helps catch the failure modes AI introduces (plausible-but-wrong patterns, hallucinated edge case handling).
Exploratory has lower ROI for:
- Stable areas with extensive automation. The marginal defect-find rate is low; the team's time is better spent elsewhere.
- As a substitute for automation. Exploratory finds defects once; automation finds them every release. Use exploratory to identify gaps, then automate the gaps it finds.
The charter-driven format
A good charter is specific enough to focus the session but open enough to allow discovery. Examples:
Too vague: "Test the dashboard." Too specific: "Verify that the dashboard page renders the user's recent activity in chronological order." Right: "Investigate the dashboard's behaviour with edge-case user states (no activity, very high activity, partial-data states)."
The right charter gives the tester a starting heuristic ("edge-case user states") but doesn't pre-specify the test cases. The session's value comes from the tester's domain expertise applied within the charter's scope.
Heuristics that find defects
A short menu of exploration heuristics, each suited to a specific failure-mode category:
Boundary heuristics
- Test values at 0, 1, -1, MAX, MIN. Test empty lists. Test lists of exactly 1.
- Test dates at year boundaries (Dec 31, Jan 1), DST transitions, leap years.
- Test text inputs at empty, 1 char, max length, max+1.
Adversarial heuristics
- Submit forms with SQL injection patterns, XSS patterns, Unicode shenanigans (right-to-left text, zero-width spaces, surrogate pairs).
- Try to break the auth flow: refresh during mid-flow, navigate back, open in multiple tabs simultaneously.
- Submit forms with deliberately malformed data (text in numeric fields, future dates in past-date fields, negative quantities).
Sequence heuristics
- Perform actions in unusual order (logout-then-undo, edit-then-delete-then-edit-again).
- Cancel mid-flow (start a checkout, abandon it, start again with different data).
- Trigger race conditions (rapidly click submit, open the same record in two tabs and edit simultaneously).
Accessibility heuristics
- Navigate the entire flow using only the keyboard.
- Navigate using only a screen reader.
- Resize the browser to 320px wide and verify nothing breaks.
- Disable JavaScript and verify the app degrades gracefully (where progressive enhancement applies).
Real-world data heuristics
- Test with the largest real customer's data shape (not synthetic data).
- Test in the timezone where customers actually are (not just UTC).
- Test in the languages your customers use (not just English).
Session reports
The session report doesn't need to be long. A useful template:
Charter: [the one-sentence charter] Tester: [name] Date: [date] Duration: [actual time spent]
Defects found:
- DEF-119: [one-line summary, link to full defect]
- DEF-120: [one-line summary, link to full defect]
Areas covered:
- [bulleted list of what was tested]
Areas NOT covered (recommend follow-up):
- [bulleted list of what was deferred]
Suggested automation gaps:
- [if the session found something automation should have caught]
This format takes 5-10 minutes to fill out post-session. The accumulated reports become the team's collective knowledge of how the system actually behaves vs how the test suite says it behaves.
How often to run sessions
The pragmatic cadence:
- Per release: 1-2 sessions per major feature shipped in the release.
- Per sprint: 1-2 sessions of ~90 minutes total per QA-aligned team member.
- Triggered: a session per customer-reported issue that automation didn't catch.
For a typical 6-engineer team with 1 dedicated QA engineer or QA-aware engineer, this works out to 2-4 sessions per sprint — roughly 10% of one person's time. The defect-find rate justifies it: most teams running this discipline find 8-15 defects per month via exploratory that would have escaped to production otherwise.
Tooling
Exploratory testing tools focus on session capture rather than test execution. The leaders:
- Session Tester: lightweight session-capture (free, James Bach's tool).
- Rapid Reporter: structured note-taking for exploratory sessions.
- Test Notes by Microsoft: integrated with Azure DevOps; captures screenshots, screen recordings, browser actions during the session.
- Built into modern test management tools: Xray, qTest, TestRail all support session-style test execution that captures defects and notes inline.
The tool matters less than the discipline. A team using a Markdown template in a shared doc is as effective as one using a dedicated tool, as long as the sessions actually happen.
Common pitfalls
- Ad-hoc instead of structured: clicking around without a charter or time-box produces little useful output. Always start with a charter.
- Treating session output as bug reports only: the suggested-automation-gaps section is often the most valuable output. Skipping it loses the learning.
- Skipping the report: a session without a written report becomes invisible; the team can't learn from it, and the tester can't reference it later.
- Defaulting to senior testers only: junior testers often find different bugs than senior testers because they have less curse-of-knowledge. Rotate exploratory testing across the team.
What this looks like at the team level
A team running structured exploratory testing well shows three indicators:
- Steady stream of exploratory-sourced defects per sprint. If sessions consistently find zero defects, either the charters are wrong (testing the wrong areas) or the application is genuinely mature (rare).
- Automation suite grows from exploratory findings. Most session reports should produce 0-2 candidates for new automated tests. If they never do, the team isn't closing the loop.
- Cross-team exploratory rotation. Developers occasionally do exploratory sessions, not just QA-specialists. This spreads the practice and produces fresh perspectives.
Related reading
For the automated test discipline that exploratory complements, see Test-case design. For the regression strategy that determines what runs when, see Regression strategy. For what to do with the defects exploratory surfaces, see Defect triage.
Frequently asked questions
- Isn't exploratory testing just clicking around?
- No — that's ad-hoc testing, which produces low signal. Structured exploratory testing has four components: (1) a 1-2 sentence charter stating what's being explored and why, (2) a time-box (typically 60-120 min), (3) an exploration approach (a specific heuristic like boundary values or adversarial inputs), and (4) a written session report. The discipline is what produces useful findings.
- How often should we run exploratory sessions?
- For a 6-engineer team with 1 dedicated QA or QA-aware engineer: 2-4 sessions of ~90 minutes per sprint — roughly 10% of one person's time. Most teams running this cadence find 8-15 defects per month that automation alone would have missed.
- When does exploratory testing have the highest ROI?
- On new feature areas where automation hasn't caught up (first 2-3 sessions typically find 5-10 defects each), pre-release verification of high-stakes flows (payment, auth, migration), investigating customer-reported issues that can't be reproduced, and verifying AI-generated code (which has different failure modes than human-written code).
- Should we use a dedicated exploratory testing tool?
- A team using a Markdown template in a shared doc is as effective as one using Session Tester, Rapid Reporter, or built-in tool features (Xray, qTest, TestRail all support session-style execution). The tool matters less than the discipline of actually running structured sessions and writing reports.
Longer-form blog posts that go deeper on exploratory testing alongside automation.
More in Test management
- Test-case design that doesn't go stale9 min · Behaviour-anchored Gherkin survives refactors that break step-anchored UI tests. The 5 components every good case has, and structural moves that age well.
- Traceability matrix without spreadsheet hell9 min · Manual spreadsheets drift within weeks. The derived-matrix approach — auto-generated from the entity graph — stays accurate and pays back for audit-grade compliance.
- Regression strategy that scales past 10,000 tests11 min · At 10k+ tests, "run everything" stops being a strategy. The 4-tier approach (smoke / affected / full nightly / pre-release) keeps iteration fast without sacrificing coverage.
- Defect triage that doesn't drown the team10 min · Severity × frequency × impact, with explicit non-fix criteria and SLAs per severity tier. The process that prevents the backlog from growing to 400+ untriaged items.