All research
Research

DORA Metrics in Practice 2026

Seven years of DORA Accelerate data, synthesised. Plus the pre-registered design for an 800-engineering-leader segmented refresh.

PDF (6 pages)Press kit
Contents
  1. 1. Why another DORA report
  2. 2. The four metrics, properly defined
  3. 3. Seven years of DORA
  4. 4. Reading the framework — right and wrong
  5. 5. The AI-adoption-bucket question
  6. 6. What Volume 1 will measure
  7. 7. Methodology summary
  8. 8. Limitations and what to expect
  9. 9. FAQ
  10. 10. Related work
  11. 11. Cite this volume
  12. 12. Participate in Volume 1

Key findings

  1. Elite-quartile DORA performance has held at ~15–20% of respondents (±4pp) across all seven published years (DORA Accelerate Reports 2018–2024).
  2. AI adoption among elite-quartile teams was ~1.4× higher than low-quartile teams in DORA 2024 — correlation, not causation. DORA explicitly disclaims a causal claim.
  3. The popular "AI causes elite DORA" interpretation routinely ignores DORA's explicit hedge. Volume 1 tests directly whether the AI-quartile correlation survives controls for company size and engineering practice maturity.
  4. Change failure rate is the noisiest of the four metrics, with the largest year-over-year variance in the published cohort.
  5. DORA renamed "Time to Recovery" to "Time to Restore Service" in 2024 to disambiguate from MTTR-as-Repair; cross-year comparisons should account for this and other periodic definition shifts.

Methodology

800-respondent engineering-leader survey using DORA's published item wording verbatim, segmented by team size × industry × AI-adoption-depth × practice-maturity. Hierarchical regression with practice covariates; Wilson 95% CIs; Benjamini–Hochberg FDR correction.

Read the full methodology

Why another DORA report when DORA already publishes one annually

The four DORA metricsdeployment frequency, lead time for changes, time to restore service, change failure rate — are the most-cited delivery benchmark in software engineering. The framework is rigorous; the popular interpretation often isn't. DORA explicitly does NOT claim causation between any single metric and organisational performance; the DORA 2024 report goes to considerable lengths to disclaim it. Yet the popular reading — "AI raises elite-DORA performance," "improving deployment frequency causes elite quartile membership" — routinely treats correlation as causation.

We chose to publish Volume 0 before Volume 1 for three reasons.

First, the cross-year synthesis of DORA reports rarely surfaces in industry coverage. DORA publishes annually; the year-over-year trends are visible in the side-by-side, not in any single year's report. The shift from "MTTR" to "Time to Restore Service" in the 2024 report, the addition of "Documentation" and "User-centricity" as supporting capabilities, the explicit AI cohort framing — each of these is most legible when read across years, which is what Volume 0 does.

Second, pre-registering hypotheses before fielding is the difference between a research report and a press release with footnotes. The hypotheses we register today are the hypotheses we will test when our 800-respondent segmented refresh fields. DORA's annual cadence means the most-recent comparative segment — AI cohort by team size — is not interrogated in real time; the Stride 2026 refresh fills that gap.

Third, the 2026 question DORA's annual cadence doesn't capture in real time: does AI adoption cause elite-quartile performance, or does pre-existing engineering practice maturity explain both? DORA 2024 reports correlation; the Stride 2026 Volume 1 tests whether the correlation survives controls for company size and practice maturity.

The four metrics, properly defined

The DORA framework was established by the early DORA research summarised in the Accelerate book (Forsgren, Humble & Kim 2018) and refined across seven annual reports. The current canonical definitions (from DORA 2024) are:

Deployment frequency. How often does your team deploy to production? Elite-quartile threshold: on demand (multiple deploys per day). Low-quartile threshold: less than once every six months.

Lead time for changes. How long does it take a code change to go from "committed" to "running in production"? Elite: less than one day. Low: one to six months.

Time to restore service. After a production incident is detected, how long until service is restored? (DORA renamed this metric from "Mean Time to Recovery / Repair" in 2024 to disambiguate from MTTR-as-Repair which measures engineering time on the underlying fix rather than the restoration to service.) Elite: less than one hour. Low: one week to one month.

Change failure rate (CFR). What percentage of deployments cause a degradation in service requiring remediation? Elite: 0–5%. Low: ≥15%.

≈17%
Share of respondents in the elite quartile across the seven DORA reports (2018–2024), with year-over-year variance of about ±4pp.Source: DORA Accelerate State of DevOps Reports 2018–2024.

Each metric has the property that improving one in isolation is achievable but not desirable. Deploying more frequently without driving down change failure rate is just shipping bugs faster. Reducing lead time without reducing CFR can hide accumulating technical debt. The framework's empirical observation: elite-quartile teams demonstrate all four metrics at the elite threshold simultaneously, and exhibit characteristic engineering practices (trunk-based development, CI/CD, test automation, on-call rotation maturity).

Seven years of DORA, side by side

DORA's annual cadence makes the framework the longest-running large-N cohort study in software-delivery research. Reading across years surfaces patterns that single-year reports flatten.

DORA quartile distribution 2018 to 2024Stacked bar chart with seven bars showing the percentage of respondents in elite, high, medium, and low DORA quartiles each year.100%50%0%201820192021202220232024EliteHighMediumLow
Figure 1. Seven years of DORA, quartile distribution (2018–2024). Stacked bars show the share of respondents in each quartile per year. Elite quartile is ~15–20% in most years; low-quartile cohort grew in 2023–2024.

Source: annual DORA Accelerate State of DevOps Reports 2018–2024.

Chart description (text)

A stacked bar chart with seven bars, one for each DORA Accelerate Report from 2018 to 2024. Each bar shows the percentage of respondents in elite, high, medium, and low quartiles, summing to 100 percent. 2018: elite 7 percent, high 31 percent, medium 48 percent, low 14 percent. 2019: elite 20 percent, high 23 percent, medium 44 percent, low 12 percent. 2021: elite 26 percent, high 47 percent, medium 16 percent, low 11 percent (DORA briefly used a three-tier framing this year). 2022: elite 11 percent, high 53 percent, medium 28 percent, low 8 percent. 2023: elite 18 percent, high 31 percent, medium 33 percent, low 18 percent. 2024: elite 17 percent, high 31 percent, medium 31 percent, low 21 percent. DORA renamed Time to Recover to Time to Restore in 2024 and re-anchored thresholds, so cross-year comparisons are directional rather than precise.

Three patterns are visible across the seven years. First, the elite-quartile share is stable at roughly 15–20% with year-over-year variance of about ±4 percentage points. The framework's reweighting in 2021 (three-tier framing) and 2024 (Time to Restore replacing Time to Recover) shifts these numbers slightly but the underlying cohort shape holds. Second, the medium quartile has compressed from a 2018 high of ~48% toward ~30% by 2024 — the population is bimodalising into "elite + high" and "low," with fewer teams sitting in the middle. Third, the low-quartile share grew from 14% (2018) to 21% (2024). DORA's commentary attributes part of this growth to the broader population now being sampled (DORA's cohort has expanded beyond the early-adopter subset).

What modern reading of the framework gets right and wrong

The framework's empirical core — four metrics, four quartiles, characteristic practice clusters at each tier — has held up well across seven years. Practitioners broadly recognise the quartile descriptors. Engineering leaders use the framework to benchmark internal teams against the published cohort.

Where the popular reading goes off the rails is on causation. The 2024 report adds AI adoption as a cross-tab variable and reports that AI usage is ~1.4× more prevalent in elite-quartile teams than low-quartile teams. The popular interpretation: AI causes elite performance. The published reading, with the explicit DORA caveat: AI adoption correlates with characteristic engineering practices already present in elite-quartile teams.

AI adoption rate by DORA quartile (DORA 2024)Horizontal bars showing approximately 80 percent AI adoption in elite quartile, 70 percent in high, 60 percent in medium, 56 percent in low.0%25%50%75%100%Elite≈80%High≈70%Medium≈60%Low≈56%Correlation only — DORA explicitly disclaims causation
Figure 2. AI adoption by DORA quartile (DORA 2024). Bar chart of AI adoption rate by quartile. The 1.4× elite-vs-low ratio is correlation, not causation — both ratios likely share an upstream variable (practice maturity).

Source: DORA Accelerate Report 2024.

Chart description (text)

Horizontal bar chart with four bars representing AI adoption rate in each DORA quartile from the 2024 Accelerate Report. Elite quartile: approximately 80 percent AI adoption. High quartile: approximately 70 percent. Medium quartile: approximately 60 percent. Low quartile: approximately 56 percent. The elite-to-low ratio of 1.4 times is correlation observed in cross-sectional data; DORA explicitly does not claim that AI adoption causes elite-quartile performance. A pre-existing engineering-practice maturity variable likely contributes to both.

The pattern most commonly misread is "increasing your AI adoption will move you up a quartile." DORA's framework treats AI as a capability alongside practices like trunk-based development and on-call maturity, not as a standalone cause. The Stride 2026 segmented refresh tests whether the AI-quartile correlation survives controls for company size and pre-existing practice maturity.

The four landmark sources, side by side

DORA Accelerate 2024

2024

Sample
n≈39,000 respondents
Method
Annual cross-sectional survey
Headline finding
AI adoption correlates with elite-quartile membership at ~1.4× rate. DORA explicitly disclaims causation.
Key limitation
Self-reported metrics, not telemetry-measured. Cross-sectional only.
Open source

Forsgren/Humble/Kim 2018 (Accelerate book)

2018

Sample
n≈23,000 multi-year synthesis
Method
Book + book-supported research
Headline finding
Four-metric framework predicts organisational performance. Established the canonical DORA quartile structure.
Key limitation
Pre-AI era. Correlation observed; causation not claimed.
Open source

Software Engineering at Google 2020

2020

Sample
Internal Google practice case study
Method
Practice case study + qualitative analysis
Headline finding
Trunk-based development + CI/CD + code review correlate with elite performance in a single-org large-scale context.
Key limitation
Single-org bias; non-generalisable. Practices were already mature at Google; can't test whether adopting them in a less-mature org would replicate the outcome.
Open source

Stack Overflow Developer Survey 2024 (DevOps section)

2024

Sample
n≈65,000
Method
Annual cross-sectional survey
Headline finding
Independent corroboration of DORA's deployment-frequency findings on a much larger non-DORA-recruited sample.
Key limitation
Different question wording than DORA; not directly comparable to DORA quartiles. Different respondent selection.
Open source

The AI-adoption-bucket question

The single most-quoted finding from DORA 2024 — elite-quartile teams adopt AI at 1.4× the rate of low-quartile teams — admits at least three different causal stories.

Story 1: AI raises performance. Adopting AI tools (Copilot, ChatGPT, Stride) makes engineering teams measurably faster and more reliable, moving them up DORA quartiles.

Story 2: Performance enables AI. Elite-quartile teams already have the engineering practices (CI/CD, code review, test automation) that make AI adoption tractable. Low-quartile teams without those practices can't get AI to stick because the surrounding workflow can't support it.

Story 3: Selection effects. Elite-quartile teams self-select into adopting new tools faster across the board. AI adoption is a marker of "team that tries new things," not a cause of performance.

DORA 2024 explicitly does not claim Story 1. The Stride 2026 segmented refresh tests directly which story (or which combination) the data supports.

What Volume 1 will measure

DORA literature timeline 2014 to 2026Timeline with ten markers across 12 years of DORA research history.20142017202020232026DORA research2014Accelerate (book)2018DORA '19SWE @ Google2020DORA '21DORA '22DORA '23DORA '24 (AI)Oct 2024Stride V0May 2026(this page)Stride V1Q4 2026
Figure 3. The DORA literature timeline 2014–2026. Where Volume 0 sits in the DORA literature and where Volume 1 lands. DORA 2024 (the AI-era refresh) is highlighted as the most-cited modern source.

Markers compiled from each source's published release date.

Chart description (text)

Horizontal timeline with ten markers from 2014 to 2026. DORA research begins in 2014 with the early Forsgren research. The Accelerate book publishes in 2018. DORA Accelerate Reports publish annually from 2019 through 2024. Software Engineering at Google publishes in 2020. The DORA 2024 report (highlighted) is the AI-era refresh and the most-cited recent source. Stride Volume 0 publishes May 2026 (highlighted as the current page). Stride Volume 1 publishes Q4 2026 as a forthcoming marker shown in dashed muted treatment.

The Stride 2026 DORA segmented refresh is an 800-respondent engineering-leader survey designed to test the three causal stories above. The survey uses DORA's published item wording verbatim — same quartile-classification questions — so cohort comparisons against DORA 2024 are direct.

What the design adds beyond DORA 2024:

  • Tighter segmentation: industry × team size × AI-adoption-depth × regulated-vs-unregulated. DORA reports primary tables at the aggregate level; the segmented refresh runs the cross-tabs DORA's annual cadence doesn't publish.
  • Practice-maturity controls: hierarchical regression with engineering-practice covariates (trunk-based development, CI/CD maturity, on-call rotation maturity, test automation maturity) to test whether the AI-quartile correlation survives controls.
  • Causal-story tests: explicit operationalisation of the three stories above as testable hypotheses (H1–H3 below).

Pre-registered hypotheses

These are the five hypotheses we register on the Open Science Framework before fielding closes. They are the hypotheses we test in Q4 2026.

  1. H1 — Quartile correlation weakens with controls. Elite-quartile DORA performance will correlate with high AI adoption (replicating DORA 2024 directionally), but correlation strength will weaken by ≥30% after controlling for company size and pre-existing engineering-practice maturity. Test: hierarchical linear regression; Cohen's effect sizes; Wilson 95% CIs.
  2. H2 — Heavy AI raises CFR in non-elite quartiles. Change failure rate will be statistically higher in heavy-AI cohorts than selective-AI cohorts within the lower three quartiles. The elite quartile will not show this gap. Test: ANOVA across cohorts × quartile.
  3. H3 — Deployment frequency improvements concentrate in small teams. Year-over-year deployment-frequency gains will be larger for teams ≤50 (≥30% improvement) than teams 5,000+ (≤10% improvement). Test: difference-in-differences regression.
  4. H4 (null) — Time to Restore is practice-driven, not AI-driven. Time to Restore service will not differ between AI-adopted and AI-opted-out cohorts after controlling for team practice maturity (CI/CD, on-call rotation, runbook quality). The null is the finding. Test: ANCOVA with practice covariates.
  5. H5 (exploratory) — Two-factor explanation. A two-factor model (engineering practices × AI-adoption depth) will explain ≥80% of DORA quartile membership variance, with practices alone explaining ≥60%. Exploratory framing.

Methodology summary

The full methodology is on the companion page. The short version:

  • Survey instrument: ~52 substantive items + DORA's published quartile questions verbatim. Median completion 13 minutes.
  • Recruitment: 800 engineering leaders via Prolific Academic + organic top-up. Balanced across team size (1-9, 10-49, 50-499, 500-4999, 5000+), industry, regulated/unregulated, AI-adoption depth.
  • Statistics: Wilson 95% CIs; Cohen's h for proportions; Cohen's d + for continuous. Benjamini–Hochberg FDR correction for the planned hypothesis family.
  • Practice-maturity scoring: 12-item composite covering trunk-based development, CI/CD, code review, test automation, on-call rotation, post-incident review, runbook quality.
  • Pre-registration: will be cross-registered on OSF before fielding closes; the link will appear here once registration completes.

Dataset publication (Volume 1)

When Volume 1 lands, the survey dataset publishes under CC-BY-4.0: anonymised individual-response CSV with weighted-indicator column, pre-computed cross-tab tables (industry × quartile, team-size × quartile, AI-depth × quartile, regulated × quartile, practice-maturity × quartile), JSON-Schema data dictionary. Single ZIP bundle + Zenodo DOI for permanent citability.

Limitations and what to expect

  • Self-reported quartile classification. Like DORA itself, the Stride refresh asks respondents to self-classify against the four metrics. Self-classification is known to skew slightly toward higher quartiles; a sensitivity check against telemetry-derived classification will be reported.
  • English-language, predominantly Western sample. Like the State-of-AI Volume 0, the panel arm skews US/UK/EU.
  • Cross-sectional only. The Stride refresh measures a single time point. Causal claims require longitudinal data; Volume 1 will not claim causation any more than DORA does, even where the segmentation suggests it directionally.
  • The literature is moving. A new DORA report typically publishes annually around October; if DORA 2025 lands between Volume 0 and Volume 1, we will append a "Recent developments" note rather than rewriting the body.

Participate in Volume 1

If you are an engineering leader and would like to participate in the Volume 1 segmented refresh (Prolific arm or organic arm), reach out to research@newlightai.com.

References

  1. Forsgren, N., Humble, J. & Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution.
  2. DORA Accelerate State of DevOps Report 2024. Google DORA.
  3. DORA Accelerate State of DevOps Reports 2018–2023. Google Cloud + DORA.
  4. Winters, T., Manshreck, T. & Wright, H. (2020). Software Engineering at Google. O'Reilly.
  5. Humble, J. & Farley, D. (2010). Continuous Delivery. Addison-Wesley.
  6. Kim, G., Behr, K. & Spafford, G. (2013). The Phoenix Project. IT Revolution.
  7. Stack Overflow Developer Survey 2024. Stack Overflow.

Frequently asked questions

Frequently asked questions

  • Why publish a DORA report when DORA already publishes one annually?

    Three reasons: (1) the cross-year synthesis of seven DORA reports rarely surfaces in industry coverage and is more legible side-by-side than in any single year; (2) pre-registering hypotheses before data collection is the difference between a research report and a press release with footnotes; (3) DORA's annual cadence doesn't publish the segmented cross-tabs (industry × team size × AI-adoption-depth × practice-maturity) the Stride refresh does.

  • Does AI cause elite-quartile DORA performance?

    DORA 2024 reports correlation only; DORA explicitly does NOT claim causation in any published report. The popular reading routinely ignores that hedge. The Stride 2026 Volume 1 segmented refresh tests directly whether the AI-quartile correlation survives controls for company size and engineering practice maturity. Volume 0 is honest that the literature does not yet support a causal claim.

  • Are these the same four metrics as the 2014 framework?

    Substantively yes, with periodic refinement. Notably, "Time to Recover" became "Time to Restore Service" in 2024 to disambiguate from MTTR-as-Repair (engineering time on the underlying fix) versus MTTR-as-Restore (wall-clock time to service restoration). The Stride Volume 1 instrument uses DORA's current published wording verbatim.

  • Why include the Accelerate book — it's pre-AI?

    Because the four-metric framework was established in the book + the supporting early DORA research. The book's empirical claim (the four metrics correlate with organisational performance) predates the AI era; AI is a 2024 addition, not the foundation. Citing the book frames AI as one capability among many, not as the framework's organising variable.

  • Is DORA still relevant for high-velocity AI-augmented teams?

    Yes — but with the caveat that the framework measures team-level outcomes, not AI-attributable contributions to those outcomes. A team that gets faster with AI on the input side and slower with AI's defects on the output side may net the same DORA score with a different underlying mechanism. The Volume 1 refresh tests this directly by segmenting on AI-adoption depth.

  • Will the dataset be released?

    Yes, under CC-BY-4.0 when Volume 1 publishes. The release includes the anonymised individual-response CSV, per-respondent quartile classifications, per-respondent practice-maturity composite scores, pre-computed cross-tab tables, and a JSON-Schema data dictionary.

  • When does Volume 1 publish?

    Target Q4 2026 at the same canonical URL. The synthesis sections of Volume 0 stay as the "Prior public evidence" framing; Volume 1 primary findings replace the "What Volume 1 will measure" section and add a dataset link. No URL change.

  • Why 800 respondents and not 8,000?

    800 is large enough for the planned hypothesis family (5 hypotheses × 4-way segmentation) to achieve adequate statistical power, and small enough to permit higher-quality recruitment (Prolific Academic panel + organic top-up) without the data-quality compromises that come with very large convenience samples. DORA's 39,000-respondent reach is a different design optimised for breadth; the Stride refresh is optimised for tighter segmentation.

  • Can I participate?

    Yes — if you are an engineering manager, director, VP, or staff+ engineer with ≥3 years of leadership tenure. Compensated panel arm is recruited through Prolific Academic when fielding opens; email research@newlightai.com to be notified. Organic top-up arm is open to industry partners and community referrals.

  • How should I cite Volume 0?

    See §"Cite this volume" further down the page for APA, Chicago, BibTeX, and Markdown formats. Short version: cite the report by title, year 2026, author "Stride Research," URL at the canonical /research/dora-metrics-in-practice-2026.

Related work

DevTools landscape

HCI and behavioural research

  • Accelerate: The Science of Lean Software and DevOps

    Forsgren, N., Humble, J. & Kim, G. · 2018

    Book establishing the four-metric framework. The empirical core that all subsequent DORA reports build on.

  • Continuous Delivery

    Humble, J. & Farley, D. · 2010

    Pre-DORA foundational work on the engineering practices that DORA later observed correlating with elite-quartile performance.

  • The Phoenix Project

    Kim, G., Behr, K. & Spafford, G. · 2013

    Narrative companion to the technical DORA literature; useful for sense-making about why the practices DORA correlates with matter.

  • Software Engineering at Google

    Winters, T., Manshreck, T. & Wright, H. · 2020

    Single-org case study at the largest scale; comparator for DORA's cross-org cohort findings on which practices matter.

Methodology references

  • DORA Methodology Page

    Google DORA · 2024

    Official DORA methodology documentation. The item wording the Volume 1 instrument uses verbatim.

  • Wilson 1927 Confidence Intervals

    Wilson, E. B. · 1927

    The CI methodology Volume 1 uses for every quoted percentage. Foundational paper.

  • OSF Pre-Registration Template

    Open Science Framework · 2024

    The pre-registration template Volume 1 will cross-register before fielding closes.

From the Stride blog

  • ROI of AI in software delivery

    Stride · 2025

    Pre-figures the H1 hypothesis: AI-quartile correlation may weaken after controlling for practice maturity.

  • Connected delivery graph

    Stride · 2025

    Argues that the DORA metrics need an integrated delivery graph to interpret — context for the Volume 1 practice-maturity composite.

Reference this Volume 0 in your own writing using the citation below.

How to cite this report

Stride Research. (2026). DORA Metrics in Practice 2026 — Volume 0: Seven-year synthesis and pre-registered segmented refresh. Newlight Solutions. https://www.stride.page/research/dora-metrics-in-practice-2026