Four research-backed metrics for measuring software delivery performance, derived from the DORA programme and the book Accelerate, covering deployment frequency, lead time, failure rate, and recovery speed.
How fast, how often, how reliably, and how safely does our team ship software?
How often you deploy to production
Time from commit to production
% of deployments causing failures
How long to recover from a failure
change_failure_rate * mean_time_to_recoveryDORA metricsMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → are four quantitative measures of software delivery and operational performance: DeploymentDeploymentEngineeringA deployment eventView reference → Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service. Together they let an engineering team place itself on a performance spectrum from low to elite, track improvement over time, and make the cost of slow or fragile delivery concrete to a non-technical audience.
The metrics come from the DevOps Research and Assessment programme, founded by Nicole Forsgren, Jez Humble, and Gene Kim. The research began in 2013 as an annual State of DevOps survey, co-produced with Puppet, drawing on thousands of responses from software practitioners worldwide. The goal was to move the conversation about DevOps maturity away from tool adoption ("are you using containers?") and onto observable outcomesOutcomeStrategyA desired business or user outcomeView reference → ("how fast can you ship, and how often do you break things?"). The programme produced a large-scale dataset linking delivery practices to both software performance and organisational outcomes.
Forsgren, Humble, and Kim published the findings in Accelerate: The Science of Lean Software and DevOps in 2018 (IT Revolution Press). The book is notable for grounding its claims in statistical analysis of years of survey data. It showed, across years of data, that high performers on the four metrics also outperformed on commercial outcomes: higher profitability, higher market share, better employee satisfaction scores.
Google acquired DORA in 2018. The programme now publishes annual State of DevOps reports and maintains a reference site at dora.dev. In 2021 the DORA team added a fifth metric, Reliability (capturing availability, latency, and error-rate targets), reflecting that delivery speed means little if the system is unhealthy. The four original keys remain the primary reference in most engineering teams' measurement work.
Each metric measures a different part of the delivery and recovery loop.
Deployment Frequency measures how often an organisation deploys code to production. Elite performers deploy on demand, multiple times per day. Low performers deploy monthly or less. The number is a proxy for batch size: teams deploying frequently are working in small, low-riskRiskComplianceA risk to the product or businessView reference → increments. High deployment frequency is both a goal and a prerequisite for the other metrics to be meaningful.
Lead Time for Changes measures the elapsed time from a code commit to that code running in production. Elite performers measure this in hours. Low performers measure in months. Lead time reflects the efficiency of the whole pipeline: code review practices, build times, deployment automation, approval gates. Long lead times signal bottlenecks worth finding.
Change Failure Rate measures the percentage of deployments that cause a production incidentIncidentDevOps & PlatformA production incidentView reference → or require a rollback. Elite performers target below 5 per cent. Low performers see 46 to 60 per cent. This metric distinguishes speed from recklessness. A team with high deployment frequency and a high failure rate is shipping fast but also breaking things frequently. The goal is high frequency paired with low failure rate.
Time to Restore Service measures how long it takes to recover from a production failure. Elite performers restore in under an hour. Low performers take between one week and one month. This metric reflects both the quality of incident response (runbooksRunbookDevOps & PlatformA runbook for incident responseView reference →, on-call practice, observability tooling) and the architectural properties that make recovery fast (the ability to roll back, featureFeatureProduct SpecificationA product capability or featureView reference →-flag off a bad change, or deploy a hotfix quickly).
A worked example. A mid-size SaaS company runs a quarterly measurement. Deployment Frequency is once per week. Lead Time is four days from commit to production. Change Failure Rate is 12 per cent. Time to Restore is six hours. This profile places the team in the "medium" band. The data prompts two conversations: the Change Failure Rate suggests test coverage or review discipline is weak, and the four-day Lead Time suggests a slow CI pipelineCI PipelineDevOps & PlatformA CI/CD pipelineView reference → or a heavyweight approval process. The team targets both. After a quarter of work, Deployment Frequency has risen to twice weekly, Lead Time has dropped to eighteen hours, and Change Failure Rate has fallen to 6 per cent. Time to Restore is unchanged at six hours. They have a concrete thread to pull.
DORA metrics are most valuable in teams that have enough deployment volume to produce meaningful numbers, a baseline of observability (you can see when things fail and when they recover), and the organisational buy-in to act on what the data shows.
They suit engineering organisations at any scale. A team of five using the metrics as a retrospectiveRetrospectiveTeam & OrganisationA team retrospectiveView reference → tool gets value. A platform engineering team using them to benchmark fifty product squads gets more value at the cost of instrumentation investment.
They are less useful in contexts where deployments are genuinely rare by design (regulated industries with mandatory change advisory board processes, hardware-tied firmware releasesReleaseProduct SpecificationA shipped version of the productView reference →), because the denominator is too small for Deployment Frequency and Change Failure Rate to be meaningful. In those environments the metrics can still inform the internal software pipeline while acknowledging that the final deployment gate has external constraintsConstraintStrategyA constraint entityView reference →.
The common failure modes: measuring only Deployment Frequency because it is the easiest to instrument and ignoring the others, treating low Change Failure Rate as the goal at the expense of deployment frequency (a team that ships once a quarter can keep failure rate low simply by taking very few risks), and gaming the numbers (marking incidents as "planned maintenance" to improve Change Failure Rate). The metrics work as a system. Optimise for all four together.
A second trap is presenting the numbers to leadership as a performance ranking of teams. DORA research is clear that the metrics are improvement tools, not league tables. Teams in different contexts (new product development versus a legacy codebase with twelve years of accumulated debt) are not comparable on the same scale.
DORA metrics form a collection framework in the metrics category. All four measures map to the same entity type, reflecting that they are a family of related performance indicators and not a hierarchy:
metricMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → entity, capturing the rate of production deployments over a measurement window.metricMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → entity, capturing the cycle time from commit to production.metricMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → entity, capturing the proportion of deployments that cause an incident or rollback.metricMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → entity, capturing the mean recovery time from production incidents.The Unified Product Graph models all four as MetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → nodes, which can carry target values, current values, and trend data as properties. Because each metric is a distinct entity, it can link to the capabilityCapabilityStrategyAn ability that enables value deliveryView reference → or practice node that is expected to move it. A metricMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → for Change Failure Rate can link to a feature node representing the investment in automated testing, making the causal chain explicit in the graph.metric