A procedure for operational tasks or incidents
A runbook is a documented procedure for carrying out an operational taskTaskProduct SpecificationA unit of work within a story or epicView reference → or responding to a known failure: an ordered set of steps, with the checks and escalations that go between them, written so that someone other than the author can execute it correctly under pressure. It is institutional memory turned into a repeatable action.
The runbook is one of the oldest artefacts in operations, older than the web. In the mainframe era of the 1960s and 1970s, operators kept physical binders, literally "run books", describing how to run scheduled jobsJobUserJob To Be Done: what the user is trying to accomplishView reference → on systems like the IBM System/360, complete with the Job Control Language to invoke and the error messages to expect. The form absorbed influence from aviation and nuclear checklists: short, unambiguous steps with explicit verification and clear escalation.
The practice modernised as operations did. Network operations centres turned binders into wikis; the rise of Site Reliability Engineering reframed the runbook as a living, linked document, ideally reachable straight from the alert that needsNeedUserA user need, pain, desire, or constraintView reference → it. The Google SRE book makes the case bluntly that a good runbook, combined with sensible alerting, cuts mean time to repair and reduces the cognitive load on a tired on-call engineer at 3am.
The frontier is automation. As runbook steps become precise enough to script, they graduate into runbook automation, where the document executes itself and the human approves rather than types. The open debate is how far to push that: an automated runbook that no one understands is a new failure mode, so the better teams keep the prose runbook as the source of truth even when a machine runs the steps.
A team owns a Redis cache that occasionally fills to capacity. Rather than rediscover the fix each time, they write a runbook titled "Redis memory pressure". It opens with how to confirm the symptomSymptomEngineeringA symptom of a problemView reference →, the exact dashboardDashboardData & AnalyticsAn analytics dashboardView reference → panel and the threshold, then the safe remediation: scale the instance, flush a specific key prefix, and the explicit warning about which keys must never be flushed.
The alert that fires on memory pressure links directly to this runbook. When it triggers at 04:00, the on-call engineer, who has never touched Redis, follows the steps and resolves it in six minutes without paging the specialist. The procedure that lived in one person's head now lives where the alert can find it, which is the entire point.
In the Unified Product Graph, RunbookDevOps & PlatformA runbook for incident response is a leaf in the Operations and Quality region, with runbookdevops as its home domain. It is reached through hierarchy edges such as Productdocumented inRunbookhierarchy and product_documented_in_runbookInfrastructure Componentdocumented inRunbookhierarchy, which anchor it to whatever it documents. Two edges capture its operational life: infrastructure_component_documented_in_runbookAlert RuletriggersRunbookcross-domain connects detection to procedure, and alert_rule_triggers_runbookRunbookmitigatesIncidentcross-domain connects procedure to the live event it resolves. The causal edge runbook_mitigates_incidentPostmortemproducesRunbookcausal closes the loop, recording that this procedure exists because something once went wrong and the team chose to remember.postmortem_produces_runbook
Type-specific fields on BaseNode
triggerstringTriggering event or alert. @example "Error rate exceeds 5% for 5 minutes", "Database connection pool exhausted"
stepsstring[]Ordered steps, one action per element. @example ["Check Grafana dashboard X", "SSH into affected node", "Restart service Y"]
last_testedstringISO date last tested or rehearsed. Runbooks degrade if untested. @example "2026-03-15"
automation_levelstringOperational maturity. Manual runbooks are candidates for automation investment. `semi_automated` = some steps scripted; human judgment still required.
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
4 phases — initial: draft · template: PUBLISHING
5 edge types connected to this entity.
product_documented_in_runbookinfrastructure_component_documented_in_runbookalert_rule_triggers_runbookrunbook_mitigates_incidentpostmortem_produces_runbook