A post-incident analysis documenting lessons
A postmortem is the written record of an incidentIncidentDevOps & PlatformA production incidentView reference →: what happened, what caused it, what was done, and what will change so it does not happen the same way again. Its defining quality is blamelessness. The document interrogates the system that allowed the failure, and it deliberately refuses to interrogate the engineer who tripped over it.
The blameless idea was imported from safety science, not invented in software. James Reason's work on human error and Sidney Dekker's writing on Just Culture framed a hard finding from aviation and medicine: punishing the individual at the sharp end leaves the latent system conditions untouched, so the next person inherits the same trap. Dekker named the punitive reflex the Bad Apple Theory and showed it does not make complex systems safer.
John Allspaw carried that thinking into engineering. In 2012, as CTO of Etsy, he published Blameless PostMortems and a Just Culture on Etsy's engineering blog, arguing that engineers who feel safe to give a detailed account of what they saw and did are the organisation's best source of truth about how the system actually behaves. Etsy backed it with a tool, Morgue, for recording these reviews. Google's Site Reliability Engineering book then codified the practice and the template, and the industry followed.
The debate since has been about accountability. Blameless does not mean consequence-free; the refinement most teams reach is that you separate the account of the failure, which must be safe and honest, from any question of competence, which belongs to a different conversation. The phrase "blameless, not accountability-free" captures where the field landed.
After a Sev2 checkout outage, the incident commander schedules a postmortem within two days while memory is fresh. The author builds a minute-by-minute timeline from chat logs and graphs, then writes the analysis in language that names systems and not people: the deploy pipeline had no automated canary, so a bad config reached 100 percent of traffic in ninety seconds.
The document lists contributing factors rather than a single culprit, and ends with dated, owned action items: add a canary stage, alert on config diff size, document the rollback in a runbookRunbookDevOps & PlatformA runbook for incident responseView reference →. It is shared widely, because the point of writing it down is that the team three doors over learns the lesson without living the outage. A year later the canary action item is the reason a similar bad config is caught at 1 percent.
In the Unified Product Graph, PostmortemDevOps & PlatformA post-incident review is a leaf in the Operations and Quality region, reached from the anchor postmortemIncidentDevOps & PlatformA production incidentView reference → through incidentIncidentanalysed inPostmortemhierarchy and incident_analysed_in_postmortemIncidenttriggersPostmortemcross-domain. Its own causal edges are what make it valuable: incident_triggers_postmortemPostmortemidentifiesRoot Causecausal connects the analysis to the condition it found, and postmortem_identifies_root_causePostmortemproducesRunbookcausal connects it to the operational change it generated. That second edge is the learningLearningValidationAn insight gained from an experimentView reference → loop made structural. A postmortem that produces no runbook and identifies no root cause is visibly a document that changed nothing.postmortem_produces_runbook
Type-specific fields on BaseNode
timelinestringChronological timeline. Events with timestamps in order. @example "03:15 Alert fired. 03:20 On-call acknowledged. 03:45 Root cause identified. 06:30 Service restored."
action_itemsstringFollow-up actions with owners and due dates. @example "1. Add circuit breaker to auth service (owner: Platform, due: 2026-04-12). 2. Update runbook for DB failover."
detection_methodstringDetection source. Key learning for improving detection coverage. @example "alert" if monitoring caught it, "customer_report" if a user reported first
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
4 phases — initial: draft · template: PUBLISHING
4 edge types connected to this entity.
incident_analysed_in_postmortemincident_triggers_postmortempostmortem_identifies_root_causepostmortem_produces_runbook