Operations & Quality
Incidents that leave a lesson the next on-call can find
An incident resolves, a postmortem is written, and six months later the same class of failure recurs because the lesson lived in a doc nobody re-reads. UPG types the whole operational backbone: the pipeline that ships, the monitors and SLOs that watch, the incident that breaks, the postmortem and root cause that explain it, the fix that prevents the next one, the quality gates that guard the release, and the compliance and support that close the loop. The break-and-learn cycle becomes structure a team can query, rather than knowledge that leaves with the on-call engineer.
“Everything fails, all the time.”
The pipeline that ships, and the runbook that recovers
A ci pipelineci_pipelineA CI/CD pipeline produces the build artifactbuild_artifactA build output (binary, container image) and deploys the serviceserviceA deployable service or microservice, and the runbookrunbookA runbook for incident response that mitigates an incidentincidentA production incident is a typed node alongside them.
The path out of an incident links to the path that shipped it. Recovery is not improvised under pressure, because the runbook that handles a given failure is one edge from the incident it was written for.
ci_pipelineA CI/CD pipelinebuild_artifactA build output (binary, container image)serviceA deployable service or microservicerunbookA runbook for incident responseincidentA production incidentThe backbone carries code from commit to production. A CI pipeline produces the build artifact and deploys the service. The runbook that mitigates an incident is a node in the graph, so the path out of a failed deploy is linked to the path in.
What the monitor measures, and what the SLO promises
A monitormonitorA monitoring check watches the serviceserviceA deployable service or microservice, measures the service level indicatorservice_level_indicatorA service level indicator (SLI) behind an SLO, triggers via an alert rulealert_ruleAn alerting rule wired to a named owner, and detects the symptomsymptomA symptom of a problem before a user reports it.
The measurement and the promise sit in one graph. An alert always names the service and the indicator it fired on, so the page that opens already says where the problem is.
monitorA monitoring checkserviceA deployable service or microserviceservice_level_indicatorA service level indicator (SLI)alert_ruleAn alerting rulesymptomA symptom of a problemMeasurement and the promise made on it sit in the same graph. A monitor watches the service, measures the indicator behind an SLO, triggers via an alert rule routed to a person, and detects the symptom ahead of a user report. An alert names the service and the indicator it fired on.
What an incident connects to across the graph
An incidentincidentA production incident breaches a service level objectiveservice_level_objectiveA service level objective (SLO), affects a featurefeatureA product capability or feature, triggers a postmortempostmortemA post-incident review, is caused by a root causeroot_causeAn identified root cause of an issue, and generates support tickets. Each is a typed edge, so the blast radius resolves to a query rather than a reconstruction.
The loop closes back onto the rest of the product. The feature affected is the same one the roadmap tracks, and the SLO breached is the one engineering set. When a similar failure starts, the graph already records what happened last time.
incidentA production incidentservice_level_objectiveA service level objective (SLO)featureA product capability or featurepostmortemA post-incident reviewroot_causeAn identified root cause of an issuesupport_ticketCustomer support request or issueAn incident is the operations anchor, and it does not vanish when the page resolves. It breaches an SLO, affects a feature, triggers a postmortem, and is caused by a root cause, which links to the change that prevents the next one. The loop from break to learning to prevention is structure the team can query, not tribal memory.
How the postmortem turns into prevention
A postmortempostmortemA post-incident review identifies the root causeroot_causeAn identified root cause of an issue and produces a runbookrunbookA runbook for incident response, and the root cause is resolved by a fixfixA fix applied to resolve an issue, affects the serviceserviceA deployable service or microservice, and causes the bugbugA defect or unexpected behaviour it explains.
Prevention links to the failure it answers. When a similar incident appears, the graph names the root cause it shares and the fix that worked, in place of an engineer half-remembering a thread from last year.
postmortemA post-incident reviewroot_causeAn identified root cause of an issuerunbookA runbook for incident responsefixA fix applied to resolve an issueserviceA deployable service or microservicebugA defect or unexpected behaviourA postmortem retains the learning after the page resolves. It identifies the root cause and produces a runbook. The root cause is resolved by a fix, affects the service, and causes the bug it explains. A later incident that looks similar resolves against a graph that already records the earlier one.
The gate a release passes before it ships
A test suitetest_suiteA suite of related tests contains the test casetest_caseAn individual test case nodes and includes the regression testregression_testA regression test that guards against old bugs, is tested via a qa sessionqa_sessionAn exploratory QA session, covers the featurefeatureA product capability or feature it protects, and is measured by a test coverage reporttest_coverage_reportA test coverage report.
The question “is this safe to ship?” resolves to the gate the graph already holds. The regression that caught last quarter’s bug is linked to the feature it protects, so the reason a check exists stays attached to the work it guards.
test_suiteA suite of related teststest_caseAn individual test caseregression_testA regression testqa_sessionAn exploratory QA sessionfeatureA product capability or featuretest_coverage_reportA test coverage reportA test suite gates delivery. It contains the cases and includes the regressions that guard against old bugs, is tested via a QA session, covers the feature it protects, and is measured by a coverage report. Whether a change is safe to ship reads as the gate the graph already holds.
The audit and the support ticket, on one graph
A compliance frameworkcompliance_frameworkA compliance framework (SOC 2, GDPR, etc.) mandates a compliance requirementcompliance_requirementA compliance requirement, requires the security controlsecurity_controlA security control or mitigation that satisfies it, and is verified by a security auditsecurity_auditA security audit. The customer’s voice is typed in the same way: a support ticket reveals a needneedA user need, pain, desire, or constraint and reports a bugbugA defect or unexpected behaviour.
An audit and a customer report draw on the same structure. A ticket becomes discovery and delivery work, linked to the need it raised and the bug it found, so the help desk feeds the rest of the product rather than closing the thread.
compliance_frameworkA compliance framework (SOC 2, GDPR, etc.)compliance_requirementA compliance requirementsecurity_controlA security control or mitigationsecurity_auditA security auditneedA user need, pain, desire, or constraintbugA defect or unexpected behaviourAn audit and a customer report draw on the same structure. A compliance framework mandates requirements, requires the security controls that satisfy them, and is verified by an audit, while a support ticket reveals a need and reports a bug. A support ticket lands in the graph as discovery and delivery work rather than staying in a help desk.
Operations ties delivery and engineering to the reality of running the thing. Follow a thread back to the system or the measurement:
Pipelines, monitoring, incident response, security, quality gates, compliance, and support, with every property and edge.
The services an incident traces back to and the deploy that triggered it.
The metrics an SLO tracks and the data quality behind them.