What is the purpose of a Symptom?

A symptom is evidence of trouble, never the trouble itself, and it is the entry point to diagnosis. Examples include an error spike, a customer complaint, a slow page, or a dropped metric, all held deliberately apart from the underlying cause. The symptom opens a diagnostic chain: it triggers an investigation that traces back to a root cause, which is then resolved by a fix.

How do you use a Symptom in product management?

Record exactly what was observed and how it was detected, without leaping to a cause, "checkout error rate jumped to 4% at 14:10", not "the payment service is broken". Capture impact and timing, since these scope the investigation. Resist fixing the symptom alone; link it onward to the investigation that will find why.

What are common mistakes with a Symptom?

The classic failure is treating the symptom as the problem and fixing it directly (restarting the service, clearing the queue, papering over the error) so the same fault resurfaces because its cause was never found. Equally misleading is recording a symptom already laced with a presumed cause, which biases the investigation before it begins. And a symptom logged without its impact and timing leaves responders unable to tell an emergency from a curiosity.

⚠️

Symptom

Q: What is a Symptom?

The observable manifestation of a fault: the thing a user, a dashboard, or an alert can actually see.

Q: What is an example of a Symptom?

Checkout error spike: Payment-failure rate jumped from 0.3% to 4.1% at 14:10 UTC, detected by an alert. Affects all card payments; impact is direct revenue loss. Cause unknown. This is the symptom that opens the investigation.

An observable sign of a problem, not its cause.

EngineeringEngineering & Platformtype: 'symptom'interface: BaseNode

View in Graph

▼On this page

Description Properties Relationships Graph Position Related Entities

Description

A symptom is the observable manifestation of a fault: the thing a user, a dashboard, or an alert can actually see. It is evidence of trouble, never the trouble itself.

See moreSee less

Origin & evolution

The symptom-versus-cause distinction was borrowed from medicine, where a symptom is what the patient reports and the cause is what the clinician must infer. Kaoru Ishikawa's cause-and-effect diagram, presented in 1968, carried the idea into manufacturing, sorting observable effects from their candidate causes across families such as method and machine so that teams stop treating the visible defect as the explanation.

Modern operations reframed the distinction around alerting. Google's Site Reliability Engineering practice argues for symptom-based alerting: page on what the user experiences, such as elevated latency or a failing checkout, and avoid alerting directly on every internal cause. The reasoning is that causes are many and shift as the system changes, while the symptoms that hurt users are few and stable. Rob Ewaschuk's "Philosophy on Alerting", which seeded the SRE book's monitoring chapter, makes the case that good alerts answer "what is broken, and why" by paging on the symptom and leaving the why to investigation.

The live tension is calibration. Alert only on symptoms and you learn late, after users are already hurting. Alert on every cause and you drown in noise and start ignoring the page. The accepted balance pairs symptom alerts for paging with cause signals for diagnosis once you are already looking.

How it works in practice

Checkout conversion drops six percent at 14:00. That fall is the symptom: it is what the business dashboard shows and what a monitor flags. It says nothing about why. The investigation finds three contributing conditions: a third-party fraud check has slowed, a retry loop is amplifying the load, and a recent deploy shortened the client timeout. The single visible symptom sat on top of a small cluster of causes. Had the team alerted on raw fraud-check latency alone, they would have paged nightly on blips that never reached a customer. Alerting on the conversion symptom paged them exactly when it mattered, and the investigation supplied the rest.

Symptom vs. its neighbours

Root cause is the underlying condition the symptom points to. The root_cause explains the symptom; the symptom is the evidence trail back to it. One root cause can throw several symptoms, and several causes can converge on one.

Sidney Dekker's work in safety science sharpens the same point in the context of organisational incidents: what an investigation labels "human error" is itself a symptom, not a cause. In Dekker's framing, the operator's action is the observable end-point of a deeper system — the training gaps, time pressures, and design conditions that shaped the decision. Treating the visible act as the explanation stops the causal chain exactly where the useful work begins.

Incident is the declared, time-bounded event a serious symptom opens. A symptom is a signal; an incident is an organisational response with an owner and a timeline. A symptom can pass without ever becoming an incident.
Monitor is the instrument that watches for a symptom and fires when thresholds break. The monitor is the sensor; the symptom is what it senses. A symptom nobody monitors is one nobody learns about until a user reports it.

In the graph

In the Unified Product Graph, symptom sits in the engineering and reliability region as the observable layer of an incident chain. It is produced by root_cause_causes_symptom and located by investigation_surfaces_symptom. The watch comes from monitor_detects_symptom, and escalation runs through symptom_triggers_incident. Keeping the symptom as its own node, distinct from cause and incident, preserves the honest sequence of a diagnosis: a team can see whether a closed incident actually removed the underlying root_cause or merely silenced the symptom that exposed it.

Preview

Presets

titlesymptom_descriptionfirst_observed_at

severity

Mild inconvenience Notices but works around easily

frequency_countfrequency_period

frequency_rating

Regular Happens on a predictable, recurring basis.

affected_users_estimate

reproducibility

steps_to_reproduce

Symptom

Directors abandon the Safe Change approval modal without confirming or rejecting

SeverityMild inconvenienceFrequency ratingRegularReproducibilityfrequent

Symptom descriptionAfter the agent previews a structural change, directors close the modal without taking action. The workspace is left in a pending-approval state that blocks further agent actions.

First observed at2026-05-13T10:00:00Z

Frequency count340

Frequency periodP7D

Affected users estimate210

Steps to reproduceCreate a new workspace, describe a process to the Builder agent, wait for the change proposal, open the approval modal, then close it without acting.

Properties

Type-specific fields on BaseNode

symptom_descriptionstring

Plain-language description of observed behaviour. Primary content of the entity.

first_observed_atstring

ISO timestamp first observed in the wild. Pairs with `frequency_rating` and `reproducibility` for triage.

severityassessment

Severity for affected users. Independent of how widespread the symptom is. Canonicalised in v0.4.0: the ad-hoc `'low' | 'medium' | 'high' | 'critical'` shape was replaced by `UPGAssessment`.

Severity (5-point) scale →

Mild inconvenience

Notices but works around easily

Annoying

Frustrated but can continue

Significant

Has to change approach

Severe

Struggles to accomplish goal

Blocker

Cannot accomplish goal

frequency_countnumber

Exact observation count in the period. Pairs with `frequency_period` for a precise rate.

frequency_periodstring

Recurrence period (ISO-8601 `Duration`). @example 'P7D' (per week), 'P1D' (per day), 'PT1H' (per hour)

frequency_ratingenum

Qualitative frequency tier. Canonical replacement for the legacy `'once' | 'sporadic' | 'frequent' | 'constant' | string` shape. Use when an exact rate is unknown. Migration: `once → rare`, `sporadic → occasional`, `frequent → regular`, `constant → constant`.

Frequency rating scale →

constant

Constant

Effectively always; continuous occurrence.

regular

Regular

Happens on a predictable, recurring basis.

occasional

Occasional

Happens sometimes, without a fixed pattern.

rare

Rare

Happens infrequently.

other

Other

A frequency not captured by the above tiers.

affected_users_estimatenumber

Approximate count of users affected. Snapshot estimate.

reproducibilityenum

Reproduction reliability

alwaysfrequentintermittentrareonce

steps_to_reproducestring

Steps to reproduce

Inherited from BaseNode (6 fields)

idstringrequired

Unique identifier (UUID)

typeNodeTyperequired

Discriminator for the entity type

titlestringrequired

Display name

descriptionstring

Optional detailed description

statusstring

Lifecycle status

tagsstring[]

Freeform tags for filtering

Relationships

4 edge types connected to this entity.

Parents

Entities that can contain this type

Investigationinvestigation_surfaces_symptom

Cross-References

Contextual links across the graph

Root Causeroot_cause_causes_symptom

Monitormonitor_detects_symptom

Incidentsymptom_triggers_incident

Graph Position

1parent

⚠️Symptom

3cross-ref

Definition

A symptom is an observable sign that something is wrong, such as an error spike or a slow page, held apart from its cause. It opens a diagnostic chain, triggering an investigation that traces back to a root cause and forward to a fix.

Usage Guidance

Record exactly what was observed and how it was detected, without leaping to a cause, "checkout error rate jumped to 4% at 14:10", not "the payment service is broken".
Capture impact and timing, since these scope the investigation.
Resist fixing the symptom alone; link it onward to the investigation that will find why.

Anti-Patterns

The classic failure is treating the symptom as the problem and fixing it directly (restarting the service, clearing the queue, papering over the error) so the same fault resurfaces because its cause was never found.
Equally misleading is recording a symptom already laced with a presumed cause, which biases the investigation before it begins.
And a symptom logged without its impact and timing leaves responders unable to tell an emergency from a curiosity.

Examples

Checkout error spike

Payment-failure rate jumped from 0.3% to 4.1% at 14:10 UTC, detected by an alert. Affects all card payments; impact is direct revenue loss. Cause unknown. This is the symptom that opens the investigation.

Recurring onboarding complaints

Eight support tickets in a week all describe "can't find where to invite teammates", a symptom pointing to a navigation problem, but not yet the cause.

Symptom

An observable sign of a problem, not its cause.

EngineeringEngineering & Platformtype: 'symptom'interface: BaseNode

View in Graph

▼On this page

Description Properties Relationships Graph Position Related Entities

Description

A symptom is the observable manifestation of a fault: the thing a user, a dashboard, or an alert can actually see. It is evidence of trouble, never the trouble itself.

See moreSee less

Origin & evolution

How it works in practice

Symptom vs. its neighbours

Root cause is the underlying condition the symptom points to. The root_cause explains the symptom; the symptom is the evidence trail back to it. One root cause can throw several symptoms, and several causes can converge on one.

Incident is the declared, time-bounded event a serious symptom opens. A symptom is a signal; an incident is an organisational response with an owner and a timeline. A symptom can pass without ever becoming an incident.
Monitor is the instrument that watches for a symptom and fires when thresholds break. The monitor is the sensor; the symptom is what it senses. A symptom nobody monitors is one nobody learns about until a user reports it.

In the graph

Preview

Presets

titlesymptom_descriptionfirst_observed_at

severity

Mild inconvenience Notices but works around easily

frequency_countfrequency_period

frequency_rating

Regular Happens on a predictable, recurring basis.

affected_users_estimate

reproducibility

steps_to_reproduce

Symptom

Directors abandon the Safe Change approval modal without confirming or rejecting

SeverityMild inconvenienceFrequency ratingRegularReproducibilityfrequent

Symptom descriptionAfter the agent previews a structural change, directors close the modal without taking action. The workspace is left in a pending-approval state that blocks further agent actions.

First observed at2026-05-13T10:00:00Z

Frequency count340

Frequency periodP7D

Affected users estimate210

Steps to reproduceCreate a new workspace, describe a process to the Builder agent, wait for the change proposal, open the approval modal, then close it without acting.

Properties

Type-specific fields on BaseNode

symptom_descriptionstring

Plain-language description of observed behaviour. Primary content of the entity.

first_observed_atstring

ISO timestamp first observed in the wild. Pairs with `frequency_rating` and `reproducibility` for triage.

severityassessment

Severity for affected users. Independent of how widespread the symptom is. Canonicalised in v0.4.0: the ad-hoc `'low' | 'medium' | 'high' | 'critical'` shape was replaced by `UPGAssessment`.

Severity (5-point) scale →

Mild inconvenience

Notices but works around easily

Annoying

Frustrated but can continue

Significant

Has to change approach

Severe

Struggles to accomplish goal

Blocker

Cannot accomplish goal

frequency_countnumber

Exact observation count in the period. Pairs with `frequency_period` for a precise rate.

frequency_periodstring

Recurrence period (ISO-8601 `Duration`). @example 'P7D' (per week), 'P1D' (per day), 'PT1H' (per hour)

frequency_ratingenum

Frequency rating scale →

constant

Constant

Effectively always; continuous occurrence.

regular

Regular

Happens on a predictable, recurring basis.

occasional

Occasional

Happens sometimes, without a fixed pattern.

rare

Rare

Happens infrequently.

other

Other

A frequency not captured by the above tiers.

affected_users_estimatenumber

Approximate count of users affected. Snapshot estimate.

reproducibilityenum

Reproduction reliability

alwaysfrequentintermittentrareonce

steps_to_reproducestring

Steps to reproduce

Inherited from BaseNode (6 fields)

idstringrequired

Unique identifier (UUID)

typeNodeTyperequired

Discriminator for the entity type

titlestringrequired

Display name

descriptionstring

Optional detailed description

statusstring

Lifecycle status

tagsstring[]

Freeform tags for filtering

Relationships

4 edge types connected to this entity.

Parents

Entities that can contain this type

Investigationinvestigation_surfaces_symptom

Cross-References

Contextual links across the graph

Root Causeroot_cause_causes_symptom

Monitormonitor_detects_symptom

Incidentsymptom_triggers_incident

Graph Position

1parent

⚠️Symptom

3cross-ref

Definition

Usage Guidance

Record exactly what was observed and how it was detected, without leaping to a cause, "checkout error rate jumped to 4% at 14:10", not "the payment service is broken".
Capture impact and timing, since these scope the investigation.
Resist fixing the symptom alone; link it onward to the investigation that will find why.

Anti-Patterns

The classic failure is treating the symptom as the problem and fixing it directly (restarting the service, clearing the queue, papering over the error) so the same fault resurfaces because its cause was never found.
Equally misleading is recording a symptom already laced with a presumed cause, which biases the investigation before it begins.
And a symptom logged without its impact and timing leaves responders unable to tell an emergency from a curiosity.

Examples

Checkout error spike

Recurring onboarding complaints

Eight support tickets in a week all describe "can't find where to invite teammates", a symptom pointing to a navigation problem, but not yet the cause.