The recorded outcome of executing a test (pass, fail, skipped, or errored) together with the evidence that explains why.
A test result is the verdict a test caseTest CaseQuality AssuranceAn individual test caseView reference → or suite emits when it runs: pass, fail, skip, or the unwelcome fourth state, flaky. It is the signal a team gates releasesReleaseProduct SpecificationA shipped version of the productView reference → on, and its authority rests entirely on being trustworthy. A result is only as useful as the confidence that a green means safe and a red means broken; the moment a suite produces results that flip without the code changing, the whole signal degrades.
The pass/fail verdict is foundational to automated testing, and for most of its history the model was binary: a test asserted something, and the assertion either held or it did not. Skip arrived as an honest third state for tests deliberately not run, on an unsupported platform, behind a disabled feature flagFeature FlagEngineeringA feature toggle for controlled rolloutView reference →, or pending a fix.
The fourth state, flaky, is the one that reshaped how teams read results. A flaky test produces both passing and failing outcomesOutcomeStrategyA desired business or user outcomeView reference → against the same code, so its result carries no information about whether the product works. Google quantified the scale of the problem in its widely cited post, Flaky Tests at Google and How We Mitigate Them (2016): about 1.5% of all test runs were flaky, and roughly 84% of observed pass-to-fail transitions were caused by flakiness rather than a real regression. At that scale, a single run's result becomes nearly meaningless as a gate, because most red is noise.
That finding pushed the field from reading single results toward reading trends. Google built tooling to automatically quarantine tests above a flakiness threshold, removing them from the critical path and filing bugsBugProduct SpecificationA defect or unexpected behaviourView reference → against them. The lesson generalised: a result is a data point, and the honest reading of a suite is the pattern across runs, not the verdict of the latest one.
A team's CI shows a failing build. The failed test is an end-to-end checkout test asserting that an order confirmation appears within three seconds. An engineer re-runs it; it passes. Re-runs again; fails. Same commit each time. The result is flaky, and the cause is a race between the test and an asynchronous email-send the page waits on.
Treating the single red as a real failure would have blocked a clean release for nothing. Reading the trend instead, the team sees this test has flipped on 9 of the last 40 runs with no related code changes. They quarantine it out of the merge gate, file a bug, and fix the race by waiting on an explicit confirmation event rather than a timeout. The result becomes deterministic again, and only then does it return to gating releases. The verdict regained its authority by becoming repeatable.
test_case_produces_test_resultTest CaseproducesTest Resultcausal.test_suite_produces_test_resultTest SuiteproducesTest Resultcausal; a single case result tells you which assertion moved.In the Unified Product Graph, Test ResultQuality AssuranceA test execution result sits in the quality and testing region as the emitted signal of verification. It is produced both at the granular level through test_resultTest CaseproducesTest Resultcausal and at the aggregate level through test_case_produces_test_resultTest SuiteproducesTest Resultcausal. Modelling the result as its own node, separate from the case that emits it, is what lets the graph hold history: a sequence of results over time exposes flakiness, surfaces trends, and turns "the build is red" into the more useful question of whether this red has been red before.test_suite_produces_test_result
Type-specific fields on BaseNode
result_statusstringOutcome of this execution. passed = all assertions met; failed = one or more assertions failed; timed_out = execution exceeded the timeout; skipped = test was not run; interrupted = test was stopped mid-run.
duration_msnumberDuration of this execution in milliseconds
retry_indexnumberRetry index. 0 = first attempt, 1 = first retry, etc.
error_messagestringError message if the test failed
version_testedstringVersion of the product or build under test
executed_atstringISO timestamp of the execution. @example "2026-04-05T14:30:00Z"
attachmentsstringComma-separated list of attachment names or URLs (screenshots, logs, traces)
commentstringNotes or commentary about this result
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
2 edge types connected to this entity.
test_suite_produces_test_resulttest_case_produces_test_result