What is an Experiment?

A structured test that confronts a hypothesis with evidence to produce a decision, fixing what to measure and the result that would change the team's mind.

What is the purpose of an Experiment?

An experiment is the mechanism for turning a hypothesis into a learning, generating the evidence base for product decisions. It moves a team from "we think" to "we know" by defining success before the test runs, so the bar cannot drift to fit the outcome.

How do you use an Experiment in product management?

Choose the cheapest experiment that can validate the riskiest assumption. Define the method, timeline, and success criteria before starting. Every experiment should produce at least one learning, even if the result is "inconclusive".

Where does the concept of an Experiment come from?

Controlled experimentation in product development was championed by Eric Ries ("The Lean Startup", 2011) and systematised by David Bland and Alex Osterwalder in "Testing Business Ideas" (2019), which catalogued 44+ experiment methods from landing page tests to Wizard of Oz prototypes.

What are common mistakes with an Experiment?

Running an experiment without a falsifiable hypothesis or a pre-declared success threshold means any result can be rationalised after the fact. Stopping the test the moment numbers look favourable bakes in peeking bias and inflates false positives. Treating every change as an experiment when there is no real uncertainty wastes traffic and slows the team. And shipping the winning variant without recording why it won leaves the next person to relitigate the same question.

🧫

Experiment

Q: What is an example of an Experiment?

5-day prototype test with 8 users: Built a clickable Figma prototype of the graph editor and tested with 8 target users. 6/8 completed the core task in under 3 minutes, validating the interaction model.

A structured test with hypothesis, measure, and decision threshold fixed in advance, not interpreted after the fact.

ValidationDiscovery, Research & Validationtype: 'experiment'interface: BaseNode

View in Graph

▼On this page

Description Properties Lifecycle Relationships Graph Position Related Entities

Description

An experiment is a structured test designed to confront a hypothesis with evidence and produce a decision. It specifies what you will do, what you will measure, and the result that would change your mind, all fixed before you start.

See moreSee less

Origin & evolution

Controlled experimentation is centuries old in science, but its move into product practice came through The Lean Startup (2011). Eric Ries argued that a startup is an engine for learning, and that experiments, not opinions or executive intuition, are how it earns validated knowledge. He paired this with innovation accounting: run an experiment, measure the effect on the metrics that matter, and decide whether to persevere or pivot. The build-measure-learn loop made the experiment the heartbeat of the method.

The practice matured from "run an experiment" into "choose the right experiment". David Bland and Alexander Osterwalder's Testing Business Ideas (2019) catalogued 44 named experiments, sequenced from cheap and weak (interviews, landing pages, Wizard of Oz tests) to expensive and strong (working prototypes, paid pilots). Their rule of thumb reorders intuition: test desirability before feasibility, run the cheapest experiment that produces meaningful evidence, and aim it at the riskiest assumption first. A polished test of a safe belief removes little risk.

Jake Knapp's *Sprint* (2016) extends the same discipline by fixing both the timebox and the test format: a five-day process in which a team builds a realistic prototype on day four and watches five target users interact with it on day five. The contribution to experiment design is structural rather than catalytic — it shows that compressing the cycle to a single week forces the team to commit to one critical question before building anything, and that a high-fidelity façade tested with real users can resolve that question without shipping a line of production code.

The cost-of-evidence frame is now the discipline's centre of gravity. Each experiment is judged by how much risk it removes per unit of time and money spent, which is why a fake door or a concierge test can beat months of engineering when the open question is whether anyone wants the thing at all.

How it works in practice

A fintech team believes small businesses will pay for automated invoice chasing. The riskiest assumption is desirability: will anyone actually pay? Building the feature would take a quarter, so they design the cheapest test that still draws real evidence. They ship a "smoke test" landing page describing the service with a "Start free trial" button, drive 2,000 visitors through a small ad spend, and define the decision rule in advance: a click-through to the pricing step above 8% justifies building a concierge version, anything under 4% kills the idea.

The result comes back at 11%. That is not proof the product will succeed, and they do not treat it as such. It clears one specific hurdle, desirability, cheaply, and licenses the next, more expensive experiment: a manually-operated concierge service for ten real customers to test whether they stay once the chasing actually happens. The experiments are staged so each one buys the right to run the next.

Experiment vs. its neighbours

A/B test is one experiment type: a live, randomised, controlled comparison of two variants on real traffic, strong for optimising an existing flow. The broader category of experiment includes tests with no live product at all, such as interviews and landing pages, used precisely when there is nothing yet to split-test.
Prototype is an artefact you build to provoke a reaction; an experiment is the test that surrounds it, including the hypothesis, the metric, and the decision rule. A prototype with no measurement plan is a demo, not a test.
Hypothesis is the prediction; the experiment is the procedure that judges it. A common failure is running activity (shipping, demoing, surveying) with no hypothesis attached, which generates data nobody can act on.

In the graph

In the Unified Product Graph, experiment is a self-nesting hub in the Discovery, Research & Validation region, sitting downstream of the hypothesis it tests. The structure separates intent from execution: an experiment holds its design via experiment_has_plan and its execution via experiment_executed_as_experiment_run, so a planned test and an actual run are distinct nodes. Output is first-class: the run validates the claim through experiment_run_validates_hypothesis and measures its effect through experiment_run_measures_metric, while the experiment yields experiment_produces_learning and experiment_produces_evidence. Because evidence and learning are their own entities, the graph can answer which beliefs have actually been tested, and which still rest on intuition.

Worked example: Trellis

Trellis tested the trust hypothesis by shipping explainable, reversible agent change to 10 percent of workspaces and measuring approved-versus-reverted agent changes and week-4 retention against a control. The experiment is the sharpest way to distinguish between a hypothesis about governed autonomy and the competing assumption that raw generation speed is what directors actually want.

Preview

Presets

titlemethodstart_dateend_datesample_sizeexpected_lift

expected_lift_unit

actual_lift

Experiment

Safe Change explainability rollout: 10 percent of workspaces, 4-week measurement

Expected lift unitpercentage

MethodA/B test measuring approved versus reverted agent changes and week-4 retention

Start date2025-03-01

End date2025-03-28

Sample size400

Expected lift20

Actual lift27

Properties

Type-specific fields on BaseNode

methodstring

Experimental method (e.g. "A/B test", "usability study", "smoke test")

start_datestring

ISO start date

end_datestring

ISO end date

sample_sizenumber

Targeted participants or observations

expected_liftnumber

Expected change in the primary metric

expected_lift_unitenum

Unit of `expected_lift`

percentageabsoluteratio

actual_liftnumber

Observed change in the primary metric

Inherited from BaseNode (6 fields)

idstringrequired

Unique identifier (UUID)

typeNodeTyperequired

Discriminator for the entity type

titlestringrequired

Display name

descriptionstring

Optional detailed description

statusstring

Lifecycle status

tagsstring[]

Freeform tags for filtering

Lifecycle

5 phases, initial: planned · template: STUDY

All lifecycles

Relationships

6 edge types connected to this entity.

Parents

Entities that can contain this type

Experiment Planexperiment_plan_designs_experiment

Children

Entities this type can contain

Experiment Runexperiment_executed_as_experiment_run

Learningexperiment_produces_learning

Evidenceexperiment_produces_evidence

Cross-References

Contextual links across the graph

Hypothesishypothesis_tested_by_experiment

Hypothesisexperiment_validates_hypothesis

Graph Position

1parent

🧫Experiment

3children

2cross-ref

Definition

An experiment is a structured test that fixes what to measure, how to measure it, and what result would change the team's mind, designed to validate or invalidate a hypothesis. It turns a hypothesis into evidence.

Usage Guidance

Choose the cheapest experiment that can validate the riskiest assumption.
Define the method, timeline, and success criteria before starting.
Every experiment should produce at least one learning, even if the result is "inconclusive".

Anti-Patterns

Running an experiment without a falsifiable hypothesis or a pre-declared success threshold means any result can be rationalised after the fact.
Stopping the test the moment numbers look favourable bakes in peeking bias and inflates false positives.
Treating every change as an experiment when there is no real uncertainty wastes traffic and slows the team.
And shipping the winning variant without recording why it won leaves the next person to relitigate the same question.

Examples

5-day prototype test with 8 users

Built a clickable Figma prototype of the graph editor and tested with 8 target users. 6/8 completed the core task in under 3 minutes, validating the interaction model.

Experiment

A structured test with hypothesis, measure, and decision threshold fixed in advance, not interpreted after the fact.

ValidationDiscovery, Research & Validationtype: 'experiment'interface: BaseNode

View in Graph

▼On this page

Description Properties Lifecycle Relationships Graph Position Related Entities

Description

See moreSee less

Origin & evolution

How it works in practice

Experiment vs. its neighbours

A/B test is one experiment type: a live, randomised, controlled comparison of two variants on real traffic, strong for optimising an existing flow. The broader category of experiment includes tests with no live product at all, such as interviews and landing pages, used precisely when there is nothing yet to split-test.
Prototype is an artefact you build to provoke a reaction; an experiment is the test that surrounds it, including the hypothesis, the metric, and the decision rule. A prototype with no measurement plan is a demo, not a test.
Hypothesis is the prediction; the experiment is the procedure that judges it. A common failure is running activity (shipping, demoing, surveying) with no hypothesis attached, which generates data nobody can act on.

In the graph

Worked example: Trellis

Preview

Presets

titlemethodstart_dateend_datesample_sizeexpected_lift

expected_lift_unit

actual_lift

Experiment

Safe Change explainability rollout: 10 percent of workspaces, 4-week measurement

Expected lift unitpercentage

MethodA/B test measuring approved versus reverted agent changes and week-4 retention

Start date2025-03-01

End date2025-03-28

Sample size400

Expected lift20

Actual lift27

Properties

Type-specific fields on BaseNode

methodstring

Experimental method (e.g. "A/B test", "usability study", "smoke test")

start_datestring

ISO start date

end_datestring

ISO end date

sample_sizenumber

Targeted participants or observations

expected_liftnumber

Expected change in the primary metric

expected_lift_unitenum

Unit of `expected_lift`

percentageabsoluteratio

actual_liftnumber

Observed change in the primary metric

Inherited from BaseNode (6 fields)

idstringrequired

Unique identifier (UUID)

typeNodeTyperequired

Discriminator for the entity type

titlestringrequired

Display name

descriptionstring

Optional detailed description

statusstring

Lifecycle status

tagsstring[]

Freeform tags for filtering

Lifecycle

5 phases, initial: planned · template: STUDY

All lifecycles

Relationships

6 edge types connected to this entity.

Parents

Entities that can contain this type

Experiment Planexperiment_plan_designs_experiment

Children

Entities this type can contain

Experiment Runexperiment_executed_as_experiment_run

Learningexperiment_produces_learning

Evidenceexperiment_produces_evidence

Cross-References

Contextual links across the graph

Hypothesishypothesis_tested_by_experiment

Hypothesisexperiment_validates_hypothesis

Graph Position

1parent

🧫Experiment

3children

2cross-ref

Definition

Usage Guidance

Choose the cheapest experiment that can validate the riskiest assumption.
Define the method, timeline, and success criteria before starting.
Every experiment should produce at least one learning, even if the result is "inconclusive".

Anti-Patterns

Running an experiment without a falsifiable hypothesis or a pre-declared success threshold means any result can be rationalised after the fact.
Stopping the test the moment numbers look favourable bakes in peeking bias and inflates false positives.
Treating every change as an experiment when there is no real uncertainty wastes traffic and slows the team.
And shipping the winning variant without recording why it won leaves the next person to relitigate the same question.

Examples

5-day prototype test with 8 users

Built a clickable Figma prototype of the graph editor and tested with 8 target users. 6/8 completed the core task in under 3 minutes, validating the interaction model.