What is an Experiment Plan?

The design of a test written down before any data exists: the hypothesis, method, sample, success metric, and stopping rule.

What is the purpose of an Experiment Plan?

An experiment plan states what a team believes, how it will test that belief, and what result would change its mind, all committed before any outcome is seen, so the bar for success cannot shift afterwards. Kept separate from the run that executes it, one plan can be run more than once, with each run judged against the criteria fixed in advance.

How do you use an Experiment Plan in product management?

Write the hypothesis as a falsifiable prediction with a number and a threshold ("adding social proof lifts trial-to-paid conversion by at least 3 points"). Fix the success criteria, sample size, and duration before launch, never after seeing results. Connect the plan to the hypothesis or assumption it tests.

What are common mistakes with an Experiment Plan?

The cardinal sin is deciding the success criteria after the data is in. Once the threshold is movable, the experiment can only confirm what you hoped. Underpowered plans are nearly as bad: a test with too small a sample produces noise that gets read as signal. And a plan with no stated way to be wrong is not an experiment but a launch with a chart attached.

📐

Experiment Plan

Q: Where does the concept of an Experiment Plan come from?

The idea that a test must be planned before it runs was formalised by the replication crisis in psychology and biomedicine, where p-hacking and HARKing dressed up post-hoc patterns as predictions. Pre-registration is the corrective: Brian Nosek and colleagues at the Center for Open Science built the Open Science Framework and argued the case in The Preregistration Revolution (PNAS, 2018), separating the prediction from the analysis plan. Product experimentation inherited the discipline, choosing a metric and a stop rule up front.

Q: What is an example of an Experiment Plan?

Pricing-page A/B plan: Hypothesis: showing annual pricing by default lifts annual-plan selection by ≥5 points. Setup: 50/50 split, 2-week run, minimum 4,000 visitors per arm, significance at p < 0.05.

The hypothesis, setup, and success criteria for a test.

ValidationDiscovery, Research & Validationtype: 'experiment_plan'interface: BaseNode

View in Graph

▼On this page

Description Properties Lifecycle Relationships Graph Position Related Entities

Description

An experiment plan is the designed test written down before any data exists: the hypothesis, the method, the sample, the success metric, and the rule that says when to stop. It is fixed in advance, so the success criterion is committed before any outcome is seen.

See moreSee less

Origin & evolution

The idea that a test must be planned before it runs is old, but its modern formalisation comes from the replication crisis in psychology and biomedicine. Researchers noticed that statistically significant results kept failing to reproduce, and a chief culprit was p-hacking: trying many analyses, then reporting only the one that crossed the significance threshold. A close relative, HARKing, hypothesising after the results are known, dressed up post-hoc patterns as predictions.

Pre-registration is the corrective. You write the plan, including the analysis, and lodge it with a date stamp before collecting data. Brian Nosek and colleagues at the Center for Open Science built the Open Science Framework to host these plans, and argued the case directly in The preregistration revolution (PNAS, 2018). Their core claim is that pre-registration sharpens the line between hypothesis generation and hypothesis testing, so a reader can tell which results were predicted and which were discovered after the fact.

A useful debate followed. Alison Ledgerwood pointed out that a pre-registration mixes two separable things: the prediction and the analysis plan. Confirming a registered prediction with a registered analysis is the strong case; everything else is exploration, which is valuable and honest as long as it is labelled as such. Product experimentation inherited this discipline wholesale. An A/B test with a metric and a stop rule chosen up front resists the same biases that pre-registration was built to defeat.

Eric Ries made the same logic central to product practice in The Lean Startup, which frames a startup's core activity as running experiments rather than executing plans. His Build-Measure-Learn loop treats every product bet as a testable hypothesis: build the minimum version needed to generate data, measure a pre-chosen outcome, and learn whether the assumption holds. The experiment plan is what keeps that loop honest — without a success criterion committed before build, the measurement step becomes an audit of whatever the data happens to show.

How it works in practice

A growth team believes a shorter signup form will lift completions. The experiment plan states the hypothesis precisely: cutting the form from nine fields to four raises signup completion from a baseline of 38% by at least three percentage points. It fixes the method (a 50/50 split on new visitors), the primary metric (completion rate), and a power calculation that says roughly 14,000 visitors per arm are needed to detect that effect. It names a stop rule: run for two full weeks, no peeking-and-stopping when the line looks good. It also pre-commits a guardrail metric, downstream activation, so a cheap completion win that produces worse users gets caught. With all of that written before launch, the result is interpretable whichever way it lands.

Experiment plan vs. its neighbours

Hypothesis is the testable belief, a single falsifiable statement. The experiment plan is the apparatus built around it: how the hypothesis will be put at risk. One hypothesis can demand several experiment plans before it is settled.
Experiment run is a single execution of the plan against real users at a real time. The plan is the design; the run is one instance of carrying it out. Separating them lets the same plan be run twice, on different cohorts, and compared honestly.
Metric is what gets measured. The plan names a metric as its success criterion and points at it, but the metric exists independently and is usually tracked across many experiments.

In the graph

In the Unified Product Graph, experiment_plan sits in the validation region as the bridge between belief and evidence. A hypothesis reaches for it through hypothesis_requires_experiment_plan, the plan points at its yardstick through experiment_plan_targets_metric, and execution is recorded as a distinct node linked by experiment_plan_ran_as_experiment_run. Growth work connects through growth_campaign_tests_via_experiment_plan. Keeping plan, run, and result as separate connected nodes is the structural version of pre-registration: the design is committed and queryable before any outcome is attached to it, so nobody can quietly rewrite the test to fit the answer.

Worked example: Trellis

The experiment plan for the 10 percent rollout records Trellis's trust hypothesis, specifies the explainable-and-reversible-change setup as the treatment condition, and names approved-versus-reverted agent changes as the primary success criterion alongside week-4 retention. Capturing the plan as a distinct artifact means the learning that follows can be traced back to the exact setup and hypothesis rather than reconstructed after the fact.

Preview

Presets

title

method

success_criteriasample_size

projected_reach

Almost no one Affects <5% of users

projected_impact

Minimal Barely moves the needle

confidence

Confident Multiple data sources

cost_estimate

Trivial Hours

planned_start_dateplanned_end_date

Experiment Plan

Safe Change 10 percent rollout: A/B plan for explainable reversible agent change

Methoda_b_testProjected reachAlmost no oneProjected impactMinimalConfidenceConfidentCost estimateTrivial

Success criteriaApproved-versus-reverted agent change ratio is at least 20 percent higher in the preview group and week-4 retention is at least 8 percentage points above control.

Sample size400

Planned start date2025-03-01

Planned end date2025-03-28

Properties

Type-specific fields on BaseNode

methodenum

Experimental method. Drives renderer and analysis tooling.

a_b_testmultivariatequal_interviewprototype_testfake_doorwizard_of_ozlongitudinal

success_criteriastring

Plain-English description of "passing"

sample_sizenumber

Targeted participants or observations for the planned test. Absorbed from `test_plan` (UPG-678) when it re-homed to QA; the planning sample size now lives on the validation plan.

projected_reachassessment

Projected reach: how many people the run is expected to touch (UPGAssessment)

Reach (5-point) scale →

Almost no one

Affects <5% of users

A few

Affects 5-20% of users

Some

Affects 20-50% of users

Most

Affects 50-80% of users

Nearly everyone

Affects >80% of users

projected_impactassessment

Projected impact on the target metric (UPGAssessment)

Impact (5-point) scale →

Minimal

Barely moves the needle

Low

Small improvement

Moderate

Noticeable improvement

High

Significant improvement

Transformative

Game-changing

confidenceassessment

Team confidence at plan-time (UPGAssessment, scale `confidence_5`)

Confidence (5-point) scale →

Guessing

No evidence

Hunch

Anecdotal evidence

Some evidence

A few data points

Confident

Multiple data sources

Data-backed

Strong quantitative evidence

cost_estimateassessment

Cost estimate at plan-time (UPGAssessment)

Effort (5-point) scale →

Trivial

Hours

Small

Days

Medium

1-2 weeks

Significant

Weeks to months

Massive

Months+

planned_start_datestring

Planned start date

planned_end_datestring

Planned end date

Inherited from BaseNode (6 fields)

idstringrequired

Unique identifier (UUID)

typeNodeTyperequired

Discriminator for the entity type

titlestringrequired

Display name

descriptionstring

Optional detailed description

statusstring

Lifecycle status

tagsstring[]

Freeform tags for filtering

Lifecycle

4 phases, initial: drafted

All lifecycles

Relationships

7 edge types connected to this entity.

Parents

Entities that can contain this type

Hypothesishypothesis_requires_experiment_plan

Growth Campaigngrowth_campaign_tests_via_experiment_plan

Pricing Strategypricing_strategy_tests_experiment_plan

Children

Entities this type can contain

Experimentexperiment_plan_designs_experiment

Experiment Runexperiment_plan_ran_as_experiment_run

Cross-References

Contextual links across the graph

Metricexperiment_plan_targets_metric

Behavioral Segmentexperiment_plan_targets_behavioral_segment

Graph Position

3parents

📐Experiment Plan

2children

2cross-ref

Definition

An experiment plan is the design of a test written before any data exists: its hypothesis, method, sample, success metric, and stopping rule. It is kept separate from the run so one design can be executed more than once and each run is judged against criteria fixed in advance.

Usage Guidance

Write the hypothesis as a falsifiable prediction with a number and a threshold ("adding social proof lifts trial-to-paid conversion by at least 3 points").
Fix the success criteria, sample size, and duration before launch, never after seeing results.
Connect the plan to the hypothesis or assumption it tests.

Anti-Patterns

The cardinal sin is deciding the success criteria after the data is in.
Once the threshold is movable, the experiment can only confirm what you hoped.
Underpowered plans are nearly as bad: a test with too small a sample produces noise that gets read as signal.
And a plan with no stated way to be wrong is not an experiment but a launch with a chart attached.

Examples

Pricing-page A/B plan

Hypothesis: showing annual pricing by default lifts annual-plan selection by ≥5 points. Setup: 50/50 split, 2-week run, minimum 4,000 visitors per arm, significance at p < 0.05.

Onboarding-checklist plan

Hypothesis: a 3-step activation checklist raises day-7 retention by ≥2 points. Success measured on the cohort entering during the run window, with a guardrail metric on support-ticket volume.

Experiment Plan

The hypothesis, setup, and success criteria for a test.

ValidationDiscovery, Research & Validationtype: 'experiment_plan'interface: BaseNode

View in Graph

▼On this page

Description Properties Lifecycle Relationships Graph Position Related Entities

Description

See moreSee less

Origin & evolution

How it works in practice

Experiment plan vs. its neighbours

Hypothesis is the testable belief, a single falsifiable statement. The experiment plan is the apparatus built around it: how the hypothesis will be put at risk. One hypothesis can demand several experiment plans before it is settled.
Experiment run is a single execution of the plan against real users at a real time. The plan is the design; the run is one instance of carrying it out. Separating them lets the same plan be run twice, on different cohorts, and compared honestly.
Metric is what gets measured. The plan names a metric as its success criterion and points at it, but the metric exists independently and is usually tracked across many experiments.

In the graph

Worked example: Trellis

Preview

Presets

title

method

success_criteriasample_size

projected_reach

Almost no one Affects <5% of users

projected_impact

Minimal Barely moves the needle

confidence

Confident Multiple data sources

cost_estimate

Trivial Hours

planned_start_dateplanned_end_date

Experiment Plan

Safe Change 10 percent rollout: A/B plan for explainable reversible agent change

Methoda_b_testProjected reachAlmost no oneProjected impactMinimalConfidenceConfidentCost estimateTrivial

Success criteriaApproved-versus-reverted agent change ratio is at least 20 percent higher in the preview group and week-4 retention is at least 8 percentage points above control.

Sample size400

Planned start date2025-03-01

Planned end date2025-03-28

Properties

Type-specific fields on BaseNode

methodenum

Experimental method. Drives renderer and analysis tooling.

a_b_testmultivariatequal_interviewprototype_testfake_doorwizard_of_ozlongitudinal

success_criteriastring

Plain-English description of "passing"

sample_sizenumber

Targeted participants or observations for the planned test. Absorbed from `test_plan` (UPG-678) when it re-homed to QA; the planning sample size now lives on the validation plan.

projected_reachassessment

Projected reach: how many people the run is expected to touch (UPGAssessment)

Reach (5-point) scale →

Almost no one

Affects <5% of users

A few

Affects 5-20% of users

Some

Affects 20-50% of users

Most

Affects 50-80% of users

Nearly everyone

Affects >80% of users

projected_impactassessment

Projected impact on the target metric (UPGAssessment)

Impact (5-point) scale →

Minimal

Barely moves the needle

Low

Small improvement

Moderate

Noticeable improvement

High

Significant improvement

Transformative

Game-changing

confidenceassessment

Team confidence at plan-time (UPGAssessment, scale `confidence_5`)

Confidence (5-point) scale →

Guessing

No evidence

Hunch

Anecdotal evidence

Some evidence

A few data points

Confident

Multiple data sources

Data-backed

Strong quantitative evidence

cost_estimateassessment

Cost estimate at plan-time (UPGAssessment)

Effort (5-point) scale →

Trivial

Hours

Small

Days

Medium

1-2 weeks

Significant

Weeks to months

Massive

Months+

planned_start_datestring

Planned start date

planned_end_datestring

Planned end date

Inherited from BaseNode (6 fields)

idstringrequired

Unique identifier (UUID)

typeNodeTyperequired

Discriminator for the entity type

titlestringrequired

Display name

descriptionstring

Optional detailed description

statusstring

Lifecycle status

tagsstring[]

Freeform tags for filtering

Lifecycle

4 phases, initial: drafted

All lifecycles

Relationships

7 edge types connected to this entity.

Parents

Entities that can contain this type

Hypothesishypothesis_requires_experiment_plan

Growth Campaigngrowth_campaign_tests_via_experiment_plan

Pricing Strategypricing_strategy_tests_experiment_plan

Children

Entities this type can contain

Experimentexperiment_plan_designs_experiment

Experiment Runexperiment_plan_ran_as_experiment_run

Cross-References

Contextual links across the graph

Metricexperiment_plan_targets_metric

Behavioral Segmentexperiment_plan_targets_behavioral_segment

Graph Position

3parents

📐Experiment Plan

2children

2cross-ref

Definition

Usage Guidance

Write the hypothesis as a falsifiable prediction with a number and a threshold ("adding social proof lifts trial-to-paid conversion by at least 3 points").
Fix the success criteria, sample size, and duration before launch, never after seeing results.
Connect the plan to the hypothesis or assumption it tests.

Anti-Patterns

The cardinal sin is deciding the success criteria after the data is in.
Once the threshold is movable, the experiment can only confirm what you hoped.
Underpowered plans are nearly as bad: a test with too small a sample produces noise that gets read as signal.
And a plan with no stated way to be wrong is not an experiment but a launch with a chart attached.

Examples

Pricing-page A/B plan

Hypothesis: showing annual pricing by default lifts annual-plan selection by ≥5 points. Setup: 50/50 split, 2-week run, minimum 4,000 visitors per arm, significance at p < 0.05.

Onboarding-checklist plan

Hypothesis: a 3-step activation checklist raises day-7 retention by ≥2 points. Success measured on the cohort entering during the run window, with a guardrail metric on support-ticket volume.