A structured plan for testing a hypothesis
A test plan is the written specification for validating a single hypothesisHypothesisValidationA testable belief about a solutionView reference →: the method you will use, the metricMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → that counts as success, the sample you will run it against, and the threshold at which you stop and decide. It exists because a hypothesis on its own is just a sentence. The plan is what turns a belief into something a team can run, observe, and be wrong about on purpose.
The discipline traces to the lean startup movement's insistence that beliefs be tested cheaply before they are built expensively. The clearest codification arrived with David J. Bland and Alexander Osterwalder's Testing Business Ideas (2019), which cataloged 43 experimentExperimentValidationA test designed to validate a hypothesisView reference → types organised by cost, time, and strength of evidenceEvidenceValidationData supporting or refuting a hypothesisView reference →, each framed as a way to attack a stated hypothesis under desirability, feasibility, or viability.
Bland's earlier work, developed alongside Jeff Gothelf and Josh Seiden, sharpened the targeting logic. AssumptionsAssumptionStrategyA belief taken as true that underpins a strategyView reference → mapping ranks a team's beliefs by how risky each one is and how little evidence supports it, so that testing effort lands on the assumption whose failure would sink the whole idea. That assumption became known as the riskiest assumption, and testing it first is the economic case for writing a test plan at all: you spend on evidence where the payoff of being wrong early is highest.
The thinking has settled on a separation that older "test everything" habits blurred. A test plan is scoped to one hypothesis and one decisionDecisionStrategyA recorded decision with context, rationale, and consequencesView reference →. It names success before the test runs, which is the discipline that stops teams from reading whatever result they get as confirmation.
A subscriptionSubscriptionSales & RevenueA recurring subscriptionView reference → team believes that solo founders will pay £19 a month for an automated bookkeeping featureFeatureProduct SpecificationA product capability or featureView reference →. That belief is the riskiest assumption on their map: high impact, thin evidence. The test plan reads as follows. Method: a pricing page with a live checkout, driven by a £400 ad spend. Sample: 600 visitors from the solo-founder segment. Success criterion: at least 4% click through to checkout and 1.5% complete payment intent. Decision rule: below 1.5%, the feature is reframed or dropped; at or above, it proceeds to a built prototypePrototypeExperience DesignAn interactive mockup for testingView reference →.
The test runs for nine days. Click-through lands at 5.1%, but payment-intent completion stalls at 0.7%. The pre-committed rule does its jobJobUserJob To Be Done: what the user is trying to accomplishView reference →: interest is real, willingness to pay at £19 is not. The team learns this for £400 rather than for a quarter of engineering.
In the Unified Product Graph, Test PlanValidationA structured plan for testing a hypothesis sits in the validation region as the bridge between a belief and its evidence. A test_planHypothesisValidationA testable belief about a solutionView reference → connects down to it via hypothesisHypothesisplanned viaTest Planhierarchy, and the plan connects forward to its execution via hypothesis_planned_via_test_planTest Planran asExperiment Runhierarchy. Both edges are hierarchical, which encodes the real dependencyDependencyTeam & OrganisationA cross-team or system dependencyView reference →: a plan that validates no hypothesis is busywork, and a plan that never ran as an experiment is an intention with no outcomeOutcomeStrategyA desired business or user outcomeView reference →. The structure makes the riskiest-assumption discipline queryable, because you can ask which hypotheses still lack a plan and which plans still lack a run.test_plan_ran_as_experiment_run
Type-specific fields on BaseNode
plan_typestringTest type
sample_sizenumberParticipants or observations
durationstringRun duration. @example "2 weeks"
success_criteriastringCriteria determining whether the test passes
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
4 phases — initial: drafted
2 edge types connected to this entity.
hypothesis_planned_via_test_plantest_plan_ran_as_experiment_run