A structured test designed to validate or invalidate a hypothesis. Experiments define what to measure, how to measure it, and what success looks like.
An experiment is a structured test designed to confront a hypothesisHypothesisValidationA testable belief about a solutionView reference → with evidenceEvidenceValidationData supporting or refuting a hypothesisView reference → and produce a decisionDecisionStrategyA recorded decision with context, rationale, and consequencesView reference →. It specifies what you will do, what you will measure, and the result that would change your mind, all fixed before you start. The craft is choosing the cheapest test that meaningfully reduces the largest riskRiskComplianceA risk to the product or businessView reference →, because a flawless study of a trivial question wastes the same time as a sloppy one.
Controlled experimentation is centuries old in science, but its move into product practice came through The Lean Startup (2011). Eric Ries argued that a startup is an engine for learningLearningValidationAn insight gained from an experimentView reference →, and that experiments, not opinions or executive intuition, are how it earns validated knowledge. He paired this with innovation accounting: run an experiment, measure the effect on the metricsMetricStrategyA unified metric that measures progress, health, or behaviour across the productView reference → that matter, and decide whether to persevere or pivot. The build-measure-learn loop made the experiment the heartbeat of the method.
The practice matured from "run an experiment" into "choose the right experiment". David Bland and Alexander Osterwalder's Testing Business Ideas (2019) catalogued 44 named experiments, sequenced from cheap and weak (interviews, landing pages, Wizard of Oz tests) to expensive and strong (working prototypesPrototypeExperience DesignAn interactive mockup for testingView reference →, paid pilots). Their rule of thumb reorders intuition: test desirability before feasibility, run the cheapest experiment that produces meaningful evidence, and aim it at the riskiest assumptionAssumptionStrategyA belief taken as true that underpins a strategyView reference → first. A polished test of a safe belief is theatre.
The cost-of-evidence frame is now the discipline's centre of gravity. Each experiment is judged by how much risk it removes per unit of time and money spent, which is why a fake door or a concierge test can beat months of engineering when the open question is whether anyone wants the thing at all.
A fintech team believes small businesses will pay for automated invoiceInvoiceSales & RevenueAn invoice for billingView reference → chasing. The riskiest assumption is desirability: will anyone actually pay? Building the featureFeatureProduct SpecificationA product capability or featureView reference → would take a quarter, so they design the cheapest test that still draws real evidence. They ship a "smoke test" landing page describing the service with a "Start free trial" button, drive 2,000 visitors through a small ad spend, and define the decision rule in advance: a click-through to the pricing step above 8% justifies building a concierge version, anything under 4% kills the idea.
The result comes back at 11%. That is not proof the product will succeed, and they do not treat it as such. It clears one specific hurdle, desirability, cheaply, and licenses the next, more expensive experiment: a manually-operated concierge service for ten real customers to test whether they stay once the chasing actually happens. The experiments are staged so each one buys the right to run the next.
In the Unified Product Graph, ExperimentValidationA test designed to validate a hypothesis is a self-nesting hub in the Discovery, Research & Validation region, sitting downstream of the hypothesis it tests. The structure separates intent from execution: an experimentExperimentValidationA test designed to validate a hypothesis holds its design via experimentExperimenthas planExperiment Planhierarchy and its execution via experiment_has_planExperimentexecuted asExperiment Runhierarchy, so a planned test and an actual run are distinct nodes. Output is first-class: the run validates the claim through experiment_executed_as_experiment_runExperiment RunvalidatesHypothesiscausal and measures its effect through experiment_run_validates_hypothesisExperiment RunmeasuresMetriccross-domain, while the experiment yields experiment_run_measures_metricExperimentproducesLearningcausal and experiment_produces_learningExperimentproducesEvidencecausal. Because evidence and learning are their own entities, the graph can answer which beliefs have actually been tested, and which still rest on intuition.experiment_produces_evidence
Type-specific fields on BaseNode
methodstringExperimental method (e.g. "A/B test", "usability study", "smoke test")
start_datestringISO start date
end_datestringISO end date
sample_sizenumberTargeted participants or observations
expected_liftnumberExpected change in the primary metric
expected_lift_unitstringUnit of `expected_lift`
actual_liftnumberObserved change in the primary metric
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
4 phases — initial: planned
4 edge types connected to this entity.
experiment_has_planexperiment_executed_as_experiment_runexperiment_produces_learningexperiment_produces_evidence