A contractual promise to customers about service levels, with defined consequences if it is breached.
A service level agreement is the external, contractual promise a provider makes to a customer about how a service will perform, with consequences attached when the promise breaks. It names a target, says how performance is measured, and specifies the penalty, usually a service credit, for falling short. The defining featureFeatureProduct SpecificationA product capability or featureView reference → is the consequence. An aspiration with no penalty is a statement of intent; the moment money rides on the number, it becomes an agreement.
The agreement and the engineering behind it grew up together but stayed distinct. Google's site reliability engineering practice gave the cleanest separation of the three terms that are still routinely confused. A service level indicatorService Level IndicatorDevOps & PlatformA service level indicator (SLI)View reference → (SLI) is a quantitative measure of some aspect of the service, such as latency or availability. A service level objectiveService Level ObjectiveDevOps & PlatformA service level objective (SLO)View reference → (SLO) is a target for that indicator, for example average latency under 120 milliseconds. A service level agreement (SLA) is the explicit contract with users that includes the consequences of meeting or missing the objectivesObjectiveStrategyA strategic goal (OKR)View reference → it contains (Google SRE).
The hard-won lesson from that work is that the contractual number should sit looser than the internal one. Teams set the SLA threshold below the SLO they actually operate to, so the internal alarm sounds well before the contract is breached (Google Cloud). An SLA promising 99.9 per cent uptime is typically backed by an internal SLO of 99.95 per cent, leaving a buffer to fix problems before customers can claim credits.
The error budgetError BudgetDevOps & PlatformAn error budget for a serviceView reference → formalised the buffer. The gap between perfect and the objective becomes a spendable quantity: as long as the budget holds, the team can ship and take risksRiskComplianceA risk to the product or businessView reference →; when it runs low, reliability work takes priority. The SLA sets the floor a business is willing to be paid to defend.
A payments API publishes an SLA of 99.9 per cent monthly uptime, which permits about 43 minutes of downtime in a 30-day month. The contract specifies the remedy: a 10 per cent service credit for breach in the 99.0 to 99.9 band, scaling to 30 per cent below 99.0.
Internally the team runs to an SLO of 99.95 per cent and watches an availability SLI computed from successful requests. In a month with a 25-minute outage, uptime lands at 99.94 per cent. The SLO is missed, so the on-call review and the error-budget freeze kick in, but the SLA holds, so no credits are owed. The buffer did its jobJobUserJob To Be Done: what the user is trying to accomplishView reference →: the internal target caught the problem the contract was never meant to absorb.
In the Unified Product Graph, a service level agreement sits in the Customer Success region. The edge Service Level ObjectivesatisfiesService Level Agreementcross-domain encodes the buffer relationship directly: the internal objective is what keeps the external promise. service_level_objective_satisfies_service_level_agreementProductguarantees viaService Level Agreementhierarchy ties the commitment to the product, product_guarantees_via_service_level_agreementService Level AgreementgovernsServicecross-domain binds it to the service it constrains, and service_level_agreement_governs_serviceService BlueprintcontainsService Level Agreementhierarchy places it in the operational design. Separating the agreement, objective, and indicator into distinct connected types is what stops the common error of treating a contractual promise and an engineering target as the same thing.service_blueprint_contains_service_level_agreement
Type-specific fields on BaseNode
targetstringTarget value for the primary metric (e.g. "99.9%", "< 200ms p95")
measurement_windowstringTime period over which `target` is measured (e.g. "monthly", "quarterly")
coverage_hoursstringHours during which the SLA applies (e.g. "24/7", "business hours", "follow-the-sun")
response_time_targetstringTarget time to first acknowledgement of an incident (e.g. "15 minutes")
resolution_time_targetstringTarget time to incident resolution (e.g. "4 hours" for sev-1)
agreement_termstringEffective term of the agreement (e.g. "12 months", "auto-renewing annual")
effective_datestringISO date effective
expiry_datestringISO date expires. Pairs with `agreement_term` for renewal logic.
ownerstringParty accountable on the service provider side
consequence_of_breachstringWhat happens if the SLA is breached (credits, penalties, escalation path)
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
6 edge types connected to this entity.
product_guarantees_via_service_level_agreementservice_blueprint_contains_service_level_agreementservice_level_objective_satisfies_service_level_agreementservice_level_agreement_governs_serviceservice_level_agreement_measures_metricservice_level_agreement_covers_account