A safety mechanism that constrains AI model output, content filters, format validators, or toxicity checks.
An AI guardrail is a runtime constraintConstraintStrategyA constraint entityView reference → placed around a model's behaviour: it inspects inputs before they reach the model and outputs before they reach the user, blocking what falls outside policy. Filtering personally identifiable information, refusing off-topic or unsafe requests, and catching jailbreak attempts are all guardrail jobsJobUserJob To Be Done: what the user is trying to accomplishView reference →. The constraint lives outside the model's weights, which is what lets a team change the rules without retraining anything.
Guardrails emerged as a practical answer to a gap left by alignment. Training techniques such as reinforcement learningLearningValidationAn insight gained from an experimentView reference → from human feedback steer a model's default behaviour, but they cannot guarantee that a deployed system stays on policy for every input. The industry response was an external enforcement layer. NVIDIA released NeMo Guardrails as an open-source toolkit in 2023, defining programmable rules for topic control, dialogue flow, and jailbreak prevention. Meta shipped Llama Guard the same year, a model trained specifically to classify whether an input or output violates a safety taxonomy.
The distinction that settled the field is that alignment is baked into the model and guardrails wrap it. Alignment changes what the model tends to do; a guardrail changes what the surrounding system permits, and it can be updated in minutes when a new attack appears.
A healthcare assistant runs every model response through two guardrails. The first scans for PII and redacts any patient name or record number before the text is logged. The second checks responses against a medical-advice policy and blocks anything that reads as a diagnosis, replacing it with a referral message. When a prompt-injection campaign tries to coax the model into ignoring its instructions, the input guardrail flags the override pattern and refuses before the model ever runs. None of these changes touched the model itself.
In the Unified Product Graph, a guardrail sits in the AI and intelligence region as the constraint layer between a model and its users. It connects upward via AI Modelconstrained byAI Guardrailhierarchy and downward to policy through ai_model_constrained_by_ai_guardrailAI GuardrailenforcesSecurity Policycross-domain. Those edges make the enforcement chain queryable: from a model, you can see every constraint on it, and from a security policy, you can see whether a runtime mechanism actually backs it.ai_guardrail_enforces_security_policy
Type-specific fields on BaseNode
guardrail_typestringProtection category
enforcementstringAction when triggered
trigger_countnumberTimes triggered
idstringrequiredUnique identifier (UUID)
typeNodeTyperequiredDiscriminator for the entity type
titlestringrequiredDisplay name
descriptionstringOptional detailed description
statusstringLifecycle status
tagsstring[]Freeform tags for filtering
4 phases — initial: proposed
2 edge types connected to this entity.
ai_model_constrained_by_ai_guardrailai_guardrail_enforces_security_policy