Safety & Moderation
Refusal policies, content filters and output validation.
Refusal Policy Guardrail
A drop-in policy block that defines what an assistant must refuse, how to refuse gracefully, and how to offer safe alternatives.
Content Safety (NeMo)
These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check
Structured Output Guardrail
Constrains a model to emit valid, schema-conformant JSON and specifies what to do when validation fails — reask, repair, or refuse.
PII Redaction Guardrail
Instructs a model to detect and redact personally identifiable information from its inputs and outputs before responding.
Privateai (NeMo Guardrail)
PII DETECTION RAILS INPUT RAILS flow detect pii on input """Check if the user input has PII.""" $has_pii = await DetectPiiAction(source="input", text=
Injection Detection (NeMo Guardrail)
flow injection detection """ Reject, omit, or sanitize injection attempts from the bot. This rail operates on the $bot_message. """ response = await I
Gcp Moderate Text (NeMo Guardrail)
""" https://cloud.google.com/natural-language/docs/moderating-text Supported Violations: Safety Attribute Description Toxic Content that is rude, disr
Abc V2 (NeMo)
prompts: - task: self_check_input content: | Your task is to check if the user message below complies with the company policy for talking with the com
Llama Guard (NeMo Guardrail)
flow llama guard check input $llama_guard_response = await LlamaGuardCheckInputAction global $allowed $allowed = $llama_guard_response["allowed"] Poli
Cleanlab (NeMo Guardrail)
""" https://cleanlab.ai/tlm/ https://help.cleanlab.ai/tutorials/tlm/ how-does-the-tlm-trustworthiness-score-work """ flow cleanlab trustworthiness """
Topic Safety (NeMo Guardrail)
flow topic safety check input $model $response = await TopicSafetyCheckInputAction(model_name=$model) global $on_topic $on_topic = $response["on_topic
Output Check (NeMo Guardrail)
flow self check output $allowed = await SelfCheckOutputAction if not $allowed if $system.config.enable_rails_exceptions send OutputRailException(messa
Sensitive Data Detection (NeMo Guardrail)
INPUT RAILS flow detect sensitive data on input """Check if the user input has any sensitive data.""" $has_sensitive_data = await DetectSensitiveDataA
Nemoguards V2 (NeMo)
prompts: - task: topic_safety_check_input $model=topic_control content: | You are to act as a customer service agent, providing users with factual inf
Patronusai (NeMo Guardrail)
flow patronus lynx check output hallucination $patronus_lynx_response = await PatronusLynxCheckOutputHallucinationAction global $hallucination $halluc
Content Safety Vision (NeMo)
prompts: - task: content_safety_check_input $model=vision_rails messages: - type: system content: | You are an AI assistant that evaluates if a user's
Autoalign (NeMo Guardrail)
flow autoalign check input $input_result = await AutoalignInputApiAction(show_autoalign_message=True) if $input_result["guardrails_triggered"] global
Trend Micro (NeMo Guardrail)
INPUT AND/OR OUTPUT RAIL flow trend ai guard input $text $result = await TrendAiGuardAction(text=$text) if $result.blocked Fails open if AI Guard serv
Facts (NeMo Guardrail)
flow self check facts """Check if the previous answer is accurate w.r.t. the relevant chunks. This output rail must be enabled explicitly per output m
Content Safety Reasoning (NeMo)
prompts: - task: content_safety_check_input $model=content_safety_reasoning content: | You are a classifier for analyzing interactions between humans
AI Defense (NeMo Guardrail)
INPUT RAILS flow ai defense inspect prompt $input_text """Check if the prompt is safe according to AI Defense.""" $result = await AiDefenseInspectActi
Topic Safety (NeMo)
prompts: - task: topic_safety_check_input $model=topic_control content: | You are to act as a customer service agent, providing users with factual inf
Content Safety API Keys (NeMo)
These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check
Self Check Thinking (NeMo)
prompts: - task: self_check_input content: | Your task is to check if the user message below complies with the company policy for talking with the com
Pangea (NeMo Guardrail)
INPUT RAILS flow pangea ai guard input $result = await PangeaAiGuardAction(mode="input") if $result.blocked if $system.config.enable_rails_exceptions
Hf Classifier (NeMo Guardrail)
flow hf classifier check input $classifier $allowed = await HfClassifierCheckInputAction(classifier=$classifier) if not $allowed if $system.config.ena
Patronusai (NeMo)
prompts: - task: patronus_lynx_check_output_hallucination content: | Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided a
Align Score (NeMo Guardrail)
flow alignscore check facts """Check if the previous answer is accurate w.r.t. the relevant chunks. This output rail must be enabled explicitly per ou
Llama Guard (NeMo)
These are the default prompts released by Meta, with the exception of policy O7, which was added to address direct insults. prompts: - task: llama_gua
Input Check (NeMo Guardrail)
flow self check input $allowed = await SelfCheckInputAction if not $allowed if $system.config.enable_rails_exceptions send InputRailException(message=
Content Safety Local (NeMo)
These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check
Gliner (NeMo Guardrail)
GLiNER PII DETECTION RAILS INPUT RAILS flow gliner detect pii on input """Check if the user input has PII using GLiNER.""" $has_pii = await GlinerDete
Content Safety Multilingual (NeMo)
prompts: - task: content_safety_check_input $model=content_safety content: | Task: Check if there is unsafe content in the user or agent messages in t
Guardrails AI (NeMo Guardrail)
flow guardrailsai check input $validator """Check input text using relevant Guardrails AI validators.""" $result = await ValidateGuardrailsAiInputActi
Content Safety (NeMo Guardrail)
flow content safety check input $model $response = await ContentSafetyCheckInputAction(model_name=$model) global $allowed $allowed = $response["allowe
Activefence (NeMo Guardrail)
""" https://docs.activefence.com/index.html section/Integrating-with-the-TandS-Platform-Overview/Violation-Types Supported Violations: Abusive or Harm
Prompt Security (NeMo Guardrail)
INPUT RAILS flow protect prompt """Check if the prompt is valid according to Prompt Security.""" $result = await ProtectTextAction(user_prompt=$user_m
Abc (NeMo)
prompts: - task: self_check_input content: | Your task is to check if the user message below complies with the company policy for talking with the com
Config (NeMo)
These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check
Crowdstrike Aidr (NeMo Guardrail)
INPUT RAILS flow crowdstrike aidr guard input $result = await CrowdstrikeAidrGuardAction(mode="input") if $result.blocked if $system.config.enable_rai
Clavata (NeMo Guardrail)
POLICY DETECTION RAILS INPUT and OUTPUT RAILS @active flow clavata check for $text $policy $labels="" """Check if the input text is benign.""" $is_mat
Regex (NeMo Guardrail)
INPUT RAILS flow regex check input """Check if the user input matches any forbidden regex patterns.""" $result = await DetectRegexMatchAction(source="
Hallucination (NeMo Guardrail)
flow bot inform answer unknown bot say "I don't know the answer to that." flow hallucination warning """Warning rail for hallucination.""" bot said so
Fiddler (NeMo Guardrail)
flow fiddler user safety """ Use Fiddler's SLMs to detect jailbreaks """ $is_jailbreak = await CallFiddlerSafetyUserAction() if $is_jailbreak if $conf
Policyai (NeMo Guardrail)
""" PolicyAI Integration Flows (Colang 2.x) PolicyAI provides content moderation and policy enforcement capabilities. For more information, see: https
Vertexai (NeMo)
prompts: - task: self_check_input content: |- Instruction: {{ user_input }} Would this instruction make a language model break moderation policies, de
Jailbreak Detection (NeMo Guardrail)
flow jailbreak detection heuristics """ Heuristic checks to assess whether the user's prompt is an attempted jailbreak. """ $is_jailbreak = await Jail