Safety & Moderation

Refusal policies, content filters and output validation.

Refusal Policy Guardrail

A drop-in policy block that defines what an assistant must refuse, how to refuse gracefully, and how to offer safe alternatives.

Safety & Moderation2026-06-15

NVIDIAGuardrailsNew

Content Safety (NeMo)

These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check

Safety & Moderation2026-06-14

CommunityGuardrailsNew

Structured Output Guardrail

Constrains a model to emit valid, schema-conformant JSON and specifies what to do when validation fails — reask, repair, or refuse.

Safety & Moderation2026-06-09

CommunityGuardrails

PII Redaction Guardrail

Instructs a model to detect and redact personally identifiable information from its inputs and outputs before responding.

Safety & Moderation2026-05-18

NVIDIAGuardrails

Privateai (NeMo Guardrail)

PII DETECTION RAILS INPUT RAILS flow detect pii on input """Check if the user input has PII.""" $has_pii = await DetectPiiAction(source="input", text=

Safety & Moderation2026-05-16

NVIDIAGuardrails

Injection Detection (NeMo Guardrail)

flow injection detection """ Reject, omit, or sanitize injection attempts from the bot. This rail operates on the $bot_message. """ response = await I

Safety & Moderation2026-05-10

NVIDIAGuardrails

Gcp Moderate Text (NeMo Guardrail)

""" https://cloud.google.com/natural-language/docs/moderating-text Supported Violations: Safety Attribute Description Toxic Content that is rude, disr

Safety & Moderation2026-05-09

NVIDIAGuardrails

Abc V2 (NeMo)

prompts: - task: self_check_input content: | Your task is to check if the user message below complies with the company policy for talking with the com

Safety & Moderation2026-05-04

NVIDIAGuardrails

Llama Guard (NeMo Guardrail)

flow llama guard check input $llama_guard_response = await LlamaGuardCheckInputAction global $allowed $allowed = $llama_guard_response["allowed"] Poli

Safety & Moderation2026-05-04

NVIDIAGuardrails

Cleanlab (NeMo Guardrail)

""" https://cleanlab.ai/tlm/ https://help.cleanlab.ai/tutorials/tlm/ how-does-the-tlm-trustworthiness-score-work """ flow cleanlab trustworthiness """

Safety & Moderation2026-04-17

NVIDIAGuardrails

Topic Safety (NeMo Guardrail)

flow topic safety check input $model $response = await TopicSafetyCheckInputAction(model_name=$model) global $on_topic $on_topic = $response["on_topic

Safety & Moderation2026-04-13

NVIDIAGuardrails

Output Check (NeMo Guardrail)

flow self check output $allowed = await SelfCheckOutputAction if not $allowed if $system.config.enable_rails_exceptions send OutputRailException(messa

Safety & Moderation2026-04-09

NVIDIAGuardrails

Sensitive Data Detection (NeMo Guardrail)

INPUT RAILS flow detect sensitive data on input """Check if the user input has any sensitive data.""" $has_sensitive_data = await DetectSensitiveDataA

Safety & Moderation2026-04-08

NVIDIAGuardrails

Nemoguards V2 (NeMo)

prompts: - task: topic_safety_check_input $model=topic_control content: | You are to act as a customer service agent, providing users with factual inf

Safety & Moderation2026-03-12

NVIDIAGuardrails

Patronusai (NeMo Guardrail)

flow patronus lynx check output hallucination $patronus_lynx_response = await PatronusLynxCheckOutputHallucinationAction global $hallucination $halluc

Safety & Moderation2026-03-10

NVIDIAGuardrails

Content Safety Vision (NeMo)

prompts: - task: content_safety_check_input $model=vision_rails messages: - type: system content: | You are an AI assistant that evaluates if a user's

Safety & Moderation2026-03-01

NVIDIAGuardrails

Autoalign (NeMo Guardrail)

flow autoalign check input $input_result = await AutoalignInputApiAction(show_autoalign_message=True) if $input_result["guardrails_triggered"] global

Safety & Moderation2026-02-22

NVIDIAGuardrails

Trend Micro (NeMo Guardrail)

INPUT AND/OR OUTPUT RAIL flow trend ai guard input $text $result = await TrendAiGuardAction(text=$text) if $result.blocked Fails open if AI Guard serv

Safety & Moderation2026-02-16

NVIDIAGuardrails

Facts (NeMo Guardrail)

flow self check facts """Check if the previous answer is accurate w.r.t. the relevant chunks. This output rail must be enabled explicitly per output m

Safety & Moderation2026-02-15

NVIDIAGuardrails

Content Safety Reasoning (NeMo)

prompts: - task: content_safety_check_input $model=content_safety_reasoning content: | You are a classifier for analyzing interactions between humans

Safety & Moderation2026-01-28

NVIDIAGuardrails

AI Defense (NeMo Guardrail)

INPUT RAILS flow ai defense inspect prompt $input_text """Check if the prompt is safe according to AI Defense.""" $result = await AiDefenseInspectActi

Safety & Moderation2026-01-24

NVIDIAGuardrails

Topic Safety (NeMo)

prompts: - task: topic_safety_check_input $model=topic_control content: | You are to act as a customer service agent, providing users with factual inf

Safety & Moderation2026-01-23

NVIDIAGuardrails

Content Safety API Keys (NeMo)

These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check

Safety & Moderation2026-01-19

NVIDIAGuardrails

Self Check Thinking (NeMo)

prompts: - task: self_check_input content: | Your task is to check if the user message below complies with the company policy for talking with the com

Safety & Moderation2026-01-19

NVIDIAGuardrails

Pangea (NeMo Guardrail)

INPUT RAILS flow pangea ai guard input $result = await PangeaAiGuardAction(mode="input") if $result.blocked if $system.config.enable_rails_exceptions

Safety & Moderation2026-01-13

NVIDIAGuardrails

Hf Classifier (NeMo Guardrail)

flow hf classifier check input $classifier $allowed = await HfClassifierCheckInputAction(classifier=$classifier) if not $allowed if $system.config.ena

Safety & Moderation2026-01-09

NVIDIAGuardrails

Patronusai (NeMo)

prompts: - task: patronus_lynx_check_output_hallucination content: | Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided a

Safety & Moderation2026-01-09

NVIDIAGuardrails

Align Score (NeMo Guardrail)

flow alignscore check facts """Check if the previous answer is accurate w.r.t. the relevant chunks. This output rail must be enabled explicitly per ou

Safety & Moderation2026-01-03

NVIDIAGuardrails

Llama Guard (NeMo)

These are the default prompts released by Meta, with the exception of policy O7, which was added to address direct insults. prompts: - task: llama_gua

Safety & Moderation2025-12-27

NVIDIAGuardrails

Input Check (NeMo Guardrail)

flow self check input $allowed = await SelfCheckInputAction if not $allowed if $system.config.enable_rails_exceptions send InputRailException(message=

Safety & Moderation2025-12-19

NVIDIAGuardrails

Content Safety Local (NeMo)

These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check

Safety & Moderation2025-12-14

NVIDIAGuardrails

Gliner (NeMo Guardrail)

GLiNER PII DETECTION RAILS INPUT RAILS flow gliner detect pii on input """Check if the user input has PII using GLiNER.""" $has_pii = await GlinerDete

Safety & Moderation2025-11-21

NVIDIAGuardrails

Content Safety Multilingual (NeMo)

prompts: - task: content_safety_check_input $model=content_safety content: | Task: Check if there is unsafe content in the user or agent messages in t

Safety & Moderation2025-11-14

NVIDIAGuardrails

Guardrails AI (NeMo Guardrail)

flow guardrailsai check input $validator """Check input text using relevant Guardrails AI validators.""" $result = await ValidateGuardrailsAiInputActi

Safety & Moderation2025-11-11

NVIDIAGuardrails

Content Safety (NeMo Guardrail)

flow content safety check input $model $response = await ContentSafetyCheckInputAction(model_name=$model) global $allowed $allowed = $response["allowe

Safety & Moderation2025-11-06

NVIDIAGuardrails

Activefence (NeMo Guardrail)

""" https://docs.activefence.com/index.html section/Integrating-with-the-TandS-Platform-Overview/Violation-Types Supported Violations: Abusive or Harm

Safety & Moderation2025-10-26

NVIDIAGuardrails

Prompt Security (NeMo Guardrail)

INPUT RAILS flow protect prompt """Check if the prompt is valid according to Prompt Security.""" $result = await ProtectTextAction(user_prompt=$user_m

Safety & Moderation2025-10-06

NVIDIAGuardrails

Abc (NeMo)

prompts: - task: self_check_input content: | Your task is to check if the user message below complies with the company policy for talking with the com

Safety & Moderation2025-10-01

NVIDIAGuardrails

Config (NeMo)

These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check

Safety & Moderation2025-10-01

NVIDIAGuardrails

Crowdstrike Aidr (NeMo Guardrail)

INPUT RAILS flow crowdstrike aidr guard input $result = await CrowdstrikeAidrGuardAction(mode="input") if $result.blocked if $system.config.enable_rai

Safety & Moderation2025-10-01

NVIDIAGuardrails

Clavata (NeMo Guardrail)

POLICY DETECTION RAILS INPUT and OUTPUT RAILS @active flow clavata check for $text $policy $labels="" """Check if the input text is benign.""" $is_mat

Safety & Moderation2025-09-19

NVIDIAGuardrails

Regex (NeMo Guardrail)

INPUT RAILS flow regex check input """Check if the user input matches any forbidden regex patterns.""" $result = await DetectRegexMatchAction(source="

Safety & Moderation2025-09-08

NVIDIAGuardrails

Hallucination (NeMo Guardrail)

flow bot inform answer unknown bot say "I don't know the answer to that." flow hallucination warning """Warning rail for hallucination.""" bot said so

Safety & Moderation2025-09-06

NVIDIAGuardrails

Fiddler (NeMo Guardrail)

flow fiddler user safety """ Use Fiddler's SLMs to detect jailbreaks """ $is_jailbreak = await CallFiddlerSafetyUserAction() if $is_jailbreak if $conf

⚠Safety & Moderation2025-09-03

NVIDIAGuardrails

Policyai (NeMo Guardrail)

""" PolicyAI Integration Flows (Colang 2.x) PolicyAI provides content moderation and policy enforcement capabilities. For more information, see: https

Safety & Moderation2025-09-01

NVIDIAGuardrails

Vertexai (NeMo)

prompts: - task: self_check_input content: |- Instruction: {{ user_input }} Would this instruction make a language model break moderation policies, de

Safety & Moderation2025-08-28

NVIDIAGuardrails

Jailbreak Detection (NeMo Guardrail)

flow jailbreak detection heuristics """ Heuristic checks to assess whether the user's prompt is an attempted jailbreak. """ $is_jailbreak = await Jail

⚠Safety & Moderation2025-08-25