Skip to main content

Guardrails AI

Guardrails AI is a framework for validating LLM outputs using a community-driven hub of validators for safety, PII detection, content quality, and more. MLflow's Guardrails AI integration allows you to use Guardrails validators as MLflow scorers, providing rule-based evaluation without requiring LLM calls.

Prerequisites

Guardrails AI scorers require the guardrails-ai package:

bash
pip install guardrails-ai

Quick Start

python
from mlflow.genai.scorers.guardrails import ToxicLanguage

scorer = ToxicLanguage(threshold=0.7)
feedback = scorer(
outputs="This is a professional and helpful response.",
)

print(feedback.value) # "yes" or "no"

Available Guardrails AI Scorers

Safety and Content Quality

Detect harmful content, PII, and other safety issues in LLM outputs:

ScorerWhat does it evaluate?Guardrails Hub
ToxicLanguageDoes the output contain toxic or offensive language?Link
NSFWTextDoes the output contain NSFW or explicit content?Link
DetectJailbreakDoes the input contain a jailbreak or prompt injection attempt?Link
DetectPIIDoes the output contain personally identifiable information?Link
SecretsPresentDoes the output contain API keys, tokens, or other secrets?Link
GibberishTextDoes the output contain nonsensical or incoherent text?Link

Creating Scorers by Name

You can also create Guardrails AI scorers dynamically using get_scorer:

python
from mlflow.genai.scorers.guardrails import get_scorer

# Create scorer by name
scorer = get_scorer(
validator_name="ToxicLanguage",
threshold=0.7,
)

feedback = scorer(
outputs="This is a professional response.",
)

Configuration

Guardrails AI scorers accept validator-specific parameters. Any additional keyword arguments are passed directly to the underlying Guardrails validator:

python
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII, DetectJailbreak

# Toxicity detection with custom threshold
scorer = ToxicLanguage(
threshold=0.7, # Confidence threshold for detection
validation_method="sentence", # "sentence" or "full" text validation
)

# PII detection with custom entity types
pii_scorer = DetectPII(
pii_entities=["CREDIT_CARD", "SSN", "EMAIL_ADDRESS"],
)

# Jailbreak detection with custom sensitivity
jailbreak_scorer = DetectJailbreak(
threshold=0.9, # Lower values are more sensitive
)

Refer to the Guardrails AI documentation and the Guardrails Hub for validator-specific parameters and the full list of available validators.

Next Steps