Guardrails are validation and filtering mechanisms that ensure your AI agents produce high-quality, accurate, and compliant outputs. They act as checkpoints within your workflows.

What are Guardrails?

Guardrails are special tasks that review and validate the output of other agents before passing it along. They can:

Validate content meets specific criteria
Filter inappropriate or incorrect information
Transform outputs into required formats
Flag content that needs human review

How Guardrails Work

[Agent Task] → [Guardrail Task] → [Next Task or Output]

The guardrail agent receives the previous task's output and evaluates it against defined criteria. Based on the evaluation, it can:

Pass: Forward the output unchanged
Modify: Make corrections and forward
Reject: Stop the workflow with an error
Flag: Continue but mark for review

Setting Up Guardrails

Step 1: Create a Guardrail Agent

First, create an AI agent specifically for validation:

Go to Agents → Create Agent
Name it descriptively (e.g., "Content Compliance Checker")
Configure the system prompt with validation criteria:

You are a content validation agent. Review the provided content and check for:

1. Factual accuracy - Flag any unverified claims
2. Tone appropriateness - Ensure professional language
3. Compliance - No sensitive data exposure
4. Completeness - All required sections present

Respond with:
- STATUS: PASS, MODIFY, REJECT, or FLAG
- ISSUES: List any problems found
- CORRECTED_CONTENT: If MODIFY, provide corrected version
- NOTES: Any additional context

Step 2: Add to Workflow

Open your workflow in the builder
Add a new task after the task you want to validate
Select your guardrail agent
Configure context settings:
- Full Context: See all previous conversation
- Isolated: Only see the immediate output to validate

Step 3: Configure Instructions

Add task-specific instructions for what to validate:

Validate the blog post draft for:
- No placeholder text remaining
- Proper heading structure (H1, H2, H3)
- All links are formatted correctly
- Word count between 800-1200 words
- Includes a call-to-action

Types of Guardrails

Content Quality Guardrails

Ensure outputs meet quality standards:

Check	Description
Grammar & Spelling	Catch errors before publishing
Tone Consistency	Maintain brand voice
Completeness	All required sections present
Length	Within specified word/character limits

Example prompt:

Review this content for quality:
- Fix any grammar or spelling errors
- Ensure tone matches our brand (professional but friendly)
- Verify all sections from the outline are included
- Confirm length is 800-1200 words

If issues found, correct them and return the improved version.

Factual Accuracy Guardrails

Verify claims and data:

Check	Description
Claim Verification	Flag unsubstantiated claims
Data Accuracy	Verify numbers and statistics
Source Attribution	Ensure claims have sources
Outdated Info	Flag potentially stale data

Example prompt:

Fact-check this content:
- Identify any claims that need verification
- Flag statistics without sources
- Check for potentially outdated information
- Verify company names and product details

Return PASS if accurate, or FLAG with specific concerns.

Compliance Guardrails

Ensure regulatory and policy compliance:

Check	Description
PII Detection	No personal data exposure
Legal Compliance	GDPR, CCPA, industry regulations
Brand Guidelines	Approved messaging only
Prohibited Content	No banned topics or language

Example prompt:

Compliance check:
- Scan for any personal identifiable information (PII)
- Verify GDPR compliance for any data mentions
- Ensure no competitor disparagement
- Check for prohibited terms from our brand guidelines

REJECT if PII found. FLAG for legal review if uncertain.

Format Guardrails

Ensure proper output structure:

Check	Description
JSON Validation	Proper JSON structure
Schema Compliance	Matches expected schema
Markdown Format	Correct heading levels, links
Required Fields	All mandatory fields present

Example prompt:

Validate the JSON output:
- Must be valid JSON
- Required fields: title, summary, content, tags
- Tags must be an array with 3-5 items
- Content must be at least 500 characters

If invalid, attempt to fix and return corrected JSON.
If unfixable, REJECT with specific error.

Guardrail Response Handling

Configure how your workflow responds to guardrail results:

Pass-Through Mode

Guardrail validates but always continues:

Content is passed to next task
Issues are logged for review
Workflow completes normally

Strict Mode

Guardrail can halt the workflow:

PASS: Continue normally
MODIFY: Use corrected content and continue
REJECT: Stop workflow, return error
FLAG: Continue but mark run for review

Human-in-the-Loop

Require human approval for flagged content:

Guardrail flags uncertain content
Workflow pauses for human review
Human approves, rejects, or edits
Workflow continues or stops based on decision

Best Practices

1. Layer Your Guardrails

Use multiple guardrails for critical workflows:

[Content Agent]
    → [Quality Guardrail]
    → [Compliance Guardrail]
    → [Format Guardrail]
    → [Output]

2. Be Specific

Vague criteria lead to inconsistent results:

❌ Bad: "Make sure it's good" ✅ Good: "Verify word count is 800-1200, tone is professional, no first-person pronouns"

3. Include Examples

Show the guardrail what good and bad look like:

Examples of PASS:
- "Our Q3 results showed 15% growth..."
- "Contact support@company.com for help..."

Examples of REJECT:
- "John Smith's SSN is 123-45-6789..." (PII)
- "Our competitor's product is terrible..." (disparagement)

4. Log Everything

Keep records of guardrail decisions for:

Debugging workflow issues
Training better guardrails
Compliance audits
Performance optimization

5. Test Edge Cases

Test your guardrails with:

Obviously good content
Obviously bad content
Borderline cases
Adversarial inputs

Example: Complete Guardrailed Workflow

Here's a content creation workflow with comprehensive guardrails:

[Form Trigger: Content Brief]
        ↓
[Research Agent]
  - Gathers information
        ↓
[Fact-Check Guardrail]    ← Verifies research accuracy
        ↓
[Writer Agent]
  - Creates draft
        ↓
[Quality Guardrail]       ← Checks grammar, tone, length
        ↓
[Compliance Guardrail]    ← Scans for PII, legal issues
        ↓
[Editor Agent]
  - Final polish
        ↓
[Format Guardrail]        ← Validates output structure
        ↓
[Output: Published Content]

Monitoring Guardrail Performance

Track these metrics:

Pass Rate: Percentage of content passing first time
Common Issues: Most frequent rejection reasons
False Positives: Valid content incorrectly rejected
False Negatives: Invalid content that passed

Use this data to refine your guardrail prompts and improve over time.

Troubleshooting

Too Many False Positives

Make criteria more specific
Add more examples of acceptable content
Adjust confidence thresholds

Missing Real Issues

Expand validation criteria
Add specific checks for missed categories
Consider multiple guardrails in sequence

Inconsistent Results

Use more deterministic prompts
Lower temperature settings on guardrail agent
Add structured output requirements

Next Steps

Creating a Workflow - Build workflows with guardrails
Workflow Triggers - Configure workflow triggers
Introduction to Workflows - Workflow fundamentals

Adding Guardrails

What are Guardrails?

How Guardrails Work

Setting Up Guardrails

Step 1: Create a Guardrail Agent

Step 2: Add to Workflow

Step 3: Configure Instructions

Types of Guardrails

Content Quality Guardrails

Factual Accuracy Guardrails

Compliance Guardrails

Format Guardrails

Guardrail Response Handling

Pass-Through Mode

Strict Mode

Human-in-the-Loop

Best Practices

1. Layer Your Guardrails

2. Be Specific

3. Include Examples

4. Log Everything

5. Test Edge Cases

Example: Complete Guardrailed Workflow

Monitoring Guardrail Performance

Troubleshooting

Too Many False Positives

Missing Real Issues

Inconsistent Results

Next Steps