Workflows

Adding Guardrails

Validate and filter AI agent outputs to ensure quality, accuracy, and compliance in your workflows.

February 6, 2024
6 min read

Guardrails are validation and filtering mechanisms that ensure your AI agents produce high-quality, accurate, and compliant outputs. They act as checkpoints within your workflows.

What are Guardrails?

Guardrails are special tasks that review and validate the output of other agents before passing it along. They can:

  • Validate content meets specific criteria
  • Filter inappropriate or incorrect information
  • Transform outputs into required formats
  • Flag content that needs human review

How Guardrails Work

[Agent Task] → [Guardrail Task] → [Next Task or Output]

The guardrail agent receives the previous task's output and evaluates it against defined criteria. Based on the evaluation, it can:

  1. Pass: Forward the output unchanged
  2. Modify: Make corrections and forward
  3. Reject: Stop the workflow with an error
  4. Flag: Continue but mark for review

Setting Up Guardrails

Step 1: Create a Guardrail Agent

First, create an AI agent specifically for validation:

  1. Go to AgentsCreate Agent
  2. Name it descriptively (e.g., "Content Compliance Checker")
  3. Configure the system prompt with validation criteria:
You are a content validation agent. Review the provided content and check for:

1. Factual accuracy - Flag any unverified claims
2. Tone appropriateness - Ensure professional language
3. Compliance - No sensitive data exposure
4. Completeness - All required sections present

Respond with:
- STATUS: PASS, MODIFY, REJECT, or FLAG
- ISSUES: List any problems found
- CORRECTED_CONTENT: If MODIFY, provide corrected version
- NOTES: Any additional context

Step 2: Add to Workflow

  1. Open your workflow in the builder
  2. Add a new task after the task you want to validate
  3. Select your guardrail agent
  4. Configure context settings:
    • Full Context: See all previous conversation
    • Isolated: Only see the immediate output to validate

Step 3: Configure Instructions

Add task-specific instructions for what to validate:

Validate the blog post draft for:
- No placeholder text remaining
- Proper heading structure (H1, H2, H3)
- All links are formatted correctly
- Word count between 800-1200 words
- Includes a call-to-action

Types of Guardrails

Content Quality Guardrails

Ensure outputs meet quality standards:

CheckDescription
Grammar & SpellingCatch errors before publishing
Tone ConsistencyMaintain brand voice
CompletenessAll required sections present
LengthWithin specified word/character limits

Example prompt:

Review this content for quality:
- Fix any grammar or spelling errors
- Ensure tone matches our brand (professional but friendly)
- Verify all sections from the outline are included
- Confirm length is 800-1200 words

If issues found, correct them and return the improved version.

Factual Accuracy Guardrails

Verify claims and data:

CheckDescription
Claim VerificationFlag unsubstantiated claims
Data AccuracyVerify numbers and statistics
Source AttributionEnsure claims have sources
Outdated InfoFlag potentially stale data

Example prompt:

Fact-check this content:
- Identify any claims that need verification
- Flag statistics without sources
- Check for potentially outdated information
- Verify company names and product details

Return PASS if accurate, or FLAG with specific concerns.

Compliance Guardrails

Ensure regulatory and policy compliance:

CheckDescription
PII DetectionNo personal data exposure
Legal ComplianceGDPR, CCPA, industry regulations
Brand GuidelinesApproved messaging only
Prohibited ContentNo banned topics or language

Example prompt:

Compliance check:
- Scan for any personal identifiable information (PII)
- Verify GDPR compliance for any data mentions
- Ensure no competitor disparagement
- Check for prohibited terms from our brand guidelines

REJECT if PII found. FLAG for legal review if uncertain.

Format Guardrails

Ensure proper output structure:

CheckDescription
JSON ValidationProper JSON structure
Schema ComplianceMatches expected schema
Markdown FormatCorrect heading levels, links
Required FieldsAll mandatory fields present

Example prompt:

Validate the JSON output:
- Must be valid JSON
- Required fields: title, summary, content, tags
- Tags must be an array with 3-5 items
- Content must be at least 500 characters

If invalid, attempt to fix and return corrected JSON.
If unfixable, REJECT with specific error.

Guardrail Response Handling

Configure how your workflow responds to guardrail results:

Pass-Through Mode

Guardrail validates but always continues:

  • Content is passed to next task
  • Issues are logged for review
  • Workflow completes normally

Strict Mode

Guardrail can halt the workflow:

  • PASS: Continue normally
  • MODIFY: Use corrected content and continue
  • REJECT: Stop workflow, return error
  • FLAG: Continue but mark run for review

Human-in-the-Loop

Require human approval for flagged content:

  • Guardrail flags uncertain content
  • Workflow pauses for human review
  • Human approves, rejects, or edits
  • Workflow continues or stops based on decision

Best Practices

1. Layer Your Guardrails

Use multiple guardrails for critical workflows:

[Content Agent]
    → [Quality Guardrail]
    → [Compliance Guardrail]
    → [Format Guardrail]
    → [Output]

2. Be Specific

Vague criteria lead to inconsistent results:

Bad: "Make sure it's good" ✅ Good: "Verify word count is 800-1200, tone is professional, no first-person pronouns"

3. Include Examples

Show the guardrail what good and bad look like:

Examples of PASS:
- "Our Q3 results showed 15% growth..."
- "Contact support@company.com for help..."

Examples of REJECT:
- "John Smith's SSN is 123-45-6789..." (PII)
- "Our competitor's product is terrible..." (disparagement)

4. Log Everything

Keep records of guardrail decisions for:

  • Debugging workflow issues
  • Training better guardrails
  • Compliance audits
  • Performance optimization

5. Test Edge Cases

Test your guardrails with:

  • Obviously good content
  • Obviously bad content
  • Borderline cases
  • Adversarial inputs

Example: Complete Guardrailed Workflow

Here's a content creation workflow with comprehensive guardrails:

[Form Trigger: Content Brief]
        ↓
[Research Agent]
  - Gathers information
        ↓
[Fact-Check Guardrail]    ← Verifies research accuracy
        ↓
[Writer Agent]
  - Creates draft
        ↓
[Quality Guardrail]       ← Checks grammar, tone, length
        ↓
[Compliance Guardrail]    ← Scans for PII, legal issues
        ↓
[Editor Agent]
  - Final polish
        ↓
[Format Guardrail]        ← Validates output structure
        ↓
[Output: Published Content]

Monitoring Guardrail Performance

Track these metrics:

  • Pass Rate: Percentage of content passing first time
  • Common Issues: Most frequent rejection reasons
  • False Positives: Valid content incorrectly rejected
  • False Negatives: Invalid content that passed

Use this data to refine your guardrail prompts and improve over time.

Troubleshooting

Too Many False Positives

  • Make criteria more specific
  • Add more examples of acceptable content
  • Adjust confidence thresholds

Missing Real Issues

  • Expand validation criteria
  • Add specific checks for missed categories
  • Consider multiple guardrails in sequence

Inconsistent Results

  • Use more deterministic prompts
  • Lower temperature settings on guardrail agent
  • Add structured output requirements

Next Steps