Adding Guardrails
Validate and filter AI agent outputs to ensure quality, accuracy, and compliance in your workflows.
Guardrails are validation and filtering mechanisms that ensure your AI agents produce high-quality, accurate, and compliant outputs. They act as checkpoints within your workflows.
What are Guardrails?
Guardrails are special tasks that review and validate the output of other agents before passing it along. They can:
- Validate content meets specific criteria
- Filter inappropriate or incorrect information
- Transform outputs into required formats
- Flag content that needs human review
How Guardrails Work
[Agent Task] → [Guardrail Task] → [Next Task or Output]
The guardrail agent receives the previous task's output and evaluates it against defined criteria. Based on the evaluation, it can:
- Pass: Forward the output unchanged
- Modify: Make corrections and forward
- Reject: Stop the workflow with an error
- Flag: Continue but mark for review
Setting Up Guardrails
Step 1: Create a Guardrail Agent
First, create an AI agent specifically for validation:
- Go to Agents → Create Agent
- Name it descriptively (e.g., "Content Compliance Checker")
- Configure the system prompt with validation criteria:
You are a content validation agent. Review the provided content and check for:
1. Factual accuracy - Flag any unverified claims
2. Tone appropriateness - Ensure professional language
3. Compliance - No sensitive data exposure
4. Completeness - All required sections present
Respond with:
- STATUS: PASS, MODIFY, REJECT, or FLAG
- ISSUES: List any problems found
- CORRECTED_CONTENT: If MODIFY, provide corrected version
- NOTES: Any additional context
Step 2: Add to Workflow
- Open your workflow in the builder
- Add a new task after the task you want to validate
- Select your guardrail agent
- Configure context settings:
- Full Context: See all previous conversation
- Isolated: Only see the immediate output to validate
Step 3: Configure Instructions
Add task-specific instructions for what to validate:
Validate the blog post draft for:
- No placeholder text remaining
- Proper heading structure (H1, H2, H3)
- All links are formatted correctly
- Word count between 800-1200 words
- Includes a call-to-action
Types of Guardrails
Content Quality Guardrails
Ensure outputs meet quality standards:
| Check | Description |
|---|---|
| Grammar & Spelling | Catch errors before publishing |
| Tone Consistency | Maintain brand voice |
| Completeness | All required sections present |
| Length | Within specified word/character limits |
Example prompt:
Review this content for quality:
- Fix any grammar or spelling errors
- Ensure tone matches our brand (professional but friendly)
- Verify all sections from the outline are included
- Confirm length is 800-1200 words
If issues found, correct them and return the improved version.
Factual Accuracy Guardrails
Verify claims and data:
| Check | Description |
|---|---|
| Claim Verification | Flag unsubstantiated claims |
| Data Accuracy | Verify numbers and statistics |
| Source Attribution | Ensure claims have sources |
| Outdated Info | Flag potentially stale data |
Example prompt:
Fact-check this content:
- Identify any claims that need verification
- Flag statistics without sources
- Check for potentially outdated information
- Verify company names and product details
Return PASS if accurate, or FLAG with specific concerns.
Compliance Guardrails
Ensure regulatory and policy compliance:
| Check | Description |
|---|---|
| PII Detection | No personal data exposure |
| Legal Compliance | GDPR, CCPA, industry regulations |
| Brand Guidelines | Approved messaging only |
| Prohibited Content | No banned topics or language |
Example prompt:
Compliance check:
- Scan for any personal identifiable information (PII)
- Verify GDPR compliance for any data mentions
- Ensure no competitor disparagement
- Check for prohibited terms from our brand guidelines
REJECT if PII found. FLAG for legal review if uncertain.
Format Guardrails
Ensure proper output structure:
| Check | Description |
|---|---|
| JSON Validation | Proper JSON structure |
| Schema Compliance | Matches expected schema |
| Markdown Format | Correct heading levels, links |
| Required Fields | All mandatory fields present |
Example prompt:
Validate the JSON output:
- Must be valid JSON
- Required fields: title, summary, content, tags
- Tags must be an array with 3-5 items
- Content must be at least 500 characters
If invalid, attempt to fix and return corrected JSON.
If unfixable, REJECT with specific error.
Guardrail Response Handling
Configure how your workflow responds to guardrail results:
Pass-Through Mode
Guardrail validates but always continues:
- Content is passed to next task
- Issues are logged for review
- Workflow completes normally
Strict Mode
Guardrail can halt the workflow:
- PASS: Continue normally
- MODIFY: Use corrected content and continue
- REJECT: Stop workflow, return error
- FLAG: Continue but mark run for review
Human-in-the-Loop
Require human approval for flagged content:
- Guardrail flags uncertain content
- Workflow pauses for human review
- Human approves, rejects, or edits
- Workflow continues or stops based on decision
Best Practices
1. Layer Your Guardrails
Use multiple guardrails for critical workflows:
[Content Agent]
→ [Quality Guardrail]
→ [Compliance Guardrail]
→ [Format Guardrail]
→ [Output]
2. Be Specific
Vague criteria lead to inconsistent results:
❌ Bad: "Make sure it's good" ✅ Good: "Verify word count is 800-1200, tone is professional, no first-person pronouns"
3. Include Examples
Show the guardrail what good and bad look like:
Examples of PASS:
- "Our Q3 results showed 15% growth..."
- "Contact support@company.com for help..."
Examples of REJECT:
- "John Smith's SSN is 123-45-6789..." (PII)
- "Our competitor's product is terrible..." (disparagement)
4. Log Everything
Keep records of guardrail decisions for:
- Debugging workflow issues
- Training better guardrails
- Compliance audits
- Performance optimization
5. Test Edge Cases
Test your guardrails with:
- Obviously good content
- Obviously bad content
- Borderline cases
- Adversarial inputs
Example: Complete Guardrailed Workflow
Here's a content creation workflow with comprehensive guardrails:
[Form Trigger: Content Brief]
↓
[Research Agent]
- Gathers information
↓
[Fact-Check Guardrail] ← Verifies research accuracy
↓
[Writer Agent]
- Creates draft
↓
[Quality Guardrail] ← Checks grammar, tone, length
↓
[Compliance Guardrail] ← Scans for PII, legal issues
↓
[Editor Agent]
- Final polish
↓
[Format Guardrail] ← Validates output structure
↓
[Output: Published Content]
Monitoring Guardrail Performance
Track these metrics:
- Pass Rate: Percentage of content passing first time
- Common Issues: Most frequent rejection reasons
- False Positives: Valid content incorrectly rejected
- False Negatives: Invalid content that passed
Use this data to refine your guardrail prompts and improve over time.
Troubleshooting
Too Many False Positives
- Make criteria more specific
- Add more examples of acceptable content
- Adjust confidence thresholds
Missing Real Issues
- Expand validation criteria
- Add specific checks for missed categories
- Consider multiple guardrails in sequence
Inconsistent Results
- Use more deterministic prompts
- Lower temperature settings on guardrail agent
- Add structured output requirements
Next Steps
- Creating a Workflow - Build workflows with guardrails
- Workflow Triggers - Configure workflow triggers
- Introduction to Workflows - Workflow fundamentals