What Are Guardrails and Why Do You Need Them?

An agent that responds helpfully to valid requests can still cause serious problems if it: processes inputs it should refuse, leaks PII in its outputs, returns data in the wrong format, or gets manipulated into off-topic behaviour by prompt injection.

The OpenAI Agents SDK provides a first-class guardrails system — input_guardrails run before the agent processes a message, and output_guardrails run before the response reaches the user. Both can block, modify, or pass through based on any logic you define.

How Guardrails Work

Guardrails are async functions that return a GuardrailFunctionOutput. If tripwire_triggered is True, the SDK raises a GuardrailTripwireTriggered exception and the agent's response is blocked. The guardrail can also return an output_info dict with metadata about why it triggered.

from agents import (
    Agent, Runner,
    input_guardrail, output_guardrail,
    GuardrailFunctionOutput, RunContextWrapper,
    TResponseInputItem
)
from pydantic import BaseModel
 
class GuardrailOutput(BaseModel):
    is_safe: bool
    reason: str = ""
 

Input Guardrail: Block Off-Topic Requests

from agents import Agent, Runner, input_guardrail, GuardrailFunctionOutput
from pydantic import BaseModel
from openai import AsyncOpenAI
 
client = AsyncOpenAI()
 
class TopicCheck(BaseModel):
    is_on_topic: bool
    reason: str
 
@input_guardrail
async def topic_guardrail(ctx, agent, input):
    """Ensure the user is asking about software engineering topics only."""
    user_message = input[-1]["content"] if isinstance(input, list) else str(input)
 
    response = await client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": (
                "Determine if the following message is about software engineering, "
                "coding, or technical topics. Return is_on_topic=false for anything else."
            )},
            {"role": "user", "content": user_message}
        ],
        response_format=TopicCheck
    )
    result = response.choices[0].message.parsed
 
    return GuardrailFunctionOutput(
        output_info=result,
        tripwire_triggered=not result.is_on_topic
    )
 
agent = Agent(
    name="Engineering Assistant",
    instructions="You are a software engineering assistant.",
    input_guardrails=[topic_guardrail]
)
 
from agents import GuardrailTripwireTriggered
import asyncio
 
async def run():
    try:
        result = await Runner.run(agent, input="What is the best recipe for pasta?")
        print(result.final_output)
    except GuardrailTripwireTriggered as e:
        print("Blocked:", e.guardrail_result.output.output_info)
        # Return a friendly message to the user instead
        print("I can only help with software engineering topics.")
 
asyncio.run(run())
 
Run guardrail checks in parallel with the agent using asyncio.gather() where possible — this removes the latency cost of guardrail checks on the critical path. The SDK does this automatically when guardrails are attached to the agent object.

Output Guardrail: Prevent PII Leakage

import re
from agents import output_guardrail, GuardrailFunctionOutput
 
# Patterns to detect common PII
PII_PATTERNS = [
    r'\b\d{3}-\d{2}-\d{4}\b',           # SSN
    r'\b\d{16}\b',                          # credit card (naive)
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',  # email
]
 
@output_guardrail
async def pii_guardrail(ctx, agent, output):
    """Block any response that appears to contain PII."""
    output_text = output.final_output if hasattr(output, 'final_output') else str(output)
 
    detected = []
    for pattern in PII_PATTERNS:
        if re.search(pattern, output_text):
            detected.append(pattern)
 
    return GuardrailFunctionOutput(
        output_info={"detected_patterns": detected},
        tripwire_triggered=len(detected) > 0
    )
 
agent = Agent(
    name="Data Assistant",
    instructions="You help users query data. Never output raw PII.",
    output_guardrails=[pii_guardrail]
)
 

Output Guardrail: Enforce Structured Format

If downstream systems require a specific output format, use an output guardrail to verify it before the response leaves the agent:

import json
from agents import output_guardrail, GuardrailFunctionOutput
 
@output_guardrail
async def json_format_guardrail(ctx, agent, output):
    """Ensure the output is valid JSON with required keys."""
    output_text = output.final_output if hasattr(output, 'final_output') else str(output)
 
    try:
        parsed = json.loads(output_text)
        required_keys = {"summary", "action_items", "priority"}
        missing = required_keys - set(parsed.keys())
        if missing:
            return GuardrailFunctionOutput(
                output_info={"error": f"Missing keys: {missing}"},
                tripwire_triggered=True
            )
    except json.JSONDecodeError:
        return GuardrailFunctionOutput(
            output_info={"error": "Output is not valid JSON"},
            tripwire_triggered=True
        )
 
    return GuardrailFunctionOutput(output_info={}, tripwire_triggered=False)
 

Lightweight Guardrails Without a Second LLM Call

LLM-based guardrail checks add latency and cost. For many cases, a fast rule-based check is sufficient and much cheaper:

from agents import input_guardrail, GuardrailFunctionOutput
 
BLOCKED_KEYWORDS = {"jailbreak", "ignore previous instructions", "pretend you are"}
MAX_INPUT_LENGTH = 2000
 
@input_guardrail
async def fast_input_guardrail(ctx, agent, input):
    """Fast rule-based checks — no LLM call needed."""
    user_message = input[-1]["content"] if isinstance(input, list) else str(input)
    msg_lower = user_message.lower()
 
    # Length check
    if len(user_message) > MAX_INPUT_LENGTH:
        return GuardrailFunctionOutput(
            output_info={"reason": "Input too long"},
            tripwire_triggered=True
        )
 
    # Keyword check for prompt injection attempts
    for keyword in BLOCKED_KEYWORDS:
        if keyword in msg_lower:
            return GuardrailFunctionOutput(
                output_info={"reason": f"Blocked keyword: {keyword}"},
                tripwire_triggered=True
            )
 
    return GuardrailFunctionOutput(output_info={}, tripwire_triggered=False)
 
Layer fast rule-based guardrails first, then LLM-based guardrails only for ambiguous cases. This keeps latency low for the common case where the input is clearly safe.

Handling Tripwire Events in Production

from agents import GuardrailTripwireTriggered, Runner
import logging
 
logger = logging.getLogger(__name__)
 
async def safe_agent_run(user_input: str, user_id: str) -> str:
    try:
        result = await Runner.run(agent, input=user_input)
        return result.final_output
 
    except GuardrailTripwireTriggered as e:
        # Log for monitoring — frequent triggers may indicate abuse
        logger.warning(
            "Guardrail triggered",
            extra={
                "user_id": user_id,
                "guardrail": e.guardrail_result.guardrail.__name__,
                "reason": e.guardrail_result.output.output_info,
            }
        )
        # Return a user-friendly message — never expose the raw guardrail reason
        return "I'm not able to help with that request. Please try a different question."
 

Guardrail Design Principles

  • Make guardrails specific — a guardrail that catches everything also catches legitimate requests. Define exactly what should be blocked.
  • Never expose guardrail internals to users — do not show the raw tripwire reason in the UI, only a generic message.
  • Log every tripwire event — a spike in guardrail triggers is a signal worth alerting on.
  • Test guardrails with adversarial inputs — red-team your own guardrails before launch.
  • Use gpt-4o-mini or claude-haiku for LLM-based guardrail checks — they are fast and cheap enough that latency impact is minimal.