PydanticAI Structured Outputs: How to Get Reliable, Validated Data from Your Agents

PydanticAI's killer feature is type-safe, validated agent outputs. Here is how to use it properly -- and what breaks when you don't.

Why This Matters

Most agent frameworks return plain text and leave parsing to you. PydanticAI takes a different approach: you declare a Pydantic model as the agent's result type, and PydanticAI guarantees the output conforms to that schema -- or retries automatically until it does. This is its core differentiator.

The feature sounds straightforward but has sharp edges. This guide covers the full pattern, the common mistakes, and how to handle validation failures gracefully in production.

The Basic Pattern

from pydantic import BaseModel, Field
from pydantic_ai import Agent
 
# Define the exact structure you want back
class ResearchReport(BaseModel):
    title: str
    summary: str = Field(description="2-3 sentence executive summary")
    key_findings: list[str] = Field(min_length=3, max_length=8)
    confidence_score: float = Field(ge=0.0, le=1.0)
    sources: list[str]
 
# Declare it as the result type -- PydanticAI enforces it
agent = Agent(
    model="claude-sonnet-4-6",
    result_type=ResearchReport,
    system_prompt=(
        "You are a research analyst. Always provide thorough research "
        "with verifiable sources and a confidence score."
    ),
)
 
# The result is always a ResearchReport -- fully type-safe
result = await agent.run("Research the current state of AI agent frameworks")
report: ResearchReport = result.data
 
print(report.title)
print(report.confidence_score)  # guaranteed to be 0.0-1.0
for finding in report.key_findings:  # guaranteed to be 3-8 items
    print(f"- {finding}")

Add Field(description=...) to every field. PydanticAI passes these descriptions to the LLM as guidance for what each field should contain. The better your field descriptions, the higher your first-attempt success rate.

Validation Constraints That Work Well

Pydantic's full validation toolkit works in result types. These are the constraints that have the most impact on output quality:

from pydantic import BaseModel, Field, field_validator
from typing import Literal
 
class ProductReview(BaseModel):
    product_name: str = Field(min_length=1, max_length=100)
    rating: int = Field(ge=1, le=5, description="Rating from 1 (worst) to 5 (best)")
    sentiment: Literal["positive", "neutral", "negative"]
    pros: list[str] = Field(min_length=1, description="At least one positive aspect")
    cons: list[str] = Field(description="Negative aspects -- can be empty list")
    word_count_estimate: int = Field(gt=0)
 
    # Custom validator for business logic
    @field_validator("rating")
    @classmethod
    def rating_matches_sentiment(cls, v, info):
        sentiment = info.data.get("sentiment")
        if sentiment == "positive" and v < 3:
            raise ValueError("Positive sentiment requires rating >= 3")
        if sentiment == "negative" and v > 2:
            raise ValueError("Negative sentiment requires rating <= 2")
        return v

Handling Validation Failures

When the LLM returns data that fails validation, PydanticAI automatically retries with the validation error included in the next prompt. By default it retries once. You can configure this, but you should also handle the case where retries are exhausted.

from pydantic_ai import Agent
from pydantic_ai.exceptions import UnexpectedModelBehavior
 
agent = Agent(
    model="claude-sonnet-4-6",
    result_type=ResearchReport,
    result_retries=3,  # retry up to 3 times on validation failure (default: 1)
)
 
try:
    result = await agent.run("Research AI agent frameworks")
    report = result.data
 
except UnexpectedModelBehavior as e:
    # All retries exhausted -- model could not produce valid output
    print(f"Agent failed after retries: {e}")
    # Options: fall back to unstructured output, alert, or queue for human review
    fallback = await unstructured_agent.run("Research AI agent frameworks")

Setting result_retries too high increases latency and cost. Each retry is a full LLM call with the validation error injected. For most use cases, 2-3 retries is sufficient. If you regularly need more than 3 retries, the problem is usually in your Pydantic model design or field descriptions, not the retry count.

Union Types for Multi-Format Responses

Sometimes an agent needs to return different structures depending on what it finds. PydanticAI supports Union result types -- the model picks the appropriate one.

from typing import Union
from pydantic import BaseModel
 
class SuccessResult(BaseModel):
    status: Literal["success"]
    data: dict
    record_count: int
 
class ErrorResult(BaseModel):
    status: Literal["error"]
    error_message: str
    suggested_fix: str
 
class NoDataResult(BaseModel):
    status: Literal["no_data"]
    reason: str
 
agent = Agent(
    model="claude-sonnet-4-6",
    result_type=Union[SuccessResult, ErrorResult, NoDataResult],
    system_prompt="Query the data source and return the appropriate result type.",
)
 
result = await agent.run("Fetch Q1 2026 sales data")
 
# Type narrowing in Python
match result.data.status:
    case "success":
        print(f"Got {result.data.record_count} records")
    case "error":
        print(f"Error: {result.data.error_message}")
    case "no_data":
        print(f"No data: {result.data.reason}")

Testing Structured Outputs

PydanticAI has first-class support for testing via its TestModel -- a mock model that returns fixed responses without making LLM calls. This is the right way to test your validation logic.

from pydantic_ai.models.test import TestModel
 
# Test that your schema accepts valid data
def test_valid_report():
    with agent.override(model=TestModel()):
        result = agent.run_sync("test query")
        # TestModel returns minimal valid data matching your schema
        assert isinstance(result.data, ResearchReport)
        assert len(result.data.key_findings) >= 3
 
# Test that invalid data is rejected
def test_validation_rejects_bad_confidence():
    bad_data = ResearchReport(
        title="Test",
        summary="Test summary that is long enough",
        key_findings=["a", "b", "c"],
        confidence_score=1.5,  # invalid -- should be <= 1.0
        sources=["http://example.com"],
    )  # this will raise ValidationError

Quick Reference

Set result_type to a Pydantic model to get type-safe, validated outputs
Add Field(description=...) to every field -- the LLM uses these as instructions
Use Literal types for fields with fixed allowed values (e.g. sentiment, status)
Set result_retries=2 or 3 in production -- always catch UnexpectedModelBehavior
Use Union types when the agent might return different structures
Test with TestModel to validate your schema logic without LLM calls