Multi-Agent Systems with Claude: Orchestrators, Subagents, and When to Split

When One Agent Is Not Enough

A single Claude agent with 20 tools and a 10,000-token task description works — until it doesn't. Long contexts degrade instruction-following. Too many tools dilutes selection quality. Tasks with clearly separable concerns (research, write, review) benefit from dedicated agents with focused system prompts and trimmed tool sets.

The rule of thumb: split when a single agent's context would need to hold the full state of two unrelated workstreams simultaneously, or when you want different reliability/cost tradeoffs per subtask.

The Orchestrator/Subagent Pattern

An orchestrator agent receives the high-level task, decomposes it, delegates to specialist subagents, collects results, and synthesises the final output. Subagents receive narrow, well-defined tasks and return structured results.

The orchestrator calls subagents via tools. From Claude's perspective, a subagent is just another tool that takes a task description and returns a result string.

import anthropic
import json
 
client = anthropic.Anthropic()
 
# --- Subagent: Researcher ---
 
RESEARCHER_SYSTEM = (
    'You are a research specialist. Given a topic, search the web and return '
    'a concise JSON summary with keys: summary, key_facts (list), sources (list of URLs).'
)
 
def run_researcher(topic: str) -> str:
    messages = [{'role': 'user', 'content': f'Research this topic: {topic}'}]
    response, _ = run_agent(messages, search_tools, system=RESEARCHER_SYSTEM)
    # Return the text content as a string result
    for block in response.content:
        if hasattr(block, 'text'):
            return block.text
    return 'No research result'
 
# --- Subagent: Writer ---
 
WRITER_SYSTEM = (
    'You are a content writer. Given a research summary, write a 400-word '
    'blog section in markdown. Be factual, clear, and cite sources inline.'
)
 
def run_writer(research_summary: str, section_title: str) -> str:
    prompt = f'Section title: {section_title}\n\nResearch:\n{research_summary}'
    messages = [{'role': 'user', 'content': prompt}]
    response, _ = run_agent(messages, [], system=WRITER_SYSTEM)
    for block in response.content:
        if hasattr(block, 'text'):
            return block.text
    return ''

Wiring the Orchestrator

The orchestrator has tools that call subagents. It never directly calls the web search API or the writing logic — it delegates. This keeps the orchestrator's context lean and its tools list short.

ORCHESTRATOR_TOOLS = [
    {
        'name': 'research_topic',
        'description': 'Research a topic on the web and return a structured summary.',
        'input_schema': {
            'type': 'object',
            'properties': {
                'topic': {'type': 'string', 'description': 'The topic to research'},
            },
            'required': ['topic'],
        },
    },
    {
        'name': 'write_section',
        'description': 'Write a blog section given a research summary and section title.',
        'input_schema': {
            'type': 'object',
            'properties': {
                'research_summary': {'type': 'string'},
                'section_title': {'type': 'string'},
            },
            'required': ['research_summary', 'section_title'],
        },
    },
]
 
def orchestrator_execute_tool(name, inputs):
    if name == 'research_topic':
        return run_researcher(inputs['topic'])
    elif name == 'write_section':
        return run_writer(inputs['research_summary'], inputs['section_title'])
    return f'Unknown tool: {name}'

Pass structured data (JSON strings) between agents rather than free-form text. The researcher returns a JSON dict; the writer receives it as a string but can parse it. This makes the pipeline testable: you can unit-test each agent independently with a fixed input.

Parallel Subagent Execution

If subagents are independent (no output from one feeds another), run them in parallel with concurrent.futures. The orchestrator still calls them via its tool loop — but your tool executor runs them concurrently.

from concurrent.futures import ThreadPoolExecutor, as_completed
 
def run_parallel_research(topics: list[str]) -> dict[str, str]:
    results = {}
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = {executor.submit(run_researcher, t): t for t in topics}
        for future in as_completed(futures):
            topic = futures[future]
            try:
                results[topic] = future.result()
            except Exception as e:
                results[topic] = f'Research failed: {e}'
    return results

Passing Context Between Agents

Agents have no shared state. Everything a subagent needs must be in its initial message or system prompt. If the orchestrator has gathered context over multiple turns, summarise it before passing to a subagent — do not pass the entire conversation history.

What to pass	How	What NOT to do
Task inputs	Embed in the user message	Pass raw conversation history
Shared config (company name, tone guidelines)	System prompt	Repeat in every tool result
Results from previous subagents	Summarised string in user message	Pass the full raw output if it is thousands of tokens
Structured data (IDs, schemas)	JSON string in user message	Rely on Claude to remember across agent boundaries

When NOT to Use Multiple Agents

Multi-agent systems add complexity, latency, and cost. Do not split into multiple agents just because a task has multiple steps. A single agent with a well-structured system prompt handles most tasks up to 10,000-15,000 tokens of context.

Use one agent when steps are sequential and share state (each step's output is the next step's input)
Use one agent when the total context fits comfortably within the model's window
Use multiple agents when tasks are truly parallel and independent
Use multiple agents when different subtasks need different system prompts, tools, or model tiers
Use multiple agents when you want to isolate failures (one subagent failing should not abort the whole pipeline)