Skip to content

Chapter 4 — Multi-Agent

🧩

"Anthropic Research multi-agent beat Claude Opus 4 single by +90.2%.But cost 15x tokens. Worth it?"

You'll learn

  • 4 multi-agent patterns: orchestrator-worker, debate, hierarchical, swarm
  • Anthropic Research case (+90.2% internal eval)
  • Devin + Goldman Sachs: hundreds of instances parallel
  • Frameworks 2026: LangGraph, CrewAI, Claude Code SDK, A2A
  • Cost reality + when to use / not use

01 Why multi-agent?

Single-agent limits

BottleneckSingle-agent
Context windowBloats, performance drops
SequentialMust do A → B → C
Cognitive load1 model handles many domains → confused

Multi-agent unlocks

  • Parallel — 3 tasks at once → 3x faster
  • Specialized — each sub-agent clear role
  • Fresh context — sub-agents don't bloat
  • 15x token cost — expensive!
  • Coordination overhead — orchestrator + comm protocol

02 Anthropic Research case (Apr 2025)

Story

Anthropic built internal multi-agent research system for company research tasks (technical analysis, market scan, competitive intel).

Architecture

        LEAD AGENT (Sonnet)
        Plan + spawn + synthesize

   ┌────────────┼────────────┐
   ▼            ▼            ▼
Worker 1     Worker 2      Worker 3
Research     Verify        Citation
topic A      claims        check

Result

MetricSingle agentMulti-agent
Eval scorebaseline+90.2%
Token cost1x~15x
Latency1x1.5-3x (parallel sub)
Coordination overhead010-20%

"Multi-agent systems can consume approximately 15x more tokens than standard chat interactions."Anthropic


03 Devin + Goldman Sachs — hybrid workforce

Story

Jul 2025: Goldman Sachs deployed Devin (Cognition) alongside 12,000 human engineers.

ItemNumber
Human engineers12,000
Devin instances (peak)Hundreds
Task typeReal engineering tickets
Speed gain3-4x faster on complex tasks
Devin pricing$20-500/month (Devin 2.0 slashed)

"Devin = employee #1 in hybrid workforce."Goldman CIO

Pattern: multi-instance Devin parallel = enterprise-scale multi-agent.

Cognition AI growth

MetricNumber
ARR Sept 2024$1M
ARR June 2025$73M (73x in 9 months)
Valuation Apr 2026$25B

04 4 multi-agent patterns

Pattern 1: Orchestrator-Worker

When: Task parallel-breakable, each subtask different domain, needs final synthesis. Example: Research + audit + report

Pattern 2: Debate

AGENT A ←──→ AGENT B    →    JUDGE AGENT
(propose)   (critique)        (decide)

When: Decision needs multi-perspective, debug LLM bias, critical judgment. Example: Code review (proposer vs critic), legal contract review

Pattern 3: Hierarchical

       CEO AGENT (strategy)

    ┌───────┴───────┐
 MANAGER A      MANAGER B
    │              │
  [W1,W2]      [W3,W4]

When: Big task needs multi-level breakdown, mimicking company. Example: Full product (research → design → code → marketing)

Pattern 4: Swarm (peer-to-peer)

AGENT 1 ←──→ AGENT 2
   ↕              ↕
AGENT 4 ←──→ AGENT 3

When: No clear hierarchy, emergent behavior, simulation/research. Example: Multi-agent RL, simulation


05 Anthropic principles for sub-agents

5 principles Anthropic published (Apr 2025)

1. Each sub-agent needs clear objective — not "help me" vague 2. Output format defined — JSON structured, not free text 3. Tool guidance — tell sub-agent which tool, when 4. Task boundary clear — sub-agent knows when done 5. Fresh context window — no peer-to-peer comm, single summary return


06 Frameworks 2026 landscape

LangGraph — production powerhouse

ItemDetail
ApproachGraph nodes + shared state
Production useKlarna, Uber, LinkedIn agents
StrengthBest LangSmith observability, time-travel checkpoints
Stars (May 2026)Surpassed CrewAI early 2026
Best forMission-critical production

CrewAI — easiest learning curve

ItemDetail
ApproachRole-based DSL (Crew + Task + Agent)
Strength20 lines to start, intuitive
Adoption60% Fortune 500, Insight Partners backing, 44K+ stars
Best forPrototype, agency

AutoGen — maintenance mode (Feb 2026)

ItemDetail
StatusMicrosoft moved to maintenance mode
ReplacementMicrosoft Agent Framework
ImplicationDon't start new projects with AutoGen

OpenAI Swarm → Agents SDK

Lightweight handoff-based, OpenAI-native.

Anthropic Claude Code SDK

Orchestrator-worker via Task tool, MCP-native, Claude-first.

A2A protocol (Google → Linux Foundation)

  • Open standard, donated Jun 2025
  • 150+ supporters — Atlassian, Salesforce, ServiceNow, SAP, Workday
  • Protocol: HTTP + SSE + JSON-RPC 2.0 + Agent Cards
  • Best for: cross-vendor agent communication

07 When to use / not use multi-agent

Use multi-agent when

  • Task has > 3 parallel subtasks
  • Cost not a constraint (15x tokens)
  • Need specialists (security audit + performance + UX)
  • Wide domain breadth (broad research)

Don't use multi-agent when

  • Sequential task (A must finish before B)
  • Tight budget
  • Single domain
  • Latency-sensitive (real-time chat)
  • Coordination overhead > task value

Quick decision

Q1: > 3 parallel subtasks?
   NO  → Single agent
   YES → Q2

Q2: Budget allows 15x tokens?
   NO  → Single agent
   YES → Q3

Q3: Need different-domain specialists?
   NO  → Single agent (just split task)
   YES → ✅ Multi-agent

08 Cost economics

Real numbers (May 2026)

SetupToken / taskCost (Sonnet 4.6)Time
Single Sonnet50K$0.7530s
Orchestrator + 3 Haiku worker200K$1.0520s (parallel)
Orchestrator + 5 Sonnet worker500K$7.5030s
Anthropic Research full750K~$1160-120s

ROI calculation

Task value > $10: multi-agent OK Task value < $1: single agent Thousand tasks/day: cost compounds — optimize prompt cache + Haiku for sub


09 Common pitfalls

🚨 6 multi-agent mistakes

1. Multi-agent for sequential task → 15x cost no gain 2. Sub-agent context overlap → redundant info, wasted cost 3. Inconsistent output format → orchestrator can't parse 4. Forget timeout → 1 sub stuck → orchestrator waits forever 5. No cost monitoring → end-of-month bill shock 6. Eval doesn't cover multi-agent → don't know output quality


10 Application in emerging markets

Multi-agent use cases

Use casePatternStack
Multi-channel CS (Messenger + Zalo + web)Orchestrator-WorkerSmax.ai + Anthropic
Tax compliance auditHierarchicalClaude + MCP for tax DB
Multi-marketplace e-comSwarmn8n + Claude per channel
Investment research local marketDebateLangGraph + market data

Smax.ai = micro multi-agent

Each channel (Messenger, Zalo, web, IG) = 1 sub-agent:

  • Own context (channel + customer history)
  • Own tools (CRM, inventory, shipping)
  • Orchestrator: human agent handover

Yody case: +15-20% close rate = multi-agent in easy-access form for SMEs.

Cost economics

  • Emerging market dev rate: $20-50/hr
  • Multi-agent system build: 2-4 weeks
  • End-product cost: $200-1K/month API
  • Project price (consultant): $5K-30K

→ Clear ROI for SMEs with > 1K conversations/month.


11 Practice exercises

✍️ 3 levels

Level 1 — 1 week

  • Implement basic orchestrator-worker
  • Task: "Audit OSS repo for X, Y, Z" parallel
  • Compare cost vs single agent

Level 2 — 1 month

  • Use CrewAI or LangGraph
  • Build 1 production multi-agent (3+ workers)
  • Add observability (LangSmith)

Level 3 — 3 months

  • Pitch SME: multi-channel agent
  • Deliver (Smax.ai + n8n hybrid)
  • Charge $10K+

12 Continue reading

Final word

"Multi-agent isn't always better.15x tokens = need clear ROI.Simple rule:- Task value > $10 → orchestrator-worker- Task value < $1 → single agent- Parallel breakdown clear → multi-agent- Sequential → single agent, even if big."