# Multi-Agent Case Studies Real-world examples of multi-agent systems in production, drawn from field experience. ## Case Study Index | # | Name | Pattern | Agents | Key Lesson | |---|------|---------|--------|------------| | 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution | | 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow | | 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis | | 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building | | 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases | | 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation | | 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility | | 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep | --- ## Case Study 1: AI Docs Loader **Pattern:** Sub-agent delegation for parallel work **Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens. **Solution:** Delegate each scrape to isolated sub-agent **Architecture:** ```text Primary Agent (9k tokens) ├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated) ├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated) ├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated) ... └→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated) Total work: 39k tokens Primary agent: Only 9k tokens ✅ Context protected: 30k tokens kept out of primary ``` **Implementation:** ```bash # Single command /load-ai-docs # Agent reads list from ai-docs/README.md # For each URL older than 24 hours: # - Spawn sub-agent # - Sub-agent scrapes URL # - Sub-agent saves to file # - Sub-agent reports completion # Primary agent never sees scrape content ``` **Key techniques:** - **Sub-agents for isolation** - Each scrape in separate context - **Parallel execution** - All 10 scrapes run simultaneously - **Context delegation** - 30k tokens stay out of primary **Results:** - **Time:** 10 scrapes in parallel vs. sequential (10x faster) - **Context:** Primary agent stays at 9k tokens throughout - **Scalability:** Can handle 50+ URLs without primary context issues **Source:** Elite Context Engineering transcript --- ## Case Study 2: SDK Migration **Pattern:** Scout-plan-build with multiple perspectives **Problem:** Migrating codebase to new Claude Agent SDK across 8 applications **Challenge:** - 100+ files potentially affected - Agent reading everything = 150k+ tokens - Planning without full context = mistakes **Solution:** Three-phase workflow with delegation **Phase 1: Scout (Reduce context for planner)** ```text Orchestrator spawns 4 scout agents (parallel): ├→ Scout 1: Gemini Lightning (fast, different perspective) ├→ Scout 2: CodeX (specialized for code search) ├→ Scout 3: Gemini Flash Preview └→ Scout 4: Haiku (cheap, fast) Each scout: - Searches codebase for SDK usage - Identifies exact files and line numbers - Notes patterns (e.g., "system prompt now explicit") Output: relevant-files.md (5k tokens) ├── File paths ├── Line number offsets ├── Character ranges └── Relevant code snippets ``` **Why multiple models?** Diverse perspectives catch edge cases single model might miss. **Phase 2: Plan (Focus on relevant subset)** ```text Planner agent (new instance): ├── Reads relevant-files.md (5k tokens) ├── Scrapes SDK documentation (8k tokens) ├── Analyzes migration patterns └── Creates detailed-plan.md (3k tokens) Context used: 16k tokens vs. 150k if reading entire codebase Savings: 89% reduction ``` **Phase 3: Build (Execute plan)** ```text Builder agent (new instance): ├── Reads detailed-plan.md (3k tokens) ├── Implements changes across 8 apps ├── Updates system prompts ├── Tests each application └── Reports completion Context used: ~80k tokens Still within safe limits ``` **Final context analysis:** ```text If single agent: ├── Search: 40k tokens ├── Read files: 60k tokens ├── Plan: 20k tokens ├── Implement: 30k tokens └── Total: 150k tokens (75% used) With scout-plan-build: ├── Primary orchestrator: 10k tokens ├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary ├── Planner (new agent): 16k tokens ├── Builder (new agent): 80k tokens └── Max per agent: 80k tokens (40% per agent) ``` **Key techniques:** - **Composable workflows** - Chain /scout, /plan, /build - **Multiple scout models** - Diverse perspectives - **Context offloading** - Scouts protect planner's context - **Fresh agents per phase** - No context accumulation **Results:** - **8 applications migrated** successfully - **51% context used** in builder phase (safe margins) - **No context explosions** across entire workflow - **Completed in single session** (~30 minutes) **Near miss:** "We were 14% away from exploding our context" due to autocompact buffer **Lesson:** Disable autocompact buffer. That 22% matters at scale. **Source:** Claude 2.0 transcript --- ## Case Study 3: Codebase Summarization **Pattern:** Orchestrator with specialized QA agents **Problem:** Summarize large codebase (frontend + backend) with architecture docs **Approach:** Divide and conquer with synthesis **Architecture:** ```text Orchestrator Agent ├→ Creates Frontend QA Agent │ ├─ Summarizes frontend components │ └─ Outputs: frontend-summary.md ├→ Creates Backend QA Agent │ ├─ Summarizes backend APIs │ └─ Outputs: backend-summary.md └→ Creates Primary QA Agent ├─ Reads both summaries ├─ Synthesizes unified view └─ Outputs: codebase-overview.md ``` **Orchestrator behavior:** ```text 1. Parse user request: "Summarize codebase" 2. Create 3 agents with specialized tasks 3. Command each agent with detailed prompts 4. SLEEP (not observing their work) 5. Wake every 15s to check status 6. Agents complete → Orchestrator wakes 7. Collect results (read produced files) 8. Summarize for user 9. Delete all 3 agents ``` **Prompts from orchestrator:** ```markdown Frontend QA Agent: "Analyze all files in src/frontend/. Create markdown summary with: - Key components and their responsibilities - State management approach - Routing structure - Technology stack Output to docs/frontend-summary.md" Backend QA Agent: "Analyze all files in src/backend/. Create markdown summary with: - API endpoints and their purposes - Database schema - Authentication/authorization - External integrations Output to docs/backend-summary.md" Primary QA Agent: "Read frontend-summary.md and backend-summary.md. Create unified overview with: - High-level architecture - How components interact - Data flow - Key technologies Output to docs/codebase-overview.md" ``` **Observability interface shows:** ```text [Agent 1] Frontend QA ├── Status: Complete ✅ ├── Context: 28k tokens used ├── Files consumed: 15 files ├── Files produced: frontend-summary.md └── Time: 45 seconds [Agent 2] Backend QA ├── Status: Complete ✅ ├── Context: 32k tokens used ├── Files consumed: 12 files ├── Files produced: backend-summary.md └── Time: 52 seconds [Agent 3] Primary QA ├── Status: Complete ✅ ├── Context: 18k tokens used ├── Files consumed: 2 files (summaries) ├── Files produced: codebase-overview.md └── Time: 30 seconds Orchestrator: ├── Context: 12k tokens (commands only, not observing work) ├── Total time: 52 seconds (parallel execution) └── All agents deleted after completion ``` **Key techniques:** - **Parallel frontend/backend** - 2x speedup - **Orchestrator sleeps** - Protects its context - **Synthesis agent** - Combines perspectives - **Deletable agents** - Freed after use **Results:** - **3 comprehensive docs** created - **Max context per agent:** 32k tokens (16%) - **Orchestrator context:** 12k tokens (6%) - **Time:** 52 seconds (vs. 2+ minutes sequential) **Source:** One Agent to Rule Them All transcript --- ## Case Study 4: UI Component Creation **Pattern:** Scout-builder two-stage **Problem:** Create gray pills for app header information display **Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns. **Solution:** Scout locates, builder implements **Phase 1: Scout** ```text Scout Agent: ├── Task: "Find header UI component files" ├── Searches for: header, display, pills, info components ├── Identifies patterns: existing pill styles, color conventions ├── Locates exact files: │ ├── src/components/AppHeader.vue │ ├── src/styles/pills.css │ └── src/utils/formatters.ts └── Outputs: scout-header-report.md with: ├── File locations ├── Line numbers for modifications ├── Existing patterns to follow └── Recommended approach ``` **Phase 2: Builder** ```text Builder Agent: ├── Reads scout-header-report.md ├── Follows identified patterns ├── Creates gray pill components ├── Applies consistent styling ├── Outputs modified files with exact changes └── Context: Only 30k tokens (vs. 80k+ without scout) ``` **Orchestrator involvement:** ```text 1. User prompts: "Create gray pills for header" 2. Orchestrator creates Scout 3. Orchestrator SLEEPS (checks every 15s) 4. Scout completes → Orchestrator wakes 5. Orchestrator reads scout output 6. Orchestrator creates Builder with detailed instructions 7. Orchestrator SLEEPS again 8. Builder completes → Orchestrator wakes 9. Orchestrator reports results 10. Orchestrator deletes both agents ``` **Key techniques:** - **Scout reduces uncertainty** - Builder knows exactly where to work - **Pattern following** - Scout identifies conventions - **Orchestrator sleep** - Two phases, minimal orchestrator context - **Precise targeting** - No wasted reads **Results:** - **Scout:** 15k tokens, 20 seconds - **Builder:** 30k tokens, 35 seconds - **Orchestrator:** 8k tokens final - **Total time:** 55 seconds - **Feature shipped** correctly on first try **Source:** One Agent to Rule Them All transcript --- ## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board **Pattern:** Structured lifecycle with quality gates **Problem:** Ensure all changes go through proper review before shipping **Architecture:** ```text Task Board Columns: ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ``` **Example task: "Update HTML titles"** **Column 1: PLAN** ```text Planner Agent: ├── Analyzes requirement ├── Identifies affected files: │ ├── index.html │ └── src/App.tsx (has in render) ├── Creates implementation plan: │ 1. Update index.html <title> │ 2. Update App.tsx header component │ 3. Test both pages load correctly └── Moves task to BUILD column ``` **Column 2: BUILD** ```text Builder Agent: ├── Reads plan from PLAN column ├── Implements changes: │ ├── index.html: "Plan Build Review Ship" │ └── App.tsx: header="Plan Build Review Ship" ├── Runs tests: All passing ✅ └── Moves task to REVIEW column ``` **Column 3: REVIEW** ```text Reviewer Agent: ├── Reads plan and implementation ├── Checks: │ ├── Plan followed? ✅ │ ├── Tests passing? ✅ │ ├── Code quality? ✅ │ └── No security issues? ✅ ├── Approves changes └── Moves task to SHIP column ``` **Column 4: SHIP** ```text Shipper Agent: ├── Creates git commit ├── Pushes to remote ├── Updates deployment └── Marks task complete ``` **Orchestrator's role:** ```text - NOT micromanaging each step - Responding to user commands like "Move task to next phase" - Tracking task state in database - Providing UI showing current phase - Can intervene if phase fails (e.g., tests fail in BUILD) ``` **UI representation:** ```text Task: Update Titles ├── Status: REVIEW ├── Assigned: reviewer-agent-003 ├── History: │ ├── PLAN: planner-001 (completed 2m ago) │ ├── BUILD: builder-002 (completed 1m ago) │ └── REVIEW: reviewer-003 (in progress) └── Files modified: 2 ``` **Key techniques:** - **Clear phases** - No ambiguity about current state - **Quality gates** - Can't skip to SHIP without REVIEW - **Agent specialization** - Each agent expert in its phase - **Failure isolation** - If BUILD fails, PLAN preserved **Results:** - **Zero shipping untested code** (REVIEW gate catches issues) - **Clear audit trail** (who did what in which phase) - **Parallel tasks** (multiple agents in different columns) - **Single interface** (user sees all tasks across all phases) **Source:** Custom Agents transcript --- ## Case Study 6: Meta-Agent System **Pattern:** Agents building agents **Problem:** Need new specialized agent but don't want to hand-write configuration **Solution:** Meta-agent that builds other agents **Meta-agent prompt:** ```markdown # meta-agent.md You are a meta-agent that builds new sub-agents from user descriptions. When user says "build a new sub-agent": 1. Ask what the agent should do 2. Fetch Claude Code sub-agent documentation 3. Design system prompt for new agent 4. Create agent configuration file 5. Test agent with sample prompts 6. Report usage examples Output: .claude/agents/<agent-name>.md with complete configuration ``` **Example: Building TTS summary agent** **User:** "Build agent that summarizes what my code does using text-to-speech" **Meta-agent process:** ```text Step 1: Understand requirements ├── Parse: "summarize code" + "text-to-speech" ├── Infer: Needs code reading + TTS API access └── Clarify: Voice provider? (user chooses 11Labs) Step 2: Fetch documentation ├── Reads Claude Code sub-agent docs ├── Reads 11Labs API docs └── Understands agent configuration format Step 3: Design system prompt ├── Purpose: Concise code summaries via voice ├── Tools needed: read files, 11Labs TTS ├── Response format: Audio file output └── Trigger: "use TTS summary" Step 4: Create configuration Writes .claude/agents/tts-summary.md: --- name: tts-summary description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary" --- Purpose: Review user's code and provide 1-sentence summary via 11Labs voice [... full system prompt ...] Step 5: Test ├── Runs test prompt: "TTS summary for hooks.py" ├── Agent reads file, generates summary ├── Outputs audio with summary └── Validates: Works correctly ✅ Step 6: Report ├── Explains how to use new agent ├── Shows example prompts └── Notes: Can adjust voice, length, etc. ``` **Result:** Fully functional TTS summary agent created from natural language description **Recursion depth:** ```text Level 0: Human user └→ Level 1: Meta-agent (builds agents) └→ Level 2: TTS summary agent (built by meta-agent) └→ Level 3: Sub-agents (if TTS agent spawns any) ``` **Key techniques:** - **Documentation fetching** - Meta-agent reads official docs - **Template following** - Follows agent configuration patterns - **Validation loop** - Tests before declaring success - **Recursive creation** - Agents can build agents **Challenges:** - **Dependency coupling** - New agent depends on meta-agent's understanding - **Debugging difficulty** - If generated agent fails, hard to trace - **Version drift** - Meta-agent's docs knowledge may become outdated **Results:** - **New agent in ~2 minutes** vs. 15+ minutes manually - **Follows best practices** automatically - **Tested before delivery** - **Documented usage** **Source:** Sub-Agents transcript --- ## Case Study 7: Observability Dashboard **Pattern:** Real-time multi-agent monitoring **Problem:** Running 5-10 agents simultaneously. Can't see what they're doing. **Solution:** Centralized observability system **Architecture:** ```text ┌──────────────────── Multiple Agents ────────────────────┐ │ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │ │ ↓ ↓ ↓ ↓ ↓ │ │ pre/post-tool-use hooks │ │ ↓ │ └──────────────────────────────────────────────────────────┘ ↓ ┌──────────────────── Bun Server ─────────────────────────┐ │ POST /events endpoint │ │ ├→ Store in SQLite (persistence) │ │ └→ Broadcast via WebSocket (real-time) │ └──────────────────────────────────────────────────────────┘ ↓ ┌──────────────────── Web Client ─────────────────────────┐ │ ┌─────────────────────────────────────────────────┐ │ │ │ Live Activity Pulse (1min/3min/5min windows) │ │ │ ├─────────────────────────────────────────────────┤ │ │ │ Event Stream (filtered by app/session/type) │ │ │ ├─────────────────────────────────────────────────┤ │ │ │ Event Details (with AI-generated summaries) │ │ │ └─────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────┘ ``` **Hook implementation:** ```python # .claude/hooks/post-tool-use.py import sys, json, subprocess def main(): event = json.load(sys.stdin) # Send to observability server (with AI summary) subprocess.run([ "uv", "run", ".claude/hooks/utils/send-event.py", "my-codebase", # App name "post-tool-use", # Event type "--summarize" # Generate Haiku summary ], input=json.dumps(event), text=True) if __name__ == "__main__": main() ``` **What user sees:** ```text ┌─────────────── Live Activity Pulse ───────────────┐ │ ▂▄▆█▆▄▂▁ Agent A (very active) │ │ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │ │ ▂▂▂▂▂▂▂▂ Agent C (steady work) │ │ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │ └────────────────────────────────────────────────────┘ ┌─────────────── Event Stream ──────────────────────┐ │ [Agent A] post-tool-use │ │ Summary: "Wrote authentication logic to user.py"│ │ Time: 2s ago │ ├────────────────────────────────────────────────────┤ │ [Agent B] sub-agent-stop │ │ Summary: "Completed documentation scrape" │ │ Time: 5s ago │ ├────────────────────────────────────────────────────┤ │ [Agent C] notification │ │ Summary: "Needs approval for rm command" │ │ Time: 8s ago │ └────────────────────────────────────────────────────┘ ``` **Filtering:** ```text Filters available: ├── By app (codebase-1, codebase-2, etc.) ├── By agent session ID ├── By event type (pre-tool, post-tool, stop, etc.) └── By time window (1min, 3min, 5min) ``` **Event summarization:** ```python # Each event summarized by Haiku ($0.0002 per event) Event: post-tool-use for Write tool Input: {file: "auth.py", content: "...500 lines..."} Output: Success Summary generated: "Implemented JWT authentication with refresh tokens in auth.py" Cost: $0.0002 Human value: Instant understanding without reading 500 lines ``` **Key techniques:** - **One-way data stream** - Simple, fast, scalable - **Edge summarization** - AI summaries generated at hook time - **Dual storage** - SQLite (history) + WebSocket (real-time) - **Color coding** - Consistent colors per agent session **Results:** - **5-10 agents monitored** simultaneously - **Thousands of events logged** (cost: ~$0.20) - **Real-time visibility** into all agent work - **Historical analysis** via SQLite queries **Business value:** - **Catch errors fast** (notification events = agent blocked) - **Optimize workflows** (which tools used most?) - **Debug issues** (what happened before failure?) - **Scale confidence** (can observe 10+ agents easily) **Source:** Multi-Agent Observability transcript --- ## Case Study 8: AFK Agent Device **Pattern:** Autonomous background work while you're away **Problem:** Long-running tasks block your terminal. You want to work on something else. **Solution:** Dedicated device running agent fleet **Architecture:** ```text Your Device (interactive): ├── Claude Code session ├── Send job to agent device └── Monitor status updates Agent Device (autonomous): ├── Picks up job from queue ├── Executes: Scout → Plan → Build → Ship ├── Reports status every 60s └── Ships results to git ``` **Workflow:** ```bash # From your device /afk-agents \ --prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \ --adw "plan-build-ship" \ --docs "https://openai-agent-sdk.com/docs" # Job sent to dedicated device # You continue working on your device # Background: Agent device executes workflow ``` **Agent device execution:** ```text [00:00] Job received: Build 3 SDK agents [00:05] Planner agent created [00:45] Plan complete: 3 agents specified [01:00] Builder agent 1 created (basic agent) [02:30] Builder agent 1 complete: basic-agent.py ✅ [02:35] Builder agent 2 created (with tools) [04:15] Builder agent 2 complete: agent-with-tools.py ✅ [04:20] Builder agent 3 created (realtime voice) [07:45] Builder agent 3 partial: needs audio libraries [08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial) [08:05] Shipper agent created [08:20] Git commit created [08:25] Pushed to remote [08:30] Job complete ✅ ``` **Status updates (every 60s):** ```text Your device shows: [60s] Status: Planning agents... [120s] Status: Building agent 1 of 3... [180s] Status: Building agent 2 of 3... [240s] Status: Building agent 3 of 3... [300s] Status: Testing agents... [360s] Status: Shipping to git... [420s] Status: Complete ✅ Click to view: results/sdk-agents-20250105/ ``` **What you do:** ```text 1. Send job (10 seconds) 2. Go AFK (work on something else) 3. Get notified when complete (7 minutes later) 4. Review results ``` **Key techniques:** - **Job queue** - Agents pick up work from queue - **Async status** - Reports back periodically - **Autonomous execution** - No human in the loop - **Git integration** - Results automatically committed **Results:** - **3 SDK agents built** in 7 minutes - **You worked on other things** during that time - **Autonomous end-to-end** - plan + build + test + ship - **Code review** - Quick glance confirms quality **Infrastructure required:** - Dedicated machine (M4 Mac Mini, cloud VM, etc.) - Agent queue system - Job scheduler - Status reporting **Use cases:** - Long-running builds - Overnight work - Prototyping experiments - Documentation generation - Codebase refactors **Source:** Claude 2.0 transcript --- ## Cross-Cutting Patterns ### Pattern: Context Window as Resource Constraint **Appears in:** - Case 1: Sub-agent delegation protects primary - Case 2: Scout-plan-build reduces planner context - Case 3: Orchestrator sleeps to protect its context - Case 8: Fresh agents for each phase (no accumulation) **Lesson:** Context is precious. Protect it aggressively. ### Pattern: Specialized Agents Over General **Appears in:** - Case 3: Frontend/Backend/QA agents vs. one do-everything agent - Case 4: Scout finds, builder builds (not one agent doing both) - Case 5: Planner/builder/reviewer/shipper (4 specialists) - Case 6: Meta-agent only builds, doesn't execute **Lesson:** "A focused agent is a performant agent." ### Pattern: Observability Enables Scale **Appears in:** - Case 3: Orchestrator tracks agent status - Case 5: Task board shows current phase - Case 7: Real-time dashboard for all agents - Case 8: Status updates every 60s **Lesson:** "If you can't measure it, you can't scale it." ### Pattern: Deletable Temporary Resources **Appears in:** - Case 3: All 3 agents deleted after completion - Case 4: Scout and builder deleted - Case 5: Each phase agent deleted after task moves - Case 8: Builder agents deleted after shipping **Lesson:** "The best agent is a deleted agent." ## Performance Comparisons ### Single Agent vs. Multi-Agent | Task | Single Agent | Multi-Agent | Speedup | |------|--------------|-------------|---------| | Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context | | SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails | | Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster | | UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster | ### With vs. Without Orchestration | Metric | Manual (no orchestrator) | With Orchestrator | |--------|-------------------------|-------------------| | Commands per task | 8-12 manual prompts | 1 prompt to orchestrator | | Context management | Manual (forget limits) | Automatic (orchestrator sleeps) | | Error recovery | Start over | Retry failed phase only | | Observability | Terminal logs | Real-time dashboard | ## Common Failure Modes ### Failure: Context Explosion **Scenario:** Case 2 without scouts - Single agent reads 100+ files - Context hits 180k tokens - Agent slows down, makes mistakes - Eventually fails or times out **Fix:** Add scout phase to filter files first ### Failure: Orchestrator Watching Everything **Scenario:** Case 3 with observing orchestrator - Orchestrator watches all agent work - Orchestrator context grows to 100k+ - Can't coordinate more than 2-3 agents - System doesn't scale **Fix:** Implement orchestrator sleep pattern ### Failure: No Observability **Scenario:** Case 7 without dashboard - 5 agents running - One agent stuck on permission request - No way to know which agent needs attention - Entire workflow blocked **Fix:** Add hooks + observability system ### Failure: Agent Accumulation **Scenario:** Case 5 not deleting agents - 20 tasks completed - 80 agents still running (4 per task) - System resources exhausted - New agents can't start **Fix:** Delete agents after task completion ## Key Takeaways 1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel 2. **Context protection = Specialization** - Focused agents use less context 3. **Orchestration = Scale** - Single interface manages fleet 4. **Observability = Confidence** - Can't scale what you can't see 5. **Deletable = Sustainable** - Free resources for next task 6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first ## When to Use Multi-Agent Patterns Use multi-agent when: - ✅ Task naturally divides into parallel subtasks - ✅ Single agent context approaching limits - ✅ Need quality gates between phases - ✅ Want to work on other things while agents execute - ✅ Have observability infrastructure Don't use multi-agent when: - ❌ Simple one-off task - ❌ Learning/prototyping phase - ❌ No way to monitor agents - ❌ Task requires tight human-in-loop feedback ## Source Attribution All case studies drawn from field experience documented in 8 source transcripts: 1. Elite Context Engineering - Case 1 (AI docs loader) 2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device) 3. Custom Agents - Case 5 (task board) 4. Sub-Agents - Case 6 (meta-agent) 5. Multi-Agent Observability - Case 7 (dashboard) 6. Hooked - Supporting patterns 7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components) 8. (Transcript 8 name not specified in context) ## Related Documentation - [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination - [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation - [Context Window Protection](../patterns/context-window-protection.md) - Resource management - [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery --- **Remember:** These are real systems in production. Start simple, add complexity only when needed.