Files
2025-11-29 18:00:36 +08:00

993 lines
30 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Multi-Agent Case Studies
Real-world examples of multi-agent systems in production, drawn from field experience.
## Case Study Index
| # | Name | Pattern | Agents | Key Lesson |
|---|------|---------|--------|------------|
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
---
## Case Study 1: AI Docs Loader
**Pattern:** Sub-agent delegation for parallel work
**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
**Solution:** Delegate each scrape to isolated sub-agent
**Architecture:**
```text
Primary Agent (9k tokens)
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
...
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
Total work: 39k tokens
Primary agent: Only 9k tokens ✅
Context protected: 30k tokens kept out of primary
```
**Implementation:**
```bash
# Single command
/load-ai-docs
# Agent reads list from ai-docs/README.md
# For each URL older than 24 hours:
# - Spawn sub-agent
# - Sub-agent scrapes URL
# - Sub-agent saves to file
# - Sub-agent reports completion
# Primary agent never sees scrape content
```
**Key techniques:**
- **Sub-agents for isolation** - Each scrape in separate context
- **Parallel execution** - All 10 scrapes run simultaneously
- **Context delegation** - 30k tokens stay out of primary
**Results:**
- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
- **Context:** Primary agent stays at 9k tokens throughout
- **Scalability:** Can handle 50+ URLs without primary context issues
**Source:** Elite Context Engineering transcript
---
## Case Study 2: SDK Migration
**Pattern:** Scout-plan-build with multiple perspectives
**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
**Challenge:**
- 100+ files potentially affected
- Agent reading everything = 150k+ tokens
- Planning without full context = mistakes
**Solution:** Three-phase workflow with delegation
**Phase 1: Scout (Reduce context for planner)**
```text
Orchestrator spawns 4 scout agents (parallel):
├→ Scout 1: Gemini Lightning (fast, different perspective)
├→ Scout 2: CodeX (specialized for code search)
├→ Scout 3: Gemini Flash Preview
└→ Scout 4: Haiku (cheap, fast)
Each scout:
- Searches codebase for SDK usage
- Identifies exact files and line numbers
- Notes patterns (e.g., "system prompt now explicit")
Output: relevant-files.md (5k tokens)
├── File paths
├── Line number offsets
├── Character ranges
└── Relevant code snippets
```
**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
**Phase 2: Plan (Focus on relevant subset)**
```text
Planner agent (new instance):
├── Reads relevant-files.md (5k tokens)
├── Scrapes SDK documentation (8k tokens)
├── Analyzes migration patterns
└── Creates detailed-plan.md (3k tokens)
Context used: 16k tokens
vs. 150k if reading entire codebase
Savings: 89% reduction
```
**Phase 3: Build (Execute plan)**
```text
Builder agent (new instance):
├── Reads detailed-plan.md (3k tokens)
├── Implements changes across 8 apps
├── Updates system prompts
├── Tests each application
└── Reports completion
Context used: ~80k tokens
Still within safe limits
```
**Final context analysis:**
```text
If single agent:
├── Search: 40k tokens
├── Read files: 60k tokens
├── Plan: 20k tokens
├── Implement: 30k tokens
└── Total: 150k tokens (75% used)
With scout-plan-build:
├── Primary orchestrator: 10k tokens
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
├── Planner (new agent): 16k tokens
├── Builder (new agent): 80k tokens
└── Max per agent: 80k tokens (40% per agent)
```
**Key techniques:**
- **Composable workflows** - Chain /scout, /plan, /build
- **Multiple scout models** - Diverse perspectives
- **Context offloading** - Scouts protect planner's context
- **Fresh agents per phase** - No context accumulation
**Results:**
- **8 applications migrated** successfully
- **51% context used** in builder phase (safe margins)
- **No context explosions** across entire workflow
- **Completed in single session** (~30 minutes)
**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
**Lesson:** Disable autocompact buffer. That 22% matters at scale.
**Source:** Claude 2.0 transcript
---
## Case Study 3: Codebase Summarization
**Pattern:** Orchestrator with specialized QA agents
**Problem:** Summarize large codebase (frontend + backend) with architecture docs
**Approach:** Divide and conquer with synthesis
**Architecture:**
```text
Orchestrator Agent
├→ Creates Frontend QA Agent
│ ├─ Summarizes frontend components
│ └─ Outputs: frontend-summary.md
├→ Creates Backend QA Agent
│ ├─ Summarizes backend APIs
│ └─ Outputs: backend-summary.md
└→ Creates Primary QA Agent
├─ Reads both summaries
├─ Synthesizes unified view
└─ Outputs: codebase-overview.md
```
**Orchestrator behavior:**
```text
1. Parse user request: "Summarize codebase"
2. Create 3 agents with specialized tasks
3. Command each agent with detailed prompts
4. SLEEP (not observing their work)
5. Wake every 15s to check status
6. Agents complete → Orchestrator wakes
7. Collect results (read produced files)
8. Summarize for user
9. Delete all 3 agents
```
**Prompts from orchestrator:**
```markdown
Frontend QA Agent:
"Analyze all files in src/frontend/. Create markdown summary with:
- Key components and their responsibilities
- State management approach
- Routing structure
- Technology stack
Output to docs/frontend-summary.md"
Backend QA Agent:
"Analyze all files in src/backend/. Create markdown summary with:
- API endpoints and their purposes
- Database schema
- Authentication/authorization
- External integrations
Output to docs/backend-summary.md"
Primary QA Agent:
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
- High-level architecture
- How components interact
- Data flow
- Key technologies
Output to docs/codebase-overview.md"
```
**Observability interface shows:**
```text
[Agent 1] Frontend QA
├── Status: Complete ✅
├── Context: 28k tokens used
├── Files consumed: 15 files
├── Files produced: frontend-summary.md
└── Time: 45 seconds
[Agent 2] Backend QA
├── Status: Complete ✅
├── Context: 32k tokens used
├── Files consumed: 12 files
├── Files produced: backend-summary.md
└── Time: 52 seconds
[Agent 3] Primary QA
├── Status: Complete ✅
├── Context: 18k tokens used
├── Files consumed: 2 files (summaries)
├── Files produced: codebase-overview.md
└── Time: 30 seconds
Orchestrator:
├── Context: 12k tokens (commands only, not observing work)
├── Total time: 52 seconds (parallel execution)
└── All agents deleted after completion
```
**Key techniques:**
- **Parallel frontend/backend** - 2x speedup
- **Orchestrator sleeps** - Protects its context
- **Synthesis agent** - Combines perspectives
- **Deletable agents** - Freed after use
**Results:**
- **3 comprehensive docs** created
- **Max context per agent:** 32k tokens (16%)
- **Orchestrator context:** 12k tokens (6%)
- **Time:** 52 seconds (vs. 2+ minutes sequential)
**Source:** One Agent to Rule Them All transcript
---
## Case Study 4: UI Component Creation
**Pattern:** Scout-builder two-stage
**Problem:** Create gray pills for app header information display
**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
**Solution:** Scout locates, builder implements
**Phase 1: Scout**
```text
Scout Agent:
├── Task: "Find header UI component files"
├── Searches for: header, display, pills, info components
├── Identifies patterns: existing pill styles, color conventions
├── Locates exact files:
│ ├── src/components/AppHeader.vue
│ ├── src/styles/pills.css
│ └── src/utils/formatters.ts
└── Outputs: scout-header-report.md with:
├── File locations
├── Line numbers for modifications
├── Existing patterns to follow
└── Recommended approach
```
**Phase 2: Builder**
```text
Builder Agent:
├── Reads scout-header-report.md
├── Follows identified patterns
├── Creates gray pill components
├── Applies consistent styling
├── Outputs modified files with exact changes
└── Context: Only 30k tokens (vs. 80k+ without scout)
```
**Orchestrator involvement:**
```text
1. User prompts: "Create gray pills for header"
2. Orchestrator creates Scout
3. Orchestrator SLEEPS (checks every 15s)
4. Scout completes → Orchestrator wakes
5. Orchestrator reads scout output
6. Orchestrator creates Builder with detailed instructions
7. Orchestrator SLEEPS again
8. Builder completes → Orchestrator wakes
9. Orchestrator reports results
10. Orchestrator deletes both agents
```
**Key techniques:**
- **Scout reduces uncertainty** - Builder knows exactly where to work
- **Pattern following** - Scout identifies conventions
- **Orchestrator sleep** - Two phases, minimal orchestrator context
- **Precise targeting** - No wasted reads
**Results:**
- **Scout:** 15k tokens, 20 seconds
- **Builder:** 30k tokens, 35 seconds
- **Orchestrator:** 8k tokens final
- **Total time:** 55 seconds
- **Feature shipped** correctly on first try
**Source:** One Agent to Rule Them All transcript
---
## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
**Pattern:** Structured lifecycle with quality gates
**Problem:** Ensure all changes go through proper review before shipping
**Architecture:**
```text
Task Board Columns:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```
**Example task: "Update HTML titles"**
**Column 1: PLAN**
```text
Planner Agent:
├── Analyzes requirement
├── Identifies affected files:
│ ├── index.html
│ └── src/App.tsx (has <title> in render)
├── Creates implementation plan:
│ 1. Update index.html <title>
│ 2. Update App.tsx header component
│ 3. Test both pages load correctly
└── Moves task to BUILD column
```
**Column 2: BUILD**
```text
Builder Agent:
├── Reads plan from PLAN column
├── Implements changes:
│ ├── index.html: "Plan Build Review Ship"
│ └── App.tsx: header="Plan Build Review Ship"
├── Runs tests: All passing ✅
└── Moves task to REVIEW column
```
**Column 3: REVIEW**
```text
Reviewer Agent:
├── Reads plan and implementation
├── Checks:
│ ├── Plan followed? ✅
│ ├── Tests passing? ✅
│ ├── Code quality? ✅
│ └── No security issues? ✅
├── Approves changes
└── Moves task to SHIP column
```
**Column 4: SHIP**
```text
Shipper Agent:
├── Creates git commit
├── Pushes to remote
├── Updates deployment
└── Marks task complete
```
**Orchestrator's role:**
```text
- NOT micromanaging each step
- Responding to user commands like "Move task to next phase"
- Tracking task state in database
- Providing UI showing current phase
- Can intervene if phase fails (e.g., tests fail in BUILD)
```
**UI representation:**
```text
Task: Update Titles
├── Status: REVIEW
├── Assigned: reviewer-agent-003
├── History:
│ ├── PLAN: planner-001 (completed 2m ago)
│ ├── BUILD: builder-002 (completed 1m ago)
│ └── REVIEW: reviewer-003 (in progress)
└── Files modified: 2
```
**Key techniques:**
- **Clear phases** - No ambiguity about current state
- **Quality gates** - Can't skip to SHIP without REVIEW
- **Agent specialization** - Each agent expert in its phase
- **Failure isolation** - If BUILD fails, PLAN preserved
**Results:**
- **Zero shipping untested code** (REVIEW gate catches issues)
- **Clear audit trail** (who did what in which phase)
- **Parallel tasks** (multiple agents in different columns)
- **Single interface** (user sees all tasks across all phases)
**Source:** Custom Agents transcript
---
## Case Study 6: Meta-Agent System
**Pattern:** Agents building agents
**Problem:** Need new specialized agent but don't want to hand-write configuration
**Solution:** Meta-agent that builds other agents
**Meta-agent prompt:**
```markdown
# meta-agent.md
You are a meta-agent that builds new sub-agents from user descriptions.
When user says "build a new sub-agent":
1. Ask what the agent should do
2. Fetch Claude Code sub-agent documentation
3. Design system prompt for new agent
4. Create agent configuration file
5. Test agent with sample prompts
6. Report usage examples
Output: .claude/agents/<agent-name>.md with complete configuration
```
**Example: Building TTS summary agent**
**User:** "Build agent that summarizes what my code does using text-to-speech"
**Meta-agent process:**
```text
Step 1: Understand requirements
├── Parse: "summarize code" + "text-to-speech"
├── Infer: Needs code reading + TTS API access
└── Clarify: Voice provider? (user chooses 11Labs)
Step 2: Fetch documentation
├── Reads Claude Code sub-agent docs
├── Reads 11Labs API docs
└── Understands agent configuration format
Step 3: Design system prompt
├── Purpose: Concise code summaries via voice
├── Tools needed: read files, 11Labs TTS
├── Response format: Audio file output
└── Trigger: "use TTS summary"
Step 4: Create configuration
Writes .claude/agents/tts-summary.md:
---
name: tts-summary
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
---
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
[... full system prompt ...]
Step 5: Test
├── Runs test prompt: "TTS summary for hooks.py"
├── Agent reads file, generates summary
├── Outputs audio with summary
└── Validates: Works correctly ✅
Step 6: Report
├── Explains how to use new agent
├── Shows example prompts
└── Notes: Can adjust voice, length, etc.
```
**Result:** Fully functional TTS summary agent created from natural language description
**Recursion depth:**
```text
Level 0: Human user
└→ Level 1: Meta-agent (builds agents)
└→ Level 2: TTS summary agent (built by meta-agent)
└→ Level 3: Sub-agents (if TTS agent spawns any)
```
**Key techniques:**
- **Documentation fetching** - Meta-agent reads official docs
- **Template following** - Follows agent configuration patterns
- **Validation loop** - Tests before declaring success
- **Recursive creation** - Agents can build agents
**Challenges:**
- **Dependency coupling** - New agent depends on meta-agent's understanding
- **Debugging difficulty** - If generated agent fails, hard to trace
- **Version drift** - Meta-agent's docs knowledge may become outdated
**Results:**
- **New agent in ~2 minutes** vs. 15+ minutes manually
- **Follows best practices** automatically
- **Tested before delivery**
- **Documented usage**
**Source:** Sub-Agents transcript
---
## Case Study 7: Observability Dashboard
**Pattern:** Real-time multi-agent monitoring
**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
**Solution:** Centralized observability system
**Architecture:**
```text
┌──────────────────── Multiple Agents ────────────────────┐
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
│ ↓ ↓ ↓ ↓ ↓ │
│ pre/post-tool-use hooks │
│ ↓ │
└──────────────────────────────────────────────────────────┘
┌──────────────────── Bun Server ─────────────────────────┐
│ POST /events endpoint │
│ ├→ Store in SQLite (persistence) │
│ └→ Broadcast via WebSocket (real-time) │
└──────────────────────────────────────────────────────────┘
┌──────────────────── Web Client ─────────────────────────┐
│ ┌─────────────────────────────────────────────────┐ │
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Stream (filtered by app/session/type) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Details (with AI-generated summaries) │ │
│ └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
```
**Hook implementation:**
```python
# .claude/hooks/post-tool-use.py
import sys, json, subprocess
def main():
event = json.load(sys.stdin)
# Send to observability server (with AI summary)
subprocess.run([
"uv", "run",
".claude/hooks/utils/send-event.py",
"my-codebase", # App name
"post-tool-use", # Event type
"--summarize" # Generate Haiku summary
], input=json.dumps(event), text=True)
if __name__ == "__main__":
main()
```
**What user sees:**
```text
┌─────────────── Live Activity Pulse ───────────────┐
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
└────────────────────────────────────────────────────┘
┌─────────────── Event Stream ──────────────────────┐
│ [Agent A] post-tool-use │
│ Summary: "Wrote authentication logic to user.py"│
│ Time: 2s ago │
├────────────────────────────────────────────────────┤
│ [Agent B] sub-agent-stop │
│ Summary: "Completed documentation scrape" │
│ Time: 5s ago │
├────────────────────────────────────────────────────┤
│ [Agent C] notification │
│ Summary: "Needs approval for rm command" │
│ Time: 8s ago │
└────────────────────────────────────────────────────┘
```
**Filtering:**
```text
Filters available:
├── By app (codebase-1, codebase-2, etc.)
├── By agent session ID
├── By event type (pre-tool, post-tool, stop, etc.)
└── By time window (1min, 3min, 5min)
```
**Event summarization:**
```python
# Each event summarized by Haiku ($0.0002 per event)
Event: post-tool-use for Write tool
Input: {file: "auth.py", content: "...500 lines..."}
Output: Success
Summary generated:
"Implemented JWT authentication with refresh tokens in auth.py"
Cost: $0.0002
Human value: Instant understanding without reading 500 lines
```
**Key techniques:**
- **One-way data stream** - Simple, fast, scalable
- **Edge summarization** - AI summaries generated at hook time
- **Dual storage** - SQLite (history) + WebSocket (real-time)
- **Color coding** - Consistent colors per agent session
**Results:**
- **5-10 agents monitored** simultaneously
- **Thousands of events logged** (cost: ~$0.20)
- **Real-time visibility** into all agent work
- **Historical analysis** via SQLite queries
**Business value:**
- **Catch errors fast** (notification events = agent blocked)
- **Optimize workflows** (which tools used most?)
- **Debug issues** (what happened before failure?)
- **Scale confidence** (can observe 10+ agents easily)
**Source:** Multi-Agent Observability transcript
---
## Case Study 8: AFK Agent Device
**Pattern:** Autonomous background work while you're away
**Problem:** Long-running tasks block your terminal. You want to work on something else.
**Solution:** Dedicated device running agent fleet
**Architecture:**
```text
Your Device (interactive):
├── Claude Code session
├── Send job to agent device
└── Monitor status updates
Agent Device (autonomous):
├── Picks up job from queue
├── Executes: Scout → Plan → Build → Ship
├── Reports status every 60s
└── Ships results to git
```
**Workflow:**
```bash
# From your device
/afk-agents \
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
--adw "plan-build-ship" \
--docs "https://openai-agent-sdk.com/docs"
# Job sent to dedicated device
# You continue working on your device
# Background: Agent device executes workflow
```
**Agent device execution:**
```text
[00:00] Job received: Build 3 SDK agents
[00:05] Planner agent created
[00:45] Plan complete: 3 agents specified
[01:00] Builder agent 1 created (basic agent)
[02:30] Builder agent 1 complete: basic-agent.py ✅
[02:35] Builder agent 2 created (with tools)
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
[04:20] Builder agent 3 created (realtime voice)
[07:45] Builder agent 3 partial: needs audio libraries
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
[08:05] Shipper agent created
[08:20] Git commit created
[08:25] Pushed to remote
[08:30] Job complete ✅
```
**Status updates (every 60s):**
```text
Your device shows:
[60s] Status: Planning agents...
[120s] Status: Building agent 1 of 3...
[180s] Status: Building agent 2 of 3...
[240s] Status: Building agent 3 of 3...
[300s] Status: Testing agents...
[360s] Status: Shipping to git...
[420s] Status: Complete ✅
Click to view: results/sdk-agents-20250105/
```
**What you do:**
```text
1. Send job (10 seconds)
2. Go AFK (work on something else)
3. Get notified when complete (7 minutes later)
4. Review results
```
**Key techniques:**
- **Job queue** - Agents pick up work from queue
- **Async status** - Reports back periodically
- **Autonomous execution** - No human in the loop
- **Git integration** - Results automatically committed
**Results:**
- **3 SDK agents built** in 7 minutes
- **You worked on other things** during that time
- **Autonomous end-to-end** - plan + build + test + ship
- **Code review** - Quick glance confirms quality
**Infrastructure required:**
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
- Agent queue system
- Job scheduler
- Status reporting
**Use cases:**
- Long-running builds
- Overnight work
- Prototyping experiments
- Documentation generation
- Codebase refactors
**Source:** Claude 2.0 transcript
---
## Cross-Cutting Patterns
### Pattern: Context Window as Resource Constraint
**Appears in:**
- Case 1: Sub-agent delegation protects primary
- Case 2: Scout-plan-build reduces planner context
- Case 3: Orchestrator sleeps to protect its context
- Case 8: Fresh agents for each phase (no accumulation)
**Lesson:** Context is precious. Protect it aggressively.
### Pattern: Specialized Agents Over General
**Appears in:**
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
- Case 4: Scout finds, builder builds (not one agent doing both)
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
- Case 6: Meta-agent only builds, doesn't execute
**Lesson:** "A focused agent is a performant agent."
### Pattern: Observability Enables Scale
**Appears in:**
- Case 3: Orchestrator tracks agent status
- Case 5: Task board shows current phase
- Case 7: Real-time dashboard for all agents
- Case 8: Status updates every 60s
**Lesson:** "If you can't measure it, you can't scale it."
### Pattern: Deletable Temporary Resources
**Appears in:**
- Case 3: All 3 agents deleted after completion
- Case 4: Scout and builder deleted
- Case 5: Each phase agent deleted after task moves
- Case 8: Builder agents deleted after shipping
**Lesson:** "The best agent is a deleted agent."
## Performance Comparisons
### Single Agent vs. Multi-Agent
| Task | Single Agent | Multi-Agent | Speedup |
|------|--------------|-------------|---------|
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
### With vs. Without Orchestration
| Metric | Manual (no orchestrator) | With Orchestrator |
|--------|-------------------------|-------------------|
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
| Error recovery | Start over | Retry failed phase only |
| Observability | Terminal logs | Real-time dashboard |
## Common Failure Modes
### Failure: Context Explosion
**Scenario:** Case 2 without scouts
- Single agent reads 100+ files
- Context hits 180k tokens
- Agent slows down, makes mistakes
- Eventually fails or times out
**Fix:** Add scout phase to filter files first
### Failure: Orchestrator Watching Everything
**Scenario:** Case 3 with observing orchestrator
- Orchestrator watches all agent work
- Orchestrator context grows to 100k+
- Can't coordinate more than 2-3 agents
- System doesn't scale
**Fix:** Implement orchestrator sleep pattern
### Failure: No Observability
**Scenario:** Case 7 without dashboard
- 5 agents running
- One agent stuck on permission request
- No way to know which agent needs attention
- Entire workflow blocked
**Fix:** Add hooks + observability system
### Failure: Agent Accumulation
**Scenario:** Case 5 not deleting agents
- 20 tasks completed
- 80 agents still running (4 per task)
- System resources exhausted
- New agents can't start
**Fix:** Delete agents after task completion
## Key Takeaways
1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
2. **Context protection = Specialization** - Focused agents use less context
3. **Orchestration = Scale** - Single interface manages fleet
4. **Observability = Confidence** - Can't scale what you can't see
5. **Deletable = Sustainable** - Free resources for next task
6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
## When to Use Multi-Agent Patterns
Use multi-agent when:
- ✅ Task naturally divides into parallel subtasks
- ✅ Single agent context approaching limits
- ✅ Need quality gates between phases
- ✅ Want to work on other things while agents execute
- ✅ Have observability infrastructure
Don't use multi-agent when:
- ❌ Simple one-off task
- ❌ Learning/prototyping phase
- ❌ No way to monitor agents
- ❌ Task requires tight human-in-loop feedback
## Source Attribution
All case studies drawn from field experience documented in 8 source transcripts:
1. Elite Context Engineering - Case 1 (AI docs loader)
2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
3. Custom Agents - Case 5 (task board)
4. Sub-Agents - Case 6 (meta-agent)
5. Multi-Agent Observability - Case 7 (dashboard)
6. Hooked - Supporting patterns
7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
8. (Transcript 8 name not specified in context)
## Related Documentation
- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
---
**Remember:** These are real systems in production. Start simple, add complexity only when needed.