30 KiB
Multi-Agent Case Studies
Real-world examples of multi-agent systems in production, drawn from field experience.
Case Study Index
| # | Name | Pattern | Agents | Key Lesson |
|---|---|---|---|---|
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
Case Study 1: AI Docs Loader
Pattern: Sub-agent delegation for parallel work
Problem: Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
Solution: Delegate each scrape to isolated sub-agent
Architecture:
Primary Agent (9k tokens)
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
...
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
Total work: 39k tokens
Primary agent: Only 9k tokens ✅
Context protected: 30k tokens kept out of primary
Implementation:
# Single command
/load-ai-docs
# Agent reads list from ai-docs/README.md
# For each URL older than 24 hours:
# - Spawn sub-agent
# - Sub-agent scrapes URL
# - Sub-agent saves to file
# - Sub-agent reports completion
# Primary agent never sees scrape content
Key techniques:
- Sub-agents for isolation - Each scrape in separate context
- Parallel execution - All 10 scrapes run simultaneously
- Context delegation - 30k tokens stay out of primary
Results:
- Time: 10 scrapes in parallel vs. sequential (10x faster)
- Context: Primary agent stays at 9k tokens throughout
- Scalability: Can handle 50+ URLs without primary context issues
Source: Elite Context Engineering transcript
Case Study 2: SDK Migration
Pattern: Scout-plan-build with multiple perspectives
Problem: Migrating codebase to new Claude Agent SDK across 8 applications
Challenge:
- 100+ files potentially affected
- Agent reading everything = 150k+ tokens
- Planning without full context = mistakes
Solution: Three-phase workflow with delegation
Phase 1: Scout (Reduce context for planner)
Orchestrator spawns 4 scout agents (parallel):
├→ Scout 1: Gemini Lightning (fast, different perspective)
├→ Scout 2: CodeX (specialized for code search)
├→ Scout 3: Gemini Flash Preview
└→ Scout 4: Haiku (cheap, fast)
Each scout:
- Searches codebase for SDK usage
- Identifies exact files and line numbers
- Notes patterns (e.g., "system prompt now explicit")
Output: relevant-files.md (5k tokens)
├── File paths
├── Line number offsets
├── Character ranges
└── Relevant code snippets
Why multiple models? Diverse perspectives catch edge cases single model might miss.
Phase 2: Plan (Focus on relevant subset)
Planner agent (new instance):
├── Reads relevant-files.md (5k tokens)
├── Scrapes SDK documentation (8k tokens)
├── Analyzes migration patterns
└── Creates detailed-plan.md (3k tokens)
Context used: 16k tokens
vs. 150k if reading entire codebase
Savings: 89% reduction
Phase 3: Build (Execute plan)
Builder agent (new instance):
├── Reads detailed-plan.md (3k tokens)
├── Implements changes across 8 apps
├── Updates system prompts
├── Tests each application
└── Reports completion
Context used: ~80k tokens
Still within safe limits
Final context analysis:
If single agent:
├── Search: 40k tokens
├── Read files: 60k tokens
├── Plan: 20k tokens
├── Implement: 30k tokens
└── Total: 150k tokens (75% used)
With scout-plan-build:
├── Primary orchestrator: 10k tokens
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
├── Planner (new agent): 16k tokens
├── Builder (new agent): 80k tokens
└── Max per agent: 80k tokens (40% per agent)
Key techniques:
- Composable workflows - Chain /scout, /plan, /build
- Multiple scout models - Diverse perspectives
- Context offloading - Scouts protect planner's context
- Fresh agents per phase - No context accumulation
Results:
- 8 applications migrated successfully
- 51% context used in builder phase (safe margins)
- No context explosions across entire workflow
- Completed in single session (~30 minutes)
Near miss: "We were 14% away from exploding our context" due to autocompact buffer
Lesson: Disable autocompact buffer. That 22% matters at scale.
Source: Claude 2.0 transcript
Case Study 3: Codebase Summarization
Pattern: Orchestrator with specialized QA agents
Problem: Summarize large codebase (frontend + backend) with architecture docs
Approach: Divide and conquer with synthesis
Architecture:
Orchestrator Agent
├→ Creates Frontend QA Agent
│ ├─ Summarizes frontend components
│ └─ Outputs: frontend-summary.md
├→ Creates Backend QA Agent
│ ├─ Summarizes backend APIs
│ └─ Outputs: backend-summary.md
└→ Creates Primary QA Agent
├─ Reads both summaries
├─ Synthesizes unified view
└─ Outputs: codebase-overview.md
Orchestrator behavior:
1. Parse user request: "Summarize codebase"
2. Create 3 agents with specialized tasks
3. Command each agent with detailed prompts
4. SLEEP (not observing their work)
5. Wake every 15s to check status
6. Agents complete → Orchestrator wakes
7. Collect results (read produced files)
8. Summarize for user
9. Delete all 3 agents
Prompts from orchestrator:
Frontend QA Agent:
"Analyze all files in src/frontend/. Create markdown summary with:
- Key components and their responsibilities
- State management approach
- Routing structure
- Technology stack
Output to docs/frontend-summary.md"
Backend QA Agent:
"Analyze all files in src/backend/. Create markdown summary with:
- API endpoints and their purposes
- Database schema
- Authentication/authorization
- External integrations
Output to docs/backend-summary.md"
Primary QA Agent:
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
- High-level architecture
- How components interact
- Data flow
- Key technologies
Output to docs/codebase-overview.md"
Observability interface shows:
[Agent 1] Frontend QA
├── Status: Complete ✅
├── Context: 28k tokens used
├── Files consumed: 15 files
├── Files produced: frontend-summary.md
└── Time: 45 seconds
[Agent 2] Backend QA
├── Status: Complete ✅
├── Context: 32k tokens used
├── Files consumed: 12 files
├── Files produced: backend-summary.md
└── Time: 52 seconds
[Agent 3] Primary QA
├── Status: Complete ✅
├── Context: 18k tokens used
├── Files consumed: 2 files (summaries)
├── Files produced: codebase-overview.md
└── Time: 30 seconds
Orchestrator:
├── Context: 12k tokens (commands only, not observing work)
├── Total time: 52 seconds (parallel execution)
└── All agents deleted after completion
Key techniques:
- Parallel frontend/backend - 2x speedup
- Orchestrator sleeps - Protects its context
- Synthesis agent - Combines perspectives
- Deletable agents - Freed after use
Results:
- 3 comprehensive docs created
- Max context per agent: 32k tokens (16%)
- Orchestrator context: 12k tokens (6%)
- Time: 52 seconds (vs. 2+ minutes sequential)
Source: One Agent to Rule Them All transcript
Case Study 4: UI Component Creation
Pattern: Scout-builder two-stage
Problem: Create gray pills for app header information display
Challenge: Codebase has specific conventions. Need to find exact files and follow patterns.
Solution: Scout locates, builder implements
Phase 1: Scout
Scout Agent:
├── Task: "Find header UI component files"
├── Searches for: header, display, pills, info components
├── Identifies patterns: existing pill styles, color conventions
├── Locates exact files:
│ ├── src/components/AppHeader.vue
│ ├── src/styles/pills.css
│ └── src/utils/formatters.ts
└── Outputs: scout-header-report.md with:
├── File locations
├── Line numbers for modifications
├── Existing patterns to follow
└── Recommended approach
Phase 2: Builder
Builder Agent:
├── Reads scout-header-report.md
├── Follows identified patterns
├── Creates gray pill components
├── Applies consistent styling
├── Outputs modified files with exact changes
└── Context: Only 30k tokens (vs. 80k+ without scout)
Orchestrator involvement:
1. User prompts: "Create gray pills for header"
2. Orchestrator creates Scout
3. Orchestrator SLEEPS (checks every 15s)
4. Scout completes → Orchestrator wakes
5. Orchestrator reads scout output
6. Orchestrator creates Builder with detailed instructions
7. Orchestrator SLEEPS again
8. Builder completes → Orchestrator wakes
9. Orchestrator reports results
10. Orchestrator deletes both agents
Key techniques:
- Scout reduces uncertainty - Builder knows exactly where to work
- Pattern following - Scout identifies conventions
- Orchestrator sleep - Two phases, minimal orchestrator context
- Precise targeting - No wasted reads
Results:
- Scout: 15k tokens, 20 seconds
- Builder: 30k tokens, 35 seconds
- Orchestrator: 8k tokens final
- Total time: 55 seconds
- Feature shipped correctly on first try
Source: One Agent to Rule Them All transcript
Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
Pattern: Structured lifecycle with quality gates
Problem: Ensure all changes go through proper review before shipping
Architecture:
Task Board Columns:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Example task: "Update HTML titles"
Column 1: PLAN
Planner Agent:
├── Analyzes requirement
├── Identifies affected files:
│ ├── index.html
│ └── src/App.tsx (has <title> in render)
├── Creates implementation plan:
│ 1. Update index.html <title>
│ 2. Update App.tsx header component
│ 3. Test both pages load correctly
└── Moves task to BUILD column
Column 2: BUILD
Builder Agent:
├── Reads plan from PLAN column
├── Implements changes:
│ ├── index.html: "Plan Build Review Ship"
│ └── App.tsx: header="Plan Build Review Ship"
├── Runs tests: All passing ✅
└── Moves task to REVIEW column
Column 3: REVIEW
Reviewer Agent:
├── Reads plan and implementation
├── Checks:
│ ├── Plan followed? ✅
│ ├── Tests passing? ✅
│ ├── Code quality? ✅
│ └── No security issues? ✅
├── Approves changes
└── Moves task to SHIP column
Column 4: SHIP
Shipper Agent:
├── Creates git commit
├── Pushes to remote
├── Updates deployment
└── Marks task complete
Orchestrator's role:
- NOT micromanaging each step
- Responding to user commands like "Move task to next phase"
- Tracking task state in database
- Providing UI showing current phase
- Can intervene if phase fails (e.g., tests fail in BUILD)
UI representation:
Task: Update Titles
├── Status: REVIEW
├── Assigned: reviewer-agent-003
├── History:
│ ├── PLAN: planner-001 (completed 2m ago)
│ ├── BUILD: builder-002 (completed 1m ago)
│ └── REVIEW: reviewer-003 (in progress)
└── Files modified: 2
Key techniques:
- Clear phases - No ambiguity about current state
- Quality gates - Can't skip to SHIP without REVIEW
- Agent specialization - Each agent expert in its phase
- Failure isolation - If BUILD fails, PLAN preserved
Results:
- Zero shipping untested code (REVIEW gate catches issues)
- Clear audit trail (who did what in which phase)
- Parallel tasks (multiple agents in different columns)
- Single interface (user sees all tasks across all phases)
Source: Custom Agents transcript
Case Study 6: Meta-Agent System
Pattern: Agents building agents
Problem: Need new specialized agent but don't want to hand-write configuration
Solution: Meta-agent that builds other agents
Meta-agent prompt:
# meta-agent.md
You are a meta-agent that builds new sub-agents from user descriptions.
When user says "build a new sub-agent":
1. Ask what the agent should do
2. Fetch Claude Code sub-agent documentation
3. Design system prompt for new agent
4. Create agent configuration file
5. Test agent with sample prompts
6. Report usage examples
Output: .claude/agents/<agent-name>.md with complete configuration
Example: Building TTS summary agent
User: "Build agent that summarizes what my code does using text-to-speech"
Meta-agent process:
Step 1: Understand requirements
├── Parse: "summarize code" + "text-to-speech"
├── Infer: Needs code reading + TTS API access
└── Clarify: Voice provider? (user chooses 11Labs)
Step 2: Fetch documentation
├── Reads Claude Code sub-agent docs
├── Reads 11Labs API docs
└── Understands agent configuration format
Step 3: Design system prompt
├── Purpose: Concise code summaries via voice
├── Tools needed: read files, 11Labs TTS
├── Response format: Audio file output
└── Trigger: "use TTS summary"
Step 4: Create configuration
Writes .claude/agents/tts-summary.md:
---
name: tts-summary
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
---
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
[... full system prompt ...]
Step 5: Test
├── Runs test prompt: "TTS summary for hooks.py"
├── Agent reads file, generates summary
├── Outputs audio with summary
└── Validates: Works correctly ✅
Step 6: Report
├── Explains how to use new agent
├── Shows example prompts
└── Notes: Can adjust voice, length, etc.
Result: Fully functional TTS summary agent created from natural language description
Recursion depth:
Level 0: Human user
└→ Level 1: Meta-agent (builds agents)
└→ Level 2: TTS summary agent (built by meta-agent)
└→ Level 3: Sub-agents (if TTS agent spawns any)
Key techniques:
- Documentation fetching - Meta-agent reads official docs
- Template following - Follows agent configuration patterns
- Validation loop - Tests before declaring success
- Recursive creation - Agents can build agents
Challenges:
- Dependency coupling - New agent depends on meta-agent's understanding
- Debugging difficulty - If generated agent fails, hard to trace
- Version drift - Meta-agent's docs knowledge may become outdated
Results:
- New agent in ~2 minutes vs. 15+ minutes manually
- Follows best practices automatically
- Tested before delivery
- Documented usage
Source: Sub-Agents transcript
Case Study 7: Observability Dashboard
Pattern: Real-time multi-agent monitoring
Problem: Running 5-10 agents simultaneously. Can't see what they're doing.
Solution: Centralized observability system
Architecture:
┌──────────────────── Multiple Agents ────────────────────┐
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
│ ↓ ↓ ↓ ↓ ↓ │
│ pre/post-tool-use hooks │
│ ↓ │
└──────────────────────────────────────────────────────────┘
↓
┌──────────────────── Bun Server ─────────────────────────┐
│ POST /events endpoint │
│ ├→ Store in SQLite (persistence) │
│ └→ Broadcast via WebSocket (real-time) │
└──────────────────────────────────────────────────────────┘
↓
┌──────────────────── Web Client ─────────────────────────┐
│ ┌─────────────────────────────────────────────────┐ │
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Stream (filtered by app/session/type) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Details (with AI-generated summaries) │ │
│ └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Hook implementation:
# .claude/hooks/post-tool-use.py
import sys, json, subprocess
def main():
event = json.load(sys.stdin)
# Send to observability server (with AI summary)
subprocess.run([
"uv", "run",
".claude/hooks/utils/send-event.py",
"my-codebase", # App name
"post-tool-use", # Event type
"--summarize" # Generate Haiku summary
], input=json.dumps(event), text=True)
if __name__ == "__main__":
main()
What user sees:
┌─────────────── Live Activity Pulse ───────────────┐
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
└────────────────────────────────────────────────────┘
┌─────────────── Event Stream ──────────────────────┐
│ [Agent A] post-tool-use │
│ Summary: "Wrote authentication logic to user.py"│
│ Time: 2s ago │
├────────────────────────────────────────────────────┤
│ [Agent B] sub-agent-stop │
│ Summary: "Completed documentation scrape" │
│ Time: 5s ago │
├────────────────────────────────────────────────────┤
│ [Agent C] notification │
│ Summary: "Needs approval for rm command" │
│ Time: 8s ago │
└────────────────────────────────────────────────────┘
Filtering:
Filters available:
├── By app (codebase-1, codebase-2, etc.)
├── By agent session ID
├── By event type (pre-tool, post-tool, stop, etc.)
└── By time window (1min, 3min, 5min)
Event summarization:
# Each event summarized by Haiku ($0.0002 per event)
Event: post-tool-use for Write tool
Input: {file: "auth.py", content: "...500 lines..."}
Output: Success
Summary generated:
"Implemented JWT authentication with refresh tokens in auth.py"
Cost: $0.0002
Human value: Instant understanding without reading 500 lines
Key techniques:
- One-way data stream - Simple, fast, scalable
- Edge summarization - AI summaries generated at hook time
- Dual storage - SQLite (history) + WebSocket (real-time)
- Color coding - Consistent colors per agent session
Results:
- 5-10 agents monitored simultaneously
- Thousands of events logged (cost: ~$0.20)
- Real-time visibility into all agent work
- Historical analysis via SQLite queries
Business value:
- Catch errors fast (notification events = agent blocked)
- Optimize workflows (which tools used most?)
- Debug issues (what happened before failure?)
- Scale confidence (can observe 10+ agents easily)
Source: Multi-Agent Observability transcript
Case Study 8: AFK Agent Device
Pattern: Autonomous background work while you're away
Problem: Long-running tasks block your terminal. You want to work on something else.
Solution: Dedicated device running agent fleet
Architecture:
Your Device (interactive):
├── Claude Code session
├── Send job to agent device
└── Monitor status updates
Agent Device (autonomous):
├── Picks up job from queue
├── Executes: Scout → Plan → Build → Ship
├── Reports status every 60s
└── Ships results to git
Workflow:
# From your device
/afk-agents \
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
--adw "plan-build-ship" \
--docs "https://openai-agent-sdk.com/docs"
# Job sent to dedicated device
# You continue working on your device
# Background: Agent device executes workflow
Agent device execution:
[00:00] Job received: Build 3 SDK agents
[00:05] Planner agent created
[00:45] Plan complete: 3 agents specified
[01:00] Builder agent 1 created (basic agent)
[02:30] Builder agent 1 complete: basic-agent.py ✅
[02:35] Builder agent 2 created (with tools)
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
[04:20] Builder agent 3 created (realtime voice)
[07:45] Builder agent 3 partial: needs audio libraries
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
[08:05] Shipper agent created
[08:20] Git commit created
[08:25] Pushed to remote
[08:30] Job complete ✅
Status updates (every 60s):
Your device shows:
[60s] Status: Planning agents...
[120s] Status: Building agent 1 of 3...
[180s] Status: Building agent 2 of 3...
[240s] Status: Building agent 3 of 3...
[300s] Status: Testing agents...
[360s] Status: Shipping to git...
[420s] Status: Complete ✅
Click to view: results/sdk-agents-20250105/
What you do:
1. Send job (10 seconds)
2. Go AFK (work on something else)
3. Get notified when complete (7 minutes later)
4. Review results
Key techniques:
- Job queue - Agents pick up work from queue
- Async status - Reports back periodically
- Autonomous execution - No human in the loop
- Git integration - Results automatically committed
Results:
- 3 SDK agents built in 7 minutes
- You worked on other things during that time
- Autonomous end-to-end - plan + build + test + ship
- Code review - Quick glance confirms quality
Infrastructure required:
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
- Agent queue system
- Job scheduler
- Status reporting
Use cases:
- Long-running builds
- Overnight work
- Prototyping experiments
- Documentation generation
- Codebase refactors
Source: Claude 2.0 transcript
Cross-Cutting Patterns
Pattern: Context Window as Resource Constraint
Appears in:
- Case 1: Sub-agent delegation protects primary
- Case 2: Scout-plan-build reduces planner context
- Case 3: Orchestrator sleeps to protect its context
- Case 8: Fresh agents for each phase (no accumulation)
Lesson: Context is precious. Protect it aggressively.
Pattern: Specialized Agents Over General
Appears in:
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
- Case 4: Scout finds, builder builds (not one agent doing both)
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
- Case 6: Meta-agent only builds, doesn't execute
Lesson: "A focused agent is a performant agent."
Pattern: Observability Enables Scale
Appears in:
- Case 3: Orchestrator tracks agent status
- Case 5: Task board shows current phase
- Case 7: Real-time dashboard for all agents
- Case 8: Status updates every 60s
Lesson: "If you can't measure it, you can't scale it."
Pattern: Deletable Temporary Resources
Appears in:
- Case 3: All 3 agents deleted after completion
- Case 4: Scout and builder deleted
- Case 5: Each phase agent deleted after task moves
- Case 8: Builder agents deleted after shipping
Lesson: "The best agent is a deleted agent."
Performance Comparisons
Single Agent vs. Multi-Agent
| Task | Single Agent | Multi-Agent | Speedup |
|---|---|---|---|
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
With vs. Without Orchestration
| Metric | Manual (no orchestrator) | With Orchestrator |
|---|---|---|
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
| Error recovery | Start over | Retry failed phase only |
| Observability | Terminal logs | Real-time dashboard |
Common Failure Modes
Failure: Context Explosion
Scenario: Case 2 without scouts
- Single agent reads 100+ files
- Context hits 180k tokens
- Agent slows down, makes mistakes
- Eventually fails or times out
Fix: Add scout phase to filter files first
Failure: Orchestrator Watching Everything
Scenario: Case 3 with observing orchestrator
- Orchestrator watches all agent work
- Orchestrator context grows to 100k+
- Can't coordinate more than 2-3 agents
- System doesn't scale
Fix: Implement orchestrator sleep pattern
Failure: No Observability
Scenario: Case 7 without dashboard
- 5 agents running
- One agent stuck on permission request
- No way to know which agent needs attention
- Entire workflow blocked
Fix: Add hooks + observability system
Failure: Agent Accumulation
Scenario: Case 5 not deleting agents
- 20 tasks completed
- 80 agents still running (4 per task)
- System resources exhausted
- New agents can't start
Fix: Delete agents after task completion
Key Takeaways
-
Parallelization = Sub-agents - Nothing else runs agents in parallel
-
Context protection = Specialization - Focused agents use less context
-
Orchestration = Scale - Single interface manages fleet
-
Observability = Confidence - Can't scale what you can't see
-
Deletable = Sustainable - Free resources for next task
-
Multi-agent is Level 5 - Requires mastering Levels 1-4 first
When to Use Multi-Agent Patterns
Use multi-agent when:
- ✅ Task naturally divides into parallel subtasks
- ✅ Single agent context approaching limits
- ✅ Need quality gates between phases
- ✅ Want to work on other things while agents execute
- ✅ Have observability infrastructure
Don't use multi-agent when:
- ❌ Simple one-off task
- ❌ Learning/prototyping phase
- ❌ No way to monitor agents
- ❌ Task requires tight human-in-loop feedback
Source Attribution
All case studies drawn from field experience documented in 8 source transcripts:
- Elite Context Engineering - Case 1 (AI docs loader)
- Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
- Custom Agents - Case 5 (task board)
- Sub-Agents - Case 6 (meta-agent)
- Multi-Agent Observability - Case 7 (dashboard)
- Hooked - Supporting patterns
- One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
- (Transcript 8 name not specified in context)
Related Documentation
- Orchestrator Pattern - Multi-agent coordination
- Hooks for Observability - Monitoring implementation
- Context Window Protection - Resource management
- Evolution Path - Progression to multi-agent mastery
Remember: These are real systems in production. Start simple, add complexity only when needed.