993 lines
30 KiB
Markdown
993 lines
30 KiB
Markdown
# Multi-Agent Case Studies
|
||
|
||
Real-world examples of multi-agent systems in production, drawn from field experience.
|
||
|
||
## Case Study Index
|
||
|
||
| # | Name | Pattern | Agents | Key Lesson |
|
||
|---|------|---------|--------|------------|
|
||
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
|
||
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
|
||
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
|
||
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
|
||
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
|
||
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
|
||
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
|
||
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
|
||
|
||
---
|
||
|
||
## Case Study 1: AI Docs Loader
|
||
|
||
**Pattern:** Sub-agent delegation for parallel work
|
||
|
||
**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
|
||
|
||
**Solution:** Delegate each scrape to isolated sub-agent
|
||
|
||
**Architecture:**
|
||
|
||
```text
|
||
Primary Agent (9k tokens)
|
||
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
|
||
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
|
||
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
|
||
...
|
||
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
|
||
|
||
Total work: 39k tokens
|
||
Primary agent: Only 9k tokens ✅
|
||
Context protected: 30k tokens kept out of primary
|
||
```
|
||
|
||
**Implementation:**
|
||
|
||
```bash
|
||
# Single command
|
||
/load-ai-docs
|
||
|
||
# Agent reads list from ai-docs/README.md
|
||
# For each URL older than 24 hours:
|
||
# - Spawn sub-agent
|
||
# - Sub-agent scrapes URL
|
||
# - Sub-agent saves to file
|
||
# - Sub-agent reports completion
|
||
# Primary agent never sees scrape content
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Sub-agents for isolation** - Each scrape in separate context
|
||
- **Parallel execution** - All 10 scrapes run simultaneously
|
||
- **Context delegation** - 30k tokens stay out of primary
|
||
|
||
**Results:**
|
||
|
||
- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
|
||
- **Context:** Primary agent stays at 9k tokens throughout
|
||
- **Scalability:** Can handle 50+ URLs without primary context issues
|
||
|
||
**Source:** Elite Context Engineering transcript
|
||
|
||
---
|
||
|
||
## Case Study 2: SDK Migration
|
||
|
||
**Pattern:** Scout-plan-build with multiple perspectives
|
||
|
||
**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
|
||
|
||
**Challenge:**
|
||
|
||
- 100+ files potentially affected
|
||
- Agent reading everything = 150k+ tokens
|
||
- Planning without full context = mistakes
|
||
|
||
**Solution:** Three-phase workflow with delegation
|
||
|
||
**Phase 1: Scout (Reduce context for planner)**
|
||
|
||
```text
|
||
Orchestrator spawns 4 scout agents (parallel):
|
||
├→ Scout 1: Gemini Lightning (fast, different perspective)
|
||
├→ Scout 2: CodeX (specialized for code search)
|
||
├→ Scout 3: Gemini Flash Preview
|
||
└→ Scout 4: Haiku (cheap, fast)
|
||
|
||
Each scout:
|
||
- Searches codebase for SDK usage
|
||
- Identifies exact files and line numbers
|
||
- Notes patterns (e.g., "system prompt now explicit")
|
||
|
||
Output: relevant-files.md (5k tokens)
|
||
├── File paths
|
||
├── Line number offsets
|
||
├── Character ranges
|
||
└── Relevant code snippets
|
||
```
|
||
|
||
**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
|
||
|
||
**Phase 2: Plan (Focus on relevant subset)**
|
||
|
||
```text
|
||
Planner agent (new instance):
|
||
├── Reads relevant-files.md (5k tokens)
|
||
├── Scrapes SDK documentation (8k tokens)
|
||
├── Analyzes migration patterns
|
||
└── Creates detailed-plan.md (3k tokens)
|
||
|
||
Context used: 16k tokens
|
||
vs. 150k if reading entire codebase
|
||
Savings: 89% reduction
|
||
```
|
||
|
||
**Phase 3: Build (Execute plan)**
|
||
|
||
```text
|
||
Builder agent (new instance):
|
||
├── Reads detailed-plan.md (3k tokens)
|
||
├── Implements changes across 8 apps
|
||
├── Updates system prompts
|
||
├── Tests each application
|
||
└── Reports completion
|
||
|
||
Context used: ~80k tokens
|
||
Still within safe limits
|
||
```
|
||
|
||
**Final context analysis:**
|
||
|
||
```text
|
||
If single agent:
|
||
├── Search: 40k tokens
|
||
├── Read files: 60k tokens
|
||
├── Plan: 20k tokens
|
||
├── Implement: 30k tokens
|
||
└── Total: 150k tokens (75% used)
|
||
|
||
With scout-plan-build:
|
||
├── Primary orchestrator: 10k tokens
|
||
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
|
||
├── Planner (new agent): 16k tokens
|
||
├── Builder (new agent): 80k tokens
|
||
└── Max per agent: 80k tokens (40% per agent)
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Composable workflows** - Chain /scout, /plan, /build
|
||
- **Multiple scout models** - Diverse perspectives
|
||
- **Context offloading** - Scouts protect planner's context
|
||
- **Fresh agents per phase** - No context accumulation
|
||
|
||
**Results:**
|
||
|
||
- **8 applications migrated** successfully
|
||
- **51% context used** in builder phase (safe margins)
|
||
- **No context explosions** across entire workflow
|
||
- **Completed in single session** (~30 minutes)
|
||
|
||
**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
|
||
|
||
**Lesson:** Disable autocompact buffer. That 22% matters at scale.
|
||
|
||
**Source:** Claude 2.0 transcript
|
||
|
||
---
|
||
|
||
## Case Study 3: Codebase Summarization
|
||
|
||
**Pattern:** Orchestrator with specialized QA agents
|
||
|
||
**Problem:** Summarize large codebase (frontend + backend) with architecture docs
|
||
|
||
**Approach:** Divide and conquer with synthesis
|
||
|
||
**Architecture:**
|
||
|
||
```text
|
||
Orchestrator Agent
|
||
├→ Creates Frontend QA Agent
|
||
│ ├─ Summarizes frontend components
|
||
│ └─ Outputs: frontend-summary.md
|
||
├→ Creates Backend QA Agent
|
||
│ ├─ Summarizes backend APIs
|
||
│ └─ Outputs: backend-summary.md
|
||
└→ Creates Primary QA Agent
|
||
├─ Reads both summaries
|
||
├─ Synthesizes unified view
|
||
└─ Outputs: codebase-overview.md
|
||
```
|
||
|
||
**Orchestrator behavior:**
|
||
|
||
```text
|
||
1. Parse user request: "Summarize codebase"
|
||
2. Create 3 agents with specialized tasks
|
||
3. Command each agent with detailed prompts
|
||
4. SLEEP (not observing their work)
|
||
5. Wake every 15s to check status
|
||
6. Agents complete → Orchestrator wakes
|
||
7. Collect results (read produced files)
|
||
8. Summarize for user
|
||
9. Delete all 3 agents
|
||
```
|
||
|
||
**Prompts from orchestrator:**
|
||
|
||
```markdown
|
||
Frontend QA Agent:
|
||
"Analyze all files in src/frontend/. Create markdown summary with:
|
||
- Key components and their responsibilities
|
||
- State management approach
|
||
- Routing structure
|
||
- Technology stack
|
||
Output to docs/frontend-summary.md"
|
||
|
||
Backend QA Agent:
|
||
"Analyze all files in src/backend/. Create markdown summary with:
|
||
- API endpoints and their purposes
|
||
- Database schema
|
||
- Authentication/authorization
|
||
- External integrations
|
||
Output to docs/backend-summary.md"
|
||
|
||
Primary QA Agent:
|
||
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
|
||
- High-level architecture
|
||
- How components interact
|
||
- Data flow
|
||
- Key technologies
|
||
Output to docs/codebase-overview.md"
|
||
```
|
||
|
||
**Observability interface shows:**
|
||
|
||
```text
|
||
[Agent 1] Frontend QA
|
||
├── Status: Complete ✅
|
||
├── Context: 28k tokens used
|
||
├── Files consumed: 15 files
|
||
├── Files produced: frontend-summary.md
|
||
└── Time: 45 seconds
|
||
|
||
[Agent 2] Backend QA
|
||
├── Status: Complete ✅
|
||
├── Context: 32k tokens used
|
||
├── Files consumed: 12 files
|
||
├── Files produced: backend-summary.md
|
||
└── Time: 52 seconds
|
||
|
||
[Agent 3] Primary QA
|
||
├── Status: Complete ✅
|
||
├── Context: 18k tokens used
|
||
├── Files consumed: 2 files (summaries)
|
||
├── Files produced: codebase-overview.md
|
||
└── Time: 30 seconds
|
||
|
||
Orchestrator:
|
||
├── Context: 12k tokens (commands only, not observing work)
|
||
├── Total time: 52 seconds (parallel execution)
|
||
└── All agents deleted after completion
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Parallel frontend/backend** - 2x speedup
|
||
- **Orchestrator sleeps** - Protects its context
|
||
- **Synthesis agent** - Combines perspectives
|
||
- **Deletable agents** - Freed after use
|
||
|
||
**Results:**
|
||
|
||
- **3 comprehensive docs** created
|
||
- **Max context per agent:** 32k tokens (16%)
|
||
- **Orchestrator context:** 12k tokens (6%)
|
||
- **Time:** 52 seconds (vs. 2+ minutes sequential)
|
||
|
||
**Source:** One Agent to Rule Them All transcript
|
||
|
||
---
|
||
|
||
## Case Study 4: UI Component Creation
|
||
|
||
**Pattern:** Scout-builder two-stage
|
||
|
||
**Problem:** Create gray pills for app header information display
|
||
|
||
**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
|
||
|
||
**Solution:** Scout locates, builder implements
|
||
|
||
**Phase 1: Scout**
|
||
|
||
```text
|
||
Scout Agent:
|
||
├── Task: "Find header UI component files"
|
||
├── Searches for: header, display, pills, info components
|
||
├── Identifies patterns: existing pill styles, color conventions
|
||
├── Locates exact files:
|
||
│ ├── src/components/AppHeader.vue
|
||
│ ├── src/styles/pills.css
|
||
│ └── src/utils/formatters.ts
|
||
└── Outputs: scout-header-report.md with:
|
||
├── File locations
|
||
├── Line numbers for modifications
|
||
├── Existing patterns to follow
|
||
└── Recommended approach
|
||
```
|
||
|
||
**Phase 2: Builder**
|
||
|
||
```text
|
||
Builder Agent:
|
||
├── Reads scout-header-report.md
|
||
├── Follows identified patterns
|
||
├── Creates gray pill components
|
||
├── Applies consistent styling
|
||
├── Outputs modified files with exact changes
|
||
└── Context: Only 30k tokens (vs. 80k+ without scout)
|
||
```
|
||
|
||
**Orchestrator involvement:**
|
||
|
||
```text
|
||
1. User prompts: "Create gray pills for header"
|
||
2. Orchestrator creates Scout
|
||
3. Orchestrator SLEEPS (checks every 15s)
|
||
4. Scout completes → Orchestrator wakes
|
||
5. Orchestrator reads scout output
|
||
6. Orchestrator creates Builder with detailed instructions
|
||
7. Orchestrator SLEEPS again
|
||
8. Builder completes → Orchestrator wakes
|
||
9. Orchestrator reports results
|
||
10. Orchestrator deletes both agents
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Scout reduces uncertainty** - Builder knows exactly where to work
|
||
- **Pattern following** - Scout identifies conventions
|
||
- **Orchestrator sleep** - Two phases, minimal orchestrator context
|
||
- **Precise targeting** - No wasted reads
|
||
|
||
**Results:**
|
||
|
||
- **Scout:** 15k tokens, 20 seconds
|
||
- **Builder:** 30k tokens, 35 seconds
|
||
- **Orchestrator:** 8k tokens final
|
||
- **Total time:** 55 seconds
|
||
- **Feature shipped** correctly on first try
|
||
|
||
**Source:** One Agent to Rule Them All transcript
|
||
|
||
---
|
||
|
||
## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
|
||
|
||
**Pattern:** Structured lifecycle with quality gates
|
||
|
||
**Problem:** Ensure all changes go through proper review before shipping
|
||
|
||
**Architecture:**
|
||
|
||
```text
|
||
Task Board Columns:
|
||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
|
||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||
```
|
||
|
||
**Example task: "Update HTML titles"**
|
||
|
||
**Column 1: PLAN**
|
||
|
||
```text
|
||
Planner Agent:
|
||
├── Analyzes requirement
|
||
├── Identifies affected files:
|
||
│ ├── index.html
|
||
│ └── src/App.tsx (has <title> in render)
|
||
├── Creates implementation plan:
|
||
│ 1. Update index.html <title>
|
||
│ 2. Update App.tsx header component
|
||
│ 3. Test both pages load correctly
|
||
└── Moves task to BUILD column
|
||
```
|
||
|
||
**Column 2: BUILD**
|
||
|
||
```text
|
||
Builder Agent:
|
||
├── Reads plan from PLAN column
|
||
├── Implements changes:
|
||
│ ├── index.html: "Plan Build Review Ship"
|
||
│ └── App.tsx: header="Plan Build Review Ship"
|
||
├── Runs tests: All passing ✅
|
||
└── Moves task to REVIEW column
|
||
```
|
||
|
||
**Column 3: REVIEW**
|
||
|
||
```text
|
||
Reviewer Agent:
|
||
├── Reads plan and implementation
|
||
├── Checks:
|
||
│ ├── Plan followed? ✅
|
||
│ ├── Tests passing? ✅
|
||
│ ├── Code quality? ✅
|
||
│ └── No security issues? ✅
|
||
├── Approves changes
|
||
└── Moves task to SHIP column
|
||
```
|
||
|
||
**Column 4: SHIP**
|
||
|
||
```text
|
||
Shipper Agent:
|
||
├── Creates git commit
|
||
├── Pushes to remote
|
||
├── Updates deployment
|
||
└── Marks task complete
|
||
```
|
||
|
||
**Orchestrator's role:**
|
||
|
||
```text
|
||
- NOT micromanaging each step
|
||
- Responding to user commands like "Move task to next phase"
|
||
- Tracking task state in database
|
||
- Providing UI showing current phase
|
||
- Can intervene if phase fails (e.g., tests fail in BUILD)
|
||
```
|
||
|
||
**UI representation:**
|
||
|
||
```text
|
||
Task: Update Titles
|
||
├── Status: REVIEW
|
||
├── Assigned: reviewer-agent-003
|
||
├── History:
|
||
│ ├── PLAN: planner-001 (completed 2m ago)
|
||
│ ├── BUILD: builder-002 (completed 1m ago)
|
||
│ └── REVIEW: reviewer-003 (in progress)
|
||
└── Files modified: 2
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Clear phases** - No ambiguity about current state
|
||
- **Quality gates** - Can't skip to SHIP without REVIEW
|
||
- **Agent specialization** - Each agent expert in its phase
|
||
- **Failure isolation** - If BUILD fails, PLAN preserved
|
||
|
||
**Results:**
|
||
|
||
- **Zero shipping untested code** (REVIEW gate catches issues)
|
||
- **Clear audit trail** (who did what in which phase)
|
||
- **Parallel tasks** (multiple agents in different columns)
|
||
- **Single interface** (user sees all tasks across all phases)
|
||
|
||
**Source:** Custom Agents transcript
|
||
|
||
---
|
||
|
||
## Case Study 6: Meta-Agent System
|
||
|
||
**Pattern:** Agents building agents
|
||
|
||
**Problem:** Need new specialized agent but don't want to hand-write configuration
|
||
|
||
**Solution:** Meta-agent that builds other agents
|
||
|
||
**Meta-agent prompt:**
|
||
|
||
```markdown
|
||
# meta-agent.md
|
||
|
||
You are a meta-agent that builds new sub-agents from user descriptions.
|
||
|
||
When user says "build a new sub-agent":
|
||
1. Ask what the agent should do
|
||
2. Fetch Claude Code sub-agent documentation
|
||
3. Design system prompt for new agent
|
||
4. Create agent configuration file
|
||
5. Test agent with sample prompts
|
||
6. Report usage examples
|
||
|
||
Output: .claude/agents/<agent-name>.md with complete configuration
|
||
```
|
||
|
||
**Example: Building TTS summary agent**
|
||
|
||
**User:** "Build agent that summarizes what my code does using text-to-speech"
|
||
|
||
**Meta-agent process:**
|
||
|
||
```text
|
||
Step 1: Understand requirements
|
||
├── Parse: "summarize code" + "text-to-speech"
|
||
├── Infer: Needs code reading + TTS API access
|
||
└── Clarify: Voice provider? (user chooses 11Labs)
|
||
|
||
Step 2: Fetch documentation
|
||
├── Reads Claude Code sub-agent docs
|
||
├── Reads 11Labs API docs
|
||
└── Understands agent configuration format
|
||
|
||
Step 3: Design system prompt
|
||
├── Purpose: Concise code summaries via voice
|
||
├── Tools needed: read files, 11Labs TTS
|
||
├── Response format: Audio file output
|
||
└── Trigger: "use TTS summary"
|
||
|
||
Step 4: Create configuration
|
||
Writes .claude/agents/tts-summary.md:
|
||
---
|
||
name: tts-summary
|
||
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
|
||
---
|
||
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
|
||
[... full system prompt ...]
|
||
|
||
Step 5: Test
|
||
├── Runs test prompt: "TTS summary for hooks.py"
|
||
├── Agent reads file, generates summary
|
||
├── Outputs audio with summary
|
||
└── Validates: Works correctly ✅
|
||
|
||
Step 6: Report
|
||
├── Explains how to use new agent
|
||
├── Shows example prompts
|
||
└── Notes: Can adjust voice, length, etc.
|
||
```
|
||
|
||
**Result:** Fully functional TTS summary agent created from natural language description
|
||
|
||
**Recursion depth:**
|
||
|
||
```text
|
||
Level 0: Human user
|
||
└→ Level 1: Meta-agent (builds agents)
|
||
└→ Level 2: TTS summary agent (built by meta-agent)
|
||
└→ Level 3: Sub-agents (if TTS agent spawns any)
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Documentation fetching** - Meta-agent reads official docs
|
||
- **Template following** - Follows agent configuration patterns
|
||
- **Validation loop** - Tests before declaring success
|
||
- **Recursive creation** - Agents can build agents
|
||
|
||
**Challenges:**
|
||
|
||
- **Dependency coupling** - New agent depends on meta-agent's understanding
|
||
- **Debugging difficulty** - If generated agent fails, hard to trace
|
||
- **Version drift** - Meta-agent's docs knowledge may become outdated
|
||
|
||
**Results:**
|
||
|
||
- **New agent in ~2 minutes** vs. 15+ minutes manually
|
||
- **Follows best practices** automatically
|
||
- **Tested before delivery**
|
||
- **Documented usage**
|
||
|
||
**Source:** Sub-Agents transcript
|
||
|
||
---
|
||
|
||
## Case Study 7: Observability Dashboard
|
||
|
||
**Pattern:** Real-time multi-agent monitoring
|
||
|
||
**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
|
||
|
||
**Solution:** Centralized observability system
|
||
|
||
**Architecture:**
|
||
|
||
```text
|
||
┌──────────────────── Multiple Agents ────────────────────┐
|
||
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
|
||
│ ↓ ↓ ↓ ↓ ↓ │
|
||
│ pre/post-tool-use hooks │
|
||
│ ↓ │
|
||
└──────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────── Bun Server ─────────────────────────┐
|
||
│ POST /events endpoint │
|
||
│ ├→ Store in SQLite (persistence) │
|
||
│ └→ Broadcast via WebSocket (real-time) │
|
||
└──────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────── Web Client ─────────────────────────┐
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
|
||
│ ├─────────────────────────────────────────────────┤ │
|
||
│ │ Event Stream (filtered by app/session/type) │ │
|
||
│ ├─────────────────────────────────────────────────┤ │
|
||
│ │ Event Details (with AI-generated summaries) │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Hook implementation:**
|
||
|
||
```python
|
||
# .claude/hooks/post-tool-use.py
|
||
import sys, json, subprocess
|
||
|
||
def main():
|
||
event = json.load(sys.stdin)
|
||
|
||
# Send to observability server (with AI summary)
|
||
subprocess.run([
|
||
"uv", "run",
|
||
".claude/hooks/utils/send-event.py",
|
||
"my-codebase", # App name
|
||
"post-tool-use", # Event type
|
||
"--summarize" # Generate Haiku summary
|
||
], input=json.dumps(event), text=True)
|
||
|
||
if __name__ == "__main__":
|
||
main()
|
||
```
|
||
|
||
**What user sees:**
|
||
|
||
```text
|
||
┌─────────────── Live Activity Pulse ───────────────┐
|
||
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
|
||
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
|
||
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
|
||
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
|
||
└────────────────────────────────────────────────────┘
|
||
|
||
┌─────────────── Event Stream ──────────────────────┐
|
||
│ [Agent A] post-tool-use │
|
||
│ Summary: "Wrote authentication logic to user.py"│
|
||
│ Time: 2s ago │
|
||
├────────────────────────────────────────────────────┤
|
||
│ [Agent B] sub-agent-stop │
|
||
│ Summary: "Completed documentation scrape" │
|
||
│ Time: 5s ago │
|
||
├────────────────────────────────────────────────────┤
|
||
│ [Agent C] notification │
|
||
│ Summary: "Needs approval for rm command" │
|
||
│ Time: 8s ago │
|
||
└────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Filtering:**
|
||
|
||
```text
|
||
Filters available:
|
||
├── By app (codebase-1, codebase-2, etc.)
|
||
├── By agent session ID
|
||
├── By event type (pre-tool, post-tool, stop, etc.)
|
||
└── By time window (1min, 3min, 5min)
|
||
```
|
||
|
||
**Event summarization:**
|
||
|
||
```python
|
||
# Each event summarized by Haiku ($0.0002 per event)
|
||
Event: post-tool-use for Write tool
|
||
Input: {file: "auth.py", content: "...500 lines..."}
|
||
Output: Success
|
||
|
||
Summary generated:
|
||
"Implemented JWT authentication with refresh tokens in auth.py"
|
||
|
||
Cost: $0.0002
|
||
Human value: Instant understanding without reading 500 lines
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **One-way data stream** - Simple, fast, scalable
|
||
- **Edge summarization** - AI summaries generated at hook time
|
||
- **Dual storage** - SQLite (history) + WebSocket (real-time)
|
||
- **Color coding** - Consistent colors per agent session
|
||
|
||
**Results:**
|
||
|
||
- **5-10 agents monitored** simultaneously
|
||
- **Thousands of events logged** (cost: ~$0.20)
|
||
- **Real-time visibility** into all agent work
|
||
- **Historical analysis** via SQLite queries
|
||
|
||
**Business value:**
|
||
|
||
- **Catch errors fast** (notification events = agent blocked)
|
||
- **Optimize workflows** (which tools used most?)
|
||
- **Debug issues** (what happened before failure?)
|
||
- **Scale confidence** (can observe 10+ agents easily)
|
||
|
||
**Source:** Multi-Agent Observability transcript
|
||
|
||
---
|
||
|
||
## Case Study 8: AFK Agent Device
|
||
|
||
**Pattern:** Autonomous background work while you're away
|
||
|
||
**Problem:** Long-running tasks block your terminal. You want to work on something else.
|
||
|
||
**Solution:** Dedicated device running agent fleet
|
||
|
||
**Architecture:**
|
||
|
||
```text
|
||
Your Device (interactive):
|
||
├── Claude Code session
|
||
├── Send job to agent device
|
||
└── Monitor status updates
|
||
|
||
Agent Device (autonomous):
|
||
├── Picks up job from queue
|
||
├── Executes: Scout → Plan → Build → Ship
|
||
├── Reports status every 60s
|
||
└── Ships results to git
|
||
```
|
||
|
||
**Workflow:**
|
||
|
||
```bash
|
||
# From your device
|
||
/afk-agents \
|
||
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
|
||
--adw "plan-build-ship" \
|
||
--docs "https://openai-agent-sdk.com/docs"
|
||
|
||
# Job sent to dedicated device
|
||
# You continue working on your device
|
||
# Background: Agent device executes workflow
|
||
```
|
||
|
||
**Agent device execution:**
|
||
|
||
```text
|
||
[00:00] Job received: Build 3 SDK agents
|
||
[00:05] Planner agent created
|
||
[00:45] Plan complete: 3 agents specified
|
||
[01:00] Builder agent 1 created (basic agent)
|
||
[02:30] Builder agent 1 complete: basic-agent.py ✅
|
||
[02:35] Builder agent 2 created (with tools)
|
||
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
|
||
[04:20] Builder agent 3 created (realtime voice)
|
||
[07:45] Builder agent 3 partial: needs audio libraries
|
||
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
|
||
[08:05] Shipper agent created
|
||
[08:20] Git commit created
|
||
[08:25] Pushed to remote
|
||
[08:30] Job complete ✅
|
||
```
|
||
|
||
**Status updates (every 60s):**
|
||
|
||
```text
|
||
Your device shows:
|
||
|
||
[60s] Status: Planning agents...
|
||
[120s] Status: Building agent 1 of 3...
|
||
[180s] Status: Building agent 2 of 3...
|
||
[240s] Status: Building agent 3 of 3...
|
||
[300s] Status: Testing agents...
|
||
[360s] Status: Shipping to git...
|
||
[420s] Status: Complete ✅
|
||
|
||
Click to view: results/sdk-agents-20250105/
|
||
```
|
||
|
||
**What you do:**
|
||
|
||
```text
|
||
1. Send job (10 seconds)
|
||
2. Go AFK (work on something else)
|
||
3. Get notified when complete (7 minutes later)
|
||
4. Review results
|
||
```
|
||
|
||
**Key techniques:**
|
||
|
||
- **Job queue** - Agents pick up work from queue
|
||
- **Async status** - Reports back periodically
|
||
- **Autonomous execution** - No human in the loop
|
||
- **Git integration** - Results automatically committed
|
||
|
||
**Results:**
|
||
|
||
- **3 SDK agents built** in 7 minutes
|
||
- **You worked on other things** during that time
|
||
- **Autonomous end-to-end** - plan + build + test + ship
|
||
- **Code review** - Quick glance confirms quality
|
||
|
||
**Infrastructure required:**
|
||
|
||
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
|
||
- Agent queue system
|
||
- Job scheduler
|
||
- Status reporting
|
||
|
||
**Use cases:**
|
||
|
||
- Long-running builds
|
||
- Overnight work
|
||
- Prototyping experiments
|
||
- Documentation generation
|
||
- Codebase refactors
|
||
|
||
**Source:** Claude 2.0 transcript
|
||
|
||
---
|
||
|
||
## Cross-Cutting Patterns
|
||
|
||
### Pattern: Context Window as Resource Constraint
|
||
|
||
**Appears in:**
|
||
|
||
- Case 1: Sub-agent delegation protects primary
|
||
- Case 2: Scout-plan-build reduces planner context
|
||
- Case 3: Orchestrator sleeps to protect its context
|
||
- Case 8: Fresh agents for each phase (no accumulation)
|
||
|
||
**Lesson:** Context is precious. Protect it aggressively.
|
||
|
||
### Pattern: Specialized Agents Over General
|
||
|
||
**Appears in:**
|
||
|
||
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
|
||
- Case 4: Scout finds, builder builds (not one agent doing both)
|
||
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
|
||
- Case 6: Meta-agent only builds, doesn't execute
|
||
|
||
**Lesson:** "A focused agent is a performant agent."
|
||
|
||
### Pattern: Observability Enables Scale
|
||
|
||
**Appears in:**
|
||
|
||
- Case 3: Orchestrator tracks agent status
|
||
- Case 5: Task board shows current phase
|
||
- Case 7: Real-time dashboard for all agents
|
||
- Case 8: Status updates every 60s
|
||
|
||
**Lesson:** "If you can't measure it, you can't scale it."
|
||
|
||
### Pattern: Deletable Temporary Resources
|
||
|
||
**Appears in:**
|
||
|
||
- Case 3: All 3 agents deleted after completion
|
||
- Case 4: Scout and builder deleted
|
||
- Case 5: Each phase agent deleted after task moves
|
||
- Case 8: Builder agents deleted after shipping
|
||
|
||
**Lesson:** "The best agent is a deleted agent."
|
||
|
||
## Performance Comparisons
|
||
|
||
### Single Agent vs. Multi-Agent
|
||
|
||
| Task | Single Agent | Multi-Agent | Speedup |
|
||
|------|--------------|-------------|---------|
|
||
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
|
||
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
|
||
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
|
||
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
|
||
|
||
### With vs. Without Orchestration
|
||
|
||
| Metric | Manual (no orchestrator) | With Orchestrator |
|
||
|--------|-------------------------|-------------------|
|
||
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
|
||
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
|
||
| Error recovery | Start over | Retry failed phase only |
|
||
| Observability | Terminal logs | Real-time dashboard |
|
||
|
||
## Common Failure Modes
|
||
|
||
### Failure: Context Explosion
|
||
|
||
**Scenario:** Case 2 without scouts
|
||
|
||
- Single agent reads 100+ files
|
||
- Context hits 180k tokens
|
||
- Agent slows down, makes mistakes
|
||
- Eventually fails or times out
|
||
|
||
**Fix:** Add scout phase to filter files first
|
||
|
||
### Failure: Orchestrator Watching Everything
|
||
|
||
**Scenario:** Case 3 with observing orchestrator
|
||
|
||
- Orchestrator watches all agent work
|
||
- Orchestrator context grows to 100k+
|
||
- Can't coordinate more than 2-3 agents
|
||
- System doesn't scale
|
||
|
||
**Fix:** Implement orchestrator sleep pattern
|
||
|
||
### Failure: No Observability
|
||
|
||
**Scenario:** Case 7 without dashboard
|
||
|
||
- 5 agents running
|
||
- One agent stuck on permission request
|
||
- No way to know which agent needs attention
|
||
- Entire workflow blocked
|
||
|
||
**Fix:** Add hooks + observability system
|
||
|
||
### Failure: Agent Accumulation
|
||
|
||
**Scenario:** Case 5 not deleting agents
|
||
|
||
- 20 tasks completed
|
||
- 80 agents still running (4 per task)
|
||
- System resources exhausted
|
||
- New agents can't start
|
||
|
||
**Fix:** Delete agents after task completion
|
||
|
||
## Key Takeaways
|
||
|
||
1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
|
||
|
||
2. **Context protection = Specialization** - Focused agents use less context
|
||
|
||
3. **Orchestration = Scale** - Single interface manages fleet
|
||
|
||
4. **Observability = Confidence** - Can't scale what you can't see
|
||
|
||
5. **Deletable = Sustainable** - Free resources for next task
|
||
|
||
6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
|
||
|
||
## When to Use Multi-Agent Patterns
|
||
|
||
Use multi-agent when:
|
||
|
||
- ✅ Task naturally divides into parallel subtasks
|
||
- ✅ Single agent context approaching limits
|
||
- ✅ Need quality gates between phases
|
||
- ✅ Want to work on other things while agents execute
|
||
- ✅ Have observability infrastructure
|
||
|
||
Don't use multi-agent when:
|
||
|
||
- ❌ Simple one-off task
|
||
- ❌ Learning/prototyping phase
|
||
- ❌ No way to monitor agents
|
||
- ❌ Task requires tight human-in-loop feedback
|
||
|
||
## Source Attribution
|
||
|
||
All case studies drawn from field experience documented in 8 source transcripts:
|
||
|
||
1. Elite Context Engineering - Case 1 (AI docs loader)
|
||
2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
|
||
3. Custom Agents - Case 5 (task board)
|
||
4. Sub-Agents - Case 6 (meta-agent)
|
||
5. Multi-Agent Observability - Case 7 (dashboard)
|
||
6. Hooked - Supporting patterns
|
||
7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
|
||
8. (Transcript 8 name not specified in context)
|
||
|
||
## Related Documentation
|
||
|
||
- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
|
||
- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
|
||
- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
|
||
- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
|
||
|
||
---
|
||
|
||
**Remember:** These are real systems in production. Start simple, add complexity only when needed.
|