Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:00:36 +08:00
commit c83b4639c5
49 changed files with 18594 additions and 0 deletions

View File

@@ -0,0 +1,992 @@
# Multi-Agent Case Studies
Real-world examples of multi-agent systems in production, drawn from field experience.
## Case Study Index
| # | Name | Pattern | Agents | Key Lesson |
|---|------|---------|--------|------------|
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
---
## Case Study 1: AI Docs Loader
**Pattern:** Sub-agent delegation for parallel work
**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
**Solution:** Delegate each scrape to isolated sub-agent
**Architecture:**
```text
Primary Agent (9k tokens)
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
...
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
Total work: 39k tokens
Primary agent: Only 9k tokens ✅
Context protected: 30k tokens kept out of primary
```
**Implementation:**
```bash
# Single command
/load-ai-docs
# Agent reads list from ai-docs/README.md
# For each URL older than 24 hours:
# - Spawn sub-agent
# - Sub-agent scrapes URL
# - Sub-agent saves to file
# - Sub-agent reports completion
# Primary agent never sees scrape content
```
**Key techniques:**
- **Sub-agents for isolation** - Each scrape in separate context
- **Parallel execution** - All 10 scrapes run simultaneously
- **Context delegation** - 30k tokens stay out of primary
**Results:**
- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
- **Context:** Primary agent stays at 9k tokens throughout
- **Scalability:** Can handle 50+ URLs without primary context issues
**Source:** Elite Context Engineering transcript
---
## Case Study 2: SDK Migration
**Pattern:** Scout-plan-build with multiple perspectives
**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
**Challenge:**
- 100+ files potentially affected
- Agent reading everything = 150k+ tokens
- Planning without full context = mistakes
**Solution:** Three-phase workflow with delegation
**Phase 1: Scout (Reduce context for planner)**
```text
Orchestrator spawns 4 scout agents (parallel):
├→ Scout 1: Gemini Lightning (fast, different perspective)
├→ Scout 2: CodeX (specialized for code search)
├→ Scout 3: Gemini Flash Preview
└→ Scout 4: Haiku (cheap, fast)
Each scout:
- Searches codebase for SDK usage
- Identifies exact files and line numbers
- Notes patterns (e.g., "system prompt now explicit")
Output: relevant-files.md (5k tokens)
├── File paths
├── Line number offsets
├── Character ranges
└── Relevant code snippets
```
**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
**Phase 2: Plan (Focus on relevant subset)**
```text
Planner agent (new instance):
├── Reads relevant-files.md (5k tokens)
├── Scrapes SDK documentation (8k tokens)
├── Analyzes migration patterns
└── Creates detailed-plan.md (3k tokens)
Context used: 16k tokens
vs. 150k if reading entire codebase
Savings: 89% reduction
```
**Phase 3: Build (Execute plan)**
```text
Builder agent (new instance):
├── Reads detailed-plan.md (3k tokens)
├── Implements changes across 8 apps
├── Updates system prompts
├── Tests each application
└── Reports completion
Context used: ~80k tokens
Still within safe limits
```
**Final context analysis:**
```text
If single agent:
├── Search: 40k tokens
├── Read files: 60k tokens
├── Plan: 20k tokens
├── Implement: 30k tokens
└── Total: 150k tokens (75% used)
With scout-plan-build:
├── Primary orchestrator: 10k tokens
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
├── Planner (new agent): 16k tokens
├── Builder (new agent): 80k tokens
└── Max per agent: 80k tokens (40% per agent)
```
**Key techniques:**
- **Composable workflows** - Chain /scout, /plan, /build
- **Multiple scout models** - Diverse perspectives
- **Context offloading** - Scouts protect planner's context
- **Fresh agents per phase** - No context accumulation
**Results:**
- **8 applications migrated** successfully
- **51% context used** in builder phase (safe margins)
- **No context explosions** across entire workflow
- **Completed in single session** (~30 minutes)
**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
**Lesson:** Disable autocompact buffer. That 22% matters at scale.
**Source:** Claude 2.0 transcript
---
## Case Study 3: Codebase Summarization
**Pattern:** Orchestrator with specialized QA agents
**Problem:** Summarize large codebase (frontend + backend) with architecture docs
**Approach:** Divide and conquer with synthesis
**Architecture:**
```text
Orchestrator Agent
├→ Creates Frontend QA Agent
│ ├─ Summarizes frontend components
│ └─ Outputs: frontend-summary.md
├→ Creates Backend QA Agent
│ ├─ Summarizes backend APIs
│ └─ Outputs: backend-summary.md
└→ Creates Primary QA Agent
├─ Reads both summaries
├─ Synthesizes unified view
└─ Outputs: codebase-overview.md
```
**Orchestrator behavior:**
```text
1. Parse user request: "Summarize codebase"
2. Create 3 agents with specialized tasks
3. Command each agent with detailed prompts
4. SLEEP (not observing their work)
5. Wake every 15s to check status
6. Agents complete → Orchestrator wakes
7. Collect results (read produced files)
8. Summarize for user
9. Delete all 3 agents
```
**Prompts from orchestrator:**
```markdown
Frontend QA Agent:
"Analyze all files in src/frontend/. Create markdown summary with:
- Key components and their responsibilities
- State management approach
- Routing structure
- Technology stack
Output to docs/frontend-summary.md"
Backend QA Agent:
"Analyze all files in src/backend/. Create markdown summary with:
- API endpoints and their purposes
- Database schema
- Authentication/authorization
- External integrations
Output to docs/backend-summary.md"
Primary QA Agent:
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
- High-level architecture
- How components interact
- Data flow
- Key technologies
Output to docs/codebase-overview.md"
```
**Observability interface shows:**
```text
[Agent 1] Frontend QA
├── Status: Complete ✅
├── Context: 28k tokens used
├── Files consumed: 15 files
├── Files produced: frontend-summary.md
└── Time: 45 seconds
[Agent 2] Backend QA
├── Status: Complete ✅
├── Context: 32k tokens used
├── Files consumed: 12 files
├── Files produced: backend-summary.md
└── Time: 52 seconds
[Agent 3] Primary QA
├── Status: Complete ✅
├── Context: 18k tokens used
├── Files consumed: 2 files (summaries)
├── Files produced: codebase-overview.md
└── Time: 30 seconds
Orchestrator:
├── Context: 12k tokens (commands only, not observing work)
├── Total time: 52 seconds (parallel execution)
└── All agents deleted after completion
```
**Key techniques:**
- **Parallel frontend/backend** - 2x speedup
- **Orchestrator sleeps** - Protects its context
- **Synthesis agent** - Combines perspectives
- **Deletable agents** - Freed after use
**Results:**
- **3 comprehensive docs** created
- **Max context per agent:** 32k tokens (16%)
- **Orchestrator context:** 12k tokens (6%)
- **Time:** 52 seconds (vs. 2+ minutes sequential)
**Source:** One Agent to Rule Them All transcript
---
## Case Study 4: UI Component Creation
**Pattern:** Scout-builder two-stage
**Problem:** Create gray pills for app header information display
**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
**Solution:** Scout locates, builder implements
**Phase 1: Scout**
```text
Scout Agent:
├── Task: "Find header UI component files"
├── Searches for: header, display, pills, info components
├── Identifies patterns: existing pill styles, color conventions
├── Locates exact files:
│ ├── src/components/AppHeader.vue
│ ├── src/styles/pills.css
│ └── src/utils/formatters.ts
└── Outputs: scout-header-report.md with:
├── File locations
├── Line numbers for modifications
├── Existing patterns to follow
└── Recommended approach
```
**Phase 2: Builder**
```text
Builder Agent:
├── Reads scout-header-report.md
├── Follows identified patterns
├── Creates gray pill components
├── Applies consistent styling
├── Outputs modified files with exact changes
└── Context: Only 30k tokens (vs. 80k+ without scout)
```
**Orchestrator involvement:**
```text
1. User prompts: "Create gray pills for header"
2. Orchestrator creates Scout
3. Orchestrator SLEEPS (checks every 15s)
4. Scout completes → Orchestrator wakes
5. Orchestrator reads scout output
6. Orchestrator creates Builder with detailed instructions
7. Orchestrator SLEEPS again
8. Builder completes → Orchestrator wakes
9. Orchestrator reports results
10. Orchestrator deletes both agents
```
**Key techniques:**
- **Scout reduces uncertainty** - Builder knows exactly where to work
- **Pattern following** - Scout identifies conventions
- **Orchestrator sleep** - Two phases, minimal orchestrator context
- **Precise targeting** - No wasted reads
**Results:**
- **Scout:** 15k tokens, 20 seconds
- **Builder:** 30k tokens, 35 seconds
- **Orchestrator:** 8k tokens final
- **Total time:** 55 seconds
- **Feature shipped** correctly on first try
**Source:** One Agent to Rule Them All transcript
---
## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
**Pattern:** Structured lifecycle with quality gates
**Problem:** Ensure all changes go through proper review before shipping
**Architecture:**
```text
Task Board Columns:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```
**Example task: "Update HTML titles"**
**Column 1: PLAN**
```text
Planner Agent:
├── Analyzes requirement
├── Identifies affected files:
│ ├── index.html
│ └── src/App.tsx (has <title> in render)
├── Creates implementation plan:
│ 1. Update index.html <title>
│ 2. Update App.tsx header component
│ 3. Test both pages load correctly
└── Moves task to BUILD column
```
**Column 2: BUILD**
```text
Builder Agent:
├── Reads plan from PLAN column
├── Implements changes:
│ ├── index.html: "Plan Build Review Ship"
│ └── App.tsx: header="Plan Build Review Ship"
├── Runs tests: All passing ✅
└── Moves task to REVIEW column
```
**Column 3: REVIEW**
```text
Reviewer Agent:
├── Reads plan and implementation
├── Checks:
│ ├── Plan followed? ✅
│ ├── Tests passing? ✅
│ ├── Code quality? ✅
│ └── No security issues? ✅
├── Approves changes
└── Moves task to SHIP column
```
**Column 4: SHIP**
```text
Shipper Agent:
├── Creates git commit
├── Pushes to remote
├── Updates deployment
└── Marks task complete
```
**Orchestrator's role:**
```text
- NOT micromanaging each step
- Responding to user commands like "Move task to next phase"
- Tracking task state in database
- Providing UI showing current phase
- Can intervene if phase fails (e.g., tests fail in BUILD)
```
**UI representation:**
```text
Task: Update Titles
├── Status: REVIEW
├── Assigned: reviewer-agent-003
├── History:
│ ├── PLAN: planner-001 (completed 2m ago)
│ ├── BUILD: builder-002 (completed 1m ago)
│ └── REVIEW: reviewer-003 (in progress)
└── Files modified: 2
```
**Key techniques:**
- **Clear phases** - No ambiguity about current state
- **Quality gates** - Can't skip to SHIP without REVIEW
- **Agent specialization** - Each agent expert in its phase
- **Failure isolation** - If BUILD fails, PLAN preserved
**Results:**
- **Zero shipping untested code** (REVIEW gate catches issues)
- **Clear audit trail** (who did what in which phase)
- **Parallel tasks** (multiple agents in different columns)
- **Single interface** (user sees all tasks across all phases)
**Source:** Custom Agents transcript
---
## Case Study 6: Meta-Agent System
**Pattern:** Agents building agents
**Problem:** Need new specialized agent but don't want to hand-write configuration
**Solution:** Meta-agent that builds other agents
**Meta-agent prompt:**
```markdown
# meta-agent.md
You are a meta-agent that builds new sub-agents from user descriptions.
When user says "build a new sub-agent":
1. Ask what the agent should do
2. Fetch Claude Code sub-agent documentation
3. Design system prompt for new agent
4. Create agent configuration file
5. Test agent with sample prompts
6. Report usage examples
Output: .claude/agents/<agent-name>.md with complete configuration
```
**Example: Building TTS summary agent**
**User:** "Build agent that summarizes what my code does using text-to-speech"
**Meta-agent process:**
```text
Step 1: Understand requirements
├── Parse: "summarize code" + "text-to-speech"
├── Infer: Needs code reading + TTS API access
└── Clarify: Voice provider? (user chooses 11Labs)
Step 2: Fetch documentation
├── Reads Claude Code sub-agent docs
├── Reads 11Labs API docs
└── Understands agent configuration format
Step 3: Design system prompt
├── Purpose: Concise code summaries via voice
├── Tools needed: read files, 11Labs TTS
├── Response format: Audio file output
└── Trigger: "use TTS summary"
Step 4: Create configuration
Writes .claude/agents/tts-summary.md:
---
name: tts-summary
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
---
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
[... full system prompt ...]
Step 5: Test
├── Runs test prompt: "TTS summary for hooks.py"
├── Agent reads file, generates summary
├── Outputs audio with summary
└── Validates: Works correctly ✅
Step 6: Report
├── Explains how to use new agent
├── Shows example prompts
└── Notes: Can adjust voice, length, etc.
```
**Result:** Fully functional TTS summary agent created from natural language description
**Recursion depth:**
```text
Level 0: Human user
└→ Level 1: Meta-agent (builds agents)
└→ Level 2: TTS summary agent (built by meta-agent)
└→ Level 3: Sub-agents (if TTS agent spawns any)
```
**Key techniques:**
- **Documentation fetching** - Meta-agent reads official docs
- **Template following** - Follows agent configuration patterns
- **Validation loop** - Tests before declaring success
- **Recursive creation** - Agents can build agents
**Challenges:**
- **Dependency coupling** - New agent depends on meta-agent's understanding
- **Debugging difficulty** - If generated agent fails, hard to trace
- **Version drift** - Meta-agent's docs knowledge may become outdated
**Results:**
- **New agent in ~2 minutes** vs. 15+ minutes manually
- **Follows best practices** automatically
- **Tested before delivery**
- **Documented usage**
**Source:** Sub-Agents transcript
---
## Case Study 7: Observability Dashboard
**Pattern:** Real-time multi-agent monitoring
**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
**Solution:** Centralized observability system
**Architecture:**
```text
┌──────────────────── Multiple Agents ────────────────────┐
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
│ ↓ ↓ ↓ ↓ ↓ │
│ pre/post-tool-use hooks │
│ ↓ │
└──────────────────────────────────────────────────────────┘
┌──────────────────── Bun Server ─────────────────────────┐
│ POST /events endpoint │
│ ├→ Store in SQLite (persistence) │
│ └→ Broadcast via WebSocket (real-time) │
└──────────────────────────────────────────────────────────┘
┌──────────────────── Web Client ─────────────────────────┐
│ ┌─────────────────────────────────────────────────┐ │
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Stream (filtered by app/session/type) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Details (with AI-generated summaries) │ │
│ └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
```
**Hook implementation:**
```python
# .claude/hooks/post-tool-use.py
import sys, json, subprocess
def main():
event = json.load(sys.stdin)
# Send to observability server (with AI summary)
subprocess.run([
"uv", "run",
".claude/hooks/utils/send-event.py",
"my-codebase", # App name
"post-tool-use", # Event type
"--summarize" # Generate Haiku summary
], input=json.dumps(event), text=True)
if __name__ == "__main__":
main()
```
**What user sees:**
```text
┌─────────────── Live Activity Pulse ───────────────┐
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
└────────────────────────────────────────────────────┘
┌─────────────── Event Stream ──────────────────────┐
│ [Agent A] post-tool-use │
│ Summary: "Wrote authentication logic to user.py"│
│ Time: 2s ago │
├────────────────────────────────────────────────────┤
│ [Agent B] sub-agent-stop │
│ Summary: "Completed documentation scrape" │
│ Time: 5s ago │
├────────────────────────────────────────────────────┤
│ [Agent C] notification │
│ Summary: "Needs approval for rm command" │
│ Time: 8s ago │
└────────────────────────────────────────────────────┘
```
**Filtering:**
```text
Filters available:
├── By app (codebase-1, codebase-2, etc.)
├── By agent session ID
├── By event type (pre-tool, post-tool, stop, etc.)
└── By time window (1min, 3min, 5min)
```
**Event summarization:**
```python
# Each event summarized by Haiku ($0.0002 per event)
Event: post-tool-use for Write tool
Input: {file: "auth.py", content: "...500 lines..."}
Output: Success
Summary generated:
"Implemented JWT authentication with refresh tokens in auth.py"
Cost: $0.0002
Human value: Instant understanding without reading 500 lines
```
**Key techniques:**
- **One-way data stream** - Simple, fast, scalable
- **Edge summarization** - AI summaries generated at hook time
- **Dual storage** - SQLite (history) + WebSocket (real-time)
- **Color coding** - Consistent colors per agent session
**Results:**
- **5-10 agents monitored** simultaneously
- **Thousands of events logged** (cost: ~$0.20)
- **Real-time visibility** into all agent work
- **Historical analysis** via SQLite queries
**Business value:**
- **Catch errors fast** (notification events = agent blocked)
- **Optimize workflows** (which tools used most?)
- **Debug issues** (what happened before failure?)
- **Scale confidence** (can observe 10+ agents easily)
**Source:** Multi-Agent Observability transcript
---
## Case Study 8: AFK Agent Device
**Pattern:** Autonomous background work while you're away
**Problem:** Long-running tasks block your terminal. You want to work on something else.
**Solution:** Dedicated device running agent fleet
**Architecture:**
```text
Your Device (interactive):
├── Claude Code session
├── Send job to agent device
└── Monitor status updates
Agent Device (autonomous):
├── Picks up job from queue
├── Executes: Scout → Plan → Build → Ship
├── Reports status every 60s
└── Ships results to git
```
**Workflow:**
```bash
# From your device
/afk-agents \
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
--adw "plan-build-ship" \
--docs "https://openai-agent-sdk.com/docs"
# Job sent to dedicated device
# You continue working on your device
# Background: Agent device executes workflow
```
**Agent device execution:**
```text
[00:00] Job received: Build 3 SDK agents
[00:05] Planner agent created
[00:45] Plan complete: 3 agents specified
[01:00] Builder agent 1 created (basic agent)
[02:30] Builder agent 1 complete: basic-agent.py ✅
[02:35] Builder agent 2 created (with tools)
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
[04:20] Builder agent 3 created (realtime voice)
[07:45] Builder agent 3 partial: needs audio libraries
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
[08:05] Shipper agent created
[08:20] Git commit created
[08:25] Pushed to remote
[08:30] Job complete ✅
```
**Status updates (every 60s):**
```text
Your device shows:
[60s] Status: Planning agents...
[120s] Status: Building agent 1 of 3...
[180s] Status: Building agent 2 of 3...
[240s] Status: Building agent 3 of 3...
[300s] Status: Testing agents...
[360s] Status: Shipping to git...
[420s] Status: Complete ✅
Click to view: results/sdk-agents-20250105/
```
**What you do:**
```text
1. Send job (10 seconds)
2. Go AFK (work on something else)
3. Get notified when complete (7 minutes later)
4. Review results
```
**Key techniques:**
- **Job queue** - Agents pick up work from queue
- **Async status** - Reports back periodically
- **Autonomous execution** - No human in the loop
- **Git integration** - Results automatically committed
**Results:**
- **3 SDK agents built** in 7 minutes
- **You worked on other things** during that time
- **Autonomous end-to-end** - plan + build + test + ship
- **Code review** - Quick glance confirms quality
**Infrastructure required:**
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
- Agent queue system
- Job scheduler
- Status reporting
**Use cases:**
- Long-running builds
- Overnight work
- Prototyping experiments
- Documentation generation
- Codebase refactors
**Source:** Claude 2.0 transcript
---
## Cross-Cutting Patterns
### Pattern: Context Window as Resource Constraint
**Appears in:**
- Case 1: Sub-agent delegation protects primary
- Case 2: Scout-plan-build reduces planner context
- Case 3: Orchestrator sleeps to protect its context
- Case 8: Fresh agents for each phase (no accumulation)
**Lesson:** Context is precious. Protect it aggressively.
### Pattern: Specialized Agents Over General
**Appears in:**
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
- Case 4: Scout finds, builder builds (not one agent doing both)
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
- Case 6: Meta-agent only builds, doesn't execute
**Lesson:** "A focused agent is a performant agent."
### Pattern: Observability Enables Scale
**Appears in:**
- Case 3: Orchestrator tracks agent status
- Case 5: Task board shows current phase
- Case 7: Real-time dashboard for all agents
- Case 8: Status updates every 60s
**Lesson:** "If you can't measure it, you can't scale it."
### Pattern: Deletable Temporary Resources
**Appears in:**
- Case 3: All 3 agents deleted after completion
- Case 4: Scout and builder deleted
- Case 5: Each phase agent deleted after task moves
- Case 8: Builder agents deleted after shipping
**Lesson:** "The best agent is a deleted agent."
## Performance Comparisons
### Single Agent vs. Multi-Agent
| Task | Single Agent | Multi-Agent | Speedup |
|------|--------------|-------------|---------|
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
### With vs. Without Orchestration
| Metric | Manual (no orchestrator) | With Orchestrator |
|--------|-------------------------|-------------------|
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
| Error recovery | Start over | Retry failed phase only |
| Observability | Terminal logs | Real-time dashboard |
## Common Failure Modes
### Failure: Context Explosion
**Scenario:** Case 2 without scouts
- Single agent reads 100+ files
- Context hits 180k tokens
- Agent slows down, makes mistakes
- Eventually fails or times out
**Fix:** Add scout phase to filter files first
### Failure: Orchestrator Watching Everything
**Scenario:** Case 3 with observing orchestrator
- Orchestrator watches all agent work
- Orchestrator context grows to 100k+
- Can't coordinate more than 2-3 agents
- System doesn't scale
**Fix:** Implement orchestrator sleep pattern
### Failure: No Observability
**Scenario:** Case 7 without dashboard
- 5 agents running
- One agent stuck on permission request
- No way to know which agent needs attention
- Entire workflow blocked
**Fix:** Add hooks + observability system
### Failure: Agent Accumulation
**Scenario:** Case 5 not deleting agents
- 20 tasks completed
- 80 agents still running (4 per task)
- System resources exhausted
- New agents can't start
**Fix:** Delete agents after task completion
## Key Takeaways
1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
2. **Context protection = Specialization** - Focused agents use less context
3. **Orchestration = Scale** - Single interface manages fleet
4. **Observability = Confidence** - Can't scale what you can't see
5. **Deletable = Sustainable** - Free resources for next task
6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
## When to Use Multi-Agent Patterns
Use multi-agent when:
- ✅ Task naturally divides into parallel subtasks
- ✅ Single agent context approaching limits
- ✅ Need quality gates between phases
- ✅ Want to work on other things while agents execute
- ✅ Have observability infrastructure
Don't use multi-agent when:
- ❌ Simple one-off task
- ❌ Learning/prototyping phase
- ❌ No way to monitor agents
- ❌ Task requires tight human-in-loop feedback
## Source Attribution
All case studies drawn from field experience documented in 8 source transcripts:
1. Elite Context Engineering - Case 1 (AI docs loader)
2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
3. Custom Agents - Case 5 (task board)
4. Sub-Agents - Case 6 (meta-agent)
5. Multi-Agent Observability - Case 7 (dashboard)
6. Hooked - Supporting patterns
7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
8. (Transcript 8 name not specified in context)
## Related Documentation
- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
---
**Remember:** These are real systems in production. Start simple, add complexity only when needed.

View File

@@ -0,0 +1,358 @@
# Work Tree Manager: Evolution Path Example
**Real-world case study** showing the proper progression from prompt → sub-agent → skill.
## The Problem
Managing git work trees across a project requires multiple related operations:
- Creating new work trees
- Listing existing work trees
- Removing old work trees
- Merging work tree changes
- Updating work tree status
## Stage 1: Start with a Prompt
**Goal:** Solve the basic problem
Create a simple slash command that creates one work tree:
```bash
/create-worktree feature-branch
```
**Implementation:**
```markdown
# .claude/commands/create-worktree.md
Create a new git worktree for the specified branch.
Steps:
1. Check if branch exists
2. Create worktree directory
3. Initialize worktree
4. Report success
```
**When to stay here:** The task is infrequent or one-off.
**Signal to advance:** You find yourself creating work trees regularly.
## Stage 2: Add Sub-Agent for Parallelism
**Goal:** Scale to multiple parallel operations
When you need to create multiple work trees at once, use a sub-agent:
```bash
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
```
**Why sub-agent:**
- **Parallelization** - Create 3 work trees simultaneously
- **Context isolation** - Each creation is independent
- **Speed** - 3x faster than sequential
**Sub-agent prompt:**
```markdown
Create work trees for the following branches in parallel:
- feature-a
- feature-b
- feature-c
For each branch:
1. Verify branch exists
2. Create worktree directory
3. Initialize worktree
4. Report status
Use the /create-worktree command for each.
```
**When to stay here:** Parallel creation is the only requirement.
**Signal to advance:** You need to **manage** work trees (not just create them).
## Stage 3: Create Skill for Management
**Goal:** Bundle multiple related operations
The problem has grown beyond creation—you need comprehensive work tree **management**:
```text
skills/work-tree-manager/
├── SKILL.md
├── scripts/
│ ├── validate.py
│ └── cleanup.py
└── reference/
└── git-worktree-commands.md
```
**SKILL.md:**
```markdown
---
name: work-tree-manager
description: Manage git worktrees - create, list, remove, merge, and update across projects. Use when working with git worktrees or when managing multiple branches simultaneously.
---
# Work Tree Manager
## Operations
### Create
Use /create-worktree command for single operations.
For parallel creation, delegate to sub-agent.
### List
Run: `git worktree list`
Parse output and present in readable format.
### Remove
1. Check if work tree is clean
2. Remove work tree directory
3. Prune references
### Merge
1. Fetch latest changes
2. Merge work tree branch to target
3. Clean up if merge successful
### Update
1. Check status of all work trees
2. Pull latest changes
3. Report any conflicts
## Validation
Before any destructive operation, run:
```bash
python scripts/validate.py <worktree-path>
```
## Cleanup
Periodically run cleanup to remove stale work trees:
```bash
python scripts/cleanup.py --dry-run
```
```bash
**Why skill:**
- **Multiple related operations** - Create, list, remove, merge, update
- **Repeat problem** - Managing work trees is ongoing
- **Domain-specific** - Specialized knowledge about git worktrees
- **Orchestration** - Coordinates slash commands, sub-agents, and scripts
**When to stay here:** Most workflows stop here.
**Signal to advance:** Need external data (GitHub API, CI/CD status).
## Stage 4: Add MCP for External Data
**Goal:** Integrate external systems
Add MCP server to query external repo metadata:
```
skills/work-tree-manager/
├── SKILL.md (updated)
└── ... (existing files)
# Now references GitHub MCP for:
# - Branch protection rules
# - CI/CD status
# - Pull request information
```bash
**Updated SKILL.md section:**
```markdown
## External Integration
Before creating work tree, check GitHub status:
- Use GitHub MCP to query branch protection
- Check if CI is passing
- Verify no open blocking PRs
Query: `GitHub:get_branch_status <branch-name>`
```
**Why MCP:**
- **External data** - Information lives outside Claude Code
- **Real-time** - CI/CD status changes frequently
- **Third-party** - GitHub API integration
## Final State
```
```text
```text
```text
```text
Prompt (Slash Command)
└─→ Creates single work tree
Sub-Agent
└─→ Creates multiple work trees in parallel
Skill
├─→ Orchestrates: Create, list, remove, merge, update
├─→ Uses: Slash commands for primitives
├─→ Uses: Sub-agents for parallel operations
└─→ Uses: Scripts for validation
MCP Server (GitHub)
└─→ Provides: Branch status, CI/CD info, PR data
Skill + MCP
└─→ Full-featured work tree manager with external integration
```
## Key Takeaways
### Progression Signals
**Prompt → Sub-Agent:**
- Signal: Need parallelization
- Keyword: "multiple," "parallel," "batch"
**Sub-Agent → Skill:**
- Signal: Need management, not just execution
- Keywords: "manage," "coordinate," "workflow"
- Multiple related operations emerge
**Skill → Skill + MCP:**
- Signal: Need external data or services
- Keywords: "GitHub," "API," "real-time," "status"
### Common Mistakes
**Skipping the prompt**
- Starting with a skill for simple creation
**Overusing sub-agents**
- Using sub-agents when main conversation would work
**Skill too early**
- Creating skill before understanding the full problem domain
**Correct approach**
- Build from bottom up
- Add complexity only when needed
- Each stage solves a real problem
### Decision Checklist
Before advancing to next stage:
**Prompt → Sub-Agent:**
- [ ] Do I need parallelization?
- [ ] Are operations truly independent?
- [ ] Am I okay losing context after?
**Sub-Agent → Skill:**
- [ ] Am I doing this repeatedly (3+ times)?
- [ ] Do I have multiple related operations?
- [ ] Is this a management problem, not just execution?
- [ ] Would orchestration add real value?
**Skill → Skill + MCP:**
- [ ] Do I need external data?
- [ ] Is the data outside Claude Code's control?
- [ ] Would real-time info improve the workflow?
## Real Usage
### Scenario 1: Quick One-Off
**Task:** Create one work tree for hotfix
**Solution:** Slash command
```bash
/create-worktree hotfix-urgent-bug
```
**Why:** Simple, direct, one-time task.
### Scenario 2: Feature Development Sprint
**Task:** Create work trees for 5 feature branches
**Solution:** Sub-agent
```bash
Create work trees in parallel for sprint features:
feature-auth, feature-api, feature-ui, feature-tests, feature-docs
```
**Why:** Parallel execution, independent operations.
### Scenario 3: Ongoing Project
**Task:** Manage all work trees across development lifecycle
**Solution:** Skill
```text
List all work trees, check status, merge completed features, clean up stale ones
```
**Why:** Multiple operations, repeat problem, management need.
### Scenario 4: CI/CD Integration
**Task:** Only create work trees for branches passing CI
**Solution:** Skill + MCP
```bash
Create work trees for features that:
- Have passing CI (check via GitHub MCP)
- Are approved by reviewers
- Have no merge conflicts
```
**Why:** Need external data from GitHub API.
## Summary
The work tree manager evolution demonstrates:
1. **Start simple** - Slash command for basic operation
2. **Scale for parallelism** - Sub-agent for batch operations
3. **Manage complexity** - Skill for full workflow orchestration
4. **Integrate externally** - MCP for real-time external data
**The principle:** Each stage solves a real problem. Don't advance until you hit the limitation of your current approach.
> "When you're starting out, I always recommend you just build a prompt. Everything is a prompt in the end."
Build from the foundation upward.