Initial commit

2025-11-29 18:00:36 +08:00
commit c83b4639c5
49 changed files with 18594 additions and 0 deletions
--- a/skills/multi-agent-composition/examples/case-studies.md
+++ b/skills/multi-agent-composition/examples/case-studies.md
@@ -0,0 +1,992 @@
+# Multi-Agent Case Studies
+
+Real-world examples of multi-agent systems in production, drawn from field experience.
+
+## Case Study Index
+
+| # | Name | Pattern | Agents | Key Lesson |
+|---|------|---------|--------|------------|
+| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
+| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
+| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
+| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
+| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
+| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
+| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
+| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
+
+---
+
+## Case Study 1: AI Docs Loader
+
+**Pattern:** Sub-agent delegation for parallel work
+
+**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
+
+**Solution:** Delegate each scrape to isolated sub-agent
+
+**Architecture:**
+
+```text
+Primary Agent (9k tokens)
+├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
+├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
+├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
+...
+└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
+
+Total work: 39k tokens
+Primary agent: Only 9k tokens ✅
+Context protected: 30k tokens kept out of primary
+```
+
+**Implementation:**
+
+```bash
+# Single command
+/load-ai-docs
+
+# Agent reads list from ai-docs/README.md
+# For each URL older than 24 hours:
+#   - Spawn sub-agent
+#   - Sub-agent scrapes URL
+#   - Sub-agent saves to file
+#   - Sub-agent reports completion
+# Primary agent never sees scrape content
+```
+
+**Key techniques:**
+
+- **Sub-agents for isolation** - Each scrape in separate context
+- **Parallel execution** - All 10 scrapes run simultaneously
+- **Context delegation** - 30k tokens stay out of primary
+
+**Results:**
+
+- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
+- **Context:** Primary agent stays at 9k tokens throughout
+- **Scalability:** Can handle 50+ URLs without primary context issues
+
+**Source:** Elite Context Engineering transcript
+
+---
+
+## Case Study 2: SDK Migration
+
+**Pattern:** Scout-plan-build with multiple perspectives
+
+**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
+
+**Challenge:**
+
+- 100+ files potentially affected
+- Agent reading everything = 150k+ tokens
+- Planning without full context = mistakes
+
+**Solution:** Three-phase workflow with delegation
+
+**Phase 1: Scout (Reduce context for planner)**
+
+```text
+Orchestrator spawns 4 scout agents (parallel):
+├→ Scout 1: Gemini Lightning (fast, different perspective)
+├→ Scout 2: CodeX (specialized for code search)
+├→ Scout 3: Gemini Flash Preview
+└→ Scout 4: Haiku (cheap, fast)
+
+Each scout:
+- Searches codebase for SDK usage
+- Identifies exact files and line numbers
+- Notes patterns (e.g., "system prompt now explicit")
+
+Output: relevant-files.md (5k tokens)
+├── File paths
+├── Line number offsets
+├── Character ranges
+└── Relevant code snippets
+```
+
+**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
+
+**Phase 2: Plan (Focus on relevant subset)**
+
+```text
+Planner agent (new instance):
+├── Reads relevant-files.md (5k tokens)
+├── Scrapes SDK documentation (8k tokens)
+├── Analyzes migration patterns
+└── Creates detailed-plan.md (3k tokens)
+
+Context used: 16k tokens
+vs. 150k if reading entire codebase
+Savings: 89% reduction
+```
+
+**Phase 3: Build (Execute plan)**
+
+```text
+Builder agent (new instance):
+├── Reads detailed-plan.md (3k tokens)
+├── Implements changes across 8 apps
+├── Updates system prompts
+├── Tests each application
+└── Reports completion
+
+Context used: ~80k tokens
+Still within safe limits
+```
+
+**Final context analysis:**
+
+```text
+If single agent:
+├── Search: 40k tokens
+├── Read files: 60k tokens
+├── Plan: 20k tokens
+├── Implement: 30k tokens
+└── Total: 150k tokens (75% used)
+
+With scout-plan-build:
+├── Primary orchestrator: 10k tokens
+├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
+├── Planner (new agent): 16k tokens
+├── Builder (new agent): 80k tokens
+└── Max per agent: 80k tokens (40% per agent)
+```
+
+**Key techniques:**
+
+- **Composable workflows** - Chain /scout, /plan, /build
+- **Multiple scout models** - Diverse perspectives
+- **Context offloading** - Scouts protect planner's context
+- **Fresh agents per phase** - No context accumulation
+
+**Results:**
+
+- **8 applications migrated** successfully
+- **51% context used** in builder phase (safe margins)
+- **No context explosions** across entire workflow
+- **Completed in single session** (~30 minutes)
+
+**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
+
+**Lesson:** Disable autocompact buffer. That 22% matters at scale.
+
+**Source:** Claude 2.0 transcript
+
+---
+
+## Case Study 3: Codebase Summarization
+
+**Pattern:** Orchestrator with specialized QA agents
+
+**Problem:** Summarize large codebase (frontend + backend) with architecture docs
+
+**Approach:** Divide and conquer with synthesis
+
+**Architecture:**
+
+```text
+Orchestrator Agent
+├→ Creates Frontend QA Agent
+│  ├─ Summarizes frontend components
+│  └─ Outputs: frontend-summary.md
+├→ Creates Backend QA Agent
+│  ├─ Summarizes backend APIs
+│  └─ Outputs: backend-summary.md
+└→ Creates Primary QA Agent
+   ├─ Reads both summaries
+   ├─ Synthesizes unified view
+   └─ Outputs: codebase-overview.md
+```
+
+**Orchestrator behavior:**
+
+```text
+1. Parse user request: "Summarize codebase"
+2. Create 3 agents with specialized tasks
+3. Command each agent with detailed prompts
+4. SLEEP (not observing their work)
+5. Wake every 15s to check status
+6. Agents complete → Orchestrator wakes
+7. Collect results (read produced files)
+8. Summarize for user
+9. Delete all 3 agents
+```
+
+**Prompts from orchestrator:**
+
+```markdown
+Frontend QA Agent:
+"Analyze all files in src/frontend/. Create markdown summary with:
+- Key components and their responsibilities
+- State management approach
+- Routing structure
+- Technology stack
+Output to docs/frontend-summary.md"
+
+Backend QA Agent:
+"Analyze all files in src/backend/. Create markdown summary with:
+- API endpoints and their purposes
+- Database schema
+- Authentication/authorization
+- External integrations
+Output to docs/backend-summary.md"
+
+Primary QA Agent:
+"Read frontend-summary.md and backend-summary.md. Create unified overview with:
+- High-level architecture
+- How components interact
+- Data flow
+- Key technologies
+Output to docs/codebase-overview.md"
+```
+
+**Observability interface shows:**
+
+```text
+[Agent 1] Frontend QA
+├── Status: Complete ✅
+├── Context: 28k tokens used
+├── Files consumed: 15 files
+├── Files produced: frontend-summary.md
+└── Time: 45 seconds
+
+[Agent 2] Backend QA
+├── Status: Complete ✅
+├── Context: 32k tokens used
+├── Files consumed: 12 files
+├── Files produced: backend-summary.md
+└── Time: 52 seconds
+
+[Agent 3] Primary QA
+├── Status: Complete ✅
+├── Context: 18k tokens used
+├── Files consumed: 2 files (summaries)
+├── Files produced: codebase-overview.md
+└── Time: 30 seconds
+
+Orchestrator:
+├── Context: 12k tokens (commands only, not observing work)
+├── Total time: 52 seconds (parallel execution)
+└── All agents deleted after completion
+```
+
+**Key techniques:**
+
+- **Parallel frontend/backend** - 2x speedup
+- **Orchestrator sleeps** - Protects its context
+- **Synthesis agent** - Combines perspectives
+- **Deletable agents** - Freed after use
+
+**Results:**
+
+- **3 comprehensive docs** created
+- **Max context per agent:** 32k tokens (16%)
+- **Orchestrator context:** 12k tokens (6%)
+- **Time:** 52 seconds (vs. 2+ minutes sequential)
+
+**Source:** One Agent to Rule Them All transcript
+
+---
+
+## Case Study 4: UI Component Creation
+
+**Pattern:** Scout-builder two-stage
+
+**Problem:** Create gray pills for app header information display
+
+**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
+
+**Solution:** Scout locates, builder implements
+
+**Phase 1: Scout**
+
+```text
+Scout Agent:
+├── Task: "Find header UI component files"
+├── Searches for: header, display, pills, info components
+├── Identifies patterns: existing pill styles, color conventions
+├── Locates exact files:
+│   ├── src/components/AppHeader.vue
+│   ├── src/styles/pills.css
+│   └── src/utils/formatters.ts
+└── Outputs: scout-header-report.md with:
+    ├── File locations
+    ├── Line numbers for modifications
+    ├── Existing patterns to follow
+    └── Recommended approach
+```
+
+**Phase 2: Builder**
+
+```text
+Builder Agent:
+├── Reads scout-header-report.md
+├── Follows identified patterns
+├── Creates gray pill components
+├── Applies consistent styling
+├── Outputs modified files with exact changes
+└── Context: Only 30k tokens (vs. 80k+ without scout)
+```
+
+**Orchestrator involvement:**
+
+```text
+1. User prompts: "Create gray pills for header"
+2. Orchestrator creates Scout
+3. Orchestrator SLEEPS (checks every 15s)
+4. Scout completes → Orchestrator wakes
+5. Orchestrator reads scout output
+6. Orchestrator creates Builder with detailed instructions
+7. Orchestrator SLEEPS again
+8. Builder completes → Orchestrator wakes
+9. Orchestrator reports results
+10. Orchestrator deletes both agents
+```
+
+**Key techniques:**
+
+- **Scout reduces uncertainty** - Builder knows exactly where to work
+- **Pattern following** - Scout identifies conventions
+- **Orchestrator sleep** - Two phases, minimal orchestrator context
+- **Precise targeting** - No wasted reads
+
+**Results:**
+
+- **Scout:** 15k tokens, 20 seconds
+- **Builder:** 30k tokens, 35 seconds
+- **Orchestrator:** 8k tokens final
+- **Total time:** 55 seconds
+- **Feature shipped** correctly on first try
+
+**Source:** One Agent to Rule Them All transcript
+
+---
+
+## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
+
+**Pattern:** Structured lifecycle with quality gates
+
+**Problem:** Ensure all changes go through proper review before shipping
+
+**Architecture:**
+
+```text
+Task Board Columns:
+┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
+│  PLAN   │→ │  BUILD  │→ │ REVIEW  │→ │  SHIP   │
+└─────────┘  └─────────┘  └─────────┘  └─────────┘
+```
+
+**Example task: "Update HTML titles"**
+
+**Column 1: PLAN**
+
+```text
+Planner Agent:
+├── Analyzes requirement
+├── Identifies affected files:
+│   ├── index.html
+│   └── src/App.tsx (has <title> in render)
+├── Creates implementation plan:
+│   1. Update index.html <title>
+│   2. Update App.tsx header component
+│   3. Test both pages load correctly
+└── Moves task to BUILD column
+```
+
+**Column 2: BUILD**
+
+```text
+Builder Agent:
+├── Reads plan from PLAN column
+├── Implements changes:
+│   ├── index.html: "Plan Build Review Ship"
+│   └── App.tsx: header="Plan Build Review Ship"
+├── Runs tests: All passing ✅
+└── Moves task to REVIEW column
+```
+
+**Column 3: REVIEW**
+
+```text
+Reviewer Agent:
+├── Reads plan and implementation
+├── Checks:
+│   ├── Plan followed? ✅
+│   ├── Tests passing? ✅
+│   ├── Code quality? ✅
+│   └── No security issues? ✅
+├── Approves changes
+└── Moves task to SHIP column
+```
+
+**Column 4: SHIP**
+
+```text
+Shipper Agent:
+├── Creates git commit
+├── Pushes to remote
+├── Updates deployment
+└── Marks task complete
+```
+
+**Orchestrator's role:**
+
+```text
+- NOT micromanaging each step
+- Responding to user commands like "Move task to next phase"
+- Tracking task state in database
+- Providing UI showing current phase
+- Can intervene if phase fails (e.g., tests fail in BUILD)
+```
+
+**UI representation:**
+
+```text
+Task: Update Titles
+├── Status: REVIEW
+├── Assigned: reviewer-agent-003
+├── History:
+│   ├── PLAN: planner-001 (completed 2m ago)
+│   ├── BUILD: builder-002 (completed 1m ago)
+│   └── REVIEW: reviewer-003 (in progress)
+└── Files modified: 2
+```
+
+**Key techniques:**
+
+- **Clear phases** - No ambiguity about current state
+- **Quality gates** - Can't skip to SHIP without REVIEW
+- **Agent specialization** - Each agent expert in its phase
+- **Failure isolation** - If BUILD fails, PLAN preserved
+
+**Results:**
+
+- **Zero shipping untested code** (REVIEW gate catches issues)
+- **Clear audit trail** (who did what in which phase)
+- **Parallel tasks** (multiple agents in different columns)
+- **Single interface** (user sees all tasks across all phases)
+
+**Source:** Custom Agents transcript
+
+---
+
+## Case Study 6: Meta-Agent System
+
+**Pattern:** Agents building agents
+
+**Problem:** Need new specialized agent but don't want to hand-write configuration
+
+**Solution:** Meta-agent that builds other agents
+
+**Meta-agent prompt:**
+
+```markdown
+# meta-agent.md
+
+You are a meta-agent that builds new sub-agents from user descriptions.
+
+When user says "build a new sub-agent":
+1. Ask what the agent should do
+2. Fetch Claude Code sub-agent documentation
+3. Design system prompt for new agent
+4. Create agent configuration file
+5. Test agent with sample prompts
+6. Report usage examples
+
+Output: .claude/agents/<agent-name>.md with complete configuration
+```
+
+**Example: Building TTS summary agent**
+
+**User:** "Build agent that summarizes what my code does using text-to-speech"
+
+**Meta-agent process:**
+
+```text
+Step 1: Understand requirements
+├── Parse: "summarize code" + "text-to-speech"
+├── Infer: Needs code reading + TTS API access
+└── Clarify: Voice provider? (user chooses 11Labs)
+
+Step 2: Fetch documentation
+├── Reads Claude Code sub-agent docs
+├── Reads 11Labs API docs
+└── Understands agent configuration format
+
+Step 3: Design system prompt
+├── Purpose: Concise code summaries via voice
+├── Tools needed: read files, 11Labs TTS
+├── Response format: Audio file output
+└── Trigger: "use TTS summary"
+
+Step 4: Create configuration
+Writes .claude/agents/tts-summary.md:
+---
+name: tts-summary
+description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
+---
+Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
+[... full system prompt ...]
+
+Step 5: Test
+├── Runs test prompt: "TTS summary for hooks.py"
+├── Agent reads file, generates summary
+├── Outputs audio with summary
+└── Validates: Works correctly ✅
+
+Step 6: Report
+├── Explains how to use new agent
+├── Shows example prompts
+└── Notes: Can adjust voice, length, etc.
+```
+
+**Result:** Fully functional TTS summary agent created from natural language description
+
+**Recursion depth:**
+
+```text
+Level 0: Human user
+  └→ Level 1: Meta-agent (builds agents)
+      └→ Level 2: TTS summary agent (built by meta-agent)
+          └→ Level 3: Sub-agents (if TTS agent spawns any)
+```
+
+**Key techniques:**
+
+- **Documentation fetching** - Meta-agent reads official docs
+- **Template following** - Follows agent configuration patterns
+- **Validation loop** - Tests before declaring success
+- **Recursive creation** - Agents can build agents
+
+**Challenges:**
+
+- **Dependency coupling** - New agent depends on meta-agent's understanding
+- **Debugging difficulty** - If generated agent fails, hard to trace
+- **Version drift** - Meta-agent's docs knowledge may become outdated
+
+**Results:**
+
+- **New agent in ~2 minutes** vs. 15+ minutes manually
+- **Follows best practices** automatically
+- **Tested before delivery**
+- **Documented usage**
+
+**Source:** Sub-Agents transcript
+
+---
+
+## Case Study 7: Observability Dashboard
+
+**Pattern:** Real-time multi-agent monitoring
+
+**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
+
+**Solution:** Centralized observability system
+
+**Architecture:**
+
+```text
+┌──────────────────── Multiple Agents ────────────────────┐
+│  Agent 1    Agent 2    Agent 3    Agent 4    Agent 5    │
+│    ↓          ↓          ↓          ↓          ↓        │
+│             pre/post-tool-use hooks                      │
+│                        ↓                                 │
+└──────────────────────────────────────────────────────────┘
+                         ↓
+┌──────────────────── Bun Server ─────────────────────────┐
+│  POST /events endpoint                                   │
+│         ├→ Store in SQLite (persistence)                 │
+│         └→ Broadcast via WebSocket (real-time)           │
+└──────────────────────────────────────────────────────────┘
+                         ↓
+┌──────────────────── Web Client ─────────────────────────┐
+│  ┌─────────────────────────────────────────────────┐    │
+│  │ Live Activity Pulse (1min/3min/5min windows)    │    │
+│  ├─────────────────────────────────────────────────┤    │
+│  │ Event Stream (filtered by app/session/type)     │    │
+│  ├─────────────────────────────────────────────────┤    │
+│  │ Event Details (with AI-generated summaries)     │    │
+│  └─────────────────────────────────────────────────┘    │
+└──────────────────────────────────────────────────────────┘
+```
+
+**Hook implementation:**
+
+```python
+# .claude/hooks/post-tool-use.py
+import sys, json, subprocess
+
+def main():
+    event = json.load(sys.stdin)
+
+    # Send to observability server (with AI summary)
+    subprocess.run([
+        "uv", "run",
+        ".claude/hooks/utils/send-event.py",
+        "my-codebase",          # App name
+        "post-tool-use",        # Event type
+        "--summarize"           # Generate Haiku summary
+    ], input=json.dumps(event), text=True)
+
+if __name__ == "__main__":
+    main()
+```
+
+**What user sees:**
+
+```text
+┌─────────────── Live Activity Pulse ───────────────┐
+│ ▂▄▆█▆▄▂▁ Agent A (very active)                    │
+│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity)              │
+│ ▂▂▂▂▂▂▂▂ Agent C (steady work)                    │
+│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet)              │
+└────────────────────────────────────────────────────┘
+
+┌─────────────── Event Stream ──────────────────────┐
+│ [Agent A] post-tool-use                            │
+│   Summary: "Wrote authentication logic to user.py"│
+│   Time: 2s ago                                     │
+├────────────────────────────────────────────────────┤
+│ [Agent B] sub-agent-stop                           │
+│   Summary: "Completed documentation scrape"        │
+│   Time: 5s ago                                     │
+├────────────────────────────────────────────────────┤
+│ [Agent C] notification                             │
+│   Summary: "Needs approval for rm command"         │
+│   Time: 8s ago                                     │
+└────────────────────────────────────────────────────┘
+```
+
+**Filtering:**
+
+```text
+Filters available:
+├── By app (codebase-1, codebase-2, etc.)
+├── By agent session ID
+├── By event type (pre-tool, post-tool, stop, etc.)
+└── By time window (1min, 3min, 5min)
+```
+
+**Event summarization:**
+
+```python
+# Each event summarized by Haiku ($0.0002 per event)
+Event: post-tool-use for Write tool
+Input: {file: "auth.py", content: "...500 lines..."}
+Output: Success
+
+Summary generated:
+"Implemented JWT authentication with refresh tokens in auth.py"
+
+Cost: $0.0002
+Human value: Instant understanding without reading 500 lines
+```
+
+**Key techniques:**
+
+- **One-way data stream** - Simple, fast, scalable
+- **Edge summarization** - AI summaries generated at hook time
+- **Dual storage** - SQLite (history) + WebSocket (real-time)
+- **Color coding** - Consistent colors per agent session
+
+**Results:**
+
+- **5-10 agents monitored** simultaneously
+- **Thousands of events logged** (cost: ~$0.20)
+- **Real-time visibility** into all agent work
+- **Historical analysis** via SQLite queries
+
+**Business value:**
+
+- **Catch errors fast** (notification events = agent blocked)
+- **Optimize workflows** (which tools used most?)
+- **Debug issues** (what happened before failure?)
+- **Scale confidence** (can observe 10+ agents easily)
+
+**Source:** Multi-Agent Observability transcript
+
+---
+
+## Case Study 8: AFK Agent Device
+
+**Pattern:** Autonomous background work while you're away
+
+**Problem:** Long-running tasks block your terminal. You want to work on something else.
+
+**Solution:** Dedicated device running agent fleet
+
+**Architecture:**
+
+```text
+Your Device (interactive):
+├── Claude Code session
+├── Send job to agent device
+└── Monitor status updates
+
+Agent Device (autonomous):
+├── Picks up job from queue
+├── Executes: Scout → Plan → Build → Ship
+├── Reports status every 60s
+└── Ships results to git
+```
+
+**Workflow:**
+
+```bash
+# From your device
+/afk-agents \
+  --prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
+  --adw "plan-build-ship" \
+  --docs "https://openai-agent-sdk.com/docs"
+
+# Job sent to dedicated device
+# You continue working on your device
+# Background: Agent device executes workflow
+```
+
+**Agent device execution:**
+
+```text
+[00:00] Job received: Build 3 SDK agents
+[00:05] Planner agent created
+[00:45] Plan complete: 3 agents specified
+[01:00] Builder agent 1 created (basic agent)
+[02:30] Builder agent 1 complete: basic-agent.py ✅
+[02:35] Builder agent 2 created (with tools)
+[04:15] Builder agent 2 complete: agent-with-tools.py ✅
+[04:20] Builder agent 3 created (realtime voice)
+[07:45] Builder agent 3 partial: needs audio libraries
+[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
+[08:05] Shipper agent created
+[08:20] Git commit created
+[08:25] Pushed to remote
+[08:30] Job complete ✅
+```
+
+**Status updates (every 60s):**
+
+```text
+Your device shows:
+
+[60s] Status: Planning agents...
+[120s] Status: Building agent 1 of 3...
+[180s] Status: Building agent 2 of 3...
+[240s] Status: Building agent 3 of 3...
+[300s] Status: Testing agents...
+[360s] Status: Shipping to git...
+[420s] Status: Complete ✅
+
+Click to view: results/sdk-agents-20250105/
+```
+
+**What you do:**
+
+```text
+1. Send job (10 seconds)
+2. Go AFK (work on something else)
+3. Get notified when complete (7 minutes later)
+4. Review results
+```
+
+**Key techniques:**
+
+- **Job queue** - Agents pick up work from queue
+- **Async status** - Reports back periodically
+- **Autonomous execution** - No human in the loop
+- **Git integration** - Results automatically committed
+
+**Results:**
+
+- **3 SDK agents built** in 7 minutes
+- **You worked on other things** during that time
+- **Autonomous end-to-end** - plan + build + test + ship
+- **Code review** - Quick glance confirms quality
+
+**Infrastructure required:**
+
+- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
+- Agent queue system
+- Job scheduler
+- Status reporting
+
+**Use cases:**
+
+- Long-running builds
+- Overnight work
+- Prototyping experiments
+- Documentation generation
+- Codebase refactors
+
+**Source:** Claude 2.0 transcript
+
+---
+
+## Cross-Cutting Patterns
+
+### Pattern: Context Window as Resource Constraint
+
+**Appears in:**
+
+- Case 1: Sub-agent delegation protects primary
+- Case 2: Scout-plan-build reduces planner context
+- Case 3: Orchestrator sleeps to protect its context
+- Case 8: Fresh agents for each phase (no accumulation)
+
+**Lesson:** Context is precious. Protect it aggressively.
+
+### Pattern: Specialized Agents Over General
+
+**Appears in:**
+
+- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
+- Case 4: Scout finds, builder builds (not one agent doing both)
+- Case 5: Planner/builder/reviewer/shipper (4 specialists)
+- Case 6: Meta-agent only builds, doesn't execute
+
+**Lesson:** "A focused agent is a performant agent."
+
+### Pattern: Observability Enables Scale
+
+**Appears in:**
+
+- Case 3: Orchestrator tracks agent status
+- Case 5: Task board shows current phase
+- Case 7: Real-time dashboard for all agents
+- Case 8: Status updates every 60s
+
+**Lesson:** "If you can't measure it, you can't scale it."
+
+### Pattern: Deletable Temporary Resources
+
+**Appears in:**
+
+- Case 3: All 3 agents deleted after completion
+- Case 4: Scout and builder deleted
+- Case 5: Each phase agent deleted after task moves
+- Case 8: Builder agents deleted after shipping
+
+**Lesson:** "The best agent is a deleted agent."
+
+## Performance Comparisons
+
+### Single Agent vs. Multi-Agent
+
+| Task | Single Agent | Multi-Agent | Speedup |
+|------|--------------|-------------|---------|
+| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
+| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
+| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
+| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
+
+### With vs. Without Orchestration
+
+| Metric | Manual (no orchestrator) | With Orchestrator |
+|--------|-------------------------|-------------------|
+| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
+| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
+| Error recovery | Start over | Retry failed phase only |
+| Observability | Terminal logs | Real-time dashboard |
+
+## Common Failure Modes
+
+### Failure: Context Explosion
+
+**Scenario:** Case 2 without scouts
+
+- Single agent reads 100+ files
+- Context hits 180k tokens
+- Agent slows down, makes mistakes
+- Eventually fails or times out
+
+**Fix:** Add scout phase to filter files first
+
+### Failure: Orchestrator Watching Everything
+
+**Scenario:** Case 3 with observing orchestrator
+
+- Orchestrator watches all agent work
+- Orchestrator context grows to 100k+
+- Can't coordinate more than 2-3 agents
+- System doesn't scale
+
+**Fix:** Implement orchestrator sleep pattern
+
+### Failure: No Observability
+
+**Scenario:** Case 7 without dashboard
+
+- 5 agents running
+- One agent stuck on permission request
+- No way to know which agent needs attention
+- Entire workflow blocked
+
+**Fix:** Add hooks + observability system
+
+### Failure: Agent Accumulation
+
+**Scenario:** Case 5 not deleting agents
+
+- 20 tasks completed
+- 80 agents still running (4 per task)
+- System resources exhausted
+- New agents can't start
+
+**Fix:** Delete agents after task completion
+
+## Key Takeaways
+
+1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
+
+2. **Context protection = Specialization** - Focused agents use less context
+
+3. **Orchestration = Scale** - Single interface manages fleet
+
+4. **Observability = Confidence** - Can't scale what you can't see
+
+5. **Deletable = Sustainable** - Free resources for next task
+
+6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
+
+## When to Use Multi-Agent Patterns
+
+Use multi-agent when:
+
+- ✅ Task naturally divides into parallel subtasks
+- ✅ Single agent context approaching limits
+- ✅ Need quality gates between phases
+- ✅ Want to work on other things while agents execute
+- ✅ Have observability infrastructure
+
+Don't use multi-agent when:
+
+- ❌ Simple one-off task
+- ❌ Learning/prototyping phase
+- ❌ No way to monitor agents
+- ❌ Task requires tight human-in-loop feedback
+
+## Source Attribution
+
+All case studies drawn from field experience documented in 8 source transcripts:
+
+1. Elite Context Engineering - Case 1 (AI docs loader)
+2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
+3. Custom Agents - Case 5 (task board)
+4. Sub-Agents - Case 6 (meta-agent)
+5. Multi-Agent Observability - Case 7 (dashboard)
+6. Hooked - Supporting patterns
+7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
+8. (Transcript 8 name not specified in context)
+
+## Related Documentation
+
+- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
+- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
+- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
+- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
+
+---
+
+**Remember:** These are real systems in production. Start simple, add complexity only when needed.
--- a/skills/multi-agent-composition/examples/progression-example.md
+++ b/skills/multi-agent-composition/examples/progression-example.md
@@ -0,0 +1,358 @@
+# Work Tree Manager: Evolution Path Example
+
+**Real-world case study** showing the proper progression from prompt → sub-agent → skill.
+
+## The Problem
+
+Managing git work trees across a project requires multiple related operations:
+
+- Creating new work trees
+- Listing existing work trees
+- Removing old work trees
+- Merging work tree changes
+- Updating work tree status
+
+## Stage 1: Start with a Prompt
+
+**Goal:** Solve the basic problem
+
+Create a simple slash command that creates one work tree:
+
+```bash
+/create-worktree feature-branch
+```
+
+**Implementation:**
+
+```markdown
+# .claude/commands/create-worktree.md
+
+Create a new git worktree for the specified branch.
+
+Steps:
+1. Check if branch exists
+2. Create worktree directory
+3. Initialize worktree
+4. Report success
+```
+
+**When to stay here:** The task is infrequent or one-off.
+
+**Signal to advance:** You find yourself creating work trees regularly.
+
+## Stage 2: Add Sub-Agent for Parallelism
+
+**Goal:** Scale to multiple parallel operations
+
+When you need to create multiple work trees at once, use a sub-agent:
+
+```bash
+Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
+```
+
+**Why sub-agent:**
+
+- **Parallelization** - Create 3 work trees simultaneously
+- **Context isolation** - Each creation is independent
+- **Speed** - 3x faster than sequential
+
+**Sub-agent prompt:**
+
+```markdown
+Create work trees for the following branches in parallel:
+- feature-a
+- feature-b
+- feature-c
+
+For each branch:
+1. Verify branch exists
+2. Create worktree directory
+3. Initialize worktree
+4. Report status
+
+Use the /create-worktree command for each.
+```
+
+**When to stay here:** Parallel creation is the only requirement.
+
+**Signal to advance:** You need to **manage** work trees (not just create them).
+
+## Stage 3: Create Skill for Management
+
+**Goal:** Bundle multiple related operations
+
+The problem has grown beyond creation—you need comprehensive work tree **management**:
+
+```text
+skills/work-tree-manager/
+├── SKILL.md
+├── scripts/
+│   ├── validate.py
+│   └── cleanup.py
+└── reference/
+    └── git-worktree-commands.md
+```
+
+**SKILL.md:**
+
+```markdown
+---
+name: work-tree-manager
+description: Manage git worktrees - create, list, remove, merge, and update across projects. Use when working with git worktrees or when managing multiple branches simultaneously.
+---
+
+# Work Tree Manager
+
+## Operations
+
+### Create
+Use /create-worktree command for single operations.
+For parallel creation, delegate to sub-agent.
+
+### List
+Run: `git worktree list`
+Parse output and present in readable format.
+
+### Remove
+1. Check if work tree is clean
+2. Remove work tree directory
+3. Prune references
+
+### Merge
+1. Fetch latest changes
+2. Merge work tree branch to target
+3. Clean up if merge successful
+
+### Update
+1. Check status of all work trees
+2. Pull latest changes
+3. Report any conflicts
+
+## Validation
+
+Before any destructive operation, run:
+```bash
+python scripts/validate.py <worktree-path>
+```
+
+## Cleanup
+
+Periodically run cleanup to remove stale work trees:
+
+```bash
+python scripts/cleanup.py --dry-run
+```
+
+
+```bash
+
+**Why skill:**
+
+- **Multiple related operations** - Create, list, remove, merge, update
+- **Repeat problem** - Managing work trees is ongoing
+- **Domain-specific** - Specialized knowledge about git worktrees
+- **Orchestration** - Coordinates slash commands, sub-agents, and scripts
+
+**When to stay here:** Most workflows stop here.
+
+**Signal to advance:** Need external data (GitHub API, CI/CD status).
+
+## Stage 4: Add MCP for External Data
+
+**Goal:** Integrate external systems
+
+Add MCP server to query external repo metadata:
+
+```
+
+skills/work-tree-manager/
+├── SKILL.md (updated)
+└── ... (existing files)
+
+# Now references GitHub MCP for:
+# - Branch protection rules
+# - CI/CD status
+# - Pull request information
+
+```bash
+
+**Updated SKILL.md section:**
+```markdown
+## External Integration
+
+Before creating work tree, check GitHub status:
+- Use GitHub MCP to query branch protection
+- Check if CI is passing
+- Verify no open blocking PRs
+
+Query: `GitHub:get_branch_status <branch-name>`
+```
+
+**Why MCP:**
+
+- **External data** - Information lives outside Claude Code
+- **Real-time** - CI/CD status changes frequently
+- **Third-party** - GitHub API integration
+
+## Final State
+
+```
+
+```text
+
+```text
+
+```text
+
+```text
+Prompt (Slash Command)
+  └─→ Creates single work tree
+
+Sub-Agent
+  └─→ Creates multiple work trees in parallel
+
+Skill
+  ├─→ Orchestrates: Create, list, remove, merge, update
+  ├─→ Uses: Slash commands for primitives
+  ├─→ Uses: Sub-agents for parallel operations
+  └─→ Uses: Scripts for validation
+
+MCP Server (GitHub)
+  └─→ Provides: Branch status, CI/CD info, PR data
+
+Skill + MCP
+  └─→ Full-featured work tree manager with external integration
+```
+
+## Key Takeaways
+
+### Progression Signals
+
+**Prompt → Sub-Agent:**
+
+- Signal: Need parallelization
+- Keyword: "multiple," "parallel," "batch"
+
+**Sub-Agent → Skill:**
+
+- Signal: Need management, not just execution
+- Keywords: "manage," "coordinate," "workflow"
+- Multiple related operations emerge
+
+**Skill → Skill + MCP:**
+
+- Signal: Need external data or services
+- Keywords: "GitHub," "API," "real-time," "status"
+
+### Common Mistakes
+
+❌ **Skipping the prompt**
+
+- Starting with a skill for simple creation
+
+❌ **Overusing sub-agents**
+
+- Using sub-agents when main conversation would work
+
+❌ **Skill too early**
+
+- Creating skill before understanding the full problem domain
+
+✅ **Correct approach**
+
+- Build from bottom up
+- Add complexity only when needed
+- Each stage solves a real problem
+
+### Decision Checklist
+
+Before advancing to next stage:
+
+**Prompt → Sub-Agent:**
+
+- [ ] Do I need parallelization?
+- [ ] Are operations truly independent?
+- [ ] Am I okay losing context after?
+
+**Sub-Agent → Skill:**
+
+- [ ] Am I doing this repeatedly (3+ times)?
+- [ ] Do I have multiple related operations?
+- [ ] Is this a management problem, not just execution?
+- [ ] Would orchestration add real value?
+
+**Skill → Skill + MCP:**
+
+- [ ] Do I need external data?
+- [ ] Is the data outside Claude Code's control?
+- [ ] Would real-time info improve the workflow?
+
+## Real Usage
+
+### Scenario 1: Quick One-Off
+
+**Task:** Create one work tree for hotfix
+
+**Solution:** Slash command
+
+```bash
+/create-worktree hotfix-urgent-bug
+```
+
+**Why:** Simple, direct, one-time task.
+
+### Scenario 2: Feature Development Sprint
+
+**Task:** Create work trees for 5 feature branches
+
+**Solution:** Sub-agent
+
+```bash
+Create work trees in parallel for sprint features:
+feature-auth, feature-api, feature-ui, feature-tests, feature-docs
+```
+
+**Why:** Parallel execution, independent operations.
+
+### Scenario 3: Ongoing Project
+
+**Task:** Manage all work trees across development lifecycle
+
+**Solution:** Skill
+
+```text
+List all work trees, check status, merge completed features, clean up stale ones
+```
+
+**Why:** Multiple operations, repeat problem, management need.
+
+### Scenario 4: CI/CD Integration
+
+**Task:** Only create work trees for branches passing CI
+
+**Solution:** Skill + MCP
+
+```bash
+Create work trees for features that:
+
+- Have passing CI (check via GitHub MCP)
+- Are approved by reviewers
+- Have no merge conflicts
+```
+
+**Why:** Need external data from GitHub API.
+
+## Summary
+
+The work tree manager evolution demonstrates:
+
+1. **Start simple** - Slash command for basic operation
+2. **Scale for parallelism** - Sub-agent for batch operations
+3. **Manage complexity** - Skill for full workflow orchestration
+4. **Integrate externally** - MCP for real-time external data
+
+**The principle:** Each stage solves a real problem. Don't advance until you hit the limitation of your current approach.
+
+> "When you're starting out, I always recommend you just build a prompt. Everything is a prompt in the end."
+
+Build from the foundation upward.