Initial commit
This commit is contained in:
992
skills/multi-agent-composition/examples/case-studies.md
Normal file
992
skills/multi-agent-composition/examples/case-studies.md
Normal file
@@ -0,0 +1,992 @@
|
||||
# Multi-Agent Case Studies
|
||||
|
||||
Real-world examples of multi-agent systems in production, drawn from field experience.
|
||||
|
||||
## Case Study Index
|
||||
|
||||
| # | Name | Pattern | Agents | Key Lesson |
|
||||
|---|------|---------|--------|------------|
|
||||
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
|
||||
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
|
||||
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
|
||||
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
|
||||
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
|
||||
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
|
||||
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
|
||||
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
|
||||
|
||||
---
|
||||
|
||||
## Case Study 1: AI Docs Loader
|
||||
|
||||
**Pattern:** Sub-agent delegation for parallel work
|
||||
|
||||
**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
|
||||
|
||||
**Solution:** Delegate each scrape to isolated sub-agent
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Primary Agent (9k tokens)
|
||||
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
|
||||
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
|
||||
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
|
||||
...
|
||||
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
|
||||
|
||||
Total work: 39k tokens
|
||||
Primary agent: Only 9k tokens ✅
|
||||
Context protected: 30k tokens kept out of primary
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```bash
|
||||
# Single command
|
||||
/load-ai-docs
|
||||
|
||||
# Agent reads list from ai-docs/README.md
|
||||
# For each URL older than 24 hours:
|
||||
# - Spawn sub-agent
|
||||
# - Sub-agent scrapes URL
|
||||
# - Sub-agent saves to file
|
||||
# - Sub-agent reports completion
|
||||
# Primary agent never sees scrape content
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Sub-agents for isolation** - Each scrape in separate context
|
||||
- **Parallel execution** - All 10 scrapes run simultaneously
|
||||
- **Context delegation** - 30k tokens stay out of primary
|
||||
|
||||
**Results:**
|
||||
|
||||
- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
|
||||
- **Context:** Primary agent stays at 9k tokens throughout
|
||||
- **Scalability:** Can handle 50+ URLs without primary context issues
|
||||
|
||||
**Source:** Elite Context Engineering transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 2: SDK Migration
|
||||
|
||||
**Pattern:** Scout-plan-build with multiple perspectives
|
||||
|
||||
**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
|
||||
|
||||
**Challenge:**
|
||||
|
||||
- 100+ files potentially affected
|
||||
- Agent reading everything = 150k+ tokens
|
||||
- Planning without full context = mistakes
|
||||
|
||||
**Solution:** Three-phase workflow with delegation
|
||||
|
||||
**Phase 1: Scout (Reduce context for planner)**
|
||||
|
||||
```text
|
||||
Orchestrator spawns 4 scout agents (parallel):
|
||||
├→ Scout 1: Gemini Lightning (fast, different perspective)
|
||||
├→ Scout 2: CodeX (specialized for code search)
|
||||
├→ Scout 3: Gemini Flash Preview
|
||||
└→ Scout 4: Haiku (cheap, fast)
|
||||
|
||||
Each scout:
|
||||
- Searches codebase for SDK usage
|
||||
- Identifies exact files and line numbers
|
||||
- Notes patterns (e.g., "system prompt now explicit")
|
||||
|
||||
Output: relevant-files.md (5k tokens)
|
||||
├── File paths
|
||||
├── Line number offsets
|
||||
├── Character ranges
|
||||
└── Relevant code snippets
|
||||
```
|
||||
|
||||
**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
|
||||
|
||||
**Phase 2: Plan (Focus on relevant subset)**
|
||||
|
||||
```text
|
||||
Planner agent (new instance):
|
||||
├── Reads relevant-files.md (5k tokens)
|
||||
├── Scrapes SDK documentation (8k tokens)
|
||||
├── Analyzes migration patterns
|
||||
└── Creates detailed-plan.md (3k tokens)
|
||||
|
||||
Context used: 16k tokens
|
||||
vs. 150k if reading entire codebase
|
||||
Savings: 89% reduction
|
||||
```
|
||||
|
||||
**Phase 3: Build (Execute plan)**
|
||||
|
||||
```text
|
||||
Builder agent (new instance):
|
||||
├── Reads detailed-plan.md (3k tokens)
|
||||
├── Implements changes across 8 apps
|
||||
├── Updates system prompts
|
||||
├── Tests each application
|
||||
└── Reports completion
|
||||
|
||||
Context used: ~80k tokens
|
||||
Still within safe limits
|
||||
```
|
||||
|
||||
**Final context analysis:**
|
||||
|
||||
```text
|
||||
If single agent:
|
||||
├── Search: 40k tokens
|
||||
├── Read files: 60k tokens
|
||||
├── Plan: 20k tokens
|
||||
├── Implement: 30k tokens
|
||||
└── Total: 150k tokens (75% used)
|
||||
|
||||
With scout-plan-build:
|
||||
├── Primary orchestrator: 10k tokens
|
||||
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
|
||||
├── Planner (new agent): 16k tokens
|
||||
├── Builder (new agent): 80k tokens
|
||||
└── Max per agent: 80k tokens (40% per agent)
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Composable workflows** - Chain /scout, /plan, /build
|
||||
- **Multiple scout models** - Diverse perspectives
|
||||
- **Context offloading** - Scouts protect planner's context
|
||||
- **Fresh agents per phase** - No context accumulation
|
||||
|
||||
**Results:**
|
||||
|
||||
- **8 applications migrated** successfully
|
||||
- **51% context used** in builder phase (safe margins)
|
||||
- **No context explosions** across entire workflow
|
||||
- **Completed in single session** (~30 minutes)
|
||||
|
||||
**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
|
||||
|
||||
**Lesson:** Disable autocompact buffer. That 22% matters at scale.
|
||||
|
||||
**Source:** Claude 2.0 transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 3: Codebase Summarization
|
||||
|
||||
**Pattern:** Orchestrator with specialized QA agents
|
||||
|
||||
**Problem:** Summarize large codebase (frontend + backend) with architecture docs
|
||||
|
||||
**Approach:** Divide and conquer with synthesis
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Orchestrator Agent
|
||||
├→ Creates Frontend QA Agent
|
||||
│ ├─ Summarizes frontend components
|
||||
│ └─ Outputs: frontend-summary.md
|
||||
├→ Creates Backend QA Agent
|
||||
│ ├─ Summarizes backend APIs
|
||||
│ └─ Outputs: backend-summary.md
|
||||
└→ Creates Primary QA Agent
|
||||
├─ Reads both summaries
|
||||
├─ Synthesizes unified view
|
||||
└─ Outputs: codebase-overview.md
|
||||
```
|
||||
|
||||
**Orchestrator behavior:**
|
||||
|
||||
```text
|
||||
1. Parse user request: "Summarize codebase"
|
||||
2. Create 3 agents with specialized tasks
|
||||
3. Command each agent with detailed prompts
|
||||
4. SLEEP (not observing their work)
|
||||
5. Wake every 15s to check status
|
||||
6. Agents complete → Orchestrator wakes
|
||||
7. Collect results (read produced files)
|
||||
8. Summarize for user
|
||||
9. Delete all 3 agents
|
||||
```
|
||||
|
||||
**Prompts from orchestrator:**
|
||||
|
||||
```markdown
|
||||
Frontend QA Agent:
|
||||
"Analyze all files in src/frontend/. Create markdown summary with:
|
||||
- Key components and their responsibilities
|
||||
- State management approach
|
||||
- Routing structure
|
||||
- Technology stack
|
||||
Output to docs/frontend-summary.md"
|
||||
|
||||
Backend QA Agent:
|
||||
"Analyze all files in src/backend/. Create markdown summary with:
|
||||
- API endpoints and their purposes
|
||||
- Database schema
|
||||
- Authentication/authorization
|
||||
- External integrations
|
||||
Output to docs/backend-summary.md"
|
||||
|
||||
Primary QA Agent:
|
||||
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
|
||||
- High-level architecture
|
||||
- How components interact
|
||||
- Data flow
|
||||
- Key technologies
|
||||
Output to docs/codebase-overview.md"
|
||||
```
|
||||
|
||||
**Observability interface shows:**
|
||||
|
||||
```text
|
||||
[Agent 1] Frontend QA
|
||||
├── Status: Complete ✅
|
||||
├── Context: 28k tokens used
|
||||
├── Files consumed: 15 files
|
||||
├── Files produced: frontend-summary.md
|
||||
└── Time: 45 seconds
|
||||
|
||||
[Agent 2] Backend QA
|
||||
├── Status: Complete ✅
|
||||
├── Context: 32k tokens used
|
||||
├── Files consumed: 12 files
|
||||
├── Files produced: backend-summary.md
|
||||
└── Time: 52 seconds
|
||||
|
||||
[Agent 3] Primary QA
|
||||
├── Status: Complete ✅
|
||||
├── Context: 18k tokens used
|
||||
├── Files consumed: 2 files (summaries)
|
||||
├── Files produced: codebase-overview.md
|
||||
└── Time: 30 seconds
|
||||
|
||||
Orchestrator:
|
||||
├── Context: 12k tokens (commands only, not observing work)
|
||||
├── Total time: 52 seconds (parallel execution)
|
||||
└── All agents deleted after completion
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Parallel frontend/backend** - 2x speedup
|
||||
- **Orchestrator sleeps** - Protects its context
|
||||
- **Synthesis agent** - Combines perspectives
|
||||
- **Deletable agents** - Freed after use
|
||||
|
||||
**Results:**
|
||||
|
||||
- **3 comprehensive docs** created
|
||||
- **Max context per agent:** 32k tokens (16%)
|
||||
- **Orchestrator context:** 12k tokens (6%)
|
||||
- **Time:** 52 seconds (vs. 2+ minutes sequential)
|
||||
|
||||
**Source:** One Agent to Rule Them All transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 4: UI Component Creation
|
||||
|
||||
**Pattern:** Scout-builder two-stage
|
||||
|
||||
**Problem:** Create gray pills for app header information display
|
||||
|
||||
**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
|
||||
|
||||
**Solution:** Scout locates, builder implements
|
||||
|
||||
**Phase 1: Scout**
|
||||
|
||||
```text
|
||||
Scout Agent:
|
||||
├── Task: "Find header UI component files"
|
||||
├── Searches for: header, display, pills, info components
|
||||
├── Identifies patterns: existing pill styles, color conventions
|
||||
├── Locates exact files:
|
||||
│ ├── src/components/AppHeader.vue
|
||||
│ ├── src/styles/pills.css
|
||||
│ └── src/utils/formatters.ts
|
||||
└── Outputs: scout-header-report.md with:
|
||||
├── File locations
|
||||
├── Line numbers for modifications
|
||||
├── Existing patterns to follow
|
||||
└── Recommended approach
|
||||
```
|
||||
|
||||
**Phase 2: Builder**
|
||||
|
||||
```text
|
||||
Builder Agent:
|
||||
├── Reads scout-header-report.md
|
||||
├── Follows identified patterns
|
||||
├── Creates gray pill components
|
||||
├── Applies consistent styling
|
||||
├── Outputs modified files with exact changes
|
||||
└── Context: Only 30k tokens (vs. 80k+ without scout)
|
||||
```
|
||||
|
||||
**Orchestrator involvement:**
|
||||
|
||||
```text
|
||||
1. User prompts: "Create gray pills for header"
|
||||
2. Orchestrator creates Scout
|
||||
3. Orchestrator SLEEPS (checks every 15s)
|
||||
4. Scout completes → Orchestrator wakes
|
||||
5. Orchestrator reads scout output
|
||||
6. Orchestrator creates Builder with detailed instructions
|
||||
7. Orchestrator SLEEPS again
|
||||
8. Builder completes → Orchestrator wakes
|
||||
9. Orchestrator reports results
|
||||
10. Orchestrator deletes both agents
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Scout reduces uncertainty** - Builder knows exactly where to work
|
||||
- **Pattern following** - Scout identifies conventions
|
||||
- **Orchestrator sleep** - Two phases, minimal orchestrator context
|
||||
- **Precise targeting** - No wasted reads
|
||||
|
||||
**Results:**
|
||||
|
||||
- **Scout:** 15k tokens, 20 seconds
|
||||
- **Builder:** 30k tokens, 35 seconds
|
||||
- **Orchestrator:** 8k tokens final
|
||||
- **Total time:** 55 seconds
|
||||
- **Feature shipped** correctly on first try
|
||||
|
||||
**Source:** One Agent to Rule Them All transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
|
||||
|
||||
**Pattern:** Structured lifecycle with quality gates
|
||||
|
||||
**Problem:** Ensure all changes go through proper review before shipping
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Task Board Columns:
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
**Example task: "Update HTML titles"**
|
||||
|
||||
**Column 1: PLAN**
|
||||
|
||||
```text
|
||||
Planner Agent:
|
||||
├── Analyzes requirement
|
||||
├── Identifies affected files:
|
||||
│ ├── index.html
|
||||
│ └── src/App.tsx (has <title> in render)
|
||||
├── Creates implementation plan:
|
||||
│ 1. Update index.html <title>
|
||||
│ 2. Update App.tsx header component
|
||||
│ 3. Test both pages load correctly
|
||||
└── Moves task to BUILD column
|
||||
```
|
||||
|
||||
**Column 2: BUILD**
|
||||
|
||||
```text
|
||||
Builder Agent:
|
||||
├── Reads plan from PLAN column
|
||||
├── Implements changes:
|
||||
│ ├── index.html: "Plan Build Review Ship"
|
||||
│ └── App.tsx: header="Plan Build Review Ship"
|
||||
├── Runs tests: All passing ✅
|
||||
└── Moves task to REVIEW column
|
||||
```
|
||||
|
||||
**Column 3: REVIEW**
|
||||
|
||||
```text
|
||||
Reviewer Agent:
|
||||
├── Reads plan and implementation
|
||||
├── Checks:
|
||||
│ ├── Plan followed? ✅
|
||||
│ ├── Tests passing? ✅
|
||||
│ ├── Code quality? ✅
|
||||
│ └── No security issues? ✅
|
||||
├── Approves changes
|
||||
└── Moves task to SHIP column
|
||||
```
|
||||
|
||||
**Column 4: SHIP**
|
||||
|
||||
```text
|
||||
Shipper Agent:
|
||||
├── Creates git commit
|
||||
├── Pushes to remote
|
||||
├── Updates deployment
|
||||
└── Marks task complete
|
||||
```
|
||||
|
||||
**Orchestrator's role:**
|
||||
|
||||
```text
|
||||
- NOT micromanaging each step
|
||||
- Responding to user commands like "Move task to next phase"
|
||||
- Tracking task state in database
|
||||
- Providing UI showing current phase
|
||||
- Can intervene if phase fails (e.g., tests fail in BUILD)
|
||||
```
|
||||
|
||||
**UI representation:**
|
||||
|
||||
```text
|
||||
Task: Update Titles
|
||||
├── Status: REVIEW
|
||||
├── Assigned: reviewer-agent-003
|
||||
├── History:
|
||||
│ ├── PLAN: planner-001 (completed 2m ago)
|
||||
│ ├── BUILD: builder-002 (completed 1m ago)
|
||||
│ └── REVIEW: reviewer-003 (in progress)
|
||||
└── Files modified: 2
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Clear phases** - No ambiguity about current state
|
||||
- **Quality gates** - Can't skip to SHIP without REVIEW
|
||||
- **Agent specialization** - Each agent expert in its phase
|
||||
- **Failure isolation** - If BUILD fails, PLAN preserved
|
||||
|
||||
**Results:**
|
||||
|
||||
- **Zero shipping untested code** (REVIEW gate catches issues)
|
||||
- **Clear audit trail** (who did what in which phase)
|
||||
- **Parallel tasks** (multiple agents in different columns)
|
||||
- **Single interface** (user sees all tasks across all phases)
|
||||
|
||||
**Source:** Custom Agents transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 6: Meta-Agent System
|
||||
|
||||
**Pattern:** Agents building agents
|
||||
|
||||
**Problem:** Need new specialized agent but don't want to hand-write configuration
|
||||
|
||||
**Solution:** Meta-agent that builds other agents
|
||||
|
||||
**Meta-agent prompt:**
|
||||
|
||||
```markdown
|
||||
# meta-agent.md
|
||||
|
||||
You are a meta-agent that builds new sub-agents from user descriptions.
|
||||
|
||||
When user says "build a new sub-agent":
|
||||
1. Ask what the agent should do
|
||||
2. Fetch Claude Code sub-agent documentation
|
||||
3. Design system prompt for new agent
|
||||
4. Create agent configuration file
|
||||
5. Test agent with sample prompts
|
||||
6. Report usage examples
|
||||
|
||||
Output: .claude/agents/<agent-name>.md with complete configuration
|
||||
```
|
||||
|
||||
**Example: Building TTS summary agent**
|
||||
|
||||
**User:** "Build agent that summarizes what my code does using text-to-speech"
|
||||
|
||||
**Meta-agent process:**
|
||||
|
||||
```text
|
||||
Step 1: Understand requirements
|
||||
├── Parse: "summarize code" + "text-to-speech"
|
||||
├── Infer: Needs code reading + TTS API access
|
||||
└── Clarify: Voice provider? (user chooses 11Labs)
|
||||
|
||||
Step 2: Fetch documentation
|
||||
├── Reads Claude Code sub-agent docs
|
||||
├── Reads 11Labs API docs
|
||||
└── Understands agent configuration format
|
||||
|
||||
Step 3: Design system prompt
|
||||
├── Purpose: Concise code summaries via voice
|
||||
├── Tools needed: read files, 11Labs TTS
|
||||
├── Response format: Audio file output
|
||||
└── Trigger: "use TTS summary"
|
||||
|
||||
Step 4: Create configuration
|
||||
Writes .claude/agents/tts-summary.md:
|
||||
---
|
||||
name: tts-summary
|
||||
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
|
||||
---
|
||||
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
|
||||
[... full system prompt ...]
|
||||
|
||||
Step 5: Test
|
||||
├── Runs test prompt: "TTS summary for hooks.py"
|
||||
├── Agent reads file, generates summary
|
||||
├── Outputs audio with summary
|
||||
└── Validates: Works correctly ✅
|
||||
|
||||
Step 6: Report
|
||||
├── Explains how to use new agent
|
||||
├── Shows example prompts
|
||||
└── Notes: Can adjust voice, length, etc.
|
||||
```
|
||||
|
||||
**Result:** Fully functional TTS summary agent created from natural language description
|
||||
|
||||
**Recursion depth:**
|
||||
|
||||
```text
|
||||
Level 0: Human user
|
||||
└→ Level 1: Meta-agent (builds agents)
|
||||
└→ Level 2: TTS summary agent (built by meta-agent)
|
||||
└→ Level 3: Sub-agents (if TTS agent spawns any)
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Documentation fetching** - Meta-agent reads official docs
|
||||
- **Template following** - Follows agent configuration patterns
|
||||
- **Validation loop** - Tests before declaring success
|
||||
- **Recursive creation** - Agents can build agents
|
||||
|
||||
**Challenges:**
|
||||
|
||||
- **Dependency coupling** - New agent depends on meta-agent's understanding
|
||||
- **Debugging difficulty** - If generated agent fails, hard to trace
|
||||
- **Version drift** - Meta-agent's docs knowledge may become outdated
|
||||
|
||||
**Results:**
|
||||
|
||||
- **New agent in ~2 minutes** vs. 15+ minutes manually
|
||||
- **Follows best practices** automatically
|
||||
- **Tested before delivery**
|
||||
- **Documented usage**
|
||||
|
||||
**Source:** Sub-Agents transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 7: Observability Dashboard
|
||||
|
||||
**Pattern:** Real-time multi-agent monitoring
|
||||
|
||||
**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
|
||||
|
||||
**Solution:** Centralized observability system
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
┌──────────────────── Multiple Agents ────────────────────┐
|
||||
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
|
||||
│ ↓ ↓ ↓ ↓ ↓ │
|
||||
│ pre/post-tool-use hooks │
|
||||
│ ↓ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────── Bun Server ─────────────────────────┐
|
||||
│ POST /events endpoint │
|
||||
│ ├→ Store in SQLite (persistence) │
|
||||
│ └→ Broadcast via WebSocket (real-time) │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────── Web Client ─────────────────────────┐
|
||||
│ ┌─────────────────────────────────────────────────┐ │
|
||||
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
|
||||
│ ├─────────────────────────────────────────────────┤ │
|
||||
│ │ Event Stream (filtered by app/session/type) │ │
|
||||
│ ├─────────────────────────────────────────────────┤ │
|
||||
│ │ Event Details (with AI-generated summaries) │ │
|
||||
│ └─────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Hook implementation:**
|
||||
|
||||
```python
|
||||
# .claude/hooks/post-tool-use.py
|
||||
import sys, json, subprocess
|
||||
|
||||
def main():
|
||||
event = json.load(sys.stdin)
|
||||
|
||||
# Send to observability server (with AI summary)
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/send-event.py",
|
||||
"my-codebase", # App name
|
||||
"post-tool-use", # Event type
|
||||
"--summarize" # Generate Haiku summary
|
||||
], input=json.dumps(event), text=True)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**What user sees:**
|
||||
|
||||
```text
|
||||
┌─────────────── Live Activity Pulse ───────────────┐
|
||||
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
|
||||
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
|
||||
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
|
||||
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
|
||||
└────────────────────────────────────────────────────┘
|
||||
|
||||
┌─────────────── Event Stream ──────────────────────┐
|
||||
│ [Agent A] post-tool-use │
|
||||
│ Summary: "Wrote authentication logic to user.py"│
|
||||
│ Time: 2s ago │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ [Agent B] sub-agent-stop │
|
||||
│ Summary: "Completed documentation scrape" │
|
||||
│ Time: 5s ago │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ [Agent C] notification │
|
||||
│ Summary: "Needs approval for rm command" │
|
||||
│ Time: 8s ago │
|
||||
└────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Filtering:**
|
||||
|
||||
```text
|
||||
Filters available:
|
||||
├── By app (codebase-1, codebase-2, etc.)
|
||||
├── By agent session ID
|
||||
├── By event type (pre-tool, post-tool, stop, etc.)
|
||||
└── By time window (1min, 3min, 5min)
|
||||
```
|
||||
|
||||
**Event summarization:**
|
||||
|
||||
```python
|
||||
# Each event summarized by Haiku ($0.0002 per event)
|
||||
Event: post-tool-use for Write tool
|
||||
Input: {file: "auth.py", content: "...500 lines..."}
|
||||
Output: Success
|
||||
|
||||
Summary generated:
|
||||
"Implemented JWT authentication with refresh tokens in auth.py"
|
||||
|
||||
Cost: $0.0002
|
||||
Human value: Instant understanding without reading 500 lines
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **One-way data stream** - Simple, fast, scalable
|
||||
- **Edge summarization** - AI summaries generated at hook time
|
||||
- **Dual storage** - SQLite (history) + WebSocket (real-time)
|
||||
- **Color coding** - Consistent colors per agent session
|
||||
|
||||
**Results:**
|
||||
|
||||
- **5-10 agents monitored** simultaneously
|
||||
- **Thousands of events logged** (cost: ~$0.20)
|
||||
- **Real-time visibility** into all agent work
|
||||
- **Historical analysis** via SQLite queries
|
||||
|
||||
**Business value:**
|
||||
|
||||
- **Catch errors fast** (notification events = agent blocked)
|
||||
- **Optimize workflows** (which tools used most?)
|
||||
- **Debug issues** (what happened before failure?)
|
||||
- **Scale confidence** (can observe 10+ agents easily)
|
||||
|
||||
**Source:** Multi-Agent Observability transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 8: AFK Agent Device
|
||||
|
||||
**Pattern:** Autonomous background work while you're away
|
||||
|
||||
**Problem:** Long-running tasks block your terminal. You want to work on something else.
|
||||
|
||||
**Solution:** Dedicated device running agent fleet
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Your Device (interactive):
|
||||
├── Claude Code session
|
||||
├── Send job to agent device
|
||||
└── Monitor status updates
|
||||
|
||||
Agent Device (autonomous):
|
||||
├── Picks up job from queue
|
||||
├── Executes: Scout → Plan → Build → Ship
|
||||
├── Reports status every 60s
|
||||
└── Ships results to git
|
||||
```
|
||||
|
||||
**Workflow:**
|
||||
|
||||
```bash
|
||||
# From your device
|
||||
/afk-agents \
|
||||
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
|
||||
--adw "plan-build-ship" \
|
||||
--docs "https://openai-agent-sdk.com/docs"
|
||||
|
||||
# Job sent to dedicated device
|
||||
# You continue working on your device
|
||||
# Background: Agent device executes workflow
|
||||
```
|
||||
|
||||
**Agent device execution:**
|
||||
|
||||
```text
|
||||
[00:00] Job received: Build 3 SDK agents
|
||||
[00:05] Planner agent created
|
||||
[00:45] Plan complete: 3 agents specified
|
||||
[01:00] Builder agent 1 created (basic agent)
|
||||
[02:30] Builder agent 1 complete: basic-agent.py ✅
|
||||
[02:35] Builder agent 2 created (with tools)
|
||||
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
|
||||
[04:20] Builder agent 3 created (realtime voice)
|
||||
[07:45] Builder agent 3 partial: needs audio libraries
|
||||
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
|
||||
[08:05] Shipper agent created
|
||||
[08:20] Git commit created
|
||||
[08:25] Pushed to remote
|
||||
[08:30] Job complete ✅
|
||||
```
|
||||
|
||||
**Status updates (every 60s):**
|
||||
|
||||
```text
|
||||
Your device shows:
|
||||
|
||||
[60s] Status: Planning agents...
|
||||
[120s] Status: Building agent 1 of 3...
|
||||
[180s] Status: Building agent 2 of 3...
|
||||
[240s] Status: Building agent 3 of 3...
|
||||
[300s] Status: Testing agents...
|
||||
[360s] Status: Shipping to git...
|
||||
[420s] Status: Complete ✅
|
||||
|
||||
Click to view: results/sdk-agents-20250105/
|
||||
```
|
||||
|
||||
**What you do:**
|
||||
|
||||
```text
|
||||
1. Send job (10 seconds)
|
||||
2. Go AFK (work on something else)
|
||||
3. Get notified when complete (7 minutes later)
|
||||
4. Review results
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Job queue** - Agents pick up work from queue
|
||||
- **Async status** - Reports back periodically
|
||||
- **Autonomous execution** - No human in the loop
|
||||
- **Git integration** - Results automatically committed
|
||||
|
||||
**Results:**
|
||||
|
||||
- **3 SDK agents built** in 7 minutes
|
||||
- **You worked on other things** during that time
|
||||
- **Autonomous end-to-end** - plan + build + test + ship
|
||||
- **Code review** - Quick glance confirms quality
|
||||
|
||||
**Infrastructure required:**
|
||||
|
||||
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
|
||||
- Agent queue system
|
||||
- Job scheduler
|
||||
- Status reporting
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Long-running builds
|
||||
- Overnight work
|
||||
- Prototyping experiments
|
||||
- Documentation generation
|
||||
- Codebase refactors
|
||||
|
||||
**Source:** Claude 2.0 transcript
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Patterns
|
||||
|
||||
### Pattern: Context Window as Resource Constraint
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 1: Sub-agent delegation protects primary
|
||||
- Case 2: Scout-plan-build reduces planner context
|
||||
- Case 3: Orchestrator sleeps to protect its context
|
||||
- Case 8: Fresh agents for each phase (no accumulation)
|
||||
|
||||
**Lesson:** Context is precious. Protect it aggressively.
|
||||
|
||||
### Pattern: Specialized Agents Over General
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
|
||||
- Case 4: Scout finds, builder builds (not one agent doing both)
|
||||
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
|
||||
- Case 6: Meta-agent only builds, doesn't execute
|
||||
|
||||
**Lesson:** "A focused agent is a performant agent."
|
||||
|
||||
### Pattern: Observability Enables Scale
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 3: Orchestrator tracks agent status
|
||||
- Case 5: Task board shows current phase
|
||||
- Case 7: Real-time dashboard for all agents
|
||||
- Case 8: Status updates every 60s
|
||||
|
||||
**Lesson:** "If you can't measure it, you can't scale it."
|
||||
|
||||
### Pattern: Deletable Temporary Resources
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 3: All 3 agents deleted after completion
|
||||
- Case 4: Scout and builder deleted
|
||||
- Case 5: Each phase agent deleted after task moves
|
||||
- Case 8: Builder agents deleted after shipping
|
||||
|
||||
**Lesson:** "The best agent is a deleted agent."
|
||||
|
||||
## Performance Comparisons
|
||||
|
||||
### Single Agent vs. Multi-Agent
|
||||
|
||||
| Task | Single Agent | Multi-Agent | Speedup |
|
||||
|------|--------------|-------------|---------|
|
||||
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
|
||||
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
|
||||
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
|
||||
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
|
||||
|
||||
### With vs. Without Orchestration
|
||||
|
||||
| Metric | Manual (no orchestrator) | With Orchestrator |
|
||||
|--------|-------------------------|-------------------|
|
||||
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
|
||||
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
|
||||
| Error recovery | Start over | Retry failed phase only |
|
||||
| Observability | Terminal logs | Real-time dashboard |
|
||||
|
||||
## Common Failure Modes
|
||||
|
||||
### Failure: Context Explosion
|
||||
|
||||
**Scenario:** Case 2 without scouts
|
||||
|
||||
- Single agent reads 100+ files
|
||||
- Context hits 180k tokens
|
||||
- Agent slows down, makes mistakes
|
||||
- Eventually fails or times out
|
||||
|
||||
**Fix:** Add scout phase to filter files first
|
||||
|
||||
### Failure: Orchestrator Watching Everything
|
||||
|
||||
**Scenario:** Case 3 with observing orchestrator
|
||||
|
||||
- Orchestrator watches all agent work
|
||||
- Orchestrator context grows to 100k+
|
||||
- Can't coordinate more than 2-3 agents
|
||||
- System doesn't scale
|
||||
|
||||
**Fix:** Implement orchestrator sleep pattern
|
||||
|
||||
### Failure: No Observability
|
||||
|
||||
**Scenario:** Case 7 without dashboard
|
||||
|
||||
- 5 agents running
|
||||
- One agent stuck on permission request
|
||||
- No way to know which agent needs attention
|
||||
- Entire workflow blocked
|
||||
|
||||
**Fix:** Add hooks + observability system
|
||||
|
||||
### Failure: Agent Accumulation
|
||||
|
||||
**Scenario:** Case 5 not deleting agents
|
||||
|
||||
- 20 tasks completed
|
||||
- 80 agents still running (4 per task)
|
||||
- System resources exhausted
|
||||
- New agents can't start
|
||||
|
||||
**Fix:** Delete agents after task completion
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
|
||||
|
||||
2. **Context protection = Specialization** - Focused agents use less context
|
||||
|
||||
3. **Orchestration = Scale** - Single interface manages fleet
|
||||
|
||||
4. **Observability = Confidence** - Can't scale what you can't see
|
||||
|
||||
5. **Deletable = Sustainable** - Free resources for next task
|
||||
|
||||
6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
|
||||
|
||||
## When to Use Multi-Agent Patterns
|
||||
|
||||
Use multi-agent when:
|
||||
|
||||
- ✅ Task naturally divides into parallel subtasks
|
||||
- ✅ Single agent context approaching limits
|
||||
- ✅ Need quality gates between phases
|
||||
- ✅ Want to work on other things while agents execute
|
||||
- ✅ Have observability infrastructure
|
||||
|
||||
Don't use multi-agent when:
|
||||
|
||||
- ❌ Simple one-off task
|
||||
- ❌ Learning/prototyping phase
|
||||
- ❌ No way to monitor agents
|
||||
- ❌ Task requires tight human-in-loop feedback
|
||||
|
||||
## Source Attribution
|
||||
|
||||
All case studies drawn from field experience documented in 8 source transcripts:
|
||||
|
||||
1. Elite Context Engineering - Case 1 (AI docs loader)
|
||||
2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
|
||||
3. Custom Agents - Case 5 (task board)
|
||||
4. Sub-Agents - Case 6 (meta-agent)
|
||||
5. Multi-Agent Observability - Case 7 (dashboard)
|
||||
6. Hooked - Supporting patterns
|
||||
7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
|
||||
8. (Transcript 8 name not specified in context)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
|
||||
- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
|
||||
- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
|
||||
- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
|
||||
|
||||
---
|
||||
|
||||
**Remember:** These are real systems in production. Start simple, add complexity only when needed.
|
||||
Reference in New Issue
Block a user