Initial commit
This commit is contained in:
@@ -0,0 +1,158 @@
|
||||
# Context in Composition
|
||||
|
||||
**Strategic framework for managing context when composing multi-agent systems.**
|
||||
|
||||
## The Core Problem
|
||||
|
||||
Context window is your most precious resource when composing multiple agents. A focused agent is a performant agent.
|
||||
|
||||
**The Reality:**
|
||||
|
||||
```text
|
||||
Single agent doing everything:
|
||||
├── Context explodes to 150k+ tokens
|
||||
├── Performance degrades
|
||||
└── Eventually fails or times out
|
||||
|
||||
Multi-agent composition:
|
||||
├── Each agent: <40k tokens
|
||||
├── Main agent: Stays lean
|
||||
└── Work completes successfully
|
||||
```
|
||||
|
||||
## The R&D Framework
|
||||
|
||||
There are only two strategies for managing context in multi-agent systems:
|
||||
|
||||
**R - Reduce**
|
||||
|
||||
- Minimize what enters context windows
|
||||
- Remove unused MCP servers (can consume 24k+ tokens)
|
||||
- Shrink static CLAUDE.md files
|
||||
- Use context priming instead of static loading
|
||||
|
||||
**D - Delegate**
|
||||
|
||||
- Move work to sub-agents' isolated contexts
|
||||
- Use background agents for autonomous work
|
||||
- Employ orchestrator sleep patterns
|
||||
- Treat agents as deletable temporary resources
|
||||
|
||||
**Everything else is a tactic implementing R or D.**
|
||||
|
||||
## The Four Levels of Context Mastery
|
||||
|
||||
### Level 1: Beginner - Stop Wasting Tokens
|
||||
|
||||
**Focus:** Resource management
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Remove unused MCP servers (reclaim 20k+ tokens)
|
||||
- Minimize CLAUDE.md (<1k tokens)
|
||||
- Disable autocompact buffer (reclaim 20%)
|
||||
|
||||
**Success Metric:** 85-90% context window free at startup
|
||||
|
||||
**Move to Level 2 when:** Resources cleaned but still rebuilding context for different tasks
|
||||
|
||||
---
|
||||
|
||||
### Level 2: Intermediate - Load Selectively
|
||||
|
||||
**Focus:** Dynamic context loading
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Context priming (`/prime` commands vs. static files)
|
||||
- Sub-agent delegation for parallel work
|
||||
- Composable workflows (scout-plan-build)
|
||||
|
||||
**Success Metric:** 60-75% context window free during work
|
||||
|
||||
**Move to Level 3 when:** Managing multiple agents but struggling with handoffs
|
||||
|
||||
---
|
||||
|
||||
### Level 3: Advanced - Multi-Agent Handoff
|
||||
|
||||
**Focus:** Agent-to-agent context transfer
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Context bundles (60-70% transfer in 10% tokens)
|
||||
- Monitor context limits proactively
|
||||
- Chain multiple agents without overflow
|
||||
|
||||
**Success Metric:** Per-agent context <60k tokens, successful handoffs
|
||||
|
||||
**Move to Level 4 when:** Need agents working autonomously while you do other work
|
||||
|
||||
---
|
||||
|
||||
### Level 4: Agentic - Out-of-Loop Systems
|
||||
|
||||
**Focus:** Fleet orchestration
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Background agents (`/background` command)
|
||||
- Dedicated agent environments
|
||||
- Orchestrator sleep patterns
|
||||
- Zero-touch execution
|
||||
|
||||
**Success Metric:** Agents ship work end-to-end without intervention
|
||||
|
||||
---
|
||||
|
||||
## When Context Becomes a Composition Issue
|
||||
|
||||
**Trigger 1: Single Agent Exceeds 150k Tokens**
|
||||
→ Delegate to sub-agents with isolated contexts
|
||||
|
||||
**Trigger 2: Agent Reading >20 Files**
|
||||
→ Use scout agents to identify relevant subset first
|
||||
|
||||
**Trigger 3: `/context` Shows >80% Used**
|
||||
→ Start fresh agent, use context bundles for handoff
|
||||
|
||||
**Trigger 4: Performance Degrading Mid-Workflow**
|
||||
→ Split workflow across multiple focused agents
|
||||
|
||||
**Trigger 5: Same Analysis Repeated Multiple Times**
|
||||
→ Context overflow forcing re-reads; delegate earlier
|
||||
|
||||
## Composition Patterns by Level
|
||||
|
||||
**Beginner:** Single agent, minimal static context
|
||||
|
||||
**Intermediate:** Main agent + sub-agents for parallel work
|
||||
|
||||
**Advanced:** Agent chains with context bundles for handoff
|
||||
|
||||
**Agentic:** Orchestrator + fleet of specialized agents
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Focused agents perform better** - Single purpose, minimal context
|
||||
2. **Agents are deletable** - Free context by removing completed agents
|
||||
3. **200k is plenty** - Context explosions are design problems, not capacity problems
|
||||
4. **Orchestrators must sleep** - Don't observe all sub-agent work
|
||||
5. **Context bundles over full replay** - 70% context in 10% tokens
|
||||
|
||||
## Implementation Details
|
||||
|
||||
For practical patterns, see:
|
||||
|
||||
- [Multi-Agent Context Isolation](../reference/multi-agent-context-isolation.md) - Parallel execution, context bundling
|
||||
- [Orchestrator Pattern](orchestrator-pattern.md) - Sleep patterns, fleet management
|
||||
- [Decision Framework](decision-framework.md) - When to use each component
|
||||
|
||||
## Source Attribution
|
||||
|
||||
Primary: Elite Context Engineering, Claude 2.0 transcripts
|
||||
Supporting: One Agent to Rule Them All, Sub-Agents documentation
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Context is the first pillar of the Core 4. Master context strategy, and you can scale infinitely with focused agents.
|
||||
715
skills/multi-agent-composition/patterns/context-management.md
Normal file
715
skills/multi-agent-composition/patterns/context-management.md
Normal file
@@ -0,0 +1,715 @@
|
||||
# Context Window Protection
|
||||
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
|
||||
|
||||
Context window protection is about managing your agent's most precious resource: attention. A focused agent is a performant agent.
|
||||
|
||||
## The Core Problem
|
||||
|
||||
**Every engineer hits this wall:**
|
||||
|
||||
```text
|
||||
Agent starts: 10k tokens (5% used)
|
||||
↓
|
||||
After exploration: 80k tokens (40% used)
|
||||
↓
|
||||
After planning: 120k tokens (60% used)
|
||||
↓
|
||||
During implementation: 170k tokens (85% used) ⚠️
|
||||
↓
|
||||
Context explodes: 195k tokens (98% used) ❌
|
||||
↓
|
||||
Agent performance degrades, fails, or times out
|
||||
```
|
||||
|
||||
**The realization:** More context ≠ better performance. Too much context = cognitive overload.
|
||||
|
||||
## The R&D Framework
|
||||
|
||||
There are only two ways to manage your context window:
|
||||
|
||||
```text
|
||||
R - REDUCE
|
||||
└─→ Minimize what enters the context window
|
||||
|
||||
D - DELEGATE
|
||||
└─→ Move work to other agents' context windows
|
||||
```
|
||||
|
||||
**Everything else is a tactic implementing R or D.**
|
||||
|
||||
## The Four Levels of Context Protection
|
||||
|
||||
### Level 1: Beginner - Reduce Waste
|
||||
|
||||
**Focus:** Stop wasting tokens on unused resources
|
||||
|
||||
#### Tactic 1: Eliminate Default MCP Servers
|
||||
|
||||
**Problem:**
|
||||
|
||||
```bash
|
||||
# Default mcp.json
|
||||
{
|
||||
"mcpServers": {
|
||||
"firecrawl": {...}, # 6k tokens
|
||||
"github": {...}, # 8k tokens
|
||||
"postgres": {...}, # 5k tokens
|
||||
"redis": {...} # 5k tokens
|
||||
}
|
||||
}
|
||||
# Total: 24k tokens always loaded (12% of 200k window!)
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Option 1: Delete default mcp.json entirely
|
||||
rm .claude/mcp.json
|
||||
|
||||
# Option 2: Load selectively
|
||||
claude-mcp-config --strict specialized-configs/firecrawl-only.json
|
||||
# Result: 4k tokens instead of 24k (83% reduction)
|
||||
```
|
||||
|
||||
#### Tactic 2: Minimize CLAUDE.md
|
||||
|
||||
**Before:**
|
||||
|
||||
```markdown
|
||||
# CLAUDE.md (23,000 tokens = 11.5% of window)
|
||||
- 500 lines of API documentation
|
||||
- 300 lines of deployment procedures
|
||||
- 1,500 lines of coding standards
|
||||
- Architecture diagrams
|
||||
- Always loaded, whether relevant or not
|
||||
```
|
||||
|
||||
**After:**
|
||||
|
||||
```markdown
|
||||
# CLAUDE.md (500 tokens = 0.25% of window)
|
||||
# Only universal essentials
|
||||
|
||||
- Fenced code blocks MUST have language
|
||||
- Use rg instead of grep
|
||||
- ALWAYS use set -euo pipefail
|
||||
```
|
||||
|
||||
**Rule:** Only include what you're 100% sure you want loaded 100% of the time.
|
||||
|
||||
#### Tactic 3: Disable Autocompact Buffer
|
||||
|
||||
**Problem:**
|
||||
|
||||
```bash
|
||||
/context
|
||||
|
||||
# Output:
|
||||
autocompact buffer: 22% ⚠️ (44k tokens gone!)
|
||||
messages: 51%
|
||||
system_tools: 8%
|
||||
---
|
||||
Total available: 78% (should be 100%)
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
/config
|
||||
# Set: autocompact = false
|
||||
|
||||
# Now:
|
||||
/context
|
||||
# Output:
|
||||
messages: 51%
|
||||
system_tools: 8%
|
||||
custom_agents: 2%
|
||||
---
|
||||
Total available: 91% ✅ (reclaimed 22%!)
|
||||
```
|
||||
|
||||
**Impact:** Reclaims 40k+ tokens immediately.
|
||||
|
||||
### Level 2: Intermediate - Dynamic Loading
|
||||
|
||||
**Focus:** Load what you need, when you need it
|
||||
|
||||
#### Tactic 4: Context Priming
|
||||
|
||||
**Replace static CLAUDE.md with task-specific `/prime` commands**
|
||||
|
||||
```markdown
|
||||
# .claude/commands/prime.md
|
||||
# General codebase context (2k tokens)
|
||||
Read README, understand structure, report findings
|
||||
|
||||
# .claude/commands/prime-feature.md
|
||||
# Feature development context (3k tokens)
|
||||
Read feature requirements, understand dependencies, plan implementation
|
||||
|
||||
# .claude/commands/prime-api.md
|
||||
# API work context (4k tokens)
|
||||
Read API docs, understand endpoints, review integration patterns
|
||||
```
|
||||
|
||||
**Usage pattern:**
|
||||
|
||||
```bash
|
||||
# Starting feature work
|
||||
/prime-feature
|
||||
|
||||
# vs. having 23k tokens always loaded
|
||||
```
|
||||
|
||||
**Savings:** 20k tokens (87% reduction)
|
||||
|
||||
#### Tactic 5: Sub-Agent Delegation
|
||||
|
||||
**Problem:** Primary agent doing parallel work fills its own context
|
||||
|
||||
```text
|
||||
Primary Agent tries to do:
|
||||
├── Web scraping (15k tokens)
|
||||
├── Documentation fetch (12k tokens)
|
||||
├── Data analysis (10k tokens)
|
||||
└── Synthesis (5k tokens)
|
||||
= 42k tokens in one agent
|
||||
```
|
||||
|
||||
**Solution:** Delegate to sub-agents with isolated contexts
|
||||
|
||||
```text
|
||||
Primary Agent (9k tokens):
|
||||
├→ Sub-Agent 1: Web scraping (15k tokens, isolated)
|
||||
├→ Sub-Agent 2: Docs fetch (12k tokens, isolated)
|
||||
└→ Sub-Agent 3: Analysis (10k tokens, isolated)
|
||||
|
||||
Total work: 46k tokens
|
||||
Primary agent context: Only 9k tokens ✅
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
/load-ai-docs
|
||||
|
||||
# Agent spawns 10 sub-agents for web scraping
|
||||
# Each scrape: ~3k tokens
|
||||
# Total work: 30k tokens
|
||||
# Primary agent context: Still only 9k tokens
|
||||
# Savings: 21k tokens protected
|
||||
```
|
||||
|
||||
**Key insight:** Sub-agents use system prompts (not user prompts), keeping their context isolated from primary.
|
||||
|
||||
### Level 3: Advanced - Multi-Agent Handoff
|
||||
|
||||
**Focus:** Chain agents together without context explosion
|
||||
|
||||
#### Tactic 6: Context Bundles
|
||||
|
||||
**Problem:** Agent 1's context explodes (180k tokens). Need to hand off to fresh Agent 2 without full replay.
|
||||
|
||||
**Solution:** Bundle 60-70% of essential context
|
||||
|
||||
```markdown
|
||||
# context-bundle-2025-01-05-<session-id>.md
|
||||
|
||||
## Context Bundle
|
||||
Created: 2025-01-05 14:30
|
||||
Source Agent: agent-abc123
|
||||
|
||||
## Initial Setup
|
||||
/prime-feature
|
||||
|
||||
## Read Operations (deduplicated)
|
||||
- src/api/endpoints.ts
|
||||
- src/components/Auth.tsx
|
||||
- config/env.ts
|
||||
|
||||
## Key Findings
|
||||
- Auth system uses JWT
|
||||
- API has 15 endpoints
|
||||
- Config needs migration
|
||||
|
||||
## User Prompts (summarized)
|
||||
1. "Implement OAuth2 flow"
|
||||
2. "Add refresh token logic"
|
||||
|
||||
[Excluded: full write operations, detailed read contents, tool execution details]
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Agent 1: Context exploding at 180k
|
||||
# Automatic bundle saved
|
||||
|
||||
# Agent 2: Fresh start (10k base)
|
||||
/loadbundle /path/to/context-bundle-<timestamp>.md
|
||||
# Agent 2 now has 70% of Agent 1's context in ~15k tokens
|
||||
|
||||
# Total: 25k tokens vs. 180k (86% reduction)
|
||||
```
|
||||
|
||||
#### Tactic 7: Composable Workflows (Scout-Plan-Build)
|
||||
|
||||
**Problem:** Single agent searching + planning + building = context explosion
|
||||
|
||||
```text
|
||||
Monolithic Agent:
|
||||
├── Search codebase: 40k tokens
|
||||
├── Read files: 60k tokens
|
||||
├── Plan changes: 20k tokens
|
||||
├── Implement: 30k tokens
|
||||
├── Test: 15k tokens
|
||||
└── Total: 165k tokens (83% used!)
|
||||
```
|
||||
|
||||
**Solution:** Break into composable steps that delegate
|
||||
|
||||
```text
|
||||
/scout-plan-build workflow:
|
||||
|
||||
Step 1: /scout (delegates to 4 parallel sub-agents)
|
||||
├→ Sub-agents search codebase: 4 × 15k = 60k total
|
||||
├→ Output: relevant-files.md (5k tokens)
|
||||
└→ Primary agent context: unchanged
|
||||
|
||||
Step 2: /plan-with-docs
|
||||
├→ Reads relevant-files.md: 5k tokens
|
||||
├→ Scrapes docs: 8k tokens
|
||||
├→ Creates plan: 3k tokens
|
||||
└→ Total added: 16k tokens
|
||||
|
||||
Step 3: /build
|
||||
├→ Reads plan: 3k tokens
|
||||
├→ Implements: 30k tokens
|
||||
└→ Total added: 33k tokens
|
||||
|
||||
Final primary agent context: 10k + 16k + 33k = 59k tokens
|
||||
Savings: 106k tokens (64% reduction)
|
||||
```
|
||||
|
||||
**Why this works:** Scout step offloads searching from planner (R&D: Reduce + Delegate)
|
||||
|
||||
### Level 4: Agentic - Out-of-Loop Systems
|
||||
|
||||
**Focus:** Agents working autonomously while you're AFK
|
||||
|
||||
#### Tactic 8: Focused Agents (One Agent, One Task)
|
||||
|
||||
**Anti-pattern:**
|
||||
|
||||
```text
|
||||
Super Agent (trying to do everything):
|
||||
├── API development
|
||||
├── UI implementation
|
||||
├── Database migrations
|
||||
├── Testing
|
||||
├── Documentation
|
||||
├── Deployment
|
||||
└── Context: 170k tokens (85% used)
|
||||
```
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```text
|
||||
Focused Agent Fleet:
|
||||
├── Agent 1: API only (30k tokens)
|
||||
├── Agent 2: UI only (35k tokens)
|
||||
├── Agent 3: DB only (20k tokens)
|
||||
├── Agent 4: Tests only (25k tokens)
|
||||
├── Agent 5: Docs only (15k tokens)
|
||||
└── Each agent: <35k tokens (max 18% per agent)
|
||||
```
|
||||
|
||||
**Principle:** "A focused engineer is a performant engineer. A focused agent is a performant agent."
|
||||
|
||||
#### Tactic 9: Deletable Agents
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```bash
|
||||
# Create agent for specific task
|
||||
/create-agent docs-writer "Document frontend components"
|
||||
|
||||
# Agent completes task (used 30k tokens)
|
||||
|
||||
# DELETE agent immediately
|
||||
/delete-agent docs-writer
|
||||
|
||||
# Result: 30k tokens freed for next agent
|
||||
```
|
||||
|
||||
**Lifecycle:**
|
||||
|
||||
```text
|
||||
1. Create agent → Task-specific context loaded
|
||||
2. Agent works → Context grows to completion
|
||||
3. Agent completes → Context maxed out
|
||||
4. DELETE agent → Context freed
|
||||
5. Create new agent → Fresh start
|
||||
6. Repeat
|
||||
```
|
||||
|
||||
**Engineering analogy:** "The best code is no code at all. The best agent is a deleted agent."
|
||||
|
||||
#### Tactic 10: Background Agent Delegation
|
||||
|
||||
**Problem:** You're in the loop, waiting for agent to finish long task
|
||||
|
||||
**Solution:** Delegate to background agent, continue working
|
||||
|
||||
```bash
|
||||
# In-loop (you wait, your context stays open)
|
||||
/implement-feature "Build auth system"
|
||||
# Your terminal blocked for 20 minutes
|
||||
# Context accumulates: 150k tokens
|
||||
|
||||
# Out-of-loop (you continue working)
|
||||
/background "Build auth system" \
|
||||
--model opus \
|
||||
--report agents/auth-report.md
|
||||
|
||||
# Background agent works independently
|
||||
# Your terminal freed immediately
|
||||
# Background agent context isolated
|
||||
# You get notified when complete
|
||||
```
|
||||
|
||||
**Context protection:**
|
||||
|
||||
- Primary agent: 10k tokens (just manages job queue)
|
||||
- Background agent: 150k tokens (isolated, will be deleted)
|
||||
- Your interactive session: 10k tokens (protected)
|
||||
|
||||
#### Tactic 11: Orchestrator Sleep Pattern
|
||||
|
||||
**Problem:** Orchestrator observing all agent work = context explosion
|
||||
|
||||
```text
|
||||
Orchestrator watches everything:
|
||||
├── Scout 1 work: 15k tokens observed
|
||||
├── Scout 2 work: 15k tokens observed
|
||||
├── Scout 3 work: 15k tokens observed
|
||||
├── Planner work: 25k tokens observed
|
||||
├── Builder work: 35k tokens observed
|
||||
└── Orchestrator context: 105k tokens
|
||||
```
|
||||
|
||||
**Solution:** Orchestrator sleeps while agents work
|
||||
|
||||
```text
|
||||
Orchestrator pattern:
|
||||
1. Create scouts → 3k tokens (commands only)
|
||||
2. SLEEP (not observing)
|
||||
3. Wake every 15s, check status → 1k tokens
|
||||
4. Scouts complete, read outputs → 5k tokens
|
||||
5. Create planner → 2k tokens
|
||||
6. SLEEP (not observing)
|
||||
7. Wake every 15s, check status → 1k tokens
|
||||
8. Planner completes, read output → 3k tokens
|
||||
9. Create builder → 2k tokens
|
||||
10. SLEEP (not observing)
|
||||
|
||||
Orchestrator final context: 17k tokens ✅
|
||||
vs. 105k if watching everything (84% reduction)
|
||||
```
|
||||
|
||||
**Key principle:** Orchestrator wakes to coordinate, sleeps while agents work.
|
||||
|
||||
## Monitoring Context Health
|
||||
|
||||
### The /context Command
|
||||
|
||||
```bash
|
||||
/context
|
||||
|
||||
# Healthy agent (beginner level):
|
||||
messages: 8%
|
||||
system_tools: 5%
|
||||
custom_agents: 2%
|
||||
---
|
||||
Total used: 15% ✅ (85% free)
|
||||
|
||||
# Warning (intermediate):
|
||||
messages: 45%
|
||||
mcp_tools: 18%
|
||||
system_tools: 5%
|
||||
---
|
||||
Total used: 68% ⚠️ (32% free, approaching limits)
|
||||
|
||||
# Danger (needs intervention):
|
||||
messages: 72%
|
||||
mcp_tools: 24%
|
||||
system_tools: 5%
|
||||
---
|
||||
Total used: 101% ❌ (context overflow!)
|
||||
```
|
||||
|
||||
### Success Metrics by Level
|
||||
|
||||
| Level | Target Context Free | What This Enables |
|
||||
|-------|---------------------|-------------------|
|
||||
| Beginner | 85-90% | Basic tasks without running out |
|
||||
| Intermediate | 60-75% | Complex tasks with breathing room |
|
||||
| Advanced | 40-60% | Multi-step workflows without overflow |
|
||||
| Agentic | Per-agent 60-80% | Fleet of focused agents |
|
||||
|
||||
### Warning Signs
|
||||
|
||||
**Your context window is in danger when:**
|
||||
|
||||
❌ **Single agent exceeds 150k tokens**
|
||||
|
||||
- Solution: Split work across multiple agents
|
||||
|
||||
❌ **Agent needs to read >20 files**
|
||||
|
||||
- Solution: Use scout agents to find relevant subset
|
||||
|
||||
❌ **`/context` shows >80% used**
|
||||
|
||||
- Solution: Start fresh agent, use context bundles
|
||||
|
||||
❌ **Agent gets slower/less accurate**
|
||||
|
||||
- Solution: Check context usage, delegate to sub-agents
|
||||
|
||||
❌ **Autocompact buffer active**
|
||||
|
||||
- Solution: Disable it, reclaim 20%+ tokens
|
||||
|
||||
## Context Window Hard Limits
|
||||
|
||||
> "Context window is a hard limit. We have to respect this and work around it."
|
||||
|
||||
### The Reality
|
||||
|
||||
```text
|
||||
Claude Opus 200k limit:
|
||||
├── System prompt: ~8k tokens (4%)
|
||||
├── Available tools: ~5k tokens (2.5%)
|
||||
├── MCP servers: 0-24k tokens (0-12%)
|
||||
├── CLAUDE.md: 0-23k tokens (0-11.5%)
|
||||
├── Custom agents: ~2k tokens (1%)
|
||||
└── Available for work: 138-185k tokens (69-92.5%)
|
||||
|
||||
Best case (optimized): 185k available
|
||||
Worst case (unoptimized): 138k available
|
||||
Difference: 47k tokens (25% of total capacity!)
|
||||
```
|
||||
|
||||
### Real Example from the Field
|
||||
|
||||
> "We were 14% away from exploding our context in our scout-plan-build workflow."
|
||||
|
||||
```text
|
||||
Scout-Plan-Build execution:
|
||||
├── Base context: 15k tokens
|
||||
├── Scout work (4 sub-agents): +40k tokens
|
||||
├── Planner work: +35k tokens
|
||||
├── Builder work: +80k tokens
|
||||
└── Total: 170k tokens
|
||||
|
||||
With autocompact buffer (22%):
|
||||
170k / 0.78 = 218k tokens
|
||||
❌ Exceeds 200k limit by 18k (9% overflow)
|
||||
|
||||
Without autocompact buffer:
|
||||
170k / 1.0 = 170k tokens
|
||||
✅ Within limits with 30k buffer (15% free)
|
||||
```
|
||||
|
||||
**Lesson:** Every percentage point matters when approaching limits.
|
||||
|
||||
## Common Context Explosion Patterns
|
||||
|
||||
### Pattern 1: The Sponge Agent
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Agent reads entire codebase
|
||||
- Opens 50+ files
|
||||
- Context grows 10k tokens every few minutes
|
||||
|
||||
**Cause:** No filtering strategy
|
||||
|
||||
**Fix:**
|
||||
|
||||
```bash
|
||||
# Before: Agent reads everything
|
||||
Agent: "Analyzing codebase..."
|
||||
[reads 100 files = 150k tokens]
|
||||
|
||||
# After: Scout first
|
||||
/scout "Find files related to authentication"
|
||||
# Scout outputs: 5 relevant files
|
||||
Agent reads only those 5 files = 8k tokens
|
||||
```
|
||||
|
||||
### Pattern 2: The Accumulator
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Long conversation
|
||||
- Many tool calls
|
||||
- Context steadily grows to limit
|
||||
|
||||
**Cause:** Not resetting agent between phases
|
||||
|
||||
**Fix:**
|
||||
|
||||
```bash
|
||||
# Phase 1: Exploration
|
||||
[Agent explores, context hits 120k]
|
||||
|
||||
# Phase 2: Implementation
|
||||
# ❌ Bad: Continue same agent (will overflow)
|
||||
# ✅ Good: New agent with context bundle
|
||||
|
||||
/loadbundle context-from-phase-1.md
|
||||
# Fresh agent (15k) + bundle (20k) = 35k tokens
|
||||
# Ready for implementation without overflow
|
||||
```
|
||||
|
||||
### Pattern 3: The Observer
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Orchestrator context growing rapidly
|
||||
- Watching all sub-agent work
|
||||
- Can't coordinate more than 2-3 agents
|
||||
|
||||
**Cause:** Not using sleep pattern
|
||||
|
||||
**Fix:**
|
||||
|
||||
```python
|
||||
# ❌ Bad: Orchestrator watches everything
|
||||
for agent in agents:
|
||||
result = orchestrator.watch_agent_work(agent) # Observes all work
|
||||
orchestrator.context += result # Context explodes
|
||||
|
||||
# ✅ Good: Orchestrator sleeps
|
||||
for agent in agents:
|
||||
orchestrator.create_and_command(agent)
|
||||
orchestrator.sleep() # Not observing
|
||||
|
||||
orchestrator.wake_and_check_status() # Only reads summaries
|
||||
```
|
||||
|
||||
## The "200k is Plenty" Principle
|
||||
|
||||
> "I'm super excited for larger effective context windows, but 200k context window is plenty. You're just stuffing a single agent with too much work."
|
||||
|
||||
**The mindset shift:**
|
||||
|
||||
```text
|
||||
Beginner thinking:
|
||||
"I need a bigger context window"
|
||||
"If only I had 500k tokens..."
|
||||
"My task is too complex for 200k"
|
||||
|
||||
Expert thinking:
|
||||
"I need better context management"
|
||||
"I'm overloading a single agent"
|
||||
"I should split this across focused agents"
|
||||
```
|
||||
|
||||
**The truth:** Most context explosions are design problems, not capacity problems.
|
||||
|
||||
### Why 200k is Sufficient
|
||||
|
||||
**With proper protection:**
|
||||
|
||||
```text
|
||||
Task: Refactor authentication across 50-file codebase
|
||||
|
||||
Approach 1 (Single Agent - fails):
|
||||
├── Agent reads 50 files: 75k tokens
|
||||
├── Agent plans changes: 20k tokens
|
||||
├── Agent implements: 80k tokens
|
||||
├── Agent tests: 30k tokens
|
||||
└── Total: 205k tokens ❌ (overflow by 5k)
|
||||
|
||||
Approach 2 (Multi-Agent - succeeds):
|
||||
├── Scout finds relevant 10 files: 15k tokens
|
||||
├── Planner creates strategy: 20k tokens (new agent)
|
||||
├── Builder 1 (auth logic): 35k tokens (new agent)
|
||||
├── Builder 2 (UI changes): 30k tokens (new agent)
|
||||
├── Tester verifies: 25k tokens (new agent)
|
||||
└── Max per agent: 35k tokens ✅ (all within limits)
|
||||
```
|
||||
|
||||
## Integration with Other Patterns
|
||||
|
||||
Context window protection enables:
|
||||
|
||||
**Progressive Disclosure:**
|
||||
|
||||
- Reduces: Minimal static context
|
||||
- Enables: Dynamic loading via priming
|
||||
|
||||
**Core 4 Management:**
|
||||
|
||||
- Protects: Context (pillar #1)
|
||||
- Enables: Better model/prompt/tools choices
|
||||
|
||||
**Orchestration:**
|
||||
|
||||
- Requires: Context protection (orchestrator sleep)
|
||||
- Enables: Fleet management without overflow
|
||||
|
||||
**Observability:**
|
||||
|
||||
- Monitors: Context usage via hooks
|
||||
- Prevents: Unnoticed context explosion
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Reduce and Delegate** - The only two strategies that matter
|
||||
|
||||
2. **A focused agent is a performant agent** - Single-purpose beats multi-purpose
|
||||
|
||||
3. **Agents are deletable** - Free context by removing completed agents
|
||||
|
||||
4. **200k is plenty** - Context explosions are design problems
|
||||
|
||||
5. **Monitor constantly** - `/context` command is your best friend
|
||||
|
||||
6. **Orchestrators must sleep** - Don't observe all agent work
|
||||
|
||||
7. **Context bundles over full replay** - 70% of context in 10% of tokens
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary sources:**
|
||||
|
||||
- Elite Context Engineering (R&D framework, 4 levels, all tactics)
|
||||
- Claude 2.0 (autocompact buffer, hard limits, scout-plan-build)
|
||||
|
||||
**Supporting sources:**
|
||||
|
||||
- One Agent to Rule Them All (orchestrator sleep, 200k principle, deletable agents)
|
||||
- Sub-Agents (sub-agent delegation, context isolation)
|
||||
|
||||
**Key quotes:**
|
||||
|
||||
- "200k context window is plenty. You're just stuffing a single agent with too much work." (One Agent)
|
||||
- "A focused agent is a performant agent." (Elite Context Engineering)
|
||||
- "We were 14% away from exploding our context." (Claude 2.0)
|
||||
- "There are only two ways to manage your context window: R and D." (Elite Context Engineering)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Progressive Disclosure](../reference/progressive-disclosure.md) - Context loading strategies
|
||||
- [Orchestrator Pattern](orchestrator-pattern.md) - Fleet management requiring protection
|
||||
- [Evolution Path](../workflows/evolution-path.md) - Progression through protection levels
|
||||
- [Core 4 Framework](../reference/core-4-framework.md) - Context as first pillar
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Context window management separates beginners from experts. Master it, and you can scale infinitely with focused agents.
|
||||
434
skills/multi-agent-composition/patterns/decision-framework.md
Normal file
434
skills/multi-agent-composition/patterns/decision-framework.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# Decision Framework: Choosing the Right Claude Code Component
|
||||
|
||||
This guide helps you choose the right Claude Code component for your specific task. **Always start with prompts**—master the primitive first before scaling to other components.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [The Decision Tree](#the-decision-tree)
|
||||
- [Quick Reference: Decision Matrix](#quick-reference-decision-matrix)
|
||||
- [When to Use Each Component](#when-to-use-each-component)
|
||||
- [Use Skills When](#use-skills-when)
|
||||
- [Use Sub-Agents When](#use-sub-agents-when)
|
||||
- [Use Slash Commands When](#use-slash-commands-when)
|
||||
- [Use MCP Servers When](#use-mcp-servers-when)
|
||||
- [Use Hooks When](#use-hooks-when)
|
||||
- [Use Plugins When](#use-plugins-when)
|
||||
- [Use Case Examples from the Field](#use-case-examples-from-the-field)
|
||||
- [Composition Rules and Boundaries](#composition-rules-and-boundaries)
|
||||
- [What Can Compose What](#what-can-compose-what)
|
||||
- [Critical Composition Rules](#critical-composition-rules)
|
||||
- [The Proper Evolution Path](#the-proper-evolution-path)
|
||||
- [Stage 1: Start with a Prompt](#stage-1-start-with-a-prompt)
|
||||
- [Stage 2: Add Sub-Agent if Parallelism Needed](#stage-2-add-sub-agent-if-parallelism-needed)
|
||||
- [Stage 3: Create Skill When Management Needed](#stage-3-create-skill-when-management-needed)
|
||||
- [Stage 4: Add MCP if External Data Needed](#stage-4-add-mcp-if-external-data-needed)
|
||||
- [Common Decision Anti-Patterns](#common-decision-anti-patterns)
|
||||
- [Anti-Pattern 1: Converting All Slash Commands to Skills](#anti-pattern-1-converting-all-slash-commands-to-skills)
|
||||
- [Anti-Pattern 2: Using Skills for One-Off Tasks](#anti-pattern-2-using-skills-for-one-off-tasks)
|
||||
- [Anti-Pattern 3: Skipping the Primitive](#anti-pattern-3-skipping-the-primitive)
|
||||
- [Anti-Pattern 4: Using Sub-Agents When Context Matters](#anti-pattern-4-using-sub-agents-when-context-matters)
|
||||
- [Anti-Pattern 5: Forgetting MCP is for External Only](#anti-pattern-5-forgetting-mcp-is-for-external-only)
|
||||
- [Decision Checklist](#decision-checklist)
|
||||
- [Summary: The Golden Rules](#summary-the-golden-rules)
|
||||
|
||||
## The Decision Tree
|
||||
|
||||
Start here when deciding which component to use:
|
||||
|
||||
```text
|
||||
1. START HERE: Build a Prompt (Slash Command)
|
||||
↓
|
||||
2. Need parallelization or isolated context?
|
||||
YES → Use Sub-Agent
|
||||
NO → Continue
|
||||
↓
|
||||
3. External data/service integration?
|
||||
YES → Use MCP Server
|
||||
NO → Continue
|
||||
↓
|
||||
4. One-off task (simple, direct)?
|
||||
YES → Use Slash Command
|
||||
NO → Continue
|
||||
↓
|
||||
5. Repeatable workflow (pattern detection)?
|
||||
YES → Use Agent Skill
|
||||
NO → Continue
|
||||
↓
|
||||
6. Lifecycle event automation?
|
||||
YES → Use Hook
|
||||
NO → Continue
|
||||
↓
|
||||
7. Sharing/distributing to team?
|
||||
YES → Use Plugin
|
||||
NO → Default to Slash Command (prompt)
|
||||
```
|
||||
|
||||
**Critical Rule:** Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
|
||||
|
||||
## Quick Reference: Decision Matrix
|
||||
|
||||
| Task Type | Component | Reason |
|
||||
|-----------|-----------|---------|
|
||||
| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
|
||||
| External data/service access | MCP Server | Integration point |
|
||||
| Parallel/isolated work | Sub-Agent | Context isolation |
|
||||
| Parallel workflow tasks | Sub-Agent | **Whenever you see "parallel", think sub-agents** |
|
||||
| One-off task | Slash Command | Simple, direct |
|
||||
| Lifecycle automation | Hook | Event-driven |
|
||||
| Team distribution | Plugin | Packaging |
|
||||
|
||||
## When to Use Each Component
|
||||
|
||||
### Use Skills When
|
||||
|
||||
**Signal keywords:** "automatic," "repeat," "manage," "workflow"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- You have a **REPEAT** problem that needs **MANAGEMENT**
|
||||
- Multiple related operations need coordination
|
||||
- You want **automatic** behavior (agent-invoked)
|
||||
- The problem domain requires orchestration of multiple components
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Managing git work trees (create, list, remove, merge, update)
|
||||
- Detecting style guide violations across codebase
|
||||
- Automatic PDF text extraction and processing
|
||||
- Video processing workflows with multiple steps
|
||||
|
||||
**NOT for:**
|
||||
|
||||
- One-off tasks → Use Slash Command instead
|
||||
- Simple operations → Use Slash Command instead
|
||||
- Problems solved well by a single prompt → Don't over-engineer
|
||||
|
||||
**Remember:** Skills are for managing problem domains, not solving one-off tasks.
|
||||
|
||||
### Use Sub-Agents When
|
||||
|
||||
**Signal keywords:** "parallel," "scale," "bulk," "isolated," "batch"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- **Parallelization** is needed
|
||||
- **Context isolation** is required
|
||||
- Scale tasks and batch operations
|
||||
- You're okay with losing context afterward
|
||||
- Each task can run independently
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Comprehensive security audits
|
||||
- Fix & debug tests at scale
|
||||
- Parallel workflow tasks
|
||||
- Bulk operations on multiple files
|
||||
- Isolated research that doesn't pollute main context
|
||||
|
||||
**NOT for:**
|
||||
|
||||
- Tasks that need to share context → Use main conversation
|
||||
- Sequential operations → Use Slash Command or Skill
|
||||
- Tasks that need to spawn more sub-agents → Hard limit: no nesting
|
||||
|
||||
**Critical constraint:** You must be okay with losing context afterward. Sub-agent context doesn't persist in the main conversation (unless you use resumable sub-agents).
|
||||
|
||||
**Golden rule:** "Whenever you see parallel, you should always just think sub-agents. Nothing else supports parallel calling."
|
||||
|
||||
### Use Slash Commands When
|
||||
|
||||
**Signal keywords:** "one-off," "simple," "quick," "manual"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- One-off tasks
|
||||
- Simple repeatable actions
|
||||
- You're starting a new workflow
|
||||
- Building the primitive before composing
|
||||
- You want manual control over invocation
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Git commit messages (one at a time)
|
||||
- Create UI component
|
||||
- Run specific code generation
|
||||
- Execute a well-defined task
|
||||
- Quick transformations
|
||||
|
||||
**Philosophy:** "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
|
||||
|
||||
**Remember:** Slash commands are the primitive foundation. Master these first before anything else.
|
||||
|
||||
### Use MCP Servers When
|
||||
|
||||
**Signal keywords:** "external," "database," "API," "service," "integration"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- External integrations are needed
|
||||
- Data sources outside Claude Code
|
||||
- Third-party services
|
||||
- Database connections
|
||||
- Real-time data access
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Connect to Jira
|
||||
- Query databases (PostgreSQL, etc.)
|
||||
- Fetch real-time weather data
|
||||
- GitHub integration
|
||||
- Slack integration
|
||||
- Figma designs
|
||||
|
||||
**NOT for:**
|
||||
|
||||
- Internal orchestration → Use Skills instead
|
||||
- Pure computation → Use Slash Command or Skill
|
||||
|
||||
**Clear rule:** External = MCP, Internal orchestration = Skills
|
||||
|
||||
**Context consideration:** MCP servers can "torch your context window" by loading all their context at startup, unlike Skills which use progressive disclosure.
|
||||
|
||||
### Use Hooks When
|
||||
|
||||
**Signal keywords:** "lifecycle," "event," "automation," "deterministic"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- Deterministic automation at lifecycle events
|
||||
- Want to execute commands at specific moments
|
||||
- Need to balance agent autonomy with deterministic control
|
||||
- Workflow automation that should always happen
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Run linters before code submission
|
||||
- Auto-format code after generation
|
||||
- Trigger tests after file changes
|
||||
- Capture context at specific points
|
||||
|
||||
**Philosophy:** "If you really want to scale, you need both" - agents AND deterministic workflows.
|
||||
|
||||
**Use for:** Adding determinism rather than always relying on the agent to decide.
|
||||
|
||||
### Use Plugins When
|
||||
|
||||
**Signal keywords:** "share," "distribute," "package," "team"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- Sharing/distributing to team
|
||||
- Packaging multiple components together
|
||||
- Reusable work across projects
|
||||
- Team-wide extensions
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Distribute custom skills to team
|
||||
- Bundle MCP servers for automatic start
|
||||
- Share slash commands across projects
|
||||
- Package hooks and configurations
|
||||
|
||||
**Philosophy:** "Plugins let you package and distribute these sets of work. This isn't super interesting. It's just a way to share and reuse cloud code extensions."
|
||||
|
||||
## Use Case Examples from the Field
|
||||
|
||||
Real examples with reasoning:
|
||||
|
||||
| Use Case | Component | Reasoning |
|
||||
|----------|-----------|-----------|
|
||||
| Automatic PDF text extraction | Agent Skill | Keyword "automatic", repeat behavior |
|
||||
| Connect to Jira | MCP Server | External source |
|
||||
| Comprehensive security audit | Sub-Agent | Scale, isolated context, not automatic |
|
||||
| Generalized git commit messages | Slash Command | Simple one-step task |
|
||||
| Query database | MCP Server | External data source (start here) |
|
||||
| Fix/debug tests at scale | Sub-Agent | Parallel work, scale |
|
||||
| Detect style guide violations | Agent Skill | Repeat behavior pattern |
|
||||
| Fetch real-time weather | MCP Server | Third-party service integration |
|
||||
| Create UI component | Slash Command | Simple one-off task |
|
||||
| Parallel workflow tasks | Sub-Agent | Keyword "parallel" |
|
||||
|
||||
## Composition Rules and Boundaries
|
||||
|
||||
### What Can Compose What
|
||||
|
||||
**Skills (Top Compositional Layer):**
|
||||
|
||||
- ✅ Can use: MCP Servers
|
||||
- ✅ Can use: Sub-Agents
|
||||
- ✅ Can use: Slash Commands
|
||||
- ✅ Can use: Other Skills
|
||||
- ❌ Cannot: Nest sub-agents/prompts directly (must use SlashCommand tool)
|
||||
|
||||
**Slash Commands (Primitive + Compositional):**
|
||||
|
||||
- ✅ Can use: Skills (via SlashCommand tool)
|
||||
- ✅ Can use: MCP Servers
|
||||
- ✅ Can use: Sub-Agents
|
||||
- ✅ Acts as: BOTH primitive AND composition point
|
||||
|
||||
**Sub-Agents (Execution Layer):**
|
||||
|
||||
- ✅ Can use: Slash Commands (via SlashCommand tool)
|
||||
- ✅ Can use: Skills (via SlashCommand tool)
|
||||
- ❌ CANNOT use: Other Sub-Agents (hard limit)
|
||||
|
||||
**MCP Servers (Integration Layer):**
|
||||
|
||||
- ℹ️ Lower level unit, used BY skills
|
||||
- ℹ️ Not using skills
|
||||
- ℹ️ Expose services to all components
|
||||
|
||||
### Critical Composition Rules
|
||||
|
||||
1. **Sub-Agents cannot nest** - No sub-agent spawning other sub-agents (prevents infinite nesting)
|
||||
2. **Skills don't execute code** - They guide Claude to use available tools
|
||||
3. **Slash commands can be invoked manually or via SlashCommand tool**
|
||||
4. **Skills use the SlashCommand tool** to compose prompts and sub-agents
|
||||
5. **No circular dependencies** - Skills can use other skills but cannot nest circularly
|
||||
|
||||
## The Proper Evolution Path
|
||||
|
||||
When building new capabilities, follow this progression:
|
||||
|
||||
### Stage 1: Start with a Prompt
|
||||
|
||||
**Goal:** Solve the basic problem
|
||||
|
||||
Create a simple prompt or slash command that accomplishes the core task.
|
||||
|
||||
**Example (Git Work Trees):** Create one work tree
|
||||
|
||||
```bash
|
||||
/create-worktree feature-branch
|
||||
```
|
||||
|
||||
**When to stay here:** The task is one-off or infrequent.
|
||||
|
||||
### Stage 2: Add Sub-Agent if Parallelism Needed
|
||||
|
||||
**Goal:** Scale to multiple parallel operations
|
||||
|
||||
If you need to do the same thing many times in parallel, use a sub-agent.
|
||||
|
||||
**Example (Git Work Trees):** Create multiple work trees in parallel
|
||||
|
||||
```bash
|
||||
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
|
||||
```
|
||||
|
||||
**When to stay here:** Parallel execution is the only requirement, no orchestration needed.
|
||||
|
||||
### Stage 3: Create Skill When Management Needed
|
||||
|
||||
**Goal:** Bundle multiple related operations
|
||||
|
||||
When the problem grows to require management, create a skill.
|
||||
|
||||
**Example (Git Work Trees):** Manage work trees (create, list, remove, merge, update)
|
||||
|
||||
Now you have a cohesive work tree manager skill that:
|
||||
|
||||
- Creates new work trees
|
||||
- Lists existing work trees
|
||||
- Removes old work trees
|
||||
- Merges work trees
|
||||
- Updates work tree status
|
||||
|
||||
**When to stay here:** Most domain-specific workflows stop here.
|
||||
|
||||
### Stage 4: Add MCP if External Data Needed
|
||||
|
||||
**Goal:** Integrate external systems
|
||||
|
||||
Only add MCP servers when you need data from outside Claude Code.
|
||||
|
||||
**Example (Git Work Trees):** Query external repo metadata from GitHub API
|
||||
|
||||
Now your skill can query GitHub for:
|
||||
|
||||
- Branch protection rules
|
||||
- CI/CD status
|
||||
- Pull request information
|
||||
|
||||
**Final state:** Full-featured work tree manager with external integration.
|
||||
|
||||
## Common Decision Anti-Patterns
|
||||
|
||||
### ❌ Anti-Pattern 1: Converting All Slash Commands to Skills
|
||||
|
||||
**Mistake:** "I'm going to convert all my slash commands to skills because skills are better."
|
||||
|
||||
**Why wrong:** Skills are for repeatable workflows that need management, not simple one-off tasks. Slash commands are the primitive—you need them.
|
||||
|
||||
**Correct approach:** Keep slash commands for simple tasks. Only create a skill when you're managing a problem domain with multiple related operations.
|
||||
|
||||
### ❌ Anti-Pattern 2: Using Skills for One-Off Tasks
|
||||
|
||||
**Mistake:** "I need to create a UI component once, so I'll build a skill for it."
|
||||
|
||||
**Why wrong:** Skills are for repeat problems. One-off tasks should use slash commands.
|
||||
|
||||
**Correct approach:** Use a slash command for the one-off task. If you find yourself doing it repeatedly, then consider a skill.
|
||||
|
||||
### ❌ Anti-Pattern 3: Skipping the Primitive
|
||||
|
||||
**Mistake:** "I'm going to start by building a skill because it's more advanced."
|
||||
|
||||
**Why wrong:** If you don't master prompts, you can't build effective skills. Everything is prompts in the end.
|
||||
|
||||
**Correct approach:** Always start with a prompt. Build the primitive first. Scale up only when needed.
|
||||
|
||||
### ❌ Anti-Pattern 4: Using Sub-Agents When Context Matters
|
||||
|
||||
**Mistake:** "I'll use a sub-agent for this research task and then reference the findings later."
|
||||
|
||||
**Why wrong:** Sub-agent context is isolated. You lose it after the sub-agent finishes (unless using resumable sub-agents).
|
||||
|
||||
**Correct approach:** If you need the context later, do the work in the main conversation or use a resumable sub-agent.
|
||||
|
||||
### ❌ Anti-Pattern 5: Forgetting MCP is for External Only
|
||||
|
||||
**Mistake:** "I'll build an MCP server to orchestrate internal workflows."
|
||||
|
||||
**Why wrong:** MCP servers are for external integrations. Internal orchestration should use skills.
|
||||
|
||||
**Correct approach:** MCP = external, Skills = internal orchestration. Keep them separate.
|
||||
|
||||
## Decision Checklist
|
||||
|
||||
Before you start building, ask yourself:
|
||||
|
||||
**Basic Questions:**
|
||||
|
||||
- [ ] Have I started with a prompt? (Non-negotiable)
|
||||
- [ ] Is this a one-off task or repeatable?
|
||||
- [ ] Do I need external data or services?
|
||||
- [ ] Is parallelization required?
|
||||
- [ ] Am I okay losing context after execution?
|
||||
|
||||
**Composition Questions:**
|
||||
|
||||
- [ ] Am I trying to nest sub-agents? (Not allowed)
|
||||
- [ ] Am I converting a simple slash command to a skill? (Probably wrong)
|
||||
- [ ] Am I using MCP for internal orchestration? (Should use skills)
|
||||
- [ ] Have I considered the evolution path? (Prompt → Sub-agent → Skill → MCP)
|
||||
|
||||
**Context Questions:**
|
||||
|
||||
- [ ] Will this torch my context window? (MCP consideration)
|
||||
- [ ] Do I need progressive disclosure? (Skills benefit)
|
||||
- [ ] Is context isolation critical? (Sub-agent benefit)
|
||||
- [ ] Will I need this context later? (Don't use sub-agent)
|
||||
|
||||
## Summary: The Golden Rules
|
||||
|
||||
1. **Always start with prompts** - Master the primitive first
|
||||
2. **"Parallel" keyword = Sub-Agents** - Nothing else supports parallel calling
|
||||
3. **External = MCP, Internal = Skills** - Clear separation of concerns
|
||||
4. **One-off = Slash Command** - Don't over-engineer
|
||||
5. **Repeat + Management = Skill** - Only scale when needed
|
||||
6. **Don't convert all slash commands to skills** - Huge mistake
|
||||
7. **Skills compose upward, not downward** - Build from primitives
|
||||
|
||||
Remember The Core 4: Context, Model, Prompt, Tools. Master these fundamentals, and you'll master the compositional units.
|
||||
925
skills/multi-agent-composition/patterns/hooks-in-composition.md
Normal file
925
skills/multi-agent-composition/patterns/hooks-in-composition.md
Normal file
@@ -0,0 +1,925 @@
|
||||
# Hooks for Observability and Control
|
||||
|
||||
> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
|
||||
|
||||
Claude Code hooks provide deterministic control over agent behavior and enable comprehensive monitoring of multi-agent systems.
|
||||
|
||||
## What Are Hooks?
|
||||
|
||||
**Hooks are lifecycle event handlers that let you execute custom code at specific points in Claude Code's execution.**
|
||||
|
||||
```text
|
||||
Agent Lifecycle:
|
||||
├── pre-tool-use hook → Before any tool executes
|
||||
├── [Tool executes]
|
||||
├── post-tool-use hook → After tool completes
|
||||
├── notification hook → When agent needs input
|
||||
├── sub-agent-stop hook → When sub-agent finishes
|
||||
└── stop hook → When agent completes response
|
||||
```
|
||||
|
||||
**Two killer use cases:**
|
||||
|
||||
1. **Observability** - Know what your agents are doing
|
||||
2. **Control** - Steer and block agent behavior
|
||||
|
||||
## The Five Hooks
|
||||
|
||||
### 1. pre-tool-use
|
||||
|
||||
**When it fires:** Before any tool executes
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Block dangerous commands (`rm -rf`, destructive operations)
|
||||
- Prevent access to sensitive files (`.env`, `credentials.json`)
|
||||
- Log tool attempts before execution
|
||||
- Validate tool parameters
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"toolName": "bash",
|
||||
"toolInput": {
|
||||
"command": "rm -rf /",
|
||||
"description": "Remove all files"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Block dangerous commands**
|
||||
|
||||
```python
|
||||
# .claude/hooks/pre-tool-use.py
|
||||
# /// script
|
||||
# dependencies = []
|
||||
# ///
|
||||
|
||||
import sys
|
||||
import json
|
||||
import re
|
||||
|
||||
def is_dangerous_remove_command(tool_name, tool_input):
|
||||
"""Block any rm -rf commands"""
|
||||
if tool_name != "bash":
|
||||
return False
|
||||
|
||||
command = tool_input.get("command", "")
|
||||
dangerous_patterns = [
|
||||
r'\brm\s+-rf\b',
|
||||
r'\brm\s+-fr\b',
|
||||
r'\brm\s+.*-[rf].*\*',
|
||||
]
|
||||
|
||||
return any(re.search(pattern, command) for pattern in dangerous_patterns)
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
tool_name = input_data.get("toolName")
|
||||
tool_input = input_data.get("toolInput", {})
|
||||
|
||||
if is_dangerous_remove_command(tool_name, tool_input):
|
||||
# Block the command
|
||||
output = {
|
||||
"allow": False,
|
||||
"message": "❌ Blocked dangerous rm command"
|
||||
}
|
||||
else:
|
||||
output = {"allow": True}
|
||||
|
||||
print(json.dumps(output))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Configuration in settings.json:**
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"pre-tool-use": [
|
||||
{
|
||||
"matcher": {}, // Empty = matches all tools
|
||||
"commands": [
|
||||
"uv run .claude/hooks/pre-tool-use.py"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. post-tool-use
|
||||
|
||||
**When it fires:** After a tool completes execution
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Log tool execution results
|
||||
- Track which tools are used most frequently
|
||||
- Measure tool execution time
|
||||
- Build observability dashboards
|
||||
- Summarize tool output with small models
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"toolName": "write",
|
||||
"toolInput": {
|
||||
"file_path": "/path/to/file.py",
|
||||
"content": "..."
|
||||
},
|
||||
"toolResult": {
|
||||
"success": true,
|
||||
"output": "File written successfully"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Event logging with summarization**
|
||||
|
||||
```python
|
||||
# .claude/hooks/post-tool-use.py
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
from anthropic import Anthropic
|
||||
|
||||
def summarize_event(tool_name, tool_input, tool_result):
|
||||
"""Use Haiku to summarize what happened"""
|
||||
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
|
||||
prompt = f"""Summarize this tool execution in 1 sentence:
|
||||
Tool: {tool_name}
|
||||
Input: {json.dumps(tool_input, indent=2)}
|
||||
Result: {json.dumps(tool_result, indent=2)}
|
||||
|
||||
Be concise and focus on what was accomplished."""
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-3-haiku-20240307", # Small, fast, cheap
|
||||
max_tokens=100,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
|
||||
return response.content[0].text
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
|
||||
# Generate summary using small model
|
||||
summary = summarize_event(
|
||||
input_data.get("toolName"),
|
||||
input_data.get("toolInput", {}),
|
||||
input_data.get("toolResult", {})
|
||||
)
|
||||
|
||||
# Log the event with summary
|
||||
event = {
|
||||
"toolName": input_data["toolName"],
|
||||
"summary": summary,
|
||||
"timestamp": input_data.get("timestamp")
|
||||
}
|
||||
|
||||
# Send to observability server
|
||||
send_to_server(event)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Why small models?** "I've sent thousands of these events. I've spent less than 20 cents. This is where small fast models really shine."
|
||||
|
||||
### 3. notification
|
||||
|
||||
**When it fires:** When Claude Code needs user input (permission request)
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Text-to-speech notifications
|
||||
- Send alerts to phone/Slack
|
||||
- Log permission requests
|
||||
- Auto-approve specific tools
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"message": "Your agent needs your input",
|
||||
"context": {
|
||||
"toolName": "bash",
|
||||
"command": "bun run apps/hello.ts"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Text-to-speech notification**
|
||||
|
||||
```python
|
||||
# .claude/hooks/notification.py
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
|
||||
def speak(text):
|
||||
"""Use 11Labs API for text-to-speech"""
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/text-to-speech-elevenlabs.py",
|
||||
text
|
||||
])
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
message = input_data.get("message", "Your agent needs your input")
|
||||
|
||||
# Speak the notification
|
||||
speak(message)
|
||||
|
||||
# Log it
|
||||
print(json.dumps({"notified": True}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
### 4. stop
|
||||
|
||||
**When it fires:** Every time Claude Code finishes responding
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Copy full chat transcript for analysis
|
||||
- Completion notifications (text-to-speech)
|
||||
- Session logging
|
||||
- Performance metrics
|
||||
- Agent output summarization
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"transcriptPath": "/path/to/chat-transcript.json",
|
||||
"sessionId": "abc123",
|
||||
"timestamp": "2025-01-05T14:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Save full conversation**
|
||||
|
||||
```python
|
||||
# .claude/hooks/stop.py
|
||||
import sys
|
||||
import json
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
transcript_path = input_data.get("transcriptPath")
|
||||
|
||||
if not transcript_path:
|
||||
return
|
||||
|
||||
# Copy transcript to logs directory
|
||||
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
session_id = input_data.get("sessionId", "unknown")
|
||||
|
||||
logs_dir = Path(".claude/logs")
|
||||
logs_dir.mkdir(exist_ok=True)
|
||||
|
||||
dest = logs_dir / f"chat-{timestamp}-{session_id[:8]}.json"
|
||||
shutil.copy(transcript_path, dest)
|
||||
|
||||
# Announce completion
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/text-to-speech.py",
|
||||
"All set and ready for your next step"
|
||||
])
|
||||
|
||||
print(json.dumps({"logged": True, "file": str(dest)}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Key insight:** "The stop event is the perfect time to copy the entire chat conversation. This is key for observability. What happened? How can we improve it?"
|
||||
|
||||
### 5. sub-agent-stop
|
||||
|
||||
**When it fires:** When a sub-agent completes its work
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Track parallel sub-agent completion
|
||||
- Per-agent performance metrics
|
||||
- Multi-agent orchestration logging
|
||||
- Progress notifications for long-running jobs
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"subAgentId": "agent-123",
|
||||
"transcriptPath": "/path/to/sub-agent-transcript.json",
|
||||
"sessionId": "parent-abc123",
|
||||
"timestamp": "2025-01-05T14:32:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Sub-agent completion tracking**
|
||||
|
||||
```python
|
||||
# .claude/hooks/sub-agent-stop.py
|
||||
import sys
|
||||
import json
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
|
||||
# Log sub-agent completion
|
||||
event = {
|
||||
"type": "sub-agent-complete",
|
||||
"agentId": input_data.get("subAgentId"),
|
||||
"timestamp": input_data.get("timestamp")
|
||||
}
|
||||
|
||||
# Send to observability system
|
||||
send_event(event)
|
||||
|
||||
# Announce
|
||||
speak("Sub agent complete")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Multi-Agent Observability Architecture
|
||||
|
||||
When scaling to 3, 5, 10+ agents, hooks enable comprehensive system visibility.
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Multiple Agents │
|
||||
│ Agent 1 Agent 2 Agent 3 ... Agent N │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────┴──────────┴──────────────────┘ │
|
||||
│ │ │
|
||||
│ Hooks fire │
|
||||
│ ↓ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Bun/Node Server │
|
||||
│ ┌────────────────┐ ┌──────────────┐ │
|
||||
│ │ HTTP Endpoint │────────→│ SQLite DB │ │
|
||||
│ │ /events │ │ (persistence)│ │
|
||||
│ └────────────────┘ └──────────────┘ │
|
||||
│ │ │
|
||||
│ └────────────→ WebSocket Broadcast │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Web Client (Vue/React) │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
|
||||
│ ├──────────────────────────────────────────────────────┤ │
|
||||
│ │ Event Stream (filtered by app/session/event type) │ │
|
||||
│ ├──────────────────────────────────────────────────────┤ │
|
||||
│ │ Event Details (with AI summaries) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Design Principles
|
||||
|
||||
**1. One-Way Data Stream**
|
||||
|
||||
```text
|
||||
Agent → Hook → Server → Database + WebSocket → Client
|
||||
```
|
||||
|
||||
"This one-way data stream keeps things really simple. Every agent is responsible for summarizing their work in the hook before they send it off."
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- Simple architecture
|
||||
- Easy to reason about
|
||||
- No bidirectional complexity
|
||||
- Fast real-time updates
|
||||
|
||||
**2. Event Summarization at the Edge**
|
||||
|
||||
```python
|
||||
# In the hook (runs on agent side)
|
||||
def send_event(app_name, event_type, event_data, summarize=True):
|
||||
if summarize:
|
||||
# Use Haiku to summarize before sending
|
||||
summary = summarize_with_haiku(event_data)
|
||||
event_data["summary"] = summary
|
||||
|
||||
# Send to server
|
||||
requests.post("http://localhost:3000/events", json={
|
||||
"app": app_name,
|
||||
"type": event_type,
|
||||
"data": event_data,
|
||||
"sessionId": os.getenv("CLAUDE_SESSION_ID")
|
||||
})
|
||||
```
|
||||
|
||||
**Why summarize at the edge?**
|
||||
|
||||
- Reduces server load
|
||||
- Cheaper (uses small models locally)
|
||||
- Human-readable summaries immediately available
|
||||
- No server-side LLM dependencies
|
||||
|
||||
**3. Persistent + Real-Time Storage**
|
||||
|
||||
```sql
|
||||
-- SQLite schema
|
||||
CREATE TABLE events (
|
||||
id INTEGER PRIMARY KEY,
|
||||
source_app TEXT NOT NULL,
|
||||
session_id TEXT NOT NULL,
|
||||
event_type TEXT NOT NULL,
|
||||
raw_payload JSON,
|
||||
summary TEXT,
|
||||
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
**Dual persistence:**
|
||||
|
||||
- SQLite for historical queries and analysis
|
||||
- WebSocket for live streaming to UI
|
||||
|
||||
### Implementation Example
|
||||
|
||||
**Hook script structure:**
|
||||
|
||||
```python
|
||||
# .claude/hooks/utils/send-event.py
|
||||
# /// script
|
||||
# dependencies = ["anthropic", "requests"]
|
||||
# ///
|
||||
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
import requests
|
||||
from anthropic import Anthropic
|
||||
|
||||
def summarize_with_haiku(event_data, event_type):
|
||||
"""Generate 1-sentence summary using Haiku"""
|
||||
if event_type not in ["pre-tool-use", "post-tool-use"]:
|
||||
return None
|
||||
|
||||
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
|
||||
prompt = f"Summarize this {event_type} event in 1 sentence: {json.dumps(event_data)}"
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-3-haiku-20240307",
|
||||
max_tokens=50,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
|
||||
return response.content[0].text
|
||||
|
||||
def send_event(app_name, event_type, event_data, summarize=False):
|
||||
"""Send event to observability server"""
|
||||
|
||||
payload = {
|
||||
"app": app_name,
|
||||
"sessionId": os.getenv("CLAUDE_SESSION_ID", "unknown"),
|
||||
"eventType": event_type,
|
||||
"data": event_data,
|
||||
"timestamp": event_data.get("timestamp")
|
||||
}
|
||||
|
||||
if summarize:
|
||||
payload["summary"] = summarize_with_haiku(event_data, event_type)
|
||||
|
||||
try:
|
||||
response = requests.post(
|
||||
"http://localhost:3000/events",
|
||||
json=payload,
|
||||
timeout=1
|
||||
)
|
||||
return response.status_code == 200
|
||||
except Exception as e:
|
||||
# Don't break agent if observability fails
|
||||
print(f"Warning: Failed to send event: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: send-event.py <app-name> <event-type> [--summarize]")
|
||||
sys.exit(1)
|
||||
|
||||
app_name = sys.argv[1]
|
||||
event_type = sys.argv[2]
|
||||
summarize = "--summarize" in sys.argv
|
||||
|
||||
# Read event data from stdin
|
||||
event_data = json.load(sys.stdin)
|
||||
|
||||
success = send_event(app_name, event_type, event_data, summarize)
|
||||
print(json.dumps({"sent": success}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Using in hooks:**
|
||||
|
||||
```python
|
||||
# .claude/hooks/post-tool-use.py
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
|
||||
# Send to observability system with summarization
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/send-event.py",
|
||||
"my-app", # App name
|
||||
"post-tool-use", # Event type
|
||||
"--summarize" # Generate AI summary
|
||||
], input=json.dumps(input_data), text=True)
|
||||
|
||||
print(json.dumps({"logged": True}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Isolated Scripts (Astral UV Pattern)
|
||||
|
||||
**Why:** Hooks should be self-contained, portable, and not depend on your codebase.
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# dependencies = ["anthropic", "requests"]
|
||||
# ///
|
||||
|
||||
# Astral UV single-file script
|
||||
# Runs independently with: uv run script.py
|
||||
# Auto-installs dependencies
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- Works in any codebase
|
||||
- No virtual environment setup
|
||||
- Portable across projects
|
||||
- Easy to test in isolation
|
||||
|
||||
**Alternative: Bun for TypeScript**
|
||||
|
||||
```typescript
|
||||
// .claude/hooks/post-tool-use.ts
|
||||
// Run with: bun run post-tool-use.ts
|
||||
|
||||
import { readSync } from "fs";
|
||||
|
||||
const input = JSON.parse(readSync(0, "utf-8"));
|
||||
// ... hook logic
|
||||
```
|
||||
|
||||
### 2. Never Block the Agent
|
||||
|
||||
```python
|
||||
def main():
|
||||
try:
|
||||
# Hook logic
|
||||
send_to_server(event)
|
||||
except Exception as e:
|
||||
# Log but don't fail
|
||||
print(f"Warning: {e}", file=sys.stderr)
|
||||
# Always output valid JSON
|
||||
print(json.dumps({"error": str(e)}))
|
||||
```
|
||||
|
||||
**Rule:** If observability fails, the agent should continue working.
|
||||
|
||||
### 3. Use Small Fast Models for Summaries
|
||||
|
||||
```text
|
||||
Cost comparison (1,000 events):
|
||||
├── Opus: $15 (overkill for summaries)
|
||||
├── Sonnet: $3 (still expensive)
|
||||
└── Haiku: $0.20 ✅ (perfect for this)
|
||||
```
|
||||
|
||||
"Thousands of events, less than 20 cents. Small fast cheap models shine here."
|
||||
|
||||
### 4. Hash Session IDs for UI Consistency
|
||||
|
||||
```python
|
||||
import hashlib
|
||||
|
||||
def color_for_session(session_id):
|
||||
"""Generate consistent color from session ID"""
|
||||
hash_val = int(hashlib.md5(session_id.encode()).hexdigest()[:6], 16)
|
||||
return f"#{hash_val:06x}"
|
||||
```
|
||||
|
||||
**Result:** Same agent = same color in UI, making it easy to track.
|
||||
|
||||
### 5. Filter and Paginate Events
|
||||
|
||||
```javascript
|
||||
// Client-side filtering
|
||||
const filteredEvents = events
|
||||
.filter(e => e.app === selectedApp || selectedApp === "all")
|
||||
.filter(e => e.eventType === selectedType || selectedType === "all")
|
||||
.slice(0, 100); // Limit displayed events
|
||||
|
||||
// Auto-refresh
|
||||
setInterval(() => fetchLatestEvents(), 5000);
|
||||
```
|
||||
|
||||
### 6. Multiple Hooks Per Event
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"stop": [
|
||||
{
|
||||
"matcher": {},
|
||||
"commands": [
|
||||
"uv run .claude/hooks/stop-chat-log.py",
|
||||
"uv run .claude/hooks/stop-tts.py",
|
||||
"uv run .claude/hooks/stop-notify.py"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hooks run sequentially** in the order specified.
|
||||
|
||||
### 7. Matcher Patterns for Selective Execution
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"pre-tool-use": [
|
||||
{
|
||||
"matcher": {
|
||||
"toolName": "bash"
|
||||
},
|
||||
"commands": ["uv run .claude/hooks/bash-validator.py"]
|
||||
},
|
||||
{
|
||||
"matcher": {
|
||||
"toolName": "write",
|
||||
"toolInput": {
|
||||
"file_path": "**/.env"
|
||||
}
|
||||
},
|
||||
"commands": ["uv run .claude/hooks/block-env-write.py"]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Directory Structure Best Practice
|
||||
|
||||
```text
|
||||
.claude/
|
||||
├── commands/ # Slash commands
|
||||
├── agents/ # Sub-agent definitions
|
||||
└── hooks/ # ← New essential directory
|
||||
├── settings.json # Hook configuration
|
||||
├── pre-tool-use.py
|
||||
├── post-tool-use.py
|
||||
├── notification.py
|
||||
├── stop.py
|
||||
├── sub-agent-stop.py
|
||||
└── utils/ # Shared utilities
|
||||
├── send-event.py
|
||||
├── text-to-speech-elevenlabs.py
|
||||
├── text-to-speech-openai.py
|
||||
└── summarize-haiku.py
|
||||
```
|
||||
|
||||
## Real-World Use Cases
|
||||
|
||||
### Use Case 1: Block Dangerous Operations
|
||||
|
||||
```python
|
||||
# .claude/hooks/pre-tool-use.py
|
||||
|
||||
BLOCKED_COMMANDS = [
|
||||
r'\brm\s+-rf\b', # rm -rf
|
||||
r'\bsudo\s+rm\b', # sudo rm
|
||||
r'\bgit\s+push.*--force\b', # git push --force
|
||||
r'\bdocker\s+system\s+prune\b', # docker system prune
|
||||
]
|
||||
|
||||
BLOCKED_FILES = [
|
||||
r'\.env$',
|
||||
r'credentials\.json$',
|
||||
r'\.ssh/id_rsa$',
|
||||
r'aws.*credentials',
|
||||
]
|
||||
|
||||
def is_blocked(tool_name, tool_input):
|
||||
if tool_name == "bash":
|
||||
command = tool_input.get("command", "")
|
||||
return any(re.search(p, command) for p in BLOCKED_COMMANDS)
|
||||
|
||||
if tool_name in ["read", "write", "edit"]:
|
||||
file_path = tool_input.get("file_path", "")
|
||||
return any(re.search(p, file_path) for p in BLOCKED_FILES)
|
||||
|
||||
return False
|
||||
```
|
||||
|
||||
### Use Case 2: Multi-Agent Task Board
|
||||
|
||||
```text
|
||||
Observability UI showing:
|
||||
|
||||
Active Agents (5):
|
||||
├── [Agent 1] Planning feature (12s ago)
|
||||
├── [Agent 2] Writing tests (45s ago) ⚠️ Needs input
|
||||
├── [Agent 3] Building UI (2m ago)
|
||||
├── [Agent 4] Deploying (5m ago) ✅ Complete
|
||||
└── [Agent 5] Monitoring (ongoing)
|
||||
|
||||
Recent Events (filtered: post-tool-use):
|
||||
├── Agent 3: Wrote src/components/Button.tsx
|
||||
├── Agent 1: Read src/api/endpoints.ts
|
||||
├── Agent 4: Bash: git push origin main
|
||||
└── Agent 2: Test failed: test/auth.test.ts
|
||||
```
|
||||
|
||||
### Use Case 3: Long-Running AFK Agents
|
||||
|
||||
```bash
|
||||
# Start agent with background work
|
||||
/background "Implement entire auth system" --report agents/auth-report.md
|
||||
|
||||
# Agent works autonomously
|
||||
# Hooks send notifications:
|
||||
# - "Starting authentication module"
|
||||
# - "Database schema created"
|
||||
# - "Tests passing"
|
||||
# - "All set and ready for your next step"
|
||||
|
||||
# You're notified via text-to-speech when complete
|
||||
```
|
||||
|
||||
### Use Case 4: Debugging Agent Behavior
|
||||
|
||||
```python
|
||||
# Filter stop events to analyze full chat transcripts
|
||||
|
||||
for event in events.filter(type="stop"):
|
||||
transcript = json.load(open(event.transcriptPath))
|
||||
|
||||
# Analyze:
|
||||
# - What files did agent read?
|
||||
# - What tools were used most?
|
||||
# - Where did agent get confused?
|
||||
# - What patterns led to errors?
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Webhook Timeouts
|
||||
|
||||
```python
|
||||
# Don't block agent on slow external services
|
||||
try:
|
||||
requests.post(webhook_url, json=event, timeout=0.5) # 500ms max
|
||||
except requests.Timeout:
|
||||
# Log locally instead
|
||||
log_to_file(event)
|
||||
```
|
||||
|
||||
### Database Size Management
|
||||
|
||||
```sql
|
||||
-- Rotate old events
|
||||
DELETE FROM events
|
||||
WHERE timestamp < datetime('now', '-30 days');
|
||||
|
||||
-- Or archive
|
||||
INSERT INTO events_archive SELECT * FROM events
|
||||
WHERE timestamp < datetime('now', '-30 days');
|
||||
|
||||
DELETE FROM events
|
||||
WHERE id IN (SELECT id FROM events_archive);
|
||||
```
|
||||
|
||||
### Event Batching
|
||||
|
||||
```python
|
||||
# Batch events before sending
|
||||
events_buffer = []
|
||||
|
||||
def send_event(event):
|
||||
events_buffer.append(event)
|
||||
|
||||
if len(events_buffer) >= 10:
|
||||
flush_events()
|
||||
|
||||
def flush_events():
|
||||
requests.post(server_url, json={"events": events_buffer})
|
||||
events_buffer.clear()
|
||||
```
|
||||
|
||||
## Integration with Observability Platforms
|
||||
|
||||
### Datadog
|
||||
|
||||
```python
|
||||
from datadog import statsd
|
||||
|
||||
def send_to_datadog(event):
|
||||
statsd.increment(f"claude.tool.{event['toolName']}")
|
||||
statsd.histogram(f"claude.duration.{event['toolName']}", event['duration'])
|
||||
```
|
||||
|
||||
### Prometheus
|
||||
|
||||
```python
|
||||
from prometheus_client import Counter, Histogram
|
||||
|
||||
tool_counter = Counter('claude_tool_executions', 'Tool executions', ['tool_name'])
|
||||
tool_duration = Histogram('claude_tool_duration_seconds', 'Tool duration', ['tool_name'])
|
||||
|
||||
def send_to_prometheus(event):
|
||||
tool_counter.labels(tool_name=event['toolName']).inc()
|
||||
tool_duration.labels(tool_name=event['toolName']).observe(event['duration'])
|
||||
```
|
||||
|
||||
### Slack
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def send_to_slack(event):
|
||||
if event['eventType'] == 'notification':
|
||||
requests.post(
|
||||
os.getenv("SLACK_WEBHOOK_URL"),
|
||||
json={"text": f"🤖 Agent needs input: {event['message']}"}
|
||||
)
|
||||
```
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **If you don't measure it, you can't improve it** - Observability is critical for scaling agents
|
||||
|
||||
2. **Keep hooks simple and isolated** - Use single-file scripts (UV, bun, shell)
|
||||
|
||||
3. **Never block the agent** - Hooks should be fast and fault-tolerant
|
||||
|
||||
4. **Small models for summaries** - Haiku is perfect and costs pennies
|
||||
|
||||
5. **One-way data streams** - Simple architecture beats complex bidirectional systems
|
||||
|
||||
6. **Context, Model, Prompt** - Even with hooks, the big three still matter
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary source:** Multi-Agent Observability transcript (complete system architecture, WebSocket streaming, event summarization, SQLite persistence)
|
||||
|
||||
**Supporting source:** Hooked transcript (5 hooks fundamentals, pre-tool-use implementation, text-to-speech integration, isolated scripts pattern)
|
||||
|
||||
**Key quotes:**
|
||||
|
||||
- "When it comes to agentic coding, observability is everything." (Hooked)
|
||||
- "This one-way data stream keeps things really simple." (Multi-Agent Observability)
|
||||
- "Thousands of events, less than 20 cents. Small fast models shine here." (Multi-Agent Observability)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Hooks Reference](../reference/hooks-reference.md) - Complete API reference for all 5 hooks
|
||||
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real observability systems in action
|
||||
- [Core 4 Framework](../reference/core-4-framework.md) - Context, Model, Prompt, Tools
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Observability isn't optional when scaling agents. If you can't see what they're doing, you can't scale them effectively.
|
||||
673
skills/multi-agent-composition/patterns/orchestrator-pattern.md
Normal file
673
skills/multi-agent-composition/patterns/orchestrator-pattern.md
Normal file
@@ -0,0 +1,673 @@
|
||||
# The Orchestrator Pattern
|
||||
|
||||
> "The rate at which you can create and command your agents becomes the constraint of your engineering output. When your agents are slow, you're slow."
|
||||
|
||||
The orchestrator pattern is **Level 5** of agentic engineering: managing fleets of agents through a single interface.
|
||||
|
||||
## The Journey to Orchestration
|
||||
|
||||
```text
|
||||
Level 1: Base agents → Use agents out of the box
|
||||
Level 2: Better agents → Customize prompts and workflows
|
||||
Level 3: More agents → Run multiple agents
|
||||
Level 4: Custom agents → Build specialized solutions
|
||||
Level 5: Orchestration → Manage fleets of agents ← You are here
|
||||
```
|
||||
|
||||
**Key realization:** Single agents hit context window limits. You need orchestration to scale beyond one agent.
|
||||
|
||||
## The Three Pillars
|
||||
|
||||
Multi-agent orchestration requires three components working together:
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 1. ORCHESTRATOR AGENT │
|
||||
│ (Single interface to your fleet) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 2. CRUD FOR AGENTS │
|
||||
│ (Create, Read, Update, Delete agents at scale) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 3. OBSERVABILITY │
|
||||
│ (Monitor performance, costs, and results) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Without all three, orchestration fails. You need:
|
||||
|
||||
- **Orchestrator** to command agents
|
||||
- **CRUD** to manage agent lifecycle
|
||||
- **Observability** to understand what agents are doing
|
||||
|
||||
## Core Principle: The Orchestrator Sleeps
|
||||
|
||||
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. It has created and commanded our agents. Now, our agents are doing the work."
|
||||
|
||||
**The pattern:**
|
||||
|
||||
```text
|
||||
1. User prompts Orchestrator
|
||||
2. Orchestrator creates specialized agents
|
||||
3. Orchestrator commands agents with detailed prompts
|
||||
4. Orchestrator SLEEPS (stops consuming context)
|
||||
5. Agents work autonomously
|
||||
6. Orchestrator wakes periodically to check status
|
||||
7. Orchestrator reports results to user
|
||||
8. Agents are deleted
|
||||
```
|
||||
|
||||
**Why orchestrator sleeps:**
|
||||
|
||||
- Protects its context window
|
||||
- Avoids observing all agent work (too much information)
|
||||
- Only wakes when needed to check status or command agents
|
||||
|
||||
**Example orchestrator sleep pattern:**
|
||||
|
||||
```python
|
||||
# Orchestrator commands agents
|
||||
orchestrator.create_agent("scout", task="Find relevant files")
|
||||
orchestrator.create_agent("builder", task="Implement changes")
|
||||
|
||||
# Orchestrator sleeps, checking status every 15s
|
||||
while not all_agents_complete():
|
||||
orchestrator.sleep(15) # Not consuming context
|
||||
status = orchestrator.check_agent_status()
|
||||
orchestrator.log(status)
|
||||
|
||||
# Wake up to collect results
|
||||
results = orchestrator.get_agent_results()
|
||||
orchestrator.summarize_to_user(results)
|
||||
```
|
||||
|
||||
## Orchestration Patterns
|
||||
|
||||
### Pattern 1: Scout-Plan-Build (Sequential Chaining)
|
||||
|
||||
**Use case:** Complex tasks requiring multiple specialized steps
|
||||
|
||||
**Flow:**
|
||||
|
||||
```text
|
||||
User: "Migrate codebase to new SDK"
|
||||
↓
|
||||
Orchestrator creates Scout agents (4 parallel)
|
||||
├→ Scout 1: Search with Gemini
|
||||
├→ Scout 2: Search with CodeX
|
||||
├→ Scout 3: Search with Haiku
|
||||
└→ Scout 4: Search with Flash
|
||||
↓
|
||||
Scouts output: relevant-files.md with exact locations
|
||||
↓
|
||||
Orchestrator creates Planner agent
|
||||
├→ Reads relevant-files.md
|
||||
├→ Scrapes documentation
|
||||
└→ Outputs: detailed-plan.md
|
||||
↓
|
||||
Orchestrator creates Builder agent
|
||||
├→ Reads detailed-plan.md
|
||||
├→ Executes implementation
|
||||
└→ Tests and validates
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
|
||||
- **Scout step offloads searching from Planner** (R&D framework: Reduce + Delegate)
|
||||
- **Multiple scout models** provide diverse perspectives
|
||||
- **Planner only sees relevant files**, not entire codebase
|
||||
- **Builder focused on execution**, not planning
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```bash
|
||||
# Composable slash commands
|
||||
/scout-plan-build "Migrate to new Claude Agent SDK"
|
||||
|
||||
# Internally runs:
|
||||
/scout "Find files needing SDK migration"
|
||||
/plan-with-docs docs=https://agent-sdk-docs.com
|
||||
/build plan=agents/plans/sdk-migration.md
|
||||
```
|
||||
|
||||
**Context savings:**
|
||||
|
||||
```text
|
||||
Without scouts:
|
||||
├── Planner searches entire codebase: 50k tokens
|
||||
├── Planner reads irrelevant files: 30k tokens
|
||||
└── Total wasted: 80k tokens
|
||||
|
||||
With scouts:
|
||||
├── 4 scouts search in parallel (isolated contexts)
|
||||
├── Planner reads only relevant-files.md: 5k tokens
|
||||
└── Savings: 75k tokens (94% reduction)
|
||||
```
|
||||
|
||||
### Pattern 2: Plan-Build-Review-Ship (Task Board)
|
||||
|
||||
**Use case:** Structured development lifecycle with quality gates
|
||||
|
||||
**Flow:**
|
||||
|
||||
```text
|
||||
User: "Update HTML titles across application"
|
||||
↓
|
||||
Task created → PLAN column
|
||||
↓
|
||||
Orchestrator creates Planner agent
|
||||
├→ Analyzes requirements
|
||||
├→ Creates implementation plan
|
||||
└→ Moves task to BUILD
|
||||
↓
|
||||
Orchestrator creates Builder agent
|
||||
├→ Reads plan
|
||||
├→ Implements changes
|
||||
├→ Runs tests
|
||||
└→ Moves task to REVIEW
|
||||
↓
|
||||
Orchestrator creates Reviewer agent
|
||||
├→ Checks implementation against plan
|
||||
├→ Validates tests pass
|
||||
└→ Moves task to SHIP
|
||||
↓
|
||||
Orchestrator creates Shipper agent
|
||||
├→ Creates git commit
|
||||
├→ Pushes to remote
|
||||
└→ Task complete
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
|
||||
- **Clear phases** with distinct responsibilities
|
||||
- **Each agent focused** on single phase
|
||||
- **Quality gates** between phases
|
||||
- **Failure isolation** - if builder fails, planner work preserved
|
||||
|
||||
**Visual representation:**
|
||||
|
||||
```text
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
|
||||
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
|
||||
│ Task A │ │ │ │ │ │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
**Agent handoff:**
|
||||
|
||||
```python
|
||||
# Orchestrator manages task board state
|
||||
task = {
|
||||
"id": "update-titles",
|
||||
"status": "planning",
|
||||
"assigned_agent": "planner-001",
|
||||
"artifacts": []
|
||||
}
|
||||
|
||||
# Planner completes
|
||||
task["status"] = "building"
|
||||
task["artifacts"].append("plan.md")
|
||||
task["assigned_agent"] = "builder-001"
|
||||
|
||||
# Orchestrator hands off to builder
|
||||
orchestrator.command_agent(
|
||||
"builder-001",
|
||||
f"Implement plan from {task['artifacts'][0]}"
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Scout-Builder (Two-Stage)
|
||||
|
||||
**Use case:** UI changes, targeted modifications
|
||||
|
||||
**Flow:**
|
||||
|
||||
```text
|
||||
User: "Create gray pills for app header information"
|
||||
↓
|
||||
Orchestrator creates Scout
|
||||
├→ Locates exact files and line numbers
|
||||
├→ Identifies patterns and conventions
|
||||
└→ Outputs: scout-report.md
|
||||
↓
|
||||
Orchestrator creates Builder
|
||||
├→ Reads scout-report.md
|
||||
├→ Implements precise changes
|
||||
└→ Outputs: modified files
|
||||
↓
|
||||
Orchestrator wakes, verifies, reports
|
||||
```
|
||||
|
||||
**Orchestrator sleep pattern:**
|
||||
|
||||
```python
|
||||
# Orchestrator creates scout
|
||||
orchestrator.create_agent("scout-header", task="Find header UI components")
|
||||
|
||||
# Orchestrator sleeps, checking every 15s
|
||||
orchestrator.sleep_with_status_checks(interval=15)
|
||||
|
||||
# Scout completes, orchestrator wakes
|
||||
scout_output = orchestrator.get_agent_output("scout-header")
|
||||
|
||||
# Orchestrator creates builder with scout's output
|
||||
orchestrator.create_agent(
|
||||
"builder-ui",
|
||||
task=f"Create gray pills based on scout findings: {scout_output}"
|
||||
)
|
||||
|
||||
# Orchestrator sleeps again
|
||||
orchestrator.sleep_with_status_checks(interval=15)
|
||||
```
|
||||
|
||||
## Context Window Protection
|
||||
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
|
||||
|
||||
**The problem:** Single agent doing everything explodes context window
|
||||
|
||||
```text
|
||||
Single Agent Approach:
|
||||
├── Search codebase: 40k tokens
|
||||
├── Read files: 60k tokens
|
||||
├── Plan changes: 20k tokens
|
||||
├── Implement: 30k tokens
|
||||
├── Test: 15k tokens
|
||||
└── Total: 165k tokens (83% used!)
|
||||
```
|
||||
|
||||
**The solution:** Specialized agents with focused context
|
||||
|
||||
```text
|
||||
Orchestrator Approach:
|
||||
├── Orchestrator: 10k tokens (coordinates)
|
||||
├── Scout 1: 15k tokens (searches)
|
||||
├── Scout 2: 15k tokens (searches)
|
||||
├── Planner: 25k tokens (plans using scout output)
|
||||
├── Builder: 35k tokens (implements)
|
||||
└── Total per agent: <35k tokens (max 18% per agent)
|
||||
```
|
||||
|
||||
**Key principle:** Agents are deletable temporary resources
|
||||
|
||||
```text
|
||||
1. Create agent for specific task
|
||||
2. Agent completes task
|
||||
3. DELETE agent (free memory)
|
||||
4. Create new agent for next task
|
||||
5. Repeat
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
# User: "Build documentation for frontend and backend"
|
||||
|
||||
# Orchestrator creates 3 agents
|
||||
/create-agent frontend-docs "Document frontend components"
|
||||
/create-agent backend-docs "Document backend APIs"
|
||||
/create-agent qa-docs "Combine and QA both docs"
|
||||
|
||||
# Work completes...
|
||||
|
||||
# Delete all agents when done
|
||||
/delete-all-agents
|
||||
|
||||
# Result: All agents gone, context freed
|
||||
```
|
||||
|
||||
**Why delete agents:**
|
||||
|
||||
- Frees context windows for new work
|
||||
- Prevents context accumulation
|
||||
- Enforces single-purpose design
|
||||
- Matches engineering principle: "The best code is no code at all"
|
||||
|
||||
## CRUD for Agents
|
||||
|
||||
Orchestrator needs full agent lifecycle control:
|
||||
|
||||
**Create:**
|
||||
|
||||
```python
|
||||
agent_id = orchestrator.create_agent(
|
||||
name="scout-api",
|
||||
task="Find all API endpoints",
|
||||
model="haiku", # Fast, cheap for search
|
||||
max_tokens=100000
|
||||
)
|
||||
```
|
||||
|
||||
**Read:**
|
||||
|
||||
```python
|
||||
# Check agent status
|
||||
status = orchestrator.get_agent_status(agent_id)
|
||||
# => {"status": "working", "progress": "60%", "context_used": "15k tokens"}
|
||||
|
||||
# Read agent output
|
||||
output = orchestrator.get_agent_output(agent_id)
|
||||
# => {"files_consumed": [...], "files_produced": [...]}
|
||||
```
|
||||
|
||||
**Update:**
|
||||
|
||||
```python
|
||||
# Command existing agent with new task
|
||||
orchestrator.command_agent(
|
||||
agent_id,
|
||||
"Now implement the changes based on your findings"
|
||||
)
|
||||
```
|
||||
|
||||
**Delete:**
|
||||
|
||||
```python
|
||||
# Single agent
|
||||
orchestrator.delete_agent(agent_id)
|
||||
|
||||
# All agents
|
||||
orchestrator.delete_all_agents()
|
||||
```
|
||||
|
||||
## Observability Requirements
|
||||
|
||||
Without observability, orchestration is blind. You need:
|
||||
|
||||
### 1. Agent-Level Visibility
|
||||
|
||||
```text
|
||||
For each agent, track:
|
||||
├── Name and ID
|
||||
├── Status (creating, working, complete, failed)
|
||||
├── Context window usage
|
||||
├── Model and cost
|
||||
├── Files consumed
|
||||
├── Files produced
|
||||
└── Tool calls executed
|
||||
```
|
||||
|
||||
### 2. Cross-Agent Visibility
|
||||
|
||||
```text
|
||||
Fleet overview:
|
||||
├── Total agents active
|
||||
├── Total context consumed
|
||||
├── Total cost
|
||||
├── Agent dependencies (who's waiting on whom)
|
||||
└── Bottlenecks (slow agents blocking others)
|
||||
```
|
||||
|
||||
### 3. Real-Time Streaming
|
||||
|
||||
```text
|
||||
User sees:
|
||||
├── Agent creation events
|
||||
├── Tool calls as they happen
|
||||
├── Progress updates
|
||||
├── Completion notifications
|
||||
└── Error alerts
|
||||
```
|
||||
|
||||
**Implementation:** See [Hooks for Observability](hooks-observability.md) for complete architecture
|
||||
|
||||
## Information Flow in Orchestrated Systems
|
||||
|
||||
```text
|
||||
User
|
||||
↓ (prompts)
|
||||
Orchestrator
|
||||
↓ (creates & commands)
|
||||
Agent 1 → Agent 2 → Agent 3
|
||||
↓ ↓ ↓
|
||||
(results flow back up)
|
||||
↓
|
||||
Orchestrator (summarizes)
|
||||
↓
|
||||
User
|
||||
```
|
||||
|
||||
**Critical understanding:** Agents never talk directly to user. They report to orchestrator.
|
||||
|
||||
**Example:**
|
||||
|
||||
```python
|
||||
# User prompts orchestrator
|
||||
user: "Summarize codebase"
|
||||
|
||||
# Orchestrator creates agent with detailed instructions
|
||||
orchestrator → agent: """
|
||||
Read all files in src/
|
||||
Create markdown summary with:
|
||||
- Architecture overview
|
||||
- Key components
|
||||
- File structure
|
||||
- Tech stack
|
||||
|
||||
Report results back to orchestrator (not user!)
|
||||
"""
|
||||
|
||||
# Agent completes, reports to orchestrator
|
||||
agent → orchestrator: "Summary complete at docs/summary.md"
|
||||
|
||||
# Orchestrator reports to user
|
||||
orchestrator → user: "Codebase summary created with 3 main sections: architecture, components, and tech stack"
|
||||
```
|
||||
|
||||
## When to Use Orchestration
|
||||
|
||||
### Use orchestration when
|
||||
|
||||
✅ **Task requires 3+ specialized agents**
|
||||
|
||||
- Example: Scout + Plan + Build
|
||||
|
||||
✅ **Context window exploding in single agent**
|
||||
|
||||
- Single agent using >150k tokens
|
||||
|
||||
✅ **Need parallel execution**
|
||||
|
||||
- Multiple independent subtasks
|
||||
|
||||
✅ **Quality gates required**
|
||||
|
||||
- Plan → Build → Review → Ship
|
||||
|
||||
✅ **Long-running autonomous work**
|
||||
|
||||
- Agents work while you're AFK
|
||||
|
||||
### Don't use orchestration when
|
||||
|
||||
❌ **Simple one-off task**
|
||||
|
||||
- Single agent sufficient
|
||||
|
||||
❌ **Learning/prototyping**
|
||||
|
||||
- Orchestration adds complexity
|
||||
|
||||
❌ **No observability infrastructure**
|
||||
|
||||
- You'll be blind to agent behavior
|
||||
|
||||
❌ **Haven't mastered custom agents**
|
||||
|
||||
- Level 5 requires Level 4 foundation
|
||||
|
||||
## Practical Implementation
|
||||
|
||||
### Minimal Orchestrator Agent
|
||||
|
||||
```python
|
||||
# orchestrator-agent.md (sub-agent definition)
|
||||
|
||||
---
|
||||
name: orchestrator
|
||||
description: Manages fleet of agents for complex multi-step tasks
|
||||
---
|
||||
|
||||
# Orchestrator Agent
|
||||
|
||||
You are an orchestrator agent managing a fleet of specialized agents.
|
||||
|
||||
## Your Tools
|
||||
|
||||
- create_agent(name, task, model): Create new agent
|
||||
- command_agent(agent_id, task): Send task to existing agent
|
||||
- get_agent_status(agent_id): Check agent progress
|
||||
- get_agent_output(agent_id): Retrieve agent results
|
||||
- delete_agent(agent_id): Remove completed agent
|
||||
- delete_all_agents(): Clean up all agents
|
||||
|
||||
## Your Responsibilities
|
||||
|
||||
1. **Break down user requests** into specialized subtasks
|
||||
2. **Create focused agents** for each subtask
|
||||
3. **Command agents** with detailed instructions
|
||||
4. **Monitor progress** without micromanaging
|
||||
5. **Collect results** and synthesize for user
|
||||
6. **Delete agents** when work is complete
|
||||
|
||||
## Orchestrator Sleep Pattern
|
||||
|
||||
After creating and commanding agents:
|
||||
1. **SLEEP** - Stop consuming context
|
||||
2. **Wake every 15-30s** to check agent status
|
||||
3. **SLEEP again** if agents still working
|
||||
4. **Wake when all complete** to collect results
|
||||
|
||||
DO NOT observe all agent work. This explodes your context window.
|
||||
|
||||
## Example Workflow
|
||||
|
||||
```
|
||||
|
||||
User: "Migrate codebase to new SDK"
|
||||
|
||||
You:
|
||||
|
||||
1. Create scout agents (parallel search)
|
||||
2. Command scouts to find SDK usage
|
||||
3. SLEEP (check status every 15s)
|
||||
4. Wake when scouts complete
|
||||
5. Create planner agent
|
||||
6. Command planner with scout results
|
||||
7. SLEEP (check status every 15s)
|
||||
8. Wake when planner completes
|
||||
9. Create builder agent
|
||||
10. Command builder with plan
|
||||
11. SLEEP (check status every 15s)
|
||||
12. Wake when builder completes
|
||||
13. Summarize results for user
|
||||
14. Delete all agents
|
||||
|
||||
```bash
|
||||
|
||||
## Key Principles
|
||||
|
||||
- **One agent, one task** - Don't overload agents
|
||||
- **Sleep between phases** - Protect your context
|
||||
- **Delete when done** - Treat agents as temporary
|
||||
- **Detailed commands** - Don't assume agents know context
|
||||
- **Results-oriented** - Every agent must produce concrete output
|
||||
```
|
||||
|
||||
### Orchestrator Tools (SDK)
|
||||
|
||||
```python
|
||||
# create_agent tool
|
||||
@mcptool(
|
||||
name="create_agent",
|
||||
description="Create a new specialized agent"
|
||||
)
|
||||
def create_agent(params: dict) -> dict:
|
||||
name = params["name"]
|
||||
task = params["task"]
|
||||
model = params.get("model", "sonnet")
|
||||
|
||||
agent_id = agent_manager.create(
|
||||
name=name,
|
||||
system_prompt=task,
|
||||
model=model
|
||||
)
|
||||
|
||||
return {
|
||||
"agent_id": agent_id,
|
||||
"status": "created",
|
||||
"message": f"Agent {name} created"
|
||||
}
|
||||
|
||||
# command_agent tool
|
||||
@mcptool(
|
||||
name="command_agent",
|
||||
description="Send task to existing agent"
|
||||
)
|
||||
def command_agent(params: dict) -> dict:
|
||||
agent_id = params["agent_id"]
|
||||
task = params["task"]
|
||||
|
||||
result = agent_manager.prompt(agent_id, task)
|
||||
|
||||
return {
|
||||
"agent_id": agent_id,
|
||||
"status": "commanded",
|
||||
"message": f"Agent received task"
|
||||
}
|
||||
```
|
||||
|
||||
## Trade-offs
|
||||
|
||||
### Benefits
|
||||
|
||||
- ✅ Scales beyond single agent limits
|
||||
- ✅ Parallel execution (3x-10x speedup)
|
||||
- ✅ Context window protection
|
||||
- ✅ Specialized agent focus
|
||||
- ✅ Quality gates between phases
|
||||
- ✅ Autonomous out-of-loop work
|
||||
|
||||
### Costs
|
||||
|
||||
- ❌ Upfront investment to build
|
||||
- ❌ Infrastructure complexity (database, WebSocket)
|
||||
- ❌ More moving parts to manage
|
||||
- ❌ Requires observability
|
||||
- ❌ Orchestrator agent needs careful prompting
|
||||
- ❌ Not worth it for simple tasks
|
||||
|
||||
## Key Quotes
|
||||
|
||||
> "The orchestrator agent is the first pattern where I felt the perfect combination of observability, customizability, and agents at scale."
|
||||
>
|
||||
> "Treat your agents as deletable temporary resources that serve a single purpose."
|
||||
>
|
||||
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. Now, our agents are doing the work."
|
||||
>
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work."
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary source:** One Agent to Rule Them All (orchestrator architecture, three pillars, sleep pattern, CRUD)
|
||||
|
||||
**Supporting sources:**
|
||||
|
||||
- Claude 2.0 (scout-plan-build workflow, composable prompts)
|
||||
- Custom Agents (plan-build-review-ship task board)
|
||||
- Sub-Agents (information flow, delegation patterns)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Hooks for Observability](hooks-observability.md) - Required for orchestration
|
||||
- [Context Window Protection](context-window-protection.md) - Why orchestration matters
|
||||
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real orchestration systems
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Orchestration is Level 5. Master Levels 1-4 first. Then build your fleet.
|
||||
Reference in New Issue
Block a user