Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:00:36 +08:00
commit c83b4639c5
49 changed files with 18594 additions and 0 deletions

View File

@@ -0,0 +1,158 @@
# Context in Composition
**Strategic framework for managing context when composing multi-agent systems.**
## The Core Problem
Context window is your most precious resource when composing multiple agents. A focused agent is a performant agent.
**The Reality:**
```text
Single agent doing everything:
├── Context explodes to 150k+ tokens
├── Performance degrades
└── Eventually fails or times out
Multi-agent composition:
├── Each agent: <40k tokens
├── Main agent: Stays lean
└── Work completes successfully
```
## The R&D Framework
There are only two strategies for managing context in multi-agent systems:
**R - Reduce**
- Minimize what enters context windows
- Remove unused MCP servers (can consume 24k+ tokens)
- Shrink static CLAUDE.md files
- Use context priming instead of static loading
**D - Delegate**
- Move work to sub-agents' isolated contexts
- Use background agents for autonomous work
- Employ orchestrator sleep patterns
- Treat agents as deletable temporary resources
**Everything else is a tactic implementing R or D.**
## The Four Levels of Context Mastery
### Level 1: Beginner - Stop Wasting Tokens
**Focus:** Resource management
**Key Actions:**
- Remove unused MCP servers (reclaim 20k+ tokens)
- Minimize CLAUDE.md (<1k tokens)
- Disable autocompact buffer (reclaim 20%)
**Success Metric:** 85-90% context window free at startup
**Move to Level 2 when:** Resources cleaned but still rebuilding context for different tasks
---
### Level 2: Intermediate - Load Selectively
**Focus:** Dynamic context loading
**Key Actions:**
- Context priming (`/prime` commands vs. static files)
- Sub-agent delegation for parallel work
- Composable workflows (scout-plan-build)
**Success Metric:** 60-75% context window free during work
**Move to Level 3 when:** Managing multiple agents but struggling with handoffs
---
### Level 3: Advanced - Multi-Agent Handoff
**Focus:** Agent-to-agent context transfer
**Key Actions:**
- Context bundles (60-70% transfer in 10% tokens)
- Monitor context limits proactively
- Chain multiple agents without overflow
**Success Metric:** Per-agent context <60k tokens, successful handoffs
**Move to Level 4 when:** Need agents working autonomously while you do other work
---
### Level 4: Agentic - Out-of-Loop Systems
**Focus:** Fleet orchestration
**Key Actions:**
- Background agents (`/background` command)
- Dedicated agent environments
- Orchestrator sleep patterns
- Zero-touch execution
**Success Metric:** Agents ship work end-to-end without intervention
---
## When Context Becomes a Composition Issue
**Trigger 1: Single Agent Exceeds 150k Tokens**
→ Delegate to sub-agents with isolated contexts
**Trigger 2: Agent Reading >20 Files**
→ Use scout agents to identify relevant subset first
**Trigger 3: `/context` Shows >80% Used**
→ Start fresh agent, use context bundles for handoff
**Trigger 4: Performance Degrading Mid-Workflow**
→ Split workflow across multiple focused agents
**Trigger 5: Same Analysis Repeated Multiple Times**
→ Context overflow forcing re-reads; delegate earlier
## Composition Patterns by Level
**Beginner:** Single agent, minimal static context
**Intermediate:** Main agent + sub-agents for parallel work
**Advanced:** Agent chains with context bundles for handoff
**Agentic:** Orchestrator + fleet of specialized agents
## Key Principles
1. **Focused agents perform better** - Single purpose, minimal context
2. **Agents are deletable** - Free context by removing completed agents
3. **200k is plenty** - Context explosions are design problems, not capacity problems
4. **Orchestrators must sleep** - Don't observe all sub-agent work
5. **Context bundles over full replay** - 70% context in 10% tokens
## Implementation Details
For practical patterns, see:
- [Multi-Agent Context Isolation](../reference/multi-agent-context-isolation.md) - Parallel execution, context bundling
- [Orchestrator Pattern](orchestrator-pattern.md) - Sleep patterns, fleet management
- [Decision Framework](decision-framework.md) - When to use each component
## Source Attribution
Primary: Elite Context Engineering, Claude 2.0 transcripts
Supporting: One Agent to Rule Them All, Sub-Agents documentation
---
**Remember:** Context is the first pillar of the Core 4. Master context strategy, and you can scale infinitely with focused agents.

View File

@@ -0,0 +1,715 @@
# Context Window Protection
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
Context window protection is about managing your agent's most precious resource: attention. A focused agent is a performant agent.
## The Core Problem
**Every engineer hits this wall:**
```text
Agent starts: 10k tokens (5% used)
After exploration: 80k tokens (40% used)
After planning: 120k tokens (60% used)
During implementation: 170k tokens (85% used) ⚠️
Context explodes: 195k tokens (98% used) ❌
Agent performance degrades, fails, or times out
```
**The realization:** More context ≠ better performance. Too much context = cognitive overload.
## The R&D Framework
There are only two ways to manage your context window:
```text
R - REDUCE
└─→ Minimize what enters the context window
D - DELEGATE
└─→ Move work to other agents' context windows
```
**Everything else is a tactic implementing R or D.**
## The Four Levels of Context Protection
### Level 1: Beginner - Reduce Waste
**Focus:** Stop wasting tokens on unused resources
#### Tactic 1: Eliminate Default MCP Servers
**Problem:**
```bash
# Default mcp.json
{
"mcpServers": {
"firecrawl": {...}, # 6k tokens
"github": {...}, # 8k tokens
"postgres": {...}, # 5k tokens
"redis": {...} # 5k tokens
}
}
# Total: 24k tokens always loaded (12% of 200k window!)
```
**Solution:**
```bash
# Option 1: Delete default mcp.json entirely
rm .claude/mcp.json
# Option 2: Load selectively
claude-mcp-config --strict specialized-configs/firecrawl-only.json
# Result: 4k tokens instead of 24k (83% reduction)
```
#### Tactic 2: Minimize CLAUDE.md
**Before:**
```markdown
# CLAUDE.md (23,000 tokens = 11.5% of window)
- 500 lines of API documentation
- 300 lines of deployment procedures
- 1,500 lines of coding standards
- Architecture diagrams
- Always loaded, whether relevant or not
```
**After:**
```markdown
# CLAUDE.md (500 tokens = 0.25% of window)
# Only universal essentials
- Fenced code blocks MUST have language
- Use rg instead of grep
- ALWAYS use set -euo pipefail
```
**Rule:** Only include what you're 100% sure you want loaded 100% of the time.
#### Tactic 3: Disable Autocompact Buffer
**Problem:**
```bash
/context
# Output:
autocompact buffer: 22% ⚠️ (44k tokens gone!)
messages: 51%
system_tools: 8%
---
Total available: 78% (should be 100%)
```
**Solution:**
```bash
/config
# Set: autocompact = false
# Now:
/context
# Output:
messages: 51%
system_tools: 8%
custom_agents: 2%
---
Total available: 91% ✅ (reclaimed 22%!)
```
**Impact:** Reclaims 40k+ tokens immediately.
### Level 2: Intermediate - Dynamic Loading
**Focus:** Load what you need, when you need it
#### Tactic 4: Context Priming
**Replace static CLAUDE.md with task-specific `/prime` commands**
```markdown
# .claude/commands/prime.md
# General codebase context (2k tokens)
Read README, understand structure, report findings
# .claude/commands/prime-feature.md
# Feature development context (3k tokens)
Read feature requirements, understand dependencies, plan implementation
# .claude/commands/prime-api.md
# API work context (4k tokens)
Read API docs, understand endpoints, review integration patterns
```
**Usage pattern:**
```bash
# Starting feature work
/prime-feature
# vs. having 23k tokens always loaded
```
**Savings:** 20k tokens (87% reduction)
#### Tactic 5: Sub-Agent Delegation
**Problem:** Primary agent doing parallel work fills its own context
```text
Primary Agent tries to do:
├── Web scraping (15k tokens)
├── Documentation fetch (12k tokens)
├── Data analysis (10k tokens)
└── Synthesis (5k tokens)
= 42k tokens in one agent
```
**Solution:** Delegate to sub-agents with isolated contexts
```text
Primary Agent (9k tokens):
├→ Sub-Agent 1: Web scraping (15k tokens, isolated)
├→ Sub-Agent 2: Docs fetch (12k tokens, isolated)
└→ Sub-Agent 3: Analysis (10k tokens, isolated)
Total work: 46k tokens
Primary agent context: Only 9k tokens ✅
```
**Example:**
```bash
/load-ai-docs
# Agent spawns 10 sub-agents for web scraping
# Each scrape: ~3k tokens
# Total work: 30k tokens
# Primary agent context: Still only 9k tokens
# Savings: 21k tokens protected
```
**Key insight:** Sub-agents use system prompts (not user prompts), keeping their context isolated from primary.
### Level 3: Advanced - Multi-Agent Handoff
**Focus:** Chain agents together without context explosion
#### Tactic 6: Context Bundles
**Problem:** Agent 1's context explodes (180k tokens). Need to hand off to fresh Agent 2 without full replay.
**Solution:** Bundle 60-70% of essential context
```markdown
# context-bundle-2025-01-05-<session-id>.md
## Context Bundle
Created: 2025-01-05 14:30
Source Agent: agent-abc123
## Initial Setup
/prime-feature
## Read Operations (deduplicated)
- src/api/endpoints.ts
- src/components/Auth.tsx
- config/env.ts
## Key Findings
- Auth system uses JWT
- API has 15 endpoints
- Config needs migration
## User Prompts (summarized)
1. "Implement OAuth2 flow"
2. "Add refresh token logic"
[Excluded: full write operations, detailed read contents, tool execution details]
```
**Usage:**
```bash
# Agent 1: Context exploding at 180k
# Automatic bundle saved
# Agent 2: Fresh start (10k base)
/loadbundle /path/to/context-bundle-<timestamp>.md
# Agent 2 now has 70% of Agent 1's context in ~15k tokens
# Total: 25k tokens vs. 180k (86% reduction)
```
#### Tactic 7: Composable Workflows (Scout-Plan-Build)
**Problem:** Single agent searching + planning + building = context explosion
```text
Monolithic Agent:
├── Search codebase: 40k tokens
├── Read files: 60k tokens
├── Plan changes: 20k tokens
├── Implement: 30k tokens
├── Test: 15k tokens
└── Total: 165k tokens (83% used!)
```
**Solution:** Break into composable steps that delegate
```text
/scout-plan-build workflow:
Step 1: /scout (delegates to 4 parallel sub-agents)
├→ Sub-agents search codebase: 4 × 15k = 60k total
├→ Output: relevant-files.md (5k tokens)
└→ Primary agent context: unchanged
Step 2: /plan-with-docs
├→ Reads relevant-files.md: 5k tokens
├→ Scrapes docs: 8k tokens
├→ Creates plan: 3k tokens
└→ Total added: 16k tokens
Step 3: /build
├→ Reads plan: 3k tokens
├→ Implements: 30k tokens
└→ Total added: 33k tokens
Final primary agent context: 10k + 16k + 33k = 59k tokens
Savings: 106k tokens (64% reduction)
```
**Why this works:** Scout step offloads searching from planner (R&D: Reduce + Delegate)
### Level 4: Agentic - Out-of-Loop Systems
**Focus:** Agents working autonomously while you're AFK
#### Tactic 8: Focused Agents (One Agent, One Task)
**Anti-pattern:**
```text
Super Agent (trying to do everything):
├── API development
├── UI implementation
├── Database migrations
├── Testing
├── Documentation
├── Deployment
└── Context: 170k tokens (85% used)
```
**Pattern:**
```text
Focused Agent Fleet:
├── Agent 1: API only (30k tokens)
├── Agent 2: UI only (35k tokens)
├── Agent 3: DB only (20k tokens)
├── Agent 4: Tests only (25k tokens)
├── Agent 5: Docs only (15k tokens)
└── Each agent: <35k tokens (max 18% per agent)
```
**Principle:** "A focused engineer is a performant engineer. A focused agent is a performant agent."
#### Tactic 9: Deletable Agents
**Pattern:**
```bash
# Create agent for specific task
/create-agent docs-writer "Document frontend components"
# Agent completes task (used 30k tokens)
# DELETE agent immediately
/delete-agent docs-writer
# Result: 30k tokens freed for next agent
```
**Lifecycle:**
```text
1. Create agent → Task-specific context loaded
2. Agent works → Context grows to completion
3. Agent completes → Context maxed out
4. DELETE agent → Context freed
5. Create new agent → Fresh start
6. Repeat
```
**Engineering analogy:** "The best code is no code at all. The best agent is a deleted agent."
#### Tactic 10: Background Agent Delegation
**Problem:** You're in the loop, waiting for agent to finish long task
**Solution:** Delegate to background agent, continue working
```bash
# In-loop (you wait, your context stays open)
/implement-feature "Build auth system"
# Your terminal blocked for 20 minutes
# Context accumulates: 150k tokens
# Out-of-loop (you continue working)
/background "Build auth system" \
--model opus \
--report agents/auth-report.md
# Background agent works independently
# Your terminal freed immediately
# Background agent context isolated
# You get notified when complete
```
**Context protection:**
- Primary agent: 10k tokens (just manages job queue)
- Background agent: 150k tokens (isolated, will be deleted)
- Your interactive session: 10k tokens (protected)
#### Tactic 11: Orchestrator Sleep Pattern
**Problem:** Orchestrator observing all agent work = context explosion
```text
Orchestrator watches everything:
├── Scout 1 work: 15k tokens observed
├── Scout 2 work: 15k tokens observed
├── Scout 3 work: 15k tokens observed
├── Planner work: 25k tokens observed
├── Builder work: 35k tokens observed
└── Orchestrator context: 105k tokens
```
**Solution:** Orchestrator sleeps while agents work
```text
Orchestrator pattern:
1. Create scouts → 3k tokens (commands only)
2. SLEEP (not observing)
3. Wake every 15s, check status → 1k tokens
4. Scouts complete, read outputs → 5k tokens
5. Create planner → 2k tokens
6. SLEEP (not observing)
7. Wake every 15s, check status → 1k tokens
8. Planner completes, read output → 3k tokens
9. Create builder → 2k tokens
10. SLEEP (not observing)
Orchestrator final context: 17k tokens ✅
vs. 105k if watching everything (84% reduction)
```
**Key principle:** Orchestrator wakes to coordinate, sleeps while agents work.
## Monitoring Context Health
### The /context Command
```bash
/context
# Healthy agent (beginner level):
messages: 8%
system_tools: 5%
custom_agents: 2%
---
Total used: 15% ✅ (85% free)
# Warning (intermediate):
messages: 45%
mcp_tools: 18%
system_tools: 5%
---
Total used: 68% ⚠️ (32% free, approaching limits)
# Danger (needs intervention):
messages: 72%
mcp_tools: 24%
system_tools: 5%
---
Total used: 101% ❌ (context overflow!)
```
### Success Metrics by Level
| Level | Target Context Free | What This Enables |
|-------|---------------------|-------------------|
| Beginner | 85-90% | Basic tasks without running out |
| Intermediate | 60-75% | Complex tasks with breathing room |
| Advanced | 40-60% | Multi-step workflows without overflow |
| Agentic | Per-agent 60-80% | Fleet of focused agents |
### Warning Signs
**Your context window is in danger when:**
**Single agent exceeds 150k tokens**
- Solution: Split work across multiple agents
**Agent needs to read >20 files**
- Solution: Use scout agents to find relevant subset
**`/context` shows >80% used**
- Solution: Start fresh agent, use context bundles
**Agent gets slower/less accurate**
- Solution: Check context usage, delegate to sub-agents
**Autocompact buffer active**
- Solution: Disable it, reclaim 20%+ tokens
## Context Window Hard Limits
> "Context window is a hard limit. We have to respect this and work around it."
### The Reality
```text
Claude Opus 200k limit:
├── System prompt: ~8k tokens (4%)
├── Available tools: ~5k tokens (2.5%)
├── MCP servers: 0-24k tokens (0-12%)
├── CLAUDE.md: 0-23k tokens (0-11.5%)
├── Custom agents: ~2k tokens (1%)
└── Available for work: 138-185k tokens (69-92.5%)
Best case (optimized): 185k available
Worst case (unoptimized): 138k available
Difference: 47k tokens (25% of total capacity!)
```
### Real Example from the Field
> "We were 14% away from exploding our context in our scout-plan-build workflow."
```text
Scout-Plan-Build execution:
├── Base context: 15k tokens
├── Scout work (4 sub-agents): +40k tokens
├── Planner work: +35k tokens
├── Builder work: +80k tokens
└── Total: 170k tokens
With autocompact buffer (22%):
170k / 0.78 = 218k tokens
❌ Exceeds 200k limit by 18k (9% overflow)
Without autocompact buffer:
170k / 1.0 = 170k tokens
✅ Within limits with 30k buffer (15% free)
```
**Lesson:** Every percentage point matters when approaching limits.
## Common Context Explosion Patterns
### Pattern 1: The Sponge Agent
**Symptoms:**
- Agent reads entire codebase
- Opens 50+ files
- Context grows 10k tokens every few minutes
**Cause:** No filtering strategy
**Fix:**
```bash
# Before: Agent reads everything
Agent: "Analyzing codebase..."
[reads 100 files = 150k tokens]
# After: Scout first
/scout "Find files related to authentication"
# Scout outputs: 5 relevant files
Agent reads only those 5 files = 8k tokens
```
### Pattern 2: The Accumulator
**Symptoms:**
- Long conversation
- Many tool calls
- Context steadily grows to limit
**Cause:** Not resetting agent between phases
**Fix:**
```bash
# Phase 1: Exploration
[Agent explores, context hits 120k]
# Phase 2: Implementation
# ❌ Bad: Continue same agent (will overflow)
# ✅ Good: New agent with context bundle
/loadbundle context-from-phase-1.md
# Fresh agent (15k) + bundle (20k) = 35k tokens
# Ready for implementation without overflow
```
### Pattern 3: The Observer
**Symptoms:**
- Orchestrator context growing rapidly
- Watching all sub-agent work
- Can't coordinate more than 2-3 agents
**Cause:** Not using sleep pattern
**Fix:**
```python
# ❌ Bad: Orchestrator watches everything
for agent in agents:
result = orchestrator.watch_agent_work(agent) # Observes all work
orchestrator.context += result # Context explodes
# ✅ Good: Orchestrator sleeps
for agent in agents:
orchestrator.create_and_command(agent)
orchestrator.sleep() # Not observing
orchestrator.wake_and_check_status() # Only reads summaries
```
## The "200k is Plenty" Principle
> "I'm super excited for larger effective context windows, but 200k context window is plenty. You're just stuffing a single agent with too much work."
**The mindset shift:**
```text
Beginner thinking:
"I need a bigger context window"
"If only I had 500k tokens..."
"My task is too complex for 200k"
Expert thinking:
"I need better context management"
"I'm overloading a single agent"
"I should split this across focused agents"
```
**The truth:** Most context explosions are design problems, not capacity problems.
### Why 200k is Sufficient
**With proper protection:**
```text
Task: Refactor authentication across 50-file codebase
Approach 1 (Single Agent - fails):
├── Agent reads 50 files: 75k tokens
├── Agent plans changes: 20k tokens
├── Agent implements: 80k tokens
├── Agent tests: 30k tokens
└── Total: 205k tokens ❌ (overflow by 5k)
Approach 2 (Multi-Agent - succeeds):
├── Scout finds relevant 10 files: 15k tokens
├── Planner creates strategy: 20k tokens (new agent)
├── Builder 1 (auth logic): 35k tokens (new agent)
├── Builder 2 (UI changes): 30k tokens (new agent)
├── Tester verifies: 25k tokens (new agent)
└── Max per agent: 35k tokens ✅ (all within limits)
```
## Integration with Other Patterns
Context window protection enables:
**Progressive Disclosure:**
- Reduces: Minimal static context
- Enables: Dynamic loading via priming
**Core 4 Management:**
- Protects: Context (pillar #1)
- Enables: Better model/prompt/tools choices
**Orchestration:**
- Requires: Context protection (orchestrator sleep)
- Enables: Fleet management without overflow
**Observability:**
- Monitors: Context usage via hooks
- Prevents: Unnoticed context explosion
## Key Principles
1. **Reduce and Delegate** - The only two strategies that matter
2. **A focused agent is a performant agent** - Single-purpose beats multi-purpose
3. **Agents are deletable** - Free context by removing completed agents
4. **200k is plenty** - Context explosions are design problems
5. **Monitor constantly** - `/context` command is your best friend
6. **Orchestrators must sleep** - Don't observe all agent work
7. **Context bundles over full replay** - 70% of context in 10% of tokens
## Source Attribution
**Primary sources:**
- Elite Context Engineering (R&D framework, 4 levels, all tactics)
- Claude 2.0 (autocompact buffer, hard limits, scout-plan-build)
**Supporting sources:**
- One Agent to Rule Them All (orchestrator sleep, 200k principle, deletable agents)
- Sub-Agents (sub-agent delegation, context isolation)
**Key quotes:**
- "200k context window is plenty. You're just stuffing a single agent with too much work." (One Agent)
- "A focused agent is a performant agent." (Elite Context Engineering)
- "We were 14% away from exploding our context." (Claude 2.0)
- "There are only two ways to manage your context window: R and D." (Elite Context Engineering)
## Related Documentation
- [Progressive Disclosure](../reference/progressive-disclosure.md) - Context loading strategies
- [Orchestrator Pattern](orchestrator-pattern.md) - Fleet management requiring protection
- [Evolution Path](../workflows/evolution-path.md) - Progression through protection levels
- [Core 4 Framework](../reference/core-4-framework.md) - Context as first pillar
---
**Remember:** Context window management separates beginners from experts. Master it, and you can scale infinitely with focused agents.

View File

@@ -0,0 +1,434 @@
# Decision Framework: Choosing the Right Claude Code Component
This guide helps you choose the right Claude Code component for your specific task. **Always start with prompts**—master the primitive first before scaling to other components.
## Table of Contents
- [The Decision Tree](#the-decision-tree)
- [Quick Reference: Decision Matrix](#quick-reference-decision-matrix)
- [When to Use Each Component](#when-to-use-each-component)
- [Use Skills When](#use-skills-when)
- [Use Sub-Agents When](#use-sub-agents-when)
- [Use Slash Commands When](#use-slash-commands-when)
- [Use MCP Servers When](#use-mcp-servers-when)
- [Use Hooks When](#use-hooks-when)
- [Use Plugins When](#use-plugins-when)
- [Use Case Examples from the Field](#use-case-examples-from-the-field)
- [Composition Rules and Boundaries](#composition-rules-and-boundaries)
- [What Can Compose What](#what-can-compose-what)
- [Critical Composition Rules](#critical-composition-rules)
- [The Proper Evolution Path](#the-proper-evolution-path)
- [Stage 1: Start with a Prompt](#stage-1-start-with-a-prompt)
- [Stage 2: Add Sub-Agent if Parallelism Needed](#stage-2-add-sub-agent-if-parallelism-needed)
- [Stage 3: Create Skill When Management Needed](#stage-3-create-skill-when-management-needed)
- [Stage 4: Add MCP if External Data Needed](#stage-4-add-mcp-if-external-data-needed)
- [Common Decision Anti-Patterns](#common-decision-anti-patterns)
- [Anti-Pattern 1: Converting All Slash Commands to Skills](#anti-pattern-1-converting-all-slash-commands-to-skills)
- [Anti-Pattern 2: Using Skills for One-Off Tasks](#anti-pattern-2-using-skills-for-one-off-tasks)
- [Anti-Pattern 3: Skipping the Primitive](#anti-pattern-3-skipping-the-primitive)
- [Anti-Pattern 4: Using Sub-Agents When Context Matters](#anti-pattern-4-using-sub-agents-when-context-matters)
- [Anti-Pattern 5: Forgetting MCP is for External Only](#anti-pattern-5-forgetting-mcp-is-for-external-only)
- [Decision Checklist](#decision-checklist)
- [Summary: The Golden Rules](#summary-the-golden-rules)
## The Decision Tree
Start here when deciding which component to use:
```text
1. START HERE: Build a Prompt (Slash Command)
2. Need parallelization or isolated context?
YES → Use Sub-Agent
NO → Continue
3. External data/service integration?
YES → Use MCP Server
NO → Continue
4. One-off task (simple, direct)?
YES → Use Slash Command
NO → Continue
5. Repeatable workflow (pattern detection)?
YES → Use Agent Skill
NO → Continue
6. Lifecycle event automation?
YES → Use Hook
NO → Continue
7. Sharing/distributing to team?
YES → Use Plugin
NO → Default to Slash Command (prompt)
```
**Critical Rule:** Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
## Quick Reference: Decision Matrix
| Task Type | Component | Reason |
|-----------|-----------|---------|
| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
| External data/service access | MCP Server | Integration point |
| Parallel/isolated work | Sub-Agent | Context isolation |
| Parallel workflow tasks | Sub-Agent | **Whenever you see "parallel", think sub-agents** |
| One-off task | Slash Command | Simple, direct |
| Lifecycle automation | Hook | Event-driven |
| Team distribution | Plugin | Packaging |
## When to Use Each Component
### Use Skills When
**Signal keywords:** "automatic," "repeat," "manage," "workflow"
**Criteria:**
- You have a **REPEAT** problem that needs **MANAGEMENT**
- Multiple related operations need coordination
- You want **automatic** behavior (agent-invoked)
- The problem domain requires orchestration of multiple components
**Example scenarios:**
- Managing git work trees (create, list, remove, merge, update)
- Detecting style guide violations across codebase
- Automatic PDF text extraction and processing
- Video processing workflows with multiple steps
**NOT for:**
- One-off tasks → Use Slash Command instead
- Simple operations → Use Slash Command instead
- Problems solved well by a single prompt → Don't over-engineer
**Remember:** Skills are for managing problem domains, not solving one-off tasks.
### Use Sub-Agents When
**Signal keywords:** "parallel," "scale," "bulk," "isolated," "batch"
**Criteria:**
- **Parallelization** is needed
- **Context isolation** is required
- Scale tasks and batch operations
- You're okay with losing context afterward
- Each task can run independently
**Example scenarios:**
- Comprehensive security audits
- Fix & debug tests at scale
- Parallel workflow tasks
- Bulk operations on multiple files
- Isolated research that doesn't pollute main context
**NOT for:**
- Tasks that need to share context → Use main conversation
- Sequential operations → Use Slash Command or Skill
- Tasks that need to spawn more sub-agents → Hard limit: no nesting
**Critical constraint:** You must be okay with losing context afterward. Sub-agent context doesn't persist in the main conversation (unless you use resumable sub-agents).
**Golden rule:** "Whenever you see parallel, you should always just think sub-agents. Nothing else supports parallel calling."
### Use Slash Commands When
**Signal keywords:** "one-off," "simple," "quick," "manual"
**Criteria:**
- One-off tasks
- Simple repeatable actions
- You're starting a new workflow
- Building the primitive before composing
- You want manual control over invocation
**Example scenarios:**
- Git commit messages (one at a time)
- Create UI component
- Run specific code generation
- Execute a well-defined task
- Quick transformations
**Philosophy:** "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
**Remember:** Slash commands are the primitive foundation. Master these first before anything else.
### Use MCP Servers When
**Signal keywords:** "external," "database," "API," "service," "integration"
**Criteria:**
- External integrations are needed
- Data sources outside Claude Code
- Third-party services
- Database connections
- Real-time data access
**Example scenarios:**
- Connect to Jira
- Query databases (PostgreSQL, etc.)
- Fetch real-time weather data
- GitHub integration
- Slack integration
- Figma designs
**NOT for:**
- Internal orchestration → Use Skills instead
- Pure computation → Use Slash Command or Skill
**Clear rule:** External = MCP, Internal orchestration = Skills
**Context consideration:** MCP servers can "torch your context window" by loading all their context at startup, unlike Skills which use progressive disclosure.
### Use Hooks When
**Signal keywords:** "lifecycle," "event," "automation," "deterministic"
**Criteria:**
- Deterministic automation at lifecycle events
- Want to execute commands at specific moments
- Need to balance agent autonomy with deterministic control
- Workflow automation that should always happen
**Example scenarios:**
- Run linters before code submission
- Auto-format code after generation
- Trigger tests after file changes
- Capture context at specific points
**Philosophy:** "If you really want to scale, you need both" - agents AND deterministic workflows.
**Use for:** Adding determinism rather than always relying on the agent to decide.
### Use Plugins When
**Signal keywords:** "share," "distribute," "package," "team"
**Criteria:**
- Sharing/distributing to team
- Packaging multiple components together
- Reusable work across projects
- Team-wide extensions
**Example scenarios:**
- Distribute custom skills to team
- Bundle MCP servers for automatic start
- Share slash commands across projects
- Package hooks and configurations
**Philosophy:** "Plugins let you package and distribute these sets of work. This isn't super interesting. It's just a way to share and reuse cloud code extensions."
## Use Case Examples from the Field
Real examples with reasoning:
| Use Case | Component | Reasoning |
|----------|-----------|-----------|
| Automatic PDF text extraction | Agent Skill | Keyword "automatic", repeat behavior |
| Connect to Jira | MCP Server | External source |
| Comprehensive security audit | Sub-Agent | Scale, isolated context, not automatic |
| Generalized git commit messages | Slash Command | Simple one-step task |
| Query database | MCP Server | External data source (start here) |
| Fix/debug tests at scale | Sub-Agent | Parallel work, scale |
| Detect style guide violations | Agent Skill | Repeat behavior pattern |
| Fetch real-time weather | MCP Server | Third-party service integration |
| Create UI component | Slash Command | Simple one-off task |
| Parallel workflow tasks | Sub-Agent | Keyword "parallel" |
## Composition Rules and Boundaries
### What Can Compose What
**Skills (Top Compositional Layer):**
- ✅ Can use: MCP Servers
- ✅ Can use: Sub-Agents
- ✅ Can use: Slash Commands
- ✅ Can use: Other Skills
- ❌ Cannot: Nest sub-agents/prompts directly (must use SlashCommand tool)
**Slash Commands (Primitive + Compositional):**
- ✅ Can use: Skills (via SlashCommand tool)
- ✅ Can use: MCP Servers
- ✅ Can use: Sub-Agents
- ✅ Acts as: BOTH primitive AND composition point
**Sub-Agents (Execution Layer):**
- ✅ Can use: Slash Commands (via SlashCommand tool)
- ✅ Can use: Skills (via SlashCommand tool)
- ❌ CANNOT use: Other Sub-Agents (hard limit)
**MCP Servers (Integration Layer):**
- Lower level unit, used BY skills
- Not using skills
- Expose services to all components
### Critical Composition Rules
1. **Sub-Agents cannot nest** - No sub-agent spawning other sub-agents (prevents infinite nesting)
2. **Skills don't execute code** - They guide Claude to use available tools
3. **Slash commands can be invoked manually or via SlashCommand tool**
4. **Skills use the SlashCommand tool** to compose prompts and sub-agents
5. **No circular dependencies** - Skills can use other skills but cannot nest circularly
## The Proper Evolution Path
When building new capabilities, follow this progression:
### Stage 1: Start with a Prompt
**Goal:** Solve the basic problem
Create a simple prompt or slash command that accomplishes the core task.
**Example (Git Work Trees):** Create one work tree
```bash
/create-worktree feature-branch
```
**When to stay here:** The task is one-off or infrequent.
### Stage 2: Add Sub-Agent if Parallelism Needed
**Goal:** Scale to multiple parallel operations
If you need to do the same thing many times in parallel, use a sub-agent.
**Example (Git Work Trees):** Create multiple work trees in parallel
```bash
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
```
**When to stay here:** Parallel execution is the only requirement, no orchestration needed.
### Stage 3: Create Skill When Management Needed
**Goal:** Bundle multiple related operations
When the problem grows to require management, create a skill.
**Example (Git Work Trees):** Manage work trees (create, list, remove, merge, update)
Now you have a cohesive work tree manager skill that:
- Creates new work trees
- Lists existing work trees
- Removes old work trees
- Merges work trees
- Updates work tree status
**When to stay here:** Most domain-specific workflows stop here.
### Stage 4: Add MCP if External Data Needed
**Goal:** Integrate external systems
Only add MCP servers when you need data from outside Claude Code.
**Example (Git Work Trees):** Query external repo metadata from GitHub API
Now your skill can query GitHub for:
- Branch protection rules
- CI/CD status
- Pull request information
**Final state:** Full-featured work tree manager with external integration.
## Common Decision Anti-Patterns
### ❌ Anti-Pattern 1: Converting All Slash Commands to Skills
**Mistake:** "I'm going to convert all my slash commands to skills because skills are better."
**Why wrong:** Skills are for repeatable workflows that need management, not simple one-off tasks. Slash commands are the primitive—you need them.
**Correct approach:** Keep slash commands for simple tasks. Only create a skill when you're managing a problem domain with multiple related operations.
### ❌ Anti-Pattern 2: Using Skills for One-Off Tasks
**Mistake:** "I need to create a UI component once, so I'll build a skill for it."
**Why wrong:** Skills are for repeat problems. One-off tasks should use slash commands.
**Correct approach:** Use a slash command for the one-off task. If you find yourself doing it repeatedly, then consider a skill.
### ❌ Anti-Pattern 3: Skipping the Primitive
**Mistake:** "I'm going to start by building a skill because it's more advanced."
**Why wrong:** If you don't master prompts, you can't build effective skills. Everything is prompts in the end.
**Correct approach:** Always start with a prompt. Build the primitive first. Scale up only when needed.
### ❌ Anti-Pattern 4: Using Sub-Agents When Context Matters
**Mistake:** "I'll use a sub-agent for this research task and then reference the findings later."
**Why wrong:** Sub-agent context is isolated. You lose it after the sub-agent finishes (unless using resumable sub-agents).
**Correct approach:** If you need the context later, do the work in the main conversation or use a resumable sub-agent.
### ❌ Anti-Pattern 5: Forgetting MCP is for External Only
**Mistake:** "I'll build an MCP server to orchestrate internal workflows."
**Why wrong:** MCP servers are for external integrations. Internal orchestration should use skills.
**Correct approach:** MCP = external, Skills = internal orchestration. Keep them separate.
## Decision Checklist
Before you start building, ask yourself:
**Basic Questions:**
- [ ] Have I started with a prompt? (Non-negotiable)
- [ ] Is this a one-off task or repeatable?
- [ ] Do I need external data or services?
- [ ] Is parallelization required?
- [ ] Am I okay losing context after execution?
**Composition Questions:**
- [ ] Am I trying to nest sub-agents? (Not allowed)
- [ ] Am I converting a simple slash command to a skill? (Probably wrong)
- [ ] Am I using MCP for internal orchestration? (Should use skills)
- [ ] Have I considered the evolution path? (Prompt → Sub-agent → Skill → MCP)
**Context Questions:**
- [ ] Will this torch my context window? (MCP consideration)
- [ ] Do I need progressive disclosure? (Skills benefit)
- [ ] Is context isolation critical? (Sub-agent benefit)
- [ ] Will I need this context later? (Don't use sub-agent)
## Summary: The Golden Rules
1. **Always start with prompts** - Master the primitive first
2. **"Parallel" keyword = Sub-Agents** - Nothing else supports parallel calling
3. **External = MCP, Internal = Skills** - Clear separation of concerns
4. **One-off = Slash Command** - Don't over-engineer
5. **Repeat + Management = Skill** - Only scale when needed
6. **Don't convert all slash commands to skills** - Huge mistake
7. **Skills compose upward, not downward** - Build from primitives
Remember The Core 4: Context, Model, Prompt, Tools. Master these fundamentals, and you'll master the compositional units.

View File

@@ -0,0 +1,925 @@
# Hooks for Observability and Control
> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
Claude Code hooks provide deterministic control over agent behavior and enable comprehensive monitoring of multi-agent systems.
## What Are Hooks?
**Hooks are lifecycle event handlers that let you execute custom code at specific points in Claude Code's execution.**
```text
Agent Lifecycle:
├── pre-tool-use hook → Before any tool executes
├── [Tool executes]
├── post-tool-use hook → After tool completes
├── notification hook → When agent needs input
├── sub-agent-stop hook → When sub-agent finishes
└── stop hook → When agent completes response
```
**Two killer use cases:**
1. **Observability** - Know what your agents are doing
2. **Control** - Steer and block agent behavior
## The Five Hooks
### 1. pre-tool-use
**When it fires:** Before any tool executes
**Use cases:**
- Block dangerous commands (`rm -rf`, destructive operations)
- Prevent access to sensitive files (`.env`, `credentials.json`)
- Log tool attempts before execution
- Validate tool parameters
**Available data:**
```json
{
"toolName": "bash",
"toolInput": {
"command": "rm -rf /",
"description": "Remove all files"
}
}
```
**Example: Block dangerous commands**
```python
# .claude/hooks/pre-tool-use.py
# /// script
# dependencies = []
# ///
import sys
import json
import re
def is_dangerous_remove_command(tool_name, tool_input):
"""Block any rm -rf commands"""
if tool_name != "bash":
return False
command = tool_input.get("command", "")
dangerous_patterns = [
r'\brm\s+-rf\b',
r'\brm\s+-fr\b',
r'\brm\s+.*-[rf].*\*',
]
return any(re.search(pattern, command) for pattern in dangerous_patterns)
def main():
input_data = json.load(sys.stdin)
tool_name = input_data.get("toolName")
tool_input = input_data.get("toolInput", {})
if is_dangerous_remove_command(tool_name, tool_input):
# Block the command
output = {
"allow": False,
"message": "❌ Blocked dangerous rm command"
}
else:
output = {"allow": True}
print(json.dumps(output))
if __name__ == "__main__":
main()
```
**Configuration in settings.json:**
```json
{
"hooks": {
"pre-tool-use": [
{
"matcher": {}, // Empty = matches all tools
"commands": [
"uv run .claude/hooks/pre-tool-use.py"
]
}
]
}
}
```
### 2. post-tool-use
**When it fires:** After a tool completes execution
**Use cases:**
- Log tool execution results
- Track which tools are used most frequently
- Measure tool execution time
- Build observability dashboards
- Summarize tool output with small models
**Available data:**
```json
{
"toolName": "write",
"toolInput": {
"file_path": "/path/to/file.py",
"content": "..."
},
"toolResult": {
"success": true,
"output": "File written successfully"
}
}
```
**Example: Event logging with summarization**
```python
# .claude/hooks/post-tool-use.py
import sys
import json
import os
from anthropic import Anthropic
def summarize_event(tool_name, tool_input, tool_result):
"""Use Haiku to summarize what happened"""
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
prompt = f"""Summarize this tool execution in 1 sentence:
Tool: {tool_name}
Input: {json.dumps(tool_input, indent=2)}
Result: {json.dumps(tool_result, indent=2)}
Be concise and focus on what was accomplished."""
response = client.messages.create(
model="claude-3-haiku-20240307", # Small, fast, cheap
max_tokens=100,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def main():
input_data = json.load(sys.stdin)
# Generate summary using small model
summary = summarize_event(
input_data.get("toolName"),
input_data.get("toolInput", {}),
input_data.get("toolResult", {})
)
# Log the event with summary
event = {
"toolName": input_data["toolName"],
"summary": summary,
"timestamp": input_data.get("timestamp")
}
# Send to observability server
send_to_server(event)
if __name__ == "__main__":
main()
```
**Why small models?** "I've sent thousands of these events. I've spent less than 20 cents. This is where small fast models really shine."
### 3. notification
**When it fires:** When Claude Code needs user input (permission request)
**Use cases:**
- Text-to-speech notifications
- Send alerts to phone/Slack
- Log permission requests
- Auto-approve specific tools
**Available data:**
```json
{
"message": "Your agent needs your input",
"context": {
"toolName": "bash",
"command": "bun run apps/hello.ts"
}
}
```
**Example: Text-to-speech notification**
```python
# .claude/hooks/notification.py
import sys
import json
import subprocess
def speak(text):
"""Use 11Labs API for text-to-speech"""
subprocess.run([
"uv", "run",
".claude/hooks/utils/text-to-speech-elevenlabs.py",
text
])
def main():
input_data = json.load(sys.stdin)
message = input_data.get("message", "Your agent needs your input")
# Speak the notification
speak(message)
# Log it
print(json.dumps({"notified": True}))
if __name__ == "__main__":
main()
```
### 4. stop
**When it fires:** Every time Claude Code finishes responding
**Use cases:**
- Copy full chat transcript for analysis
- Completion notifications (text-to-speech)
- Session logging
- Performance metrics
- Agent output summarization
**Available data:**
```json
{
"transcriptPath": "/path/to/chat-transcript.json",
"sessionId": "abc123",
"timestamp": "2025-01-05T14:30:00Z"
}
```
**Example: Save full conversation**
```python
# .claude/hooks/stop.py
import sys
import json
import shutil
from pathlib import Path
from datetime import datetime
def main():
input_data = json.load(sys.stdin)
transcript_path = input_data.get("transcriptPath")
if not transcript_path:
return
# Copy transcript to logs directory
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
session_id = input_data.get("sessionId", "unknown")
logs_dir = Path(".claude/logs")
logs_dir.mkdir(exist_ok=True)
dest = logs_dir / f"chat-{timestamp}-{session_id[:8]}.json"
shutil.copy(transcript_path, dest)
# Announce completion
subprocess.run([
"uv", "run",
".claude/hooks/utils/text-to-speech.py",
"All set and ready for your next step"
])
print(json.dumps({"logged": True, "file": str(dest)}))
if __name__ == "__main__":
main()
```
**Key insight:** "The stop event is the perfect time to copy the entire chat conversation. This is key for observability. What happened? How can we improve it?"
### 5. sub-agent-stop
**When it fires:** When a sub-agent completes its work
**Use cases:**
- Track parallel sub-agent completion
- Per-agent performance metrics
- Multi-agent orchestration logging
- Progress notifications for long-running jobs
**Available data:**
```json
{
"subAgentId": "agent-123",
"transcriptPath": "/path/to/sub-agent-transcript.json",
"sessionId": "parent-abc123",
"timestamp": "2025-01-05T14:32:00Z"
}
```
**Example: Sub-agent completion tracking**
```python
# .claude/hooks/sub-agent-stop.py
import sys
import json
def main():
input_data = json.load(sys.stdin)
# Log sub-agent completion
event = {
"type": "sub-agent-complete",
"agentId": input_data.get("subAgentId"),
"timestamp": input_data.get("timestamp")
}
# Send to observability system
send_event(event)
# Announce
speak("Sub agent complete")
if __name__ == "__main__":
main()
```
## Multi-Agent Observability Architecture
When scaling to 3, 5, 10+ agents, hooks enable comprehensive system visibility.
### Architecture Overview
```text
┌─────────────────────────────────────────────────────────────┐
│ Multiple Agents │
│ Agent 1 Agent 2 Agent 3 ... Agent N │
│ │ │ │ │ │
│ └──────────┴──────────┴──────────────────┘ │
│ │ │
│ Hooks fire │
│ ↓ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Bun/Node Server │
│ ┌────────────────┐ ┌──────────────┐ │
│ │ HTTP Endpoint │────────→│ SQLite DB │ │
│ │ /events │ │ (persistence)│ │
│ └────────────────┘ └──────────────┘ │
│ │ │
│ └────────────→ WebSocket Broadcast │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Web Client (Vue/React) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Event Stream (filtered by app/session/event type) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Event Details (with AI summaries) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Key Design Principles
**1. One-Way Data Stream**
```text
Agent → Hook → Server → Database + WebSocket → Client
```
"This one-way data stream keeps things really simple. Every agent is responsible for summarizing their work in the hook before they send it off."
**Benefits:**
- Simple architecture
- Easy to reason about
- No bidirectional complexity
- Fast real-time updates
**2. Event Summarization at the Edge**
```python
# In the hook (runs on agent side)
def send_event(app_name, event_type, event_data, summarize=True):
if summarize:
# Use Haiku to summarize before sending
summary = summarize_with_haiku(event_data)
event_data["summary"] = summary
# Send to server
requests.post("http://localhost:3000/events", json={
"app": app_name,
"type": event_type,
"data": event_data,
"sessionId": os.getenv("CLAUDE_SESSION_ID")
})
```
**Why summarize at the edge?**
- Reduces server load
- Cheaper (uses small models locally)
- Human-readable summaries immediately available
- No server-side LLM dependencies
**3. Persistent + Real-Time Storage**
```sql
-- SQLite schema
CREATE TABLE events (
id INTEGER PRIMARY KEY,
source_app TEXT NOT NULL,
session_id TEXT NOT NULL,
event_type TEXT NOT NULL,
raw_payload JSON,
summary TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**Dual persistence:**
- SQLite for historical queries and analysis
- WebSocket for live streaming to UI
### Implementation Example
**Hook script structure:**
```python
# .claude/hooks/utils/send-event.py
# /// script
# dependencies = ["anthropic", "requests"]
# ///
import sys
import json
import os
import requests
from anthropic import Anthropic
def summarize_with_haiku(event_data, event_type):
"""Generate 1-sentence summary using Haiku"""
if event_type not in ["pre-tool-use", "post-tool-use"]:
return None
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
prompt = f"Summarize this {event_type} event in 1 sentence: {json.dumps(event_data)}"
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=50,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def send_event(app_name, event_type, event_data, summarize=False):
"""Send event to observability server"""
payload = {
"app": app_name,
"sessionId": os.getenv("CLAUDE_SESSION_ID", "unknown"),
"eventType": event_type,
"data": event_data,
"timestamp": event_data.get("timestamp")
}
if summarize:
payload["summary"] = summarize_with_haiku(event_data, event_type)
try:
response = requests.post(
"http://localhost:3000/events",
json=payload,
timeout=1
)
return response.status_code == 200
except Exception as e:
# Don't break agent if observability fails
print(f"Warning: Failed to send event: {e}", file=sys.stderr)
return False
def main():
if len(sys.argv) < 3:
print("Usage: send-event.py <app-name> <event-type> [--summarize]")
sys.exit(1)
app_name = sys.argv[1]
event_type = sys.argv[2]
summarize = "--summarize" in sys.argv
# Read event data from stdin
event_data = json.load(sys.stdin)
success = send_event(app_name, event_type, event_data, summarize)
print(json.dumps({"sent": success}))
if __name__ == "__main__":
main()
```
**Using in hooks:**
```python
# .claude/hooks/post-tool-use.py
import sys
import json
import subprocess
def main():
input_data = json.load(sys.stdin)
# Send to observability system with summarization
subprocess.run([
"uv", "run",
".claude/hooks/utils/send-event.py",
"my-app", # App name
"post-tool-use", # Event type
"--summarize" # Generate AI summary
], input=json.dumps(input_data), text=True)
print(json.dumps({"logged": True}))
if __name__ == "__main__":
main()
```
## Best Practices
### 1. Use Isolated Scripts (Astral UV Pattern)
**Why:** Hooks should be self-contained, portable, and not depend on your codebase.
```python
# /// script
# dependencies = ["anthropic", "requests"]
# ///
# Astral UV single-file script
# Runs independently with: uv run script.py
# Auto-installs dependencies
```
**Benefits:**
- Works in any codebase
- No virtual environment setup
- Portable across projects
- Easy to test in isolation
**Alternative: Bun for TypeScript**
```typescript
// .claude/hooks/post-tool-use.ts
// Run with: bun run post-tool-use.ts
import { readSync } from "fs";
const input = JSON.parse(readSync(0, "utf-8"));
// ... hook logic
```
### 2. Never Block the Agent
```python
def main():
try:
# Hook logic
send_to_server(event)
except Exception as e:
# Log but don't fail
print(f"Warning: {e}", file=sys.stderr)
# Always output valid JSON
print(json.dumps({"error": str(e)}))
```
**Rule:** If observability fails, the agent should continue working.
### 3. Use Small Fast Models for Summaries
```text
Cost comparison (1,000 events):
├── Opus: $15 (overkill for summaries)
├── Sonnet: $3 (still expensive)
└── Haiku: $0.20 ✅ (perfect for this)
```
"Thousands of events, less than 20 cents. Small fast cheap models shine here."
### 4. Hash Session IDs for UI Consistency
```python
import hashlib
def color_for_session(session_id):
"""Generate consistent color from session ID"""
hash_val = int(hashlib.md5(session_id.encode()).hexdigest()[:6], 16)
return f"#{hash_val:06x}"
```
**Result:** Same agent = same color in UI, making it easy to track.
### 5. Filter and Paginate Events
```javascript
// Client-side filtering
const filteredEvents = events
.filter(e => e.app === selectedApp || selectedApp === "all")
.filter(e => e.eventType === selectedType || selectedType === "all")
.slice(0, 100); // Limit displayed events
// Auto-refresh
setInterval(() => fetchLatestEvents(), 5000);
```
### 6. Multiple Hooks Per Event
```json
{
"hooks": {
"stop": [
{
"matcher": {},
"commands": [
"uv run .claude/hooks/stop-chat-log.py",
"uv run .claude/hooks/stop-tts.py",
"uv run .claude/hooks/stop-notify.py"
]
}
]
}
}
```
**Hooks run sequentially** in the order specified.
### 7. Matcher Patterns for Selective Execution
```json
{
"hooks": {
"pre-tool-use": [
{
"matcher": {
"toolName": "bash"
},
"commands": ["uv run .claude/hooks/bash-validator.py"]
},
{
"matcher": {
"toolName": "write",
"toolInput": {
"file_path": "**/.env"
}
},
"commands": ["uv run .claude/hooks/block-env-write.py"]
}
]
}
}
```
## Directory Structure Best Practice
```text
.claude/
├── commands/ # Slash commands
├── agents/ # Sub-agent definitions
└── hooks/ # ← New essential directory
├── settings.json # Hook configuration
├── pre-tool-use.py
├── post-tool-use.py
├── notification.py
├── stop.py
├── sub-agent-stop.py
└── utils/ # Shared utilities
├── send-event.py
├── text-to-speech-elevenlabs.py
├── text-to-speech-openai.py
└── summarize-haiku.py
```
## Real-World Use Cases
### Use Case 1: Block Dangerous Operations
```python
# .claude/hooks/pre-tool-use.py
BLOCKED_COMMANDS = [
r'\brm\s+-rf\b', # rm -rf
r'\bsudo\s+rm\b', # sudo rm
r'\bgit\s+push.*--force\b', # git push --force
r'\bdocker\s+system\s+prune\b', # docker system prune
]
BLOCKED_FILES = [
r'\.env$',
r'credentials\.json$',
r'\.ssh/id_rsa$',
r'aws.*credentials',
]
def is_blocked(tool_name, tool_input):
if tool_name == "bash":
command = tool_input.get("command", "")
return any(re.search(p, command) for p in BLOCKED_COMMANDS)
if tool_name in ["read", "write", "edit"]:
file_path = tool_input.get("file_path", "")
return any(re.search(p, file_path) for p in BLOCKED_FILES)
return False
```
### Use Case 2: Multi-Agent Task Board
```text
Observability UI showing:
Active Agents (5):
├── [Agent 1] Planning feature (12s ago)
├── [Agent 2] Writing tests (45s ago) ⚠️ Needs input
├── [Agent 3] Building UI (2m ago)
├── [Agent 4] Deploying (5m ago) ✅ Complete
└── [Agent 5] Monitoring (ongoing)
Recent Events (filtered: post-tool-use):
├── Agent 3: Wrote src/components/Button.tsx
├── Agent 1: Read src/api/endpoints.ts
├── Agent 4: Bash: git push origin main
└── Agent 2: Test failed: test/auth.test.ts
```
### Use Case 3: Long-Running AFK Agents
```bash
# Start agent with background work
/background "Implement entire auth system" --report agents/auth-report.md
# Agent works autonomously
# Hooks send notifications:
# - "Starting authentication module"
# - "Database schema created"
# - "Tests passing"
# - "All set and ready for your next step"
# You're notified via text-to-speech when complete
```
### Use Case 4: Debugging Agent Behavior
```python
# Filter stop events to analyze full chat transcripts
for event in events.filter(type="stop"):
transcript = json.load(open(event.transcriptPath))
# Analyze:
# - What files did agent read?
# - What tools were used most?
# - Where did agent get confused?
# - What patterns led to errors?
```
## Performance Considerations
### Webhook Timeouts
```python
# Don't block agent on slow external services
try:
requests.post(webhook_url, json=event, timeout=0.5) # 500ms max
except requests.Timeout:
# Log locally instead
log_to_file(event)
```
### Database Size Management
```sql
-- Rotate old events
DELETE FROM events
WHERE timestamp < datetime('now', '-30 days');
-- Or archive
INSERT INTO events_archive SELECT * FROM events
WHERE timestamp < datetime('now', '-30 days');
DELETE FROM events
WHERE id IN (SELECT id FROM events_archive);
```
### Event Batching
```python
# Batch events before sending
events_buffer = []
def send_event(event):
events_buffer.append(event)
if len(events_buffer) >= 10:
flush_events()
def flush_events():
requests.post(server_url, json={"events": events_buffer})
events_buffer.clear()
```
## Integration with Observability Platforms
### Datadog
```python
from datadog import statsd
def send_to_datadog(event):
statsd.increment(f"claude.tool.{event['toolName']}")
statsd.histogram(f"claude.duration.{event['toolName']}", event['duration'])
```
### Prometheus
```python
from prometheus_client import Counter, Histogram
tool_counter = Counter('claude_tool_executions', 'Tool executions', ['tool_name'])
tool_duration = Histogram('claude_tool_duration_seconds', 'Tool duration', ['tool_name'])
def send_to_prometheus(event):
tool_counter.labels(tool_name=event['toolName']).inc()
tool_duration.labels(tool_name=event['toolName']).observe(event['duration'])
```
### Slack
```python
import requests
def send_to_slack(event):
if event['eventType'] == 'notification':
requests.post(
os.getenv("SLACK_WEBHOOK_URL"),
json={"text": f"🤖 Agent needs input: {event['message']}"}
)
```
## Key Principles
1. **If you don't measure it, you can't improve it** - Observability is critical for scaling agents
2. **Keep hooks simple and isolated** - Use single-file scripts (UV, bun, shell)
3. **Never block the agent** - Hooks should be fast and fault-tolerant
4. **Small models for summaries** - Haiku is perfect and costs pennies
5. **One-way data streams** - Simple architecture beats complex bidirectional systems
6. **Context, Model, Prompt** - Even with hooks, the big three still matter
## Source Attribution
**Primary source:** Multi-Agent Observability transcript (complete system architecture, WebSocket streaming, event summarization, SQLite persistence)
**Supporting source:** Hooked transcript (5 hooks fundamentals, pre-tool-use implementation, text-to-speech integration, isolated scripts pattern)
**Key quotes:**
- "When it comes to agentic coding, observability is everything." (Hooked)
- "This one-way data stream keeps things really simple." (Multi-Agent Observability)
- "Thousands of events, less than 20 cents. Small fast models shine here." (Multi-Agent Observability)
## Related Documentation
- [Hooks Reference](../reference/hooks-reference.md) - Complete API reference for all 5 hooks
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real observability systems in action
- [Core 4 Framework](../reference/core-4-framework.md) - Context, Model, Prompt, Tools
---
**Remember:** Observability isn't optional when scaling agents. If you can't see what they're doing, you can't scale them effectively.

View File

@@ -0,0 +1,673 @@
# The Orchestrator Pattern
> "The rate at which you can create and command your agents becomes the constraint of your engineering output. When your agents are slow, you're slow."
The orchestrator pattern is **Level 5** of agentic engineering: managing fleets of agents through a single interface.
## The Journey to Orchestration
```text
Level 1: Base agents → Use agents out of the box
Level 2: Better agents → Customize prompts and workflows
Level 3: More agents → Run multiple agents
Level 4: Custom agents → Build specialized solutions
Level 5: Orchestration → Manage fleets of agents ← You are here
```
**Key realization:** Single agents hit context window limits. You need orchestration to scale beyond one agent.
## The Three Pillars
Multi-agent orchestration requires three components working together:
```text
┌─────────────────────────────────────────────────────────┐
│ 1. ORCHESTRATOR AGENT │
│ (Single interface to your fleet) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 2. CRUD FOR AGENTS │
│ (Create, Read, Update, Delete agents at scale) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 3. OBSERVABILITY │
│ (Monitor performance, costs, and results) │
└─────────────────────────────────────────────────────────┘
```
Without all three, orchestration fails. You need:
- **Orchestrator** to command agents
- **CRUD** to manage agent lifecycle
- **Observability** to understand what agents are doing
## Core Principle: The Orchestrator Sleeps
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. It has created and commanded our agents. Now, our agents are doing the work."
**The pattern:**
```text
1. User prompts Orchestrator
2. Orchestrator creates specialized agents
3. Orchestrator commands agents with detailed prompts
4. Orchestrator SLEEPS (stops consuming context)
5. Agents work autonomously
6. Orchestrator wakes periodically to check status
7. Orchestrator reports results to user
8. Agents are deleted
```
**Why orchestrator sleeps:**
- Protects its context window
- Avoids observing all agent work (too much information)
- Only wakes when needed to check status or command agents
**Example orchestrator sleep pattern:**
```python
# Orchestrator commands agents
orchestrator.create_agent("scout", task="Find relevant files")
orchestrator.create_agent("builder", task="Implement changes")
# Orchestrator sleeps, checking status every 15s
while not all_agents_complete():
orchestrator.sleep(15) # Not consuming context
status = orchestrator.check_agent_status()
orchestrator.log(status)
# Wake up to collect results
results = orchestrator.get_agent_results()
orchestrator.summarize_to_user(results)
```
## Orchestration Patterns
### Pattern 1: Scout-Plan-Build (Sequential Chaining)
**Use case:** Complex tasks requiring multiple specialized steps
**Flow:**
```text
User: "Migrate codebase to new SDK"
Orchestrator creates Scout agents (4 parallel)
├→ Scout 1: Search with Gemini
├→ Scout 2: Search with CodeX
├→ Scout 3: Search with Haiku
└→ Scout 4: Search with Flash
Scouts output: relevant-files.md with exact locations
Orchestrator creates Planner agent
├→ Reads relevant-files.md
├→ Scrapes documentation
└→ Outputs: detailed-plan.md
Orchestrator creates Builder agent
├→ Reads detailed-plan.md
├→ Executes implementation
└→ Tests and validates
```
**Why this works:**
- **Scout step offloads searching from Planner** (R&D framework: Reduce + Delegate)
- **Multiple scout models** provide diverse perspectives
- **Planner only sees relevant files**, not entire codebase
- **Builder focused on execution**, not planning
**Implementation:**
```bash
# Composable slash commands
/scout-plan-build "Migrate to new Claude Agent SDK"
# Internally runs:
/scout "Find files needing SDK migration"
/plan-with-docs docs=https://agent-sdk-docs.com
/build plan=agents/plans/sdk-migration.md
```
**Context savings:**
```text
Without scouts:
├── Planner searches entire codebase: 50k tokens
├── Planner reads irrelevant files: 30k tokens
└── Total wasted: 80k tokens
With scouts:
├── 4 scouts search in parallel (isolated contexts)
├── Planner reads only relevant-files.md: 5k tokens
└── Savings: 75k tokens (94% reduction)
```
### Pattern 2: Plan-Build-Review-Ship (Task Board)
**Use case:** Structured development lifecycle with quality gates
**Flow:**
```text
User: "Update HTML titles across application"
Task created → PLAN column
Orchestrator creates Planner agent
├→ Analyzes requirements
├→ Creates implementation plan
└→ Moves task to BUILD
Orchestrator creates Builder agent
├→ Reads plan
├→ Implements changes
├→ Runs tests
└→ Moves task to REVIEW
Orchestrator creates Reviewer agent
├→ Checks implementation against plan
├→ Validates tests pass
└→ Moves task to SHIP
Orchestrator creates Shipper agent
├→ Creates git commit
├→ Pushes to remote
└→ Task complete
```
**Why this works:**
- **Clear phases** with distinct responsibilities
- **Each agent focused** on single phase
- **Quality gates** between phases
- **Failure isolation** - if builder fails, planner work preserved
**Visual representation:**
```text
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ Task A │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```
**Agent handoff:**
```python
# Orchestrator manages task board state
task = {
"id": "update-titles",
"status": "planning",
"assigned_agent": "planner-001",
"artifacts": []
}
# Planner completes
task["status"] = "building"
task["artifacts"].append("plan.md")
task["assigned_agent"] = "builder-001"
# Orchestrator hands off to builder
orchestrator.command_agent(
"builder-001",
f"Implement plan from {task['artifacts'][0]}"
)
```
### Pattern 3: Scout-Builder (Two-Stage)
**Use case:** UI changes, targeted modifications
**Flow:**
```text
User: "Create gray pills for app header information"
Orchestrator creates Scout
├→ Locates exact files and line numbers
├→ Identifies patterns and conventions
└→ Outputs: scout-report.md
Orchestrator creates Builder
├→ Reads scout-report.md
├→ Implements precise changes
└→ Outputs: modified files
Orchestrator wakes, verifies, reports
```
**Orchestrator sleep pattern:**
```python
# Orchestrator creates scout
orchestrator.create_agent("scout-header", task="Find header UI components")
# Orchestrator sleeps, checking every 15s
orchestrator.sleep_with_status_checks(interval=15)
# Scout completes, orchestrator wakes
scout_output = orchestrator.get_agent_output("scout-header")
# Orchestrator creates builder with scout's output
orchestrator.create_agent(
"builder-ui",
task=f"Create gray pills based on scout findings: {scout_output}"
)
# Orchestrator sleeps again
orchestrator.sleep_with_status_checks(interval=15)
```
## Context Window Protection
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
**The problem:** Single agent doing everything explodes context window
```text
Single Agent Approach:
├── Search codebase: 40k tokens
├── Read files: 60k tokens
├── Plan changes: 20k tokens
├── Implement: 30k tokens
├── Test: 15k tokens
└── Total: 165k tokens (83% used!)
```
**The solution:** Specialized agents with focused context
```text
Orchestrator Approach:
├── Orchestrator: 10k tokens (coordinates)
├── Scout 1: 15k tokens (searches)
├── Scout 2: 15k tokens (searches)
├── Planner: 25k tokens (plans using scout output)
├── Builder: 35k tokens (implements)
└── Total per agent: <35k tokens (max 18% per agent)
```
**Key principle:** Agents are deletable temporary resources
```text
1. Create agent for specific task
2. Agent completes task
3. DELETE agent (free memory)
4. Create new agent for next task
5. Repeat
```
**Example:**
```bash
# User: "Build documentation for frontend and backend"
# Orchestrator creates 3 agents
/create-agent frontend-docs "Document frontend components"
/create-agent backend-docs "Document backend APIs"
/create-agent qa-docs "Combine and QA both docs"
# Work completes...
# Delete all agents when done
/delete-all-agents
# Result: All agents gone, context freed
```
**Why delete agents:**
- Frees context windows for new work
- Prevents context accumulation
- Enforces single-purpose design
- Matches engineering principle: "The best code is no code at all"
## CRUD for Agents
Orchestrator needs full agent lifecycle control:
**Create:**
```python
agent_id = orchestrator.create_agent(
name="scout-api",
task="Find all API endpoints",
model="haiku", # Fast, cheap for search
max_tokens=100000
)
```
**Read:**
```python
# Check agent status
status = orchestrator.get_agent_status(agent_id)
# => {"status": "working", "progress": "60%", "context_used": "15k tokens"}
# Read agent output
output = orchestrator.get_agent_output(agent_id)
# => {"files_consumed": [...], "files_produced": [...]}
```
**Update:**
```python
# Command existing agent with new task
orchestrator.command_agent(
agent_id,
"Now implement the changes based on your findings"
)
```
**Delete:**
```python
# Single agent
orchestrator.delete_agent(agent_id)
# All agents
orchestrator.delete_all_agents()
```
## Observability Requirements
Without observability, orchestration is blind. You need:
### 1. Agent-Level Visibility
```text
For each agent, track:
├── Name and ID
├── Status (creating, working, complete, failed)
├── Context window usage
├── Model and cost
├── Files consumed
├── Files produced
└── Tool calls executed
```
### 2. Cross-Agent Visibility
```text
Fleet overview:
├── Total agents active
├── Total context consumed
├── Total cost
├── Agent dependencies (who's waiting on whom)
└── Bottlenecks (slow agents blocking others)
```
### 3. Real-Time Streaming
```text
User sees:
├── Agent creation events
├── Tool calls as they happen
├── Progress updates
├── Completion notifications
└── Error alerts
```
**Implementation:** See [Hooks for Observability](hooks-observability.md) for complete architecture
## Information Flow in Orchestrated Systems
```text
User
↓ (prompts)
Orchestrator
↓ (creates & commands)
Agent 1 → Agent 2 → Agent 3
↓ ↓ ↓
(results flow back up)
Orchestrator (summarizes)
User
```
**Critical understanding:** Agents never talk directly to user. They report to orchestrator.
**Example:**
```python
# User prompts orchestrator
user: "Summarize codebase"
# Orchestrator creates agent with detailed instructions
orchestrator agent: """
Read all files in src/
Create markdown summary with:
- Architecture overview
- Key components
- File structure
- Tech stack
Report results back to orchestrator (not user!)
"""
# Agent completes, reports to orchestrator
agent orchestrator: "Summary complete at docs/summary.md"
# Orchestrator reports to user
orchestrator user: "Codebase summary created with 3 main sections: architecture, components, and tech stack"
```
## When to Use Orchestration
### Use orchestration when
**Task requires 3+ specialized agents**
- Example: Scout + Plan + Build
**Context window exploding in single agent**
- Single agent using >150k tokens
**Need parallel execution**
- Multiple independent subtasks
**Quality gates required**
- Plan → Build → Review → Ship
**Long-running autonomous work**
- Agents work while you're AFK
### Don't use orchestration when
**Simple one-off task**
- Single agent sufficient
**Learning/prototyping**
- Orchestration adds complexity
**No observability infrastructure**
- You'll be blind to agent behavior
**Haven't mastered custom agents**
- Level 5 requires Level 4 foundation
## Practical Implementation
### Minimal Orchestrator Agent
```python
# orchestrator-agent.md (sub-agent definition)
---
name: orchestrator
description: Manages fleet of agents for complex multi-step tasks
---
# Orchestrator Agent
You are an orchestrator agent managing a fleet of specialized agents.
## Your Tools
- create_agent(name, task, model): Create new agent
- command_agent(agent_id, task): Send task to existing agent
- get_agent_status(agent_id): Check agent progress
- get_agent_output(agent_id): Retrieve agent results
- delete_agent(agent_id): Remove completed agent
- delete_all_agents(): Clean up all agents
## Your Responsibilities
1. **Break down user requests** into specialized subtasks
2. **Create focused agents** for each subtask
3. **Command agents** with detailed instructions
4. **Monitor progress** without micromanaging
5. **Collect results** and synthesize for user
6. **Delete agents** when work is complete
## Orchestrator Sleep Pattern
After creating and commanding agents:
1. **SLEEP** - Stop consuming context
2. **Wake every 15-30s** to check agent status
3. **SLEEP again** if agents still working
4. **Wake when all complete** to collect results
DO NOT observe all agent work. This explodes your context window.
## Example Workflow
```
User: "Migrate codebase to new SDK"
You:
1. Create scout agents (parallel search)
2. Command scouts to find SDK usage
3. SLEEP (check status every 15s)
4. Wake when scouts complete
5. Create planner agent
6. Command planner with scout results
7. SLEEP (check status every 15s)
8. Wake when planner completes
9. Create builder agent
10. Command builder with plan
11. SLEEP (check status every 15s)
12. Wake when builder completes
13. Summarize results for user
14. Delete all agents
```bash
## Key Principles
- **One agent, one task** - Don't overload agents
- **Sleep between phases** - Protect your context
- **Delete when done** - Treat agents as temporary
- **Detailed commands** - Don't assume agents know context
- **Results-oriented** - Every agent must produce concrete output
```
### Orchestrator Tools (SDK)
```python
# create_agent tool
@mcptool(
name="create_agent",
description="Create a new specialized agent"
)
def create_agent(params: dict) -> dict:
name = params["name"]
task = params["task"]
model = params.get("model", "sonnet")
agent_id = agent_manager.create(
name=name,
system_prompt=task,
model=model
)
return {
"agent_id": agent_id,
"status": "created",
"message": f"Agent {name} created"
}
# command_agent tool
@mcptool(
name="command_agent",
description="Send task to existing agent"
)
def command_agent(params: dict) -> dict:
agent_id = params["agent_id"]
task = params["task"]
result = agent_manager.prompt(agent_id, task)
return {
"agent_id": agent_id,
"status": "commanded",
"message": f"Agent received task"
}
```
## Trade-offs
### Benefits
- ✅ Scales beyond single agent limits
- ✅ Parallel execution (3x-10x speedup)
- ✅ Context window protection
- ✅ Specialized agent focus
- ✅ Quality gates between phases
- ✅ Autonomous out-of-loop work
### Costs
- ❌ Upfront investment to build
- ❌ Infrastructure complexity (database, WebSocket)
- ❌ More moving parts to manage
- ❌ Requires observability
- ❌ Orchestrator agent needs careful prompting
- ❌ Not worth it for simple tasks
## Key Quotes
> "The orchestrator agent is the first pattern where I felt the perfect combination of observability, customizability, and agents at scale."
>
> "Treat your agents as deletable temporary resources that serve a single purpose."
>
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. Now, our agents are doing the work."
>
> "200k context window is plenty. You're just stuffing a single agent with too much work."
## Source Attribution
**Primary source:** One Agent to Rule Them All (orchestrator architecture, three pillars, sleep pattern, CRUD)
**Supporting sources:**
- Claude 2.0 (scout-plan-build workflow, composable prompts)
- Custom Agents (plan-build-review-ship task board)
- Sub-Agents (information flow, delegation patterns)
## Related Documentation
- [Hooks for Observability](hooks-observability.md) - Required for orchestration
- [Context Window Protection](context-window-protection.md) - Why orchestration matters
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real orchestration systems
---
**Remember:** Orchestration is Level 5. Master Levels 1-4 first. Then build your fleet.