Initial commit

2025-11-29 18:00:36 +08:00
commit c83b4639c5
49 changed files with 18594 additions and 0 deletions
--- a/skills/multi-agent-composition/patterns/context-in-composition.md
+++ b/skills/multi-agent-composition/patterns/context-in-composition.md
@@ -0,0 +1,158 @@
+# Context in Composition
+
+**Strategic framework for managing context when composing multi-agent systems.**
+
+## The Core Problem
+
+Context window is your most precious resource when composing multiple agents. A focused agent is a performant agent.
+
+**The Reality:**
+
+```text
+Single agent doing everything:
+├── Context explodes to 150k+ tokens
+├── Performance degrades
+└── Eventually fails or times out
+
+Multi-agent composition:
+├── Each agent: <40k tokens
+├── Main agent: Stays lean
+└── Work completes successfully
+```
+
+## The R&D Framework
+
+There are only two strategies for managing context in multi-agent systems:
+
+**R - Reduce**
+
+- Minimize what enters context windows
+- Remove unused MCP servers (can consume 24k+ tokens)
+- Shrink static CLAUDE.md files
+- Use context priming instead of static loading
+
+**D - Delegate**
+
+- Move work to sub-agents' isolated contexts
+- Use background agents for autonomous work
+- Employ orchestrator sleep patterns
+- Treat agents as deletable temporary resources
+
+**Everything else is a tactic implementing R or D.**
+
+## The Four Levels of Context Mastery
+
+### Level 1: Beginner - Stop Wasting Tokens
+
+**Focus:** Resource management
+
+**Key Actions:**
+
+- Remove unused MCP servers (reclaim 20k+ tokens)
+- Minimize CLAUDE.md (<1k tokens)
+- Disable autocompact buffer (reclaim 20%)
+
+**Success Metric:** 85-90% context window free at startup
+
+**Move to Level 2 when:** Resources cleaned but still rebuilding context for different tasks
+
+---
+
+### Level 2: Intermediate - Load Selectively
+
+**Focus:** Dynamic context loading
+
+**Key Actions:**
+
+- Context priming (`/prime` commands vs. static files)
+- Sub-agent delegation for parallel work
+- Composable workflows (scout-plan-build)
+
+**Success Metric:** 60-75% context window free during work
+
+**Move to Level 3 when:** Managing multiple agents but struggling with handoffs
+
+---
+
+### Level 3: Advanced - Multi-Agent Handoff
+
+**Focus:** Agent-to-agent context transfer
+
+**Key Actions:**
+
+- Context bundles (60-70% transfer in 10% tokens)
+- Monitor context limits proactively
+- Chain multiple agents without overflow
+
+**Success Metric:** Per-agent context <60k tokens, successful handoffs
+
+**Move to Level 4 when:** Need agents working autonomously while you do other work
+
+---
+
+### Level 4: Agentic - Out-of-Loop Systems
+
+**Focus:** Fleet orchestration
+
+**Key Actions:**
+
+- Background agents (`/background` command)
+- Dedicated agent environments
+- Orchestrator sleep patterns
+- Zero-touch execution
+
+**Success Metric:** Agents ship work end-to-end without intervention
+
+---
+
+## When Context Becomes a Composition Issue
+
+**Trigger 1: Single Agent Exceeds 150k Tokens**
+→ Delegate to sub-agents with isolated contexts
+
+**Trigger 2: Agent Reading >20 Files**
+→ Use scout agents to identify relevant subset first
+
+**Trigger 3: `/context` Shows >80% Used**
+→ Start fresh agent, use context bundles for handoff
+
+**Trigger 4: Performance Degrading Mid-Workflow**
+→ Split workflow across multiple focused agents
+
+**Trigger 5: Same Analysis Repeated Multiple Times**
+→ Context overflow forcing re-reads; delegate earlier
+
+## Composition Patterns by Level
+
+**Beginner:** Single agent, minimal static context
+
+**Intermediate:** Main agent + sub-agents for parallel work
+
+**Advanced:** Agent chains with context bundles for handoff
+
+**Agentic:** Orchestrator + fleet of specialized agents
+
+## Key Principles
+
+1. **Focused agents perform better** - Single purpose, minimal context
+2. **Agents are deletable** - Free context by removing completed agents
+3. **200k is plenty** - Context explosions are design problems, not capacity problems
+4. **Orchestrators must sleep** - Don't observe all sub-agent work
+5. **Context bundles over full replay** - 70% context in 10% tokens
+
+## Implementation Details
+
+For practical patterns, see:
+
+- [Multi-Agent Context Isolation](../reference/multi-agent-context-isolation.md) - Parallel execution, context bundling
+- [Orchestrator Pattern](orchestrator-pattern.md) - Sleep patterns, fleet management
+- [Decision Framework](decision-framework.md) - When to use each component
+
+## Source Attribution
+
+Primary: Elite Context Engineering, Claude 2.0 transcripts
+Supporting: One Agent to Rule Them All, Sub-Agents documentation
+
+---
+
+**Remember:** Context is the first pillar of the Core 4. Master context strategy, and you can scale infinitely with focused agents.
--- a/skills/multi-agent-composition/patterns/context-management.md
+++ b/skills/multi-agent-composition/patterns/context-management.md
@@ -0,0 +1,715 @@
+# Context Window Protection
+
+> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
+
+Context window protection is about managing your agent's most precious resource: attention. A focused agent is a performant agent.
+
+## The Core Problem
+
+**Every engineer hits this wall:**
+
+```text
+Agent starts:  10k tokens (5% used)
+           ↓
+After exploration:  80k tokens (40% used)
+           ↓
+After planning:  120k tokens (60% used)
+           ↓
+During implementation:  170k tokens (85% used) ⚠️
+           ↓
+Context explodes:  195k tokens (98% used) ❌
+           ↓
+Agent performance degrades, fails, or times out
+```
+
+**The realization:** More context ≠ better performance. Too much context = cognitive overload.
+
+## The R&D Framework
+
+There are only two ways to manage your context window:
+
+```text
+R - REDUCE
+└─→ Minimize what enters the context window
+
+D - DELEGATE
+└─→ Move work to other agents' context windows
+```
+
+**Everything else is a tactic implementing R or D.**
+
+## The Four Levels of Context Protection
+
+### Level 1: Beginner - Reduce Waste
+
+**Focus:** Stop wasting tokens on unused resources
+
+#### Tactic 1: Eliminate Default MCP Servers
+
+**Problem:**
+
+```bash
+# Default mcp.json
+{
+  "mcpServers": {
+    "firecrawl": {...},   # 6k tokens
+    "github": {...},      # 8k tokens
+    "postgres": {...},    # 5k tokens
+    "redis": {...}        # 5k tokens
+  }
+}
+# Total: 24k tokens always loaded (12% of 200k window!)
+```
+
+**Solution:**
+
+```bash
+# Option 1: Delete default mcp.json entirely
+rm .claude/mcp.json
+
+# Option 2: Load selectively
+claude-mcp-config --strict specialized-configs/firecrawl-only.json
+# Result: 4k tokens instead of 24k (83% reduction)
+```
+
+#### Tactic 2: Minimize CLAUDE.md
+
+**Before:**
+
+```markdown
+# CLAUDE.md (23,000 tokens = 11.5% of window)
+- 500 lines of API documentation
+- 300 lines of deployment procedures
+- 1,500 lines of coding standards
+- Architecture diagrams
+- Always loaded, whether relevant or not
+```
+
+**After:**
+
+```markdown
+# CLAUDE.md (500 tokens = 0.25% of window)
+# Only universal essentials
+
+- Fenced code blocks MUST have language
+- Use rg instead of grep
+- ALWAYS use set -euo pipefail
+```
+
+**Rule:** Only include what you're 100% sure you want loaded 100% of the time.
+
+#### Tactic 3: Disable Autocompact Buffer
+
+**Problem:**
+
+```bash
+/context
+
+# Output:
+autocompact buffer: 22%  ⚠️ (44k tokens gone!)
+messages: 51%
+system_tools: 8%
+---
+Total available: 78% (should be 100%)
+```
+
+**Solution:**
+
+```bash
+/config
+# Set: autocompact = false
+
+# Now:
+/context
+# Output:
+messages: 51%
+system_tools: 8%
+custom_agents: 2%
+---
+Total available: 91% ✅ (reclaimed 22%!)
+```
+
+**Impact:** Reclaims 40k+ tokens immediately.
+
+### Level 2: Intermediate - Dynamic Loading
+
+**Focus:** Load what you need, when you need it
+
+#### Tactic 4: Context Priming
+
+**Replace static CLAUDE.md with task-specific `/prime` commands**
+
+```markdown
+# .claude/commands/prime.md
+# General codebase context (2k tokens)
+Read README, understand structure, report findings
+
+# .claude/commands/prime-feature.md
+# Feature development context (3k tokens)
+Read feature requirements, understand dependencies, plan implementation
+
+# .claude/commands/prime-api.md
+# API work context (4k tokens)
+Read API docs, understand endpoints, review integration patterns
+```
+
+**Usage pattern:**
+
+```bash
+# Starting feature work
+/prime-feature
+
+# vs. having 23k tokens always loaded
+```
+
+**Savings:** 20k tokens (87% reduction)
+
+#### Tactic 5: Sub-Agent Delegation
+
+**Problem:** Primary agent doing parallel work fills its own context
+
+```text
+Primary Agent tries to do:
+├── Web scraping (15k tokens)
+├── Documentation fetch (12k tokens)
+├── Data analysis (10k tokens)
+└── Synthesis (5k tokens)
+= 42k tokens in one agent
+```
+
+**Solution:** Delegate to sub-agents with isolated contexts
+
+```text
+Primary Agent (9k tokens):
+├→ Sub-Agent 1: Web scraping (15k tokens, isolated)
+├→ Sub-Agent 2: Docs fetch (12k tokens, isolated)
+└→ Sub-Agent 3: Analysis (10k tokens, isolated)
+
+Total work: 46k tokens
+Primary agent context: Only 9k tokens ✅
+```
+
+**Example:**
+
+```bash
+/load-ai-docs
+
+# Agent spawns 10 sub-agents for web scraping
+# Each scrape: ~3k tokens
+# Total work: 30k tokens
+# Primary agent context: Still only 9k tokens
+# Savings: 21k tokens protected
+```
+
+**Key insight:** Sub-agents use system prompts (not user prompts), keeping their context isolated from primary.
+
+### Level 3: Advanced - Multi-Agent Handoff
+
+**Focus:** Chain agents together without context explosion
+
+#### Tactic 6: Context Bundles
+
+**Problem:** Agent 1's context explodes (180k tokens). Need to hand off to fresh Agent 2 without full replay.
+
+**Solution:** Bundle 60-70% of essential context
+
+```markdown
+# context-bundle-2025-01-05-<session-id>.md
+
+## Context Bundle
+Created: 2025-01-05 14:30
+Source Agent: agent-abc123
+
+## Initial Setup
+/prime-feature
+
+## Read Operations (deduplicated)
+- src/api/endpoints.ts
+- src/components/Auth.tsx
+- config/env.ts
+
+## Key Findings
+- Auth system uses JWT
+- API has 15 endpoints
+- Config needs migration
+
+## User Prompts (summarized)
+1. "Implement OAuth2 flow"
+2. "Add refresh token logic"
+
+[Excluded: full write operations, detailed read contents, tool execution details]
+```
+
+**Usage:**
+
+```bash
+# Agent 1: Context exploding at 180k
+# Automatic bundle saved
+
+# Agent 2: Fresh start (10k base)
+/loadbundle /path/to/context-bundle-<timestamp>.md
+# Agent 2 now has 70% of Agent 1's context in ~15k tokens
+
+# Total: 25k tokens vs. 180k (86% reduction)
+```
+
+#### Tactic 7: Composable Workflows (Scout-Plan-Build)
+
+**Problem:** Single agent searching + planning + building = context explosion
+
+```text
+Monolithic Agent:
+├── Search codebase: 40k tokens
+├── Read files: 60k tokens
+├── Plan changes: 20k tokens
+├── Implement: 30k tokens
+├── Test: 15k tokens
+└── Total: 165k tokens (83% used!)
+```
+
+**Solution:** Break into composable steps that delegate
+
+```text
+/scout-plan-build workflow:
+
+Step 1: /scout (delegates to 4 parallel sub-agents)
+├→ Sub-agents search codebase: 4 × 15k = 60k total
+├→ Output: relevant-files.md (5k tokens)
+└→ Primary agent context: unchanged
+
+Step 2: /plan-with-docs
+├→ Reads relevant-files.md: 5k tokens
+├→ Scrapes docs: 8k tokens
+├→ Creates plan: 3k tokens
+└→ Total added: 16k tokens
+
+Step 3: /build
+├→ Reads plan: 3k tokens
+├→ Implements: 30k tokens
+└→ Total added: 33k tokens
+
+Final primary agent context: 10k + 16k + 33k = 59k tokens
+Savings: 106k tokens (64% reduction)
+```
+
+**Why this works:** Scout step offloads searching from planner (R&D: Reduce + Delegate)
+
+### Level 4: Agentic - Out-of-Loop Systems
+
+**Focus:** Agents working autonomously while you're AFK
+
+#### Tactic 8: Focused Agents (One Agent, One Task)
+
+**Anti-pattern:**
+
+```text
+Super Agent (trying to do everything):
+├── API development
+├── UI implementation
+├── Database migrations
+├── Testing
+├── Documentation
+├── Deployment
+└── Context: 170k tokens (85% used)
+```
+
+**Pattern:**
+
+```text
+Focused Agent Fleet:
+├── Agent 1: API only (30k tokens)
+├── Agent 2: UI only (35k tokens)
+├── Agent 3: DB only (20k tokens)
+├── Agent 4: Tests only (25k tokens)
+├── Agent 5: Docs only (15k tokens)
+└── Each agent: <35k tokens (max 18% per agent)
+```
+
+**Principle:** "A focused engineer is a performant engineer. A focused agent is a performant agent."
+
+#### Tactic 9: Deletable Agents
+
+**Pattern:**
+
+```bash
+# Create agent for specific task
+/create-agent docs-writer "Document frontend components"
+
+# Agent completes task (used 30k tokens)
+
+# DELETE agent immediately
+/delete-agent docs-writer
+
+# Result: 30k tokens freed for next agent
+```
+
+**Lifecycle:**
+
+```text
+1. Create agent → Task-specific context loaded
+2. Agent works → Context grows to completion
+3. Agent completes → Context maxed out
+4. DELETE agent → Context freed
+5. Create new agent → Fresh start
+6. Repeat
+```
+
+**Engineering analogy:** "The best code is no code at all. The best agent is a deleted agent."
+
+#### Tactic 10: Background Agent Delegation
+
+**Problem:** You're in the loop, waiting for agent to finish long task
+
+**Solution:** Delegate to background agent, continue working
+
+```bash
+# In-loop (you wait, your context stays open)
+/implement-feature "Build auth system"
+# Your terminal blocked for 20 minutes
+# Context accumulates: 150k tokens
+
+# Out-of-loop (you continue working)
+/background "Build auth system" \
+  --model opus \
+  --report agents/auth-report.md
+
+# Background agent works independently
+# Your terminal freed immediately
+# Background agent context isolated
+# You get notified when complete
+```
+
+**Context protection:**
+
+- Primary agent: 10k tokens (just manages job queue)
+- Background agent: 150k tokens (isolated, will be deleted)
+- Your interactive session: 10k tokens (protected)
+
+#### Tactic 11: Orchestrator Sleep Pattern
+
+**Problem:** Orchestrator observing all agent work = context explosion
+
+```text
+Orchestrator watches everything:
+├── Scout 1 work: 15k tokens observed
+├── Scout 2 work: 15k tokens observed
+├── Scout 3 work: 15k tokens observed
+├── Planner work: 25k tokens observed
+├── Builder work: 35k tokens observed
+└── Orchestrator context: 105k tokens
+```
+
+**Solution:** Orchestrator sleeps while agents work
+
+```text
+Orchestrator pattern:
+1. Create scouts → 3k tokens (commands only)
+2. SLEEP (not observing)
+3. Wake every 15s, check status → 1k tokens
+4. Scouts complete, read outputs → 5k tokens
+5. Create planner → 2k tokens
+6. SLEEP (not observing)
+7. Wake every 15s, check status → 1k tokens
+8. Planner completes, read output → 3k tokens
+9. Create builder → 2k tokens
+10. SLEEP (not observing)
+
+Orchestrator final context: 17k tokens ✅
+vs. 105k if watching everything (84% reduction)
+```
+
+**Key principle:** Orchestrator wakes to coordinate, sleeps while agents work.
+
+## Monitoring Context Health
+
+### The /context Command
+
+```bash
+/context
+
+# Healthy agent (beginner level):
+messages: 8%
+system_tools: 5%
+custom_agents: 2%
+---
+Total used: 15%  ✅ (85% free)
+
+# Warning (intermediate):
+messages: 45%
+mcp_tools: 18%
+system_tools: 5%
+---
+Total used: 68%  ⚠️ (32% free, approaching limits)
+
+# Danger (needs intervention):
+messages: 72%
+mcp_tools: 24%
+system_tools: 5%
+---
+Total used: 101%  ❌ (context overflow!)
+```
+
+### Success Metrics by Level
+
+| Level | Target Context Free | What This Enables |
+|-------|---------------------|-------------------|
+| Beginner | 85-90% | Basic tasks without running out |
+| Intermediate | 60-75% | Complex tasks with breathing room |
+| Advanced | 40-60% | Multi-step workflows without overflow |
+| Agentic | Per-agent 60-80% | Fleet of focused agents |
+
+### Warning Signs
+
+**Your context window is in danger when:**
+
+❌ **Single agent exceeds 150k tokens**
+
+- Solution: Split work across multiple agents
+
+❌ **Agent needs to read >20 files**
+
+- Solution: Use scout agents to find relevant subset
+
+❌ **`/context` shows >80% used**
+
+- Solution: Start fresh agent, use context bundles
+
+❌ **Agent gets slower/less accurate**
+
+- Solution: Check context usage, delegate to sub-agents
+
+❌ **Autocompact buffer active**
+
+- Solution: Disable it, reclaim 20%+ tokens
+
+## Context Window Hard Limits
+
+> "Context window is a hard limit. We have to respect this and work around it."
+
+### The Reality
+
+```text
+Claude Opus 200k limit:
+├── System prompt: ~8k tokens (4%)
+├── Available tools: ~5k tokens (2.5%)
+├── MCP servers: 0-24k tokens (0-12%)
+├── CLAUDE.md: 0-23k tokens (0-11.5%)
+├── Custom agents: ~2k tokens (1%)
+└── Available for work: 138-185k tokens (69-92.5%)
+
+Best case (optimized): 185k available
+Worst case (unoptimized): 138k available
+Difference: 47k tokens (25% of total capacity!)
+```
+
+### Real Example from the Field
+
+> "We were 14% away from exploding our context in our scout-plan-build workflow."
+
+```text
+Scout-Plan-Build execution:
+├── Base context: 15k tokens
+├── Scout work (4 sub-agents): +40k tokens
+├── Planner work: +35k tokens
+├── Builder work: +80k tokens
+└── Total: 170k tokens
+
+With autocompact buffer (22%):
+170k / 0.78 = 218k tokens
+❌ Exceeds 200k limit by 18k (9% overflow)
+
+Without autocompact buffer:
+170k / 1.0 = 170k tokens
+✅ Within limits with 30k buffer (15% free)
+```
+
+**Lesson:** Every percentage point matters when approaching limits.
+
+## Common Context Explosion Patterns
+
+### Pattern 1: The Sponge Agent
+
+**Symptoms:**
+
+- Agent reads entire codebase
+- Opens 50+ files
+- Context grows 10k tokens every few minutes
+
+**Cause:** No filtering strategy
+
+**Fix:**
+
+```bash
+# Before: Agent reads everything
+Agent: "Analyzing codebase..."
+[reads 100 files = 150k tokens]
+
+# After: Scout first
+/scout "Find files related to authentication"
+# Scout outputs: 5 relevant files
+Agent reads only those 5 files = 8k tokens
+```
+
+### Pattern 2: The Accumulator
+
+**Symptoms:**
+
+- Long conversation
+- Many tool calls
+- Context steadily grows to limit
+
+**Cause:** Not resetting agent between phases
+
+**Fix:**
+
+```bash
+# Phase 1: Exploration
+[Agent explores, context hits 120k]
+
+# Phase 2: Implementation
+# ❌ Bad: Continue same agent (will overflow)
+# ✅ Good: New agent with context bundle
+
+/loadbundle context-from-phase-1.md
+# Fresh agent (15k) + bundle (20k) = 35k tokens
+# Ready for implementation without overflow
+```
+
+### Pattern 3: The Observer
+
+**Symptoms:**
+
+- Orchestrator context growing rapidly
+- Watching all sub-agent work
+- Can't coordinate more than 2-3 agents
+
+**Cause:** Not using sleep pattern
+
+**Fix:**
+
+```python
+# ❌ Bad: Orchestrator watches everything
+for agent in agents:
+    result = orchestrator.watch_agent_work(agent)  # Observes all work
+    orchestrator.context += result  # Context explodes
+
+# ✅ Good: Orchestrator sleeps
+for agent in agents:
+    orchestrator.create_and_command(agent)
+    orchestrator.sleep()  # Not observing
+
+orchestrator.wake_and_check_status()  # Only reads summaries
+```
+
+## The "200k is Plenty" Principle
+
+> "I'm super excited for larger effective context windows, but 200k context window is plenty. You're just stuffing a single agent with too much work."
+
+**The mindset shift:**
+
+```text
+Beginner thinking:
+"I need a bigger context window"
+"If only I had 500k tokens..."
+"My task is too complex for 200k"
+
+Expert thinking:
+"I need better context management"
+"I'm overloading a single agent"
+"I should split this across focused agents"
+```
+
+**The truth:** Most context explosions are design problems, not capacity problems.
+
+### Why 200k is Sufficient
+
+**With proper protection:**
+
+```text
+Task: Refactor authentication across 50-file codebase
+
+Approach 1 (Single Agent - fails):
+├── Agent reads 50 files: 75k tokens
+├── Agent plans changes: 20k tokens
+├── Agent implements: 80k tokens
+├── Agent tests: 30k tokens
+└── Total: 205k tokens ❌ (overflow by 5k)
+
+Approach 2 (Multi-Agent - succeeds):
+├── Scout finds relevant 10 files: 15k tokens
+├── Planner creates strategy: 20k tokens (new agent)
+├── Builder 1 (auth logic): 35k tokens (new agent)
+├── Builder 2 (UI changes): 30k tokens (new agent)
+├── Tester verifies: 25k tokens (new agent)
+└── Max per agent: 35k tokens ✅ (all within limits)
+```
+
+## Integration with Other Patterns
+
+Context window protection enables:
+
+**Progressive Disclosure:**
+
+- Reduces: Minimal static context
+- Enables: Dynamic loading via priming
+
+**Core 4 Management:**
+
+- Protects: Context (pillar #1)
+- Enables: Better model/prompt/tools choices
+
+**Orchestration:**
+
+- Requires: Context protection (orchestrator sleep)
+- Enables: Fleet management without overflow
+
+**Observability:**
+
+- Monitors: Context usage via hooks
+- Prevents: Unnoticed context explosion
+
+## Key Principles
+
+1. **Reduce and Delegate** - The only two strategies that matter
+
+2. **A focused agent is a performant agent** - Single-purpose beats multi-purpose
+
+3. **Agents are deletable** - Free context by removing completed agents
+
+4. **200k is plenty** - Context explosions are design problems
+
+5. **Monitor constantly** - `/context` command is your best friend
+
+6. **Orchestrators must sleep** - Don't observe all agent work
+
+7. **Context bundles over full replay** - 70% of context in 10% of tokens
+
+## Source Attribution
+
+**Primary sources:**
+
+- Elite Context Engineering (R&D framework, 4 levels, all tactics)
+- Claude 2.0 (autocompact buffer, hard limits, scout-plan-build)
+
+**Supporting sources:**
+
+- One Agent to Rule Them All (orchestrator sleep, 200k principle, deletable agents)
+- Sub-Agents (sub-agent delegation, context isolation)
+
+**Key quotes:**
+
+- "200k context window is plenty. You're just stuffing a single agent with too much work." (One Agent)
+- "A focused agent is a performant agent." (Elite Context Engineering)
+- "We were 14% away from exploding our context." (Claude 2.0)
+- "There are only two ways to manage your context window: R and D." (Elite Context Engineering)
+
+## Related Documentation
+
+- [Progressive Disclosure](../reference/progressive-disclosure.md) - Context loading strategies
+- [Orchestrator Pattern](orchestrator-pattern.md) - Fleet management requiring protection
+- [Evolution Path](../workflows/evolution-path.md) - Progression through protection levels
+- [Core 4 Framework](../reference/core-4-framework.md) - Context as first pillar
+
+---
+
+**Remember:** Context window management separates beginners from experts. Master it, and you can scale infinitely with focused agents.
--- a/skills/multi-agent-composition/patterns/decision-framework.md
+++ b/skills/multi-agent-composition/patterns/decision-framework.md
@@ -0,0 +1,434 @@
+# Decision Framework: Choosing the Right Claude Code Component
+
+This guide helps you choose the right Claude Code component for your specific task. **Always start with prompts**—master the primitive first before scaling to other components.
+
+## Table of Contents
+
+- [The Decision Tree](#the-decision-tree)
+- [Quick Reference: Decision Matrix](#quick-reference-decision-matrix)
+- [When to Use Each Component](#when-to-use-each-component)
+  - [Use Skills When](#use-skills-when)
+  - [Use Sub-Agents When](#use-sub-agents-when)
+  - [Use Slash Commands When](#use-slash-commands-when)
+  - [Use MCP Servers When](#use-mcp-servers-when)
+  - [Use Hooks When](#use-hooks-when)
+  - [Use Plugins When](#use-plugins-when)
+- [Use Case Examples from the Field](#use-case-examples-from-the-field)
+- [Composition Rules and Boundaries](#composition-rules-and-boundaries)
+  - [What Can Compose What](#what-can-compose-what)
+  - [Critical Composition Rules](#critical-composition-rules)
+- [The Proper Evolution Path](#the-proper-evolution-path)
+  - [Stage 1: Start with a Prompt](#stage-1-start-with-a-prompt)
+  - [Stage 2: Add Sub-Agent if Parallelism Needed](#stage-2-add-sub-agent-if-parallelism-needed)
+  - [Stage 3: Create Skill When Management Needed](#stage-3-create-skill-when-management-needed)
+  - [Stage 4: Add MCP if External Data Needed](#stage-4-add-mcp-if-external-data-needed)
+- [Common Decision Anti-Patterns](#common-decision-anti-patterns)
+  - [Anti-Pattern 1: Converting All Slash Commands to Skills](#anti-pattern-1-converting-all-slash-commands-to-skills)
+  - [Anti-Pattern 2: Using Skills for One-Off Tasks](#anti-pattern-2-using-skills-for-one-off-tasks)
+  - [Anti-Pattern 3: Skipping the Primitive](#anti-pattern-3-skipping-the-primitive)
+  - [Anti-Pattern 4: Using Sub-Agents When Context Matters](#anti-pattern-4-using-sub-agents-when-context-matters)
+  - [Anti-Pattern 5: Forgetting MCP is for External Only](#anti-pattern-5-forgetting-mcp-is-for-external-only)
+- [Decision Checklist](#decision-checklist)
+- [Summary: The Golden Rules](#summary-the-golden-rules)
+
+## The Decision Tree
+
+Start here when deciding which component to use:
+
+```text
+1. START HERE: Build a Prompt (Slash Command)
+   ↓
+2. Need parallelization or isolated context?
+   YES → Use Sub-Agent
+   NO → Continue
+   ↓
+3. External data/service integration?
+   YES → Use MCP Server
+   NO → Continue
+   ↓
+4. One-off task (simple, direct)?
+   YES → Use Slash Command
+   NO → Continue
+   ↓
+5. Repeatable workflow (pattern detection)?
+   YES → Use Agent Skill
+   NO → Continue
+   ↓
+6. Lifecycle event automation?
+   YES → Use Hook
+   NO → Continue
+   ↓
+7. Sharing/distributing to team?
+   YES → Use Plugin
+   NO → Default to Slash Command (prompt)
+```
+
+**Critical Rule:** Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
+
+## Quick Reference: Decision Matrix
+
+| Task Type | Component | Reason |
+|-----------|-----------|---------|
+| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
+| External data/service access | MCP Server | Integration point |
+| Parallel/isolated work | Sub-Agent | Context isolation |
+| Parallel workflow tasks | Sub-Agent | **Whenever you see "parallel", think sub-agents** |
+| One-off task | Slash Command | Simple, direct |
+| Lifecycle automation | Hook | Event-driven |
+| Team distribution | Plugin | Packaging |
+
+## When to Use Each Component
+
+### Use Skills When
+
+**Signal keywords:** "automatic," "repeat," "manage," "workflow"
+
+**Criteria:**
+
+- You have a **REPEAT** problem that needs **MANAGEMENT**
+- Multiple related operations need coordination
+- You want **automatic** behavior (agent-invoked)
+- The problem domain requires orchestration of multiple components
+
+**Example scenarios:**
+
+- Managing git work trees (create, list, remove, merge, update)
+- Detecting style guide violations across codebase
+- Automatic PDF text extraction and processing
+- Video processing workflows with multiple steps
+
+**NOT for:**
+
+- One-off tasks → Use Slash Command instead
+- Simple operations → Use Slash Command instead
+- Problems solved well by a single prompt → Don't over-engineer
+
+**Remember:** Skills are for managing problem domains, not solving one-off tasks.
+
+### Use Sub-Agents When
+
+**Signal keywords:** "parallel," "scale," "bulk," "isolated," "batch"
+
+**Criteria:**
+
+- **Parallelization** is needed
+- **Context isolation** is required
+- Scale tasks and batch operations
+- You're okay with losing context afterward
+- Each task can run independently
+
+**Example scenarios:**
+
+- Comprehensive security audits
+- Fix & debug tests at scale
+- Parallel workflow tasks
+- Bulk operations on multiple files
+- Isolated research that doesn't pollute main context
+
+**NOT for:**
+
+- Tasks that need to share context → Use main conversation
+- Sequential operations → Use Slash Command or Skill
+- Tasks that need to spawn more sub-agents → Hard limit: no nesting
+
+**Critical constraint:** You must be okay with losing context afterward. Sub-agent context doesn't persist in the main conversation (unless you use resumable sub-agents).
+
+**Golden rule:** "Whenever you see parallel, you should always just think sub-agents. Nothing else supports parallel calling."
+
+### Use Slash Commands When
+
+**Signal keywords:** "one-off," "simple," "quick," "manual"
+
+**Criteria:**
+
+- One-off tasks
+- Simple repeatable actions
+- You're starting a new workflow
+- Building the primitive before composing
+- You want manual control over invocation
+
+**Example scenarios:**
+
+- Git commit messages (one at a time)
+- Create UI component
+- Run specific code generation
+- Execute a well-defined task
+- Quick transformations
+
+**Philosophy:** "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
+
+**Remember:** Slash commands are the primitive foundation. Master these first before anything else.
+
+### Use MCP Servers When
+
+**Signal keywords:** "external," "database," "API," "service," "integration"
+
+**Criteria:**
+
+- External integrations are needed
+- Data sources outside Claude Code
+- Third-party services
+- Database connections
+- Real-time data access
+
+**Example scenarios:**
+
+- Connect to Jira
+- Query databases (PostgreSQL, etc.)
+- Fetch real-time weather data
+- GitHub integration
+- Slack integration
+- Figma designs
+
+**NOT for:**
+
+- Internal orchestration → Use Skills instead
+- Pure computation → Use Slash Command or Skill
+
+**Clear rule:** External = MCP, Internal orchestration = Skills
+
+**Context consideration:** MCP servers can "torch your context window" by loading all their context at startup, unlike Skills which use progressive disclosure.
+
+### Use Hooks When
+
+**Signal keywords:** "lifecycle," "event," "automation," "deterministic"
+
+**Criteria:**
+
+- Deterministic automation at lifecycle events
+- Want to execute commands at specific moments
+- Need to balance agent autonomy with deterministic control
+- Workflow automation that should always happen
+
+**Example scenarios:**
+
+- Run linters before code submission
+- Auto-format code after generation
+- Trigger tests after file changes
+- Capture context at specific points
+
+**Philosophy:** "If you really want to scale, you need both" - agents AND deterministic workflows.
+
+**Use for:** Adding determinism rather than always relying on the agent to decide.
+
+### Use Plugins When
+
+**Signal keywords:** "share," "distribute," "package," "team"
+
+**Criteria:**
+
+- Sharing/distributing to team
+- Packaging multiple components together
+- Reusable work across projects
+- Team-wide extensions
+
+**Example scenarios:**
+
+- Distribute custom skills to team
+- Bundle MCP servers for automatic start
+- Share slash commands across projects
+- Package hooks and configurations
+
+**Philosophy:** "Plugins let you package and distribute these sets of work. This isn't super interesting. It's just a way to share and reuse cloud code extensions."
+
+## Use Case Examples from the Field
+
+Real examples with reasoning:
+
+| Use Case | Component | Reasoning |
+|----------|-----------|-----------|
+| Automatic PDF text extraction | Agent Skill | Keyword "automatic", repeat behavior |
+| Connect to Jira | MCP Server | External source |
+| Comprehensive security audit | Sub-Agent | Scale, isolated context, not automatic |
+| Generalized git commit messages | Slash Command | Simple one-step task |
+| Query database | MCP Server | External data source (start here) |
+| Fix/debug tests at scale | Sub-Agent | Parallel work, scale |
+| Detect style guide violations | Agent Skill | Repeat behavior pattern |
+| Fetch real-time weather | MCP Server | Third-party service integration |
+| Create UI component | Slash Command | Simple one-off task |
+| Parallel workflow tasks | Sub-Agent | Keyword "parallel" |
+
+## Composition Rules and Boundaries
+
+### What Can Compose What
+
+**Skills (Top Compositional Layer):**
+
+- ✅ Can use: MCP Servers
+- ✅ Can use: Sub-Agents
+- ✅ Can use: Slash Commands
+- ✅ Can use: Other Skills
+- ❌ Cannot: Nest sub-agents/prompts directly (must use SlashCommand tool)
+
+**Slash Commands (Primitive + Compositional):**
+
+- ✅ Can use: Skills (via SlashCommand tool)
+- ✅ Can use: MCP Servers
+- ✅ Can use: Sub-Agents
+- ✅ Acts as: BOTH primitive AND composition point
+
+**Sub-Agents (Execution Layer):**
+
+- ✅ Can use: Slash Commands (via SlashCommand tool)
+- ✅ Can use: Skills (via SlashCommand tool)
+- ❌ CANNOT use: Other Sub-Agents (hard limit)
+
+**MCP Servers (Integration Layer):**
+
+- ℹ️ Lower level unit, used BY skills
+- ℹ️ Not using skills
+- ℹ️ Expose services to all components
+
+### Critical Composition Rules
+
+1. **Sub-Agents cannot nest** - No sub-agent spawning other sub-agents (prevents infinite nesting)
+2. **Skills don't execute code** - They guide Claude to use available tools
+3. **Slash commands can be invoked manually or via SlashCommand tool**
+4. **Skills use the SlashCommand tool** to compose prompts and sub-agents
+5. **No circular dependencies** - Skills can use other skills but cannot nest circularly
+
+## The Proper Evolution Path
+
+When building new capabilities, follow this progression:
+
+### Stage 1: Start with a Prompt
+
+**Goal:** Solve the basic problem
+
+Create a simple prompt or slash command that accomplishes the core task.
+
+**Example (Git Work Trees):** Create one work tree
+
+```bash
+/create-worktree feature-branch
+```
+
+**When to stay here:** The task is one-off or infrequent.
+
+### Stage 2: Add Sub-Agent if Parallelism Needed
+
+**Goal:** Scale to multiple parallel operations
+
+If you need to do the same thing many times in parallel, use a sub-agent.
+
+**Example (Git Work Trees):** Create multiple work trees in parallel
+
+```bash
+Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
+```
+
+**When to stay here:** Parallel execution is the only requirement, no orchestration needed.
+
+### Stage 3: Create Skill When Management Needed
+
+**Goal:** Bundle multiple related operations
+
+When the problem grows to require management, create a skill.
+
+**Example (Git Work Trees):** Manage work trees (create, list, remove, merge, update)
+
+Now you have a cohesive work tree manager skill that:
+
+- Creates new work trees
+- Lists existing work trees
+- Removes old work trees
+- Merges work trees
+- Updates work tree status
+
+**When to stay here:** Most domain-specific workflows stop here.
+
+### Stage 4: Add MCP if External Data Needed
+
+**Goal:** Integrate external systems
+
+Only add MCP servers when you need data from outside Claude Code.
+
+**Example (Git Work Trees):** Query external repo metadata from GitHub API
+
+Now your skill can query GitHub for:
+
+- Branch protection rules
+- CI/CD status
+- Pull request information
+
+**Final state:** Full-featured work tree manager with external integration.
+
+## Common Decision Anti-Patterns
+
+### ❌ Anti-Pattern 1: Converting All Slash Commands to Skills
+
+**Mistake:** "I'm going to convert all my slash commands to skills because skills are better."
+
+**Why wrong:** Skills are for repeatable workflows that need management, not simple one-off tasks. Slash commands are the primitive—you need them.
+
+**Correct approach:** Keep slash commands for simple tasks. Only create a skill when you're managing a problem domain with multiple related operations.
+
+### ❌ Anti-Pattern 2: Using Skills for One-Off Tasks
+
+**Mistake:** "I need to create a UI component once, so I'll build a skill for it."
+
+**Why wrong:** Skills are for repeat problems. One-off tasks should use slash commands.
+
+**Correct approach:** Use a slash command for the one-off task. If you find yourself doing it repeatedly, then consider a skill.
+
+### ❌ Anti-Pattern 3: Skipping the Primitive
+
+**Mistake:** "I'm going to start by building a skill because it's more advanced."
+
+**Why wrong:** If you don't master prompts, you can't build effective skills. Everything is prompts in the end.
+
+**Correct approach:** Always start with a prompt. Build the primitive first. Scale up only when needed.
+
+### ❌ Anti-Pattern 4: Using Sub-Agents When Context Matters
+
+**Mistake:** "I'll use a sub-agent for this research task and then reference the findings later."
+
+**Why wrong:** Sub-agent context is isolated. You lose it after the sub-agent finishes (unless using resumable sub-agents).
+
+**Correct approach:** If you need the context later, do the work in the main conversation or use a resumable sub-agent.
+
+### ❌ Anti-Pattern 5: Forgetting MCP is for External Only
+
+**Mistake:** "I'll build an MCP server to orchestrate internal workflows."
+
+**Why wrong:** MCP servers are for external integrations. Internal orchestration should use skills.
+
+**Correct approach:** MCP = external, Skills = internal orchestration. Keep them separate.
+
+## Decision Checklist
+
+Before you start building, ask yourself:
+
+**Basic Questions:**
+
+- [ ] Have I started with a prompt? (Non-negotiable)
+- [ ] Is this a one-off task or repeatable?
+- [ ] Do I need external data or services?
+- [ ] Is parallelization required?
+- [ ] Am I okay losing context after execution?
+
+**Composition Questions:**
+
+- [ ] Am I trying to nest sub-agents? (Not allowed)
+- [ ] Am I converting a simple slash command to a skill? (Probably wrong)
+- [ ] Am I using MCP for internal orchestration? (Should use skills)
+- [ ] Have I considered the evolution path? (Prompt → Sub-agent → Skill → MCP)
+
+**Context Questions:**
+
+- [ ] Will this torch my context window? (MCP consideration)
+- [ ] Do I need progressive disclosure? (Skills benefit)
+- [ ] Is context isolation critical? (Sub-agent benefit)
+- [ ] Will I need this context later? (Don't use sub-agent)
+
+## Summary: The Golden Rules
+
+1. **Always start with prompts** - Master the primitive first
+2. **"Parallel" keyword = Sub-Agents** - Nothing else supports parallel calling
+3. **External = MCP, Internal = Skills** - Clear separation of concerns
+4. **One-off = Slash Command** - Don't over-engineer
+5. **Repeat + Management = Skill** - Only scale when needed
+6. **Don't convert all slash commands to skills** - Huge mistake
+7. **Skills compose upward, not downward** - Build from primitives
+
+Remember The Core 4: Context, Model, Prompt, Tools. Master these fundamentals, and you'll master the compositional units.
--- a/skills/multi-agent-composition/patterns/hooks-in-composition.md
+++ b/skills/multi-agent-composition/patterns/hooks-in-composition.md
@@ -0,0 +1,925 @@
+# Hooks for Observability and Control
+
+> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
+
+Claude Code hooks provide deterministic control over agent behavior and enable comprehensive monitoring of multi-agent systems.
+
+## What Are Hooks?
+
+**Hooks are lifecycle event handlers that let you execute custom code at specific points in Claude Code's execution.**
+
+```text
+Agent Lifecycle:
+├── pre-tool-use hook → Before any tool executes
+├── [Tool executes]
+├── post-tool-use hook → After tool completes
+├── notification hook → When agent needs input
+├── sub-agent-stop hook → When sub-agent finishes
+└── stop hook → When agent completes response
+```
+
+**Two killer use cases:**
+
+1. **Observability** - Know what your agents are doing
+2. **Control** - Steer and block agent behavior
+
+## The Five Hooks
+
+### 1. pre-tool-use
+
+**When it fires:** Before any tool executes
+
+**Use cases:**
+
+- Block dangerous commands (`rm -rf`, destructive operations)
+- Prevent access to sensitive files (`.env`, `credentials.json`)
+- Log tool attempts before execution
+- Validate tool parameters
+
+**Available data:**
+
+```json
+{
+  "toolName": "bash",
+  "toolInput": {
+    "command": "rm -rf /",
+    "description": "Remove all files"
+  }
+}
+```
+
+**Example: Block dangerous commands**
+
+```python
+# .claude/hooks/pre-tool-use.py
+# /// script
+# dependencies = []
+# ///
+
+import sys
+import json
+import re
+
+def is_dangerous_remove_command(tool_name, tool_input):
+    """Block any rm -rf commands"""
+    if tool_name != "bash":
+        return False
+
+    command = tool_input.get("command", "")
+    dangerous_patterns = [
+        r'\brm\s+-rf\b',
+        r'\brm\s+-fr\b',
+        r'\brm\s+.*-[rf].*\*',
+    ]
+
+    return any(re.search(pattern, command) for pattern in dangerous_patterns)
+
+def main():
+    input_data = json.load(sys.stdin)
+    tool_name = input_data.get("toolName")
+    tool_input = input_data.get("toolInput", {})
+
+    if is_dangerous_remove_command(tool_name, tool_input):
+        # Block the command
+        output = {
+            "allow": False,
+            "message": "❌ Blocked dangerous rm command"
+        }
+    else:
+        output = {"allow": True}
+
+    print(json.dumps(output))
+
+if __name__ == "__main__":
+    main()
+```
+
+**Configuration in settings.json:**
+
+```json
+{
+  "hooks": {
+    "pre-tool-use": [
+      {
+        "matcher": {},  // Empty = matches all tools
+        "commands": [
+          "uv run .claude/hooks/pre-tool-use.py"
+        ]
+      }
+    ]
+  }
+}
+```
+
+### 2. post-tool-use
+
+**When it fires:** After a tool completes execution
+
+**Use cases:**
+
+- Log tool execution results
+- Track which tools are used most frequently
+- Measure tool execution time
+- Build observability dashboards
+- Summarize tool output with small models
+
+**Available data:**
+
+```json
+{
+  "toolName": "write",
+  "toolInput": {
+    "file_path": "/path/to/file.py",
+    "content": "..."
+  },
+  "toolResult": {
+    "success": true,
+    "output": "File written successfully"
+  }
+}
+```
+
+**Example: Event logging with summarization**
+
+```python
+# .claude/hooks/post-tool-use.py
+import sys
+import json
+import os
+from anthropic import Anthropic
+
+def summarize_event(tool_name, tool_input, tool_result):
+    """Use Haiku to summarize what happened"""
+    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+
+    prompt = f"""Summarize this tool execution in 1 sentence:
+Tool: {tool_name}
+Input: {json.dumps(tool_input, indent=2)}
+Result: {json.dumps(tool_result, indent=2)}
+
+Be concise and focus on what was accomplished."""
+
+    response = client.messages.create(
+        model="claude-3-haiku-20240307",  # Small, fast, cheap
+        max_tokens=100,
+        messages=[{"role": "user", "content": prompt}]
+    )
+
+    return response.content[0].text
+
+def main():
+    input_data = json.load(sys.stdin)
+
+    # Generate summary using small model
+    summary = summarize_event(
+        input_data.get("toolName"),
+        input_data.get("toolInput", {}),
+        input_data.get("toolResult", {})
+    )
+
+    # Log the event with summary
+    event = {
+        "toolName": input_data["toolName"],
+        "summary": summary,
+        "timestamp": input_data.get("timestamp")
+    }
+
+    # Send to observability server
+    send_to_server(event)
+
+if __name__ == "__main__":
+    main()
+```
+
+**Why small models?** "I've sent thousands of these events. I've spent less than 20 cents. This is where small fast models really shine."
+
+### 3. notification
+
+**When it fires:** When Claude Code needs user input (permission request)
+
+**Use cases:**
+
+- Text-to-speech notifications
+- Send alerts to phone/Slack
+- Log permission requests
+- Auto-approve specific tools
+
+**Available data:**
+
+```json
+{
+  "message": "Your agent needs your input",
+  "context": {
+    "toolName": "bash",
+    "command": "bun run apps/hello.ts"
+  }
+}
+```
+
+**Example: Text-to-speech notification**
+
+```python
+# .claude/hooks/notification.py
+import sys
+import json
+import subprocess
+
+def speak(text):
+    """Use 11Labs API for text-to-speech"""
+    subprocess.run([
+        "uv", "run",
+        ".claude/hooks/utils/text-to-speech-elevenlabs.py",
+        text
+    ])
+
+def main():
+    input_data = json.load(sys.stdin)
+    message = input_data.get("message", "Your agent needs your input")
+
+    # Speak the notification
+    speak(message)
+
+    # Log it
+    print(json.dumps({"notified": True}))
+
+if __name__ == "__main__":
+    main()
+```
+
+### 4. stop
+
+**When it fires:** Every time Claude Code finishes responding
+
+**Use cases:**
+
+- Copy full chat transcript for analysis
+- Completion notifications (text-to-speech)
+- Session logging
+- Performance metrics
+- Agent output summarization
+
+**Available data:**
+
+```json
+{
+  "transcriptPath": "/path/to/chat-transcript.json",
+  "sessionId": "abc123",
+  "timestamp": "2025-01-05T14:30:00Z"
+}
+```
+
+**Example: Save full conversation**
+
+```python
+# .claude/hooks/stop.py
+import sys
+import json
+import shutil
+from pathlib import Path
+from datetime import datetime
+
+def main():
+    input_data = json.load(sys.stdin)
+    transcript_path = input_data.get("transcriptPath")
+
+    if not transcript_path:
+        return
+
+    # Copy transcript to logs directory
+    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
+    session_id = input_data.get("sessionId", "unknown")
+
+    logs_dir = Path(".claude/logs")
+    logs_dir.mkdir(exist_ok=True)
+
+    dest = logs_dir / f"chat-{timestamp}-{session_id[:8]}.json"
+    shutil.copy(transcript_path, dest)
+
+    # Announce completion
+    subprocess.run([
+        "uv", "run",
+        ".claude/hooks/utils/text-to-speech.py",
+        "All set and ready for your next step"
+    ])
+
+    print(json.dumps({"logged": True, "file": str(dest)}))
+
+if __name__ == "__main__":
+    main()
+```
+
+**Key insight:** "The stop event is the perfect time to copy the entire chat conversation. This is key for observability. What happened? How can we improve it?"
+
+### 5. sub-agent-stop
+
+**When it fires:** When a sub-agent completes its work
+
+**Use cases:**
+
+- Track parallel sub-agent completion
+- Per-agent performance metrics
+- Multi-agent orchestration logging
+- Progress notifications for long-running jobs
+
+**Available data:**
+
+```json
+{
+  "subAgentId": "agent-123",
+  "transcriptPath": "/path/to/sub-agent-transcript.json",
+  "sessionId": "parent-abc123",
+  "timestamp": "2025-01-05T14:32:00Z"
+}
+```
+
+**Example: Sub-agent completion tracking**
+
+```python
+# .claude/hooks/sub-agent-stop.py
+import sys
+import json
+
+def main():
+    input_data = json.load(sys.stdin)
+
+    # Log sub-agent completion
+    event = {
+        "type": "sub-agent-complete",
+        "agentId": input_data.get("subAgentId"),
+        "timestamp": input_data.get("timestamp")
+    }
+
+    # Send to observability system
+    send_event(event)
+
+    # Announce
+    speak("Sub agent complete")
+
+if __name__ == "__main__":
+    main()
+```
+
+## Multi-Agent Observability Architecture
+
+When scaling to 3, 5, 10+ agents, hooks enable comprehensive system visibility.
+
+### Architecture Overview
+
+```text
+┌─────────────────────────────────────────────────────────────┐
+│                     Multiple Agents                          │
+│  Agent 1    Agent 2    Agent 3    ...    Agent N            │
+│    │          │          │                  │                │
+│    └──────────┴──────────┴──────────────────┘                │
+│                        │                                      │
+│                   Hooks fire                                  │
+│                        ↓                                      │
+└─────────────────────────────────────────────────────────────┘
+                         │
+                         ↓
+┌─────────────────────────────────────────────────────────────┐
+│                   Bun/Node Server                            │
+│  ┌────────────────┐         ┌──────────────┐               │
+│  │  HTTP Endpoint │────────→│   SQLite DB  │               │
+│  │  /events       │         │  (persistence)│               │
+│  └────────────────┘         └──────────────┘               │
+│         │                                                     │
+│         └────────────→ WebSocket Broadcast                   │
+└─────────────────────────────────────────────────────────────┘
+                         │
+                         ↓
+┌─────────────────────────────────────────────────────────────┐
+│                  Web Client (Vue/React)                      │
+│  ┌──────────────────────────────────────────────────────┐  │
+│  │  Live Activity Pulse (1min/3min/5min windows)        │  │
+│  ├──────────────────────────────────────────────────────┤  │
+│  │  Event Stream (filtered by app/session/event type)   │  │
+│  ├──────────────────────────────────────────────────────┤  │
+│  │  Event Details (with AI summaries)                    │  │
+│  └──────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Key Design Principles
+
+**1. One-Way Data Stream**
+
+```text
+Agent → Hook → Server → Database + WebSocket → Client
+```
+
+"This one-way data stream keeps things really simple. Every agent is responsible for summarizing their work in the hook before they send it off."
+
+**Benefits:**
+
+- Simple architecture
+- Easy to reason about
+- No bidirectional complexity
+- Fast real-time updates
+
+**2. Event Summarization at the Edge**
+
+```python
+# In the hook (runs on agent side)
+def send_event(app_name, event_type, event_data, summarize=True):
+    if summarize:
+        # Use Haiku to summarize before sending
+        summary = summarize_with_haiku(event_data)
+        event_data["summary"] = summary
+
+    # Send to server
+    requests.post("http://localhost:3000/events", json={
+        "app": app_name,
+        "type": event_type,
+        "data": event_data,
+        "sessionId": os.getenv("CLAUDE_SESSION_ID")
+    })
+```
+
+**Why summarize at the edge?**
+
+- Reduces server load
+- Cheaper (uses small models locally)
+- Human-readable summaries immediately available
+- No server-side LLM dependencies
+
+**3. Persistent + Real-Time Storage**
+
+```sql
+-- SQLite schema
+CREATE TABLE events (
+    id INTEGER PRIMARY KEY,
+    source_app TEXT NOT NULL,
+    session_id TEXT NOT NULL,
+    event_type TEXT NOT NULL,
+    raw_payload JSON,
+    summary TEXT,
+    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+**Dual persistence:**
+
+- SQLite for historical queries and analysis
+- WebSocket for live streaming to UI
+
+### Implementation Example
+
+**Hook script structure:**
+
+```python
+# .claude/hooks/utils/send-event.py
+# /// script
+# dependencies = ["anthropic", "requests"]
+# ///
+
+import sys
+import json
+import os
+import requests
+from anthropic import Anthropic
+
+def summarize_with_haiku(event_data, event_type):
+    """Generate 1-sentence summary using Haiku"""
+    if event_type not in ["pre-tool-use", "post-tool-use"]:
+        return None
+
+    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+
+    prompt = f"Summarize this {event_type} event in 1 sentence: {json.dumps(event_data)}"
+
+    response = client.messages.create(
+        model="claude-3-haiku-20240307",
+        max_tokens=50,
+        messages=[{"role": "user", "content": prompt}]
+    )
+
+    return response.content[0].text
+
+def send_event(app_name, event_type, event_data, summarize=False):
+    """Send event to observability server"""
+
+    payload = {
+        "app": app_name,
+        "sessionId": os.getenv("CLAUDE_SESSION_ID", "unknown"),
+        "eventType": event_type,
+        "data": event_data,
+        "timestamp": event_data.get("timestamp")
+    }
+
+    if summarize:
+        payload["summary"] = summarize_with_haiku(event_data, event_type)
+
+    try:
+        response = requests.post(
+            "http://localhost:3000/events",
+            json=payload,
+            timeout=1
+        )
+        return response.status_code == 200
+    except Exception as e:
+        # Don't break agent if observability fails
+        print(f"Warning: Failed to send event: {e}", file=sys.stderr)
+        return False
+
+def main():
+    if len(sys.argv) < 3:
+        print("Usage: send-event.py <app-name> <event-type> [--summarize]")
+        sys.exit(1)
+
+    app_name = sys.argv[1]
+    event_type = sys.argv[2]
+    summarize = "--summarize" in sys.argv
+
+    # Read event data from stdin
+    event_data = json.load(sys.stdin)
+
+    success = send_event(app_name, event_type, event_data, summarize)
+    print(json.dumps({"sent": success}))
+
+if __name__ == "__main__":
+    main()
+```
+
+**Using in hooks:**
+
+```python
+# .claude/hooks/post-tool-use.py
+import sys
+import json
+import subprocess
+
+def main():
+    input_data = json.load(sys.stdin)
+
+    # Send to observability system with summarization
+    subprocess.run([
+        "uv", "run",
+        ".claude/hooks/utils/send-event.py",
+        "my-app",  # App name
+        "post-tool-use",  # Event type
+        "--summarize"  # Generate AI summary
+    ], input=json.dumps(input_data), text=True)
+
+    print(json.dumps({"logged": True}))
+
+if __name__ == "__main__":
+    main()
+```
+
+## Best Practices
+
+### 1. Use Isolated Scripts (Astral UV Pattern)
+
+**Why:** Hooks should be self-contained, portable, and not depend on your codebase.
+
+```python
+# /// script
+# dependencies = ["anthropic", "requests"]
+# ///
+
+# Astral UV single-file script
+# Runs independently with: uv run script.py
+# Auto-installs dependencies
+```
+
+**Benefits:**
+
+- Works in any codebase
+- No virtual environment setup
+- Portable across projects
+- Easy to test in isolation
+
+**Alternative: Bun for TypeScript**
+
+```typescript
+// .claude/hooks/post-tool-use.ts
+// Run with: bun run post-tool-use.ts
+
+import { readSync } from "fs";
+
+const input = JSON.parse(readSync(0, "utf-8"));
+// ... hook logic
+```
+
+### 2. Never Block the Agent
+
+```python
+def main():
+    try:
+        # Hook logic
+        send_to_server(event)
+    except Exception as e:
+        # Log but don't fail
+        print(f"Warning: {e}", file=sys.stderr)
+        # Always output valid JSON
+        print(json.dumps({"error": str(e)}))
+```
+
+**Rule:** If observability fails, the agent should continue working.
+
+### 3. Use Small Fast Models for Summaries
+
+```text
+Cost comparison (1,000 events):
+├── Opus: $15 (overkill for summaries)
+├── Sonnet: $3 (still expensive)
+└── Haiku: $0.20 ✅ (perfect for this)
+```
+
+"Thousands of events, less than 20 cents. Small fast cheap models shine here."
+
+### 4. Hash Session IDs for UI Consistency
+
+```python
+import hashlib
+
+def color_for_session(session_id):
+    """Generate consistent color from session ID"""
+    hash_val = int(hashlib.md5(session_id.encode()).hexdigest()[:6], 16)
+    return f"#{hash_val:06x}"
+```
+
+**Result:** Same agent = same color in UI, making it easy to track.
+
+### 5. Filter and Paginate Events
+
+```javascript
+// Client-side filtering
+const filteredEvents = events
+  .filter(e => e.app === selectedApp || selectedApp === "all")
+  .filter(e => e.eventType === selectedType || selectedType === "all")
+  .slice(0, 100); // Limit displayed events
+
+// Auto-refresh
+setInterval(() => fetchLatestEvents(), 5000);
+```
+
+### 6. Multiple Hooks Per Event
+
+```json
+{
+  "hooks": {
+    "stop": [
+      {
+        "matcher": {},
+        "commands": [
+          "uv run .claude/hooks/stop-chat-log.py",
+          "uv run .claude/hooks/stop-tts.py",
+          "uv run .claude/hooks/stop-notify.py"
+        ]
+      }
+    ]
+  }
+}
+```
+
+**Hooks run sequentially** in the order specified.
+
+### 7. Matcher Patterns for Selective Execution
+
+```json
+{
+  "hooks": {
+    "pre-tool-use": [
+      {
+        "matcher": {
+          "toolName": "bash"
+        },
+        "commands": ["uv run .claude/hooks/bash-validator.py"]
+      },
+      {
+        "matcher": {
+          "toolName": "write",
+          "toolInput": {
+            "file_path": "**/.env"
+          }
+        },
+        "commands": ["uv run .claude/hooks/block-env-write.py"]
+      }
+    ]
+  }
+}
+```
+
+## Directory Structure Best Practice
+
+```text
+.claude/
+├── commands/           # Slash commands
+├── agents/             # Sub-agent definitions
+└── hooks/              # ← New essential directory
+    ├── settings.json   # Hook configuration
+    ├── pre-tool-use.py
+    ├── post-tool-use.py
+    ├── notification.py
+    ├── stop.py
+    ├── sub-agent-stop.py
+    └── utils/          # Shared utilities
+        ├── send-event.py
+        ├── text-to-speech-elevenlabs.py
+        ├── text-to-speech-openai.py
+        └── summarize-haiku.py
+```
+
+## Real-World Use Cases
+
+### Use Case 1: Block Dangerous Operations
+
+```python
+# .claude/hooks/pre-tool-use.py
+
+BLOCKED_COMMANDS = [
+    r'\brm\s+-rf\b',      # rm -rf
+    r'\bsudo\s+rm\b',     # sudo rm
+    r'\bgit\s+push.*--force\b',  # git push --force
+    r'\bdocker\s+system\s+prune\b',  # docker system prune
+]
+
+BLOCKED_FILES = [
+    r'\.env$',
+    r'credentials\.json$',
+    r'\.ssh/id_rsa$',
+    r'aws.*credentials',
+]
+
+def is_blocked(tool_name, tool_input):
+    if tool_name == "bash":
+        command = tool_input.get("command", "")
+        return any(re.search(p, command) for p in BLOCKED_COMMANDS)
+
+    if tool_name in ["read", "write", "edit"]:
+        file_path = tool_input.get("file_path", "")
+        return any(re.search(p, file_path) for p in BLOCKED_FILES)
+
+    return False
+```
+
+### Use Case 2: Multi-Agent Task Board
+
+```text
+Observability UI showing:
+
+Active Agents (5):
+├── [Agent 1] Planning feature (12s ago)
+├── [Agent 2] Writing tests (45s ago) ⚠️ Needs input
+├── [Agent 3] Building UI (2m ago)
+├── [Agent 4] Deploying (5m ago) ✅ Complete
+└── [Agent 5] Monitoring (ongoing)
+
+Recent Events (filtered: post-tool-use):
+├── Agent 3: Wrote src/components/Button.tsx
+├── Agent 1: Read src/api/endpoints.ts
+├── Agent 4: Bash: git push origin main
+└── Agent 2: Test failed: test/auth.test.ts
+```
+
+### Use Case 3: Long-Running AFK Agents
+
+```bash
+# Start agent with background work
+/background "Implement entire auth system" --report agents/auth-report.md
+
+# Agent works autonomously
+# Hooks send notifications:
+# - "Starting authentication module"
+# - "Database schema created"
+# - "Tests passing"
+# - "All set and ready for your next step"
+
+# You're notified via text-to-speech when complete
+```
+
+### Use Case 4: Debugging Agent Behavior
+
+```python
+# Filter stop events to analyze full chat transcripts
+
+for event in events.filter(type="stop"):
+    transcript = json.load(open(event.transcriptPath))
+
+    # Analyze:
+    # - What files did agent read?
+    # - What tools were used most?
+    # - Where did agent get confused?
+    # - What patterns led to errors?
+```
+
+## Performance Considerations
+
+### Webhook Timeouts
+
+```python
+# Don't block agent on slow external services
+try:
+    requests.post(webhook_url, json=event, timeout=0.5)  # 500ms max
+except requests.Timeout:
+    # Log locally instead
+    log_to_file(event)
+```
+
+### Database Size Management
+
+```sql
+-- Rotate old events
+DELETE FROM events
+WHERE timestamp < datetime('now', '-30 days');
+
+-- Or archive
+INSERT INTO events_archive SELECT * FROM events
+WHERE timestamp < datetime('now', '-30 days');
+
+DELETE FROM events
+WHERE id IN (SELECT id FROM events_archive);
+```
+
+### Event Batching
+
+```python
+# Batch events before sending
+events_buffer = []
+
+def send_event(event):
+    events_buffer.append(event)
+
+    if len(events_buffer) >= 10:
+        flush_events()
+
+def flush_events():
+    requests.post(server_url, json={"events": events_buffer})
+    events_buffer.clear()
+```
+
+## Integration with Observability Platforms
+
+### Datadog
+
+```python
+from datadog import statsd
+
+def send_to_datadog(event):
+    statsd.increment(f"claude.tool.{event['toolName']}")
+    statsd.histogram(f"claude.duration.{event['toolName']}", event['duration'])
+```
+
+### Prometheus
+
+```python
+from prometheus_client import Counter, Histogram
+
+tool_counter = Counter('claude_tool_executions', 'Tool executions', ['tool_name'])
+tool_duration = Histogram('claude_tool_duration_seconds', 'Tool duration', ['tool_name'])
+
+def send_to_prometheus(event):
+    tool_counter.labels(tool_name=event['toolName']).inc()
+    tool_duration.labels(tool_name=event['toolName']).observe(event['duration'])
+```
+
+### Slack
+
+```python
+import requests
+
+def send_to_slack(event):
+    if event['eventType'] == 'notification':
+        requests.post(
+            os.getenv("SLACK_WEBHOOK_URL"),
+            json={"text": f"🤖 Agent needs input: {event['message']}"}
+        )
+```
+
+## Key Principles
+
+1. **If you don't measure it, you can't improve it** - Observability is critical for scaling agents
+
+2. **Keep hooks simple and isolated** - Use single-file scripts (UV, bun, shell)
+
+3. **Never block the agent** - Hooks should be fast and fault-tolerant
+
+4. **Small models for summaries** - Haiku is perfect and costs pennies
+
+5. **One-way data streams** - Simple architecture beats complex bidirectional systems
+
+6. **Context, Model, Prompt** - Even with hooks, the big three still matter
+
+## Source Attribution
+
+**Primary source:** Multi-Agent Observability transcript (complete system architecture, WebSocket streaming, event summarization, SQLite persistence)
+
+**Supporting source:** Hooked transcript (5 hooks fundamentals, pre-tool-use implementation, text-to-speech integration, isolated scripts pattern)
+
+**Key quotes:**
+
+- "When it comes to agentic coding, observability is everything." (Hooked)
+- "This one-way data stream keeps things really simple." (Multi-Agent Observability)
+- "Thousands of events, less than 20 cents. Small fast models shine here." (Multi-Agent Observability)
+
+## Related Documentation
+
+- [Hooks Reference](../reference/hooks-reference.md) - Complete API reference for all 5 hooks
+- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real observability systems in action
+- [Core 4 Framework](../reference/core-4-framework.md) - Context, Model, Prompt, Tools
+
+---
+
+**Remember:** Observability isn't optional when scaling agents. If you can't see what they're doing, you can't scale them effectively.
--- a/skills/multi-agent-composition/patterns/orchestrator-pattern.md
+++ b/skills/multi-agent-composition/patterns/orchestrator-pattern.md
@@ -0,0 +1,673 @@
+# The Orchestrator Pattern
+
+> "The rate at which you can create and command your agents becomes the constraint of your engineering output. When your agents are slow, you're slow."
+
+The orchestrator pattern is **Level 5** of agentic engineering: managing fleets of agents through a single interface.
+
+## The Journey to Orchestration
+
+```text
+Level 1: Base agents       → Use agents out of the box
+Level 2: Better agents     → Customize prompts and workflows
+Level 3: More agents       → Run multiple agents
+Level 4: Custom agents     → Build specialized solutions
+Level 5: Orchestration     → Manage fleets of agents ← You are here
+```
+
+**Key realization:** Single agents hit context window limits. You need orchestration to scale beyond one agent.
+
+## The Three Pillars
+
+Multi-agent orchestration requires three components working together:
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│              1. ORCHESTRATOR AGENT                      │
+│         (Single interface to your fleet)                │
+└─────────────────────────────────────────────────────────┘
+                         ↓
+┌─────────────────────────────────────────────────────────┐
+│              2. CRUD FOR AGENTS                          │
+│    (Create, Read, Update, Delete agents at scale)       │
+└─────────────────────────────────────────────────────────┘
+                         ↓
+┌─────────────────────────────────────────────────────────┐
+│              3. OBSERVABILITY                            │
+│    (Monitor performance, costs, and results)             │
+└─────────────────────────────────────────────────────────┘
+```
+
+Without all three, orchestration fails. You need:
+
+- **Orchestrator** to command agents
+- **CRUD** to manage agent lifecycle
+- **Observability** to understand what agents are doing
+
+## Core Principle: The Orchestrator Sleeps
+
+> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. It has created and commanded our agents. Now, our agents are doing the work."
+
+**The pattern:**
+
+```text
+1. User prompts Orchestrator
+2. Orchestrator creates specialized agents
+3. Orchestrator commands agents with detailed prompts
+4. Orchestrator SLEEPS (stops consuming context)
+5. Agents work autonomously
+6. Orchestrator wakes periodically to check status
+7. Orchestrator reports results to user
+8. Agents are deleted
+```
+
+**Why orchestrator sleeps:**
+
+- Protects its context window
+- Avoids observing all agent work (too much information)
+- Only wakes when needed to check status or command agents
+
+**Example orchestrator sleep pattern:**
+
+```python
+# Orchestrator commands agents
+orchestrator.create_agent("scout", task="Find relevant files")
+orchestrator.create_agent("builder", task="Implement changes")
+
+# Orchestrator sleeps, checking status every 15s
+while not all_agents_complete():
+    orchestrator.sleep(15)  # Not consuming context
+    status = orchestrator.check_agent_status()
+    orchestrator.log(status)
+
+# Wake up to collect results
+results = orchestrator.get_agent_results()
+orchestrator.summarize_to_user(results)
+```
+
+## Orchestration Patterns
+
+### Pattern 1: Scout-Plan-Build (Sequential Chaining)
+
+**Use case:** Complex tasks requiring multiple specialized steps
+
+**Flow:**
+
+```text
+User: "Migrate codebase to new SDK"
+  ↓
+Orchestrator creates Scout agents (4 parallel)
+  ├→ Scout 1: Search with Gemini
+  ├→ Scout 2: Search with CodeX
+  ├→ Scout 3: Search with Haiku
+  └→ Scout 4: Search with Flash
+  ↓
+Scouts output: relevant-files.md with exact locations
+  ↓
+Orchestrator creates Planner agent
+  ├→ Reads relevant-files.md
+  ├→ Scrapes documentation
+  └→ Outputs: detailed-plan.md
+  ↓
+Orchestrator creates Builder agent
+  ├→ Reads detailed-plan.md
+  ├→ Executes implementation
+  └→ Tests and validates
+```
+
+**Why this works:**
+
+- **Scout step offloads searching from Planner** (R&D framework: Reduce + Delegate)
+- **Multiple scout models** provide diverse perspectives
+- **Planner only sees relevant files**, not entire codebase
+- **Builder focused on execution**, not planning
+
+**Implementation:**
+
+```bash
+# Composable slash commands
+/scout-plan-build "Migrate to new Claude Agent SDK"
+
+# Internally runs:
+/scout "Find files needing SDK migration"
+/plan-with-docs docs=https://agent-sdk-docs.com
+/build plan=agents/plans/sdk-migration.md
+```
+
+**Context savings:**
+
+```text
+Without scouts:
+├── Planner searches entire codebase: 50k tokens
+├── Planner reads irrelevant files: 30k tokens
+└── Total wasted: 80k tokens
+
+With scouts:
+├── 4 scouts search in parallel (isolated contexts)
+├── Planner reads only relevant-files.md: 5k tokens
+└── Savings: 75k tokens (94% reduction)
+```
+
+### Pattern 2: Plan-Build-Review-Ship (Task Board)
+
+**Use case:** Structured development lifecycle with quality gates
+
+**Flow:**
+
+```text
+User: "Update HTML titles across application"
+  ↓
+Task created → PLAN column
+  ↓
+Orchestrator creates Planner agent
+  ├→ Analyzes requirements
+  ├→ Creates implementation plan
+  └→ Moves task to BUILD
+  ↓
+Orchestrator creates Builder agent
+  ├→ Reads plan
+  ├→ Implements changes
+  ├→ Runs tests
+  └→ Moves task to REVIEW
+  ↓
+Orchestrator creates Reviewer agent
+  ├→ Checks implementation against plan
+  ├→ Validates tests pass
+  └→ Moves task to SHIP
+  ↓
+Orchestrator creates Shipper agent
+  ├→ Creates git commit
+  ├→ Pushes to remote
+  └→ Task complete
+```
+
+**Why this works:**
+
+- **Clear phases** with distinct responsibilities
+- **Each agent focused** on single phase
+- **Quality gates** between phases
+- **Failure isolation** - if builder fails, planner work preserved
+
+**Visual representation:**
+
+```text
+┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
+│  PLAN   │→ │  BUILD  │→ │ REVIEW  │→ │  SHIP   │
+├─────────┤  ├─────────┤  ├─────────┤  ├─────────┤
+│ Task A  │  │         │  │         │  │         │
+│         │  │         │  │         │  │         │
+└─────────┘  └─────────┘  └─────────┘  └─────────┘
+```
+
+**Agent handoff:**
+
+```python
+# Orchestrator manages task board state
+task = {
+    "id": "update-titles",
+    "status": "planning",
+    "assigned_agent": "planner-001",
+    "artifacts": []
+}
+
+# Planner completes
+task["status"] = "building"
+task["artifacts"].append("plan.md")
+task["assigned_agent"] = "builder-001"
+
+# Orchestrator hands off to builder
+orchestrator.command_agent(
+    "builder-001",
+    f"Implement plan from {task['artifacts'][0]}"
+)
+```
+
+### Pattern 3: Scout-Builder (Two-Stage)
+
+**Use case:** UI changes, targeted modifications
+
+**Flow:**
+
+```text
+User: "Create gray pills for app header information"
+  ↓
+Orchestrator creates Scout
+  ├→ Locates exact files and line numbers
+  ├→ Identifies patterns and conventions
+  └→ Outputs: scout-report.md
+  ↓
+Orchestrator creates Builder
+  ├→ Reads scout-report.md
+  ├→ Implements precise changes
+  └→ Outputs: modified files
+  ↓
+Orchestrator wakes, verifies, reports
+```
+
+**Orchestrator sleep pattern:**
+
+```python
+# Orchestrator creates scout
+orchestrator.create_agent("scout-header", task="Find header UI components")
+
+# Orchestrator sleeps, checking every 15s
+orchestrator.sleep_with_status_checks(interval=15)
+
+# Scout completes, orchestrator wakes
+scout_output = orchestrator.get_agent_output("scout-header")
+
+# Orchestrator creates builder with scout's output
+orchestrator.create_agent(
+    "builder-ui",
+    task=f"Create gray pills based on scout findings: {scout_output}"
+)
+
+# Orchestrator sleeps again
+orchestrator.sleep_with_status_checks(interval=15)
+```
+
+## Context Window Protection
+
+> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
+
+**The problem:** Single agent doing everything explodes context window
+
+```text
+Single Agent Approach:
+├── Search codebase: 40k tokens
+├── Read files: 60k tokens
+├── Plan changes: 20k tokens
+├── Implement: 30k tokens
+├── Test: 15k tokens
+└── Total: 165k tokens (83% used!)
+```
+
+**The solution:** Specialized agents with focused context
+
+```text
+Orchestrator Approach:
+├── Orchestrator: 10k tokens (coordinates)
+├── Scout 1: 15k tokens (searches)
+├── Scout 2: 15k tokens (searches)
+├── Planner: 25k tokens (plans using scout output)
+├── Builder: 35k tokens (implements)
+└── Total per agent: <35k tokens (max 18% per agent)
+```
+
+**Key principle:** Agents are deletable temporary resources
+
+```text
+1. Create agent for specific task
+2. Agent completes task
+3. DELETE agent (free memory)
+4. Create new agent for next task
+5. Repeat
+```
+
+**Example:**
+
+```bash
+# User: "Build documentation for frontend and backend"
+
+# Orchestrator creates 3 agents
+/create-agent frontend-docs "Document frontend components"
+/create-agent backend-docs "Document backend APIs"
+/create-agent qa-docs "Combine and QA both docs"
+
+# Work completes...
+
+# Delete all agents when done
+/delete-all-agents
+
+# Result: All agents gone, context freed
+```
+
+**Why delete agents:**
+
+- Frees context windows for new work
+- Prevents context accumulation
+- Enforces single-purpose design
+- Matches engineering principle: "The best code is no code at all"
+
+## CRUD for Agents
+
+Orchestrator needs full agent lifecycle control:
+
+**Create:**
+
+```python
+agent_id = orchestrator.create_agent(
+    name="scout-api",
+    task="Find all API endpoints",
+    model="haiku",  # Fast, cheap for search
+    max_tokens=100000
+)
+```
+
+**Read:**
+
+```python
+# Check agent status
+status = orchestrator.get_agent_status(agent_id)
+# => {"status": "working", "progress": "60%", "context_used": "15k tokens"}
+
+# Read agent output
+output = orchestrator.get_agent_output(agent_id)
+# => {"files_consumed": [...], "files_produced": [...]}
+```
+
+**Update:**
+
+```python
+# Command existing agent with new task
+orchestrator.command_agent(
+    agent_id,
+    "Now implement the changes based on your findings"
+)
+```
+
+**Delete:**
+
+```python
+# Single agent
+orchestrator.delete_agent(agent_id)
+
+# All agents
+orchestrator.delete_all_agents()
+```
+
+## Observability Requirements
+
+Without observability, orchestration is blind. You need:
+
+### 1. Agent-Level Visibility
+
+```text
+For each agent, track:
+├── Name and ID
+├── Status (creating, working, complete, failed)
+├── Context window usage
+├── Model and cost
+├── Files consumed
+├── Files produced
+└── Tool calls executed
+```
+
+### 2. Cross-Agent Visibility
+
+```text
+Fleet overview:
+├── Total agents active
+├── Total context consumed
+├── Total cost
+├── Agent dependencies (who's waiting on whom)
+└── Bottlenecks (slow agents blocking others)
+```
+
+### 3. Real-Time Streaming
+
+```text
+User sees:
+├── Agent creation events
+├── Tool calls as they happen
+├── Progress updates
+├── Completion notifications
+└── Error alerts
+```
+
+**Implementation:** See [Hooks for Observability](hooks-observability.md) for complete architecture
+
+## Information Flow in Orchestrated Systems
+
+```text
+User
+ ↓ (prompts)
+Orchestrator
+ ↓ (creates & commands)
+Agent 1 → Agent 2 → Agent 3
+ ↓         ↓         ↓
+(results flow back up)
+ ↓
+Orchestrator (summarizes)
+ ↓
+User
+```
+
+**Critical understanding:** Agents never talk directly to user. They report to orchestrator.
+
+**Example:**
+
+```python
+# User prompts orchestrator
+user: "Summarize codebase"
+
+# Orchestrator creates agent with detailed instructions
+orchestrator → agent: """
+Read all files in src/
+Create markdown summary with:
+- Architecture overview
+- Key components
+- File structure
+- Tech stack
+
+Report results back to orchestrator (not user!)
+"""
+
+# Agent completes, reports to orchestrator
+agent → orchestrator: "Summary complete at docs/summary.md"
+
+# Orchestrator reports to user
+orchestrator → user: "Codebase summary created with 3 main sections: architecture, components, and tech stack"
+```
+
+## When to Use Orchestration
+
+### Use orchestration when
+
+✅ **Task requires 3+ specialized agents**
+
+- Example: Scout + Plan + Build
+
+✅ **Context window exploding in single agent**
+
+- Single agent using >150k tokens
+
+✅ **Need parallel execution**
+
+- Multiple independent subtasks
+
+✅ **Quality gates required**
+
+- Plan → Build → Review → Ship
+
+✅ **Long-running autonomous work**
+
+- Agents work while you're AFK
+
+### Don't use orchestration when
+
+❌ **Simple one-off task**
+
+- Single agent sufficient
+
+❌ **Learning/prototyping**
+
+- Orchestration adds complexity
+
+❌ **No observability infrastructure**
+
+- You'll be blind to agent behavior
+
+❌ **Haven't mastered custom agents**
+
+- Level 5 requires Level 4 foundation
+
+## Practical Implementation
+
+### Minimal Orchestrator Agent
+
+```python
+# orchestrator-agent.md (sub-agent definition)
+
+---
+name: orchestrator
+description: Manages fleet of agents for complex multi-step tasks
+---
+
+# Orchestrator Agent
+
+You are an orchestrator agent managing a fleet of specialized agents.
+
+## Your Tools
+
+- create_agent(name, task, model): Create new agent
+- command_agent(agent_id, task): Send task to existing agent
+- get_agent_status(agent_id): Check agent progress
+- get_agent_output(agent_id): Retrieve agent results
+- delete_agent(agent_id): Remove completed agent
+- delete_all_agents(): Clean up all agents
+
+## Your Responsibilities
+
+1. **Break down user requests** into specialized subtasks
+2. **Create focused agents** for each subtask
+3. **Command agents** with detailed instructions
+4. **Monitor progress** without micromanaging
+5. **Collect results** and synthesize for user
+6. **Delete agents** when work is complete
+
+## Orchestrator Sleep Pattern
+
+After creating and commanding agents:
+1. **SLEEP** - Stop consuming context
+2. **Wake every 15-30s** to check agent status
+3. **SLEEP again** if agents still working
+4. **Wake when all complete** to collect results
+
+DO NOT observe all agent work. This explodes your context window.
+
+## Example Workflow
+
+```
+
+User: "Migrate codebase to new SDK"
+
+You:
+
+1. Create scout agents (parallel search)
+2. Command scouts to find SDK usage
+3. SLEEP (check status every 15s)
+4. Wake when scouts complete
+5. Create planner agent
+6. Command planner with scout results
+7. SLEEP (check status every 15s)
+8. Wake when planner completes
+9. Create builder agent
+10. Command builder with plan
+11. SLEEP (check status every 15s)
+12. Wake when builder completes
+13. Summarize results for user
+14. Delete all agents
+
+```bash
+
+## Key Principles
+
+- **One agent, one task** - Don't overload agents
+- **Sleep between phases** - Protect your context
+- **Delete when done** - Treat agents as temporary
+- **Detailed commands** - Don't assume agents know context
+- **Results-oriented** - Every agent must produce concrete output
+```
+
+### Orchestrator Tools (SDK)
+
+```python
+# create_agent tool
+@mcptool(
+    name="create_agent",
+    description="Create a new specialized agent"
+)
+def create_agent(params: dict) -> dict:
+    name = params["name"]
+    task = params["task"]
+    model = params.get("model", "sonnet")
+
+    agent_id = agent_manager.create(
+        name=name,
+        system_prompt=task,
+        model=model
+    )
+
+    return {
+        "agent_id": agent_id,
+        "status": "created",
+        "message": f"Agent {name} created"
+    }
+
+# command_agent tool
+@mcptool(
+    name="command_agent",
+    description="Send task to existing agent"
+)
+def command_agent(params: dict) -> dict:
+    agent_id = params["agent_id"]
+    task = params["task"]
+
+    result = agent_manager.prompt(agent_id, task)
+
+    return {
+        "agent_id": agent_id,
+        "status": "commanded",
+        "message": f"Agent received task"
+    }
+```
+
+## Trade-offs
+
+### Benefits
+
+- ✅ Scales beyond single agent limits
+- ✅ Parallel execution (3x-10x speedup)
+- ✅ Context window protection
+- ✅ Specialized agent focus
+- ✅ Quality gates between phases
+- ✅ Autonomous out-of-loop work
+
+### Costs
+
+- ❌ Upfront investment to build
+- ❌ Infrastructure complexity (database, WebSocket)
+- ❌ More moving parts to manage
+- ❌ Requires observability
+- ❌ Orchestrator agent needs careful prompting
+- ❌ Not worth it for simple tasks
+
+## Key Quotes
+
+> "The orchestrator agent is the first pattern where I felt the perfect combination of observability, customizability, and agents at scale."
+>
+> "Treat your agents as deletable temporary resources that serve a single purpose."
+>
+> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. Now, our agents are doing the work."
+>
+> "200k context window is plenty. You're just stuffing a single agent with too much work."
+
+## Source Attribution
+
+**Primary source:** One Agent to Rule Them All (orchestrator architecture, three pillars, sleep pattern, CRUD)
+
+**Supporting sources:**
+
+- Claude 2.0 (scout-plan-build workflow, composable prompts)
+- Custom Agents (plan-build-review-ship task board)
+- Sub-Agents (information flow, delegation patterns)
+
+## Related Documentation
+
+- [Hooks for Observability](hooks-observability.md) - Required for orchestration
+- [Context Window Protection](context-window-protection.md) - Why orchestration matters
+- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real orchestration systems
+
+---
+
+**Remember:** Orchestration is Level 5. Master Levels 1-4 first. Then build your fleet.