Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:00:36 +08:00
commit c83b4639c5
49 changed files with 18594 additions and 0 deletions

View File

@@ -0,0 +1,211 @@
---
name: multi-agent-composition
description: >
This skill should be used when the user asks to "choose between skill and agent", "compose
multi-agent system", "orchestrate agents", "manage agent context", "design component
architecture", "should I use a skill or agent", "when to use hooks vs MCP", "build orchestrator
workflow", needs decision frameworks for Claude Code components (skills, sub-agents, hooks, MCP
servers, slash commands), context management patterns, or wants to build effective multi-component
agentic systems with proper orchestration and anti-patterns guidance.
---
# Multi-Agent Composition
**Master Claude Code's components, patterns, and principles** to build effective agentic systems.
## When to Use This Knowledge
Use this knowledge when:
- **Learning Claude Code** - Understanding what each component does
- **Making architectural decisions** - Choosing Skills vs Sub-Agents vs MCP vs Slash Commands
- **Building custom solutions** - Creating specialized agents or orchestration systems
- **Scaling agentic workflows** - Moving from single agents to multi-agent orchestration
- **Debugging issues** - Understanding why components behave certain ways
- **Adding observability** - Implementing hooks for monitoring and control
## Quick Reference
### The Core 4 Framework
Every agent is built on these four elements:
1. **Context** - What information does the agent have?
2. **Model** - What capabilities does the model provide?
3. **Prompt** - What instruction are you giving?
4. **Tools** - What actions can the agent take?
> "Everything comes down to just four pieces. If you understand these, you will win."
### Component Overview
| Component | Trigger | Use When | Best For |
|-----------|---------|----------|----------|
| **Skills** | Agent-invoked | Repeat problems needing management | Domain-specific workflows |
| **Sub-Agents** | Tool-invoked | Parallelization & context isolation | Scale & batch operations |
| **MCP Servers** | As needed | External data/services | Integration with external systems |
| **Slash Commands** | Manual/tool | One-off tasks | Simple repeatable prompts |
| **Hooks** | Lifecycle events | Observability & control | Monitoring & blocking |
### Composition Hierarchy
```text
Skills (Top Layer)
├─→ Can use: Sub-Agents, Slash Commands, MCP Servers, Other Skills
└─→ Purpose: Orchestrate primitives for repeatable workflows
Sub-Agents (Execution Layer)
├─→ Can use: Slash Commands, Skills
└─→ Cannot nest other Sub-Agents
Slash Commands (Primitive Layer)
└─→ The fundamental building block
MCP Servers (Integration Layer)
└─→ Connect external systems
```
### Golden Rules
1. **Always start with prompts** - Master the primitive first
2. **"Parallel" = Sub-Agents** - Nothing else supports parallel execution
3. **External = MCP, Internal = Skills** - Clear separation of concerns
4. **One-off = Slash Command** - Don't over-engineer
5. **Repeat + Management = Skill** - Only scale when needed
6. **Don't convert all slash commands to skills** - Huge mistake
7. **Context, Model, Prompt, Tools** - Never forget the foundation
## Documentation Structure
This skill uses progressive disclosure. Start here, then navigate to specific topics as needed.
### Reference Documentation
**Architecture fundamentals** - What each component is and how they work
- **[architecture.md](reference/architecture.md)** - Component definitions, capabilities, restrictions
- **[core-4-framework.md](reference/core-4-framework.md)** - Deep dive into Context, Model, Prompt, Tools
### Implementation Patterns
**How to use components effectively** - Decision-making and implementation
- **[decision-framework.md](patterns/decision-framework.md)** - When to use Skills vs Sub-Agents vs MCP vs Slash Commands
- **[hooks-in-composition.md](patterns/hooks-in-composition.md)** - Implementing hooks for observability and control
- **[orchestrator-pattern.md](patterns/orchestrator-pattern.md)** - Multi-agent orchestration at scale
- **[context-management.md](patterns/context-management.md)** - Managing context across agents
- **[context-in-composition.md](patterns/context-in-composition.md)** - Context handling in multi-agent systems
### Anti-Patterns
#### Common mistakes to avoid
- **[common-mistakes.md](anti-patterns/common-mistakes.md)** - Converting all slash commands to
skills, using skills for one-offs, context explosion, and more
### Examples
#### Real-world case studies and progression paths
- **[progression-example.md](examples/progression-example.md)** - Evolution from prompt → sub-agent → skill (work tree manager example)
- **[case-studies.md](examples/case-studies.md)** - Scout-builder patterns, orchestration workflows, multi-agent systems
### Workflows
#### Visual guides and decision trees
- **[decision-tree.md](workflows/decision-tree.md)** - Decision trees, mindmaps, and visual guides for choosing components
## Getting Started
### If you're new to Claude Code
1. Start with **[reference/architecture.md](reference/architecture.md)** to understand components
2. Read **[reference/core-4-framework.md](reference/core-4-framework.md)** to grasp the foundation
3. Use **[patterns/decision-framework.md](patterns/decision-framework.md)** to make your first architectural choice
4. Check **[anti-patterns/common-mistakes.md](anti-patterns/common-mistakes.md)** to avoid pitfalls
### If you're making an architectural decision
1. Open **[patterns/decision-framework.md](patterns/decision-framework.md)**
2. Follow the decision tree to identify the right component
3. Review the specific component in **[reference/architecture.md](reference/architecture.md)**
4. Check **[examples/](examples/)** for similar use cases
### If you're adding observability
1. Read **[patterns/hooks-in-composition.md](patterns/hooks-in-composition.md)** to understand available hooks and implementation
2. Use isolated scripts pattern (UV, bun, or shell)
### If you're scaling to multi-agent orchestration
1. Ensure you've mastered custom agents first
2. Read **[patterns/orchestrator-pattern.md](patterns/orchestrator-pattern.md)**
3. Study **[examples/case-studies.md](examples/case-studies.md)**
4. Review **[patterns/context-management.md](patterns/context-management.md)**
## Key Principles from the Field
### Prompts Are the Primitive
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming. If you don't know how to build and manage prompts, you will lose."
**Everything is prompts in the end.** Master slash commands before skills. Have a strong bias toward slash commands.
### Skills Are Compositional, Not Replacements
> "It is very clear this does not replace any existing feature or capability. It is a higher compositional level."
Skills orchestrate other components; they don't replace them. Don't convert all your
slash commands to skills—that's a huge mistake.
### Observability is Everything
> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
If you can't measure it, you can't improve it. If you can't measure it, you can't scale it.
### Context Window Protection
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
Create focused agents with single purposes. Delete them when done. Treat agents as temporary, deletable resources.
### The Agentic Engineering Progression
```text
Level 1: Base agents → Use agents out of the box
Level 2: Better agents → Customize prompts and workflows
Level 3: More agents → Run multiple agents
Level 4: Custom agents → Build specialized solutions
Level 5: Orchestration → Manage fleets of agents
```
## Source Attribution
This knowledge synthesizes:
- Video presentations by Claude Code engineering team
- Official Claude Code documentation (docs.claude.com)
- Hands-on experimentation and validation
- Multi-agent orchestration patterns from the field
## Quick Navigation
**Need to understand what a component is?** → [reference/architecture.md](reference/architecture.md)
**Need to choose the right component?** → [patterns/decision-framework.md](patterns/decision-framework.md)
**Need to implement hooks?** → [patterns/hooks-in-composition.md](patterns/hooks-in-composition.md)
**Need to scale to multiple agents?** → [patterns/orchestrator-pattern.md](patterns/orchestrator-pattern.md)
**Need to see real examples?** → [examples/](examples/)
**Need visual guides?** → [workflows/decision-tree.md](workflows/decision-tree.md)
**Want to avoid mistakes?** → [anti-patterns/common-mistakes.md](anti-patterns/common-mistakes.md)
---
**Remember:** Context, Model, Prompt, Tools. Master these four, and you master Claude Code.

View File

@@ -0,0 +1,429 @@
# Common Anti-Patterns in Claude Code
**Critical mistakes to avoid** when building with Claude Code components.
## Table of Contents
- [The Fatal Five](#the-fatal-five)
- [1. Converting All Slash Commands to Skills](#1-converting-all-slash-commands-to-skills)
- [2. Using Skills for One-Off Tasks](#2-using-skills-for-one-off-tasks)
- [3. Skipping the Primitive (Not Mastering Prompts First)](#3-skipping-the-primitive-not-mastering-prompts-first)
- [4. Forcing Single Agents to Do Too Much (Context Explosion)](#4-forcing-single-agents-to-do-too-much-context-explosion)
- [5. Using Sub-Agents When Context Matters](#5-using-sub-agents-when-context-matters)
- [Secondary Anti-Patterns](#secondary-anti-patterns)
- [6. Confusing MCP with Internal Orchestration](#6-confusing-mcp-with-internal-orchestration)
- [7. Forgetting the Core Four](#7-forgetting-the-core-four)
- [8. No Observability (Can't Measure, Can't Improve)](#8-no-observability-cant-measure-cant-improve)
- [9. Nesting Sub-Agents](#9-nesting-sub-agents)
- [10. Over-Engineering Simple Problems](#10-over-engineering-simple-problems)
- [11. Agent Dependency Coupling](#11-agent-dependency-coupling)
- [Anti-Pattern Detection Checklist](#anti-pattern-detection-checklist)
- [Recovery Strategies](#recovery-strategies)
- [Remember](#remember)
## The Fatal Five
These are the most common and damaging mistakes engineers make:
### 1. Converting All Slash Commands to Skills
**The Mistake:**
> "There are a lot of engineers right now that are going all in on skills. They're converting all their slash commands to skills. I think that's a huge mistake."
**Why it's wrong:**
- Skills are for **repeat problems that need management**, not simple one-off tasks
- Slash commands are the **primitive foundation** - you need them
- You're adding unnecessary complexity and context overhead
- Skills should **complement** slash commands, not replace them
**Correct approach:**
- Keep slash commands for simple, direct tasks
- Only create a skill when you're **managing a problem domain** with multiple related operations
- Have a strong bias toward slash commands
**Example:**
- ❌ Wrong: Create a skill for generating a single commit message
- ✅ Right: Use a slash command for one-off commit messages; create a skill only if managing an entire git workflow system
---
### 2. Using Skills for One-Off Tasks
**The Mistake:**
> "If you can do the job with a sub-agent or custom slash command and it's a one-off job, do not use a skill. This is not what skills are for."
**Why it's wrong:**
- Skills have overhead (metadata, loading, management)
- One-off tasks don't benefit from reuse
- You're over-engineering a simple problem
**Signal words that indicate you DON'T need a skill:**
- "One time"
- "Quick"
- "Just need to..."
- "Simple task"
**Correct approach:**
- Use a slash command for one-off tasks
- If you find yourself doing it repeatedly (3+ times), **then** consider a skill
**Example:**
- ❌ Wrong: Build a skill to create one UI component
- ✅ Right: Use a slash command; upgrade to skill only after creating components repeatedly
---
### 3. Skipping the Primitive (Not Mastering Prompts First)
**The Mistake:**
> "When you're starting out, I always recommend you just build a prompt. Don't build a skill. Don't build a sub-agent. Don't build out an MCP server. Keep it simple. Build a prompt."
**Why it's wrong:**
- If you don't master prompts, you can't build effective skills
- Everything is prompts in the end (tokens in, tokens out)
- You're building on a weak foundation
**The fundamental truth:**
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming. If you don't know how to build and manage prompts, you will lose."
**Correct approach:**
1. Always start with a prompt/slash command
2. Master the primitive first
3. Scale up only when needed
4. Build from the foundation upward
**Example:**
- ❌ Wrong: "I'm going to start by building a skill because it's more advanced"
- ✅ Right: "I'll write a prompt first, see if it works, then consider scaling to a skill"
---
### 4. Forcing Single Agents to Do Too Much (Context Explosion)
**The Mistake:**
> "200k context window is plenty. You're just stuffing a single agent with too much work, just like your boss did to you at your last job. Don't force your agent to context switch."
**Why it's wrong:**
- Context explosion leads to poor performance
- Agent loses focus across too many unrelated tasks
- You're treating your agent like an overworked employee
- Results degrade as context window fills
**Correct approach:**
- Create **focused agents** with single purposes
- Use **sub-agents** for parallel, isolated work
- **Delete agents** when their task is complete
- Treat agents as **temporary, deletable resources**
**Example:**
- ❌ Wrong: One agent that reads codebase, writes tests, updates docs, and deploys
- ✅ Right: Four focused agents - one for reading, one for tests, one for docs, one for deployment
---
### 5. Using Sub-Agents When Context Matters
**The Mistake:**
> "Sub-agents isolate and protect your context window... But of course, you have to be okay with losing that context afterward because it will be lost."
**Why it's wrong:**
- Sub-agent context is **isolated**
- You can't reference sub-agent work later without resumable sub-agents
- You lose the conversation history
**Correct approach:**
- Use sub-agents when:
- You need **parallelization**
- Context **isolation** is desired
- You're okay **losing context** after
- Use main conversation when:
- You need context later
- Work builds on previous steps
- Conversation continuity matters
**Example:**
- ❌ Wrong: Use sub-agent for research task, then try to reference findings 10 prompts later
- ✅ Right: Do research in main conversation if you'll need it later; use sub-agent only for isolated batch work
---
## Secondary Anti-Patterns
### 6. Confusing MCP with Internal Orchestration
**The Mistake:** Using MCP servers for internal workflows instead of external integrations.
**Why it's wrong:**
> "To me, there is very very little overlap here between agent skills and MCP servers. These are fully distinct."
**Clear rule:** External = MCP, Internal orchestration = Skills
**Example:**
- ❌ Wrong: Build MCP server to orchestrate your internal test suite
- ✅ Right: Build a skill for internal test orchestration; use MCP to connect to external CI/CD service
---
### 7. Forgetting the Core Four
**The Mistake:** Not monitoring Context, Model, Prompt, and Tools at critical moments.
**Why it's wrong:**
> "Context, model, prompt, tools. Do you know what these four leverage points are at every critical moment? This is the foundation."
**Correct approach:**
- Always know the state of the Core Four for your agents
- Monitor context window usage
- Understand which model is active
- Track what prompts are being used
- Know what tools are available
---
### 8. No Observability (Can't Measure, Can't Improve)
**The Mistake:** Running agents without logging, monitoring, or hooks.
**Why it's wrong:**
> "When it comes to agentic coding, observability is everything. If you can't measure it, you can't improve it. And if you can't measure it, you can't scale it."
**Correct approach:**
- Implement hooks for logging (post-tool-use, stop)
- Track agent performance and costs
- Monitor what files are read/written
- Capture chat transcripts
- Review agent behavior to improve prompts
---
### 9. Nesting Sub-Agents
**The Mistake:** Trying to spawn sub-agents from within other sub-agents.
**Why it's wrong:**
- Hard limit in Claude Code architecture
- Prevents infinite nesting
- Not supported by the system
**The restriction:**
> "Sub-agents cannot spawn other sub-agents. This prevents infinite nesting while still allowing Claude to gather necessary context."
**Correct approach:**
- Use orchestrator pattern instead
- Flatten your agent hierarchy
- Have main agent create all sub-agents
---
### 10. Over-Engineering Simple Problems
**The Mistake:** Building complex multi-agent orchestration for tasks that could be a single prompt.
**Why it's wrong:**
- Unnecessary complexity
- Maintenance burden
- Slower execution
- Higher costs
**The principle:** Start simple, scale only when needed.
**Decision checklist before scaling:**
- [ ] Have I tried solving this with a single prompt?
- [ ] Is this actually a repeat problem?
- [ ] Will the added complexity pay off?
- [ ] Am I solving a real problem or just playing with new features?
---
### 11. Agent Dependency Coupling
**The Mistake:** Creating agents that depend on the exact output format of other agents.
**Why it's wrong:**
- Creates **brittle coupling** between agents
- Changes to one agent's output **break downstream agents**
- Makes the system **hard to maintain** and evolve
- Creates a **hidden dependency graph** that's not explicit
**The problem:**
When Agent B expects Agent A to return data in a specific format (e.g., JSON with specific field names, or markdown with specific structure), you create tight coupling. If Agent A's output changes, Agent B silently breaks.
**Warning signs:**
- Agents parsing other agents' string outputs
- Hard-coded field names or output structure assumptions
- Agents that "expect" data in a certain format without validation
- No explicit contracts between agents
**Correct approach:**
**1. Use explicit contracts:**
```text
Agent A prompt:
"Return JSON with these exact fields: {id, name, status, created_at}"
Agent B prompt:
"You will receive JSON with fields: {id, name, status, created_at}
Validate the structure before processing."
```
**2. Use structured data formats:**
- Define JSON schemas explicitly
- Document expected fields
- Validate inputs before processing
- Handle missing or malformed data gracefully
**3. Minimize agent-to-agent communication:**
- Prefer orchestrator pattern (main agent coordinates)
- Pass data through orchestrator, not agent-to-agent
- Keep sub-agents independent when possible
**4. Version your agent contracts:**
```text
Agent output format v2:
{
"version": "2.0",
"data": {...},
"metadata": {...}
}
```
**Example:**
**Wrong (Brittle Coupling):**
```text
Agent A: "Analyze files and report findings"
[Returns: "Found 3 issues in foo.py and 2 in bar.py"]
Agent B: "Parse Agent A's output and fix the issues"
[Expects: "Found N issues in X and Y in Z" format]
```
**Problem:** If Agent A changes its output format, Agent B breaks silently.
**Right (Explicit Contract):**
```text
Agent A: "Analyze files and return JSON:
{
'files_analyzed': [...],
'findings': [
{'file': 'foo.py', 'line': 10, 'issue': '...'},
{'file': 'bar.py', 'line': 20, 'issue': '...'}
]
}"
Agent B: "You will receive JSON with fields: {files_analyzed, findings}.
First validate the structure. Then fix each issue in findings array."
```
**Better (Orchestrator Pattern):**
```text
Main Agent:
1. Spawn Agent A to analyze files
2. Parse Agent A's JSON output
3. Transform to format Agent B needs
4. Spawn Agent B with explicit data structure
5. Agent B doesn't need to know about Agent A
```
**Best practice:** The orchestrator (main agent) owns the contracts and data transformations. Sub-agents are independent and don't depend on each other's formats.
---
## Anti-Pattern Detection Checklist
Ask yourself these questions:
**Before creating a skill:**
- [ ] Is this a **repeat problem** that needs **management**?
- [ ] Have I solved this with a prompt/slash command first?
- [ ] Am I avoiding the mistake of converting simple commands to skills?
**Before using a sub-agent:**
- [ ] Do I need **parallelization** or **context isolation**?
- [ ] Am I okay **losing this context** afterward?
- [ ] Could this be done in the main conversation instead?
**Before using MCP:**
- [ ] Is this for **external** data/services?
- [ ] Am I not confusing this with internal orchestration?
**Before scaling to multi-agent orchestration:**
- [ ] Have I mastered custom agents first?
- [ ] Do I have observability in place?
- [ ] Am I solving a real scale problem?
---
## Recovery Strategies
**If you've fallen into these anti-patterns:**
1. **Converted slash commands to skills?**
- Evaluate each skill: Is it truly a repeat management problem?
- Downgrade skills that are just one-off tasks back to slash commands
- Keep your slash command library strong
2. **Context explosion in single agent?**
- Split work across focused sub-agents
- Use orchestrator pattern for complex workflows
- Delete agents when tasks complete
3. **No observability?**
- Add hooks immediately (start with stop and post-tool-use)
- Log chat transcripts
- Track tool usage
- Monitor costs
4. **Lost in complexity?**
- Step back to basics: What's the simplest solution?
- Remove unnecessary abstractions
- Return to prompts/slash commands
- Scale up only when proven necessary
---
## Remember
> "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
>
> "If you can do the job with a sub-agent or custom slash command and it's a one-off job, do not use a skill."
>
> "Context, model, prompt, tools. This never goes away."
**The golden path:** Start with prompts → Scale thoughtfully → Add observability → Manage complexity

View File

@@ -0,0 +1,992 @@
# Multi-Agent Case Studies
Real-world examples of multi-agent systems in production, drawn from field experience.
## Case Study Index
| # | Name | Pattern | Agents | Key Lesson |
|---|------|---------|--------|------------|
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
---
## Case Study 1: AI Docs Loader
**Pattern:** Sub-agent delegation for parallel work
**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
**Solution:** Delegate each scrape to isolated sub-agent
**Architecture:**
```text
Primary Agent (9k tokens)
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
...
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
Total work: 39k tokens
Primary agent: Only 9k tokens ✅
Context protected: 30k tokens kept out of primary
```
**Implementation:**
```bash
# Single command
/load-ai-docs
# Agent reads list from ai-docs/README.md
# For each URL older than 24 hours:
# - Spawn sub-agent
# - Sub-agent scrapes URL
# - Sub-agent saves to file
# - Sub-agent reports completion
# Primary agent never sees scrape content
```
**Key techniques:**
- **Sub-agents for isolation** - Each scrape in separate context
- **Parallel execution** - All 10 scrapes run simultaneously
- **Context delegation** - 30k tokens stay out of primary
**Results:**
- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
- **Context:** Primary agent stays at 9k tokens throughout
- **Scalability:** Can handle 50+ URLs without primary context issues
**Source:** Elite Context Engineering transcript
---
## Case Study 2: SDK Migration
**Pattern:** Scout-plan-build with multiple perspectives
**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
**Challenge:**
- 100+ files potentially affected
- Agent reading everything = 150k+ tokens
- Planning without full context = mistakes
**Solution:** Three-phase workflow with delegation
**Phase 1: Scout (Reduce context for planner)**
```text
Orchestrator spawns 4 scout agents (parallel):
├→ Scout 1: Gemini Lightning (fast, different perspective)
├→ Scout 2: CodeX (specialized for code search)
├→ Scout 3: Gemini Flash Preview
└→ Scout 4: Haiku (cheap, fast)
Each scout:
- Searches codebase for SDK usage
- Identifies exact files and line numbers
- Notes patterns (e.g., "system prompt now explicit")
Output: relevant-files.md (5k tokens)
├── File paths
├── Line number offsets
├── Character ranges
└── Relevant code snippets
```
**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
**Phase 2: Plan (Focus on relevant subset)**
```text
Planner agent (new instance):
├── Reads relevant-files.md (5k tokens)
├── Scrapes SDK documentation (8k tokens)
├── Analyzes migration patterns
└── Creates detailed-plan.md (3k tokens)
Context used: 16k tokens
vs. 150k if reading entire codebase
Savings: 89% reduction
```
**Phase 3: Build (Execute plan)**
```text
Builder agent (new instance):
├── Reads detailed-plan.md (3k tokens)
├── Implements changes across 8 apps
├── Updates system prompts
├── Tests each application
└── Reports completion
Context used: ~80k tokens
Still within safe limits
```
**Final context analysis:**
```text
If single agent:
├── Search: 40k tokens
├── Read files: 60k tokens
├── Plan: 20k tokens
├── Implement: 30k tokens
└── Total: 150k tokens (75% used)
With scout-plan-build:
├── Primary orchestrator: 10k tokens
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
├── Planner (new agent): 16k tokens
├── Builder (new agent): 80k tokens
└── Max per agent: 80k tokens (40% per agent)
```
**Key techniques:**
- **Composable workflows** - Chain /scout, /plan, /build
- **Multiple scout models** - Diverse perspectives
- **Context offloading** - Scouts protect planner's context
- **Fresh agents per phase** - No context accumulation
**Results:**
- **8 applications migrated** successfully
- **51% context used** in builder phase (safe margins)
- **No context explosions** across entire workflow
- **Completed in single session** (~30 minutes)
**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
**Lesson:** Disable autocompact buffer. That 22% matters at scale.
**Source:** Claude 2.0 transcript
---
## Case Study 3: Codebase Summarization
**Pattern:** Orchestrator with specialized QA agents
**Problem:** Summarize large codebase (frontend + backend) with architecture docs
**Approach:** Divide and conquer with synthesis
**Architecture:**
```text
Orchestrator Agent
├→ Creates Frontend QA Agent
│ ├─ Summarizes frontend components
│ └─ Outputs: frontend-summary.md
├→ Creates Backend QA Agent
│ ├─ Summarizes backend APIs
│ └─ Outputs: backend-summary.md
└→ Creates Primary QA Agent
├─ Reads both summaries
├─ Synthesizes unified view
└─ Outputs: codebase-overview.md
```
**Orchestrator behavior:**
```text
1. Parse user request: "Summarize codebase"
2. Create 3 agents with specialized tasks
3. Command each agent with detailed prompts
4. SLEEP (not observing their work)
5. Wake every 15s to check status
6. Agents complete → Orchestrator wakes
7. Collect results (read produced files)
8. Summarize for user
9. Delete all 3 agents
```
**Prompts from orchestrator:**
```markdown
Frontend QA Agent:
"Analyze all files in src/frontend/. Create markdown summary with:
- Key components and their responsibilities
- State management approach
- Routing structure
- Technology stack
Output to docs/frontend-summary.md"
Backend QA Agent:
"Analyze all files in src/backend/. Create markdown summary with:
- API endpoints and their purposes
- Database schema
- Authentication/authorization
- External integrations
Output to docs/backend-summary.md"
Primary QA Agent:
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
- High-level architecture
- How components interact
- Data flow
- Key technologies
Output to docs/codebase-overview.md"
```
**Observability interface shows:**
```text
[Agent 1] Frontend QA
├── Status: Complete ✅
├── Context: 28k tokens used
├── Files consumed: 15 files
├── Files produced: frontend-summary.md
└── Time: 45 seconds
[Agent 2] Backend QA
├── Status: Complete ✅
├── Context: 32k tokens used
├── Files consumed: 12 files
├── Files produced: backend-summary.md
└── Time: 52 seconds
[Agent 3] Primary QA
├── Status: Complete ✅
├── Context: 18k tokens used
├── Files consumed: 2 files (summaries)
├── Files produced: codebase-overview.md
└── Time: 30 seconds
Orchestrator:
├── Context: 12k tokens (commands only, not observing work)
├── Total time: 52 seconds (parallel execution)
└── All agents deleted after completion
```
**Key techniques:**
- **Parallel frontend/backend** - 2x speedup
- **Orchestrator sleeps** - Protects its context
- **Synthesis agent** - Combines perspectives
- **Deletable agents** - Freed after use
**Results:**
- **3 comprehensive docs** created
- **Max context per agent:** 32k tokens (16%)
- **Orchestrator context:** 12k tokens (6%)
- **Time:** 52 seconds (vs. 2+ minutes sequential)
**Source:** One Agent to Rule Them All transcript
---
## Case Study 4: UI Component Creation
**Pattern:** Scout-builder two-stage
**Problem:** Create gray pills for app header information display
**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
**Solution:** Scout locates, builder implements
**Phase 1: Scout**
```text
Scout Agent:
├── Task: "Find header UI component files"
├── Searches for: header, display, pills, info components
├── Identifies patterns: existing pill styles, color conventions
├── Locates exact files:
│ ├── src/components/AppHeader.vue
│ ├── src/styles/pills.css
│ └── src/utils/formatters.ts
└── Outputs: scout-header-report.md with:
├── File locations
├── Line numbers for modifications
├── Existing patterns to follow
└── Recommended approach
```
**Phase 2: Builder**
```text
Builder Agent:
├── Reads scout-header-report.md
├── Follows identified patterns
├── Creates gray pill components
├── Applies consistent styling
├── Outputs modified files with exact changes
└── Context: Only 30k tokens (vs. 80k+ without scout)
```
**Orchestrator involvement:**
```text
1. User prompts: "Create gray pills for header"
2. Orchestrator creates Scout
3. Orchestrator SLEEPS (checks every 15s)
4. Scout completes → Orchestrator wakes
5. Orchestrator reads scout output
6. Orchestrator creates Builder with detailed instructions
7. Orchestrator SLEEPS again
8. Builder completes → Orchestrator wakes
9. Orchestrator reports results
10. Orchestrator deletes both agents
```
**Key techniques:**
- **Scout reduces uncertainty** - Builder knows exactly where to work
- **Pattern following** - Scout identifies conventions
- **Orchestrator sleep** - Two phases, minimal orchestrator context
- **Precise targeting** - No wasted reads
**Results:**
- **Scout:** 15k tokens, 20 seconds
- **Builder:** 30k tokens, 35 seconds
- **Orchestrator:** 8k tokens final
- **Total time:** 55 seconds
- **Feature shipped** correctly on first try
**Source:** One Agent to Rule Them All transcript
---
## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
**Pattern:** Structured lifecycle with quality gates
**Problem:** Ensure all changes go through proper review before shipping
**Architecture:**
```text
Task Board Columns:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```
**Example task: "Update HTML titles"**
**Column 1: PLAN**
```text
Planner Agent:
├── Analyzes requirement
├── Identifies affected files:
│ ├── index.html
│ └── src/App.tsx (has <title> in render)
├── Creates implementation plan:
│ 1. Update index.html <title>
│ 2. Update App.tsx header component
│ 3. Test both pages load correctly
└── Moves task to BUILD column
```
**Column 2: BUILD**
```text
Builder Agent:
├── Reads plan from PLAN column
├── Implements changes:
│ ├── index.html: "Plan Build Review Ship"
│ └── App.tsx: header="Plan Build Review Ship"
├── Runs tests: All passing ✅
└── Moves task to REVIEW column
```
**Column 3: REVIEW**
```text
Reviewer Agent:
├── Reads plan and implementation
├── Checks:
│ ├── Plan followed? ✅
│ ├── Tests passing? ✅
│ ├── Code quality? ✅
│ └── No security issues? ✅
├── Approves changes
└── Moves task to SHIP column
```
**Column 4: SHIP**
```text
Shipper Agent:
├── Creates git commit
├── Pushes to remote
├── Updates deployment
└── Marks task complete
```
**Orchestrator's role:**
```text
- NOT micromanaging each step
- Responding to user commands like "Move task to next phase"
- Tracking task state in database
- Providing UI showing current phase
- Can intervene if phase fails (e.g., tests fail in BUILD)
```
**UI representation:**
```text
Task: Update Titles
├── Status: REVIEW
├── Assigned: reviewer-agent-003
├── History:
│ ├── PLAN: planner-001 (completed 2m ago)
│ ├── BUILD: builder-002 (completed 1m ago)
│ └── REVIEW: reviewer-003 (in progress)
└── Files modified: 2
```
**Key techniques:**
- **Clear phases** - No ambiguity about current state
- **Quality gates** - Can't skip to SHIP without REVIEW
- **Agent specialization** - Each agent expert in its phase
- **Failure isolation** - If BUILD fails, PLAN preserved
**Results:**
- **Zero shipping untested code** (REVIEW gate catches issues)
- **Clear audit trail** (who did what in which phase)
- **Parallel tasks** (multiple agents in different columns)
- **Single interface** (user sees all tasks across all phases)
**Source:** Custom Agents transcript
---
## Case Study 6: Meta-Agent System
**Pattern:** Agents building agents
**Problem:** Need new specialized agent but don't want to hand-write configuration
**Solution:** Meta-agent that builds other agents
**Meta-agent prompt:**
```markdown
# meta-agent.md
You are a meta-agent that builds new sub-agents from user descriptions.
When user says "build a new sub-agent":
1. Ask what the agent should do
2. Fetch Claude Code sub-agent documentation
3. Design system prompt for new agent
4. Create agent configuration file
5. Test agent with sample prompts
6. Report usage examples
Output: .claude/agents/<agent-name>.md with complete configuration
```
**Example: Building TTS summary agent**
**User:** "Build agent that summarizes what my code does using text-to-speech"
**Meta-agent process:**
```text
Step 1: Understand requirements
├── Parse: "summarize code" + "text-to-speech"
├── Infer: Needs code reading + TTS API access
└── Clarify: Voice provider? (user chooses 11Labs)
Step 2: Fetch documentation
├── Reads Claude Code sub-agent docs
├── Reads 11Labs API docs
└── Understands agent configuration format
Step 3: Design system prompt
├── Purpose: Concise code summaries via voice
├── Tools needed: read files, 11Labs TTS
├── Response format: Audio file output
└── Trigger: "use TTS summary"
Step 4: Create configuration
Writes .claude/agents/tts-summary.md:
---
name: tts-summary
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
---
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
[... full system prompt ...]
Step 5: Test
├── Runs test prompt: "TTS summary for hooks.py"
├── Agent reads file, generates summary
├── Outputs audio with summary
└── Validates: Works correctly ✅
Step 6: Report
├── Explains how to use new agent
├── Shows example prompts
└── Notes: Can adjust voice, length, etc.
```
**Result:** Fully functional TTS summary agent created from natural language description
**Recursion depth:**
```text
Level 0: Human user
└→ Level 1: Meta-agent (builds agents)
└→ Level 2: TTS summary agent (built by meta-agent)
└→ Level 3: Sub-agents (if TTS agent spawns any)
```
**Key techniques:**
- **Documentation fetching** - Meta-agent reads official docs
- **Template following** - Follows agent configuration patterns
- **Validation loop** - Tests before declaring success
- **Recursive creation** - Agents can build agents
**Challenges:**
- **Dependency coupling** - New agent depends on meta-agent's understanding
- **Debugging difficulty** - If generated agent fails, hard to trace
- **Version drift** - Meta-agent's docs knowledge may become outdated
**Results:**
- **New agent in ~2 minutes** vs. 15+ minutes manually
- **Follows best practices** automatically
- **Tested before delivery**
- **Documented usage**
**Source:** Sub-Agents transcript
---
## Case Study 7: Observability Dashboard
**Pattern:** Real-time multi-agent monitoring
**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
**Solution:** Centralized observability system
**Architecture:**
```text
┌──────────────────── Multiple Agents ────────────────────┐
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
│ ↓ ↓ ↓ ↓ ↓ │
│ pre/post-tool-use hooks │
│ ↓ │
└──────────────────────────────────────────────────────────┘
┌──────────────────── Bun Server ─────────────────────────┐
│ POST /events endpoint │
│ ├→ Store in SQLite (persistence) │
│ └→ Broadcast via WebSocket (real-time) │
└──────────────────────────────────────────────────────────┘
┌──────────────────── Web Client ─────────────────────────┐
│ ┌─────────────────────────────────────────────────┐ │
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Stream (filtered by app/session/type) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Event Details (with AI-generated summaries) │ │
│ └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
```
**Hook implementation:**
```python
# .claude/hooks/post-tool-use.py
import sys, json, subprocess
def main():
event = json.load(sys.stdin)
# Send to observability server (with AI summary)
subprocess.run([
"uv", "run",
".claude/hooks/utils/send-event.py",
"my-codebase", # App name
"post-tool-use", # Event type
"--summarize" # Generate Haiku summary
], input=json.dumps(event), text=True)
if __name__ == "__main__":
main()
```
**What user sees:**
```text
┌─────────────── Live Activity Pulse ───────────────┐
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
└────────────────────────────────────────────────────┘
┌─────────────── Event Stream ──────────────────────┐
│ [Agent A] post-tool-use │
│ Summary: "Wrote authentication logic to user.py"│
│ Time: 2s ago │
├────────────────────────────────────────────────────┤
│ [Agent B] sub-agent-stop │
│ Summary: "Completed documentation scrape" │
│ Time: 5s ago │
├────────────────────────────────────────────────────┤
│ [Agent C] notification │
│ Summary: "Needs approval for rm command" │
│ Time: 8s ago │
└────────────────────────────────────────────────────┘
```
**Filtering:**
```text
Filters available:
├── By app (codebase-1, codebase-2, etc.)
├── By agent session ID
├── By event type (pre-tool, post-tool, stop, etc.)
└── By time window (1min, 3min, 5min)
```
**Event summarization:**
```python
# Each event summarized by Haiku ($0.0002 per event)
Event: post-tool-use for Write tool
Input: {file: "auth.py", content: "...500 lines..."}
Output: Success
Summary generated:
"Implemented JWT authentication with refresh tokens in auth.py"
Cost: $0.0002
Human value: Instant understanding without reading 500 lines
```
**Key techniques:**
- **One-way data stream** - Simple, fast, scalable
- **Edge summarization** - AI summaries generated at hook time
- **Dual storage** - SQLite (history) + WebSocket (real-time)
- **Color coding** - Consistent colors per agent session
**Results:**
- **5-10 agents monitored** simultaneously
- **Thousands of events logged** (cost: ~$0.20)
- **Real-time visibility** into all agent work
- **Historical analysis** via SQLite queries
**Business value:**
- **Catch errors fast** (notification events = agent blocked)
- **Optimize workflows** (which tools used most?)
- **Debug issues** (what happened before failure?)
- **Scale confidence** (can observe 10+ agents easily)
**Source:** Multi-Agent Observability transcript
---
## Case Study 8: AFK Agent Device
**Pattern:** Autonomous background work while you're away
**Problem:** Long-running tasks block your terminal. You want to work on something else.
**Solution:** Dedicated device running agent fleet
**Architecture:**
```text
Your Device (interactive):
├── Claude Code session
├── Send job to agent device
└── Monitor status updates
Agent Device (autonomous):
├── Picks up job from queue
├── Executes: Scout → Plan → Build → Ship
├── Reports status every 60s
└── Ships results to git
```
**Workflow:**
```bash
# From your device
/afk-agents \
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
--adw "plan-build-ship" \
--docs "https://openai-agent-sdk.com/docs"
# Job sent to dedicated device
# You continue working on your device
# Background: Agent device executes workflow
```
**Agent device execution:**
```text
[00:00] Job received: Build 3 SDK agents
[00:05] Planner agent created
[00:45] Plan complete: 3 agents specified
[01:00] Builder agent 1 created (basic agent)
[02:30] Builder agent 1 complete: basic-agent.py ✅
[02:35] Builder agent 2 created (with tools)
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
[04:20] Builder agent 3 created (realtime voice)
[07:45] Builder agent 3 partial: needs audio libraries
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
[08:05] Shipper agent created
[08:20] Git commit created
[08:25] Pushed to remote
[08:30] Job complete ✅
```
**Status updates (every 60s):**
```text
Your device shows:
[60s] Status: Planning agents...
[120s] Status: Building agent 1 of 3...
[180s] Status: Building agent 2 of 3...
[240s] Status: Building agent 3 of 3...
[300s] Status: Testing agents...
[360s] Status: Shipping to git...
[420s] Status: Complete ✅
Click to view: results/sdk-agents-20250105/
```
**What you do:**
```text
1. Send job (10 seconds)
2. Go AFK (work on something else)
3. Get notified when complete (7 minutes later)
4. Review results
```
**Key techniques:**
- **Job queue** - Agents pick up work from queue
- **Async status** - Reports back periodically
- **Autonomous execution** - No human in the loop
- **Git integration** - Results automatically committed
**Results:**
- **3 SDK agents built** in 7 minutes
- **You worked on other things** during that time
- **Autonomous end-to-end** - plan + build + test + ship
- **Code review** - Quick glance confirms quality
**Infrastructure required:**
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
- Agent queue system
- Job scheduler
- Status reporting
**Use cases:**
- Long-running builds
- Overnight work
- Prototyping experiments
- Documentation generation
- Codebase refactors
**Source:** Claude 2.0 transcript
---
## Cross-Cutting Patterns
### Pattern: Context Window as Resource Constraint
**Appears in:**
- Case 1: Sub-agent delegation protects primary
- Case 2: Scout-plan-build reduces planner context
- Case 3: Orchestrator sleeps to protect its context
- Case 8: Fresh agents for each phase (no accumulation)
**Lesson:** Context is precious. Protect it aggressively.
### Pattern: Specialized Agents Over General
**Appears in:**
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
- Case 4: Scout finds, builder builds (not one agent doing both)
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
- Case 6: Meta-agent only builds, doesn't execute
**Lesson:** "A focused agent is a performant agent."
### Pattern: Observability Enables Scale
**Appears in:**
- Case 3: Orchestrator tracks agent status
- Case 5: Task board shows current phase
- Case 7: Real-time dashboard for all agents
- Case 8: Status updates every 60s
**Lesson:** "If you can't measure it, you can't scale it."
### Pattern: Deletable Temporary Resources
**Appears in:**
- Case 3: All 3 agents deleted after completion
- Case 4: Scout and builder deleted
- Case 5: Each phase agent deleted after task moves
- Case 8: Builder agents deleted after shipping
**Lesson:** "The best agent is a deleted agent."
## Performance Comparisons
### Single Agent vs. Multi-Agent
| Task | Single Agent | Multi-Agent | Speedup |
|------|--------------|-------------|---------|
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
### With vs. Without Orchestration
| Metric | Manual (no orchestrator) | With Orchestrator |
|--------|-------------------------|-------------------|
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
| Error recovery | Start over | Retry failed phase only |
| Observability | Terminal logs | Real-time dashboard |
## Common Failure Modes
### Failure: Context Explosion
**Scenario:** Case 2 without scouts
- Single agent reads 100+ files
- Context hits 180k tokens
- Agent slows down, makes mistakes
- Eventually fails or times out
**Fix:** Add scout phase to filter files first
### Failure: Orchestrator Watching Everything
**Scenario:** Case 3 with observing orchestrator
- Orchestrator watches all agent work
- Orchestrator context grows to 100k+
- Can't coordinate more than 2-3 agents
- System doesn't scale
**Fix:** Implement orchestrator sleep pattern
### Failure: No Observability
**Scenario:** Case 7 without dashboard
- 5 agents running
- One agent stuck on permission request
- No way to know which agent needs attention
- Entire workflow blocked
**Fix:** Add hooks + observability system
### Failure: Agent Accumulation
**Scenario:** Case 5 not deleting agents
- 20 tasks completed
- 80 agents still running (4 per task)
- System resources exhausted
- New agents can't start
**Fix:** Delete agents after task completion
## Key Takeaways
1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
2. **Context protection = Specialization** - Focused agents use less context
3. **Orchestration = Scale** - Single interface manages fleet
4. **Observability = Confidence** - Can't scale what you can't see
5. **Deletable = Sustainable** - Free resources for next task
6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
## When to Use Multi-Agent Patterns
Use multi-agent when:
- ✅ Task naturally divides into parallel subtasks
- ✅ Single agent context approaching limits
- ✅ Need quality gates between phases
- ✅ Want to work on other things while agents execute
- ✅ Have observability infrastructure
Don't use multi-agent when:
- ❌ Simple one-off task
- ❌ Learning/prototyping phase
- ❌ No way to monitor agents
- ❌ Task requires tight human-in-loop feedback
## Source Attribution
All case studies drawn from field experience documented in 8 source transcripts:
1. Elite Context Engineering - Case 1 (AI docs loader)
2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
3. Custom Agents - Case 5 (task board)
4. Sub-Agents - Case 6 (meta-agent)
5. Multi-Agent Observability - Case 7 (dashboard)
6. Hooked - Supporting patterns
7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
8. (Transcript 8 name not specified in context)
## Related Documentation
- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
---
**Remember:** These are real systems in production. Start simple, add complexity only when needed.

View File

@@ -0,0 +1,358 @@
# Work Tree Manager: Evolution Path Example
**Real-world case study** showing the proper progression from prompt → sub-agent → skill.
## The Problem
Managing git work trees across a project requires multiple related operations:
- Creating new work trees
- Listing existing work trees
- Removing old work trees
- Merging work tree changes
- Updating work tree status
## Stage 1: Start with a Prompt
**Goal:** Solve the basic problem
Create a simple slash command that creates one work tree:
```bash
/create-worktree feature-branch
```
**Implementation:**
```markdown
# .claude/commands/create-worktree.md
Create a new git worktree for the specified branch.
Steps:
1. Check if branch exists
2. Create worktree directory
3. Initialize worktree
4. Report success
```
**When to stay here:** The task is infrequent or one-off.
**Signal to advance:** You find yourself creating work trees regularly.
## Stage 2: Add Sub-Agent for Parallelism
**Goal:** Scale to multiple parallel operations
When you need to create multiple work trees at once, use a sub-agent:
```bash
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
```
**Why sub-agent:**
- **Parallelization** - Create 3 work trees simultaneously
- **Context isolation** - Each creation is independent
- **Speed** - 3x faster than sequential
**Sub-agent prompt:**
```markdown
Create work trees for the following branches in parallel:
- feature-a
- feature-b
- feature-c
For each branch:
1. Verify branch exists
2. Create worktree directory
3. Initialize worktree
4. Report status
Use the /create-worktree command for each.
```
**When to stay here:** Parallel creation is the only requirement.
**Signal to advance:** You need to **manage** work trees (not just create them).
## Stage 3: Create Skill for Management
**Goal:** Bundle multiple related operations
The problem has grown beyond creation—you need comprehensive work tree **management**:
```text
skills/work-tree-manager/
├── SKILL.md
├── scripts/
│ ├── validate.py
│ └── cleanup.py
└── reference/
└── git-worktree-commands.md
```
**SKILL.md:**
```markdown
---
name: work-tree-manager
description: Manage git worktrees - create, list, remove, merge, and update across projects. Use when working with git worktrees or when managing multiple branches simultaneously.
---
# Work Tree Manager
## Operations
### Create
Use /create-worktree command for single operations.
For parallel creation, delegate to sub-agent.
### List
Run: `git worktree list`
Parse output and present in readable format.
### Remove
1. Check if work tree is clean
2. Remove work tree directory
3. Prune references
### Merge
1. Fetch latest changes
2. Merge work tree branch to target
3. Clean up if merge successful
### Update
1. Check status of all work trees
2. Pull latest changes
3. Report any conflicts
## Validation
Before any destructive operation, run:
```bash
python scripts/validate.py <worktree-path>
```
## Cleanup
Periodically run cleanup to remove stale work trees:
```bash
python scripts/cleanup.py --dry-run
```
```bash
**Why skill:**
- **Multiple related operations** - Create, list, remove, merge, update
- **Repeat problem** - Managing work trees is ongoing
- **Domain-specific** - Specialized knowledge about git worktrees
- **Orchestration** - Coordinates slash commands, sub-agents, and scripts
**When to stay here:** Most workflows stop here.
**Signal to advance:** Need external data (GitHub API, CI/CD status).
## Stage 4: Add MCP for External Data
**Goal:** Integrate external systems
Add MCP server to query external repo metadata:
```
skills/work-tree-manager/
├── SKILL.md (updated)
└── ... (existing files)
# Now references GitHub MCP for:
# - Branch protection rules
# - CI/CD status
# - Pull request information
```bash
**Updated SKILL.md section:**
```markdown
## External Integration
Before creating work tree, check GitHub status:
- Use GitHub MCP to query branch protection
- Check if CI is passing
- Verify no open blocking PRs
Query: `GitHub:get_branch_status <branch-name>`
```
**Why MCP:**
- **External data** - Information lives outside Claude Code
- **Real-time** - CI/CD status changes frequently
- **Third-party** - GitHub API integration
## Final State
```
```text
```text
```text
```text
Prompt (Slash Command)
└─→ Creates single work tree
Sub-Agent
└─→ Creates multiple work trees in parallel
Skill
├─→ Orchestrates: Create, list, remove, merge, update
├─→ Uses: Slash commands for primitives
├─→ Uses: Sub-agents for parallel operations
└─→ Uses: Scripts for validation
MCP Server (GitHub)
└─→ Provides: Branch status, CI/CD info, PR data
Skill + MCP
└─→ Full-featured work tree manager with external integration
```
## Key Takeaways
### Progression Signals
**Prompt → Sub-Agent:**
- Signal: Need parallelization
- Keyword: "multiple," "parallel," "batch"
**Sub-Agent → Skill:**
- Signal: Need management, not just execution
- Keywords: "manage," "coordinate," "workflow"
- Multiple related operations emerge
**Skill → Skill + MCP:**
- Signal: Need external data or services
- Keywords: "GitHub," "API," "real-time," "status"
### Common Mistakes
**Skipping the prompt**
- Starting with a skill for simple creation
**Overusing sub-agents**
- Using sub-agents when main conversation would work
**Skill too early**
- Creating skill before understanding the full problem domain
**Correct approach**
- Build from bottom up
- Add complexity only when needed
- Each stage solves a real problem
### Decision Checklist
Before advancing to next stage:
**Prompt → Sub-Agent:**
- [ ] Do I need parallelization?
- [ ] Are operations truly independent?
- [ ] Am I okay losing context after?
**Sub-Agent → Skill:**
- [ ] Am I doing this repeatedly (3+ times)?
- [ ] Do I have multiple related operations?
- [ ] Is this a management problem, not just execution?
- [ ] Would orchestration add real value?
**Skill → Skill + MCP:**
- [ ] Do I need external data?
- [ ] Is the data outside Claude Code's control?
- [ ] Would real-time info improve the workflow?
## Real Usage
### Scenario 1: Quick One-Off
**Task:** Create one work tree for hotfix
**Solution:** Slash command
```bash
/create-worktree hotfix-urgent-bug
```
**Why:** Simple, direct, one-time task.
### Scenario 2: Feature Development Sprint
**Task:** Create work trees for 5 feature branches
**Solution:** Sub-agent
```bash
Create work trees in parallel for sprint features:
feature-auth, feature-api, feature-ui, feature-tests, feature-docs
```
**Why:** Parallel execution, independent operations.
### Scenario 3: Ongoing Project
**Task:** Manage all work trees across development lifecycle
**Solution:** Skill
```text
List all work trees, check status, merge completed features, clean up stale ones
```
**Why:** Multiple operations, repeat problem, management need.
### Scenario 4: CI/CD Integration
**Task:** Only create work trees for branches passing CI
**Solution:** Skill + MCP
```bash
Create work trees for features that:
- Have passing CI (check via GitHub MCP)
- Are approved by reviewers
- Have no merge conflicts
```
**Why:** Need external data from GitHub API.
## Summary
The work tree manager evolution demonstrates:
1. **Start simple** - Slash command for basic operation
2. **Scale for parallelism** - Sub-agent for batch operations
3. **Manage complexity** - Skill for full workflow orchestration
4. **Integrate externally** - MCP for real-time external data
**The principle:** Each stage solves a real problem. Don't advance until you hit the limitation of your current approach.
> "When you're starting out, I always recommend you just build a prompt. Everything is a prompt in the end."
Build from the foundation upward.

View File

@@ -0,0 +1,158 @@
# Context in Composition
**Strategic framework for managing context when composing multi-agent systems.**
## The Core Problem
Context window is your most precious resource when composing multiple agents. A focused agent is a performant agent.
**The Reality:**
```text
Single agent doing everything:
├── Context explodes to 150k+ tokens
├── Performance degrades
└── Eventually fails or times out
Multi-agent composition:
├── Each agent: <40k tokens
├── Main agent: Stays lean
└── Work completes successfully
```
## The R&D Framework
There are only two strategies for managing context in multi-agent systems:
**R - Reduce**
- Minimize what enters context windows
- Remove unused MCP servers (can consume 24k+ tokens)
- Shrink static CLAUDE.md files
- Use context priming instead of static loading
**D - Delegate**
- Move work to sub-agents' isolated contexts
- Use background agents for autonomous work
- Employ orchestrator sleep patterns
- Treat agents as deletable temporary resources
**Everything else is a tactic implementing R or D.**
## The Four Levels of Context Mastery
### Level 1: Beginner - Stop Wasting Tokens
**Focus:** Resource management
**Key Actions:**
- Remove unused MCP servers (reclaim 20k+ tokens)
- Minimize CLAUDE.md (<1k tokens)
- Disable autocompact buffer (reclaim 20%)
**Success Metric:** 85-90% context window free at startup
**Move to Level 2 when:** Resources cleaned but still rebuilding context for different tasks
---
### Level 2: Intermediate - Load Selectively
**Focus:** Dynamic context loading
**Key Actions:**
- Context priming (`/prime` commands vs. static files)
- Sub-agent delegation for parallel work
- Composable workflows (scout-plan-build)
**Success Metric:** 60-75% context window free during work
**Move to Level 3 when:** Managing multiple agents but struggling with handoffs
---
### Level 3: Advanced - Multi-Agent Handoff
**Focus:** Agent-to-agent context transfer
**Key Actions:**
- Context bundles (60-70% transfer in 10% tokens)
- Monitor context limits proactively
- Chain multiple agents without overflow
**Success Metric:** Per-agent context <60k tokens, successful handoffs
**Move to Level 4 when:** Need agents working autonomously while you do other work
---
### Level 4: Agentic - Out-of-Loop Systems
**Focus:** Fleet orchestration
**Key Actions:**
- Background agents (`/background` command)
- Dedicated agent environments
- Orchestrator sleep patterns
- Zero-touch execution
**Success Metric:** Agents ship work end-to-end without intervention
---
## When Context Becomes a Composition Issue
**Trigger 1: Single Agent Exceeds 150k Tokens**
→ Delegate to sub-agents with isolated contexts
**Trigger 2: Agent Reading >20 Files**
→ Use scout agents to identify relevant subset first
**Trigger 3: `/context` Shows >80% Used**
→ Start fresh agent, use context bundles for handoff
**Trigger 4: Performance Degrading Mid-Workflow**
→ Split workflow across multiple focused agents
**Trigger 5: Same Analysis Repeated Multiple Times**
→ Context overflow forcing re-reads; delegate earlier
## Composition Patterns by Level
**Beginner:** Single agent, minimal static context
**Intermediate:** Main agent + sub-agents for parallel work
**Advanced:** Agent chains with context bundles for handoff
**Agentic:** Orchestrator + fleet of specialized agents
## Key Principles
1. **Focused agents perform better** - Single purpose, minimal context
2. **Agents are deletable** - Free context by removing completed agents
3. **200k is plenty** - Context explosions are design problems, not capacity problems
4. **Orchestrators must sleep** - Don't observe all sub-agent work
5. **Context bundles over full replay** - 70% context in 10% tokens
## Implementation Details
For practical patterns, see:
- [Multi-Agent Context Isolation](../reference/multi-agent-context-isolation.md) - Parallel execution, context bundling
- [Orchestrator Pattern](orchestrator-pattern.md) - Sleep patterns, fleet management
- [Decision Framework](decision-framework.md) - When to use each component
## Source Attribution
Primary: Elite Context Engineering, Claude 2.0 transcripts
Supporting: One Agent to Rule Them All, Sub-Agents documentation
---
**Remember:** Context is the first pillar of the Core 4. Master context strategy, and you can scale infinitely with focused agents.

View File

@@ -0,0 +1,715 @@
# Context Window Protection
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
Context window protection is about managing your agent's most precious resource: attention. A focused agent is a performant agent.
## The Core Problem
**Every engineer hits this wall:**
```text
Agent starts: 10k tokens (5% used)
After exploration: 80k tokens (40% used)
After planning: 120k tokens (60% used)
During implementation: 170k tokens (85% used) ⚠️
Context explodes: 195k tokens (98% used) ❌
Agent performance degrades, fails, or times out
```
**The realization:** More context ≠ better performance. Too much context = cognitive overload.
## The R&D Framework
There are only two ways to manage your context window:
```text
R - REDUCE
└─→ Minimize what enters the context window
D - DELEGATE
└─→ Move work to other agents' context windows
```
**Everything else is a tactic implementing R or D.**
## The Four Levels of Context Protection
### Level 1: Beginner - Reduce Waste
**Focus:** Stop wasting tokens on unused resources
#### Tactic 1: Eliminate Default MCP Servers
**Problem:**
```bash
# Default mcp.json
{
"mcpServers": {
"firecrawl": {...}, # 6k tokens
"github": {...}, # 8k tokens
"postgres": {...}, # 5k tokens
"redis": {...} # 5k tokens
}
}
# Total: 24k tokens always loaded (12% of 200k window!)
```
**Solution:**
```bash
# Option 1: Delete default mcp.json entirely
rm .claude/mcp.json
# Option 2: Load selectively
claude-mcp-config --strict specialized-configs/firecrawl-only.json
# Result: 4k tokens instead of 24k (83% reduction)
```
#### Tactic 2: Minimize CLAUDE.md
**Before:**
```markdown
# CLAUDE.md (23,000 tokens = 11.5% of window)
- 500 lines of API documentation
- 300 lines of deployment procedures
- 1,500 lines of coding standards
- Architecture diagrams
- Always loaded, whether relevant or not
```
**After:**
```markdown
# CLAUDE.md (500 tokens = 0.25% of window)
# Only universal essentials
- Fenced code blocks MUST have language
- Use rg instead of grep
- ALWAYS use set -euo pipefail
```
**Rule:** Only include what you're 100% sure you want loaded 100% of the time.
#### Tactic 3: Disable Autocompact Buffer
**Problem:**
```bash
/context
# Output:
autocompact buffer: 22% ⚠️ (44k tokens gone!)
messages: 51%
system_tools: 8%
---
Total available: 78% (should be 100%)
```
**Solution:**
```bash
/config
# Set: autocompact = false
# Now:
/context
# Output:
messages: 51%
system_tools: 8%
custom_agents: 2%
---
Total available: 91% ✅ (reclaimed 22%!)
```
**Impact:** Reclaims 40k+ tokens immediately.
### Level 2: Intermediate - Dynamic Loading
**Focus:** Load what you need, when you need it
#### Tactic 4: Context Priming
**Replace static CLAUDE.md with task-specific `/prime` commands**
```markdown
# .claude/commands/prime.md
# General codebase context (2k tokens)
Read README, understand structure, report findings
# .claude/commands/prime-feature.md
# Feature development context (3k tokens)
Read feature requirements, understand dependencies, plan implementation
# .claude/commands/prime-api.md
# API work context (4k tokens)
Read API docs, understand endpoints, review integration patterns
```
**Usage pattern:**
```bash
# Starting feature work
/prime-feature
# vs. having 23k tokens always loaded
```
**Savings:** 20k tokens (87% reduction)
#### Tactic 5: Sub-Agent Delegation
**Problem:** Primary agent doing parallel work fills its own context
```text
Primary Agent tries to do:
├── Web scraping (15k tokens)
├── Documentation fetch (12k tokens)
├── Data analysis (10k tokens)
└── Synthesis (5k tokens)
= 42k tokens in one agent
```
**Solution:** Delegate to sub-agents with isolated contexts
```text
Primary Agent (9k tokens):
├→ Sub-Agent 1: Web scraping (15k tokens, isolated)
├→ Sub-Agent 2: Docs fetch (12k tokens, isolated)
└→ Sub-Agent 3: Analysis (10k tokens, isolated)
Total work: 46k tokens
Primary agent context: Only 9k tokens ✅
```
**Example:**
```bash
/load-ai-docs
# Agent spawns 10 sub-agents for web scraping
# Each scrape: ~3k tokens
# Total work: 30k tokens
# Primary agent context: Still only 9k tokens
# Savings: 21k tokens protected
```
**Key insight:** Sub-agents use system prompts (not user prompts), keeping their context isolated from primary.
### Level 3: Advanced - Multi-Agent Handoff
**Focus:** Chain agents together without context explosion
#### Tactic 6: Context Bundles
**Problem:** Agent 1's context explodes (180k tokens). Need to hand off to fresh Agent 2 without full replay.
**Solution:** Bundle 60-70% of essential context
```markdown
# context-bundle-2025-01-05-<session-id>.md
## Context Bundle
Created: 2025-01-05 14:30
Source Agent: agent-abc123
## Initial Setup
/prime-feature
## Read Operations (deduplicated)
- src/api/endpoints.ts
- src/components/Auth.tsx
- config/env.ts
## Key Findings
- Auth system uses JWT
- API has 15 endpoints
- Config needs migration
## User Prompts (summarized)
1. "Implement OAuth2 flow"
2. "Add refresh token logic"
[Excluded: full write operations, detailed read contents, tool execution details]
```
**Usage:**
```bash
# Agent 1: Context exploding at 180k
# Automatic bundle saved
# Agent 2: Fresh start (10k base)
/loadbundle /path/to/context-bundle-<timestamp>.md
# Agent 2 now has 70% of Agent 1's context in ~15k tokens
# Total: 25k tokens vs. 180k (86% reduction)
```
#### Tactic 7: Composable Workflows (Scout-Plan-Build)
**Problem:** Single agent searching + planning + building = context explosion
```text
Monolithic Agent:
├── Search codebase: 40k tokens
├── Read files: 60k tokens
├── Plan changes: 20k tokens
├── Implement: 30k tokens
├── Test: 15k tokens
└── Total: 165k tokens (83% used!)
```
**Solution:** Break into composable steps that delegate
```text
/scout-plan-build workflow:
Step 1: /scout (delegates to 4 parallel sub-agents)
├→ Sub-agents search codebase: 4 × 15k = 60k total
├→ Output: relevant-files.md (5k tokens)
└→ Primary agent context: unchanged
Step 2: /plan-with-docs
├→ Reads relevant-files.md: 5k tokens
├→ Scrapes docs: 8k tokens
├→ Creates plan: 3k tokens
└→ Total added: 16k tokens
Step 3: /build
├→ Reads plan: 3k tokens
├→ Implements: 30k tokens
└→ Total added: 33k tokens
Final primary agent context: 10k + 16k + 33k = 59k tokens
Savings: 106k tokens (64% reduction)
```
**Why this works:** Scout step offloads searching from planner (R&D: Reduce + Delegate)
### Level 4: Agentic - Out-of-Loop Systems
**Focus:** Agents working autonomously while you're AFK
#### Tactic 8: Focused Agents (One Agent, One Task)
**Anti-pattern:**
```text
Super Agent (trying to do everything):
├── API development
├── UI implementation
├── Database migrations
├── Testing
├── Documentation
├── Deployment
└── Context: 170k tokens (85% used)
```
**Pattern:**
```text
Focused Agent Fleet:
├── Agent 1: API only (30k tokens)
├── Agent 2: UI only (35k tokens)
├── Agent 3: DB only (20k tokens)
├── Agent 4: Tests only (25k tokens)
├── Agent 5: Docs only (15k tokens)
└── Each agent: <35k tokens (max 18% per agent)
```
**Principle:** "A focused engineer is a performant engineer. A focused agent is a performant agent."
#### Tactic 9: Deletable Agents
**Pattern:**
```bash
# Create agent for specific task
/create-agent docs-writer "Document frontend components"
# Agent completes task (used 30k tokens)
# DELETE agent immediately
/delete-agent docs-writer
# Result: 30k tokens freed for next agent
```
**Lifecycle:**
```text
1. Create agent → Task-specific context loaded
2. Agent works → Context grows to completion
3. Agent completes → Context maxed out
4. DELETE agent → Context freed
5. Create new agent → Fresh start
6. Repeat
```
**Engineering analogy:** "The best code is no code at all. The best agent is a deleted agent."
#### Tactic 10: Background Agent Delegation
**Problem:** You're in the loop, waiting for agent to finish long task
**Solution:** Delegate to background agent, continue working
```bash
# In-loop (you wait, your context stays open)
/implement-feature "Build auth system"
# Your terminal blocked for 20 minutes
# Context accumulates: 150k tokens
# Out-of-loop (you continue working)
/background "Build auth system" \
--model opus \
--report agents/auth-report.md
# Background agent works independently
# Your terminal freed immediately
# Background agent context isolated
# You get notified when complete
```
**Context protection:**
- Primary agent: 10k tokens (just manages job queue)
- Background agent: 150k tokens (isolated, will be deleted)
- Your interactive session: 10k tokens (protected)
#### Tactic 11: Orchestrator Sleep Pattern
**Problem:** Orchestrator observing all agent work = context explosion
```text
Orchestrator watches everything:
├── Scout 1 work: 15k tokens observed
├── Scout 2 work: 15k tokens observed
├── Scout 3 work: 15k tokens observed
├── Planner work: 25k tokens observed
├── Builder work: 35k tokens observed
└── Orchestrator context: 105k tokens
```
**Solution:** Orchestrator sleeps while agents work
```text
Orchestrator pattern:
1. Create scouts → 3k tokens (commands only)
2. SLEEP (not observing)
3. Wake every 15s, check status → 1k tokens
4. Scouts complete, read outputs → 5k tokens
5. Create planner → 2k tokens
6. SLEEP (not observing)
7. Wake every 15s, check status → 1k tokens
8. Planner completes, read output → 3k tokens
9. Create builder → 2k tokens
10. SLEEP (not observing)
Orchestrator final context: 17k tokens ✅
vs. 105k if watching everything (84% reduction)
```
**Key principle:** Orchestrator wakes to coordinate, sleeps while agents work.
## Monitoring Context Health
### The /context Command
```bash
/context
# Healthy agent (beginner level):
messages: 8%
system_tools: 5%
custom_agents: 2%
---
Total used: 15% ✅ (85% free)
# Warning (intermediate):
messages: 45%
mcp_tools: 18%
system_tools: 5%
---
Total used: 68% ⚠️ (32% free, approaching limits)
# Danger (needs intervention):
messages: 72%
mcp_tools: 24%
system_tools: 5%
---
Total used: 101% ❌ (context overflow!)
```
### Success Metrics by Level
| Level | Target Context Free | What This Enables |
|-------|---------------------|-------------------|
| Beginner | 85-90% | Basic tasks without running out |
| Intermediate | 60-75% | Complex tasks with breathing room |
| Advanced | 40-60% | Multi-step workflows without overflow |
| Agentic | Per-agent 60-80% | Fleet of focused agents |
### Warning Signs
**Your context window is in danger when:**
**Single agent exceeds 150k tokens**
- Solution: Split work across multiple agents
**Agent needs to read >20 files**
- Solution: Use scout agents to find relevant subset
**`/context` shows >80% used**
- Solution: Start fresh agent, use context bundles
**Agent gets slower/less accurate**
- Solution: Check context usage, delegate to sub-agents
**Autocompact buffer active**
- Solution: Disable it, reclaim 20%+ tokens
## Context Window Hard Limits
> "Context window is a hard limit. We have to respect this and work around it."
### The Reality
```text
Claude Opus 200k limit:
├── System prompt: ~8k tokens (4%)
├── Available tools: ~5k tokens (2.5%)
├── MCP servers: 0-24k tokens (0-12%)
├── CLAUDE.md: 0-23k tokens (0-11.5%)
├── Custom agents: ~2k tokens (1%)
└── Available for work: 138-185k tokens (69-92.5%)
Best case (optimized): 185k available
Worst case (unoptimized): 138k available
Difference: 47k tokens (25% of total capacity!)
```
### Real Example from the Field
> "We were 14% away from exploding our context in our scout-plan-build workflow."
```text
Scout-Plan-Build execution:
├── Base context: 15k tokens
├── Scout work (4 sub-agents): +40k tokens
├── Planner work: +35k tokens
├── Builder work: +80k tokens
└── Total: 170k tokens
With autocompact buffer (22%):
170k / 0.78 = 218k tokens
❌ Exceeds 200k limit by 18k (9% overflow)
Without autocompact buffer:
170k / 1.0 = 170k tokens
✅ Within limits with 30k buffer (15% free)
```
**Lesson:** Every percentage point matters when approaching limits.
## Common Context Explosion Patterns
### Pattern 1: The Sponge Agent
**Symptoms:**
- Agent reads entire codebase
- Opens 50+ files
- Context grows 10k tokens every few minutes
**Cause:** No filtering strategy
**Fix:**
```bash
# Before: Agent reads everything
Agent: "Analyzing codebase..."
[reads 100 files = 150k tokens]
# After: Scout first
/scout "Find files related to authentication"
# Scout outputs: 5 relevant files
Agent reads only those 5 files = 8k tokens
```
### Pattern 2: The Accumulator
**Symptoms:**
- Long conversation
- Many tool calls
- Context steadily grows to limit
**Cause:** Not resetting agent between phases
**Fix:**
```bash
# Phase 1: Exploration
[Agent explores, context hits 120k]
# Phase 2: Implementation
# ❌ Bad: Continue same agent (will overflow)
# ✅ Good: New agent with context bundle
/loadbundle context-from-phase-1.md
# Fresh agent (15k) + bundle (20k) = 35k tokens
# Ready for implementation without overflow
```
### Pattern 3: The Observer
**Symptoms:**
- Orchestrator context growing rapidly
- Watching all sub-agent work
- Can't coordinate more than 2-3 agents
**Cause:** Not using sleep pattern
**Fix:**
```python
# ❌ Bad: Orchestrator watches everything
for agent in agents:
result = orchestrator.watch_agent_work(agent) # Observes all work
orchestrator.context += result # Context explodes
# ✅ Good: Orchestrator sleeps
for agent in agents:
orchestrator.create_and_command(agent)
orchestrator.sleep() # Not observing
orchestrator.wake_and_check_status() # Only reads summaries
```
## The "200k is Plenty" Principle
> "I'm super excited for larger effective context windows, but 200k context window is plenty. You're just stuffing a single agent with too much work."
**The mindset shift:**
```text
Beginner thinking:
"I need a bigger context window"
"If only I had 500k tokens..."
"My task is too complex for 200k"
Expert thinking:
"I need better context management"
"I'm overloading a single agent"
"I should split this across focused agents"
```
**The truth:** Most context explosions are design problems, not capacity problems.
### Why 200k is Sufficient
**With proper protection:**
```text
Task: Refactor authentication across 50-file codebase
Approach 1 (Single Agent - fails):
├── Agent reads 50 files: 75k tokens
├── Agent plans changes: 20k tokens
├── Agent implements: 80k tokens
├── Agent tests: 30k tokens
└── Total: 205k tokens ❌ (overflow by 5k)
Approach 2 (Multi-Agent - succeeds):
├── Scout finds relevant 10 files: 15k tokens
├── Planner creates strategy: 20k tokens (new agent)
├── Builder 1 (auth logic): 35k tokens (new agent)
├── Builder 2 (UI changes): 30k tokens (new agent)
├── Tester verifies: 25k tokens (new agent)
└── Max per agent: 35k tokens ✅ (all within limits)
```
## Integration with Other Patterns
Context window protection enables:
**Progressive Disclosure:**
- Reduces: Minimal static context
- Enables: Dynamic loading via priming
**Core 4 Management:**
- Protects: Context (pillar #1)
- Enables: Better model/prompt/tools choices
**Orchestration:**
- Requires: Context protection (orchestrator sleep)
- Enables: Fleet management without overflow
**Observability:**
- Monitors: Context usage via hooks
- Prevents: Unnoticed context explosion
## Key Principles
1. **Reduce and Delegate** - The only two strategies that matter
2. **A focused agent is a performant agent** - Single-purpose beats multi-purpose
3. **Agents are deletable** - Free context by removing completed agents
4. **200k is plenty** - Context explosions are design problems
5. **Monitor constantly** - `/context` command is your best friend
6. **Orchestrators must sleep** - Don't observe all agent work
7. **Context bundles over full replay** - 70% of context in 10% of tokens
## Source Attribution
**Primary sources:**
- Elite Context Engineering (R&D framework, 4 levels, all tactics)
- Claude 2.0 (autocompact buffer, hard limits, scout-plan-build)
**Supporting sources:**
- One Agent to Rule Them All (orchestrator sleep, 200k principle, deletable agents)
- Sub-Agents (sub-agent delegation, context isolation)
**Key quotes:**
- "200k context window is plenty. You're just stuffing a single agent with too much work." (One Agent)
- "A focused agent is a performant agent." (Elite Context Engineering)
- "We were 14% away from exploding our context." (Claude 2.0)
- "There are only two ways to manage your context window: R and D." (Elite Context Engineering)
## Related Documentation
- [Progressive Disclosure](../reference/progressive-disclosure.md) - Context loading strategies
- [Orchestrator Pattern](orchestrator-pattern.md) - Fleet management requiring protection
- [Evolution Path](../workflows/evolution-path.md) - Progression through protection levels
- [Core 4 Framework](../reference/core-4-framework.md) - Context as first pillar
---
**Remember:** Context window management separates beginners from experts. Master it, and you can scale infinitely with focused agents.

View File

@@ -0,0 +1,434 @@
# Decision Framework: Choosing the Right Claude Code Component
This guide helps you choose the right Claude Code component for your specific task. **Always start with prompts**—master the primitive first before scaling to other components.
## Table of Contents
- [The Decision Tree](#the-decision-tree)
- [Quick Reference: Decision Matrix](#quick-reference-decision-matrix)
- [When to Use Each Component](#when-to-use-each-component)
- [Use Skills When](#use-skills-when)
- [Use Sub-Agents When](#use-sub-agents-when)
- [Use Slash Commands When](#use-slash-commands-when)
- [Use MCP Servers When](#use-mcp-servers-when)
- [Use Hooks When](#use-hooks-when)
- [Use Plugins When](#use-plugins-when)
- [Use Case Examples from the Field](#use-case-examples-from-the-field)
- [Composition Rules and Boundaries](#composition-rules-and-boundaries)
- [What Can Compose What](#what-can-compose-what)
- [Critical Composition Rules](#critical-composition-rules)
- [The Proper Evolution Path](#the-proper-evolution-path)
- [Stage 1: Start with a Prompt](#stage-1-start-with-a-prompt)
- [Stage 2: Add Sub-Agent if Parallelism Needed](#stage-2-add-sub-agent-if-parallelism-needed)
- [Stage 3: Create Skill When Management Needed](#stage-3-create-skill-when-management-needed)
- [Stage 4: Add MCP if External Data Needed](#stage-4-add-mcp-if-external-data-needed)
- [Common Decision Anti-Patterns](#common-decision-anti-patterns)
- [Anti-Pattern 1: Converting All Slash Commands to Skills](#anti-pattern-1-converting-all-slash-commands-to-skills)
- [Anti-Pattern 2: Using Skills for One-Off Tasks](#anti-pattern-2-using-skills-for-one-off-tasks)
- [Anti-Pattern 3: Skipping the Primitive](#anti-pattern-3-skipping-the-primitive)
- [Anti-Pattern 4: Using Sub-Agents When Context Matters](#anti-pattern-4-using-sub-agents-when-context-matters)
- [Anti-Pattern 5: Forgetting MCP is for External Only](#anti-pattern-5-forgetting-mcp-is-for-external-only)
- [Decision Checklist](#decision-checklist)
- [Summary: The Golden Rules](#summary-the-golden-rules)
## The Decision Tree
Start here when deciding which component to use:
```text
1. START HERE: Build a Prompt (Slash Command)
2. Need parallelization or isolated context?
YES → Use Sub-Agent
NO → Continue
3. External data/service integration?
YES → Use MCP Server
NO → Continue
4. One-off task (simple, direct)?
YES → Use Slash Command
NO → Continue
5. Repeatable workflow (pattern detection)?
YES → Use Agent Skill
NO → Continue
6. Lifecycle event automation?
YES → Use Hook
NO → Continue
7. Sharing/distributing to team?
YES → Use Plugin
NO → Default to Slash Command (prompt)
```
**Critical Rule:** Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
## Quick Reference: Decision Matrix
| Task Type | Component | Reason |
|-----------|-----------|---------|
| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
| External data/service access | MCP Server | Integration point |
| Parallel/isolated work | Sub-Agent | Context isolation |
| Parallel workflow tasks | Sub-Agent | **Whenever you see "parallel", think sub-agents** |
| One-off task | Slash Command | Simple, direct |
| Lifecycle automation | Hook | Event-driven |
| Team distribution | Plugin | Packaging |
## When to Use Each Component
### Use Skills When
**Signal keywords:** "automatic," "repeat," "manage," "workflow"
**Criteria:**
- You have a **REPEAT** problem that needs **MANAGEMENT**
- Multiple related operations need coordination
- You want **automatic** behavior (agent-invoked)
- The problem domain requires orchestration of multiple components
**Example scenarios:**
- Managing git work trees (create, list, remove, merge, update)
- Detecting style guide violations across codebase
- Automatic PDF text extraction and processing
- Video processing workflows with multiple steps
**NOT for:**
- One-off tasks → Use Slash Command instead
- Simple operations → Use Slash Command instead
- Problems solved well by a single prompt → Don't over-engineer
**Remember:** Skills are for managing problem domains, not solving one-off tasks.
### Use Sub-Agents When
**Signal keywords:** "parallel," "scale," "bulk," "isolated," "batch"
**Criteria:**
- **Parallelization** is needed
- **Context isolation** is required
- Scale tasks and batch operations
- You're okay with losing context afterward
- Each task can run independently
**Example scenarios:**
- Comprehensive security audits
- Fix & debug tests at scale
- Parallel workflow tasks
- Bulk operations on multiple files
- Isolated research that doesn't pollute main context
**NOT for:**
- Tasks that need to share context → Use main conversation
- Sequential operations → Use Slash Command or Skill
- Tasks that need to spawn more sub-agents → Hard limit: no nesting
**Critical constraint:** You must be okay with losing context afterward. Sub-agent context doesn't persist in the main conversation (unless you use resumable sub-agents).
**Golden rule:** "Whenever you see parallel, you should always just think sub-agents. Nothing else supports parallel calling."
### Use Slash Commands When
**Signal keywords:** "one-off," "simple," "quick," "manual"
**Criteria:**
- One-off tasks
- Simple repeatable actions
- You're starting a new workflow
- Building the primitive before composing
- You want manual control over invocation
**Example scenarios:**
- Git commit messages (one at a time)
- Create UI component
- Run specific code generation
- Execute a well-defined task
- Quick transformations
**Philosophy:** "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
**Remember:** Slash commands are the primitive foundation. Master these first before anything else.
### Use MCP Servers When
**Signal keywords:** "external," "database," "API," "service," "integration"
**Criteria:**
- External integrations are needed
- Data sources outside Claude Code
- Third-party services
- Database connections
- Real-time data access
**Example scenarios:**
- Connect to Jira
- Query databases (PostgreSQL, etc.)
- Fetch real-time weather data
- GitHub integration
- Slack integration
- Figma designs
**NOT for:**
- Internal orchestration → Use Skills instead
- Pure computation → Use Slash Command or Skill
**Clear rule:** External = MCP, Internal orchestration = Skills
**Context consideration:** MCP servers can "torch your context window" by loading all their context at startup, unlike Skills which use progressive disclosure.
### Use Hooks When
**Signal keywords:** "lifecycle," "event," "automation," "deterministic"
**Criteria:**
- Deterministic automation at lifecycle events
- Want to execute commands at specific moments
- Need to balance agent autonomy with deterministic control
- Workflow automation that should always happen
**Example scenarios:**
- Run linters before code submission
- Auto-format code after generation
- Trigger tests after file changes
- Capture context at specific points
**Philosophy:** "If you really want to scale, you need both" - agents AND deterministic workflows.
**Use for:** Adding determinism rather than always relying on the agent to decide.
### Use Plugins When
**Signal keywords:** "share," "distribute," "package," "team"
**Criteria:**
- Sharing/distributing to team
- Packaging multiple components together
- Reusable work across projects
- Team-wide extensions
**Example scenarios:**
- Distribute custom skills to team
- Bundle MCP servers for automatic start
- Share slash commands across projects
- Package hooks and configurations
**Philosophy:** "Plugins let you package and distribute these sets of work. This isn't super interesting. It's just a way to share and reuse cloud code extensions."
## Use Case Examples from the Field
Real examples with reasoning:
| Use Case | Component | Reasoning |
|----------|-----------|-----------|
| Automatic PDF text extraction | Agent Skill | Keyword "automatic", repeat behavior |
| Connect to Jira | MCP Server | External source |
| Comprehensive security audit | Sub-Agent | Scale, isolated context, not automatic |
| Generalized git commit messages | Slash Command | Simple one-step task |
| Query database | MCP Server | External data source (start here) |
| Fix/debug tests at scale | Sub-Agent | Parallel work, scale |
| Detect style guide violations | Agent Skill | Repeat behavior pattern |
| Fetch real-time weather | MCP Server | Third-party service integration |
| Create UI component | Slash Command | Simple one-off task |
| Parallel workflow tasks | Sub-Agent | Keyword "parallel" |
## Composition Rules and Boundaries
### What Can Compose What
**Skills (Top Compositional Layer):**
- ✅ Can use: MCP Servers
- ✅ Can use: Sub-Agents
- ✅ Can use: Slash Commands
- ✅ Can use: Other Skills
- ❌ Cannot: Nest sub-agents/prompts directly (must use SlashCommand tool)
**Slash Commands (Primitive + Compositional):**
- ✅ Can use: Skills (via SlashCommand tool)
- ✅ Can use: MCP Servers
- ✅ Can use: Sub-Agents
- ✅ Acts as: BOTH primitive AND composition point
**Sub-Agents (Execution Layer):**
- ✅ Can use: Slash Commands (via SlashCommand tool)
- ✅ Can use: Skills (via SlashCommand tool)
- ❌ CANNOT use: Other Sub-Agents (hard limit)
**MCP Servers (Integration Layer):**
- Lower level unit, used BY skills
- Not using skills
- Expose services to all components
### Critical Composition Rules
1. **Sub-Agents cannot nest** - No sub-agent spawning other sub-agents (prevents infinite nesting)
2. **Skills don't execute code** - They guide Claude to use available tools
3. **Slash commands can be invoked manually or via SlashCommand tool**
4. **Skills use the SlashCommand tool** to compose prompts and sub-agents
5. **No circular dependencies** - Skills can use other skills but cannot nest circularly
## The Proper Evolution Path
When building new capabilities, follow this progression:
### Stage 1: Start with a Prompt
**Goal:** Solve the basic problem
Create a simple prompt or slash command that accomplishes the core task.
**Example (Git Work Trees):** Create one work tree
```bash
/create-worktree feature-branch
```
**When to stay here:** The task is one-off or infrequent.
### Stage 2: Add Sub-Agent if Parallelism Needed
**Goal:** Scale to multiple parallel operations
If you need to do the same thing many times in parallel, use a sub-agent.
**Example (Git Work Trees):** Create multiple work trees in parallel
```bash
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
```
**When to stay here:** Parallel execution is the only requirement, no orchestration needed.
### Stage 3: Create Skill When Management Needed
**Goal:** Bundle multiple related operations
When the problem grows to require management, create a skill.
**Example (Git Work Trees):** Manage work trees (create, list, remove, merge, update)
Now you have a cohesive work tree manager skill that:
- Creates new work trees
- Lists existing work trees
- Removes old work trees
- Merges work trees
- Updates work tree status
**When to stay here:** Most domain-specific workflows stop here.
### Stage 4: Add MCP if External Data Needed
**Goal:** Integrate external systems
Only add MCP servers when you need data from outside Claude Code.
**Example (Git Work Trees):** Query external repo metadata from GitHub API
Now your skill can query GitHub for:
- Branch protection rules
- CI/CD status
- Pull request information
**Final state:** Full-featured work tree manager with external integration.
## Common Decision Anti-Patterns
### ❌ Anti-Pattern 1: Converting All Slash Commands to Skills
**Mistake:** "I'm going to convert all my slash commands to skills because skills are better."
**Why wrong:** Skills are for repeatable workflows that need management, not simple one-off tasks. Slash commands are the primitive—you need them.
**Correct approach:** Keep slash commands for simple tasks. Only create a skill when you're managing a problem domain with multiple related operations.
### ❌ Anti-Pattern 2: Using Skills for One-Off Tasks
**Mistake:** "I need to create a UI component once, so I'll build a skill for it."
**Why wrong:** Skills are for repeat problems. One-off tasks should use slash commands.
**Correct approach:** Use a slash command for the one-off task. If you find yourself doing it repeatedly, then consider a skill.
### ❌ Anti-Pattern 3: Skipping the Primitive
**Mistake:** "I'm going to start by building a skill because it's more advanced."
**Why wrong:** If you don't master prompts, you can't build effective skills. Everything is prompts in the end.
**Correct approach:** Always start with a prompt. Build the primitive first. Scale up only when needed.
### ❌ Anti-Pattern 4: Using Sub-Agents When Context Matters
**Mistake:** "I'll use a sub-agent for this research task and then reference the findings later."
**Why wrong:** Sub-agent context is isolated. You lose it after the sub-agent finishes (unless using resumable sub-agents).
**Correct approach:** If you need the context later, do the work in the main conversation or use a resumable sub-agent.
### ❌ Anti-Pattern 5: Forgetting MCP is for External Only
**Mistake:** "I'll build an MCP server to orchestrate internal workflows."
**Why wrong:** MCP servers are for external integrations. Internal orchestration should use skills.
**Correct approach:** MCP = external, Skills = internal orchestration. Keep them separate.
## Decision Checklist
Before you start building, ask yourself:
**Basic Questions:**
- [ ] Have I started with a prompt? (Non-negotiable)
- [ ] Is this a one-off task or repeatable?
- [ ] Do I need external data or services?
- [ ] Is parallelization required?
- [ ] Am I okay losing context after execution?
**Composition Questions:**
- [ ] Am I trying to nest sub-agents? (Not allowed)
- [ ] Am I converting a simple slash command to a skill? (Probably wrong)
- [ ] Am I using MCP for internal orchestration? (Should use skills)
- [ ] Have I considered the evolution path? (Prompt → Sub-agent → Skill → MCP)
**Context Questions:**
- [ ] Will this torch my context window? (MCP consideration)
- [ ] Do I need progressive disclosure? (Skills benefit)
- [ ] Is context isolation critical? (Sub-agent benefit)
- [ ] Will I need this context later? (Don't use sub-agent)
## Summary: The Golden Rules
1. **Always start with prompts** - Master the primitive first
2. **"Parallel" keyword = Sub-Agents** - Nothing else supports parallel calling
3. **External = MCP, Internal = Skills** - Clear separation of concerns
4. **One-off = Slash Command** - Don't over-engineer
5. **Repeat + Management = Skill** - Only scale when needed
6. **Don't convert all slash commands to skills** - Huge mistake
7. **Skills compose upward, not downward** - Build from primitives
Remember The Core 4: Context, Model, Prompt, Tools. Master these fundamentals, and you'll master the compositional units.

View File

@@ -0,0 +1,925 @@
# Hooks for Observability and Control
> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
Claude Code hooks provide deterministic control over agent behavior and enable comprehensive monitoring of multi-agent systems.
## What Are Hooks?
**Hooks are lifecycle event handlers that let you execute custom code at specific points in Claude Code's execution.**
```text
Agent Lifecycle:
├── pre-tool-use hook → Before any tool executes
├── [Tool executes]
├── post-tool-use hook → After tool completes
├── notification hook → When agent needs input
├── sub-agent-stop hook → When sub-agent finishes
└── stop hook → When agent completes response
```
**Two killer use cases:**
1. **Observability** - Know what your agents are doing
2. **Control** - Steer and block agent behavior
## The Five Hooks
### 1. pre-tool-use
**When it fires:** Before any tool executes
**Use cases:**
- Block dangerous commands (`rm -rf`, destructive operations)
- Prevent access to sensitive files (`.env`, `credentials.json`)
- Log tool attempts before execution
- Validate tool parameters
**Available data:**
```json
{
"toolName": "bash",
"toolInput": {
"command": "rm -rf /",
"description": "Remove all files"
}
}
```
**Example: Block dangerous commands**
```python
# .claude/hooks/pre-tool-use.py
# /// script
# dependencies = []
# ///
import sys
import json
import re
def is_dangerous_remove_command(tool_name, tool_input):
"""Block any rm -rf commands"""
if tool_name != "bash":
return False
command = tool_input.get("command", "")
dangerous_patterns = [
r'\brm\s+-rf\b',
r'\brm\s+-fr\b',
r'\brm\s+.*-[rf].*\*',
]
return any(re.search(pattern, command) for pattern in dangerous_patterns)
def main():
input_data = json.load(sys.stdin)
tool_name = input_data.get("toolName")
tool_input = input_data.get("toolInput", {})
if is_dangerous_remove_command(tool_name, tool_input):
# Block the command
output = {
"allow": False,
"message": "❌ Blocked dangerous rm command"
}
else:
output = {"allow": True}
print(json.dumps(output))
if __name__ == "__main__":
main()
```
**Configuration in settings.json:**
```json
{
"hooks": {
"pre-tool-use": [
{
"matcher": {}, // Empty = matches all tools
"commands": [
"uv run .claude/hooks/pre-tool-use.py"
]
}
]
}
}
```
### 2. post-tool-use
**When it fires:** After a tool completes execution
**Use cases:**
- Log tool execution results
- Track which tools are used most frequently
- Measure tool execution time
- Build observability dashboards
- Summarize tool output with small models
**Available data:**
```json
{
"toolName": "write",
"toolInput": {
"file_path": "/path/to/file.py",
"content": "..."
},
"toolResult": {
"success": true,
"output": "File written successfully"
}
}
```
**Example: Event logging with summarization**
```python
# .claude/hooks/post-tool-use.py
import sys
import json
import os
from anthropic import Anthropic
def summarize_event(tool_name, tool_input, tool_result):
"""Use Haiku to summarize what happened"""
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
prompt = f"""Summarize this tool execution in 1 sentence:
Tool: {tool_name}
Input: {json.dumps(tool_input, indent=2)}
Result: {json.dumps(tool_result, indent=2)}
Be concise and focus on what was accomplished."""
response = client.messages.create(
model="claude-3-haiku-20240307", # Small, fast, cheap
max_tokens=100,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def main():
input_data = json.load(sys.stdin)
# Generate summary using small model
summary = summarize_event(
input_data.get("toolName"),
input_data.get("toolInput", {}),
input_data.get("toolResult", {})
)
# Log the event with summary
event = {
"toolName": input_data["toolName"],
"summary": summary,
"timestamp": input_data.get("timestamp")
}
# Send to observability server
send_to_server(event)
if __name__ == "__main__":
main()
```
**Why small models?** "I've sent thousands of these events. I've spent less than 20 cents. This is where small fast models really shine."
### 3. notification
**When it fires:** When Claude Code needs user input (permission request)
**Use cases:**
- Text-to-speech notifications
- Send alerts to phone/Slack
- Log permission requests
- Auto-approve specific tools
**Available data:**
```json
{
"message": "Your agent needs your input",
"context": {
"toolName": "bash",
"command": "bun run apps/hello.ts"
}
}
```
**Example: Text-to-speech notification**
```python
# .claude/hooks/notification.py
import sys
import json
import subprocess
def speak(text):
"""Use 11Labs API for text-to-speech"""
subprocess.run([
"uv", "run",
".claude/hooks/utils/text-to-speech-elevenlabs.py",
text
])
def main():
input_data = json.load(sys.stdin)
message = input_data.get("message", "Your agent needs your input")
# Speak the notification
speak(message)
# Log it
print(json.dumps({"notified": True}))
if __name__ == "__main__":
main()
```
### 4. stop
**When it fires:** Every time Claude Code finishes responding
**Use cases:**
- Copy full chat transcript for analysis
- Completion notifications (text-to-speech)
- Session logging
- Performance metrics
- Agent output summarization
**Available data:**
```json
{
"transcriptPath": "/path/to/chat-transcript.json",
"sessionId": "abc123",
"timestamp": "2025-01-05T14:30:00Z"
}
```
**Example: Save full conversation**
```python
# .claude/hooks/stop.py
import sys
import json
import shutil
from pathlib import Path
from datetime import datetime
def main():
input_data = json.load(sys.stdin)
transcript_path = input_data.get("transcriptPath")
if not transcript_path:
return
# Copy transcript to logs directory
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
session_id = input_data.get("sessionId", "unknown")
logs_dir = Path(".claude/logs")
logs_dir.mkdir(exist_ok=True)
dest = logs_dir / f"chat-{timestamp}-{session_id[:8]}.json"
shutil.copy(transcript_path, dest)
# Announce completion
subprocess.run([
"uv", "run",
".claude/hooks/utils/text-to-speech.py",
"All set and ready for your next step"
])
print(json.dumps({"logged": True, "file": str(dest)}))
if __name__ == "__main__":
main()
```
**Key insight:** "The stop event is the perfect time to copy the entire chat conversation. This is key for observability. What happened? How can we improve it?"
### 5. sub-agent-stop
**When it fires:** When a sub-agent completes its work
**Use cases:**
- Track parallel sub-agent completion
- Per-agent performance metrics
- Multi-agent orchestration logging
- Progress notifications for long-running jobs
**Available data:**
```json
{
"subAgentId": "agent-123",
"transcriptPath": "/path/to/sub-agent-transcript.json",
"sessionId": "parent-abc123",
"timestamp": "2025-01-05T14:32:00Z"
}
```
**Example: Sub-agent completion tracking**
```python
# .claude/hooks/sub-agent-stop.py
import sys
import json
def main():
input_data = json.load(sys.stdin)
# Log sub-agent completion
event = {
"type": "sub-agent-complete",
"agentId": input_data.get("subAgentId"),
"timestamp": input_data.get("timestamp")
}
# Send to observability system
send_event(event)
# Announce
speak("Sub agent complete")
if __name__ == "__main__":
main()
```
## Multi-Agent Observability Architecture
When scaling to 3, 5, 10+ agents, hooks enable comprehensive system visibility.
### Architecture Overview
```text
┌─────────────────────────────────────────────────────────────┐
│ Multiple Agents │
│ Agent 1 Agent 2 Agent 3 ... Agent N │
│ │ │ │ │ │
│ └──────────┴──────────┴──────────────────┘ │
│ │ │
│ Hooks fire │
│ ↓ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Bun/Node Server │
│ ┌────────────────┐ ┌──────────────┐ │
│ │ HTTP Endpoint │────────→│ SQLite DB │ │
│ │ /events │ │ (persistence)│ │
│ └────────────────┘ └──────────────┘ │
│ │ │
│ └────────────→ WebSocket Broadcast │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Web Client (Vue/React) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Event Stream (filtered by app/session/event type) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Event Details (with AI summaries) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Key Design Principles
**1. One-Way Data Stream**
```text
Agent → Hook → Server → Database + WebSocket → Client
```
"This one-way data stream keeps things really simple. Every agent is responsible for summarizing their work in the hook before they send it off."
**Benefits:**
- Simple architecture
- Easy to reason about
- No bidirectional complexity
- Fast real-time updates
**2. Event Summarization at the Edge**
```python
# In the hook (runs on agent side)
def send_event(app_name, event_type, event_data, summarize=True):
if summarize:
# Use Haiku to summarize before sending
summary = summarize_with_haiku(event_data)
event_data["summary"] = summary
# Send to server
requests.post("http://localhost:3000/events", json={
"app": app_name,
"type": event_type,
"data": event_data,
"sessionId": os.getenv("CLAUDE_SESSION_ID")
})
```
**Why summarize at the edge?**
- Reduces server load
- Cheaper (uses small models locally)
- Human-readable summaries immediately available
- No server-side LLM dependencies
**3. Persistent + Real-Time Storage**
```sql
-- SQLite schema
CREATE TABLE events (
id INTEGER PRIMARY KEY,
source_app TEXT NOT NULL,
session_id TEXT NOT NULL,
event_type TEXT NOT NULL,
raw_payload JSON,
summary TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**Dual persistence:**
- SQLite for historical queries and analysis
- WebSocket for live streaming to UI
### Implementation Example
**Hook script structure:**
```python
# .claude/hooks/utils/send-event.py
# /// script
# dependencies = ["anthropic", "requests"]
# ///
import sys
import json
import os
import requests
from anthropic import Anthropic
def summarize_with_haiku(event_data, event_type):
"""Generate 1-sentence summary using Haiku"""
if event_type not in ["pre-tool-use", "post-tool-use"]:
return None
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
prompt = f"Summarize this {event_type} event in 1 sentence: {json.dumps(event_data)}"
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=50,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def send_event(app_name, event_type, event_data, summarize=False):
"""Send event to observability server"""
payload = {
"app": app_name,
"sessionId": os.getenv("CLAUDE_SESSION_ID", "unknown"),
"eventType": event_type,
"data": event_data,
"timestamp": event_data.get("timestamp")
}
if summarize:
payload["summary"] = summarize_with_haiku(event_data, event_type)
try:
response = requests.post(
"http://localhost:3000/events",
json=payload,
timeout=1
)
return response.status_code == 200
except Exception as e:
# Don't break agent if observability fails
print(f"Warning: Failed to send event: {e}", file=sys.stderr)
return False
def main():
if len(sys.argv) < 3:
print("Usage: send-event.py <app-name> <event-type> [--summarize]")
sys.exit(1)
app_name = sys.argv[1]
event_type = sys.argv[2]
summarize = "--summarize" in sys.argv
# Read event data from stdin
event_data = json.load(sys.stdin)
success = send_event(app_name, event_type, event_data, summarize)
print(json.dumps({"sent": success}))
if __name__ == "__main__":
main()
```
**Using in hooks:**
```python
# .claude/hooks/post-tool-use.py
import sys
import json
import subprocess
def main():
input_data = json.load(sys.stdin)
# Send to observability system with summarization
subprocess.run([
"uv", "run",
".claude/hooks/utils/send-event.py",
"my-app", # App name
"post-tool-use", # Event type
"--summarize" # Generate AI summary
], input=json.dumps(input_data), text=True)
print(json.dumps({"logged": True}))
if __name__ == "__main__":
main()
```
## Best Practices
### 1. Use Isolated Scripts (Astral UV Pattern)
**Why:** Hooks should be self-contained, portable, and not depend on your codebase.
```python
# /// script
# dependencies = ["anthropic", "requests"]
# ///
# Astral UV single-file script
# Runs independently with: uv run script.py
# Auto-installs dependencies
```
**Benefits:**
- Works in any codebase
- No virtual environment setup
- Portable across projects
- Easy to test in isolation
**Alternative: Bun for TypeScript**
```typescript
// .claude/hooks/post-tool-use.ts
// Run with: bun run post-tool-use.ts
import { readSync } from "fs";
const input = JSON.parse(readSync(0, "utf-8"));
// ... hook logic
```
### 2. Never Block the Agent
```python
def main():
try:
# Hook logic
send_to_server(event)
except Exception as e:
# Log but don't fail
print(f"Warning: {e}", file=sys.stderr)
# Always output valid JSON
print(json.dumps({"error": str(e)}))
```
**Rule:** If observability fails, the agent should continue working.
### 3. Use Small Fast Models for Summaries
```text
Cost comparison (1,000 events):
├── Opus: $15 (overkill for summaries)
├── Sonnet: $3 (still expensive)
└── Haiku: $0.20 ✅ (perfect for this)
```
"Thousands of events, less than 20 cents. Small fast cheap models shine here."
### 4. Hash Session IDs for UI Consistency
```python
import hashlib
def color_for_session(session_id):
"""Generate consistent color from session ID"""
hash_val = int(hashlib.md5(session_id.encode()).hexdigest()[:6], 16)
return f"#{hash_val:06x}"
```
**Result:** Same agent = same color in UI, making it easy to track.
### 5. Filter and Paginate Events
```javascript
// Client-side filtering
const filteredEvents = events
.filter(e => e.app === selectedApp || selectedApp === "all")
.filter(e => e.eventType === selectedType || selectedType === "all")
.slice(0, 100); // Limit displayed events
// Auto-refresh
setInterval(() => fetchLatestEvents(), 5000);
```
### 6. Multiple Hooks Per Event
```json
{
"hooks": {
"stop": [
{
"matcher": {},
"commands": [
"uv run .claude/hooks/stop-chat-log.py",
"uv run .claude/hooks/stop-tts.py",
"uv run .claude/hooks/stop-notify.py"
]
}
]
}
}
```
**Hooks run sequentially** in the order specified.
### 7. Matcher Patterns for Selective Execution
```json
{
"hooks": {
"pre-tool-use": [
{
"matcher": {
"toolName": "bash"
},
"commands": ["uv run .claude/hooks/bash-validator.py"]
},
{
"matcher": {
"toolName": "write",
"toolInput": {
"file_path": "**/.env"
}
},
"commands": ["uv run .claude/hooks/block-env-write.py"]
}
]
}
}
```
## Directory Structure Best Practice
```text
.claude/
├── commands/ # Slash commands
├── agents/ # Sub-agent definitions
└── hooks/ # ← New essential directory
├── settings.json # Hook configuration
├── pre-tool-use.py
├── post-tool-use.py
├── notification.py
├── stop.py
├── sub-agent-stop.py
└── utils/ # Shared utilities
├── send-event.py
├── text-to-speech-elevenlabs.py
├── text-to-speech-openai.py
└── summarize-haiku.py
```
## Real-World Use Cases
### Use Case 1: Block Dangerous Operations
```python
# .claude/hooks/pre-tool-use.py
BLOCKED_COMMANDS = [
r'\brm\s+-rf\b', # rm -rf
r'\bsudo\s+rm\b', # sudo rm
r'\bgit\s+push.*--force\b', # git push --force
r'\bdocker\s+system\s+prune\b', # docker system prune
]
BLOCKED_FILES = [
r'\.env$',
r'credentials\.json$',
r'\.ssh/id_rsa$',
r'aws.*credentials',
]
def is_blocked(tool_name, tool_input):
if tool_name == "bash":
command = tool_input.get("command", "")
return any(re.search(p, command) for p in BLOCKED_COMMANDS)
if tool_name in ["read", "write", "edit"]:
file_path = tool_input.get("file_path", "")
return any(re.search(p, file_path) for p in BLOCKED_FILES)
return False
```
### Use Case 2: Multi-Agent Task Board
```text
Observability UI showing:
Active Agents (5):
├── [Agent 1] Planning feature (12s ago)
├── [Agent 2] Writing tests (45s ago) ⚠️ Needs input
├── [Agent 3] Building UI (2m ago)
├── [Agent 4] Deploying (5m ago) ✅ Complete
└── [Agent 5] Monitoring (ongoing)
Recent Events (filtered: post-tool-use):
├── Agent 3: Wrote src/components/Button.tsx
├── Agent 1: Read src/api/endpoints.ts
├── Agent 4: Bash: git push origin main
└── Agent 2: Test failed: test/auth.test.ts
```
### Use Case 3: Long-Running AFK Agents
```bash
# Start agent with background work
/background "Implement entire auth system" --report agents/auth-report.md
# Agent works autonomously
# Hooks send notifications:
# - "Starting authentication module"
# - "Database schema created"
# - "Tests passing"
# - "All set and ready for your next step"
# You're notified via text-to-speech when complete
```
### Use Case 4: Debugging Agent Behavior
```python
# Filter stop events to analyze full chat transcripts
for event in events.filter(type="stop"):
transcript = json.load(open(event.transcriptPath))
# Analyze:
# - What files did agent read?
# - What tools were used most?
# - Where did agent get confused?
# - What patterns led to errors?
```
## Performance Considerations
### Webhook Timeouts
```python
# Don't block agent on slow external services
try:
requests.post(webhook_url, json=event, timeout=0.5) # 500ms max
except requests.Timeout:
# Log locally instead
log_to_file(event)
```
### Database Size Management
```sql
-- Rotate old events
DELETE FROM events
WHERE timestamp < datetime('now', '-30 days');
-- Or archive
INSERT INTO events_archive SELECT * FROM events
WHERE timestamp < datetime('now', '-30 days');
DELETE FROM events
WHERE id IN (SELECT id FROM events_archive);
```
### Event Batching
```python
# Batch events before sending
events_buffer = []
def send_event(event):
events_buffer.append(event)
if len(events_buffer) >= 10:
flush_events()
def flush_events():
requests.post(server_url, json={"events": events_buffer})
events_buffer.clear()
```
## Integration with Observability Platforms
### Datadog
```python
from datadog import statsd
def send_to_datadog(event):
statsd.increment(f"claude.tool.{event['toolName']}")
statsd.histogram(f"claude.duration.{event['toolName']}", event['duration'])
```
### Prometheus
```python
from prometheus_client import Counter, Histogram
tool_counter = Counter('claude_tool_executions', 'Tool executions', ['tool_name'])
tool_duration = Histogram('claude_tool_duration_seconds', 'Tool duration', ['tool_name'])
def send_to_prometheus(event):
tool_counter.labels(tool_name=event['toolName']).inc()
tool_duration.labels(tool_name=event['toolName']).observe(event['duration'])
```
### Slack
```python
import requests
def send_to_slack(event):
if event['eventType'] == 'notification':
requests.post(
os.getenv("SLACK_WEBHOOK_URL"),
json={"text": f"🤖 Agent needs input: {event['message']}"}
)
```
## Key Principles
1. **If you don't measure it, you can't improve it** - Observability is critical for scaling agents
2. **Keep hooks simple and isolated** - Use single-file scripts (UV, bun, shell)
3. **Never block the agent** - Hooks should be fast and fault-tolerant
4. **Small models for summaries** - Haiku is perfect and costs pennies
5. **One-way data streams** - Simple architecture beats complex bidirectional systems
6. **Context, Model, Prompt** - Even with hooks, the big three still matter
## Source Attribution
**Primary source:** Multi-Agent Observability transcript (complete system architecture, WebSocket streaming, event summarization, SQLite persistence)
**Supporting source:** Hooked transcript (5 hooks fundamentals, pre-tool-use implementation, text-to-speech integration, isolated scripts pattern)
**Key quotes:**
- "When it comes to agentic coding, observability is everything." (Hooked)
- "This one-way data stream keeps things really simple." (Multi-Agent Observability)
- "Thousands of events, less than 20 cents. Small fast models shine here." (Multi-Agent Observability)
## Related Documentation
- [Hooks Reference](../reference/hooks-reference.md) - Complete API reference for all 5 hooks
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real observability systems in action
- [Core 4 Framework](../reference/core-4-framework.md) - Context, Model, Prompt, Tools
---
**Remember:** Observability isn't optional when scaling agents. If you can't see what they're doing, you can't scale them effectively.

View File

@@ -0,0 +1,673 @@
# The Orchestrator Pattern
> "The rate at which you can create and command your agents becomes the constraint of your engineering output. When your agents are slow, you're slow."
The orchestrator pattern is **Level 5** of agentic engineering: managing fleets of agents through a single interface.
## The Journey to Orchestration
```text
Level 1: Base agents → Use agents out of the box
Level 2: Better agents → Customize prompts and workflows
Level 3: More agents → Run multiple agents
Level 4: Custom agents → Build specialized solutions
Level 5: Orchestration → Manage fleets of agents ← You are here
```
**Key realization:** Single agents hit context window limits. You need orchestration to scale beyond one agent.
## The Three Pillars
Multi-agent orchestration requires three components working together:
```text
┌─────────────────────────────────────────────────────────┐
│ 1. ORCHESTRATOR AGENT │
│ (Single interface to your fleet) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 2. CRUD FOR AGENTS │
│ (Create, Read, Update, Delete agents at scale) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 3. OBSERVABILITY │
│ (Monitor performance, costs, and results) │
└─────────────────────────────────────────────────────────┘
```
Without all three, orchestration fails. You need:
- **Orchestrator** to command agents
- **CRUD** to manage agent lifecycle
- **Observability** to understand what agents are doing
## Core Principle: The Orchestrator Sleeps
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. It has created and commanded our agents. Now, our agents are doing the work."
**The pattern:**
```text
1. User prompts Orchestrator
2. Orchestrator creates specialized agents
3. Orchestrator commands agents with detailed prompts
4. Orchestrator SLEEPS (stops consuming context)
5. Agents work autonomously
6. Orchestrator wakes periodically to check status
7. Orchestrator reports results to user
8. Agents are deleted
```
**Why orchestrator sleeps:**
- Protects its context window
- Avoids observing all agent work (too much information)
- Only wakes when needed to check status or command agents
**Example orchestrator sleep pattern:**
```python
# Orchestrator commands agents
orchestrator.create_agent("scout", task="Find relevant files")
orchestrator.create_agent("builder", task="Implement changes")
# Orchestrator sleeps, checking status every 15s
while not all_agents_complete():
orchestrator.sleep(15) # Not consuming context
status = orchestrator.check_agent_status()
orchestrator.log(status)
# Wake up to collect results
results = orchestrator.get_agent_results()
orchestrator.summarize_to_user(results)
```
## Orchestration Patterns
### Pattern 1: Scout-Plan-Build (Sequential Chaining)
**Use case:** Complex tasks requiring multiple specialized steps
**Flow:**
```text
User: "Migrate codebase to new SDK"
Orchestrator creates Scout agents (4 parallel)
├→ Scout 1: Search with Gemini
├→ Scout 2: Search with CodeX
├→ Scout 3: Search with Haiku
└→ Scout 4: Search with Flash
Scouts output: relevant-files.md with exact locations
Orchestrator creates Planner agent
├→ Reads relevant-files.md
├→ Scrapes documentation
└→ Outputs: detailed-plan.md
Orchestrator creates Builder agent
├→ Reads detailed-plan.md
├→ Executes implementation
└→ Tests and validates
```
**Why this works:**
- **Scout step offloads searching from Planner** (R&D framework: Reduce + Delegate)
- **Multiple scout models** provide diverse perspectives
- **Planner only sees relevant files**, not entire codebase
- **Builder focused on execution**, not planning
**Implementation:**
```bash
# Composable slash commands
/scout-plan-build "Migrate to new Claude Agent SDK"
# Internally runs:
/scout "Find files needing SDK migration"
/plan-with-docs docs=https://agent-sdk-docs.com
/build plan=agents/plans/sdk-migration.md
```
**Context savings:**
```text
Without scouts:
├── Planner searches entire codebase: 50k tokens
├── Planner reads irrelevant files: 30k tokens
└── Total wasted: 80k tokens
With scouts:
├── 4 scouts search in parallel (isolated contexts)
├── Planner reads only relevant-files.md: 5k tokens
└── Savings: 75k tokens (94% reduction)
```
### Pattern 2: Plan-Build-Review-Ship (Task Board)
**Use case:** Structured development lifecycle with quality gates
**Flow:**
```text
User: "Update HTML titles across application"
Task created → PLAN column
Orchestrator creates Planner agent
├→ Analyzes requirements
├→ Creates implementation plan
└→ Moves task to BUILD
Orchestrator creates Builder agent
├→ Reads plan
├→ Implements changes
├→ Runs tests
└→ Moves task to REVIEW
Orchestrator creates Reviewer agent
├→ Checks implementation against plan
├→ Validates tests pass
└→ Moves task to SHIP
Orchestrator creates Shipper agent
├→ Creates git commit
├→ Pushes to remote
└→ Task complete
```
**Why this works:**
- **Clear phases** with distinct responsibilities
- **Each agent focused** on single phase
- **Quality gates** between phases
- **Failure isolation** - if builder fails, planner work preserved
**Visual representation:**
```text
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ Task A │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```
**Agent handoff:**
```python
# Orchestrator manages task board state
task = {
"id": "update-titles",
"status": "planning",
"assigned_agent": "planner-001",
"artifacts": []
}
# Planner completes
task["status"] = "building"
task["artifacts"].append("plan.md")
task["assigned_agent"] = "builder-001"
# Orchestrator hands off to builder
orchestrator.command_agent(
"builder-001",
f"Implement plan from {task['artifacts'][0]}"
)
```
### Pattern 3: Scout-Builder (Two-Stage)
**Use case:** UI changes, targeted modifications
**Flow:**
```text
User: "Create gray pills for app header information"
Orchestrator creates Scout
├→ Locates exact files and line numbers
├→ Identifies patterns and conventions
└→ Outputs: scout-report.md
Orchestrator creates Builder
├→ Reads scout-report.md
├→ Implements precise changes
└→ Outputs: modified files
Orchestrator wakes, verifies, reports
```
**Orchestrator sleep pattern:**
```python
# Orchestrator creates scout
orchestrator.create_agent("scout-header", task="Find header UI components")
# Orchestrator sleeps, checking every 15s
orchestrator.sleep_with_status_checks(interval=15)
# Scout completes, orchestrator wakes
scout_output = orchestrator.get_agent_output("scout-header")
# Orchestrator creates builder with scout's output
orchestrator.create_agent(
"builder-ui",
task=f"Create gray pills based on scout findings: {scout_output}"
)
# Orchestrator sleeps again
orchestrator.sleep_with_status_checks(interval=15)
```
## Context Window Protection
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
**The problem:** Single agent doing everything explodes context window
```text
Single Agent Approach:
├── Search codebase: 40k tokens
├── Read files: 60k tokens
├── Plan changes: 20k tokens
├── Implement: 30k tokens
├── Test: 15k tokens
└── Total: 165k tokens (83% used!)
```
**The solution:** Specialized agents with focused context
```text
Orchestrator Approach:
├── Orchestrator: 10k tokens (coordinates)
├── Scout 1: 15k tokens (searches)
├── Scout 2: 15k tokens (searches)
├── Planner: 25k tokens (plans using scout output)
├── Builder: 35k tokens (implements)
└── Total per agent: <35k tokens (max 18% per agent)
```
**Key principle:** Agents are deletable temporary resources
```text
1. Create agent for specific task
2. Agent completes task
3. DELETE agent (free memory)
4. Create new agent for next task
5. Repeat
```
**Example:**
```bash
# User: "Build documentation for frontend and backend"
# Orchestrator creates 3 agents
/create-agent frontend-docs "Document frontend components"
/create-agent backend-docs "Document backend APIs"
/create-agent qa-docs "Combine and QA both docs"
# Work completes...
# Delete all agents when done
/delete-all-agents
# Result: All agents gone, context freed
```
**Why delete agents:**
- Frees context windows for new work
- Prevents context accumulation
- Enforces single-purpose design
- Matches engineering principle: "The best code is no code at all"
## CRUD for Agents
Orchestrator needs full agent lifecycle control:
**Create:**
```python
agent_id = orchestrator.create_agent(
name="scout-api",
task="Find all API endpoints",
model="haiku", # Fast, cheap for search
max_tokens=100000
)
```
**Read:**
```python
# Check agent status
status = orchestrator.get_agent_status(agent_id)
# => {"status": "working", "progress": "60%", "context_used": "15k tokens"}
# Read agent output
output = orchestrator.get_agent_output(agent_id)
# => {"files_consumed": [...], "files_produced": [...]}
```
**Update:**
```python
# Command existing agent with new task
orchestrator.command_agent(
agent_id,
"Now implement the changes based on your findings"
)
```
**Delete:**
```python
# Single agent
orchestrator.delete_agent(agent_id)
# All agents
orchestrator.delete_all_agents()
```
## Observability Requirements
Without observability, orchestration is blind. You need:
### 1. Agent-Level Visibility
```text
For each agent, track:
├── Name and ID
├── Status (creating, working, complete, failed)
├── Context window usage
├── Model and cost
├── Files consumed
├── Files produced
└── Tool calls executed
```
### 2. Cross-Agent Visibility
```text
Fleet overview:
├── Total agents active
├── Total context consumed
├── Total cost
├── Agent dependencies (who's waiting on whom)
└── Bottlenecks (slow agents blocking others)
```
### 3. Real-Time Streaming
```text
User sees:
├── Agent creation events
├── Tool calls as they happen
├── Progress updates
├── Completion notifications
└── Error alerts
```
**Implementation:** See [Hooks for Observability](hooks-observability.md) for complete architecture
## Information Flow in Orchestrated Systems
```text
User
↓ (prompts)
Orchestrator
↓ (creates & commands)
Agent 1 → Agent 2 → Agent 3
↓ ↓ ↓
(results flow back up)
Orchestrator (summarizes)
User
```
**Critical understanding:** Agents never talk directly to user. They report to orchestrator.
**Example:**
```python
# User prompts orchestrator
user: "Summarize codebase"
# Orchestrator creates agent with detailed instructions
orchestrator agent: """
Read all files in src/
Create markdown summary with:
- Architecture overview
- Key components
- File structure
- Tech stack
Report results back to orchestrator (not user!)
"""
# Agent completes, reports to orchestrator
agent orchestrator: "Summary complete at docs/summary.md"
# Orchestrator reports to user
orchestrator user: "Codebase summary created with 3 main sections: architecture, components, and tech stack"
```
## When to Use Orchestration
### Use orchestration when
**Task requires 3+ specialized agents**
- Example: Scout + Plan + Build
**Context window exploding in single agent**
- Single agent using >150k tokens
**Need parallel execution**
- Multiple independent subtasks
**Quality gates required**
- Plan → Build → Review → Ship
**Long-running autonomous work**
- Agents work while you're AFK
### Don't use orchestration when
**Simple one-off task**
- Single agent sufficient
**Learning/prototyping**
- Orchestration adds complexity
**No observability infrastructure**
- You'll be blind to agent behavior
**Haven't mastered custom agents**
- Level 5 requires Level 4 foundation
## Practical Implementation
### Minimal Orchestrator Agent
```python
# orchestrator-agent.md (sub-agent definition)
---
name: orchestrator
description: Manages fleet of agents for complex multi-step tasks
---
# Orchestrator Agent
You are an orchestrator agent managing a fleet of specialized agents.
## Your Tools
- create_agent(name, task, model): Create new agent
- command_agent(agent_id, task): Send task to existing agent
- get_agent_status(agent_id): Check agent progress
- get_agent_output(agent_id): Retrieve agent results
- delete_agent(agent_id): Remove completed agent
- delete_all_agents(): Clean up all agents
## Your Responsibilities
1. **Break down user requests** into specialized subtasks
2. **Create focused agents** for each subtask
3. **Command agents** with detailed instructions
4. **Monitor progress** without micromanaging
5. **Collect results** and synthesize for user
6. **Delete agents** when work is complete
## Orchestrator Sleep Pattern
After creating and commanding agents:
1. **SLEEP** - Stop consuming context
2. **Wake every 15-30s** to check agent status
3. **SLEEP again** if agents still working
4. **Wake when all complete** to collect results
DO NOT observe all agent work. This explodes your context window.
## Example Workflow
```
User: "Migrate codebase to new SDK"
You:
1. Create scout agents (parallel search)
2. Command scouts to find SDK usage
3. SLEEP (check status every 15s)
4. Wake when scouts complete
5. Create planner agent
6. Command planner with scout results
7. SLEEP (check status every 15s)
8. Wake when planner completes
9. Create builder agent
10. Command builder with plan
11. SLEEP (check status every 15s)
12. Wake when builder completes
13. Summarize results for user
14. Delete all agents
```bash
## Key Principles
- **One agent, one task** - Don't overload agents
- **Sleep between phases** - Protect your context
- **Delete when done** - Treat agents as temporary
- **Detailed commands** - Don't assume agents know context
- **Results-oriented** - Every agent must produce concrete output
```
### Orchestrator Tools (SDK)
```python
# create_agent tool
@mcptool(
name="create_agent",
description="Create a new specialized agent"
)
def create_agent(params: dict) -> dict:
name = params["name"]
task = params["task"]
model = params.get("model", "sonnet")
agent_id = agent_manager.create(
name=name,
system_prompt=task,
model=model
)
return {
"agent_id": agent_id,
"status": "created",
"message": f"Agent {name} created"
}
# command_agent tool
@mcptool(
name="command_agent",
description="Send task to existing agent"
)
def command_agent(params: dict) -> dict:
agent_id = params["agent_id"]
task = params["task"]
result = agent_manager.prompt(agent_id, task)
return {
"agent_id": agent_id,
"status": "commanded",
"message": f"Agent received task"
}
```
## Trade-offs
### Benefits
- ✅ Scales beyond single agent limits
- ✅ Parallel execution (3x-10x speedup)
- ✅ Context window protection
- ✅ Specialized agent focus
- ✅ Quality gates between phases
- ✅ Autonomous out-of-loop work
### Costs
- ❌ Upfront investment to build
- ❌ Infrastructure complexity (database, WebSocket)
- ❌ More moving parts to manage
- ❌ Requires observability
- ❌ Orchestrator agent needs careful prompting
- ❌ Not worth it for simple tasks
## Key Quotes
> "The orchestrator agent is the first pattern where I felt the perfect combination of observability, customizability, and agents at scale."
>
> "Treat your agents as deletable temporary resources that serve a single purpose."
>
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. Now, our agents are doing the work."
>
> "200k context window is plenty. You're just stuffing a single agent with too much work."
## Source Attribution
**Primary source:** One Agent to Rule Them All (orchestrator architecture, three pillars, sleep pattern, CRUD)
**Supporting sources:**
- Claude 2.0 (scout-plan-build workflow, composable prompts)
- Custom Agents (plan-build-review-ship task board)
- Sub-Agents (information flow, delegation patterns)
## Related Documentation
- [Hooks for Observability](hooks-observability.md) - Required for orchestration
- [Context Window Protection](context-window-protection.md) - Why orchestration matters
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real orchestration systems
---
**Remember:** Orchestration is Level 5. Master Levels 1-4 first. Then build your fleet.

View File

@@ -0,0 +1,408 @@
# Core Concepts: Claude Code Architecture
## Table of Contents
- [Executive Summary](#executive-summary)
- [The Core 4 Framework](#the-core-4-framework)
- [Component Definitions](#component-definitions)
- [Skills](#skills)
- [MCP Servers (External Data Sources)](#mcp-servers-external-data-sources)
- [Sub-Agents](#sub-agents)
- [Slash Commands (Custom Prompts)](#slash-commands-custom-prompts)
- [Compositional Hierarchy](#compositional-hierarchy)
- [Progressive Disclosure Architecture](#progressive-disclosure-architecture)
- [Three-Level Loading Mechanism](#three-level-loading-mechanism)
- [How They Relate](#how-they-relate)
- [When to Use Each Component](#when-to-use-each-component)
- [Use Skills When](#use-skills-when)
- [Use Sub-Agents When](#use-sub-agents-when)
- [Use Slash Commands When](#use-slash-commands-when)
- [Use MCP Servers When](#use-mcp-servers-when)
- [Critical Insights and Warnings](#critical-insights-and-warnings)
- [1. Don't Convert All Slash Commands to Skills](#1-dont-convert-all-slash-commands-to-skills)
- [2. Skills Are Not Replacements](#2-skills-are-not-replacements)
- [3. One-Off Tasks Don't Need Skills](#3-one-off-tasks-dont-need-skills)
- [4. Master the Fundamentals First](#4-master-the-fundamentals-first)
- [5. Prompts Are Non-Negotiable](#5-prompts-are-non-negotiable)
- [Skills: Honest Assessment](#skills-honest-assessment)
- [Pros](#pros)
- [Cons](#cons)
- [Evolution Path](#evolution-path)
- [Context Management](#context-management)
- [Key Quotes for Reference](#key-quotes-for-reference)
- [Summary](#summary)
## Executive Summary
Claude Code's architecture is built on a fundamental principle: **prompts are the primitive foundation** for everything. This document provides the authoritative reference for understanding Skills, Sub-Agents, MCP Servers, and Slash Commands—how they work, how they relate, and how they compose.
**Key insight:** Skills are powerful compositional units but should NOT replace the fundamental building blocks (prompts, sub-agents, MCPs). They orchestrate these primitives to solve repeat problems in an agent-first way.
## The Core 4 Framework
**Everything comes down to four pieces:**
1. **Context**
2. **Model**
3. **Prompt**
4. **Tools**
> "If you understand these, if you can build and manage these, you will win. Why is that? It's because every agent is the core 4. And every feature that every one of these agent coding tools is going to build is going to build directly on the core 4. This is the foundation."
This is the thinking framework for understanding and building with Claude Code.
## Component Definitions
### Skills
**What they are:** A dedicated, modular solution that packages a domain-specific capability for autonomous, repeatable workflows.
**Triggering:** Agent-invoked. Claude autonomously decides when to use them based on your request and the Skill's description. You don't explicitly invoke them—they activate automatically when relevant.
**Context and structure:** High modularity with a dedicated directory structure. Supports progressive disclosure (metadata, instructions, resources) and context persistence within the skill's scope.
**Composition:** Can use prompts, other skills, MCP servers, and sub-agents. They sit on top of other capabilities and can orchestrate them through instructions.
**Best use cases:** Automatic or recurring behavior that you want to reuse across workflows (e.g., a work-tree manager that handles create, list, remove, merge, update operations).
**Not a replacement for:** MCP servers, sub-agents, or slash commands. Skills are a higher-level composition unit that coordinates these primitives.
**Critical insight:** Skills don't directly execute code; they provide declarative guidance that coordinates multiple components. When a skill activates, Claude reads the instructions and uses available tools to follow the workflow.
### MCP Servers (External Data Sources)
**What they are:** External data sources or tools integrated into agents through the Model Context Protocol (MCP).
**Triggering:** Typically invoked as needed, often by skills or prompts.
**Context:** They don't bundle a workflow; they connect to external systems and bring in data/services.
**Composition:** Can be used within skills or prompts to fetch data or perform actions with external tools.
**Best use cases:** Connecting to Jira, databases, GitHub, Figma, Slack, and hundreds of other external services. Bundling multiple services together for exposure to the agent.
**Practical examples:**
- Implement features from issue trackers: "Add the feature described in JIRA issue ENG-4521"
- Query databases: "Find emails of 10 random users based on our Postgres database"
- Integrate designs: "Update our email template based on new Figma designs"
- Automate workflows: "Create Gmail drafts inviting these users to a feedback session"
**Clear differentiation:** MCP = external integration, Skills = internal orchestration.
**Plugin integration:** Plugins can bundle MCP servers that start automatically when the plugin is enabled, providing tools and integrations team-wide.
### Sub-Agents
**What they are:** Isolated workflows with separate contexts that can run in parallel.
**Triggering:** Invoked by the main agent to do a task in parallel without polluting the main context.
**Context and isolation:** Each sub-agent uses its own context window separate from main conversation. This prevents context pollution and enables longer overall sessions.
**Composition:** Can be used inside skills and prompts, but you **cannot nest sub-agents inside other sub-agents** (hard limit to prevent infinite nesting).
**Best use cases:** Parallelizable, isolated tasks (e.g., bulk/scale tasks like fixing failing tests, batch operations, comprehensive audits).
**Critical constraint:** You must be okay with losing context afterward—sub-agent context doesn't persist in the main conversation.
**Resumable sub-agents:** Each execution gets a unique `agentId` stored in `agent-{agentId}.jsonl`. Sub-agents can be resumed to continue previous conversations, useful for:
- Long-running research across multiple sessions
- Iterative refinement without losing context
- Multi-step workflows with maintained context
**Model selection:** Sub-agents support `model` field to specify model alias (`sonnet`, `opus`, `haiku`) or `'inherit'` to use the main conversation's model.
### Slash Commands (Custom Prompts)
**What they are:** The primitive, reusable prompts you invoke manually. The closest compositional unit to "bare metal agent plus LLM."
**Triggering:** Manual triggers by a user (or by a higher-level unit like a sub-agent or skill via the SlashCommand tool).
**Context:** They're the most fundamental unit. You should master prompt design here.
**Composition:** Can be used alone or as building blocks inside skills, sub-agents, and MCPs. Acts as BOTH primitive AND composition point.
**Best use cases:** One-off tasks or basic, repeatable prompts. The starting point for building more complex capabilities.
**Critical principle:**
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming. If you don't know how to build and manage prompts, you will lose."
**SlashCommand tool:** Claude can programmatically invoke custom slash commands via the `SlashCommand` tool during conversations. Both skills and sub-agents compose prompts using this tool.
**Advanced features:**
- **Bash execution:** Use `!` prefix to execute bash commands before the slash command runs
- **File references:** Use `@` prefix to include file contents
- **Arguments:** Support `$ARGUMENTS` for all args or `$1`, `$2` for individual parameters
- **Frontmatter:** Control `allowed-tools`, `model`, `description`, `argument-hint`
**Comparison to Skills:**
| Aspect | Slash Commands | Skills |
|----------------|-----------------------------|-------------------------------------|
| **Complexity** | Simple prompts | Complex capabilities |
| **Structure** | Single .md file | Directory with SKILL.md + resources |
| **Discovery** | Explicit (`/command`) | Automatic (context-based) |
| **Files** | One file only | Multiple files, scripts, templates |
## Compositional Hierarchy
**Skills sit at the top of the composition hierarchy:**
```text
Skills (Top Compositional Layer)
├─→ Can use: MCP Servers
├─→ Can use: Sub-Agents
├─→ Can use: Slash Commands
└─→ Can use: Other Skills
Slash Commands (Primitive + Compositional)
├─→ Can use: Skills (via SlashCommand tool)
├─→ Can use: MCP Servers
├─→ Can use: Sub-Agents
└─→ Acts as BOTH primitive AND composition point
Sub-Agents (Execution Layer)
├─→ Can use: Slash Commands (via SlashCommand tool)
├─→ Can use: Skills (via SlashCommand tool)
└─→ CANNOT use: Other Sub-Agents (hard limit)
MCP Servers (Integration Layer)
└─→ Lower level unit, used BY skills, not using skills
```
**Key principle:** Skills provide **coordinated guidance** for repeatable workflows. They orchestrate other components through instructions, not by executing code directly.
**Verified restrictions:**
- Sub-agents cannot nest (no sub-agent spawning other sub-agents)
- Skills don't execute code; they guide Claude to use available tools
- Slash commands can be invoked manually or via `SlashCommand` tool
## Progressive Disclosure Architecture
### Three-Level Loading Mechanism
Skills use a sophisticated loading system that minimizes context usage:
**Level 1: Metadata (always loaded)** - ~100 tokens per skill
- YAML frontmatter with `name` and `description`
- Loaded at startup into system prompt
- Enables discovery without context penalty
- You can install many Skills with minimal overhead
**Level 2: Instructions (loaded when triggered)** - Under 5k tokens
- Main SKILL.md body with procedural knowledge
- Read from filesystem via bash when skill activates
- Only enters context when the skill is relevant
- Contains workflows, best practices, guidance
**Level 3: Resources (loaded as needed)** - Effectively unlimited
- Additional markdown files, scripts, templates
- Executed via bash without loading contents into context
- Scripts provide deterministic operations efficiently
- No context penalty for bundled content that isn't used
**Example skill structure:**
```text
work-tree-manager/
├── SKILL.md # Main instructions (Level 2)
├── reference.md # Detailed reference (Level 3)
├── examples.md # Usage examples (Level 3)
└── scripts/
├── validate.py # Utility script (Level 3, executed)
└── cleanup.py # Cleanup script (Level 3, executed)
```
When this skill activates:
1. Claude already knows the skill exists (Level 1 metadata pre-loaded)
2. Claude reads SKILL.md when the skill is relevant (Level 2)
3. Claude reads reference.md only if needed (Level 3)
4. Claude executes scripts without loading their code (Level 3)
**Key advantage:** Unlike MCP servers which load all context at startup, Skills are extremely context-efficient. Progressive disclosure means only relevant content occupies the context window at any given time.
## How They Relate
**Prompts / slash commands are the primitive building blocks.**
- Master these first before anything else
- "Everything is a prompt in the end. It's tokens in, tokens out."
- Strong bias towards slash commands for simple tasks
**Sub-agents are for isolated, parallelizable tasks with separate contexts.**
- Use when you see the keyword "parallel"
- Nothing else supports parallel calling
- Critical for scale tasks and batch operations
**MCP servers connect to external systems and data sources.**
- Very little overlap with Skills
- These are fully distinct components
- Clear separation: external (MCP) vs internal (Skills)
**Skills are higher-level, domain-specific bundles that orchestrate or compose prompts, sub-agents, and MCP servers to solve repeat problems.**
- Use for MANAGEMENT problems, not one-off tasks
- Keywords: "automatic," "repeat," "manage"
- Don't convert all slash commands to skills—this is a huge mistake
## When to Use Each Component
### Use Skills When
- You have a **REPEAT** problem that needs **MANAGEMENT**
- Multiple related operations need coordination
- You want **automatic** behavior
- Example: Managing git work trees (create, list, remove, merge, update)
**Not for:**
- One-off tasks
- Simple operations
- Problems solved well by a single prompt
### Use Sub-Agents When
- **Parallelization** is needed
- **Context isolation** is required
- Scale tasks and batch operations
- You're okay with losing context afterward
**Signal words:** "parallel," "scale," "bulk," "isolated"
### Use Slash Commands When
- One-off tasks
- Simple repeatable actions
- You're starting a new workflow
- Building the primitive before composing
**Remember:** "Have a strong bias towards slash commands."
### Use MCP Servers When
- External integrations are needed
- Data sources outside Claude Code
- Third-party services
- Database connections
**Clear rule:** External = MCP, Internal orchestration = Skills
## Critical Insights and Warnings
### 1. Don't Convert All Slash Commands to Skills
> "There are a lot of engineers right now that are going all in on skills. They're converting all their slash commands to skills. I think that's a huge mistake."
Keep your slash commands. They are the primitive foundation.
### 2. Skills Are Not Replacements
> "It is very clear this does not replace any existing feature or capability. It is a higher compositional level."
Skills complement other components; they don't replace them.
### 3. One-Off Tasks Don't Need Skills
> "If you can do the job with a sub-agent or custom slash command and it's a one-off job, do not use a skill. This is not what skills are for."
Use the right tool for the job. Not everything needs a skill.
### 4. Master the Fundamentals First
> "When you're starting out, I always recommend you just build a prompt. Don't build a skill. Don't build a sub-agent. Don't build out an MCP server. Keep it simple. Build a prompt."
Start simple. Build upward from primitives.
### 5. Prompts Are Non-Negotiable
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming."
Everything comes back to prompts. Master them first.
## Skills: Honest Assessment
### Pros
1. **Agent-invoked** - Dial up the autonomy knob to 11
2. **Context protection** - Progressive disclosure unlike MCP servers
3. **Dedicated file system pattern** - Logically compose and group skills together
4. **Composability** - Can compose other elements or features
5. **Agentic approach** - Agent just does the right thing
**Biggest value:** "Dedicated isolated file system pattern" + "agent invoked"
### Cons
1. **Doesn't go all the way** - No first-class support for embedding prompts and sub-agents directly in skill directories (must use SlashCommand tool to compose them)
2. **Reliability in complex chains is uncertain** - "Will the agent actually use the right skills when chained? I think individually it's less concerning but when you stack these up... how reliable is that?"
3. **Limited innovation** - Skills are effectively "curated prompt engineering plus modularity." The real innovation is having a dedicated, opinionated way to operate agents.
**Rating:** "8 out of 10"
**Bottom line:** "Having a dedicated specific way to operate your agents in an agent first way is still powerful."
## Evolution Path
The proper progression for building with Claude Code:
1. **Start with a prompt/slash command** - Solve the basic problem
2. **Add sub-agent if parallelism needed** - Scale to multiple parallel operations
3. **Create skill when management needed** - Bundle multiple related operations
4. **Add MCP if external data needed** - Integrate external systems
**Example: Git Work Trees**
- **Prompt:** Create one work tree ✓
- **Sub-agent:** Create multiple work trees in parallel ✓
- **Skill:** Manage work trees (create, list, remove, merge, update) ✓
- **MCP:** Query external repo metadata (if needed) ✓
## Context Management
**Progressive Disclosure (Skills):**
Skills are very context efficient. Three levels of progressive disclosure ensure only relevant content is loaded:
1. Metadata level (always in context, ~100 tokens)
2. Instructions (loaded when triggered, <5k tokens)
3. Resources (loaded as needed, effectively unlimited)
**Context Isolation (Sub-Agents):**
Sub-agents isolate and protect your context window by using separate contexts for each task. This is what makes sub-agents great for parallel work—but you must be okay with losing that context afterward.
**Context Explosion (MCP Servers):**
Unlike Skills, MCP servers can "torch your context window" by loading all their context at startup. This is a tradeoff for immediate availability of external tools.
## Key Quotes for Reference
1. **On Prompts:**
> "The prompt is the fundamental unit of knowledge work and of programming."
2. **On Skills vs Prompts:**
> "If you can do the job with a sub agent or custom slash command and it's a one-off job, do not use a skill."
3. **On Composition:**
> "Skills at the top of the composition hierarchy... can compose everything into a skill, but you can also compose everything into a slash command."
4. **On The Core 4:**
> "Everything comes down to just four pieces... context, model, prompt, and tools."
5. **On Skills' Purpose:**
> "Skills offer a dedicated solution, right? An opinionated structure on how to solve repeat problems in an agent first way."
## Summary
**Start simple:** Build prompts first.
**Compose upward:** Prompts → Skills (not Skills → prompts as primary).
**Use the right tool:** Not everything needs a skill.
**Master The Core 4:** Context, Model, Prompt, Tools—these are the foundation.
**Remember:** Skills are powerful compositional units for repeat problems, but prompts remain the fundamental primitive. Build from this foundation, and compose upward as complexity requires.

View File

@@ -0,0 +1,428 @@
# The Core 4 Framework
> "Keep track of the core four. If you understand the core 4 and how each element flows and controls your agent, you will understand compute and you'll understand how to scale your compute."
The Core 4 Framework is the foundation of all agentic systems. Every agent—whether base, custom, or sub-agent—operates on these four pillars:
1. **Context** - What information does the agent have?
2. **Model** - What capabilities does the model provide?
3. **Prompt** - What instruction are you giving?
4. **Tools** - What actions can the agent take?
## Why the Core 4 Matters
**Understanding compute = Understanding the Core 4**
When you analyze any agent configuration, isolate the Core 4:
- How is context being managed?
- Which model is selected and why?
- What are the system prompts vs user prompts?
- What tools are available?
**Everything comes down to just four pieces. If you understand these, you will win.**
## The Four Pillars in Detail
### 1. Context - What Information Does the Agent Have?
Context is the information available to your agent at any given moment.
**Types of Context:**
```text
Static Context (always loaded):
├── CLAUDE.md (global instructions)
├── System prompt (agent definition)
└── MCP servers (tool descriptions)
Dynamic Context (accumulated during session):
├── Conversation history
├── File reads
├── Tool execution results
└── User prompts
```
**Context Management Strategies:**
| Strategy | When to Use | Token Cost |
|----------|-------------|------------|
| Minimal CLAUDE.md | Always | 500-1k tokens |
| Context priming | Task-specific setup | 2-5k tokens |
| Context bundles | Agent handoffs | 10-20k tokens |
| Sub-agent delegation | Parallel work | Isolated per agent |
**Key Principle:** A focused agent is a performant agent.
**Anti-pattern:** Loading all context upfront regardless of relevance.
### 2. Model - What Capabilities Does the Model Provide?
The model determines intelligence, speed, and cost characteristics.
**Model Selection:**
```text
Claude Opus:
├── Use: Complex reasoning, large codebases, architectural decisions
├── Cost: Highest
└── Speed: Slower
Claude Sonnet:
├── Use: Balanced tasks, general development
├── Cost: Medium
└── Speed: Medium
Claude Haiku:
├── Use: Simple tasks, fast iteration, text transformation
├── Cost: Lowest (pennies)
└── Speed: Fastest
```
**Example: Echo Agent (Custom Agents)**
```python
model: "claude-3-haiku-20240307" # Downgraded for simple text manipulation
# Result: Much faster, much cheaper, still effective for the task
```
**Key Principle:** Match model capability to task complexity. Don't pay for Opus when Haiku will do.
### 3. Prompt - What Instruction Are You Giving?
Prompts are the fundamental unit of knowledge work and programming.
**Critical Distinction: System Prompts vs User Prompts**
```text
System Prompts:
├── Define agent identity and capabilities
├── Loaded once at agent initialization
├── Affect every user prompt that follows
├── Used in: Custom agents, sub-agents
└── Not visible in conversation history
User Prompts:
├── Request specific work from the agent
├── Added to conversation history
├── Build on system prompt foundation
├── Used in: Interactive Claude Code sessions
└── Visible in conversation history
```
**The Pong Agent Example:**
```python
# System prompt (3 lines):
"You are a pong agent. Always respond exactly with 'pong'. That's it."
# Result: No matter what user prompts ("hello", "summarize codebase", "what can you do?")
# Agent always responds: "pong"
```
**Key Insight:** "As soon as you touch the system prompt, you change the product, you change the agent."
**Information Flow in Multi-Agent Systems:**
```text
User Prompt → Primary Agent (System + User Prompts)
Primary prompts Sub-Agent (System Prompt + Primary's instructions)
Sub-Agent responds → Primary Agent (not to user!)
Primary Agent → User
```
**Why this matters:** Sub-agents respond to your primary agent, not to you. This changes how you write sub-agent prompts.
### 4. Tools - What Actions Can the Agent Take?
Tools are the agent's ability to interact with the world.
**Tool Sources:**
```text
Built-in Claude Code Tools:
├── Read, Write, Edit files
├── Bash commands
├── Grep, Glob searches
├── Git operations
└── ~15 standard tools
MCP Servers (External):
├── APIs, databases, services
├── Added via mcp.json
└── Can consume 24k+ tokens if not managed
Custom Tools (SDK):
├── Built with @mcptool decorator
├── Passed to create_sdk_mcp_server()
└── Integrated with system prompt
```
**Example: Custom Echo Agent Tool**
```python
@mcptool(
name="text_transformer",
description="Transform text with reverse, uppercase, repeat operations"
)
def text_transformer(params: dict) -> dict:
text = params["text"]
operation = params["operation"]
# Do whatever you want inside your tool
return {"result": transformed_text}
```
**Key Principle:** Tools consume context. The `/context` command shows what's loaded—every tool takes space in your agent's mind.
## The Core 4 in Different Agent Types
### Base Claude Code Agent
```text
Context: CLAUDE.md + conversation history
Model: User-selected (Opus/Sonnet/Haiku)
Prompt: User prompts → system prompt
Tools: All 15 built-in + loaded MCP servers
```
### Custom Agent (SDK)
```text
Context: Can be customized (override or extend)
Model: Specified in options (can use Haiku for speed)
Prompt: Custom system prompt (can override completely)
Tools: Custom tools + optionally built-in tools
```
**Example:** The Pong agent completely overrides Claude Code's system prompt—it's no longer Claude Code, it's a custom agent.
### Sub-Agent
```text
Context: Isolated context window (no history from primary)
Model: Inherits from primary or can be specified
Prompt: System prompt (in .md file) + primary agent's instructions
Tools: Configurable (can restrict to subset)
```
**Key distinction:** Sub-agents have no context history. They only have what the primary agent prompts them with.
## Information Flow Between Agents
### Single Agent Flow
```text
User Prompt
Primary Agent (Context + Model + Prompt + Tools)
Response to User
```
### Multi-Agent Flow
```text
User Prompt
Primary Agent
├→ Sub-Agent 1 (isolated context)
├→ Sub-Agent 2 (isolated context)
└→ Sub-Agent 3 (isolated context)
Aggregates responses
Response to User
```
**Critical Understanding:**
- Your sub-agents respond to your primary agent, not to you
- Each sub-agent has its own Core 4
- You must track multiple sets of (Context, Model, Prompt, Tools)
## Context Preservation vs Context Isolation
### Context Preservation (Benefit)
```text
Primary Agent:
├── Conversation history maintained
├── Can reference previous work
├── Builds on accumulated knowledge
└── Uses client class in SDK for multi-turn conversations
```
### Context Isolation (Feature + Limitation)
```text
Sub-Agent:
├── Fresh context window (no pollution from main conversation)
├── Focused on single purpose
├── Cannot access primary agent's full history
└── Operates on what primary agent passes it
```
**The Trade-off:** Context isolation makes agents focused (good) but limits information flow (limitation).
## The 12 Leverage Points of Agent Coding
While the Core 4 are foundational, experienced engineers track 12 leverage points:
1. **Context** (Core 4)
2. **Model** (Core 4)
3. **Prompt** (Core 4)
4. **Tools** (Core 4)
5. System prompt structure
6. Tool permission management
7. Context window monitoring
8. Model selection per task
9. Multi-agent orchestration
10. Information flow design
11. Debugging and observability
12. Dependency coupling management
**Key Principle:** "Whenever you see Claude Code options, isolate the Core 4. How will the Core 4 be managed given this setup?"
## Practical Applications
### Application 1: Choosing the Right Model
```text
Task: Simple text transformation
Core 4 Analysis:
├── Context: Minimal (just the text to transform)
├── Model: Haiku (fast, cheap, sufficient)
├── Prompt: Simple instruction ("reverse this text")
└── Tools: Custom text_transformer tool
Result: Pennies cost, sub-second response
```
### Application 2: Managing Context Explosion
```text
Problem: Primary agent context at 180k tokens
Core 4 Analysis:
├── Context: Too much accumulated history
├── Model: Opus (expensive at high token count)
├── Prompt: Gets diluted in massive context
└── Tools: All 15 + 5 MCP servers (24k tokens)
Solution: Delegate to sub-agents
├── Context: Split work across 3 sub-agents (60k each)
├── Model: Keep Opus only where needed
├── Prompt: Focused sub-agent system prompts
└── Tools: Restrict to relevant subset per agent
Result: Work completed, context manageable
```
### Application 3: Custom Agent for Specialized Workflow
```text
Use Case: Plan-Build-Review-Ship task board
Core 4 Design:
├── Context: Task board state + file structure
├── Model: Sonnet (balanced for coding + reasoning)
├── Prompt: Custom system prompt defining PBRS workflow
└── Tools: Built-in file ops + custom task board tools
Implementation: SDK with custom system prompt and tools
Result: Specialized agent that understands your specific workflow
```
## System Prompts vs User Prompts in Practice
### The Confusion
Many engineers treat sub-agent `.md` files as user prompts. **This is wrong.**
```markdown
# ❌ Wrong: Writing sub-agent prompt like a user prompt
Please analyze this codebase and tell me what it does.
```
```markdown
# ✅ Correct: Writing sub-agent prompt as system prompt
Purpose: Analyze codebases and provide concise summaries
When called, you will receive a user's request from the PRIMARY AGENT.
Your job is to read relevant files and create a summary.
Report Format:
Respond to the PRIMARY AGENT (not the user) with:
"Claude, tell the user: [your summary]"
```
### Why the Distinction Matters
```text
System Prompt:
├── Defines WHO the agent is
├── Loaded once (persistent)
└── Affects all user interactions
User Prompt:
├── Defines WHAT work to do
├── Changes with each interaction
└── Builds on system prompt foundation
```
## Debugging with the Core 4
When an agent misbehaves, audit the Core 4:
```text
1. Check Context:
└── Run /context to see what's loaded
└── Are unused MCP servers consuming tokens?
2. Check Model:
└── Is Haiku trying to do Opus-level reasoning?
└── Is cost/speed appropriate for task?
3. Check Prompt:
└── Is system prompt clear and focused?
└── Are sub-agents responding to primary, not user?
4. Check Tools:
└── Run /all-tools to see available options
└── Are too many tools creating choice paralysis?
```
## Key Takeaways
1. **Everything is Core 4** - Every agent configuration comes down to Context, Model, Prompt, Tools
2. **System ≠ User** - System prompts define agent identity; user prompts define work requests
3. **Information flows matter** - In multi-agent systems, understand who's talking to whom
4. **Focused agents perform better** - Like engineers, agents work best with clear, bounded context
5. **Model selection is strategic** - Don't overpay for Opus when Haiku will work
6. **Tools consume context** - Every MCP server and tool takes space in the agent's mind
7. **Context isolation is powerful** - Sub-agents get fresh starts, preventing context pollution
## Source Attribution
**Primary sources:**
- Custom Agents transcript (Core 4 framework, system prompts, SDK usage)
- Sub-Agents transcript (information flow, context preservation, multi-agent systems)
**Key quotes:**
- "Keep track of the core four. If you understand the core 4 and how each element flows and controls your agent, you will understand compute." (Custom Agents)
- "Context, model, prompt, and specifically the flow of the context, model, and prompt between different agents." (Sub-Agents)
## Related Documentation
- [Progressive Disclosure](progressive-disclosure.md) - Managing context (Core 4 pillar #1)
- [Architecture Reference](architecture.md) - How components use the Core 4
- [Decision Framework](../patterns/decision-framework.md) - Choosing components based on Core 4 needs
- [Context Window Protection](../patterns/context-window-protection.md) - Advanced context management
---
**Remember:** Context, Model, Prompt, Tools. Master these four, and you master Claude Code.

View File

@@ -0,0 +1,630 @@
# Claude Code Agent Features - Comprehensive Guide
This document visualizes the complete structure of Claude Code agent features, their relationships, use cases, and best practices.
---
## How to Use This Guide
- **New to Claude Code?** Start with "The Core 4 Thinking Framework"
- **Choosing a component?** Use the "Decision Tree"
- **Understanding architecture?** Study the "Mindmap"
- **Quick reference?** Check the "Decision Matrix"
---
## Terminology
Understanding these terms is critical for navigating Claude Code's composition model:
- **Use** - Invoke a single component for a task (e.g., calling a slash command)
- **Compose** - Wire multiple components together into a larger workflow (e.g., a skill that orchestrates prompts, sub-agents, and MCPs)
- **Nest** - Hierarchical containment (placing one capability inside another's scope)
- **Hard Limit:** Sub-agents cannot nest other sub-agents (technical restriction)
- **Allowed:** Skills can compose/use sub-agents, prompts, MCPs, and other skills
---
## The Core 4 Thinking Framework
Every agent is built on these four fundamental pieces:
1. **Context** - What information does the agent have access to?
2. **Model** - What capabilities does the model provide?
3. **Prompt** - What instruction are you giving?
4. **Tools** - What actions can the agent take?
**Master these fundamentals first.** If you understand these four elements, you can master any agentic feature or tool. This is the foundation - everything else builds on top of this.
---
## Component Overview Mindmap
```mermaid
mindmap
root((Claude Code Agent Features))
Core Agentic Elements
The Core 4 Thinking Framework
Context: What information?
Model: What capability?
Prompt: What instruction?
Tools: What actions?
Context
Model
Prompt
Tools
Key Components
Agent Skills
Capabilities
Triggered by Agents
Context Efficient
Progressive Disclosure
Modular Directory Structure
Composability w/ Features
Dedicated Solutions
Pros
Agent-Initiated Automation
Context Window Protection
Logical Organization/File Structure
Feature Composition Ability
Agentic Approach
Cons
Subject to sub-agent nesting limitation
Reliability in complex chains needs attention
Not a replacement for other features
Examples
Meta Skill
Video Processor Skill
Work Tree Manager Skill
Author Assessment
Rating: 8/10
Not a replacement for other features
Higher compositional level
Thin opinionated file structure
MCP Servers
External Integrations
Expose Services to Agent
Context Window Impact
Sub Agents
Isolated Workflows
Context Protection
Parallelization Support
Cannot nest other sub-agents
Custom Slash Commands
Manual Triggers
Reusable Prompt Shortcuts
Primitive Unit (Prompt)
Hooks
Deterministic Automation
Executes on Lifecycle Events
Code/Agent Integration
Plugins
Distribute Extensions
Reusable Work
Output Styles
Customizable Output
Examples
text-to-speech
diff
summary
Use Case Examples
Automatic PDF Text Extraction → Agent Skill
Connect to Jira → MCP Server
Security Audit → Sub Agent
Git Commit Messages → Slash Command
Database Queries → MCP Server
Fix & Debug Tests → Sub Agent
Detect Style Guide Violations → Agent Skill
Fetch Real-Time Weather → MCP Server
Create UI Component → Slash Command
Parallel Workflow Tasks → Sub Agent
Proper Usage Patterns
CRITICAL: Prompts Are THE Primitive
Everything is prompts (tokens in/out)
Master this FIRST (non-negotiable)
Don't convert all slash commands to skills
Core building block for all components
When To Use Each Feature
Start Simple With Prompts
Scaling to Skill (Repeat Use)
Skill As Solution Manager
Compositional Hierarchy
Skills: Top Compositional Layer
Composition Examples
Technical Limits
Agentic Composability Advice
Context considerations
Model selection
Prompt design
Tool integration
Common Anti-Patterns
Converting all slash commands to skills (HUGE MISTAKE)
Using skills for one-off tasks
Forgetting prompts are the foundation
Not mastering prompts first
Best Practices & Recommendations
Auto-Organize workflows
Leverage progressive disclosure
Maintain clear boundaries between components
Use appropriate abstraction levels
Capabilities Breakdown
Detailed analysis of each component's capabilities and limitations
Key Insights
Hierarchical Understanding
Prompts = Primitive foundation
Slash Commands = Reusable prompts
Sub-Agents = Isolated execution contexts
MCP Servers = External integrations
Skills = Top-level orchestration layer
Hooks = Lifecycle automation
Plugins = Distribution mechanism
Output Styles = Presentation layer
Critical Distinctions
Sub-agents cannot nest other sub-agents (hard limit)
Skills can compose sub-agents, prompts, MCPs, other skills
Prompts are the fundamental primitive
Skills are compositional layers, not replacements
Context efficiency matters
Reliability in complex chains needs attention
Decision Framework
Repeatable pattern detection → Agent Skill
External data/service access → MCP Server
Parallel/isolated work → Sub Agent
Parallel workflow tasks → Sub Agent (whenever you see parallel, think sub-agents)
One-off task → Slash Command
Lifecycle automation → Hook
Team distribution → Plugin
Composition Model
Skills Orchestration Layer
Can compose: Prompts/Slash Commands, MCP Servers, Sub-Agents, Other Skills
Restriction: Avoid circular dependencies (skill A → skill B → skill A)
Purpose: Domain-specific workflow orchestration
Sub-Agents Execution Layer
Can compose: Prompts, MCP Servers
Cannot nest: Sub-agents within sub-agents (hard technical limitation)
Purpose: Isolated/parallel task execution
Slash Commands Primitive Layer
Manual invocation
Reusable prompts
Can be composed into higher layers
MCP Servers Integration Layer
External connections
Expose services to all components
```
---
## Composition Hierarchy
The mindmap shows a clear composition hierarchy:
1. **Prompts** = Primitive foundation (everything builds on this)
2. **Slash Commands** = Reusable prompts
3. **Sub-Agents** = Isolated execution contexts
4. **MCP Servers** = External integrations
5. **Skills** = Top-level orchestration layer
6. **Hooks** = Lifecycle automation
7. **Plugins** = Distribution mechanism
8. **Output Styles** = Presentation layer
### Verified Composition Capabilities
**Skills can compose:**
- ✅ Prompts/Slash Commands
- ✅ MCP Servers
- ✅ Sub-Agents
- ✅ Other Skills (avoid circular dependencies)
**Sub-Agents can compose:**
- ✅ Prompts
- ✅ MCP Servers
- ❌ Other Sub-Agents (hard technical limitation - verified in official docs)
**Technical Limit (Verified):**
- Sub-agents **cannot nest other sub-agents** (this prevents infinite recursion)
- This is the only hard nesting restriction in the system
---
## Decision Matrix
| Task Type | Component | Reason |
|-----------|-----------|---------|
| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
| External data/service access | MCP Server | Integration point |
| Parallel/isolated work | Sub Agent | Context isolation |
| Parallel workflow tasks | Sub Agent | **Whenever you see parallel, think sub-agents** |
| One-off task | Slash Command | Simple, direct |
| Lifecycle automation | Hook | Event-driven |
| Team distribution | Plugin | Packaging |
---
## Decision Tree: When to Use What
This decision tree helps you choose the right Claude Code component based on your needs. **Always start with prompts** - master the primitive first!
```graphviz
digraph decision_tree {
rankdir=TB;
node [shape=box, style=rounded];
start [label="What are you trying to do?", shape=diamond, style="filled", fillcolor=lightblue];
prompt_start [label="START HERE:\nBuild a Prompt\n(Slash Command)", shape=rect, style="filled", fillcolor=lightyellow];
parallel_check [label="Need parallelization\nor isolated context?", shape=diamond];
external_check [label="External data/service\nintegration?", shape=diamond];
oneoff_check [label="One-off task\n(simple, direct)?", shape=diamond];
repeatable_check [label="Repeatable workflow\n(pattern detection)?", shape=diamond];
lifecycle_check [label="Lifecycle event\nautomation?", shape=diamond];
distribution_check [label="Sharing/distributing\nto team?", shape=diamond];
subagent [label="Use Sub Agent\nIsolated context\nParallel execution\nContext protection", shape=rect, style="filled", fillcolor=lightgreen];
mcp [label="Use MCP Server\nExternal integrations\nExpose services\nContext window impact", shape=rect, style="filled", fillcolor=lightgreen];
slash_cmd [label="Use Slash Command\nManual trigger\nReusable prompt\nPrimitive unit", shape=rect, style="filled", fillcolor=lightgreen];
skill [label="Use Agent Skill\nAgent-triggered\nContext efficient\nProgressive disclosure\nModular structure", shape=rect, style="filled", fillcolor=lightgreen];
hook [label="Use Hook\nDeterministic automation\nLifecycle events\nCode/Agent integration", shape=rect, style="filled", fillcolor=lightgreen];
plugin [label="Use Plugin\nDistribute extensions\nReusable work\nPackaging/sharing", shape=rect, style="filled", fillcolor=lightgreen];
start -> prompt_start [label="Always start here", style=dashed, color=red];
prompt_start -> parallel_check;
parallel_check -> subagent [label="Yes\n⚠ Whenever you see\n'parallel', think sub-agents"];
parallel_check -> external_check [label="No"];
external_check -> mcp [label="Yes"];
external_check -> oneoff_check [label="No"];
oneoff_check -> slash_cmd [label="Yes\nKeep it simple"];
oneoff_check -> repeatable_check [label="No"];
repeatable_check -> skill [label="Yes\nScale to skill\nfor repeat use"];
repeatable_check -> lifecycle_check [label="No"];
lifecycle_check -> hook [label="Yes"];
lifecycle_check -> distribution_check [label="No"];
distribution_check -> plugin [label="Yes"];
distribution_check -> slash_cmd [label="No\nDefault: Use prompt"];
}
```
### Decision Tree Key Points
**Critical Rule**: Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
**Decision Flow**:
1. **Parallel/Isolated?** → Sub Agent (whenever you see "parallel", think sub-agents)
2. **External Integration?** → MCP Server
3. **One-off Task?** → Slash Command (keep it simple)
4. **Repeatable Pattern?** → Agent Skill (scale up)
5. **Lifecycle Automation?** → Hook
6. **Team Distribution?** → Plugin
7. **Default** → Slash Command (prompt)
**Remember**: Skills are compositional layers, not replacements. Don't convert all your slash commands to skills - that's a HUGE MISTAKE!
---
## Critical Principles
- **⚠️ CRITICAL: Prompts are THE fundamental primitive** - Everything is prompts (tokens in/out). Master this FIRST (non-negotiable). Don't convert all slash commands to skills.
- **Sub-agents cannot nest other sub-agents** (hard technical limitation - verified in official docs)
- **Skills CAN compose sub-agents, prompts, MCPs, and other skills** (verified through first-hand experience)
- **Skills are compositional layers, not replacements** (complementary, not substitutes). Rating: 8/10 - "Higher compositional level" not a replacement.
- **Context efficiency matters** (progressive disclosure, isolation)
- **Reliability in complex chains needs attention** (acknowledged challenge)
- **Parallel keyword = Sub Agents** - Whenever you see parallel, think sub-agents
---
## Verified Composition Rules
Based on official documentation and empirical testing:
### Skills (Top Orchestration Layer)
-**Can invoke/compose:** Prompts/Slash Commands, MCP Servers, Sub-Agents, Other Skills
- ⚠️ **Best Practice:** Avoid circular dependencies (skill A → skill B → skill A)
- **Purpose:** Domain-specific workflow orchestration
- **When to use:** Repeatable workflows that benefit from automatic triggering
### Sub-Agents (Execution Layer)
-**Can invoke/compose:** Prompts, MCP Servers
-**Cannot nest:** Other sub-agents (hard technical limitation from official docs)
- **Purpose:** Isolated/parallel task execution with separate context
- **When to use:** Parallel work, context isolation, specialized roles
### Slash Commands (Primitive Layer)
-**Can be composed into:** Skills, Sub-Agents
- **Purpose:** Manual invocation of reusable prompts
- **When to use:** One-off tasks, simple workflows, building blocks
### MCP Servers (Integration Layer)
-**Can be used by:** Skills, Sub-Agents, Main Agent
- **Purpose:** External service/data integration
- **When to use:** Need to access external APIs, databases, or services
---
## Common Anti-Patterns to Avoid
- **Converting all slash commands to skills** - This is a HUGE MISTAKE. Skills are for repeatable workflows, not one-off tasks.
- **Using skills for one-off tasks** - Use slash commands (prompts) instead.
- **Forgetting prompts are the foundation** - Master prompts first before building skills.
- **Not mastering prompts first** - If you avoid understanding prompts, you will not progress as an agentic engineer.
- **Trying to nest sub-agents** - This is a hard technical limitation and will fail.
---
## Best Practices
### When to Use Each Component
**Start with Prompts:**
- Begin every workflow as a prompt/slash command
- Test and validate the approach
- Only promote to skill when pattern repeats
**Scale to Skills:**
- Pattern used multiple times? → Create a skill
- Need automatic triggering? → Create a skill
- Complex multi-step workflow? → Create a skill
- One-off task? → Keep as slash command
**Use Sub-Agents for:**
- Parallel execution needs
- Context isolation required
- Specialized roles with separate context
- Research or planning phases
**Use MCP Servers for:**
- External API integration
- Database access
- Third-party service connections
---
## Detailed Component Analysis
### Agent Skills
**Capabilities:**
- Triggered automatically by agents based on description matching
- Context efficient through progressive disclosure
- Modular directory structure (SKILL.md, scripts/, references/, assets/)
- Can compose with all other features
**Pros:**
- Agent-initiated automation (no manual invocation needed)
- Context window protection (progressive disclosure)
- Logical organization and file structure
- Feature composition ability
- Scales from simple to complex
**Cons:**
- Subject to sub-agent nesting limitation (composed sub-agents can't nest others)
- Reliability in complex chains needs attention
- Not a replacement for other features (complementary)
**When to Use:**
- Repeatable workflows
- Domain-specific expertise
- Complex multi-step processes
- When you want automatic triggering
**Examples:**
- PDF processing workflows
- Code generation patterns
- Documentation generation
- Brand guidelines enforcement
### Sub-Agents
**Capabilities:**
- Isolated execution context (separate from main agent)
- Can run in parallel
- Custom system prompts
- Tool access (can inherit or specify)
- Access to MCP servers
**Pros:**
- Context isolation
- Parallel execution
- Specialized expertise
- Separate tool permissions
**Cons:**
- Cannot nest other sub-agents (hard limit)
- No memory between invocations
- Need to re-gather context each time
**When to Use:**
- Parallel workflow tasks
- Isolated research/planning
- Specialized roles (architect, tester, reviewer)
- When you need separate context
**Technical Note:**
- **VERIFIED:** Sub-agents cannot spawn other sub-agents (official docs)
- This prevents infinite nesting and maintains system stability
### MCP Servers
**Capabilities:**
- External service integration
- Standardized protocol
- Authentication handling
- Available to all components
**When to Use:**
- Need external data
- API access required
- Database queries
- Third-party service integration
### Slash Commands
**Capabilities:**
- Manual invocation
- Reusable prompts
- Project or global scope
- Can be composed into skills and sub-agents
**When to Use:**
- One-off tasks
- Simple workflows
- Testing new patterns
- Building blocks for skills
### Hooks
**Capabilities:**
- Lifecycle event automation
- Deterministic execution
- Code/agent integration
**When to Use:**
- Pre/post command execution
- File change reactions
- Environment validation
### Plugins
**Capabilities:**
- Bundle multiple components
- Distribution mechanism
- Team sharing
**When to Use:**
- Sharing complete workflows
- Team standardization
- Marketplace distribution
---
## Composition Examples
### Example 1: Full-Stack Development Skill
A skill that orchestrates:
- Calls planning sub-agent (for architecture)
- Calls coding sub-agent (for implementation)
- Uses MCP server (for database queries)
- Invokes testing slash command (for validation)
**This is valid** because:
- Skill composes sub-agents ✓
- Skill composes MCP servers ✓
- Skill composes slash commands ✓
- Sub-agents don't nest each other ✓
### Example 2: Research Workflow
A skill that:
- Calls research sub-agent #1 (searches documentation)
- Calls research sub-agent #2 (analyzes codebase)
- Both run in parallel
- Both use MCP server for external docs
**This is valid** because:
- Skill orchestrates multiple sub-agents ✓
- Sub-agents run in parallel (separate contexts) ✓
- Sub-agents don't nest each other ✓
### Example 3: INVALID - Nested Sub-Agents
A sub-agent that tries to:
- ❌ Call another sub-agent from within itself
**This will FAIL** because:
- Sub-agents cannot nest other sub-agents (hard limit)
---
## Key Insights Summary
### Hierarchical Understanding
1. **Prompts** = Primitive foundation (everything builds on this)
2. **Slash Commands** = Reusable prompts with manual invocation
3. **Sub-Agents** = Isolated execution contexts with separate context windows
4. **MCP Servers** = External integrations available to all
5. **Skills** = Top-level orchestration layer (composes everything)
6. **Hooks** = Lifecycle automation
7. **Plugins** = Distribution mechanism
8. **Output Styles** = Presentation layer
### Critical Technical Facts
**Verified from Official Docs:**
- ✅ Sub-agents CANNOT nest other sub-agents (hard technical limitation)
**Verified from First-Hand Experience:**
- ✅ Skills CAN invoke/compose sub-agents
- ✅ Skills CAN invoke/compose slash commands
- ✅ Skills CAN invoke/compose other skills
**Best Practices:**
- Start with prompts (master the primitive)
- Don't convert all slash commands to skills
- Use sub-agents for parallel/isolated work
- Use skills for repeatable workflows
- Avoid circular skill dependencies
---
## Testing Recommendations
Before deploying any complex workflow:
1. **Test individual components** - Verify each slash command works
2. **Test sub-agent isolation** - Confirm context separation
3. **Test skill triggering** - Ensure description matches use cases
4. **Test composition** - Verify skills can call sub-agents
5. **Test parallel execution** - Confirm sub-agents run independently
---
**Document Status:** Corrected and Verified
**Last Updated:** Based on Claude Code capabilities as of November 2025
**Verification:** Technical facts confirmed via official docs + empirical testing