Initial commit
This commit is contained in:
199
skills/agent-creator/SKILL.md
Normal file
199
skills/agent-creator/SKILL.md
Normal file
@@ -0,0 +1,199 @@
|
||||
---
|
||||
name: agent-creator
|
||||
description: >
|
||||
This skill should be used when the user asks to "create an agent", "write a subagent", "generate
|
||||
agent definition", "add agent to plugin", "write agent frontmatter", "create autonomous agent",
|
||||
"build subagent", needs agent structure guidance, YAML frontmatter configuration, invocation
|
||||
criteria with examples, or wants to add specialized subagents to Claude Code plugins with proper
|
||||
capabilities lists and tool access definitions.
|
||||
---
|
||||
|
||||
# Agent Creator
|
||||
|
||||
## Overview
|
||||
|
||||
Creates subagent definitions for Claude Code. Subagents are specialized assistants
|
||||
that Claude can invoke for specific tasks.
|
||||
|
||||
**When to use:** User requests an agent, wants to add specialized subagent to plugin, or needs agent structure guidance.
|
||||
|
||||
**References:** Consult
|
||||
`plugins/meta/claude-docs/skills/official-docs/reference/plugins-reference.md` and
|
||||
`plugins/meta/claude-docs/skills/official-docs/reference/sub-agents.md` for specifications.
|
||||
|
||||
## CRITICAL: Two Types of Agents
|
||||
|
||||
Claude Code has **two distinct agent types** with **different requirements**:
|
||||
|
||||
### Plugin Agents (plugins/*/agents/)
|
||||
|
||||
**Purpose:** Agents distributed via plugins for team/community use
|
||||
|
||||
**Required frontmatter fields:**
|
||||
- `description` (required) - What this agent specializes in
|
||||
- `capabilities` (required) - Array of specific capabilities
|
||||
|
||||
**Location:** `plugins/<category>/<plugin-name>/agents/agent-name.md`
|
||||
|
||||
**Example:**
|
||||
```markdown
|
||||
---
|
||||
description: Expert code reviewer validating security and quality
|
||||
capabilities: ["vulnerability detection", "code quality review", "best practices"]
|
||||
---
|
||||
```
|
||||
|
||||
### User/Project Agents (.claude/agents/)
|
||||
|
||||
**Purpose:** Personal agents for individual workflows
|
||||
|
||||
**Required frontmatter fields:**
|
||||
- `name` (required) - Agent identifier
|
||||
- `description` (required) - When to invoke this agent
|
||||
- `tools` (optional) - Comma-separated tool list
|
||||
- `model` (optional) - Model alias (sonnet, opus, haiku)
|
||||
|
||||
**Location:** `.claude/agents/agent-name.md` or `~/.claude/agents/agent-name.md`
|
||||
|
||||
**Example:**
|
||||
```markdown
|
||||
---
|
||||
name: code-reviewer
|
||||
description: Expert code review. Use after code changes.
|
||||
tools: Read, Grep, Glob, Bash
|
||||
model: sonnet
|
||||
---
|
||||
```
|
||||
|
||||
**Key difference:** User agents have `name` field and system prompt. Plugin agents have `capabilities` array and documentation.
|
||||
|
||||
## Agent Structure Requirements (Plugin Agents)
|
||||
|
||||
Every **plugin agent** MUST include:
|
||||
|
||||
1. **Frontmatter** with `description` and `capabilities` array
|
||||
2. **Agent title** as h1
|
||||
3. **Capabilities** section explaining what agent does
|
||||
4. **When to Use** section with invocation criteria
|
||||
5. **Context and Examples** with concrete scenarios
|
||||
6. Located in `agents/agent-name.md` within plugin
|
||||
|
||||
## Creation Process
|
||||
|
||||
### Step 0: Determine Agent Type
|
||||
|
||||
**Ask the user:**
|
||||
- Is this for a plugin (team/community distribution)?
|
||||
- Or for personal use (.claude/agents/)?
|
||||
|
||||
**If personal use:** Use user agent format with `name`, `description`, system prompt. See `plugins/meta/claude-docs/skills/official-docs/reference/sub-agents.md` for examples.
|
||||
|
||||
**If plugin:** Continue with plugin agent format below.
|
||||
|
||||
### Step 1: Define Agent Purpose
|
||||
|
||||
Ask the user:
|
||||
|
||||
- What specialized task does this agent handle?
|
||||
- What capabilities distinguish it from other agents?
|
||||
- When should Claude invoke this vs doing work directly?
|
||||
|
||||
### Step 2: Determine Agent Name
|
||||
|
||||
Create descriptive kebab-case name:
|
||||
|
||||
- "security review" → `security-reviewer`
|
||||
- "performance testing" → `performance-tester`
|
||||
- "API documentation" → `api-documenter`
|
||||
|
||||
### Step 3: List Capabilities
|
||||
|
||||
Identify 3-5 specific capabilities:
|
||||
|
||||
- Concrete actions the agent performs
|
||||
- Specialized knowledge it applies
|
||||
- Outputs it generates
|
||||
|
||||
### Step 4: Structure the Agent
|
||||
|
||||
Use this template:
|
||||
|
||||
```markdown
|
||||
---
|
||||
description: One-line agent description
|
||||
capabilities: ["capability-1", "capability-2", "capability-3"]
|
||||
---
|
||||
|
||||
# Agent Name
|
||||
|
||||
Detailed description of agent's role and expertise.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- **Capability 1**: What this enables
|
||||
- **Capability 2**: What this enables
|
||||
- **Capability 3**: What this enables
|
||||
|
||||
## When to Use This Agent
|
||||
|
||||
Claude should invoke when:
|
||||
- Specific condition 1
|
||||
- Specific condition 2
|
||||
- Specific condition 3
|
||||
|
||||
## Context and Examples
|
||||
|
||||
**Example 1: Scenario Name**
|
||||
|
||||
User requests: "Help with X"
|
||||
Agent provides: Specific assistance using capabilities
|
||||
|
||||
**Example 2: Another Scenario**
|
||||
|
||||
When Y happens, agent does Z.
|
||||
```
|
||||
|
||||
### Step 5: Verify Against Official Docs
|
||||
|
||||
**For plugin agents:**
|
||||
Check `plugins/meta/claude-docs/skills/official-docs/reference/plugins-reference.md` (requires `capabilities` array).
|
||||
|
||||
**For user agents:**
|
||||
Check `plugins/meta/claude-docs/skills/official-docs/reference/sub-agents.md` (requires `name` field).
|
||||
|
||||
## Key Principles
|
||||
|
||||
- **Specialization**: Agents should have focused expertise
|
||||
- **Clear Invocation**: Claude must know when to use this agent
|
||||
- **Concrete Capabilities**: List specific things agent can do
|
||||
- **Examples**: Show real scenarios where agent helps
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Security Reviewer Agent
|
||||
|
||||
User: "Create an agent for security reviews"
|
||||
|
||||
Process:
|
||||
|
||||
1. Purpose: Reviews code for security vulnerabilities
|
||||
2. Name: `security-reviewer`
|
||||
3. Capabilities: ["vulnerability detection", "security best practices", "threat modeling"]
|
||||
4. Structure: Include when to invoke, examples of security issues
|
||||
5. Create: `agents/security-reviewer.md`
|
||||
|
||||
Output: Agent that Claude invokes for security-related code review
|
||||
|
||||
### Example 2: Performance Tester Agent
|
||||
|
||||
User: "I need an agent for performance testing"
|
||||
|
||||
Process:
|
||||
|
||||
1. Purpose: Designs and analyzes performance tests
|
||||
2. Name: `performance-tester`
|
||||
3. Capabilities: ["load testing", "benchmark design", "performance analysis"]
|
||||
4. Structure: When to use for optimization vs testing
|
||||
5. Create: `agents/performance-tester.md`
|
||||
|
||||
Output: Agent that Claude invokes for performance concerns
|
||||
183
skills/command-creator/SKILL.md
Normal file
183
skills/command-creator/SKILL.md
Normal file
@@ -0,0 +1,183 @@
|
||||
---
|
||||
name: command-creator
|
||||
description: >
|
||||
This skill should be used when the user asks to "create a slash command", "write a command file",
|
||||
"add command to plugin", "create /command", "write command frontmatter", "add command arguments",
|
||||
"configure command tools", needs guidance on command structure, YAML frontmatter fields
|
||||
(description, argument-hint, allowed-tools), markdown command body, or wants to add custom slash
|
||||
commands to Claude Code plugins with proper argument handling and tool restrictions.
|
||||
---
|
||||
|
||||
# Command Creator
|
||||
|
||||
## Overview
|
||||
|
||||
Creates slash commands for Claude Code plugins. Commands are user-invoked prompts that
|
||||
expand into detailed instructions for Claude.
|
||||
|
||||
**When to use:** User wants to create a command, add command to plugin, or needs command structure help.
|
||||
|
||||
**References:** See
|
||||
`plugins/meta/claude-docs/skills/claude-docs/reference/plugins-reference.md` for command specifications.
|
||||
|
||||
## Command Structure Requirements
|
||||
|
||||
Every command MUST:
|
||||
|
||||
1. Be a `.md` file in `commands/` directory
|
||||
2. Include frontmatter with `description`
|
||||
3. Contain clear instructions for Claude
|
||||
4. Use descriptive kebab-case filename
|
||||
5. Instructions written from Claude's perspective
|
||||
|
||||
## Creation Process
|
||||
|
||||
### Step 1: Define Command Purpose
|
||||
|
||||
Ask the user:
|
||||
|
||||
- What should this command do?
|
||||
- What inputs/context does it need?
|
||||
- What should Claude produce?
|
||||
|
||||
### Step 2: Choose Command Name
|
||||
|
||||
Create concise kebab-case name:
|
||||
|
||||
- "generate tests" → `generate-tests.md`
|
||||
- "review pr" → `review-pr.md`
|
||||
- "deploy app" → `deploy-app.md`
|
||||
|
||||
Name becomes the command: `/generate-tests`
|
||||
|
||||
### Step 3: Write Frontmatter
|
||||
|
||||
Required frontmatter:
|
||||
|
||||
```markdown
|
||||
---
|
||||
description: Brief description of what command does
|
||||
---
|
||||
```
|
||||
|
||||
### Step 4: Write Instructions
|
||||
|
||||
Write clear instructions for Claude:
|
||||
|
||||
```markdown
|
||||
# Command Title
|
||||
|
||||
Detailed instructions telling Claude exactly what to do when this command is invoked.
|
||||
|
||||
## Steps
|
||||
|
||||
1. First action Claude should take
|
||||
2. Second action
|
||||
3. Final action
|
||||
|
||||
## Output Format
|
||||
|
||||
Describe how Claude should present results.
|
||||
|
||||
## Examples
|
||||
|
||||
Show example scenarios if helpful.
|
||||
```
|
||||
|
||||
### Step 5: Verify Against Official Docs
|
||||
|
||||
Check
|
||||
`plugins/meta/claude-docs/skills/claude-docs/reference/plugins-reference.md` for command specifications.
|
||||
|
||||
## Key Principles
|
||||
|
||||
- **Clarity**: Instructions must be unambiguous
|
||||
- **Completeness**: Include all steps Claude needs
|
||||
- **Perspective**: Write as if instructing Claude directly
|
||||
- **Frontmatter**: Always include description
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Test Generator Command
|
||||
|
||||
User: "Create command to generate tests for a file"
|
||||
|
||||
Command file `commands/generate-tests.md`:
|
||||
|
||||
```markdown
|
||||
---
|
||||
description: Generate comprehensive tests for a source file
|
||||
---
|
||||
|
||||
# Generate Tests
|
||||
|
||||
Generate test cases for the file provided by the user.
|
||||
|
||||
## Process
|
||||
|
||||
1. Read and analyze the source file
|
||||
2. Identify testable functions and methods
|
||||
3. Determine test scenarios (happy path, edge cases, errors)
|
||||
4. Write tests using the project's testing framework
|
||||
5. Ensure tests are comprehensive and follow best practices
|
||||
|
||||
## Test Structure
|
||||
|
||||
- One test file per source file
|
||||
- Clear test names describing what's tested
|
||||
- Arrange-Act-Assert pattern
|
||||
- Cover edge cases and error conditions
|
||||
|
||||
## Output
|
||||
|
||||
Present the generated tests and explain coverage.
|
||||
```
|
||||
|
||||
Invoked with: `/generate-tests`
|
||||
|
||||
### Example 2: PR Review Command
|
||||
|
||||
User: "Create command for reviewing pull requests"
|
||||
|
||||
Command file `commands/review-pr.md`:
|
||||
|
||||
```markdown
|
||||
---
|
||||
description: Conduct thorough code review of a pull request
|
||||
---
|
||||
|
||||
# Review PR
|
||||
|
||||
Review the specified pull request for code quality, correctness, and best practices.
|
||||
|
||||
## Review Process
|
||||
|
||||
1. Fetch PR changes using git or gh CLI
|
||||
2. Analyze changed files for:
|
||||
- Code correctness and logic errors
|
||||
- Style and formatting issues
|
||||
- Test coverage
|
||||
- Documentation completeness
|
||||
- Security concerns
|
||||
- Performance implications
|
||||
3. Provide structured feedback
|
||||
|
||||
## Feedback Format
|
||||
|
||||
**Summary**: Brief overview of PR
|
||||
|
||||
**Strengths**: What's done well
|
||||
|
||||
**Issues**: Categorized by severity
|
||||
- Critical: Must fix
|
||||
- Important: Should fix
|
||||
- Minor: Nice to have
|
||||
|
||||
**Suggestions**: Specific improvements with examples
|
||||
|
||||
## Usage
|
||||
|
||||
`/review-pr <pr-number>` or provide PR URL
|
||||
```
|
||||
|
||||
Invoked with: `/review-pr 123`
|
||||
190
skills/hook-creator/SKILL.md
Normal file
190
skills/hook-creator/SKILL.md
Normal file
@@ -0,0 +1,190 @@
|
||||
---
|
||||
name: hook-creator
|
||||
description: >
|
||||
This skill should be used when the user asks to "create a hook", "write hook config", "add
|
||||
hooks.json", "configure event hooks", "create PreToolUse hook", "add SessionStart hook",
|
||||
"implement hook validation", "set up event-driven automation", needs guidance on hooks.json
|
||||
structure, hook events (PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd,
|
||||
UserPromptSubmit), or wants to automate workflows and implement event-driven behavior in Claude
|
||||
Code plugins.
|
||||
---
|
||||
|
||||
# Hook Creator
|
||||
|
||||
## Overview
|
||||
|
||||
Creates hook configurations that respond to Claude Code events automatically. Hooks
|
||||
enable automation like formatting on save, running tests after edits, or custom session
|
||||
initialization.
|
||||
|
||||
**When to use:** User wants to automate workflows, needs event-driven behavior, or requests hooks for their plugin.
|
||||
|
||||
**References:** Consult
|
||||
`plugins/meta/claude-docs/skills/claude-docs/reference/plugins-reference.md` for hook specifications and available
|
||||
events.
|
||||
|
||||
## Hook Structure Requirements
|
||||
|
||||
Hooks are defined in `hooks/hooks.json` with:
|
||||
|
||||
1. **Event type** (SessionStart, PostToolUse, etc.)
|
||||
2. **Matcher** (optional, for filtering which tool uses trigger hook)
|
||||
3. **Hook actions** (command, validation, notification)
|
||||
4. **Proper use of** `${CLAUDE_PLUGIN_ROOT}` for plugin-relative paths
|
||||
|
||||
## Available Events
|
||||
|
||||
From official documentation:
|
||||
|
||||
- `PreToolUse` - Before Claude uses any tool
|
||||
- `PostToolUse` - After Claude uses any tool
|
||||
- `UserPromptSubmit` - When user submits a prompt
|
||||
- `Notification` - When Claude Code sends notifications
|
||||
- `Stop` - When Claude attempts to stop
|
||||
- `SubagentStop` - When subagent attempts to stop
|
||||
- `SessionStart` - At session beginning
|
||||
- `SessionEnd` - At session end
|
||||
- `PreCompact` - Before conversation history compaction
|
||||
|
||||
## Creation Process
|
||||
|
||||
### Step 1: Identify Event and Purpose
|
||||
|
||||
Ask the user:
|
||||
|
||||
- What should happen automatically?
|
||||
- When should it happen (which event)?
|
||||
- What tool uses should trigger it (if PostToolUse)?
|
||||
|
||||
### Step 2: Choose Hook Type
|
||||
|
||||
Three hook types:
|
||||
|
||||
- **command**: Execute shell commands/scripts
|
||||
- **validation**: Validate file contents or project state
|
||||
- **notification**: Send alerts or status updates
|
||||
|
||||
### Step 3: Write Hook Configuration
|
||||
|
||||
Structure for `hooks/hooks.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"EventName": [
|
||||
{
|
||||
"matcher": "ToolName1|ToolName2",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/script.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Create Associated Scripts
|
||||
|
||||
If using command hooks:
|
||||
|
||||
1. Create script in plugin's `scripts/` directory
|
||||
2. Make executable: `chmod +x scripts/script.sh`
|
||||
3. Use `${CLAUDE_PLUGIN_ROOT}` for paths
|
||||
|
||||
### Step 5: Verify Against Official Docs
|
||||
|
||||
Check
|
||||
`plugins/meta/claude-docs/skills/claude-docs/reference/plugins-reference.md` for:
|
||||
|
||||
- Current event names
|
||||
- Hook configuration schema
|
||||
- Environment variable usage
|
||||
|
||||
## Key Principles
|
||||
|
||||
- **Event Selection**: Choose most specific event for the need
|
||||
- **Matcher Precision**: Use matchers to avoid unnecessary executions
|
||||
- **Script Paths**: Always use `${CLAUDE_PLUGIN_ROOT}` for portability
|
||||
- **Error Handling**: Scripts should handle errors gracefully
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Code Formatting Hook
|
||||
|
||||
User: "Auto-format code after I edit files"
|
||||
|
||||
Hook configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Write|Edit",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/format-code.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Creates `scripts/format-code.sh` that runs formatter on modified files.
|
||||
|
||||
### Example 2: Session Welcome Message
|
||||
|
||||
User: "Show a message when Claude starts"
|
||||
|
||||
Hook configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"SessionStart": [
|
||||
{
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "echo 'Welcome! Plugin loaded successfully.'"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Simple command hook, no external script needed.
|
||||
|
||||
### Example 3: Test Runner Hook
|
||||
|
||||
User: "Run tests after I modify test files"
|
||||
|
||||
Hook configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Write|Edit",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/run-tests.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Creates `scripts/run-tests.sh` that detects test file changes and runs relevant tests.
|
||||
211
skills/multi-agent-composition/SKILL.md
Normal file
211
skills/multi-agent-composition/SKILL.md
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
name: multi-agent-composition
|
||||
description: >
|
||||
This skill should be used when the user asks to "choose between skill and agent", "compose
|
||||
multi-agent system", "orchestrate agents", "manage agent context", "design component
|
||||
architecture", "should I use a skill or agent", "when to use hooks vs MCP", "build orchestrator
|
||||
workflow", needs decision frameworks for Claude Code components (skills, sub-agents, hooks, MCP
|
||||
servers, slash commands), context management patterns, or wants to build effective multi-component
|
||||
agentic systems with proper orchestration and anti-patterns guidance.
|
||||
---
|
||||
|
||||
# Multi-Agent Composition
|
||||
|
||||
**Master Claude Code's components, patterns, and principles** to build effective agentic systems.
|
||||
|
||||
## When to Use This Knowledge
|
||||
|
||||
Use this knowledge when:
|
||||
|
||||
- **Learning Claude Code** - Understanding what each component does
|
||||
- **Making architectural decisions** - Choosing Skills vs Sub-Agents vs MCP vs Slash Commands
|
||||
- **Building custom solutions** - Creating specialized agents or orchestration systems
|
||||
- **Scaling agentic workflows** - Moving from single agents to multi-agent orchestration
|
||||
- **Debugging issues** - Understanding why components behave certain ways
|
||||
- **Adding observability** - Implementing hooks for monitoring and control
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### The Core 4 Framework
|
||||
|
||||
Every agent is built on these four elements:
|
||||
|
||||
1. **Context** - What information does the agent have?
|
||||
2. **Model** - What capabilities does the model provide?
|
||||
3. **Prompt** - What instruction are you giving?
|
||||
4. **Tools** - What actions can the agent take?
|
||||
|
||||
> "Everything comes down to just four pieces. If you understand these, you will win."
|
||||
|
||||
### Component Overview
|
||||
|
||||
| Component | Trigger | Use When | Best For |
|
||||
|-----------|---------|----------|----------|
|
||||
| **Skills** | Agent-invoked | Repeat problems needing management | Domain-specific workflows |
|
||||
| **Sub-Agents** | Tool-invoked | Parallelization & context isolation | Scale & batch operations |
|
||||
| **MCP Servers** | As needed | External data/services | Integration with external systems |
|
||||
| **Slash Commands** | Manual/tool | One-off tasks | Simple repeatable prompts |
|
||||
| **Hooks** | Lifecycle events | Observability & control | Monitoring & blocking |
|
||||
|
||||
### Composition Hierarchy
|
||||
|
||||
```text
|
||||
Skills (Top Layer)
|
||||
├─→ Can use: Sub-Agents, Slash Commands, MCP Servers, Other Skills
|
||||
└─→ Purpose: Orchestrate primitives for repeatable workflows
|
||||
|
||||
Sub-Agents (Execution Layer)
|
||||
├─→ Can use: Slash Commands, Skills
|
||||
└─→ Cannot nest other Sub-Agents
|
||||
|
||||
Slash Commands (Primitive Layer)
|
||||
└─→ The fundamental building block
|
||||
|
||||
MCP Servers (Integration Layer)
|
||||
└─→ Connect external systems
|
||||
```
|
||||
|
||||
### Golden Rules
|
||||
|
||||
1. **Always start with prompts** - Master the primitive first
|
||||
2. **"Parallel" = Sub-Agents** - Nothing else supports parallel execution
|
||||
3. **External = MCP, Internal = Skills** - Clear separation of concerns
|
||||
4. **One-off = Slash Command** - Don't over-engineer
|
||||
5. **Repeat + Management = Skill** - Only scale when needed
|
||||
6. **Don't convert all slash commands to skills** - Huge mistake
|
||||
7. **Context, Model, Prompt, Tools** - Never forget the foundation
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
This skill uses progressive disclosure. Start here, then navigate to specific topics as needed.
|
||||
|
||||
### Reference Documentation
|
||||
|
||||
**Architecture fundamentals** - What each component is and how they work
|
||||
|
||||
- **[architecture.md](reference/architecture.md)** - Component definitions, capabilities, restrictions
|
||||
- **[core-4-framework.md](reference/core-4-framework.md)** - Deep dive into Context, Model, Prompt, Tools
|
||||
|
||||
### Implementation Patterns
|
||||
|
||||
**How to use components effectively** - Decision-making and implementation
|
||||
|
||||
- **[decision-framework.md](patterns/decision-framework.md)** - When to use Skills vs Sub-Agents vs MCP vs Slash Commands
|
||||
- **[hooks-in-composition.md](patterns/hooks-in-composition.md)** - Implementing hooks for observability and control
|
||||
- **[orchestrator-pattern.md](patterns/orchestrator-pattern.md)** - Multi-agent orchestration at scale
|
||||
- **[context-management.md](patterns/context-management.md)** - Managing context across agents
|
||||
- **[context-in-composition.md](patterns/context-in-composition.md)** - Context handling in multi-agent systems
|
||||
|
||||
### Anti-Patterns
|
||||
|
||||
#### Common mistakes to avoid
|
||||
|
||||
- **[common-mistakes.md](anti-patterns/common-mistakes.md)** - Converting all slash commands to
|
||||
skills, using skills for one-offs, context explosion, and more
|
||||
|
||||
### Examples
|
||||
|
||||
#### Real-world case studies and progression paths
|
||||
|
||||
- **[progression-example.md](examples/progression-example.md)** - Evolution from prompt → sub-agent → skill (work tree manager example)
|
||||
- **[case-studies.md](examples/case-studies.md)** - Scout-builder patterns, orchestration workflows, multi-agent systems
|
||||
|
||||
### Workflows
|
||||
|
||||
#### Visual guides and decision trees
|
||||
|
||||
- **[decision-tree.md](workflows/decision-tree.md)** - Decision trees, mindmaps, and visual guides for choosing components
|
||||
|
||||
## Getting Started
|
||||
|
||||
### If you're new to Claude Code
|
||||
|
||||
1. Start with **[reference/architecture.md](reference/architecture.md)** to understand components
|
||||
2. Read **[reference/core-4-framework.md](reference/core-4-framework.md)** to grasp the foundation
|
||||
3. Use **[patterns/decision-framework.md](patterns/decision-framework.md)** to make your first architectural choice
|
||||
4. Check **[anti-patterns/common-mistakes.md](anti-patterns/common-mistakes.md)** to avoid pitfalls
|
||||
|
||||
### If you're making an architectural decision
|
||||
|
||||
1. Open **[patterns/decision-framework.md](patterns/decision-framework.md)**
|
||||
2. Follow the decision tree to identify the right component
|
||||
3. Review the specific component in **[reference/architecture.md](reference/architecture.md)**
|
||||
4. Check **[examples/](examples/)** for similar use cases
|
||||
|
||||
### If you're adding observability
|
||||
|
||||
1. Read **[patterns/hooks-in-composition.md](patterns/hooks-in-composition.md)** to understand available hooks and implementation
|
||||
2. Use isolated scripts pattern (UV, bun, or shell)
|
||||
|
||||
### If you're scaling to multi-agent orchestration
|
||||
|
||||
1. Ensure you've mastered custom agents first
|
||||
2. Read **[patterns/orchestrator-pattern.md](patterns/orchestrator-pattern.md)**
|
||||
3. Study **[examples/case-studies.md](examples/case-studies.md)**
|
||||
4. Review **[patterns/context-management.md](patterns/context-management.md)**
|
||||
|
||||
## Key Principles from the Field
|
||||
|
||||
### Prompts Are the Primitive
|
||||
|
||||
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming. If you don't know how to build and manage prompts, you will lose."
|
||||
|
||||
**Everything is prompts in the end.** Master slash commands before skills. Have a strong bias toward slash commands.
|
||||
|
||||
### Skills Are Compositional, Not Replacements
|
||||
|
||||
> "It is very clear this does not replace any existing feature or capability. It is a higher compositional level."
|
||||
|
||||
Skills orchestrate other components; they don't replace them. Don't convert all your
|
||||
slash commands to skills—that's a huge mistake.
|
||||
|
||||
### Observability is Everything
|
||||
|
||||
> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
|
||||
|
||||
If you can't measure it, you can't improve it. If you can't measure it, you can't scale it.
|
||||
|
||||
### Context Window Protection
|
||||
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
|
||||
|
||||
Create focused agents with single purposes. Delete them when done. Treat agents as temporary, deletable resources.
|
||||
|
||||
### The Agentic Engineering Progression
|
||||
|
||||
```text
|
||||
Level 1: Base agents → Use agents out of the box
|
||||
Level 2: Better agents → Customize prompts and workflows
|
||||
Level 3: More agents → Run multiple agents
|
||||
Level 4: Custom agents → Build specialized solutions
|
||||
Level 5: Orchestration → Manage fleets of agents
|
||||
```
|
||||
|
||||
## Source Attribution
|
||||
|
||||
This knowledge synthesizes:
|
||||
|
||||
- Video presentations by Claude Code engineering team
|
||||
- Official Claude Code documentation (docs.claude.com)
|
||||
- Hands-on experimentation and validation
|
||||
- Multi-agent orchestration patterns from the field
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
**Need to understand what a component is?** → [reference/architecture.md](reference/architecture.md)
|
||||
|
||||
**Need to choose the right component?** → [patterns/decision-framework.md](patterns/decision-framework.md)
|
||||
|
||||
**Need to implement hooks?** → [patterns/hooks-in-composition.md](patterns/hooks-in-composition.md)
|
||||
|
||||
**Need to scale to multiple agents?** → [patterns/orchestrator-pattern.md](patterns/orchestrator-pattern.md)
|
||||
|
||||
**Need to see real examples?** → [examples/](examples/)
|
||||
|
||||
**Need visual guides?** → [workflows/decision-tree.md](workflows/decision-tree.md)
|
||||
|
||||
**Want to avoid mistakes?** → [anti-patterns/common-mistakes.md](anti-patterns/common-mistakes.md)
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Context, Model, Prompt, Tools. Master these four, and you master Claude Code.
|
||||
429
skills/multi-agent-composition/anti-patterns/common-mistakes.md
Normal file
429
skills/multi-agent-composition/anti-patterns/common-mistakes.md
Normal file
@@ -0,0 +1,429 @@
|
||||
# Common Anti-Patterns in Claude Code
|
||||
|
||||
**Critical mistakes to avoid** when building with Claude Code components.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [The Fatal Five](#the-fatal-five)
|
||||
- [1. Converting All Slash Commands to Skills](#1-converting-all-slash-commands-to-skills)
|
||||
- [2. Using Skills for One-Off Tasks](#2-using-skills-for-one-off-tasks)
|
||||
- [3. Skipping the Primitive (Not Mastering Prompts First)](#3-skipping-the-primitive-not-mastering-prompts-first)
|
||||
- [4. Forcing Single Agents to Do Too Much (Context Explosion)](#4-forcing-single-agents-to-do-too-much-context-explosion)
|
||||
- [5. Using Sub-Agents When Context Matters](#5-using-sub-agents-when-context-matters)
|
||||
- [Secondary Anti-Patterns](#secondary-anti-patterns)
|
||||
- [6. Confusing MCP with Internal Orchestration](#6-confusing-mcp-with-internal-orchestration)
|
||||
- [7. Forgetting the Core Four](#7-forgetting-the-core-four)
|
||||
- [8. No Observability (Can't Measure, Can't Improve)](#8-no-observability-cant-measure-cant-improve)
|
||||
- [9. Nesting Sub-Agents](#9-nesting-sub-agents)
|
||||
- [10. Over-Engineering Simple Problems](#10-over-engineering-simple-problems)
|
||||
- [11. Agent Dependency Coupling](#11-agent-dependency-coupling)
|
||||
- [Anti-Pattern Detection Checklist](#anti-pattern-detection-checklist)
|
||||
- [Recovery Strategies](#recovery-strategies)
|
||||
- [Remember](#remember)
|
||||
|
||||
## The Fatal Five
|
||||
|
||||
These are the most common and damaging mistakes engineers make:
|
||||
|
||||
### 1. Converting All Slash Commands to Skills
|
||||
|
||||
**The Mistake:**
|
||||
> "There are a lot of engineers right now that are going all in on skills. They're converting all their slash commands to skills. I think that's a huge mistake."
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Skills are for **repeat problems that need management**, not simple one-off tasks
|
||||
- Slash commands are the **primitive foundation** - you need them
|
||||
- You're adding unnecessary complexity and context overhead
|
||||
- Skills should **complement** slash commands, not replace them
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Keep slash commands for simple, direct tasks
|
||||
- Only create a skill when you're **managing a problem domain** with multiple related operations
|
||||
- Have a strong bias toward slash commands
|
||||
|
||||
**Example:**
|
||||
|
||||
- ❌ Wrong: Create a skill for generating a single commit message
|
||||
- ✅ Right: Use a slash command for one-off commit messages; create a skill only if managing an entire git workflow system
|
||||
|
||||
---
|
||||
|
||||
### 2. Using Skills for One-Off Tasks
|
||||
|
||||
**The Mistake:**
|
||||
> "If you can do the job with a sub-agent or custom slash command and it's a one-off job, do not use a skill. This is not what skills are for."
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Skills have overhead (metadata, loading, management)
|
||||
- One-off tasks don't benefit from reuse
|
||||
- You're over-engineering a simple problem
|
||||
|
||||
**Signal words that indicate you DON'T need a skill:**
|
||||
|
||||
- "One time"
|
||||
- "Quick"
|
||||
- "Just need to..."
|
||||
- "Simple task"
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Use a slash command for one-off tasks
|
||||
- If you find yourself doing it repeatedly (3+ times), **then** consider a skill
|
||||
|
||||
**Example:**
|
||||
|
||||
- ❌ Wrong: Build a skill to create one UI component
|
||||
- ✅ Right: Use a slash command; upgrade to skill only after creating components repeatedly
|
||||
|
||||
---
|
||||
|
||||
### 3. Skipping the Primitive (Not Mastering Prompts First)
|
||||
|
||||
**The Mistake:**
|
||||
> "When you're starting out, I always recommend you just build a prompt. Don't build a skill. Don't build a sub-agent. Don't build out an MCP server. Keep it simple. Build a prompt."
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- If you don't master prompts, you can't build effective skills
|
||||
- Everything is prompts in the end (tokens in, tokens out)
|
||||
- You're building on a weak foundation
|
||||
|
||||
**The fundamental truth:**
|
||||
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming. If you don't know how to build and manage prompts, you will lose."
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
1. Always start with a prompt/slash command
|
||||
2. Master the primitive first
|
||||
3. Scale up only when needed
|
||||
4. Build from the foundation upward
|
||||
|
||||
**Example:**
|
||||
|
||||
- ❌ Wrong: "I'm going to start by building a skill because it's more advanced"
|
||||
- ✅ Right: "I'll write a prompt first, see if it works, then consider scaling to a skill"
|
||||
|
||||
---
|
||||
|
||||
### 4. Forcing Single Agents to Do Too Much (Context Explosion)
|
||||
|
||||
**The Mistake:**
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work, just like your boss did to you at your last job. Don't force your agent to context switch."
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Context explosion leads to poor performance
|
||||
- Agent loses focus across too many unrelated tasks
|
||||
- You're treating your agent like an overworked employee
|
||||
- Results degrade as context window fills
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Create **focused agents** with single purposes
|
||||
- Use **sub-agents** for parallel, isolated work
|
||||
- **Delete agents** when their task is complete
|
||||
- Treat agents as **temporary, deletable resources**
|
||||
|
||||
**Example:**
|
||||
|
||||
- ❌ Wrong: One agent that reads codebase, writes tests, updates docs, and deploys
|
||||
- ✅ Right: Four focused agents - one for reading, one for tests, one for docs, one for deployment
|
||||
|
||||
---
|
||||
|
||||
### 5. Using Sub-Agents When Context Matters
|
||||
|
||||
**The Mistake:**
|
||||
> "Sub-agents isolate and protect your context window... But of course, you have to be okay with losing that context afterward because it will be lost."
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Sub-agent context is **isolated**
|
||||
- You can't reference sub-agent work later without resumable sub-agents
|
||||
- You lose the conversation history
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Use sub-agents when:
|
||||
- You need **parallelization**
|
||||
- Context **isolation** is desired
|
||||
- You're okay **losing context** after
|
||||
- Use main conversation when:
|
||||
- You need context later
|
||||
- Work builds on previous steps
|
||||
- Conversation continuity matters
|
||||
|
||||
**Example:**
|
||||
|
||||
- ❌ Wrong: Use sub-agent for research task, then try to reference findings 10 prompts later
|
||||
- ✅ Right: Do research in main conversation if you'll need it later; use sub-agent only for isolated batch work
|
||||
|
||||
---
|
||||
|
||||
## Secondary Anti-Patterns
|
||||
|
||||
### 6. Confusing MCP with Internal Orchestration
|
||||
|
||||
**The Mistake:** Using MCP servers for internal workflows instead of external integrations.
|
||||
|
||||
**Why it's wrong:**
|
||||
> "To me, there is very very little overlap here between agent skills and MCP servers. These are fully distinct."
|
||||
|
||||
**Clear rule:** External = MCP, Internal orchestration = Skills
|
||||
|
||||
**Example:**
|
||||
|
||||
- ❌ Wrong: Build MCP server to orchestrate your internal test suite
|
||||
- ✅ Right: Build a skill for internal test orchestration; use MCP to connect to external CI/CD service
|
||||
|
||||
---
|
||||
|
||||
### 7. Forgetting the Core Four
|
||||
|
||||
**The Mistake:** Not monitoring Context, Model, Prompt, and Tools at critical moments.
|
||||
|
||||
**Why it's wrong:**
|
||||
> "Context, model, prompt, tools. Do you know what these four leverage points are at every critical moment? This is the foundation."
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Always know the state of the Core Four for your agents
|
||||
- Monitor context window usage
|
||||
- Understand which model is active
|
||||
- Track what prompts are being used
|
||||
- Know what tools are available
|
||||
|
||||
---
|
||||
|
||||
### 8. No Observability (Can't Measure, Can't Improve)
|
||||
|
||||
**The Mistake:** Running agents without logging, monitoring, or hooks.
|
||||
|
||||
**Why it's wrong:**
|
||||
> "When it comes to agentic coding, observability is everything. If you can't measure it, you can't improve it. And if you can't measure it, you can't scale it."
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Implement hooks for logging (post-tool-use, stop)
|
||||
- Track agent performance and costs
|
||||
- Monitor what files are read/written
|
||||
- Capture chat transcripts
|
||||
- Review agent behavior to improve prompts
|
||||
|
||||
---
|
||||
|
||||
### 9. Nesting Sub-Agents
|
||||
|
||||
**The Mistake:** Trying to spawn sub-agents from within other sub-agents.
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Hard limit in Claude Code architecture
|
||||
- Prevents infinite nesting
|
||||
- Not supported by the system
|
||||
|
||||
**The restriction:**
|
||||
> "Sub-agents cannot spawn other sub-agents. This prevents infinite nesting while still allowing Claude to gather necessary context."
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
- Use orchestrator pattern instead
|
||||
- Flatten your agent hierarchy
|
||||
- Have main agent create all sub-agents
|
||||
|
||||
---
|
||||
|
||||
### 10. Over-Engineering Simple Problems
|
||||
|
||||
**The Mistake:** Building complex multi-agent orchestration for tasks that could be a single prompt.
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Unnecessary complexity
|
||||
- Maintenance burden
|
||||
- Slower execution
|
||||
- Higher costs
|
||||
|
||||
**The principle:** Start simple, scale only when needed.
|
||||
|
||||
**Decision checklist before scaling:**
|
||||
|
||||
- [ ] Have I tried solving this with a single prompt?
|
||||
- [ ] Is this actually a repeat problem?
|
||||
- [ ] Will the added complexity pay off?
|
||||
- [ ] Am I solving a real problem or just playing with new features?
|
||||
|
||||
---
|
||||
|
||||
### 11. Agent Dependency Coupling
|
||||
|
||||
**The Mistake:** Creating agents that depend on the exact output format of other agents.
|
||||
|
||||
**Why it's wrong:**
|
||||
|
||||
- Creates **brittle coupling** between agents
|
||||
- Changes to one agent's output **break downstream agents**
|
||||
- Makes the system **hard to maintain** and evolve
|
||||
- Creates a **hidden dependency graph** that's not explicit
|
||||
|
||||
**The problem:**
|
||||
When Agent B expects Agent A to return data in a specific format (e.g., JSON with specific field names, or markdown with specific structure), you create tight coupling. If Agent A's output changes, Agent B silently breaks.
|
||||
|
||||
**Warning signs:**
|
||||
|
||||
- Agents parsing other agents' string outputs
|
||||
- Hard-coded field names or output structure assumptions
|
||||
- Agents that "expect" data in a certain format without validation
|
||||
- No explicit contracts between agents
|
||||
|
||||
**Correct approach:**
|
||||
|
||||
**1. Use explicit contracts:**
|
||||
|
||||
```text
|
||||
Agent A prompt:
|
||||
"Return JSON with these exact fields: {id, name, status, created_at}"
|
||||
|
||||
Agent B prompt:
|
||||
"You will receive JSON with fields: {id, name, status, created_at}
|
||||
Validate the structure before processing."
|
||||
```
|
||||
|
||||
**2. Use structured data formats:**
|
||||
|
||||
- Define JSON schemas explicitly
|
||||
- Document expected fields
|
||||
- Validate inputs before processing
|
||||
- Handle missing or malformed data gracefully
|
||||
|
||||
**3. Minimize agent-to-agent communication:**
|
||||
|
||||
- Prefer orchestrator pattern (main agent coordinates)
|
||||
- Pass data through orchestrator, not agent-to-agent
|
||||
- Keep sub-agents independent when possible
|
||||
|
||||
**4. Version your agent contracts:**
|
||||
|
||||
```text
|
||||
Agent output format v2:
|
||||
{
|
||||
"version": "2.0",
|
||||
"data": {...},
|
||||
"metadata": {...}
|
||||
}
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
❌ **Wrong (Brittle Coupling):**
|
||||
|
||||
```text
|
||||
Agent A: "Analyze files and report findings"
|
||||
[Returns: "Found 3 issues in foo.py and 2 in bar.py"]
|
||||
|
||||
Agent B: "Parse Agent A's output and fix the issues"
|
||||
[Expects: "Found N issues in X and Y in Z" format]
|
||||
```
|
||||
|
||||
**Problem:** If Agent A changes its output format, Agent B breaks silently.
|
||||
|
||||
✅ **Right (Explicit Contract):**
|
||||
|
||||
```text
|
||||
Agent A: "Analyze files and return JSON:
|
||||
{
|
||||
'files_analyzed': [...],
|
||||
'findings': [
|
||||
{'file': 'foo.py', 'line': 10, 'issue': '...'},
|
||||
{'file': 'bar.py', 'line': 20, 'issue': '...'}
|
||||
]
|
||||
}"
|
||||
|
||||
Agent B: "You will receive JSON with fields: {files_analyzed, findings}.
|
||||
First validate the structure. Then fix each issue in findings array."
|
||||
```
|
||||
|
||||
**Better (Orchestrator Pattern):**
|
||||
|
||||
```text
|
||||
Main Agent:
|
||||
1. Spawn Agent A to analyze files
|
||||
2. Parse Agent A's JSON output
|
||||
3. Transform to format Agent B needs
|
||||
4. Spawn Agent B with explicit data structure
|
||||
5. Agent B doesn't need to know about Agent A
|
||||
```
|
||||
|
||||
**Best practice:** The orchestrator (main agent) owns the contracts and data transformations. Sub-agents are independent and don't depend on each other's formats.
|
||||
|
||||
---
|
||||
|
||||
## Anti-Pattern Detection Checklist
|
||||
|
||||
Ask yourself these questions:
|
||||
|
||||
**Before creating a skill:**
|
||||
|
||||
- [ ] Is this a **repeat problem** that needs **management**?
|
||||
- [ ] Have I solved this with a prompt/slash command first?
|
||||
- [ ] Am I avoiding the mistake of converting simple commands to skills?
|
||||
|
||||
**Before using a sub-agent:**
|
||||
|
||||
- [ ] Do I need **parallelization** or **context isolation**?
|
||||
- [ ] Am I okay **losing this context** afterward?
|
||||
- [ ] Could this be done in the main conversation instead?
|
||||
|
||||
**Before using MCP:**
|
||||
|
||||
- [ ] Is this for **external** data/services?
|
||||
- [ ] Am I not confusing this with internal orchestration?
|
||||
|
||||
**Before scaling to multi-agent orchestration:**
|
||||
|
||||
- [ ] Have I mastered custom agents first?
|
||||
- [ ] Do I have observability in place?
|
||||
- [ ] Am I solving a real scale problem?
|
||||
|
||||
---
|
||||
|
||||
## Recovery Strategies
|
||||
|
||||
**If you've fallen into these anti-patterns:**
|
||||
|
||||
1. **Converted slash commands to skills?**
|
||||
- Evaluate each skill: Is it truly a repeat management problem?
|
||||
- Downgrade skills that are just one-off tasks back to slash commands
|
||||
- Keep your slash command library strong
|
||||
|
||||
2. **Context explosion in single agent?**
|
||||
- Split work across focused sub-agents
|
||||
- Use orchestrator pattern for complex workflows
|
||||
- Delete agents when tasks complete
|
||||
|
||||
3. **No observability?**
|
||||
- Add hooks immediately (start with stop and post-tool-use)
|
||||
- Log chat transcripts
|
||||
- Track tool usage
|
||||
- Monitor costs
|
||||
|
||||
4. **Lost in complexity?**
|
||||
- Step back to basics: What's the simplest solution?
|
||||
- Remove unnecessary abstractions
|
||||
- Return to prompts/slash commands
|
||||
- Scale up only when proven necessary
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
> "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
|
||||
>
|
||||
> "If you can do the job with a sub-agent or custom slash command and it's a one-off job, do not use a skill."
|
||||
>
|
||||
> "Context, model, prompt, tools. This never goes away."
|
||||
|
||||
**The golden path:** Start with prompts → Scale thoughtfully → Add observability → Manage complexity
|
||||
992
skills/multi-agent-composition/examples/case-studies.md
Normal file
992
skills/multi-agent-composition/examples/case-studies.md
Normal file
@@ -0,0 +1,992 @@
|
||||
# Multi-Agent Case Studies
|
||||
|
||||
Real-world examples of multi-agent systems in production, drawn from field experience.
|
||||
|
||||
## Case Study Index
|
||||
|
||||
| # | Name | Pattern | Agents | Key Lesson |
|
||||
|---|------|---------|--------|------------|
|
||||
| 1 | AI Docs Loader | Sub-agent delegation | 8-10 | Parallel work without context pollution |
|
||||
| 2 | SDK Migration | Scout-plan-build | 6 | Search + plan + implement workflow |
|
||||
| 3 | Codebase Summarization | Orchestrator + QA | 3 | Divide and conquer with synthesis |
|
||||
| 4 | UI Component Creation | Scout-builder | 2 | Precise targeting before building |
|
||||
| 5 | PLAN-BUILD-REVIEW-SHIP | Task board lifecycle | 4 | Quality gates between phases |
|
||||
| 6 | Meta-Agent System | Agent building agents | Variable | Recursive agent creation |
|
||||
| 7 | Observability Dashboard | Fleet monitoring | 5-10+ | Real-time multi-agent visibility |
|
||||
| 8 | AFK Agent Device | Autonomous background work | 3-5 | Out-of-loop while you sleep |
|
||||
|
||||
---
|
||||
|
||||
## Case Study 1: AI Docs Loader
|
||||
|
||||
**Pattern:** Sub-agent delegation for parallel work
|
||||
|
||||
**Problem:** Loading 10 documentation URLs consumes 30k+ tokens per scrape. Single agent would hit 150k+ tokens.
|
||||
|
||||
**Solution:** Delegate each scrape to isolated sub-agent
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Primary Agent (9k tokens)
|
||||
├→ Sub-Agent 1: Scrape doc 1 (3k tokens, isolated)
|
||||
├→ Sub-Agent 2: Scrape doc 2 (3k tokens, isolated)
|
||||
├→ Sub-Agent 3: Scrape doc 3 (3k tokens, isolated)
|
||||
...
|
||||
└→ Sub-Agent 10: Scrape doc 10 (3k tokens, isolated)
|
||||
|
||||
Total work: 39k tokens
|
||||
Primary agent: Only 9k tokens ✅
|
||||
Context protected: 30k tokens kept out of primary
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```bash
|
||||
# Single command
|
||||
/load-ai-docs
|
||||
|
||||
# Agent reads list from ai-docs/README.md
|
||||
# For each URL older than 24 hours:
|
||||
# - Spawn sub-agent
|
||||
# - Sub-agent scrapes URL
|
||||
# - Sub-agent saves to file
|
||||
# - Sub-agent reports completion
|
||||
# Primary agent never sees scrape content
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Sub-agents for isolation** - Each scrape in separate context
|
||||
- **Parallel execution** - All 10 scrapes run simultaneously
|
||||
- **Context delegation** - 30k tokens stay out of primary
|
||||
|
||||
**Results:**
|
||||
|
||||
- **Time:** 10 scrapes in parallel vs. sequential (10x faster)
|
||||
- **Context:** Primary agent stays at 9k tokens throughout
|
||||
- **Scalability:** Can handle 50+ URLs without primary context issues
|
||||
|
||||
**Source:** Elite Context Engineering transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 2: SDK Migration
|
||||
|
||||
**Pattern:** Scout-plan-build with multiple perspectives
|
||||
|
||||
**Problem:** Migrating codebase to new Claude Agent SDK across 8 applications
|
||||
|
||||
**Challenge:**
|
||||
|
||||
- 100+ files potentially affected
|
||||
- Agent reading everything = 150k+ tokens
|
||||
- Planning without full context = mistakes
|
||||
|
||||
**Solution:** Three-phase workflow with delegation
|
||||
|
||||
**Phase 1: Scout (Reduce context for planner)**
|
||||
|
||||
```text
|
||||
Orchestrator spawns 4 scout agents (parallel):
|
||||
├→ Scout 1: Gemini Lightning (fast, different perspective)
|
||||
├→ Scout 2: CodeX (specialized for code search)
|
||||
├→ Scout 3: Gemini Flash Preview
|
||||
└→ Scout 4: Haiku (cheap, fast)
|
||||
|
||||
Each scout:
|
||||
- Searches codebase for SDK usage
|
||||
- Identifies exact files and line numbers
|
||||
- Notes patterns (e.g., "system prompt now explicit")
|
||||
|
||||
Output: relevant-files.md (5k tokens)
|
||||
├── File paths
|
||||
├── Line number offsets
|
||||
├── Character ranges
|
||||
└── Relevant code snippets
|
||||
```
|
||||
|
||||
**Why multiple models?** Diverse perspectives catch edge cases single model might miss.
|
||||
|
||||
**Phase 2: Plan (Focus on relevant subset)**
|
||||
|
||||
```text
|
||||
Planner agent (new instance):
|
||||
├── Reads relevant-files.md (5k tokens)
|
||||
├── Scrapes SDK documentation (8k tokens)
|
||||
├── Analyzes migration patterns
|
||||
└── Creates detailed-plan.md (3k tokens)
|
||||
|
||||
Context used: 16k tokens
|
||||
vs. 150k if reading entire codebase
|
||||
Savings: 89% reduction
|
||||
```
|
||||
|
||||
**Phase 3: Build (Execute plan)**
|
||||
|
||||
```text
|
||||
Builder agent (new instance):
|
||||
├── Reads detailed-plan.md (3k tokens)
|
||||
├── Implements changes across 8 apps
|
||||
├── Updates system prompts
|
||||
├── Tests each application
|
||||
└── Reports completion
|
||||
|
||||
Context used: ~80k tokens
|
||||
Still within safe limits
|
||||
```
|
||||
|
||||
**Final context analysis:**
|
||||
|
||||
```text
|
||||
If single agent:
|
||||
├── Search: 40k tokens
|
||||
├── Read files: 60k tokens
|
||||
├── Plan: 20k tokens
|
||||
├── Implement: 30k tokens
|
||||
└── Total: 150k tokens (75% used)
|
||||
|
||||
With scout-plan-build:
|
||||
├── Primary orchestrator: 10k tokens
|
||||
├── 4 scouts (parallel, isolated): 4 × 15k = 60k total, 0k in primary
|
||||
├── Planner (new agent): 16k tokens
|
||||
├── Builder (new agent): 80k tokens
|
||||
└── Max per agent: 80k tokens (40% per agent)
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Composable workflows** - Chain /scout, /plan, /build
|
||||
- **Multiple scout models** - Diverse perspectives
|
||||
- **Context offloading** - Scouts protect planner's context
|
||||
- **Fresh agents per phase** - No context accumulation
|
||||
|
||||
**Results:**
|
||||
|
||||
- **8 applications migrated** successfully
|
||||
- **51% context used** in builder phase (safe margins)
|
||||
- **No context explosions** across entire workflow
|
||||
- **Completed in single session** (~30 minutes)
|
||||
|
||||
**Near miss:** "We were 14% away from exploding our context" due to autocompact buffer
|
||||
|
||||
**Lesson:** Disable autocompact buffer. That 22% matters at scale.
|
||||
|
||||
**Source:** Claude 2.0 transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 3: Codebase Summarization
|
||||
|
||||
**Pattern:** Orchestrator with specialized QA agents
|
||||
|
||||
**Problem:** Summarize large codebase (frontend + backend) with architecture docs
|
||||
|
||||
**Approach:** Divide and conquer with synthesis
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Orchestrator Agent
|
||||
├→ Creates Frontend QA Agent
|
||||
│ ├─ Summarizes frontend components
|
||||
│ └─ Outputs: frontend-summary.md
|
||||
├→ Creates Backend QA Agent
|
||||
│ ├─ Summarizes backend APIs
|
||||
│ └─ Outputs: backend-summary.md
|
||||
└→ Creates Primary QA Agent
|
||||
├─ Reads both summaries
|
||||
├─ Synthesizes unified view
|
||||
└─ Outputs: codebase-overview.md
|
||||
```
|
||||
|
||||
**Orchestrator behavior:**
|
||||
|
||||
```text
|
||||
1. Parse user request: "Summarize codebase"
|
||||
2. Create 3 agents with specialized tasks
|
||||
3. Command each agent with detailed prompts
|
||||
4. SLEEP (not observing their work)
|
||||
5. Wake every 15s to check status
|
||||
6. Agents complete → Orchestrator wakes
|
||||
7. Collect results (read produced files)
|
||||
8. Summarize for user
|
||||
9. Delete all 3 agents
|
||||
```
|
||||
|
||||
**Prompts from orchestrator:**
|
||||
|
||||
```markdown
|
||||
Frontend QA Agent:
|
||||
"Analyze all files in src/frontend/. Create markdown summary with:
|
||||
- Key components and their responsibilities
|
||||
- State management approach
|
||||
- Routing structure
|
||||
- Technology stack
|
||||
Output to docs/frontend-summary.md"
|
||||
|
||||
Backend QA Agent:
|
||||
"Analyze all files in src/backend/. Create markdown summary with:
|
||||
- API endpoints and their purposes
|
||||
- Database schema
|
||||
- Authentication/authorization
|
||||
- External integrations
|
||||
Output to docs/backend-summary.md"
|
||||
|
||||
Primary QA Agent:
|
||||
"Read frontend-summary.md and backend-summary.md. Create unified overview with:
|
||||
- High-level architecture
|
||||
- How components interact
|
||||
- Data flow
|
||||
- Key technologies
|
||||
Output to docs/codebase-overview.md"
|
||||
```
|
||||
|
||||
**Observability interface shows:**
|
||||
|
||||
```text
|
||||
[Agent 1] Frontend QA
|
||||
├── Status: Complete ✅
|
||||
├── Context: 28k tokens used
|
||||
├── Files consumed: 15 files
|
||||
├── Files produced: frontend-summary.md
|
||||
└── Time: 45 seconds
|
||||
|
||||
[Agent 2] Backend QA
|
||||
├── Status: Complete ✅
|
||||
├── Context: 32k tokens used
|
||||
├── Files consumed: 12 files
|
||||
├── Files produced: backend-summary.md
|
||||
└── Time: 52 seconds
|
||||
|
||||
[Agent 3] Primary QA
|
||||
├── Status: Complete ✅
|
||||
├── Context: 18k tokens used
|
||||
├── Files consumed: 2 files (summaries)
|
||||
├── Files produced: codebase-overview.md
|
||||
└── Time: 30 seconds
|
||||
|
||||
Orchestrator:
|
||||
├── Context: 12k tokens (commands only, not observing work)
|
||||
├── Total time: 52 seconds (parallel execution)
|
||||
└── All agents deleted after completion
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Parallel frontend/backend** - 2x speedup
|
||||
- **Orchestrator sleeps** - Protects its context
|
||||
- **Synthesis agent** - Combines perspectives
|
||||
- **Deletable agents** - Freed after use
|
||||
|
||||
**Results:**
|
||||
|
||||
- **3 comprehensive docs** created
|
||||
- **Max context per agent:** 32k tokens (16%)
|
||||
- **Orchestrator context:** 12k tokens (6%)
|
||||
- **Time:** 52 seconds (vs. 2+ minutes sequential)
|
||||
|
||||
**Source:** One Agent to Rule Them All transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 4: UI Component Creation
|
||||
|
||||
**Pattern:** Scout-builder two-stage
|
||||
|
||||
**Problem:** Create gray pills for app header information display
|
||||
|
||||
**Challenge:** Codebase has specific conventions. Need to find exact files and follow patterns.
|
||||
|
||||
**Solution:** Scout locates, builder implements
|
||||
|
||||
**Phase 1: Scout**
|
||||
|
||||
```text
|
||||
Scout Agent:
|
||||
├── Task: "Find header UI component files"
|
||||
├── Searches for: header, display, pills, info components
|
||||
├── Identifies patterns: existing pill styles, color conventions
|
||||
├── Locates exact files:
|
||||
│ ├── src/components/AppHeader.vue
|
||||
│ ├── src/styles/pills.css
|
||||
│ └── src/utils/formatters.ts
|
||||
└── Outputs: scout-header-report.md with:
|
||||
├── File locations
|
||||
├── Line numbers for modifications
|
||||
├── Existing patterns to follow
|
||||
└── Recommended approach
|
||||
```
|
||||
|
||||
**Phase 2: Builder**
|
||||
|
||||
```text
|
||||
Builder Agent:
|
||||
├── Reads scout-header-report.md
|
||||
├── Follows identified patterns
|
||||
├── Creates gray pill components
|
||||
├── Applies consistent styling
|
||||
├── Outputs modified files with exact changes
|
||||
└── Context: Only 30k tokens (vs. 80k+ without scout)
|
||||
```
|
||||
|
||||
**Orchestrator involvement:**
|
||||
|
||||
```text
|
||||
1. User prompts: "Create gray pills for header"
|
||||
2. Orchestrator creates Scout
|
||||
3. Orchestrator SLEEPS (checks every 15s)
|
||||
4. Scout completes → Orchestrator wakes
|
||||
5. Orchestrator reads scout output
|
||||
6. Orchestrator creates Builder with detailed instructions
|
||||
7. Orchestrator SLEEPS again
|
||||
8. Builder completes → Orchestrator wakes
|
||||
9. Orchestrator reports results
|
||||
10. Orchestrator deletes both agents
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Scout reduces uncertainty** - Builder knows exactly where to work
|
||||
- **Pattern following** - Scout identifies conventions
|
||||
- **Orchestrator sleep** - Two phases, minimal orchestrator context
|
||||
- **Precise targeting** - No wasted reads
|
||||
|
||||
**Results:**
|
||||
|
||||
- **Scout:** 15k tokens, 20 seconds
|
||||
- **Builder:** 30k tokens, 35 seconds
|
||||
- **Orchestrator:** 8k tokens final
|
||||
- **Total time:** 55 seconds
|
||||
- **Feature shipped** correctly on first try
|
||||
|
||||
**Source:** One Agent to Rule Them All transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 5: PLAN-BUILD-REVIEW-SHIP Task Board
|
||||
|
||||
**Pattern:** Structured lifecycle with quality gates
|
||||
|
||||
**Problem:** Ensure all changes go through proper review before shipping
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Task Board Columns:
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
**Example task: "Update HTML titles"**
|
||||
|
||||
**Column 1: PLAN**
|
||||
|
||||
```text
|
||||
Planner Agent:
|
||||
├── Analyzes requirement
|
||||
├── Identifies affected files:
|
||||
│ ├── index.html
|
||||
│ └── src/App.tsx (has <title> in render)
|
||||
├── Creates implementation plan:
|
||||
│ 1. Update index.html <title>
|
||||
│ 2. Update App.tsx header component
|
||||
│ 3. Test both pages load correctly
|
||||
└── Moves task to BUILD column
|
||||
```
|
||||
|
||||
**Column 2: BUILD**
|
||||
|
||||
```text
|
||||
Builder Agent:
|
||||
├── Reads plan from PLAN column
|
||||
├── Implements changes:
|
||||
│ ├── index.html: "Plan Build Review Ship"
|
||||
│ └── App.tsx: header="Plan Build Review Ship"
|
||||
├── Runs tests: All passing ✅
|
||||
└── Moves task to REVIEW column
|
||||
```
|
||||
|
||||
**Column 3: REVIEW**
|
||||
|
||||
```text
|
||||
Reviewer Agent:
|
||||
├── Reads plan and implementation
|
||||
├── Checks:
|
||||
│ ├── Plan followed? ✅
|
||||
│ ├── Tests passing? ✅
|
||||
│ ├── Code quality? ✅
|
||||
│ └── No security issues? ✅
|
||||
├── Approves changes
|
||||
└── Moves task to SHIP column
|
||||
```
|
||||
|
||||
**Column 4: SHIP**
|
||||
|
||||
```text
|
||||
Shipper Agent:
|
||||
├── Creates git commit
|
||||
├── Pushes to remote
|
||||
├── Updates deployment
|
||||
└── Marks task complete
|
||||
```
|
||||
|
||||
**Orchestrator's role:**
|
||||
|
||||
```text
|
||||
- NOT micromanaging each step
|
||||
- Responding to user commands like "Move task to next phase"
|
||||
- Tracking task state in database
|
||||
- Providing UI showing current phase
|
||||
- Can intervene if phase fails (e.g., tests fail in BUILD)
|
||||
```
|
||||
|
||||
**UI representation:**
|
||||
|
||||
```text
|
||||
Task: Update Titles
|
||||
├── Status: REVIEW
|
||||
├── Assigned: reviewer-agent-003
|
||||
├── History:
|
||||
│ ├── PLAN: planner-001 (completed 2m ago)
|
||||
│ ├── BUILD: builder-002 (completed 1m ago)
|
||||
│ └── REVIEW: reviewer-003 (in progress)
|
||||
└── Files modified: 2
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Clear phases** - No ambiguity about current state
|
||||
- **Quality gates** - Can't skip to SHIP without REVIEW
|
||||
- **Agent specialization** - Each agent expert in its phase
|
||||
- **Failure isolation** - If BUILD fails, PLAN preserved
|
||||
|
||||
**Results:**
|
||||
|
||||
- **Zero shipping untested code** (REVIEW gate catches issues)
|
||||
- **Clear audit trail** (who did what in which phase)
|
||||
- **Parallel tasks** (multiple agents in different columns)
|
||||
- **Single interface** (user sees all tasks across all phases)
|
||||
|
||||
**Source:** Custom Agents transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 6: Meta-Agent System
|
||||
|
||||
**Pattern:** Agents building agents
|
||||
|
||||
**Problem:** Need new specialized agent but don't want to hand-write configuration
|
||||
|
||||
**Solution:** Meta-agent that builds other agents
|
||||
|
||||
**Meta-agent prompt:**
|
||||
|
||||
```markdown
|
||||
# meta-agent.md
|
||||
|
||||
You are a meta-agent that builds new sub-agents from user descriptions.
|
||||
|
||||
When user says "build a new sub-agent":
|
||||
1. Ask what the agent should do
|
||||
2. Fetch Claude Code sub-agent documentation
|
||||
3. Design system prompt for new agent
|
||||
4. Create agent configuration file
|
||||
5. Test agent with sample prompts
|
||||
6. Report usage examples
|
||||
|
||||
Output: .claude/agents/<agent-name>.md with complete configuration
|
||||
```
|
||||
|
||||
**Example: Building TTS summary agent**
|
||||
|
||||
**User:** "Build agent that summarizes what my code does using text-to-speech"
|
||||
|
||||
**Meta-agent process:**
|
||||
|
||||
```text
|
||||
Step 1: Understand requirements
|
||||
├── Parse: "summarize code" + "text-to-speech"
|
||||
├── Infer: Needs code reading + TTS API access
|
||||
└── Clarify: Voice provider? (user chooses 11Labs)
|
||||
|
||||
Step 2: Fetch documentation
|
||||
├── Reads Claude Code sub-agent docs
|
||||
├── Reads 11Labs API docs
|
||||
└── Understands agent configuration format
|
||||
|
||||
Step 3: Design system prompt
|
||||
├── Purpose: Concise code summaries via voice
|
||||
├── Tools needed: read files, 11Labs TTS
|
||||
├── Response format: Audio file output
|
||||
└── Trigger: "use TTS summary"
|
||||
|
||||
Step 4: Create configuration
|
||||
Writes .claude/agents/tts-summary.md:
|
||||
---
|
||||
name: tts-summary
|
||||
description: Concisely summarizes code with text-to-speech. Trigger: "TTS summary"
|
||||
---
|
||||
Purpose: Review user's code and provide 1-sentence summary via 11Labs voice
|
||||
[... full system prompt ...]
|
||||
|
||||
Step 5: Test
|
||||
├── Runs test prompt: "TTS summary for hooks.py"
|
||||
├── Agent reads file, generates summary
|
||||
├── Outputs audio with summary
|
||||
└── Validates: Works correctly ✅
|
||||
|
||||
Step 6: Report
|
||||
├── Explains how to use new agent
|
||||
├── Shows example prompts
|
||||
└── Notes: Can adjust voice, length, etc.
|
||||
```
|
||||
|
||||
**Result:** Fully functional TTS summary agent created from natural language description
|
||||
|
||||
**Recursion depth:**
|
||||
|
||||
```text
|
||||
Level 0: Human user
|
||||
└→ Level 1: Meta-agent (builds agents)
|
||||
└→ Level 2: TTS summary agent (built by meta-agent)
|
||||
└→ Level 3: Sub-agents (if TTS agent spawns any)
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Documentation fetching** - Meta-agent reads official docs
|
||||
- **Template following** - Follows agent configuration patterns
|
||||
- **Validation loop** - Tests before declaring success
|
||||
- **Recursive creation** - Agents can build agents
|
||||
|
||||
**Challenges:**
|
||||
|
||||
- **Dependency coupling** - New agent depends on meta-agent's understanding
|
||||
- **Debugging difficulty** - If generated agent fails, hard to trace
|
||||
- **Version drift** - Meta-agent's docs knowledge may become outdated
|
||||
|
||||
**Results:**
|
||||
|
||||
- **New agent in ~2 minutes** vs. 15+ minutes manually
|
||||
- **Follows best practices** automatically
|
||||
- **Tested before delivery**
|
||||
- **Documented usage**
|
||||
|
||||
**Source:** Sub-Agents transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 7: Observability Dashboard
|
||||
|
||||
**Pattern:** Real-time multi-agent monitoring
|
||||
|
||||
**Problem:** Running 5-10 agents simultaneously. Can't see what they're doing.
|
||||
|
||||
**Solution:** Centralized observability system
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
┌──────────────────── Multiple Agents ────────────────────┐
|
||||
│ Agent 1 Agent 2 Agent 3 Agent 4 Agent 5 │
|
||||
│ ↓ ↓ ↓ ↓ ↓ │
|
||||
│ pre/post-tool-use hooks │
|
||||
│ ↓ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────── Bun Server ─────────────────────────┐
|
||||
│ POST /events endpoint │
|
||||
│ ├→ Store in SQLite (persistence) │
|
||||
│ └→ Broadcast via WebSocket (real-time) │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────── Web Client ─────────────────────────┐
|
||||
│ ┌─────────────────────────────────────────────────┐ │
|
||||
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
|
||||
│ ├─────────────────────────────────────────────────┤ │
|
||||
│ │ Event Stream (filtered by app/session/type) │ │
|
||||
│ ├─────────────────────────────────────────────────┤ │
|
||||
│ │ Event Details (with AI-generated summaries) │ │
|
||||
│ └─────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Hook implementation:**
|
||||
|
||||
```python
|
||||
# .claude/hooks/post-tool-use.py
|
||||
import sys, json, subprocess
|
||||
|
||||
def main():
|
||||
event = json.load(sys.stdin)
|
||||
|
||||
# Send to observability server (with AI summary)
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/send-event.py",
|
||||
"my-codebase", # App name
|
||||
"post-tool-use", # Event type
|
||||
"--summarize" # Generate Haiku summary
|
||||
], input=json.dumps(event), text=True)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**What user sees:**
|
||||
|
||||
```text
|
||||
┌─────────────── Live Activity Pulse ───────────────┐
|
||||
│ ▂▄▆█▆▄▂▁ Agent A (very active) │
|
||||
│ ▁▁▂▂▃▃▂▂ Agent B (moderate activity) │
|
||||
│ ▂▂▂▂▂▂▂▂ Agent C (steady work) │
|
||||
│ ▁▁▁█▁▁▁▁ Agent D (spike, then quiet) │
|
||||
└────────────────────────────────────────────────────┘
|
||||
|
||||
┌─────────────── Event Stream ──────────────────────┐
|
||||
│ [Agent A] post-tool-use │
|
||||
│ Summary: "Wrote authentication logic to user.py"│
|
||||
│ Time: 2s ago │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ [Agent B] sub-agent-stop │
|
||||
│ Summary: "Completed documentation scrape" │
|
||||
│ Time: 5s ago │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ [Agent C] notification │
|
||||
│ Summary: "Needs approval for rm command" │
|
||||
│ Time: 8s ago │
|
||||
└────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Filtering:**
|
||||
|
||||
```text
|
||||
Filters available:
|
||||
├── By app (codebase-1, codebase-2, etc.)
|
||||
├── By agent session ID
|
||||
├── By event type (pre-tool, post-tool, stop, etc.)
|
||||
└── By time window (1min, 3min, 5min)
|
||||
```
|
||||
|
||||
**Event summarization:**
|
||||
|
||||
```python
|
||||
# Each event summarized by Haiku ($0.0002 per event)
|
||||
Event: post-tool-use for Write tool
|
||||
Input: {file: "auth.py", content: "...500 lines..."}
|
||||
Output: Success
|
||||
|
||||
Summary generated:
|
||||
"Implemented JWT authentication with refresh tokens in auth.py"
|
||||
|
||||
Cost: $0.0002
|
||||
Human value: Instant understanding without reading 500 lines
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **One-way data stream** - Simple, fast, scalable
|
||||
- **Edge summarization** - AI summaries generated at hook time
|
||||
- **Dual storage** - SQLite (history) + WebSocket (real-time)
|
||||
- **Color coding** - Consistent colors per agent session
|
||||
|
||||
**Results:**
|
||||
|
||||
- **5-10 agents monitored** simultaneously
|
||||
- **Thousands of events logged** (cost: ~$0.20)
|
||||
- **Real-time visibility** into all agent work
|
||||
- **Historical analysis** via SQLite queries
|
||||
|
||||
**Business value:**
|
||||
|
||||
- **Catch errors fast** (notification events = agent blocked)
|
||||
- **Optimize workflows** (which tools used most?)
|
||||
- **Debug issues** (what happened before failure?)
|
||||
- **Scale confidence** (can observe 10+ agents easily)
|
||||
|
||||
**Source:** Multi-Agent Observability transcript
|
||||
|
||||
---
|
||||
|
||||
## Case Study 8: AFK Agent Device
|
||||
|
||||
**Pattern:** Autonomous background work while you're away
|
||||
|
||||
**Problem:** Long-running tasks block your terminal. You want to work on something else.
|
||||
|
||||
**Solution:** Dedicated device running agent fleet
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```text
|
||||
Your Device (interactive):
|
||||
├── Claude Code session
|
||||
├── Send job to agent device
|
||||
└── Monitor status updates
|
||||
|
||||
Agent Device (autonomous):
|
||||
├── Picks up job from queue
|
||||
├── Executes: Scout → Plan → Build → Ship
|
||||
├── Reports status every 60s
|
||||
└── Ships results to git
|
||||
```
|
||||
|
||||
**Workflow:**
|
||||
|
||||
```bash
|
||||
# From your device
|
||||
/afk-agents \
|
||||
--prompt "Build 3 OpenAI SDK agents: basic, with-tools, realtime-voice" \
|
||||
--adw "plan-build-ship" \
|
||||
--docs "https://openai-agent-sdk.com/docs"
|
||||
|
||||
# Job sent to dedicated device
|
||||
# You continue working on your device
|
||||
# Background: Agent device executes workflow
|
||||
```
|
||||
|
||||
**Agent device execution:**
|
||||
|
||||
```text
|
||||
[00:00] Job received: Build 3 SDK agents
|
||||
[00:05] Planner agent created
|
||||
[00:45] Plan complete: 3 agents specified
|
||||
[01:00] Builder agent 1 created (basic agent)
|
||||
[02:30] Builder agent 1 complete: basic-agent.py ✅
|
||||
[02:35] Builder agent 2 created (with tools)
|
||||
[04:15] Builder agent 2 complete: agent-with-tools.py ✅
|
||||
[04:20] Builder agent 3 created (realtime voice)
|
||||
[07:45] Builder agent 3 partial: needs audio libraries
|
||||
[08:00] Builder agent 3 complete: realtime-agent.py ⚠️ (partial)
|
||||
[08:05] Shipper agent created
|
||||
[08:20] Git commit created
|
||||
[08:25] Pushed to remote
|
||||
[08:30] Job complete ✅
|
||||
```
|
||||
|
||||
**Status updates (every 60s):**
|
||||
|
||||
```text
|
||||
Your device shows:
|
||||
|
||||
[60s] Status: Planning agents...
|
||||
[120s] Status: Building agent 1 of 3...
|
||||
[180s] Status: Building agent 2 of 3...
|
||||
[240s] Status: Building agent 3 of 3...
|
||||
[300s] Status: Testing agents...
|
||||
[360s] Status: Shipping to git...
|
||||
[420s] Status: Complete ✅
|
||||
|
||||
Click to view: results/sdk-agents-20250105/
|
||||
```
|
||||
|
||||
**What you do:**
|
||||
|
||||
```text
|
||||
1. Send job (10 seconds)
|
||||
2. Go AFK (work on something else)
|
||||
3. Get notified when complete (7 minutes later)
|
||||
4. Review results
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
|
||||
- **Job queue** - Agents pick up work from queue
|
||||
- **Async status** - Reports back periodically
|
||||
- **Autonomous execution** - No human in the loop
|
||||
- **Git integration** - Results automatically committed
|
||||
|
||||
**Results:**
|
||||
|
||||
- **3 SDK agents built** in 7 minutes
|
||||
- **You worked on other things** during that time
|
||||
- **Autonomous end-to-end** - plan + build + test + ship
|
||||
- **Code review** - Quick glance confirms quality
|
||||
|
||||
**Infrastructure required:**
|
||||
|
||||
- Dedicated machine (M4 Mac Mini, cloud VM, etc.)
|
||||
- Agent queue system
|
||||
- Job scheduler
|
||||
- Status reporting
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Long-running builds
|
||||
- Overnight work
|
||||
- Prototyping experiments
|
||||
- Documentation generation
|
||||
- Codebase refactors
|
||||
|
||||
**Source:** Claude 2.0 transcript
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Patterns
|
||||
|
||||
### Pattern: Context Window as Resource Constraint
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 1: Sub-agent delegation protects primary
|
||||
- Case 2: Scout-plan-build reduces planner context
|
||||
- Case 3: Orchestrator sleeps to protect its context
|
||||
- Case 8: Fresh agents for each phase (no accumulation)
|
||||
|
||||
**Lesson:** Context is precious. Protect it aggressively.
|
||||
|
||||
### Pattern: Specialized Agents Over General
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 3: Frontend/Backend/QA agents vs. one do-everything agent
|
||||
- Case 4: Scout finds, builder builds (not one agent doing both)
|
||||
- Case 5: Planner/builder/reviewer/shipper (4 specialists)
|
||||
- Case 6: Meta-agent only builds, doesn't execute
|
||||
|
||||
**Lesson:** "A focused agent is a performant agent."
|
||||
|
||||
### Pattern: Observability Enables Scale
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 3: Orchestrator tracks agent status
|
||||
- Case 5: Task board shows current phase
|
||||
- Case 7: Real-time dashboard for all agents
|
||||
- Case 8: Status updates every 60s
|
||||
|
||||
**Lesson:** "If you can't measure it, you can't scale it."
|
||||
|
||||
### Pattern: Deletable Temporary Resources
|
||||
|
||||
**Appears in:**
|
||||
|
||||
- Case 3: All 3 agents deleted after completion
|
||||
- Case 4: Scout and builder deleted
|
||||
- Case 5: Each phase agent deleted after task moves
|
||||
- Case 8: Builder agents deleted after shipping
|
||||
|
||||
**Lesson:** "The best agent is a deleted agent."
|
||||
|
||||
## Performance Comparisons
|
||||
|
||||
### Single Agent vs. Multi-Agent
|
||||
|
||||
| Task | Single Agent | Multi-Agent | Speedup |
|
||||
|------|--------------|-------------|---------|
|
||||
| Load 10 docs | 150k tokens, 5min | 30k primary, 2min | 2.5x faster, 80% less context |
|
||||
| SDK migration | Fails (overflow) | 80k max/agent, 30min | Completes vs. fails |
|
||||
| Codebase summary | 120k tokens, 3min | 32k max/agent, 52s | 3.5x faster |
|
||||
| UI components | 80k tokens, 2min | 30k max, 55s | 2.2x faster |
|
||||
|
||||
### With vs. Without Orchestration
|
||||
|
||||
| Metric | Manual (no orchestrator) | With Orchestrator |
|
||||
|--------|-------------------------|-------------------|
|
||||
| Commands per task | 8-12 manual prompts | 1 prompt to orchestrator |
|
||||
| Context management | Manual (forget limits) | Automatic (orchestrator sleeps) |
|
||||
| Error recovery | Start over | Retry failed phase only |
|
||||
| Observability | Terminal logs | Real-time dashboard |
|
||||
|
||||
## Common Failure Modes
|
||||
|
||||
### Failure: Context Explosion
|
||||
|
||||
**Scenario:** Case 2 without scouts
|
||||
|
||||
- Single agent reads 100+ files
|
||||
- Context hits 180k tokens
|
||||
- Agent slows down, makes mistakes
|
||||
- Eventually fails or times out
|
||||
|
||||
**Fix:** Add scout phase to filter files first
|
||||
|
||||
### Failure: Orchestrator Watching Everything
|
||||
|
||||
**Scenario:** Case 3 with observing orchestrator
|
||||
|
||||
- Orchestrator watches all agent work
|
||||
- Orchestrator context grows to 100k+
|
||||
- Can't coordinate more than 2-3 agents
|
||||
- System doesn't scale
|
||||
|
||||
**Fix:** Implement orchestrator sleep pattern
|
||||
|
||||
### Failure: No Observability
|
||||
|
||||
**Scenario:** Case 7 without dashboard
|
||||
|
||||
- 5 agents running
|
||||
- One agent stuck on permission request
|
||||
- No way to know which agent needs attention
|
||||
- Entire workflow blocked
|
||||
|
||||
**Fix:** Add hooks + observability system
|
||||
|
||||
### Failure: Agent Accumulation
|
||||
|
||||
**Scenario:** Case 5 not deleting agents
|
||||
|
||||
- 20 tasks completed
|
||||
- 80 agents still running (4 per task)
|
||||
- System resources exhausted
|
||||
- New agents can't start
|
||||
|
||||
**Fix:** Delete agents after task completion
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Parallelization = Sub-agents** - Nothing else runs agents in parallel
|
||||
|
||||
2. **Context protection = Specialization** - Focused agents use less context
|
||||
|
||||
3. **Orchestration = Scale** - Single interface manages fleet
|
||||
|
||||
4. **Observability = Confidence** - Can't scale what you can't see
|
||||
|
||||
5. **Deletable = Sustainable** - Free resources for next task
|
||||
|
||||
6. **Multi-agent is Level 5** - Requires mastering Levels 1-4 first
|
||||
|
||||
## When to Use Multi-Agent Patterns
|
||||
|
||||
Use multi-agent when:
|
||||
|
||||
- ✅ Task naturally divides into parallel subtasks
|
||||
- ✅ Single agent context approaching limits
|
||||
- ✅ Need quality gates between phases
|
||||
- ✅ Want to work on other things while agents execute
|
||||
- ✅ Have observability infrastructure
|
||||
|
||||
Don't use multi-agent when:
|
||||
|
||||
- ❌ Simple one-off task
|
||||
- ❌ Learning/prototyping phase
|
||||
- ❌ No way to monitor agents
|
||||
- ❌ Task requires tight human-in-loop feedback
|
||||
|
||||
## Source Attribution
|
||||
|
||||
All case studies drawn from field experience documented in 8 source transcripts:
|
||||
|
||||
1. Elite Context Engineering - Case 1 (AI docs loader)
|
||||
2. Claude 2.0 - Case 2 (SDK migration), Case 8 (AFK device)
|
||||
3. Custom Agents - Case 5 (task board)
|
||||
4. Sub-Agents - Case 6 (meta-agent)
|
||||
5. Multi-Agent Observability - Case 7 (dashboard)
|
||||
6. Hooked - Supporting patterns
|
||||
7. One Agent to Rule Them All - Case 3 (summarization), Case 4 (UI components)
|
||||
8. (Transcript 8 name not specified in context)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Orchestrator Pattern](../patterns/orchestrator-pattern.md) - Multi-agent coordination
|
||||
- [Hooks for Observability](../patterns/hooks-observability.md) - Monitoring implementation
|
||||
- [Context Window Protection](../patterns/context-window-protection.md) - Resource management
|
||||
- [Evolution Path](../workflows/evolution-path.md) - Progression to multi-agent mastery
|
||||
|
||||
---
|
||||
|
||||
**Remember:** These are real systems in production. Start simple, add complexity only when needed.
|
||||
358
skills/multi-agent-composition/examples/progression-example.md
Normal file
358
skills/multi-agent-composition/examples/progression-example.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# Work Tree Manager: Evolution Path Example
|
||||
|
||||
**Real-world case study** showing the proper progression from prompt → sub-agent → skill.
|
||||
|
||||
## The Problem
|
||||
|
||||
Managing git work trees across a project requires multiple related operations:
|
||||
|
||||
- Creating new work trees
|
||||
- Listing existing work trees
|
||||
- Removing old work trees
|
||||
- Merging work tree changes
|
||||
- Updating work tree status
|
||||
|
||||
## Stage 1: Start with a Prompt
|
||||
|
||||
**Goal:** Solve the basic problem
|
||||
|
||||
Create a simple slash command that creates one work tree:
|
||||
|
||||
```bash
|
||||
/create-worktree feature-branch
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```markdown
|
||||
# .claude/commands/create-worktree.md
|
||||
|
||||
Create a new git worktree for the specified branch.
|
||||
|
||||
Steps:
|
||||
1. Check if branch exists
|
||||
2. Create worktree directory
|
||||
3. Initialize worktree
|
||||
4. Report success
|
||||
```
|
||||
|
||||
**When to stay here:** The task is infrequent or one-off.
|
||||
|
||||
**Signal to advance:** You find yourself creating work trees regularly.
|
||||
|
||||
## Stage 2: Add Sub-Agent for Parallelism
|
||||
|
||||
**Goal:** Scale to multiple parallel operations
|
||||
|
||||
When you need to create multiple work trees at once, use a sub-agent:
|
||||
|
||||
```bash
|
||||
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
|
||||
```
|
||||
|
||||
**Why sub-agent:**
|
||||
|
||||
- **Parallelization** - Create 3 work trees simultaneously
|
||||
- **Context isolation** - Each creation is independent
|
||||
- **Speed** - 3x faster than sequential
|
||||
|
||||
**Sub-agent prompt:**
|
||||
|
||||
```markdown
|
||||
Create work trees for the following branches in parallel:
|
||||
- feature-a
|
||||
- feature-b
|
||||
- feature-c
|
||||
|
||||
For each branch:
|
||||
1. Verify branch exists
|
||||
2. Create worktree directory
|
||||
3. Initialize worktree
|
||||
4. Report status
|
||||
|
||||
Use the /create-worktree command for each.
|
||||
```
|
||||
|
||||
**When to stay here:** Parallel creation is the only requirement.
|
||||
|
||||
**Signal to advance:** You need to **manage** work trees (not just create them).
|
||||
|
||||
## Stage 3: Create Skill for Management
|
||||
|
||||
**Goal:** Bundle multiple related operations
|
||||
|
||||
The problem has grown beyond creation—you need comprehensive work tree **management**:
|
||||
|
||||
```text
|
||||
skills/work-tree-manager/
|
||||
├── SKILL.md
|
||||
├── scripts/
|
||||
│ ├── validate.py
|
||||
│ └── cleanup.py
|
||||
└── reference/
|
||||
└── git-worktree-commands.md
|
||||
```
|
||||
|
||||
**SKILL.md:**
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: work-tree-manager
|
||||
description: Manage git worktrees - create, list, remove, merge, and update across projects. Use when working with git worktrees or when managing multiple branches simultaneously.
|
||||
---
|
||||
|
||||
# Work Tree Manager
|
||||
|
||||
## Operations
|
||||
|
||||
### Create
|
||||
Use /create-worktree command for single operations.
|
||||
For parallel creation, delegate to sub-agent.
|
||||
|
||||
### List
|
||||
Run: `git worktree list`
|
||||
Parse output and present in readable format.
|
||||
|
||||
### Remove
|
||||
1. Check if work tree is clean
|
||||
2. Remove work tree directory
|
||||
3. Prune references
|
||||
|
||||
### Merge
|
||||
1. Fetch latest changes
|
||||
2. Merge work tree branch to target
|
||||
3. Clean up if merge successful
|
||||
|
||||
### Update
|
||||
1. Check status of all work trees
|
||||
2. Pull latest changes
|
||||
3. Report any conflicts
|
||||
|
||||
## Validation
|
||||
|
||||
Before any destructive operation, run:
|
||||
```bash
|
||||
python scripts/validate.py <worktree-path>
|
||||
```
|
||||
|
||||
## Cleanup
|
||||
|
||||
Periodically run cleanup to remove stale work trees:
|
||||
|
||||
```bash
|
||||
python scripts/cleanup.py --dry-run
|
||||
```
|
||||
|
||||
|
||||
```bash
|
||||
|
||||
**Why skill:**
|
||||
|
||||
- **Multiple related operations** - Create, list, remove, merge, update
|
||||
- **Repeat problem** - Managing work trees is ongoing
|
||||
- **Domain-specific** - Specialized knowledge about git worktrees
|
||||
- **Orchestration** - Coordinates slash commands, sub-agents, and scripts
|
||||
|
||||
**When to stay here:** Most workflows stop here.
|
||||
|
||||
**Signal to advance:** Need external data (GitHub API, CI/CD status).
|
||||
|
||||
## Stage 4: Add MCP for External Data
|
||||
|
||||
**Goal:** Integrate external systems
|
||||
|
||||
Add MCP server to query external repo metadata:
|
||||
|
||||
```
|
||||
|
||||
skills/work-tree-manager/
|
||||
├── SKILL.md (updated)
|
||||
└── ... (existing files)
|
||||
|
||||
# Now references GitHub MCP for:
|
||||
# - Branch protection rules
|
||||
# - CI/CD status
|
||||
# - Pull request information
|
||||
|
||||
```bash
|
||||
|
||||
**Updated SKILL.md section:**
|
||||
```markdown
|
||||
## External Integration
|
||||
|
||||
Before creating work tree, check GitHub status:
|
||||
- Use GitHub MCP to query branch protection
|
||||
- Check if CI is passing
|
||||
- Verify no open blocking PRs
|
||||
|
||||
Query: `GitHub:get_branch_status <branch-name>`
|
||||
```
|
||||
|
||||
**Why MCP:**
|
||||
|
||||
- **External data** - Information lives outside Claude Code
|
||||
- **Real-time** - CI/CD status changes frequently
|
||||
- **Third-party** - GitHub API integration
|
||||
|
||||
## Final State
|
||||
|
||||
```
|
||||
|
||||
```text
|
||||
|
||||
```text
|
||||
|
||||
```text
|
||||
|
||||
```text
|
||||
Prompt (Slash Command)
|
||||
└─→ Creates single work tree
|
||||
|
||||
Sub-Agent
|
||||
└─→ Creates multiple work trees in parallel
|
||||
|
||||
Skill
|
||||
├─→ Orchestrates: Create, list, remove, merge, update
|
||||
├─→ Uses: Slash commands for primitives
|
||||
├─→ Uses: Sub-agents for parallel operations
|
||||
└─→ Uses: Scripts for validation
|
||||
|
||||
MCP Server (GitHub)
|
||||
└─→ Provides: Branch status, CI/CD info, PR data
|
||||
|
||||
Skill + MCP
|
||||
└─→ Full-featured work tree manager with external integration
|
||||
```
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
### Progression Signals
|
||||
|
||||
**Prompt → Sub-Agent:**
|
||||
|
||||
- Signal: Need parallelization
|
||||
- Keyword: "multiple," "parallel," "batch"
|
||||
|
||||
**Sub-Agent → Skill:**
|
||||
|
||||
- Signal: Need management, not just execution
|
||||
- Keywords: "manage," "coordinate," "workflow"
|
||||
- Multiple related operations emerge
|
||||
|
||||
**Skill → Skill + MCP:**
|
||||
|
||||
- Signal: Need external data or services
|
||||
- Keywords: "GitHub," "API," "real-time," "status"
|
||||
|
||||
### Common Mistakes
|
||||
|
||||
❌ **Skipping the prompt**
|
||||
|
||||
- Starting with a skill for simple creation
|
||||
|
||||
❌ **Overusing sub-agents**
|
||||
|
||||
- Using sub-agents when main conversation would work
|
||||
|
||||
❌ **Skill too early**
|
||||
|
||||
- Creating skill before understanding the full problem domain
|
||||
|
||||
✅ **Correct approach**
|
||||
|
||||
- Build from bottom up
|
||||
- Add complexity only when needed
|
||||
- Each stage solves a real problem
|
||||
|
||||
### Decision Checklist
|
||||
|
||||
Before advancing to next stage:
|
||||
|
||||
**Prompt → Sub-Agent:**
|
||||
|
||||
- [ ] Do I need parallelization?
|
||||
- [ ] Are operations truly independent?
|
||||
- [ ] Am I okay losing context after?
|
||||
|
||||
**Sub-Agent → Skill:**
|
||||
|
||||
- [ ] Am I doing this repeatedly (3+ times)?
|
||||
- [ ] Do I have multiple related operations?
|
||||
- [ ] Is this a management problem, not just execution?
|
||||
- [ ] Would orchestration add real value?
|
||||
|
||||
**Skill → Skill + MCP:**
|
||||
|
||||
- [ ] Do I need external data?
|
||||
- [ ] Is the data outside Claude Code's control?
|
||||
- [ ] Would real-time info improve the workflow?
|
||||
|
||||
## Real Usage
|
||||
|
||||
### Scenario 1: Quick One-Off
|
||||
|
||||
**Task:** Create one work tree for hotfix
|
||||
|
||||
**Solution:** Slash command
|
||||
|
||||
```bash
|
||||
/create-worktree hotfix-urgent-bug
|
||||
```
|
||||
|
||||
**Why:** Simple, direct, one-time task.
|
||||
|
||||
### Scenario 2: Feature Development Sprint
|
||||
|
||||
**Task:** Create work trees for 5 feature branches
|
||||
|
||||
**Solution:** Sub-agent
|
||||
|
||||
```bash
|
||||
Create work trees in parallel for sprint features:
|
||||
feature-auth, feature-api, feature-ui, feature-tests, feature-docs
|
||||
```
|
||||
|
||||
**Why:** Parallel execution, independent operations.
|
||||
|
||||
### Scenario 3: Ongoing Project
|
||||
|
||||
**Task:** Manage all work trees across development lifecycle
|
||||
|
||||
**Solution:** Skill
|
||||
|
||||
```text
|
||||
List all work trees, check status, merge completed features, clean up stale ones
|
||||
```
|
||||
|
||||
**Why:** Multiple operations, repeat problem, management need.
|
||||
|
||||
### Scenario 4: CI/CD Integration
|
||||
|
||||
**Task:** Only create work trees for branches passing CI
|
||||
|
||||
**Solution:** Skill + MCP
|
||||
|
||||
```bash
|
||||
Create work trees for features that:
|
||||
|
||||
- Have passing CI (check via GitHub MCP)
|
||||
- Are approved by reviewers
|
||||
- Have no merge conflicts
|
||||
```
|
||||
|
||||
**Why:** Need external data from GitHub API.
|
||||
|
||||
## Summary
|
||||
|
||||
The work tree manager evolution demonstrates:
|
||||
|
||||
1. **Start simple** - Slash command for basic operation
|
||||
2. **Scale for parallelism** - Sub-agent for batch operations
|
||||
3. **Manage complexity** - Skill for full workflow orchestration
|
||||
4. **Integrate externally** - MCP for real-time external data
|
||||
|
||||
**The principle:** Each stage solves a real problem. Don't advance until you hit the limitation of your current approach.
|
||||
|
||||
> "When you're starting out, I always recommend you just build a prompt. Everything is a prompt in the end."
|
||||
|
||||
Build from the foundation upward.
|
||||
@@ -0,0 +1,158 @@
|
||||
# Context in Composition
|
||||
|
||||
**Strategic framework for managing context when composing multi-agent systems.**
|
||||
|
||||
## The Core Problem
|
||||
|
||||
Context window is your most precious resource when composing multiple agents. A focused agent is a performant agent.
|
||||
|
||||
**The Reality:**
|
||||
|
||||
```text
|
||||
Single agent doing everything:
|
||||
├── Context explodes to 150k+ tokens
|
||||
├── Performance degrades
|
||||
└── Eventually fails or times out
|
||||
|
||||
Multi-agent composition:
|
||||
├── Each agent: <40k tokens
|
||||
├── Main agent: Stays lean
|
||||
└── Work completes successfully
|
||||
```
|
||||
|
||||
## The R&D Framework
|
||||
|
||||
There are only two strategies for managing context in multi-agent systems:
|
||||
|
||||
**R - Reduce**
|
||||
|
||||
- Minimize what enters context windows
|
||||
- Remove unused MCP servers (can consume 24k+ tokens)
|
||||
- Shrink static CLAUDE.md files
|
||||
- Use context priming instead of static loading
|
||||
|
||||
**D - Delegate**
|
||||
|
||||
- Move work to sub-agents' isolated contexts
|
||||
- Use background agents for autonomous work
|
||||
- Employ orchestrator sleep patterns
|
||||
- Treat agents as deletable temporary resources
|
||||
|
||||
**Everything else is a tactic implementing R or D.**
|
||||
|
||||
## The Four Levels of Context Mastery
|
||||
|
||||
### Level 1: Beginner - Stop Wasting Tokens
|
||||
|
||||
**Focus:** Resource management
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Remove unused MCP servers (reclaim 20k+ tokens)
|
||||
- Minimize CLAUDE.md (<1k tokens)
|
||||
- Disable autocompact buffer (reclaim 20%)
|
||||
|
||||
**Success Metric:** 85-90% context window free at startup
|
||||
|
||||
**Move to Level 2 when:** Resources cleaned but still rebuilding context for different tasks
|
||||
|
||||
---
|
||||
|
||||
### Level 2: Intermediate - Load Selectively
|
||||
|
||||
**Focus:** Dynamic context loading
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Context priming (`/prime` commands vs. static files)
|
||||
- Sub-agent delegation for parallel work
|
||||
- Composable workflows (scout-plan-build)
|
||||
|
||||
**Success Metric:** 60-75% context window free during work
|
||||
|
||||
**Move to Level 3 when:** Managing multiple agents but struggling with handoffs
|
||||
|
||||
---
|
||||
|
||||
### Level 3: Advanced - Multi-Agent Handoff
|
||||
|
||||
**Focus:** Agent-to-agent context transfer
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Context bundles (60-70% transfer in 10% tokens)
|
||||
- Monitor context limits proactively
|
||||
- Chain multiple agents without overflow
|
||||
|
||||
**Success Metric:** Per-agent context <60k tokens, successful handoffs
|
||||
|
||||
**Move to Level 4 when:** Need agents working autonomously while you do other work
|
||||
|
||||
---
|
||||
|
||||
### Level 4: Agentic - Out-of-Loop Systems
|
||||
|
||||
**Focus:** Fleet orchestration
|
||||
|
||||
**Key Actions:**
|
||||
|
||||
- Background agents (`/background` command)
|
||||
- Dedicated agent environments
|
||||
- Orchestrator sleep patterns
|
||||
- Zero-touch execution
|
||||
|
||||
**Success Metric:** Agents ship work end-to-end without intervention
|
||||
|
||||
---
|
||||
|
||||
## When Context Becomes a Composition Issue
|
||||
|
||||
**Trigger 1: Single Agent Exceeds 150k Tokens**
|
||||
→ Delegate to sub-agents with isolated contexts
|
||||
|
||||
**Trigger 2: Agent Reading >20 Files**
|
||||
→ Use scout agents to identify relevant subset first
|
||||
|
||||
**Trigger 3: `/context` Shows >80% Used**
|
||||
→ Start fresh agent, use context bundles for handoff
|
||||
|
||||
**Trigger 4: Performance Degrading Mid-Workflow**
|
||||
→ Split workflow across multiple focused agents
|
||||
|
||||
**Trigger 5: Same Analysis Repeated Multiple Times**
|
||||
→ Context overflow forcing re-reads; delegate earlier
|
||||
|
||||
## Composition Patterns by Level
|
||||
|
||||
**Beginner:** Single agent, minimal static context
|
||||
|
||||
**Intermediate:** Main agent + sub-agents for parallel work
|
||||
|
||||
**Advanced:** Agent chains with context bundles for handoff
|
||||
|
||||
**Agentic:** Orchestrator + fleet of specialized agents
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Focused agents perform better** - Single purpose, minimal context
|
||||
2. **Agents are deletable** - Free context by removing completed agents
|
||||
3. **200k is plenty** - Context explosions are design problems, not capacity problems
|
||||
4. **Orchestrators must sleep** - Don't observe all sub-agent work
|
||||
5. **Context bundles over full replay** - 70% context in 10% tokens
|
||||
|
||||
## Implementation Details
|
||||
|
||||
For practical patterns, see:
|
||||
|
||||
- [Multi-Agent Context Isolation](../reference/multi-agent-context-isolation.md) - Parallel execution, context bundling
|
||||
- [Orchestrator Pattern](orchestrator-pattern.md) - Sleep patterns, fleet management
|
||||
- [Decision Framework](decision-framework.md) - When to use each component
|
||||
|
||||
## Source Attribution
|
||||
|
||||
Primary: Elite Context Engineering, Claude 2.0 transcripts
|
||||
Supporting: One Agent to Rule Them All, Sub-Agents documentation
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Context is the first pillar of the Core 4. Master context strategy, and you can scale infinitely with focused agents.
|
||||
715
skills/multi-agent-composition/patterns/context-management.md
Normal file
715
skills/multi-agent-composition/patterns/context-management.md
Normal file
@@ -0,0 +1,715 @@
|
||||
# Context Window Protection
|
||||
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
|
||||
|
||||
Context window protection is about managing your agent's most precious resource: attention. A focused agent is a performant agent.
|
||||
|
||||
## The Core Problem
|
||||
|
||||
**Every engineer hits this wall:**
|
||||
|
||||
```text
|
||||
Agent starts: 10k tokens (5% used)
|
||||
↓
|
||||
After exploration: 80k tokens (40% used)
|
||||
↓
|
||||
After planning: 120k tokens (60% used)
|
||||
↓
|
||||
During implementation: 170k tokens (85% used) ⚠️
|
||||
↓
|
||||
Context explodes: 195k tokens (98% used) ❌
|
||||
↓
|
||||
Agent performance degrades, fails, or times out
|
||||
```
|
||||
|
||||
**The realization:** More context ≠ better performance. Too much context = cognitive overload.
|
||||
|
||||
## The R&D Framework
|
||||
|
||||
There are only two ways to manage your context window:
|
||||
|
||||
```text
|
||||
R - REDUCE
|
||||
└─→ Minimize what enters the context window
|
||||
|
||||
D - DELEGATE
|
||||
└─→ Move work to other agents' context windows
|
||||
```
|
||||
|
||||
**Everything else is a tactic implementing R or D.**
|
||||
|
||||
## The Four Levels of Context Protection
|
||||
|
||||
### Level 1: Beginner - Reduce Waste
|
||||
|
||||
**Focus:** Stop wasting tokens on unused resources
|
||||
|
||||
#### Tactic 1: Eliminate Default MCP Servers
|
||||
|
||||
**Problem:**
|
||||
|
||||
```bash
|
||||
# Default mcp.json
|
||||
{
|
||||
"mcpServers": {
|
||||
"firecrawl": {...}, # 6k tokens
|
||||
"github": {...}, # 8k tokens
|
||||
"postgres": {...}, # 5k tokens
|
||||
"redis": {...} # 5k tokens
|
||||
}
|
||||
}
|
||||
# Total: 24k tokens always loaded (12% of 200k window!)
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Option 1: Delete default mcp.json entirely
|
||||
rm .claude/mcp.json
|
||||
|
||||
# Option 2: Load selectively
|
||||
claude-mcp-config --strict specialized-configs/firecrawl-only.json
|
||||
# Result: 4k tokens instead of 24k (83% reduction)
|
||||
```
|
||||
|
||||
#### Tactic 2: Minimize CLAUDE.md
|
||||
|
||||
**Before:**
|
||||
|
||||
```markdown
|
||||
# CLAUDE.md (23,000 tokens = 11.5% of window)
|
||||
- 500 lines of API documentation
|
||||
- 300 lines of deployment procedures
|
||||
- 1,500 lines of coding standards
|
||||
- Architecture diagrams
|
||||
- Always loaded, whether relevant or not
|
||||
```
|
||||
|
||||
**After:**
|
||||
|
||||
```markdown
|
||||
# CLAUDE.md (500 tokens = 0.25% of window)
|
||||
# Only universal essentials
|
||||
|
||||
- Fenced code blocks MUST have language
|
||||
- Use rg instead of grep
|
||||
- ALWAYS use set -euo pipefail
|
||||
```
|
||||
|
||||
**Rule:** Only include what you're 100% sure you want loaded 100% of the time.
|
||||
|
||||
#### Tactic 3: Disable Autocompact Buffer
|
||||
|
||||
**Problem:**
|
||||
|
||||
```bash
|
||||
/context
|
||||
|
||||
# Output:
|
||||
autocompact buffer: 22% ⚠️ (44k tokens gone!)
|
||||
messages: 51%
|
||||
system_tools: 8%
|
||||
---
|
||||
Total available: 78% (should be 100%)
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
/config
|
||||
# Set: autocompact = false
|
||||
|
||||
# Now:
|
||||
/context
|
||||
# Output:
|
||||
messages: 51%
|
||||
system_tools: 8%
|
||||
custom_agents: 2%
|
||||
---
|
||||
Total available: 91% ✅ (reclaimed 22%!)
|
||||
```
|
||||
|
||||
**Impact:** Reclaims 40k+ tokens immediately.
|
||||
|
||||
### Level 2: Intermediate - Dynamic Loading
|
||||
|
||||
**Focus:** Load what you need, when you need it
|
||||
|
||||
#### Tactic 4: Context Priming
|
||||
|
||||
**Replace static CLAUDE.md with task-specific `/prime` commands**
|
||||
|
||||
```markdown
|
||||
# .claude/commands/prime.md
|
||||
# General codebase context (2k tokens)
|
||||
Read README, understand structure, report findings
|
||||
|
||||
# .claude/commands/prime-feature.md
|
||||
# Feature development context (3k tokens)
|
||||
Read feature requirements, understand dependencies, plan implementation
|
||||
|
||||
# .claude/commands/prime-api.md
|
||||
# API work context (4k tokens)
|
||||
Read API docs, understand endpoints, review integration patterns
|
||||
```
|
||||
|
||||
**Usage pattern:**
|
||||
|
||||
```bash
|
||||
# Starting feature work
|
||||
/prime-feature
|
||||
|
||||
# vs. having 23k tokens always loaded
|
||||
```
|
||||
|
||||
**Savings:** 20k tokens (87% reduction)
|
||||
|
||||
#### Tactic 5: Sub-Agent Delegation
|
||||
|
||||
**Problem:** Primary agent doing parallel work fills its own context
|
||||
|
||||
```text
|
||||
Primary Agent tries to do:
|
||||
├── Web scraping (15k tokens)
|
||||
├── Documentation fetch (12k tokens)
|
||||
├── Data analysis (10k tokens)
|
||||
└── Synthesis (5k tokens)
|
||||
= 42k tokens in one agent
|
||||
```
|
||||
|
||||
**Solution:** Delegate to sub-agents with isolated contexts
|
||||
|
||||
```text
|
||||
Primary Agent (9k tokens):
|
||||
├→ Sub-Agent 1: Web scraping (15k tokens, isolated)
|
||||
├→ Sub-Agent 2: Docs fetch (12k tokens, isolated)
|
||||
└→ Sub-Agent 3: Analysis (10k tokens, isolated)
|
||||
|
||||
Total work: 46k tokens
|
||||
Primary agent context: Only 9k tokens ✅
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
/load-ai-docs
|
||||
|
||||
# Agent spawns 10 sub-agents for web scraping
|
||||
# Each scrape: ~3k tokens
|
||||
# Total work: 30k tokens
|
||||
# Primary agent context: Still only 9k tokens
|
||||
# Savings: 21k tokens protected
|
||||
```
|
||||
|
||||
**Key insight:** Sub-agents use system prompts (not user prompts), keeping their context isolated from primary.
|
||||
|
||||
### Level 3: Advanced - Multi-Agent Handoff
|
||||
|
||||
**Focus:** Chain agents together without context explosion
|
||||
|
||||
#### Tactic 6: Context Bundles
|
||||
|
||||
**Problem:** Agent 1's context explodes (180k tokens). Need to hand off to fresh Agent 2 without full replay.
|
||||
|
||||
**Solution:** Bundle 60-70% of essential context
|
||||
|
||||
```markdown
|
||||
# context-bundle-2025-01-05-<session-id>.md
|
||||
|
||||
## Context Bundle
|
||||
Created: 2025-01-05 14:30
|
||||
Source Agent: agent-abc123
|
||||
|
||||
## Initial Setup
|
||||
/prime-feature
|
||||
|
||||
## Read Operations (deduplicated)
|
||||
- src/api/endpoints.ts
|
||||
- src/components/Auth.tsx
|
||||
- config/env.ts
|
||||
|
||||
## Key Findings
|
||||
- Auth system uses JWT
|
||||
- API has 15 endpoints
|
||||
- Config needs migration
|
||||
|
||||
## User Prompts (summarized)
|
||||
1. "Implement OAuth2 flow"
|
||||
2. "Add refresh token logic"
|
||||
|
||||
[Excluded: full write operations, detailed read contents, tool execution details]
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Agent 1: Context exploding at 180k
|
||||
# Automatic bundle saved
|
||||
|
||||
# Agent 2: Fresh start (10k base)
|
||||
/loadbundle /path/to/context-bundle-<timestamp>.md
|
||||
# Agent 2 now has 70% of Agent 1's context in ~15k tokens
|
||||
|
||||
# Total: 25k tokens vs. 180k (86% reduction)
|
||||
```
|
||||
|
||||
#### Tactic 7: Composable Workflows (Scout-Plan-Build)
|
||||
|
||||
**Problem:** Single agent searching + planning + building = context explosion
|
||||
|
||||
```text
|
||||
Monolithic Agent:
|
||||
├── Search codebase: 40k tokens
|
||||
├── Read files: 60k tokens
|
||||
├── Plan changes: 20k tokens
|
||||
├── Implement: 30k tokens
|
||||
├── Test: 15k tokens
|
||||
└── Total: 165k tokens (83% used!)
|
||||
```
|
||||
|
||||
**Solution:** Break into composable steps that delegate
|
||||
|
||||
```text
|
||||
/scout-plan-build workflow:
|
||||
|
||||
Step 1: /scout (delegates to 4 parallel sub-agents)
|
||||
├→ Sub-agents search codebase: 4 × 15k = 60k total
|
||||
├→ Output: relevant-files.md (5k tokens)
|
||||
└→ Primary agent context: unchanged
|
||||
|
||||
Step 2: /plan-with-docs
|
||||
├→ Reads relevant-files.md: 5k tokens
|
||||
├→ Scrapes docs: 8k tokens
|
||||
├→ Creates plan: 3k tokens
|
||||
└→ Total added: 16k tokens
|
||||
|
||||
Step 3: /build
|
||||
├→ Reads plan: 3k tokens
|
||||
├→ Implements: 30k tokens
|
||||
└→ Total added: 33k tokens
|
||||
|
||||
Final primary agent context: 10k + 16k + 33k = 59k tokens
|
||||
Savings: 106k tokens (64% reduction)
|
||||
```
|
||||
|
||||
**Why this works:** Scout step offloads searching from planner (R&D: Reduce + Delegate)
|
||||
|
||||
### Level 4: Agentic - Out-of-Loop Systems
|
||||
|
||||
**Focus:** Agents working autonomously while you're AFK
|
||||
|
||||
#### Tactic 8: Focused Agents (One Agent, One Task)
|
||||
|
||||
**Anti-pattern:**
|
||||
|
||||
```text
|
||||
Super Agent (trying to do everything):
|
||||
├── API development
|
||||
├── UI implementation
|
||||
├── Database migrations
|
||||
├── Testing
|
||||
├── Documentation
|
||||
├── Deployment
|
||||
└── Context: 170k tokens (85% used)
|
||||
```
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```text
|
||||
Focused Agent Fleet:
|
||||
├── Agent 1: API only (30k tokens)
|
||||
├── Agent 2: UI only (35k tokens)
|
||||
├── Agent 3: DB only (20k tokens)
|
||||
├── Agent 4: Tests only (25k tokens)
|
||||
├── Agent 5: Docs only (15k tokens)
|
||||
└── Each agent: <35k tokens (max 18% per agent)
|
||||
```
|
||||
|
||||
**Principle:** "A focused engineer is a performant engineer. A focused agent is a performant agent."
|
||||
|
||||
#### Tactic 9: Deletable Agents
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```bash
|
||||
# Create agent for specific task
|
||||
/create-agent docs-writer "Document frontend components"
|
||||
|
||||
# Agent completes task (used 30k tokens)
|
||||
|
||||
# DELETE agent immediately
|
||||
/delete-agent docs-writer
|
||||
|
||||
# Result: 30k tokens freed for next agent
|
||||
```
|
||||
|
||||
**Lifecycle:**
|
||||
|
||||
```text
|
||||
1. Create agent → Task-specific context loaded
|
||||
2. Agent works → Context grows to completion
|
||||
3. Agent completes → Context maxed out
|
||||
4. DELETE agent → Context freed
|
||||
5. Create new agent → Fresh start
|
||||
6. Repeat
|
||||
```
|
||||
|
||||
**Engineering analogy:** "The best code is no code at all. The best agent is a deleted agent."
|
||||
|
||||
#### Tactic 10: Background Agent Delegation
|
||||
|
||||
**Problem:** You're in the loop, waiting for agent to finish long task
|
||||
|
||||
**Solution:** Delegate to background agent, continue working
|
||||
|
||||
```bash
|
||||
# In-loop (you wait, your context stays open)
|
||||
/implement-feature "Build auth system"
|
||||
# Your terminal blocked for 20 minutes
|
||||
# Context accumulates: 150k tokens
|
||||
|
||||
# Out-of-loop (you continue working)
|
||||
/background "Build auth system" \
|
||||
--model opus \
|
||||
--report agents/auth-report.md
|
||||
|
||||
# Background agent works independently
|
||||
# Your terminal freed immediately
|
||||
# Background agent context isolated
|
||||
# You get notified when complete
|
||||
```
|
||||
|
||||
**Context protection:**
|
||||
|
||||
- Primary agent: 10k tokens (just manages job queue)
|
||||
- Background agent: 150k tokens (isolated, will be deleted)
|
||||
- Your interactive session: 10k tokens (protected)
|
||||
|
||||
#### Tactic 11: Orchestrator Sleep Pattern
|
||||
|
||||
**Problem:** Orchestrator observing all agent work = context explosion
|
||||
|
||||
```text
|
||||
Orchestrator watches everything:
|
||||
├── Scout 1 work: 15k tokens observed
|
||||
├── Scout 2 work: 15k tokens observed
|
||||
├── Scout 3 work: 15k tokens observed
|
||||
├── Planner work: 25k tokens observed
|
||||
├── Builder work: 35k tokens observed
|
||||
└── Orchestrator context: 105k tokens
|
||||
```
|
||||
|
||||
**Solution:** Orchestrator sleeps while agents work
|
||||
|
||||
```text
|
||||
Orchestrator pattern:
|
||||
1. Create scouts → 3k tokens (commands only)
|
||||
2. SLEEP (not observing)
|
||||
3. Wake every 15s, check status → 1k tokens
|
||||
4. Scouts complete, read outputs → 5k tokens
|
||||
5. Create planner → 2k tokens
|
||||
6. SLEEP (not observing)
|
||||
7. Wake every 15s, check status → 1k tokens
|
||||
8. Planner completes, read output → 3k tokens
|
||||
9. Create builder → 2k tokens
|
||||
10. SLEEP (not observing)
|
||||
|
||||
Orchestrator final context: 17k tokens ✅
|
||||
vs. 105k if watching everything (84% reduction)
|
||||
```
|
||||
|
||||
**Key principle:** Orchestrator wakes to coordinate, sleeps while agents work.
|
||||
|
||||
## Monitoring Context Health
|
||||
|
||||
### The /context Command
|
||||
|
||||
```bash
|
||||
/context
|
||||
|
||||
# Healthy agent (beginner level):
|
||||
messages: 8%
|
||||
system_tools: 5%
|
||||
custom_agents: 2%
|
||||
---
|
||||
Total used: 15% ✅ (85% free)
|
||||
|
||||
# Warning (intermediate):
|
||||
messages: 45%
|
||||
mcp_tools: 18%
|
||||
system_tools: 5%
|
||||
---
|
||||
Total used: 68% ⚠️ (32% free, approaching limits)
|
||||
|
||||
# Danger (needs intervention):
|
||||
messages: 72%
|
||||
mcp_tools: 24%
|
||||
system_tools: 5%
|
||||
---
|
||||
Total used: 101% ❌ (context overflow!)
|
||||
```
|
||||
|
||||
### Success Metrics by Level
|
||||
|
||||
| Level | Target Context Free | What This Enables |
|
||||
|-------|---------------------|-------------------|
|
||||
| Beginner | 85-90% | Basic tasks without running out |
|
||||
| Intermediate | 60-75% | Complex tasks with breathing room |
|
||||
| Advanced | 40-60% | Multi-step workflows without overflow |
|
||||
| Agentic | Per-agent 60-80% | Fleet of focused agents |
|
||||
|
||||
### Warning Signs
|
||||
|
||||
**Your context window is in danger when:**
|
||||
|
||||
❌ **Single agent exceeds 150k tokens**
|
||||
|
||||
- Solution: Split work across multiple agents
|
||||
|
||||
❌ **Agent needs to read >20 files**
|
||||
|
||||
- Solution: Use scout agents to find relevant subset
|
||||
|
||||
❌ **`/context` shows >80% used**
|
||||
|
||||
- Solution: Start fresh agent, use context bundles
|
||||
|
||||
❌ **Agent gets slower/less accurate**
|
||||
|
||||
- Solution: Check context usage, delegate to sub-agents
|
||||
|
||||
❌ **Autocompact buffer active**
|
||||
|
||||
- Solution: Disable it, reclaim 20%+ tokens
|
||||
|
||||
## Context Window Hard Limits
|
||||
|
||||
> "Context window is a hard limit. We have to respect this and work around it."
|
||||
|
||||
### The Reality
|
||||
|
||||
```text
|
||||
Claude Opus 200k limit:
|
||||
├── System prompt: ~8k tokens (4%)
|
||||
├── Available tools: ~5k tokens (2.5%)
|
||||
├── MCP servers: 0-24k tokens (0-12%)
|
||||
├── CLAUDE.md: 0-23k tokens (0-11.5%)
|
||||
├── Custom agents: ~2k tokens (1%)
|
||||
└── Available for work: 138-185k tokens (69-92.5%)
|
||||
|
||||
Best case (optimized): 185k available
|
||||
Worst case (unoptimized): 138k available
|
||||
Difference: 47k tokens (25% of total capacity!)
|
||||
```
|
||||
|
||||
### Real Example from the Field
|
||||
|
||||
> "We were 14% away from exploding our context in our scout-plan-build workflow."
|
||||
|
||||
```text
|
||||
Scout-Plan-Build execution:
|
||||
├── Base context: 15k tokens
|
||||
├── Scout work (4 sub-agents): +40k tokens
|
||||
├── Planner work: +35k tokens
|
||||
├── Builder work: +80k tokens
|
||||
└── Total: 170k tokens
|
||||
|
||||
With autocompact buffer (22%):
|
||||
170k / 0.78 = 218k tokens
|
||||
❌ Exceeds 200k limit by 18k (9% overflow)
|
||||
|
||||
Without autocompact buffer:
|
||||
170k / 1.0 = 170k tokens
|
||||
✅ Within limits with 30k buffer (15% free)
|
||||
```
|
||||
|
||||
**Lesson:** Every percentage point matters when approaching limits.
|
||||
|
||||
## Common Context Explosion Patterns
|
||||
|
||||
### Pattern 1: The Sponge Agent
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Agent reads entire codebase
|
||||
- Opens 50+ files
|
||||
- Context grows 10k tokens every few minutes
|
||||
|
||||
**Cause:** No filtering strategy
|
||||
|
||||
**Fix:**
|
||||
|
||||
```bash
|
||||
# Before: Agent reads everything
|
||||
Agent: "Analyzing codebase..."
|
||||
[reads 100 files = 150k tokens]
|
||||
|
||||
# After: Scout first
|
||||
/scout "Find files related to authentication"
|
||||
# Scout outputs: 5 relevant files
|
||||
Agent reads only those 5 files = 8k tokens
|
||||
```
|
||||
|
||||
### Pattern 2: The Accumulator
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Long conversation
|
||||
- Many tool calls
|
||||
- Context steadily grows to limit
|
||||
|
||||
**Cause:** Not resetting agent between phases
|
||||
|
||||
**Fix:**
|
||||
|
||||
```bash
|
||||
# Phase 1: Exploration
|
||||
[Agent explores, context hits 120k]
|
||||
|
||||
# Phase 2: Implementation
|
||||
# ❌ Bad: Continue same agent (will overflow)
|
||||
# ✅ Good: New agent with context bundle
|
||||
|
||||
/loadbundle context-from-phase-1.md
|
||||
# Fresh agent (15k) + bundle (20k) = 35k tokens
|
||||
# Ready for implementation without overflow
|
||||
```
|
||||
|
||||
### Pattern 3: The Observer
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Orchestrator context growing rapidly
|
||||
- Watching all sub-agent work
|
||||
- Can't coordinate more than 2-3 agents
|
||||
|
||||
**Cause:** Not using sleep pattern
|
||||
|
||||
**Fix:**
|
||||
|
||||
```python
|
||||
# ❌ Bad: Orchestrator watches everything
|
||||
for agent in agents:
|
||||
result = orchestrator.watch_agent_work(agent) # Observes all work
|
||||
orchestrator.context += result # Context explodes
|
||||
|
||||
# ✅ Good: Orchestrator sleeps
|
||||
for agent in agents:
|
||||
orchestrator.create_and_command(agent)
|
||||
orchestrator.sleep() # Not observing
|
||||
|
||||
orchestrator.wake_and_check_status() # Only reads summaries
|
||||
```
|
||||
|
||||
## The "200k is Plenty" Principle
|
||||
|
||||
> "I'm super excited for larger effective context windows, but 200k context window is plenty. You're just stuffing a single agent with too much work."
|
||||
|
||||
**The mindset shift:**
|
||||
|
||||
```text
|
||||
Beginner thinking:
|
||||
"I need a bigger context window"
|
||||
"If only I had 500k tokens..."
|
||||
"My task is too complex for 200k"
|
||||
|
||||
Expert thinking:
|
||||
"I need better context management"
|
||||
"I'm overloading a single agent"
|
||||
"I should split this across focused agents"
|
||||
```
|
||||
|
||||
**The truth:** Most context explosions are design problems, not capacity problems.
|
||||
|
||||
### Why 200k is Sufficient
|
||||
|
||||
**With proper protection:**
|
||||
|
||||
```text
|
||||
Task: Refactor authentication across 50-file codebase
|
||||
|
||||
Approach 1 (Single Agent - fails):
|
||||
├── Agent reads 50 files: 75k tokens
|
||||
├── Agent plans changes: 20k tokens
|
||||
├── Agent implements: 80k tokens
|
||||
├── Agent tests: 30k tokens
|
||||
└── Total: 205k tokens ❌ (overflow by 5k)
|
||||
|
||||
Approach 2 (Multi-Agent - succeeds):
|
||||
├── Scout finds relevant 10 files: 15k tokens
|
||||
├── Planner creates strategy: 20k tokens (new agent)
|
||||
├── Builder 1 (auth logic): 35k tokens (new agent)
|
||||
├── Builder 2 (UI changes): 30k tokens (new agent)
|
||||
├── Tester verifies: 25k tokens (new agent)
|
||||
└── Max per agent: 35k tokens ✅ (all within limits)
|
||||
```
|
||||
|
||||
## Integration with Other Patterns
|
||||
|
||||
Context window protection enables:
|
||||
|
||||
**Progressive Disclosure:**
|
||||
|
||||
- Reduces: Minimal static context
|
||||
- Enables: Dynamic loading via priming
|
||||
|
||||
**Core 4 Management:**
|
||||
|
||||
- Protects: Context (pillar #1)
|
||||
- Enables: Better model/prompt/tools choices
|
||||
|
||||
**Orchestration:**
|
||||
|
||||
- Requires: Context protection (orchestrator sleep)
|
||||
- Enables: Fleet management without overflow
|
||||
|
||||
**Observability:**
|
||||
|
||||
- Monitors: Context usage via hooks
|
||||
- Prevents: Unnoticed context explosion
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Reduce and Delegate** - The only two strategies that matter
|
||||
|
||||
2. **A focused agent is a performant agent** - Single-purpose beats multi-purpose
|
||||
|
||||
3. **Agents are deletable** - Free context by removing completed agents
|
||||
|
||||
4. **200k is plenty** - Context explosions are design problems
|
||||
|
||||
5. **Monitor constantly** - `/context` command is your best friend
|
||||
|
||||
6. **Orchestrators must sleep** - Don't observe all agent work
|
||||
|
||||
7. **Context bundles over full replay** - 70% of context in 10% of tokens
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary sources:**
|
||||
|
||||
- Elite Context Engineering (R&D framework, 4 levels, all tactics)
|
||||
- Claude 2.0 (autocompact buffer, hard limits, scout-plan-build)
|
||||
|
||||
**Supporting sources:**
|
||||
|
||||
- One Agent to Rule Them All (orchestrator sleep, 200k principle, deletable agents)
|
||||
- Sub-Agents (sub-agent delegation, context isolation)
|
||||
|
||||
**Key quotes:**
|
||||
|
||||
- "200k context window is plenty. You're just stuffing a single agent with too much work." (One Agent)
|
||||
- "A focused agent is a performant agent." (Elite Context Engineering)
|
||||
- "We were 14% away from exploding our context." (Claude 2.0)
|
||||
- "There are only two ways to manage your context window: R and D." (Elite Context Engineering)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Progressive Disclosure](../reference/progressive-disclosure.md) - Context loading strategies
|
||||
- [Orchestrator Pattern](orchestrator-pattern.md) - Fleet management requiring protection
|
||||
- [Evolution Path](../workflows/evolution-path.md) - Progression through protection levels
|
||||
- [Core 4 Framework](../reference/core-4-framework.md) - Context as first pillar
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Context window management separates beginners from experts. Master it, and you can scale infinitely with focused agents.
|
||||
434
skills/multi-agent-composition/patterns/decision-framework.md
Normal file
434
skills/multi-agent-composition/patterns/decision-framework.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# Decision Framework: Choosing the Right Claude Code Component
|
||||
|
||||
This guide helps you choose the right Claude Code component for your specific task. **Always start with prompts**—master the primitive first before scaling to other components.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [The Decision Tree](#the-decision-tree)
|
||||
- [Quick Reference: Decision Matrix](#quick-reference-decision-matrix)
|
||||
- [When to Use Each Component](#when-to-use-each-component)
|
||||
- [Use Skills When](#use-skills-when)
|
||||
- [Use Sub-Agents When](#use-sub-agents-when)
|
||||
- [Use Slash Commands When](#use-slash-commands-when)
|
||||
- [Use MCP Servers When](#use-mcp-servers-when)
|
||||
- [Use Hooks When](#use-hooks-when)
|
||||
- [Use Plugins When](#use-plugins-when)
|
||||
- [Use Case Examples from the Field](#use-case-examples-from-the-field)
|
||||
- [Composition Rules and Boundaries](#composition-rules-and-boundaries)
|
||||
- [What Can Compose What](#what-can-compose-what)
|
||||
- [Critical Composition Rules](#critical-composition-rules)
|
||||
- [The Proper Evolution Path](#the-proper-evolution-path)
|
||||
- [Stage 1: Start with a Prompt](#stage-1-start-with-a-prompt)
|
||||
- [Stage 2: Add Sub-Agent if Parallelism Needed](#stage-2-add-sub-agent-if-parallelism-needed)
|
||||
- [Stage 3: Create Skill When Management Needed](#stage-3-create-skill-when-management-needed)
|
||||
- [Stage 4: Add MCP if External Data Needed](#stage-4-add-mcp-if-external-data-needed)
|
||||
- [Common Decision Anti-Patterns](#common-decision-anti-patterns)
|
||||
- [Anti-Pattern 1: Converting All Slash Commands to Skills](#anti-pattern-1-converting-all-slash-commands-to-skills)
|
||||
- [Anti-Pattern 2: Using Skills for One-Off Tasks](#anti-pattern-2-using-skills-for-one-off-tasks)
|
||||
- [Anti-Pattern 3: Skipping the Primitive](#anti-pattern-3-skipping-the-primitive)
|
||||
- [Anti-Pattern 4: Using Sub-Agents When Context Matters](#anti-pattern-4-using-sub-agents-when-context-matters)
|
||||
- [Anti-Pattern 5: Forgetting MCP is for External Only](#anti-pattern-5-forgetting-mcp-is-for-external-only)
|
||||
- [Decision Checklist](#decision-checklist)
|
||||
- [Summary: The Golden Rules](#summary-the-golden-rules)
|
||||
|
||||
## The Decision Tree
|
||||
|
||||
Start here when deciding which component to use:
|
||||
|
||||
```text
|
||||
1. START HERE: Build a Prompt (Slash Command)
|
||||
↓
|
||||
2. Need parallelization or isolated context?
|
||||
YES → Use Sub-Agent
|
||||
NO → Continue
|
||||
↓
|
||||
3. External data/service integration?
|
||||
YES → Use MCP Server
|
||||
NO → Continue
|
||||
↓
|
||||
4. One-off task (simple, direct)?
|
||||
YES → Use Slash Command
|
||||
NO → Continue
|
||||
↓
|
||||
5. Repeatable workflow (pattern detection)?
|
||||
YES → Use Agent Skill
|
||||
NO → Continue
|
||||
↓
|
||||
6. Lifecycle event automation?
|
||||
YES → Use Hook
|
||||
NO → Continue
|
||||
↓
|
||||
7. Sharing/distributing to team?
|
||||
YES → Use Plugin
|
||||
NO → Default to Slash Command (prompt)
|
||||
```
|
||||
|
||||
**Critical Rule:** Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
|
||||
|
||||
## Quick Reference: Decision Matrix
|
||||
|
||||
| Task Type | Component | Reason |
|
||||
|-----------|-----------|---------|
|
||||
| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
|
||||
| External data/service access | MCP Server | Integration point |
|
||||
| Parallel/isolated work | Sub-Agent | Context isolation |
|
||||
| Parallel workflow tasks | Sub-Agent | **Whenever you see "parallel", think sub-agents** |
|
||||
| One-off task | Slash Command | Simple, direct |
|
||||
| Lifecycle automation | Hook | Event-driven |
|
||||
| Team distribution | Plugin | Packaging |
|
||||
|
||||
## When to Use Each Component
|
||||
|
||||
### Use Skills When
|
||||
|
||||
**Signal keywords:** "automatic," "repeat," "manage," "workflow"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- You have a **REPEAT** problem that needs **MANAGEMENT**
|
||||
- Multiple related operations need coordination
|
||||
- You want **automatic** behavior (agent-invoked)
|
||||
- The problem domain requires orchestration of multiple components
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Managing git work trees (create, list, remove, merge, update)
|
||||
- Detecting style guide violations across codebase
|
||||
- Automatic PDF text extraction and processing
|
||||
- Video processing workflows with multiple steps
|
||||
|
||||
**NOT for:**
|
||||
|
||||
- One-off tasks → Use Slash Command instead
|
||||
- Simple operations → Use Slash Command instead
|
||||
- Problems solved well by a single prompt → Don't over-engineer
|
||||
|
||||
**Remember:** Skills are for managing problem domains, not solving one-off tasks.
|
||||
|
||||
### Use Sub-Agents When
|
||||
|
||||
**Signal keywords:** "parallel," "scale," "bulk," "isolated," "batch"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- **Parallelization** is needed
|
||||
- **Context isolation** is required
|
||||
- Scale tasks and batch operations
|
||||
- You're okay with losing context afterward
|
||||
- Each task can run independently
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Comprehensive security audits
|
||||
- Fix & debug tests at scale
|
||||
- Parallel workflow tasks
|
||||
- Bulk operations on multiple files
|
||||
- Isolated research that doesn't pollute main context
|
||||
|
||||
**NOT for:**
|
||||
|
||||
- Tasks that need to share context → Use main conversation
|
||||
- Sequential operations → Use Slash Command or Skill
|
||||
- Tasks that need to spawn more sub-agents → Hard limit: no nesting
|
||||
|
||||
**Critical constraint:** You must be okay with losing context afterward. Sub-agent context doesn't persist in the main conversation (unless you use resumable sub-agents).
|
||||
|
||||
**Golden rule:** "Whenever you see parallel, you should always just think sub-agents. Nothing else supports parallel calling."
|
||||
|
||||
### Use Slash Commands When
|
||||
|
||||
**Signal keywords:** "one-off," "simple," "quick," "manual"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- One-off tasks
|
||||
- Simple repeatable actions
|
||||
- You're starting a new workflow
|
||||
- Building the primitive before composing
|
||||
- You want manual control over invocation
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Git commit messages (one at a time)
|
||||
- Create UI component
|
||||
- Run specific code generation
|
||||
- Execute a well-defined task
|
||||
- Quick transformations
|
||||
|
||||
**Philosophy:** "Have a strong bias towards slash commands. And then when you're thinking about composing many slash commands, sub-agents or MCPs, think about putting them in a skill."
|
||||
|
||||
**Remember:** Slash commands are the primitive foundation. Master these first before anything else.
|
||||
|
||||
### Use MCP Servers When
|
||||
|
||||
**Signal keywords:** "external," "database," "API," "service," "integration"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- External integrations are needed
|
||||
- Data sources outside Claude Code
|
||||
- Third-party services
|
||||
- Database connections
|
||||
- Real-time data access
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Connect to Jira
|
||||
- Query databases (PostgreSQL, etc.)
|
||||
- Fetch real-time weather data
|
||||
- GitHub integration
|
||||
- Slack integration
|
||||
- Figma designs
|
||||
|
||||
**NOT for:**
|
||||
|
||||
- Internal orchestration → Use Skills instead
|
||||
- Pure computation → Use Slash Command or Skill
|
||||
|
||||
**Clear rule:** External = MCP, Internal orchestration = Skills
|
||||
|
||||
**Context consideration:** MCP servers can "torch your context window" by loading all their context at startup, unlike Skills which use progressive disclosure.
|
||||
|
||||
### Use Hooks When
|
||||
|
||||
**Signal keywords:** "lifecycle," "event," "automation," "deterministic"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- Deterministic automation at lifecycle events
|
||||
- Want to execute commands at specific moments
|
||||
- Need to balance agent autonomy with deterministic control
|
||||
- Workflow automation that should always happen
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Run linters before code submission
|
||||
- Auto-format code after generation
|
||||
- Trigger tests after file changes
|
||||
- Capture context at specific points
|
||||
|
||||
**Philosophy:** "If you really want to scale, you need both" - agents AND deterministic workflows.
|
||||
|
||||
**Use for:** Adding determinism rather than always relying on the agent to decide.
|
||||
|
||||
### Use Plugins When
|
||||
|
||||
**Signal keywords:** "share," "distribute," "package," "team"
|
||||
|
||||
**Criteria:**
|
||||
|
||||
- Sharing/distributing to team
|
||||
- Packaging multiple components together
|
||||
- Reusable work across projects
|
||||
- Team-wide extensions
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
- Distribute custom skills to team
|
||||
- Bundle MCP servers for automatic start
|
||||
- Share slash commands across projects
|
||||
- Package hooks and configurations
|
||||
|
||||
**Philosophy:** "Plugins let you package and distribute these sets of work. This isn't super interesting. It's just a way to share and reuse cloud code extensions."
|
||||
|
||||
## Use Case Examples from the Field
|
||||
|
||||
Real examples with reasoning:
|
||||
|
||||
| Use Case | Component | Reasoning |
|
||||
|----------|-----------|-----------|
|
||||
| Automatic PDF text extraction | Agent Skill | Keyword "automatic", repeat behavior |
|
||||
| Connect to Jira | MCP Server | External source |
|
||||
| Comprehensive security audit | Sub-Agent | Scale, isolated context, not automatic |
|
||||
| Generalized git commit messages | Slash Command | Simple one-step task |
|
||||
| Query database | MCP Server | External data source (start here) |
|
||||
| Fix/debug tests at scale | Sub-Agent | Parallel work, scale |
|
||||
| Detect style guide violations | Agent Skill | Repeat behavior pattern |
|
||||
| Fetch real-time weather | MCP Server | Third-party service integration |
|
||||
| Create UI component | Slash Command | Simple one-off task |
|
||||
| Parallel workflow tasks | Sub-Agent | Keyword "parallel" |
|
||||
|
||||
## Composition Rules and Boundaries
|
||||
|
||||
### What Can Compose What
|
||||
|
||||
**Skills (Top Compositional Layer):**
|
||||
|
||||
- ✅ Can use: MCP Servers
|
||||
- ✅ Can use: Sub-Agents
|
||||
- ✅ Can use: Slash Commands
|
||||
- ✅ Can use: Other Skills
|
||||
- ❌ Cannot: Nest sub-agents/prompts directly (must use SlashCommand tool)
|
||||
|
||||
**Slash Commands (Primitive + Compositional):**
|
||||
|
||||
- ✅ Can use: Skills (via SlashCommand tool)
|
||||
- ✅ Can use: MCP Servers
|
||||
- ✅ Can use: Sub-Agents
|
||||
- ✅ Acts as: BOTH primitive AND composition point
|
||||
|
||||
**Sub-Agents (Execution Layer):**
|
||||
|
||||
- ✅ Can use: Slash Commands (via SlashCommand tool)
|
||||
- ✅ Can use: Skills (via SlashCommand tool)
|
||||
- ❌ CANNOT use: Other Sub-Agents (hard limit)
|
||||
|
||||
**MCP Servers (Integration Layer):**
|
||||
|
||||
- ℹ️ Lower level unit, used BY skills
|
||||
- ℹ️ Not using skills
|
||||
- ℹ️ Expose services to all components
|
||||
|
||||
### Critical Composition Rules
|
||||
|
||||
1. **Sub-Agents cannot nest** - No sub-agent spawning other sub-agents (prevents infinite nesting)
|
||||
2. **Skills don't execute code** - They guide Claude to use available tools
|
||||
3. **Slash commands can be invoked manually or via SlashCommand tool**
|
||||
4. **Skills use the SlashCommand tool** to compose prompts and sub-agents
|
||||
5. **No circular dependencies** - Skills can use other skills but cannot nest circularly
|
||||
|
||||
## The Proper Evolution Path
|
||||
|
||||
When building new capabilities, follow this progression:
|
||||
|
||||
### Stage 1: Start with a Prompt
|
||||
|
||||
**Goal:** Solve the basic problem
|
||||
|
||||
Create a simple prompt or slash command that accomplishes the core task.
|
||||
|
||||
**Example (Git Work Trees):** Create one work tree
|
||||
|
||||
```bash
|
||||
/create-worktree feature-branch
|
||||
```
|
||||
|
||||
**When to stay here:** The task is one-off or infrequent.
|
||||
|
||||
### Stage 2: Add Sub-Agent if Parallelism Needed
|
||||
|
||||
**Goal:** Scale to multiple parallel operations
|
||||
|
||||
If you need to do the same thing many times in parallel, use a sub-agent.
|
||||
|
||||
**Example (Git Work Trees):** Create multiple work trees in parallel
|
||||
|
||||
```bash
|
||||
Use sub-agent to create work trees for: feature-a, feature-b, feature-c in parallel
|
||||
```
|
||||
|
||||
**When to stay here:** Parallel execution is the only requirement, no orchestration needed.
|
||||
|
||||
### Stage 3: Create Skill When Management Needed
|
||||
|
||||
**Goal:** Bundle multiple related operations
|
||||
|
||||
When the problem grows to require management, create a skill.
|
||||
|
||||
**Example (Git Work Trees):** Manage work trees (create, list, remove, merge, update)
|
||||
|
||||
Now you have a cohesive work tree manager skill that:
|
||||
|
||||
- Creates new work trees
|
||||
- Lists existing work trees
|
||||
- Removes old work trees
|
||||
- Merges work trees
|
||||
- Updates work tree status
|
||||
|
||||
**When to stay here:** Most domain-specific workflows stop here.
|
||||
|
||||
### Stage 4: Add MCP if External Data Needed
|
||||
|
||||
**Goal:** Integrate external systems
|
||||
|
||||
Only add MCP servers when you need data from outside Claude Code.
|
||||
|
||||
**Example (Git Work Trees):** Query external repo metadata from GitHub API
|
||||
|
||||
Now your skill can query GitHub for:
|
||||
|
||||
- Branch protection rules
|
||||
- CI/CD status
|
||||
- Pull request information
|
||||
|
||||
**Final state:** Full-featured work tree manager with external integration.
|
||||
|
||||
## Common Decision Anti-Patterns
|
||||
|
||||
### ❌ Anti-Pattern 1: Converting All Slash Commands to Skills
|
||||
|
||||
**Mistake:** "I'm going to convert all my slash commands to skills because skills are better."
|
||||
|
||||
**Why wrong:** Skills are for repeatable workflows that need management, not simple one-off tasks. Slash commands are the primitive—you need them.
|
||||
|
||||
**Correct approach:** Keep slash commands for simple tasks. Only create a skill when you're managing a problem domain with multiple related operations.
|
||||
|
||||
### ❌ Anti-Pattern 2: Using Skills for One-Off Tasks
|
||||
|
||||
**Mistake:** "I need to create a UI component once, so I'll build a skill for it."
|
||||
|
||||
**Why wrong:** Skills are for repeat problems. One-off tasks should use slash commands.
|
||||
|
||||
**Correct approach:** Use a slash command for the one-off task. If you find yourself doing it repeatedly, then consider a skill.
|
||||
|
||||
### ❌ Anti-Pattern 3: Skipping the Primitive
|
||||
|
||||
**Mistake:** "I'm going to start by building a skill because it's more advanced."
|
||||
|
||||
**Why wrong:** If you don't master prompts, you can't build effective skills. Everything is prompts in the end.
|
||||
|
||||
**Correct approach:** Always start with a prompt. Build the primitive first. Scale up only when needed.
|
||||
|
||||
### ❌ Anti-Pattern 4: Using Sub-Agents When Context Matters
|
||||
|
||||
**Mistake:** "I'll use a sub-agent for this research task and then reference the findings later."
|
||||
|
||||
**Why wrong:** Sub-agent context is isolated. You lose it after the sub-agent finishes (unless using resumable sub-agents).
|
||||
|
||||
**Correct approach:** If you need the context later, do the work in the main conversation or use a resumable sub-agent.
|
||||
|
||||
### ❌ Anti-Pattern 5: Forgetting MCP is for External Only
|
||||
|
||||
**Mistake:** "I'll build an MCP server to orchestrate internal workflows."
|
||||
|
||||
**Why wrong:** MCP servers are for external integrations. Internal orchestration should use skills.
|
||||
|
||||
**Correct approach:** MCP = external, Skills = internal orchestration. Keep them separate.
|
||||
|
||||
## Decision Checklist
|
||||
|
||||
Before you start building, ask yourself:
|
||||
|
||||
**Basic Questions:**
|
||||
|
||||
- [ ] Have I started with a prompt? (Non-negotiable)
|
||||
- [ ] Is this a one-off task or repeatable?
|
||||
- [ ] Do I need external data or services?
|
||||
- [ ] Is parallelization required?
|
||||
- [ ] Am I okay losing context after execution?
|
||||
|
||||
**Composition Questions:**
|
||||
|
||||
- [ ] Am I trying to nest sub-agents? (Not allowed)
|
||||
- [ ] Am I converting a simple slash command to a skill? (Probably wrong)
|
||||
- [ ] Am I using MCP for internal orchestration? (Should use skills)
|
||||
- [ ] Have I considered the evolution path? (Prompt → Sub-agent → Skill → MCP)
|
||||
|
||||
**Context Questions:**
|
||||
|
||||
- [ ] Will this torch my context window? (MCP consideration)
|
||||
- [ ] Do I need progressive disclosure? (Skills benefit)
|
||||
- [ ] Is context isolation critical? (Sub-agent benefit)
|
||||
- [ ] Will I need this context later? (Don't use sub-agent)
|
||||
|
||||
## Summary: The Golden Rules
|
||||
|
||||
1. **Always start with prompts** - Master the primitive first
|
||||
2. **"Parallel" keyword = Sub-Agents** - Nothing else supports parallel calling
|
||||
3. **External = MCP, Internal = Skills** - Clear separation of concerns
|
||||
4. **One-off = Slash Command** - Don't over-engineer
|
||||
5. **Repeat + Management = Skill** - Only scale when needed
|
||||
6. **Don't convert all slash commands to skills** - Huge mistake
|
||||
7. **Skills compose upward, not downward** - Build from primitives
|
||||
|
||||
Remember The Core 4: Context, Model, Prompt, Tools. Master these fundamentals, and you'll master the compositional units.
|
||||
925
skills/multi-agent-composition/patterns/hooks-in-composition.md
Normal file
925
skills/multi-agent-composition/patterns/hooks-in-composition.md
Normal file
@@ -0,0 +1,925 @@
|
||||
# Hooks for Observability and Control
|
||||
|
||||
> "When it comes to agentic coding, observability is everything. How well you can observe, iterate, and improve your agentic system is going to be a massive differentiating factor."
|
||||
|
||||
Claude Code hooks provide deterministic control over agent behavior and enable comprehensive monitoring of multi-agent systems.
|
||||
|
||||
## What Are Hooks?
|
||||
|
||||
**Hooks are lifecycle event handlers that let you execute custom code at specific points in Claude Code's execution.**
|
||||
|
||||
```text
|
||||
Agent Lifecycle:
|
||||
├── pre-tool-use hook → Before any tool executes
|
||||
├── [Tool executes]
|
||||
├── post-tool-use hook → After tool completes
|
||||
├── notification hook → When agent needs input
|
||||
├── sub-agent-stop hook → When sub-agent finishes
|
||||
└── stop hook → When agent completes response
|
||||
```
|
||||
|
||||
**Two killer use cases:**
|
||||
|
||||
1. **Observability** - Know what your agents are doing
|
||||
2. **Control** - Steer and block agent behavior
|
||||
|
||||
## The Five Hooks
|
||||
|
||||
### 1. pre-tool-use
|
||||
|
||||
**When it fires:** Before any tool executes
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Block dangerous commands (`rm -rf`, destructive operations)
|
||||
- Prevent access to sensitive files (`.env`, `credentials.json`)
|
||||
- Log tool attempts before execution
|
||||
- Validate tool parameters
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"toolName": "bash",
|
||||
"toolInput": {
|
||||
"command": "rm -rf /",
|
||||
"description": "Remove all files"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Block dangerous commands**
|
||||
|
||||
```python
|
||||
# .claude/hooks/pre-tool-use.py
|
||||
# /// script
|
||||
# dependencies = []
|
||||
# ///
|
||||
|
||||
import sys
|
||||
import json
|
||||
import re
|
||||
|
||||
def is_dangerous_remove_command(tool_name, tool_input):
|
||||
"""Block any rm -rf commands"""
|
||||
if tool_name != "bash":
|
||||
return False
|
||||
|
||||
command = tool_input.get("command", "")
|
||||
dangerous_patterns = [
|
||||
r'\brm\s+-rf\b',
|
||||
r'\brm\s+-fr\b',
|
||||
r'\brm\s+.*-[rf].*\*',
|
||||
]
|
||||
|
||||
return any(re.search(pattern, command) for pattern in dangerous_patterns)
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
tool_name = input_data.get("toolName")
|
||||
tool_input = input_data.get("toolInput", {})
|
||||
|
||||
if is_dangerous_remove_command(tool_name, tool_input):
|
||||
# Block the command
|
||||
output = {
|
||||
"allow": False,
|
||||
"message": "❌ Blocked dangerous rm command"
|
||||
}
|
||||
else:
|
||||
output = {"allow": True}
|
||||
|
||||
print(json.dumps(output))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Configuration in settings.json:**
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"pre-tool-use": [
|
||||
{
|
||||
"matcher": {}, // Empty = matches all tools
|
||||
"commands": [
|
||||
"uv run .claude/hooks/pre-tool-use.py"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. post-tool-use
|
||||
|
||||
**When it fires:** After a tool completes execution
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Log tool execution results
|
||||
- Track which tools are used most frequently
|
||||
- Measure tool execution time
|
||||
- Build observability dashboards
|
||||
- Summarize tool output with small models
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"toolName": "write",
|
||||
"toolInput": {
|
||||
"file_path": "/path/to/file.py",
|
||||
"content": "..."
|
||||
},
|
||||
"toolResult": {
|
||||
"success": true,
|
||||
"output": "File written successfully"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Event logging with summarization**
|
||||
|
||||
```python
|
||||
# .claude/hooks/post-tool-use.py
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
from anthropic import Anthropic
|
||||
|
||||
def summarize_event(tool_name, tool_input, tool_result):
|
||||
"""Use Haiku to summarize what happened"""
|
||||
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
|
||||
prompt = f"""Summarize this tool execution in 1 sentence:
|
||||
Tool: {tool_name}
|
||||
Input: {json.dumps(tool_input, indent=2)}
|
||||
Result: {json.dumps(tool_result, indent=2)}
|
||||
|
||||
Be concise and focus on what was accomplished."""
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-3-haiku-20240307", # Small, fast, cheap
|
||||
max_tokens=100,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
|
||||
return response.content[0].text
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
|
||||
# Generate summary using small model
|
||||
summary = summarize_event(
|
||||
input_data.get("toolName"),
|
||||
input_data.get("toolInput", {}),
|
||||
input_data.get("toolResult", {})
|
||||
)
|
||||
|
||||
# Log the event with summary
|
||||
event = {
|
||||
"toolName": input_data["toolName"],
|
||||
"summary": summary,
|
||||
"timestamp": input_data.get("timestamp")
|
||||
}
|
||||
|
||||
# Send to observability server
|
||||
send_to_server(event)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Why small models?** "I've sent thousands of these events. I've spent less than 20 cents. This is where small fast models really shine."
|
||||
|
||||
### 3. notification
|
||||
|
||||
**When it fires:** When Claude Code needs user input (permission request)
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Text-to-speech notifications
|
||||
- Send alerts to phone/Slack
|
||||
- Log permission requests
|
||||
- Auto-approve specific tools
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"message": "Your agent needs your input",
|
||||
"context": {
|
||||
"toolName": "bash",
|
||||
"command": "bun run apps/hello.ts"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Text-to-speech notification**
|
||||
|
||||
```python
|
||||
# .claude/hooks/notification.py
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
|
||||
def speak(text):
|
||||
"""Use 11Labs API for text-to-speech"""
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/text-to-speech-elevenlabs.py",
|
||||
text
|
||||
])
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
message = input_data.get("message", "Your agent needs your input")
|
||||
|
||||
# Speak the notification
|
||||
speak(message)
|
||||
|
||||
# Log it
|
||||
print(json.dumps({"notified": True}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
### 4. stop
|
||||
|
||||
**When it fires:** Every time Claude Code finishes responding
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Copy full chat transcript for analysis
|
||||
- Completion notifications (text-to-speech)
|
||||
- Session logging
|
||||
- Performance metrics
|
||||
- Agent output summarization
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"transcriptPath": "/path/to/chat-transcript.json",
|
||||
"sessionId": "abc123",
|
||||
"timestamp": "2025-01-05T14:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Save full conversation**
|
||||
|
||||
```python
|
||||
# .claude/hooks/stop.py
|
||||
import sys
|
||||
import json
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
transcript_path = input_data.get("transcriptPath")
|
||||
|
||||
if not transcript_path:
|
||||
return
|
||||
|
||||
# Copy transcript to logs directory
|
||||
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
session_id = input_data.get("sessionId", "unknown")
|
||||
|
||||
logs_dir = Path(".claude/logs")
|
||||
logs_dir.mkdir(exist_ok=True)
|
||||
|
||||
dest = logs_dir / f"chat-{timestamp}-{session_id[:8]}.json"
|
||||
shutil.copy(transcript_path, dest)
|
||||
|
||||
# Announce completion
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/text-to-speech.py",
|
||||
"All set and ready for your next step"
|
||||
])
|
||||
|
||||
print(json.dumps({"logged": True, "file": str(dest)}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Key insight:** "The stop event is the perfect time to copy the entire chat conversation. This is key for observability. What happened? How can we improve it?"
|
||||
|
||||
### 5. sub-agent-stop
|
||||
|
||||
**When it fires:** When a sub-agent completes its work
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- Track parallel sub-agent completion
|
||||
- Per-agent performance metrics
|
||||
- Multi-agent orchestration logging
|
||||
- Progress notifications for long-running jobs
|
||||
|
||||
**Available data:**
|
||||
|
||||
```json
|
||||
{
|
||||
"subAgentId": "agent-123",
|
||||
"transcriptPath": "/path/to/sub-agent-transcript.json",
|
||||
"sessionId": "parent-abc123",
|
||||
"timestamp": "2025-01-05T14:32:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Sub-agent completion tracking**
|
||||
|
||||
```python
|
||||
# .claude/hooks/sub-agent-stop.py
|
||||
import sys
|
||||
import json
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
|
||||
# Log sub-agent completion
|
||||
event = {
|
||||
"type": "sub-agent-complete",
|
||||
"agentId": input_data.get("subAgentId"),
|
||||
"timestamp": input_data.get("timestamp")
|
||||
}
|
||||
|
||||
# Send to observability system
|
||||
send_event(event)
|
||||
|
||||
# Announce
|
||||
speak("Sub agent complete")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Multi-Agent Observability Architecture
|
||||
|
||||
When scaling to 3, 5, 10+ agents, hooks enable comprehensive system visibility.
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Multiple Agents │
|
||||
│ Agent 1 Agent 2 Agent 3 ... Agent N │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────┴──────────┴──────────────────┘ │
|
||||
│ │ │
|
||||
│ Hooks fire │
|
||||
│ ↓ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Bun/Node Server │
|
||||
│ ┌────────────────┐ ┌──────────────┐ │
|
||||
│ │ HTTP Endpoint │────────→│ SQLite DB │ │
|
||||
│ │ /events │ │ (persistence)│ │
|
||||
│ └────────────────┘ └──────────────┘ │
|
||||
│ │ │
|
||||
│ └────────────→ WebSocket Broadcast │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Web Client (Vue/React) │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Live Activity Pulse (1min/3min/5min windows) │ │
|
||||
│ ├──────────────────────────────────────────────────────┤ │
|
||||
│ │ Event Stream (filtered by app/session/event type) │ │
|
||||
│ ├──────────────────────────────────────────────────────┤ │
|
||||
│ │ Event Details (with AI summaries) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Design Principles
|
||||
|
||||
**1. One-Way Data Stream**
|
||||
|
||||
```text
|
||||
Agent → Hook → Server → Database + WebSocket → Client
|
||||
```
|
||||
|
||||
"This one-way data stream keeps things really simple. Every agent is responsible for summarizing their work in the hook before they send it off."
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- Simple architecture
|
||||
- Easy to reason about
|
||||
- No bidirectional complexity
|
||||
- Fast real-time updates
|
||||
|
||||
**2. Event Summarization at the Edge**
|
||||
|
||||
```python
|
||||
# In the hook (runs on agent side)
|
||||
def send_event(app_name, event_type, event_data, summarize=True):
|
||||
if summarize:
|
||||
# Use Haiku to summarize before sending
|
||||
summary = summarize_with_haiku(event_data)
|
||||
event_data["summary"] = summary
|
||||
|
||||
# Send to server
|
||||
requests.post("http://localhost:3000/events", json={
|
||||
"app": app_name,
|
||||
"type": event_type,
|
||||
"data": event_data,
|
||||
"sessionId": os.getenv("CLAUDE_SESSION_ID")
|
||||
})
|
||||
```
|
||||
|
||||
**Why summarize at the edge?**
|
||||
|
||||
- Reduces server load
|
||||
- Cheaper (uses small models locally)
|
||||
- Human-readable summaries immediately available
|
||||
- No server-side LLM dependencies
|
||||
|
||||
**3. Persistent + Real-Time Storage**
|
||||
|
||||
```sql
|
||||
-- SQLite schema
|
||||
CREATE TABLE events (
|
||||
id INTEGER PRIMARY KEY,
|
||||
source_app TEXT NOT NULL,
|
||||
session_id TEXT NOT NULL,
|
||||
event_type TEXT NOT NULL,
|
||||
raw_payload JSON,
|
||||
summary TEXT,
|
||||
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
**Dual persistence:**
|
||||
|
||||
- SQLite for historical queries and analysis
|
||||
- WebSocket for live streaming to UI
|
||||
|
||||
### Implementation Example
|
||||
|
||||
**Hook script structure:**
|
||||
|
||||
```python
|
||||
# .claude/hooks/utils/send-event.py
|
||||
# /// script
|
||||
# dependencies = ["anthropic", "requests"]
|
||||
# ///
|
||||
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
import requests
|
||||
from anthropic import Anthropic
|
||||
|
||||
def summarize_with_haiku(event_data, event_type):
|
||||
"""Generate 1-sentence summary using Haiku"""
|
||||
if event_type not in ["pre-tool-use", "post-tool-use"]:
|
||||
return None
|
||||
|
||||
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
|
||||
prompt = f"Summarize this {event_type} event in 1 sentence: {json.dumps(event_data)}"
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-3-haiku-20240307",
|
||||
max_tokens=50,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
|
||||
return response.content[0].text
|
||||
|
||||
def send_event(app_name, event_type, event_data, summarize=False):
|
||||
"""Send event to observability server"""
|
||||
|
||||
payload = {
|
||||
"app": app_name,
|
||||
"sessionId": os.getenv("CLAUDE_SESSION_ID", "unknown"),
|
||||
"eventType": event_type,
|
||||
"data": event_data,
|
||||
"timestamp": event_data.get("timestamp")
|
||||
}
|
||||
|
||||
if summarize:
|
||||
payload["summary"] = summarize_with_haiku(event_data, event_type)
|
||||
|
||||
try:
|
||||
response = requests.post(
|
||||
"http://localhost:3000/events",
|
||||
json=payload,
|
||||
timeout=1
|
||||
)
|
||||
return response.status_code == 200
|
||||
except Exception as e:
|
||||
# Don't break agent if observability fails
|
||||
print(f"Warning: Failed to send event: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: send-event.py <app-name> <event-type> [--summarize]")
|
||||
sys.exit(1)
|
||||
|
||||
app_name = sys.argv[1]
|
||||
event_type = sys.argv[2]
|
||||
summarize = "--summarize" in sys.argv
|
||||
|
||||
# Read event data from stdin
|
||||
event_data = json.load(sys.stdin)
|
||||
|
||||
success = send_event(app_name, event_type, event_data, summarize)
|
||||
print(json.dumps({"sent": success}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**Using in hooks:**
|
||||
|
||||
```python
|
||||
# .claude/hooks/post-tool-use.py
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
|
||||
def main():
|
||||
input_data = json.load(sys.stdin)
|
||||
|
||||
# Send to observability system with summarization
|
||||
subprocess.run([
|
||||
"uv", "run",
|
||||
".claude/hooks/utils/send-event.py",
|
||||
"my-app", # App name
|
||||
"post-tool-use", # Event type
|
||||
"--summarize" # Generate AI summary
|
||||
], input=json.dumps(input_data), text=True)
|
||||
|
||||
print(json.dumps({"logged": True}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Isolated Scripts (Astral UV Pattern)
|
||||
|
||||
**Why:** Hooks should be self-contained, portable, and not depend on your codebase.
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# dependencies = ["anthropic", "requests"]
|
||||
# ///
|
||||
|
||||
# Astral UV single-file script
|
||||
# Runs independently with: uv run script.py
|
||||
# Auto-installs dependencies
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- Works in any codebase
|
||||
- No virtual environment setup
|
||||
- Portable across projects
|
||||
- Easy to test in isolation
|
||||
|
||||
**Alternative: Bun for TypeScript**
|
||||
|
||||
```typescript
|
||||
// .claude/hooks/post-tool-use.ts
|
||||
// Run with: bun run post-tool-use.ts
|
||||
|
||||
import { readSync } from "fs";
|
||||
|
||||
const input = JSON.parse(readSync(0, "utf-8"));
|
||||
// ... hook logic
|
||||
```
|
||||
|
||||
### 2. Never Block the Agent
|
||||
|
||||
```python
|
||||
def main():
|
||||
try:
|
||||
# Hook logic
|
||||
send_to_server(event)
|
||||
except Exception as e:
|
||||
# Log but don't fail
|
||||
print(f"Warning: {e}", file=sys.stderr)
|
||||
# Always output valid JSON
|
||||
print(json.dumps({"error": str(e)}))
|
||||
```
|
||||
|
||||
**Rule:** If observability fails, the agent should continue working.
|
||||
|
||||
### 3. Use Small Fast Models for Summaries
|
||||
|
||||
```text
|
||||
Cost comparison (1,000 events):
|
||||
├── Opus: $15 (overkill for summaries)
|
||||
├── Sonnet: $3 (still expensive)
|
||||
└── Haiku: $0.20 ✅ (perfect for this)
|
||||
```
|
||||
|
||||
"Thousands of events, less than 20 cents. Small fast cheap models shine here."
|
||||
|
||||
### 4. Hash Session IDs for UI Consistency
|
||||
|
||||
```python
|
||||
import hashlib
|
||||
|
||||
def color_for_session(session_id):
|
||||
"""Generate consistent color from session ID"""
|
||||
hash_val = int(hashlib.md5(session_id.encode()).hexdigest()[:6], 16)
|
||||
return f"#{hash_val:06x}"
|
||||
```
|
||||
|
||||
**Result:** Same agent = same color in UI, making it easy to track.
|
||||
|
||||
### 5. Filter and Paginate Events
|
||||
|
||||
```javascript
|
||||
// Client-side filtering
|
||||
const filteredEvents = events
|
||||
.filter(e => e.app === selectedApp || selectedApp === "all")
|
||||
.filter(e => e.eventType === selectedType || selectedType === "all")
|
||||
.slice(0, 100); // Limit displayed events
|
||||
|
||||
// Auto-refresh
|
||||
setInterval(() => fetchLatestEvents(), 5000);
|
||||
```
|
||||
|
||||
### 6. Multiple Hooks Per Event
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"stop": [
|
||||
{
|
||||
"matcher": {},
|
||||
"commands": [
|
||||
"uv run .claude/hooks/stop-chat-log.py",
|
||||
"uv run .claude/hooks/stop-tts.py",
|
||||
"uv run .claude/hooks/stop-notify.py"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hooks run sequentially** in the order specified.
|
||||
|
||||
### 7. Matcher Patterns for Selective Execution
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"pre-tool-use": [
|
||||
{
|
||||
"matcher": {
|
||||
"toolName": "bash"
|
||||
},
|
||||
"commands": ["uv run .claude/hooks/bash-validator.py"]
|
||||
},
|
||||
{
|
||||
"matcher": {
|
||||
"toolName": "write",
|
||||
"toolInput": {
|
||||
"file_path": "**/.env"
|
||||
}
|
||||
},
|
||||
"commands": ["uv run .claude/hooks/block-env-write.py"]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Directory Structure Best Practice
|
||||
|
||||
```text
|
||||
.claude/
|
||||
├── commands/ # Slash commands
|
||||
├── agents/ # Sub-agent definitions
|
||||
└── hooks/ # ← New essential directory
|
||||
├── settings.json # Hook configuration
|
||||
├── pre-tool-use.py
|
||||
├── post-tool-use.py
|
||||
├── notification.py
|
||||
├── stop.py
|
||||
├── sub-agent-stop.py
|
||||
└── utils/ # Shared utilities
|
||||
├── send-event.py
|
||||
├── text-to-speech-elevenlabs.py
|
||||
├── text-to-speech-openai.py
|
||||
└── summarize-haiku.py
|
||||
```
|
||||
|
||||
## Real-World Use Cases
|
||||
|
||||
### Use Case 1: Block Dangerous Operations
|
||||
|
||||
```python
|
||||
# .claude/hooks/pre-tool-use.py
|
||||
|
||||
BLOCKED_COMMANDS = [
|
||||
r'\brm\s+-rf\b', # rm -rf
|
||||
r'\bsudo\s+rm\b', # sudo rm
|
||||
r'\bgit\s+push.*--force\b', # git push --force
|
||||
r'\bdocker\s+system\s+prune\b', # docker system prune
|
||||
]
|
||||
|
||||
BLOCKED_FILES = [
|
||||
r'\.env$',
|
||||
r'credentials\.json$',
|
||||
r'\.ssh/id_rsa$',
|
||||
r'aws.*credentials',
|
||||
]
|
||||
|
||||
def is_blocked(tool_name, tool_input):
|
||||
if tool_name == "bash":
|
||||
command = tool_input.get("command", "")
|
||||
return any(re.search(p, command) for p in BLOCKED_COMMANDS)
|
||||
|
||||
if tool_name in ["read", "write", "edit"]:
|
||||
file_path = tool_input.get("file_path", "")
|
||||
return any(re.search(p, file_path) for p in BLOCKED_FILES)
|
||||
|
||||
return False
|
||||
```
|
||||
|
||||
### Use Case 2: Multi-Agent Task Board
|
||||
|
||||
```text
|
||||
Observability UI showing:
|
||||
|
||||
Active Agents (5):
|
||||
├── [Agent 1] Planning feature (12s ago)
|
||||
├── [Agent 2] Writing tests (45s ago) ⚠️ Needs input
|
||||
├── [Agent 3] Building UI (2m ago)
|
||||
├── [Agent 4] Deploying (5m ago) ✅ Complete
|
||||
└── [Agent 5] Monitoring (ongoing)
|
||||
|
||||
Recent Events (filtered: post-tool-use):
|
||||
├── Agent 3: Wrote src/components/Button.tsx
|
||||
├── Agent 1: Read src/api/endpoints.ts
|
||||
├── Agent 4: Bash: git push origin main
|
||||
└── Agent 2: Test failed: test/auth.test.ts
|
||||
```
|
||||
|
||||
### Use Case 3: Long-Running AFK Agents
|
||||
|
||||
```bash
|
||||
# Start agent with background work
|
||||
/background "Implement entire auth system" --report agents/auth-report.md
|
||||
|
||||
# Agent works autonomously
|
||||
# Hooks send notifications:
|
||||
# - "Starting authentication module"
|
||||
# - "Database schema created"
|
||||
# - "Tests passing"
|
||||
# - "All set and ready for your next step"
|
||||
|
||||
# You're notified via text-to-speech when complete
|
||||
```
|
||||
|
||||
### Use Case 4: Debugging Agent Behavior
|
||||
|
||||
```python
|
||||
# Filter stop events to analyze full chat transcripts
|
||||
|
||||
for event in events.filter(type="stop"):
|
||||
transcript = json.load(open(event.transcriptPath))
|
||||
|
||||
# Analyze:
|
||||
# - What files did agent read?
|
||||
# - What tools were used most?
|
||||
# - Where did agent get confused?
|
||||
# - What patterns led to errors?
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Webhook Timeouts
|
||||
|
||||
```python
|
||||
# Don't block agent on slow external services
|
||||
try:
|
||||
requests.post(webhook_url, json=event, timeout=0.5) # 500ms max
|
||||
except requests.Timeout:
|
||||
# Log locally instead
|
||||
log_to_file(event)
|
||||
```
|
||||
|
||||
### Database Size Management
|
||||
|
||||
```sql
|
||||
-- Rotate old events
|
||||
DELETE FROM events
|
||||
WHERE timestamp < datetime('now', '-30 days');
|
||||
|
||||
-- Or archive
|
||||
INSERT INTO events_archive SELECT * FROM events
|
||||
WHERE timestamp < datetime('now', '-30 days');
|
||||
|
||||
DELETE FROM events
|
||||
WHERE id IN (SELECT id FROM events_archive);
|
||||
```
|
||||
|
||||
### Event Batching
|
||||
|
||||
```python
|
||||
# Batch events before sending
|
||||
events_buffer = []
|
||||
|
||||
def send_event(event):
|
||||
events_buffer.append(event)
|
||||
|
||||
if len(events_buffer) >= 10:
|
||||
flush_events()
|
||||
|
||||
def flush_events():
|
||||
requests.post(server_url, json={"events": events_buffer})
|
||||
events_buffer.clear()
|
||||
```
|
||||
|
||||
## Integration with Observability Platforms
|
||||
|
||||
### Datadog
|
||||
|
||||
```python
|
||||
from datadog import statsd
|
||||
|
||||
def send_to_datadog(event):
|
||||
statsd.increment(f"claude.tool.{event['toolName']}")
|
||||
statsd.histogram(f"claude.duration.{event['toolName']}", event['duration'])
|
||||
```
|
||||
|
||||
### Prometheus
|
||||
|
||||
```python
|
||||
from prometheus_client import Counter, Histogram
|
||||
|
||||
tool_counter = Counter('claude_tool_executions', 'Tool executions', ['tool_name'])
|
||||
tool_duration = Histogram('claude_tool_duration_seconds', 'Tool duration', ['tool_name'])
|
||||
|
||||
def send_to_prometheus(event):
|
||||
tool_counter.labels(tool_name=event['toolName']).inc()
|
||||
tool_duration.labels(tool_name=event['toolName']).observe(event['duration'])
|
||||
```
|
||||
|
||||
### Slack
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def send_to_slack(event):
|
||||
if event['eventType'] == 'notification':
|
||||
requests.post(
|
||||
os.getenv("SLACK_WEBHOOK_URL"),
|
||||
json={"text": f"🤖 Agent needs input: {event['message']}"}
|
||||
)
|
||||
```
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **If you don't measure it, you can't improve it** - Observability is critical for scaling agents
|
||||
|
||||
2. **Keep hooks simple and isolated** - Use single-file scripts (UV, bun, shell)
|
||||
|
||||
3. **Never block the agent** - Hooks should be fast and fault-tolerant
|
||||
|
||||
4. **Small models for summaries** - Haiku is perfect and costs pennies
|
||||
|
||||
5. **One-way data streams** - Simple architecture beats complex bidirectional systems
|
||||
|
||||
6. **Context, Model, Prompt** - Even with hooks, the big three still matter
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary source:** Multi-Agent Observability transcript (complete system architecture, WebSocket streaming, event summarization, SQLite persistence)
|
||||
|
||||
**Supporting source:** Hooked transcript (5 hooks fundamentals, pre-tool-use implementation, text-to-speech integration, isolated scripts pattern)
|
||||
|
||||
**Key quotes:**
|
||||
|
||||
- "When it comes to agentic coding, observability is everything." (Hooked)
|
||||
- "This one-way data stream keeps things really simple." (Multi-Agent Observability)
|
||||
- "Thousands of events, less than 20 cents. Small fast models shine here." (Multi-Agent Observability)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Hooks Reference](../reference/hooks-reference.md) - Complete API reference for all 5 hooks
|
||||
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real observability systems in action
|
||||
- [Core 4 Framework](../reference/core-4-framework.md) - Context, Model, Prompt, Tools
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Observability isn't optional when scaling agents. If you can't see what they're doing, you can't scale them effectively.
|
||||
673
skills/multi-agent-composition/patterns/orchestrator-pattern.md
Normal file
673
skills/multi-agent-composition/patterns/orchestrator-pattern.md
Normal file
@@ -0,0 +1,673 @@
|
||||
# The Orchestrator Pattern
|
||||
|
||||
> "The rate at which you can create and command your agents becomes the constraint of your engineering output. When your agents are slow, you're slow."
|
||||
|
||||
The orchestrator pattern is **Level 5** of agentic engineering: managing fleets of agents through a single interface.
|
||||
|
||||
## The Journey to Orchestration
|
||||
|
||||
```text
|
||||
Level 1: Base agents → Use agents out of the box
|
||||
Level 2: Better agents → Customize prompts and workflows
|
||||
Level 3: More agents → Run multiple agents
|
||||
Level 4: Custom agents → Build specialized solutions
|
||||
Level 5: Orchestration → Manage fleets of agents ← You are here
|
||||
```
|
||||
|
||||
**Key realization:** Single agents hit context window limits. You need orchestration to scale beyond one agent.
|
||||
|
||||
## The Three Pillars
|
||||
|
||||
Multi-agent orchestration requires three components working together:
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 1. ORCHESTRATOR AGENT │
|
||||
│ (Single interface to your fleet) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 2. CRUD FOR AGENTS │
|
||||
│ (Create, Read, Update, Delete agents at scale) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 3. OBSERVABILITY │
|
||||
│ (Monitor performance, costs, and results) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Without all three, orchestration fails. You need:
|
||||
|
||||
- **Orchestrator** to command agents
|
||||
- **CRUD** to manage agent lifecycle
|
||||
- **Observability** to understand what agents are doing
|
||||
|
||||
## Core Principle: The Orchestrator Sleeps
|
||||
|
||||
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. It has created and commanded our agents. Now, our agents are doing the work."
|
||||
|
||||
**The pattern:**
|
||||
|
||||
```text
|
||||
1. User prompts Orchestrator
|
||||
2. Orchestrator creates specialized agents
|
||||
3. Orchestrator commands agents with detailed prompts
|
||||
4. Orchestrator SLEEPS (stops consuming context)
|
||||
5. Agents work autonomously
|
||||
6. Orchestrator wakes periodically to check status
|
||||
7. Orchestrator reports results to user
|
||||
8. Agents are deleted
|
||||
```
|
||||
|
||||
**Why orchestrator sleeps:**
|
||||
|
||||
- Protects its context window
|
||||
- Avoids observing all agent work (too much information)
|
||||
- Only wakes when needed to check status or command agents
|
||||
|
||||
**Example orchestrator sleep pattern:**
|
||||
|
||||
```python
|
||||
# Orchestrator commands agents
|
||||
orchestrator.create_agent("scout", task="Find relevant files")
|
||||
orchestrator.create_agent("builder", task="Implement changes")
|
||||
|
||||
# Orchestrator sleeps, checking status every 15s
|
||||
while not all_agents_complete():
|
||||
orchestrator.sleep(15) # Not consuming context
|
||||
status = orchestrator.check_agent_status()
|
||||
orchestrator.log(status)
|
||||
|
||||
# Wake up to collect results
|
||||
results = orchestrator.get_agent_results()
|
||||
orchestrator.summarize_to_user(results)
|
||||
```
|
||||
|
||||
## Orchestration Patterns
|
||||
|
||||
### Pattern 1: Scout-Plan-Build (Sequential Chaining)
|
||||
|
||||
**Use case:** Complex tasks requiring multiple specialized steps
|
||||
|
||||
**Flow:**
|
||||
|
||||
```text
|
||||
User: "Migrate codebase to new SDK"
|
||||
↓
|
||||
Orchestrator creates Scout agents (4 parallel)
|
||||
├→ Scout 1: Search with Gemini
|
||||
├→ Scout 2: Search with CodeX
|
||||
├→ Scout 3: Search with Haiku
|
||||
└→ Scout 4: Search with Flash
|
||||
↓
|
||||
Scouts output: relevant-files.md with exact locations
|
||||
↓
|
||||
Orchestrator creates Planner agent
|
||||
├→ Reads relevant-files.md
|
||||
├→ Scrapes documentation
|
||||
└→ Outputs: detailed-plan.md
|
||||
↓
|
||||
Orchestrator creates Builder agent
|
||||
├→ Reads detailed-plan.md
|
||||
├→ Executes implementation
|
||||
└→ Tests and validates
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
|
||||
- **Scout step offloads searching from Planner** (R&D framework: Reduce + Delegate)
|
||||
- **Multiple scout models** provide diverse perspectives
|
||||
- **Planner only sees relevant files**, not entire codebase
|
||||
- **Builder focused on execution**, not planning
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```bash
|
||||
# Composable slash commands
|
||||
/scout-plan-build "Migrate to new Claude Agent SDK"
|
||||
|
||||
# Internally runs:
|
||||
/scout "Find files needing SDK migration"
|
||||
/plan-with-docs docs=https://agent-sdk-docs.com
|
||||
/build plan=agents/plans/sdk-migration.md
|
||||
```
|
||||
|
||||
**Context savings:**
|
||||
|
||||
```text
|
||||
Without scouts:
|
||||
├── Planner searches entire codebase: 50k tokens
|
||||
├── Planner reads irrelevant files: 30k tokens
|
||||
└── Total wasted: 80k tokens
|
||||
|
||||
With scouts:
|
||||
├── 4 scouts search in parallel (isolated contexts)
|
||||
├── Planner reads only relevant-files.md: 5k tokens
|
||||
└── Savings: 75k tokens (94% reduction)
|
||||
```
|
||||
|
||||
### Pattern 2: Plan-Build-Review-Ship (Task Board)
|
||||
|
||||
**Use case:** Structured development lifecycle with quality gates
|
||||
|
||||
**Flow:**
|
||||
|
||||
```text
|
||||
User: "Update HTML titles across application"
|
||||
↓
|
||||
Task created → PLAN column
|
||||
↓
|
||||
Orchestrator creates Planner agent
|
||||
├→ Analyzes requirements
|
||||
├→ Creates implementation plan
|
||||
└→ Moves task to BUILD
|
||||
↓
|
||||
Orchestrator creates Builder agent
|
||||
├→ Reads plan
|
||||
├→ Implements changes
|
||||
├→ Runs tests
|
||||
└→ Moves task to REVIEW
|
||||
↓
|
||||
Orchestrator creates Reviewer agent
|
||||
├→ Checks implementation against plan
|
||||
├→ Validates tests pass
|
||||
└→ Moves task to SHIP
|
||||
↓
|
||||
Orchestrator creates Shipper agent
|
||||
├→ Creates git commit
|
||||
├→ Pushes to remote
|
||||
└→ Task complete
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
|
||||
- **Clear phases** with distinct responsibilities
|
||||
- **Each agent focused** on single phase
|
||||
- **Quality gates** between phases
|
||||
- **Failure isolation** - if builder fails, planner work preserved
|
||||
|
||||
**Visual representation:**
|
||||
|
||||
```text
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│ PLAN │→ │ BUILD │→ │ REVIEW │→ │ SHIP │
|
||||
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
|
||||
│ Task A │ │ │ │ │ │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
**Agent handoff:**
|
||||
|
||||
```python
|
||||
# Orchestrator manages task board state
|
||||
task = {
|
||||
"id": "update-titles",
|
||||
"status": "planning",
|
||||
"assigned_agent": "planner-001",
|
||||
"artifacts": []
|
||||
}
|
||||
|
||||
# Planner completes
|
||||
task["status"] = "building"
|
||||
task["artifacts"].append("plan.md")
|
||||
task["assigned_agent"] = "builder-001"
|
||||
|
||||
# Orchestrator hands off to builder
|
||||
orchestrator.command_agent(
|
||||
"builder-001",
|
||||
f"Implement plan from {task['artifacts'][0]}"
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Scout-Builder (Two-Stage)
|
||||
|
||||
**Use case:** UI changes, targeted modifications
|
||||
|
||||
**Flow:**
|
||||
|
||||
```text
|
||||
User: "Create gray pills for app header information"
|
||||
↓
|
||||
Orchestrator creates Scout
|
||||
├→ Locates exact files and line numbers
|
||||
├→ Identifies patterns and conventions
|
||||
└→ Outputs: scout-report.md
|
||||
↓
|
||||
Orchestrator creates Builder
|
||||
├→ Reads scout-report.md
|
||||
├→ Implements precise changes
|
||||
└→ Outputs: modified files
|
||||
↓
|
||||
Orchestrator wakes, verifies, reports
|
||||
```
|
||||
|
||||
**Orchestrator sleep pattern:**
|
||||
|
||||
```python
|
||||
# Orchestrator creates scout
|
||||
orchestrator.create_agent("scout-header", task="Find header UI components")
|
||||
|
||||
# Orchestrator sleeps, checking every 15s
|
||||
orchestrator.sleep_with_status_checks(interval=15)
|
||||
|
||||
# Scout completes, orchestrator wakes
|
||||
scout_output = orchestrator.get_agent_output("scout-header")
|
||||
|
||||
# Orchestrator creates builder with scout's output
|
||||
orchestrator.create_agent(
|
||||
"builder-ui",
|
||||
task=f"Create gray pills based on scout findings: {scout_output}"
|
||||
)
|
||||
|
||||
# Orchestrator sleeps again
|
||||
orchestrator.sleep_with_status_checks(interval=15)
|
||||
```
|
||||
|
||||
## Context Window Protection
|
||||
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
|
||||
|
||||
**The problem:** Single agent doing everything explodes context window
|
||||
|
||||
```text
|
||||
Single Agent Approach:
|
||||
├── Search codebase: 40k tokens
|
||||
├── Read files: 60k tokens
|
||||
├── Plan changes: 20k tokens
|
||||
├── Implement: 30k tokens
|
||||
├── Test: 15k tokens
|
||||
└── Total: 165k tokens (83% used!)
|
||||
```
|
||||
|
||||
**The solution:** Specialized agents with focused context
|
||||
|
||||
```text
|
||||
Orchestrator Approach:
|
||||
├── Orchestrator: 10k tokens (coordinates)
|
||||
├── Scout 1: 15k tokens (searches)
|
||||
├── Scout 2: 15k tokens (searches)
|
||||
├── Planner: 25k tokens (plans using scout output)
|
||||
├── Builder: 35k tokens (implements)
|
||||
└── Total per agent: <35k tokens (max 18% per agent)
|
||||
```
|
||||
|
||||
**Key principle:** Agents are deletable temporary resources
|
||||
|
||||
```text
|
||||
1. Create agent for specific task
|
||||
2. Agent completes task
|
||||
3. DELETE agent (free memory)
|
||||
4. Create new agent for next task
|
||||
5. Repeat
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
# User: "Build documentation for frontend and backend"
|
||||
|
||||
# Orchestrator creates 3 agents
|
||||
/create-agent frontend-docs "Document frontend components"
|
||||
/create-agent backend-docs "Document backend APIs"
|
||||
/create-agent qa-docs "Combine and QA both docs"
|
||||
|
||||
# Work completes...
|
||||
|
||||
# Delete all agents when done
|
||||
/delete-all-agents
|
||||
|
||||
# Result: All agents gone, context freed
|
||||
```
|
||||
|
||||
**Why delete agents:**
|
||||
|
||||
- Frees context windows for new work
|
||||
- Prevents context accumulation
|
||||
- Enforces single-purpose design
|
||||
- Matches engineering principle: "The best code is no code at all"
|
||||
|
||||
## CRUD for Agents
|
||||
|
||||
Orchestrator needs full agent lifecycle control:
|
||||
|
||||
**Create:**
|
||||
|
||||
```python
|
||||
agent_id = orchestrator.create_agent(
|
||||
name="scout-api",
|
||||
task="Find all API endpoints",
|
||||
model="haiku", # Fast, cheap for search
|
||||
max_tokens=100000
|
||||
)
|
||||
```
|
||||
|
||||
**Read:**
|
||||
|
||||
```python
|
||||
# Check agent status
|
||||
status = orchestrator.get_agent_status(agent_id)
|
||||
# => {"status": "working", "progress": "60%", "context_used": "15k tokens"}
|
||||
|
||||
# Read agent output
|
||||
output = orchestrator.get_agent_output(agent_id)
|
||||
# => {"files_consumed": [...], "files_produced": [...]}
|
||||
```
|
||||
|
||||
**Update:**
|
||||
|
||||
```python
|
||||
# Command existing agent with new task
|
||||
orchestrator.command_agent(
|
||||
agent_id,
|
||||
"Now implement the changes based on your findings"
|
||||
)
|
||||
```
|
||||
|
||||
**Delete:**
|
||||
|
||||
```python
|
||||
# Single agent
|
||||
orchestrator.delete_agent(agent_id)
|
||||
|
||||
# All agents
|
||||
orchestrator.delete_all_agents()
|
||||
```
|
||||
|
||||
## Observability Requirements
|
||||
|
||||
Without observability, orchestration is blind. You need:
|
||||
|
||||
### 1. Agent-Level Visibility
|
||||
|
||||
```text
|
||||
For each agent, track:
|
||||
├── Name and ID
|
||||
├── Status (creating, working, complete, failed)
|
||||
├── Context window usage
|
||||
├── Model and cost
|
||||
├── Files consumed
|
||||
├── Files produced
|
||||
└── Tool calls executed
|
||||
```
|
||||
|
||||
### 2. Cross-Agent Visibility
|
||||
|
||||
```text
|
||||
Fleet overview:
|
||||
├── Total agents active
|
||||
├── Total context consumed
|
||||
├── Total cost
|
||||
├── Agent dependencies (who's waiting on whom)
|
||||
└── Bottlenecks (slow agents blocking others)
|
||||
```
|
||||
|
||||
### 3. Real-Time Streaming
|
||||
|
||||
```text
|
||||
User sees:
|
||||
├── Agent creation events
|
||||
├── Tool calls as they happen
|
||||
├── Progress updates
|
||||
├── Completion notifications
|
||||
└── Error alerts
|
||||
```
|
||||
|
||||
**Implementation:** See [Hooks for Observability](hooks-observability.md) for complete architecture
|
||||
|
||||
## Information Flow in Orchestrated Systems
|
||||
|
||||
```text
|
||||
User
|
||||
↓ (prompts)
|
||||
Orchestrator
|
||||
↓ (creates & commands)
|
||||
Agent 1 → Agent 2 → Agent 3
|
||||
↓ ↓ ↓
|
||||
(results flow back up)
|
||||
↓
|
||||
Orchestrator (summarizes)
|
||||
↓
|
||||
User
|
||||
```
|
||||
|
||||
**Critical understanding:** Agents never talk directly to user. They report to orchestrator.
|
||||
|
||||
**Example:**
|
||||
|
||||
```python
|
||||
# User prompts orchestrator
|
||||
user: "Summarize codebase"
|
||||
|
||||
# Orchestrator creates agent with detailed instructions
|
||||
orchestrator → agent: """
|
||||
Read all files in src/
|
||||
Create markdown summary with:
|
||||
- Architecture overview
|
||||
- Key components
|
||||
- File structure
|
||||
- Tech stack
|
||||
|
||||
Report results back to orchestrator (not user!)
|
||||
"""
|
||||
|
||||
# Agent completes, reports to orchestrator
|
||||
agent → orchestrator: "Summary complete at docs/summary.md"
|
||||
|
||||
# Orchestrator reports to user
|
||||
orchestrator → user: "Codebase summary created with 3 main sections: architecture, components, and tech stack"
|
||||
```
|
||||
|
||||
## When to Use Orchestration
|
||||
|
||||
### Use orchestration when
|
||||
|
||||
✅ **Task requires 3+ specialized agents**
|
||||
|
||||
- Example: Scout + Plan + Build
|
||||
|
||||
✅ **Context window exploding in single agent**
|
||||
|
||||
- Single agent using >150k tokens
|
||||
|
||||
✅ **Need parallel execution**
|
||||
|
||||
- Multiple independent subtasks
|
||||
|
||||
✅ **Quality gates required**
|
||||
|
||||
- Plan → Build → Review → Ship
|
||||
|
||||
✅ **Long-running autonomous work**
|
||||
|
||||
- Agents work while you're AFK
|
||||
|
||||
### Don't use orchestration when
|
||||
|
||||
❌ **Simple one-off task**
|
||||
|
||||
- Single agent sufficient
|
||||
|
||||
❌ **Learning/prototyping**
|
||||
|
||||
- Orchestration adds complexity
|
||||
|
||||
❌ **No observability infrastructure**
|
||||
|
||||
- You'll be blind to agent behavior
|
||||
|
||||
❌ **Haven't mastered custom agents**
|
||||
|
||||
- Level 5 requires Level 4 foundation
|
||||
|
||||
## Practical Implementation
|
||||
|
||||
### Minimal Orchestrator Agent
|
||||
|
||||
```python
|
||||
# orchestrator-agent.md (sub-agent definition)
|
||||
|
||||
---
|
||||
name: orchestrator
|
||||
description: Manages fleet of agents for complex multi-step tasks
|
||||
---
|
||||
|
||||
# Orchestrator Agent
|
||||
|
||||
You are an orchestrator agent managing a fleet of specialized agents.
|
||||
|
||||
## Your Tools
|
||||
|
||||
- create_agent(name, task, model): Create new agent
|
||||
- command_agent(agent_id, task): Send task to existing agent
|
||||
- get_agent_status(agent_id): Check agent progress
|
||||
- get_agent_output(agent_id): Retrieve agent results
|
||||
- delete_agent(agent_id): Remove completed agent
|
||||
- delete_all_agents(): Clean up all agents
|
||||
|
||||
## Your Responsibilities
|
||||
|
||||
1. **Break down user requests** into specialized subtasks
|
||||
2. **Create focused agents** for each subtask
|
||||
3. **Command agents** with detailed instructions
|
||||
4. **Monitor progress** without micromanaging
|
||||
5. **Collect results** and synthesize for user
|
||||
6. **Delete agents** when work is complete
|
||||
|
||||
## Orchestrator Sleep Pattern
|
||||
|
||||
After creating and commanding agents:
|
||||
1. **SLEEP** - Stop consuming context
|
||||
2. **Wake every 15-30s** to check agent status
|
||||
3. **SLEEP again** if agents still working
|
||||
4. **Wake when all complete** to collect results
|
||||
|
||||
DO NOT observe all agent work. This explodes your context window.
|
||||
|
||||
## Example Workflow
|
||||
|
||||
```
|
||||
|
||||
User: "Migrate codebase to new SDK"
|
||||
|
||||
You:
|
||||
|
||||
1. Create scout agents (parallel search)
|
||||
2. Command scouts to find SDK usage
|
||||
3. SLEEP (check status every 15s)
|
||||
4. Wake when scouts complete
|
||||
5. Create planner agent
|
||||
6. Command planner with scout results
|
||||
7. SLEEP (check status every 15s)
|
||||
8. Wake when planner completes
|
||||
9. Create builder agent
|
||||
10. Command builder with plan
|
||||
11. SLEEP (check status every 15s)
|
||||
12. Wake when builder completes
|
||||
13. Summarize results for user
|
||||
14. Delete all agents
|
||||
|
||||
```bash
|
||||
|
||||
## Key Principles
|
||||
|
||||
- **One agent, one task** - Don't overload agents
|
||||
- **Sleep between phases** - Protect your context
|
||||
- **Delete when done** - Treat agents as temporary
|
||||
- **Detailed commands** - Don't assume agents know context
|
||||
- **Results-oriented** - Every agent must produce concrete output
|
||||
```
|
||||
|
||||
### Orchestrator Tools (SDK)
|
||||
|
||||
```python
|
||||
# create_agent tool
|
||||
@mcptool(
|
||||
name="create_agent",
|
||||
description="Create a new specialized agent"
|
||||
)
|
||||
def create_agent(params: dict) -> dict:
|
||||
name = params["name"]
|
||||
task = params["task"]
|
||||
model = params.get("model", "sonnet")
|
||||
|
||||
agent_id = agent_manager.create(
|
||||
name=name,
|
||||
system_prompt=task,
|
||||
model=model
|
||||
)
|
||||
|
||||
return {
|
||||
"agent_id": agent_id,
|
||||
"status": "created",
|
||||
"message": f"Agent {name} created"
|
||||
}
|
||||
|
||||
# command_agent tool
|
||||
@mcptool(
|
||||
name="command_agent",
|
||||
description="Send task to existing agent"
|
||||
)
|
||||
def command_agent(params: dict) -> dict:
|
||||
agent_id = params["agent_id"]
|
||||
task = params["task"]
|
||||
|
||||
result = agent_manager.prompt(agent_id, task)
|
||||
|
||||
return {
|
||||
"agent_id": agent_id,
|
||||
"status": "commanded",
|
||||
"message": f"Agent received task"
|
||||
}
|
||||
```
|
||||
|
||||
## Trade-offs
|
||||
|
||||
### Benefits
|
||||
|
||||
- ✅ Scales beyond single agent limits
|
||||
- ✅ Parallel execution (3x-10x speedup)
|
||||
- ✅ Context window protection
|
||||
- ✅ Specialized agent focus
|
||||
- ✅ Quality gates between phases
|
||||
- ✅ Autonomous out-of-loop work
|
||||
|
||||
### Costs
|
||||
|
||||
- ❌ Upfront investment to build
|
||||
- ❌ Infrastructure complexity (database, WebSocket)
|
||||
- ❌ More moving parts to manage
|
||||
- ❌ Requires observability
|
||||
- ❌ Orchestrator agent needs careful prompting
|
||||
- ❌ Not worth it for simple tasks
|
||||
|
||||
## Key Quotes
|
||||
|
||||
> "The orchestrator agent is the first pattern where I felt the perfect combination of observability, customizability, and agents at scale."
|
||||
>
|
||||
> "Treat your agents as deletable temporary resources that serve a single purpose."
|
||||
>
|
||||
> "Our orchestrator has stopped doing work. Its orchestration tasks are completed. Now, our agents are doing the work."
|
||||
>
|
||||
> "200k context window is plenty. You're just stuffing a single agent with too much work."
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary source:** One Agent to Rule Them All (orchestrator architecture, three pillars, sleep pattern, CRUD)
|
||||
|
||||
**Supporting sources:**
|
||||
|
||||
- Claude 2.0 (scout-plan-build workflow, composable prompts)
|
||||
- Custom Agents (plan-build-review-ship task board)
|
||||
- Sub-Agents (information flow, delegation patterns)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Hooks for Observability](hooks-observability.md) - Required for orchestration
|
||||
- [Context Window Protection](context-window-protection.md) - Why orchestration matters
|
||||
- [Multi-Agent Case Studies](../examples/multi-agent-case-studies.md) - Real orchestration systems
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Orchestration is Level 5. Master Levels 1-4 first. Then build your fleet.
|
||||
408
skills/multi-agent-composition/reference/architecture.md
Normal file
408
skills/multi-agent-composition/reference/architecture.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# Core Concepts: Claude Code Architecture
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Executive Summary](#executive-summary)
|
||||
- [The Core 4 Framework](#the-core-4-framework)
|
||||
- [Component Definitions](#component-definitions)
|
||||
- [Skills](#skills)
|
||||
- [MCP Servers (External Data Sources)](#mcp-servers-external-data-sources)
|
||||
- [Sub-Agents](#sub-agents)
|
||||
- [Slash Commands (Custom Prompts)](#slash-commands-custom-prompts)
|
||||
- [Compositional Hierarchy](#compositional-hierarchy)
|
||||
- [Progressive Disclosure Architecture](#progressive-disclosure-architecture)
|
||||
- [Three-Level Loading Mechanism](#three-level-loading-mechanism)
|
||||
- [How They Relate](#how-they-relate)
|
||||
- [When to Use Each Component](#when-to-use-each-component)
|
||||
- [Use Skills When](#use-skills-when)
|
||||
- [Use Sub-Agents When](#use-sub-agents-when)
|
||||
- [Use Slash Commands When](#use-slash-commands-when)
|
||||
- [Use MCP Servers When](#use-mcp-servers-when)
|
||||
- [Critical Insights and Warnings](#critical-insights-and-warnings)
|
||||
- [1. Don't Convert All Slash Commands to Skills](#1-dont-convert-all-slash-commands-to-skills)
|
||||
- [2. Skills Are Not Replacements](#2-skills-are-not-replacements)
|
||||
- [3. One-Off Tasks Don't Need Skills](#3-one-off-tasks-dont-need-skills)
|
||||
- [4. Master the Fundamentals First](#4-master-the-fundamentals-first)
|
||||
- [5. Prompts Are Non-Negotiable](#5-prompts-are-non-negotiable)
|
||||
- [Skills: Honest Assessment](#skills-honest-assessment)
|
||||
- [Pros](#pros)
|
||||
- [Cons](#cons)
|
||||
- [Evolution Path](#evolution-path)
|
||||
- [Context Management](#context-management)
|
||||
- [Key Quotes for Reference](#key-quotes-for-reference)
|
||||
- [Summary](#summary)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Claude Code's architecture is built on a fundamental principle: **prompts are the primitive foundation** for everything. This document provides the authoritative reference for understanding Skills, Sub-Agents, MCP Servers, and Slash Commands—how they work, how they relate, and how they compose.
|
||||
|
||||
**Key insight:** Skills are powerful compositional units but should NOT replace the fundamental building blocks (prompts, sub-agents, MCPs). They orchestrate these primitives to solve repeat problems in an agent-first way.
|
||||
|
||||
## The Core 4 Framework
|
||||
|
||||
**Everything comes down to four pieces:**
|
||||
|
||||
1. **Context**
|
||||
2. **Model**
|
||||
3. **Prompt**
|
||||
4. **Tools**
|
||||
|
||||
> "If you understand these, if you can build and manage these, you will win. Why is that? It's because every agent is the core 4. And every feature that every one of these agent coding tools is going to build is going to build directly on the core 4. This is the foundation."
|
||||
|
||||
This is the thinking framework for understanding and building with Claude Code.
|
||||
|
||||
## Component Definitions
|
||||
|
||||
### Skills
|
||||
|
||||
**What they are:** A dedicated, modular solution that packages a domain-specific capability for autonomous, repeatable workflows.
|
||||
|
||||
**Triggering:** Agent-invoked. Claude autonomously decides when to use them based on your request and the Skill's description. You don't explicitly invoke them—they activate automatically when relevant.
|
||||
|
||||
**Context and structure:** High modularity with a dedicated directory structure. Supports progressive disclosure (metadata, instructions, resources) and context persistence within the skill's scope.
|
||||
|
||||
**Composition:** Can use prompts, other skills, MCP servers, and sub-agents. They sit on top of other capabilities and can orchestrate them through instructions.
|
||||
|
||||
**Best use cases:** Automatic or recurring behavior that you want to reuse across workflows (e.g., a work-tree manager that handles create, list, remove, merge, update operations).
|
||||
|
||||
**Not a replacement for:** MCP servers, sub-agents, or slash commands. Skills are a higher-level composition unit that coordinates these primitives.
|
||||
|
||||
**Critical insight:** Skills don't directly execute code; they provide declarative guidance that coordinates multiple components. When a skill activates, Claude reads the instructions and uses available tools to follow the workflow.
|
||||
|
||||
### MCP Servers (External Data Sources)
|
||||
|
||||
**What they are:** External data sources or tools integrated into agents through the Model Context Protocol (MCP).
|
||||
|
||||
**Triggering:** Typically invoked as needed, often by skills or prompts.
|
||||
|
||||
**Context:** They don't bundle a workflow; they connect to external systems and bring in data/services.
|
||||
|
||||
**Composition:** Can be used within skills or prompts to fetch data or perform actions with external tools.
|
||||
|
||||
**Best use cases:** Connecting to Jira, databases, GitHub, Figma, Slack, and hundreds of other external services. Bundling multiple services together for exposure to the agent.
|
||||
|
||||
**Practical examples:**
|
||||
- Implement features from issue trackers: "Add the feature described in JIRA issue ENG-4521"
|
||||
- Query databases: "Find emails of 10 random users based on our Postgres database"
|
||||
- Integrate designs: "Update our email template based on new Figma designs"
|
||||
- Automate workflows: "Create Gmail drafts inviting these users to a feedback session"
|
||||
|
||||
**Clear differentiation:** MCP = external integration, Skills = internal orchestration.
|
||||
|
||||
**Plugin integration:** Plugins can bundle MCP servers that start automatically when the plugin is enabled, providing tools and integrations team-wide.
|
||||
|
||||
### Sub-Agents
|
||||
|
||||
**What they are:** Isolated workflows with separate contexts that can run in parallel.
|
||||
|
||||
**Triggering:** Invoked by the main agent to do a task in parallel without polluting the main context.
|
||||
|
||||
**Context and isolation:** Each sub-agent uses its own context window separate from main conversation. This prevents context pollution and enables longer overall sessions.
|
||||
|
||||
**Composition:** Can be used inside skills and prompts, but you **cannot nest sub-agents inside other sub-agents** (hard limit to prevent infinite nesting).
|
||||
|
||||
**Best use cases:** Parallelizable, isolated tasks (e.g., bulk/scale tasks like fixing failing tests, batch operations, comprehensive audits).
|
||||
|
||||
**Critical constraint:** You must be okay with losing context afterward—sub-agent context doesn't persist in the main conversation.
|
||||
|
||||
**Resumable sub-agents:** Each execution gets a unique `agentId` stored in `agent-{agentId}.jsonl`. Sub-agents can be resumed to continue previous conversations, useful for:
|
||||
- Long-running research across multiple sessions
|
||||
- Iterative refinement without losing context
|
||||
- Multi-step workflows with maintained context
|
||||
|
||||
**Model selection:** Sub-agents support `model` field to specify model alias (`sonnet`, `opus`, `haiku`) or `'inherit'` to use the main conversation's model.
|
||||
|
||||
### Slash Commands (Custom Prompts)
|
||||
|
||||
**What they are:** The primitive, reusable prompts you invoke manually. The closest compositional unit to "bare metal agent plus LLM."
|
||||
|
||||
**Triggering:** Manual triggers by a user (or by a higher-level unit like a sub-agent or skill via the SlashCommand tool).
|
||||
|
||||
**Context:** They're the most fundamental unit. You should master prompt design here.
|
||||
|
||||
**Composition:** Can be used alone or as building blocks inside skills, sub-agents, and MCPs. Acts as BOTH primitive AND composition point.
|
||||
|
||||
**Best use cases:** One-off tasks or basic, repeatable prompts. The starting point for building more complex capabilities.
|
||||
|
||||
**Critical principle:**
|
||||
|
||||
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming. If you don't know how to build and manage prompts, you will lose."
|
||||
|
||||
**SlashCommand tool:** Claude can programmatically invoke custom slash commands via the `SlashCommand` tool during conversations. Both skills and sub-agents compose prompts using this tool.
|
||||
|
||||
**Advanced features:**
|
||||
- **Bash execution:** Use `!` prefix to execute bash commands before the slash command runs
|
||||
- **File references:** Use `@` prefix to include file contents
|
||||
- **Arguments:** Support `$ARGUMENTS` for all args or `$1`, `$2` for individual parameters
|
||||
- **Frontmatter:** Control `allowed-tools`, `model`, `description`, `argument-hint`
|
||||
|
||||
**Comparison to Skills:**
|
||||
|
||||
| Aspect | Slash Commands | Skills |
|
||||
|----------------|-----------------------------|-------------------------------------|
|
||||
| **Complexity** | Simple prompts | Complex capabilities |
|
||||
| **Structure** | Single .md file | Directory with SKILL.md + resources |
|
||||
| **Discovery** | Explicit (`/command`) | Automatic (context-based) |
|
||||
| **Files** | One file only | Multiple files, scripts, templates |
|
||||
|
||||
## Compositional Hierarchy
|
||||
|
||||
**Skills sit at the top of the composition hierarchy:**
|
||||
|
||||
```text
|
||||
Skills (Top Compositional Layer)
|
||||
├─→ Can use: MCP Servers
|
||||
├─→ Can use: Sub-Agents
|
||||
├─→ Can use: Slash Commands
|
||||
└─→ Can use: Other Skills
|
||||
|
||||
Slash Commands (Primitive + Compositional)
|
||||
├─→ Can use: Skills (via SlashCommand tool)
|
||||
├─→ Can use: MCP Servers
|
||||
├─→ Can use: Sub-Agents
|
||||
└─→ Acts as BOTH primitive AND composition point
|
||||
|
||||
Sub-Agents (Execution Layer)
|
||||
├─→ Can use: Slash Commands (via SlashCommand tool)
|
||||
├─→ Can use: Skills (via SlashCommand tool)
|
||||
└─→ CANNOT use: Other Sub-Agents (hard limit)
|
||||
|
||||
MCP Servers (Integration Layer)
|
||||
└─→ Lower level unit, used BY skills, not using skills
|
||||
```
|
||||
|
||||
**Key principle:** Skills provide **coordinated guidance** for repeatable workflows. They orchestrate other components through instructions, not by executing code directly.
|
||||
|
||||
**Verified restrictions:**
|
||||
- Sub-agents cannot nest (no sub-agent spawning other sub-agents)
|
||||
- Skills don't execute code; they guide Claude to use available tools
|
||||
- Slash commands can be invoked manually or via `SlashCommand` tool
|
||||
|
||||
## Progressive Disclosure Architecture
|
||||
|
||||
### Three-Level Loading Mechanism
|
||||
|
||||
Skills use a sophisticated loading system that minimizes context usage:
|
||||
|
||||
**Level 1: Metadata (always loaded)** - ~100 tokens per skill
|
||||
- YAML frontmatter with `name` and `description`
|
||||
- Loaded at startup into system prompt
|
||||
- Enables discovery without context penalty
|
||||
- You can install many Skills with minimal overhead
|
||||
|
||||
**Level 2: Instructions (loaded when triggered)** - Under 5k tokens
|
||||
- Main SKILL.md body with procedural knowledge
|
||||
- Read from filesystem via bash when skill activates
|
||||
- Only enters context when the skill is relevant
|
||||
- Contains workflows, best practices, guidance
|
||||
|
||||
**Level 3: Resources (loaded as needed)** - Effectively unlimited
|
||||
- Additional markdown files, scripts, templates
|
||||
- Executed via bash without loading contents into context
|
||||
- Scripts provide deterministic operations efficiently
|
||||
- No context penalty for bundled content that isn't used
|
||||
|
||||
**Example skill structure:**
|
||||
|
||||
```text
|
||||
work-tree-manager/
|
||||
├── SKILL.md # Main instructions (Level 2)
|
||||
├── reference.md # Detailed reference (Level 3)
|
||||
├── examples.md # Usage examples (Level 3)
|
||||
└── scripts/
|
||||
├── validate.py # Utility script (Level 3, executed)
|
||||
└── cleanup.py # Cleanup script (Level 3, executed)
|
||||
```
|
||||
|
||||
When this skill activates:
|
||||
1. Claude already knows the skill exists (Level 1 metadata pre-loaded)
|
||||
2. Claude reads SKILL.md when the skill is relevant (Level 2)
|
||||
3. Claude reads reference.md only if needed (Level 3)
|
||||
4. Claude executes scripts without loading their code (Level 3)
|
||||
|
||||
**Key advantage:** Unlike MCP servers which load all context at startup, Skills are extremely context-efficient. Progressive disclosure means only relevant content occupies the context window at any given time.
|
||||
|
||||
## How They Relate
|
||||
|
||||
**Prompts / slash commands are the primitive building blocks.**
|
||||
|
||||
- Master these first before anything else
|
||||
- "Everything is a prompt in the end. It's tokens in, tokens out."
|
||||
- Strong bias towards slash commands for simple tasks
|
||||
|
||||
**Sub-agents are for isolated, parallelizable tasks with separate contexts.**
|
||||
|
||||
- Use when you see the keyword "parallel"
|
||||
- Nothing else supports parallel calling
|
||||
- Critical for scale tasks and batch operations
|
||||
|
||||
**MCP servers connect to external systems and data sources.**
|
||||
|
||||
- Very little overlap with Skills
|
||||
- These are fully distinct components
|
||||
- Clear separation: external (MCP) vs internal (Skills)
|
||||
|
||||
**Skills are higher-level, domain-specific bundles that orchestrate or compose prompts, sub-agents, and MCP servers to solve repeat problems.**
|
||||
|
||||
- Use for MANAGEMENT problems, not one-off tasks
|
||||
- Keywords: "automatic," "repeat," "manage"
|
||||
- Don't convert all slash commands to skills—this is a huge mistake
|
||||
|
||||
## When to Use Each Component
|
||||
|
||||
### Use Skills When
|
||||
|
||||
- You have a **REPEAT** problem that needs **MANAGEMENT**
|
||||
- Multiple related operations need coordination
|
||||
- You want **automatic** behavior
|
||||
- Example: Managing git work trees (create, list, remove, merge, update)
|
||||
|
||||
**Not for:**
|
||||
- One-off tasks
|
||||
- Simple operations
|
||||
- Problems solved well by a single prompt
|
||||
|
||||
### Use Sub-Agents When
|
||||
|
||||
- **Parallelization** is needed
|
||||
- **Context isolation** is required
|
||||
- Scale tasks and batch operations
|
||||
- You're okay with losing context afterward
|
||||
|
||||
**Signal words:** "parallel," "scale," "bulk," "isolated"
|
||||
|
||||
### Use Slash Commands When
|
||||
|
||||
- One-off tasks
|
||||
- Simple repeatable actions
|
||||
- You're starting a new workflow
|
||||
- Building the primitive before composing
|
||||
|
||||
**Remember:** "Have a strong bias towards slash commands."
|
||||
|
||||
### Use MCP Servers When
|
||||
|
||||
- External integrations are needed
|
||||
- Data sources outside Claude Code
|
||||
- Third-party services
|
||||
- Database connections
|
||||
|
||||
**Clear rule:** External = MCP, Internal orchestration = Skills
|
||||
|
||||
## Critical Insights and Warnings
|
||||
|
||||
### 1. Don't Convert All Slash Commands to Skills
|
||||
|
||||
> "There are a lot of engineers right now that are going all in on skills. They're converting all their slash commands to skills. I think that's a huge mistake."
|
||||
|
||||
Keep your slash commands. They are the primitive foundation.
|
||||
|
||||
### 2. Skills Are Not Replacements
|
||||
|
||||
> "It is very clear this does not replace any existing feature or capability. It is a higher compositional level."
|
||||
|
||||
Skills complement other components; they don't replace them.
|
||||
|
||||
### 3. One-Off Tasks Don't Need Skills
|
||||
|
||||
> "If you can do the job with a sub-agent or custom slash command and it's a one-off job, do not use a skill. This is not what skills are for."
|
||||
|
||||
Use the right tool for the job. Not everything needs a skill.
|
||||
|
||||
### 4. Master the Fundamentals First
|
||||
|
||||
> "When you're starting out, I always recommend you just build a prompt. Don't build a skill. Don't build a sub-agent. Don't build out an MCP server. Keep it simple. Build a prompt."
|
||||
|
||||
Start simple. Build upward from primitives.
|
||||
|
||||
### 5. Prompts Are Non-Negotiable
|
||||
|
||||
> "Do not give away the prompt. The prompt is the fundamental unit of knowledge work and of programming."
|
||||
|
||||
Everything comes back to prompts. Master them first.
|
||||
|
||||
## Skills: Honest Assessment
|
||||
|
||||
### Pros
|
||||
|
||||
1. **Agent-invoked** - Dial up the autonomy knob to 11
|
||||
2. **Context protection** - Progressive disclosure unlike MCP servers
|
||||
3. **Dedicated file system pattern** - Logically compose and group skills together
|
||||
4. **Composability** - Can compose other elements or features
|
||||
5. **Agentic approach** - Agent just does the right thing
|
||||
|
||||
**Biggest value:** "Dedicated isolated file system pattern" + "agent invoked"
|
||||
|
||||
### Cons
|
||||
|
||||
1. **Doesn't go all the way** - No first-class support for embedding prompts and sub-agents directly in skill directories (must use SlashCommand tool to compose them)
|
||||
|
||||
2. **Reliability in complex chains is uncertain** - "Will the agent actually use the right skills when chained? I think individually it's less concerning but when you stack these up... how reliable is that?"
|
||||
|
||||
3. **Limited innovation** - Skills are effectively "curated prompt engineering plus modularity." The real innovation is having a dedicated, opinionated way to operate agents.
|
||||
|
||||
**Rating:** "8 out of 10"
|
||||
|
||||
**Bottom line:** "Having a dedicated specific way to operate your agents in an agent first way is still powerful."
|
||||
|
||||
## Evolution Path
|
||||
|
||||
The proper progression for building with Claude Code:
|
||||
|
||||
1. **Start with a prompt/slash command** - Solve the basic problem
|
||||
2. **Add sub-agent if parallelism needed** - Scale to multiple parallel operations
|
||||
3. **Create skill when management needed** - Bundle multiple related operations
|
||||
4. **Add MCP if external data needed** - Integrate external systems
|
||||
|
||||
**Example: Git Work Trees**
|
||||
|
||||
- **Prompt:** Create one work tree ✓
|
||||
- **Sub-agent:** Create multiple work trees in parallel ✓
|
||||
- **Skill:** Manage work trees (create, list, remove, merge, update) ✓
|
||||
- **MCP:** Query external repo metadata (if needed) ✓
|
||||
|
||||
## Context Management
|
||||
|
||||
**Progressive Disclosure (Skills):**
|
||||
|
||||
Skills are very context efficient. Three levels of progressive disclosure ensure only relevant content is loaded:
|
||||
1. Metadata level (always in context, ~100 tokens)
|
||||
2. Instructions (loaded when triggered, <5k tokens)
|
||||
3. Resources (loaded as needed, effectively unlimited)
|
||||
|
||||
**Context Isolation (Sub-Agents):**
|
||||
|
||||
Sub-agents isolate and protect your context window by using separate contexts for each task. This is what makes sub-agents great for parallel work—but you must be okay with losing that context afterward.
|
||||
|
||||
**Context Explosion (MCP Servers):**
|
||||
|
||||
Unlike Skills, MCP servers can "torch your context window" by loading all their context at startup. This is a tradeoff for immediate availability of external tools.
|
||||
|
||||
## Key Quotes for Reference
|
||||
|
||||
1. **On Prompts:**
|
||||
> "The prompt is the fundamental unit of knowledge work and of programming."
|
||||
|
||||
2. **On Skills vs Prompts:**
|
||||
> "If you can do the job with a sub agent or custom slash command and it's a one-off job, do not use a skill."
|
||||
|
||||
3. **On Composition:**
|
||||
> "Skills at the top of the composition hierarchy... can compose everything into a skill, but you can also compose everything into a slash command."
|
||||
|
||||
4. **On The Core 4:**
|
||||
> "Everything comes down to just four pieces... context, model, prompt, and tools."
|
||||
|
||||
5. **On Skills' Purpose:**
|
||||
> "Skills offer a dedicated solution, right? An opinionated structure on how to solve repeat problems in an agent first way."
|
||||
|
||||
## Summary
|
||||
|
||||
**Start simple:** Build prompts first.
|
||||
|
||||
**Compose upward:** Prompts → Skills (not Skills → prompts as primary).
|
||||
|
||||
**Use the right tool:** Not everything needs a skill.
|
||||
|
||||
**Master The Core 4:** Context, Model, Prompt, Tools—these are the foundation.
|
||||
|
||||
**Remember:** Skills are powerful compositional units for repeat problems, but prompts remain the fundamental primitive. Build from this foundation, and compose upward as complexity requires.
|
||||
428
skills/multi-agent-composition/reference/core-4-framework.md
Normal file
428
skills/multi-agent-composition/reference/core-4-framework.md
Normal file
@@ -0,0 +1,428 @@
|
||||
# The Core 4 Framework
|
||||
|
||||
> "Keep track of the core four. If you understand the core 4 and how each element flows and controls your agent, you will understand compute and you'll understand how to scale your compute."
|
||||
|
||||
The Core 4 Framework is the foundation of all agentic systems. Every agent—whether base, custom, or sub-agent—operates on these four pillars:
|
||||
|
||||
1. **Context** - What information does the agent have?
|
||||
2. **Model** - What capabilities does the model provide?
|
||||
3. **Prompt** - What instruction are you giving?
|
||||
4. **Tools** - What actions can the agent take?
|
||||
|
||||
## Why the Core 4 Matters
|
||||
|
||||
**Understanding compute = Understanding the Core 4**
|
||||
|
||||
When you analyze any agent configuration, isolate the Core 4:
|
||||
- How is context being managed?
|
||||
- Which model is selected and why?
|
||||
- What are the system prompts vs user prompts?
|
||||
- What tools are available?
|
||||
|
||||
**Everything comes down to just four pieces. If you understand these, you will win.**
|
||||
|
||||
## The Four Pillars in Detail
|
||||
|
||||
### 1. Context - What Information Does the Agent Have?
|
||||
|
||||
Context is the information available to your agent at any given moment.
|
||||
|
||||
**Types of Context:**
|
||||
|
||||
```text
|
||||
Static Context (always loaded):
|
||||
├── CLAUDE.md (global instructions)
|
||||
├── System prompt (agent definition)
|
||||
└── MCP servers (tool descriptions)
|
||||
|
||||
Dynamic Context (accumulated during session):
|
||||
├── Conversation history
|
||||
├── File reads
|
||||
├── Tool execution results
|
||||
└── User prompts
|
||||
```
|
||||
|
||||
**Context Management Strategies:**
|
||||
|
||||
| Strategy | When to Use | Token Cost |
|
||||
|----------|-------------|------------|
|
||||
| Minimal CLAUDE.md | Always | 500-1k tokens |
|
||||
| Context priming | Task-specific setup | 2-5k tokens |
|
||||
| Context bundles | Agent handoffs | 10-20k tokens |
|
||||
| Sub-agent delegation | Parallel work | Isolated per agent |
|
||||
|
||||
**Key Principle:** A focused agent is a performant agent.
|
||||
|
||||
**Anti-pattern:** Loading all context upfront regardless of relevance.
|
||||
|
||||
### 2. Model - What Capabilities Does the Model Provide?
|
||||
|
||||
The model determines intelligence, speed, and cost characteristics.
|
||||
|
||||
**Model Selection:**
|
||||
|
||||
```text
|
||||
Claude Opus:
|
||||
├── Use: Complex reasoning, large codebases, architectural decisions
|
||||
├── Cost: Highest
|
||||
└── Speed: Slower
|
||||
|
||||
Claude Sonnet:
|
||||
├── Use: Balanced tasks, general development
|
||||
├── Cost: Medium
|
||||
└── Speed: Medium
|
||||
|
||||
Claude Haiku:
|
||||
├── Use: Simple tasks, fast iteration, text transformation
|
||||
├── Cost: Lowest (pennies)
|
||||
└── Speed: Fastest
|
||||
```
|
||||
|
||||
**Example: Echo Agent (Custom Agents)**
|
||||
```python
|
||||
model: "claude-3-haiku-20240307" # Downgraded for simple text manipulation
|
||||
# Result: Much faster, much cheaper, still effective for the task
|
||||
```
|
||||
|
||||
**Key Principle:** Match model capability to task complexity. Don't pay for Opus when Haiku will do.
|
||||
|
||||
### 3. Prompt - What Instruction Are You Giving?
|
||||
|
||||
Prompts are the fundamental unit of knowledge work and programming.
|
||||
|
||||
**Critical Distinction: System Prompts vs User Prompts**
|
||||
|
||||
```text
|
||||
System Prompts:
|
||||
├── Define agent identity and capabilities
|
||||
├── Loaded once at agent initialization
|
||||
├── Affect every user prompt that follows
|
||||
├── Used in: Custom agents, sub-agents
|
||||
└── Not visible in conversation history
|
||||
|
||||
User Prompts:
|
||||
├── Request specific work from the agent
|
||||
├── Added to conversation history
|
||||
├── Build on system prompt foundation
|
||||
├── Used in: Interactive Claude Code sessions
|
||||
└── Visible in conversation history
|
||||
```
|
||||
|
||||
**The Pong Agent Example:**
|
||||
|
||||
```python
|
||||
# System prompt (3 lines):
|
||||
"You are a pong agent. Always respond exactly with 'pong'. That's it."
|
||||
|
||||
# Result: No matter what user prompts ("hello", "summarize codebase", "what can you do?")
|
||||
# Agent always responds: "pong"
|
||||
```
|
||||
|
||||
**Key Insight:** "As soon as you touch the system prompt, you change the product, you change the agent."
|
||||
|
||||
**Information Flow in Multi-Agent Systems:**
|
||||
|
||||
```text
|
||||
User Prompt → Primary Agent (System + User Prompts)
|
||||
↓
|
||||
Primary prompts Sub-Agent (System Prompt + Primary's instructions)
|
||||
↓
|
||||
Sub-Agent responds → Primary Agent (not to user!)
|
||||
↓
|
||||
Primary Agent → User
|
||||
```
|
||||
|
||||
**Why this matters:** Sub-agents respond to your primary agent, not to you. This changes how you write sub-agent prompts.
|
||||
|
||||
### 4. Tools - What Actions Can the Agent Take?
|
||||
|
||||
Tools are the agent's ability to interact with the world.
|
||||
|
||||
**Tool Sources:**
|
||||
|
||||
```text
|
||||
Built-in Claude Code Tools:
|
||||
├── Read, Write, Edit files
|
||||
├── Bash commands
|
||||
├── Grep, Glob searches
|
||||
├── Git operations
|
||||
└── ~15 standard tools
|
||||
|
||||
MCP Servers (External):
|
||||
├── APIs, databases, services
|
||||
├── Added via mcp.json
|
||||
└── Can consume 24k+ tokens if not managed
|
||||
|
||||
Custom Tools (SDK):
|
||||
├── Built with @mcptool decorator
|
||||
├── Passed to create_sdk_mcp_server()
|
||||
└── Integrated with system prompt
|
||||
```
|
||||
|
||||
**Example: Custom Echo Agent Tool**
|
||||
|
||||
```python
|
||||
@mcptool(
|
||||
name="text_transformer",
|
||||
description="Transform text with reverse, uppercase, repeat operations"
|
||||
)
|
||||
def text_transformer(params: dict) -> dict:
|
||||
text = params["text"]
|
||||
operation = params["operation"]
|
||||
# Do whatever you want inside your tool
|
||||
return {"result": transformed_text}
|
||||
```
|
||||
|
||||
**Key Principle:** Tools consume context. The `/context` command shows what's loaded—every tool takes space in your agent's mind.
|
||||
|
||||
## The Core 4 in Different Agent Types
|
||||
|
||||
### Base Claude Code Agent
|
||||
|
||||
```text
|
||||
Context: CLAUDE.md + conversation history
|
||||
Model: User-selected (Opus/Sonnet/Haiku)
|
||||
Prompt: User prompts → system prompt
|
||||
Tools: All 15 built-in + loaded MCP servers
|
||||
```
|
||||
|
||||
### Custom Agent (SDK)
|
||||
|
||||
```text
|
||||
Context: Can be customized (override or extend)
|
||||
Model: Specified in options (can use Haiku for speed)
|
||||
Prompt: Custom system prompt (can override completely)
|
||||
Tools: Custom tools + optionally built-in tools
|
||||
```
|
||||
|
||||
**Example:** The Pong agent completely overrides Claude Code's system prompt—it's no longer Claude Code, it's a custom agent.
|
||||
|
||||
### Sub-Agent
|
||||
|
||||
```text
|
||||
Context: Isolated context window (no history from primary)
|
||||
Model: Inherits from primary or can be specified
|
||||
Prompt: System prompt (in .md file) + primary agent's instructions
|
||||
Tools: Configurable (can restrict to subset)
|
||||
```
|
||||
|
||||
**Key distinction:** Sub-agents have no context history. They only have what the primary agent prompts them with.
|
||||
|
||||
## Information Flow Between Agents
|
||||
|
||||
### Single Agent Flow
|
||||
|
||||
```text
|
||||
User Prompt
|
||||
↓
|
||||
Primary Agent (Context + Model + Prompt + Tools)
|
||||
↓
|
||||
Response to User
|
||||
```
|
||||
|
||||
### Multi-Agent Flow
|
||||
|
||||
```text
|
||||
User Prompt
|
||||
↓
|
||||
Primary Agent
|
||||
├→ Sub-Agent 1 (isolated context)
|
||||
├→ Sub-Agent 2 (isolated context)
|
||||
└→ Sub-Agent 3 (isolated context)
|
||||
↓
|
||||
Aggregates responses
|
||||
↓
|
||||
Response to User
|
||||
```
|
||||
|
||||
**Critical Understanding:**
|
||||
- Your sub-agents respond to your primary agent, not to you
|
||||
- Each sub-agent has its own Core 4
|
||||
- You must track multiple sets of (Context, Model, Prompt, Tools)
|
||||
|
||||
## Context Preservation vs Context Isolation
|
||||
|
||||
### Context Preservation (Benefit)
|
||||
|
||||
```text
|
||||
Primary Agent:
|
||||
├── Conversation history maintained
|
||||
├── Can reference previous work
|
||||
├── Builds on accumulated knowledge
|
||||
└── Uses client class in SDK for multi-turn conversations
|
||||
```
|
||||
|
||||
### Context Isolation (Feature + Limitation)
|
||||
|
||||
```text
|
||||
Sub-Agent:
|
||||
├── Fresh context window (no pollution from main conversation)
|
||||
├── Focused on single purpose
|
||||
├── Cannot access primary agent's full history
|
||||
└── Operates on what primary agent passes it
|
||||
```
|
||||
|
||||
**The Trade-off:** Context isolation makes agents focused (good) but limits information flow (limitation).
|
||||
|
||||
## The 12 Leverage Points of Agent Coding
|
||||
|
||||
While the Core 4 are foundational, experienced engineers track 12 leverage points:
|
||||
|
||||
1. **Context** (Core 4)
|
||||
2. **Model** (Core 4)
|
||||
3. **Prompt** (Core 4)
|
||||
4. **Tools** (Core 4)
|
||||
5. System prompt structure
|
||||
6. Tool permission management
|
||||
7. Context window monitoring
|
||||
8. Model selection per task
|
||||
9. Multi-agent orchestration
|
||||
10. Information flow design
|
||||
11. Debugging and observability
|
||||
12. Dependency coupling management
|
||||
|
||||
**Key Principle:** "Whenever you see Claude Code options, isolate the Core 4. How will the Core 4 be managed given this setup?"
|
||||
|
||||
## Practical Applications
|
||||
|
||||
### Application 1: Choosing the Right Model
|
||||
|
||||
```text
|
||||
Task: Simple text transformation
|
||||
Core 4 Analysis:
|
||||
├── Context: Minimal (just the text to transform)
|
||||
├── Model: Haiku (fast, cheap, sufficient)
|
||||
├── Prompt: Simple instruction ("reverse this text")
|
||||
└── Tools: Custom text_transformer tool
|
||||
|
||||
Result: Pennies cost, sub-second response
|
||||
```
|
||||
|
||||
### Application 2: Managing Context Explosion
|
||||
|
||||
```text
|
||||
Problem: Primary agent context at 180k tokens
|
||||
Core 4 Analysis:
|
||||
├── Context: Too much accumulated history
|
||||
├── Model: Opus (expensive at high token count)
|
||||
├── Prompt: Gets diluted in massive context
|
||||
└── Tools: All 15 + 5 MCP servers (24k tokens)
|
||||
|
||||
Solution: Delegate to sub-agents
|
||||
├── Context: Split work across 3 sub-agents (60k each)
|
||||
├── Model: Keep Opus only where needed
|
||||
├── Prompt: Focused sub-agent system prompts
|
||||
└── Tools: Restrict to relevant subset per agent
|
||||
|
||||
Result: Work completed, context manageable
|
||||
```
|
||||
|
||||
### Application 3: Custom Agent for Specialized Workflow
|
||||
|
||||
```text
|
||||
Use Case: Plan-Build-Review-Ship task board
|
||||
Core 4 Design:
|
||||
├── Context: Task board state + file structure
|
||||
├── Model: Sonnet (balanced for coding + reasoning)
|
||||
├── Prompt: Custom system prompt defining PBRS workflow
|
||||
└── Tools: Built-in file ops + custom task board tools
|
||||
|
||||
Implementation: SDK with custom system prompt and tools
|
||||
Result: Specialized agent that understands your specific workflow
|
||||
```
|
||||
|
||||
## System Prompts vs User Prompts in Practice
|
||||
|
||||
### The Confusion
|
||||
|
||||
Many engineers treat sub-agent `.md` files as user prompts. **This is wrong.**
|
||||
|
||||
```markdown
|
||||
# ❌ Wrong: Writing sub-agent prompt like a user prompt
|
||||
Please analyze this codebase and tell me what it does.
|
||||
```
|
||||
|
||||
```markdown
|
||||
# ✅ Correct: Writing sub-agent prompt as system prompt
|
||||
Purpose: Analyze codebases and provide concise summaries
|
||||
|
||||
When called, you will receive a user's request from the PRIMARY AGENT.
|
||||
Your job is to read relevant files and create a summary.
|
||||
|
||||
Report Format:
|
||||
Respond to the PRIMARY AGENT (not the user) with:
|
||||
"Claude, tell the user: [your summary]"
|
||||
```
|
||||
|
||||
### Why the Distinction Matters
|
||||
|
||||
```text
|
||||
System Prompt:
|
||||
├── Defines WHO the agent is
|
||||
├── Loaded once (persistent)
|
||||
└── Affects all user interactions
|
||||
|
||||
User Prompt:
|
||||
├── Defines WHAT work to do
|
||||
├── Changes with each interaction
|
||||
└── Builds on system prompt foundation
|
||||
```
|
||||
|
||||
## Debugging with the Core 4
|
||||
|
||||
When an agent misbehaves, audit the Core 4:
|
||||
|
||||
```text
|
||||
1. Check Context:
|
||||
└── Run /context to see what's loaded
|
||||
└── Are unused MCP servers consuming tokens?
|
||||
|
||||
2. Check Model:
|
||||
└── Is Haiku trying to do Opus-level reasoning?
|
||||
└── Is cost/speed appropriate for task?
|
||||
|
||||
3. Check Prompt:
|
||||
└── Is system prompt clear and focused?
|
||||
└── Are sub-agents responding to primary, not user?
|
||||
|
||||
4. Check Tools:
|
||||
└── Run /all-tools to see available options
|
||||
└── Are too many tools creating choice paralysis?
|
||||
```
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Everything is Core 4** - Every agent configuration comes down to Context, Model, Prompt, Tools
|
||||
|
||||
2. **System ≠ User** - System prompts define agent identity; user prompts define work requests
|
||||
|
||||
3. **Information flows matter** - In multi-agent systems, understand who's talking to whom
|
||||
|
||||
4. **Focused agents perform better** - Like engineers, agents work best with clear, bounded context
|
||||
|
||||
5. **Model selection is strategic** - Don't overpay for Opus when Haiku will work
|
||||
|
||||
6. **Tools consume context** - Every MCP server and tool takes space in the agent's mind
|
||||
|
||||
7. **Context isolation is powerful** - Sub-agents get fresh starts, preventing context pollution
|
||||
|
||||
## Source Attribution
|
||||
|
||||
**Primary sources:**
|
||||
- Custom Agents transcript (Core 4 framework, system prompts, SDK usage)
|
||||
- Sub-Agents transcript (information flow, context preservation, multi-agent systems)
|
||||
|
||||
**Key quotes:**
|
||||
- "Keep track of the core four. If you understand the core 4 and how each element flows and controls your agent, you will understand compute." (Custom Agents)
|
||||
- "Context, model, prompt, and specifically the flow of the context, model, and prompt between different agents." (Sub-Agents)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Progressive Disclosure](progressive-disclosure.md) - Managing context (Core 4 pillar #1)
|
||||
- [Architecture Reference](architecture.md) - How components use the Core 4
|
||||
- [Decision Framework](../patterns/decision-framework.md) - Choosing components based on Core 4 needs
|
||||
- [Context Window Protection](../patterns/context-window-protection.md) - Advanced context management
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Context, Model, Prompt, Tools. Master these four, and you master Claude Code.
|
||||
630
skills/multi-agent-composition/workflows/decision-tree.md
Normal file
630
skills/multi-agent-composition/workflows/decision-tree.md
Normal file
@@ -0,0 +1,630 @@
|
||||
# Claude Code Agent Features - Comprehensive Guide
|
||||
|
||||
This document visualizes the complete structure of Claude Code agent features, their relationships, use cases, and best practices.
|
||||
|
||||
---
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
- **New to Claude Code?** Start with "The Core 4 Thinking Framework"
|
||||
- **Choosing a component?** Use the "Decision Tree"
|
||||
- **Understanding architecture?** Study the "Mindmap"
|
||||
- **Quick reference?** Check the "Decision Matrix"
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
Understanding these terms is critical for navigating Claude Code's composition model:
|
||||
|
||||
- **Use** - Invoke a single component for a task (e.g., calling a slash command)
|
||||
- **Compose** - Wire multiple components together into a larger workflow (e.g., a skill that orchestrates prompts, sub-agents, and MCPs)
|
||||
- **Nest** - Hierarchical containment (placing one capability inside another's scope)
|
||||
- **Hard Limit:** Sub-agents cannot nest other sub-agents (technical restriction)
|
||||
- **Allowed:** Skills can compose/use sub-agents, prompts, MCPs, and other skills
|
||||
|
||||
---
|
||||
|
||||
## The Core 4 Thinking Framework
|
||||
|
||||
Every agent is built on these four fundamental pieces:
|
||||
|
||||
1. **Context** - What information does the agent have access to?
|
||||
2. **Model** - What capabilities does the model provide?
|
||||
3. **Prompt** - What instruction are you giving?
|
||||
4. **Tools** - What actions can the agent take?
|
||||
|
||||
**Master these fundamentals first.** If you understand these four elements, you can master any agentic feature or tool. This is the foundation - everything else builds on top of this.
|
||||
|
||||
---
|
||||
|
||||
## Component Overview Mindmap
|
||||
|
||||
```mermaid
|
||||
mindmap
|
||||
root((Claude Code Agent Features))
|
||||
Core Agentic Elements
|
||||
The Core 4 Thinking Framework
|
||||
Context: What information?
|
||||
Model: What capability?
|
||||
Prompt: What instruction?
|
||||
Tools: What actions?
|
||||
Context
|
||||
Model
|
||||
Prompt
|
||||
Tools
|
||||
Key Components
|
||||
Agent Skills
|
||||
Capabilities
|
||||
Triggered by Agents
|
||||
Context Efficient
|
||||
Progressive Disclosure
|
||||
Modular Directory Structure
|
||||
Composability w/ Features
|
||||
Dedicated Solutions
|
||||
Pros
|
||||
Agent-Initiated Automation
|
||||
Context Window Protection
|
||||
Logical Organization/File Structure
|
||||
Feature Composition Ability
|
||||
Agentic Approach
|
||||
Cons
|
||||
Subject to sub-agent nesting limitation
|
||||
Reliability in complex chains needs attention
|
||||
Not a replacement for other features
|
||||
Examples
|
||||
Meta Skill
|
||||
Video Processor Skill
|
||||
Work Tree Manager Skill
|
||||
Author Assessment
|
||||
Rating: 8/10
|
||||
Not a replacement for other features
|
||||
Higher compositional level
|
||||
Thin opinionated file structure
|
||||
MCP Servers
|
||||
External Integrations
|
||||
Expose Services to Agent
|
||||
Context Window Impact
|
||||
Sub Agents
|
||||
Isolated Workflows
|
||||
Context Protection
|
||||
Parallelization Support
|
||||
Cannot nest other sub-agents
|
||||
Custom Slash Commands
|
||||
Manual Triggers
|
||||
Reusable Prompt Shortcuts
|
||||
Primitive Unit (Prompt)
|
||||
Hooks
|
||||
Deterministic Automation
|
||||
Executes on Lifecycle Events
|
||||
Code/Agent Integration
|
||||
Plugins
|
||||
Distribute Extensions
|
||||
Reusable Work
|
||||
Output Styles
|
||||
Customizable Output
|
||||
Examples
|
||||
text-to-speech
|
||||
diff
|
||||
summary
|
||||
Use Case Examples
|
||||
Automatic PDF Text Extraction → Agent Skill
|
||||
Connect to Jira → MCP Server
|
||||
Security Audit → Sub Agent
|
||||
Git Commit Messages → Slash Command
|
||||
Database Queries → MCP Server
|
||||
Fix & Debug Tests → Sub Agent
|
||||
Detect Style Guide Violations → Agent Skill
|
||||
Fetch Real-Time Weather → MCP Server
|
||||
Create UI Component → Slash Command
|
||||
Parallel Workflow Tasks → Sub Agent
|
||||
Proper Usage Patterns
|
||||
CRITICAL: Prompts Are THE Primitive
|
||||
Everything is prompts (tokens in/out)
|
||||
Master this FIRST (non-negotiable)
|
||||
Don't convert all slash commands to skills
|
||||
Core building block for all components
|
||||
When To Use Each Feature
|
||||
Start Simple With Prompts
|
||||
Scaling to Skill (Repeat Use)
|
||||
Skill As Solution Manager
|
||||
Compositional Hierarchy
|
||||
Skills: Top Compositional Layer
|
||||
Composition Examples
|
||||
Technical Limits
|
||||
Agentic Composability Advice
|
||||
Context considerations
|
||||
Model selection
|
||||
Prompt design
|
||||
Tool integration
|
||||
Common Anti-Patterns
|
||||
Converting all slash commands to skills (HUGE MISTAKE)
|
||||
Using skills for one-off tasks
|
||||
Forgetting prompts are the foundation
|
||||
Not mastering prompts first
|
||||
Best Practices & Recommendations
|
||||
Auto-Organize workflows
|
||||
Leverage progressive disclosure
|
||||
Maintain clear boundaries between components
|
||||
Use appropriate abstraction levels
|
||||
Capabilities Breakdown
|
||||
Detailed analysis of each component's capabilities and limitations
|
||||
Key Insights
|
||||
Hierarchical Understanding
|
||||
Prompts = Primitive foundation
|
||||
Slash Commands = Reusable prompts
|
||||
Sub-Agents = Isolated execution contexts
|
||||
MCP Servers = External integrations
|
||||
Skills = Top-level orchestration layer
|
||||
Hooks = Lifecycle automation
|
||||
Plugins = Distribution mechanism
|
||||
Output Styles = Presentation layer
|
||||
Critical Distinctions
|
||||
Sub-agents cannot nest other sub-agents (hard limit)
|
||||
Skills can compose sub-agents, prompts, MCPs, other skills
|
||||
Prompts are the fundamental primitive
|
||||
Skills are compositional layers, not replacements
|
||||
Context efficiency matters
|
||||
Reliability in complex chains needs attention
|
||||
Decision Framework
|
||||
Repeatable pattern detection → Agent Skill
|
||||
External data/service access → MCP Server
|
||||
Parallel/isolated work → Sub Agent
|
||||
Parallel workflow tasks → Sub Agent (whenever you see parallel, think sub-agents)
|
||||
One-off task → Slash Command
|
||||
Lifecycle automation → Hook
|
||||
Team distribution → Plugin
|
||||
Composition Model
|
||||
Skills Orchestration Layer
|
||||
Can compose: Prompts/Slash Commands, MCP Servers, Sub-Agents, Other Skills
|
||||
Restriction: Avoid circular dependencies (skill A → skill B → skill A)
|
||||
Purpose: Domain-specific workflow orchestration
|
||||
Sub-Agents Execution Layer
|
||||
Can compose: Prompts, MCP Servers
|
||||
Cannot nest: Sub-agents within sub-agents (hard technical limitation)
|
||||
Purpose: Isolated/parallel task execution
|
||||
Slash Commands Primitive Layer
|
||||
Manual invocation
|
||||
Reusable prompts
|
||||
Can be composed into higher layers
|
||||
MCP Servers Integration Layer
|
||||
External connections
|
||||
Expose services to all components
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Composition Hierarchy
|
||||
|
||||
The mindmap shows a clear composition hierarchy:
|
||||
|
||||
1. **Prompts** = Primitive foundation (everything builds on this)
|
||||
2. **Slash Commands** = Reusable prompts
|
||||
3. **Sub-Agents** = Isolated execution contexts
|
||||
4. **MCP Servers** = External integrations
|
||||
5. **Skills** = Top-level orchestration layer
|
||||
6. **Hooks** = Lifecycle automation
|
||||
7. **Plugins** = Distribution mechanism
|
||||
8. **Output Styles** = Presentation layer
|
||||
|
||||
### Verified Composition Capabilities
|
||||
|
||||
**Skills can compose:**
|
||||
|
||||
- ✅ Prompts/Slash Commands
|
||||
- ✅ MCP Servers
|
||||
- ✅ Sub-Agents
|
||||
- ✅ Other Skills (avoid circular dependencies)
|
||||
|
||||
**Sub-Agents can compose:**
|
||||
|
||||
- ✅ Prompts
|
||||
- ✅ MCP Servers
|
||||
- ❌ Other Sub-Agents (hard technical limitation - verified in official docs)
|
||||
|
||||
**Technical Limit (Verified):**
|
||||
|
||||
- Sub-agents **cannot nest other sub-agents** (this prevents infinite recursion)
|
||||
- This is the only hard nesting restriction in the system
|
||||
|
||||
---
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Task Type | Component | Reason |
|
||||
|-----------|-----------|---------|
|
||||
| Repeatable pattern detection | Agent Skill | Domain-specific workflow |
|
||||
| External data/service access | MCP Server | Integration point |
|
||||
| Parallel/isolated work | Sub Agent | Context isolation |
|
||||
| Parallel workflow tasks | Sub Agent | **Whenever you see parallel, think sub-agents** |
|
||||
| One-off task | Slash Command | Simple, direct |
|
||||
| Lifecycle automation | Hook | Event-driven |
|
||||
| Team distribution | Plugin | Packaging |
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree: When to Use What
|
||||
|
||||
This decision tree helps you choose the right Claude Code component based on your needs. **Always start with prompts** - master the primitive first!
|
||||
|
||||
```graphviz
|
||||
digraph decision_tree {
|
||||
rankdir=TB;
|
||||
node [shape=box, style=rounded];
|
||||
|
||||
start [label="What are you trying to do?", shape=diamond, style="filled", fillcolor=lightblue];
|
||||
|
||||
prompt_start [label="START HERE:\nBuild a Prompt\n(Slash Command)", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
parallel_check [label="Need parallelization\nor isolated context?", shape=diamond];
|
||||
external_check [label="External data/service\nintegration?", shape=diamond];
|
||||
oneoff_check [label="One-off task\n(simple, direct)?", shape=diamond];
|
||||
repeatable_check [label="Repeatable workflow\n(pattern detection)?", shape=diamond];
|
||||
lifecycle_check [label="Lifecycle event\nautomation?", shape=diamond];
|
||||
distribution_check [label="Sharing/distributing\nto team?", shape=diamond];
|
||||
|
||||
subagent [label="Use Sub Agent\nIsolated context\nParallel execution\nContext protection", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
mcp [label="Use MCP Server\nExternal integrations\nExpose services\nContext window impact", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
slash_cmd [label="Use Slash Command\nManual trigger\nReusable prompt\nPrimitive unit", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
skill [label="Use Agent Skill\nAgent-triggered\nContext efficient\nProgressive disclosure\nModular structure", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
hook [label="Use Hook\nDeterministic automation\nLifecycle events\nCode/Agent integration", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
plugin [label="Use Plugin\nDistribute extensions\nReusable work\nPackaging/sharing", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
|
||||
start -> prompt_start [label="Always start here", style=dashed, color=red];
|
||||
prompt_start -> parallel_check;
|
||||
|
||||
parallel_check -> subagent [label="Yes\n⚠️ Whenever you see\n'parallel', think sub-agents"];
|
||||
parallel_check -> external_check [label="No"];
|
||||
|
||||
external_check -> mcp [label="Yes"];
|
||||
external_check -> oneoff_check [label="No"];
|
||||
|
||||
oneoff_check -> slash_cmd [label="Yes\nKeep it simple"];
|
||||
oneoff_check -> repeatable_check [label="No"];
|
||||
|
||||
repeatable_check -> skill [label="Yes\nScale to skill\nfor repeat use"];
|
||||
repeatable_check -> lifecycle_check [label="No"];
|
||||
|
||||
lifecycle_check -> hook [label="Yes"];
|
||||
lifecycle_check -> distribution_check [label="No"];
|
||||
|
||||
distribution_check -> plugin [label="Yes"];
|
||||
distribution_check -> slash_cmd [label="No\nDefault: Use prompt"];
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Tree Key Points
|
||||
|
||||
**Critical Rule**: Always start with **Prompts** (implemented as Slash Commands). Master the primitive first before scaling to other components.
|
||||
|
||||
**Decision Flow**:
|
||||
|
||||
1. **Parallel/Isolated?** → Sub Agent (whenever you see "parallel", think sub-agents)
|
||||
2. **External Integration?** → MCP Server
|
||||
3. **One-off Task?** → Slash Command (keep it simple)
|
||||
4. **Repeatable Pattern?** → Agent Skill (scale up)
|
||||
5. **Lifecycle Automation?** → Hook
|
||||
6. **Team Distribution?** → Plugin
|
||||
7. **Default** → Slash Command (prompt)
|
||||
|
||||
**Remember**: Skills are compositional layers, not replacements. Don't convert all your slash commands to skills - that's a HUGE MISTAKE!
|
||||
|
||||
---
|
||||
|
||||
## Critical Principles
|
||||
|
||||
- **⚠️ CRITICAL: Prompts are THE fundamental primitive** - Everything is prompts (tokens in/out). Master this FIRST (non-negotiable). Don't convert all slash commands to skills.
|
||||
- **Sub-agents cannot nest other sub-agents** (hard technical limitation - verified in official docs)
|
||||
- **Skills CAN compose sub-agents, prompts, MCPs, and other skills** (verified through first-hand experience)
|
||||
- **Skills are compositional layers, not replacements** (complementary, not substitutes). Rating: 8/10 - "Higher compositional level" not a replacement.
|
||||
- **Context efficiency matters** (progressive disclosure, isolation)
|
||||
- **Reliability in complex chains needs attention** (acknowledged challenge)
|
||||
- **Parallel keyword = Sub Agents** - Whenever you see parallel, think sub-agents
|
||||
|
||||
---
|
||||
|
||||
## Verified Composition Rules
|
||||
|
||||
Based on official documentation and empirical testing:
|
||||
|
||||
### Skills (Top Orchestration Layer)
|
||||
|
||||
- ✅ **Can invoke/compose:** Prompts/Slash Commands, MCP Servers, Sub-Agents, Other Skills
|
||||
- ⚠️ **Best Practice:** Avoid circular dependencies (skill A → skill B → skill A)
|
||||
- ℹ️ **Purpose:** Domain-specific workflow orchestration
|
||||
- ℹ️ **When to use:** Repeatable workflows that benefit from automatic triggering
|
||||
|
||||
### Sub-Agents (Execution Layer)
|
||||
|
||||
- ✅ **Can invoke/compose:** Prompts, MCP Servers
|
||||
- ❌ **Cannot nest:** Other sub-agents (hard technical limitation from official docs)
|
||||
- ℹ️ **Purpose:** Isolated/parallel task execution with separate context
|
||||
- ℹ️ **When to use:** Parallel work, context isolation, specialized roles
|
||||
|
||||
### Slash Commands (Primitive Layer)
|
||||
|
||||
- ✅ **Can be composed into:** Skills, Sub-Agents
|
||||
- ℹ️ **Purpose:** Manual invocation of reusable prompts
|
||||
- ℹ️ **When to use:** One-off tasks, simple workflows, building blocks
|
||||
|
||||
### MCP Servers (Integration Layer)
|
||||
|
||||
- ✅ **Can be used by:** Skills, Sub-Agents, Main Agent
|
||||
- ℹ️ **Purpose:** External service/data integration
|
||||
- ℹ️ **When to use:** Need to access external APIs, databases, or services
|
||||
|
||||
---
|
||||
|
||||
## Common Anti-Patterns to Avoid
|
||||
|
||||
- **Converting all slash commands to skills** - This is a HUGE MISTAKE. Skills are for repeatable workflows, not one-off tasks.
|
||||
- **Using skills for one-off tasks** - Use slash commands (prompts) instead.
|
||||
- **Forgetting prompts are the foundation** - Master prompts first before building skills.
|
||||
- **Not mastering prompts first** - If you avoid understanding prompts, you will not progress as an agentic engineer.
|
||||
- **Trying to nest sub-agents** - This is a hard technical limitation and will fail.
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### When to Use Each Component
|
||||
|
||||
**Start with Prompts:**
|
||||
|
||||
- Begin every workflow as a prompt/slash command
|
||||
- Test and validate the approach
|
||||
- Only promote to skill when pattern repeats
|
||||
|
||||
**Scale to Skills:**
|
||||
|
||||
- Pattern used multiple times? → Create a skill
|
||||
- Need automatic triggering? → Create a skill
|
||||
- Complex multi-step workflow? → Create a skill
|
||||
- One-off task? → Keep as slash command
|
||||
|
||||
**Use Sub-Agents for:**
|
||||
|
||||
- Parallel execution needs
|
||||
- Context isolation required
|
||||
- Specialized roles with separate context
|
||||
- Research or planning phases
|
||||
|
||||
**Use MCP Servers for:**
|
||||
|
||||
- External API integration
|
||||
- Database access
|
||||
- Third-party service connections
|
||||
|
||||
---
|
||||
|
||||
## Detailed Component Analysis
|
||||
|
||||
### Agent Skills
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- Triggered automatically by agents based on description matching
|
||||
- Context efficient through progressive disclosure
|
||||
- Modular directory structure (SKILL.md, scripts/, references/, assets/)
|
||||
- Can compose with all other features
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Agent-initiated automation (no manual invocation needed)
|
||||
- Context window protection (progressive disclosure)
|
||||
- Logical organization and file structure
|
||||
- Feature composition ability
|
||||
- Scales from simple to complex
|
||||
|
||||
**Cons:**
|
||||
|
||||
- Subject to sub-agent nesting limitation (composed sub-agents can't nest others)
|
||||
- Reliability in complex chains needs attention
|
||||
- Not a replacement for other features (complementary)
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- Repeatable workflows
|
||||
- Domain-specific expertise
|
||||
- Complex multi-step processes
|
||||
- When you want automatic triggering
|
||||
|
||||
**Examples:**
|
||||
|
||||
- PDF processing workflows
|
||||
- Code generation patterns
|
||||
- Documentation generation
|
||||
- Brand guidelines enforcement
|
||||
|
||||
### Sub-Agents
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- Isolated execution context (separate from main agent)
|
||||
- Can run in parallel
|
||||
- Custom system prompts
|
||||
- Tool access (can inherit or specify)
|
||||
- Access to MCP servers
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Context isolation
|
||||
- Parallel execution
|
||||
- Specialized expertise
|
||||
- Separate tool permissions
|
||||
|
||||
**Cons:**
|
||||
|
||||
- Cannot nest other sub-agents (hard limit)
|
||||
- No memory between invocations
|
||||
- Need to re-gather context each time
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- Parallel workflow tasks
|
||||
- Isolated research/planning
|
||||
- Specialized roles (architect, tester, reviewer)
|
||||
- When you need separate context
|
||||
|
||||
**Technical Note:**
|
||||
|
||||
- **VERIFIED:** Sub-agents cannot spawn other sub-agents (official docs)
|
||||
- This prevents infinite nesting and maintains system stability
|
||||
|
||||
### MCP Servers
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- External service integration
|
||||
- Standardized protocol
|
||||
- Authentication handling
|
||||
- Available to all components
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- Need external data
|
||||
- API access required
|
||||
- Database queries
|
||||
- Third-party service integration
|
||||
|
||||
### Slash Commands
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- Manual invocation
|
||||
- Reusable prompts
|
||||
- Project or global scope
|
||||
- Can be composed into skills and sub-agents
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- One-off tasks
|
||||
- Simple workflows
|
||||
- Testing new patterns
|
||||
- Building blocks for skills
|
||||
|
||||
### Hooks
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- Lifecycle event automation
|
||||
- Deterministic execution
|
||||
- Code/agent integration
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- Pre/post command execution
|
||||
- File change reactions
|
||||
- Environment validation
|
||||
|
||||
### Plugins
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- Bundle multiple components
|
||||
- Distribution mechanism
|
||||
- Team sharing
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- Sharing complete workflows
|
||||
- Team standardization
|
||||
- Marketplace distribution
|
||||
|
||||
---
|
||||
|
||||
## Composition Examples
|
||||
|
||||
### Example 1: Full-Stack Development Skill
|
||||
|
||||
A skill that orchestrates:
|
||||
|
||||
- Calls planning sub-agent (for architecture)
|
||||
- Calls coding sub-agent (for implementation)
|
||||
- Uses MCP server (for database queries)
|
||||
- Invokes testing slash command (for validation)
|
||||
|
||||
**This is valid** because:
|
||||
|
||||
- Skill composes sub-agents ✓
|
||||
- Skill composes MCP servers ✓
|
||||
- Skill composes slash commands ✓
|
||||
- Sub-agents don't nest each other ✓
|
||||
|
||||
### Example 2: Research Workflow
|
||||
|
||||
A skill that:
|
||||
|
||||
- Calls research sub-agent #1 (searches documentation)
|
||||
- Calls research sub-agent #2 (analyzes codebase)
|
||||
- Both run in parallel
|
||||
- Both use MCP server for external docs
|
||||
|
||||
**This is valid** because:
|
||||
|
||||
- Skill orchestrates multiple sub-agents ✓
|
||||
- Sub-agents run in parallel (separate contexts) ✓
|
||||
- Sub-agents don't nest each other ✓
|
||||
|
||||
### Example 3: INVALID - Nested Sub-Agents
|
||||
|
||||
A sub-agent that tries to:
|
||||
|
||||
- ❌ Call another sub-agent from within itself
|
||||
|
||||
**This will FAIL** because:
|
||||
|
||||
- Sub-agents cannot nest other sub-agents (hard limit)
|
||||
|
||||
---
|
||||
|
||||
## Key Insights Summary
|
||||
|
||||
### Hierarchical Understanding
|
||||
|
||||
1. **Prompts** = Primitive foundation (everything builds on this)
|
||||
2. **Slash Commands** = Reusable prompts with manual invocation
|
||||
3. **Sub-Agents** = Isolated execution contexts with separate context windows
|
||||
4. **MCP Servers** = External integrations available to all
|
||||
5. **Skills** = Top-level orchestration layer (composes everything)
|
||||
6. **Hooks** = Lifecycle automation
|
||||
7. **Plugins** = Distribution mechanism
|
||||
8. **Output Styles** = Presentation layer
|
||||
|
||||
### Critical Technical Facts
|
||||
|
||||
**Verified from Official Docs:**
|
||||
|
||||
- ✅ Sub-agents CANNOT nest other sub-agents (hard technical limitation)
|
||||
|
||||
**Verified from First-Hand Experience:**
|
||||
|
||||
- ✅ Skills CAN invoke/compose sub-agents
|
||||
- ✅ Skills CAN invoke/compose slash commands
|
||||
- ✅ Skills CAN invoke/compose other skills
|
||||
|
||||
**Best Practices:**
|
||||
|
||||
- Start with prompts (master the primitive)
|
||||
- Don't convert all slash commands to skills
|
||||
- Use sub-agents for parallel/isolated work
|
||||
- Use skills for repeatable workflows
|
||||
- Avoid circular skill dependencies
|
||||
|
||||
---
|
||||
|
||||
## Testing Recommendations
|
||||
|
||||
Before deploying any complex workflow:
|
||||
|
||||
1. **Test individual components** - Verify each slash command works
|
||||
2. **Test sub-agent isolation** - Confirm context separation
|
||||
3. **Test skill triggering** - Ensure description matches use cases
|
||||
4. **Test composition** - Verify skills can call sub-agents
|
||||
5. **Test parallel execution** - Confirm sub-agents run independently
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** Corrected and Verified
|
||||
**Last Updated:** Based on Claude Code capabilities as of November 2025
|
||||
**Verification:** Technical facts confirmed via official docs + empirical testing
|
||||
361
skills/skill-creator/SKILL.md
Normal file
361
skills/skill-creator/SKILL.md
Normal file
@@ -0,0 +1,361 @@
|
||||
---
|
||||
name: skill-creator
|
||||
description: >
|
||||
This skill should be used when the user asks to "create a skill", "build a skill", "write a new
|
||||
skill", "generate SKILL.md", "write skill frontmatter", "package a skill", "organize skill
|
||||
content", "add progressive disclosure", needs guidance on skill structure, bundled resources
|
||||
(scripts/references/assets), or wants to extend Claude's capabilities with specialized knowledge,
|
||||
workflows, or tool integrations.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
# Skill Creator
|
||||
|
||||
This skill provides guidance for creating effective skills.
|
||||
|
||||
## About Skills
|
||||
|
||||
Skills are modular, self-contained packages that extend Claude's capabilities by providing
|
||||
specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific
|
||||
domains or tasks—they transform Claude from a general-purpose agent into a specialized agent
|
||||
equipped with procedural knowledge that no model can fully possess.
|
||||
|
||||
### What Skills Provide
|
||||
|
||||
1. Specialized workflows - Multi-step procedures for specific domains
|
||||
2. Tool integrations - Instructions for working with specific file formats or APIs
|
||||
3. Domain expertise - Company-specific knowledge, schemas, business logic
|
||||
4. Bundled resources - Scripts, references, and assets for complex and repetitive tasks
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Concise is Key
|
||||
|
||||
The context window is a public good. Skills share the context window with everything else Claude needs: system prompt, conversation history, other Skills' metadata, and the actual user request.
|
||||
|
||||
**Default assumption: Claude is already very smart.** Only add context Claude doesn't already have. Challenge each piece of information: "Does Claude really need this explanation?" and "Does this paragraph justify its token cost?"
|
||||
|
||||
Prefer concise examples over verbose explanations.
|
||||
|
||||
### Set Appropriate Degrees of Freedom
|
||||
|
||||
Match the level of specificity to the task's fragility and variability:
|
||||
|
||||
**High freedom (text-based instructions)**: Use when multiple approaches are valid, decisions depend on context, or heuristics guide the approach.
|
||||
|
||||
**Medium freedom (pseudocode or scripts with parameters)**: Use when a preferred pattern exists, some variation is acceptable, or configuration affects behavior.
|
||||
|
||||
**Low freedom (specific scripts, few parameters)**: Use when operations are fragile and error-prone, consistency is critical, or a specific sequence must be followed.
|
||||
|
||||
Think of Claude as exploring a path: a narrow bridge with cliffs needs specific guardrails (low freedom), while an open field allows many routes (high freedom).
|
||||
|
||||
### Anatomy of a Skill
|
||||
|
||||
Every skill consists of a required SKILL.md file and optional bundled resources:
|
||||
|
||||
```
|
||||
skill-name/
|
||||
├── SKILL.md (required)
|
||||
│ ├── YAML frontmatter metadata (required)
|
||||
│ │ ├── name: (required)
|
||||
│ │ └── description: (required)
|
||||
│ └── Markdown instructions (required)
|
||||
└── Bundled Resources (optional)
|
||||
├── scripts/ - Executable code (Python/Bash/etc.)
|
||||
├── references/ - Documentation intended to be loaded into context as needed
|
||||
└── assets/ - Files used in output (templates, icons, fonts, etc.)
|
||||
```
|
||||
|
||||
#### SKILL.md (required)
|
||||
|
||||
Every SKILL.md consists of:
|
||||
|
||||
- **Frontmatter** (YAML): Contains `name` and `description` fields. These are the only fields that Claude reads to determine when the skill gets used, thus it is very important to be clear and comprehensive in describing what the skill is, and when it should be used.
|
||||
- **Body** (Markdown): Instructions and guidance for using the skill. Only loaded AFTER the skill triggers (if at all).
|
||||
|
||||
#### Bundled Resources (optional)
|
||||
|
||||
##### Scripts (`scripts/`)
|
||||
|
||||
Executable code (Python/Bash/etc.) for tasks that require deterministic reliability or are repeatedly rewritten.
|
||||
|
||||
- **When to include**: When the same code is being rewritten repeatedly or deterministic reliability is needed
|
||||
- **Example**: `scripts/rotate_pdf.py` for PDF rotation tasks
|
||||
- **Benefits**: Token efficient, deterministic, may be executed without loading into context
|
||||
- **Note**: Scripts may still need to be read by Claude for patching or environment-specific adjustments
|
||||
|
||||
##### References (`references/`)
|
||||
|
||||
Documentation and reference material intended to be loaded as needed into context to inform Claude's process and thinking.
|
||||
|
||||
- **When to include**: For documentation that Claude should reference while working
|
||||
- **Examples**: `references/finance.md` for financial schemas, `references/mnda.md` for company NDA template, `references/policies.md` for company policies, `references/api_docs.md` for API specifications
|
||||
- **Use cases**: Database schemas, API documentation, domain knowledge, company policies, detailed workflow guides
|
||||
- **Benefits**: Keeps SKILL.md lean, loaded only when Claude determines it's needed
|
||||
- **Best practice**: If files are large (>10k words), include grep search patterns in SKILL.md
|
||||
- **Avoid duplication**: Information should live in either SKILL.md or references files, not both. Prefer references files for detailed information unless it's truly core to the skill—this keeps SKILL.md lean while making information discoverable without hogging the context window. Keep only essential procedural instructions and workflow guidance in SKILL.md; move detailed reference material, schemas, and examples to references files.
|
||||
|
||||
##### Assets (`assets/`)
|
||||
|
||||
Files not intended to be loaded into context, but rather used within the output Claude produces.
|
||||
|
||||
- **When to include**: When the skill needs files that will be used in the final output
|
||||
- **Examples**: `assets/logo.png` for brand assets, `assets/slides.pptx` for PowerPoint templates, `assets/frontend-template/` for HTML/React boilerplate, `assets/font.ttf` for typography
|
||||
- **Use cases**: Templates, images, icons, boilerplate code, fonts, sample documents that get copied or modified
|
||||
- **Benefits**: Separates output resources from documentation, enables Claude to use files without loading them into context
|
||||
|
||||
#### What to Not Include in a Skill
|
||||
|
||||
A skill should only contain essential files that directly support its functionality. Do NOT create extraneous documentation or auxiliary files, including:
|
||||
|
||||
- README.md
|
||||
- INSTALLATION_GUIDE.md
|
||||
- QUICK_REFERENCE.md
|
||||
- CHANGELOG.md
|
||||
- etc.
|
||||
|
||||
The skill should only contain the information needed for an AI agent to do the job at hand. It should not contain auxilary context about the process that went into creating it, setup and testing procedures, user-facing documentation, etc. Creating additional documentation files just adds clutter and confusion.
|
||||
|
||||
### Progressive Disclosure Design Principle
|
||||
|
||||
Skills use a three-level loading system to manage context efficiently:
|
||||
|
||||
1. **Metadata (name + description)** - Always in context (~100 words)
|
||||
2. **SKILL.md body** - When skill triggers (<5k words)
|
||||
3. **Bundled resources** - As needed by Claude (Unlimited because scripts can be executed without reading into context window)
|
||||
|
||||
#### Progressive Disclosure Patterns
|
||||
|
||||
Keep SKILL.md body to the essentials and under 500 lines to minimize context bloat. Split content into separate files when approaching this limit. When splitting out content into other files, it is very important to reference them from SKILL.md and describe clearly when to read them, to ensure the reader of the skill knows they exist and when to use them.
|
||||
|
||||
**Key principle:** When a skill supports multiple variations, frameworks, or options, keep only the core workflow and selection guidance in SKILL.md. Move variant-specific details (patterns, examples, configuration) into separate reference files.
|
||||
|
||||
**Pattern 1: High-level guide with references**
|
||||
|
||||
```markdown
|
||||
# PDF Processing
|
||||
|
||||
## Quick start
|
||||
|
||||
Extract text with pdfplumber:
|
||||
[code example]
|
||||
|
||||
## Advanced features
|
||||
|
||||
- **Form filling**: See [FORMS.md](FORMS.md) for complete guide
|
||||
- **API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
|
||||
- **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
|
||||
```
|
||||
|
||||
Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
|
||||
|
||||
**Pattern 2: Domain-specific organization**
|
||||
|
||||
For Skills with multiple domains, organize content by domain to avoid loading irrelevant context:
|
||||
|
||||
```
|
||||
bigquery-skill/
|
||||
├── SKILL.md (overview and navigation)
|
||||
└── reference/
|
||||
├── finance.md (revenue, billing metrics)
|
||||
├── sales.md (opportunities, pipeline)
|
||||
├── product.md (API usage, features)
|
||||
└── marketing.md (campaigns, attribution)
|
||||
```
|
||||
|
||||
When a user asks about sales metrics, Claude only reads sales.md.
|
||||
|
||||
Similarly, for skills supporting multiple frameworks or variants, organize by variant:
|
||||
|
||||
```
|
||||
cloud-deploy/
|
||||
├── SKILL.md (workflow + provider selection)
|
||||
└── references/
|
||||
├── aws.md (AWS deployment patterns)
|
||||
├── gcp.md (GCP deployment patterns)
|
||||
└── azure.md (Azure deployment patterns)
|
||||
```
|
||||
|
||||
When the user chooses AWS, Claude only reads aws.md.
|
||||
|
||||
**Pattern 3: Conditional details**
|
||||
|
||||
Show basic content, link to advanced content:
|
||||
|
||||
```markdown
|
||||
# DOCX Processing
|
||||
|
||||
## Creating documents
|
||||
|
||||
Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
|
||||
|
||||
## Editing documents
|
||||
|
||||
For simple edits, modify the XML directly.
|
||||
|
||||
**For tracked changes**: See [REDLINING.md](REDLINING.md)
|
||||
**For OOXML details**: See [OOXML.md](OOXML.md)
|
||||
```
|
||||
|
||||
Claude reads REDLINING.md or OOXML.md only when the user needs those features.
|
||||
|
||||
**Important guidelines:**
|
||||
|
||||
- **Avoid deeply nested references** - Keep references one level deep from SKILL.md. All reference files should link directly from SKILL.md.
|
||||
- **Structure longer reference files** - For files longer than 100 lines, include a table of contents at the top so Claude can see the full scope when previewing.
|
||||
|
||||
## Skill Creation Process
|
||||
|
||||
Skill creation involves these steps:
|
||||
|
||||
1. Understand the skill with concrete examples
|
||||
2. Plan reusable skill contents (scripts, references, assets)
|
||||
3. Initialize the skill (run init_skill.py)
|
||||
4. Edit the skill (implement resources and write SKILL.md)
|
||||
5. Package the skill (run package_skill.py)
|
||||
6. Iterate based on real usage
|
||||
|
||||
Follow these steps in order, skipping only if there is a clear reason why they are not applicable.
|
||||
|
||||
### Step 1: Understanding the Skill with Concrete Examples
|
||||
|
||||
Skip this step only when the skill's usage patterns are already clearly understood. It remains valuable even when working with an existing skill.
|
||||
|
||||
To create an effective skill, clearly understand concrete examples of how the skill will be used. This understanding can come from either direct user examples or generated examples that are validated with user feedback.
|
||||
|
||||
For example, when building an image-editor skill, relevant questions include:
|
||||
|
||||
- "What functionality should the image-editor skill support? Editing, rotating, anything else?"
|
||||
- "Can you give some examples of how this skill would be used?"
|
||||
- "I can imagine users asking for things like 'Remove the red-eye from this image' or 'Rotate this image'. Are there other ways you imagine this skill being used?"
|
||||
- "What would a user say that should trigger this skill?"
|
||||
|
||||
To avoid overwhelming users, avoid asking too many questions in a single message. Start with the most important questions and follow up as needed for better effectiveness.
|
||||
|
||||
Conclude this step when there is a clear sense of the functionality the skill should support.
|
||||
|
||||
### Step 2: Planning the Reusable Skill Contents
|
||||
|
||||
To turn concrete examples into an effective skill, analyze each example by:
|
||||
|
||||
1. Considering how to execute on the example from scratch
|
||||
2. Identifying what scripts, references, and assets would be helpful when executing these workflows repeatedly
|
||||
|
||||
Example: When building a `pdf-editor` skill to handle queries like "Help me rotate this PDF," the analysis shows:
|
||||
|
||||
1. Rotating a PDF requires re-writing the same code each time
|
||||
2. A `scripts/rotate_pdf.py` script would be helpful to store in the skill
|
||||
|
||||
Example: When designing a `frontend-webapp-builder` skill for queries like "Build me a todo app" or "Build me a dashboard to track my steps," the analysis shows:
|
||||
|
||||
1. Writing a frontend webapp requires the same boilerplate HTML/React each time
|
||||
2. An `assets/hello-world/` template containing the boilerplate HTML/React project files would be helpful to store in the skill
|
||||
|
||||
Example: When building a `big-query` skill to handle queries like "How many users have logged in today?" the analysis shows:
|
||||
|
||||
1. Querying BigQuery requires re-discovering the table schemas and relationships each time
|
||||
2. A `references/schema.md` file documenting the table schemas would be helpful to store in the skill
|
||||
|
||||
To establish the skill's contents, analyze each concrete example to create a list of the reusable resources to include: scripts, references, and assets.
|
||||
|
||||
### Step 3: Initializing the Skill
|
||||
|
||||
At this point, it is time to actually create the skill.
|
||||
|
||||
Skip this step only if the skill being developed already exists, and iteration or packaging is needed. In this case, continue to the next step.
|
||||
|
||||
When creating a new skill from scratch, always run the `init_skill.py` script. The script conveniently generates a new template skill directory that automatically includes everything a skill requires, making the skill creation process much more efficient and reliable.
|
||||
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
scripts/init_skill.py <skill-name> --path <output-directory>
|
||||
```
|
||||
|
||||
The script:
|
||||
|
||||
- Creates the skill directory at the specified path
|
||||
- Generates a SKILL.md template with proper frontmatter and TODO placeholders
|
||||
- Creates example resource directories: `scripts/`, `references/`, and `assets/`
|
||||
- Adds example files in each directory that can be customized or deleted
|
||||
|
||||
After initialization, customize or remove the generated SKILL.md and example files as needed.
|
||||
|
||||
### Step 4: Edit the Skill
|
||||
|
||||
When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of Claude to use. Include information that would be beneficial and non-obvious to Claude. Consider what procedural knowledge, domain-specific details, or reusable assets would help another Claude instance execute these tasks more effectively.
|
||||
|
||||
#### Learn Proven Design Patterns
|
||||
|
||||
Consult these helpful guides based on your skill's needs:
|
||||
|
||||
- **Multi-step processes**: See references/workflows.md for sequential workflows and conditional logic
|
||||
- **Specific output formats or quality standards**: See references/output-patterns.md for template and example patterns
|
||||
|
||||
These files contain established best practices for effective skill design.
|
||||
|
||||
#### Start with Reusable Skill Contents
|
||||
|
||||
To begin implementation, start with the reusable resources identified above: `scripts/`, `references/`, and `assets/` files. Note that this step may require user input. For example, when implementing a `brand-guidelines` skill, the user may need to provide brand assets or templates to store in `assets/`, or documentation to store in `references/`.
|
||||
|
||||
Added scripts must be tested by actually running them to ensure there are no bugs and that the output matches what is expected. If there are many similar scripts, only a representative sample needs to be tested to ensure confidence that they all work while balancing time to completion.
|
||||
|
||||
Any example files and directories not needed for the skill should be deleted. The initialization script creates example files in `scripts/`, `references/`, and `assets/` to demonstrate structure, but most skills won't need all of them.
|
||||
|
||||
#### Update SKILL.md
|
||||
|
||||
**Writing Guidelines:** Always use imperative/infinitive form.
|
||||
|
||||
##### Frontmatter
|
||||
|
||||
Write the YAML frontmatter with `name` and `description`:
|
||||
|
||||
- `name`: The skill name
|
||||
- `description`: This is the primary triggering mechanism for your skill, and helps Claude understand when to use the skill.
|
||||
- Include both what the Skill does and specific triggers/contexts for when to use it.
|
||||
- Include all "when to use" information here - Not in the body. The body is only loaded after triggering, so "When to Use This Skill" sections in the body are not helpful to Claude.
|
||||
- Example description for a `docx` skill: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. Use when Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
|
||||
|
||||
Do not include any other fields in YAML frontmatter.
|
||||
|
||||
##### Body
|
||||
|
||||
Write instructions for using the skill and its bundled resources.
|
||||
|
||||
### Step 5: Packaging a Skill
|
||||
|
||||
Once development of the skill is complete, it must be packaged into a distributable .skill file that gets shared with the user. The packaging process automatically validates the skill first to ensure it meets all requirements:
|
||||
|
||||
```bash
|
||||
scripts/package_skill.py <path/to/skill-folder>
|
||||
```
|
||||
|
||||
Optional output directory specification:
|
||||
|
||||
```bash
|
||||
scripts/package_skill.py <path/to/skill-folder> ./dist
|
||||
```
|
||||
|
||||
The packaging script will:
|
||||
|
||||
1. **Validate** the skill automatically, checking:
|
||||
|
||||
- YAML frontmatter format and required fields
|
||||
- Skill naming conventions and directory structure
|
||||
- Description completeness and quality
|
||||
- File organization and resource references
|
||||
|
||||
2. **Package** the skill if validation passes, creating a .skill file named after the skill (e.g., `my-skill.skill`) that includes all files and maintains the proper directory structure for distribution. The .skill file is a zip file with a .skill extension.
|
||||
|
||||
If validation fails, the script will report the errors and exit without creating a package. Fix any validation errors and run the packaging command again.
|
||||
|
||||
### Step 6: Iterate
|
||||
|
||||
After testing the skill, users may request improvements. Often this happens right after using the skill, with fresh context of how the skill performed.
|
||||
|
||||
**Iteration workflow:**
|
||||
|
||||
1. Use the skill on real tasks
|
||||
2. Notice struggles or inefficiencies
|
||||
3. Identify how SKILL.md or bundled resources should be updated
|
||||
4. Implement changes and test again
|
||||
86
skills/skill-creator/references/output-patterns.md
Normal file
86
skills/skill-creator/references/output-patterns.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Output Patterns
|
||||
|
||||
Use these patterns when skills need to produce consistent, high-quality output.
|
||||
|
||||
## Template Pattern
|
||||
|
||||
Provide templates for output format. Match the level of strictness to your needs.
|
||||
|
||||
**For strict requirements (like API responses or data formats):**
|
||||
|
||||
```markdown
|
||||
## Report structure
|
||||
|
||||
ALWAYS use this exact template structure:
|
||||
|
||||
# [Analysis Title]
|
||||
|
||||
## Executive summary
|
||||
[One-paragraph overview of key findings]
|
||||
|
||||
## Key findings
|
||||
- Finding 1 with supporting data
|
||||
- Finding 2 with supporting data
|
||||
- Finding 3 with supporting data
|
||||
|
||||
## Recommendations
|
||||
1. Specific actionable recommendation
|
||||
2. Specific actionable recommendation
|
||||
```
|
||||
|
||||
**For flexible guidance (when adaptation is useful):**
|
||||
|
||||
```markdown
|
||||
## Report structure
|
||||
|
||||
Here is a sensible default format, but use your best judgment:
|
||||
|
||||
# [Analysis Title]
|
||||
|
||||
## Executive summary
|
||||
[Overview]
|
||||
|
||||
## Key findings
|
||||
[Adapt sections based on what you discover]
|
||||
|
||||
## Recommendations
|
||||
[Tailor to the specific context]
|
||||
|
||||
Adjust sections as needed for the specific analysis type.
|
||||
```
|
||||
|
||||
## Examples Pattern
|
||||
|
||||
For skills where output quality depends on seeing examples, provide input/output pairs:
|
||||
|
||||
```markdown
|
||||
## Commit message format
|
||||
|
||||
Generate commit messages following these examples:
|
||||
|
||||
**Example 1:**
|
||||
Input: Added user authentication with JWT tokens
|
||||
Output:
|
||||
```
|
||||
|
||||
feat(auth): implement JWT-based authentication
|
||||
|
||||
Add login endpoint and token validation middleware
|
||||
|
||||
```text
|
||||
|
||||
**Example 2:**
|
||||
Input: Fixed bug where dates displayed incorrectly in reports
|
||||
Output:
|
||||
```
|
||||
|
||||
fix(reports): correct date formatting in timezone conversion
|
||||
|
||||
Use UTC timestamps consistently across report generation
|
||||
|
||||
```text
|
||||
|
||||
Follow this style: type(scope): brief description, then detailed explanation.
|
||||
```
|
||||
|
||||
Examples help Claude understand the desired style and level of detail more clearly than descriptions alone.
|
||||
28
skills/skill-creator/references/workflows.md
Normal file
28
skills/skill-creator/references/workflows.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Workflow Patterns
|
||||
|
||||
## Sequential Workflows
|
||||
|
||||
For complex tasks, break operations into clear, sequential steps. It is often helpful to give Claude an overview of the process towards the beginning of SKILL.md:
|
||||
|
||||
```markdown
|
||||
Filling a PDF form involves these steps:
|
||||
|
||||
1. Analyze the form (run analyze_form.py)
|
||||
2. Create field mapping (edit fields.json)
|
||||
3. Validate mapping (run validate_fields.py)
|
||||
4. Fill the form (run fill_form.py)
|
||||
5. Verify output (run verify_output.py)
|
||||
```
|
||||
|
||||
## Conditional Workflows
|
||||
|
||||
For tasks with branching logic, guide Claude through decision points:
|
||||
|
||||
```markdown
|
||||
1. Determine the modification type:
|
||||
**Creating new content?** → Follow "Creation workflow" below
|
||||
**Editing existing content?** → Follow "Editing workflow" below
|
||||
|
||||
2. Creation workflow: [steps]
|
||||
3. Editing workflow: [steps]
|
||||
```
|
||||
304
skills/skill-creator/scripts/init_skill.py
Normal file
304
skills/skill-creator/scripts/init_skill.py
Normal file
@@ -0,0 +1,304 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Skill Initializer - Creates a new skill from template
|
||||
|
||||
Usage:
|
||||
init_skill.py <skill-name> --path <path>
|
||||
|
||||
Examples:
|
||||
init_skill.py my-new-skill --path skills/public
|
||||
init_skill.py my-api-helper --path skills/private
|
||||
init_skill.py custom-skill --path /custom/location
|
||||
"""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
SKILL_TEMPLATE = """---
|
||||
name: {skill_name}
|
||||
description: [TODO: Complete and informative explanation of what the skill does and when to use it. Include WHEN to use this skill - specific scenarios, file types, or tasks that trigger it.]
|
||||
---
|
||||
|
||||
# {skill_title}
|
||||
|
||||
## Overview
|
||||
|
||||
[TODO: 1-2 sentences explaining what this skill enables]
|
||||
|
||||
## Structuring This Skill
|
||||
|
||||
[TODO: Choose the structure that best fits this skill's purpose. Common patterns:
|
||||
|
||||
**1. Workflow-Based** (best for sequential processes)
|
||||
- Works well when there are clear step-by-step procedures
|
||||
- Example: DOCX skill with "Workflow Decision Tree" → "Reading" → "Creating" → "Editing"
|
||||
- Structure: ## Overview → ## Workflow Decision Tree → ## Step 1 → ## Step 2...
|
||||
|
||||
**2. Task-Based** (best for tool collections)
|
||||
- Works well when the skill offers different operations/capabilities
|
||||
- Example: PDF skill with "Quick Start" → "Merge PDFs" → "Split PDFs" → "Extract Text"
|
||||
- Structure: ## Overview → ## Quick Start → ## Task Category 1 → ## Task Category 2...
|
||||
|
||||
**3. Reference/Guidelines** (best for standards or specifications)
|
||||
- Works well for brand guidelines, coding standards, or requirements
|
||||
- Example: Brand styling with "Brand Guidelines" → "Colors" → "Typography" → "Features"
|
||||
- Structure: ## Overview → ## Guidelines → ## Specifications → ## Usage...
|
||||
|
||||
**4. Capabilities-Based** (best for integrated systems)
|
||||
- Works well when the skill provides multiple interrelated features
|
||||
- Example: Product Management with "Core Capabilities" → numbered capability list
|
||||
- Structure: ## Overview → ## Core Capabilities → ### 1. Feature → ### 2. Feature...
|
||||
|
||||
Patterns can be mixed and matched as needed. Most skills combine patterns (e.g., start with task-based, add workflow for complex operations).
|
||||
|
||||
Delete this entire "Structuring This Skill" section when done - it's just guidance.]
|
||||
|
||||
## [TODO: Replace with the first main section based on chosen structure]
|
||||
|
||||
[TODO: Add content here. See examples in existing skills:
|
||||
- Code samples for technical skills
|
||||
- Decision trees for complex workflows
|
||||
- Concrete examples with realistic user requests
|
||||
- References to scripts/templates/references as needed]
|
||||
|
||||
## Resources
|
||||
|
||||
This skill includes example resource directories that demonstrate how to organize different types of bundled resources:
|
||||
|
||||
### scripts/
|
||||
Executable code (Python/Bash/etc.) that can be run directly to perform specific operations.
|
||||
|
||||
**Examples from other skills:**
|
||||
- PDF skill: `fill_fillable_fields.py`, `extract_form_field_info.py` - utilities for PDF manipulation
|
||||
- DOCX skill: `document.py`, `utilities.py` - Python modules for document processing
|
||||
|
||||
**Appropriate for:** Python scripts, shell scripts, or any executable code that performs automation, data processing, or specific operations.
|
||||
|
||||
**Note:** Scripts may be executed without loading into context, but can still be read by Claude for patching or environment adjustments.
|
||||
|
||||
### references/
|
||||
Documentation and reference material intended to be loaded into context to inform Claude's process and thinking.
|
||||
|
||||
**Examples from other skills:**
|
||||
- Product management: `communication.md`, `context_building.md` - detailed workflow guides
|
||||
- BigQuery: API reference documentation and query examples
|
||||
- Finance: Schema documentation, company policies
|
||||
|
||||
**Appropriate for:** In-depth documentation, API references, database schemas, comprehensive guides, or any detailed information that Claude should reference while working.
|
||||
|
||||
### assets/
|
||||
Files not intended to be loaded into context, but rather used within the output Claude produces.
|
||||
|
||||
**Examples from other skills:**
|
||||
- Brand styling: PowerPoint template files (.pptx), logo files
|
||||
- Frontend builder: HTML/React boilerplate project directories
|
||||
- Typography: Font files (.ttf, .woff2)
|
||||
|
||||
**Appropriate for:** Templates, boilerplate code, document templates, images, icons, fonts, or any files meant to be copied or used in the final output.
|
||||
|
||||
---
|
||||
|
||||
**Any unneeded directories can be deleted.** Not every skill requires all three types of resources.
|
||||
"""
|
||||
|
||||
EXAMPLE_SCRIPT = '''#!/usr/bin/env python3
|
||||
"""
|
||||
Example helper script for {skill_name}
|
||||
|
||||
This is a placeholder script that can be executed directly.
|
||||
Replace with actual implementation or delete if not needed.
|
||||
|
||||
Example real scripts from other skills:
|
||||
- pdf/scripts/fill_fillable_fields.py - Fills PDF form fields
|
||||
- pdf/scripts/convert_pdf_to_images.py - Converts PDF pages to images
|
||||
"""
|
||||
|
||||
def main():
|
||||
print("This is an example script for {skill_name}")
|
||||
# TODO: Add actual script logic here
|
||||
# This could be data processing, file conversion, API calls, etc.
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
'''
|
||||
|
||||
EXAMPLE_REFERENCE = """# Reference Documentation for {skill_title}
|
||||
|
||||
This is a placeholder for detailed reference documentation.
|
||||
Replace with actual reference content or delete if not needed.
|
||||
|
||||
Example real reference docs from other skills:
|
||||
- product-management/references/communication.md - Comprehensive guide for status updates
|
||||
- product-management/references/context_building.md - Deep-dive on gathering context
|
||||
- bigquery/references/ - API references and query examples
|
||||
|
||||
## When Reference Docs Are Useful
|
||||
|
||||
Reference docs are ideal for:
|
||||
- Comprehensive API documentation
|
||||
- Detailed workflow guides
|
||||
- Complex multi-step processes
|
||||
- Information too lengthy for main SKILL.md
|
||||
- Content that's only needed for specific use cases
|
||||
|
||||
## Structure Suggestions
|
||||
|
||||
### API Reference Example
|
||||
- Overview
|
||||
- Authentication
|
||||
- Endpoints with examples
|
||||
- Error codes
|
||||
- Rate limits
|
||||
|
||||
### Workflow Guide Example
|
||||
- Prerequisites
|
||||
- Step-by-step instructions
|
||||
- Common patterns
|
||||
- Troubleshooting
|
||||
- Best practices
|
||||
"""
|
||||
|
||||
EXAMPLE_ASSET = """# Example Asset File
|
||||
|
||||
This placeholder represents where asset files would be stored.
|
||||
Replace with actual asset files (templates, images, fonts, etc.) or delete if not needed.
|
||||
|
||||
Asset files are NOT intended to be loaded into context, but rather used within
|
||||
the output Claude produces.
|
||||
|
||||
Example asset files from other skills:
|
||||
- Brand guidelines: logo.png, slides_template.pptx
|
||||
- Frontend builder: hello-world/ directory with HTML/React boilerplate
|
||||
- Typography: custom-font.ttf, font-family.woff2
|
||||
- Data: sample_data.csv, test_dataset.json
|
||||
|
||||
## Common Asset Types
|
||||
|
||||
- Templates: .pptx, .docx, boilerplate directories
|
||||
- Images: .png, .jpg, .svg, .gif
|
||||
- Fonts: .ttf, .otf, .woff, .woff2
|
||||
- Boilerplate code: Project directories, starter files
|
||||
- Icons: .ico, .svg
|
||||
- Data files: .csv, .json, .xml, .yaml
|
||||
|
||||
Note: This is a text placeholder. Actual assets can be any file type.
|
||||
"""
|
||||
|
||||
|
||||
def title_case_skill_name(skill_name):
|
||||
"""Convert hyphenated skill name to Title Case for display."""
|
||||
return ' '.join(word.capitalize() for word in skill_name.split('-'))
|
||||
|
||||
|
||||
def init_skill(skill_name, path):
|
||||
"""
|
||||
Initialize a new skill directory with template SKILL.md.
|
||||
|
||||
Args:
|
||||
skill_name: Name of the skill
|
||||
path: Path where the skill directory should be created
|
||||
|
||||
Returns:
|
||||
Path to created skill directory, or None if error
|
||||
"""
|
||||
# Determine skill directory path
|
||||
skill_dir = Path(path).resolve() / skill_name
|
||||
|
||||
# Check if directory already exists
|
||||
if skill_dir.exists():
|
||||
print(f"❌ Error: Skill directory already exists: {skill_dir}")
|
||||
return None
|
||||
|
||||
# Create skill directory
|
||||
try:
|
||||
skill_dir.mkdir(parents=True, exist_ok=False)
|
||||
print(f"✅ Created skill directory: {skill_dir}")
|
||||
except Exception as e:
|
||||
print(f"❌ Error creating directory: {e}")
|
||||
return None
|
||||
|
||||
# Create SKILL.md from template
|
||||
skill_title = title_case_skill_name(skill_name)
|
||||
skill_content = SKILL_TEMPLATE.format(
|
||||
skill_name=skill_name,
|
||||
skill_title=skill_title
|
||||
)
|
||||
|
||||
skill_md_path = skill_dir / 'SKILL.md'
|
||||
try:
|
||||
skill_md_path.write_text(skill_content)
|
||||
print("✅ Created SKILL.md")
|
||||
except Exception as e:
|
||||
print(f"❌ Error creating SKILL.md: {e}")
|
||||
return None
|
||||
|
||||
# Create resource directories with example files
|
||||
try:
|
||||
# Create scripts/ directory with example script
|
||||
scripts_dir = skill_dir / 'scripts'
|
||||
scripts_dir.mkdir(exist_ok=True)
|
||||
example_script = scripts_dir / 'example.py'
|
||||
example_script.write_text(EXAMPLE_SCRIPT.format(skill_name=skill_name))
|
||||
example_script.chmod(0o755)
|
||||
print("✅ Created scripts/example.py")
|
||||
|
||||
# Create references/ directory with example reference doc
|
||||
references_dir = skill_dir / 'references'
|
||||
references_dir.mkdir(exist_ok=True)
|
||||
example_reference = references_dir / 'api_reference.md'
|
||||
example_reference.write_text(
|
||||
EXAMPLE_REFERENCE.format(skill_title=skill_title))
|
||||
print("✅ Created references/api_reference.md")
|
||||
|
||||
# Create assets/ directory with example asset placeholder
|
||||
assets_dir = skill_dir / 'assets'
|
||||
assets_dir.mkdir(exist_ok=True)
|
||||
example_asset = assets_dir / 'example_asset.txt'
|
||||
example_asset.write_text(EXAMPLE_ASSET)
|
||||
print("✅ Created assets/example_asset.txt")
|
||||
except Exception as e:
|
||||
print(f"❌ Error creating resource directories: {e}")
|
||||
return None
|
||||
|
||||
# Print next steps
|
||||
print(f"\n✅ Skill '{skill_name}' initialized successfully at {skill_dir}")
|
||||
print("\nNext steps:")
|
||||
print("1. Edit SKILL.md to complete the TODO items and update the description")
|
||||
print("2. Customize or delete the example files in scripts/, references/, and assets/")
|
||||
print("3. Run the validator when ready to check the skill structure")
|
||||
|
||||
return skill_dir
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 4 or sys.argv[2] != '--path':
|
||||
print("Usage: init_skill.py <skill-name> --path <path>")
|
||||
print("\nSkill name requirements:")
|
||||
print(" - Hyphen-case identifier (e.g., 'data-analyzer')")
|
||||
print(" - Lowercase letters, digits, and hyphens only")
|
||||
print(" - Max 40 characters")
|
||||
print(" - Must match directory name exactly")
|
||||
print("\nExamples:")
|
||||
print(" init_skill.py my-new-skill --path skills/public")
|
||||
print(" init_skill.py my-api-helper --path skills/private")
|
||||
print(" init_skill.py custom-skill --path /custom/location")
|
||||
sys.exit(1)
|
||||
|
||||
skill_name = sys.argv[1]
|
||||
path = sys.argv[3]
|
||||
|
||||
print(f"🚀 Initializing skill: {skill_name}")
|
||||
print(f" Location: {path}")
|
||||
print()
|
||||
|
||||
result = init_skill(skill_name, path)
|
||||
|
||||
if result:
|
||||
sys.exit(0)
|
||||
else:
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
111
skills/skill-creator/scripts/package_skill.py
Normal file
111
skills/skill-creator/scripts/package_skill.py
Normal file
@@ -0,0 +1,111 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Skill Packager - Creates a distributable .skill file of a skill folder
|
||||
|
||||
Usage:
|
||||
python utils/package_skill.py <path/to/skill-folder> [output-directory]
|
||||
|
||||
Example:
|
||||
python utils/package_skill.py skills/public/my-skill
|
||||
python utils/package_skill.py skills/public/my-skill ./dist
|
||||
"""
|
||||
|
||||
import sys
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
from quick_validate import validate_skill
|
||||
|
||||
|
||||
def package_skill(skill_path, output_dir=None):
|
||||
"""
|
||||
Package a skill folder into a .skill file.
|
||||
|
||||
Args:
|
||||
skill_path: Path to the skill folder
|
||||
output_dir: Optional output directory for the .skill file (defaults to current directory)
|
||||
|
||||
Returns:
|
||||
Path to the created .skill file, or None if error
|
||||
"""
|
||||
skill_path = Path(skill_path).resolve()
|
||||
|
||||
# Validate skill folder exists
|
||||
if not skill_path.exists():
|
||||
print(f"❌ Error: Skill folder not found: {skill_path}")
|
||||
return None
|
||||
|
||||
if not skill_path.is_dir():
|
||||
print(f"❌ Error: Path is not a directory: {skill_path}")
|
||||
return None
|
||||
|
||||
# Validate SKILL.md exists
|
||||
skill_md = skill_path / "SKILL.md"
|
||||
if not skill_md.exists():
|
||||
print(f"❌ Error: SKILL.md not found in {skill_path}")
|
||||
return None
|
||||
|
||||
# Run validation before packaging
|
||||
print("🔍 Validating skill...")
|
||||
valid, message = validate_skill(skill_path)
|
||||
if not valid:
|
||||
print(f"❌ Validation failed: {message}")
|
||||
print(" Please fix the validation errors before packaging.")
|
||||
return None
|
||||
print(f"✅ {message}\n")
|
||||
|
||||
# Determine output location
|
||||
skill_name = skill_path.name
|
||||
if output_dir:
|
||||
output_path = Path(output_dir).resolve()
|
||||
output_path.mkdir(parents=True, exist_ok=True)
|
||||
else:
|
||||
output_path = Path.cwd()
|
||||
|
||||
skill_filename = output_path / f"{skill_name}.skill"
|
||||
|
||||
# Create the .skill file (zip format)
|
||||
try:
|
||||
with zipfile.ZipFile(skill_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
|
||||
# Walk through the skill directory
|
||||
for file_path in skill_path.rglob('*'):
|
||||
if file_path.is_file():
|
||||
# Calculate the relative path within the zip
|
||||
arcname = file_path.relative_to(skill_path.parent)
|
||||
zipf.write(file_path, arcname)
|
||||
print(f" Added: {arcname}")
|
||||
|
||||
print(f"\n✅ Successfully packaged skill to: {skill_filename}")
|
||||
return skill_filename
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error creating .skill file: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print(
|
||||
"Usage: python utils/package_skill.py <path/to/skill-folder> [output-directory]")
|
||||
print("\nExample:")
|
||||
print(" python utils/package_skill.py skills/public/my-skill")
|
||||
print(" python utils/package_skill.py skills/public/my-skill ./dist")
|
||||
sys.exit(1)
|
||||
|
||||
skill_path = sys.argv[1]
|
||||
output_dir = sys.argv[2] if len(sys.argv) > 2 else None
|
||||
|
||||
print(f"📦 Packaging skill: {skill_path}")
|
||||
if output_dir:
|
||||
print(f" Output directory: {output_dir}")
|
||||
print()
|
||||
|
||||
result = package_skill(skill_path, output_dir)
|
||||
|
||||
if result:
|
||||
sys.exit(0)
|
||||
else:
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
98
skills/skill-creator/scripts/quick_validate.py
Normal file
98
skills/skill-creator/scripts/quick_validate.py
Normal file
@@ -0,0 +1,98 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Quick validation script for skills - minimal version
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import re
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def validate_skill(skill_path):
|
||||
"""Basic validation of a skill"""
|
||||
skill_path = Path(skill_path)
|
||||
|
||||
# Check SKILL.md exists
|
||||
skill_md = skill_path / 'SKILL.md'
|
||||
if not skill_md.exists():
|
||||
return False, "SKILL.md not found"
|
||||
|
||||
# Read and validate frontmatter
|
||||
content = skill_md.read_text()
|
||||
if not content.startswith('---'):
|
||||
return False, "No YAML frontmatter found"
|
||||
|
||||
# Extract frontmatter
|
||||
match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
|
||||
if not match:
|
||||
return False, "Invalid frontmatter format"
|
||||
|
||||
frontmatter_text = match.group(1)
|
||||
|
||||
# Parse YAML frontmatter
|
||||
try:
|
||||
frontmatter = yaml.safe_load(frontmatter_text)
|
||||
if not isinstance(frontmatter, dict):
|
||||
return False, "Frontmatter must be a YAML dictionary"
|
||||
except yaml.YAMLError as e:
|
||||
return False, f"Invalid YAML in frontmatter: {e}"
|
||||
|
||||
# Define allowed properties
|
||||
ALLOWED_PROPERTIES = {'name', 'description',
|
||||
'license', 'allowed-tools', 'metadata'}
|
||||
|
||||
# Check for unexpected properties (excluding nested keys under metadata)
|
||||
unexpected_keys = set(frontmatter.keys()) - ALLOWED_PROPERTIES
|
||||
if unexpected_keys:
|
||||
return False, (
|
||||
f"Unexpected key(s) in SKILL.md frontmatter: {', '.join(sorted(unexpected_keys))}. "
|
||||
f"Allowed properties are: {', '.join(sorted(ALLOWED_PROPERTIES))}"
|
||||
)
|
||||
|
||||
# Check required fields
|
||||
if 'name' not in frontmatter:
|
||||
return False, "Missing 'name' in frontmatter"
|
||||
if 'description' not in frontmatter:
|
||||
return False, "Missing 'description' in frontmatter"
|
||||
|
||||
# Extract name for validation
|
||||
name = frontmatter.get('name', '')
|
||||
if not isinstance(name, str):
|
||||
return False, f"Name must be a string, got {type(name).__name__}"
|
||||
name = name.strip()
|
||||
if name:
|
||||
# Check naming convention (hyphen-case: lowercase with hyphens)
|
||||
if not re.match(r'^[a-z0-9-]+$', name):
|
||||
return False, f"Name '{name}' should be hyphen-case (lowercase letters, digits, and hyphens only)"
|
||||
if name.startswith('-') or name.endswith('-') or '--' in name:
|
||||
return False, f"Name '{name}' cannot start/end with hyphen or contain consecutive hyphens"
|
||||
# Check name length (max 64 characters per spec)
|
||||
if len(name) > 64:
|
||||
return False, f"Name is too long ({len(name)} characters). Maximum is 64 characters."
|
||||
|
||||
# Extract and validate description
|
||||
description = frontmatter.get('description', '')
|
||||
if not isinstance(description, str):
|
||||
return False, f"Description must be a string, got {type(description).__name__}"
|
||||
description = description.strip()
|
||||
if description:
|
||||
# Check for angle brackets
|
||||
if '<' in description or '>' in description:
|
||||
return False, "Description cannot contain angle brackets (< or >)"
|
||||
# Check description length (max 1024 characters per spec)
|
||||
if len(description) > 1024:
|
||||
return False, f"Description is too long ({len(description)} characters). Maximum is 1024 characters."
|
||||
|
||||
return True, "Skill is valid!"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 2:
|
||||
print("Usage: python quick_validate.py <skill_directory>")
|
||||
sys.exit(1)
|
||||
|
||||
valid, message = validate_skill(sys.argv[1])
|
||||
print(message)
|
||||
sys.exit(0 if valid else 1)
|
||||
445
skills/skill-factory/SKILL.md
Normal file
445
skills/skill-factory/SKILL.md
Normal file
@@ -0,0 +1,445 @@
|
||||
---
|
||||
name: skill-factory
|
||||
description: >
|
||||
Research-backed skill creation workflow with automated firecrawl research gathering, multi-tier
|
||||
validation, and comprehensive auditing. Use when "create skills with research automation",
|
||||
"build research-backed skills", "validate skills end-to-end", "automate skill research and
|
||||
creation", needs 8-phase workflow from research through final audit, wants firecrawl-powered
|
||||
research combined with validation, or requires quality-assured skill creation following
|
||||
Anthropic specifications for Claude Code.
|
||||
---
|
||||
|
||||
# Skill Factory
|
||||
|
||||
Comprehensive workflow orchestrator for creating high-quality Claude Code skills with automated research, content
|
||||
review, and multi-tier validation.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use skill-factory when:
|
||||
|
||||
- **Creating any new skill** - From initial idea to validated, production-ready skill
|
||||
- **Research needed** - Automate gathering of documentation, examples, and best practices
|
||||
- **Quality assurance required** - Ensure skills meet official specifications and best practices
|
||||
- **Guided workflow preferred** - Step-by-step progression with clear checkpoints
|
||||
- **Validation needed** - Runtime testing, integration checks, and comprehensive auditing
|
||||
|
||||
**Scope:** Creates skills for ANY purpose (not limited to meta-claude plugin):
|
||||
|
||||
- Infrastructure skills (terraform-best-practices, ansible-vault-security)
|
||||
- Development skills (docker-compose-helper, git-workflow-automation)
|
||||
- Domain-specific skills (brand-guidelines, conventional-git-commits)
|
||||
- Any skill that extends Claude's capabilities
|
||||
|
||||
## Available Operations
|
||||
|
||||
The skill-factory provides 8 specialized commands for the create-review-validate lifecycle:
|
||||
|
||||
| Command | Purpose | Use When |
|
||||
|---------|---------|----------|
|
||||
| `/meta-claude:skill:research` | Gather domain knowledge using firecrawl API | Need automated web scraping for skill research |
|
||||
| `/meta-claude:skill:format` | Clean and structure research materials | Have raw research needing markdown formatting |
|
||||
| `/meta-claude:skill:create` | Generate SKILL.md with YAML frontmatter | Ready to create skill structure from research |
|
||||
| `/meta-claude:skill:review-content` | Validate content quality and clarity | Need content review before compliance check |
|
||||
| `/meta-claude:skill:review-compliance` | Run quick_validate.py on SKILL.md | Validate YAML frontmatter and naming conventions |
|
||||
| `/meta-claude:skill:validate-runtime` | Test skill loading in Claude context | Verify skill loads without syntax errors |
|
||||
| `/meta-claude:skill:validate-integration` | Check for conflicts with existing skills | Ensure no duplicate names or overlaps |
|
||||
| `/meta-claude:skill:validate-audit` | Invoke claude-skill-auditor agent | Get comprehensive audit against Anthropic specs |
|
||||
|
||||
**Power user tip:** Commands work standalone or orchestrated. Use individual commands for targeted fixes,
|
||||
or invoke the skill for full workflow automation.
|
||||
|
||||
**Visual learners:** See [workflows/visual-guide.md](workflows/visual-guide.md) for decision trees, state diagrams,
|
||||
and workflow visualizations.
|
||||
|
||||
## Quick Decision Guide
|
||||
|
||||
### Full Workflow vs Individual Commands
|
||||
|
||||
**Creating new skill (full workflow):**
|
||||
|
||||
- With research → `skill-factory <skill-name> <research-path>`
|
||||
- Without research → `skill-factory <skill-name>` (includes firecrawl research)
|
||||
- From knowledge only → `skill-factory <skill-name>` → Select "Skip research"
|
||||
|
||||
**Using individual commands (power users):**
|
||||
|
||||
| Scenario | Command | Why |
|
||||
|----------|---------|-----|
|
||||
| Need web research for skill topic | `/meta-claude:skill:research <name> [sources]` | Automated firecrawl scraping |
|
||||
| Have messy research files | `/meta-claude:skill:format <research-dir>` | Clean markdown formatting |
|
||||
| Ready to generate SKILL.md | `/meta-claude:skill:create <name> <research-dir>` | Creates structure with YAML |
|
||||
| Content unclear or incomplete | `/meta-claude:skill:review-content <skill-path>` | Quality gate before compliance |
|
||||
| Check frontmatter syntax | `/meta-claude:skill:review-compliance <skill-path>` | Runs quick_validate.py |
|
||||
| Skill won't load in Claude | `/meta-claude:skill:validate-runtime <skill-path>` | Tests actual loading |
|
||||
| Worried about name conflicts | `/meta-claude:skill:validate-integration <skill-path>` | Checks existing skills |
|
||||
| Want Anthropic spec audit | `/meta-claude:skill:validate-audit <skill-path>` | Runs claude-skill-auditor |
|
||||
|
||||
**When to use full workflow:** Creating new skills from scratch
|
||||
**When to use individual commands:** Fixing specific issues, power user iteration
|
||||
|
||||
For full workflow details, see Quick Start section below.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Path 1: Research Already Gathered
|
||||
|
||||
If you have research materials ready:
|
||||
|
||||
```bash
|
||||
# Research exists at docs/research/skills/<skill-name>/
|
||||
skill-factory <skill-name> docs/research/skills/<skill-name>/
|
||||
```
|
||||
|
||||
The skill will:
|
||||
|
||||
1. Format research materials
|
||||
2. Create skill structure
|
||||
3. Review content quality
|
||||
4. Review technical compliance
|
||||
5. Validate runtime loading
|
||||
6. Validate integration
|
||||
7. Run comprehensive audit
|
||||
8. Present completion options
|
||||
|
||||
### Path 2: Research Needed
|
||||
|
||||
If starting from scratch:
|
||||
|
||||
```bash
|
||||
# Let skill-factory handle research
|
||||
skill-factory <skill-name>
|
||||
```
|
||||
|
||||
The skill will ask about research sources and proceed through full workflow.
|
||||
|
||||
### Example Usage
|
||||
|
||||
```text
|
||||
User: "Create a skill for CodeRabbit code review best practices"
|
||||
|
||||
skill-factory detects no research path provided, asks:
|
||||
|
||||
"Have you already gathered research for this skill?
|
||||
[Yes - I have research at <path>]
|
||||
[No - Help me gather research]
|
||||
[Skip - I'll create from knowledge only]"
|
||||
|
||||
User: "No - Help me gather research"
|
||||
|
||||
skill-factory proceeds through Path 2:
|
||||
1. Research skill domain
|
||||
2. Format research materials
|
||||
3. Create skill structure
|
||||
... (continues through all phases)
|
||||
```
|
||||
|
||||
## When This Skill Is Invoked
|
||||
|
||||
**Your role:** You are the skill-factory orchestrator. Your task is to guide the user through creating
|
||||
a high-quality, validated skill using 8 primitive slash commands.
|
||||
|
||||
### Step 1: Entry Point Detection
|
||||
|
||||
Analyze the user's prompt to determine which workflow path to use:
|
||||
|
||||
**If research path is explicitly provided:**
|
||||
|
||||
```text
|
||||
User: "skill-factory coderabbit docs/research/skills/coderabbit/"
|
||||
→ Use Path 1 (skip research phase)
|
||||
```
|
||||
|
||||
**If no research path is provided:**
|
||||
|
||||
Ask the user using AskUserQuestion:
|
||||
|
||||
```text
|
||||
"Have you already gathered research for this skill?"
|
||||
|
||||
Options:
|
||||
[Yes - I have research at a specific location]
|
||||
[No - Help me gather research]
|
||||
[Skip - I'll create from knowledge only]
|
||||
```
|
||||
|
||||
**Based on user response:**
|
||||
|
||||
- **Yes** → Ask for research path, use Path 1
|
||||
- **No** → Use Path 2 (include research phase)
|
||||
- **Skip** → Use Path 1 without research (create from existing knowledge)
|
||||
|
||||
### Step 2: Initialize TodoWrite
|
||||
|
||||
Create a TodoWrite list based on the selected path:
|
||||
|
||||
**Path 2 (Full Workflow with Research):**
|
||||
|
||||
```javascript
|
||||
TodoWrite([
|
||||
{"content": "Research skill domain", "status": "pending", "activeForm": "Researching skill domain"},
|
||||
{"content": "Format research materials", "status": "pending", "activeForm": "Formatting research materials"},
|
||||
{"content": "Create skill structure", "status": "pending", "activeForm": "Creating skill structure"},
|
||||
{"content": "Review content quality", "status": "pending", "activeForm": "Reviewing content quality"},
|
||||
{"content": "Review technical compliance", "status": "pending", "activeForm": "Reviewing technical compliance"},
|
||||
{"content": "Validate runtime loading", "status": "pending", "activeForm": "Validating runtime loading"},
|
||||
{"content": "Validate integration", "status": "pending", "activeForm": "Validating integration"},
|
||||
{"content": "Run comprehensive audit", "status": "pending", "activeForm": "Running comprehensive audit"},
|
||||
{"content": "Complete workflow", "status": "pending", "activeForm": "Completing workflow"}
|
||||
])
|
||||
```
|
||||
|
||||
**Path 1 (Research Exists or Skipped):**
|
||||
|
||||
Omit the first "Research skill domain" task. Start with "Format research materials" or
|
||||
"Create skill structure" depending on whether research exists.
|
||||
|
||||
### Step 3: Execute Workflow Sequentially
|
||||
|
||||
For each phase in the workflow, follow this pattern:
|
||||
|
||||
#### 1. Mark phase as in_progress
|
||||
|
||||
Update the corresponding TodoWrite item to `in_progress` status.
|
||||
|
||||
#### 2. Check dependencies
|
||||
|
||||
Before running a command, verify prior phases completed:
|
||||
|
||||
- Review-compliance requires review-content to pass
|
||||
- Validate-runtime requires review-compliance to pass
|
||||
- Validate-integration requires validate-runtime to pass
|
||||
- Validate-audit runs regardless (non-blocking feedback)
|
||||
|
||||
#### 3. Invoke command using SlashCommand tool
|
||||
|
||||
```text
|
||||
/meta-claude:skill:research <skill-name> [sources]
|
||||
/meta-claude:skill:format <research-dir>
|
||||
/meta-claude:skill:create <skill-name> <research-dir>
|
||||
/meta-claude:skill:review-content <skill-path>
|
||||
/meta-claude:skill:review-compliance <skill-path>
|
||||
/meta-claude:skill:validate-runtime <skill-path>
|
||||
/meta-claude:skill:validate-integration <skill-path>
|
||||
/meta-claude:skill:validate-audit <skill-path>
|
||||
```
|
||||
|
||||
**IMPORTANT:** Wait for each command to complete before proceeding to the next phase.
|
||||
Do not invoke multiple commands in parallel.
|
||||
|
||||
#### 4. Check command result
|
||||
|
||||
Each command returns success or failure with specific error details.
|
||||
|
||||
#### 5. Apply fix strategy if needed
|
||||
|
||||
The workflow uses a three-tier fix strategy:
|
||||
|
||||
- **Tier 1 (Simple):** Auto-fix formatting, frontmatter, markdown syntax
|
||||
- **Tier 2 (Medium):** Guided fixes with user approval
|
||||
- **Tier 3 (Complex):** Stop and report - requires manual fixes
|
||||
|
||||
**One-shot policy:** Each fix applied once, re-run once, then fail fast if still broken.
|
||||
|
||||
**For complete tier definitions, issue categorization, examples, and fix workflows:**
|
||||
See [references/error-handling.md](references/error-handling.md)
|
||||
|
||||
#### 6. Mark phase completed
|
||||
|
||||
Update TodoWrite item to `completed` status.
|
||||
|
||||
#### 7. Continue to next phase
|
||||
|
||||
Proceed to the next workflow phase, or exit if fail-fast triggered.
|
||||
|
||||
### Step 4: Completion
|
||||
|
||||
When all phases pass successfully:
|
||||
|
||||
**Present completion summary:**
|
||||
|
||||
```text
|
||||
✅ Skill created and validated successfully!
|
||||
|
||||
Location: <skill-output-path>/
|
||||
|
||||
Research materials: docs/research/skills/<skill-name>/
|
||||
```
|
||||
|
||||
**Ask about artifact cleanup:**
|
||||
|
||||
```text
|
||||
Keep research materials? [Keep/Remove] (default: Keep)
|
||||
```
|
||||
|
||||
**Present next steps using AskUserQuestion:**
|
||||
|
||||
```text
|
||||
Next steps - choose an option:
|
||||
[Test the skill now - Try invoking it in a new conversation]
|
||||
[Create PR - Submit skill to repository]
|
||||
[Add to plugin.json - Integrate with plugin manifest]
|
||||
[Done - Exit workflow]
|
||||
```
|
||||
|
||||
**Execute user's choice:**
|
||||
|
||||
- **Test** → Guide user to test skill invocation
|
||||
- **Create PR** → Create git branch, commit, push, open PR
|
||||
- **Add to plugin.json** → Update manifest, validate structure
|
||||
- **Done** → Clean exit
|
||||
|
||||
### Key Execution Principles
|
||||
|
||||
**Sequential Execution:** Do not run commands in parallel. Wait for each phase to complete before proceeding.
|
||||
|
||||
**Context Window Protection:** You are orchestrating commands, not sub-agents. Your context window is safe
|
||||
because you're invoking slash commands sequentially, not spawning multiple agents.
|
||||
|
||||
**State Management:** TodoWrite provides real-time progress visibility. Update it at every phase
|
||||
transition.
|
||||
|
||||
**Fail Fast:** When Tier 3 issues occur or user declines fixes, exit immediately with clear guidance.
|
||||
Don't attempt complex recovery.
|
||||
|
||||
**Dependency Enforcement:** Never skip dependency checks. Review phases are sequential, validation
|
||||
phases are tiered.
|
||||
|
||||
**One-shot Fixes:** Apply each fix once, re-run once, then fail if still broken. This prevents infinite loops.
|
||||
|
||||
**User Communication:** Report progress clearly. Show which phase is running, what the result was,
|
||||
and what's happening next.
|
||||
|
||||
## Workflow Architecture
|
||||
|
||||
Two paths based on research availability: Path 1 (research exists) and Path 2 (research needed).
|
||||
TodoWrite tracks progress through 7-8 phases. Entry point detection uses prompt analysis and AskUserQuestion.
|
||||
|
||||
**Details:** See [references/workflow-architecture.md](references/workflow-architecture.md)
|
||||
|
||||
## Workflow Execution
|
||||
|
||||
Sequential phase invocation pattern: mark in_progress → check dependencies → invoke command →
|
||||
check result → apply fixes → mark completed → continue. Dependencies enforced (review sequential,
|
||||
validation tiered). Commands invoked via SlashCommand tool with wait-for-completion pattern.
|
||||
|
||||
**Details:** See [references/workflow-execution.md](references/workflow-execution.md)
|
||||
|
||||
## Success Completion
|
||||
|
||||
When all phases pass successfully:
|
||||
|
||||
```text
|
||||
✅ Skill created and validated successfully!
|
||||
|
||||
Location: <skill-output-path>/
|
||||
|
||||
Research materials: docs/research/skills/<skill-name>/
|
||||
Keep research materials? [Keep/Remove] (default: Keep)
|
||||
```
|
||||
|
||||
**Artifact Cleanup:**
|
||||
|
||||
Ask user about research materials:
|
||||
|
||||
- **Keep** (default): Preserves research for future iterations, builds knowledge base
|
||||
- **Remove**: Cleans up workspace, research can be re-gathered if needed
|
||||
|
||||
**Next Steps:**
|
||||
|
||||
Present options to user:
|
||||
|
||||
```text
|
||||
Next steps - choose an option:
|
||||
[1] Test the skill now - Try invoking it in a new conversation
|
||||
[2] Create PR - Submit skill to repository
|
||||
[3] Add to plugin.json - Integrate with plugin manifest (if applicable)
|
||||
[4] Done - Exit workflow
|
||||
|
||||
What would you like to do?
|
||||
```
|
||||
|
||||
**User Actions:**
|
||||
|
||||
1. **Test the skill now** → Guide user to test skill invocation
|
||||
2. **Create PR** → Create git branch, commit, push, open PR
|
||||
3. **Add to plugin.json** → Update manifest, validate structure (for plugin skills)
|
||||
4. **Done** → Clean exit
|
||||
|
||||
Execute the user's choice, then exit cleanly.
|
||||
|
||||
## Examples
|
||||
|
||||
The skill-factory workflow supports various scenarios:
|
||||
|
||||
1. **Path 2 (Full Workflow):** Creating skills from scratch with automated research gathering
|
||||
2. **Path 1 (Existing Research):** Creating skills when research materials already exist
|
||||
3. **Guided Fix Workflow:** Applying Tier 2 fixes with user approval
|
||||
4. **Fail-Fast Pattern:** Handling Tier 3 complex issues with immediate exit
|
||||
|
||||
**Detailed Examples:** See [references/workflow-examples.md](references/workflow-examples.md) for complete walkthrough
|
||||
scenarios showing TodoWrite state transitions, command invocations, error handling, and success paths.
|
||||
|
||||
## Design Principles
|
||||
|
||||
Six core principles: (1) Primitives First (slash commands foundation), (2) KISS State Management (TodoWrite only),
|
||||
(3) Fail Fast (no complex recovery), (4) Context-Aware Entry (prompt analysis), (5) Composable & Testable
|
||||
(standalone or orchestrated), (6) Quality Gates (sequential dependencies).
|
||||
|
||||
**Details:** See [references/design-principles.md](references/design-principles.md)
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Delegation Architecture
|
||||
|
||||
skill-factory extends the proven skill-creator skill by adding:
|
||||
|
||||
- **Pre-creation phases:** Research gathering and formatting
|
||||
- **Post-creation phases:** Content review and validation
|
||||
- **Quality gates:** Compliance checking, runtime testing, integration validation
|
||||
|
||||
**Delegation to existing tools:**
|
||||
|
||||
- **skill-creator skill** → Core creation workflow (Understand → Plan → Initialize → Edit → Package)
|
||||
- **quick_validate.py** → Compliance validation (frontmatter, naming, structure)
|
||||
- **claude-skill-auditor agent** → Comprehensive audit
|
||||
|
||||
This separation maintains the stability of skill-creator while adding research-backed, validated skill creation
|
||||
with quality gates.
|
||||
|
||||
### Progressive Disclosure
|
||||
|
||||
This skill provides:
|
||||
|
||||
1. **Quick Start** - Fast path for common use cases
|
||||
2. **Workflow Architecture** - Understanding the orchestration model
|
||||
3. **Detailed Phase Documentation** - Deep dive into each phase
|
||||
4. **Error Handling** - Comprehensive fix strategies
|
||||
5. **Examples** - Real-world scenarios
|
||||
|
||||
Load sections as needed for your use case.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Common issues: research phase failures (check FIRECRAWL_API_KEY), content review loops (Tier 3 issues need
|
||||
redesign), compliance validation (run quick_validate.py manually), integration conflicts (check duplicate names).
|
||||
|
||||
**Details:** See [references/troubleshooting.md](references/troubleshooting.md)
|
||||
|
||||
## Success Metrics
|
||||
|
||||
You know skill-factory succeeds when:
|
||||
|
||||
1. **Time to create skill:** Reduced from hours to minutes
|
||||
2. **Skill quality:** 100% compliance with official specs on first validation
|
||||
3. **User satisfaction:** Beginners create high-quality skills without deep knowledge
|
||||
4. **Maintainability:** Primitives are independently testable and reusable
|
||||
5. **Workflow clarity:** Users understand current phase and next steps at all times
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **skill-creator skill** - Core skill creation workflow (delegated by skill-factory)
|
||||
- **multi-agent-composition skill** - Architectural patterns and composition rules
|
||||
- **Primitive commands** - Individual slash commands under `/skill-*` namespace
|
||||
- **quick_validate.py** - Compliance validation script
|
||||
- **claude-skill-auditor agent** - Comprehensive skill audit agent
|
||||
31
skills/skill-factory/references/design-principles.md
Normal file
31
skills/skill-factory/references/design-principles.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Design Principles
|
||||
|
||||
## 1. Primitives First
|
||||
|
||||
Slash commands are the foundation. The skill orchestrates them using the SlashCommand tool. This follows the
|
||||
multi-agent-composition principle: "Always start with prompts."
|
||||
|
||||
### 2. KISS State Management
|
||||
|
||||
TodoWrite provides visibility without complexity. No external state files, no databases, no complex checkpointing.
|
||||
Simple, effective progress tracking.
|
||||
|
||||
### 3. Fail Fast
|
||||
|
||||
No complex recovery mechanisms. When something can't be auto-fixed or user declines a fix, exit immediately with
|
||||
clear guidance. Preserves artifacts, provides next steps.
|
||||
|
||||
### 4. Context-Aware Entry
|
||||
|
||||
Detects workflow path from user's prompt. Explicit research location → Path 1. Ambiguous → Ask user. Natural
|
||||
language interface.
|
||||
|
||||
### 5. Composable & Testable
|
||||
|
||||
Every primitive works standalone (power users) or orchestrated (guided users). Each command is independently
|
||||
testable and verifiable.
|
||||
|
||||
### 6. Quality Gates
|
||||
|
||||
Sequential dependencies ensure quality: content before compliance, runtime before integration. Tiered validation
|
||||
with non-blocking audit for comprehensive feedback.
|
||||
181
skills/skill-factory/references/error-handling.md
Normal file
181
skills/skill-factory/references/error-handling.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Error Handling & Fix Strategy
|
||||
|
||||
## Core Principle: Fail Fast
|
||||
|
||||
When a phase fails without auto-fix capability, the workflow **stops immediately**. No complex recovery, no
|
||||
checkpointing, no resume commands—only clean exit with clear error reporting and preserved artifacts.
|
||||
|
||||
## Rule-Based Fix Tiers
|
||||
|
||||
Issues are categorized into three tiers based on complexity:
|
||||
|
||||
### Tier 1: Simple (Auto-Fix)
|
||||
|
||||
**Issue Types:**
|
||||
|
||||
- Formatting issues (whitespace, indentation)
|
||||
- Missing frontmatter fields (can be inferred)
|
||||
- Markdown syntax errors (quote escaping, link formatting)
|
||||
- File structure issues (missing directories)
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. Automatically apply fix
|
||||
2. Auto re-run the failed command ONCE
|
||||
3. Continue if passes, fail fast if still broken
|
||||
|
||||
**Example:**
|
||||
|
||||
```text
|
||||
/meta-claude:skill:review-compliance fails: "Missing frontmatter description field"
|
||||
↓
|
||||
Tier: Simple → AUTO-FIX
|
||||
↓
|
||||
Fix: Add description field inferred from skill name
|
||||
↓
|
||||
Auto re-run: /meta-claude:skill:review-compliance <skill-path>
|
||||
↓
|
||||
Result: Pass → Mark todo completed, continue to /meta-claude:skill:validate-runtime
|
||||
```
|
||||
|
||||
### Tier 2: Medium (Guided Fix with Approval)
|
||||
|
||||
**Issue Types:**
|
||||
|
||||
- Content clarity suggestions
|
||||
- Example improvements
|
||||
- Instruction rewording
|
||||
- Structure optimization
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. Present issue and suggested fix
|
||||
2. Ask user: "Apply this fix? [Yes/No/Edit]"
|
||||
3. If Yes → Apply fix, re-run command once
|
||||
4. If No → Fail fast
|
||||
5. If Edit → Show fix, let user modify, apply, re-run
|
||||
|
||||
**Example:**
|
||||
|
||||
```text
|
||||
/meta-claude:skill:review-content fails: "Examples section unclear, lacks practical context"
|
||||
↓
|
||||
Tier: Medium → GUIDED FIX
|
||||
↓
|
||||
Suggested fix: [Shows proposed rewrite with clearer examples]
|
||||
↓
|
||||
Ask: "Apply this fix? [Yes/No/Edit]"
|
||||
↓
|
||||
User: Yes
|
||||
↓
|
||||
Apply fix
|
||||
↓
|
||||
Re-run: /meta-claude:skill:review-content <skill-path>
|
||||
↓
|
||||
Result: Pass → Mark todo completed, continue to /meta-claude:skill:review-compliance
|
||||
```
|
||||
|
||||
### Tier 3: Complex (Stop and Report)
|
||||
|
||||
**Issue Types:**
|
||||
|
||||
- Architectural problems (skill design flaws)
|
||||
- Insufficient research (missing critical information)
|
||||
- Unsupported use cases (doesn't fit Claude Code model)
|
||||
- Schema violations (fundamental structure issues)
|
||||
- Composition rule violations (e.g., attempting to nest sub-agents)
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. Report the issue with detailed explanation
|
||||
2. Provide recommendations for manual fixes
|
||||
3. **Fail fast** - exit workflow immediately
|
||||
4. User must fix manually and restart workflow
|
||||
|
||||
**Example:**
|
||||
|
||||
```text
|
||||
/meta-claude:skill:review-content fails: "Skill attempts to nest sub-agents, violates composition rules"
|
||||
↓
|
||||
Tier: Complex → STOP AND REPORT
|
||||
↓
|
||||
Report:
|
||||
❌ Skill creation failed at: Review Content Quality
|
||||
|
||||
Issue found:
|
||||
- [Tier 3: Complex] Skill attempts to nest sub-agents, which violates composition rules
|
||||
|
||||
Recommendation:
|
||||
- Restructure skill to invoke sub-agents via SlashCommand tool instead
|
||||
- See: plugins/meta/meta-claude/skills/multi-agent-composition/
|
||||
|
||||
Workflow stopped. Please fix manually and restart.
|
||||
|
||||
Artifacts preserved at:
|
||||
Research: docs/research/skills/coderabbit/
|
||||
Partial skill: plugins/meta/meta-claude/skills/coderabbit/
|
||||
|
||||
↓
|
||||
WORKFLOW EXITS (fail fast)
|
||||
```
|
||||
|
||||
## One-Shot Fix Policy
|
||||
|
||||
To prevent infinite loops:
|
||||
|
||||
```text
|
||||
Phase fails
|
||||
↓
|
||||
Apply fix (auto or guided)
|
||||
↓
|
||||
Re-run command ONCE
|
||||
↓
|
||||
Result:
|
||||
- Pass → Continue to next phase
|
||||
- Fail → FAIL FAST (no second fix attempt)
|
||||
```
|
||||
|
||||
**Rationale:** If the first fix fails, the issue exceeds initial assessment. Stop and let the user investigate rather
|
||||
than looping infinitely.
|
||||
|
||||
## Issue Categorization Response Format
|
||||
|
||||
Each primitive command returns errors with tier metadata:
|
||||
|
||||
```javascript
|
||||
{
|
||||
"status": "fail",
|
||||
"issues": [
|
||||
{
|
||||
"tier": "simple",
|
||||
"category": "frontmatter",
|
||||
"description": "Missing description field",
|
||||
"fix": "Add description: 'Guide for CodeRabbit code review'",
|
||||
"auto_fixable": true
|
||||
},
|
||||
{
|
||||
"tier": "medium",
|
||||
"category": "content-clarity",
|
||||
"description": "Examples section unclear, lacks practical context",
|
||||
"suggestion": "[Proposed rewrite with clearer examples]",
|
||||
"auto_fixable": false
|
||||
},
|
||||
{
|
||||
"tier": "complex",
|
||||
"category": "architectural",
|
||||
"description": "Skill violates composition rules by nesting sub-agents",
|
||||
"recommendation": "Restructure to use SlashCommand tool for sub-agent invocation",
|
||||
"auto_fixable": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Parsing Command Responses
|
||||
|
||||
When a command completes, analyze its output to determine status:
|
||||
|
||||
- Look for "Success", "PASS", or exit code 0 → Continue
|
||||
- Look for "Error", "FAIL", or exit code 1 → Apply fix strategy
|
||||
- Parse issue tier metadata (if provided) to select fix approach
|
||||
- If no tier metadata, infer tier from issue description
|
||||
45
skills/skill-factory/references/troubleshooting.md
Normal file
45
skills/skill-factory/references/troubleshooting.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Troubleshooting
|
||||
|
||||
## Research Phase Fails
|
||||
|
||||
**Symptom:** `/meta-claude:skill:research` command fails with API errors
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Verify FIRECRAWL_API_KEY is set: `echo $FIRECRAWL_API_KEY`
|
||||
- Check network connectivity
|
||||
- Verify research script permissions: `chmod +x scripts/firecrawl_*.py`
|
||||
- Try manual research and use Path 1 (skip research phase)
|
||||
|
||||
### Content Review Fails Repeatedly
|
||||
|
||||
**Symptom:** `/meta-claude:skill:review-content` fails even after applying fixes
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Review the specific issues in the quality report
|
||||
- Check if issues are Tier 3 (complex) - these require manual redesign
|
||||
- Consider if the skill design matches Claude Code's composition model
|
||||
- Consult multi-agent-composition skill for architectural guidance
|
||||
|
||||
### Compliance Validation Fails
|
||||
|
||||
**Symptom:** `/meta-claude:skill:review-compliance` reports frontmatter or naming violations
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Run quick_validate.py manually: `scripts/quick_validate.py <skill-path>`
|
||||
- Check frontmatter YAML syntax (valid YAML, required fields)
|
||||
- Verify skill name follows hyphen-case convention
|
||||
- Ensure description is clear and within 1024 characters
|
||||
|
||||
### Integration Validation Fails
|
||||
|
||||
**Symptom:** `/meta-claude:skill:validate-integration` reports conflicts
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Check for duplicate skill names in the plugin
|
||||
- Review skill description for overlap with existing skills
|
||||
- Consider renaming or refining scope to avoid conflicts
|
||||
- Ensure skill complements rather than duplicates existing functionality
|
||||
67
skills/skill-factory/references/workflow-architecture.md
Normal file
67
skills/skill-factory/references/workflow-architecture.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Workflow Architecture
|
||||
|
||||
## Entry Point Detection
|
||||
|
||||
The skill analyzes your prompt to determine the workflow path:
|
||||
|
||||
**Explicit Research Path (Path 1):**
|
||||
|
||||
```text
|
||||
User: "Create coderabbit skill, research in docs/research/skills/coderabbit/"
|
||||
→ Detects research location, uses Path 1 (skip research phase)
|
||||
```
|
||||
|
||||
**Ambiguous Path:**
|
||||
|
||||
```text
|
||||
User: "Create coderabbit skill"
|
||||
→ Asks: "Have you already gathered research?"
|
||||
→ User response determines path
|
||||
```
|
||||
|
||||
**Research Needed (Path 2):**
|
||||
|
||||
```text
|
||||
User selects "No - Help me gather research"
|
||||
→ Uses Path 2 (full workflow including research)
|
||||
```
|
||||
|
||||
## Workflow Paths
|
||||
|
||||
### Path 1: Research Exists
|
||||
|
||||
```text
|
||||
format → create → review-content → review-compliance →
|
||||
validate-runtime → validate-integration → validate-audit → complete
|
||||
```
|
||||
|
||||
### Path 2: Research Needed
|
||||
|
||||
```text
|
||||
research → format → create → review-content → review-compliance →
|
||||
validate-runtime → validate-integration → validate-audit → complete
|
||||
```
|
||||
|
||||
## State Management
|
||||
|
||||
Progress tracking uses TodoWrite for real-time visibility:
|
||||
|
||||
**Path 2 Example (Full Workflow):**
|
||||
|
||||
```javascript
|
||||
[
|
||||
{"content": "Research skill domain", "status": "in_progress", "activeForm": "Researching skill domain"},
|
||||
{"content": "Format research materials", "status": "pending", "activeForm": "Formatting research materials"},
|
||||
{"content": "Create skill structure", "status": "pending", "activeForm": "Creating skill structure"},
|
||||
{"content": "Review content quality", "status": "pending", "activeForm": "Reviewing content quality"},
|
||||
{"content": "Review technical compliance", "status": "pending", "activeForm": "Reviewing technical compliance"},
|
||||
{"content": "Validate runtime loading", "status": "pending", "activeForm": "Validating runtime loading"},
|
||||
{"content": "Validate integration", "status": "pending", "activeForm": "Validating integration"},
|
||||
{"content": "Audit skill (non-blocking)", "status": "pending", "activeForm": "Auditing skill"},
|
||||
{"content": "Complete workflow", "status": "pending", "activeForm": "Completing workflow"}
|
||||
]
|
||||
```
|
||||
|
||||
**Path 1 Example (Research Exists):**
|
||||
|
||||
Omit first "Research skill domain" task from TodoWrite list.
|
||||
266
skills/skill-factory/references/workflow-examples.md
Normal file
266
skills/skill-factory/references/workflow-examples.md
Normal file
@@ -0,0 +1,266 @@
|
||||
# Workflow Examples
|
||||
|
||||
## Example 1: Creating Infrastructure Skill (Path 2)
|
||||
|
||||
```text
|
||||
User: "Create terraform-best-practices skill"
|
||||
|
||||
skill-factory:
|
||||
"Have you already gathered research for this skill?
|
||||
[Yes - I have research at <path>]
|
||||
[No - Help me gather research]
|
||||
[Skip - I'll create from knowledge only]"
|
||||
|
||||
User: "No - Help me gather research"
|
||||
|
||||
skill-factory initializes TodoWrite with 9 tasks, starts workflow:
|
||||
|
||||
[Phase 1: Research]
|
||||
Invokes: /meta-claude:skill:research terraform-best-practices
|
||||
Mini brainstorm about scope and categories
|
||||
Executes firecrawl research script
|
||||
Research saved to docs/research/skills/terraform-best-practices/
|
||||
✓ Research completed
|
||||
|
||||
[Phase 2: Format]
|
||||
Invokes: /meta-claude:skill:format docs/research/skills/terraform-best-practices
|
||||
Cleans UI artifacts and navigation elements
|
||||
✓ Formatting completed
|
||||
|
||||
[Phase 3: Create]
|
||||
Invokes: /meta-claude:skill:create terraform-best-practices docs/research/skills/terraform-best-practices
|
||||
Delegates to skill-creator skill
|
||||
Follows Understand → Plan → Initialize → Edit → Package workflow
|
||||
✓ Skill created at plugins/infrastructure/terraform-skills/skills/terraform-best-practices/
|
||||
|
||||
[Phase 4: Review Content]
|
||||
Invokes: /meta-claude:skill:review-content plugins/infrastructure/terraform-skills/skills/terraform-best-practices
|
||||
Analyzes clarity, completeness, examples, actionability, usefulness
|
||||
✓ Content review passed (5/5 quality dimensions)
|
||||
|
||||
[Phase 5: Review Compliance]
|
||||
Invokes: /meta-claude:skill:review-compliance plugins/infrastructure/terraform-skills/skills/terraform-best-practices
|
||||
Runs quick_validate.py
|
||||
✓ Compliance check passed
|
||||
|
||||
[Phase 6: Validate Runtime]
|
||||
Invokes: /meta-claude:skill:validate-runtime plugins/infrastructure/terraform-skills/skills/terraform-best-practices
|
||||
Tests skill loading in Claude Code context
|
||||
✓ Runtime validation passed
|
||||
|
||||
[Phase 7: Validate Integration]
|
||||
Invokes: /meta-claude:skill:validate-integration plugins/infrastructure/terraform-skills/skills/terraform-best-practices
|
||||
Checks for conflicts with existing skills
|
||||
✓ Integration validation passed
|
||||
|
||||
[Phase 8: Audit]
|
||||
Invokes: /meta-claude:skill:validate-audit plugins/infrastructure/terraform-skills/skills/terraform-best-practices
|
||||
Runs claude-skill-auditor agent
|
||||
ℹ Audit completed with recommendations (non-blocking)
|
||||
|
||||
[Phase 9: Complete]
|
||||
✅ Skill created and validated successfully!
|
||||
|
||||
Location: plugins/infrastructure/terraform-skills/skills/terraform-best-practices/
|
||||
|
||||
Research materials: docs/research/skills/terraform-best-practices/
|
||||
Keep research materials? [Keep/Remove] (default: Keep)
|
||||
|
||||
User: Keep
|
||||
|
||||
Next steps - choose an option:
|
||||
[1] Test the skill now
|
||||
[2] Create PR
|
||||
[3] Add to plugin.json
|
||||
[4] Done
|
||||
|
||||
User: [2] Create PR
|
||||
|
||||
skill-factory creates branch, commits skill, pushes, opens PR
|
||||
Workflow complete!
|
||||
```
|
||||
|
||||
## Example 2: Creating Skill with Existing Research (Path 1)
|
||||
|
||||
```text
|
||||
User: "Create coderabbit skill, research in docs/research/skills/coderabbit/"
|
||||
|
||||
skill-factory detects explicit research path, uses Path 1
|
||||
|
||||
Initializes TodoWrite with 8 tasks (skips research), starts workflow:
|
||||
|
||||
[Phase 1: Format]
|
||||
Invokes: /meta-claude:skill:format docs/research/skills/coderabbit
|
||||
✓ Formatting completed
|
||||
|
||||
[Phase 2: Create]
|
||||
Invokes: /meta-claude:skill:create coderabbit docs/research/skills/coderabbit
|
||||
✓ Skill created
|
||||
|
||||
[Continues through remaining phases...]
|
||||
✓ Workflow complete
|
||||
```
|
||||
|
||||
## Example 3: Workflow Failure with Guided Fix
|
||||
|
||||
```text
|
||||
[Phase 4: Review Content]
|
||||
Invokes: /meta-claude:skill:review-content plugins/meta/meta-claude/skills/docker-compose
|
||||
|
||||
Command response:
|
||||
{
|
||||
"status": "fail",
|
||||
"issues": [
|
||||
{
|
||||
"tier": "medium",
|
||||
"category": "examples",
|
||||
"description": "Examples section missing practical docker-compose.yml configurations",
|
||||
"suggestion": "[Shows proposed examples with common patterns]",
|
||||
"auto_fixable": false
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
skill-factory detects Tier 2 (guided fix):
|
||||
|
||||
"Content review found issues that can be fixed:
|
||||
|
||||
Issue: Examples section missing practical docker-compose.yml configurations
|
||||
|
||||
Suggested fix:
|
||||
[Shows proposed docker-compose.yml examples]
|
||||
|
||||
Apply this fix? [Yes/No/Edit]"
|
||||
|
||||
User: Yes
|
||||
|
||||
skill-factory applies fix, re-runs command:
|
||||
|
||||
Invokes: /meta-claude:skill:review-content plugins/meta/meta-claude/skills/docker-compose
|
||||
✓ Content review passed
|
||||
|
||||
Continues to next phase...
|
||||
```
|
||||
|
||||
## Example 4: Workflow Failure with Complex Issue
|
||||
|
||||
```text
|
||||
[Phase 4: Review Content]
|
||||
Invokes: /meta-claude:skill:review-content plugins/meta/meta-claude/skills/advanced-orchestration
|
||||
|
||||
Command response:
|
||||
{
|
||||
"status": "fail",
|
||||
"issues": [
|
||||
{
|
||||
"tier": "complex",
|
||||
"category": "architectural",
|
||||
"description": "Skill attempts to nest sub-agents within sub-agents, which violates Claude Code composition rules",
|
||||
"recommendation": "Restructure skill to use SlashCommand tool for sub-agent invocation. See multi-agent-composition skill for patterns.",
|
||||
"auto_fixable": false
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
skill-factory detects Tier 3 (complex), fails fast:
|
||||
|
||||
❌ Skill creation failed at: Review Content Quality
|
||||
|
||||
Issue found:
|
||||
- [Tier 3: Complex] Skill attempts to nest sub-agents within sub-agents, which violates Claude Code composition rules
|
||||
|
||||
Recommendation:
|
||||
- Restructure skill to use SlashCommand tool for sub-agent invocation
|
||||
- See: plugins/meta/meta-claude/skills/multi-agent-composition/patterns/orchestrator-pattern.md
|
||||
|
||||
Workflow stopped. Please fix manually and restart with:
|
||||
skill-factory advanced-orchestration docs/research/skills/advanced-orchestration/
|
||||
|
||||
Artifacts preserved at:
|
||||
Research: docs/research/skills/advanced-orchestration/
|
||||
Partial skill: plugins/meta/meta-claude/skills/advanced-orchestration/
|
||||
|
||||
WORKFLOW EXITS
|
||||
```
|
||||
|
||||
## Command Output Reference
|
||||
|
||||
### Successful Command Outputs
|
||||
|
||||
**Research:**
|
||||
|
||||
```bash
|
||||
✓ Research completed for terraform-best-practices
|
||||
Saved to: docs/research/skills/terraform-best-practices/
|
||||
Files: 5 documents (github: 3, research: 2)
|
||||
```
|
||||
|
||||
**Format:**
|
||||
|
||||
```text
|
||||
✓ Formatting completed
|
||||
Cleaned: 5 files, removed 247 UI artifacts
|
||||
Output: docs/research/skills/terraform-best-practices/
|
||||
```
|
||||
|
||||
**Validation (Pass):**
|
||||
|
||||
```text
|
||||
✓ Content review passed (5/5 quality dimensions)
|
||||
✓ Compliance check passed
|
||||
✓ Runtime validation passed
|
||||
✓ Integration validation passed
|
||||
```
|
||||
|
||||
### Failed Command Outputs
|
||||
|
||||
**Tier 1 (Auto-fix):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "fail",
|
||||
"issues": [
|
||||
{
|
||||
"tier": "simple",
|
||||
"category": "frontmatter",
|
||||
"description": "Missing description field",
|
||||
"fix": "Add description: 'Terraform infrastructure best practices'",
|
||||
"auto_fixable": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Tier 2 (Guided fix):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "fail",
|
||||
"issues": [
|
||||
{
|
||||
"tier": "medium",
|
||||
"category": "examples",
|
||||
"description": "Examples section lacks practical configurations",
|
||||
"suggestion": "[Proposed examples with common patterns]",
|
||||
"auto_fixable": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Tier 3 (Complex):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "fail",
|
||||
"issues": [
|
||||
{
|
||||
"tier": "complex",
|
||||
"category": "architectural",
|
||||
"description": "Violates composition rules",
|
||||
"recommendation": "Restructure to use SlashCommand tool",
|
||||
"auto_fixable": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
80
skills/skill-factory/references/workflow-execution.md
Normal file
80
skills/skill-factory/references/workflow-execution.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Workflow Execution
|
||||
|
||||
## Phase Invocation Pattern
|
||||
|
||||
For each phase in the workflow:
|
||||
|
||||
1. **Mark phase as in_progress** (update TodoWrite)
|
||||
2. **Check dependencies** (verify prior phases completed)
|
||||
3. **Invoke command** using SlashCommand tool:
|
||||
|
||||
```text
|
||||
/meta-claude:skill:research <skill-name> [sources]
|
||||
/meta-claude:skill:format <research-dir>
|
||||
/meta-claude:skill:create <skill-name> <research-dir>
|
||||
/meta-claude:skill:review-content <skill-path>
|
||||
/meta-claude:skill:review-compliance <skill-path>
|
||||
/meta-claude:skill:validate-runtime <skill-path>
|
||||
/meta-claude:skill:validate-integration <skill-path>
|
||||
/meta-claude:skill:validate-audit <skill-path>
|
||||
```
|
||||
|
||||
4. **Check result** (success or failure with tier metadata)
|
||||
5. **Apply fix strategy** (if needed - see Error Handling section)
|
||||
6. **Mark phase completed** (update TodoWrite)
|
||||
7. **Continue to next phase** (or exit if fail-fast triggered)
|
||||
|
||||
### Dependency Enforcement
|
||||
|
||||
Before running each command, verify dependencies:
|
||||
|
||||
**Review Phase (Sequential):**
|
||||
|
||||
```text
|
||||
/meta-claude:skill:review-content (no dependency)
|
||||
↓ (must pass)
|
||||
/meta-claude:skill:review-compliance (depends on content passing)
|
||||
```
|
||||
|
||||
**Validation Phase (Tiered):**
|
||||
|
||||
```text
|
||||
/meta-claude:skill:validate-runtime (depends on compliance passing)
|
||||
↓ (must pass)
|
||||
/meta-claude:skill:validate-integration (depends on runtime passing)
|
||||
↓ (runs regardless)
|
||||
/meta-claude:skill:validate-audit (non-blocking, informational)
|
||||
```
|
||||
|
||||
**Dependency Check Pattern:**
|
||||
|
||||
```text
|
||||
Before running /meta-claude:skill:review-compliance:
|
||||
Check: Is "Review content quality" completed?
|
||||
- Yes → Invoke /meta-claude:skill:review-compliance
|
||||
- No → Skip (workflow failed earlier, stop here)
|
||||
```
|
||||
|
||||
### Command Invocation with SlashCommand Tool
|
||||
|
||||
Use the SlashCommand tool to invoke each primitive command:
|
||||
|
||||
```javascript
|
||||
// Example: Invoking research phase
|
||||
SlashCommand({
|
||||
command: "/meta-claude:skill:research ansible-vault-security"
|
||||
})
|
||||
|
||||
// Example: Invoking format phase
|
||||
SlashCommand({
|
||||
command: "/meta-claude:skill:format docs/research/skills/ansible-vault-security"
|
||||
})
|
||||
|
||||
// Example: Invoking create phase
|
||||
SlashCommand({
|
||||
command: "/meta-claude:skill:create ansible-vault-security docs/research/skills/ansible-vault-security"
|
||||
})
|
||||
```
|
||||
|
||||
**IMPORTANT:** Wait for each command to complete before proceeding to the next phase. Check the response status
|
||||
before continuing.
|
||||
368
skills/skill-factory/workflows/visual-guide.md
Normal file
368
skills/skill-factory/workflows/visual-guide.md
Normal file
@@ -0,0 +1,368 @@
|
||||
# Skill-Factory Visual Guide
|
||||
|
||||
Visual decision trees and workflow diagrams for the skill-factory orchestrator.
|
||||
|
||||
---
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
- **New to skill-factory?** Start with "Decision Tree: Full Workflow vs Individual Commands"
|
||||
- **Understanding the workflow?** Study the "Workflow State Diagram"
|
||||
- **Quick reference?** Check the "Command Decision Matrix"
|
||||
- **Troubleshooting?** Use the "Error Handling Flow"
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree: Full Workflow vs Individual Commands
|
||||
|
||||
This decision tree helps you choose between using the orchestrated workflow or individual slash commands.
|
||||
|
||||
```graphviz
|
||||
digraph skill_factory_decision {
|
||||
rankdir=TB;
|
||||
node [shape=box, style=rounded];
|
||||
|
||||
start [label="What are you trying to do?", shape=diamond, style="filled", fillcolor=lightblue];
|
||||
|
||||
new_skill [label="Creating a\nnew skill?", shape=diamond];
|
||||
have_research [label="Research\nalready gathered?", shape=diamond];
|
||||
specific_issue [label="Fixing a\nspecific issue?", shape=diamond];
|
||||
which_phase [label="Which phase\nhas the issue?", shape=diamond];
|
||||
|
||||
full_workflow_research [label="Use Full Workflow\n(Path 2)\n\nskill-factory <name>\n\n✓ Includes research\n✓ 8-phase validation\n✓ Progress tracking", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
|
||||
full_workflow_skip [label="Use Full Workflow\n(Path 1)\n\nskill-factory <name> <research-path>\n\n✓ Skips research phase\n✓ Full 7-phase validation\n✓ TodoWrite progress tracking", shape=rect, style="filled", fillcolor=lightgreen];
|
||||
|
||||
research_cmd [label="/meta-claude:skill:research\n\nAutomate firecrawl scraping\nfor skill domain knowledge", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
format_cmd [label="/meta-claude:skill:format\n\nClean and structure\nraw research materials", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
create_cmd [label="/meta-claude:skill:create\n\nGenerate SKILL.md with\nYAML frontmatter", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
review_content_cmd [label="/meta-claude:skill:review-content\n\nValidate content quality\nand clarity", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
review_compliance_cmd [label="/meta-claude:skill:review-compliance\n\nRun quick_validate.py on\nSKILL.md", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
validate_runtime_cmd [label="/meta-claude:skill:validate-runtime\n\nTest skill loading\nin Claude context", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
validate_integration_cmd [label="/meta-claude:skill:validate-integration\n\nCheck for conflicts with\nexisting skills", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
validate_audit_cmd [label="/meta-claude:skill:validate-audit\n\nInvoke claude-skill-auditor\nfor comprehensive audit", shape=rect, style="filled", fillcolor=lightyellow];
|
||||
|
||||
start -> new_skill;
|
||||
|
||||
new_skill -> have_research [label="Yes"];
|
||||
new_skill -> specific_issue [label="No"];
|
||||
|
||||
have_research -> full_workflow_skip [label="Yes\nHave research at\nspecific path"];
|
||||
have_research -> full_workflow_research [label="No\nNeed to gather\nresearch"];
|
||||
|
||||
specific_issue -> which_phase [label="Yes"];
|
||||
specific_issue -> full_workflow_research [label="No\nUse full workflow"];
|
||||
|
||||
which_phase -> research_cmd [label="Research gathering"];
|
||||
which_phase -> format_cmd [label="Research formatting"];
|
||||
which_phase -> create_cmd [label="Skill generation"];
|
||||
which_phase -> review_content_cmd [label="Content quality"];
|
||||
which_phase -> review_compliance_cmd [label="YAML/compliance"];
|
||||
which_phase -> validate_runtime_cmd [label="Skill won't load"];
|
||||
which_phase -> validate_integration_cmd [label="Name conflicts"];
|
||||
which_phase -> validate_audit_cmd [label="Anthropic spec audit"];
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Tree Key Points
|
||||
|
||||
**Critical Rule**: For new skills, use the **full workflow** (orchestrated). For specific fixes,
|
||||
use **individual commands**.
|
||||
|
||||
**Decision Flow**:
|
||||
|
||||
1. **Creating new skill?**
|
||||
- Yes → Check if research exists
|
||||
- Research exists → Full Workflow (Path 1)
|
||||
- Research needed → Full Workflow (Path 2)
|
||||
- No → Check if fixing specific issue
|
||||
2. **Fixing specific issue?**
|
||||
- Yes → Use individual command for that phase
|
||||
- No → Use full workflow
|
||||
|
||||
**Remember**: Individual commands are power user tools. Most users should use the full orchestrated workflow.
|
||||
|
||||
---
|
||||
|
||||
## Workflow State Diagram
|
||||
|
||||
Shows the phases and state transitions during skill creation.
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> EntryPoint
|
||||
|
||||
EntryPoint --> PathDecision: Analyze prompt
|
||||
|
||||
PathDecision --> Path1: Research exists
|
||||
PathDecision --> Path2: Research needed
|
||||
|
||||
Path2 --> Research: Phase 1
|
||||
Research --> Format: Success
|
||||
Research --> FailFast: Tier 3 Error
|
||||
|
||||
Path1 --> Format: Skip research
|
||||
|
||||
Format --> Create: Success
|
||||
Format --> AutoFix: Tier 1 Error
|
||||
Format --> GuidedFix: Tier 2 Error
|
||||
Format --> FailFast: Tier 3 Error
|
||||
|
||||
AutoFix --> Format: Retry once
|
||||
GuidedFix --> Format: User approves
|
||||
GuidedFix --> FailFast: User declines
|
||||
|
||||
Create --> ReviewContent: Success
|
||||
Create --> AutoFix2: Tier 1 Error
|
||||
Create --> GuidedFix2: Tier 2 Error
|
||||
Create --> FailFast: Tier 3 Error
|
||||
|
||||
AutoFix2 --> Create: Retry once
|
||||
GuidedFix2 --> Create: User approves
|
||||
GuidedFix2 --> FailFast: User declines
|
||||
|
||||
ReviewContent --> ReviewCompliance: Pass
|
||||
ReviewContent --> FailFast: Fail
|
||||
|
||||
ReviewCompliance --> ValidateRuntime: Pass
|
||||
ReviewCompliance --> FailFast: Fail
|
||||
|
||||
ValidateRuntime --> ValidateIntegration: Pass
|
||||
ValidateRuntime --> FailFast: Fail
|
||||
|
||||
ValidateIntegration --> ValidateAudit: Pass
|
||||
ValidateIntegration --> FailFast: Fail
|
||||
|
||||
ValidateAudit --> Complete: Always runs
|
||||
|
||||
Complete --> NextSteps: Present options
|
||||
|
||||
NextSteps --> Test: User choice
|
||||
NextSteps --> CreatePR: User choice
|
||||
NextSteps --> UpdatePlugin: User choice
|
||||
NextSteps --> [*]: Done
|
||||
|
||||
FailFast --> [*]: Exit with guidance
|
||||
|
||||
note right of PathDecision
|
||||
Uses AskUserQuestion
|
||||
if path ambiguous
|
||||
end note
|
||||
|
||||
note right of AutoFix
|
||||
One-shot policy:
|
||||
Apply fix once,
|
||||
retry once,
|
||||
then fail fast
|
||||
end note
|
||||
|
||||
note right of ValidateAudit
|
||||
Non-blocking:
|
||||
Runs regardless of
|
||||
prior failures
|
||||
end note
|
||||
```
|
||||
|
||||
### State Diagram Key Points
|
||||
|
||||
**Entry Point Detection**:
|
||||
|
||||
- Analyzes user prompt
|
||||
- Uses AskUserQuestion if ambiguous
|
||||
- Routes to Path 1 (skip research) or Path 2 (include research)
|
||||
|
||||
**Error Handling States**:
|
||||
|
||||
- **AutoFix**: Tier 1 errors (formatting, syntax) - automated fix
|
||||
- **GuidedFix**: Tier 2 errors (content clarity) - user approval required
|
||||
- **FailFast**: Tier 3 errors (architectural) - exit immediately
|
||||
|
||||
**Quality Gates**:
|
||||
|
||||
- ReviewContent must pass before ReviewCompliance
|
||||
- ReviewCompliance must pass before ValidateRuntime
|
||||
- ValidateRuntime must pass before ValidateIntegration
|
||||
- ValidateAudit always runs (non-blocking feedback)
|
||||
|
||||
---
|
||||
|
||||
## Command Decision Matrix
|
||||
|
||||
Quick reference for choosing the right command.
|
||||
|
||||
| Scenario | Command | Why | Phase |
|
||||
|----------|---------|-----|-------|
|
||||
| **Need web research** | `/meta-claude:skill:research` | Automated firecrawl scraping | 1 |
|
||||
| **Have messy research** | `/meta-claude:skill:format` | Clean markdown formatting | 2 |
|
||||
| **Ready to generate SKILL.md** | `/meta-claude:skill:create` | Creates structure with YAML | 3 |
|
||||
| **Content unclear** | `/meta-claude:skill:review-content` | Quality gate before compliance | 4 |
|
||||
| **Check frontmatter** | `/meta-claude:skill:review-compliance` | Runs quick_validate.py | 5 |
|
||||
| **Skill won't load** | `/meta-claude:skill:validate-runtime` | Tests actual loading | 6 |
|
||||
| **Worried about conflicts** | `/meta-claude:skill:validate-integration` | Checks existing skills | 7 |
|
||||
| **Want Anthropic audit** | `/meta-claude:skill:validate-audit` | Runs claude-skill-auditor | 8 |
|
||||
|
||||
**Phase numbers** show the sequential order in the full workflow.
|
||||
|
||||
---
|
||||
|
||||
## Error Handling Flow
|
||||
|
||||
Visual representation of the three-tier fix strategy.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start([Command Executes]) --> Check{Check Result}
|
||||
|
||||
Check -->|Success| MarkComplete[Mark Phase Completed]
|
||||
Check -->|Failure| ClassifyError{Classify Error Tier}
|
||||
|
||||
ClassifyError -->|Tier 1<br/>Formatting, Syntax| AutoFix[Auto-Fix]
|
||||
ClassifyError -->|Tier 2<br/>Content Clarity| GuidedFix[Guided Fix]
|
||||
ClassifyError -->|Tier 3<br/>Architecture| FailFast[Fail Fast]
|
||||
|
||||
AutoFix --> ApplyFix1[Apply Fix Automatically]
|
||||
ApplyFix1 --> Retry1[Retry Command Once]
|
||||
Retry1 --> Check2{Check Result}
|
||||
|
||||
Check2 -->|Success| MarkComplete
|
||||
Check2 -->|Still Failed| EscalateTier2[Escalate to Tier 2]
|
||||
EscalateTier2 --> GuidedFix
|
||||
|
||||
GuidedFix --> Present[Present Fix to User]
|
||||
Present --> AskApproval{User Approves?}
|
||||
|
||||
AskApproval -->|Yes| ApplyFix2[Apply Fix]
|
||||
AskApproval -->|No| FailFast
|
||||
|
||||
ApplyFix2 --> Retry2[Retry Command Once]
|
||||
Retry2 --> Check3{Check Result}
|
||||
|
||||
Check3 -->|Success| MarkComplete
|
||||
Check3 -->|Still Failed| FailFast
|
||||
|
||||
FailFast --> Report[Report Issue with Detail]
|
||||
Report --> Guidance[Provide Fix Guidance]
|
||||
Guidance --> Exit([Exit Workflow])
|
||||
|
||||
MarkComplete --> Continue[Continue to Next Phase]
|
||||
|
||||
style AutoFix fill:#90EE90
|
||||
style GuidedFix fill:#FFE4B5
|
||||
style FailFast fill:#FFB6C1
|
||||
style MarkComplete fill:#ADD8E6
|
||||
```
|
||||
|
||||
### Error Handling Key Points
|
||||
|
||||
**Tier 1 (Auto-Fix)**: Formatting errors, YAML syntax, markdown issues
|
||||
|
||||
- **Action**: Apply fix automatically
|
||||
- **Retry**: Once
|
||||
- **Escalation**: If still fails → Tier 2
|
||||
|
||||
**Tier 2 (Guided-Fix)**: Content clarity, instruction rewording
|
||||
|
||||
- **Action**: Present suggested fix to user
|
||||
- **User Choice**: Approve or decline
|
||||
- **Retry**: Once if approved
|
||||
- **Escalation**: If still fails or user declines → Tier 3
|
||||
|
||||
**Tier 3 (Fail-Fast)**: Architectural problems, schema violations
|
||||
|
||||
- **Action**: Report issue with detailed explanation
|
||||
- **Recovery**: Exit immediately with guidance
|
||||
- **Manual**: User must fix manually
|
||||
|
||||
**One-Shot Policy**: Each tier gets one fix attempt, one retry, then escalates or fails. Prevents infinite loops.
|
||||
|
||||
---
|
||||
|
||||
## TodoWrite Progress Visualization
|
||||
|
||||
Shows how TodoWrite tracks progress through the workflow.
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title Skill-Factory Progress Tracking (Path 2 - Full Workflow)
|
||||
dateFormat X
|
||||
axisFormat %s
|
||||
|
||||
section Research
|
||||
Research skill domain :done, phase1, 0, 1
|
||||
|
||||
section Format
|
||||
Format research materials :active, phase2, 1, 2
|
||||
|
||||
section Create
|
||||
Create skill structure :phase3, 2, 3
|
||||
|
||||
section Review
|
||||
Review content quality :phase4, 3, 4
|
||||
Review technical compliance :phase5, 4, 5
|
||||
|
||||
section Validate
|
||||
Validate runtime loading :phase6, 5, 6
|
||||
Validate integration :phase7, 6, 7
|
||||
|
||||
section Audit
|
||||
Run comprehensive audit :phase8, 7, 8
|
||||
|
||||
section Complete
|
||||
Complete workflow :phase9, 8, 9
|
||||
```
|
||||
|
||||
**Status Indicators**:
|
||||
|
||||
- **Green** (done): Phase completed successfully
|
||||
- **Blue** (active): Phase currently in progress
|
||||
- **Gray** (pending): Phase not yet started
|
||||
|
||||
**TodoWrite Example** (Phase 2 in progress):
|
||||
|
||||
```javascript
|
||||
[
|
||||
{"content": "Research skill domain", "status": "completed", "activeForm": "Researching skill domain"},
|
||||
{"content": "Format research materials", "status": "in_progress", "activeForm": "Formatting research materials"},
|
||||
{"content": "Create skill structure", "status": "pending", "activeForm": "Creating skill structure"},
|
||||
{"content": "Review content quality", "status": "pending", "activeForm": "Reviewing content quality"},
|
||||
{"content": "Review technical compliance", "status": "pending", "activeForm": "Reviewing technical compliance"},
|
||||
{"content": "Validate runtime loading", "status": "pending", "activeForm": "Validating runtime loading"},
|
||||
{"content": "Validate integration", "status": "pending", "activeForm": "Validating integration"},
|
||||
{"content": "Run comprehensive audit", "status": "pending", "activeForm": "Running comprehensive audit"},
|
||||
{"content": "Complete workflow", "status": "pending", "activeForm": "Completing workflow"}
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### When to Use Visual Guides
|
||||
|
||||
- **New users**: Start with Decision Tree to understand full workflow vs individual commands
|
||||
- **Debugging**: Use Error Handling Flow to understand fix strategies
|
||||
- **Learning**: Study Workflow State Diagram to understand phase transitions
|
||||
- **Quick reference**: Use Command Decision Matrix for fast lookup
|
||||
|
||||
### Composition Pattern
|
||||
|
||||
This visual guide follows the same pattern as **multi-agent-composition/workflows/decision-tree.md**:
|
||||
|
||||
- Multiple visual formats (Graphviz, Mermaid, Tables)
|
||||
- Decision trees with diamond decision nodes
|
||||
- State diagrams showing transitions
|
||||
- Quick reference matrices
|
||||
- Best practices sections
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** Complete Visual Guide
|
||||
**Pattern Source:** multi-agent-composition/workflows/decision-tree.md
|
||||
**Last Updated:** 2025-11-17
|
||||
Reference in New Issue
Block a user