Initial commit

2025-11-29 18:00:36 +08:00
commit c83b4639c5
49 changed files with 18594 additions and 0 deletions
--- a/skills/multi-agent-composition/patterns/context-management.md
+++ b/skills/multi-agent-composition/patterns/context-management.md
@@ -0,0 +1,715 @@
+# Context Window Protection
+
+> "200k context window is plenty. You're just stuffing a single agent with too much work. Don't force your agent to context switch."
+
+Context window protection is about managing your agent's most precious resource: attention. A focused agent is a performant agent.
+
+## The Core Problem
+
+**Every engineer hits this wall:**
+
+```text
+Agent starts:  10k tokens (5% used)
+           ↓
+After exploration:  80k tokens (40% used)
+           ↓
+After planning:  120k tokens (60% used)
+           ↓
+During implementation:  170k tokens (85% used) ⚠️
+           ↓
+Context explodes:  195k tokens (98% used) ❌
+           ↓
+Agent performance degrades, fails, or times out
+```
+
+**The realization:** More context ≠ better performance. Too much context = cognitive overload.
+
+## The R&D Framework
+
+There are only two ways to manage your context window:
+
+```text
+R - REDUCE
+└─→ Minimize what enters the context window
+
+D - DELEGATE
+└─→ Move work to other agents' context windows
+```
+
+**Everything else is a tactic implementing R or D.**
+
+## The Four Levels of Context Protection
+
+### Level 1: Beginner - Reduce Waste
+
+**Focus:** Stop wasting tokens on unused resources
+
+#### Tactic 1: Eliminate Default MCP Servers
+
+**Problem:**
+
+```bash
+# Default mcp.json
+{
+  "mcpServers": {
+    "firecrawl": {...},   # 6k tokens
+    "github": {...},      # 8k tokens
+    "postgres": {...},    # 5k tokens
+    "redis": {...}        # 5k tokens
+  }
+}
+# Total: 24k tokens always loaded (12% of 200k window!)
+```
+
+**Solution:**
+
+```bash
+# Option 1: Delete default mcp.json entirely
+rm .claude/mcp.json
+
+# Option 2: Load selectively
+claude-mcp-config --strict specialized-configs/firecrawl-only.json
+# Result: 4k tokens instead of 24k (83% reduction)
+```
+
+#### Tactic 2: Minimize CLAUDE.md
+
+**Before:**
+
+```markdown
+# CLAUDE.md (23,000 tokens = 11.5% of window)
+- 500 lines of API documentation
+- 300 lines of deployment procedures
+- 1,500 lines of coding standards
+- Architecture diagrams
+- Always loaded, whether relevant or not
+```
+
+**After:**
+
+```markdown
+# CLAUDE.md (500 tokens = 0.25% of window)
+# Only universal essentials
+
+- Fenced code blocks MUST have language
+- Use rg instead of grep
+- ALWAYS use set -euo pipefail
+```
+
+**Rule:** Only include what you're 100% sure you want loaded 100% of the time.
+
+#### Tactic 3: Disable Autocompact Buffer
+
+**Problem:**
+
+```bash
+/context
+
+# Output:
+autocompact buffer: 22%  ⚠️ (44k tokens gone!)
+messages: 51%
+system_tools: 8%
+---
+Total available: 78% (should be 100%)
+```
+
+**Solution:**
+
+```bash
+/config
+# Set: autocompact = false
+
+# Now:
+/context
+# Output:
+messages: 51%
+system_tools: 8%
+custom_agents: 2%
+---
+Total available: 91% ✅ (reclaimed 22%!)
+```
+
+**Impact:** Reclaims 40k+ tokens immediately.
+
+### Level 2: Intermediate - Dynamic Loading
+
+**Focus:** Load what you need, when you need it
+
+#### Tactic 4: Context Priming
+
+**Replace static CLAUDE.md with task-specific `/prime` commands**
+
+```markdown
+# .claude/commands/prime.md
+# General codebase context (2k tokens)
+Read README, understand structure, report findings
+
+# .claude/commands/prime-feature.md
+# Feature development context (3k tokens)
+Read feature requirements, understand dependencies, plan implementation
+
+# .claude/commands/prime-api.md
+# API work context (4k tokens)
+Read API docs, understand endpoints, review integration patterns
+```
+
+**Usage pattern:**
+
+```bash
+# Starting feature work
+/prime-feature
+
+# vs. having 23k tokens always loaded
+```
+
+**Savings:** 20k tokens (87% reduction)
+
+#### Tactic 5: Sub-Agent Delegation
+
+**Problem:** Primary agent doing parallel work fills its own context
+
+```text
+Primary Agent tries to do:
+├── Web scraping (15k tokens)
+├── Documentation fetch (12k tokens)
+├── Data analysis (10k tokens)
+└── Synthesis (5k tokens)
+= 42k tokens in one agent
+```
+
+**Solution:** Delegate to sub-agents with isolated contexts
+
+```text
+Primary Agent (9k tokens):
+├→ Sub-Agent 1: Web scraping (15k tokens, isolated)
+├→ Sub-Agent 2: Docs fetch (12k tokens, isolated)
+└→ Sub-Agent 3: Analysis (10k tokens, isolated)
+
+Total work: 46k tokens
+Primary agent context: Only 9k tokens ✅
+```
+
+**Example:**
+
+```bash
+/load-ai-docs
+
+# Agent spawns 10 sub-agents for web scraping
+# Each scrape: ~3k tokens
+# Total work: 30k tokens
+# Primary agent context: Still only 9k tokens
+# Savings: 21k tokens protected
+```
+
+**Key insight:** Sub-agents use system prompts (not user prompts), keeping their context isolated from primary.
+
+### Level 3: Advanced - Multi-Agent Handoff
+
+**Focus:** Chain agents together without context explosion
+
+#### Tactic 6: Context Bundles
+
+**Problem:** Agent 1's context explodes (180k tokens). Need to hand off to fresh Agent 2 without full replay.
+
+**Solution:** Bundle 60-70% of essential context
+
+```markdown
+# context-bundle-2025-01-05-<session-id>.md
+
+## Context Bundle
+Created: 2025-01-05 14:30
+Source Agent: agent-abc123
+
+## Initial Setup
+/prime-feature
+
+## Read Operations (deduplicated)
+- src/api/endpoints.ts
+- src/components/Auth.tsx
+- config/env.ts
+
+## Key Findings
+- Auth system uses JWT
+- API has 15 endpoints
+- Config needs migration
+
+## User Prompts (summarized)
+1. "Implement OAuth2 flow"
+2. "Add refresh token logic"
+
+[Excluded: full write operations, detailed read contents, tool execution details]
+```
+
+**Usage:**
+
+```bash
+# Agent 1: Context exploding at 180k
+# Automatic bundle saved
+
+# Agent 2: Fresh start (10k base)
+/loadbundle /path/to/context-bundle-<timestamp>.md
+# Agent 2 now has 70% of Agent 1's context in ~15k tokens
+
+# Total: 25k tokens vs. 180k (86% reduction)
+```
+
+#### Tactic 7: Composable Workflows (Scout-Plan-Build)
+
+**Problem:** Single agent searching + planning + building = context explosion
+
+```text
+Monolithic Agent:
+├── Search codebase: 40k tokens
+├── Read files: 60k tokens
+├── Plan changes: 20k tokens
+├── Implement: 30k tokens
+├── Test: 15k tokens
+└── Total: 165k tokens (83% used!)
+```
+
+**Solution:** Break into composable steps that delegate
+
+```text
+/scout-plan-build workflow:
+
+Step 1: /scout (delegates to 4 parallel sub-agents)
+├→ Sub-agents search codebase: 4 × 15k = 60k total
+├→ Output: relevant-files.md (5k tokens)
+└→ Primary agent context: unchanged
+
+Step 2: /plan-with-docs
+├→ Reads relevant-files.md: 5k tokens
+├→ Scrapes docs: 8k tokens
+├→ Creates plan: 3k tokens
+└→ Total added: 16k tokens
+
+Step 3: /build
+├→ Reads plan: 3k tokens
+├→ Implements: 30k tokens
+└→ Total added: 33k tokens
+
+Final primary agent context: 10k + 16k + 33k = 59k tokens
+Savings: 106k tokens (64% reduction)
+```
+
+**Why this works:** Scout step offloads searching from planner (R&D: Reduce + Delegate)
+
+### Level 4: Agentic - Out-of-Loop Systems
+
+**Focus:** Agents working autonomously while you're AFK
+
+#### Tactic 8: Focused Agents (One Agent, One Task)
+
+**Anti-pattern:**
+
+```text
+Super Agent (trying to do everything):
+├── API development
+├── UI implementation
+├── Database migrations
+├── Testing
+├── Documentation
+├── Deployment
+└── Context: 170k tokens (85% used)
+```
+
+**Pattern:**
+
+```text
+Focused Agent Fleet:
+├── Agent 1: API only (30k tokens)
+├── Agent 2: UI only (35k tokens)
+├── Agent 3: DB only (20k tokens)
+├── Agent 4: Tests only (25k tokens)
+├── Agent 5: Docs only (15k tokens)
+└── Each agent: <35k tokens (max 18% per agent)
+```
+
+**Principle:** "A focused engineer is a performant engineer. A focused agent is a performant agent."
+
+#### Tactic 9: Deletable Agents
+
+**Pattern:**
+
+```bash
+# Create agent for specific task
+/create-agent docs-writer "Document frontend components"
+
+# Agent completes task (used 30k tokens)
+
+# DELETE agent immediately
+/delete-agent docs-writer
+
+# Result: 30k tokens freed for next agent
+```
+
+**Lifecycle:**
+
+```text
+1. Create agent → Task-specific context loaded
+2. Agent works → Context grows to completion
+3. Agent completes → Context maxed out
+4. DELETE agent → Context freed
+5. Create new agent → Fresh start
+6. Repeat
+```
+
+**Engineering analogy:** "The best code is no code at all. The best agent is a deleted agent."
+
+#### Tactic 10: Background Agent Delegation
+
+**Problem:** You're in the loop, waiting for agent to finish long task
+
+**Solution:** Delegate to background agent, continue working
+
+```bash
+# In-loop (you wait, your context stays open)
+/implement-feature "Build auth system"
+# Your terminal blocked for 20 minutes
+# Context accumulates: 150k tokens
+
+# Out-of-loop (you continue working)
+/background "Build auth system" \
+  --model opus \
+  --report agents/auth-report.md
+
+# Background agent works independently
+# Your terminal freed immediately
+# Background agent context isolated
+# You get notified when complete
+```
+
+**Context protection:**
+
+- Primary agent: 10k tokens (just manages job queue)
+- Background agent: 150k tokens (isolated, will be deleted)
+- Your interactive session: 10k tokens (protected)
+
+#### Tactic 11: Orchestrator Sleep Pattern
+
+**Problem:** Orchestrator observing all agent work = context explosion
+
+```text
+Orchestrator watches everything:
+├── Scout 1 work: 15k tokens observed
+├── Scout 2 work: 15k tokens observed
+├── Scout 3 work: 15k tokens observed
+├── Planner work: 25k tokens observed
+├── Builder work: 35k tokens observed
+└── Orchestrator context: 105k tokens
+```
+
+**Solution:** Orchestrator sleeps while agents work
+
+```text
+Orchestrator pattern:
+1. Create scouts → 3k tokens (commands only)
+2. SLEEP (not observing)
+3. Wake every 15s, check status → 1k tokens
+4. Scouts complete, read outputs → 5k tokens
+5. Create planner → 2k tokens
+6. SLEEP (not observing)
+7. Wake every 15s, check status → 1k tokens
+8. Planner completes, read output → 3k tokens
+9. Create builder → 2k tokens
+10. SLEEP (not observing)
+
+Orchestrator final context: 17k tokens ✅
+vs. 105k if watching everything (84% reduction)
+```
+
+**Key principle:** Orchestrator wakes to coordinate, sleeps while agents work.
+
+## Monitoring Context Health
+
+### The /context Command
+
+```bash
+/context
+
+# Healthy agent (beginner level):
+messages: 8%
+system_tools: 5%
+custom_agents: 2%
+---
+Total used: 15%  ✅ (85% free)
+
+# Warning (intermediate):
+messages: 45%
+mcp_tools: 18%
+system_tools: 5%
+---
+Total used: 68%  ⚠️ (32% free, approaching limits)
+
+# Danger (needs intervention):
+messages: 72%
+mcp_tools: 24%
+system_tools: 5%
+---
+Total used: 101%  ❌ (context overflow!)
+```
+
+### Success Metrics by Level
+
+| Level | Target Context Free | What This Enables |
+|-------|---------------------|-------------------|
+| Beginner | 85-90% | Basic tasks without running out |
+| Intermediate | 60-75% | Complex tasks with breathing room |
+| Advanced | 40-60% | Multi-step workflows without overflow |
+| Agentic | Per-agent 60-80% | Fleet of focused agents |
+
+### Warning Signs
+
+**Your context window is in danger when:**
+
+❌ **Single agent exceeds 150k tokens**
+
+- Solution: Split work across multiple agents
+
+❌ **Agent needs to read >20 files**
+
+- Solution: Use scout agents to find relevant subset
+
+❌ **`/context` shows >80% used**
+
+- Solution: Start fresh agent, use context bundles
+
+❌ **Agent gets slower/less accurate**
+
+- Solution: Check context usage, delegate to sub-agents
+
+❌ **Autocompact buffer active**
+
+- Solution: Disable it, reclaim 20%+ tokens
+
+## Context Window Hard Limits
+
+> "Context window is a hard limit. We have to respect this and work around it."
+
+### The Reality
+
+```text
+Claude Opus 200k limit:
+├── System prompt: ~8k tokens (4%)
+├── Available tools: ~5k tokens (2.5%)
+├── MCP servers: 0-24k tokens (0-12%)
+├── CLAUDE.md: 0-23k tokens (0-11.5%)
+├── Custom agents: ~2k tokens (1%)
+└── Available for work: 138-185k tokens (69-92.5%)
+
+Best case (optimized): 185k available
+Worst case (unoptimized): 138k available
+Difference: 47k tokens (25% of total capacity!)
+```
+
+### Real Example from the Field
+
+> "We were 14% away from exploding our context in our scout-plan-build workflow."
+
+```text
+Scout-Plan-Build execution:
+├── Base context: 15k tokens
+├── Scout work (4 sub-agents): +40k tokens
+├── Planner work: +35k tokens
+├── Builder work: +80k tokens
+└── Total: 170k tokens
+
+With autocompact buffer (22%):
+170k / 0.78 = 218k tokens
+❌ Exceeds 200k limit by 18k (9% overflow)
+
+Without autocompact buffer:
+170k / 1.0 = 170k tokens
+✅ Within limits with 30k buffer (15% free)
+```
+
+**Lesson:** Every percentage point matters when approaching limits.
+
+## Common Context Explosion Patterns
+
+### Pattern 1: The Sponge Agent
+
+**Symptoms:**
+
+- Agent reads entire codebase
+- Opens 50+ files
+- Context grows 10k tokens every few minutes
+
+**Cause:** No filtering strategy
+
+**Fix:**
+
+```bash
+# Before: Agent reads everything
+Agent: "Analyzing codebase..."
+[reads 100 files = 150k tokens]
+
+# After: Scout first
+/scout "Find files related to authentication"
+# Scout outputs: 5 relevant files
+Agent reads only those 5 files = 8k tokens
+```
+
+### Pattern 2: The Accumulator
+
+**Symptoms:**
+
+- Long conversation
+- Many tool calls
+- Context steadily grows to limit
+
+**Cause:** Not resetting agent between phases
+
+**Fix:**
+
+```bash
+# Phase 1: Exploration
+[Agent explores, context hits 120k]
+
+# Phase 2: Implementation
+# ❌ Bad: Continue same agent (will overflow)
+# ✅ Good: New agent with context bundle
+
+/loadbundle context-from-phase-1.md
+# Fresh agent (15k) + bundle (20k) = 35k tokens
+# Ready for implementation without overflow
+```
+
+### Pattern 3: The Observer
+
+**Symptoms:**
+
+- Orchestrator context growing rapidly
+- Watching all sub-agent work
+- Can't coordinate more than 2-3 agents
+
+**Cause:** Not using sleep pattern
+
+**Fix:**
+
+```python
+# ❌ Bad: Orchestrator watches everything
+for agent in agents:
+    result = orchestrator.watch_agent_work(agent)  # Observes all work
+    orchestrator.context += result  # Context explodes
+
+# ✅ Good: Orchestrator sleeps
+for agent in agents:
+    orchestrator.create_and_command(agent)
+    orchestrator.sleep()  # Not observing
+
+orchestrator.wake_and_check_status()  # Only reads summaries
+```
+
+## The "200k is Plenty" Principle
+
+> "I'm super excited for larger effective context windows, but 200k context window is plenty. You're just stuffing a single agent with too much work."
+
+**The mindset shift:**
+
+```text
+Beginner thinking:
+"I need a bigger context window"
+"If only I had 500k tokens..."
+"My task is too complex for 200k"
+
+Expert thinking:
+"I need better context management"
+"I'm overloading a single agent"
+"I should split this across focused agents"
+```
+
+**The truth:** Most context explosions are design problems, not capacity problems.
+
+### Why 200k is Sufficient
+
+**With proper protection:**
+
+```text
+Task: Refactor authentication across 50-file codebase
+
+Approach 1 (Single Agent - fails):
+├── Agent reads 50 files: 75k tokens
+├── Agent plans changes: 20k tokens
+├── Agent implements: 80k tokens
+├── Agent tests: 30k tokens
+└── Total: 205k tokens ❌ (overflow by 5k)
+
+Approach 2 (Multi-Agent - succeeds):
+├── Scout finds relevant 10 files: 15k tokens
+├── Planner creates strategy: 20k tokens (new agent)
+├── Builder 1 (auth logic): 35k tokens (new agent)
+├── Builder 2 (UI changes): 30k tokens (new agent)
+├── Tester verifies: 25k tokens (new agent)
+└── Max per agent: 35k tokens ✅ (all within limits)
+```
+
+## Integration with Other Patterns
+
+Context window protection enables:
+
+**Progressive Disclosure:**
+
+- Reduces: Minimal static context
+- Enables: Dynamic loading via priming
+
+**Core 4 Management:**
+
+- Protects: Context (pillar #1)
+- Enables: Better model/prompt/tools choices
+
+**Orchestration:**
+
+- Requires: Context protection (orchestrator sleep)
+- Enables: Fleet management without overflow
+
+**Observability:**
+
+- Monitors: Context usage via hooks
+- Prevents: Unnoticed context explosion
+
+## Key Principles
+
+1. **Reduce and Delegate** - The only two strategies that matter
+
+2. **A focused agent is a performant agent** - Single-purpose beats multi-purpose
+
+3. **Agents are deletable** - Free context by removing completed agents
+
+4. **200k is plenty** - Context explosions are design problems
+
+5. **Monitor constantly** - `/context` command is your best friend
+
+6. **Orchestrators must sleep** - Don't observe all agent work
+
+7. **Context bundles over full replay** - 70% of context in 10% of tokens
+
+## Source Attribution
+
+**Primary sources:**
+
+- Elite Context Engineering (R&D framework, 4 levels, all tactics)
+- Claude 2.0 (autocompact buffer, hard limits, scout-plan-build)
+
+**Supporting sources:**
+
+- One Agent to Rule Them All (orchestrator sleep, 200k principle, deletable agents)
+- Sub-Agents (sub-agent delegation, context isolation)
+
+**Key quotes:**
+
+- "200k context window is plenty. You're just stuffing a single agent with too much work." (One Agent)
+- "A focused agent is a performant agent." (Elite Context Engineering)
+- "We were 14% away from exploding our context." (Claude 2.0)
+- "There are only two ways to manage your context window: R and D." (Elite Context Engineering)
+
+## Related Documentation
+
+- [Progressive Disclosure](../reference/progressive-disclosure.md) - Context loading strategies
+- [Orchestrator Pattern](orchestrator-pattern.md) - Fleet management requiring protection
+- [Evolution Path](../workflows/evolution-path.md) - Progression through protection levels
+- [Core 4 Framework](../reference/core-4-framework.md) - Context as first pillar
+
+---
+
+**Remember:** Context window management separates beginners from experts. Master it, and you can scale infinitely with focused agents.