Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:39:00 +08:00
commit 56486a03ae
8 changed files with 4910 additions and 0 deletions

View File

@@ -0,0 +1,13 @@
{
"name": "orchestration",
"description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
"version": "0.1.1",
"author": {
"name": "Jack Rudenko",
"email": "i@madappgang.com",
"company": "MadAppGang"
},
"skills": [
"./skills"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# orchestration
Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.

61
plugin.lock.json Normal file
View File

@@ -0,0 +1,61 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:MadAppGang/claude-code:plugins/orchestration",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "ad90df36843224b97a17f14cfd5a207d4e053c67",
"treeHash": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150",
"generatedAt": "2025-11-28T10:12:05.859643Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "orchestration",
"description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
"version": "0.1.1"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "215babb6dff86f8783d8e97d0a21546e2aaa3b055bc1cde5c4e16c6bf3d6c7a5"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "36414e18947889714f9d80576e01edaab8b3ffdf9efd44107e0f5fb42b0e2270"
},
{
"path": "skills/todowrite-orchestration/SKILL.md",
"sha256": "f681467a2eef99945f90b8f2b654c8c9713f4153afdff19a0c0b312d2f6084de"
},
{
"path": "skills/quality-gates/SKILL.md",
"sha256": "ba13c21d8e9f8abeb856bbec4a6ebc821e92dfe0857942797959087452b175c3"
},
{
"path": "skills/error-recovery/SKILL.md",
"sha256": "133564d1bc0d35a8c35074b089120fe7d7a757b71bdd6222a7a5c23e45f20aa3"
},
{
"path": "skills/multi-agent-coordination/SKILL.md",
"sha256": "9e0156350eb09447221898598611a5270921c31168e7698c4bd0d3bd0ced4616"
},
{
"path": "skills/multi-model-validation/SKILL.md",
"sha256": "9d5c46dfa531f911f4fcc4070fd6c039900bcdb440c997f7eac384001a1ba33e"
}
],
"dirSha256": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,742 @@
---
name: multi-agent-coordination
description: Coordinate multiple agents in parallel or sequential workflows. Use when running agents simultaneously, delegating to sub-agents, switching between specialized agents, or managing agent selection. Trigger keywords - "parallel agents", "sequential workflow", "delegate", "multi-agent", "sub-agent", "agent switching", "task decomposition".
version: 0.1.0
tags: [orchestration, multi-agent, parallel, sequential, delegation, coordination]
keywords: [parallel, sequential, delegate, sub-agent, agent-switching, multi-agent, task-decomposition, coordination]
---
# Multi-Agent Coordination
**Version:** 1.0.0
**Purpose:** Patterns for coordinating multiple agents in complex workflows
**Status:** Production Ready
## Overview
Multi-agent coordination is the foundation of sophisticated Claude Code workflows. This skill provides battle-tested patterns for orchestrating multiple specialized agents to accomplish complex tasks that are beyond the capabilities of a single agent.
The key challenge in multi-agent systems is **dependencies**. Some tasks must execute sequentially (one agent's output feeds into another), while others can run in parallel (independent validations from different perspectives). Getting this right is the difference between a 5-minute workflow and a 15-minute one.
This skill teaches you:
- When to run agents in **parallel** vs **sequential**
- How to **select the right agent** for each task
- How to **delegate** to sub-agents without polluting context
- How to manage **context windows** across multiple agent calls
## Core Patterns
### Pattern 1: Sequential vs Parallel Execution
**When to Use Sequential:**
Use sequential execution when there are **dependencies** between agents:
- Agent B needs Agent A's output as input
- Workflow phases must complete in order (plan → implement → test → review)
- Each agent modifies shared state (same files)
**Example: Multi-Phase Implementation**
```
Phase 1: Architecture Planning
Task: api-architect
Output: ai-docs/architecture-plan.md
Wait for completion ✓
Phase 2: Implementation (depends on Phase 1)
Task: backend-developer
Input: Read ai-docs/architecture-plan.md
Output: src/auth.ts, src/routes.ts
Wait for completion ✓
Phase 3: Testing (depends on Phase 2)
Task: test-architect
Input: Read src/auth.ts, src/routes.ts
Output: tests/auth.test.ts
```
**When to Use Parallel:**
Use parallel execution when agents are **independent**:
- Multiple validation perspectives (designer + tester + reviewer)
- Multiple AI models reviewing same code (Grok + Gemini + Claude)
- Multiple feature implementations in separate files
**Example: Multi-Perspective Validation**
```
Single Message with Multiple Task Calls:
Task: designer
Prompt: Validate UI against Figma design
Output: ai-docs/design-review.md
---
Task: ui-manual-tester
Prompt: Test UI in browser for usability
Output: ai-docs/testing-report.md
---
Task: senior-code-reviewer
Prompt: Review code quality and patterns
Output: ai-docs/code-review.md
All three execute simultaneously (3x speedup!)
Wait for all to complete, then consolidate results.
```
**The 4-Message Pattern for True Parallel Execution:**
This is **CRITICAL** for achieving true parallelism:
```
Message 1: Preparation (Bash Only)
- Create workspace directories
- Validate inputs
- Write context files
- NO Task calls, NO TodoWrite
Message 2: Parallel Execution (Task Only)
- Launch ALL agents in SINGLE message
- ONLY Task tool calls
- Each Task is independent
- All execute simultaneously
Message 3: Consolidation (Task Only)
- Launch consolidation agent
- Automatically triggered when N agents complete
Message 4: Present Results
- Show user final consolidated results
- Include links to detailed reports
```
**Anti-Pattern: Mixing Tool Types Breaks Parallelism**
```
❌ WRONG - Executes Sequentially:
await TodoWrite({...}); // Tool 1
await Task({...}); // Tool 2 - waits for TodoWrite
await Bash({...}); // Tool 3 - waits for Task
await Task({...}); // Tool 4 - waits for Bash
✅ CORRECT - Executes in Parallel:
await Task({...}); // Task 1
await Task({...}); // Task 2
await Task({...}); // Task 3
// All execute simultaneously
```
**Why Mixing Fails:**
Claude Code sees different tool types and assumes there are dependencies between them, forcing sequential execution. Using a single tool type (all Task calls) signals that operations are independent and can run in parallel.
---
### Pattern 2: Agent Selection by Task Type
**Task Detection Logic:**
Intelligent workflows automatically detect task type and select appropriate agents:
```
Task Type Detection:
IF request mentions "API", "endpoint", "backend", "database":
→ API-focused workflow
→ Use: api-architect, backend-developer, test-architect
→ Skip: designer, ui-developer (not relevant)
ELSE IF request mentions "UI", "component", "design", "Figma":
→ UI-focused workflow
→ Use: designer, ui-developer, ui-manual-tester
→ Optional: ui-developer-codex (external validation)
ELSE IF request mentions both API and UI:
→ Mixed workflow
→ Use all relevant agents from both categories
→ Coordinate between backend and frontend agents
ELSE IF request mentions "test", "coverage", "bug":
→ Testing-focused workflow
→ Use: test-architect, ui-manual-tester
→ Optional: codebase-detective (for bug investigation)
ELSE IF request mentions "review", "validate", "feedback":
→ Review-focused workflow
→ Use: senior-code-reviewer, designer, ui-developer
→ Optional: external model reviewers
```
**Agent Capability Matrix:**
| Task Type | Primary Agent | Secondary Agent | Optional External |
|-----------|---------------|-----------------|-------------------|
| API Implementation | backend-developer | api-architect | - |
| UI Implementation | ui-developer | designer | ui-developer-codex |
| Testing | test-architect | ui-manual-tester | - |
| Code Review | senior-code-reviewer | - | codex-code-reviewer |
| Architecture Planning | api-architect OR frontend-architect | - | plan-reviewer |
| Bug Investigation | codebase-detective | test-architect | - |
| Design Validation | designer | ui-developer | designer-codex |
**Agent Switching Pattern:**
Some workflows benefit from **adaptive agent selection** based on context:
```
Example: UI Development with External Validation
Base Implementation:
Task: ui-developer
Prompt: Implement navbar component from design
User requests external validation:
→ Switch to ui-developer-codex OR add parallel ui-developer-codex
→ Run both: embedded ui-developer + external ui-developer-codex
→ Consolidate feedback from both
Scenario 1: User wants speed
→ Use ONLY ui-developer (embedded, fast)
Scenario 2: User wants highest quality
→ Use BOTH ui-developer AND ui-developer-codex (parallel)
→ Consensus analysis on feedback
Scenario 3: User is out of credits
→ Fallback to ui-developer only
→ Notify user external validation unavailable
```
---
### Pattern 3: Sub-Agent Delegation
**File-Based Instructions (Context Isolation):**
When delegating to sub-agents, use **file-based instructions** to avoid context pollution:
```
✅ CORRECT - File-Based Delegation:
Step 1: Write instructions to file
Write: ai-docs/architecture-instructions.md
Content: "Design authentication system with JWT tokens..."
Step 2: Delegate to agent with file reference
Task: api-architect
Prompt: "Read instructions from ai-docs/architecture-instructions.md
and create architecture plan."
Step 3: Agent reads file, does work, writes output
Agent reads: ai-docs/architecture-instructions.md
Agent writes: ai-docs/architecture-plan.md
Step 4: Agent returns brief summary ONLY
Return: "Architecture plan complete. See ai-docs/architecture-plan.md"
Step 5: Orchestrator reads output file if needed
Read: ai-docs/architecture-plan.md
(Only if orchestrator needs to process the output)
```
**Why File-Based?**
- **Avoids context pollution:** Long user requirements don't bloat orchestrator context
- **Reusable:** Multiple agents can read same instruction file
- **Debuggable:** Files persist after workflow completes
- **Clean separation:** Input file, output file, orchestrator stays lightweight
**Anti-Pattern: Inline Delegation**
```
❌ WRONG - Context Pollution:
Task: api-architect
Prompt: "Design authentication system with:
- JWT tokens with refresh token rotation
- Email/password login with bcrypt hashing
- OAuth2 integration with Google, GitHub
- Rate limiting on login endpoint (5 attempts per 15 min)
- Password reset flow with time-limited tokens
- Email verification on signup
- Role-based access control (admin, user, guest)
- Session management with Redis
- Security headers (CORS, CSP, HSTS)
- ... (500 more lines of requirements)"
Problem: Orchestrator's context now contains 500+ lines of requirements
that are only relevant to the architect agent.
```
**Brief Summary Returns:**
Sub-agents should return **2-5 sentence summaries**, not full output:
```
✅ CORRECT - Brief Summary:
"Architecture plan complete. Designed 3-layer authentication:
JWT with refresh tokens, OAuth2 integration (Google/GitHub),
and Redis session management. See ai-docs/architecture-plan.md
for detailed component breakdown."
❌ WRONG - Full Output:
"Architecture plan:
[500 lines of detailed architecture documentation]
Components: AuthController, TokenService, OAuthService...
[another 500 lines]"
```
**Proxy Mode Invocation:**
For external AI models (Claudish), use the PROXY_MODE directive:
```
Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
Prompt: "Review authentication implementation for security issues.
Code context in ai-docs/code-review-context.md"
Agent Behavior:
1. Detects PROXY_MODE directive
2. Extracts model: x-ai/grok-code-fast-1
3. Extracts task: "Review authentication implementation..."
4. Executes: claudish --model x-ai/grok-code-fast-1 --stdin <<< "..."
5. Waits for full response (blocking execution)
6. Writes: ai-docs/grok-review.md (full detailed review)
7. Returns: "Grok review complete. Found 3 CRITICAL issues. See ai-docs/grok-review.md"
```
**Key: Blocking Execution**
External models MUST execute synchronously (blocking) so the agent waits for the full response:
```
✅ CORRECT - Blocking:
RESULT=$(claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT")
echo "$RESULT" > ai-docs/grok-review.md
echo "Review complete - see ai-docs/grok-review.md"
❌ WRONG - Background (returns before completion):
claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT" &
echo "Review started..." # Agent returns immediately, review not done!
```
---
### Pattern 4: Context Window Management
**When to Delegate:**
Delegate to sub-agents when:
- Task is self-contained (clear input → output)
- Output is large (architecture plan, test suite, review report)
- Task requires specialized expertise (designer, tester, reviewer)
- Multiple independent tasks can run in parallel
**When to Execute in Main Context:**
Execute in main orchestrator when:
- Task is small (simple file edit, command execution)
- Output is brief (yes/no decision, status check)
- Task depends on orchestrator state (current phase, iteration count)
- Context pollution risk is low
**Context Size Estimation:**
**Note:** Token estimates below are approximations based on typical usage. Actual context consumption varies by skill complexity, Claude model version, and conversation history. Use these as guidelines, not exact measurements.
Estimate context usage to decide delegation strategy:
```
Context Budget: ~200k tokens (Claude Sonnet 4.5 - actual varies by model)
Current context usage breakdown:
- System prompt: 10k tokens
- Skill content (5 skills): 10k tokens
- Command instructions: 5k tokens
- User request: 1k tokens
- Conversation history: 20k tokens
───────────────────────────────────
Total used: 46k tokens
Remaining: 154k tokens
Safe threshold for delegation: If task will consume >30k tokens, delegate
Example: Architecture planning for large system
- Requirements: 5k tokens
- Expected output: 20k tokens
- Total: 25k tokens
───────────────────────────────────
Decision: Delegate (keeps orchestrator lightweight)
```
**Delegation Strategy by Context Size:**
| Task Output Size | Strategy |
|------------------|----------|
| < 1k tokens | Execute in orchestrator |
| 1k - 10k tokens | Delegate with summary return |
| 10k - 30k tokens | Delegate with file-based output |
| > 30k tokens | Multi-agent decomposition |
**Example: Multi-Agent Decomposition**
```
User Request: "Implement complete e-commerce system"
This is >100k tokens if done by single agent. Decompose:
Phase 1: Break into sub-systems
- Product catalog
- Shopping cart
- Checkout flow
- User authentication
- Order management
- Payment integration
Phase 2: Delegate each sub-system to separate agent
Task: backend-developer
Instruction file: ai-docs/product-catalog-requirements.md
Output file: ai-docs/product-catalog-implementation.md
Task: backend-developer
Instruction file: ai-docs/shopping-cart-requirements.md
Output file: ai-docs/shopping-cart-implementation.md
... (6 parallel agent invocations)
Phase 3: Integration agent
Task: backend-developer
Instruction: "Integrate 6 sub-systems. Read output files:
ai-docs/*-implementation.md"
Output: ai-docs/integration-plan.md
Total context per agent: ~20k tokens (manageable)
vs. Single agent: 120k+ tokens (context overflow risk)
```
---
## Integration with Other Skills
**multi-agent-coordination + multi-model-validation:**
```
Use Case: Code review with multiple AI models
Step 1: Agent Selection (multi-agent-coordination)
- Detect task type: Code review
- Select agents: senior-code-reviewer (embedded) + external models
Step 2: Parallel Execution (multi-model-validation)
- Follow 4-Message Pattern
- Launch all reviewers simultaneously
- Wait for all to complete
Step 3: Consolidation (multi-model-validation)
- Auto-consolidate reviews
- Apply consensus analysis
```
**multi-agent-coordination + quality-gates:**
```
Use Case: Iterative UI validation
Step 1: Agent Selection (multi-agent-coordination)
- Detect task type: UI validation
- Select agents: designer, ui-developer
Step 2: Iteration Loop (quality-gates)
- Run designer validation
- If not PASS: delegate to ui-developer for fixes
- Loop until PASS or max iterations
Step 3: User Validation Gate (quality-gates)
- MANDATORY user approval
- Collect feedback if issues found
```
**multi-agent-coordination + todowrite-orchestration:**
```
Use Case: Multi-phase implementation workflow
Step 1: Initialize TodoWrite (todowrite-orchestration)
- Create task list for all phases
Step 2: Sequential Agent Delegation (multi-agent-coordination)
- Phase 1: api-architect
- Phase 2: backend-developer (depends on Phase 1)
- Phase 3: test-architect (depends on Phase 2)
- Update TodoWrite after each phase
```
---
## Best Practices
**Do:**
- ✅ Use parallel execution for independent tasks (3-5x speedup)
- ✅ Use sequential execution when there are dependencies
- ✅ Use file-based instructions to avoid context pollution
- ✅ Return brief summaries (2-5 sentences) from sub-agents
- ✅ Select agents based on task type (API/UI/Testing/Review)
- ✅ Decompose large tasks into multiple sub-agent calls
- ✅ Estimate context usage before delegating
**Don't:**
- ❌ Mix tool types in parallel execution (breaks parallelism)
- ❌ Inline long instructions in Task prompts (context pollution)
- ❌ Return full output from sub-agents (use files instead)
- ❌ Use parallel execution for dependent tasks (wrong results)
- ❌ Use single agent for >100k token tasks (context overflow)
- ❌ Forget to wait for all parallel tasks before consolidating
**Performance Tips:**
- Parallel execution: 3-5x faster than sequential (5min vs 15min)
- File-based delegation: Saves 50-80% context usage
- Agent switching: Adapt to user preferences (speed vs quality)
- Context decomposition: Enables tasks that would otherwise overflow
---
## Examples
### Example 1: Parallel Multi-Model Code Review
**Scenario:** User requests "Review my authentication code with Grok and Gemini"
**Agent Selection:**
- Task type: Code review
- Agents: senior-code-reviewer (embedded), external Grok, external Gemini
**Execution:**
```
Message 1: Preparation
- Write code context to ai-docs/code-review-context.md
Message 2: Parallel Execution (3 Task calls in single message)
Task: senior-code-reviewer
Prompt: "Review ai-docs/code-review-context.md for security issues"
---
Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
Prompt: "Review ai-docs/code-review-context.md for security issues"
---
Task: codex-code-reviewer PROXY_MODE: google/gemini-2.5-flash
Prompt: "Review ai-docs/code-review-context.md for security issues"
All 3 execute simultaneously (3x faster than sequential)
Message 3: Auto-Consolidation
Task: senior-code-reviewer
Prompt: "Consolidate 3 reviews from:
- ai-docs/claude-review.md
- ai-docs/grok-review.md
- ai-docs/gemini-review.md
Prioritize by consensus."
Message 4: Present Results
"Review complete. 3 models analyzed your code.
Top 5 issues by consensus:
1. [UNANIMOUS] Missing input validation on login endpoint
2. [STRONG] SQL injection risk in user query
3. [MAJORITY] Weak password requirements
See ai-docs/consolidated-review.md for details."
```
**Result:** 5 minutes total (vs 15+ if sequential), consensus-based prioritization
---
### Example 2: Sequential Multi-Phase Implementation
**Scenario:** User requests "Implement payment integration feature"
**Agent Selection:**
- Task type: API implementation
- Agents: api-architect → backend-developer → test-architect → senior-code-reviewer
**Execution:**
```
Phase 1: Architecture Planning
Write: ai-docs/payment-requirements.md
"Integrate Stripe payment processing with webhook support..."
Task: api-architect
Prompt: "Read ai-docs/payment-requirements.md
Create architecture plan"
Output: ai-docs/payment-architecture.md
Return: "Architecture plan complete. Designed 3-layer payment system."
Wait for completion ✓
Phase 2: Implementation (depends on Phase 1)
Task: backend-developer
Prompt: "Read ai-docs/payment-architecture.md
Implement payment integration"
Output: src/payment.ts, src/webhooks.ts
Return: "Payment integration implemented. 2 new files, 500 lines."
Wait for completion ✓
Phase 3: Testing (depends on Phase 2)
Task: test-architect
Prompt: "Write tests for src/payment.ts and src/webhooks.ts"
Output: tests/payment.test.ts, tests/webhooks.test.ts
Return: "Test suite complete. 20 tests covering payment flows."
Wait for completion ✓
Phase 4: Code Review (depends on Phase 3)
Task: senior-code-reviewer
Prompt: "Review payment integration implementation"
Output: ai-docs/payment-review.md
Return: "Review complete. 2 MEDIUM issues found."
Wait for completion ✓
```
**Result:** Sequential execution ensures each phase has correct inputs
---
### Example 3: Adaptive Agent Switching
**Scenario:** User requests "Validate navbar implementation" with optional external AI
**Agent Selection:**
- Task type: UI validation
- Base agent: designer
- Optional: designer-codex (if user wants external validation)
**Execution:**
```
Step 1: Ask user preference
"Do you want external AI validation? (Yes/No)"
Step 2a: If user says NO (speed mode)
Task: designer
Prompt: "Validate navbar against Figma design"
Output: ai-docs/design-review.md
Return: "Design validation complete. PASS with 2 minor suggestions."
Step 2b: If user says YES (quality mode)
Message 1: Parallel Validation
Task: designer
Prompt: "Validate navbar against Figma design"
---
Task: designer PROXY_MODE: design-review-codex
Prompt: "Validate navbar against Figma design"
Message 2: Consolidate
Task: designer
Prompt: "Consolidate 2 design reviews. Prioritize by consensus."
Output: ai-docs/design-review-consolidated.md
Return: "Consolidated review complete. Both agree on 1 CRITICAL issue."
Step 3: User validation
Present consolidated review to user for approval
```
**Result:** Adaptive workflow based on user preference (speed vs quality)
---
## Troubleshooting
**Problem: Parallel tasks executing sequentially**
Cause: Mixed tool types in same message
Solution: Use 4-Message Pattern with ONLY Task calls in Message 2
```
❌ Wrong:
await TodoWrite({...});
await Task({...});
await Task({...});
✅ Correct:
Message 1: await Bash({...}); (prep only)
Message 2: await Task({...}); await Task({...}); (parallel)
```
---
**Problem: Orchestrator context overflowing**
Cause: Inline instructions or full output returns
Solution: Use file-based delegation + brief summaries
```
❌ Wrong:
Task: agent
Prompt: "[1000 lines of inline requirements]"
Return: "[500 lines of full output]"
✅ Correct:
Write: ai-docs/requirements.md
Task: agent
Prompt: "Read ai-docs/requirements.md"
Return: "Complete. See ai-docs/output.md"
```
---
**Problem: Wrong agent selected for task**
Cause: Task type detection failed
Solution: Explicitly detect task type using keywords
```
Check user request for keywords:
- API/endpoint/backend → api-architect, backend-developer
- UI/component/design → designer, ui-developer
- test/coverage → test-architect
- review/validate → senior-code-reviewer
Default: Ask user to clarify task type
```
---
**Problem: Agent returns immediately before external model completes**
Cause: Background execution (non-blocking claudish call)
Solution: Use synchronous (blocking) execution
```
❌ Wrong:
claudish --model grok ... & (background, returns immediately)
✅ Correct:
RESULT=$(claudish --model grok ...) (blocks until complete)
```
---
## Summary
Multi-agent coordination is about choosing the right execution strategy:
- **Parallel** when tasks are independent (3-5x speedup)
- **Sequential** when tasks have dependencies (correct results)
- **File-based delegation** to avoid context pollution (50-80% savings)
- **Brief summaries** from sub-agents (clean orchestrator context)
- **Task type detection** for intelligent agent selection
- **Context decomposition** for large tasks (avoid overflow)
Master these patterns and you can orchestrate workflows of any complexity.
---
**Extracted From:**
- `/implement` command (task detection, sequential workflows)
- `/validate-ui` command (adaptive agent switching)
- `/review` command (parallel execution, 4-Message Pattern)
- `CLAUDE.md` Parallel Multi-Model Execution Protocol

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,996 @@
---
name: quality-gates
description: Implement quality gates, user approval, iteration loops, and test-driven development. Use when validating with users, implementing feedback loops, classifying issue severity, running test-driven loops, or building multi-iteration workflows. Trigger keywords - "approval", "user validation", "iteration", "feedback loop", "severity", "test-driven", "TDD", "quality gate", "consensus".
version: 0.1.0
tags: [orchestration, quality-gates, approval, iteration, feedback, severity, test-driven, TDD]
keywords: [approval, validation, iteration, feedback-loop, severity, test-driven, TDD, quality-gate, consensus, user-approval]
---
# Quality Gates
**Version:** 1.0.0
**Purpose:** Patterns for approval gates, iteration loops, and quality validation in multi-agent workflows
**Status:** Production Ready
## Overview
Quality gates are checkpoints in workflows where execution pauses for validation before proceeding. They prevent low-quality work from advancing through the pipeline and ensure user expectations are met.
This skill provides battle-tested patterns for:
- **User approval gates** (cost gates, quality gates, final acceptance)
- **Iteration loops** (automated refinement until quality threshold met)
- **Issue severity classification** (CRITICAL, HIGH, MEDIUM, LOW)
- **Multi-reviewer consensus** (unanimous vs majority agreement)
- **Feedback loops** (user reports issues → agent fixes → user validates)
- **Test-driven development loops** (write tests → run → analyze failures → fix → repeat)
Quality gates transform "fire and forget" workflows into **iterative refinement systems** that consistently produce high-quality results.
## Core Patterns
### Pattern 1: User Approval Gates
**When to Ask for Approval:**
Use approval gates for:
- **Cost gates:** Before expensive operations (multi-model review, large-scale refactoring)
- **Quality gates:** Before proceeding to next phase (design validation before implementation)
- **Final validation:** Before completing workflow (user acceptance testing)
- **Irreversible operations:** Before destructive actions (delete files, database migrations)
**How to Present Approval:**
```
Good Approval Prompt:
"You selected 5 AI models for code review:
- Claude Sonnet (embedded, free)
- Grok Code Fast (external, $0.002)
- Gemini 2.5 Flash (external, $0.001)
- GPT-5 Codex (external, $0.004)
- DeepSeek Coder (external, $0.001)
Estimated total cost: $0.008 ($0.005 - $0.010)
Expected duration: ~5 minutes
Proceed with multi-model review? (Yes/No/Cancel)"
Why it works:
✓ Clear context (what will happen)
✓ Cost transparency (range, not single number)
✓ Time expectation (5 minutes)
✓ Multiple options (Yes/No/Cancel)
```
**Anti-Pattern: Vague Approval**
```
❌ Wrong:
"This will cost money. Proceed? (Yes/No)"
Why it fails:
✗ No cost details (how much?)
✗ No context (what will happen?)
✗ No alternatives (what if user says no?)
```
**Handling User Responses:**
```
User says YES:
→ Proceed with workflow
→ Track approval in logs
→ Continue to next step
User says NO:
→ Offer alternatives:
1. Use fewer models (reduce cost)
2. Use only free embedded Claude
3. Skip this step entirely
4. Cancel workflow
→ Ask user to choose alternative
→ Proceed based on choice
User says CANCEL:
→ Gracefully exit workflow
→ Save partial results (if any)
→ Log cancellation reason
→ Clean up temporary files
→ Notify user: "Workflow cancelled. Partial results saved to..."
```
**Approval Bypasses (Advanced):**
For automated workflows, allow approval bypass:
```
Automated Workflow Mode:
If workflow is triggered by CI/CD or scheduled task:
→ Skip user approval gates
→ Use predefined defaults (e.g., max cost $0.10)
→ Log decisions for audit trail
→ Email report to stakeholders after completion
Example:
if (isAutomatedMode) {
if (estimatedCost <= maxAutomatedCost) {
log("Auto-approved: $0.008 <= $0.10 threshold");
proceed();
} else {
log("Auto-rejected: $0.008 > $0.10 threshold");
notifyStakeholders("Cost exceeds automated threshold");
abort();
}
}
```
---
### Pattern 2: Iteration Loop Patterns
**Max Iteration Limits:**
Always set a **max iteration limit** to prevent infinite loops:
```
Typical Iteration Limits:
Automated quality loops: 10 iterations
- Designer validation → Developer fixes → Repeat
- Test failures → Developer fixes → Repeat
User feedback loops: 5 rounds
- User reports issues → Developer fixes → User validates → Repeat
Code review loops: 3 rounds
- Reviewer finds issues → Developer fixes → Re-review → Repeat
Multi-model consensus: 1 iteration (no loop)
- Parallel review → Consolidate → Present
```
**Exit Criteria:**
Define clear **exit criteria** for each loop type:
```
Loop Type: Design Validation
Exit Criteria (checked after each iteration):
1. Designer assessment = PASS → Exit loop (success)
2. Iteration count >= 10 → Exit loop (max iterations)
3. User manually approves → Exit loop (user override)
4. No changes made by developer → Exit loop (stuck, escalate)
Example:
for (let i = 1; i <= 10; i++) {
const review = await designer.validate();
if (review.assessment === "PASS") {
log("Design validation passed on iteration " + i);
break; // Success exit
}
if (i === 10) {
log("Max iterations reached. Escalating to user validation.");
break; // Max iterations exit
}
await developer.fix(review.issues);
}
```
**Progress Tracking:**
Show clear progress to user during iterations:
```
Iteration Loop Progress:
Iteration 1/10: Designer found 5 issues → Developer fixing...
Iteration 2/10: Designer found 3 issues → Developer fixing...
Iteration 3/10: Designer found 1 issue → Developer fixing...
Iteration 4/10: Designer assessment: PASS ✓
Loop completed in 4 iterations.
```
**Iteration History Documentation:**
Track what happened in each iteration:
```
Iteration History (ai-docs/iteration-history.md):
## Iteration 1
Designer Assessment: NEEDS IMPROVEMENT
Issues Found:
- Button color doesn't match design (#3B82F6 vs #2563EB)
- Spacing between elements too tight (8px vs 16px)
- Font size incorrect (14px vs 16px)
Developer Actions:
- Updated button color to #2563EB
- Increased spacing to 16px
- Changed font size to 16px
## Iteration 2
Designer Assessment: NEEDS IMPROVEMENT
Issues Found:
- Border radius too large (8px vs 4px)
Developer Actions:
- Reduced border radius to 4px
## Iteration 3
Designer Assessment: PASS ✓
Issues Found: None
Result: Design validation complete
```
---
### Pattern 3: Issue Severity Classification
**Severity Levels:**
Use 4-level severity classification:
```
CRITICAL - Must fix immediately
- Blocks core functionality
- Security vulnerabilities (SQL injection, XSS, auth bypass)
- Data loss risk
- System crashes
- Build failures
Action: STOP workflow, fix immediately, re-validate
HIGH - Should fix soon
- Major bugs (incorrect behavior)
- Performance issues (>3s page load, memory leaks)
- Accessibility violations (keyboard navigation broken)
- User experience blockers
Action: Fix in current iteration, proceed after fix
MEDIUM - Should fix
- Minor bugs (edge cases, visual glitches)
- Code quality issues (duplication, complexity)
- Non-blocking performance issues
- Incomplete error handling
Action: Fix if time permits, or schedule for next iteration
LOW - Nice to have
- Code style inconsistencies
- Minor refactoring opportunities
- Documentation improvements
- Polish and optimization
Action: Log for future improvement, proceed without fixing
```
**Severity-Based Prioritization:**
```
Issue List (sorted by severity):
CRITICAL Issues (must fix all before proceeding):
1. SQL injection in user search endpoint
2. Missing authentication check on admin routes
3. Password stored in plaintext
HIGH Issues (fix before code review):
4. Memory leak in WebSocket connection
5. Missing error handling in payment flow
6. Accessibility: keyboard navigation broken
MEDIUM Issues (fix if time permits):
7. Code duplication in auth controllers
8. Inconsistent error messages
9. Missing JSDoc comments
LOW Issues (defer to future):
10. Variable naming inconsistency
11. Redundant type annotations
12. CSS could use more specificity
Action Plan:
- Fix CRITICAL (1-3) immediately → Re-run tests
- Fix HIGH (4-6) before code review
- Log MEDIUM (7-9) for next iteration
- Ignore LOW (10-12) for now
```
**Severity Escalation:**
Issues can escalate in severity based on context:
```
Context-Based Escalation:
Issue: "Missing error handling in payment flow"
Base Severity: MEDIUM (code quality issue)
Context 1: Development environment
→ Severity: MEDIUM (not user-facing yet)
Context 2: Production environment
→ Severity: HIGH (affects real users, money involved)
Context 3: Production + recent payment failures
→ Severity: CRITICAL (actively causing issues)
Rule: Escalate severity when:
- Issue affects production users
- Issue involves money/security/data
- Issue is currently causing failures
```
---
### Pattern 4: Multi-Reviewer Consensus
**Consensus Levels:**
When multiple reviewers evaluate the same code/design:
```
UNANIMOUS (100% agreement):
- ALL reviewers flagged this issue
- VERY HIGH confidence
- Highest priority (likely a real problem)
Example:
3/3 reviewers: "SQL injection in search endpoint"
→ UNANIMOUS consensus
→ CRITICAL priority (all agree it's critical)
STRONG CONSENSUS (67-99% agreement):
- MOST reviewers flagged this issue
- HIGH confidence
- High priority (probably a real problem)
Example:
2/3 reviewers: "Missing input validation"
→ STRONG consensus (67%)
→ HIGH priority
MAJORITY (50-66% agreement):
- HALF or more flagged this issue
- MEDIUM confidence
- Medium priority (worth investigating)
Example:
2/3 reviewers: "Code duplication in controllers"
→ MAJORITY consensus (67%)
→ MEDIUM priority
DIVERGENT (< 50% agreement):
- Only 1-2 reviewers flagged this issue
- LOW confidence
- Low priority (may be model-specific or false positive)
Example:
1/3 reviewers: "Variable naming could be better"
→ DIVERGENT (33%)
→ LOW priority (one reviewer's opinion)
```
**Consensus-Based Prioritization:**
```
Prioritized Issue List (by consensus + severity):
1. [UNANIMOUS - CRITICAL] SQL injection in search
ALL reviewers agree: Claude, Grok, Gemini (3/3)
2. [UNANIMOUS - HIGH] Missing input validation
ALL reviewers agree: Claude, Grok, Gemini (3/3)
3. [STRONG - HIGH] Memory leak in WebSocket
MOST reviewers agree: Claude, Grok (2/3)
4. [MAJORITY - MEDIUM] Code duplication
HALF+ reviewers agree: Claude, Gemini (2/3)
5. [DIVERGENT - LOW] Variable naming
SINGLE reviewer: Claude only (1/3)
Action:
- Fix issues 1-2 immediately (unanimous + CRITICAL/HIGH)
- Fix issue 3 before review (strong consensus)
- Consider issue 4 (majority, but medium severity)
- Ignore issue 5 (divergent, likely false positive)
```
---
### Pattern 5: Feedback Loop Implementation
**User Feedback Loop:**
```
Workflow: User Validation with Feedback
Step 1: Initial Implementation
Developer implements feature
Designer/Tester validates
Present to user for manual validation
Step 2: User Validation Gate (MANDATORY)
Present to user:
"Implementation complete. Please manually verify:
- Open app at http://localhost:3000
- Test feature: [specific instructions]
- Compare to design reference
Does it meet expectations? (Yes/No)"
Step 3a: User says YES
→ ✅ Feature approved
→ Generate final report
→ Mark workflow complete
Step 3b: User says NO
→ Collect specific feedback
Step 4: Collect Specific Feedback
Ask user: "Please describe the issues you found:"
User response:
"1. Button color is wrong (should be blue, not green)
2. Spacing is too tight between elements
3. Font size is too small"
Step 5: Extract Structured Feedback
Parse user feedback into structured issues:
Issue 1:
Component: Button
Problem: Color incorrect
Expected: Blue (#2563EB)
Actual: Green (#10B981)
Severity: MEDIUM
Issue 2:
Component: Container
Problem: Spacing too tight
Expected: 16px
Actual: 8px
Severity: MEDIUM
Issue 3:
Component: Text
Problem: Font size too small
Expected: 16px
Actual: 14px
Severity: LOW
Step 6: Launch Fixing Agent
Task: ui-developer
Prompt: "Fix user-reported issues:
1. Button color: Change from #10B981 to #2563EB
2. Container spacing: Increase from 8px to 16px
3. Text font size: Increase from 14px to 16px
User feedback: [user's exact words]"
Step 7: Re-validate
After fixes:
- Re-run designer validation
- Loop back to Step 2 (user validation)
Step 8: Max Feedback Rounds
Limit: 5 feedback rounds (prevent infinite loop)
If round > 5:
Escalate to human review
"Unable to meet user expectations after 5 rounds.
Manual intervention required."
```
**Feedback Round Tracking:**
```
Feedback Round History:
Round 1:
User Issues: Button color, spacing, font size
Fixes Applied: Updated all 3 issues
Result: Re-validate
Round 2:
User Issues: Border radius too large
Fixes Applied: Reduced border radius
Result: Re-validate
Round 3:
User Issues: None
Result: ✅ APPROVED
Total Rounds: 3/5
```
---
### Pattern 6: Test-Driven Development Loop
**When to Use:**
Use TDD loop **after implementing code, before code review**:
```
Workflow Phases:
Phase 1: Architecture Planning
Phase 2: Implementation
Phase 2.5: Test-Driven Development Loop ← THIS PATTERN
Phase 3: Code Review
Phase 4: User Acceptance
```
**The TDD Loop Pattern:**
```
Step 1: Write Tests First
Task: test-architect
Prompt: "Write comprehensive tests for authentication feature.
Requirements: [link to requirements]
Implementation: [link to code]"
Output: tests/auth.test.ts
Step 2: Run Tests
Bash: bun test tests/auth.test.ts
Capture output and exit code
Step 3: Check Test Results
If all tests pass:
→ ✅ TDD loop complete
→ Proceed to code review (Phase 3)
If tests fail:
→ Analyze failure (continue to Step 4)
Step 4: Analyze Test Failure
Task: test-architect
Prompt: "Analyze test failure output:
[test failure logs]
Determine root cause:
- TEST_ISSUE: Test has bug (bad assertion, missing mock, wrong expectation)
- IMPLEMENTATION_ISSUE: Code has bug (logic error, missing validation, incorrect behavior)
Provide detailed analysis."
test-architect returns:
verdict: TEST_ISSUE | IMPLEMENTATION_ISSUE
analysis: Detailed explanation
recommendation: Specific fix needed
Step 5a: If TEST_ISSUE (test is wrong)
Task: test-architect
Prompt: "Fix test based on analysis:
[analysis from Step 4]"
After fix:
→ Re-run tests (back to Step 2)
→ Loop continues
Step 5b: If IMPLEMENTATION_ISSUE (code is wrong)
Provide structured feedback to developer:
Task: backend-developer
Prompt: "Fix implementation based on test failure:
Test Failure:
[failure output]
Root Cause:
[analysis from test-architect]
Recommended Fix:
[specific fix needed]"
After fix:
→ Re-run tests (back to Step 2)
→ Loop continues
Step 6: Max Iteration Limit
Limit: 10 iterations
Iteration tracking:
Iteration 1/10: 5 tests failed → Fix implementation
Iteration 2/10: 2 tests failed → Fix test (bad mock)
Iteration 3/10: All tests pass ✅
If iteration > 10:
Escalate to human review
"Unable to pass all tests after 10 iterations.
Manual debugging required."
```
**Example TDD Loop:**
```
Phase 2.5: Test-Driven Development Loop
Iteration 1:
Tests Run: 20 tests
Results: 5 failed, 15 passed
Failure: "JWT token validation fails with expired token"
Analysis: IMPLEMENTATION_ISSUE - Missing expiration check
Fix: Added expiration validation in TokenService
Re-run: Continue to Iteration 2
Iteration 2:
Tests Run: 20 tests
Results: 2 failed, 18 passed
Failure: "Mock database not reset between tests"
Analysis: TEST_ISSUE - Missing beforeEach cleanup
Fix: Added database reset in test setup
Re-run: Continue to Iteration 3
Iteration 3:
Tests Run: 20 tests
Results: All passed ✅
Result: TDD loop complete, proceed to code review
Total Iterations: 3/10
Duration: ~5 minutes
Benefits:
- Caught 2 bugs before code review
- Fixed 1 test quality issue
- All tests passing gives confidence in implementation
```
**Benefits of TDD Loop:**
```
Benefits:
1. Catch bugs early (before code review, not after)
2. Ensure test quality (test-architect fixes bad tests)
3. Automated quality assurance (no manual testing needed)
4. Fast feedback loop (seconds to run tests, not minutes)
5. Confidence in implementation (all tests passing)
Performance:
Traditional: Implement → Review → Find bugs → Fix → Re-review
Time: 30+ minutes, multiple review rounds
TDD Loop: Implement → Test → Fix → Test → Review (with confidence)
Time: 15 minutes, single review round (fewer issues)
```
---
## Integration with Other Skills
**quality-gates + multi-model-validation:**
```
Use Case: Cost approval before multi-model review
Step 1: Estimate costs (multi-model-validation)
Step 2: User approval gate (quality-gates)
If approved: Proceed with parallel execution
If rejected: Offer alternatives
Step 3: Execute review (multi-model-validation)
```
**quality-gates + multi-agent-coordination:**
```
Use Case: Iteration loop with designer validation
Step 1: Agent selection (multi-agent-coordination)
Select designer + ui-developer
Step 2: Iteration loop (quality-gates)
For i = 1 to 10:
- Run designer validation
- If PASS: Exit loop
- Else: Delegate to ui-developer for fixes
Step 3: User validation gate (quality-gates)
Mandatory manual approval
```
**quality-gates + error-recovery:**
```
Use Case: Test-driven loop with error recovery
Step 1: Run tests (quality-gates TDD pattern)
Step 2: If test execution fails (error-recovery)
- Syntax error → Fix and retry
- Framework crash → Notify user, skip TDD
Step 3: If tests pass (quality-gates)
- Proceed to code review
```
---
## Best Practices
**Do:**
- ✅ Set max iteration limits (prevent infinite loops)
- ✅ Define clear exit criteria (PASS, max iterations, user override)
- ✅ Track iteration history (document what happened)
- ✅ Show progress to user ("Iteration 3/10 complete")
- ✅ Classify issue severity (CRITICAL → HIGH → MEDIUM → LOW)
- ✅ Prioritize by consensus + severity
- ✅ Ask user approval for expensive operations
- ✅ Collect specific feedback (not vague complaints)
- ✅ Use TDD loop to catch bugs early
**Don't:**
- ❌ Create infinite loops (no exit criteria)
- ❌ Skip user validation gates (mandatory for UX)
- ❌ Ignore consensus (unanimous issues are real)
- ❌ Batch all severities together (prioritize CRITICAL)
- ❌ Proceed without approval for >$0.01 operations
- ❌ Collect vague feedback ("it's wrong" → what specifically?)
- ❌ Skip TDD loop (catches bugs before expensive review)
**Performance:**
- Iteration loops: 5-10 iterations typical, max 10-15 min
- TDD loop: 3-5 iterations typical, max 5-10 min
- User feedback: 1-3 rounds typical, max 5 rounds
---
## Examples
### Example 1: User Approval Gate for Multi-Model Review
**Scenario:** User requests multi-model review, costs $0.008
**Execution:**
```
Step 1: Estimate Costs
Input: 450 lines × 1.5 = 675 tokens per model
Output: 2000-4000 tokens per model
Total: 3 models × 3000 avg = 9000 output tokens
Cost: ~$0.008 ($0.005 - $0.010)
Step 2: Present Approval Gate
"Multi-model review will analyze 450 lines with 3 AI models:
- Claude Sonnet (embedded, free)
- Grok Code Fast (external, $0.002)
- Gemini 2.5 Flash (external, $0.001)
Estimated cost: $0.008 ($0.005 - $0.010)
Duration: ~5 minutes
Proceed? (Yes/No/Cancel)"
Step 3a: User says YES
→ Proceed with parallel execution
→ Track approval: log("User approved $0.008 cost")
Step 3b: User says NO
→ Offer alternatives:
1. Use only free Claude (no external models)
2. Use only 1 external model (reduce cost to $0.002)
3. Skip review entirely
→ Ask user to choose
Step 3c: User says CANCEL
→ Exit gracefully
→ Log: "User cancelled multi-model review"
→ Clean up temporary files
```
---
### Example 2: Designer Validation Iteration Loop
**Scenario:** UI implementation with automated iteration until PASS
**Execution:**
```
Iteration 1:
Task: designer
Prompt: "Validate navbar against Figma design"
Output: ai-docs/design-review-1.md
Assessment: NEEDS IMPROVEMENT
Issues:
- Button color: #3B82F6 (expected #2563EB)
- Spacing: 8px (expected 16px)
Task: ui-developer
Prompt: "Fix issues from ai-docs/design-review-1.md"
Changes: Updated button color, increased spacing
Result: Continue to Iteration 2
Iteration 2:
Task: designer
Prompt: "Re-validate navbar"
Output: ai-docs/design-review-2.md
Assessment: NEEDS IMPROVEMENT
Issues:
- Border radius: 8px (expected 4px)
Task: ui-developer
Prompt: "Fix border radius issue"
Changes: Reduced border radius to 4px
Result: Continue to Iteration 3
Iteration 3:
Task: designer
Prompt: "Re-validate navbar"
Output: ai-docs/design-review-3.md
Assessment: PASS ✓
Issues: None
Result: Exit loop (success)
Summary:
Total Iterations: 3/10
Duration: ~8 minutes
Automated Fixes: 3 issues resolved
Result: PASS, proceed to user validation
```
---
### Example 3: Test-Driven Development Loop
**Scenario:** Authentication implementation with TDD
**Execution:**
```
Phase 2.5: Test-Driven Development Loop
Iteration 1:
Task: test-architect
Prompt: "Write tests for authentication feature"
Output: tests/auth.test.ts (20 tests)
Bash: bun test tests/auth.test.ts
Result: 5 failed, 15 passed
Task: test-architect
Prompt: "Analyze test failures"
Verdict: IMPLEMENTATION_ISSUE
Analysis: "Missing JWT expiration validation"
Task: backend-developer
Prompt: "Add JWT expiration validation"
Changes: Updated TokenService.verify()
Bash: bun test tests/auth.test.ts
Result: Continue to Iteration 2
Iteration 2:
Bash: bun test tests/auth.test.ts
Result: 2 failed, 18 passed
Task: test-architect
Prompt: "Analyze test failures"
Verdict: TEST_ISSUE
Analysis: "Mock database not reset between tests"
Task: test-architect
Prompt: "Fix test setup"
Changes: Added beforeEach cleanup
Bash: bun test tests/auth.test.ts
Result: Continue to Iteration 3
Iteration 3:
Bash: bun test tests/auth.test.ts
Result: All 20 passed ✅
Result: TDD loop complete, proceed to code review
Summary:
Total Iterations: 3/10
Duration: ~5 minutes
Bugs Caught: 1 implementation bug, 1 test bug
Result: All tests passing, high confidence in code
```
---
## Troubleshooting
**Problem: Infinite iteration loop**
Cause: No exit criteria or max iteration limit
Solution: Always set max iterations (10 for automated, 5 for user feedback)
```
❌ Wrong:
while (true) {
if (review.assessment === "PASS") break;
fix();
}
✅ Correct:
for (let i = 1; i <= 10; i++) {
if (review.assessment === "PASS") break;
if (i === 10) escalateToUser();
fix();
}
```
---
**Problem: User approval skipped for expensive operation**
Cause: Missing approval gate
Solution: Always ask approval for costs >$0.01
```
❌ Wrong:
if (userRequestedMultiModel) {
executeReview();
}
✅ Correct:
if (userRequestedMultiModel) {
const cost = estimateCost();
if (cost > 0.01) {
const approved = await askUserApproval(cost);
if (!approved) return offerAlternatives();
}
executeReview();
}
```
---
**Problem: All issues treated equally**
Cause: No severity classification
Solution: Classify by severity, prioritize CRITICAL
```
❌ Wrong:
issues.forEach(issue => fix(issue));
✅ Correct:
const critical = issues.filter(i => i.severity === "CRITICAL");
const high = issues.filter(i => i.severity === "HIGH");
critical.forEach(issue => fix(issue)); // Fix critical first
high.forEach(issue => fix(issue)); // Then high
// MEDIUM and LOW deferred or skipped
```
---
## Summary
Quality gates ensure high-quality results through:
- **User approval gates** (cost, quality, final validation)
- **Iteration loops** (automated refinement, max 10 iterations)
- **Severity classification** (CRITICAL → HIGH → MEDIUM → LOW)
- **Consensus prioritization** (unanimous → strong → majority → divergent)
- **Feedback loops** (collect specific issues, fix, re-validate)
- **Test-driven development** (write tests, run, fix, repeat until pass)
Master these patterns and your workflows will consistently produce high-quality, validated results.
---
**Extracted From:**
- `/review` command (user approval for costs, consensus analysis)
- `/validate-ui` command (iteration loops, user validation gates, feedback collection)
- `/implement` command (PHASE 2.5 test-driven development loop)
- Multi-model review patterns (consensus-based prioritization)

View File

@@ -0,0 +1,983 @@
---
name: todowrite-orchestration
description: Track progress in multi-phase workflows with TodoWrite. Use when orchestrating 5+ phase commands, managing iteration loops, tracking parallel tasks, or providing real-time progress visibility. Trigger keywords - "phase tracking", "progress", "workflow", "multi-step", "multi-phase", "todo", "tracking", "status".
version: 0.1.0
tags: [orchestration, todowrite, progress, tracking, workflow, multi-phase]
keywords: [phase-tracking, progress, workflow, multi-step, multi-phase, todo, tracking, status, visibility]
---
# TodoWrite Orchestration
**Version:** 1.0.0
**Purpose:** Patterns for using TodoWrite in complex multi-phase workflows
**Status:** Production Ready
## Overview
TodoWrite orchestration is the practice of using the TodoWrite tool to provide **real-time progress visibility** in complex multi-phase workflows. It transforms opaque "black box" workflows into transparent, trackable processes where users can see:
- What phase is currently executing
- How many phases remain
- Which tasks are pending, in-progress, or completed
- Overall progress percentage
- Iteration counts in loops
This skill provides battle-tested patterns for:
- **Phase initialization** (create complete task list before starting)
- **Task granularity** (how to break phases into trackable tasks)
- **Status transitions** (pending → in_progress → completed)
- **Real-time updates** (mark complete immediately, not batched)
- **Iteration tracking** (progress through loops)
- **Parallel task tracking** (multiple agents executing simultaneously)
TodoWrite orchestration is especially valuable for workflows with >5 phases or >10 minutes duration, where users need progress feedback.
## Core Patterns
### Pattern 1: Phase Initialization
**Create TodoWrite List BEFORE Starting:**
Initialize TodoWrite as **step 0** of your workflow, before any actual work begins:
```
✅ CORRECT - Initialize First:
Step 0: Initialize TodoWrite
TodoWrite: Create task list
- PHASE 1: Gather user inputs
- PHASE 1: Validate inputs
- PHASE 2: Select AI models
- PHASE 2: Estimate costs
- PHASE 2: Get user approval
- PHASE 3: Launch parallel reviews
- PHASE 3: Wait for all reviews
- PHASE 4: Consolidate reviews
- PHASE 5: Present results
Step 1: Start actual work (PHASE 1)
Mark "PHASE 1: Gather user inputs" as in_progress
... do work ...
Mark "PHASE 1: Gather user inputs" as completed
Mark "PHASE 1: Validate inputs" as in_progress
... do work ...
❌ WRONG - Create During Workflow:
Step 1: Do some work
... work happens ...
TodoWrite: Create task "Did some work" (completed)
Step 2: Do more work
... work happens ...
TodoWrite: Create task "Did more work" (completed)
Problem: User has no visibility into upcoming phases
```
**List All Phases Upfront:**
When initializing, include **all phases** in the task list, not just the current phase:
```
✅ CORRECT - Complete Visibility:
TodoWrite Initial State:
[ ] PHASE 1: Gather user inputs
[ ] PHASE 1: Validate inputs
[ ] PHASE 2: Architecture planning
[ ] PHASE 3: Implementation
[ ] PHASE 3: Run quality checks
[ ] PHASE 4: Code review
[ ] PHASE 5: User acceptance
[ ] PHASE 6: Generate report
User sees: "8 tasks total, 0 complete, Phase 1 starting"
❌ WRONG - Incremental Discovery:
TodoWrite Initial State:
[ ] PHASE 1: Gather user inputs
[ ] PHASE 1: Validate inputs
(User thinks workflow is 2 tasks, then surprised by 6 more phases)
```
**Why Initialize First:**
1. **User expectation setting:** User knows workflow scope (8 phases, ~20 minutes)
2. **Progress visibility:** User can see % complete (3/8 = 37.5%)
3. **Time estimation:** User can estimate remaining time based on progress
4. **Transparency:** No hidden phases or surprises
---
### Pattern 2: Task Granularity Guidelines
**One Task Per Significant Operation:**
Each task should represent a **significant operation** (1-5 minutes of work):
```
✅ CORRECT - Significant Operations:
Tasks:
- PHASE 1: Ask user for inputs (30s)
- PHASE 2: Generate architecture plan (2 min)
- PHASE 3: Implement feature (5 min)
- PHASE 4: Run tests (1 min)
- PHASE 5: Code review (3 min)
Each task = meaningful unit of work
❌ WRONG - Too Granular:
Tasks:
- PHASE 1: Ask user question 1
- PHASE 1: Ask user question 2
- PHASE 1: Ask user question 3
- PHASE 2: Read file A
- PHASE 2: Read file B
- PHASE 2: Write file C
- ... (50 micro-tasks)
Problem: Too many updates, clutters user interface
```
**Multi-Step Phases: Break Into 2-3 Sub-Tasks:**
For complex phases (>5 minutes), break into 2-3 sub-tasks:
```
✅ CORRECT - Sub-Task Breakdown:
PHASE 3: Implementation (15 min total)
→ Sub-tasks:
- PHASE 3: Implement core logic (5 min)
- PHASE 3: Add error handling (3 min)
- PHASE 3: Write tests (7 min)
User sees progress within phase: "PHASE 3: 2/3 complete"
❌ WRONG - Single Monolithic Task:
PHASE 3: Implementation (15 min)
→ No sub-tasks
Problem: User sees "in_progress" for 15 min with no updates
```
**Avoid Too Many Tasks:**
Limit to **max 15-20 tasks** for readability:
```
✅ CORRECT - 12 Tasks (readable):
10-phase workflow:
- PHASE 1: Ask user
- PHASE 2: Plan (2 sub-tasks)
- PHASE 3: Implement (3 sub-tasks)
- PHASE 4: Test
- PHASE 5: Review (2 sub-tasks)
- PHASE 6: Fix issues
- PHASE 7: Re-review
- PHASE 8: Accept
Total: 12 tasks (clean, trackable)
❌ WRONG - 50 Tasks (overwhelming):
Every single action as separate task:
- Read file 1
- Read file 2
- Write file 3
- Run command 1
- ... (50 tasks)
Problem: User overwhelmed, can't see forest for trees
```
**Guideline by Workflow Duration:**
```
Workflow Duration → Task Count:
< 5 minutes: 3-5 tasks
5-15 minutes: 8-12 tasks
15-30 minutes: 12-18 tasks
> 30 minutes: 15-20 tasks (if more, group into phases)
Example:
5-minute workflow (3 phases):
- PHASE 1: Prepare
- PHASE 2: Execute
- PHASE 3: Present
Total: 3 tasks ✓
20-minute workflow (6 phases):
- PHASE 1: Ask user
- PHASE 2: Plan (2 sub-tasks)
- PHASE 3: Implement (3 sub-tasks)
- PHASE 4: Test
- PHASE 5: Review (2 sub-tasks)
- PHASE 6: Accept
Total: 11 tasks ✓
```
---
### Pattern 3: Status Transitions
**Exactly ONE Task In Progress at a Time:**
Maintain the invariant: **exactly one task in_progress** at any moment:
```
✅ CORRECT - One In-Progress:
State at time T1:
[✓] PHASE 1: Ask user (completed)
[✓] PHASE 2: Plan (completed)
[→] PHASE 3: Implement (in_progress) ← Only one
[ ] PHASE 4: Test (pending)
[ ] PHASE 5: Review (pending)
State at time T2 (after PHASE 3 completes):
[✓] PHASE 1: Ask user (completed)
[✓] PHASE 2: Plan (completed)
[✓] PHASE 3: Implement (completed)
[→] PHASE 4: Test (in_progress) ← Only one
[ ] PHASE 5: Review (pending)
❌ WRONG - Multiple In-Progress:
State:
[✓] PHASE 1: Ask user (completed)
[→] PHASE 2: Plan (in_progress) ← Two in-progress?
[→] PHASE 3: Implement (in_progress) ← Confusing!
[ ] PHASE 4: Test (pending)
Problem: User confused about current phase
```
**Status Transition Sequence:**
```
Lifecycle of a Task:
1. Created: pending
(Task exists, not started yet)
2. Started: pending → in_progress
(Mark as in_progress when starting work)
3. Completed: in_progress → completed
(Mark as completed immediately after finishing)
4. Next task: Mark next task as in_progress
(Continue to next task)
Example Timeline:
T=0s: [→] Task 1 (in_progress), [ ] Task 2 (pending)
T=30s: [✓] Task 1 (completed), [→] Task 2 (in_progress)
T=60s: [✓] Task 1 (completed), [✓] Task 2 (completed)
```
**NEVER Batch Completions:**
Mark tasks completed **immediately** after finishing, not at end of phase:
```
✅ CORRECT - Immediate Updates:
Mark "PHASE 1: Ask user" as in_progress
... do work (30s) ...
Mark "PHASE 1: Ask user" as completed ← Immediate
Mark "PHASE 1: Validate inputs" as in_progress
... do work (20s) ...
Mark "PHASE 1: Validate inputs" as completed ← Immediate
User sees real-time progress
❌ WRONG - Batched Updates:
Mark "PHASE 1: Ask user" as in_progress
... do work (30s) ...
Mark "PHASE 1: Validate inputs" as in_progress
... do work (20s) ...
(At end of PHASE 1, batch update both to completed)
Problem: User doesn't see progress for 50s, thinks workflow is stuck
```
---
### Pattern 4: Real-Time Progress Tracking
**Update TodoWrite As Work Progresses:**
TodoWrite should reflect **current state**, not past state:
```
✅ CORRECT - Real-Time Updates:
T=0s: Initialize TodoWrite (8 tasks, all pending)
T=5s: Mark "PHASE 1" as in_progress
T=35s: Mark "PHASE 1" as completed, "PHASE 2" as in_progress
T=90s: Mark "PHASE 2" as completed, "PHASE 3" as in_progress
...
User always sees accurate current state
❌ WRONG - Delayed Updates:
T=0s: Initialize TodoWrite
T=300s: Workflow completes
T=301s: Update all tasks to completed
Problem: No progress visibility for 5 minutes
```
**Add New Tasks If Discovered During Execution:**
If you discover additional work during execution, add new tasks:
```
Scenario: During implementation, realize refactoring needed
Initial TodoWrite:
[✓] PHASE 1: Plan
[→] PHASE 2: Implement
[ ] PHASE 3: Test
[ ] PHASE 4: Review
During PHASE 2, discover:
"Implementation requires refactoring legacy code"
Updated TodoWrite:
[✓] PHASE 1: Plan
[✓] PHASE 2: Implement core logic (completed)
[→] PHASE 2: Refactor legacy code (in_progress) ← New task added
[ ] PHASE 3: Test
[ ] PHASE 4: Review
User sees: "Additional work discovered: refactoring. Total now 5 tasks."
```
**User Can See Current Progress at Any Time:**
With real-time updates, user can check progress:
```
User checks at T=120s:
TodoWrite State:
[✓] PHASE 1: Ask user
[✓] PHASE 2: Plan architecture
[→] PHASE 3: Implement core logic (in_progress)
[ ] PHASE 3: Add error handling
[ ] PHASE 3: Write tests
[ ] PHASE 4: Code review
[ ] PHASE 5: Accept
User sees: "3/8 tasks complete (37.5%), currently implementing core logic"
```
---
### Pattern 5: Iteration Loop Tracking
**Create Task Per Iteration:**
For iteration loops, create a task for each iteration:
```
✅ CORRECT - Iteration Tasks:
Design Validation Loop (max 10 iterations):
Initial TodoWrite:
[ ] Iteration 1/10: Designer validation
[ ] Iteration 2/10: Designer validation
[ ] Iteration 3/10: Designer validation
... (create all 10 upfront)
Progress:
[✓] Iteration 1/10: Designer validation (NEEDS IMPROVEMENT)
[✓] Iteration 2/10: Designer validation (NEEDS IMPROVEMENT)
[→] Iteration 3/10: Designer validation (in_progress)
[ ] Iteration 4/10: Designer validation
...
User sees: "Iteration 3/10 in progress, 2 complete"
❌ WRONG - Single Loop Task:
TodoWrite:
[→] Design validation loop (in_progress)
Problem: User sees "in_progress" for 10 minutes, no iteration visibility
```
**Mark Iteration Complete When Done:**
```
Iteration Lifecycle:
Iteration 1:
Mark "Iteration 1/10" as in_progress
Run designer validation
If NEEDS IMPROVEMENT: Run developer fixes
Mark "Iteration 1/10" as completed
Iteration 2:
Mark "Iteration 2/10" as in_progress
Run designer validation
If PASS: Exit loop early
Mark "Iteration 2/10" as completed
Result: Loop exited after 2 iterations
[✓] Iteration 1/10 (completed)
[✓] Iteration 2/10 (completed)
[ ] Iteration 3/10 (not needed, loop exited)
...
User sees: "Loop completed in 2/10 iterations"
```
**Track Total Iterations vs Max Limit:**
```
Iteration Progress:
Max: 10 iterations
Current: 5
TodoWrite State:
[✓] Iteration 1/10
[✓] Iteration 2/10
[✓] Iteration 3/10
[✓] Iteration 4/10
[→] Iteration 5/10
[ ] Iteration 6/10
...
User sees: "Iteration 5/10 (50% through max)"
Warning at Iteration 8:
"Iteration 8/10 - approaching max, may escalate to user if not PASS"
```
**Clear Progress Visibility:**
```
Iteration Loop with TodoWrite:
User Request: "Validate UI design"
TodoWrite:
[✓] PHASE 1: Gather design reference
[✓] Iteration 1/10: Designer validation (5 issues found)
[✓] Iteration 2/10: Designer validation (3 issues found)
[✓] Iteration 3/10: Designer validation (1 issue found)
[→] Iteration 4/10: Designer validation (in_progress)
[ ] Iteration 5/10: Designer validation
...
[ ] PHASE 3: User validation gate
User sees:
- 4 iterations completed (40% through max)
- Issues reducing each iteration (5 → 3 → 1)
- Progress toward PASS
```
---
### Pattern 6: Parallel Task Tracking
**Multiple Agents Executing Simultaneously:**
When running agents in parallel, track each separately:
```
✅ CORRECT - Separate Tasks for Parallel Agents:
Multi-Model Review (3 models in parallel):
TodoWrite:
[✓] PHASE 1: Prepare review context
[→] PHASE 2: Claude review (in_progress)
[→] PHASE 2: Grok review (in_progress)
[→] PHASE 2: Gemini review (in_progress)
[ ] PHASE 3: Consolidate reviews
Note: 3 tasks "in_progress" is OK for parallel execution
(Exception to "one in_progress" rule)
As models complete:
[✓] PHASE 1: Prepare review context
[✓] PHASE 2: Claude review (completed) ← First to finish
[→] PHASE 2: Grok review (in_progress)
[→] PHASE 2: Gemini review (in_progress)
[ ] PHASE 3: Consolidate reviews
User sees: "1/3 reviews complete, 2 in progress"
❌ WRONG - Single Task for Parallel Work:
TodoWrite:
[✓] PHASE 1: Prepare
[→] PHASE 2: Run 3 reviews (in_progress)
[ ] PHASE 3: Consolidate
Problem: No visibility into which reviews are complete
```
**Update As Each Agent Completes:**
```
Parallel Execution Timeline:
T=0s: Launch 3 reviews in parallel
[→] Claude review (in_progress)
[→] Grok review (in_progress)
[→] Gemini review (in_progress)
T=60s: Claude completes first
[✓] Claude review (completed)
[→] Grok review (in_progress)
[→] Gemini review (in_progress)
T=120s: Gemini completes
[✓] Claude review (completed)
[→] Grok review (in_progress)
[✓] Gemini review (completed)
T=180s: Grok completes
[✓] Claude review (completed)
[✓] Grok review (completed)
[✓] Gemini review (completed)
User sees real-time completion updates
```
**Progress Indicators During Long Parallel Tasks:**
```
For long-running parallel tasks (>2 minutes), show progress:
T=0s: "Launching 5 AI model reviews (estimated 5 minutes)..."
T=60s: "1/5 reviews complete..."
T=120s: "2/5 reviews complete..."
T=180s: "4/5 reviews complete, 1 in progress..."
T=240s: "All reviews complete! Consolidating results..."
TodoWrite mirrors this:
[✓] Claude review (1/5 complete)
[✓] Grok review (2/5 complete)
[→] Gemini review (in_progress)
[→] GPT-5 review (in_progress)
[→] DeepSeek review (in_progress)
```
---
## Integration with Other Skills
**todowrite-orchestration + multi-agent-coordination:**
```
Use Case: Multi-phase implementation workflow
Step 1: Initialize TodoWrite (todowrite-orchestration)
Create task list for all 8 phases
Step 2: Sequential Agent Delegation (multi-agent-coordination)
Phase 1: api-architect
Mark PHASE 1 as in_progress
Delegate to api-architect
Mark PHASE 1 as completed
Phase 2: backend-developer
Mark PHASE 2 as in_progress
Delegate to backend-developer
Mark PHASE 2 as completed
... continue for all phases
```
**todowrite-orchestration + multi-model-validation:**
```
Use Case: Multi-model review with progress tracking
Step 1: Initialize TodoWrite (todowrite-orchestration)
[ ] PHASE 1: Prepare context
[ ] PHASE 2: Launch reviews (5 models)
[ ] PHASE 3: Consolidate results
Step 2: Parallel Execution (multi-model-validation)
Mark "PHASE 2: Launch reviews" as in_progress
Launch all 5 models simultaneously
As each completes: Update progress (1/5, 2/5, ...)
Mark "PHASE 2: Launch reviews" as completed
Step 3: Real-Time Visibility (todowrite-orchestration)
User sees: "PHASE 2: 3/5 reviews complete..."
```
**todowrite-orchestration + quality-gates:**
```
Use Case: Iteration loop with TodoWrite tracking
Step 1: Initialize TodoWrite (todowrite-orchestration)
[ ] Iteration 1/10
[ ] Iteration 2/10
...
Step 2: Iteration Loop (quality-gates)
For i = 1 to 10:
Mark "Iteration i/10" as in_progress
Run designer validation
If PASS: Exit loop
Mark "Iteration i/10" as completed
Step 3: Progress Visibility
User sees: "Iteration 5/10 complete, 5 remaining"
```
---
## Best Practices
**Do:**
- ✅ Initialize TodoWrite BEFORE starting work (step 0)
- ✅ List ALL phases upfront (user sees complete scope)
- ✅ Use 8-15 tasks for typical workflows (readable)
- ✅ Mark completed IMMEDIATELY after finishing (real-time)
- ✅ Keep exactly ONE task in_progress (except parallel tasks)
- ✅ Track iterations separately (Iteration 1/10, 2/10, ...)
- ✅ Update as work progresses (not batched at end)
- ✅ Add new tasks if discovered during execution
**Don't:**
- ❌ Create TodoWrite during workflow (initialize first)
- ❌ Hide phases from user (list all upfront)
- ❌ Create too many tasks (>20 overwhelms user)
- ❌ Batch completions at end of phase (update real-time)
- ❌ Leave multiple tasks in_progress (pick one)
- ❌ Use single task for loop (track iterations separately)
- ❌ Update only at start/end (update during execution)
**Performance:**
- TodoWrite overhead: <1s per update (negligible)
- User visibility benefit: Reduces perceived wait time 30-50%
- Workflow confidence: User knows progress, less likely to cancel
---
## Examples
### Example 1: 8-Phase Implementation Workflow
**Scenario:** Full-cycle implementation with TodoWrite tracking
**Execution:**
```
Step 0: Initialize TodoWrite
TodoWrite: Create task list
[ ] PHASE 1: Ask user for requirements
[ ] PHASE 2: Generate architecture plan
[ ] PHASE 3: Implement core logic
[ ] PHASE 3: Add error handling
[ ] PHASE 3: Write tests
[ ] PHASE 4: Run test suite
[ ] PHASE 5: Code review
[ ] PHASE 6: Fix review issues
[ ] PHASE 7: User acceptance
[ ] PHASE 8: Generate report
User sees: "10 tasks, 0 complete, Phase 1 starting..."
Step 1: PHASE 1
Mark "PHASE 1: Ask user" as in_progress
... gather requirements (30s) ...
Mark "PHASE 1: Ask user" as completed
User sees: "1/10 tasks complete (10%)"
Step 2: PHASE 2
Mark "PHASE 2: Architecture plan" as in_progress
... generate plan (2 min) ...
Mark "PHASE 2: Architecture plan" as completed
User sees: "2/10 tasks complete (20%)"
Step 3: PHASE 3 (3 sub-tasks)
Mark "PHASE 3: Implement core" as in_progress
... implement (3 min) ...
Mark "PHASE 3: Implement core" as completed
User sees: "3/10 tasks complete (30%)"
Mark "PHASE 3: Add error handling" as in_progress
... add error handling (2 min) ...
Mark "PHASE 3: Add error handling" as completed
User sees: "4/10 tasks complete (40%)"
Mark "PHASE 3: Write tests" as in_progress
... write tests (3 min) ...
Mark "PHASE 3: Write tests" as completed
User sees: "5/10 tasks complete (50%)"
... continue through all phases ...
Final State:
[✓] All 10 tasks completed
User sees: "10/10 tasks complete (100%). Workflow finished!"
Total Duration: ~15 minutes
User Experience: Continuous progress updates every 1-3 minutes
```
---
### Example 2: Iteration Loop with Progress Tracking
**Scenario:** Design validation with 10 max iterations
**Execution:**
```
Step 0: Initialize TodoWrite
TodoWrite: Create task list
[ ] PHASE 1: Gather design reference
[ ] Iteration 1/10: Designer validation
[ ] Iteration 2/10: Designer validation
[ ] Iteration 3/10: Designer validation
[ ] Iteration 4/10: Designer validation
[ ] Iteration 5/10: Designer validation
... (10 iterations total)
[ ] PHASE 3: User validation gate
Step 1: PHASE 1
Mark "PHASE 1: Gather design" as in_progress
... gather design (20s) ...
Mark "PHASE 1: Gather design" as completed
Step 2: Iteration Loop
Iteration 1:
Mark "Iteration 1/10" as in_progress
Designer: "NEEDS IMPROVEMENT - 5 issues"
Developer: Fix 5 issues
Mark "Iteration 1/10" as completed
User sees: "Iteration 1/10 complete, 5 issues fixed"
Iteration 2:
Mark "Iteration 2/10" as in_progress
Designer: "NEEDS IMPROVEMENT - 3 issues"
Developer: Fix 3 issues
Mark "Iteration 2/10" as completed
User sees: "Iteration 2/10 complete, 3 issues fixed"
Iteration 3:
Mark "Iteration 3/10" as in_progress
Designer: "NEEDS IMPROVEMENT - 1 issue"
Developer: Fix 1 issue
Mark "Iteration 3/10" as completed
User sees: "Iteration 3/10 complete, 1 issue fixed"
Iteration 4:
Mark "Iteration 4/10" as in_progress
Designer: "PASS ✓"
Mark "Iteration 4/10" as completed
Exit loop (early exit)
User sees: "Loop completed in 4/10 iterations"
Step 3: PHASE 3
Mark "PHASE 3: User validation" as in_progress
... user validates ...
Mark "PHASE 3: User validation" as completed
Final State:
[✓] PHASE 1: Gather design
[✓] Iteration 1/10 (5 issues fixed)
[✓] Iteration 2/10 (3 issues fixed)
[✓] Iteration 3/10 (1 issue fixed)
[✓] Iteration 4/10 (PASS)
[ ] Iteration 5/10 (not needed)
...
[✓] PHASE 3: User validation
User Experience: Clear iteration progress, early exit visible
```
---
### Example 3: Parallel Multi-Model Review
**Scenario:** 5 AI models reviewing code in parallel
**Execution:**
```
Step 0: Initialize TodoWrite
TodoWrite: Create task list
[ ] PHASE 1: Prepare review context
[ ] PHASE 2: Claude review
[ ] PHASE 2: Grok review
[ ] PHASE 2: Gemini review
[ ] PHASE 2: GPT-5 review
[ ] PHASE 2: DeepSeek review
[ ] PHASE 3: Consolidate reviews
[ ] PHASE 4: Present results
Step 1: PHASE 1
Mark "PHASE 1: Prepare context" as in_progress
... prepare (30s) ...
Mark "PHASE 1: Prepare context" as completed
Step 2: PHASE 2 (Parallel Execution)
Mark all 5 reviews as in_progress:
[→] Claude review
[→] Grok review
[→] Gemini review
[→] GPT-5 review
[→] DeepSeek review
Launch all 5 in parallel (4-Message Pattern)
As each completes:
T=60s: Claude completes
[✓] Claude review
User sees: "1/5 reviews complete"
T=90s: Gemini completes
[✓] Gemini review
User sees: "2/5 reviews complete"
T=120s: GPT-5 completes
[✓] GPT-5 review
User sees: "3/5 reviews complete"
T=150s: Grok completes
[✓] Grok review
User sees: "4/5 reviews complete"
T=180s: DeepSeek completes
[✓] DeepSeek review
User sees: "5/5 reviews complete!"
Step 3: PHASE 3
Mark "PHASE 3: Consolidate" as in_progress
... consolidate (30s) ...
Mark "PHASE 3: Consolidate" as completed
Step 4: PHASE 4
Mark "PHASE 4: Present results" as in_progress
... present (10s) ...
Mark "PHASE 4: Present results" as completed
Final State:
[✓] All 8 tasks completed
User sees: "Multi-model review complete in 3 minutes"
User Experience:
- Real-time progress as each model completes
- Clear visibility: "3/5 reviews complete"
- Reduces perceived wait time (user knows progress)
```
---
## Troubleshooting
**Problem: User thinks workflow is stuck**
Cause: No TodoWrite updates for >1 minute
Solution: Update TodoWrite more frequently, or add sub-tasks
```
❌ Wrong:
[→] PHASE 3: Implementation (in_progress for 10 minutes)
✅ Correct:
[✓] PHASE 3: Implement core logic (2 min)
[✓] PHASE 3: Add error handling (3 min)
[→] PHASE 3: Write tests (in_progress, 2 min so far)
User sees progress every 2-3 minutes
```
---
**Problem: Too many tasks (>20), overwhelming**
Cause: Too granular task breakdown
Solution: Group micro-tasks into larger operations
```
❌ Wrong (25 tasks):
[ ] Read file 1
[ ] Read file 2
[ ] Write file 3
... (25 micro-tasks)
✅ Correct (8 tasks):
[ ] PHASE 1: Gather inputs (includes reading files)
[ ] PHASE 2: Process data
... (8 significant operations)
```
---
**Problem: Multiple tasks "in_progress" (not parallel execution)**
Cause: Forgot to mark previous task as completed
Solution: Always mark completed before starting next
```
❌ Wrong:
[→] PHASE 1: Ask user (in_progress)
[→] PHASE 2: Plan (in_progress) ← Both in_progress?
✅ Correct:
[✓] PHASE 1: Ask user (completed)
[→] PHASE 2: Plan (in_progress) ← Only one
```
---
## Summary
TodoWrite orchestration provides real-time progress visibility through:
- **Phase initialization** (create task list before starting)
- **Appropriate granularity** (8-15 tasks, significant operations)
- **Real-time updates** (mark completed immediately)
- **Exactly one in_progress** (except parallel execution)
- **Iteration tracking** (separate task per iteration)
- **Parallel task tracking** (update as each completes)
Master these patterns and users will always know:
- What's happening now
- What's coming next
- How much progress has been made
- How much remains
This transforms "black box" workflows into transparent, trackable processes.
---
**Extracted From:**
- `/review` command (10-task initialization, phase-based tracking)
- `/implement` command (8-phase workflow with sub-tasks)
- `/validate-ui` command (iteration tracking, user feedback rounds)
- All multi-phase orchestration workflows