Initial commit

2025-11-30 08:39:00 +08:00
commit 56486a03ae
8 changed files with 4910 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,13 @@
 {
  "name": "orchestration",
  "description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
  "version": "0.1.1",
  "author": {
    "name": "Jack Rudenko",
    "email": "i@madappgang.com",
    "company": "MadAppGang"
  },
  "skills": [
    "./skills"
  ]
 }
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
 # orchestration
 Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,61 @@
 {
  "$schema": "internal://schemas/plugin.lock.v1.json",
  "pluginId": "gh:MadAppGang/claude-code:plugins/orchestration",
  "normalized": {
    "repo": null,
    "ref": "refs/tags/v20251128.0",
    "commit": "ad90df36843224b97a17f14cfd5a207d4e053c67",
    "treeHash": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150",
    "generatedAt": "2025-11-28T10:12:05.859643Z",
    "toolVersion": "publish_plugins.py@0.2.0"
  },
  "origin": {
    "remote": "git@github.com:zhongweili/42plugin-data.git",
    "branch": "master",
    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
  },
  "manifest": {
    "name": "orchestration",
    "description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
    "version": "0.1.1"
  },
  "content": {
    "files": [
      {
        "path": "README.md",
        "sha256": "215babb6dff86f8783d8e97d0a21546e2aaa3b055bc1cde5c4e16c6bf3d6c7a5"
      },
      {
        "path": ".claude-plugin/plugin.json",
        "sha256": "36414e18947889714f9d80576e01edaab8b3ffdf9efd44107e0f5fb42b0e2270"
      },
      {
        "path": "skills/todowrite-orchestration/SKILL.md",
        "sha256": "f681467a2eef99945f90b8f2b654c8c9713f4153afdff19a0c0b312d2f6084de"
      },
      {
        "path": "skills/quality-gates/SKILL.md",
        "sha256": "ba13c21d8e9f8abeb856bbec4a6ebc821e92dfe0857942797959087452b175c3"
      },
      {
        "path": "skills/error-recovery/SKILL.md",
        "sha256": "133564d1bc0d35a8c35074b089120fe7d7a757b71bdd6222a7a5c23e45f20aa3"
      },
      {
        "path": "skills/multi-agent-coordination/SKILL.md",
        "sha256": "9e0156350eb09447221898598611a5270921c31168e7698c4bd0d3bd0ced4616"
      },
      {
        "path": "skills/multi-model-validation/SKILL.md",
        "sha256": "9d5c46dfa531f911f4fcc4070fd6c039900bcdb440c997f7eac384001a1ba33e"
      }
    ],
    "dirSha256": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150"
  },
  "security": {
    "scannedAt": null,
    "scannerVersion": null,
    "flags": []
  }
 }
--- a/skills/error-recovery/SKILL.md
+++ b/skills/error-recovery/SKILL.md
--- a/skills/multi-agent-coordination/SKILL.md
+++ b/skills/multi-agent-coordination/SKILL.md
@@ -0,0 +1,742 @@
 ---
 name: multi-agent-coordination
 description: Coordinate multiple agents in parallel or sequential workflows. Use when running agents simultaneously, delegating to sub-agents, switching between specialized agents, or managing agent selection. Trigger keywords - "parallel agents", "sequential workflow", "delegate", "multi-agent", "sub-agent", "agent switching", "task decomposition".
 version: 0.1.0
 tags: [orchestration, multi-agent, parallel, sequential, delegation, coordination]
 keywords: [parallel, sequential, delegate, sub-agent, agent-switching, multi-agent, task-decomposition, coordination]
 ---
 # Multi-Agent Coordination
 **Version:** 1.0.0
 **Purpose:** Patterns for coordinating multiple agents in complex workflows
 **Status:** Production Ready
 ## Overview
 Multi-agent coordination is the foundation of sophisticated Claude Code workflows. This skill provides battle-tested patterns for orchestrating multiple specialized agents to accomplish complex tasks that are beyond the capabilities of a single agent.
 The key challenge in multi-agent systems is **dependencies**. Some tasks must execute sequentially (one agent's output feeds into another), while others can run in parallel (independent validations from different perspectives). Getting this right is the difference between a 5-minute workflow and a 15-minute one.
 This skill teaches you:
 - When to run agents in **parallel** vs **sequential**
 - How to **select the right agent** for each task
 - How to **delegate** to sub-agents without polluting context
 - How to manage **context windows** across multiple agent calls
 ## Core Patterns
 ### Pattern 1: Sequential vs Parallel Execution
 **When to Use Sequential:**
 Use sequential execution when there are **dependencies** between agents:
 - Agent B needs Agent A's output as input
 - Workflow phases must complete in order (plan → implement → test → review)
 - Each agent modifies shared state (same files)
 **Example: Multi-Phase Implementation**
 ```
 Phase 1: Architecture Planning
  Task: api-architect
    Output: ai-docs/architecture-plan.md
    Wait for completion ✓
 Phase 2: Implementation (depends on Phase 1)
  Task: backend-developer
    Input: Read ai-docs/architecture-plan.md
    Output: src/auth.ts, src/routes.ts
    Wait for completion ✓
 Phase 3: Testing (depends on Phase 2)
  Task: test-architect
    Input: Read src/auth.ts, src/routes.ts
    Output: tests/auth.test.ts
 ```
 **When to Use Parallel:**
 Use parallel execution when agents are **independent**:
 - Multiple validation perspectives (designer + tester + reviewer)
 - Multiple AI models reviewing same code (Grok + Gemini + Claude)
 - Multiple feature implementations in separate files
 **Example: Multi-Perspective Validation**
 ```
 Single Message with Multiple Task Calls:
 Task: designer
  Prompt: Validate UI against Figma design
  Output: ai-docs/design-review.md
 ---
 Task: ui-manual-tester
  Prompt: Test UI in browser for usability
  Output: ai-docs/testing-report.md
 ---
 Task: senior-code-reviewer
  Prompt: Review code quality and patterns
  Output: ai-docs/code-review.md
 All three execute simultaneously (3x speedup!)
 Wait for all to complete, then consolidate results.
 ```
 **The 4-Message Pattern for True Parallel Execution:**
 This is **CRITICAL** for achieving true parallelism:
 ```
 Message 1: Preparation (Bash Only)
  - Create workspace directories
  - Validate inputs
  - Write context files
  - NO Task calls, NO TodoWrite
 Message 2: Parallel Execution (Task Only)
  - Launch ALL agents in SINGLE message
  - ONLY Task tool calls
  - Each Task is independent
  - All execute simultaneously
 Message 3: Consolidation (Task Only)
  - Launch consolidation agent
  - Automatically triggered when N agents complete
 Message 4: Present Results
  - Show user final consolidated results
  - Include links to detailed reports
 ```
 **Anti-Pattern: Mixing Tool Types Breaks Parallelism**
 ```
 ❌ WRONG - Executes Sequentially:
  await TodoWrite({...});  // Tool 1
  await Task({...});       // Tool 2 - waits for TodoWrite
  await Bash({...});       // Tool 3 - waits for Task
  await Task({...});       // Tool 4 - waits for Bash
 ✅ CORRECT - Executes in Parallel:
  await Task({...});  // Task 1
  await Task({...});  // Task 2
  await Task({...});  // Task 3
  // All execute simultaneously
 ```
 **Why Mixing Fails:**
 Claude Code sees different tool types and assumes there are dependencies between them, forcing sequential execution. Using a single tool type (all Task calls) signals that operations are independent and can run in parallel.
 ---
 ### Pattern 2: Agent Selection by Task Type
 **Task Detection Logic:**
 Intelligent workflows automatically detect task type and select appropriate agents:
 ```
 Task Type Detection:
 IF request mentions "API", "endpoint", "backend", "database":
  → API-focused workflow
  → Use: api-architect, backend-developer, test-architect
  → Skip: designer, ui-developer (not relevant)
 ELSE IF request mentions "UI", "component", "design", "Figma":
  → UI-focused workflow
  → Use: designer, ui-developer, ui-manual-tester
  → Optional: ui-developer-codex (external validation)
 ELSE IF request mentions both API and UI:
  → Mixed workflow
  → Use all relevant agents from both categories
  → Coordinate between backend and frontend agents
 ELSE IF request mentions "test", "coverage", "bug":
  → Testing-focused workflow
  → Use: test-architect, ui-manual-tester
  → Optional: codebase-detective (for bug investigation)
 ELSE IF request mentions "review", "validate", "feedback":
  → Review-focused workflow
  → Use: senior-code-reviewer, designer, ui-developer
  → Optional: external model reviewers
 ```
 **Agent Capability Matrix:**
 | Task Type | Primary Agent | Secondary Agent | Optional External |
 |-----------|---------------|-----------------|-------------------|
 | API Implementation | backend-developer | api-architect | - |
 | UI Implementation | ui-developer | designer | ui-developer-codex |
 | Testing | test-architect | ui-manual-tester | - |
 | Code Review | senior-code-reviewer | - | codex-code-reviewer |
 | Architecture Planning | api-architect OR frontend-architect | - | plan-reviewer |
 | Bug Investigation | codebase-detective | test-architect | - |
 | Design Validation | designer | ui-developer | designer-codex |
 **Agent Switching Pattern:**
 Some workflows benefit from **adaptive agent selection** based on context:
 ```
 Example: UI Development with External Validation
 Base Implementation:
  Task: ui-developer
    Prompt: Implement navbar component from design
 User requests external validation:
  → Switch to ui-developer-codex OR add parallel ui-developer-codex
  → Run both: embedded ui-developer + external ui-developer-codex
  → Consolidate feedback from both
 Scenario 1: User wants speed
  → Use ONLY ui-developer (embedded, fast)
 Scenario 2: User wants highest quality
  → Use BOTH ui-developer AND ui-developer-codex (parallel)
  → Consensus analysis on feedback
 Scenario 3: User is out of credits
  → Fallback to ui-developer only
  → Notify user external validation unavailable
 ```
 ---
 ### Pattern 3: Sub-Agent Delegation
 **File-Based Instructions (Context Isolation):**
 When delegating to sub-agents, use **file-based instructions** to avoid context pollution:
 ```
 ✅ CORRECT - File-Based Delegation:
 Step 1: Write instructions to file
  Write: ai-docs/architecture-instructions.md
    Content: "Design authentication system with JWT tokens..."
 Step 2: Delegate to agent with file reference
  Task: api-architect
    Prompt: "Read instructions from ai-docs/architecture-instructions.md
             and create architecture plan."
 Step 3: Agent reads file, does work, writes output
  Agent reads: ai-docs/architecture-instructions.md
  Agent writes: ai-docs/architecture-plan.md
 Step 4: Agent returns brief summary ONLY
  Return: "Architecture plan complete. See ai-docs/architecture-plan.md"
 Step 5: Orchestrator reads output file if needed
  Read: ai-docs/architecture-plan.md
  (Only if orchestrator needs to process the output)
 ```
 **Why File-Based?**
 - **Avoids context pollution:** Long user requirements don't bloat orchestrator context
 - **Reusable:** Multiple agents can read same instruction file
 - **Debuggable:** Files persist after workflow completes
 - **Clean separation:** Input file, output file, orchestrator stays lightweight
 **Anti-Pattern: Inline Delegation**
 ```
 ❌ WRONG - Context Pollution:
 Task: api-architect
  Prompt: "Design authentication system with:
    - JWT tokens with refresh token rotation
    - Email/password login with bcrypt hashing
    - OAuth2 integration with Google, GitHub
    - Rate limiting on login endpoint (5 attempts per 15 min)
    - Password reset flow with time-limited tokens
    - Email verification on signup
    - Role-based access control (admin, user, guest)
    - Session management with Redis
    - Security headers (CORS, CSP, HSTS)
    - ... (500 more lines of requirements)"
 Problem: Orchestrator's context now contains 500+ lines of requirements
         that are only relevant to the architect agent.
 ```
 **Brief Summary Returns:**
 Sub-agents should return **2-5 sentence summaries**, not full output:
 ```
 ✅ CORRECT - Brief Summary:
  "Architecture plan complete. Designed 3-layer authentication:
   JWT with refresh tokens, OAuth2 integration (Google/GitHub),
   and Redis session management. See ai-docs/architecture-plan.md
   for detailed component breakdown."
 ❌ WRONG - Full Output:
  "Architecture plan:
   [500 lines of detailed architecture documentation]
   Components: AuthController, TokenService, OAuthService...
   [another 500 lines]"
 ```
 **Proxy Mode Invocation:**
 For external AI models (Claudish), use the PROXY_MODE directive:
 ```
 Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
  Prompt: "Review authentication implementation for security issues.
           Code context in ai-docs/code-review-context.md"
 Agent Behavior:
  1. Detects PROXY_MODE directive
  2. Extracts model: x-ai/grok-code-fast-1
  3. Extracts task: "Review authentication implementation..."
  4. Executes: claudish --model x-ai/grok-code-fast-1 --stdin <<< "..."
  5. Waits for full response (blocking execution)
  6. Writes: ai-docs/grok-review.md (full detailed review)
  7. Returns: "Grok review complete. Found 3 CRITICAL issues. See ai-docs/grok-review.md"
 ```
 **Key: Blocking Execution**
 External models MUST execute synchronously (blocking) so the agent waits for the full response:
 ```
 ✅ CORRECT - Blocking:
  RESULT=$(claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT")
  echo "$RESULT" > ai-docs/grok-review.md
  echo "Review complete - see ai-docs/grok-review.md"
 ❌ WRONG - Background (returns before completion):
  claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT" &
  echo "Review started..."  # Agent returns immediately, review not done!
 ```
 ---
 ### Pattern 4: Context Window Management
 **When to Delegate:**
 Delegate to sub-agents when:
 - Task is self-contained (clear input → output)
 - Output is large (architecture plan, test suite, review report)
 - Task requires specialized expertise (designer, tester, reviewer)
 - Multiple independent tasks can run in parallel
 **When to Execute in Main Context:**
 Execute in main orchestrator when:
 - Task is small (simple file edit, command execution)
 - Output is brief (yes/no decision, status check)
 - Task depends on orchestrator state (current phase, iteration count)
 - Context pollution risk is low
 **Context Size Estimation:**
 **Note:** Token estimates below are approximations based on typical usage. Actual context consumption varies by skill complexity, Claude model version, and conversation history. Use these as guidelines, not exact measurements.
 Estimate context usage to decide delegation strategy:
 ```
 Context Budget: ~200k tokens (Claude Sonnet 4.5 - actual varies by model)
 Current context usage breakdown:
  - System prompt: 10k tokens
  - Skill content (5 skills): 10k tokens
  - Command instructions: 5k tokens
  - User request: 1k tokens
  - Conversation history: 20k tokens
  ───────────────────────────────────
  Total used: 46k tokens
  Remaining: 154k tokens
 Safe threshold for delegation: If task will consume >30k tokens, delegate
 Example: Architecture planning for large system
  - Requirements: 5k tokens
  - Expected output: 20k tokens
  - Total: 25k tokens
  ───────────────────────────────────
  Decision: Delegate (keeps orchestrator lightweight)
 ```
 **Delegation Strategy by Context Size:**
 | Task Output Size | Strategy |
 |------------------|----------|
 | < 1k tokens | Execute in orchestrator |
 | 1k - 10k tokens | Delegate with summary return |
 | 10k - 30k tokens | Delegate with file-based output |
 | > 30k tokens | Multi-agent decomposition |
 **Example: Multi-Agent Decomposition**
 ```
 User Request: "Implement complete e-commerce system"
 This is >100k tokens if done by single agent. Decompose:
 Phase 1: Break into sub-systems
  - Product catalog
  - Shopping cart
  - Checkout flow
  - User authentication
  - Order management
  - Payment integration
 Phase 2: Delegate each sub-system to separate agent
  Task: backend-developer
    Instruction file: ai-docs/product-catalog-requirements.md
    Output file: ai-docs/product-catalog-implementation.md
  Task: backend-developer
    Instruction file: ai-docs/shopping-cart-requirements.md
    Output file: ai-docs/shopping-cart-implementation.md
  ... (6 parallel agent invocations)
 Phase 3: Integration agent
  Task: backend-developer
    Instruction: "Integrate 6 sub-systems. Read output files:
                  ai-docs/*-implementation.md"
    Output: ai-docs/integration-plan.md
 Total context per agent: ~20k tokens (manageable)
 vs. Single agent: 120k+ tokens (context overflow risk)
 ```
 ---
 ## Integration with Other Skills
 **multi-agent-coordination + multi-model-validation:**
 ```
 Use Case: Code review with multiple AI models
 Step 1: Agent Selection (multi-agent-coordination)
  - Detect task type: Code review
  - Select agents: senior-code-reviewer (embedded) + external models
 Step 2: Parallel Execution (multi-model-validation)
  - Follow 4-Message Pattern
  - Launch all reviewers simultaneously
  - Wait for all to complete
 Step 3: Consolidation (multi-model-validation)
  - Auto-consolidate reviews
  - Apply consensus analysis
 ```
 **multi-agent-coordination + quality-gates:**
 ```
 Use Case: Iterative UI validation
 Step 1: Agent Selection (multi-agent-coordination)
  - Detect task type: UI validation
  - Select agents: designer, ui-developer
 Step 2: Iteration Loop (quality-gates)
  - Run designer validation
  - If not PASS: delegate to ui-developer for fixes
  - Loop until PASS or max iterations
 Step 3: User Validation Gate (quality-gates)
  - MANDATORY user approval
  - Collect feedback if issues found
 ```
 **multi-agent-coordination + todowrite-orchestration:**
 ```
 Use Case: Multi-phase implementation workflow
 Step 1: Initialize TodoWrite (todowrite-orchestration)
  - Create task list for all phases
 Step 2: Sequential Agent Delegation (multi-agent-coordination)
  - Phase 1: api-architect
  - Phase 2: backend-developer (depends on Phase 1)
  - Phase 3: test-architect (depends on Phase 2)
  - Update TodoWrite after each phase
 ```
 ---
 ## Best Practices
 **Do:**
 - ✅ Use parallel execution for independent tasks (3-5x speedup)
 - ✅ Use sequential execution when there are dependencies
 - ✅ Use file-based instructions to avoid context pollution
 - ✅ Return brief summaries (2-5 sentences) from sub-agents
 - ✅ Select agents based on task type (API/UI/Testing/Review)
 - ✅ Decompose large tasks into multiple sub-agent calls
 - ✅ Estimate context usage before delegating
 **Don't:**
 - ❌ Mix tool types in parallel execution (breaks parallelism)
 - ❌ Inline long instructions in Task prompts (context pollution)
 - ❌ Return full output from sub-agents (use files instead)
 - ❌ Use parallel execution for dependent tasks (wrong results)
 - ❌ Use single agent for >100k token tasks (context overflow)
 - ❌ Forget to wait for all parallel tasks before consolidating
 **Performance Tips:**
 - Parallel execution: 3-5x faster than sequential (5min vs 15min)
 - File-based delegation: Saves 50-80% context usage
 - Agent switching: Adapt to user preferences (speed vs quality)
 - Context decomposition: Enables tasks that would otherwise overflow
 ---
 ## Examples
 ### Example 1: Parallel Multi-Model Code Review
 **Scenario:** User requests "Review my authentication code with Grok and Gemini"
 **Agent Selection:**
 - Task type: Code review
 - Agents: senior-code-reviewer (embedded), external Grok, external Gemini
 **Execution:**
 ```
 Message 1: Preparation
  - Write code context to ai-docs/code-review-context.md
 Message 2: Parallel Execution (3 Task calls in single message)
  Task: senior-code-reviewer
    Prompt: "Review ai-docs/code-review-context.md for security issues"
  ---
  Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
    Prompt: "Review ai-docs/code-review-context.md for security issues"
  ---
  Task: codex-code-reviewer PROXY_MODE: google/gemini-2.5-flash
    Prompt: "Review ai-docs/code-review-context.md for security issues"
  All 3 execute simultaneously (3x faster than sequential)
 Message 3: Auto-Consolidation
  Task: senior-code-reviewer
    Prompt: "Consolidate 3 reviews from:
             - ai-docs/claude-review.md
             - ai-docs/grok-review.md
             - ai-docs/gemini-review.md
             Prioritize by consensus."
 Message 4: Present Results
  "Review complete. 3 models analyzed your code.
   Top 5 issues by consensus:
   1. [UNANIMOUS] Missing input validation on login endpoint
   2. [STRONG] SQL injection risk in user query
   3. [MAJORITY] Weak password requirements
   See ai-docs/consolidated-review.md for details."
 ```
 **Result:** 5 minutes total (vs 15+ if sequential), consensus-based prioritization
 ---
 ### Example 2: Sequential Multi-Phase Implementation
 **Scenario:** User requests "Implement payment integration feature"
 **Agent Selection:**
 - Task type: API implementation
 - Agents: api-architect → backend-developer → test-architect → senior-code-reviewer
 **Execution:**
 ```
 Phase 1: Architecture Planning
  Write: ai-docs/payment-requirements.md
    "Integrate Stripe payment processing with webhook support..."
  Task: api-architect
    Prompt: "Read ai-docs/payment-requirements.md
             Create architecture plan"
    Output: ai-docs/payment-architecture.md
    Return: "Architecture plan complete. Designed 3-layer payment system."
  Wait for completion ✓
 Phase 2: Implementation (depends on Phase 1)
  Task: backend-developer
    Prompt: "Read ai-docs/payment-architecture.md
             Implement payment integration"
    Output: src/payment.ts, src/webhooks.ts
    Return: "Payment integration implemented. 2 new files, 500 lines."
  Wait for completion ✓
 Phase 3: Testing (depends on Phase 2)
  Task: test-architect
    Prompt: "Write tests for src/payment.ts and src/webhooks.ts"
    Output: tests/payment.test.ts, tests/webhooks.test.ts
    Return: "Test suite complete. 20 tests covering payment flows."
  Wait for completion ✓
 Phase 4: Code Review (depends on Phase 3)
  Task: senior-code-reviewer
    Prompt: "Review payment integration implementation"
    Output: ai-docs/payment-review.md
    Return: "Review complete. 2 MEDIUM issues found."
  Wait for completion ✓
 ```
 **Result:** Sequential execution ensures each phase has correct inputs
 ---
 ### Example 3: Adaptive Agent Switching
 **Scenario:** User requests "Validate navbar implementation" with optional external AI
 **Agent Selection:**
 - Task type: UI validation
 - Base agent: designer
 - Optional: designer-codex (if user wants external validation)
 **Execution:**
 ```
 Step 1: Ask user preference
  "Do you want external AI validation? (Yes/No)"
 Step 2a: If user says NO (speed mode)
  Task: designer
    Prompt: "Validate navbar against Figma design"
    Output: ai-docs/design-review.md
    Return: "Design validation complete. PASS with 2 minor suggestions."
 Step 2b: If user says YES (quality mode)
  Message 1: Parallel Validation
    Task: designer
      Prompt: "Validate navbar against Figma design"
    ---
    Task: designer PROXY_MODE: design-review-codex
      Prompt: "Validate navbar against Figma design"
  Message 2: Consolidate
    Task: designer
      Prompt: "Consolidate 2 design reviews. Prioritize by consensus."
      Output: ai-docs/design-review-consolidated.md
      Return: "Consolidated review complete. Both agree on 1 CRITICAL issue."
 Step 3: User validation
  Present consolidated review to user for approval
 ```
 **Result:** Adaptive workflow based on user preference (speed vs quality)
 ---
 ## Troubleshooting
 **Problem: Parallel tasks executing sequentially**
 Cause: Mixed tool types in same message
 Solution: Use 4-Message Pattern with ONLY Task calls in Message 2
 ```
 ❌ Wrong:
  await TodoWrite({...});
  await Task({...});
  await Task({...});
 ✅ Correct:
  Message 1: await Bash({...});  (prep only)
  Message 2: await Task({...}); await Task({...}); (parallel)
 ```
 ---
 **Problem: Orchestrator context overflowing**
 Cause: Inline instructions or full output returns
 Solution: Use file-based delegation + brief summaries
 ```
 ❌ Wrong:
  Task: agent
    Prompt: "[1000 lines of inline requirements]"
  Return: "[500 lines of full output]"
 ✅ Correct:
  Write: ai-docs/requirements.md
  Task: agent
    Prompt: "Read ai-docs/requirements.md"
  Return: "Complete. See ai-docs/output.md"
 ```
 ---
 **Problem: Wrong agent selected for task**
 Cause: Task type detection failed
 Solution: Explicitly detect task type using keywords
 ```
 Check user request for keywords:
  - API/endpoint/backend → api-architect, backend-developer
  - UI/component/design → designer, ui-developer
  - test/coverage → test-architect
  - review/validate → senior-code-reviewer
 Default: Ask user to clarify task type
 ```
 ---
 **Problem: Agent returns immediately before external model completes**
 Cause: Background execution (non-blocking claudish call)
 Solution: Use synchronous (blocking) execution
 ```
 ❌ Wrong:
  claudish --model grok ... &  (background, returns immediately)
 ✅ Correct:
  RESULT=$(claudish --model grok ...)  (blocks until complete)
 ```
 ---
 ## Summary
 Multi-agent coordination is about choosing the right execution strategy:
 - **Parallel** when tasks are independent (3-5x speedup)
 - **Sequential** when tasks have dependencies (correct results)
 - **File-based delegation** to avoid context pollution (50-80% savings)
 - **Brief summaries** from sub-agents (clean orchestrator context)
 - **Task type detection** for intelligent agent selection
 - **Context decomposition** for large tasks (avoid overflow)
 Master these patterns and you can orchestrate workflows of any complexity.
 ---
 **Extracted From:**
 - `/implement` command (task detection, sequential workflows)
 - `/validate-ui` command (adaptive agent switching)
 - `/review` command (parallel execution, 4-Message Pattern)
 - `CLAUDE.md` Parallel Multi-Model Execution Protocol
--- a/skills/multi-model-validation/SKILL.md
+++ b/skills/multi-model-validation/SKILL.md
--- a/skills/quality-gates/SKILL.md
+++ b/skills/quality-gates/SKILL.md
@@ -0,0 +1,996 @@
 ---
 name: quality-gates
 description: Implement quality gates, user approval, iteration loops, and test-driven development. Use when validating with users, implementing feedback loops, classifying issue severity, running test-driven loops, or building multi-iteration workflows. Trigger keywords - "approval", "user validation", "iteration", "feedback loop", "severity", "test-driven", "TDD", "quality gate", "consensus".
 version: 0.1.0
 tags: [orchestration, quality-gates, approval, iteration, feedback, severity, test-driven, TDD]
 keywords: [approval, validation, iteration, feedback-loop, severity, test-driven, TDD, quality-gate, consensus, user-approval]
 ---
 # Quality Gates
 **Version:** 1.0.0
 **Purpose:** Patterns for approval gates, iteration loops, and quality validation in multi-agent workflows
 **Status:** Production Ready
 ## Overview
 Quality gates are checkpoints in workflows where execution pauses for validation before proceeding. They prevent low-quality work from advancing through the pipeline and ensure user expectations are met.
 This skill provides battle-tested patterns for:
 - **User approval gates** (cost gates, quality gates, final acceptance)
 - **Iteration loops** (automated refinement until quality threshold met)
 - **Issue severity classification** (CRITICAL, HIGH, MEDIUM, LOW)
 - **Multi-reviewer consensus** (unanimous vs majority agreement)
 - **Feedback loops** (user reports issues → agent fixes → user validates)
 - **Test-driven development loops** (write tests → run → analyze failures → fix → repeat)
 Quality gates transform "fire and forget" workflows into **iterative refinement systems** that consistently produce high-quality results.
 ## Core Patterns
 ### Pattern 1: User Approval Gates
 **When to Ask for Approval:**
 Use approval gates for:
 - **Cost gates:** Before expensive operations (multi-model review, large-scale refactoring)
 - **Quality gates:** Before proceeding to next phase (design validation before implementation)
 - **Final validation:** Before completing workflow (user acceptance testing)
 - **Irreversible operations:** Before destructive actions (delete files, database migrations)
 **How to Present Approval:**
 ```
 Good Approval Prompt:
 "You selected 5 AI models for code review:
 - Claude Sonnet (embedded, free)
 - Grok Code Fast (external, $0.002)
 - Gemini 2.5 Flash (external, $0.001)
 - GPT-5 Codex (external, $0.004)
 - DeepSeek Coder (external, $0.001)
 Estimated total cost: $0.008 ($0.005 - $0.010)
 Expected duration: ~5 minutes
 Proceed with multi-model review? (Yes/No/Cancel)"
 Why it works:
 ✓ Clear context (what will happen)
 ✓ Cost transparency (range, not single number)
 ✓ Time expectation (5 minutes)
 ✓ Multiple options (Yes/No/Cancel)
 ```
 **Anti-Pattern: Vague Approval**
 ```
 ❌ Wrong:
 "This will cost money. Proceed? (Yes/No)"
 Why it fails:
 ✗ No cost details (how much?)
 ✗ No context (what will happen?)
 ✗ No alternatives (what if user says no?)
 ```
 **Handling User Responses:**
 ```
 User says YES:
  → Proceed with workflow
  → Track approval in logs
  → Continue to next step
 User says NO:
  → Offer alternatives:
    1. Use fewer models (reduce cost)
    2. Use only free embedded Claude
    3. Skip this step entirely
    4. Cancel workflow
  → Ask user to choose alternative
  → Proceed based on choice
 User says CANCEL:
  → Gracefully exit workflow
  → Save partial results (if any)
  → Log cancellation reason
  → Clean up temporary files
  → Notify user: "Workflow cancelled. Partial results saved to..."
 ```
 **Approval Bypasses (Advanced):**
 For automated workflows, allow approval bypass:
 ```
 Automated Workflow Mode:
 If workflow is triggered by CI/CD or scheduled task:
  → Skip user approval gates
  → Use predefined defaults (e.g., max cost $0.10)
  → Log decisions for audit trail
  → Email report to stakeholders after completion
 Example:
  if (isAutomatedMode) {
    if (estimatedCost <= maxAutomatedCost) {
      log("Auto-approved: $0.008 <= $0.10 threshold");
      proceed();
    } else {
      log("Auto-rejected: $0.008 > $0.10 threshold");
      notifyStakeholders("Cost exceeds automated threshold");
      abort();
    }
  }
 ```
 ---
 ### Pattern 2: Iteration Loop Patterns
 **Max Iteration Limits:**
 Always set a **max iteration limit** to prevent infinite loops:
 ```
 Typical Iteration Limits:
 Automated quality loops: 10 iterations
  - Designer validation → Developer fixes → Repeat
  - Test failures → Developer fixes → Repeat
 User feedback loops: 5 rounds
  - User reports issues → Developer fixes → User validates → Repeat
 Code review loops: 3 rounds
  - Reviewer finds issues → Developer fixes → Re-review → Repeat
 Multi-model consensus: 1 iteration (no loop)
  - Parallel review → Consolidate → Present
 ```
 **Exit Criteria:**
 Define clear **exit criteria** for each loop type:
 ```
 Loop Type: Design Validation
 Exit Criteria (checked after each iteration):
  1. Designer assessment = PASS → Exit loop (success)
  2. Iteration count >= 10 → Exit loop (max iterations)
  3. User manually approves → Exit loop (user override)
  4. No changes made by developer → Exit loop (stuck, escalate)
 Example:
  for (let i = 1; i <= 10; i++) {
    const review = await designer.validate();
    if (review.assessment === "PASS") {
      log("Design validation passed on iteration " + i);
      break;  // Success exit
    }
    if (i === 10) {
      log("Max iterations reached. Escalating to user validation.");
      break;  // Max iterations exit
    }
    await developer.fix(review.issues);
  }
 ```
 **Progress Tracking:**
 Show clear progress to user during iterations:
 ```
 Iteration Loop Progress:
 Iteration 1/10: Designer found 5 issues → Developer fixing...
 Iteration 2/10: Designer found 3 issues → Developer fixing...
 Iteration 3/10: Designer found 1 issue → Developer fixing...
 Iteration 4/10: Designer assessment: PASS ✓
 Loop completed in 4 iterations.
 ```
 **Iteration History Documentation:**
 Track what happened in each iteration:
 ```
 Iteration History (ai-docs/iteration-history.md):
 ## Iteration 1
 Designer Assessment: NEEDS IMPROVEMENT
 Issues Found:
  - Button color doesn't match design (#3B82F6 vs #2563EB)
  - Spacing between elements too tight (8px vs 16px)
  - Font size incorrect (14px vs 16px)
 Developer Actions:
  - Updated button color to #2563EB
  - Increased spacing to 16px
  - Changed font size to 16px
 ## Iteration 2
 Designer Assessment: NEEDS IMPROVEMENT
 Issues Found:
  - Border radius too large (8px vs 4px)
 Developer Actions:
  - Reduced border radius to 4px
 ## Iteration 3
 Designer Assessment: PASS ✓
 Issues Found: None
 Result: Design validation complete
 ```
 ---
 ### Pattern 3: Issue Severity Classification
 **Severity Levels:**
 Use 4-level severity classification:
 ```
 CRITICAL - Must fix immediately
  - Blocks core functionality
  - Security vulnerabilities (SQL injection, XSS, auth bypass)
  - Data loss risk
  - System crashes
  - Build failures
  Action: STOP workflow, fix immediately, re-validate
 HIGH - Should fix soon
  - Major bugs (incorrect behavior)
  - Performance issues (>3s page load, memory leaks)
  - Accessibility violations (keyboard navigation broken)
  - User experience blockers
  Action: Fix in current iteration, proceed after fix
 MEDIUM - Should fix
  - Minor bugs (edge cases, visual glitches)
  - Code quality issues (duplication, complexity)
  - Non-blocking performance issues
  - Incomplete error handling
  Action: Fix if time permits, or schedule for next iteration
 LOW - Nice to have
  - Code style inconsistencies
  - Minor refactoring opportunities
  - Documentation improvements
  - Polish and optimization
  Action: Log for future improvement, proceed without fixing
 ```
 **Severity-Based Prioritization:**
 ```
 Issue List (sorted by severity):
 CRITICAL Issues (must fix all before proceeding):
  1. SQL injection in user search endpoint
  2. Missing authentication check on admin routes
  3. Password stored in plaintext
 HIGH Issues (fix before code review):
  4. Memory leak in WebSocket connection
  5. Missing error handling in payment flow
  6. Accessibility: keyboard navigation broken
 MEDIUM Issues (fix if time permits):
  7. Code duplication in auth controllers
  8. Inconsistent error messages
  9. Missing JSDoc comments
 LOW Issues (defer to future):
  10. Variable naming inconsistency
  11. Redundant type annotations
  12. CSS could use more specificity
 Action Plan:
  - Fix CRITICAL (1-3) immediately → Re-run tests
  - Fix HIGH (4-6) before code review
  - Log MEDIUM (7-9) for next iteration
  - Ignore LOW (10-12) for now
 ```
 **Severity Escalation:**
 Issues can escalate in severity based on context:
 ```
 Context-Based Escalation:
 Issue: "Missing error handling in payment flow"
  Base Severity: MEDIUM (code quality issue)
  Context 1: Development environment
    → Severity: MEDIUM (not user-facing yet)
  Context 2: Production environment
    → Severity: HIGH (affects real users, money involved)
  Context 3: Production + recent payment failures
    → Severity: CRITICAL (actively causing issues)
 Rule: Escalate severity when:
  - Issue affects production users
  - Issue involves money/security/data
  - Issue is currently causing failures
 ```
 ---
 ### Pattern 4: Multi-Reviewer Consensus
 **Consensus Levels:**
 When multiple reviewers evaluate the same code/design:
 ```
 UNANIMOUS (100% agreement):
  - ALL reviewers flagged this issue
  - VERY HIGH confidence
  - Highest priority (likely a real problem)
 Example:
  3/3 reviewers: "SQL injection in search endpoint"
  → UNANIMOUS consensus
  → CRITICAL priority (all agree it's critical)
 STRONG CONSENSUS (67-99% agreement):
  - MOST reviewers flagged this issue
  - HIGH confidence
  - High priority (probably a real problem)
 Example:
  2/3 reviewers: "Missing input validation"
  → STRONG consensus (67%)
  → HIGH priority
 MAJORITY (50-66% agreement):
  - HALF or more flagged this issue
  - MEDIUM confidence
  - Medium priority (worth investigating)
 Example:
  2/3 reviewers: "Code duplication in controllers"
  → MAJORITY consensus (67%)
  → MEDIUM priority
 DIVERGENT (< 50% agreement):
  - Only 1-2 reviewers flagged this issue
  - LOW confidence
  - Low priority (may be model-specific or false positive)
 Example:
  1/3 reviewers: "Variable naming could be better"
  → DIVERGENT (33%)
  → LOW priority (one reviewer's opinion)
 ```
 **Consensus-Based Prioritization:**
 ```
 Prioritized Issue List (by consensus + severity):
 1. [UNANIMOUS - CRITICAL] SQL injection in search
   ALL reviewers agree: Claude, Grok, Gemini (3/3)
 2. [UNANIMOUS - HIGH] Missing input validation
   ALL reviewers agree: Claude, Grok, Gemini (3/3)
 3. [STRONG - HIGH] Memory leak in WebSocket
   MOST reviewers agree: Claude, Grok (2/3)
 4. [MAJORITY - MEDIUM] Code duplication
   HALF+ reviewers agree: Claude, Gemini (2/3)
 5. [DIVERGENT - LOW] Variable naming
   SINGLE reviewer: Claude only (1/3)
 Action:
  - Fix issues 1-2 immediately (unanimous + CRITICAL/HIGH)
  - Fix issue 3 before review (strong consensus)
  - Consider issue 4 (majority, but medium severity)
  - Ignore issue 5 (divergent, likely false positive)
 ```
 ---
 ### Pattern 5: Feedback Loop Implementation
 **User Feedback Loop:**
 ```
 Workflow: User Validation with Feedback
 Step 1: Initial Implementation
  Developer implements feature
  Designer/Tester validates
  Present to user for manual validation
 Step 2: User Validation Gate (MANDATORY)
  Present to user:
    "Implementation complete. Please manually verify:
     - Open app at http://localhost:3000
     - Test feature: [specific instructions]
     - Compare to design reference
     Does it meet expectations? (Yes/No)"
 Step 3a: User says YES
  → ✅ Feature approved
  → Generate final report
  → Mark workflow complete
 Step 3b: User says NO
  → Collect specific feedback
 Step 4: Collect Specific Feedback
  Ask user: "Please describe the issues you found:"
  User response:
    "1. Button color is wrong (should be blue, not green)
     2. Spacing is too tight between elements
     3. Font size is too small"
 Step 5: Extract Structured Feedback
  Parse user feedback into structured issues:
  Issue 1:
    Component: Button
    Problem: Color incorrect
    Expected: Blue (#2563EB)
    Actual: Green (#10B981)
    Severity: MEDIUM
  Issue 2:
    Component: Container
    Problem: Spacing too tight
    Expected: 16px
    Actual: 8px
    Severity: MEDIUM
  Issue 3:
    Component: Text
    Problem: Font size too small
    Expected: 16px
    Actual: 14px
    Severity: LOW
 Step 6: Launch Fixing Agent
  Task: ui-developer
    Prompt: "Fix user-reported issues:
             1. Button color: Change from #10B981 to #2563EB
             2. Container spacing: Increase from 8px to 16px
             3. Text font size: Increase from 14px to 16px
             User feedback: [user's exact words]"
 Step 7: Re-validate
  After fixes:
    - Re-run designer validation
    - Loop back to Step 2 (user validation)
 Step 8: Max Feedback Rounds
  Limit: 5 feedback rounds (prevent infinite loop)
  If round > 5:
    Escalate to human review
    "Unable to meet user expectations after 5 rounds.
     Manual intervention required."
 ```
 **Feedback Round Tracking:**
 ```
 Feedback Round History:
 Round 1:
  User Issues: Button color, spacing, font size
  Fixes Applied: Updated all 3 issues
  Result: Re-validate
 Round 2:
  User Issues: Border radius too large
  Fixes Applied: Reduced border radius
  Result: Re-validate
 Round 3:
  User Issues: None
  Result: ✅ APPROVED
 Total Rounds: 3/5
 ```
 ---
 ### Pattern 6: Test-Driven Development Loop
 **When to Use:**
 Use TDD loop **after implementing code, before code review**:
 ```
 Workflow Phases:
 Phase 1: Architecture Planning
 Phase 2: Implementation
 Phase 2.5: Test-Driven Development Loop ← THIS PATTERN
 Phase 3: Code Review
 Phase 4: User Acceptance
 ```
 **The TDD Loop Pattern:**
 ```
 Step 1: Write Tests First
  Task: test-architect
    Prompt: "Write comprehensive tests for authentication feature.
             Requirements: [link to requirements]
             Implementation: [link to code]"
    Output: tests/auth.test.ts
 Step 2: Run Tests
  Bash: bun test tests/auth.test.ts
  Capture output and exit code
 Step 3: Check Test Results
  If all tests pass:
    → ✅ TDD loop complete
    → Proceed to code review (Phase 3)
  If tests fail:
    → Analyze failure (continue to Step 4)
 Step 4: Analyze Test Failure
  Task: test-architect
    Prompt: "Analyze test failure output:
             [test failure logs]
             Determine root cause:
             - TEST_ISSUE: Test has bug (bad assertion, missing mock, wrong expectation)
             - IMPLEMENTATION_ISSUE: Code has bug (logic error, missing validation, incorrect behavior)
             Provide detailed analysis."
  test-architect returns:
    verdict: TEST_ISSUE | IMPLEMENTATION_ISSUE
    analysis: Detailed explanation
    recommendation: Specific fix needed
 Step 5a: If TEST_ISSUE (test is wrong)
  Task: test-architect
    Prompt: "Fix test based on analysis:
             [analysis from Step 4]"
  After fix:
    → Re-run tests (back to Step 2)
    → Loop continues
 Step 5b: If IMPLEMENTATION_ISSUE (code is wrong)
  Provide structured feedback to developer:
  Task: backend-developer
    Prompt: "Fix implementation based on test failure:
             Test Failure:
             [failure output]
             Root Cause:
             [analysis from test-architect]
             Recommended Fix:
             [specific fix needed]"
  After fix:
    → Re-run tests (back to Step 2)
    → Loop continues
 Step 6: Max Iteration Limit
  Limit: 10 iterations
  Iteration tracking:
    Iteration 1/10: 5 tests failed → Fix implementation
    Iteration 2/10: 2 tests failed → Fix test (bad mock)
    Iteration 3/10: All tests pass ✅
  If iteration > 10:
    Escalate to human review
    "Unable to pass all tests after 10 iterations.
     Manual debugging required."
 ```
 **Example TDD Loop:**
 ```
 Phase 2.5: Test-Driven Development Loop
 Iteration 1:
  Tests Run: 20 tests
  Results: 5 failed, 15 passed
  Failure: "JWT token validation fails with expired token"
  Analysis: IMPLEMENTATION_ISSUE - Missing expiration check
  Fix: Added expiration validation in TokenService
  Re-run: Continue to Iteration 2
 Iteration 2:
  Tests Run: 20 tests
  Results: 2 failed, 18 passed
  Failure: "Mock database not reset between tests"
  Analysis: TEST_ISSUE - Missing beforeEach cleanup
  Fix: Added database reset in test setup
  Re-run: Continue to Iteration 3
 Iteration 3:
  Tests Run: 20 tests
  Results: All passed ✅
  Result: TDD loop complete, proceed to code review
 Total Iterations: 3/10
 Duration: ~5 minutes
 Benefits:
  - Caught 2 bugs before code review
  - Fixed 1 test quality issue
  - All tests passing gives confidence in implementation
 ```
 **Benefits of TDD Loop:**
 ```
 Benefits:
 1. Catch bugs early (before code review, not after)
 2. Ensure test quality (test-architect fixes bad tests)
 3. Automated quality assurance (no manual testing needed)
 4. Fast feedback loop (seconds to run tests, not minutes)
 5. Confidence in implementation (all tests passing)
 Performance:
  Traditional: Implement → Review → Find bugs → Fix → Re-review
  Time: 30+ minutes, multiple review rounds
  TDD Loop: Implement → Test → Fix → Test → Review (with confidence)
  Time: 15 minutes, single review round (fewer issues)
 ```
 ---
 ## Integration with Other Skills
 **quality-gates + multi-model-validation:**
 ```
 Use Case: Cost approval before multi-model review
 Step 1: Estimate costs (multi-model-validation)
 Step 2: User approval gate (quality-gates)
  If approved: Proceed with parallel execution
  If rejected: Offer alternatives
 Step 3: Execute review (multi-model-validation)
 ```
 **quality-gates + multi-agent-coordination:**
 ```
 Use Case: Iteration loop with designer validation
 Step 1: Agent selection (multi-agent-coordination)
  Select designer + ui-developer
 Step 2: Iteration loop (quality-gates)
  For i = 1 to 10:
    - Run designer validation
    - If PASS: Exit loop
    - Else: Delegate to ui-developer for fixes
 Step 3: User validation gate (quality-gates)
  Mandatory manual approval
 ```
 **quality-gates + error-recovery:**
 ```
 Use Case: Test-driven loop with error recovery
 Step 1: Run tests (quality-gates TDD pattern)
 Step 2: If test execution fails (error-recovery)
  - Syntax error → Fix and retry
  - Framework crash → Notify user, skip TDD
 Step 3: If tests pass (quality-gates)
  - Proceed to code review
 ```
 ---
 ## Best Practices
 **Do:**
 - ✅ Set max iteration limits (prevent infinite loops)
 - ✅ Define clear exit criteria (PASS, max iterations, user override)
 - ✅ Track iteration history (document what happened)
 - ✅ Show progress to user ("Iteration 3/10 complete")
 - ✅ Classify issue severity (CRITICAL → HIGH → MEDIUM → LOW)
 - ✅ Prioritize by consensus + severity
 - ✅ Ask user approval for expensive operations
 - ✅ Collect specific feedback (not vague complaints)
 - ✅ Use TDD loop to catch bugs early
 **Don't:**
 - ❌ Create infinite loops (no exit criteria)
 - ❌ Skip user validation gates (mandatory for UX)
 - ❌ Ignore consensus (unanimous issues are real)
 - ❌ Batch all severities together (prioritize CRITICAL)
 - ❌ Proceed without approval for >$0.01 operations
 - ❌ Collect vague feedback ("it's wrong" → what specifically?)
 - ❌ Skip TDD loop (catches bugs before expensive review)
 **Performance:**
 - Iteration loops: 5-10 iterations typical, max 10-15 min
 - TDD loop: 3-5 iterations typical, max 5-10 min
 - User feedback: 1-3 rounds typical, max 5 rounds
 ---
 ## Examples
 ### Example 1: User Approval Gate for Multi-Model Review
 **Scenario:** User requests multi-model review, costs $0.008
 **Execution:**
 ```
 Step 1: Estimate Costs
  Input: 450 lines × 1.5 = 675 tokens per model
  Output: 2000-4000 tokens per model
  Total: 3 models × 3000 avg = 9000 output tokens
  Cost: ~$0.008 ($0.005 - $0.010)
 Step 2: Present Approval Gate
  "Multi-model review will analyze 450 lines with 3 AI models:
   - Claude Sonnet (embedded, free)
   - Grok Code Fast (external, $0.002)
   - Gemini 2.5 Flash (external, $0.001)
   Estimated cost: $0.008 ($0.005 - $0.010)
   Duration: ~5 minutes
   Proceed? (Yes/No/Cancel)"
 Step 3a: User says YES
  → Proceed with parallel execution
  → Track approval: log("User approved $0.008 cost")
 Step 3b: User says NO
  → Offer alternatives:
    1. Use only free Claude (no external models)
    2. Use only 1 external model (reduce cost to $0.002)
    3. Skip review entirely
  → Ask user to choose
 Step 3c: User says CANCEL
  → Exit gracefully
  → Log: "User cancelled multi-model review"
  → Clean up temporary files
 ```
 ---
 ### Example 2: Designer Validation Iteration Loop
 **Scenario:** UI implementation with automated iteration until PASS
 **Execution:**
 ```
 Iteration 1:
  Task: designer
    Prompt: "Validate navbar against Figma design"
    Output: ai-docs/design-review-1.md
    Assessment: NEEDS IMPROVEMENT
    Issues:
      - Button color: #3B82F6 (expected #2563EB)
      - Spacing: 8px (expected 16px)
  Task: ui-developer
    Prompt: "Fix issues from ai-docs/design-review-1.md"
    Changes: Updated button color, increased spacing
  Result: Continue to Iteration 2
 Iteration 2:
  Task: designer
    Prompt: "Re-validate navbar"
    Output: ai-docs/design-review-2.md
    Assessment: NEEDS IMPROVEMENT
    Issues:
      - Border radius: 8px (expected 4px)
  Task: ui-developer
    Prompt: "Fix border radius issue"
    Changes: Reduced border radius to 4px
  Result: Continue to Iteration 3
 Iteration 3:
  Task: designer
    Prompt: "Re-validate navbar"
    Output: ai-docs/design-review-3.md
    Assessment: PASS ✓
    Issues: None
  Result: Exit loop (success)
 Summary:
  Total Iterations: 3/10
  Duration: ~8 minutes
  Automated Fixes: 3 issues resolved
  Result: PASS, proceed to user validation
 ```
 ---
 ### Example 3: Test-Driven Development Loop
 **Scenario:** Authentication implementation with TDD
 **Execution:**
 ```
 Phase 2.5: Test-Driven Development Loop
 Iteration 1:
  Task: test-architect
    Prompt: "Write tests for authentication feature"
    Output: tests/auth.test.ts (20 tests)
  Bash: bun test tests/auth.test.ts
    Result: 5 failed, 15 passed
  Task: test-architect
    Prompt: "Analyze test failures"
    Verdict: IMPLEMENTATION_ISSUE
    Analysis: "Missing JWT expiration validation"
  Task: backend-developer
    Prompt: "Add JWT expiration validation"
    Changes: Updated TokenService.verify()
  Bash: bun test tests/auth.test.ts
    Result: Continue to Iteration 2
 Iteration 2:
  Bash: bun test tests/auth.test.ts
    Result: 2 failed, 18 passed
  Task: test-architect
    Prompt: "Analyze test failures"
    Verdict: TEST_ISSUE
    Analysis: "Mock database not reset between tests"
  Task: test-architect
    Prompt: "Fix test setup"
    Changes: Added beforeEach cleanup
  Bash: bun test tests/auth.test.ts
    Result: Continue to Iteration 3
 Iteration 3:
  Bash: bun test tests/auth.test.ts
    Result: All 20 passed ✅
  Result: TDD loop complete, proceed to code review
 Summary:
  Total Iterations: 3/10
  Duration: ~5 minutes
  Bugs Caught: 1 implementation bug, 1 test bug
  Result: All tests passing, high confidence in code
 ```
 ---
 ## Troubleshooting
 **Problem: Infinite iteration loop**
 Cause: No exit criteria or max iteration limit
 Solution: Always set max iterations (10 for automated, 5 for user feedback)
 ```
 ❌ Wrong:
  while (true) {
    if (review.assessment === "PASS") break;
    fix();
  }
 ✅ Correct:
  for (let i = 1; i <= 10; i++) {
    if (review.assessment === "PASS") break;
    if (i === 10) escalateToUser();
    fix();
  }
 ```
 ---
 **Problem: User approval skipped for expensive operation**
 Cause: Missing approval gate
 Solution: Always ask approval for costs >$0.01
 ```
 ❌ Wrong:
  if (userRequestedMultiModel) {
    executeReview();
  }
 ✅ Correct:
  if (userRequestedMultiModel) {
    const cost = estimateCost();
    if (cost > 0.01) {
      const approved = await askUserApproval(cost);
      if (!approved) return offerAlternatives();
    }
    executeReview();
  }
 ```
 ---
 **Problem: All issues treated equally**
 Cause: No severity classification
 Solution: Classify by severity, prioritize CRITICAL
 ```
 ❌ Wrong:
  issues.forEach(issue => fix(issue));
 ✅ Correct:
  const critical = issues.filter(i => i.severity === "CRITICAL");
  const high = issues.filter(i => i.severity === "HIGH");
  critical.forEach(issue => fix(issue));  // Fix critical first
  high.forEach(issue => fix(issue));      // Then high
  // MEDIUM and LOW deferred or skipped
 ```
 ---
 ## Summary
 Quality gates ensure high-quality results through:
 - **User approval gates** (cost, quality, final validation)
 - **Iteration loops** (automated refinement, max 10 iterations)
 - **Severity classification** (CRITICAL → HIGH → MEDIUM → LOW)
 - **Consensus prioritization** (unanimous → strong → majority → divergent)
 - **Feedback loops** (collect specific issues, fix, re-validate)
 - **Test-driven development** (write tests, run, fix, repeat until pass)
 Master these patterns and your workflows will consistently produce high-quality, validated results.
 ---
 **Extracted From:**
 - `/review` command (user approval for costs, consensus analysis)
 - `/validate-ui` command (iteration loops, user validation gates, feedback collection)
 - `/implement` command (PHASE 2.5 test-driven development loop)
 - Multi-model review patterns (consensus-based prioritization)
--- a/skills/todowrite-orchestration/SKILL.md
+++ b/skills/todowrite-orchestration/SKILL.md
@@ -0,0 +1,983 @@
 ---
 name: todowrite-orchestration
 description: Track progress in multi-phase workflows with TodoWrite. Use when orchestrating 5+ phase commands, managing iteration loops, tracking parallel tasks, or providing real-time progress visibility. Trigger keywords - "phase tracking", "progress", "workflow", "multi-step", "multi-phase", "todo", "tracking", "status".
 version: 0.1.0
 tags: [orchestration, todowrite, progress, tracking, workflow, multi-phase]
 keywords: [phase-tracking, progress, workflow, multi-step, multi-phase, todo, tracking, status, visibility]
 ---
 # TodoWrite Orchestration
 **Version:** 1.0.0
 **Purpose:** Patterns for using TodoWrite in complex multi-phase workflows
 **Status:** Production Ready
 ## Overview
 TodoWrite orchestration is the practice of using the TodoWrite tool to provide **real-time progress visibility** in complex multi-phase workflows. It transforms opaque "black box" workflows into transparent, trackable processes where users can see:
 - What phase is currently executing
 - How many phases remain
 - Which tasks are pending, in-progress, or completed
 - Overall progress percentage
 - Iteration counts in loops
 This skill provides battle-tested patterns for:
 - **Phase initialization** (create complete task list before starting)
 - **Task granularity** (how to break phases into trackable tasks)
 - **Status transitions** (pending → in_progress → completed)
 - **Real-time updates** (mark complete immediately, not batched)
 - **Iteration tracking** (progress through loops)
 - **Parallel task tracking** (multiple agents executing simultaneously)
 TodoWrite orchestration is especially valuable for workflows with >5 phases or >10 minutes duration, where users need progress feedback.
 ## Core Patterns
 ### Pattern 1: Phase Initialization
 **Create TodoWrite List BEFORE Starting:**
 Initialize TodoWrite as **step 0** of your workflow, before any actual work begins:
 ```
 ✅ CORRECT - Initialize First:
 Step 0: Initialize TodoWrite
  TodoWrite: Create task list
    - PHASE 1: Gather user inputs
    - PHASE 1: Validate inputs
    - PHASE 2: Select AI models
    - PHASE 2: Estimate costs
    - PHASE 2: Get user approval
    - PHASE 3: Launch parallel reviews
    - PHASE 3: Wait for all reviews
    - PHASE 4: Consolidate reviews
    - PHASE 5: Present results
 Step 1: Start actual work (PHASE 1)
  Mark "PHASE 1: Gather user inputs" as in_progress
  ... do work ...
  Mark "PHASE 1: Gather user inputs" as completed
  Mark "PHASE 1: Validate inputs" as in_progress
  ... do work ...
 ❌ WRONG - Create During Workflow:
 Step 1: Do some work
  ... work happens ...
  TodoWrite: Create task "Did some work" (completed)
 Step 2: Do more work
  ... work happens ...
  TodoWrite: Create task "Did more work" (completed)
 Problem: User has no visibility into upcoming phases
 ```
 **List All Phases Upfront:**
 When initializing, include **all phases** in the task list, not just the current phase:
 ```
 ✅ CORRECT - Complete Visibility:
 TodoWrite Initial State:
  [ ] PHASE 1: Gather user inputs
  [ ] PHASE 1: Validate inputs
  [ ] PHASE 2: Architecture planning
  [ ] PHASE 3: Implementation
  [ ] PHASE 3: Run quality checks
  [ ] PHASE 4: Code review
  [ ] PHASE 5: User acceptance
  [ ] PHASE 6: Generate report
 User sees: "8 tasks total, 0 complete, Phase 1 starting"
 ❌ WRONG - Incremental Discovery:
 TodoWrite Initial State:
  [ ] PHASE 1: Gather user inputs
  [ ] PHASE 1: Validate inputs
 (User thinks workflow is 2 tasks, then surprised by 6 more phases)
 ```
 **Why Initialize First:**
 1. **User expectation setting:** User knows workflow scope (8 phases, ~20 minutes)
 2. **Progress visibility:** User can see % complete (3/8 = 37.5%)
 3. **Time estimation:** User can estimate remaining time based on progress
 4. **Transparency:** No hidden phases or surprises
 ---
 ### Pattern 2: Task Granularity Guidelines
 **One Task Per Significant Operation:**
 Each task should represent a **significant operation** (1-5 minutes of work):
 ```
 ✅ CORRECT - Significant Operations:
 Tasks:
  - PHASE 1: Ask user for inputs (30s)
  - PHASE 2: Generate architecture plan (2 min)
  - PHASE 3: Implement feature (5 min)
  - PHASE 4: Run tests (1 min)
  - PHASE 5: Code review (3 min)
 Each task = meaningful unit of work
 ❌ WRONG - Too Granular:
 Tasks:
  - PHASE 1: Ask user question 1
  - PHASE 1: Ask user question 2
  - PHASE 1: Ask user question 3
  - PHASE 2: Read file A
  - PHASE 2: Read file B
  - PHASE 2: Write file C
  - ... (50 micro-tasks)
 Problem: Too many updates, clutters user interface
 ```
 **Multi-Step Phases: Break Into 2-3 Sub-Tasks:**
 For complex phases (>5 minutes), break into 2-3 sub-tasks:
 ```
 ✅ CORRECT - Sub-Task Breakdown:
 PHASE 3: Implementation (15 min total)
  → Sub-tasks:
    - PHASE 3: Implement core logic (5 min)
    - PHASE 3: Add error handling (3 min)
    - PHASE 3: Write tests (7 min)
 User sees progress within phase: "PHASE 3: 2/3 complete"
 ❌ WRONG - Single Monolithic Task:
 PHASE 3: Implementation (15 min)
  → No sub-tasks
 Problem: User sees "in_progress" for 15 min with no updates
 ```
 **Avoid Too Many Tasks:**
 Limit to **max 15-20 tasks** for readability:
 ```
 ✅ CORRECT - 12 Tasks (readable):
 10-phase workflow:
  - PHASE 1: Ask user
  - PHASE 2: Plan (2 sub-tasks)
  - PHASE 3: Implement (3 sub-tasks)
  - PHASE 4: Test
  - PHASE 5: Review (2 sub-tasks)
  - PHASE 6: Fix issues
  - PHASE 7: Re-review
  - PHASE 8: Accept
 Total: 12 tasks (clean, trackable)
 ❌ WRONG - 50 Tasks (overwhelming):
 Every single action as separate task:
  - Read file 1
  - Read file 2
  - Write file 3
  - Run command 1
  - ... (50 tasks)
 Problem: User overwhelmed, can't see forest for trees
 ```
 **Guideline by Workflow Duration:**
 ```
 Workflow Duration → Task Count:
 < 5 minutes:    3-5 tasks
 5-15 minutes:   8-12 tasks
 15-30 minutes:  12-18 tasks
 > 30 minutes:   15-20 tasks (if more, group into phases)
 Example:
  5-minute workflow (3 phases):
    - PHASE 1: Prepare
    - PHASE 2: Execute
    - PHASE 3: Present
  Total: 3 tasks ✓
  20-minute workflow (6 phases):
    - PHASE 1: Ask user
    - PHASE 2: Plan (2 sub-tasks)
    - PHASE 3: Implement (3 sub-tasks)
    - PHASE 4: Test
    - PHASE 5: Review (2 sub-tasks)
    - PHASE 6: Accept
  Total: 11 tasks ✓
 ```
 ---
 ### Pattern 3: Status Transitions
 **Exactly ONE Task In Progress at a Time:**
 Maintain the invariant: **exactly one task in_progress** at any moment:
 ```
 ✅ CORRECT - One In-Progress:
 State at time T1:
  [✓] PHASE 1: Ask user (completed)
  [✓] PHASE 2: Plan (completed)
  [→] PHASE 3: Implement (in_progress)  ← Only one
  [ ] PHASE 4: Test (pending)
  [ ] PHASE 5: Review (pending)
 State at time T2 (after PHASE 3 completes):
  [✓] PHASE 1: Ask user (completed)
  [✓] PHASE 2: Plan (completed)
  [✓] PHASE 3: Implement (completed)
  [→] PHASE 4: Test (in_progress)  ← Only one
  [ ] PHASE 5: Review (pending)
 ❌ WRONG - Multiple In-Progress:
 State:
  [✓] PHASE 1: Ask user (completed)
  [→] PHASE 2: Plan (in_progress)  ← Two in-progress?
  [→] PHASE 3: Implement (in_progress)  ← Confusing!
  [ ] PHASE 4: Test (pending)
 Problem: User confused about current phase
 ```
 **Status Transition Sequence:**
 ```
 Lifecycle of a Task:
 1. Created: pending
   (Task exists, not started yet)
 2. Started: pending → in_progress
   (Mark as in_progress when starting work)
 3. Completed: in_progress → completed
   (Mark as completed immediately after finishing)
 4. Next task: Mark next task as in_progress
   (Continue to next task)
 Example Timeline:
 T=0s:  [→] Task 1 (in_progress), [ ] Task 2 (pending)
 T=30s: [✓] Task 1 (completed),   [→] Task 2 (in_progress)
 T=60s: [✓] Task 1 (completed),   [✓] Task 2 (completed)
 ```
 **NEVER Batch Completions:**
 Mark tasks completed **immediately** after finishing, not at end of phase:
 ```
 ✅ CORRECT - Immediate Updates:
 Mark "PHASE 1: Ask user" as in_progress
 ... do work (30s) ...
 Mark "PHASE 1: Ask user" as completed  ← Immediate
 Mark "PHASE 1: Validate inputs" as in_progress
 ... do work (20s) ...
 Mark "PHASE 1: Validate inputs" as completed  ← Immediate
 User sees real-time progress
 ❌ WRONG - Batched Updates:
 Mark "PHASE 1: Ask user" as in_progress
 ... do work (30s) ...
 Mark "PHASE 1: Validate inputs" as in_progress
 ... do work (20s) ...
 (At end of PHASE 1, batch update both to completed)
 Problem: User doesn't see progress for 50s, thinks workflow is stuck
 ```
 ---
 ### Pattern 4: Real-Time Progress Tracking
 **Update TodoWrite As Work Progresses:**
 TodoWrite should reflect **current state**, not past state:
 ```
 ✅ CORRECT - Real-Time Updates:
 T=0s:  Initialize TodoWrite (8 tasks, all pending)
 T=5s:  Mark "PHASE 1" as in_progress
 T=35s: Mark "PHASE 1" as completed, "PHASE 2" as in_progress
 T=90s: Mark "PHASE 2" as completed, "PHASE 3" as in_progress
 ...
 User always sees accurate current state
 ❌ WRONG - Delayed Updates:
 T=0s:   Initialize TodoWrite
 T=300s: Workflow completes
 T=301s: Update all tasks to completed
 Problem: No progress visibility for 5 minutes
 ```
 **Add New Tasks If Discovered During Execution:**
 If you discover additional work during execution, add new tasks:
 ```
 Scenario: During implementation, realize refactoring needed
 Initial TodoWrite:
  [✓] PHASE 1: Plan
  [→] PHASE 2: Implement
  [ ] PHASE 3: Test
  [ ] PHASE 4: Review
 During PHASE 2, discover:
  "Implementation requires refactoring legacy code"
 Updated TodoWrite:
  [✓] PHASE 1: Plan
  [✓] PHASE 2: Implement core logic (completed)
  [→] PHASE 2: Refactor legacy code (in_progress)  ← New task added
  [ ] PHASE 3: Test
  [ ] PHASE 4: Review
 User sees: "Additional work discovered: refactoring. Total now 5 tasks."
 ```
 **User Can See Current Progress at Any Time:**
 With real-time updates, user can check progress:
 ```
 User checks at T=120s:
 TodoWrite State:
  [✓] PHASE 1: Ask user
  [✓] PHASE 2: Plan architecture
  [→] PHASE 3: Implement core logic (in_progress)
  [ ] PHASE 3: Add error handling
  [ ] PHASE 3: Write tests
  [ ] PHASE 4: Code review
  [ ] PHASE 5: Accept
 User sees: "3/8 tasks complete (37.5%), currently implementing core logic"
 ```
 ---
 ### Pattern 5: Iteration Loop Tracking
 **Create Task Per Iteration:**
 For iteration loops, create a task for each iteration:
 ```
 ✅ CORRECT - Iteration Tasks:
 Design Validation Loop (max 10 iterations):
 Initial TodoWrite:
  [ ] Iteration 1/10: Designer validation
  [ ] Iteration 2/10: Designer validation
  [ ] Iteration 3/10: Designer validation
  ... (create all 10 upfront)
 Progress:
  [✓] Iteration 1/10: Designer validation (NEEDS IMPROVEMENT)
  [✓] Iteration 2/10: Designer validation (NEEDS IMPROVEMENT)
  [→] Iteration 3/10: Designer validation (in_progress)
  [ ] Iteration 4/10: Designer validation
  ...
 User sees: "Iteration 3/10 in progress, 2 complete"
 ❌ WRONG - Single Loop Task:
 TodoWrite:
  [→] Design validation loop (in_progress)
 Problem: User sees "in_progress" for 10 minutes, no iteration visibility
 ```
 **Mark Iteration Complete When Done:**
 ```
 Iteration Lifecycle:
 Iteration 1:
  Mark "Iteration 1/10" as in_progress
  Run designer validation
  If NEEDS IMPROVEMENT: Run developer fixes
  Mark "Iteration 1/10" as completed
 Iteration 2:
  Mark "Iteration 2/10" as in_progress
  Run designer validation
  If PASS: Exit loop early
  Mark "Iteration 2/10" as completed
 Result: Loop exited after 2 iterations
  [✓] Iteration 1/10 (completed)
  [✓] Iteration 2/10 (completed)
  [ ] Iteration 3/10 (not needed, loop exited)
  ...
 User sees: "Loop completed in 2/10 iterations"
 ```
 **Track Total Iterations vs Max Limit:**
 ```
 Iteration Progress:
 Max: 10 iterations
 Current: 5
 TodoWrite State:
  [✓] Iteration 1/10
  [✓] Iteration 2/10
  [✓] Iteration 3/10
  [✓] Iteration 4/10
  [→] Iteration 5/10
  [ ] Iteration 6/10
  ...
 User sees: "Iteration 5/10 (50% through max)"
 Warning at Iteration 8:
  "Iteration 8/10 - approaching max, may escalate to user if not PASS"
 ```
 **Clear Progress Visibility:**
 ```
 Iteration Loop with TodoWrite:
 User Request: "Validate UI design"
 TodoWrite:
  [✓] PHASE 1: Gather design reference
  [✓] Iteration 1/10: Designer validation (5 issues found)
  [✓] Iteration 2/10: Designer validation (3 issues found)
  [✓] Iteration 3/10: Designer validation (1 issue found)
  [→] Iteration 4/10: Designer validation (in_progress)
  [ ] Iteration 5/10: Designer validation
  ...
  [ ] PHASE 3: User validation gate
 User sees:
  - 4 iterations completed (40% through max)
  - Issues reducing each iteration (5 → 3 → 1)
  - Progress toward PASS
 ```
 ---
 ### Pattern 6: Parallel Task Tracking
 **Multiple Agents Executing Simultaneously:**
 When running agents in parallel, track each separately:
 ```
 ✅ CORRECT - Separate Tasks for Parallel Agents:
 Multi-Model Review (3 models in parallel):
 TodoWrite:
  [✓] PHASE 1: Prepare review context
  [→] PHASE 2: Claude review (in_progress)
  [→] PHASE 2: Grok review (in_progress)
  [→] PHASE 2: Gemini review (in_progress)
  [ ] PHASE 3: Consolidate reviews
 Note: 3 tasks "in_progress" is OK for parallel execution
      (Exception to "one in_progress" rule)
 As models complete:
  [✓] PHASE 1: Prepare review context
  [✓] PHASE 2: Claude review (completed)  ← First to finish
  [→] PHASE 2: Grok review (in_progress)
  [→] PHASE 2: Gemini review (in_progress)
  [ ] PHASE 3: Consolidate reviews
 User sees: "1/3 reviews complete, 2 in progress"
 ❌ WRONG - Single Task for Parallel Work:
 TodoWrite:
  [✓] PHASE 1: Prepare
  [→] PHASE 2: Run 3 reviews (in_progress)
  [ ] PHASE 3: Consolidate
 Problem: No visibility into which reviews are complete
 ```
 **Update As Each Agent Completes:**
 ```
 Parallel Execution Timeline:
 T=0s:  Launch 3 reviews in parallel
  [→] Claude review (in_progress)
  [→] Grok review (in_progress)
  [→] Gemini review (in_progress)
 T=60s: Claude completes first
  [✓] Claude review (completed)
  [→] Grok review (in_progress)
  [→] Gemini review (in_progress)
 T=120s: Gemini completes
  [✓] Claude review (completed)
  [→] Grok review (in_progress)
  [✓] Gemini review (completed)
 T=180s: Grok completes
  [✓] Claude review (completed)
  [✓] Grok review (completed)
  [✓] Gemini review (completed)
 User sees real-time completion updates
 ```
 **Progress Indicators During Long Parallel Tasks:**
 ```
 For long-running parallel tasks (>2 minutes), show progress:
 T=0s:   "Launching 5 AI model reviews (estimated 5 minutes)..."
 T=60s:  "1/5 reviews complete..."
 T=120s: "2/5 reviews complete..."
 T=180s: "4/5 reviews complete, 1 in progress..."
 T=240s: "All reviews complete! Consolidating results..."
 TodoWrite mirrors this:
  [✓] Claude review (1/5 complete)
  [✓] Grok review (2/5 complete)
  [→] Gemini review (in_progress)
  [→] GPT-5 review (in_progress)
  [→] DeepSeek review (in_progress)
 ```
 ---
 ## Integration with Other Skills
 **todowrite-orchestration + multi-agent-coordination:**
 ```
 Use Case: Multi-phase implementation workflow
 Step 1: Initialize TodoWrite (todowrite-orchestration)
  Create task list for all 8 phases
 Step 2: Sequential Agent Delegation (multi-agent-coordination)
  Phase 1: api-architect
    Mark PHASE 1 as in_progress
    Delegate to api-architect
    Mark PHASE 1 as completed
  Phase 2: backend-developer
    Mark PHASE 2 as in_progress
    Delegate to backend-developer
    Mark PHASE 2 as completed
  ... continue for all phases
 ```
 **todowrite-orchestration + multi-model-validation:**
 ```
 Use Case: Multi-model review with progress tracking
 Step 1: Initialize TodoWrite (todowrite-orchestration)
  [ ] PHASE 1: Prepare context
  [ ] PHASE 2: Launch reviews (5 models)
  [ ] PHASE 3: Consolidate results
 Step 2: Parallel Execution (multi-model-validation)
  Mark "PHASE 2: Launch reviews" as in_progress
  Launch all 5 models simultaneously
  As each completes: Update progress (1/5, 2/5, ...)
  Mark "PHASE 2: Launch reviews" as completed
 Step 3: Real-Time Visibility (todowrite-orchestration)
  User sees: "PHASE 2: 3/5 reviews complete..."
 ```
 **todowrite-orchestration + quality-gates:**
 ```
 Use Case: Iteration loop with TodoWrite tracking
 Step 1: Initialize TodoWrite (todowrite-orchestration)
  [ ] Iteration 1/10
  [ ] Iteration 2/10
  ...
 Step 2: Iteration Loop (quality-gates)
  For i = 1 to 10:
    Mark "Iteration i/10" as in_progress
    Run designer validation
    If PASS: Exit loop
    Mark "Iteration i/10" as completed
 Step 3: Progress Visibility
  User sees: "Iteration 5/10 complete, 5 remaining"
 ```
 ---
 ## Best Practices
 **Do:**
 - ✅ Initialize TodoWrite BEFORE starting work (step 0)
 - ✅ List ALL phases upfront (user sees complete scope)
 - ✅ Use 8-15 tasks for typical workflows (readable)
 - ✅ Mark completed IMMEDIATELY after finishing (real-time)
 - ✅ Keep exactly ONE task in_progress (except parallel tasks)
 - ✅ Track iterations separately (Iteration 1/10, 2/10, ...)
 - ✅ Update as work progresses (not batched at end)
 - ✅ Add new tasks if discovered during execution
 **Don't:**
 - ❌ Create TodoWrite during workflow (initialize first)
 - ❌ Hide phases from user (list all upfront)
 - ❌ Create too many tasks (>20 overwhelms user)
 - ❌ Batch completions at end of phase (update real-time)
 - ❌ Leave multiple tasks in_progress (pick one)
 - ❌ Use single task for loop (track iterations separately)
 - ❌ Update only at start/end (update during execution)
 **Performance:**
 - TodoWrite overhead: <1s per update (negligible)
 - User visibility benefit: Reduces perceived wait time 30-50%
 - Workflow confidence: User knows progress, less likely to cancel
 ---
 ## Examples
 ### Example 1: 8-Phase Implementation Workflow
 **Scenario:** Full-cycle implementation with TodoWrite tracking
 **Execution:**
 ```
 Step 0: Initialize TodoWrite
  TodoWrite: Create task list
    [ ] PHASE 1: Ask user for requirements
    [ ] PHASE 2: Generate architecture plan
    [ ] PHASE 3: Implement core logic
    [ ] PHASE 3: Add error handling
    [ ] PHASE 3: Write tests
    [ ] PHASE 4: Run test suite
    [ ] PHASE 5: Code review
    [ ] PHASE 6: Fix review issues
    [ ] PHASE 7: User acceptance
    [ ] PHASE 8: Generate report
  User sees: "10 tasks, 0 complete, Phase 1 starting..."
 Step 1: PHASE 1
  Mark "PHASE 1: Ask user" as in_progress
  ... gather requirements (30s) ...
  Mark "PHASE 1: Ask user" as completed
  User sees: "1/10 tasks complete (10%)"
 Step 2: PHASE 2
  Mark "PHASE 2: Architecture plan" as in_progress
  ... generate plan (2 min) ...
  Mark "PHASE 2: Architecture plan" as completed
  User sees: "2/10 tasks complete (20%)"
 Step 3: PHASE 3 (3 sub-tasks)
  Mark "PHASE 3: Implement core" as in_progress
  ... implement (3 min) ...
  Mark "PHASE 3: Implement core" as completed
  User sees: "3/10 tasks complete (30%)"
  Mark "PHASE 3: Add error handling" as in_progress
  ... add error handling (2 min) ...
  Mark "PHASE 3: Add error handling" as completed
  User sees: "4/10 tasks complete (40%)"
  Mark "PHASE 3: Write tests" as in_progress
  ... write tests (3 min) ...
  Mark "PHASE 3: Write tests" as completed
  User sees: "5/10 tasks complete (50%)"
 ... continue through all phases ...
 Final State:
  [✓] All 10 tasks completed
  User sees: "10/10 tasks complete (100%). Workflow finished!"
 Total Duration: ~15 minutes
 User Experience: Continuous progress updates every 1-3 minutes
 ```
 ---
 ### Example 2: Iteration Loop with Progress Tracking
 **Scenario:** Design validation with 10 max iterations
 **Execution:**
 ```
 Step 0: Initialize TodoWrite
  TodoWrite: Create task list
    [ ] PHASE 1: Gather design reference
    [ ] Iteration 1/10: Designer validation
    [ ] Iteration 2/10: Designer validation
    [ ] Iteration 3/10: Designer validation
    [ ] Iteration 4/10: Designer validation
    [ ] Iteration 5/10: Designer validation
    ... (10 iterations total)
    [ ] PHASE 3: User validation gate
 Step 1: PHASE 1
  Mark "PHASE 1: Gather design" as in_progress
  ... gather design (20s) ...
  Mark "PHASE 1: Gather design" as completed
 Step 2: Iteration Loop
  Iteration 1:
    Mark "Iteration 1/10" as in_progress
    Designer: "NEEDS IMPROVEMENT - 5 issues"
    Developer: Fix 5 issues
    Mark "Iteration 1/10" as completed
    User sees: "Iteration 1/10 complete, 5 issues fixed"
  Iteration 2:
    Mark "Iteration 2/10" as in_progress
    Designer: "NEEDS IMPROVEMENT - 3 issues"
    Developer: Fix 3 issues
    Mark "Iteration 2/10" as completed
    User sees: "Iteration 2/10 complete, 3 issues fixed"
  Iteration 3:
    Mark "Iteration 3/10" as in_progress
    Designer: "NEEDS IMPROVEMENT - 1 issue"
    Developer: Fix 1 issue
    Mark "Iteration 3/10" as completed
    User sees: "Iteration 3/10 complete, 1 issue fixed"
  Iteration 4:
    Mark "Iteration 4/10" as in_progress
    Designer: "PASS ✓"
    Mark "Iteration 4/10" as completed
    Exit loop (early exit)
    User sees: "Loop completed in 4/10 iterations"
 Step 3: PHASE 3
  Mark "PHASE 3: User validation" as in_progress
  ... user validates ...
  Mark "PHASE 3: User validation" as completed
 Final State:
  [✓] PHASE 1: Gather design
  [✓] Iteration 1/10 (5 issues fixed)
  [✓] Iteration 2/10 (3 issues fixed)
  [✓] Iteration 3/10 (1 issue fixed)
  [✓] Iteration 4/10 (PASS)
  [ ] Iteration 5/10 (not needed)
  ...
  [✓] PHASE 3: User validation
 User Experience: Clear iteration progress, early exit visible
 ```
 ---
 ### Example 3: Parallel Multi-Model Review
 **Scenario:** 5 AI models reviewing code in parallel
 **Execution:**
 ```
 Step 0: Initialize TodoWrite
  TodoWrite: Create task list
    [ ] PHASE 1: Prepare review context
    [ ] PHASE 2: Claude review
    [ ] PHASE 2: Grok review
    [ ] PHASE 2: Gemini review
    [ ] PHASE 2: GPT-5 review
    [ ] PHASE 2: DeepSeek review
    [ ] PHASE 3: Consolidate reviews
    [ ] PHASE 4: Present results
 Step 1: PHASE 1
  Mark "PHASE 1: Prepare context" as in_progress
  ... prepare (30s) ...
  Mark "PHASE 1: Prepare context" as completed
 Step 2: PHASE 2 (Parallel Execution)
  Mark all 5 reviews as in_progress:
    [→] Claude review
    [→] Grok review
    [→] Gemini review
    [→] GPT-5 review
    [→] DeepSeek review
  Launch all 5 in parallel (4-Message Pattern)
  As each completes:
    T=60s:  Claude completes
      [✓] Claude review
      User sees: "1/5 reviews complete"
    T=90s:  Gemini completes
      [✓] Gemini review
      User sees: "2/5 reviews complete"
    T=120s: GPT-5 completes
      [✓] GPT-5 review
      User sees: "3/5 reviews complete"
    T=150s: Grok completes
      [✓] Grok review
      User sees: "4/5 reviews complete"
    T=180s: DeepSeek completes
      [✓] DeepSeek review
      User sees: "5/5 reviews complete!"
 Step 3: PHASE 3
  Mark "PHASE 3: Consolidate" as in_progress
  ... consolidate (30s) ...
  Mark "PHASE 3: Consolidate" as completed
 Step 4: PHASE 4
  Mark "PHASE 4: Present results" as in_progress
  ... present (10s) ...
  Mark "PHASE 4: Present results" as completed
 Final State:
  [✓] All 8 tasks completed
  User sees: "Multi-model review complete in 3 minutes"
 User Experience:
  - Real-time progress as each model completes
  - Clear visibility: "3/5 reviews complete"
  - Reduces perceived wait time (user knows progress)
 ```
 ---
 ## Troubleshooting
 **Problem: User thinks workflow is stuck**
 Cause: No TodoWrite updates for >1 minute
 Solution: Update TodoWrite more frequently, or add sub-tasks
 ```
 ❌ Wrong:
  [→] PHASE 3: Implementation (in_progress for 10 minutes)
 ✅ Correct:
  [✓] PHASE 3: Implement core logic (2 min)
  [✓] PHASE 3: Add error handling (3 min)
  [→] PHASE 3: Write tests (in_progress, 2 min so far)
 User sees progress every 2-3 minutes
 ```
 ---
 **Problem: Too many tasks (>20), overwhelming**
 Cause: Too granular task breakdown
 Solution: Group micro-tasks into larger operations
 ```
 ❌ Wrong (25 tasks):
  [ ] Read file 1
  [ ] Read file 2
  [ ] Write file 3
  ... (25 micro-tasks)
 ✅ Correct (8 tasks):
  [ ] PHASE 1: Gather inputs (includes reading files)
  [ ] PHASE 2: Process data
  ... (8 significant operations)
 ```
 ---
 **Problem: Multiple tasks "in_progress" (not parallel execution)**
 Cause: Forgot to mark previous task as completed
 Solution: Always mark completed before starting next
 ```
 ❌ Wrong:
  [→] PHASE 1: Ask user (in_progress)
  [→] PHASE 2: Plan (in_progress)  ← Both in_progress?
 ✅ Correct:
  [✓] PHASE 1: Ask user (completed)
  [→] PHASE 2: Plan (in_progress)  ← Only one
 ```
 ---
 ## Summary
 TodoWrite orchestration provides real-time progress visibility through:
 - **Phase initialization** (create task list before starting)
 - **Appropriate granularity** (8-15 tasks, significant operations)
 - **Real-time updates** (mark completed immediately)
 - **Exactly one in_progress** (except parallel execution)
 - **Iteration tracking** (separate task per iteration)
 - **Parallel task tracking** (update as each completes)
 Master these patterns and users will always know:
 - What's happening now
 - What's coming next
 - How much progress has been made
 - How much remains
 This transforms "black box" workflows into transparent, trackable processes.
 ---
 **Extracted From:**
 - `/review` command (10-task initialization, phase-based tracking)
 - `/implement` command (8-phase workflow with sub-tasks)
 - `/validate-ui` command (iteration tracking, user feedback rounds)
 - All multi-phase orchestration workflows
		`@@ -0,0 +1,3 @@`
							`# orchestration`

							`Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.`