Initial commit

2025-11-30 08:39:00 +08:00
commit 56486a03ae
8 changed files with 4910 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,13 @@
+{
+  "name": "orchestration",
+  "description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
+  "version": "0.1.1",
+  "author": {
+    "name": "Jack Rudenko",
+    "email": "i@madappgang.com",
+    "company": "MadAppGang"
+  },
+  "skills": [
+    "./skills"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# orchestration
+
+Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,61 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:MadAppGang/claude-code:plugins/orchestration",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "ad90df36843224b97a17f14cfd5a207d4e053c67",
+    "treeHash": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150",
+    "generatedAt": "2025-11-28T10:12:05.859643Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "orchestration",
+    "description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
+    "version": "0.1.1"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "215babb6dff86f8783d8e97d0a21546e2aaa3b055bc1cde5c4e16c6bf3d6c7a5"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "36414e18947889714f9d80576e01edaab8b3ffdf9efd44107e0f5fb42b0e2270"
+      },
+      {
+        "path": "skills/todowrite-orchestration/SKILL.md",
+        "sha256": "f681467a2eef99945f90b8f2b654c8c9713f4153afdff19a0c0b312d2f6084de"
+      },
+      {
+        "path": "skills/quality-gates/SKILL.md",
+        "sha256": "ba13c21d8e9f8abeb856bbec4a6ebc821e92dfe0857942797959087452b175c3"
+      },
+      {
+        "path": "skills/error-recovery/SKILL.md",
+        "sha256": "133564d1bc0d35a8c35074b089120fe7d7a757b71bdd6222a7a5c23e45f20aa3"
+      },
+      {
+        "path": "skills/multi-agent-coordination/SKILL.md",
+        "sha256": "9e0156350eb09447221898598611a5270921c31168e7698c4bd0d3bd0ced4616"
+      },
+      {
+        "path": "skills/multi-model-validation/SKILL.md",
+        "sha256": "9d5c46dfa531f911f4fcc4070fd6c039900bcdb440c997f7eac384001a1ba33e"
+      }
+    ],
+    "dirSha256": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/error-recovery/SKILL.md
+++ b/skills/error-recovery/SKILL.md
--- a/skills/multi-agent-coordination/SKILL.md
+++ b/skills/multi-agent-coordination/SKILL.md
@@ -0,0 +1,742 @@
+---
+name: multi-agent-coordination
+description: Coordinate multiple agents in parallel or sequential workflows. Use when running agents simultaneously, delegating to sub-agents, switching between specialized agents, or managing agent selection. Trigger keywords - "parallel agents", "sequential workflow", "delegate", "multi-agent", "sub-agent", "agent switching", "task decomposition".
+version: 0.1.0
+tags: [orchestration, multi-agent, parallel, sequential, delegation, coordination]
+keywords: [parallel, sequential, delegate, sub-agent, agent-switching, multi-agent, task-decomposition, coordination]
+---
+
+# Multi-Agent Coordination
+
+**Version:** 1.0.0
+**Purpose:** Patterns for coordinating multiple agents in complex workflows
+**Status:** Production Ready
+
+## Overview
+
+Multi-agent coordination is the foundation of sophisticated Claude Code workflows. This skill provides battle-tested patterns for orchestrating multiple specialized agents to accomplish complex tasks that are beyond the capabilities of a single agent.
+
+The key challenge in multi-agent systems is **dependencies**. Some tasks must execute sequentially (one agent's output feeds into another), while others can run in parallel (independent validations from different perspectives). Getting this right is the difference between a 5-minute workflow and a 15-minute one.
+
+This skill teaches you:
+- When to run agents in **parallel** vs **sequential**
+- How to **select the right agent** for each task
+- How to **delegate** to sub-agents without polluting context
+- How to manage **context windows** across multiple agent calls
+
+## Core Patterns
+
+### Pattern 1: Sequential vs Parallel Execution
+
+**When to Use Sequential:**
+
+Use sequential execution when there are **dependencies** between agents:
+- Agent B needs Agent A's output as input
+- Workflow phases must complete in order (plan → implement → test → review)
+- Each agent modifies shared state (same files)
+
+**Example: Multi-Phase Implementation**
+
+```
+Phase 1: Architecture Planning
+  Task: api-architect
+    Output: ai-docs/architecture-plan.md
+    Wait for completion ✓
+
+Phase 2: Implementation (depends on Phase 1)
+  Task: backend-developer
+    Input: Read ai-docs/architecture-plan.md
+    Output: src/auth.ts, src/routes.ts
+    Wait for completion ✓
+
+Phase 3: Testing (depends on Phase 2)
+  Task: test-architect
+    Input: Read src/auth.ts, src/routes.ts
+    Output: tests/auth.test.ts
+```
+
+**When to Use Parallel:**
+
+Use parallel execution when agents are **independent**:
+- Multiple validation perspectives (designer + tester + reviewer)
+- Multiple AI models reviewing same code (Grok + Gemini + Claude)
+- Multiple feature implementations in separate files
+
+**Example: Multi-Perspective Validation**
+
+```
+Single Message with Multiple Task Calls:
+
+Task: designer
+  Prompt: Validate UI against Figma design
+  Output: ai-docs/design-review.md
+---
+Task: ui-manual-tester
+  Prompt: Test UI in browser for usability
+  Output: ai-docs/testing-report.md
+---
+Task: senior-code-reviewer
+  Prompt: Review code quality and patterns
+  Output: ai-docs/code-review.md
+
+All three execute simultaneously (3x speedup!)
+Wait for all to complete, then consolidate results.
+```
+
+**The 4-Message Pattern for True Parallel Execution:**
+
+This is **CRITICAL** for achieving true parallelism:
+
+```
+Message 1: Preparation (Bash Only)
+  - Create workspace directories
+  - Validate inputs
+  - Write context files
+  - NO Task calls, NO TodoWrite
+
+Message 2: Parallel Execution (Task Only)
+  - Launch ALL agents in SINGLE message
+  - ONLY Task tool calls
+  - Each Task is independent
+  - All execute simultaneously
+
+Message 3: Consolidation (Task Only)
+  - Launch consolidation agent
+  - Automatically triggered when N agents complete
+
+Message 4: Present Results
+  - Show user final consolidated results
+  - Include links to detailed reports
+```
+
+**Anti-Pattern: Mixing Tool Types Breaks Parallelism**
+
+```
+❌ WRONG - Executes Sequentially:
+  await TodoWrite({...});  // Tool 1
+  await Task({...});       // Tool 2 - waits for TodoWrite
+  await Bash({...});       // Tool 3 - waits for Task
+  await Task({...});       // Tool 4 - waits for Bash
+
+✅ CORRECT - Executes in Parallel:
+  await Task({...});  // Task 1
+  await Task({...});  // Task 2
+  await Task({...});  // Task 3
+  // All execute simultaneously
+```
+
+**Why Mixing Fails:**
+
+Claude Code sees different tool types and assumes there are dependencies between them, forcing sequential execution. Using a single tool type (all Task calls) signals that operations are independent and can run in parallel.
+
+---
+
+### Pattern 2: Agent Selection by Task Type
+
+**Task Detection Logic:**
+
+Intelligent workflows automatically detect task type and select appropriate agents:
+
+```
+Task Type Detection:
+
+IF request mentions "API", "endpoint", "backend", "database":
+  → API-focused workflow
+  → Use: api-architect, backend-developer, test-architect
+  → Skip: designer, ui-developer (not relevant)
+
+ELSE IF request mentions "UI", "component", "design", "Figma":
+  → UI-focused workflow
+  → Use: designer, ui-developer, ui-manual-tester
+  → Optional: ui-developer-codex (external validation)
+
+ELSE IF request mentions both API and UI:
+  → Mixed workflow
+  → Use all relevant agents from both categories
+  → Coordinate between backend and frontend agents
+
+ELSE IF request mentions "test", "coverage", "bug":
+  → Testing-focused workflow
+  → Use: test-architect, ui-manual-tester
+  → Optional: codebase-detective (for bug investigation)
+
+ELSE IF request mentions "review", "validate", "feedback":
+  → Review-focused workflow
+  → Use: senior-code-reviewer, designer, ui-developer
+  → Optional: external model reviewers
+```
+
+**Agent Capability Matrix:**
+
+| Task Type | Primary Agent | Secondary Agent | Optional External |
+|-----------|---------------|-----------------|-------------------|
+| API Implementation | backend-developer | api-architect | - |
+| UI Implementation | ui-developer | designer | ui-developer-codex |
+| Testing | test-architect | ui-manual-tester | - |
+| Code Review | senior-code-reviewer | - | codex-code-reviewer |
+| Architecture Planning | api-architect OR frontend-architect | - | plan-reviewer |
+| Bug Investigation | codebase-detective | test-architect | - |
+| Design Validation | designer | ui-developer | designer-codex |
+
+**Agent Switching Pattern:**
+
+Some workflows benefit from **adaptive agent selection** based on context:
+
+```
+Example: UI Development with External Validation
+
+Base Implementation:
+  Task: ui-developer
+    Prompt: Implement navbar component from design
+
+User requests external validation:
+  → Switch to ui-developer-codex OR add parallel ui-developer-codex
+  → Run both: embedded ui-developer + external ui-developer-codex
+  → Consolidate feedback from both
+
+Scenario 1: User wants speed
+  → Use ONLY ui-developer (embedded, fast)
+
+Scenario 2: User wants highest quality
+  → Use BOTH ui-developer AND ui-developer-codex (parallel)
+  → Consensus analysis on feedback
+
+Scenario 3: User is out of credits
+  → Fallback to ui-developer only
+  → Notify user external validation unavailable
+```
+
+---
+
+### Pattern 3: Sub-Agent Delegation
+
+**File-Based Instructions (Context Isolation):**
+
+When delegating to sub-agents, use **file-based instructions** to avoid context pollution:
+
+```
+✅ CORRECT - File-Based Delegation:
+
+Step 1: Write instructions to file
+  Write: ai-docs/architecture-instructions.md
+    Content: "Design authentication system with JWT tokens..."
+
+Step 2: Delegate to agent with file reference
+  Task: api-architect
+    Prompt: "Read instructions from ai-docs/architecture-instructions.md
+             and create architecture plan."
+
+Step 3: Agent reads file, does work, writes output
+  Agent reads: ai-docs/architecture-instructions.md
+  Agent writes: ai-docs/architecture-plan.md
+
+Step 4: Agent returns brief summary ONLY
+  Return: "Architecture plan complete. See ai-docs/architecture-plan.md"
+
+Step 5: Orchestrator reads output file if needed
+  Read: ai-docs/architecture-plan.md
+  (Only if orchestrator needs to process the output)
+```
+
+**Why File-Based?**
+
+- **Avoids context pollution:** Long user requirements don't bloat orchestrator context
+- **Reusable:** Multiple agents can read same instruction file
+- **Debuggable:** Files persist after workflow completes
+- **Clean separation:** Input file, output file, orchestrator stays lightweight
+
+**Anti-Pattern: Inline Delegation**
+
+```
+❌ WRONG - Context Pollution:
+
+Task: api-architect
+  Prompt: "Design authentication system with:
+    - JWT tokens with refresh token rotation
+    - Email/password login with bcrypt hashing
+    - OAuth2 integration with Google, GitHub
+    - Rate limiting on login endpoint (5 attempts per 15 min)
+    - Password reset flow with time-limited tokens
+    - Email verification on signup
+    - Role-based access control (admin, user, guest)
+    - Session management with Redis
+    - Security headers (CORS, CSP, HSTS)
+    - ... (500 more lines of requirements)"
+
+Problem: Orchestrator's context now contains 500+ lines of requirements
+         that are only relevant to the architect agent.
+```
+
+**Brief Summary Returns:**
+
+Sub-agents should return **2-5 sentence summaries**, not full output:
+
+```
+✅ CORRECT - Brief Summary:
+  "Architecture plan complete. Designed 3-layer authentication:
+   JWT with refresh tokens, OAuth2 integration (Google/GitHub),
+   and Redis session management. See ai-docs/architecture-plan.md
+   for detailed component breakdown."
+
+❌ WRONG - Full Output:
+  "Architecture plan:
+   [500 lines of detailed architecture documentation]
+   Components: AuthController, TokenService, OAuthService...
+   [another 500 lines]"
+```
+
+**Proxy Mode Invocation:**
+
+For external AI models (Claudish), use the PROXY_MODE directive:
+
+```
+Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
+  Prompt: "Review authentication implementation for security issues.
+           Code context in ai-docs/code-review-context.md"
+
+Agent Behavior:
+  1. Detects PROXY_MODE directive
+  2. Extracts model: x-ai/grok-code-fast-1
+  3. Extracts task: "Review authentication implementation..."
+  4. Executes: claudish --model x-ai/grok-code-fast-1 --stdin <<< "..."
+  5. Waits for full response (blocking execution)
+  6. Writes: ai-docs/grok-review.md (full detailed review)
+  7. Returns: "Grok review complete. Found 3 CRITICAL issues. See ai-docs/grok-review.md"
+```
+
+**Key: Blocking Execution**
+
+External models MUST execute synchronously (blocking) so the agent waits for the full response:
+
+```
+✅ CORRECT - Blocking:
+  RESULT=$(claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT")
+  echo "$RESULT" > ai-docs/grok-review.md
+  echo "Review complete - see ai-docs/grok-review.md"
+
+❌ WRONG - Background (returns before completion):
+  claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT" &
+  echo "Review started..."  # Agent returns immediately, review not done!
+```
+
+---
+
+### Pattern 4: Context Window Management
+
+**When to Delegate:**
+
+Delegate to sub-agents when:
+- Task is self-contained (clear input → output)
+- Output is large (architecture plan, test suite, review report)
+- Task requires specialized expertise (designer, tester, reviewer)
+- Multiple independent tasks can run in parallel
+
+**When to Execute in Main Context:**
+
+Execute in main orchestrator when:
+- Task is small (simple file edit, command execution)
+- Output is brief (yes/no decision, status check)
+- Task depends on orchestrator state (current phase, iteration count)
+- Context pollution risk is low
+
+**Context Size Estimation:**
+
+**Note:** Token estimates below are approximations based on typical usage. Actual context consumption varies by skill complexity, Claude model version, and conversation history. Use these as guidelines, not exact measurements.
+
+Estimate context usage to decide delegation strategy:
+
+```
+Context Budget: ~200k tokens (Claude Sonnet 4.5 - actual varies by model)
+
+Current context usage breakdown:
+  - System prompt: 10k tokens
+  - Skill content (5 skills): 10k tokens
+  - Command instructions: 5k tokens
+  - User request: 1k tokens
+  - Conversation history: 20k tokens
+  ───────────────────────────────────
+  Total used: 46k tokens
+  Remaining: 154k tokens
+
+Safe threshold for delegation: If task will consume >30k tokens, delegate
+
+Example: Architecture planning for large system
+  - Requirements: 5k tokens
+  - Expected output: 20k tokens
+  - Total: 25k tokens
+  ───────────────────────────────────
+  Decision: Delegate (keeps orchestrator lightweight)
+```
+
+**Delegation Strategy by Context Size:**
+
+| Task Output Size | Strategy |
+|------------------|----------|
+| < 1k tokens | Execute in orchestrator |
+| 1k - 10k tokens | Delegate with summary return |
+| 10k - 30k tokens | Delegate with file-based output |
+| > 30k tokens | Multi-agent decomposition |
+
+**Example: Multi-Agent Decomposition**
+
+```
+User Request: "Implement complete e-commerce system"
+
+This is >100k tokens if done by single agent. Decompose:
+
+Phase 1: Break into sub-systems
+  - Product catalog
+  - Shopping cart
+  - Checkout flow
+  - User authentication
+  - Order management
+  - Payment integration
+
+Phase 2: Delegate each sub-system to separate agent
+  Task: backend-developer
+    Instruction file: ai-docs/product-catalog-requirements.md
+    Output file: ai-docs/product-catalog-implementation.md
+
+  Task: backend-developer
+    Instruction file: ai-docs/shopping-cart-requirements.md
+    Output file: ai-docs/shopping-cart-implementation.md
+
+  ... (6 parallel agent invocations)
+
+Phase 3: Integration agent
+  Task: backend-developer
+    Instruction: "Integrate 6 sub-systems. Read output files:
+                  ai-docs/*-implementation.md"
+    Output: ai-docs/integration-plan.md
+
+Total context per agent: ~20k tokens (manageable)
+vs. Single agent: 120k+ tokens (context overflow risk)
+```
+
+---
+
+## Integration with Other Skills
+
+**multi-agent-coordination + multi-model-validation:**
+
+```
+Use Case: Code review with multiple AI models
+
+Step 1: Agent Selection (multi-agent-coordination)
+  - Detect task type: Code review
+  - Select agents: senior-code-reviewer (embedded) + external models
+
+Step 2: Parallel Execution (multi-model-validation)
+  - Follow 4-Message Pattern
+  - Launch all reviewers simultaneously
+  - Wait for all to complete
+
+Step 3: Consolidation (multi-model-validation)
+  - Auto-consolidate reviews
+  - Apply consensus analysis
+```
+
+**multi-agent-coordination + quality-gates:**
+
+```
+Use Case: Iterative UI validation
+
+Step 1: Agent Selection (multi-agent-coordination)
+  - Detect task type: UI validation
+  - Select agents: designer, ui-developer
+
+Step 2: Iteration Loop (quality-gates)
+  - Run designer validation
+  - If not PASS: delegate to ui-developer for fixes
+  - Loop until PASS or max iterations
+
+Step 3: User Validation Gate (quality-gates)
+  - MANDATORY user approval
+  - Collect feedback if issues found
+```
+
+**multi-agent-coordination + todowrite-orchestration:**
+
+```
+Use Case: Multi-phase implementation workflow
+
+Step 1: Initialize TodoWrite (todowrite-orchestration)
+  - Create task list for all phases
+
+Step 2: Sequential Agent Delegation (multi-agent-coordination)
+  - Phase 1: api-architect
+  - Phase 2: backend-developer (depends on Phase 1)
+  - Phase 3: test-architect (depends on Phase 2)
+  - Update TodoWrite after each phase
+```
+
+---
+
+## Best Practices
+
+**Do:**
+- ✅ Use parallel execution for independent tasks (3-5x speedup)
+- ✅ Use sequential execution when there are dependencies
+- ✅ Use file-based instructions to avoid context pollution
+- ✅ Return brief summaries (2-5 sentences) from sub-agents
+- ✅ Select agents based on task type (API/UI/Testing/Review)
+- ✅ Decompose large tasks into multiple sub-agent calls
+- ✅ Estimate context usage before delegating
+
+**Don't:**
+- ❌ Mix tool types in parallel execution (breaks parallelism)
+- ❌ Inline long instructions in Task prompts (context pollution)
+- ❌ Return full output from sub-agents (use files instead)
+- ❌ Use parallel execution for dependent tasks (wrong results)
+- ❌ Use single agent for >100k token tasks (context overflow)
+- ❌ Forget to wait for all parallel tasks before consolidating
+
+**Performance Tips:**
+- Parallel execution: 3-5x faster than sequential (5min vs 15min)
+- File-based delegation: Saves 50-80% context usage
+- Agent switching: Adapt to user preferences (speed vs quality)
+- Context decomposition: Enables tasks that would otherwise overflow
+
+---
+
+## Examples
+
+### Example 1: Parallel Multi-Model Code Review
+
+**Scenario:** User requests "Review my authentication code with Grok and Gemini"
+
+**Agent Selection:**
+- Task type: Code review
+- Agents: senior-code-reviewer (embedded), external Grok, external Gemini
+
+**Execution:**
+
+```
+Message 1: Preparation
+  - Write code context to ai-docs/code-review-context.md
+
+Message 2: Parallel Execution (3 Task calls in single message)
+  Task: senior-code-reviewer
+    Prompt: "Review ai-docs/code-review-context.md for security issues"
+  ---
+  Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
+    Prompt: "Review ai-docs/code-review-context.md for security issues"
+  ---
+  Task: codex-code-reviewer PROXY_MODE: google/gemini-2.5-flash
+    Prompt: "Review ai-docs/code-review-context.md for security issues"
+
+  All 3 execute simultaneously (3x faster than sequential)
+
+Message 3: Auto-Consolidation
+  Task: senior-code-reviewer
+    Prompt: "Consolidate 3 reviews from:
+             - ai-docs/claude-review.md
+             - ai-docs/grok-review.md
+             - ai-docs/gemini-review.md
+             Prioritize by consensus."
+
+Message 4: Present Results
+  "Review complete. 3 models analyzed your code.
+   Top 5 issues by consensus:
+   1. [UNANIMOUS] Missing input validation on login endpoint
+   2. [STRONG] SQL injection risk in user query
+   3. [MAJORITY] Weak password requirements
+   See ai-docs/consolidated-review.md for details."
+```
+
+**Result:** 5 minutes total (vs 15+ if sequential), consensus-based prioritization
+
+---
+
+### Example 2: Sequential Multi-Phase Implementation
+
+**Scenario:** User requests "Implement payment integration feature"
+
+**Agent Selection:**
+- Task type: API implementation
+- Agents: api-architect → backend-developer → test-architect → senior-code-reviewer
+
+**Execution:**
+
+```
+Phase 1: Architecture Planning
+  Write: ai-docs/payment-requirements.md
+    "Integrate Stripe payment processing with webhook support..."
+
+  Task: api-architect
+    Prompt: "Read ai-docs/payment-requirements.md
+             Create architecture plan"
+    Output: ai-docs/payment-architecture.md
+    Return: "Architecture plan complete. Designed 3-layer payment system."
+
+  Wait for completion ✓
+
+Phase 2: Implementation (depends on Phase 1)
+  Task: backend-developer
+    Prompt: "Read ai-docs/payment-architecture.md
+             Implement payment integration"
+    Output: src/payment.ts, src/webhooks.ts
+    Return: "Payment integration implemented. 2 new files, 500 lines."
+
+  Wait for completion ✓
+
+Phase 3: Testing (depends on Phase 2)
+  Task: test-architect
+    Prompt: "Write tests for src/payment.ts and src/webhooks.ts"
+    Output: tests/payment.test.ts, tests/webhooks.test.ts
+    Return: "Test suite complete. 20 tests covering payment flows."
+
+  Wait for completion ✓
+
+Phase 4: Code Review (depends on Phase 3)
+  Task: senior-code-reviewer
+    Prompt: "Review payment integration implementation"
+    Output: ai-docs/payment-review.md
+    Return: "Review complete. 2 MEDIUM issues found."
+
+  Wait for completion ✓
+```
+
+**Result:** Sequential execution ensures each phase has correct inputs
+
+---
+
+### Example 3: Adaptive Agent Switching
+
+**Scenario:** User requests "Validate navbar implementation" with optional external AI
+
+**Agent Selection:**
+- Task type: UI validation
+- Base agent: designer
+- Optional: designer-codex (if user wants external validation)
+
+**Execution:**
+
+```
+Step 1: Ask user preference
+  "Do you want external AI validation? (Yes/No)"
+
+Step 2a: If user says NO (speed mode)
+  Task: designer
+    Prompt: "Validate navbar against Figma design"
+    Output: ai-docs/design-review.md
+    Return: "Design validation complete. PASS with 2 minor suggestions."
+
+Step 2b: If user says YES (quality mode)
+  Message 1: Parallel Validation
+    Task: designer
+      Prompt: "Validate navbar against Figma design"
+    ---
+    Task: designer PROXY_MODE: design-review-codex
+      Prompt: "Validate navbar against Figma design"
+
+  Message 2: Consolidate
+    Task: designer
+      Prompt: "Consolidate 2 design reviews. Prioritize by consensus."
+      Output: ai-docs/design-review-consolidated.md
+      Return: "Consolidated review complete. Both agree on 1 CRITICAL issue."
+
+Step 3: User validation
+  Present consolidated review to user for approval
+```
+
+**Result:** Adaptive workflow based on user preference (speed vs quality)
+
+---
+
+## Troubleshooting
+
+**Problem: Parallel tasks executing sequentially**
+
+Cause: Mixed tool types in same message
+
+Solution: Use 4-Message Pattern with ONLY Task calls in Message 2
+
+```
+❌ Wrong:
+  await TodoWrite({...});
+  await Task({...});
+  await Task({...});
+
+✅ Correct:
+  Message 1: await Bash({...});  (prep only)
+  Message 2: await Task({...}); await Task({...}); (parallel)
+```
+
+---
+
+**Problem: Orchestrator context overflowing**
+
+Cause: Inline instructions or full output returns
+
+Solution: Use file-based delegation + brief summaries
+
+```
+❌ Wrong:
+  Task: agent
+    Prompt: "[1000 lines of inline requirements]"
+  Return: "[500 lines of full output]"
+
+✅ Correct:
+  Write: ai-docs/requirements.md
+  Task: agent
+    Prompt: "Read ai-docs/requirements.md"
+  Return: "Complete. See ai-docs/output.md"
+```
+
+---
+
+**Problem: Wrong agent selected for task**
+
+Cause: Task type detection failed
+
+Solution: Explicitly detect task type using keywords
+
+```
+Check user request for keywords:
+  - API/endpoint/backend → api-architect, backend-developer
+  - UI/component/design → designer, ui-developer
+  - test/coverage → test-architect
+  - review/validate → senior-code-reviewer
+
+Default: Ask user to clarify task type
+```
+
+---
+
+**Problem: Agent returns immediately before external model completes**
+
+Cause: Background execution (non-blocking claudish call)
+
+Solution: Use synchronous (blocking) execution
+
+```
+❌ Wrong:
+  claudish --model grok ... &  (background, returns immediately)
+
+✅ Correct:
+  RESULT=$(claudish --model grok ...)  (blocks until complete)
+```
+
+---
+
+## Summary
+
+Multi-agent coordination is about choosing the right execution strategy:
+
+- **Parallel** when tasks are independent (3-5x speedup)
+- **Sequential** when tasks have dependencies (correct results)
+- **File-based delegation** to avoid context pollution (50-80% savings)
+- **Brief summaries** from sub-agents (clean orchestrator context)
+- **Task type detection** for intelligent agent selection
+- **Context decomposition** for large tasks (avoid overflow)
+
+Master these patterns and you can orchestrate workflows of any complexity.
+
+---
+
+**Extracted From:**
+- `/implement` command (task detection, sequential workflows)
+- `/validate-ui` command (adaptive agent switching)
+- `/review` command (parallel execution, 4-Message Pattern)
+- `CLAUDE.md` Parallel Multi-Model Execution Protocol
--- a/skills/multi-model-validation/SKILL.md
+++ b/skills/multi-model-validation/SKILL.md
--- a/skills/quality-gates/SKILL.md
+++ b/skills/quality-gates/SKILL.md
@@ -0,0 +1,996 @@
+---
+name: quality-gates
+description: Implement quality gates, user approval, iteration loops, and test-driven development. Use when validating with users, implementing feedback loops, classifying issue severity, running test-driven loops, or building multi-iteration workflows. Trigger keywords - "approval", "user validation", "iteration", "feedback loop", "severity", "test-driven", "TDD", "quality gate", "consensus".
+version: 0.1.0
+tags: [orchestration, quality-gates, approval, iteration, feedback, severity, test-driven, TDD]
+keywords: [approval, validation, iteration, feedback-loop, severity, test-driven, TDD, quality-gate, consensus, user-approval]
+---
+
+# Quality Gates
+
+**Version:** 1.0.0
+**Purpose:** Patterns for approval gates, iteration loops, and quality validation in multi-agent workflows
+**Status:** Production Ready
+
+## Overview
+
+Quality gates are checkpoints in workflows where execution pauses for validation before proceeding. They prevent low-quality work from advancing through the pipeline and ensure user expectations are met.
+
+This skill provides battle-tested patterns for:
+- **User approval gates** (cost gates, quality gates, final acceptance)
+- **Iteration loops** (automated refinement until quality threshold met)
+- **Issue severity classification** (CRITICAL, HIGH, MEDIUM, LOW)
+- **Multi-reviewer consensus** (unanimous vs majority agreement)
+- **Feedback loops** (user reports issues → agent fixes → user validates)
+- **Test-driven development loops** (write tests → run → analyze failures → fix → repeat)
+
+Quality gates transform "fire and forget" workflows into **iterative refinement systems** that consistently produce high-quality results.
+
+## Core Patterns
+
+### Pattern 1: User Approval Gates
+
+**When to Ask for Approval:**
+
+Use approval gates for:
+- **Cost gates:** Before expensive operations (multi-model review, large-scale refactoring)
+- **Quality gates:** Before proceeding to next phase (design validation before implementation)
+- **Final validation:** Before completing workflow (user acceptance testing)
+- **Irreversible operations:** Before destructive actions (delete files, database migrations)
+
+**How to Present Approval:**
+
+```
+Good Approval Prompt:
+
+"You selected 5 AI models for code review:
+ - Claude Sonnet (embedded, free)
+ - Grok Code Fast (external, $0.002)
+ - Gemini 2.5 Flash (external, $0.001)
+ - GPT-5 Codex (external, $0.004)
+ - DeepSeek Coder (external, $0.001)
+
+ Estimated total cost: $0.008 ($0.005 - $0.010)
+ Expected duration: ~5 minutes
+
+ Proceed with multi-model review? (Yes/No/Cancel)"
+
+Why it works:
+✓ Clear context (what will happen)
+✓ Cost transparency (range, not single number)
+✓ Time expectation (5 minutes)
+✓ Multiple options (Yes/No/Cancel)
+```
+
+**Anti-Pattern: Vague Approval**
+
+```
+❌ Wrong:
+
+"This will cost money. Proceed? (Yes/No)"
+
+Why it fails:
+✗ No cost details (how much?)
+✗ No context (what will happen?)
+✗ No alternatives (what if user says no?)
+```
+
+**Handling User Responses:**
+
+```
+User says YES:
+  → Proceed with workflow
+  → Track approval in logs
+  → Continue to next step
+
+User says NO:
+  → Offer alternatives:
+    1. Use fewer models (reduce cost)
+    2. Use only free embedded Claude
+    3. Skip this step entirely
+    4. Cancel workflow
+  → Ask user to choose alternative
+  → Proceed based on choice
+
+User says CANCEL:
+  → Gracefully exit workflow
+  → Save partial results (if any)
+  → Log cancellation reason
+  → Clean up temporary files
+  → Notify user: "Workflow cancelled. Partial results saved to..."
+```
+
+**Approval Bypasses (Advanced):**
+
+For automated workflows, allow approval bypass:
+
+```
+Automated Workflow Mode:
+
+If workflow is triggered by CI/CD or scheduled task:
+  → Skip user approval gates
+  → Use predefined defaults (e.g., max cost $0.10)
+  → Log decisions for audit trail
+  → Email report to stakeholders after completion
+
+Example:
+  if (isAutomatedMode) {
+    if (estimatedCost <= maxAutomatedCost) {
+      log("Auto-approved: $0.008 <= $0.10 threshold");
+      proceed();
+    } else {
+      log("Auto-rejected: $0.008 > $0.10 threshold");
+      notifyStakeholders("Cost exceeds automated threshold");
+      abort();
+    }
+  }
+```
+
+---
+
+### Pattern 2: Iteration Loop Patterns
+
+**Max Iteration Limits:**
+
+Always set a **max iteration limit** to prevent infinite loops:
+
+```
+Typical Iteration Limits:
+
+Automated quality loops: 10 iterations
+  - Designer validation → Developer fixes → Repeat
+  - Test failures → Developer fixes → Repeat
+
+User feedback loops: 5 rounds
+  - User reports issues → Developer fixes → User validates → Repeat
+
+Code review loops: 3 rounds
+  - Reviewer finds issues → Developer fixes → Re-review → Repeat
+
+Multi-model consensus: 1 iteration (no loop)
+  - Parallel review → Consolidate → Present
+```
+
+**Exit Criteria:**
+
+Define clear **exit criteria** for each loop type:
+
+```
+Loop Type: Design Validation
+
+Exit Criteria (checked after each iteration):
+  1. Designer assessment = PASS → Exit loop (success)
+  2. Iteration count >= 10 → Exit loop (max iterations)
+  3. User manually approves → Exit loop (user override)
+  4. No changes made by developer → Exit loop (stuck, escalate)
+
+Example:
+  for (let i = 1; i <= 10; i++) {
+    const review = await designer.validate();
+
+    if (review.assessment === "PASS") {
+      log("Design validation passed on iteration " + i);
+      break;  // Success exit
+    }
+
+    if (i === 10) {
+      log("Max iterations reached. Escalating to user validation.");
+      break;  // Max iterations exit
+    }
+
+    await developer.fix(review.issues);
+  }
+```
+
+**Progress Tracking:**
+
+Show clear progress to user during iterations:
+
+```
+Iteration Loop Progress:
+
+Iteration 1/10: Designer found 5 issues → Developer fixing...
+Iteration 2/10: Designer found 3 issues → Developer fixing...
+Iteration 3/10: Designer found 1 issue → Developer fixing...
+Iteration 4/10: Designer assessment: PASS ✓
+
+Loop completed in 4 iterations.
+```
+
+**Iteration History Documentation:**
+
+Track what happened in each iteration:
+
+```
+Iteration History (ai-docs/iteration-history.md):
+
+## Iteration 1
+Designer Assessment: NEEDS IMPROVEMENT
+Issues Found:
+  - Button color doesn't match design (#3B82F6 vs #2563EB)
+  - Spacing between elements too tight (8px vs 16px)
+  - Font size incorrect (14px vs 16px)
+Developer Actions:
+  - Updated button color to #2563EB
+  - Increased spacing to 16px
+  - Changed font size to 16px
+
+## Iteration 2
+Designer Assessment: NEEDS IMPROVEMENT
+Issues Found:
+  - Border radius too large (8px vs 4px)
+Developer Actions:
+  - Reduced border radius to 4px
+
+## Iteration 3
+Designer Assessment: PASS ✓
+Issues Found: None
+Result: Design validation complete
+```
+
+---
+
+### Pattern 3: Issue Severity Classification
+
+**Severity Levels:**
+
+Use 4-level severity classification:
+
+```
+CRITICAL - Must fix immediately
+  - Blocks core functionality
+  - Security vulnerabilities (SQL injection, XSS, auth bypass)
+  - Data loss risk
+  - System crashes
+  - Build failures
+
+  Action: STOP workflow, fix immediately, re-validate
+
+HIGH - Should fix soon
+  - Major bugs (incorrect behavior)
+  - Performance issues (>3s page load, memory leaks)
+  - Accessibility violations (keyboard navigation broken)
+  - User experience blockers
+
+  Action: Fix in current iteration, proceed after fix
+
+MEDIUM - Should fix
+  - Minor bugs (edge cases, visual glitches)
+  - Code quality issues (duplication, complexity)
+  - Non-blocking performance issues
+  - Incomplete error handling
+
+  Action: Fix if time permits, or schedule for next iteration
+
+LOW - Nice to have
+  - Code style inconsistencies
+  - Minor refactoring opportunities
+  - Documentation improvements
+  - Polish and optimization
+
+  Action: Log for future improvement, proceed without fixing
+```
+
+**Severity-Based Prioritization:**
+
+```
+Issue List (sorted by severity):
+
+CRITICAL Issues (must fix all before proceeding):
+  1. SQL injection in user search endpoint
+  2. Missing authentication check on admin routes
+  3. Password stored in plaintext
+
+HIGH Issues (fix before code review):
+  4. Memory leak in WebSocket connection
+  5. Missing error handling in payment flow
+  6. Accessibility: keyboard navigation broken
+
+MEDIUM Issues (fix if time permits):
+  7. Code duplication in auth controllers
+  8. Inconsistent error messages
+  9. Missing JSDoc comments
+
+LOW Issues (defer to future):
+  10. Variable naming inconsistency
+  11. Redundant type annotations
+  12. CSS could use more specificity
+
+Action Plan:
+  - Fix CRITICAL (1-3) immediately → Re-run tests
+  - Fix HIGH (4-6) before code review
+  - Log MEDIUM (7-9) for next iteration
+  - Ignore LOW (10-12) for now
+```
+
+**Severity Escalation:**
+
+Issues can escalate in severity based on context:
+
+```
+Context-Based Escalation:
+
+Issue: "Missing error handling in payment flow"
+  Base Severity: MEDIUM (code quality issue)
+
+  Context 1: Development environment
+    → Severity: MEDIUM (not user-facing yet)
+
+  Context 2: Production environment
+    → Severity: HIGH (affects real users, money involved)
+
+  Context 3: Production + recent payment failures
+    → Severity: CRITICAL (actively causing issues)
+
+Rule: Escalate severity when:
+  - Issue affects production users
+  - Issue involves money/security/data
+  - Issue is currently causing failures
+```
+
+---
+
+### Pattern 4: Multi-Reviewer Consensus
+
+**Consensus Levels:**
+
+When multiple reviewers evaluate the same code/design:
+
+```
+UNANIMOUS (100% agreement):
+  - ALL reviewers flagged this issue
+  - VERY HIGH confidence
+  - Highest priority (likely a real problem)
+
+Example:
+  3/3 reviewers: "SQL injection in search endpoint"
+  → UNANIMOUS consensus
+  → CRITICAL priority (all agree it's critical)
+
+STRONG CONSENSUS (67-99% agreement):
+  - MOST reviewers flagged this issue
+  - HIGH confidence
+  - High priority (probably a real problem)
+
+Example:
+  2/3 reviewers: "Missing input validation"
+  → STRONG consensus (67%)
+  → HIGH priority
+
+MAJORITY (50-66% agreement):
+  - HALF or more flagged this issue
+  - MEDIUM confidence
+  - Medium priority (worth investigating)
+
+Example:
+  2/3 reviewers: "Code duplication in controllers"
+  → MAJORITY consensus (67%)
+  → MEDIUM priority
+
+DIVERGENT (< 50% agreement):
+  - Only 1-2 reviewers flagged this issue
+  - LOW confidence
+  - Low priority (may be model-specific or false positive)
+
+Example:
+  1/3 reviewers: "Variable naming could be better"
+  → DIVERGENT (33%)
+  → LOW priority (one reviewer's opinion)
+```
+
+**Consensus-Based Prioritization:**
+
+```
+Prioritized Issue List (by consensus + severity):
+
+1. [UNANIMOUS - CRITICAL] SQL injection in search
+   ALL reviewers agree: Claude, Grok, Gemini (3/3)
+
+2. [UNANIMOUS - HIGH] Missing input validation
+   ALL reviewers agree: Claude, Grok, Gemini (3/3)
+
+3. [STRONG - HIGH] Memory leak in WebSocket
+   MOST reviewers agree: Claude, Grok (2/3)
+
+4. [MAJORITY - MEDIUM] Code duplication
+   HALF+ reviewers agree: Claude, Gemini (2/3)
+
+5. [DIVERGENT - LOW] Variable naming
+   SINGLE reviewer: Claude only (1/3)
+
+Action:
+  - Fix issues 1-2 immediately (unanimous + CRITICAL/HIGH)
+  - Fix issue 3 before review (strong consensus)
+  - Consider issue 4 (majority, but medium severity)
+  - Ignore issue 5 (divergent, likely false positive)
+```
+
+---
+
+### Pattern 5: Feedback Loop Implementation
+
+**User Feedback Loop:**
+
+```
+Workflow: User Validation with Feedback
+
+Step 1: Initial Implementation
+  Developer implements feature
+  Designer/Tester validates
+  Present to user for manual validation
+
+Step 2: User Validation Gate (MANDATORY)
+  Present to user:
+    "Implementation complete. Please manually verify:
+     - Open app at http://localhost:3000
+     - Test feature: [specific instructions]
+     - Compare to design reference
+
+     Does it meet expectations? (Yes/No)"
+
+Step 3a: User says YES
+  → ✅ Feature approved
+  → Generate final report
+  → Mark workflow complete
+
+Step 3b: User says NO
+  → Collect specific feedback
+
+Step 4: Collect Specific Feedback
+  Ask user: "Please describe the issues you found:"
+
+  User response:
+    "1. Button color is wrong (should be blue, not green)
+     2. Spacing is too tight between elements
+     3. Font size is too small"
+
+Step 5: Extract Structured Feedback
+  Parse user feedback into structured issues:
+
+  Issue 1:
+    Component: Button
+    Problem: Color incorrect
+    Expected: Blue (#2563EB)
+    Actual: Green (#10B981)
+    Severity: MEDIUM
+
+  Issue 2:
+    Component: Container
+    Problem: Spacing too tight
+    Expected: 16px
+    Actual: 8px
+    Severity: MEDIUM
+
+  Issue 3:
+    Component: Text
+    Problem: Font size too small
+    Expected: 16px
+    Actual: 14px
+    Severity: LOW
+
+Step 6: Launch Fixing Agent
+  Task: ui-developer
+    Prompt: "Fix user-reported issues:
+
+             1. Button color: Change from #10B981 to #2563EB
+             2. Container spacing: Increase from 8px to 16px
+             3. Text font size: Increase from 14px to 16px
+
+             User feedback: [user's exact words]"
+
+Step 7: Re-validate
+  After fixes:
+    - Re-run designer validation
+    - Loop back to Step 2 (user validation)
+
+Step 8: Max Feedback Rounds
+  Limit: 5 feedback rounds (prevent infinite loop)
+
+  If round > 5:
+    Escalate to human review
+    "Unable to meet user expectations after 5 rounds.
+     Manual intervention required."
+```
+
+**Feedback Round Tracking:**
+
+```
+Feedback Round History:
+
+Round 1:
+  User Issues: Button color, spacing, font size
+  Fixes Applied: Updated all 3 issues
+  Result: Re-validate
+
+Round 2:
+  User Issues: Border radius too large
+  Fixes Applied: Reduced border radius
+  Result: Re-validate
+
+Round 3:
+  User Issues: None
+  Result: ✅ APPROVED
+
+Total Rounds: 3/5
+```
+
+---
+
+### Pattern 6: Test-Driven Development Loop
+
+**When to Use:**
+
+Use TDD loop **after implementing code, before code review**:
+
+```
+Workflow Phases:
+
+Phase 1: Architecture Planning
+Phase 2: Implementation
+Phase 2.5: Test-Driven Development Loop ← THIS PATTERN
+Phase 3: Code Review
+Phase 4: User Acceptance
+```
+
+**The TDD Loop Pattern:**
+
+```
+Step 1: Write Tests First
+  Task: test-architect
+    Prompt: "Write comprehensive tests for authentication feature.
+             Requirements: [link to requirements]
+             Implementation: [link to code]"
+    Output: tests/auth.test.ts
+
+Step 2: Run Tests
+  Bash: bun test tests/auth.test.ts
+  Capture output and exit code
+
+Step 3: Check Test Results
+  If all tests pass:
+    → ✅ TDD loop complete
+    → Proceed to code review (Phase 3)
+
+  If tests fail:
+    → Analyze failure (continue to Step 4)
+
+Step 4: Analyze Test Failure
+  Task: test-architect
+    Prompt: "Analyze test failure output:
+
+             [test failure logs]
+
+             Determine root cause:
+             - TEST_ISSUE: Test has bug (bad assertion, missing mock, wrong expectation)
+             - IMPLEMENTATION_ISSUE: Code has bug (logic error, missing validation, incorrect behavior)
+
+             Provide detailed analysis."
+
+  test-architect returns:
+    verdict: TEST_ISSUE | IMPLEMENTATION_ISSUE
+    analysis: Detailed explanation
+    recommendation: Specific fix needed
+
+Step 5a: If TEST_ISSUE (test is wrong)
+  Task: test-architect
+    Prompt: "Fix test based on analysis:
+             [analysis from Step 4]"
+
+  After fix:
+    → Re-run tests (back to Step 2)
+    → Loop continues
+
+Step 5b: If IMPLEMENTATION_ISSUE (code is wrong)
+  Provide structured feedback to developer:
+
+  Task: backend-developer
+    Prompt: "Fix implementation based on test failure:
+
+             Test Failure:
+             [failure output]
+
+             Root Cause:
+             [analysis from test-architect]
+
+             Recommended Fix:
+             [specific fix needed]"
+
+  After fix:
+    → Re-run tests (back to Step 2)
+    → Loop continues
+
+Step 6: Max Iteration Limit
+  Limit: 10 iterations
+
+  Iteration tracking:
+    Iteration 1/10: 5 tests failed → Fix implementation
+    Iteration 2/10: 2 tests failed → Fix test (bad mock)
+    Iteration 3/10: All tests pass ✅
+
+  If iteration > 10:
+    Escalate to human review
+    "Unable to pass all tests after 10 iterations.
+     Manual debugging required."
+```
+
+**Example TDD Loop:**
+
+```
+Phase 2.5: Test-Driven Development Loop
+
+Iteration 1:
+  Tests Run: 20 tests
+  Results: 5 failed, 15 passed
+  Failure: "JWT token validation fails with expired token"
+  Analysis: IMPLEMENTATION_ISSUE - Missing expiration check
+  Fix: Added expiration validation in TokenService
+  Re-run: Continue to Iteration 2
+
+Iteration 2:
+  Tests Run: 20 tests
+  Results: 2 failed, 18 passed
+  Failure: "Mock database not reset between tests"
+  Analysis: TEST_ISSUE - Missing beforeEach cleanup
+  Fix: Added database reset in test setup
+  Re-run: Continue to Iteration 3
+
+Iteration 3:
+  Tests Run: 20 tests
+  Results: All passed ✅
+  Result: TDD loop complete, proceed to code review
+
+Total Iterations: 3/10
+Duration: ~5 minutes
+Benefits:
+  - Caught 2 bugs before code review
+  - Fixed 1 test quality issue
+  - All tests passing gives confidence in implementation
+```
+
+**Benefits of TDD Loop:**
+
+```
+Benefits:
+
+1. Catch bugs early (before code review, not after)
+2. Ensure test quality (test-architect fixes bad tests)
+3. Automated quality assurance (no manual testing needed)
+4. Fast feedback loop (seconds to run tests, not minutes)
+5. Confidence in implementation (all tests passing)
+
+Performance:
+  Traditional: Implement → Review → Find bugs → Fix → Re-review
+  Time: 30+ minutes, multiple review rounds
+
+  TDD Loop: Implement → Test → Fix → Test → Review (with confidence)
+  Time: 15 minutes, single review round (fewer issues)
+```
+
+---
+
+## Integration with Other Skills
+
+**quality-gates + multi-model-validation:**
+
+```
+Use Case: Cost approval before multi-model review
+
+Step 1: Estimate costs (multi-model-validation)
+Step 2: User approval gate (quality-gates)
+  If approved: Proceed with parallel execution
+  If rejected: Offer alternatives
+Step 3: Execute review (multi-model-validation)
+```
+
+**quality-gates + multi-agent-coordination:**
+
+```
+Use Case: Iteration loop with designer validation
+
+Step 1: Agent selection (multi-agent-coordination)
+  Select designer + ui-developer
+
+Step 2: Iteration loop (quality-gates)
+  For i = 1 to 10:
+    - Run designer validation
+    - If PASS: Exit loop
+    - Else: Delegate to ui-developer for fixes
+
+Step 3: User validation gate (quality-gates)
+  Mandatory manual approval
+```
+
+**quality-gates + error-recovery:**
+
+```
+Use Case: Test-driven loop with error recovery
+
+Step 1: Run tests (quality-gates TDD pattern)
+Step 2: If test execution fails (error-recovery)
+  - Syntax error → Fix and retry
+  - Framework crash → Notify user, skip TDD
+Step 3: If tests pass (quality-gates)
+  - Proceed to code review
+```
+
+---
+
+## Best Practices
+
+**Do:**
+- ✅ Set max iteration limits (prevent infinite loops)
+- ✅ Define clear exit criteria (PASS, max iterations, user override)
+- ✅ Track iteration history (document what happened)
+- ✅ Show progress to user ("Iteration 3/10 complete")
+- ✅ Classify issue severity (CRITICAL → HIGH → MEDIUM → LOW)
+- ✅ Prioritize by consensus + severity
+- ✅ Ask user approval for expensive operations
+- ✅ Collect specific feedback (not vague complaints)
+- ✅ Use TDD loop to catch bugs early
+
+**Don't:**
+- ❌ Create infinite loops (no exit criteria)
+- ❌ Skip user validation gates (mandatory for UX)
+- ❌ Ignore consensus (unanimous issues are real)
+- ❌ Batch all severities together (prioritize CRITICAL)
+- ❌ Proceed without approval for >$0.01 operations
+- ❌ Collect vague feedback ("it's wrong" → what specifically?)
+- ❌ Skip TDD loop (catches bugs before expensive review)
+
+**Performance:**
+- Iteration loops: 5-10 iterations typical, max 10-15 min
+- TDD loop: 3-5 iterations typical, max 5-10 min
+- User feedback: 1-3 rounds typical, max 5 rounds
+
+---
+
+## Examples
+
+### Example 1: User Approval Gate for Multi-Model Review
+
+**Scenario:** User requests multi-model review, costs $0.008
+
+**Execution:**
+
+```
+Step 1: Estimate Costs
+  Input: 450 lines × 1.5 = 675 tokens per model
+  Output: 2000-4000 tokens per model
+  Total: 3 models × 3000 avg = 9000 output tokens
+  Cost: ~$0.008 ($0.005 - $0.010)
+
+Step 2: Present Approval Gate
+  "Multi-model review will analyze 450 lines with 3 AI models:
+   - Claude Sonnet (embedded, free)
+   - Grok Code Fast (external, $0.002)
+   - Gemini 2.5 Flash (external, $0.001)
+
+   Estimated cost: $0.008 ($0.005 - $0.010)
+   Duration: ~5 minutes
+
+   Proceed? (Yes/No/Cancel)"
+
+Step 3a: User says YES
+  → Proceed with parallel execution
+  → Track approval: log("User approved $0.008 cost")
+
+Step 3b: User says NO
+  → Offer alternatives:
+    1. Use only free Claude (no external models)
+    2. Use only 1 external model (reduce cost to $0.002)
+    3. Skip review entirely
+  → Ask user to choose
+
+Step 3c: User says CANCEL
+  → Exit gracefully
+  → Log: "User cancelled multi-model review"
+  → Clean up temporary files
+```
+
+---
+
+### Example 2: Designer Validation Iteration Loop
+
+**Scenario:** UI implementation with automated iteration until PASS
+
+**Execution:**
+
+```
+Iteration 1:
+  Task: designer
+    Prompt: "Validate navbar against Figma design"
+    Output: ai-docs/design-review-1.md
+    Assessment: NEEDS IMPROVEMENT
+    Issues:
+      - Button color: #3B82F6 (expected #2563EB)
+      - Spacing: 8px (expected 16px)
+
+  Task: ui-developer
+    Prompt: "Fix issues from ai-docs/design-review-1.md"
+    Changes: Updated button color, increased spacing
+
+  Result: Continue to Iteration 2
+
+Iteration 2:
+  Task: designer
+    Prompt: "Re-validate navbar"
+    Output: ai-docs/design-review-2.md
+    Assessment: NEEDS IMPROVEMENT
+    Issues:
+      - Border radius: 8px (expected 4px)
+
+  Task: ui-developer
+    Prompt: "Fix border radius issue"
+    Changes: Reduced border radius to 4px
+
+  Result: Continue to Iteration 3
+
+Iteration 3:
+  Task: designer
+    Prompt: "Re-validate navbar"
+    Output: ai-docs/design-review-3.md
+    Assessment: PASS ✓
+    Issues: None
+
+  Result: Exit loop (success)
+
+Summary:
+  Total Iterations: 3/10
+  Duration: ~8 minutes
+  Automated Fixes: 3 issues resolved
+  Result: PASS, proceed to user validation
+```
+
+---
+
+### Example 3: Test-Driven Development Loop
+
+**Scenario:** Authentication implementation with TDD
+
+**Execution:**
+
+```
+Phase 2.5: Test-Driven Development Loop
+
+Iteration 1:
+  Task: test-architect
+    Prompt: "Write tests for authentication feature"
+    Output: tests/auth.test.ts (20 tests)
+
+  Bash: bun test tests/auth.test.ts
+    Result: 5 failed, 15 passed
+
+  Task: test-architect
+    Prompt: "Analyze test failures"
+    Verdict: IMPLEMENTATION_ISSUE
+    Analysis: "Missing JWT expiration validation"
+
+  Task: backend-developer
+    Prompt: "Add JWT expiration validation"
+    Changes: Updated TokenService.verify()
+
+  Bash: bun test tests/auth.test.ts
+    Result: Continue to Iteration 2
+
+Iteration 2:
+  Bash: bun test tests/auth.test.ts
+    Result: 2 failed, 18 passed
+
+  Task: test-architect
+    Prompt: "Analyze test failures"
+    Verdict: TEST_ISSUE
+    Analysis: "Mock database not reset between tests"
+
+  Task: test-architect
+    Prompt: "Fix test setup"
+    Changes: Added beforeEach cleanup
+
+  Bash: bun test tests/auth.test.ts
+    Result: Continue to Iteration 3
+
+Iteration 3:
+  Bash: bun test tests/auth.test.ts
+    Result: All 20 passed ✅
+
+  Result: TDD loop complete, proceed to code review
+
+Summary:
+  Total Iterations: 3/10
+  Duration: ~5 minutes
+  Bugs Caught: 1 implementation bug, 1 test bug
+  Result: All tests passing, high confidence in code
+```
+
+---
+
+## Troubleshooting
+
+**Problem: Infinite iteration loop**
+
+Cause: No exit criteria or max iteration limit
+
+Solution: Always set max iterations (10 for automated, 5 for user feedback)
+
+```
+❌ Wrong:
+  while (true) {
+    if (review.assessment === "PASS") break;
+    fix();
+  }
+
+✅ Correct:
+  for (let i = 1; i <= 10; i++) {
+    if (review.assessment === "PASS") break;
+    if (i === 10) escalateToUser();
+    fix();
+  }
+```
+
+---
+
+**Problem: User approval skipped for expensive operation**
+
+Cause: Missing approval gate
+
+Solution: Always ask approval for costs >$0.01
+
+```
+❌ Wrong:
+  if (userRequestedMultiModel) {
+    executeReview();
+  }
+
+✅ Correct:
+  if (userRequestedMultiModel) {
+    const cost = estimateCost();
+    if (cost > 0.01) {
+      const approved = await askUserApproval(cost);
+      if (!approved) return offerAlternatives();
+    }
+    executeReview();
+  }
+```
+
+---
+
+**Problem: All issues treated equally**
+
+Cause: No severity classification
+
+Solution: Classify by severity, prioritize CRITICAL
+
+```
+❌ Wrong:
+  issues.forEach(issue => fix(issue));
+
+✅ Correct:
+  const critical = issues.filter(i => i.severity === "CRITICAL");
+  const high = issues.filter(i => i.severity === "HIGH");
+
+  critical.forEach(issue => fix(issue));  // Fix critical first
+  high.forEach(issue => fix(issue));      // Then high
+  // MEDIUM and LOW deferred or skipped
+```
+
+---
+
+## Summary
+
+Quality gates ensure high-quality results through:
+
+- **User approval gates** (cost, quality, final validation)
+- **Iteration loops** (automated refinement, max 10 iterations)
+- **Severity classification** (CRITICAL → HIGH → MEDIUM → LOW)
+- **Consensus prioritization** (unanimous → strong → majority → divergent)
+- **Feedback loops** (collect specific issues, fix, re-validate)
+- **Test-driven development** (write tests, run, fix, repeat until pass)
+
+Master these patterns and your workflows will consistently produce high-quality, validated results.
+
+---
+
+**Extracted From:**
+- `/review` command (user approval for costs, consensus analysis)
+- `/validate-ui` command (iteration loops, user validation gates, feedback collection)
+- `/implement` command (PHASE 2.5 test-driven development loop)
+- Multi-model review patterns (consensus-based prioritization)
--- a/skills/todowrite-orchestration/SKILL.md
+++ b/skills/todowrite-orchestration/SKILL.md
@@ -0,0 +1,983 @@
+---
+name: todowrite-orchestration
+description: Track progress in multi-phase workflows with TodoWrite. Use when orchestrating 5+ phase commands, managing iteration loops, tracking parallel tasks, or providing real-time progress visibility. Trigger keywords - "phase tracking", "progress", "workflow", "multi-step", "multi-phase", "todo", "tracking", "status".
+version: 0.1.0
+tags: [orchestration, todowrite, progress, tracking, workflow, multi-phase]
+keywords: [phase-tracking, progress, workflow, multi-step, multi-phase, todo, tracking, status, visibility]
+---
+
+# TodoWrite Orchestration
+
+**Version:** 1.0.0
+**Purpose:** Patterns for using TodoWrite in complex multi-phase workflows
+**Status:** Production Ready
+
+## Overview
+
+TodoWrite orchestration is the practice of using the TodoWrite tool to provide **real-time progress visibility** in complex multi-phase workflows. It transforms opaque "black box" workflows into transparent, trackable processes where users can see:
+
+- What phase is currently executing
+- How many phases remain
+- Which tasks are pending, in-progress, or completed
+- Overall progress percentage
+- Iteration counts in loops
+
+This skill provides battle-tested patterns for:
+- **Phase initialization** (create complete task list before starting)
+- **Task granularity** (how to break phases into trackable tasks)
+- **Status transitions** (pending → in_progress → completed)
+- **Real-time updates** (mark complete immediately, not batched)
+- **Iteration tracking** (progress through loops)
+- **Parallel task tracking** (multiple agents executing simultaneously)
+
+TodoWrite orchestration is especially valuable for workflows with >5 phases or >10 minutes duration, where users need progress feedback.
+
+## Core Patterns
+
+### Pattern 1: Phase Initialization
+
+**Create TodoWrite List BEFORE Starting:**
+
+Initialize TodoWrite as **step 0** of your workflow, before any actual work begins:
+
+```
+✅ CORRECT - Initialize First:
+
+Step 0: Initialize TodoWrite
+  TodoWrite: Create task list
+    - PHASE 1: Gather user inputs
+    - PHASE 1: Validate inputs
+    - PHASE 2: Select AI models
+    - PHASE 2: Estimate costs
+    - PHASE 2: Get user approval
+    - PHASE 3: Launch parallel reviews
+    - PHASE 3: Wait for all reviews
+    - PHASE 4: Consolidate reviews
+    - PHASE 5: Present results
+
+Step 1: Start actual work (PHASE 1)
+  Mark "PHASE 1: Gather user inputs" as in_progress
+  ... do work ...
+  Mark "PHASE 1: Gather user inputs" as completed
+  Mark "PHASE 1: Validate inputs" as in_progress
+  ... do work ...
+
+❌ WRONG - Create During Workflow:
+
+Step 1: Do some work
+  ... work happens ...
+  TodoWrite: Create task "Did some work" (completed)
+
+Step 2: Do more work
+  ... work happens ...
+  TodoWrite: Create task "Did more work" (completed)
+
+Problem: User has no visibility into upcoming phases
+```
+
+**List All Phases Upfront:**
+
+When initializing, include **all phases** in the task list, not just the current phase:
+
+```
+✅ CORRECT - Complete Visibility:
+
+TodoWrite Initial State:
+  [ ] PHASE 1: Gather user inputs
+  [ ] PHASE 1: Validate inputs
+  [ ] PHASE 2: Architecture planning
+  [ ] PHASE 3: Implementation
+  [ ] PHASE 3: Run quality checks
+  [ ] PHASE 4: Code review
+  [ ] PHASE 5: User acceptance
+  [ ] PHASE 6: Generate report
+
+User sees: "8 tasks total, 0 complete, Phase 1 starting"
+
+❌ WRONG - Incremental Discovery:
+
+TodoWrite Initial State:
+  [ ] PHASE 1: Gather user inputs
+  [ ] PHASE 1: Validate inputs
+
+(User thinks workflow is 2 tasks, then surprised by 6 more phases)
+```
+
+**Why Initialize First:**
+
+1. **User expectation setting:** User knows workflow scope (8 phases, ~20 minutes)
+2. **Progress visibility:** User can see % complete (3/8 = 37.5%)
+3. **Time estimation:** User can estimate remaining time based on progress
+4. **Transparency:** No hidden phases or surprises
+
+---
+
+### Pattern 2: Task Granularity Guidelines
+
+**One Task Per Significant Operation:**
+
+Each task should represent a **significant operation** (1-5 minutes of work):
+
+```
+✅ CORRECT - Significant Operations:
+
+Tasks:
+  - PHASE 1: Ask user for inputs (30s)
+  - PHASE 2: Generate architecture plan (2 min)
+  - PHASE 3: Implement feature (5 min)
+  - PHASE 4: Run tests (1 min)
+  - PHASE 5: Code review (3 min)
+
+Each task = meaningful unit of work
+
+❌ WRONG - Too Granular:
+
+Tasks:
+  - PHASE 1: Ask user question 1
+  - PHASE 1: Ask user question 2
+  - PHASE 1: Ask user question 3
+  - PHASE 2: Read file A
+  - PHASE 2: Read file B
+  - PHASE 2: Write file C
+  - ... (50 micro-tasks)
+
+Problem: Too many updates, clutters user interface
+```
+
+**Multi-Step Phases: Break Into 2-3 Sub-Tasks:**
+
+For complex phases (>5 minutes), break into 2-3 sub-tasks:
+
+```
+✅ CORRECT - Sub-Task Breakdown:
+
+PHASE 3: Implementation (15 min total)
+  → Sub-tasks:
+    - PHASE 3: Implement core logic (5 min)
+    - PHASE 3: Add error handling (3 min)
+    - PHASE 3: Write tests (7 min)
+
+User sees progress within phase: "PHASE 3: 2/3 complete"
+
+❌ WRONG - Single Monolithic Task:
+
+PHASE 3: Implementation (15 min)
+  → No sub-tasks
+
+Problem: User sees "in_progress" for 15 min with no updates
+```
+
+**Avoid Too Many Tasks:**
+
+Limit to **max 15-20 tasks** for readability:
+
+```
+✅ CORRECT - 12 Tasks (readable):
+
+10-phase workflow:
+  - PHASE 1: Ask user
+  - PHASE 2: Plan (2 sub-tasks)
+  - PHASE 3: Implement (3 sub-tasks)
+  - PHASE 4: Test
+  - PHASE 5: Review (2 sub-tasks)
+  - PHASE 6: Fix issues
+  - PHASE 7: Re-review
+  - PHASE 8: Accept
+
+Total: 12 tasks (clean, trackable)
+
+❌ WRONG - 50 Tasks (overwhelming):
+
+Every single action as separate task:
+  - Read file 1
+  - Read file 2
+  - Write file 3
+  - Run command 1
+  - ... (50 tasks)
+
+Problem: User overwhelmed, can't see forest for trees
+```
+
+**Guideline by Workflow Duration:**
+
+```
+Workflow Duration → Task Count:
+
+< 5 minutes:    3-5 tasks
+5-15 minutes:   8-12 tasks
+15-30 minutes:  12-18 tasks
+> 30 minutes:   15-20 tasks (if more, group into phases)
+
+Example:
+  5-minute workflow (3 phases):
+    - PHASE 1: Prepare
+    - PHASE 2: Execute
+    - PHASE 3: Present
+  Total: 3 tasks ✓
+
+  20-minute workflow (6 phases):
+    - PHASE 1: Ask user
+    - PHASE 2: Plan (2 sub-tasks)
+    - PHASE 3: Implement (3 sub-tasks)
+    - PHASE 4: Test
+    - PHASE 5: Review (2 sub-tasks)
+    - PHASE 6: Accept
+  Total: 11 tasks ✓
+```
+
+---
+
+### Pattern 3: Status Transitions
+
+**Exactly ONE Task In Progress at a Time:**
+
+Maintain the invariant: **exactly one task in_progress** at any moment:
+
+```
+✅ CORRECT - One In-Progress:
+
+State at time T1:
+  [✓] PHASE 1: Ask user (completed)
+  [✓] PHASE 2: Plan (completed)
+  [→] PHASE 3: Implement (in_progress)  ← Only one
+  [ ] PHASE 4: Test (pending)
+  [ ] PHASE 5: Review (pending)
+
+State at time T2 (after PHASE 3 completes):
+  [✓] PHASE 1: Ask user (completed)
+  [✓] PHASE 2: Plan (completed)
+  [✓] PHASE 3: Implement (completed)
+  [→] PHASE 4: Test (in_progress)  ← Only one
+  [ ] PHASE 5: Review (pending)
+
+❌ WRONG - Multiple In-Progress:
+
+State:
+  [✓] PHASE 1: Ask user (completed)
+  [→] PHASE 2: Plan (in_progress)  ← Two in-progress?
+  [→] PHASE 3: Implement (in_progress)  ← Confusing!
+  [ ] PHASE 4: Test (pending)
+
+Problem: User confused about current phase
+```
+
+**Status Transition Sequence:**
+
+```
+Lifecycle of a Task:
+
+1. Created: pending
+   (Task exists, not started yet)
+
+2. Started: pending → in_progress
+   (Mark as in_progress when starting work)
+
+3. Completed: in_progress → completed
+   (Mark as completed immediately after finishing)
+
+4. Next task: Mark next task as in_progress
+   (Continue to next task)
+
+Example Timeline:
+
+T=0s:  [→] Task 1 (in_progress), [ ] Task 2 (pending)
+T=30s: [✓] Task 1 (completed),   [→] Task 2 (in_progress)
+T=60s: [✓] Task 1 (completed),   [✓] Task 2 (completed)
+```
+
+**NEVER Batch Completions:**
+
+Mark tasks completed **immediately** after finishing, not at end of phase:
+
+```
+✅ CORRECT - Immediate Updates:
+
+Mark "PHASE 1: Ask user" as in_progress
+... do work (30s) ...
+Mark "PHASE 1: Ask user" as completed  ← Immediate
+
+Mark "PHASE 1: Validate inputs" as in_progress
+... do work (20s) ...
+Mark "PHASE 1: Validate inputs" as completed  ← Immediate
+
+User sees real-time progress
+
+❌ WRONG - Batched Updates:
+
+Mark "PHASE 1: Ask user" as in_progress
+... do work (30s) ...
+
+Mark "PHASE 1: Validate inputs" as in_progress
+... do work (20s) ...
+
+(At end of PHASE 1, batch update both to completed)
+
+Problem: User doesn't see progress for 50s, thinks workflow is stuck
+```
+
+---
+
+### Pattern 4: Real-Time Progress Tracking
+
+**Update TodoWrite As Work Progresses:**
+
+TodoWrite should reflect **current state**, not past state:
+
+```
+✅ CORRECT - Real-Time Updates:
+
+T=0s:  Initialize TodoWrite (8 tasks, all pending)
+T=5s:  Mark "PHASE 1" as in_progress
+T=35s: Mark "PHASE 1" as completed, "PHASE 2" as in_progress
+T=90s: Mark "PHASE 2" as completed, "PHASE 3" as in_progress
+...
+
+User always sees accurate current state
+
+❌ WRONG - Delayed Updates:
+
+T=0s:   Initialize TodoWrite
+T=300s: Workflow completes
+T=301s: Update all tasks to completed
+
+Problem: No progress visibility for 5 minutes
+```
+
+**Add New Tasks If Discovered During Execution:**
+
+If you discover additional work during execution, add new tasks:
+
+```
+Scenario: During implementation, realize refactoring needed
+
+Initial TodoWrite:
+  [✓] PHASE 1: Plan
+  [→] PHASE 2: Implement
+  [ ] PHASE 3: Test
+  [ ] PHASE 4: Review
+
+During PHASE 2, discover:
+  "Implementation requires refactoring legacy code"
+
+Updated TodoWrite:
+  [✓] PHASE 1: Plan
+  [✓] PHASE 2: Implement core logic (completed)
+  [→] PHASE 2: Refactor legacy code (in_progress)  ← New task added
+  [ ] PHASE 3: Test
+  [ ] PHASE 4: Review
+
+User sees: "Additional work discovered: refactoring. Total now 5 tasks."
+```
+
+**User Can See Current Progress at Any Time:**
+
+With real-time updates, user can check progress:
+
+```
+User checks at T=120s:
+
+TodoWrite State:
+  [✓] PHASE 1: Ask user
+  [✓] PHASE 2: Plan architecture
+  [→] PHASE 3: Implement core logic (in_progress)
+  [ ] PHASE 3: Add error handling
+  [ ] PHASE 3: Write tests
+  [ ] PHASE 4: Code review
+  [ ] PHASE 5: Accept
+
+User sees: "3/8 tasks complete (37.5%), currently implementing core logic"
+```
+
+---
+
+### Pattern 5: Iteration Loop Tracking
+
+**Create Task Per Iteration:**
+
+For iteration loops, create a task for each iteration:
+
+```
+✅ CORRECT - Iteration Tasks:
+
+Design Validation Loop (max 10 iterations):
+
+Initial TodoWrite:
+  [ ] Iteration 1/10: Designer validation
+  [ ] Iteration 2/10: Designer validation
+  [ ] Iteration 3/10: Designer validation
+  ... (create all 10 upfront)
+
+Progress:
+  [✓] Iteration 1/10: Designer validation (NEEDS IMPROVEMENT)
+  [✓] Iteration 2/10: Designer validation (NEEDS IMPROVEMENT)
+  [→] Iteration 3/10: Designer validation (in_progress)
+  [ ] Iteration 4/10: Designer validation
+  ...
+
+User sees: "Iteration 3/10 in progress, 2 complete"
+
+❌ WRONG - Single Loop Task:
+
+TodoWrite:
+  [→] Design validation loop (in_progress)
+
+Problem: User sees "in_progress" for 10 minutes, no iteration visibility
+```
+
+**Mark Iteration Complete When Done:**
+
+```
+Iteration Lifecycle:
+
+Iteration 1:
+  Mark "Iteration 1/10" as in_progress
+  Run designer validation
+  If NEEDS IMPROVEMENT: Run developer fixes
+  Mark "Iteration 1/10" as completed
+
+Iteration 2:
+  Mark "Iteration 2/10" as in_progress
+  Run designer validation
+  If PASS: Exit loop early
+  Mark "Iteration 2/10" as completed
+
+Result: Loop exited after 2 iterations
+  [✓] Iteration 1/10 (completed)
+  [✓] Iteration 2/10 (completed)
+  [ ] Iteration 3/10 (not needed, loop exited)
+  ...
+
+User sees: "Loop completed in 2/10 iterations"
+```
+
+**Track Total Iterations vs Max Limit:**
+
+```
+Iteration Progress:
+
+Max: 10 iterations
+Current: 5
+
+TodoWrite State:
+  [✓] Iteration 1/10
+  [✓] Iteration 2/10
+  [✓] Iteration 3/10
+  [✓] Iteration 4/10
+  [→] Iteration 5/10
+  [ ] Iteration 6/10
+  ...
+
+User sees: "Iteration 5/10 (50% through max)"
+
+Warning at Iteration 8:
+  "Iteration 8/10 - approaching max, may escalate to user if not PASS"
+```
+
+**Clear Progress Visibility:**
+
+```
+Iteration Loop with TodoWrite:
+
+User Request: "Validate UI design"
+
+TodoWrite:
+  [✓] PHASE 1: Gather design reference
+  [✓] Iteration 1/10: Designer validation (5 issues found)
+  [✓] Iteration 2/10: Designer validation (3 issues found)
+  [✓] Iteration 3/10: Designer validation (1 issue found)
+  [→] Iteration 4/10: Designer validation (in_progress)
+  [ ] Iteration 5/10: Designer validation
+  ...
+  [ ] PHASE 3: User validation gate
+
+User sees:
+  - 4 iterations completed (40% through max)
+  - Issues reducing each iteration (5 → 3 → 1)
+  - Progress toward PASS
+```
+
+---
+
+### Pattern 6: Parallel Task Tracking
+
+**Multiple Agents Executing Simultaneously:**
+
+When running agents in parallel, track each separately:
+
+```
+✅ CORRECT - Separate Tasks for Parallel Agents:
+
+Multi-Model Review (3 models in parallel):
+
+TodoWrite:
+  [✓] PHASE 1: Prepare review context
+  [→] PHASE 2: Claude review (in_progress)
+  [→] PHASE 2: Grok review (in_progress)
+  [→] PHASE 2: Gemini review (in_progress)
+  [ ] PHASE 3: Consolidate reviews
+
+Note: 3 tasks "in_progress" is OK for parallel execution
+      (Exception to "one in_progress" rule)
+
+As models complete:
+  [✓] PHASE 1: Prepare review context
+  [✓] PHASE 2: Claude review (completed)  ← First to finish
+  [→] PHASE 2: Grok review (in_progress)
+  [→] PHASE 2: Gemini review (in_progress)
+  [ ] PHASE 3: Consolidate reviews
+
+User sees: "1/3 reviews complete, 2 in progress"
+
+❌ WRONG - Single Task for Parallel Work:
+
+TodoWrite:
+  [✓] PHASE 1: Prepare
+  [→] PHASE 2: Run 3 reviews (in_progress)
+  [ ] PHASE 3: Consolidate
+
+Problem: No visibility into which reviews are complete
+```
+
+**Update As Each Agent Completes:**
+
+```
+Parallel Execution Timeline:
+
+T=0s:  Launch 3 reviews in parallel
+  [→] Claude review (in_progress)
+  [→] Grok review (in_progress)
+  [→] Gemini review (in_progress)
+
+T=60s: Claude completes first
+  [✓] Claude review (completed)
+  [→] Grok review (in_progress)
+  [→] Gemini review (in_progress)
+
+T=120s: Gemini completes
+  [✓] Claude review (completed)
+  [→] Grok review (in_progress)
+  [✓] Gemini review (completed)
+
+T=180s: Grok completes
+  [✓] Claude review (completed)
+  [✓] Grok review (completed)
+  [✓] Gemini review (completed)
+
+User sees real-time completion updates
+```
+
+**Progress Indicators During Long Parallel Tasks:**
+
+```
+For long-running parallel tasks (>2 minutes), show progress:
+
+T=0s:   "Launching 5 AI model reviews (estimated 5 minutes)..."
+T=60s:  "1/5 reviews complete..."
+T=120s: "2/5 reviews complete..."
+T=180s: "4/5 reviews complete, 1 in progress..."
+T=240s: "All reviews complete! Consolidating results..."
+
+TodoWrite mirrors this:
+  [✓] Claude review (1/5 complete)
+  [✓] Grok review (2/5 complete)
+  [→] Gemini review (in_progress)
+  [→] GPT-5 review (in_progress)
+  [→] DeepSeek review (in_progress)
+```
+
+---
+
+## Integration with Other Skills
+
+**todowrite-orchestration + multi-agent-coordination:**
+
+```
+Use Case: Multi-phase implementation workflow
+
+Step 1: Initialize TodoWrite (todowrite-orchestration)
+  Create task list for all 8 phases
+
+Step 2: Sequential Agent Delegation (multi-agent-coordination)
+  Phase 1: api-architect
+    Mark PHASE 1 as in_progress
+    Delegate to api-architect
+    Mark PHASE 1 as completed
+
+  Phase 2: backend-developer
+    Mark PHASE 2 as in_progress
+    Delegate to backend-developer
+    Mark PHASE 2 as completed
+
+  ... continue for all phases
+```
+
+**todowrite-orchestration + multi-model-validation:**
+
+```
+Use Case: Multi-model review with progress tracking
+
+Step 1: Initialize TodoWrite (todowrite-orchestration)
+  [ ] PHASE 1: Prepare context
+  [ ] PHASE 2: Launch reviews (5 models)
+  [ ] PHASE 3: Consolidate results
+
+Step 2: Parallel Execution (multi-model-validation)
+  Mark "PHASE 2: Launch reviews" as in_progress
+  Launch all 5 models simultaneously
+  As each completes: Update progress (1/5, 2/5, ...)
+  Mark "PHASE 2: Launch reviews" as completed
+
+Step 3: Real-Time Visibility (todowrite-orchestration)
+  User sees: "PHASE 2: 3/5 reviews complete..."
+```
+
+**todowrite-orchestration + quality-gates:**
+
+```
+Use Case: Iteration loop with TodoWrite tracking
+
+Step 1: Initialize TodoWrite (todowrite-orchestration)
+  [ ] Iteration 1/10
+  [ ] Iteration 2/10
+  ...
+
+Step 2: Iteration Loop (quality-gates)
+  For i = 1 to 10:
+    Mark "Iteration i/10" as in_progress
+    Run designer validation
+    If PASS: Exit loop
+    Mark "Iteration i/10" as completed
+
+Step 3: Progress Visibility
+  User sees: "Iteration 5/10 complete, 5 remaining"
+```
+
+---
+
+## Best Practices
+
+**Do:**
+- ✅ Initialize TodoWrite BEFORE starting work (step 0)
+- ✅ List ALL phases upfront (user sees complete scope)
+- ✅ Use 8-15 tasks for typical workflows (readable)
+- ✅ Mark completed IMMEDIATELY after finishing (real-time)
+- ✅ Keep exactly ONE task in_progress (except parallel tasks)
+- ✅ Track iterations separately (Iteration 1/10, 2/10, ...)
+- ✅ Update as work progresses (not batched at end)
+- ✅ Add new tasks if discovered during execution
+
+**Don't:**
+- ❌ Create TodoWrite during workflow (initialize first)
+- ❌ Hide phases from user (list all upfront)
+- ❌ Create too many tasks (>20 overwhelms user)
+- ❌ Batch completions at end of phase (update real-time)
+- ❌ Leave multiple tasks in_progress (pick one)
+- ❌ Use single task for loop (track iterations separately)
+- ❌ Update only at start/end (update during execution)
+
+**Performance:**
+- TodoWrite overhead: <1s per update (negligible)
+- User visibility benefit: Reduces perceived wait time 30-50%
+- Workflow confidence: User knows progress, less likely to cancel
+
+---
+
+## Examples
+
+### Example 1: 8-Phase Implementation Workflow
+
+**Scenario:** Full-cycle implementation with TodoWrite tracking
+
+**Execution:**
+
+```
+Step 0: Initialize TodoWrite
+  TodoWrite: Create task list
+    [ ] PHASE 1: Ask user for requirements
+    [ ] PHASE 2: Generate architecture plan
+    [ ] PHASE 3: Implement core logic
+    [ ] PHASE 3: Add error handling
+    [ ] PHASE 3: Write tests
+    [ ] PHASE 4: Run test suite
+    [ ] PHASE 5: Code review
+    [ ] PHASE 6: Fix review issues
+    [ ] PHASE 7: User acceptance
+    [ ] PHASE 8: Generate report
+
+  User sees: "10 tasks, 0 complete, Phase 1 starting..."
+
+Step 1: PHASE 1
+  Mark "PHASE 1: Ask user" as in_progress
+  ... gather requirements (30s) ...
+  Mark "PHASE 1: Ask user" as completed
+  User sees: "1/10 tasks complete (10%)"
+
+Step 2: PHASE 2
+  Mark "PHASE 2: Architecture plan" as in_progress
+  ... generate plan (2 min) ...
+  Mark "PHASE 2: Architecture plan" as completed
+  User sees: "2/10 tasks complete (20%)"
+
+Step 3: PHASE 3 (3 sub-tasks)
+  Mark "PHASE 3: Implement core" as in_progress
+  ... implement (3 min) ...
+  Mark "PHASE 3: Implement core" as completed
+  User sees: "3/10 tasks complete (30%)"
+
+  Mark "PHASE 3: Add error handling" as in_progress
+  ... add error handling (2 min) ...
+  Mark "PHASE 3: Add error handling" as completed
+  User sees: "4/10 tasks complete (40%)"
+
+  Mark "PHASE 3: Write tests" as in_progress
+  ... write tests (3 min) ...
+  Mark "PHASE 3: Write tests" as completed
+  User sees: "5/10 tasks complete (50%)"
+
+... continue through all phases ...
+
+Final State:
+  [✓] All 10 tasks completed
+  User sees: "10/10 tasks complete (100%). Workflow finished!"
+
+Total Duration: ~15 minutes
+User Experience: Continuous progress updates every 1-3 minutes
+```
+
+---
+
+### Example 2: Iteration Loop with Progress Tracking
+
+**Scenario:** Design validation with 10 max iterations
+
+**Execution:**
+
+```
+Step 0: Initialize TodoWrite
+  TodoWrite: Create task list
+    [ ] PHASE 1: Gather design reference
+    [ ] Iteration 1/10: Designer validation
+    [ ] Iteration 2/10: Designer validation
+    [ ] Iteration 3/10: Designer validation
+    [ ] Iteration 4/10: Designer validation
+    [ ] Iteration 5/10: Designer validation
+    ... (10 iterations total)
+    [ ] PHASE 3: User validation gate
+
+Step 1: PHASE 1
+  Mark "PHASE 1: Gather design" as in_progress
+  ... gather design (20s) ...
+  Mark "PHASE 1: Gather design" as completed
+
+Step 2: Iteration Loop
+  Iteration 1:
+    Mark "Iteration 1/10" as in_progress
+    Designer: "NEEDS IMPROVEMENT - 5 issues"
+    Developer: Fix 5 issues
+    Mark "Iteration 1/10" as completed
+    User sees: "Iteration 1/10 complete, 5 issues fixed"
+
+  Iteration 2:
+    Mark "Iteration 2/10" as in_progress
+    Designer: "NEEDS IMPROVEMENT - 3 issues"
+    Developer: Fix 3 issues
+    Mark "Iteration 2/10" as completed
+    User sees: "Iteration 2/10 complete, 3 issues fixed"
+
+  Iteration 3:
+    Mark "Iteration 3/10" as in_progress
+    Designer: "NEEDS IMPROVEMENT - 1 issue"
+    Developer: Fix 1 issue
+    Mark "Iteration 3/10" as completed
+    User sees: "Iteration 3/10 complete, 1 issue fixed"
+
+  Iteration 4:
+    Mark "Iteration 4/10" as in_progress
+    Designer: "PASS ✓"
+    Mark "Iteration 4/10" as completed
+    Exit loop (early exit)
+    User sees: "Loop completed in 4/10 iterations"
+
+Step 3: PHASE 3
+  Mark "PHASE 3: User validation" as in_progress
+  ... user validates ...
+  Mark "PHASE 3: User validation" as completed
+
+Final State:
+  [✓] PHASE 1: Gather design
+  [✓] Iteration 1/10 (5 issues fixed)
+  [✓] Iteration 2/10 (3 issues fixed)
+  [✓] Iteration 3/10 (1 issue fixed)
+  [✓] Iteration 4/10 (PASS)
+  [ ] Iteration 5/10 (not needed)
+  ...
+  [✓] PHASE 3: User validation
+
+User Experience: Clear iteration progress, early exit visible
+```
+
+---
+
+### Example 3: Parallel Multi-Model Review
+
+**Scenario:** 5 AI models reviewing code in parallel
+
+**Execution:**
+
+```
+Step 0: Initialize TodoWrite
+  TodoWrite: Create task list
+    [ ] PHASE 1: Prepare review context
+    [ ] PHASE 2: Claude review
+    [ ] PHASE 2: Grok review
+    [ ] PHASE 2: Gemini review
+    [ ] PHASE 2: GPT-5 review
+    [ ] PHASE 2: DeepSeek review
+    [ ] PHASE 3: Consolidate reviews
+    [ ] PHASE 4: Present results
+
+Step 1: PHASE 1
+  Mark "PHASE 1: Prepare context" as in_progress
+  ... prepare (30s) ...
+  Mark "PHASE 1: Prepare context" as completed
+
+Step 2: PHASE 2 (Parallel Execution)
+  Mark all 5 reviews as in_progress:
+    [→] Claude review
+    [→] Grok review
+    [→] Gemini review
+    [→] GPT-5 review
+    [→] DeepSeek review
+
+  Launch all 5 in parallel (4-Message Pattern)
+
+  As each completes:
+    T=60s:  Claude completes
+      [✓] Claude review
+      User sees: "1/5 reviews complete"
+
+    T=90s:  Gemini completes
+      [✓] Gemini review
+      User sees: "2/5 reviews complete"
+
+    T=120s: GPT-5 completes
+      [✓] GPT-5 review
+      User sees: "3/5 reviews complete"
+
+    T=150s: Grok completes
+      [✓] Grok review
+      User sees: "4/5 reviews complete"
+
+    T=180s: DeepSeek completes
+      [✓] DeepSeek review
+      User sees: "5/5 reviews complete!"
+
+Step 3: PHASE 3
+  Mark "PHASE 3: Consolidate" as in_progress
+  ... consolidate (30s) ...
+  Mark "PHASE 3: Consolidate" as completed
+
+Step 4: PHASE 4
+  Mark "PHASE 4: Present results" as in_progress
+  ... present (10s) ...
+  Mark "PHASE 4: Present results" as completed
+
+Final State:
+  [✓] All 8 tasks completed
+  User sees: "Multi-model review complete in 3 minutes"
+
+User Experience:
+  - Real-time progress as each model completes
+  - Clear visibility: "3/5 reviews complete"
+  - Reduces perceived wait time (user knows progress)
+```
+
+---
+
+## Troubleshooting
+
+**Problem: User thinks workflow is stuck**
+
+Cause: No TodoWrite updates for >1 minute
+
+Solution: Update TodoWrite more frequently, or add sub-tasks
+
+```
+❌ Wrong:
+  [→] PHASE 3: Implementation (in_progress for 10 minutes)
+
+✅ Correct:
+  [✓] PHASE 3: Implement core logic (2 min)
+  [✓] PHASE 3: Add error handling (3 min)
+  [→] PHASE 3: Write tests (in_progress, 2 min so far)
+
+User sees progress every 2-3 minutes
+```
+
+---
+
+**Problem: Too many tasks (>20), overwhelming**
+
+Cause: Too granular task breakdown
+
+Solution: Group micro-tasks into larger operations
+
+```
+❌ Wrong (25 tasks):
+  [ ] Read file 1
+  [ ] Read file 2
+  [ ] Write file 3
+  ... (25 micro-tasks)
+
+✅ Correct (8 tasks):
+  [ ] PHASE 1: Gather inputs (includes reading files)
+  [ ] PHASE 2: Process data
+  ... (8 significant operations)
+```
+
+---
+
+**Problem: Multiple tasks "in_progress" (not parallel execution)**
+
+Cause: Forgot to mark previous task as completed
+
+Solution: Always mark completed before starting next
+
+```
+❌ Wrong:
+  [→] PHASE 1: Ask user (in_progress)
+  [→] PHASE 2: Plan (in_progress)  ← Both in_progress?
+
+✅ Correct:
+  [✓] PHASE 1: Ask user (completed)
+  [→] PHASE 2: Plan (in_progress)  ← Only one
+```
+
+---
+
+## Summary
+
+TodoWrite orchestration provides real-time progress visibility through:
+
+- **Phase initialization** (create task list before starting)
+- **Appropriate granularity** (8-15 tasks, significant operations)
+- **Real-time updates** (mark completed immediately)
+- **Exactly one in_progress** (except parallel execution)
+- **Iteration tracking** (separate task per iteration)
+- **Parallel task tracking** (update as each completes)
+
+Master these patterns and users will always know:
+- What's happening now
+- What's coming next
+- How much progress has been made
+- How much remains
+
+This transforms "black box" workflows into transparent, trackable processes.
+
+---
+
+**Extracted From:**
+- `/review` command (10-task initialization, phase-based tracking)
+- `/implement` command (8-phase workflow with sub-tasks)
+- `/validate-ui` command (iteration tracking, user feedback rounds)
+- All multi-phase orchestration workflows