Initial commit
This commit is contained in:
13
.claude-plugin/plugin.json
Normal file
13
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
{
|
||||||
|
"name": "orchestration",
|
||||||
|
"description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
|
||||||
|
"version": "0.1.1",
|
||||||
|
"author": {
|
||||||
|
"name": "Jack Rudenko",
|
||||||
|
"email": "i@madappgang.com",
|
||||||
|
"company": "MadAppGang"
|
||||||
|
},
|
||||||
|
"skills": [
|
||||||
|
"./skills"
|
||||||
|
]
|
||||||
|
}
|
||||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# orchestration
|
||||||
|
|
||||||
|
Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.
|
||||||
61
plugin.lock.json
Normal file
61
plugin.lock.json
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
{
|
||||||
|
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||||
|
"pluginId": "gh:MadAppGang/claude-code:plugins/orchestration",
|
||||||
|
"normalized": {
|
||||||
|
"repo": null,
|
||||||
|
"ref": "refs/tags/v20251128.0",
|
||||||
|
"commit": "ad90df36843224b97a17f14cfd5a207d4e053c67",
|
||||||
|
"treeHash": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150",
|
||||||
|
"generatedAt": "2025-11-28T10:12:05.859643Z",
|
||||||
|
"toolVersion": "publish_plugins.py@0.2.0"
|
||||||
|
},
|
||||||
|
"origin": {
|
||||||
|
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||||
|
"branch": "master",
|
||||||
|
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||||
|
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||||
|
},
|
||||||
|
"manifest": {
|
||||||
|
"name": "orchestration",
|
||||||
|
"description": "Shared multi-agent coordination and workflow orchestration patterns for complex Claude Code workflows. Skills-only plugin providing proven patterns for parallel execution (3-5x speedup), multi-model validation (Grok/Gemini/GPT-5), quality gates, TDD loops, TodoWrite phase tracking, and comprehensive error recovery. Battle-tested patterns from 100+ days production use.",
|
||||||
|
"version": "0.1.1"
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"files": [
|
||||||
|
{
|
||||||
|
"path": "README.md",
|
||||||
|
"sha256": "215babb6dff86f8783d8e97d0a21546e2aaa3b055bc1cde5c4e16c6bf3d6c7a5"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": ".claude-plugin/plugin.json",
|
||||||
|
"sha256": "36414e18947889714f9d80576e01edaab8b3ffdf9efd44107e0f5fb42b0e2270"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/todowrite-orchestration/SKILL.md",
|
||||||
|
"sha256": "f681467a2eef99945f90b8f2b654c8c9713f4153afdff19a0c0b312d2f6084de"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/quality-gates/SKILL.md",
|
||||||
|
"sha256": "ba13c21d8e9f8abeb856bbec4a6ebc821e92dfe0857942797959087452b175c3"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/error-recovery/SKILL.md",
|
||||||
|
"sha256": "133564d1bc0d35a8c35074b089120fe7d7a757b71bdd6222a7a5c23e45f20aa3"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/multi-agent-coordination/SKILL.md",
|
||||||
|
"sha256": "9e0156350eb09447221898598611a5270921c31168e7698c4bd0d3bd0ced4616"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/multi-model-validation/SKILL.md",
|
||||||
|
"sha256": "9d5c46dfa531f911f4fcc4070fd6c039900bcdb440c997f7eac384001a1ba33e"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"dirSha256": "811ec6920184f4235cc78d0b9ca0025fae96488caf35059ca1224e8d5cb24150"
|
||||||
|
},
|
||||||
|
"security": {
|
||||||
|
"scannedAt": null,
|
||||||
|
"scannerVersion": null,
|
||||||
|
"flags": []
|
||||||
|
}
|
||||||
|
}
|
||||||
1107
skills/error-recovery/SKILL.md
Normal file
1107
skills/error-recovery/SKILL.md
Normal file
File diff suppressed because it is too large
Load Diff
742
skills/multi-agent-coordination/SKILL.md
Normal file
742
skills/multi-agent-coordination/SKILL.md
Normal file
@@ -0,0 +1,742 @@
|
|||||||
|
---
|
||||||
|
name: multi-agent-coordination
|
||||||
|
description: Coordinate multiple agents in parallel or sequential workflows. Use when running agents simultaneously, delegating to sub-agents, switching between specialized agents, or managing agent selection. Trigger keywords - "parallel agents", "sequential workflow", "delegate", "multi-agent", "sub-agent", "agent switching", "task decomposition".
|
||||||
|
version: 0.1.0
|
||||||
|
tags: [orchestration, multi-agent, parallel, sequential, delegation, coordination]
|
||||||
|
keywords: [parallel, sequential, delegate, sub-agent, agent-switching, multi-agent, task-decomposition, coordination]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Multi-Agent Coordination
|
||||||
|
|
||||||
|
**Version:** 1.0.0
|
||||||
|
**Purpose:** Patterns for coordinating multiple agents in complex workflows
|
||||||
|
**Status:** Production Ready
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Multi-agent coordination is the foundation of sophisticated Claude Code workflows. This skill provides battle-tested patterns for orchestrating multiple specialized agents to accomplish complex tasks that are beyond the capabilities of a single agent.
|
||||||
|
|
||||||
|
The key challenge in multi-agent systems is **dependencies**. Some tasks must execute sequentially (one agent's output feeds into another), while others can run in parallel (independent validations from different perspectives). Getting this right is the difference between a 5-minute workflow and a 15-minute one.
|
||||||
|
|
||||||
|
This skill teaches you:
|
||||||
|
- When to run agents in **parallel** vs **sequential**
|
||||||
|
- How to **select the right agent** for each task
|
||||||
|
- How to **delegate** to sub-agents without polluting context
|
||||||
|
- How to manage **context windows** across multiple agent calls
|
||||||
|
|
||||||
|
## Core Patterns
|
||||||
|
|
||||||
|
### Pattern 1: Sequential vs Parallel Execution
|
||||||
|
|
||||||
|
**When to Use Sequential:**
|
||||||
|
|
||||||
|
Use sequential execution when there are **dependencies** between agents:
|
||||||
|
- Agent B needs Agent A's output as input
|
||||||
|
- Workflow phases must complete in order (plan → implement → test → review)
|
||||||
|
- Each agent modifies shared state (same files)
|
||||||
|
|
||||||
|
**Example: Multi-Phase Implementation**
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 1: Architecture Planning
|
||||||
|
Task: api-architect
|
||||||
|
Output: ai-docs/architecture-plan.md
|
||||||
|
Wait for completion ✓
|
||||||
|
|
||||||
|
Phase 2: Implementation (depends on Phase 1)
|
||||||
|
Task: backend-developer
|
||||||
|
Input: Read ai-docs/architecture-plan.md
|
||||||
|
Output: src/auth.ts, src/routes.ts
|
||||||
|
Wait for completion ✓
|
||||||
|
|
||||||
|
Phase 3: Testing (depends on Phase 2)
|
||||||
|
Task: test-architect
|
||||||
|
Input: Read src/auth.ts, src/routes.ts
|
||||||
|
Output: tests/auth.test.ts
|
||||||
|
```
|
||||||
|
|
||||||
|
**When to Use Parallel:**
|
||||||
|
|
||||||
|
Use parallel execution when agents are **independent**:
|
||||||
|
- Multiple validation perspectives (designer + tester + reviewer)
|
||||||
|
- Multiple AI models reviewing same code (Grok + Gemini + Claude)
|
||||||
|
- Multiple feature implementations in separate files
|
||||||
|
|
||||||
|
**Example: Multi-Perspective Validation**
|
||||||
|
|
||||||
|
```
|
||||||
|
Single Message with Multiple Task Calls:
|
||||||
|
|
||||||
|
Task: designer
|
||||||
|
Prompt: Validate UI against Figma design
|
||||||
|
Output: ai-docs/design-review.md
|
||||||
|
---
|
||||||
|
Task: ui-manual-tester
|
||||||
|
Prompt: Test UI in browser for usability
|
||||||
|
Output: ai-docs/testing-report.md
|
||||||
|
---
|
||||||
|
Task: senior-code-reviewer
|
||||||
|
Prompt: Review code quality and patterns
|
||||||
|
Output: ai-docs/code-review.md
|
||||||
|
|
||||||
|
All three execute simultaneously (3x speedup!)
|
||||||
|
Wait for all to complete, then consolidate results.
|
||||||
|
```
|
||||||
|
|
||||||
|
**The 4-Message Pattern for True Parallel Execution:**
|
||||||
|
|
||||||
|
This is **CRITICAL** for achieving true parallelism:
|
||||||
|
|
||||||
|
```
|
||||||
|
Message 1: Preparation (Bash Only)
|
||||||
|
- Create workspace directories
|
||||||
|
- Validate inputs
|
||||||
|
- Write context files
|
||||||
|
- NO Task calls, NO TodoWrite
|
||||||
|
|
||||||
|
Message 2: Parallel Execution (Task Only)
|
||||||
|
- Launch ALL agents in SINGLE message
|
||||||
|
- ONLY Task tool calls
|
||||||
|
- Each Task is independent
|
||||||
|
- All execute simultaneously
|
||||||
|
|
||||||
|
Message 3: Consolidation (Task Only)
|
||||||
|
- Launch consolidation agent
|
||||||
|
- Automatically triggered when N agents complete
|
||||||
|
|
||||||
|
Message 4: Present Results
|
||||||
|
- Show user final consolidated results
|
||||||
|
- Include links to detailed reports
|
||||||
|
```
|
||||||
|
|
||||||
|
**Anti-Pattern: Mixing Tool Types Breaks Parallelism**
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ WRONG - Executes Sequentially:
|
||||||
|
await TodoWrite({...}); // Tool 1
|
||||||
|
await Task({...}); // Tool 2 - waits for TodoWrite
|
||||||
|
await Bash({...}); // Tool 3 - waits for Task
|
||||||
|
await Task({...}); // Tool 4 - waits for Bash
|
||||||
|
|
||||||
|
✅ CORRECT - Executes in Parallel:
|
||||||
|
await Task({...}); // Task 1
|
||||||
|
await Task({...}); // Task 2
|
||||||
|
await Task({...}); // Task 3
|
||||||
|
// All execute simultaneously
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why Mixing Fails:**
|
||||||
|
|
||||||
|
Claude Code sees different tool types and assumes there are dependencies between them, forcing sequential execution. Using a single tool type (all Task calls) signals that operations are independent and can run in parallel.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 2: Agent Selection by Task Type
|
||||||
|
|
||||||
|
**Task Detection Logic:**
|
||||||
|
|
||||||
|
Intelligent workflows automatically detect task type and select appropriate agents:
|
||||||
|
|
||||||
|
```
|
||||||
|
Task Type Detection:
|
||||||
|
|
||||||
|
IF request mentions "API", "endpoint", "backend", "database":
|
||||||
|
→ API-focused workflow
|
||||||
|
→ Use: api-architect, backend-developer, test-architect
|
||||||
|
→ Skip: designer, ui-developer (not relevant)
|
||||||
|
|
||||||
|
ELSE IF request mentions "UI", "component", "design", "Figma":
|
||||||
|
→ UI-focused workflow
|
||||||
|
→ Use: designer, ui-developer, ui-manual-tester
|
||||||
|
→ Optional: ui-developer-codex (external validation)
|
||||||
|
|
||||||
|
ELSE IF request mentions both API and UI:
|
||||||
|
→ Mixed workflow
|
||||||
|
→ Use all relevant agents from both categories
|
||||||
|
→ Coordinate between backend and frontend agents
|
||||||
|
|
||||||
|
ELSE IF request mentions "test", "coverage", "bug":
|
||||||
|
→ Testing-focused workflow
|
||||||
|
→ Use: test-architect, ui-manual-tester
|
||||||
|
→ Optional: codebase-detective (for bug investigation)
|
||||||
|
|
||||||
|
ELSE IF request mentions "review", "validate", "feedback":
|
||||||
|
→ Review-focused workflow
|
||||||
|
→ Use: senior-code-reviewer, designer, ui-developer
|
||||||
|
→ Optional: external model reviewers
|
||||||
|
```
|
||||||
|
|
||||||
|
**Agent Capability Matrix:**
|
||||||
|
|
||||||
|
| Task Type | Primary Agent | Secondary Agent | Optional External |
|
||||||
|
|-----------|---------------|-----------------|-------------------|
|
||||||
|
| API Implementation | backend-developer | api-architect | - |
|
||||||
|
| UI Implementation | ui-developer | designer | ui-developer-codex |
|
||||||
|
| Testing | test-architect | ui-manual-tester | - |
|
||||||
|
| Code Review | senior-code-reviewer | - | codex-code-reviewer |
|
||||||
|
| Architecture Planning | api-architect OR frontend-architect | - | plan-reviewer |
|
||||||
|
| Bug Investigation | codebase-detective | test-architect | - |
|
||||||
|
| Design Validation | designer | ui-developer | designer-codex |
|
||||||
|
|
||||||
|
**Agent Switching Pattern:**
|
||||||
|
|
||||||
|
Some workflows benefit from **adaptive agent selection** based on context:
|
||||||
|
|
||||||
|
```
|
||||||
|
Example: UI Development with External Validation
|
||||||
|
|
||||||
|
Base Implementation:
|
||||||
|
Task: ui-developer
|
||||||
|
Prompt: Implement navbar component from design
|
||||||
|
|
||||||
|
User requests external validation:
|
||||||
|
→ Switch to ui-developer-codex OR add parallel ui-developer-codex
|
||||||
|
→ Run both: embedded ui-developer + external ui-developer-codex
|
||||||
|
→ Consolidate feedback from both
|
||||||
|
|
||||||
|
Scenario 1: User wants speed
|
||||||
|
→ Use ONLY ui-developer (embedded, fast)
|
||||||
|
|
||||||
|
Scenario 2: User wants highest quality
|
||||||
|
→ Use BOTH ui-developer AND ui-developer-codex (parallel)
|
||||||
|
→ Consensus analysis on feedback
|
||||||
|
|
||||||
|
Scenario 3: User is out of credits
|
||||||
|
→ Fallback to ui-developer only
|
||||||
|
→ Notify user external validation unavailable
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 3: Sub-Agent Delegation
|
||||||
|
|
||||||
|
**File-Based Instructions (Context Isolation):**
|
||||||
|
|
||||||
|
When delegating to sub-agents, use **file-based instructions** to avoid context pollution:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - File-Based Delegation:
|
||||||
|
|
||||||
|
Step 1: Write instructions to file
|
||||||
|
Write: ai-docs/architecture-instructions.md
|
||||||
|
Content: "Design authentication system with JWT tokens..."
|
||||||
|
|
||||||
|
Step 2: Delegate to agent with file reference
|
||||||
|
Task: api-architect
|
||||||
|
Prompt: "Read instructions from ai-docs/architecture-instructions.md
|
||||||
|
and create architecture plan."
|
||||||
|
|
||||||
|
Step 3: Agent reads file, does work, writes output
|
||||||
|
Agent reads: ai-docs/architecture-instructions.md
|
||||||
|
Agent writes: ai-docs/architecture-plan.md
|
||||||
|
|
||||||
|
Step 4: Agent returns brief summary ONLY
|
||||||
|
Return: "Architecture plan complete. See ai-docs/architecture-plan.md"
|
||||||
|
|
||||||
|
Step 5: Orchestrator reads output file if needed
|
||||||
|
Read: ai-docs/architecture-plan.md
|
||||||
|
(Only if orchestrator needs to process the output)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why File-Based?**
|
||||||
|
|
||||||
|
- **Avoids context pollution:** Long user requirements don't bloat orchestrator context
|
||||||
|
- **Reusable:** Multiple agents can read same instruction file
|
||||||
|
- **Debuggable:** Files persist after workflow completes
|
||||||
|
- **Clean separation:** Input file, output file, orchestrator stays lightweight
|
||||||
|
|
||||||
|
**Anti-Pattern: Inline Delegation**
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ WRONG - Context Pollution:
|
||||||
|
|
||||||
|
Task: api-architect
|
||||||
|
Prompt: "Design authentication system with:
|
||||||
|
- JWT tokens with refresh token rotation
|
||||||
|
- Email/password login with bcrypt hashing
|
||||||
|
- OAuth2 integration with Google, GitHub
|
||||||
|
- Rate limiting on login endpoint (5 attempts per 15 min)
|
||||||
|
- Password reset flow with time-limited tokens
|
||||||
|
- Email verification on signup
|
||||||
|
- Role-based access control (admin, user, guest)
|
||||||
|
- Session management with Redis
|
||||||
|
- Security headers (CORS, CSP, HSTS)
|
||||||
|
- ... (500 more lines of requirements)"
|
||||||
|
|
||||||
|
Problem: Orchestrator's context now contains 500+ lines of requirements
|
||||||
|
that are only relevant to the architect agent.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Brief Summary Returns:**
|
||||||
|
|
||||||
|
Sub-agents should return **2-5 sentence summaries**, not full output:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Brief Summary:
|
||||||
|
"Architecture plan complete. Designed 3-layer authentication:
|
||||||
|
JWT with refresh tokens, OAuth2 integration (Google/GitHub),
|
||||||
|
and Redis session management. See ai-docs/architecture-plan.md
|
||||||
|
for detailed component breakdown."
|
||||||
|
|
||||||
|
❌ WRONG - Full Output:
|
||||||
|
"Architecture plan:
|
||||||
|
[500 lines of detailed architecture documentation]
|
||||||
|
Components: AuthController, TokenService, OAuthService...
|
||||||
|
[another 500 lines]"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Proxy Mode Invocation:**
|
||||||
|
|
||||||
|
For external AI models (Claudish), use the PROXY_MODE directive:
|
||||||
|
|
||||||
|
```
|
||||||
|
Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
|
||||||
|
Prompt: "Review authentication implementation for security issues.
|
||||||
|
Code context in ai-docs/code-review-context.md"
|
||||||
|
|
||||||
|
Agent Behavior:
|
||||||
|
1. Detects PROXY_MODE directive
|
||||||
|
2. Extracts model: x-ai/grok-code-fast-1
|
||||||
|
3. Extracts task: "Review authentication implementation..."
|
||||||
|
4. Executes: claudish --model x-ai/grok-code-fast-1 --stdin <<< "..."
|
||||||
|
5. Waits for full response (blocking execution)
|
||||||
|
6. Writes: ai-docs/grok-review.md (full detailed review)
|
||||||
|
7. Returns: "Grok review complete. Found 3 CRITICAL issues. See ai-docs/grok-review.md"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key: Blocking Execution**
|
||||||
|
|
||||||
|
External models MUST execute synchronously (blocking) so the agent waits for the full response:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Blocking:
|
||||||
|
RESULT=$(claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT")
|
||||||
|
echo "$RESULT" > ai-docs/grok-review.md
|
||||||
|
echo "Review complete - see ai-docs/grok-review.md"
|
||||||
|
|
||||||
|
❌ WRONG - Background (returns before completion):
|
||||||
|
claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT" &
|
||||||
|
echo "Review started..." # Agent returns immediately, review not done!
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 4: Context Window Management
|
||||||
|
|
||||||
|
**When to Delegate:**
|
||||||
|
|
||||||
|
Delegate to sub-agents when:
|
||||||
|
- Task is self-contained (clear input → output)
|
||||||
|
- Output is large (architecture plan, test suite, review report)
|
||||||
|
- Task requires specialized expertise (designer, tester, reviewer)
|
||||||
|
- Multiple independent tasks can run in parallel
|
||||||
|
|
||||||
|
**When to Execute in Main Context:**
|
||||||
|
|
||||||
|
Execute in main orchestrator when:
|
||||||
|
- Task is small (simple file edit, command execution)
|
||||||
|
- Output is brief (yes/no decision, status check)
|
||||||
|
- Task depends on orchestrator state (current phase, iteration count)
|
||||||
|
- Context pollution risk is low
|
||||||
|
|
||||||
|
**Context Size Estimation:**
|
||||||
|
|
||||||
|
**Note:** Token estimates below are approximations based on typical usage. Actual context consumption varies by skill complexity, Claude model version, and conversation history. Use these as guidelines, not exact measurements.
|
||||||
|
|
||||||
|
Estimate context usage to decide delegation strategy:
|
||||||
|
|
||||||
|
```
|
||||||
|
Context Budget: ~200k tokens (Claude Sonnet 4.5 - actual varies by model)
|
||||||
|
|
||||||
|
Current context usage breakdown:
|
||||||
|
- System prompt: 10k tokens
|
||||||
|
- Skill content (5 skills): 10k tokens
|
||||||
|
- Command instructions: 5k tokens
|
||||||
|
- User request: 1k tokens
|
||||||
|
- Conversation history: 20k tokens
|
||||||
|
───────────────────────────────────
|
||||||
|
Total used: 46k tokens
|
||||||
|
Remaining: 154k tokens
|
||||||
|
|
||||||
|
Safe threshold for delegation: If task will consume >30k tokens, delegate
|
||||||
|
|
||||||
|
Example: Architecture planning for large system
|
||||||
|
- Requirements: 5k tokens
|
||||||
|
- Expected output: 20k tokens
|
||||||
|
- Total: 25k tokens
|
||||||
|
───────────────────────────────────
|
||||||
|
Decision: Delegate (keeps orchestrator lightweight)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Delegation Strategy by Context Size:**
|
||||||
|
|
||||||
|
| Task Output Size | Strategy |
|
||||||
|
|------------------|----------|
|
||||||
|
| < 1k tokens | Execute in orchestrator |
|
||||||
|
| 1k - 10k tokens | Delegate with summary return |
|
||||||
|
| 10k - 30k tokens | Delegate with file-based output |
|
||||||
|
| > 30k tokens | Multi-agent decomposition |
|
||||||
|
|
||||||
|
**Example: Multi-Agent Decomposition**
|
||||||
|
|
||||||
|
```
|
||||||
|
User Request: "Implement complete e-commerce system"
|
||||||
|
|
||||||
|
This is >100k tokens if done by single agent. Decompose:
|
||||||
|
|
||||||
|
Phase 1: Break into sub-systems
|
||||||
|
- Product catalog
|
||||||
|
- Shopping cart
|
||||||
|
- Checkout flow
|
||||||
|
- User authentication
|
||||||
|
- Order management
|
||||||
|
- Payment integration
|
||||||
|
|
||||||
|
Phase 2: Delegate each sub-system to separate agent
|
||||||
|
Task: backend-developer
|
||||||
|
Instruction file: ai-docs/product-catalog-requirements.md
|
||||||
|
Output file: ai-docs/product-catalog-implementation.md
|
||||||
|
|
||||||
|
Task: backend-developer
|
||||||
|
Instruction file: ai-docs/shopping-cart-requirements.md
|
||||||
|
Output file: ai-docs/shopping-cart-implementation.md
|
||||||
|
|
||||||
|
... (6 parallel agent invocations)
|
||||||
|
|
||||||
|
Phase 3: Integration agent
|
||||||
|
Task: backend-developer
|
||||||
|
Instruction: "Integrate 6 sub-systems. Read output files:
|
||||||
|
ai-docs/*-implementation.md"
|
||||||
|
Output: ai-docs/integration-plan.md
|
||||||
|
|
||||||
|
Total context per agent: ~20k tokens (manageable)
|
||||||
|
vs. Single agent: 120k+ tokens (context overflow risk)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Other Skills
|
||||||
|
|
||||||
|
**multi-agent-coordination + multi-model-validation:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Code review with multiple AI models
|
||||||
|
|
||||||
|
Step 1: Agent Selection (multi-agent-coordination)
|
||||||
|
- Detect task type: Code review
|
||||||
|
- Select agents: senior-code-reviewer (embedded) + external models
|
||||||
|
|
||||||
|
Step 2: Parallel Execution (multi-model-validation)
|
||||||
|
- Follow 4-Message Pattern
|
||||||
|
- Launch all reviewers simultaneously
|
||||||
|
- Wait for all to complete
|
||||||
|
|
||||||
|
Step 3: Consolidation (multi-model-validation)
|
||||||
|
- Auto-consolidate reviews
|
||||||
|
- Apply consensus analysis
|
||||||
|
```
|
||||||
|
|
||||||
|
**multi-agent-coordination + quality-gates:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Iterative UI validation
|
||||||
|
|
||||||
|
Step 1: Agent Selection (multi-agent-coordination)
|
||||||
|
- Detect task type: UI validation
|
||||||
|
- Select agents: designer, ui-developer
|
||||||
|
|
||||||
|
Step 2: Iteration Loop (quality-gates)
|
||||||
|
- Run designer validation
|
||||||
|
- If not PASS: delegate to ui-developer for fixes
|
||||||
|
- Loop until PASS or max iterations
|
||||||
|
|
||||||
|
Step 3: User Validation Gate (quality-gates)
|
||||||
|
- MANDATORY user approval
|
||||||
|
- Collect feedback if issues found
|
||||||
|
```
|
||||||
|
|
||||||
|
**multi-agent-coordination + todowrite-orchestration:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Multi-phase implementation workflow
|
||||||
|
|
||||||
|
Step 1: Initialize TodoWrite (todowrite-orchestration)
|
||||||
|
- Create task list for all phases
|
||||||
|
|
||||||
|
Step 2: Sequential Agent Delegation (multi-agent-coordination)
|
||||||
|
- Phase 1: api-architect
|
||||||
|
- Phase 2: backend-developer (depends on Phase 1)
|
||||||
|
- Phase 3: test-architect (depends on Phase 2)
|
||||||
|
- Update TodoWrite after each phase
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
**Do:**
|
||||||
|
- ✅ Use parallel execution for independent tasks (3-5x speedup)
|
||||||
|
- ✅ Use sequential execution when there are dependencies
|
||||||
|
- ✅ Use file-based instructions to avoid context pollution
|
||||||
|
- ✅ Return brief summaries (2-5 sentences) from sub-agents
|
||||||
|
- ✅ Select agents based on task type (API/UI/Testing/Review)
|
||||||
|
- ✅ Decompose large tasks into multiple sub-agent calls
|
||||||
|
- ✅ Estimate context usage before delegating
|
||||||
|
|
||||||
|
**Don't:**
|
||||||
|
- ❌ Mix tool types in parallel execution (breaks parallelism)
|
||||||
|
- ❌ Inline long instructions in Task prompts (context pollution)
|
||||||
|
- ❌ Return full output from sub-agents (use files instead)
|
||||||
|
- ❌ Use parallel execution for dependent tasks (wrong results)
|
||||||
|
- ❌ Use single agent for >100k token tasks (context overflow)
|
||||||
|
- ❌ Forget to wait for all parallel tasks before consolidating
|
||||||
|
|
||||||
|
**Performance Tips:**
|
||||||
|
- Parallel execution: 3-5x faster than sequential (5min vs 15min)
|
||||||
|
- File-based delegation: Saves 50-80% context usage
|
||||||
|
- Agent switching: Adapt to user preferences (speed vs quality)
|
||||||
|
- Context decomposition: Enables tasks that would otherwise overflow
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: Parallel Multi-Model Code Review
|
||||||
|
|
||||||
|
**Scenario:** User requests "Review my authentication code with Grok and Gemini"
|
||||||
|
|
||||||
|
**Agent Selection:**
|
||||||
|
- Task type: Code review
|
||||||
|
- Agents: senior-code-reviewer (embedded), external Grok, external Gemini
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Message 1: Preparation
|
||||||
|
- Write code context to ai-docs/code-review-context.md
|
||||||
|
|
||||||
|
Message 2: Parallel Execution (3 Task calls in single message)
|
||||||
|
Task: senior-code-reviewer
|
||||||
|
Prompt: "Review ai-docs/code-review-context.md for security issues"
|
||||||
|
---
|
||||||
|
Task: codex-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
|
||||||
|
Prompt: "Review ai-docs/code-review-context.md for security issues"
|
||||||
|
---
|
||||||
|
Task: codex-code-reviewer PROXY_MODE: google/gemini-2.5-flash
|
||||||
|
Prompt: "Review ai-docs/code-review-context.md for security issues"
|
||||||
|
|
||||||
|
All 3 execute simultaneously (3x faster than sequential)
|
||||||
|
|
||||||
|
Message 3: Auto-Consolidation
|
||||||
|
Task: senior-code-reviewer
|
||||||
|
Prompt: "Consolidate 3 reviews from:
|
||||||
|
- ai-docs/claude-review.md
|
||||||
|
- ai-docs/grok-review.md
|
||||||
|
- ai-docs/gemini-review.md
|
||||||
|
Prioritize by consensus."
|
||||||
|
|
||||||
|
Message 4: Present Results
|
||||||
|
"Review complete. 3 models analyzed your code.
|
||||||
|
Top 5 issues by consensus:
|
||||||
|
1. [UNANIMOUS] Missing input validation on login endpoint
|
||||||
|
2. [STRONG] SQL injection risk in user query
|
||||||
|
3. [MAJORITY] Weak password requirements
|
||||||
|
See ai-docs/consolidated-review.md for details."
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result:** 5 minutes total (vs 15+ if sequential), consensus-based prioritization
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example 2: Sequential Multi-Phase Implementation
|
||||||
|
|
||||||
|
**Scenario:** User requests "Implement payment integration feature"
|
||||||
|
|
||||||
|
**Agent Selection:**
|
||||||
|
- Task type: API implementation
|
||||||
|
- Agents: api-architect → backend-developer → test-architect → senior-code-reviewer
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 1: Architecture Planning
|
||||||
|
Write: ai-docs/payment-requirements.md
|
||||||
|
"Integrate Stripe payment processing with webhook support..."
|
||||||
|
|
||||||
|
Task: api-architect
|
||||||
|
Prompt: "Read ai-docs/payment-requirements.md
|
||||||
|
Create architecture plan"
|
||||||
|
Output: ai-docs/payment-architecture.md
|
||||||
|
Return: "Architecture plan complete. Designed 3-layer payment system."
|
||||||
|
|
||||||
|
Wait for completion ✓
|
||||||
|
|
||||||
|
Phase 2: Implementation (depends on Phase 1)
|
||||||
|
Task: backend-developer
|
||||||
|
Prompt: "Read ai-docs/payment-architecture.md
|
||||||
|
Implement payment integration"
|
||||||
|
Output: src/payment.ts, src/webhooks.ts
|
||||||
|
Return: "Payment integration implemented. 2 new files, 500 lines."
|
||||||
|
|
||||||
|
Wait for completion ✓
|
||||||
|
|
||||||
|
Phase 3: Testing (depends on Phase 2)
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Write tests for src/payment.ts and src/webhooks.ts"
|
||||||
|
Output: tests/payment.test.ts, tests/webhooks.test.ts
|
||||||
|
Return: "Test suite complete. 20 tests covering payment flows."
|
||||||
|
|
||||||
|
Wait for completion ✓
|
||||||
|
|
||||||
|
Phase 4: Code Review (depends on Phase 3)
|
||||||
|
Task: senior-code-reviewer
|
||||||
|
Prompt: "Review payment integration implementation"
|
||||||
|
Output: ai-docs/payment-review.md
|
||||||
|
Return: "Review complete. 2 MEDIUM issues found."
|
||||||
|
|
||||||
|
Wait for completion ✓
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result:** Sequential execution ensures each phase has correct inputs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example 3: Adaptive Agent Switching
|
||||||
|
|
||||||
|
**Scenario:** User requests "Validate navbar implementation" with optional external AI
|
||||||
|
|
||||||
|
**Agent Selection:**
|
||||||
|
- Task type: UI validation
|
||||||
|
- Base agent: designer
|
||||||
|
- Optional: designer-codex (if user wants external validation)
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 1: Ask user preference
|
||||||
|
"Do you want external AI validation? (Yes/No)"
|
||||||
|
|
||||||
|
Step 2a: If user says NO (speed mode)
|
||||||
|
Task: designer
|
||||||
|
Prompt: "Validate navbar against Figma design"
|
||||||
|
Output: ai-docs/design-review.md
|
||||||
|
Return: "Design validation complete. PASS with 2 minor suggestions."
|
||||||
|
|
||||||
|
Step 2b: If user says YES (quality mode)
|
||||||
|
Message 1: Parallel Validation
|
||||||
|
Task: designer
|
||||||
|
Prompt: "Validate navbar against Figma design"
|
||||||
|
---
|
||||||
|
Task: designer PROXY_MODE: design-review-codex
|
||||||
|
Prompt: "Validate navbar against Figma design"
|
||||||
|
|
||||||
|
Message 2: Consolidate
|
||||||
|
Task: designer
|
||||||
|
Prompt: "Consolidate 2 design reviews. Prioritize by consensus."
|
||||||
|
Output: ai-docs/design-review-consolidated.md
|
||||||
|
Return: "Consolidated review complete. Both agree on 1 CRITICAL issue."
|
||||||
|
|
||||||
|
Step 3: User validation
|
||||||
|
Present consolidated review to user for approval
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result:** Adaptive workflow based on user preference (speed vs quality)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Problem: Parallel tasks executing sequentially**
|
||||||
|
|
||||||
|
Cause: Mixed tool types in same message
|
||||||
|
|
||||||
|
Solution: Use 4-Message Pattern with ONLY Task calls in Message 2
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
await TodoWrite({...});
|
||||||
|
await Task({...});
|
||||||
|
await Task({...});
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
Message 1: await Bash({...}); (prep only)
|
||||||
|
Message 2: await Task({...}); await Task({...}); (parallel)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: Orchestrator context overflowing**
|
||||||
|
|
||||||
|
Cause: Inline instructions or full output returns
|
||||||
|
|
||||||
|
Solution: Use file-based delegation + brief summaries
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
Task: agent
|
||||||
|
Prompt: "[1000 lines of inline requirements]"
|
||||||
|
Return: "[500 lines of full output]"
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
Write: ai-docs/requirements.md
|
||||||
|
Task: agent
|
||||||
|
Prompt: "Read ai-docs/requirements.md"
|
||||||
|
Return: "Complete. See ai-docs/output.md"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: Wrong agent selected for task**
|
||||||
|
|
||||||
|
Cause: Task type detection failed
|
||||||
|
|
||||||
|
Solution: Explicitly detect task type using keywords
|
||||||
|
|
||||||
|
```
|
||||||
|
Check user request for keywords:
|
||||||
|
- API/endpoint/backend → api-architect, backend-developer
|
||||||
|
- UI/component/design → designer, ui-developer
|
||||||
|
- test/coverage → test-architect
|
||||||
|
- review/validate → senior-code-reviewer
|
||||||
|
|
||||||
|
Default: Ask user to clarify task type
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: Agent returns immediately before external model completes**
|
||||||
|
|
||||||
|
Cause: Background execution (non-blocking claudish call)
|
||||||
|
|
||||||
|
Solution: Use synchronous (blocking) execution
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
claudish --model grok ... & (background, returns immediately)
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
RESULT=$(claudish --model grok ...) (blocks until complete)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Multi-agent coordination is about choosing the right execution strategy:
|
||||||
|
|
||||||
|
- **Parallel** when tasks are independent (3-5x speedup)
|
||||||
|
- **Sequential** when tasks have dependencies (correct results)
|
||||||
|
- **File-based delegation** to avoid context pollution (50-80% savings)
|
||||||
|
- **Brief summaries** from sub-agents (clean orchestrator context)
|
||||||
|
- **Task type detection** for intelligent agent selection
|
||||||
|
- **Context decomposition** for large tasks (avoid overflow)
|
||||||
|
|
||||||
|
Master these patterns and you can orchestrate workflows of any complexity.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Extracted From:**
|
||||||
|
- `/implement` command (task detection, sequential workflows)
|
||||||
|
- `/validate-ui` command (adaptive agent switching)
|
||||||
|
- `/review` command (parallel execution, 4-Message Pattern)
|
||||||
|
- `CLAUDE.md` Parallel Multi-Model Execution Protocol
|
||||||
1005
skills/multi-model-validation/SKILL.md
Normal file
1005
skills/multi-model-validation/SKILL.md
Normal file
File diff suppressed because it is too large
Load Diff
996
skills/quality-gates/SKILL.md
Normal file
996
skills/quality-gates/SKILL.md
Normal file
@@ -0,0 +1,996 @@
|
|||||||
|
---
|
||||||
|
name: quality-gates
|
||||||
|
description: Implement quality gates, user approval, iteration loops, and test-driven development. Use when validating with users, implementing feedback loops, classifying issue severity, running test-driven loops, or building multi-iteration workflows. Trigger keywords - "approval", "user validation", "iteration", "feedback loop", "severity", "test-driven", "TDD", "quality gate", "consensus".
|
||||||
|
version: 0.1.0
|
||||||
|
tags: [orchestration, quality-gates, approval, iteration, feedback, severity, test-driven, TDD]
|
||||||
|
keywords: [approval, validation, iteration, feedback-loop, severity, test-driven, TDD, quality-gate, consensus, user-approval]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Quality Gates
|
||||||
|
|
||||||
|
**Version:** 1.0.0
|
||||||
|
**Purpose:** Patterns for approval gates, iteration loops, and quality validation in multi-agent workflows
|
||||||
|
**Status:** Production Ready
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Quality gates are checkpoints in workflows where execution pauses for validation before proceeding. They prevent low-quality work from advancing through the pipeline and ensure user expectations are met.
|
||||||
|
|
||||||
|
This skill provides battle-tested patterns for:
|
||||||
|
- **User approval gates** (cost gates, quality gates, final acceptance)
|
||||||
|
- **Iteration loops** (automated refinement until quality threshold met)
|
||||||
|
- **Issue severity classification** (CRITICAL, HIGH, MEDIUM, LOW)
|
||||||
|
- **Multi-reviewer consensus** (unanimous vs majority agreement)
|
||||||
|
- **Feedback loops** (user reports issues → agent fixes → user validates)
|
||||||
|
- **Test-driven development loops** (write tests → run → analyze failures → fix → repeat)
|
||||||
|
|
||||||
|
Quality gates transform "fire and forget" workflows into **iterative refinement systems** that consistently produce high-quality results.
|
||||||
|
|
||||||
|
## Core Patterns
|
||||||
|
|
||||||
|
### Pattern 1: User Approval Gates
|
||||||
|
|
||||||
|
**When to Ask for Approval:**
|
||||||
|
|
||||||
|
Use approval gates for:
|
||||||
|
- **Cost gates:** Before expensive operations (multi-model review, large-scale refactoring)
|
||||||
|
- **Quality gates:** Before proceeding to next phase (design validation before implementation)
|
||||||
|
- **Final validation:** Before completing workflow (user acceptance testing)
|
||||||
|
- **Irreversible operations:** Before destructive actions (delete files, database migrations)
|
||||||
|
|
||||||
|
**How to Present Approval:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Good Approval Prompt:
|
||||||
|
|
||||||
|
"You selected 5 AI models for code review:
|
||||||
|
- Claude Sonnet (embedded, free)
|
||||||
|
- Grok Code Fast (external, $0.002)
|
||||||
|
- Gemini 2.5 Flash (external, $0.001)
|
||||||
|
- GPT-5 Codex (external, $0.004)
|
||||||
|
- DeepSeek Coder (external, $0.001)
|
||||||
|
|
||||||
|
Estimated total cost: $0.008 ($0.005 - $0.010)
|
||||||
|
Expected duration: ~5 minutes
|
||||||
|
|
||||||
|
Proceed with multi-model review? (Yes/No/Cancel)"
|
||||||
|
|
||||||
|
Why it works:
|
||||||
|
✓ Clear context (what will happen)
|
||||||
|
✓ Cost transparency (range, not single number)
|
||||||
|
✓ Time expectation (5 minutes)
|
||||||
|
✓ Multiple options (Yes/No/Cancel)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Anti-Pattern: Vague Approval**
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
|
||||||
|
"This will cost money. Proceed? (Yes/No)"
|
||||||
|
|
||||||
|
Why it fails:
|
||||||
|
✗ No cost details (how much?)
|
||||||
|
✗ No context (what will happen?)
|
||||||
|
✗ No alternatives (what if user says no?)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Handling User Responses:**
|
||||||
|
|
||||||
|
```
|
||||||
|
User says YES:
|
||||||
|
→ Proceed with workflow
|
||||||
|
→ Track approval in logs
|
||||||
|
→ Continue to next step
|
||||||
|
|
||||||
|
User says NO:
|
||||||
|
→ Offer alternatives:
|
||||||
|
1. Use fewer models (reduce cost)
|
||||||
|
2. Use only free embedded Claude
|
||||||
|
3. Skip this step entirely
|
||||||
|
4. Cancel workflow
|
||||||
|
→ Ask user to choose alternative
|
||||||
|
→ Proceed based on choice
|
||||||
|
|
||||||
|
User says CANCEL:
|
||||||
|
→ Gracefully exit workflow
|
||||||
|
→ Save partial results (if any)
|
||||||
|
→ Log cancellation reason
|
||||||
|
→ Clean up temporary files
|
||||||
|
→ Notify user: "Workflow cancelled. Partial results saved to..."
|
||||||
|
```
|
||||||
|
|
||||||
|
**Approval Bypasses (Advanced):**
|
||||||
|
|
||||||
|
For automated workflows, allow approval bypass:
|
||||||
|
|
||||||
|
```
|
||||||
|
Automated Workflow Mode:
|
||||||
|
|
||||||
|
If workflow is triggered by CI/CD or scheduled task:
|
||||||
|
→ Skip user approval gates
|
||||||
|
→ Use predefined defaults (e.g., max cost $0.10)
|
||||||
|
→ Log decisions for audit trail
|
||||||
|
→ Email report to stakeholders after completion
|
||||||
|
|
||||||
|
Example:
|
||||||
|
if (isAutomatedMode) {
|
||||||
|
if (estimatedCost <= maxAutomatedCost) {
|
||||||
|
log("Auto-approved: $0.008 <= $0.10 threshold");
|
||||||
|
proceed();
|
||||||
|
} else {
|
||||||
|
log("Auto-rejected: $0.008 > $0.10 threshold");
|
||||||
|
notifyStakeholders("Cost exceeds automated threshold");
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 2: Iteration Loop Patterns
|
||||||
|
|
||||||
|
**Max Iteration Limits:**
|
||||||
|
|
||||||
|
Always set a **max iteration limit** to prevent infinite loops:
|
||||||
|
|
||||||
|
```
|
||||||
|
Typical Iteration Limits:
|
||||||
|
|
||||||
|
Automated quality loops: 10 iterations
|
||||||
|
- Designer validation → Developer fixes → Repeat
|
||||||
|
- Test failures → Developer fixes → Repeat
|
||||||
|
|
||||||
|
User feedback loops: 5 rounds
|
||||||
|
- User reports issues → Developer fixes → User validates → Repeat
|
||||||
|
|
||||||
|
Code review loops: 3 rounds
|
||||||
|
- Reviewer finds issues → Developer fixes → Re-review → Repeat
|
||||||
|
|
||||||
|
Multi-model consensus: 1 iteration (no loop)
|
||||||
|
- Parallel review → Consolidate → Present
|
||||||
|
```
|
||||||
|
|
||||||
|
**Exit Criteria:**
|
||||||
|
|
||||||
|
Define clear **exit criteria** for each loop type:
|
||||||
|
|
||||||
|
```
|
||||||
|
Loop Type: Design Validation
|
||||||
|
|
||||||
|
Exit Criteria (checked after each iteration):
|
||||||
|
1. Designer assessment = PASS → Exit loop (success)
|
||||||
|
2. Iteration count >= 10 → Exit loop (max iterations)
|
||||||
|
3. User manually approves → Exit loop (user override)
|
||||||
|
4. No changes made by developer → Exit loop (stuck, escalate)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
for (let i = 1; i <= 10; i++) {
|
||||||
|
const review = await designer.validate();
|
||||||
|
|
||||||
|
if (review.assessment === "PASS") {
|
||||||
|
log("Design validation passed on iteration " + i);
|
||||||
|
break; // Success exit
|
||||||
|
}
|
||||||
|
|
||||||
|
if (i === 10) {
|
||||||
|
log("Max iterations reached. Escalating to user validation.");
|
||||||
|
break; // Max iterations exit
|
||||||
|
}
|
||||||
|
|
||||||
|
await developer.fix(review.issues);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Progress Tracking:**
|
||||||
|
|
||||||
|
Show clear progress to user during iterations:
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration Loop Progress:
|
||||||
|
|
||||||
|
Iteration 1/10: Designer found 5 issues → Developer fixing...
|
||||||
|
Iteration 2/10: Designer found 3 issues → Developer fixing...
|
||||||
|
Iteration 3/10: Designer found 1 issue → Developer fixing...
|
||||||
|
Iteration 4/10: Designer assessment: PASS ✓
|
||||||
|
|
||||||
|
Loop completed in 4 iterations.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Iteration History Documentation:**
|
||||||
|
|
||||||
|
Track what happened in each iteration:
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration History (ai-docs/iteration-history.md):
|
||||||
|
|
||||||
|
## Iteration 1
|
||||||
|
Designer Assessment: NEEDS IMPROVEMENT
|
||||||
|
Issues Found:
|
||||||
|
- Button color doesn't match design (#3B82F6 vs #2563EB)
|
||||||
|
- Spacing between elements too tight (8px vs 16px)
|
||||||
|
- Font size incorrect (14px vs 16px)
|
||||||
|
Developer Actions:
|
||||||
|
- Updated button color to #2563EB
|
||||||
|
- Increased spacing to 16px
|
||||||
|
- Changed font size to 16px
|
||||||
|
|
||||||
|
## Iteration 2
|
||||||
|
Designer Assessment: NEEDS IMPROVEMENT
|
||||||
|
Issues Found:
|
||||||
|
- Border radius too large (8px vs 4px)
|
||||||
|
Developer Actions:
|
||||||
|
- Reduced border radius to 4px
|
||||||
|
|
||||||
|
## Iteration 3
|
||||||
|
Designer Assessment: PASS ✓
|
||||||
|
Issues Found: None
|
||||||
|
Result: Design validation complete
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 3: Issue Severity Classification
|
||||||
|
|
||||||
|
**Severity Levels:**
|
||||||
|
|
||||||
|
Use 4-level severity classification:
|
||||||
|
|
||||||
|
```
|
||||||
|
CRITICAL - Must fix immediately
|
||||||
|
- Blocks core functionality
|
||||||
|
- Security vulnerabilities (SQL injection, XSS, auth bypass)
|
||||||
|
- Data loss risk
|
||||||
|
- System crashes
|
||||||
|
- Build failures
|
||||||
|
|
||||||
|
Action: STOP workflow, fix immediately, re-validate
|
||||||
|
|
||||||
|
HIGH - Should fix soon
|
||||||
|
- Major bugs (incorrect behavior)
|
||||||
|
- Performance issues (>3s page load, memory leaks)
|
||||||
|
- Accessibility violations (keyboard navigation broken)
|
||||||
|
- User experience blockers
|
||||||
|
|
||||||
|
Action: Fix in current iteration, proceed after fix
|
||||||
|
|
||||||
|
MEDIUM - Should fix
|
||||||
|
- Minor bugs (edge cases, visual glitches)
|
||||||
|
- Code quality issues (duplication, complexity)
|
||||||
|
- Non-blocking performance issues
|
||||||
|
- Incomplete error handling
|
||||||
|
|
||||||
|
Action: Fix if time permits, or schedule for next iteration
|
||||||
|
|
||||||
|
LOW - Nice to have
|
||||||
|
- Code style inconsistencies
|
||||||
|
- Minor refactoring opportunities
|
||||||
|
- Documentation improvements
|
||||||
|
- Polish and optimization
|
||||||
|
|
||||||
|
Action: Log for future improvement, proceed without fixing
|
||||||
|
```
|
||||||
|
|
||||||
|
**Severity-Based Prioritization:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Issue List (sorted by severity):
|
||||||
|
|
||||||
|
CRITICAL Issues (must fix all before proceeding):
|
||||||
|
1. SQL injection in user search endpoint
|
||||||
|
2. Missing authentication check on admin routes
|
||||||
|
3. Password stored in plaintext
|
||||||
|
|
||||||
|
HIGH Issues (fix before code review):
|
||||||
|
4. Memory leak in WebSocket connection
|
||||||
|
5. Missing error handling in payment flow
|
||||||
|
6. Accessibility: keyboard navigation broken
|
||||||
|
|
||||||
|
MEDIUM Issues (fix if time permits):
|
||||||
|
7. Code duplication in auth controllers
|
||||||
|
8. Inconsistent error messages
|
||||||
|
9. Missing JSDoc comments
|
||||||
|
|
||||||
|
LOW Issues (defer to future):
|
||||||
|
10. Variable naming inconsistency
|
||||||
|
11. Redundant type annotations
|
||||||
|
12. CSS could use more specificity
|
||||||
|
|
||||||
|
Action Plan:
|
||||||
|
- Fix CRITICAL (1-3) immediately → Re-run tests
|
||||||
|
- Fix HIGH (4-6) before code review
|
||||||
|
- Log MEDIUM (7-9) for next iteration
|
||||||
|
- Ignore LOW (10-12) for now
|
||||||
|
```
|
||||||
|
|
||||||
|
**Severity Escalation:**
|
||||||
|
|
||||||
|
Issues can escalate in severity based on context:
|
||||||
|
|
||||||
|
```
|
||||||
|
Context-Based Escalation:
|
||||||
|
|
||||||
|
Issue: "Missing error handling in payment flow"
|
||||||
|
Base Severity: MEDIUM (code quality issue)
|
||||||
|
|
||||||
|
Context 1: Development environment
|
||||||
|
→ Severity: MEDIUM (not user-facing yet)
|
||||||
|
|
||||||
|
Context 2: Production environment
|
||||||
|
→ Severity: HIGH (affects real users, money involved)
|
||||||
|
|
||||||
|
Context 3: Production + recent payment failures
|
||||||
|
→ Severity: CRITICAL (actively causing issues)
|
||||||
|
|
||||||
|
Rule: Escalate severity when:
|
||||||
|
- Issue affects production users
|
||||||
|
- Issue involves money/security/data
|
||||||
|
- Issue is currently causing failures
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 4: Multi-Reviewer Consensus
|
||||||
|
|
||||||
|
**Consensus Levels:**
|
||||||
|
|
||||||
|
When multiple reviewers evaluate the same code/design:
|
||||||
|
|
||||||
|
```
|
||||||
|
UNANIMOUS (100% agreement):
|
||||||
|
- ALL reviewers flagged this issue
|
||||||
|
- VERY HIGH confidence
|
||||||
|
- Highest priority (likely a real problem)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
3/3 reviewers: "SQL injection in search endpoint"
|
||||||
|
→ UNANIMOUS consensus
|
||||||
|
→ CRITICAL priority (all agree it's critical)
|
||||||
|
|
||||||
|
STRONG CONSENSUS (67-99% agreement):
|
||||||
|
- MOST reviewers flagged this issue
|
||||||
|
- HIGH confidence
|
||||||
|
- High priority (probably a real problem)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
2/3 reviewers: "Missing input validation"
|
||||||
|
→ STRONG consensus (67%)
|
||||||
|
→ HIGH priority
|
||||||
|
|
||||||
|
MAJORITY (50-66% agreement):
|
||||||
|
- HALF or more flagged this issue
|
||||||
|
- MEDIUM confidence
|
||||||
|
- Medium priority (worth investigating)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
2/3 reviewers: "Code duplication in controllers"
|
||||||
|
→ MAJORITY consensus (67%)
|
||||||
|
→ MEDIUM priority
|
||||||
|
|
||||||
|
DIVERGENT (< 50% agreement):
|
||||||
|
- Only 1-2 reviewers flagged this issue
|
||||||
|
- LOW confidence
|
||||||
|
- Low priority (may be model-specific or false positive)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
1/3 reviewers: "Variable naming could be better"
|
||||||
|
→ DIVERGENT (33%)
|
||||||
|
→ LOW priority (one reviewer's opinion)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Consensus-Based Prioritization:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Prioritized Issue List (by consensus + severity):
|
||||||
|
|
||||||
|
1. [UNANIMOUS - CRITICAL] SQL injection in search
|
||||||
|
ALL reviewers agree: Claude, Grok, Gemini (3/3)
|
||||||
|
|
||||||
|
2. [UNANIMOUS - HIGH] Missing input validation
|
||||||
|
ALL reviewers agree: Claude, Grok, Gemini (3/3)
|
||||||
|
|
||||||
|
3. [STRONG - HIGH] Memory leak in WebSocket
|
||||||
|
MOST reviewers agree: Claude, Grok (2/3)
|
||||||
|
|
||||||
|
4. [MAJORITY - MEDIUM] Code duplication
|
||||||
|
HALF+ reviewers agree: Claude, Gemini (2/3)
|
||||||
|
|
||||||
|
5. [DIVERGENT - LOW] Variable naming
|
||||||
|
SINGLE reviewer: Claude only (1/3)
|
||||||
|
|
||||||
|
Action:
|
||||||
|
- Fix issues 1-2 immediately (unanimous + CRITICAL/HIGH)
|
||||||
|
- Fix issue 3 before review (strong consensus)
|
||||||
|
- Consider issue 4 (majority, but medium severity)
|
||||||
|
- Ignore issue 5 (divergent, likely false positive)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 5: Feedback Loop Implementation
|
||||||
|
|
||||||
|
**User Feedback Loop:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Workflow: User Validation with Feedback
|
||||||
|
|
||||||
|
Step 1: Initial Implementation
|
||||||
|
Developer implements feature
|
||||||
|
Designer/Tester validates
|
||||||
|
Present to user for manual validation
|
||||||
|
|
||||||
|
Step 2: User Validation Gate (MANDATORY)
|
||||||
|
Present to user:
|
||||||
|
"Implementation complete. Please manually verify:
|
||||||
|
- Open app at http://localhost:3000
|
||||||
|
- Test feature: [specific instructions]
|
||||||
|
- Compare to design reference
|
||||||
|
|
||||||
|
Does it meet expectations? (Yes/No)"
|
||||||
|
|
||||||
|
Step 3a: User says YES
|
||||||
|
→ ✅ Feature approved
|
||||||
|
→ Generate final report
|
||||||
|
→ Mark workflow complete
|
||||||
|
|
||||||
|
Step 3b: User says NO
|
||||||
|
→ Collect specific feedback
|
||||||
|
|
||||||
|
Step 4: Collect Specific Feedback
|
||||||
|
Ask user: "Please describe the issues you found:"
|
||||||
|
|
||||||
|
User response:
|
||||||
|
"1. Button color is wrong (should be blue, not green)
|
||||||
|
2. Spacing is too tight between elements
|
||||||
|
3. Font size is too small"
|
||||||
|
|
||||||
|
Step 5: Extract Structured Feedback
|
||||||
|
Parse user feedback into structured issues:
|
||||||
|
|
||||||
|
Issue 1:
|
||||||
|
Component: Button
|
||||||
|
Problem: Color incorrect
|
||||||
|
Expected: Blue (#2563EB)
|
||||||
|
Actual: Green (#10B981)
|
||||||
|
Severity: MEDIUM
|
||||||
|
|
||||||
|
Issue 2:
|
||||||
|
Component: Container
|
||||||
|
Problem: Spacing too tight
|
||||||
|
Expected: 16px
|
||||||
|
Actual: 8px
|
||||||
|
Severity: MEDIUM
|
||||||
|
|
||||||
|
Issue 3:
|
||||||
|
Component: Text
|
||||||
|
Problem: Font size too small
|
||||||
|
Expected: 16px
|
||||||
|
Actual: 14px
|
||||||
|
Severity: LOW
|
||||||
|
|
||||||
|
Step 6: Launch Fixing Agent
|
||||||
|
Task: ui-developer
|
||||||
|
Prompt: "Fix user-reported issues:
|
||||||
|
|
||||||
|
1. Button color: Change from #10B981 to #2563EB
|
||||||
|
2. Container spacing: Increase from 8px to 16px
|
||||||
|
3. Text font size: Increase from 14px to 16px
|
||||||
|
|
||||||
|
User feedback: [user's exact words]"
|
||||||
|
|
||||||
|
Step 7: Re-validate
|
||||||
|
After fixes:
|
||||||
|
- Re-run designer validation
|
||||||
|
- Loop back to Step 2 (user validation)
|
||||||
|
|
||||||
|
Step 8: Max Feedback Rounds
|
||||||
|
Limit: 5 feedback rounds (prevent infinite loop)
|
||||||
|
|
||||||
|
If round > 5:
|
||||||
|
Escalate to human review
|
||||||
|
"Unable to meet user expectations after 5 rounds.
|
||||||
|
Manual intervention required."
|
||||||
|
```
|
||||||
|
|
||||||
|
**Feedback Round Tracking:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Feedback Round History:
|
||||||
|
|
||||||
|
Round 1:
|
||||||
|
User Issues: Button color, spacing, font size
|
||||||
|
Fixes Applied: Updated all 3 issues
|
||||||
|
Result: Re-validate
|
||||||
|
|
||||||
|
Round 2:
|
||||||
|
User Issues: Border radius too large
|
||||||
|
Fixes Applied: Reduced border radius
|
||||||
|
Result: Re-validate
|
||||||
|
|
||||||
|
Round 3:
|
||||||
|
User Issues: None
|
||||||
|
Result: ✅ APPROVED
|
||||||
|
|
||||||
|
Total Rounds: 3/5
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 6: Test-Driven Development Loop
|
||||||
|
|
||||||
|
**When to Use:**
|
||||||
|
|
||||||
|
Use TDD loop **after implementing code, before code review**:
|
||||||
|
|
||||||
|
```
|
||||||
|
Workflow Phases:
|
||||||
|
|
||||||
|
Phase 1: Architecture Planning
|
||||||
|
Phase 2: Implementation
|
||||||
|
Phase 2.5: Test-Driven Development Loop ← THIS PATTERN
|
||||||
|
Phase 3: Code Review
|
||||||
|
Phase 4: User Acceptance
|
||||||
|
```
|
||||||
|
|
||||||
|
**The TDD Loop Pattern:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 1: Write Tests First
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Write comprehensive tests for authentication feature.
|
||||||
|
Requirements: [link to requirements]
|
||||||
|
Implementation: [link to code]"
|
||||||
|
Output: tests/auth.test.ts
|
||||||
|
|
||||||
|
Step 2: Run Tests
|
||||||
|
Bash: bun test tests/auth.test.ts
|
||||||
|
Capture output and exit code
|
||||||
|
|
||||||
|
Step 3: Check Test Results
|
||||||
|
If all tests pass:
|
||||||
|
→ ✅ TDD loop complete
|
||||||
|
→ Proceed to code review (Phase 3)
|
||||||
|
|
||||||
|
If tests fail:
|
||||||
|
→ Analyze failure (continue to Step 4)
|
||||||
|
|
||||||
|
Step 4: Analyze Test Failure
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Analyze test failure output:
|
||||||
|
|
||||||
|
[test failure logs]
|
||||||
|
|
||||||
|
Determine root cause:
|
||||||
|
- TEST_ISSUE: Test has bug (bad assertion, missing mock, wrong expectation)
|
||||||
|
- IMPLEMENTATION_ISSUE: Code has bug (logic error, missing validation, incorrect behavior)
|
||||||
|
|
||||||
|
Provide detailed analysis."
|
||||||
|
|
||||||
|
test-architect returns:
|
||||||
|
verdict: TEST_ISSUE | IMPLEMENTATION_ISSUE
|
||||||
|
analysis: Detailed explanation
|
||||||
|
recommendation: Specific fix needed
|
||||||
|
|
||||||
|
Step 5a: If TEST_ISSUE (test is wrong)
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Fix test based on analysis:
|
||||||
|
[analysis from Step 4]"
|
||||||
|
|
||||||
|
After fix:
|
||||||
|
→ Re-run tests (back to Step 2)
|
||||||
|
→ Loop continues
|
||||||
|
|
||||||
|
Step 5b: If IMPLEMENTATION_ISSUE (code is wrong)
|
||||||
|
Provide structured feedback to developer:
|
||||||
|
|
||||||
|
Task: backend-developer
|
||||||
|
Prompt: "Fix implementation based on test failure:
|
||||||
|
|
||||||
|
Test Failure:
|
||||||
|
[failure output]
|
||||||
|
|
||||||
|
Root Cause:
|
||||||
|
[analysis from test-architect]
|
||||||
|
|
||||||
|
Recommended Fix:
|
||||||
|
[specific fix needed]"
|
||||||
|
|
||||||
|
After fix:
|
||||||
|
→ Re-run tests (back to Step 2)
|
||||||
|
→ Loop continues
|
||||||
|
|
||||||
|
Step 6: Max Iteration Limit
|
||||||
|
Limit: 10 iterations
|
||||||
|
|
||||||
|
Iteration tracking:
|
||||||
|
Iteration 1/10: 5 tests failed → Fix implementation
|
||||||
|
Iteration 2/10: 2 tests failed → Fix test (bad mock)
|
||||||
|
Iteration 3/10: All tests pass ✅
|
||||||
|
|
||||||
|
If iteration > 10:
|
||||||
|
Escalate to human review
|
||||||
|
"Unable to pass all tests after 10 iterations.
|
||||||
|
Manual debugging required."
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example TDD Loop:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 2.5: Test-Driven Development Loop
|
||||||
|
|
||||||
|
Iteration 1:
|
||||||
|
Tests Run: 20 tests
|
||||||
|
Results: 5 failed, 15 passed
|
||||||
|
Failure: "JWT token validation fails with expired token"
|
||||||
|
Analysis: IMPLEMENTATION_ISSUE - Missing expiration check
|
||||||
|
Fix: Added expiration validation in TokenService
|
||||||
|
Re-run: Continue to Iteration 2
|
||||||
|
|
||||||
|
Iteration 2:
|
||||||
|
Tests Run: 20 tests
|
||||||
|
Results: 2 failed, 18 passed
|
||||||
|
Failure: "Mock database not reset between tests"
|
||||||
|
Analysis: TEST_ISSUE - Missing beforeEach cleanup
|
||||||
|
Fix: Added database reset in test setup
|
||||||
|
Re-run: Continue to Iteration 3
|
||||||
|
|
||||||
|
Iteration 3:
|
||||||
|
Tests Run: 20 tests
|
||||||
|
Results: All passed ✅
|
||||||
|
Result: TDD loop complete, proceed to code review
|
||||||
|
|
||||||
|
Total Iterations: 3/10
|
||||||
|
Duration: ~5 minutes
|
||||||
|
Benefits:
|
||||||
|
- Caught 2 bugs before code review
|
||||||
|
- Fixed 1 test quality issue
|
||||||
|
- All tests passing gives confidence in implementation
|
||||||
|
```
|
||||||
|
|
||||||
|
**Benefits of TDD Loop:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Benefits:
|
||||||
|
|
||||||
|
1. Catch bugs early (before code review, not after)
|
||||||
|
2. Ensure test quality (test-architect fixes bad tests)
|
||||||
|
3. Automated quality assurance (no manual testing needed)
|
||||||
|
4. Fast feedback loop (seconds to run tests, not minutes)
|
||||||
|
5. Confidence in implementation (all tests passing)
|
||||||
|
|
||||||
|
Performance:
|
||||||
|
Traditional: Implement → Review → Find bugs → Fix → Re-review
|
||||||
|
Time: 30+ minutes, multiple review rounds
|
||||||
|
|
||||||
|
TDD Loop: Implement → Test → Fix → Test → Review (with confidence)
|
||||||
|
Time: 15 minutes, single review round (fewer issues)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Other Skills
|
||||||
|
|
||||||
|
**quality-gates + multi-model-validation:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Cost approval before multi-model review
|
||||||
|
|
||||||
|
Step 1: Estimate costs (multi-model-validation)
|
||||||
|
Step 2: User approval gate (quality-gates)
|
||||||
|
If approved: Proceed with parallel execution
|
||||||
|
If rejected: Offer alternatives
|
||||||
|
Step 3: Execute review (multi-model-validation)
|
||||||
|
```
|
||||||
|
|
||||||
|
**quality-gates + multi-agent-coordination:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Iteration loop with designer validation
|
||||||
|
|
||||||
|
Step 1: Agent selection (multi-agent-coordination)
|
||||||
|
Select designer + ui-developer
|
||||||
|
|
||||||
|
Step 2: Iteration loop (quality-gates)
|
||||||
|
For i = 1 to 10:
|
||||||
|
- Run designer validation
|
||||||
|
- If PASS: Exit loop
|
||||||
|
- Else: Delegate to ui-developer for fixes
|
||||||
|
|
||||||
|
Step 3: User validation gate (quality-gates)
|
||||||
|
Mandatory manual approval
|
||||||
|
```
|
||||||
|
|
||||||
|
**quality-gates + error-recovery:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Test-driven loop with error recovery
|
||||||
|
|
||||||
|
Step 1: Run tests (quality-gates TDD pattern)
|
||||||
|
Step 2: If test execution fails (error-recovery)
|
||||||
|
- Syntax error → Fix and retry
|
||||||
|
- Framework crash → Notify user, skip TDD
|
||||||
|
Step 3: If tests pass (quality-gates)
|
||||||
|
- Proceed to code review
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
**Do:**
|
||||||
|
- ✅ Set max iteration limits (prevent infinite loops)
|
||||||
|
- ✅ Define clear exit criteria (PASS, max iterations, user override)
|
||||||
|
- ✅ Track iteration history (document what happened)
|
||||||
|
- ✅ Show progress to user ("Iteration 3/10 complete")
|
||||||
|
- ✅ Classify issue severity (CRITICAL → HIGH → MEDIUM → LOW)
|
||||||
|
- ✅ Prioritize by consensus + severity
|
||||||
|
- ✅ Ask user approval for expensive operations
|
||||||
|
- ✅ Collect specific feedback (not vague complaints)
|
||||||
|
- ✅ Use TDD loop to catch bugs early
|
||||||
|
|
||||||
|
**Don't:**
|
||||||
|
- ❌ Create infinite loops (no exit criteria)
|
||||||
|
- ❌ Skip user validation gates (mandatory for UX)
|
||||||
|
- ❌ Ignore consensus (unanimous issues are real)
|
||||||
|
- ❌ Batch all severities together (prioritize CRITICAL)
|
||||||
|
- ❌ Proceed without approval for >$0.01 operations
|
||||||
|
- ❌ Collect vague feedback ("it's wrong" → what specifically?)
|
||||||
|
- ❌ Skip TDD loop (catches bugs before expensive review)
|
||||||
|
|
||||||
|
**Performance:**
|
||||||
|
- Iteration loops: 5-10 iterations typical, max 10-15 min
|
||||||
|
- TDD loop: 3-5 iterations typical, max 5-10 min
|
||||||
|
- User feedback: 1-3 rounds typical, max 5 rounds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: User Approval Gate for Multi-Model Review
|
||||||
|
|
||||||
|
**Scenario:** User requests multi-model review, costs $0.008
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 1: Estimate Costs
|
||||||
|
Input: 450 lines × 1.5 = 675 tokens per model
|
||||||
|
Output: 2000-4000 tokens per model
|
||||||
|
Total: 3 models × 3000 avg = 9000 output tokens
|
||||||
|
Cost: ~$0.008 ($0.005 - $0.010)
|
||||||
|
|
||||||
|
Step 2: Present Approval Gate
|
||||||
|
"Multi-model review will analyze 450 lines with 3 AI models:
|
||||||
|
- Claude Sonnet (embedded, free)
|
||||||
|
- Grok Code Fast (external, $0.002)
|
||||||
|
- Gemini 2.5 Flash (external, $0.001)
|
||||||
|
|
||||||
|
Estimated cost: $0.008 ($0.005 - $0.010)
|
||||||
|
Duration: ~5 minutes
|
||||||
|
|
||||||
|
Proceed? (Yes/No/Cancel)"
|
||||||
|
|
||||||
|
Step 3a: User says YES
|
||||||
|
→ Proceed with parallel execution
|
||||||
|
→ Track approval: log("User approved $0.008 cost")
|
||||||
|
|
||||||
|
Step 3b: User says NO
|
||||||
|
→ Offer alternatives:
|
||||||
|
1. Use only free Claude (no external models)
|
||||||
|
2. Use only 1 external model (reduce cost to $0.002)
|
||||||
|
3. Skip review entirely
|
||||||
|
→ Ask user to choose
|
||||||
|
|
||||||
|
Step 3c: User says CANCEL
|
||||||
|
→ Exit gracefully
|
||||||
|
→ Log: "User cancelled multi-model review"
|
||||||
|
→ Clean up temporary files
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example 2: Designer Validation Iteration Loop
|
||||||
|
|
||||||
|
**Scenario:** UI implementation with automated iteration until PASS
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration 1:
|
||||||
|
Task: designer
|
||||||
|
Prompt: "Validate navbar against Figma design"
|
||||||
|
Output: ai-docs/design-review-1.md
|
||||||
|
Assessment: NEEDS IMPROVEMENT
|
||||||
|
Issues:
|
||||||
|
- Button color: #3B82F6 (expected #2563EB)
|
||||||
|
- Spacing: 8px (expected 16px)
|
||||||
|
|
||||||
|
Task: ui-developer
|
||||||
|
Prompt: "Fix issues from ai-docs/design-review-1.md"
|
||||||
|
Changes: Updated button color, increased spacing
|
||||||
|
|
||||||
|
Result: Continue to Iteration 2
|
||||||
|
|
||||||
|
Iteration 2:
|
||||||
|
Task: designer
|
||||||
|
Prompt: "Re-validate navbar"
|
||||||
|
Output: ai-docs/design-review-2.md
|
||||||
|
Assessment: NEEDS IMPROVEMENT
|
||||||
|
Issues:
|
||||||
|
- Border radius: 8px (expected 4px)
|
||||||
|
|
||||||
|
Task: ui-developer
|
||||||
|
Prompt: "Fix border radius issue"
|
||||||
|
Changes: Reduced border radius to 4px
|
||||||
|
|
||||||
|
Result: Continue to Iteration 3
|
||||||
|
|
||||||
|
Iteration 3:
|
||||||
|
Task: designer
|
||||||
|
Prompt: "Re-validate navbar"
|
||||||
|
Output: ai-docs/design-review-3.md
|
||||||
|
Assessment: PASS ✓
|
||||||
|
Issues: None
|
||||||
|
|
||||||
|
Result: Exit loop (success)
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
Total Iterations: 3/10
|
||||||
|
Duration: ~8 minutes
|
||||||
|
Automated Fixes: 3 issues resolved
|
||||||
|
Result: PASS, proceed to user validation
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example 3: Test-Driven Development Loop
|
||||||
|
|
||||||
|
**Scenario:** Authentication implementation with TDD
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 2.5: Test-Driven Development Loop
|
||||||
|
|
||||||
|
Iteration 1:
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Write tests for authentication feature"
|
||||||
|
Output: tests/auth.test.ts (20 tests)
|
||||||
|
|
||||||
|
Bash: bun test tests/auth.test.ts
|
||||||
|
Result: 5 failed, 15 passed
|
||||||
|
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Analyze test failures"
|
||||||
|
Verdict: IMPLEMENTATION_ISSUE
|
||||||
|
Analysis: "Missing JWT expiration validation"
|
||||||
|
|
||||||
|
Task: backend-developer
|
||||||
|
Prompt: "Add JWT expiration validation"
|
||||||
|
Changes: Updated TokenService.verify()
|
||||||
|
|
||||||
|
Bash: bun test tests/auth.test.ts
|
||||||
|
Result: Continue to Iteration 2
|
||||||
|
|
||||||
|
Iteration 2:
|
||||||
|
Bash: bun test tests/auth.test.ts
|
||||||
|
Result: 2 failed, 18 passed
|
||||||
|
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Analyze test failures"
|
||||||
|
Verdict: TEST_ISSUE
|
||||||
|
Analysis: "Mock database not reset between tests"
|
||||||
|
|
||||||
|
Task: test-architect
|
||||||
|
Prompt: "Fix test setup"
|
||||||
|
Changes: Added beforeEach cleanup
|
||||||
|
|
||||||
|
Bash: bun test tests/auth.test.ts
|
||||||
|
Result: Continue to Iteration 3
|
||||||
|
|
||||||
|
Iteration 3:
|
||||||
|
Bash: bun test tests/auth.test.ts
|
||||||
|
Result: All 20 passed ✅
|
||||||
|
|
||||||
|
Result: TDD loop complete, proceed to code review
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
Total Iterations: 3/10
|
||||||
|
Duration: ~5 minutes
|
||||||
|
Bugs Caught: 1 implementation bug, 1 test bug
|
||||||
|
Result: All tests passing, high confidence in code
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Problem: Infinite iteration loop**
|
||||||
|
|
||||||
|
Cause: No exit criteria or max iteration limit
|
||||||
|
|
||||||
|
Solution: Always set max iterations (10 for automated, 5 for user feedback)
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
while (true) {
|
||||||
|
if (review.assessment === "PASS") break;
|
||||||
|
fix();
|
||||||
|
}
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
for (let i = 1; i <= 10; i++) {
|
||||||
|
if (review.assessment === "PASS") break;
|
||||||
|
if (i === 10) escalateToUser();
|
||||||
|
fix();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: User approval skipped for expensive operation**
|
||||||
|
|
||||||
|
Cause: Missing approval gate
|
||||||
|
|
||||||
|
Solution: Always ask approval for costs >$0.01
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
if (userRequestedMultiModel) {
|
||||||
|
executeReview();
|
||||||
|
}
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
if (userRequestedMultiModel) {
|
||||||
|
const cost = estimateCost();
|
||||||
|
if (cost > 0.01) {
|
||||||
|
const approved = await askUserApproval(cost);
|
||||||
|
if (!approved) return offerAlternatives();
|
||||||
|
}
|
||||||
|
executeReview();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: All issues treated equally**
|
||||||
|
|
||||||
|
Cause: No severity classification
|
||||||
|
|
||||||
|
Solution: Classify by severity, prioritize CRITICAL
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
issues.forEach(issue => fix(issue));
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
const critical = issues.filter(i => i.severity === "CRITICAL");
|
||||||
|
const high = issues.filter(i => i.severity === "HIGH");
|
||||||
|
|
||||||
|
critical.forEach(issue => fix(issue)); // Fix critical first
|
||||||
|
high.forEach(issue => fix(issue)); // Then high
|
||||||
|
// MEDIUM and LOW deferred or skipped
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Quality gates ensure high-quality results through:
|
||||||
|
|
||||||
|
- **User approval gates** (cost, quality, final validation)
|
||||||
|
- **Iteration loops** (automated refinement, max 10 iterations)
|
||||||
|
- **Severity classification** (CRITICAL → HIGH → MEDIUM → LOW)
|
||||||
|
- **Consensus prioritization** (unanimous → strong → majority → divergent)
|
||||||
|
- **Feedback loops** (collect specific issues, fix, re-validate)
|
||||||
|
- **Test-driven development** (write tests, run, fix, repeat until pass)
|
||||||
|
|
||||||
|
Master these patterns and your workflows will consistently produce high-quality, validated results.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Extracted From:**
|
||||||
|
- `/review` command (user approval for costs, consensus analysis)
|
||||||
|
- `/validate-ui` command (iteration loops, user validation gates, feedback collection)
|
||||||
|
- `/implement` command (PHASE 2.5 test-driven development loop)
|
||||||
|
- Multi-model review patterns (consensus-based prioritization)
|
||||||
983
skills/todowrite-orchestration/SKILL.md
Normal file
983
skills/todowrite-orchestration/SKILL.md
Normal file
@@ -0,0 +1,983 @@
|
|||||||
|
---
|
||||||
|
name: todowrite-orchestration
|
||||||
|
description: Track progress in multi-phase workflows with TodoWrite. Use when orchestrating 5+ phase commands, managing iteration loops, tracking parallel tasks, or providing real-time progress visibility. Trigger keywords - "phase tracking", "progress", "workflow", "multi-step", "multi-phase", "todo", "tracking", "status".
|
||||||
|
version: 0.1.0
|
||||||
|
tags: [orchestration, todowrite, progress, tracking, workflow, multi-phase]
|
||||||
|
keywords: [phase-tracking, progress, workflow, multi-step, multi-phase, todo, tracking, status, visibility]
|
||||||
|
---
|
||||||
|
|
||||||
|
# TodoWrite Orchestration
|
||||||
|
|
||||||
|
**Version:** 1.0.0
|
||||||
|
**Purpose:** Patterns for using TodoWrite in complex multi-phase workflows
|
||||||
|
**Status:** Production Ready
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
TodoWrite orchestration is the practice of using the TodoWrite tool to provide **real-time progress visibility** in complex multi-phase workflows. It transforms opaque "black box" workflows into transparent, trackable processes where users can see:
|
||||||
|
|
||||||
|
- What phase is currently executing
|
||||||
|
- How many phases remain
|
||||||
|
- Which tasks are pending, in-progress, or completed
|
||||||
|
- Overall progress percentage
|
||||||
|
- Iteration counts in loops
|
||||||
|
|
||||||
|
This skill provides battle-tested patterns for:
|
||||||
|
- **Phase initialization** (create complete task list before starting)
|
||||||
|
- **Task granularity** (how to break phases into trackable tasks)
|
||||||
|
- **Status transitions** (pending → in_progress → completed)
|
||||||
|
- **Real-time updates** (mark complete immediately, not batched)
|
||||||
|
- **Iteration tracking** (progress through loops)
|
||||||
|
- **Parallel task tracking** (multiple agents executing simultaneously)
|
||||||
|
|
||||||
|
TodoWrite orchestration is especially valuable for workflows with >5 phases or >10 minutes duration, where users need progress feedback.
|
||||||
|
|
||||||
|
## Core Patterns
|
||||||
|
|
||||||
|
### Pattern 1: Phase Initialization
|
||||||
|
|
||||||
|
**Create TodoWrite List BEFORE Starting:**
|
||||||
|
|
||||||
|
Initialize TodoWrite as **step 0** of your workflow, before any actual work begins:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Initialize First:
|
||||||
|
|
||||||
|
Step 0: Initialize TodoWrite
|
||||||
|
TodoWrite: Create task list
|
||||||
|
- PHASE 1: Gather user inputs
|
||||||
|
- PHASE 1: Validate inputs
|
||||||
|
- PHASE 2: Select AI models
|
||||||
|
- PHASE 2: Estimate costs
|
||||||
|
- PHASE 2: Get user approval
|
||||||
|
- PHASE 3: Launch parallel reviews
|
||||||
|
- PHASE 3: Wait for all reviews
|
||||||
|
- PHASE 4: Consolidate reviews
|
||||||
|
- PHASE 5: Present results
|
||||||
|
|
||||||
|
Step 1: Start actual work (PHASE 1)
|
||||||
|
Mark "PHASE 1: Gather user inputs" as in_progress
|
||||||
|
... do work ...
|
||||||
|
Mark "PHASE 1: Gather user inputs" as completed
|
||||||
|
Mark "PHASE 1: Validate inputs" as in_progress
|
||||||
|
... do work ...
|
||||||
|
|
||||||
|
❌ WRONG - Create During Workflow:
|
||||||
|
|
||||||
|
Step 1: Do some work
|
||||||
|
... work happens ...
|
||||||
|
TodoWrite: Create task "Did some work" (completed)
|
||||||
|
|
||||||
|
Step 2: Do more work
|
||||||
|
... work happens ...
|
||||||
|
TodoWrite: Create task "Did more work" (completed)
|
||||||
|
|
||||||
|
Problem: User has no visibility into upcoming phases
|
||||||
|
```
|
||||||
|
|
||||||
|
**List All Phases Upfront:**
|
||||||
|
|
||||||
|
When initializing, include **all phases** in the task list, not just the current phase:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Complete Visibility:
|
||||||
|
|
||||||
|
TodoWrite Initial State:
|
||||||
|
[ ] PHASE 1: Gather user inputs
|
||||||
|
[ ] PHASE 1: Validate inputs
|
||||||
|
[ ] PHASE 2: Architecture planning
|
||||||
|
[ ] PHASE 3: Implementation
|
||||||
|
[ ] PHASE 3: Run quality checks
|
||||||
|
[ ] PHASE 4: Code review
|
||||||
|
[ ] PHASE 5: User acceptance
|
||||||
|
[ ] PHASE 6: Generate report
|
||||||
|
|
||||||
|
User sees: "8 tasks total, 0 complete, Phase 1 starting"
|
||||||
|
|
||||||
|
❌ WRONG - Incremental Discovery:
|
||||||
|
|
||||||
|
TodoWrite Initial State:
|
||||||
|
[ ] PHASE 1: Gather user inputs
|
||||||
|
[ ] PHASE 1: Validate inputs
|
||||||
|
|
||||||
|
(User thinks workflow is 2 tasks, then surprised by 6 more phases)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why Initialize First:**
|
||||||
|
|
||||||
|
1. **User expectation setting:** User knows workflow scope (8 phases, ~20 minutes)
|
||||||
|
2. **Progress visibility:** User can see % complete (3/8 = 37.5%)
|
||||||
|
3. **Time estimation:** User can estimate remaining time based on progress
|
||||||
|
4. **Transparency:** No hidden phases or surprises
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 2: Task Granularity Guidelines
|
||||||
|
|
||||||
|
**One Task Per Significant Operation:**
|
||||||
|
|
||||||
|
Each task should represent a **significant operation** (1-5 minutes of work):
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Significant Operations:
|
||||||
|
|
||||||
|
Tasks:
|
||||||
|
- PHASE 1: Ask user for inputs (30s)
|
||||||
|
- PHASE 2: Generate architecture plan (2 min)
|
||||||
|
- PHASE 3: Implement feature (5 min)
|
||||||
|
- PHASE 4: Run tests (1 min)
|
||||||
|
- PHASE 5: Code review (3 min)
|
||||||
|
|
||||||
|
Each task = meaningful unit of work
|
||||||
|
|
||||||
|
❌ WRONG - Too Granular:
|
||||||
|
|
||||||
|
Tasks:
|
||||||
|
- PHASE 1: Ask user question 1
|
||||||
|
- PHASE 1: Ask user question 2
|
||||||
|
- PHASE 1: Ask user question 3
|
||||||
|
- PHASE 2: Read file A
|
||||||
|
- PHASE 2: Read file B
|
||||||
|
- PHASE 2: Write file C
|
||||||
|
- ... (50 micro-tasks)
|
||||||
|
|
||||||
|
Problem: Too many updates, clutters user interface
|
||||||
|
```
|
||||||
|
|
||||||
|
**Multi-Step Phases: Break Into 2-3 Sub-Tasks:**
|
||||||
|
|
||||||
|
For complex phases (>5 minutes), break into 2-3 sub-tasks:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Sub-Task Breakdown:
|
||||||
|
|
||||||
|
PHASE 3: Implementation (15 min total)
|
||||||
|
→ Sub-tasks:
|
||||||
|
- PHASE 3: Implement core logic (5 min)
|
||||||
|
- PHASE 3: Add error handling (3 min)
|
||||||
|
- PHASE 3: Write tests (7 min)
|
||||||
|
|
||||||
|
User sees progress within phase: "PHASE 3: 2/3 complete"
|
||||||
|
|
||||||
|
❌ WRONG - Single Monolithic Task:
|
||||||
|
|
||||||
|
PHASE 3: Implementation (15 min)
|
||||||
|
→ No sub-tasks
|
||||||
|
|
||||||
|
Problem: User sees "in_progress" for 15 min with no updates
|
||||||
|
```
|
||||||
|
|
||||||
|
**Avoid Too Many Tasks:**
|
||||||
|
|
||||||
|
Limit to **max 15-20 tasks** for readability:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - 12 Tasks (readable):
|
||||||
|
|
||||||
|
10-phase workflow:
|
||||||
|
- PHASE 1: Ask user
|
||||||
|
- PHASE 2: Plan (2 sub-tasks)
|
||||||
|
- PHASE 3: Implement (3 sub-tasks)
|
||||||
|
- PHASE 4: Test
|
||||||
|
- PHASE 5: Review (2 sub-tasks)
|
||||||
|
- PHASE 6: Fix issues
|
||||||
|
- PHASE 7: Re-review
|
||||||
|
- PHASE 8: Accept
|
||||||
|
|
||||||
|
Total: 12 tasks (clean, trackable)
|
||||||
|
|
||||||
|
❌ WRONG - 50 Tasks (overwhelming):
|
||||||
|
|
||||||
|
Every single action as separate task:
|
||||||
|
- Read file 1
|
||||||
|
- Read file 2
|
||||||
|
- Write file 3
|
||||||
|
- Run command 1
|
||||||
|
- ... (50 tasks)
|
||||||
|
|
||||||
|
Problem: User overwhelmed, can't see forest for trees
|
||||||
|
```
|
||||||
|
|
||||||
|
**Guideline by Workflow Duration:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Workflow Duration → Task Count:
|
||||||
|
|
||||||
|
< 5 minutes: 3-5 tasks
|
||||||
|
5-15 minutes: 8-12 tasks
|
||||||
|
15-30 minutes: 12-18 tasks
|
||||||
|
> 30 minutes: 15-20 tasks (if more, group into phases)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
5-minute workflow (3 phases):
|
||||||
|
- PHASE 1: Prepare
|
||||||
|
- PHASE 2: Execute
|
||||||
|
- PHASE 3: Present
|
||||||
|
Total: 3 tasks ✓
|
||||||
|
|
||||||
|
20-minute workflow (6 phases):
|
||||||
|
- PHASE 1: Ask user
|
||||||
|
- PHASE 2: Plan (2 sub-tasks)
|
||||||
|
- PHASE 3: Implement (3 sub-tasks)
|
||||||
|
- PHASE 4: Test
|
||||||
|
- PHASE 5: Review (2 sub-tasks)
|
||||||
|
- PHASE 6: Accept
|
||||||
|
Total: 11 tasks ✓
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 3: Status Transitions
|
||||||
|
|
||||||
|
**Exactly ONE Task In Progress at a Time:**
|
||||||
|
|
||||||
|
Maintain the invariant: **exactly one task in_progress** at any moment:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - One In-Progress:
|
||||||
|
|
||||||
|
State at time T1:
|
||||||
|
[✓] PHASE 1: Ask user (completed)
|
||||||
|
[✓] PHASE 2: Plan (completed)
|
||||||
|
[→] PHASE 3: Implement (in_progress) ← Only one
|
||||||
|
[ ] PHASE 4: Test (pending)
|
||||||
|
[ ] PHASE 5: Review (pending)
|
||||||
|
|
||||||
|
State at time T2 (after PHASE 3 completes):
|
||||||
|
[✓] PHASE 1: Ask user (completed)
|
||||||
|
[✓] PHASE 2: Plan (completed)
|
||||||
|
[✓] PHASE 3: Implement (completed)
|
||||||
|
[→] PHASE 4: Test (in_progress) ← Only one
|
||||||
|
[ ] PHASE 5: Review (pending)
|
||||||
|
|
||||||
|
❌ WRONG - Multiple In-Progress:
|
||||||
|
|
||||||
|
State:
|
||||||
|
[✓] PHASE 1: Ask user (completed)
|
||||||
|
[→] PHASE 2: Plan (in_progress) ← Two in-progress?
|
||||||
|
[→] PHASE 3: Implement (in_progress) ← Confusing!
|
||||||
|
[ ] PHASE 4: Test (pending)
|
||||||
|
|
||||||
|
Problem: User confused about current phase
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Transition Sequence:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Lifecycle of a Task:
|
||||||
|
|
||||||
|
1. Created: pending
|
||||||
|
(Task exists, not started yet)
|
||||||
|
|
||||||
|
2. Started: pending → in_progress
|
||||||
|
(Mark as in_progress when starting work)
|
||||||
|
|
||||||
|
3. Completed: in_progress → completed
|
||||||
|
(Mark as completed immediately after finishing)
|
||||||
|
|
||||||
|
4. Next task: Mark next task as in_progress
|
||||||
|
(Continue to next task)
|
||||||
|
|
||||||
|
Example Timeline:
|
||||||
|
|
||||||
|
T=0s: [→] Task 1 (in_progress), [ ] Task 2 (pending)
|
||||||
|
T=30s: [✓] Task 1 (completed), [→] Task 2 (in_progress)
|
||||||
|
T=60s: [✓] Task 1 (completed), [✓] Task 2 (completed)
|
||||||
|
```
|
||||||
|
|
||||||
|
**NEVER Batch Completions:**
|
||||||
|
|
||||||
|
Mark tasks completed **immediately** after finishing, not at end of phase:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Immediate Updates:
|
||||||
|
|
||||||
|
Mark "PHASE 1: Ask user" as in_progress
|
||||||
|
... do work (30s) ...
|
||||||
|
Mark "PHASE 1: Ask user" as completed ← Immediate
|
||||||
|
|
||||||
|
Mark "PHASE 1: Validate inputs" as in_progress
|
||||||
|
... do work (20s) ...
|
||||||
|
Mark "PHASE 1: Validate inputs" as completed ← Immediate
|
||||||
|
|
||||||
|
User sees real-time progress
|
||||||
|
|
||||||
|
❌ WRONG - Batched Updates:
|
||||||
|
|
||||||
|
Mark "PHASE 1: Ask user" as in_progress
|
||||||
|
... do work (30s) ...
|
||||||
|
|
||||||
|
Mark "PHASE 1: Validate inputs" as in_progress
|
||||||
|
... do work (20s) ...
|
||||||
|
|
||||||
|
(At end of PHASE 1, batch update both to completed)
|
||||||
|
|
||||||
|
Problem: User doesn't see progress for 50s, thinks workflow is stuck
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 4: Real-Time Progress Tracking
|
||||||
|
|
||||||
|
**Update TodoWrite As Work Progresses:**
|
||||||
|
|
||||||
|
TodoWrite should reflect **current state**, not past state:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Real-Time Updates:
|
||||||
|
|
||||||
|
T=0s: Initialize TodoWrite (8 tasks, all pending)
|
||||||
|
T=5s: Mark "PHASE 1" as in_progress
|
||||||
|
T=35s: Mark "PHASE 1" as completed, "PHASE 2" as in_progress
|
||||||
|
T=90s: Mark "PHASE 2" as completed, "PHASE 3" as in_progress
|
||||||
|
...
|
||||||
|
|
||||||
|
User always sees accurate current state
|
||||||
|
|
||||||
|
❌ WRONG - Delayed Updates:
|
||||||
|
|
||||||
|
T=0s: Initialize TodoWrite
|
||||||
|
T=300s: Workflow completes
|
||||||
|
T=301s: Update all tasks to completed
|
||||||
|
|
||||||
|
Problem: No progress visibility for 5 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
**Add New Tasks If Discovered During Execution:**
|
||||||
|
|
||||||
|
If you discover additional work during execution, add new tasks:
|
||||||
|
|
||||||
|
```
|
||||||
|
Scenario: During implementation, realize refactoring needed
|
||||||
|
|
||||||
|
Initial TodoWrite:
|
||||||
|
[✓] PHASE 1: Plan
|
||||||
|
[→] PHASE 2: Implement
|
||||||
|
[ ] PHASE 3: Test
|
||||||
|
[ ] PHASE 4: Review
|
||||||
|
|
||||||
|
During PHASE 2, discover:
|
||||||
|
"Implementation requires refactoring legacy code"
|
||||||
|
|
||||||
|
Updated TodoWrite:
|
||||||
|
[✓] PHASE 1: Plan
|
||||||
|
[✓] PHASE 2: Implement core logic (completed)
|
||||||
|
[→] PHASE 2: Refactor legacy code (in_progress) ← New task added
|
||||||
|
[ ] PHASE 3: Test
|
||||||
|
[ ] PHASE 4: Review
|
||||||
|
|
||||||
|
User sees: "Additional work discovered: refactoring. Total now 5 tasks."
|
||||||
|
```
|
||||||
|
|
||||||
|
**User Can See Current Progress at Any Time:**
|
||||||
|
|
||||||
|
With real-time updates, user can check progress:
|
||||||
|
|
||||||
|
```
|
||||||
|
User checks at T=120s:
|
||||||
|
|
||||||
|
TodoWrite State:
|
||||||
|
[✓] PHASE 1: Ask user
|
||||||
|
[✓] PHASE 2: Plan architecture
|
||||||
|
[→] PHASE 3: Implement core logic (in_progress)
|
||||||
|
[ ] PHASE 3: Add error handling
|
||||||
|
[ ] PHASE 3: Write tests
|
||||||
|
[ ] PHASE 4: Code review
|
||||||
|
[ ] PHASE 5: Accept
|
||||||
|
|
||||||
|
User sees: "3/8 tasks complete (37.5%), currently implementing core logic"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 5: Iteration Loop Tracking
|
||||||
|
|
||||||
|
**Create Task Per Iteration:**
|
||||||
|
|
||||||
|
For iteration loops, create a task for each iteration:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Iteration Tasks:
|
||||||
|
|
||||||
|
Design Validation Loop (max 10 iterations):
|
||||||
|
|
||||||
|
Initial TodoWrite:
|
||||||
|
[ ] Iteration 1/10: Designer validation
|
||||||
|
[ ] Iteration 2/10: Designer validation
|
||||||
|
[ ] Iteration 3/10: Designer validation
|
||||||
|
... (create all 10 upfront)
|
||||||
|
|
||||||
|
Progress:
|
||||||
|
[✓] Iteration 1/10: Designer validation (NEEDS IMPROVEMENT)
|
||||||
|
[✓] Iteration 2/10: Designer validation (NEEDS IMPROVEMENT)
|
||||||
|
[→] Iteration 3/10: Designer validation (in_progress)
|
||||||
|
[ ] Iteration 4/10: Designer validation
|
||||||
|
...
|
||||||
|
|
||||||
|
User sees: "Iteration 3/10 in progress, 2 complete"
|
||||||
|
|
||||||
|
❌ WRONG - Single Loop Task:
|
||||||
|
|
||||||
|
TodoWrite:
|
||||||
|
[→] Design validation loop (in_progress)
|
||||||
|
|
||||||
|
Problem: User sees "in_progress" for 10 minutes, no iteration visibility
|
||||||
|
```
|
||||||
|
|
||||||
|
**Mark Iteration Complete When Done:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration Lifecycle:
|
||||||
|
|
||||||
|
Iteration 1:
|
||||||
|
Mark "Iteration 1/10" as in_progress
|
||||||
|
Run designer validation
|
||||||
|
If NEEDS IMPROVEMENT: Run developer fixes
|
||||||
|
Mark "Iteration 1/10" as completed
|
||||||
|
|
||||||
|
Iteration 2:
|
||||||
|
Mark "Iteration 2/10" as in_progress
|
||||||
|
Run designer validation
|
||||||
|
If PASS: Exit loop early
|
||||||
|
Mark "Iteration 2/10" as completed
|
||||||
|
|
||||||
|
Result: Loop exited after 2 iterations
|
||||||
|
[✓] Iteration 1/10 (completed)
|
||||||
|
[✓] Iteration 2/10 (completed)
|
||||||
|
[ ] Iteration 3/10 (not needed, loop exited)
|
||||||
|
...
|
||||||
|
|
||||||
|
User sees: "Loop completed in 2/10 iterations"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Track Total Iterations vs Max Limit:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration Progress:
|
||||||
|
|
||||||
|
Max: 10 iterations
|
||||||
|
Current: 5
|
||||||
|
|
||||||
|
TodoWrite State:
|
||||||
|
[✓] Iteration 1/10
|
||||||
|
[✓] Iteration 2/10
|
||||||
|
[✓] Iteration 3/10
|
||||||
|
[✓] Iteration 4/10
|
||||||
|
[→] Iteration 5/10
|
||||||
|
[ ] Iteration 6/10
|
||||||
|
...
|
||||||
|
|
||||||
|
User sees: "Iteration 5/10 (50% through max)"
|
||||||
|
|
||||||
|
Warning at Iteration 8:
|
||||||
|
"Iteration 8/10 - approaching max, may escalate to user if not PASS"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Clear Progress Visibility:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration Loop with TodoWrite:
|
||||||
|
|
||||||
|
User Request: "Validate UI design"
|
||||||
|
|
||||||
|
TodoWrite:
|
||||||
|
[✓] PHASE 1: Gather design reference
|
||||||
|
[✓] Iteration 1/10: Designer validation (5 issues found)
|
||||||
|
[✓] Iteration 2/10: Designer validation (3 issues found)
|
||||||
|
[✓] Iteration 3/10: Designer validation (1 issue found)
|
||||||
|
[→] Iteration 4/10: Designer validation (in_progress)
|
||||||
|
[ ] Iteration 5/10: Designer validation
|
||||||
|
...
|
||||||
|
[ ] PHASE 3: User validation gate
|
||||||
|
|
||||||
|
User sees:
|
||||||
|
- 4 iterations completed (40% through max)
|
||||||
|
- Issues reducing each iteration (5 → 3 → 1)
|
||||||
|
- Progress toward PASS
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pattern 6: Parallel Task Tracking
|
||||||
|
|
||||||
|
**Multiple Agents Executing Simultaneously:**
|
||||||
|
|
||||||
|
When running agents in parallel, track each separately:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ CORRECT - Separate Tasks for Parallel Agents:
|
||||||
|
|
||||||
|
Multi-Model Review (3 models in parallel):
|
||||||
|
|
||||||
|
TodoWrite:
|
||||||
|
[✓] PHASE 1: Prepare review context
|
||||||
|
[→] PHASE 2: Claude review (in_progress)
|
||||||
|
[→] PHASE 2: Grok review (in_progress)
|
||||||
|
[→] PHASE 2: Gemini review (in_progress)
|
||||||
|
[ ] PHASE 3: Consolidate reviews
|
||||||
|
|
||||||
|
Note: 3 tasks "in_progress" is OK for parallel execution
|
||||||
|
(Exception to "one in_progress" rule)
|
||||||
|
|
||||||
|
As models complete:
|
||||||
|
[✓] PHASE 1: Prepare review context
|
||||||
|
[✓] PHASE 2: Claude review (completed) ← First to finish
|
||||||
|
[→] PHASE 2: Grok review (in_progress)
|
||||||
|
[→] PHASE 2: Gemini review (in_progress)
|
||||||
|
[ ] PHASE 3: Consolidate reviews
|
||||||
|
|
||||||
|
User sees: "1/3 reviews complete, 2 in progress"
|
||||||
|
|
||||||
|
❌ WRONG - Single Task for Parallel Work:
|
||||||
|
|
||||||
|
TodoWrite:
|
||||||
|
[✓] PHASE 1: Prepare
|
||||||
|
[→] PHASE 2: Run 3 reviews (in_progress)
|
||||||
|
[ ] PHASE 3: Consolidate
|
||||||
|
|
||||||
|
Problem: No visibility into which reviews are complete
|
||||||
|
```
|
||||||
|
|
||||||
|
**Update As Each Agent Completes:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Parallel Execution Timeline:
|
||||||
|
|
||||||
|
T=0s: Launch 3 reviews in parallel
|
||||||
|
[→] Claude review (in_progress)
|
||||||
|
[→] Grok review (in_progress)
|
||||||
|
[→] Gemini review (in_progress)
|
||||||
|
|
||||||
|
T=60s: Claude completes first
|
||||||
|
[✓] Claude review (completed)
|
||||||
|
[→] Grok review (in_progress)
|
||||||
|
[→] Gemini review (in_progress)
|
||||||
|
|
||||||
|
T=120s: Gemini completes
|
||||||
|
[✓] Claude review (completed)
|
||||||
|
[→] Grok review (in_progress)
|
||||||
|
[✓] Gemini review (completed)
|
||||||
|
|
||||||
|
T=180s: Grok completes
|
||||||
|
[✓] Claude review (completed)
|
||||||
|
[✓] Grok review (completed)
|
||||||
|
[✓] Gemini review (completed)
|
||||||
|
|
||||||
|
User sees real-time completion updates
|
||||||
|
```
|
||||||
|
|
||||||
|
**Progress Indicators During Long Parallel Tasks:**
|
||||||
|
|
||||||
|
```
|
||||||
|
For long-running parallel tasks (>2 minutes), show progress:
|
||||||
|
|
||||||
|
T=0s: "Launching 5 AI model reviews (estimated 5 minutes)..."
|
||||||
|
T=60s: "1/5 reviews complete..."
|
||||||
|
T=120s: "2/5 reviews complete..."
|
||||||
|
T=180s: "4/5 reviews complete, 1 in progress..."
|
||||||
|
T=240s: "All reviews complete! Consolidating results..."
|
||||||
|
|
||||||
|
TodoWrite mirrors this:
|
||||||
|
[✓] Claude review (1/5 complete)
|
||||||
|
[✓] Grok review (2/5 complete)
|
||||||
|
[→] Gemini review (in_progress)
|
||||||
|
[→] GPT-5 review (in_progress)
|
||||||
|
[→] DeepSeek review (in_progress)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Other Skills
|
||||||
|
|
||||||
|
**todowrite-orchestration + multi-agent-coordination:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Multi-phase implementation workflow
|
||||||
|
|
||||||
|
Step 1: Initialize TodoWrite (todowrite-orchestration)
|
||||||
|
Create task list for all 8 phases
|
||||||
|
|
||||||
|
Step 2: Sequential Agent Delegation (multi-agent-coordination)
|
||||||
|
Phase 1: api-architect
|
||||||
|
Mark PHASE 1 as in_progress
|
||||||
|
Delegate to api-architect
|
||||||
|
Mark PHASE 1 as completed
|
||||||
|
|
||||||
|
Phase 2: backend-developer
|
||||||
|
Mark PHASE 2 as in_progress
|
||||||
|
Delegate to backend-developer
|
||||||
|
Mark PHASE 2 as completed
|
||||||
|
|
||||||
|
... continue for all phases
|
||||||
|
```
|
||||||
|
|
||||||
|
**todowrite-orchestration + multi-model-validation:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Multi-model review with progress tracking
|
||||||
|
|
||||||
|
Step 1: Initialize TodoWrite (todowrite-orchestration)
|
||||||
|
[ ] PHASE 1: Prepare context
|
||||||
|
[ ] PHASE 2: Launch reviews (5 models)
|
||||||
|
[ ] PHASE 3: Consolidate results
|
||||||
|
|
||||||
|
Step 2: Parallel Execution (multi-model-validation)
|
||||||
|
Mark "PHASE 2: Launch reviews" as in_progress
|
||||||
|
Launch all 5 models simultaneously
|
||||||
|
As each completes: Update progress (1/5, 2/5, ...)
|
||||||
|
Mark "PHASE 2: Launch reviews" as completed
|
||||||
|
|
||||||
|
Step 3: Real-Time Visibility (todowrite-orchestration)
|
||||||
|
User sees: "PHASE 2: 3/5 reviews complete..."
|
||||||
|
```
|
||||||
|
|
||||||
|
**todowrite-orchestration + quality-gates:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Use Case: Iteration loop with TodoWrite tracking
|
||||||
|
|
||||||
|
Step 1: Initialize TodoWrite (todowrite-orchestration)
|
||||||
|
[ ] Iteration 1/10
|
||||||
|
[ ] Iteration 2/10
|
||||||
|
...
|
||||||
|
|
||||||
|
Step 2: Iteration Loop (quality-gates)
|
||||||
|
For i = 1 to 10:
|
||||||
|
Mark "Iteration i/10" as in_progress
|
||||||
|
Run designer validation
|
||||||
|
If PASS: Exit loop
|
||||||
|
Mark "Iteration i/10" as completed
|
||||||
|
|
||||||
|
Step 3: Progress Visibility
|
||||||
|
User sees: "Iteration 5/10 complete, 5 remaining"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
**Do:**
|
||||||
|
- ✅ Initialize TodoWrite BEFORE starting work (step 0)
|
||||||
|
- ✅ List ALL phases upfront (user sees complete scope)
|
||||||
|
- ✅ Use 8-15 tasks for typical workflows (readable)
|
||||||
|
- ✅ Mark completed IMMEDIATELY after finishing (real-time)
|
||||||
|
- ✅ Keep exactly ONE task in_progress (except parallel tasks)
|
||||||
|
- ✅ Track iterations separately (Iteration 1/10, 2/10, ...)
|
||||||
|
- ✅ Update as work progresses (not batched at end)
|
||||||
|
- ✅ Add new tasks if discovered during execution
|
||||||
|
|
||||||
|
**Don't:**
|
||||||
|
- ❌ Create TodoWrite during workflow (initialize first)
|
||||||
|
- ❌ Hide phases from user (list all upfront)
|
||||||
|
- ❌ Create too many tasks (>20 overwhelms user)
|
||||||
|
- ❌ Batch completions at end of phase (update real-time)
|
||||||
|
- ❌ Leave multiple tasks in_progress (pick one)
|
||||||
|
- ❌ Use single task for loop (track iterations separately)
|
||||||
|
- ❌ Update only at start/end (update during execution)
|
||||||
|
|
||||||
|
**Performance:**
|
||||||
|
- TodoWrite overhead: <1s per update (negligible)
|
||||||
|
- User visibility benefit: Reduces perceived wait time 30-50%
|
||||||
|
- Workflow confidence: User knows progress, less likely to cancel
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: 8-Phase Implementation Workflow
|
||||||
|
|
||||||
|
**Scenario:** Full-cycle implementation with TodoWrite tracking
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 0: Initialize TodoWrite
|
||||||
|
TodoWrite: Create task list
|
||||||
|
[ ] PHASE 1: Ask user for requirements
|
||||||
|
[ ] PHASE 2: Generate architecture plan
|
||||||
|
[ ] PHASE 3: Implement core logic
|
||||||
|
[ ] PHASE 3: Add error handling
|
||||||
|
[ ] PHASE 3: Write tests
|
||||||
|
[ ] PHASE 4: Run test suite
|
||||||
|
[ ] PHASE 5: Code review
|
||||||
|
[ ] PHASE 6: Fix review issues
|
||||||
|
[ ] PHASE 7: User acceptance
|
||||||
|
[ ] PHASE 8: Generate report
|
||||||
|
|
||||||
|
User sees: "10 tasks, 0 complete, Phase 1 starting..."
|
||||||
|
|
||||||
|
Step 1: PHASE 1
|
||||||
|
Mark "PHASE 1: Ask user" as in_progress
|
||||||
|
... gather requirements (30s) ...
|
||||||
|
Mark "PHASE 1: Ask user" as completed
|
||||||
|
User sees: "1/10 tasks complete (10%)"
|
||||||
|
|
||||||
|
Step 2: PHASE 2
|
||||||
|
Mark "PHASE 2: Architecture plan" as in_progress
|
||||||
|
... generate plan (2 min) ...
|
||||||
|
Mark "PHASE 2: Architecture plan" as completed
|
||||||
|
User sees: "2/10 tasks complete (20%)"
|
||||||
|
|
||||||
|
Step 3: PHASE 3 (3 sub-tasks)
|
||||||
|
Mark "PHASE 3: Implement core" as in_progress
|
||||||
|
... implement (3 min) ...
|
||||||
|
Mark "PHASE 3: Implement core" as completed
|
||||||
|
User sees: "3/10 tasks complete (30%)"
|
||||||
|
|
||||||
|
Mark "PHASE 3: Add error handling" as in_progress
|
||||||
|
... add error handling (2 min) ...
|
||||||
|
Mark "PHASE 3: Add error handling" as completed
|
||||||
|
User sees: "4/10 tasks complete (40%)"
|
||||||
|
|
||||||
|
Mark "PHASE 3: Write tests" as in_progress
|
||||||
|
... write tests (3 min) ...
|
||||||
|
Mark "PHASE 3: Write tests" as completed
|
||||||
|
User sees: "5/10 tasks complete (50%)"
|
||||||
|
|
||||||
|
... continue through all phases ...
|
||||||
|
|
||||||
|
Final State:
|
||||||
|
[✓] All 10 tasks completed
|
||||||
|
User sees: "10/10 tasks complete (100%). Workflow finished!"
|
||||||
|
|
||||||
|
Total Duration: ~15 minutes
|
||||||
|
User Experience: Continuous progress updates every 1-3 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example 2: Iteration Loop with Progress Tracking
|
||||||
|
|
||||||
|
**Scenario:** Design validation with 10 max iterations
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 0: Initialize TodoWrite
|
||||||
|
TodoWrite: Create task list
|
||||||
|
[ ] PHASE 1: Gather design reference
|
||||||
|
[ ] Iteration 1/10: Designer validation
|
||||||
|
[ ] Iteration 2/10: Designer validation
|
||||||
|
[ ] Iteration 3/10: Designer validation
|
||||||
|
[ ] Iteration 4/10: Designer validation
|
||||||
|
[ ] Iteration 5/10: Designer validation
|
||||||
|
... (10 iterations total)
|
||||||
|
[ ] PHASE 3: User validation gate
|
||||||
|
|
||||||
|
Step 1: PHASE 1
|
||||||
|
Mark "PHASE 1: Gather design" as in_progress
|
||||||
|
... gather design (20s) ...
|
||||||
|
Mark "PHASE 1: Gather design" as completed
|
||||||
|
|
||||||
|
Step 2: Iteration Loop
|
||||||
|
Iteration 1:
|
||||||
|
Mark "Iteration 1/10" as in_progress
|
||||||
|
Designer: "NEEDS IMPROVEMENT - 5 issues"
|
||||||
|
Developer: Fix 5 issues
|
||||||
|
Mark "Iteration 1/10" as completed
|
||||||
|
User sees: "Iteration 1/10 complete, 5 issues fixed"
|
||||||
|
|
||||||
|
Iteration 2:
|
||||||
|
Mark "Iteration 2/10" as in_progress
|
||||||
|
Designer: "NEEDS IMPROVEMENT - 3 issues"
|
||||||
|
Developer: Fix 3 issues
|
||||||
|
Mark "Iteration 2/10" as completed
|
||||||
|
User sees: "Iteration 2/10 complete, 3 issues fixed"
|
||||||
|
|
||||||
|
Iteration 3:
|
||||||
|
Mark "Iteration 3/10" as in_progress
|
||||||
|
Designer: "NEEDS IMPROVEMENT - 1 issue"
|
||||||
|
Developer: Fix 1 issue
|
||||||
|
Mark "Iteration 3/10" as completed
|
||||||
|
User sees: "Iteration 3/10 complete, 1 issue fixed"
|
||||||
|
|
||||||
|
Iteration 4:
|
||||||
|
Mark "Iteration 4/10" as in_progress
|
||||||
|
Designer: "PASS ✓"
|
||||||
|
Mark "Iteration 4/10" as completed
|
||||||
|
Exit loop (early exit)
|
||||||
|
User sees: "Loop completed in 4/10 iterations"
|
||||||
|
|
||||||
|
Step 3: PHASE 3
|
||||||
|
Mark "PHASE 3: User validation" as in_progress
|
||||||
|
... user validates ...
|
||||||
|
Mark "PHASE 3: User validation" as completed
|
||||||
|
|
||||||
|
Final State:
|
||||||
|
[✓] PHASE 1: Gather design
|
||||||
|
[✓] Iteration 1/10 (5 issues fixed)
|
||||||
|
[✓] Iteration 2/10 (3 issues fixed)
|
||||||
|
[✓] Iteration 3/10 (1 issue fixed)
|
||||||
|
[✓] Iteration 4/10 (PASS)
|
||||||
|
[ ] Iteration 5/10 (not needed)
|
||||||
|
...
|
||||||
|
[✓] PHASE 3: User validation
|
||||||
|
|
||||||
|
User Experience: Clear iteration progress, early exit visible
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example 3: Parallel Multi-Model Review
|
||||||
|
|
||||||
|
**Scenario:** 5 AI models reviewing code in parallel
|
||||||
|
|
||||||
|
**Execution:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 0: Initialize TodoWrite
|
||||||
|
TodoWrite: Create task list
|
||||||
|
[ ] PHASE 1: Prepare review context
|
||||||
|
[ ] PHASE 2: Claude review
|
||||||
|
[ ] PHASE 2: Grok review
|
||||||
|
[ ] PHASE 2: Gemini review
|
||||||
|
[ ] PHASE 2: GPT-5 review
|
||||||
|
[ ] PHASE 2: DeepSeek review
|
||||||
|
[ ] PHASE 3: Consolidate reviews
|
||||||
|
[ ] PHASE 4: Present results
|
||||||
|
|
||||||
|
Step 1: PHASE 1
|
||||||
|
Mark "PHASE 1: Prepare context" as in_progress
|
||||||
|
... prepare (30s) ...
|
||||||
|
Mark "PHASE 1: Prepare context" as completed
|
||||||
|
|
||||||
|
Step 2: PHASE 2 (Parallel Execution)
|
||||||
|
Mark all 5 reviews as in_progress:
|
||||||
|
[→] Claude review
|
||||||
|
[→] Grok review
|
||||||
|
[→] Gemini review
|
||||||
|
[→] GPT-5 review
|
||||||
|
[→] DeepSeek review
|
||||||
|
|
||||||
|
Launch all 5 in parallel (4-Message Pattern)
|
||||||
|
|
||||||
|
As each completes:
|
||||||
|
T=60s: Claude completes
|
||||||
|
[✓] Claude review
|
||||||
|
User sees: "1/5 reviews complete"
|
||||||
|
|
||||||
|
T=90s: Gemini completes
|
||||||
|
[✓] Gemini review
|
||||||
|
User sees: "2/5 reviews complete"
|
||||||
|
|
||||||
|
T=120s: GPT-5 completes
|
||||||
|
[✓] GPT-5 review
|
||||||
|
User sees: "3/5 reviews complete"
|
||||||
|
|
||||||
|
T=150s: Grok completes
|
||||||
|
[✓] Grok review
|
||||||
|
User sees: "4/5 reviews complete"
|
||||||
|
|
||||||
|
T=180s: DeepSeek completes
|
||||||
|
[✓] DeepSeek review
|
||||||
|
User sees: "5/5 reviews complete!"
|
||||||
|
|
||||||
|
Step 3: PHASE 3
|
||||||
|
Mark "PHASE 3: Consolidate" as in_progress
|
||||||
|
... consolidate (30s) ...
|
||||||
|
Mark "PHASE 3: Consolidate" as completed
|
||||||
|
|
||||||
|
Step 4: PHASE 4
|
||||||
|
Mark "PHASE 4: Present results" as in_progress
|
||||||
|
... present (10s) ...
|
||||||
|
Mark "PHASE 4: Present results" as completed
|
||||||
|
|
||||||
|
Final State:
|
||||||
|
[✓] All 8 tasks completed
|
||||||
|
User sees: "Multi-model review complete in 3 minutes"
|
||||||
|
|
||||||
|
User Experience:
|
||||||
|
- Real-time progress as each model completes
|
||||||
|
- Clear visibility: "3/5 reviews complete"
|
||||||
|
- Reduces perceived wait time (user knows progress)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Problem: User thinks workflow is stuck**
|
||||||
|
|
||||||
|
Cause: No TodoWrite updates for >1 minute
|
||||||
|
|
||||||
|
Solution: Update TodoWrite more frequently, or add sub-tasks
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
[→] PHASE 3: Implementation (in_progress for 10 minutes)
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
[✓] PHASE 3: Implement core logic (2 min)
|
||||||
|
[✓] PHASE 3: Add error handling (3 min)
|
||||||
|
[→] PHASE 3: Write tests (in_progress, 2 min so far)
|
||||||
|
|
||||||
|
User sees progress every 2-3 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: Too many tasks (>20), overwhelming**
|
||||||
|
|
||||||
|
Cause: Too granular task breakdown
|
||||||
|
|
||||||
|
Solution: Group micro-tasks into larger operations
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong (25 tasks):
|
||||||
|
[ ] Read file 1
|
||||||
|
[ ] Read file 2
|
||||||
|
[ ] Write file 3
|
||||||
|
... (25 micro-tasks)
|
||||||
|
|
||||||
|
✅ Correct (8 tasks):
|
||||||
|
[ ] PHASE 1: Gather inputs (includes reading files)
|
||||||
|
[ ] PHASE 2: Process data
|
||||||
|
... (8 significant operations)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem: Multiple tasks "in_progress" (not parallel execution)**
|
||||||
|
|
||||||
|
Cause: Forgot to mark previous task as completed
|
||||||
|
|
||||||
|
Solution: Always mark completed before starting next
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Wrong:
|
||||||
|
[→] PHASE 1: Ask user (in_progress)
|
||||||
|
[→] PHASE 2: Plan (in_progress) ← Both in_progress?
|
||||||
|
|
||||||
|
✅ Correct:
|
||||||
|
[✓] PHASE 1: Ask user (completed)
|
||||||
|
[→] PHASE 2: Plan (in_progress) ← Only one
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
TodoWrite orchestration provides real-time progress visibility through:
|
||||||
|
|
||||||
|
- **Phase initialization** (create task list before starting)
|
||||||
|
- **Appropriate granularity** (8-15 tasks, significant operations)
|
||||||
|
- **Real-time updates** (mark completed immediately)
|
||||||
|
- **Exactly one in_progress** (except parallel execution)
|
||||||
|
- **Iteration tracking** (separate task per iteration)
|
||||||
|
- **Parallel task tracking** (update as each completes)
|
||||||
|
|
||||||
|
Master these patterns and users will always know:
|
||||||
|
- What's happening now
|
||||||
|
- What's coming next
|
||||||
|
- How much progress has been made
|
||||||
|
- How much remains
|
||||||
|
|
||||||
|
This transforms "black box" workflows into transparent, trackable processes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Extracted From:**
|
||||||
|
- `/review` command (10-task initialization, phase-based tracking)
|
||||||
|
- `/implement` command (8-phase workflow with sub-tasks)
|
||||||
|
- `/validate-ui` command (iteration tracking, user feedback rounds)
|
||||||
|
- All multi-phase orchestration workflows
|
||||||
Reference in New Issue
Block a user