--- name: quality-gates description: Implement quality gates, user approval, iteration loops, and test-driven development. Use when validating with users, implementing feedback loops, classifying issue severity, running test-driven loops, or building multi-iteration workflows. Trigger keywords - "approval", "user validation", "iteration", "feedback loop", "severity", "test-driven", "TDD", "quality gate", "consensus". version: 0.1.0 tags: [orchestration, quality-gates, approval, iteration, feedback, severity, test-driven, TDD] keywords: [approval, validation, iteration, feedback-loop, severity, test-driven, TDD, quality-gate, consensus, user-approval] --- # Quality Gates **Version:** 1.0.0 **Purpose:** Patterns for approval gates, iteration loops, and quality validation in multi-agent workflows **Status:** Production Ready ## Overview Quality gates are checkpoints in workflows where execution pauses for validation before proceeding. They prevent low-quality work from advancing through the pipeline and ensure user expectations are met. This skill provides battle-tested patterns for: - **User approval gates** (cost gates, quality gates, final acceptance) - **Iteration loops** (automated refinement until quality threshold met) - **Issue severity classification** (CRITICAL, HIGH, MEDIUM, LOW) - **Multi-reviewer consensus** (unanimous vs majority agreement) - **Feedback loops** (user reports issues → agent fixes → user validates) - **Test-driven development loops** (write tests → run → analyze failures → fix → repeat) Quality gates transform "fire and forget" workflows into **iterative refinement systems** that consistently produce high-quality results. ## Core Patterns ### Pattern 1: User Approval Gates **When to Ask for Approval:** Use approval gates for: - **Cost gates:** Before expensive operations (multi-model review, large-scale refactoring) - **Quality gates:** Before proceeding to next phase (design validation before implementation) - **Final validation:** Before completing workflow (user acceptance testing) - **Irreversible operations:** Before destructive actions (delete files, database migrations) **How to Present Approval:** ``` Good Approval Prompt: "You selected 5 AI models for code review: - Claude Sonnet (embedded, free) - Grok Code Fast (external, $0.002) - Gemini 2.5 Flash (external, $0.001) - GPT-5 Codex (external, $0.004) - DeepSeek Coder (external, $0.001) Estimated total cost: $0.008 ($0.005 - $0.010) Expected duration: ~5 minutes Proceed with multi-model review? (Yes/No/Cancel)" Why it works: ✓ Clear context (what will happen) ✓ Cost transparency (range, not single number) ✓ Time expectation (5 minutes) ✓ Multiple options (Yes/No/Cancel) ``` **Anti-Pattern: Vague Approval** ``` ❌ Wrong: "This will cost money. Proceed? (Yes/No)" Why it fails: ✗ No cost details (how much?) ✗ No context (what will happen?) ✗ No alternatives (what if user says no?) ``` **Handling User Responses:** ``` User says YES: → Proceed with workflow → Track approval in logs → Continue to next step User says NO: → Offer alternatives: 1. Use fewer models (reduce cost) 2. Use only free embedded Claude 3. Skip this step entirely 4. Cancel workflow → Ask user to choose alternative → Proceed based on choice User says CANCEL: → Gracefully exit workflow → Save partial results (if any) → Log cancellation reason → Clean up temporary files → Notify user: "Workflow cancelled. Partial results saved to..." ``` **Approval Bypasses (Advanced):** For automated workflows, allow approval bypass: ``` Automated Workflow Mode: If workflow is triggered by CI/CD or scheduled task: → Skip user approval gates → Use predefined defaults (e.g., max cost $0.10) → Log decisions for audit trail → Email report to stakeholders after completion Example: if (isAutomatedMode) { if (estimatedCost <= maxAutomatedCost) { log("Auto-approved: $0.008 <= $0.10 threshold"); proceed(); } else { log("Auto-rejected: $0.008 > $0.10 threshold"); notifyStakeholders("Cost exceeds automated threshold"); abort(); } } ``` --- ### Pattern 2: Iteration Loop Patterns **Max Iteration Limits:** Always set a **max iteration limit** to prevent infinite loops: ``` Typical Iteration Limits: Automated quality loops: 10 iterations - Designer validation → Developer fixes → Repeat - Test failures → Developer fixes → Repeat User feedback loops: 5 rounds - User reports issues → Developer fixes → User validates → Repeat Code review loops: 3 rounds - Reviewer finds issues → Developer fixes → Re-review → Repeat Multi-model consensus: 1 iteration (no loop) - Parallel review → Consolidate → Present ``` **Exit Criteria:** Define clear **exit criteria** for each loop type: ``` Loop Type: Design Validation Exit Criteria (checked after each iteration): 1. Designer assessment = PASS → Exit loop (success) 2. Iteration count >= 10 → Exit loop (max iterations) 3. User manually approves → Exit loop (user override) 4. No changes made by developer → Exit loop (stuck, escalate) Example: for (let i = 1; i <= 10; i++) { const review = await designer.validate(); if (review.assessment === "PASS") { log("Design validation passed on iteration " + i); break; // Success exit } if (i === 10) { log("Max iterations reached. Escalating to user validation."); break; // Max iterations exit } await developer.fix(review.issues); } ``` **Progress Tracking:** Show clear progress to user during iterations: ``` Iteration Loop Progress: Iteration 1/10: Designer found 5 issues → Developer fixing... Iteration 2/10: Designer found 3 issues → Developer fixing... Iteration 3/10: Designer found 1 issue → Developer fixing... Iteration 4/10: Designer assessment: PASS ✓ Loop completed in 4 iterations. ``` **Iteration History Documentation:** Track what happened in each iteration: ``` Iteration History (ai-docs/iteration-history.md): ## Iteration 1 Designer Assessment: NEEDS IMPROVEMENT Issues Found: - Button color doesn't match design (#3B82F6 vs #2563EB) - Spacing between elements too tight (8px vs 16px) - Font size incorrect (14px vs 16px) Developer Actions: - Updated button color to #2563EB - Increased spacing to 16px - Changed font size to 16px ## Iteration 2 Designer Assessment: NEEDS IMPROVEMENT Issues Found: - Border radius too large (8px vs 4px) Developer Actions: - Reduced border radius to 4px ## Iteration 3 Designer Assessment: PASS ✓ Issues Found: None Result: Design validation complete ``` --- ### Pattern 3: Issue Severity Classification **Severity Levels:** Use 4-level severity classification: ``` CRITICAL - Must fix immediately - Blocks core functionality - Security vulnerabilities (SQL injection, XSS, auth bypass) - Data loss risk - System crashes - Build failures Action: STOP workflow, fix immediately, re-validate HIGH - Should fix soon - Major bugs (incorrect behavior) - Performance issues (>3s page load, memory leaks) - Accessibility violations (keyboard navigation broken) - User experience blockers Action: Fix in current iteration, proceed after fix MEDIUM - Should fix - Minor bugs (edge cases, visual glitches) - Code quality issues (duplication, complexity) - Non-blocking performance issues - Incomplete error handling Action: Fix if time permits, or schedule for next iteration LOW - Nice to have - Code style inconsistencies - Minor refactoring opportunities - Documentation improvements - Polish and optimization Action: Log for future improvement, proceed without fixing ``` **Severity-Based Prioritization:** ``` Issue List (sorted by severity): CRITICAL Issues (must fix all before proceeding): 1. SQL injection in user search endpoint 2. Missing authentication check on admin routes 3. Password stored in plaintext HIGH Issues (fix before code review): 4. Memory leak in WebSocket connection 5. Missing error handling in payment flow 6. Accessibility: keyboard navigation broken MEDIUM Issues (fix if time permits): 7. Code duplication in auth controllers 8. Inconsistent error messages 9. Missing JSDoc comments LOW Issues (defer to future): 10. Variable naming inconsistency 11. Redundant type annotations 12. CSS could use more specificity Action Plan: - Fix CRITICAL (1-3) immediately → Re-run tests - Fix HIGH (4-6) before code review - Log MEDIUM (7-9) for next iteration - Ignore LOW (10-12) for now ``` **Severity Escalation:** Issues can escalate in severity based on context: ``` Context-Based Escalation: Issue: "Missing error handling in payment flow" Base Severity: MEDIUM (code quality issue) Context 1: Development environment → Severity: MEDIUM (not user-facing yet) Context 2: Production environment → Severity: HIGH (affects real users, money involved) Context 3: Production + recent payment failures → Severity: CRITICAL (actively causing issues) Rule: Escalate severity when: - Issue affects production users - Issue involves money/security/data - Issue is currently causing failures ``` --- ### Pattern 4: Multi-Reviewer Consensus **Consensus Levels:** When multiple reviewers evaluate the same code/design: ``` UNANIMOUS (100% agreement): - ALL reviewers flagged this issue - VERY HIGH confidence - Highest priority (likely a real problem) Example: 3/3 reviewers: "SQL injection in search endpoint" → UNANIMOUS consensus → CRITICAL priority (all agree it's critical) STRONG CONSENSUS (67-99% agreement): - MOST reviewers flagged this issue - HIGH confidence - High priority (probably a real problem) Example: 2/3 reviewers: "Missing input validation" → STRONG consensus (67%) → HIGH priority MAJORITY (50-66% agreement): - HALF or more flagged this issue - MEDIUM confidence - Medium priority (worth investigating) Example: 2/3 reviewers: "Code duplication in controllers" → MAJORITY consensus (67%) → MEDIUM priority DIVERGENT (< 50% agreement): - Only 1-2 reviewers flagged this issue - LOW confidence - Low priority (may be model-specific or false positive) Example: 1/3 reviewers: "Variable naming could be better" → DIVERGENT (33%) → LOW priority (one reviewer's opinion) ``` **Consensus-Based Prioritization:** ``` Prioritized Issue List (by consensus + severity): 1. [UNANIMOUS - CRITICAL] SQL injection in search ALL reviewers agree: Claude, Grok, Gemini (3/3) 2. [UNANIMOUS - HIGH] Missing input validation ALL reviewers agree: Claude, Grok, Gemini (3/3) 3. [STRONG - HIGH] Memory leak in WebSocket MOST reviewers agree: Claude, Grok (2/3) 4. [MAJORITY - MEDIUM] Code duplication HALF+ reviewers agree: Claude, Gemini (2/3) 5. [DIVERGENT - LOW] Variable naming SINGLE reviewer: Claude only (1/3) Action: - Fix issues 1-2 immediately (unanimous + CRITICAL/HIGH) - Fix issue 3 before review (strong consensus) - Consider issue 4 (majority, but medium severity) - Ignore issue 5 (divergent, likely false positive) ``` --- ### Pattern 5: Feedback Loop Implementation **User Feedback Loop:** ``` Workflow: User Validation with Feedback Step 1: Initial Implementation Developer implements feature Designer/Tester validates Present to user for manual validation Step 2: User Validation Gate (MANDATORY) Present to user: "Implementation complete. Please manually verify: - Open app at http://localhost:3000 - Test feature: [specific instructions] - Compare to design reference Does it meet expectations? (Yes/No)" Step 3a: User says YES → ✅ Feature approved → Generate final report → Mark workflow complete Step 3b: User says NO → Collect specific feedback Step 4: Collect Specific Feedback Ask user: "Please describe the issues you found:" User response: "1. Button color is wrong (should be blue, not green) 2. Spacing is too tight between elements 3. Font size is too small" Step 5: Extract Structured Feedback Parse user feedback into structured issues: Issue 1: Component: Button Problem: Color incorrect Expected: Blue (#2563EB) Actual: Green (#10B981) Severity: MEDIUM Issue 2: Component: Container Problem: Spacing too tight Expected: 16px Actual: 8px Severity: MEDIUM Issue 3: Component: Text Problem: Font size too small Expected: 16px Actual: 14px Severity: LOW Step 6: Launch Fixing Agent Task: ui-developer Prompt: "Fix user-reported issues: 1. Button color: Change from #10B981 to #2563EB 2. Container spacing: Increase from 8px to 16px 3. Text font size: Increase from 14px to 16px User feedback: [user's exact words]" Step 7: Re-validate After fixes: - Re-run designer validation - Loop back to Step 2 (user validation) Step 8: Max Feedback Rounds Limit: 5 feedback rounds (prevent infinite loop) If round > 5: Escalate to human review "Unable to meet user expectations after 5 rounds. Manual intervention required." ``` **Feedback Round Tracking:** ``` Feedback Round History: Round 1: User Issues: Button color, spacing, font size Fixes Applied: Updated all 3 issues Result: Re-validate Round 2: User Issues: Border radius too large Fixes Applied: Reduced border radius Result: Re-validate Round 3: User Issues: None Result: ✅ APPROVED Total Rounds: 3/5 ``` --- ### Pattern 6: Test-Driven Development Loop **When to Use:** Use TDD loop **after implementing code, before code review**: ``` Workflow Phases: Phase 1: Architecture Planning Phase 2: Implementation Phase 2.5: Test-Driven Development Loop ← THIS PATTERN Phase 3: Code Review Phase 4: User Acceptance ``` **The TDD Loop Pattern:** ``` Step 1: Write Tests First Task: test-architect Prompt: "Write comprehensive tests for authentication feature. Requirements: [link to requirements] Implementation: [link to code]" Output: tests/auth.test.ts Step 2: Run Tests Bash: bun test tests/auth.test.ts Capture output and exit code Step 3: Check Test Results If all tests pass: → ✅ TDD loop complete → Proceed to code review (Phase 3) If tests fail: → Analyze failure (continue to Step 4) Step 4: Analyze Test Failure Task: test-architect Prompt: "Analyze test failure output: [test failure logs] Determine root cause: - TEST_ISSUE: Test has bug (bad assertion, missing mock, wrong expectation) - IMPLEMENTATION_ISSUE: Code has bug (logic error, missing validation, incorrect behavior) Provide detailed analysis." test-architect returns: verdict: TEST_ISSUE | IMPLEMENTATION_ISSUE analysis: Detailed explanation recommendation: Specific fix needed Step 5a: If TEST_ISSUE (test is wrong) Task: test-architect Prompt: "Fix test based on analysis: [analysis from Step 4]" After fix: → Re-run tests (back to Step 2) → Loop continues Step 5b: If IMPLEMENTATION_ISSUE (code is wrong) Provide structured feedback to developer: Task: backend-developer Prompt: "Fix implementation based on test failure: Test Failure: [failure output] Root Cause: [analysis from test-architect] Recommended Fix: [specific fix needed]" After fix: → Re-run tests (back to Step 2) → Loop continues Step 6: Max Iteration Limit Limit: 10 iterations Iteration tracking: Iteration 1/10: 5 tests failed → Fix implementation Iteration 2/10: 2 tests failed → Fix test (bad mock) Iteration 3/10: All tests pass ✅ If iteration > 10: Escalate to human review "Unable to pass all tests after 10 iterations. Manual debugging required." ``` **Example TDD Loop:** ``` Phase 2.5: Test-Driven Development Loop Iteration 1: Tests Run: 20 tests Results: 5 failed, 15 passed Failure: "JWT token validation fails with expired token" Analysis: IMPLEMENTATION_ISSUE - Missing expiration check Fix: Added expiration validation in TokenService Re-run: Continue to Iteration 2 Iteration 2: Tests Run: 20 tests Results: 2 failed, 18 passed Failure: "Mock database not reset between tests" Analysis: TEST_ISSUE - Missing beforeEach cleanup Fix: Added database reset in test setup Re-run: Continue to Iteration 3 Iteration 3: Tests Run: 20 tests Results: All passed ✅ Result: TDD loop complete, proceed to code review Total Iterations: 3/10 Duration: ~5 minutes Benefits: - Caught 2 bugs before code review - Fixed 1 test quality issue - All tests passing gives confidence in implementation ``` **Benefits of TDD Loop:** ``` Benefits: 1. Catch bugs early (before code review, not after) 2. Ensure test quality (test-architect fixes bad tests) 3. Automated quality assurance (no manual testing needed) 4. Fast feedback loop (seconds to run tests, not minutes) 5. Confidence in implementation (all tests passing) Performance: Traditional: Implement → Review → Find bugs → Fix → Re-review Time: 30+ minutes, multiple review rounds TDD Loop: Implement → Test → Fix → Test → Review (with confidence) Time: 15 minutes, single review round (fewer issues) ``` --- ## Integration with Other Skills **quality-gates + multi-model-validation:** ``` Use Case: Cost approval before multi-model review Step 1: Estimate costs (multi-model-validation) Step 2: User approval gate (quality-gates) If approved: Proceed with parallel execution If rejected: Offer alternatives Step 3: Execute review (multi-model-validation) ``` **quality-gates + multi-agent-coordination:** ``` Use Case: Iteration loop with designer validation Step 1: Agent selection (multi-agent-coordination) Select designer + ui-developer Step 2: Iteration loop (quality-gates) For i = 1 to 10: - Run designer validation - If PASS: Exit loop - Else: Delegate to ui-developer for fixes Step 3: User validation gate (quality-gates) Mandatory manual approval ``` **quality-gates + error-recovery:** ``` Use Case: Test-driven loop with error recovery Step 1: Run tests (quality-gates TDD pattern) Step 2: If test execution fails (error-recovery) - Syntax error → Fix and retry - Framework crash → Notify user, skip TDD Step 3: If tests pass (quality-gates) - Proceed to code review ``` --- ## Best Practices **Do:** - ✅ Set max iteration limits (prevent infinite loops) - ✅ Define clear exit criteria (PASS, max iterations, user override) - ✅ Track iteration history (document what happened) - ✅ Show progress to user ("Iteration 3/10 complete") - ✅ Classify issue severity (CRITICAL → HIGH → MEDIUM → LOW) - ✅ Prioritize by consensus + severity - ✅ Ask user approval for expensive operations - ✅ Collect specific feedback (not vague complaints) - ✅ Use TDD loop to catch bugs early **Don't:** - ❌ Create infinite loops (no exit criteria) - ❌ Skip user validation gates (mandatory for UX) - ❌ Ignore consensus (unanimous issues are real) - ❌ Batch all severities together (prioritize CRITICAL) - ❌ Proceed without approval for >$0.01 operations - ❌ Collect vague feedback ("it's wrong" → what specifically?) - ❌ Skip TDD loop (catches bugs before expensive review) **Performance:** - Iteration loops: 5-10 iterations typical, max 10-15 min - TDD loop: 3-5 iterations typical, max 5-10 min - User feedback: 1-3 rounds typical, max 5 rounds --- ## Examples ### Example 1: User Approval Gate for Multi-Model Review **Scenario:** User requests multi-model review, costs $0.008 **Execution:** ``` Step 1: Estimate Costs Input: 450 lines × 1.5 = 675 tokens per model Output: 2000-4000 tokens per model Total: 3 models × 3000 avg = 9000 output tokens Cost: ~$0.008 ($0.005 - $0.010) Step 2: Present Approval Gate "Multi-model review will analyze 450 lines with 3 AI models: - Claude Sonnet (embedded, free) - Grok Code Fast (external, $0.002) - Gemini 2.5 Flash (external, $0.001) Estimated cost: $0.008 ($0.005 - $0.010) Duration: ~5 minutes Proceed? (Yes/No/Cancel)" Step 3a: User says YES → Proceed with parallel execution → Track approval: log("User approved $0.008 cost") Step 3b: User says NO → Offer alternatives: 1. Use only free Claude (no external models) 2. Use only 1 external model (reduce cost to $0.002) 3. Skip review entirely → Ask user to choose Step 3c: User says CANCEL → Exit gracefully → Log: "User cancelled multi-model review" → Clean up temporary files ``` --- ### Example 2: Designer Validation Iteration Loop **Scenario:** UI implementation with automated iteration until PASS **Execution:** ``` Iteration 1: Task: designer Prompt: "Validate navbar against Figma design" Output: ai-docs/design-review-1.md Assessment: NEEDS IMPROVEMENT Issues: - Button color: #3B82F6 (expected #2563EB) - Spacing: 8px (expected 16px) Task: ui-developer Prompt: "Fix issues from ai-docs/design-review-1.md" Changes: Updated button color, increased spacing Result: Continue to Iteration 2 Iteration 2: Task: designer Prompt: "Re-validate navbar" Output: ai-docs/design-review-2.md Assessment: NEEDS IMPROVEMENT Issues: - Border radius: 8px (expected 4px) Task: ui-developer Prompt: "Fix border radius issue" Changes: Reduced border radius to 4px Result: Continue to Iteration 3 Iteration 3: Task: designer Prompt: "Re-validate navbar" Output: ai-docs/design-review-3.md Assessment: PASS ✓ Issues: None Result: Exit loop (success) Summary: Total Iterations: 3/10 Duration: ~8 minutes Automated Fixes: 3 issues resolved Result: PASS, proceed to user validation ``` --- ### Example 3: Test-Driven Development Loop **Scenario:** Authentication implementation with TDD **Execution:** ``` Phase 2.5: Test-Driven Development Loop Iteration 1: Task: test-architect Prompt: "Write tests for authentication feature" Output: tests/auth.test.ts (20 tests) Bash: bun test tests/auth.test.ts Result: 5 failed, 15 passed Task: test-architect Prompt: "Analyze test failures" Verdict: IMPLEMENTATION_ISSUE Analysis: "Missing JWT expiration validation" Task: backend-developer Prompt: "Add JWT expiration validation" Changes: Updated TokenService.verify() Bash: bun test tests/auth.test.ts Result: Continue to Iteration 2 Iteration 2: Bash: bun test tests/auth.test.ts Result: 2 failed, 18 passed Task: test-architect Prompt: "Analyze test failures" Verdict: TEST_ISSUE Analysis: "Mock database not reset between tests" Task: test-architect Prompt: "Fix test setup" Changes: Added beforeEach cleanup Bash: bun test tests/auth.test.ts Result: Continue to Iteration 3 Iteration 3: Bash: bun test tests/auth.test.ts Result: All 20 passed ✅ Result: TDD loop complete, proceed to code review Summary: Total Iterations: 3/10 Duration: ~5 minutes Bugs Caught: 1 implementation bug, 1 test bug Result: All tests passing, high confidence in code ``` --- ## Troubleshooting **Problem: Infinite iteration loop** Cause: No exit criteria or max iteration limit Solution: Always set max iterations (10 for automated, 5 for user feedback) ``` ❌ Wrong: while (true) { if (review.assessment === "PASS") break; fix(); } ✅ Correct: for (let i = 1; i <= 10; i++) { if (review.assessment === "PASS") break; if (i === 10) escalateToUser(); fix(); } ``` --- **Problem: User approval skipped for expensive operation** Cause: Missing approval gate Solution: Always ask approval for costs >$0.01 ``` ❌ Wrong: if (userRequestedMultiModel) { executeReview(); } ✅ Correct: if (userRequestedMultiModel) { const cost = estimateCost(); if (cost > 0.01) { const approved = await askUserApproval(cost); if (!approved) return offerAlternatives(); } executeReview(); } ``` --- **Problem: All issues treated equally** Cause: No severity classification Solution: Classify by severity, prioritize CRITICAL ``` ❌ Wrong: issues.forEach(issue => fix(issue)); ✅ Correct: const critical = issues.filter(i => i.severity === "CRITICAL"); const high = issues.filter(i => i.severity === "HIGH"); critical.forEach(issue => fix(issue)); // Fix critical first high.forEach(issue => fix(issue)); // Then high // MEDIUM and LOW deferred or skipped ``` --- ## Summary Quality gates ensure high-quality results through: - **User approval gates** (cost, quality, final validation) - **Iteration loops** (automated refinement, max 10 iterations) - **Severity classification** (CRITICAL → HIGH → MEDIUM → LOW) - **Consensus prioritization** (unanimous → strong → majority → divergent) - **Feedback loops** (collect specific issues, fix, re-validate) - **Test-driven development** (write tests, run, fix, repeat until pass) Master these patterns and your workflows will consistently produce high-quality, validated results. --- **Extracted From:** - `/review` command (user approval for costs, consensus analysis) - `/validate-ui` command (iteration loops, user validation gates, feedback collection) - `/implement` command (PHASE 2.5 test-driven development loop) - Multi-model review patterns (consensus-based prioritization)