--- name: Orchestration QA description: Quality assurance for orchestration workflows - validates Skills and Subagents follow documented patterns, tracks deviations, suggests improvements --- # Orchestration QA Skill ## Overview This skill provides quality assurance for Task Orchestrator workflows by validating that Skills and Subagents follow their documented patterns, detecting deviations, and suggesting continuous improvements. **Key Capabilities:** - **Interactive configuration** - User chooses which analyses to enable (token efficiency) - **Pre-execution validation** - Context capture, checkpoint setting - **Post-execution review** - Workflow adherence, output validation - **Specialized quality analysis** - Execution graphs, tag coverage, information density - **Efficiency analysis** - Token optimization, tool selection, parallelization - **Deviation reporting** - Structured findings with severity (ALERT/WARN/INFO) - **Pattern tracking** - Continuous improvement suggestions **Philosophy:** - ✅ **User-driven configuration** - Pay token costs only for analyses you want - ✅ **Observe and validate** - Never blocks execution - ✅ **Report transparently** - Clear severity levels (ALERT/WARN/INFO) - ✅ **Learn from patterns** - Track issues, suggest improvements - ✅ **Progressive loading** - Load only analysis needed for context - ❌ **Not a blocker** - Warns about issues, doesn't stop workflows - ❌ **Not auto-fix** - Asks user for decisions on deviations ## When to Use This Skill ### Interactive Configuration (FIRST TIME) **Trigger**: First time using orchestration-qa in a session, or when user wants to change settings **Action**: Ask user which analysis categories to enable (multiselect interface) **Output**: Configuration stored in session, used for all subsequent reviews **User Value**: Only pay token costs for analyses you actually want ### Session Initialization **Trigger**: After configuration, at start of orchestration session **Action**: Load knowledge bases (Skills, Subagents, routing config) based on enabled categories **Output**: Initialization status with active configuration, ready signal ### Pre-Execution Validation **Triggers**: - "Create feature for X" (before Feature Orchestration Skill or Feature Architect) - "Execute tasks" (before Task Orchestration Skill) - "Mark complete" (before Status Progression Skill) - Before launching any Skill or Subagent **Action**: Capture context, set validation checkpoints **Output**: Stored context for post-execution comparison ### Post-Execution Review **Triggers**: - After any Skill completes - After any Subagent returns - User asks: "Review quality", "Show QA results", "Any issues?" **Action**: Validate workflow adherence, analyze quality, detect deviations **Output**: Structured quality report with findings and recommendations ## Parameters ```typescript { phase: "init" | "pre" | "post" | "configure", // For pre/post phases entityType?: "feature-orchestration" | "task-orchestration" | "status-progression" | "dependency-analysis" | "feature-architect" | "planning-specialist" | "backend-engineer" | "frontend-developer" | "database-engineer" | "test-engineer" | "technical-writer" | "bug-triage-specialist", // For pre phase userInput?: string, // Original user request // For post phase entityOutput?: string, // Output from Skill/Subagent entityId?: string, // Feature/Task/Project ID (if applicable) // Optional verboseReporting?: boolean // Default: false (brief reports) } ``` ## Workflow ### Phase: configure (Interactive Configuration) - **ALWAYS RUN FIRST** **Purpose**: Let user choose which analysis categories to enable for the session **When**: Before init phase, or when user wants to change settings mid-session **Interactive Prompts**: Use AskUserQuestion to present configuration options: ```javascript AskUserQuestion({ questions: [ { question: "Which quality analysis categories would you like to enable for this session?", header: "QA Categories", multiSelect: true, options: [ { label: "Information Density", description: "Analyze task content quality, detect wasteful patterns, measure information-to-token ratio (Specialists only)" }, { label: "Execution Graphs", description: "Validate dependency graphs and parallel execution opportunities (Planning Specialist only)" }, { label: "Tag Coverage", description: "Check tag consistency and agent-mapping coverage (Planning Specialist & Feature Architect)" }, { label: "Token Optimization", description: "Identify token waste patterns (verbose output, unnecessary loading, redundant operations)" }, { label: "Tool Selection", description: "Verify optimal tool usage (overview vs get, search vs filtered query, bulk operations)" }, { label: "Routing Validation", description: "Detect Skills bypass violations (CRITICAL - status changes, feature creation, task execution)" }, { label: "Parallel Detection", description: "Find missed parallelization opportunities (independent tasks, batch operations)" } ] }, { question: "How detailed should QA reports be?", header: "Report Style", multiSelect: false, options: [ { label: "Brief", description: "Only show critical issues (ALERT level) - minimal token usage" }, { label: "Standard", description: "Show ALERT and WARN level issues with brief explanations" }, { label: "Detailed", description: "Show all issues (ALERT/WARN/INFO) with full analysis and recommendations" } ] } ] }) ``` **Default Configuration** (if user skips configuration): - ✅ Routing Validation (CRITICAL - always enabled) - ✅ Information Density (for specialists) - ❌ All other categories disabled - Report style: Standard **Configuration Storage**: Store user preferences in session state: ```javascript session.qaConfig = { enabled: { informationDensity: true/false, executionGraphs: true/false, tagCoverage: true/false, tokenOptimization: true/false, toolSelection: true/false, routingValidation: true, // Always true (CRITICAL) parallelDetection: true/false }, reportStyle: "brief" | "standard" | "detailed" } ``` **Token Cost**: ~200-300 tokens (one-time configuration) ### Phase: init (Session Initialization) **Purpose**: Load knowledge bases for validation throughout session **Steps**: 1. **If not configured**: Run configure phase first (interactive) 2. Read `initialization.md` for setup workflow 3. Glob `.claude/skills/*/SKILL.md` → extract Skills knowledge - Parse skill name, triggers, workflows, tools, token ranges 4. Glob `.claude/agents/task-orchestrator/*.md` → extract Subagents knowledge - Parse agent name, steps, critical patterns, output validation 5. Read `agent-mapping.yaml` → extract routing configuration 6. Initialize tracking state (deviations, patterns, improvements) 7. Report initialization status with active configuration **Output**: ```javascript { initialized: true, knowledgeBase: { skillsCount: 5, subagentsCount: 8, routingLoaded: true }, configuration: { enabled: ["Information Density", "Routing Validation"], disabled: ["Execution Graphs", "Tag Coverage", "Token Optimization", "Tool Selection", "Parallel Detection"], reportStyle: "standard" }, tracking: { sessionStart: timestamp, deviations: [], patterns: [] } } ``` **Token Cost**: ~800-1000 tokens (loads once per session) ### Phase: pre (Pre-Execution Validation) **Purpose**: Capture context and set validation checkpoints before launching **Steps**: 1. Read `pre-execution.md` for validation checklist 2. Identify entity type (Skill vs Subagent) 3. Capture original user input context 4. Set entity-specific validation checkpoints based on type: - **Skills**: Expected workflow steps, tool usage, token range - **Subagents**: Expected steps (8-9 steps), critical patterns, output format 5. Store context for post-execution comparison 6. Return ready signal **Context Captured**: - User's original request (full text) - Expected mode (PRD/Interactive/Quick for Feature Architect) - Entity type and anticipated complexity - Validation checkpoints to verify after execution **Output**: ```javascript { ready: true, contextCaptured: true, checkpoints: [ "Verify Skill assessed complexity correctly", "Verify templates discovered and applied", // ... entity-specific checkpoints ] } ``` **Token Cost**: ~400-600 tokens ### Phase: post (Post-Execution Review) **Purpose**: Validate workflow adherence, analyze quality, detect deviations **Steps**: #### 1. Load Post-Execution Workflow Read `post-execution.md` for review process #### 2. Determine Required Analyses Based on entity type AND user configuration: **Planning Specialist**: - Always: `post-execution.md` → core workflow validation - If `routingValidation` enabled: `routing-validation.md` → Skills usage check - If `executionGraphs` enabled: `graph-quality.md` → execution graph validation - If `tagCoverage` enabled: `tag-quality.md` → tag coverage analysis **Feature Architect**: - Always: `post-execution.md` → PRD extraction validation - Always: Compare output vs original user input - If `routingValidation` enabled: `routing-validation.md` → agent-mapping check - If `tagCoverage` enabled: `tag-quality.md` → tag consistency check **Implementation Specialists** (Backend, Frontend, Database, Test, Technical Writer): - Always: `post-execution.md` → lifecycle steps verification - If `routingValidation` enabled: `routing-validation.md` → Status Progression Skill usage - If `informationDensity` enabled: `task-content-quality.md` → content quality analysis - Always: Verify summary (300-500 chars), Files Changed section, test results **All Skills**: - Always: Read skill definition from knowledge base - Always: Verify expected workflow steps followed - Always: Check tool usage matches expected patterns - Always: Validate token range #### 3. Conditional Efficiency Analysis Based on user configuration: - If `tokenOptimization` enabled: Read `token-optimization.md` → identify token waste - If `toolSelection` enabled: Read `tool-selection.md` → verify optimal tool usage - If `parallelDetection` enabled: Read `parallel-detection.md` → find missed parallelization #### 4. Deviation Detection Compare actual execution against expected patterns: - **ALERT**: Critical violations (status bypass, cross-domain tasks, missing requirements) - **WARN**: Process issues (verbose output, skipped steps, suboptimal dependencies) - **INFO**: Observations (efficiency opportunities, quality patterns) #### 5. Reporting If deviations found: - Read `deviation-templates.md` → format report - Add to TodoWrite with appropriate severity - If ALERT: Report immediately to user with decision prompt - If WARN: Log for end-of-session summary - If INFO: Track for pattern analysis #### 6. Pattern Tracking Read `pattern-tracking.md` → continuous improvement: - Check for recurring issues (count >= 2 in session) - Suggest definition improvements if patterns detected - Track for session summary **Output**: ```javascript { workflowAdherence: "8/8 steps followed (100%)", expectedOutputs: "7/7 present", deviations: [ { severity: "ALERT", issue: "Cross-domain task detected", details: "Task mixes backend + frontend", recommendation: "Split into domain-isolated tasks" } ], analyses: { graphQuality: "95%", tagCoverage: "100%", tokenEfficiency: "85%" }, recommendations: [ "Update planning-specialist.md to enforce domain isolation", "Add validation checklist for cross-domain detection" ] } ``` **Token Cost**: - Basic validation: ~600-800 tokens - With specialized analysis (Planning Specialist): ~1500-2000 tokens - With efficiency analysis: +800-1200 tokens ## Progressive Loading Strategy **Optimization**: Load only the analysis docs needed based on entity type AND user configuration ### Configuration-Driven Loading **Core Loading** (always loaded regardless of config): - `post-execution.md` → base workflow validation - Skill/Subagent definition from knowledge base - Entity-specific mandatory checks (summary, files changed, etc.) **Conditional Loading** (based on user configuration): ```javascript // Planning Specialist if (config.routingValidation) → Read routing-validation.md if (config.executionGraphs) → Read graph-quality.md if (config.tagCoverage) → Read tag-quality.md // Feature Architect if (config.routingValidation) → Read routing-validation.md if (config.tagCoverage) → Read tag-quality.md // Implementation Specialists (Backend, Frontend, Database, Test, Technical Writer) if (config.routingValidation) → Read routing-validation.md if (config.informationDensity) → Read task-content-quality.md // All Entities if (config.tokenOptimization) → Read token-optimization.md if (config.toolSelection) → Read tool-selection.md if (config.parallelDetection) → Read parallel-detection.md // Reporting if (deviations.length > 0) → Read deviation-templates.md if (session.deviations.count >= 2) → Read pattern-tracking.md ``` ### Token Savings Examples **Example 1: User only wants Information Density feedback** - Configuration: Only "Information Density" enabled - Loaded for Backend Engineer: `post-execution.md` + `task-content-quality.md` = ~1,200 tokens - Skipped: `routing-validation.md`, `token-optimization.md`, `tool-selection.md`, `parallel-detection.md` = ~2,400 tokens saved - **Savings: 67% reduction** **Example 2: User wants minimal CRITICAL validation only** - Configuration: Only "Routing Validation" enabled - Loaded: `post-execution.md` + `routing-validation.md` = ~1,000 tokens - Skipped: All other analysis docs = ~3,500 tokens saved - **Savings: 78% reduction** **Example 3: User wants comprehensive Planning Specialist review** - Configuration: All categories enabled - Loaded: `post-execution.md` + `graph-quality.md` + `tag-quality.md` + `routing-validation.md` + efficiency docs = ~3,500 tokens - Skipped: None (comprehensive mode) - **Savings: 0% (full analysis)** ### Special Cases **Task Orchestration Skill**: - `parallel-detection.md` always loaded if enabled in config (core to this skill's purpose) **Status Progression Skill**: - `routing-validation.md` always loaded if enabled in config (CRITICAL - status bypass detection) ## Output Format ### Success (No Deviations) ```markdown ✅ **QA Review**: [Entity Name] Workflow adherence: 100% All quality checks passed. [If efficiency analysis enabled:] Token efficiency: 85% (identified 2 optimization opportunities) ``` ### Issues Found ```markdown ## QA Review: [Entity Name] **Workflow Adherence:** X/Y steps (Z%) ### ✅ Successes - [Success 1] - [Success 2] ### ⚠️ Issues Detected **🚨 ALERT**: [Critical issue] - Impact: [What this affects] - Found: [What was observed] - Expected: [What should have happened] - Recommendation: [How to fix] **⚠️ WARN**: [Process issue] - Found: [What was observed] - Expected: [What should have happened] ### 📋 Added to TodoWrite - Review [Entity]: [Issue description] - Improvement: [Suggestion] ### 🎯 Recommendations 1. [Most critical action] 2. [Secondary action] ### 💭 Decision Required [If user decision needed, present options] ``` ## Integration with Orchestrator **Recommended Pattern**: ```javascript // 1. FIRST TIME: Interactive configuration Use orchestration-qa skill (phase="configure") // Agent asks user which analysis categories to enable // User selects: "Information Density" + "Routing Validation" // Configuration stored in session // 2. Session initialization Use orchestration-qa skill (phase="init") // Returns: Initialized with [2] analysis categories enabled // 3. Before launching Feature Architect Use orchestration-qa skill ( phase="pre", entityType="feature-architect", userInput="[user's original request]" ) // 4. Launch Feature Architect Task(subagent_type="Feature Architect", prompt="...") // 5. After Feature Architect returns Use orchestration-qa skill ( phase="post", entityType="feature-architect", entityOutput="[subagent's response]", entityId="feature-uuid" ) // Only loads: post-execution.md + routing-validation.md (user config) // Skips: graph-quality.md, tag-quality.md, token-optimization.md (not enabled) // 6. Review QA findings, take action if needed ``` **Mid-Session Reconfiguration**: ```javascript // User: "I want to also track token optimization now" Use orchestration-qa skill (phase="configure") // Agent asks again, pre-selects current config // User adds "Token Optimization" to enabled categories // New config stored, affects all subsequent post-execution reviews ``` ## Supporting Documentation This skill uses progressive loading to minimize token usage. Supporting docs are read as needed: - **initialization.md** - Session setup workflow - **pre-execution.md** - Context capture and checkpoint setting - **post-execution.md** - Core review workflow for all entities - **graph-quality.md** - Planning Specialist: execution graph analysis - **tag-quality.md** - Planning Specialist: tag coverage validation - **task-content-quality.md** - Implementation Specialists: information density and wasteful pattern detection - **token-optimization.md** - Efficiency: identify token waste patterns - **tool-selection.md** - Efficiency: verify optimal tool usage - **parallel-detection.md** - Efficiency: find missed parallelization - **routing-validation.md** - Critical: Skills vs Direct tool violations - **deviation-templates.md** - User report formatting by severity - **pattern-tracking.md** - Continuous improvement tracking ## Token Efficiency **Current Trainer** (monolithic): ~20k-30k tokens always loaded **Orchestration QA Skill** (configuration-driven progressive loading): - Configure phase: ~200-300 tokens (one-time, interactive) - Init phase: ~1000 tokens (one-time per session) - Pre-execution: ~600 tokens (per entity) - Post-execution (varies by configuration): - **Minimal** (routing only): ~800-1000 tokens - **Standard** (info density + routing): ~1200-1500 tokens - **Planning Specialist** (graphs + tags + routing): ~2000-2500 tokens - **Comprehensive** (all categories): ~3500-4000 tokens **Configuration Impact Examples**: | User Configuration | Token Cost | vs Monolithic | vs Default | |-------------------|------------|---------------|------------| | Information Density only | ~1,200 tokens | 94% savings | 67% savings | | Routing Validation only | ~1,000 tokens | 95% savings | 78% savings | | Default (Info + Routing) | ~1,500 tokens | 93% savings | baseline | | Comprehensive (all enabled) | ~4,000 tokens | 80% savings | -167% | **Smart Defaults**: Most users only need Information Density + Routing Validation, achieving 93% token reduction while catching critical issues and wasteful content. ## Quality Metrics Track these metrics across sessions: - Workflow adherence percentage - Deviation count by severity (ALERT/WARN/INFO) - Pattern recurrence (same issue multiple times) - Definition improvement suggestions generated - Token efficiency of analyzed workflows ## Examples See `examples.md` for detailed usage scenarios including: - **Interactive configuration** - Choosing analysis categories - **Session initialization** - Loading knowledge bases with config - **Feature Architect validation** - PRD mode with selective analysis - **Planning Specialist review** - Graph + tag analysis (when enabled) - **Implementation Specialist review** - Information density tracking - **Status Progression enforcement** - Critical routing violations - **Mid-session reconfiguration** - Changing enabled categories - **Token efficiency comparisons** - Different configuration impacts