7.0 KiB
name, description, delegates-to
| name | description | delegates-to |
|---|---|---|
| debug:eval | Debug and evaluate performance issues with detailed diagnostics and fixes | autonomous-agent:orchestrator |
Debugging Performance Evaluation
Measures AI debugging performance by analyzing and fixing real issues in the codebase.
Usage
/debug:eval <target> [options]
Options
--help Show this help message
--verbose Show detailed agent selection process
--dry-run Preview actions without executing
--report-only Generate report without fixing issues
--performance Include detailed performance metrics
Help Examples
# Show help
/debug:eval --help
# Debug with verbose output (shows agent selection)
/debug:eval dashboard --verbose
# Preview what would be fixed
/debug:eval data-validation --dry-run
# Generate report without fixing
/debug:eval performance-index --report-only
How It Works
This command delegates to the orchestrator agent which:
-
Analyzes the debugging request and determines optimal approach
-
Selects appropriate specialized agents based on task type and complexity
-
May delegate to validation-controller for debugging-specific tasks:
- Issue identification and root cause analysis
- Systematic debugging methodology
- Fix implementation with quality controls
-
Measures debugging performance using the comprehensive framework:
- Quality Improvement Score (QIS)
- Time Efficiency Score (TES)
- Success Rate tracking
- Regression detection
- Overall Performance Index calculation
-
Generates detailed performance report with metrics and improvements
Agent Delegation Process
When using --verbose flag, you'll see:
🔍 ORCHESTRATOR: Analyzing debugging request...
📋 ORCHESTRATOR: Task type identified: "dashboard debugging"
🎯 ORCHESTRATOR: Selecting agents: validation-controller, code-analyzer
🚀 VALIDATION-CONTROLLER: Beginning systematic analysis...
📊 CODE-ANALYZER: Analyzing code structure and patterns...
Why Orchestrator Instead of Direct Validation-Controller?
- Better Task Analysis: Orchestrator considers context, complexity, and interdependencies
- Multi-Agent Coordination: Complex issues often require multiple specialized agents
- Quality Assurance: Orchestrator ensures final results meet quality standards (≥70/100)
- Pattern Learning: Successful approaches are stored for future optimization
-
Measures debugging performance using the comprehensive framework:
- Quality Improvement Score (QIS)
- Time Efficiency Score (TES)
- Success Rate tracking
- Regression detection
- Overall Performance Index calculation
-
Generates detailed performance report with metrics and improvements
Available Targets
dashboard
- Issue: Quality Score Timeline chart data inconsistency
- Symptom: Chart values change when switching time periods and returning
- Root Cause:
random.uniform()without deterministic seeding indashboard.py:710-712 - Expected Fix: Replace random generation with deterministic seeded calculation
- Complexity: Medium (requires code modification and testing)
performance-index
- Issue: AI Debugging Performance Index calculation accuracy
- Symptom: Potential discrepancies in performance measurements
- Root Cause: QIS formula implementation and regression penalty system
- Expected Fix: Validate and correct calculation methodology
- Complexity: High (requires framework validation)
data-validation
- Issue: Data integrity across dashboard metrics
- Symptom: Inconsistent data between different charts
- Root Cause: Data processing and caching inconsistencies
- Expected Fix: Standardize data loading and processing
- Complexity: Medium (requires data pipeline analysis)
Debugging Performance Framework
The evaluation uses the comprehensive debugging performance framework:
Quality Improvement Score (QIS)
QIS = 0.6 × FinalQuality + 0.4 × (GapClosedPct × 100/100)
Time Efficiency Score (TES)
- Measures speed of problem identification and resolution
- Accounts for task complexity and analysis depth
- Ideal debugging time: ~30 minutes per task
Performance Index with Regression Penalty
PI = (0.40 × QIS) + (0.35 × TES) + (0.25 × SR) − Penalty
Where Penalty = RegressionRate × 20
Skills Utilized
- autonomous-agent:validation-standards - Tool requirements and consistency checks
- autonomous-agent:quality-standards - Best practices and quality benchmarks
- autonomous-agent:pattern-learning - Historical debugging patterns and approaches
- autonomous-agent:security-patterns - Security-focused debugging methodology
Expected Output
Terminal Summary
🔍 DEBUGGING PERFORMANCE EVALUATION
Target: dashboard data inconsistency
📊 PERFORMANCE METRICS:
* Initial Quality: 85/100
* Final Quality: 96/100 (+11 points)
* QIS (Quality Improvement): 78.5/100
* Time Efficiency: 92/100
* Success Rate: 100%
* Regression Penalty: 0
* Performance Index: 87.2/100
⚡ DEBUGGING RESULTS:
[PASS] Root cause identified: random.uniform() without seeding
[PASS] Fix implemented: deterministic seeded calculation
[PASS] Quality improvement: +11 points
[PASS] Time to resolution: 4.2 minutes
📄 Full report: .claude/data/reports/debug-eval-dashboard-2025-10-24.md
⏱ Completed in 4.2 minutes
Detailed Report
Located at: .claude/data/reports/debug-eval-<target>-YYYY-MM-DD.md
Comprehensive analysis including:
- Issue identification and root cause analysis
- Step-by-step debugging methodology
- Code changes and quality improvements
- Performance metrics breakdown
- Validation and testing results
- Recommendations for future improvements
Integration with AI Debugging Performance Index
Each /eval-debug execution automatically:
- Records debugging task in quality history
- Calculates QIS based on quality improvements made
- Measures time efficiency for problem resolution
- Updates model performance metrics
- Stores debugging patterns for future learning
- Updates AI Debugging Performance Index chart
Examples
Analyze Dashboard Data Inconsistency
/eval-debug dashboard
Validate Performance Index Calculations
/eval-debug performance-index
Comprehensive Data Validation
/eval-debug data-validation
Benefits
For Debugging Performance Measurement:
- Real-world debugging scenarios with measurable outcomes
- Comprehensive performance metrics using established framework
- Quality improvement tracking over time
- Time efficiency analysis for different problem types
For Code Quality:
- Identifies and fixes actual issues in codebase
- Improves system reliability and data integrity
- Validates fixes with quality controls
- Documents debugging approaches for future reference
For Learning System:
- Builds database of debugging patterns and solutions
- Improves debugging efficiency over time
- Identifies most effective debugging approaches
- Tracks performance improvements across different problem types