Initial commit
This commit is contained in:
222
commands/debug/eval.md
Normal file
222
commands/debug/eval.md
Normal file
@@ -0,0 +1,222 @@
|
||||
---
|
||||
name: debug:eval
|
||||
description: Debug and evaluate performance issues with detailed diagnostics and fixes
|
||||
delegates-to: autonomous-agent:orchestrator
|
||||
---
|
||||
|
||||
# Debugging Performance Evaluation
|
||||
|
||||
Measures AI debugging performance by analyzing and fixing real issues in the codebase.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
/debug:eval <target> [options]
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
```bash
|
||||
--help Show this help message
|
||||
--verbose Show detailed agent selection process
|
||||
--dry-run Preview actions without executing
|
||||
--report-only Generate report without fixing issues
|
||||
--performance Include detailed performance metrics
|
||||
```
|
||||
|
||||
### Help Examples
|
||||
|
||||
```bash
|
||||
# Show help
|
||||
/debug:eval --help
|
||||
|
||||
# Debug with verbose output (shows agent selection)
|
||||
/debug:eval dashboard --verbose
|
||||
|
||||
# Preview what would be fixed
|
||||
/debug:eval data-validation --dry-run
|
||||
|
||||
# Generate report without fixing
|
||||
/debug:eval performance-index --report-only
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
This command delegates to the **orchestrator** agent which:
|
||||
|
||||
1. **Analyzes the debugging request** and determines optimal approach
|
||||
2. **Selects appropriate specialized agents** based on task type and complexity
|
||||
3. **May delegate to validation-controller** for debugging-specific tasks:
|
||||
- Issue identification and root cause analysis
|
||||
- Systematic debugging methodology
|
||||
- Fix implementation with quality controls
|
||||
4. **Measures debugging performance** using the comprehensive framework:
|
||||
- Quality Improvement Score (QIS)
|
||||
- Time Efficiency Score (TES)
|
||||
- Success Rate tracking
|
||||
- Regression detection
|
||||
- Overall Performance Index calculation
|
||||
|
||||
5. **Generates detailed performance report** with metrics and improvements
|
||||
|
||||
### Agent Delegation Process
|
||||
|
||||
When using `--verbose` flag, you'll see:
|
||||
```
|
||||
🔍 ORCHESTRATOR: Analyzing debugging request...
|
||||
📋 ORCHESTRATOR: Task type identified: "dashboard debugging"
|
||||
🎯 ORCHESTRATOR: Selecting agents: validation-controller, code-analyzer
|
||||
🚀 VALIDATION-CONTROLLER: Beginning systematic analysis...
|
||||
📊 CODE-ANALYZER: Analyzing code structure and patterns...
|
||||
```
|
||||
|
||||
### Why Orchestrator Instead of Direct Validation-Controller?
|
||||
|
||||
- **Better Task Analysis**: Orchestrator considers context, complexity, and interdependencies
|
||||
- **Multi-Agent Coordination**: Complex issues often require multiple specialized agents
|
||||
- **Quality Assurance**: Orchestrator ensures final results meet quality standards (≥70/100)
|
||||
- **Pattern Learning**: Successful approaches are stored for future optimization
|
||||
|
||||
4. **Measures debugging performance** using the comprehensive framework:
|
||||
- Quality Improvement Score (QIS)
|
||||
- Time Efficiency Score (TES)
|
||||
- Success Rate tracking
|
||||
- Regression detection
|
||||
- Overall Performance Index calculation
|
||||
|
||||
5. **Generates detailed performance report** with metrics and improvements
|
||||
|
||||
## Available Targets
|
||||
|
||||
### `dashboard`
|
||||
- **Issue**: Quality Score Timeline chart data inconsistency
|
||||
- **Symptom**: Chart values change when switching time periods and returning
|
||||
- **Root Cause**: `random.uniform()` without deterministic seeding in `dashboard.py:710-712`
|
||||
- **Expected Fix**: Replace random generation with deterministic seeded calculation
|
||||
- **Complexity**: Medium (requires code modification and testing)
|
||||
|
||||
### `performance-index`
|
||||
- **Issue**: AI Debugging Performance Index calculation accuracy
|
||||
- **Symptom**: Potential discrepancies in performance measurements
|
||||
- **Root Cause**: QIS formula implementation and regression penalty system
|
||||
- **Expected Fix**: Validate and correct calculation methodology
|
||||
- **Complexity**: High (requires framework validation)
|
||||
|
||||
### `data-validation`
|
||||
- **Issue**: Data integrity across dashboard metrics
|
||||
- **Symptom**: Inconsistent data between different charts
|
||||
- **Root Cause**: Data processing and caching inconsistencies
|
||||
- **Expected Fix**: Standardize data loading and processing
|
||||
- **Complexity**: Medium (requires data pipeline analysis)
|
||||
|
||||
## Debugging Performance Framework
|
||||
|
||||
The evaluation uses the comprehensive debugging performance framework:
|
||||
|
||||
### Quality Improvement Score (QIS)
|
||||
```
|
||||
QIS = 0.6 × FinalQuality + 0.4 × (GapClosedPct × 100/100)
|
||||
```
|
||||
|
||||
### Time Efficiency Score (TES)
|
||||
- Measures speed of problem identification and resolution
|
||||
- Accounts for task complexity and analysis depth
|
||||
- Ideal debugging time: ~30 minutes per task
|
||||
|
||||
### Performance Index with Regression Penalty
|
||||
```
|
||||
PI = (0.40 × QIS) + (0.35 × TES) + (0.25 × SR) − Penalty
|
||||
```
|
||||
|
||||
Where Penalty = RegressionRate × 20
|
||||
|
||||
## Skills Utilized
|
||||
|
||||
- **autonomous-agent:validation-standards** - Tool requirements and consistency checks
|
||||
- **autonomous-agent:quality-standards** - Best practices and quality benchmarks
|
||||
- **autonomous-agent:pattern-learning** - Historical debugging patterns and approaches
|
||||
- **autonomous-agent:security-patterns** - Security-focused debugging methodology
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Terminal Summary
|
||||
```
|
||||
🔍 DEBUGGING PERFORMANCE EVALUATION
|
||||
Target: dashboard data inconsistency
|
||||
|
||||
📊 PERFORMANCE METRICS:
|
||||
* Initial Quality: 85/100
|
||||
* Final Quality: 96/100 (+11 points)
|
||||
* QIS (Quality Improvement): 78.5/100
|
||||
* Time Efficiency: 92/100
|
||||
* Success Rate: 100%
|
||||
* Regression Penalty: 0
|
||||
* Performance Index: 87.2/100
|
||||
|
||||
⚡ DEBUGGING RESULTS:
|
||||
[PASS] Root cause identified: random.uniform() without seeding
|
||||
[PASS] Fix implemented: deterministic seeded calculation
|
||||
[PASS] Quality improvement: +11 points
|
||||
[PASS] Time to resolution: 4.2 minutes
|
||||
|
||||
📄 Full report: .claude/data/reports/debug-eval-dashboard-2025-10-24.md
|
||||
⏱ Completed in 4.2 minutes
|
||||
```
|
||||
|
||||
### Detailed Report
|
||||
Located at: `.claude/data/reports/debug-eval-<target>-YYYY-MM-DD.md`
|
||||
|
||||
Comprehensive analysis including:
|
||||
- Issue identification and root cause analysis
|
||||
- Step-by-step debugging methodology
|
||||
- Code changes and quality improvements
|
||||
- Performance metrics breakdown
|
||||
- Validation and testing results
|
||||
- Recommendations for future improvements
|
||||
|
||||
## Integration with AI Debugging Performance Index
|
||||
|
||||
Each `/eval-debug` execution automatically:
|
||||
1. Records debugging task in quality history
|
||||
2. Calculates QIS based on quality improvements made
|
||||
3. Measures time efficiency for problem resolution
|
||||
4. Updates model performance metrics
|
||||
5. Stores debugging patterns for future learning
|
||||
6. Updates AI Debugging Performance Index chart
|
||||
|
||||
## Examples
|
||||
|
||||
### Analyze Dashboard Data Inconsistency
|
||||
```bash
|
||||
/eval-debug dashboard
|
||||
```
|
||||
|
||||
### Validate Performance Index Calculations
|
||||
```bash
|
||||
/eval-debug performance-index
|
||||
```
|
||||
|
||||
### Comprehensive Data Validation
|
||||
```bash
|
||||
/eval-debug data-validation
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
**For Debugging Performance Measurement**:
|
||||
- Real-world debugging scenarios with measurable outcomes
|
||||
- Comprehensive performance metrics using established framework
|
||||
- Quality improvement tracking over time
|
||||
- Time efficiency analysis for different problem types
|
||||
|
||||
**For Code Quality**:
|
||||
- Identifies and fixes actual issues in codebase
|
||||
- Improves system reliability and data integrity
|
||||
- Validates fixes with quality controls
|
||||
- Documents debugging approaches for future reference
|
||||
|
||||
**For Learning System**:
|
||||
- Builds database of debugging patterns and solutions
|
||||
- Improves debugging efficiency over time
|
||||
- Identifies most effective debugging approaches
|
||||
- Tracks performance improvements across different problem types
|
||||
Reference in New Issue
Block a user