Files
gh-bejranonda-llm-autonomou…/commands/debug/eval.md
2025-11-29 18:00:50 +08:00

222 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: debug:eval
description: Debug and evaluate performance issues with detailed diagnostics and fixes
delegates-to: autonomous-agent:orchestrator
---
# Debugging Performance Evaluation
Measures AI debugging performance by analyzing and fixing real issues in the codebase.
## Usage
```bash
/debug:eval <target> [options]
```
### Options
```bash
--help Show this help message
--verbose Show detailed agent selection process
--dry-run Preview actions without executing
--report-only Generate report without fixing issues
--performance Include detailed performance metrics
```
### Help Examples
```bash
# Show help
/debug:eval --help
# Debug with verbose output (shows agent selection)
/debug:eval dashboard --verbose
# Preview what would be fixed
/debug:eval data-validation --dry-run
# Generate report without fixing
/debug:eval performance-index --report-only
```
## How It Works
This command delegates to the **orchestrator** agent which:
1. **Analyzes the debugging request** and determines optimal approach
2. **Selects appropriate specialized agents** based on task type and complexity
3. **May delegate to validation-controller** for debugging-specific tasks:
- Issue identification and root cause analysis
- Systematic debugging methodology
- Fix implementation with quality controls
4. **Measures debugging performance** using the comprehensive framework:
- Quality Improvement Score (QIS)
- Time Efficiency Score (TES)
- Success Rate tracking
- Regression detection
- Overall Performance Index calculation
5. **Generates detailed performance report** with metrics and improvements
### Agent Delegation Process
When using `--verbose` flag, you'll see:
```
🔍 ORCHESTRATOR: Analyzing debugging request...
📋 ORCHESTRATOR: Task type identified: "dashboard debugging"
🎯 ORCHESTRATOR: Selecting agents: validation-controller, code-analyzer
🚀 VALIDATION-CONTROLLER: Beginning systematic analysis...
📊 CODE-ANALYZER: Analyzing code structure and patterns...
```
### Why Orchestrator Instead of Direct Validation-Controller?
- **Better Task Analysis**: Orchestrator considers context, complexity, and interdependencies
- **Multi-Agent Coordination**: Complex issues often require multiple specialized agents
- **Quality Assurance**: Orchestrator ensures final results meet quality standards (≥70/100)
- **Pattern Learning**: Successful approaches are stored for future optimization
4. **Measures debugging performance** using the comprehensive framework:
- Quality Improvement Score (QIS)
- Time Efficiency Score (TES)
- Success Rate tracking
- Regression detection
- Overall Performance Index calculation
5. **Generates detailed performance report** with metrics and improvements
## Available Targets
### `dashboard`
- **Issue**: Quality Score Timeline chart data inconsistency
- **Symptom**: Chart values change when switching time periods and returning
- **Root Cause**: `random.uniform()` without deterministic seeding in `dashboard.py:710-712`
- **Expected Fix**: Replace random generation with deterministic seeded calculation
- **Complexity**: Medium (requires code modification and testing)
### `performance-index`
- **Issue**: AI Debugging Performance Index calculation accuracy
- **Symptom**: Potential discrepancies in performance measurements
- **Root Cause**: QIS formula implementation and regression penalty system
- **Expected Fix**: Validate and correct calculation methodology
- **Complexity**: High (requires framework validation)
### `data-validation`
- **Issue**: Data integrity across dashboard metrics
- **Symptom**: Inconsistent data between different charts
- **Root Cause**: Data processing and caching inconsistencies
- **Expected Fix**: Standardize data loading and processing
- **Complexity**: Medium (requires data pipeline analysis)
## Debugging Performance Framework
The evaluation uses the comprehensive debugging performance framework:
### Quality Improvement Score (QIS)
```
QIS = 0.6 × FinalQuality + 0.4 × (GapClosedPct × 100/100)
```
### Time Efficiency Score (TES)
- Measures speed of problem identification and resolution
- Accounts for task complexity and analysis depth
- Ideal debugging time: ~30 minutes per task
### Performance Index with Regression Penalty
```
PI = (0.40 × QIS) + (0.35 × TES) + (0.25 × SR) Penalty
```
Where Penalty = RegressionRate × 20
## Skills Utilized
- **autonomous-agent:validation-standards** - Tool requirements and consistency checks
- **autonomous-agent:quality-standards** - Best practices and quality benchmarks
- **autonomous-agent:pattern-learning** - Historical debugging patterns and approaches
- **autonomous-agent:security-patterns** - Security-focused debugging methodology
## Expected Output
### Terminal Summary
```
🔍 DEBUGGING PERFORMANCE EVALUATION
Target: dashboard data inconsistency
📊 PERFORMANCE METRICS:
* Initial Quality: 85/100
* Final Quality: 96/100 (+11 points)
* QIS (Quality Improvement): 78.5/100
* Time Efficiency: 92/100
* Success Rate: 100%
* Regression Penalty: 0
* Performance Index: 87.2/100
⚡ DEBUGGING RESULTS:
[PASS] Root cause identified: random.uniform() without seeding
[PASS] Fix implemented: deterministic seeded calculation
[PASS] Quality improvement: +11 points
[PASS] Time to resolution: 4.2 minutes
📄 Full report: .claude/data/reports/debug-eval-dashboard-2025-10-24.md
⏱ Completed in 4.2 minutes
```
### Detailed Report
Located at: `.claude/data/reports/debug-eval-<target>-YYYY-MM-DD.md`
Comprehensive analysis including:
- Issue identification and root cause analysis
- Step-by-step debugging methodology
- Code changes and quality improvements
- Performance metrics breakdown
- Validation and testing results
- Recommendations for future improvements
## Integration with AI Debugging Performance Index
Each `/eval-debug` execution automatically:
1. Records debugging task in quality history
2. Calculates QIS based on quality improvements made
3. Measures time efficiency for problem resolution
4. Updates model performance metrics
5. Stores debugging patterns for future learning
6. Updates AI Debugging Performance Index chart
## Examples
### Analyze Dashboard Data Inconsistency
```bash
/eval-debug dashboard
```
### Validate Performance Index Calculations
```bash
/eval-debug performance-index
```
### Comprehensive Data Validation
```bash
/eval-debug data-validation
```
## Benefits
**For Debugging Performance Measurement**:
- Real-world debugging scenarios with measurable outcomes
- Comprehensive performance metrics using established framework
- Quality improvement tracking over time
- Time efficiency analysis for different problem types
**For Code Quality**:
- Identifies and fixes actual issues in codebase
- Improves system reliability and data integrity
- Validates fixes with quality controls
- Documents debugging approaches for future reference
**For Learning System**:
- Builds database of debugging patterns and solutions
- Improves debugging efficiency over time
- Identifies most effective debugging approaches
- Tracks performance improvements across different problem types