gh-bejranonda-llm-autonomou…/commands/debug/eval.md

---
name: debug:eval
description: Debug and evaluate performance issues with detailed diagnostics and fixes
delegates-to: autonomous-agent:orchestrator
---

# Debugging Performance Evaluation

Measures AI debugging performance by analyzing and fixing real issues in the codebase.

## Usage

```bash
/debug:eval <target> [options]
```

### Options

```bash
--help                Show this help message
--verbose             Show detailed agent selection process
--dry-run            Preview actions without executing
--report-only        Generate report without fixing issues
--performance         Include detailed performance metrics
```

### Help Examples

```bash
# Show help
/debug:eval --help

# Debug with verbose output (shows agent selection)
/debug:eval dashboard --verbose

# Preview what would be fixed
/debug:eval data-validation --dry-run

# Generate report without fixing
/debug:eval performance-index --report-only
```

## How It Works

This command delegates to the **orchestrator** agent which:

1. **Analyzes the debugging request** and determines optimal approach
2. **Selects appropriate specialized agents** based on task type and complexity
3. **May delegate to validation-controller** for debugging-specific tasks:
   - Issue identification and root cause analysis
   - Systematic debugging methodology
   - Fix implementation with quality controls
4. **Measures debugging performance** using the comprehensive framework:
   - Quality Improvement Score (QIS)
   - Time Efficiency Score (TES)
   - Success Rate tracking
   - Regression detection
   - Overall Performance Index calculation

5. **Generates detailed performance report** with metrics and improvements

### Agent Delegation Process

When using `--verbose` flag, you'll see:
```
🔍 ORCHESTRATOR: Analyzing debugging request...
📋 ORCHESTRATOR: Task type identified: "dashboard debugging"
🎯 ORCHESTRATOR: Selecting agents: validation-controller, code-analyzer
🚀 VALIDATION-CONTROLLER: Beginning systematic analysis...
📊 CODE-ANALYZER: Analyzing code structure and patterns...
```

### Why Orchestrator Instead of Direct Validation-Controller?

- **Better Task Analysis**: Orchestrator considers context, complexity, and interdependencies
- **Multi-Agent Coordination**: Complex issues often require multiple specialized agents
- **Quality Assurance**: Orchestrator ensures final results meet quality standards (≥70/100)
- **Pattern Learning**: Successful approaches are stored for future optimization

4. **Measures debugging performance** using the comprehensive framework:
   - Quality Improvement Score (QIS)
   - Time Efficiency Score (TES)
   - Success Rate tracking
   - Regression detection
   - Overall Performance Index calculation

5. **Generates detailed performance report** with metrics and improvements

## Available Targets

### `dashboard`
- **Issue**: Quality Score Timeline chart data inconsistency
- **Symptom**: Chart values change when switching time periods and returning
- **Root Cause**: `random.uniform()` without deterministic seeding in `dashboard.py:710-712`
- **Expected Fix**: Replace random generation with deterministic seeded calculation
- **Complexity**: Medium (requires code modification and testing)

### `performance-index`
- **Issue**: AI Debugging Performance Index calculation accuracy
- **Symptom**: Potential discrepancies in performance measurements
- **Root Cause**: QIS formula implementation and regression penalty system
- **Expected Fix**: Validate and correct calculation methodology
- **Complexity**: High (requires framework validation)

### `data-validation`
- **Issue**: Data integrity across dashboard metrics
- **Symptom**: Inconsistent data between different charts
- **Root Cause**: Data processing and caching inconsistencies
- **Expected Fix**: Standardize data loading and processing
- **Complexity**: Medium (requires data pipeline analysis)

## Debugging Performance Framework

The evaluation uses the comprehensive debugging performance framework:

### Quality Improvement Score (QIS)
```
QIS = 0.6 × FinalQuality + 0.4 × (GapClosedPct × 100/100)
```

### Time Efficiency Score (TES)
- Measures speed of problem identification and resolution
- Accounts for task complexity and analysis depth
- Ideal debugging time: ~30 minutes per task

### Performance Index with Regression Penalty
```
PI = (0.40 × QIS) + (0.35 × TES) + (0.25 × SR) − Penalty
```

Where Penalty = RegressionRate × 20

## Skills Utilized

- **autonomous-agent:validation-standards** - Tool requirements and consistency checks
- **autonomous-agent:quality-standards** - Best practices and quality benchmarks
- **autonomous-agent:pattern-learning** - Historical debugging patterns and approaches
- **autonomous-agent:security-patterns** - Security-focused debugging methodology

## Expected Output

### Terminal Summary
```
🔍 DEBUGGING PERFORMANCE EVALUATION
Target: dashboard data inconsistency

📊 PERFORMANCE METRICS:
* Initial Quality: 85/100
* Final Quality: 96/100 (+11 points)
* QIS (Quality Improvement): 78.5/100
* Time Efficiency: 92/100
* Success Rate: 100%
* Regression Penalty: 0
* Performance Index: 87.2/100

⚡ DEBUGGING RESULTS:
[PASS] Root cause identified: random.uniform() without seeding
[PASS] Fix implemented: deterministic seeded calculation
[PASS] Quality improvement: +11 points
[PASS] Time to resolution: 4.2 minutes

📄 Full report: .claude/data/reports/debug-eval-dashboard-2025-10-24.md
⏱ Completed in 4.2 minutes
```

### Detailed Report
Located at: `.claude/data/reports/debug-eval-<target>-YYYY-MM-DD.md`

Comprehensive analysis including:
- Issue identification and root cause analysis
- Step-by-step debugging methodology
- Code changes and quality improvements
- Performance metrics breakdown
- Validation and testing results
- Recommendations for future improvements

## Integration with AI Debugging Performance Index

Each `/eval-debug` execution automatically:
1. Records debugging task in quality history
2. Calculates QIS based on quality improvements made
3. Measures time efficiency for problem resolution
4. Updates model performance metrics
5. Stores debugging patterns for future learning
6. Updates AI Debugging Performance Index chart

## Examples

### Analyze Dashboard Data Inconsistency
```bash
/eval-debug dashboard
```

### Validate Performance Index Calculations
```bash
/eval-debug performance-index
```

### Comprehensive Data Validation
```bash
/eval-debug data-validation
```

## Benefits

**For Debugging Performance Measurement**:
- Real-world debugging scenarios with measurable outcomes
- Comprehensive performance metrics using established framework
- Quality improvement tracking over time
- Time efficiency analysis for different problem types

**For Code Quality**:
- Identifies and fixes actual issues in codebase
- Improves system reliability and data integrity
- Validates fixes with quality controls
- Documents debugging approaches for future reference

**For Learning System**:
- Builds database of debugging patterns and solutions
- Improves debugging efficiency over time
- Identifies most effective debugging approaches
- Tracks performance improvements across different problem types