gh-yaleh-meta-cc-claude/skills/retrospective-validation/reference/process.md

# Retrospective Validation Process

**Version**: 1.0
**Framework**: BAIME
**Purpose**: Validate methodologies against historical data post-creation

---

## Overview

Retrospective validation applies a newly created methodology to historical work to measure effectiveness and identify gaps. This validates that the methodology would have improved past outcomes.

---

## Validation Process

### Phase 1: Data Collection (15 min)

**Gather historical data**:
- Session history (JSONL files)
- Error logs and recovery attempts
- Time measurements
- Quality metrics

**Tools**:
```bash
# Query session data
query_tools --status=error
query_user_messages --pattern="error|fail|bug"
query_context --error-signature="..."
```

### Phase 2: Baseline Measurement (15 min)

**Measure pre-methodology state**:
- Error frequency by category
- Mean Time To Recovery (MTTR)
- Prevention opportunities missed
- Quality metrics

**Example**:
```markdown
## Baseline (Without Methodology)

**Errors**: 1336 total
**MTTR**: 11.25 min average
**Prevention**: 0% (no automation)
**Classification**: Ad-hoc, inconsistent
```

### Phase 3: Apply Methodology (30 min)

**Retrospectively apply patterns**:
1. Classify errors using new taxonomy
2. Identify which patterns would apply
3. Calculate time saved per pattern
4. Measure coverage improvement

**Example**:
```markdown
## With Error Recovery Methodology

**Classification**: 1275/1336 = 95.4% coverage
**Patterns Applied**: 10 recovery patterns
**Time Saved**: 8.25 min per error average
**Prevention**: 317 errors (23.7%) preventable
```

### Phase 4: Calculate Impact (20 min)

**Metrics**:
```
Coverage = classified_errors / total_errors
Time_Saved = (MTTR_before - MTTR_after) × error_count
Prevention_Rate = preventable_errors / total_errors
ROI = time_saved / methodology_creation_time
```

**Example**:
```markdown
## Impact Analysis

**Coverage**: 95.4% (1275/1336)
**Time Saved**: 8.25 min × 1336 = 183.6 hours
**Prevention**: 23.7% (317 errors)
**ROI**: 183.6h saved / 5.75h invested = 31.9x
```

### Phase 5: Gap Analysis (15 min)

**Identify remaining gaps**:
- Uncategorized errors (4.6%)
- Patterns needed for edge cases
- Automation opportunities
- Transferability limits

---

## Confidence Scoring

**Formula**:
```
Confidence = 0.4 × coverage +
             0.3 × validation_sample_size +
             0.2 × pattern_consistency +
             0.1 × expert_review

Where:
- coverage = classified / total (0-1)
- validation_sample_size = min(validated/50, 1.0)
- pattern_consistency = successful_applications / total_applications
- expert_review = binary (0 or 1)
```

**Thresholds**:
- Confidence ≥ 0.80: High confidence, production-ready
- Confidence 0.60-0.79: Medium confidence, needs refinement
- Confidence < 0.60: Low confidence, significant gaps

---

## Validation Criteria

**Methodology is validated if**:
1. Coverage ≥ 80% (methodology handles most cases)
2. Time savings ≥ 30% (significant efficiency gain)
3. Prevention ≥ 10% (automation provides value)
4. ROI ≥ 5x (worthwhile investment)
5. Transferability ≥ 70% (broadly applicable)

---

## Example: Error Recovery Validation

**Historical Data**: 1336 errors from 15 sessions

**Baseline**:
- MTTR: 11.25 min
- No systematic classification
- No prevention tools

**Post-Methodology** (retrospective):
- Coverage: 95.4% (13 categories)
- MTTR: 3 min (73% reduction)
- Prevention: 23.7% (3 automation tools)
- Time saved: 183.6 hours
- ROI: 31.9x

**Confidence Score**:
```
Confidence = 0.4 × 0.954 +
             0.3 × 1.0 +
             0.2 × 0.91 +
             0.1 × 1.0
           = 0.38 + 0.30 + 0.18 + 0.10
           = 0.96 (High confidence)
```

**Validation Result**: ✅ VALIDATED (all criteria met)

---

## Common Pitfalls

**❌ Selection Bias**: Only validating on "easy" cases
- Fix: Use complete dataset, include edge cases

**❌ Overfitting**: Methodology too specific to validation data
- Fix: Test transferability on different project

**❌ Optimistic Timing**: Assuming perfect pattern application
- Fix: Use realistic time estimates (1.2x typical)

**❌ Ignoring Learning Curve**: Assuming immediate proficiency
- Fix: Factor in 2-3 iterations to master patterns

---

## Automation Support

**Validation Script**:
```bash
#!/bin/bash
# scripts/validate-methodology.sh

METHODOLOGY=$1
HISTORY_DIR=$2

# Extract baseline metrics
baseline=$(query_tools --scope=session | jq -r '.[] | .duration' | avg)

# Apply methodology patterns
coverage=$(classify_with_patterns "$METHODOLOGY" "$HISTORY_DIR")

# Calculate impact
time_saved=$(calculate_time_savings "$baseline" "$coverage")
prevention=$(calculate_prevention_rate "$METHODOLOGY")

# Generate report
echo "Coverage: $coverage"
echo "Time Saved: $time_saved"
echo "Prevention: $prevention"
echo "ROI: $(calculate_roi "$time_saved" "$methodology_time")"
```

---

**Source**: Bootstrap-003 Error Recovery Retrospective Validation
**Status**: Production-ready, 96% confidence score
**ROI**: 31.9x validated across 1336 historical errors