Files
gh-yaleh-meta-cc-claude/skills/rapid-convergence/reference/criteria.md
2025-11-30 09:07:22 +08:00

379 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Rapid Convergence Criteria - Detailed
**Purpose**: In-depth explanation of 5 rapid convergence criteria
**Impact**: Understanding when 3-4 iterations are achievable
---
## Criterion 1: Clear Baseline Metrics ⭐ CRITICAL
### Definition
V_meta(s₀) ≥ 0.40 indicates strong foundational work enables rapid progress.
### Mathematical Basis
```
ΔV_meta needed = 0.80 - V_meta(s₀)
If V_meta(s₀) = 0.40: Need +0.40 → 3-4 iterations achievable
If V_meta(s₀) = 0.10: Need +0.70 → 5-7 iterations required
```
**Assumption**: Average ΔV_meta per iteration ≈ 0.15-0.20
### What Strong Baseline Looks Like
**Quantitative metrics exist**:
- Error rate, test coverage, build time
- Measurable via tools (not subjective)
- Baseline established in <2 hours
**Success criteria are clear**:
- Target values defined (e.g., <3% error rate)
- Thresholds for convergence known
- No ambiguity about "done"
**Initial taxonomy comprehensive**:
- 70-80% coverage in iteration 0
- 10-15 categories/patterns documented
- Most edge cases identified
### Examples
**✅ Bootstrap-003 (V_meta(s₀) = 0.48)**:
```
- 1,336 errors quantified via MCP query
- Error rate: 5.78% calculated automatically
- 10 error categories (79.1% coverage)
- Clear targets: <3% error rate, <2 min MTTR
- Result: 3 iterations
```
**❌ Bootstrap-002 (V_meta(s₀) = 0.04)**:
```
- Coverage: 72.1% (but no patterns documented)
- No clear test patterns identified
- Ambiguous "done" criteria
- Had to establish metrics first
- Result: 6 iterations
```
### Impact Analysis
| V_meta(s₀) | Iterations Needed | Hours | Reason |
|------------|-------------------|-------|--------|
| 0.60-0.80 | 2-3 | 6-10h | Minimal gap to 0.80 |
| 0.40-0.59 | 3-4 | 10-15h | Moderate gap |
| 0.20-0.39 | 4-6 | 15-25h | Large gap |
| 0.00-0.19 | 6-10 | 25-40h | Exploratory |
---
## Criterion 2: Focused Domain Scope ⭐ IMPORTANT
### Definition
Domain described in <3 sentences without ambiguity.
### Why This Matters
**Focused scope** → Less exploration → Faster convergence
**Broad scope** → More patterns needed → Slower convergence
### Quantifying Focus
**Metric**: Boundary clarity ratio
```
BCR = clear_boundaries / total_boundaries
Where boundaries = {in-scope, out-of-scope, edge cases}
```
**Target**: BCR ≥ 0.80 (80% of boundaries unambiguous)
### Examples
**✅ Focused (Bootstrap-003)**:
```
Domain: "Error detection, diagnosis, recovery, prevention for meta-cc"
Boundaries:
✅ In-scope: All meta-cc errors
✅ Out-of-scope: Infrastructure failures, user errors
✅ Edge cases: Cascading errors (handle as single category)
BCR = 3/3 = 1.0 (perfectly focused)
```
**❌ Broad (Bootstrap-002)**:
```
Domain: "Develop test strategy"
Boundaries:
⚠️ In-scope: Which tests? Unit? Integration? E2E?
⚠️ Out-of-scope: What about test infrastructure?
⚠️ Edge cases: Multi-language support? CI integration?
BCR = 0/3 = 0.00 (needs scoping work)
```
### Scoping Technique
**Step 1**: Write 1-sentence domain definition
**Step 2**: List 3-5 explicit in-scope items
**Step 3**: List 3-5 explicit out-of-scope items
**Step 4**: Define edge case handling
**Example**:
```markdown
## Domain: Error Recovery for Meta-CC
**In-Scope**:
- Error detection and classification
- Root cause diagnosis
- Recovery procedures
- Prevention automation
- MTTR reduction
**Out-of-Scope**:
- Infrastructure failures (Docker, network)
- User mistakes (misuse of CLI)
- Feature requests
- Performance optimization (unless error-related)
**Edge Cases**:
- Cascading errors: Treat as single error with multiple symptoms
- Intermittent errors: Require 3+ occurrences for pattern
- Error prevention: In-scope if automatable
```
---
## Criterion 3: Direct Validation ⭐ IMPORTANT
### Definition
Can validate methodology without multi-context deployment.
### Validation Complexity Spectrum
**Level 1: Retrospective** (Fastest)
- Use historical data
- No deployment needed
- Example: 1,336 historical errors
**Level 2: Single-Context** (Fast)
- Test in one environment
- Minimal deployment
- Example: Validate on current project
**Level 3: Multi-Context** (Slow)
- Test across multiple projects/languages
- Significant deployment overhead
- Example: 3 project archetypes
**Level 4: Production** (Slowest)
- Real-world validation required
- Months of data collection
- Example: Monitor for 3-6 months
### Time Impact
| Validation Level | Overhead | Example Iterations Added |
|------------------|----------|--------------------------|
| Retrospective | 0h | +0 (Bootstrap-003) |
| Single-Context | 2-4h | +0 to +1 |
| Multi-Context | 6-12h | +2 to +3 (Bootstrap-002) |
| Production | Months | N/A (not rapid) |
### When Retrospective Validation Works
**Requirements**:
1. Historical data exists (session logs, error logs)
2. Data is representative of current/future work
3. Metrics can be calculated from historical data
4. Methodology can be applied retrospectively
**Example** (Bootstrap-003):
```
✅ 1,336 historical errors in session logs
✅ Representative of typical development work
✅ Can classify errors retrospectively
✅ Can measure prevention rate via replay
Result: Direct validation, 0 overhead
```
---
## Criterion 4: Generic Agent Sufficiency 🟡 MODERATE
### Definition
Generic agents (data-analyst, doc-writer, coder) sufficient for execution.
### Specialization Overhead
**Generic agents**: 0 overhead (use as-is)
**Specialized agents**: +1 to +2 iterations for design + testing
### When Specialization Adds Value
**10x+ speedup opportunity**:
- Example: coverage-analyzer (15 min → 30 sec = 30x)
- Example: test-generator (10 min → 1 min = 10x)
- Worth 1-2 iteration investment
**<5x speedup**:
- Use generic agents + simple scripts
- Not worth specialization overhead
### Examples
**✅ Generic Sufficient (Bootstrap-003)**:
```
Tasks:
- Analyze errors (generic data-analyst)
- Document taxonomy (generic doc-writer)
- Create validation scripts (generic coder)
Speedup from specialization: 2-3x (not worth it)
Result: 0 specialization overhead
```
**⚠️ Specialization Needed (Bootstrap-002)**:
```
Tasks:
- Coverage analysis (15 min → 30 sec = 30x with coverage-analyzer)
- Test generation (10 min → 1 min = 10x with test-generator)
Speedup: >10x for both
Investment: 1 iteration to design and test agents
Result: +1 iteration, but ROI positive overall
```
---
## Criterion 5: Early High-Impact Automation 🟡 MODERATE
### Definition
Top 3 automation opportunities identified by iteration 1.
### Pareto Principle Application
**80/20 rule**: 20% of automations provide 80% of value
**Implication**: Identify top 3 early → rapid V_instance improvement
### Identification Signals
**High-frequency patterns**:
- Appears in >10% of cases
- Example: File-not-found (18.7% of errors)
**High-impact prevention**:
- Prevents >50% of pattern occurrences
- Example: validate-path.sh prevents 65.2%
**High ROI**:
- Time saved / time invested > 5x
- Example: validate-path.sh = 61x ROI
### Early Identification Techniques
**Frequency Analysis**:
```bash
# Count error types
cat errors.jsonl | jq -r '.error_type' | sort | uniq -c | sort -rn
# Top 3 = high-frequency candidates
```
**Impact Estimation**:
```
If tool prevents X% of pattern Y:
- Pattern Y occurs N times
- Prevention: X% × N
- Impact: (X% × N) / total_errors
```
**ROI Calculation**:
```
Manual time: M min per occurrence
Tool investment: T hours
Expected uses: N
ROI = (M × N) / (T × 60)
```
### Example (Bootstrap-003)
**Iteration 0 Analysis**:
```
Top 3 by frequency:
1. File-not-found: 250/1,336 = 18.7%
2. MCP errors: 228/1,336 = 17.1%
3. Build errors: 200/1,336 = 15.0%
Automation feasibility:
1. File-not-found: ✅ Path validation (high prevention %)
2. MCP errors: ❌ Infrastructure (low automation value)
3. Build errors: ⚠️ Language-specific (moderate value)
Selected:
1. validate-path.sh: 250 errors, 65.2% prevention, 61x ROI
2. check-file-size.sh: 84 errors, 100% prevention, 31.6x ROI
3. check-read-before-write.sh: 70 errors, 100% prevention, 26.2x ROI
Total impact: 317/1,336 = 23.7% error prevention
```
**Result**: Clear automation path from iteration 0
---
## Criteria Interaction Matrix
| Criterion 1 | Criterion 2 | Criterion 3 | Likely Iterations |
|-------------|-------------|-------------|-------------------|
| ✅ (≥0.40) | ✅ Focused | ✅ Direct | 3-4 ⚡ |
| ✅ (≥0.40) | ✅ Focused | ❌ Multi | 4-5 |
| ✅ (≥0.40) | ❌ Broad | ✅ Direct | 4-5 |
| ❌ (<0.40) | ✅ Focused | ✅ Direct | 5-6 |
| ❌ (<0.40) | ❌ Broad | ❌ Multi | 7-10 |
**Key Insight**: Criteria 1-3 are multiplicative. Missing any = slower convergence.
---
## Decision Tree
```
Start
├─ Can you achieve V_meta(s₀) ≥ 0.40?
│ YES → Continue
│ NO → Standard convergence (5-7 iterations)
├─ Is domain scope <3 sentences?
│ YES → Continue
│ NO → Refine scope first
├─ Can you validate without multi-context?
│ YES → Rapid convergence likely (3-4 iterations)
│ NO → Add +2 iterations for validation
└─ Generic agents sufficient?
YES → No overhead
NO → Add +1 iteration for specialization
```
---
**Source**: BAIME Rapid Convergence Criteria
**Validation**: 13 experiments, 85% prediction accuracy
**Critical Path**: Criteria 1-3 (must all be met for rapid convergence)