Initial commit

2025-11-30 09:07:22 +08:00
commit fab98d059b
179 changed files with 46209 additions and 0 deletions
--- a/skills/rapid-convergence/reference/criteria.md
+++ b/skills/rapid-convergence/reference/criteria.md
@@ -0,0 +1,378 @@
+# Rapid Convergence Criteria - Detailed
+
+**Purpose**: In-depth explanation of 5 rapid convergence criteria
+**Impact**: Understanding when 3-4 iterations are achievable
+
+---
+
+## Criterion 1: Clear Baseline Metrics ⭐ CRITICAL
+
+### Definition
+
+V_meta(s₀) ≥ 0.40 indicates strong foundational work enables rapid progress.
+
+### Mathematical Basis
+
+```
+ΔV_meta needed = 0.80 - V_meta(s₀)
+
+If V_meta(s₀) = 0.40: Need +0.40 → 3-4 iterations achievable
+If V_meta(s₀) = 0.10: Need +0.70 → 5-7 iterations required
+```
+
+**Assumption**: Average ΔV_meta per iteration ≈ 0.15-0.20
+
+### What Strong Baseline Looks Like
+
+**Quantitative metrics exist**:
+- Error rate, test coverage, build time
+- Measurable via tools (not subjective)
+- Baseline established in <2 hours
+
+**Success criteria are clear**:
+- Target values defined (e.g., <3% error rate)
+- Thresholds for convergence known
+- No ambiguity about "done"
+
+**Initial taxonomy comprehensive**:
+- 70-80% coverage in iteration 0
+- 10-15 categories/patterns documented
+- Most edge cases identified
+
+### Examples
+
+**✅ Bootstrap-003 (V_meta(s₀) = 0.48)**:
+```
+- 1,336 errors quantified via MCP query
+- Error rate: 5.78% calculated automatically
+- 10 error categories (79.1% coverage)
+- Clear targets: <3% error rate, <2 min MTTR
+- Result: 3 iterations
+```
+
+**❌ Bootstrap-002 (V_meta(s₀) = 0.04)**:
+```
+- Coverage: 72.1% (but no patterns documented)
+- No clear test patterns identified
+- Ambiguous "done" criteria
+- Had to establish metrics first
+- Result: 6 iterations
+```
+
+### Impact Analysis
+
+| V_meta(s₀) | Iterations Needed | Hours | Reason |
+|------------|-------------------|-------|--------|
+| 0.60-0.80 | 2-3 | 6-10h | Minimal gap to 0.80 |
+| 0.40-0.59 | 3-4 | 10-15h | Moderate gap |
+| 0.20-0.39 | 4-6 | 15-25h | Large gap |
+| 0.00-0.19 | 6-10 | 25-40h | Exploratory |
+
+---
+
+## Criterion 2: Focused Domain Scope ⭐ IMPORTANT
+
+### Definition
+
+Domain described in <3 sentences without ambiguity.
+
+### Why This Matters
+
+**Focused scope** → Less exploration → Faster convergence
+
+**Broad scope** → More patterns needed → Slower convergence
+
+### Quantifying Focus
+
+**Metric**: Boundary clarity ratio
+```
+BCR = clear_boundaries / total_boundaries
+
+Where boundaries = {in-scope, out-of-scope, edge cases}
+```
+
+**Target**: BCR ≥ 0.80 (80% of boundaries unambiguous)
+
+### Examples
+
+**✅ Focused (Bootstrap-003)**:
+```
+Domain: "Error detection, diagnosis, recovery, prevention for meta-cc"
+
+Boundaries:
+✅ In-scope: All meta-cc errors
+✅ Out-of-scope: Infrastructure failures, user errors
+✅ Edge cases: Cascading errors (handle as single category)
+
+BCR = 3/3 = 1.0 (perfectly focused)
+```
+
+**❌ Broad (Bootstrap-002)**:
+```
+Domain: "Develop test strategy"
+
+Boundaries:
+⚠️ In-scope: Which tests? Unit? Integration? E2E?
+⚠️ Out-of-scope: What about test infrastructure?
+⚠️ Edge cases: Multi-language support? CI integration?
+
+BCR = 0/3 = 0.00 (needs scoping work)
+```
+
+### Scoping Technique
+
+**Step 1**: Write 1-sentence domain definition
+**Step 2**: List 3-5 explicit in-scope items
+**Step 3**: List 3-5 explicit out-of-scope items
+**Step 4**: Define edge case handling
+
+**Example**:
+```markdown
+## Domain: Error Recovery for Meta-CC
+
+**In-Scope**:
+- Error detection and classification
+- Root cause diagnosis
+- Recovery procedures
+- Prevention automation
+- MTTR reduction
+
+**Out-of-Scope**:
+- Infrastructure failures (Docker, network)
+- User mistakes (misuse of CLI)
+- Feature requests
+- Performance optimization (unless error-related)
+
+**Edge Cases**:
+- Cascading errors: Treat as single error with multiple symptoms
+- Intermittent errors: Require 3+ occurrences for pattern
+- Error prevention: In-scope if automatable
+```
+
+---
+
+## Criterion 3: Direct Validation ⭐ IMPORTANT
+
+### Definition
+
+Can validate methodology without multi-context deployment.
+
+### Validation Complexity Spectrum
+
+**Level 1: Retrospective** (Fastest)
+- Use historical data
+- No deployment needed
+- Example: 1,336 historical errors
+
+**Level 2: Single-Context** (Fast)
+- Test in one environment
+- Minimal deployment
+- Example: Validate on current project
+
+**Level 3: Multi-Context** (Slow)
+- Test across multiple projects/languages
+- Significant deployment overhead
+- Example: 3 project archetypes
+
+**Level 4: Production** (Slowest)
+- Real-world validation required
+- Months of data collection
+- Example: Monitor for 3-6 months
+
+### Time Impact
+
+| Validation Level | Overhead | Example Iterations Added |
+|------------------|----------|--------------------------|
+| Retrospective | 0h | +0 (Bootstrap-003) |
+| Single-Context | 2-4h | +0 to +1 |
+| Multi-Context | 6-12h | +2 to +3 (Bootstrap-002) |
+| Production | Months | N/A (not rapid) |
+
+### When Retrospective Validation Works
+
+**Requirements**:
+1. Historical data exists (session logs, error logs)
+2. Data is representative of current/future work
+3. Metrics can be calculated from historical data
+4. Methodology can be applied retrospectively
+
+**Example** (Bootstrap-003):
+```
+✅ 1,336 historical errors in session logs
+✅ Representative of typical development work
+✅ Can classify errors retrospectively
+✅ Can measure prevention rate via replay
+
+Result: Direct validation, 0 overhead
+```
+
+---
+
+## Criterion 4: Generic Agent Sufficiency 🟡 MODERATE
+
+### Definition
+
+Generic agents (data-analyst, doc-writer, coder) sufficient for execution.
+
+### Specialization Overhead
+
+**Generic agents**: 0 overhead (use as-is)
+**Specialized agents**: +1 to +2 iterations for design + testing
+
+### When Specialization Adds Value
+
+**10x+ speedup opportunity**:
+- Example: coverage-analyzer (15 min → 30 sec = 30x)
+- Example: test-generator (10 min → 1 min = 10x)
+- Worth 1-2 iteration investment
+
+**<5x speedup**:
+- Use generic agents + simple scripts
+- Not worth specialization overhead
+
+### Examples
+
+**✅ Generic Sufficient (Bootstrap-003)**:
+```
+Tasks:
+- Analyze errors (generic data-analyst)
+- Document taxonomy (generic doc-writer)
+- Create validation scripts (generic coder)
+
+Speedup from specialization: 2-3x (not worth it)
+Result: 0 specialization overhead
+```
+
+**⚠️ Specialization Needed (Bootstrap-002)**:
+```
+Tasks:
+- Coverage analysis (15 min → 30 sec = 30x with coverage-analyzer)
+- Test generation (10 min → 1 min = 10x with test-generator)
+
+Speedup: >10x for both
+Investment: 1 iteration to design and test agents
+Result: +1 iteration, but ROI positive overall
+```
+
+---
+
+## Criterion 5: Early High-Impact Automation 🟡 MODERATE
+
+### Definition
+
+Top 3 automation opportunities identified by iteration 1.
+
+### Pareto Principle Application
+
+**80/20 rule**: 20% of automations provide 80% of value
+
+**Implication**: Identify top 3 early → rapid V_instance improvement
+
+### Identification Signals
+
+**High-frequency patterns**:
+- Appears in >10% of cases
+- Example: File-not-found (18.7% of errors)
+
+**High-impact prevention**:
+- Prevents >50% of pattern occurrences
+- Example: validate-path.sh prevents 65.2%
+
+**High ROI**:
+- Time saved / time invested > 5x
+- Example: validate-path.sh = 61x ROI
+
+### Early Identification Techniques
+
+**Frequency Analysis**:
+```bash
+# Count error types
+cat errors.jsonl | jq -r '.error_type' | sort | uniq -c | sort -rn
+
+# Top 3 = high-frequency candidates
+```
+
+**Impact Estimation**:
+```
+If tool prevents X% of pattern Y:
+- Pattern Y occurs N times
+- Prevention: X% × N
+- Impact: (X% × N) / total_errors
+```
+
+**ROI Calculation**:
+```
+Manual time: M min per occurrence
+Tool investment: T hours
+Expected uses: N
+
+ROI = (M × N) / (T × 60)
+```
+
+### Example (Bootstrap-003)
+
+**Iteration 0 Analysis**:
+```
+Top 3 by frequency:
+1. File-not-found: 250/1,336 = 18.7%
+2. MCP errors: 228/1,336 = 17.1%
+3. Build errors: 200/1,336 = 15.0%
+
+Automation feasibility:
+1. File-not-found: ✅ Path validation (high prevention %)
+2. MCP errors: ❌ Infrastructure (low automation value)
+3. Build errors: ⚠️ Language-specific (moderate value)
+
+Selected:
+1. validate-path.sh: 250 errors, 65.2% prevention, 61x ROI
+2. check-file-size.sh: 84 errors, 100% prevention, 31.6x ROI
+3. check-read-before-write.sh: 70 errors, 100% prevention, 26.2x ROI
+
+Total impact: 317/1,336 = 23.7% error prevention
+```
+
+**Result**: Clear automation path from iteration 0
+
+---
+
+## Criteria Interaction Matrix
+
+| Criterion 1 | Criterion 2 | Criterion 3 | Likely Iterations |
+|-------------|-------------|-------------|-------------------|
+| ✅ (≥0.40) | ✅ Focused | ✅ Direct | 3-4 ⚡ |
+| ✅ (≥0.40) | ✅ Focused | ❌ Multi | 4-5 |
+| ✅ (≥0.40) | ❌ Broad | ✅ Direct | 4-5 |
+| ❌ (<0.40) | ✅ Focused | ✅ Direct | 5-6 |
+| ❌ (<0.40) | ❌ Broad | ❌ Multi | 7-10 |
+
+**Key Insight**: Criteria 1-3 are multiplicative. Missing any = slower convergence.
+
+---
+
+## Decision Tree
+
+```
+Start
+  │
+  ├─ Can you achieve V_meta(s₀) ≥ 0.40?
+  │    YES → Continue
+  │    NO → Standard convergence (5-7 iterations)
+  │
+  ├─ Is domain scope <3 sentences?
+  │    YES → Continue
+  │    NO → Refine scope first
+  │
+  ├─ Can you validate without multi-context?
+  │    YES → Rapid convergence likely (3-4 iterations)
+  │    NO → Add +2 iterations for validation
+  │
+  └─ Generic agents sufficient?
+       YES → No overhead
+       NO → Add +1 iteration for specialization
+```
+
+---
+
+**Source**: BAIME Rapid Convergence Criteria
+**Validation**: 13 experiments, 85% prediction accuracy
+**Critical Path**: Criteria 1-3 (must all be met for rapid convergence)