Files
2025-11-30 09:07:22 +08:00

9.2 KiB
Raw Permalink Blame History

Rapid Convergence Criteria - Detailed

Purpose: In-depth explanation of 5 rapid convergence criteria Impact: Understanding when 3-4 iterations are achievable


Criterion 1: Clear Baseline Metrics CRITICAL

Definition

V_meta(s₀) ≥ 0.40 indicates strong foundational work enables rapid progress.

Mathematical Basis

ΔV_meta needed = 0.80 - V_meta(s₀)

If V_meta(s₀) = 0.40: Need +0.40 → 3-4 iterations achievable
If V_meta(s₀) = 0.10: Need +0.70 → 5-7 iterations required

Assumption: Average ΔV_meta per iteration ≈ 0.15-0.20

What Strong Baseline Looks Like

Quantitative metrics exist:

  • Error rate, test coverage, build time
  • Measurable via tools (not subjective)
  • Baseline established in <2 hours

Success criteria are clear:

  • Target values defined (e.g., <3% error rate)
  • Thresholds for convergence known
  • No ambiguity about "done"

Initial taxonomy comprehensive:

  • 70-80% coverage in iteration 0
  • 10-15 categories/patterns documented
  • Most edge cases identified

Examples

Bootstrap-003 (V_meta(s₀) = 0.48):

- 1,336 errors quantified via MCP query
- Error rate: 5.78% calculated automatically
- 10 error categories (79.1% coverage)
- Clear targets: <3% error rate, <2 min MTTR
- Result: 3 iterations

Bootstrap-002 (V_meta(s₀) = 0.04):

- Coverage: 72.1% (but no patterns documented)
- No clear test patterns identified
- Ambiguous "done" criteria
- Had to establish metrics first
- Result: 6 iterations

Impact Analysis

V_meta(s₀) Iterations Needed Hours Reason
0.60-0.80 2-3 6-10h Minimal gap to 0.80
0.40-0.59 3-4 10-15h Moderate gap
0.20-0.39 4-6 15-25h Large gap
0.00-0.19 6-10 25-40h Exploratory

Criterion 2: Focused Domain Scope IMPORTANT

Definition

Domain described in <3 sentences without ambiguity.

Why This Matters

Focused scope → Less exploration → Faster convergence

Broad scope → More patterns needed → Slower convergence

Quantifying Focus

Metric: Boundary clarity ratio

BCR = clear_boundaries / total_boundaries

Where boundaries = {in-scope, out-of-scope, edge cases}

Target: BCR ≥ 0.80 (80% of boundaries unambiguous)

Examples

Focused (Bootstrap-003):

Domain: "Error detection, diagnosis, recovery, prevention for meta-cc"

Boundaries:
✅ In-scope: All meta-cc errors
✅ Out-of-scope: Infrastructure failures, user errors
✅ Edge cases: Cascading errors (handle as single category)

BCR = 3/3 = 1.0 (perfectly focused)

Broad (Bootstrap-002):

Domain: "Develop test strategy"

Boundaries:
⚠️ In-scope: Which tests? Unit? Integration? E2E?
⚠️ Out-of-scope: What about test infrastructure?
⚠️ Edge cases: Multi-language support? CI integration?

BCR = 0/3 = 0.00 (needs scoping work)

Scoping Technique

Step 1: Write 1-sentence domain definition Step 2: List 3-5 explicit in-scope items Step 3: List 3-5 explicit out-of-scope items Step 4: Define edge case handling

Example:

## Domain: Error Recovery for Meta-CC

**In-Scope**:
- Error detection and classification
- Root cause diagnosis
- Recovery procedures
- Prevention automation
- MTTR reduction

**Out-of-Scope**:
- Infrastructure failures (Docker, network)
- User mistakes (misuse of CLI)
- Feature requests
- Performance optimization (unless error-related)

**Edge Cases**:
- Cascading errors: Treat as single error with multiple symptoms
- Intermittent errors: Require 3+ occurrences for pattern
- Error prevention: In-scope if automatable

Criterion 3: Direct Validation IMPORTANT

Definition

Can validate methodology without multi-context deployment.

Validation Complexity Spectrum

Level 1: Retrospective (Fastest)

  • Use historical data
  • No deployment needed
  • Example: 1,336 historical errors

Level 2: Single-Context (Fast)

  • Test in one environment
  • Minimal deployment
  • Example: Validate on current project

Level 3: Multi-Context (Slow)

  • Test across multiple projects/languages
  • Significant deployment overhead
  • Example: 3 project archetypes

Level 4: Production (Slowest)

  • Real-world validation required
  • Months of data collection
  • Example: Monitor for 3-6 months

Time Impact

Validation Level Overhead Example Iterations Added
Retrospective 0h +0 (Bootstrap-003)
Single-Context 2-4h +0 to +1
Multi-Context 6-12h +2 to +3 (Bootstrap-002)
Production Months N/A (not rapid)

When Retrospective Validation Works

Requirements:

  1. Historical data exists (session logs, error logs)
  2. Data is representative of current/future work
  3. Metrics can be calculated from historical data
  4. Methodology can be applied retrospectively

Example (Bootstrap-003):

✅ 1,336 historical errors in session logs
✅ Representative of typical development work
✅ Can classify errors retrospectively
✅ Can measure prevention rate via replay

Result: Direct validation, 0 overhead

Criterion 4: Generic Agent Sufficiency 🟡 MODERATE

Definition

Generic agents (data-analyst, doc-writer, coder) sufficient for execution.

Specialization Overhead

Generic agents: 0 overhead (use as-is) Specialized agents: +1 to +2 iterations for design + testing

When Specialization Adds Value

10x+ speedup opportunity:

  • Example: coverage-analyzer (15 min → 30 sec = 30x)
  • Example: test-generator (10 min → 1 min = 10x)
  • Worth 1-2 iteration investment

<5x speedup:

  • Use generic agents + simple scripts
  • Not worth specialization overhead

Examples

Generic Sufficient (Bootstrap-003):

Tasks:
- Analyze errors (generic data-analyst)
- Document taxonomy (generic doc-writer)
- Create validation scripts (generic coder)

Speedup from specialization: 2-3x (not worth it)
Result: 0 specialization overhead

⚠️ Specialization Needed (Bootstrap-002):

Tasks:
- Coverage analysis (15 min → 30 sec = 30x with coverage-analyzer)
- Test generation (10 min → 1 min = 10x with test-generator)

Speedup: >10x for both
Investment: 1 iteration to design and test agents
Result: +1 iteration, but ROI positive overall

Criterion 5: Early High-Impact Automation 🟡 MODERATE

Definition

Top 3 automation opportunities identified by iteration 1.

Pareto Principle Application

80/20 rule: 20% of automations provide 80% of value

Implication: Identify top 3 early → rapid V_instance improvement

Identification Signals

High-frequency patterns:

  • Appears in >10% of cases
  • Example: File-not-found (18.7% of errors)

High-impact prevention:

  • Prevents >50% of pattern occurrences
  • Example: validate-path.sh prevents 65.2%

High ROI:

  • Time saved / time invested > 5x
  • Example: validate-path.sh = 61x ROI

Early Identification Techniques

Frequency Analysis:

# Count error types
cat errors.jsonl | jq -r '.error_type' | sort | uniq -c | sort -rn

# Top 3 = high-frequency candidates

Impact Estimation:

If tool prevents X% of pattern Y:
- Pattern Y occurs N times
- Prevention: X% × N
- Impact: (X% × N) / total_errors

ROI Calculation:

Manual time: M min per occurrence
Tool investment: T hours
Expected uses: N

ROI = (M × N) / (T × 60)

Example (Bootstrap-003)

Iteration 0 Analysis:

Top 3 by frequency:
1. File-not-found: 250/1,336 = 18.7%
2. MCP errors: 228/1,336 = 17.1%
3. Build errors: 200/1,336 = 15.0%

Automation feasibility:
1. File-not-found: ✅ Path validation (high prevention %)
2. MCP errors: ❌ Infrastructure (low automation value)
3. Build errors: ⚠️ Language-specific (moderate value)

Selected:
1. validate-path.sh: 250 errors, 65.2% prevention, 61x ROI
2. check-file-size.sh: 84 errors, 100% prevention, 31.6x ROI
3. check-read-before-write.sh: 70 errors, 100% prevention, 26.2x ROI

Total impact: 317/1,336 = 23.7% error prevention

Result: Clear automation path from iteration 0


Criteria Interaction Matrix

Criterion 1 Criterion 2 Criterion 3 Likely Iterations
(≥0.40) Focused Direct 3-4
(≥0.40) Focused Multi 4-5
(≥0.40) Broad Direct 4-5
(<0.40) Focused Direct 5-6
(<0.40) Broad Multi 7-10

Key Insight: Criteria 1-3 are multiplicative. Missing any = slower convergence.


Decision Tree

Start
  │
  ├─ Can you achieve V_meta(s₀) ≥ 0.40?
  │    YES → Continue
  │    NO → Standard convergence (5-7 iterations)
  │
  ├─ Is domain scope <3 sentences?
  │    YES → Continue
  │    NO → Refine scope first
  │
  ├─ Can you validate without multi-context?
  │    YES → Rapid convergence likely (3-4 iterations)
  │    NO → Add +2 iterations for validation
  │
  └─ Generic agents sufficient?
       YES → No overhead
       NO → Add +1 iteration for specialization

Source: BAIME Rapid Convergence Criteria Validation: 13 experiments, 85% prediction accuracy Critical Path: Criteria 1-3 (must all be met for rapid convergence)