Initial commit

2025-11-30 09:07:22 +08:00
commit fab98d059b
179 changed files with 46209 additions and 0 deletions
--- a/skills/methodology-bootstrapping/reference/convergence-criteria.md
+++ b/skills/methodology-bootstrapping/reference/convergence-criteria.md
@@ -0,0 +1,334 @@
+# Convergence Criteria
+
+**How to know when your methodology development is complete.**
+
+## Standard Dual Convergence
+
+The most common pattern (used in 6/8 experiments):
+
+### Criteria
+
+```
+Converged when ALL of:
+1. M_n == M_{n-1}        (Meta-Agent stable)
+2. A_n == A_{n-1}        (Agent set stable)
+3. V_instance(s_n) ≥ 0.80
+4. V_meta(s_n) ≥ 0.80
+5. Objectives complete
+6. ΔV < 0.02 for 2+ iterations (diminishing returns)
+```
+
+### Example: Bootstrap-009 (Observability)
+
+```
+Iteration 6:
+  V_instance(s₆) = 0.87 (target: 0.80) ✅
+  V_meta(s₆) = 0.83 (target: 0.80) ✅
+  M₆ == M₅ ✅
+  A₆ == A₅ ✅
+  Objectives: All 3 pillars implemented ✅
+  ΔV: 0.01 (< 0.02) ✅
+
+→ CONVERGED (Standard Dual Convergence)
+```
+
+**Use when**: Both task and methodology are equally important.
+
+---
+
+## Meta-Focused Convergence
+
+Alternative pattern when methodology is primary goal (used in 1/8 experiments):
+
+### Criteria
+
+```
+Converged when ALL of:
+1. M_n == M_{n-1}        (Meta-Agent stable)
+2. A_n == A_{n-1}        (Agent set stable)
+3. V_meta(s_n) ≥ 0.80    (Methodology excellent)
+4. V_instance(s_n) ≥ 0.55 (Instance practically sufficient)
+5. Instance gap is infrastructure, NOT methodology
+6. System stable for 2+ iterations
+```
+
+### Example: Bootstrap-011 (Knowledge Transfer)
+
+```
+Iteration 3:
+  V_instance(s₃) = 0.585 (practically sufficient)
+  V_meta(s₃) = 0.877 (excellent, +9.6% above target) ✅
+  M₃ == M₂ == M₁ ✅
+  A₃ == A₂ == A₁ ✅
+
+  Instance gap analysis:
+  - Missing: Knowledge graph, semantic search (infrastructure)
+  - Present: ALL 3 learning paths complete (methodology)
+  - Value: 3-8x onboarding speedup already achieved
+
+  Meta convergence:
+  - Completeness: 0.80 (all templates complete)
+  - Effectiveness: 0.95 (3-8x validated)
+  - Reusability: 0.88 (95%+ transferable)
+
+→ CONVERGED (Meta-Focused Convergence)
+```
+
+**Use when**:
+- Experiment explicitly prioritizes meta-objective
+- Instance gap is tooling/infrastructure, not methodology
+- Methodology has reached complete transferability (≥90%)
+- Further instance work would not improve methodology quality
+
+**Validation checklist**:
+- [ ] Primary objective is methodology (stated in README)
+- [ ] Instance gap is infrastructure (not methodology gaps)
+- [ ] V_meta_reusability ≥ 0.90
+- [ ] Practical value delivered (speedup demonstrated)
+
+---
+
+## Practical Convergence
+
+Alternative pattern when quality exceeds metrics (used in 1/8 experiments):
+
+### Criteria
+
+```
+Converged when ALL of:
+1. M_n == M_{n-1}        (Meta-Agent stable)
+2. A_n == A_{n-1}        (Agent set stable)
+3. V_instance + V_meta ≥ 1.60 (combined threshold)
+4. Quality evidence exceeds raw metric scores
+5. Justified partial criteria
+6. ΔV < 0.02 for 2+ iterations
+```
+
+### Example: Bootstrap-002 (Testing)
+
+```
+Iteration 5:
+  V_instance(s₅) = 0.848 (target: 0.80, +6% margin) ✅
+  V_meta(s₅) ≈ 0.85 (estimated)
+  Combined: 1.698 (> 1.60) ✅
+
+  Quality evidence:
+  - Coverage: 75% overall BUT 86-94% in core packages
+  - Sub-package excellence > aggregate metric
+  - Quality gates: 8/10 met consistently
+  - Test quality: Fixtures, mocks, zero flaky tests
+  - 15x speedup validated
+  - 89% methodology reusability
+
+  M₅ == M₄ ✅
+  A₅ == A₄ ✅
+  ΔV: 0.01 (< 0.02) ✅
+
+→ CONVERGED (Practical Convergence)
+```
+
+**Use when**:
+- Some components don't reach target but overall quality is excellent
+- Sub-system excellence compensates for aggregate metrics
+- Diminishing returns demonstrated
+- Honest assessment shows methodology complete
+
+**Validation checklist**:
+- [ ] Combined V_instance + V_meta ≥ 1.60
+- [ ] Quality evidence documented (not just metrics)
+- [ ] Honest gap analysis (no inflation)
+- [ ] Diminishing returns proven (ΔV trend)
+
+---
+
+## System Stability
+
+All convergence patterns require system stability:
+
+### Agent Set Stability (A_n == A_{n-1})
+
+**Stable when**:
+- Same agents used in iteration n and n-1
+- No new specialized agents created
+- No agent capabilities expanded
+
+**Example**:
+```
+Iteration 5: {coder, doc-writer, data-analyst, log-analyzer}
+Iteration 6: {coder, doc-writer, data-analyst, log-analyzer}
+→ A₆ == A₅ ✅ STABLE
+```
+
+### Meta-Agent Stability (M_n == M_{n-1})
+
+**Stable when**:
+- Same 5 capabilities in iteration n and n-1
+- No new coordination patterns
+- No Meta-Agent prompt evolution
+
+**Standard M₀ capabilities**:
+1. observe - Pattern observation
+2. plan - Iteration planning
+3. execute - Agent orchestration
+4. reflect - Value assessment
+5. evolve - System evolution
+
+**Finding**: M₀ was sufficient in ALL 8 experiments (no evolution needed)
+
+---
+
+## Diminishing Returns
+
+**Definition**: ΔV < epsilon for k consecutive iterations
+
+**Standard threshold**: epsilon = 0.02, k = 2
+
+**Calculation**:
+```
+ΔV_n = |V_total(s_n) - V_total(s_{n-1})|
+
+If ΔV_n < 0.02 AND ΔV_{n-1} < 0.02:
+  → Diminishing returns detected
+```
+
+**Example**:
+```
+Iteration 4: V_total = 0.82, ΔV = 0.05 (significant)
+Iteration 5: V_total = 0.84, ΔV = 0.02 (small)
+Iteration 6: V_total = 0.85, ΔV = 0.01 (small)
+→ Diminishing returns since Iteration 5
+```
+
+**Interpretation**:
+- Large ΔV (>0.05): Significant progress, continue
+- Medium ΔV (0.02-0.05): Steady progress, continue
+- Small ΔV (<0.02): Diminishing returns, consider converging
+
+---
+
+## Decision Tree
+
+```
+Start with iteration n:
+
+1. Calculate V_instance(s_n) and V_meta(s_n)
+
+2. Check system stability:
+   M_n == M_{n-1}? → YES/NO
+   A_n == A_{n-1}? → YES/NO
+
+   If NO to either → Continue iteration n+1
+
+3. Check convergence pattern:
+
+   Pattern A: Standard Dual Convergence
+   ├─ V_instance ≥ 0.80? → YES
+   ├─ V_meta ≥ 0.80? → YES
+   ├─ Objectives complete? → YES
+   ├─ ΔV < 0.02 for 2 iterations? → YES
+   └─→ CONVERGED ✅
+
+   Pattern B: Meta-Focused Convergence
+   ├─ V_meta ≥ 0.80? → YES
+   ├─ V_instance ≥ 0.55? → YES
+   ├─ Primary objective is methodology? → YES
+   ├─ Instance gap is infrastructure? → YES
+   ├─ V_meta_reusability ≥ 0.90? → YES
+   └─→ CONVERGED ✅
+
+   Pattern C: Practical Convergence
+   ├─ V_instance + V_meta ≥ 1.60? → YES
+   ├─ Quality evidence strong? → YES
+   ├─ Justified partial criteria? → YES
+   ├─ ΔV < 0.02 for 2 iterations? → YES
+   └─→ CONVERGED ✅
+
+4. If no pattern matches → Continue iteration n+1
+```
+
+---
+
+## Common Mistakes
+
+### Mistake 1: Premature Convergence
+
+**Symptom**: Declaring convergence before system stable
+
+**Example**:
+```
+Iteration 3:
+  V_instance = 0.82 ✅
+  V_meta = 0.81 ✅
+  BUT M₃ ≠ M₂ (new Meta-Agent capability added)
+
+→ NOT CONVERGED (system unstable)
+```
+
+**Fix**: Wait until M_n == M_{n-1} and A_n == A_{n-1}
+
+### Mistake 2: Inflated Values
+
+**Symptom**: V scores mysteriously jump to exactly 0.80
+
+**Example**:
+```
+Iteration 4: V_instance = 0.77
+Iteration 5: V_instance = 0.80 (claimed)
+BUT no substantial work done!
+```
+
+**Fix**: Honest assessment, gap enumeration, evidence-based scoring
+
+### Mistake 3: Moving Goalposts
+
+**Symptom**: Changing criteria mid-experiment
+
+**Example**:
+```
+Initial plan: V_instance ≥ 0.80
+Final state: V_instance = 0.65
+Conclusion: "Actually, 0.65 is sufficient" ❌ WRONG
+```
+
+**Fix**: Either reach 0.80 OR use Meta-Focused/Practical with explicit justification
+
+### Mistake 4: Ignoring System Instability
+
+**Symptom**: Declaring convergence while agents still evolving
+
+**Example**:
+```
+Iteration 5:
+  Both V scores ≥ 0.80 ✅
+  BUT new specialized agent created in Iteration 5
+  A₅ ≠ A₄
+
+→ NOT CONVERGED (agent set unstable)
+```
+
+**Fix**: Run Iteration 6 to confirm A₆ == A₅
+
+---
+
+## Convergence Prediction
+
+Based on 8 experiments, you can predict iteration count:
+
+**Base estimate**: 5 iterations
+
+**Adjustments**:
+- Well-defined domain: -2 iterations
+- Existing tools available: -1 iteration
+- High interdependency: +2 iterations
+- Novel patterns needed: +1 iteration
+- Large codebase scope: +1 iteration
+- Multiple competing goals: +1 iteration
+
+**Examples**:
+- Dependency Health: 5 - 2 - 1 = 2 → actual 3 ✓
+- Observability: 5 + 0 + 1 = 6 → actual 6 ✓
+- Cross-Cutting: 5 + 2 + 1 = 8 → actual 8 ✓
+
+---
+
+**Next**: Read [dual-value-functions.md](dual-value-functions.md) for V_instance and V_meta calculation.