Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:07:22 +08:00
commit fab98d059b
179 changed files with 46209 additions and 0 deletions

View File

@@ -0,0 +1,465 @@
---
name: Baseline Quality Assessment
description: Achieve comprehensive baseline (V_meta ≥0.40) in iteration 0 to enable rapid convergence. Use when planning iteration 0 time allocation, domain has established practices to reference, rich historical data exists for immediate quantification, or targeting 3-4 iteration convergence. Provides 4 quality levels (minimal/basic/comprehensive/exceptional), component-by-component V_meta calculation guide, and 3 strategies for comprehensive baseline (leverage prior art, quantify baseline, domain universality analysis). 40-50% iteration reduction when V_meta(s₀) ≥0.40 vs <0.20. Spend 3-4 extra hours in iteration 0, save 3-6 hours overall.
allowed-tools: Read, Grep, Glob, Bash, Edit, Write
---
# Baseline Quality Assessment
**Invest in iteration 0 to save 40-50% total time.**
> A strong baseline (V_meta ≥0.40) is the foundation of rapid convergence. Spend hours in iteration 0 to save days overall.
---
## When to Use This Skill
Use this skill when:
- 📋 **Planning iteration 0**: Deciding time allocation and priorities
- 🎯 **Targeting rapid convergence**: Want 3-4 iterations (not 5-7)
- 📚 **Prior art exists**: Domain has established practices to reference
- 📊 **Historical data available**: Can quantify baseline immediately
-**Time constraints**: Need methodology in 10-15 hours total
- 🔍 **Gap clarity needed**: Want obvious iteration objectives
**Don't use when**:
- ❌ Exploratory domain (no prior art)
- ❌ Greenfield project (no historical data)
- ❌ Time abundant (standard convergence acceptable)
- ❌ Incremental baseline acceptable (build up gradually)
---
## Quick Start (30 minutes)
### Baseline Quality Self-Assessment
Calculate your V_meta(s₀):
**V_meta = (Completeness + Effectiveness + Reusability + Validation) / 4**
**Completeness** (Documentation exists?):
- 0.00: No documentation
- 0.25: Basic notes only
- 0.50: Partial documentation (some categories)
- 0.75: Most documentation complete
- 1.00: Comprehensive documentation
**Effectiveness** (Speedup quantified?):
- 0.00: No baseline measurement
- 0.25: Informal estimates
- 0.50: Some metrics measured
- 0.75: Most metrics quantified
- 1.00: Full quantitative baseline
**Reusability** (Transferable patterns?):
- 0.00: No patterns identified
- 0.25: Ad-hoc solutions only
- 0.50: Some patterns emerging
- 0.75: Most patterns codified
- 1.00: Universal patterns documented
**Validation** (Evidence-based?):
- 0.00: No validation
- 0.25: Anecdotal only
- 0.50: Some data analysis
- 0.75: Systematic analysis
- 1.00: Comprehensive validation
**Example** (Bootstrap-003, V_meta(s₀) = 0.48):
```
Completeness: 0.60 (10-category taxonomy, 79.1% coverage)
Effectiveness: 0.40 (Error rate quantified: 5.78%)
Reusability: 0.40 (5 workflows, 5 patterns, 8 guidelines)
Validation: 0.50 (1,336 errors analyzed)
---
V_meta(s₀) = (0.60 + 0.40 + 0.40 + 0.50) / 4 = 0.475 ≈ 0.48
```
**Target**: V_meta(s₀) ≥ 0.40 for rapid convergence
---
## Four Baseline Quality Levels
### Level 1: Minimal (V_meta <0.20)
**Characteristics**:
- No or minimal documentation
- No quantitative metrics
- No pattern identification
- No validation
**Iteration 0 time**: 1-2 hours
**Total iterations**: 6-10 (standard to slow convergence)
**Example**: Starting from scratch in novel domain
**When acceptable**: Exploratory research, no prior art
### Level 2: Basic (V_meta 0.20-0.39)
**Characteristics**:
- Basic documentation (notes, informal structure)
- Some metrics identified (not quantified)
- Ad-hoc patterns (not codified)
- Anecdotal validation
**Iteration 0 time**: 2-3 hours
**Total iterations**: 5-7 (standard convergence)
**Example**: Bootstrap-002 (V_meta(s₀) = 0.04, but quickly built to basic)
**When acceptable**: Standard timelines, incremental approach
### Level 3: Comprehensive (V_meta 0.40-0.60) ⭐ TARGET
**Characteristics**:
- Structured documentation (taxonomy, categories)
- Quantified metrics (baseline measured)
- Codified patterns (initial pattern library)
- Systematic validation (data analysis)
**Iteration 0 time**: 3-5 hours
**Total iterations**: 3-4 (rapid convergence)
**Example**: Bootstrap-003 (V_meta(s₀) = 0.48, converged in 3 iterations)
**When to target**: Time constrained, prior art exists, data available
### Level 4: Exceptional (V_meta >0.60)
**Characteristics**:
- Comprehensive documentation (≥90% coverage)
- Full quantitative baseline (all metrics)
- Extensive pattern library
- Validated methodology (proven in 1+ contexts)
**Iteration 0 time**: 5-8 hours
**Total iterations**: 2-3 (exceptional rapid convergence)
**Example**: Hypothetical (not yet observed in experiments)
**When to target**: Adaptation of proven methodology, domain expertise high
---
## Three Strategies for Comprehensive Baseline
### Strategy 1: Leverage Prior Art (2-3 hours)
**When**: Domain has established practices
**Steps**:
1. **Literature review** (30 min):
- Industry best practices
- Existing methodologies
- Academic research
2. **Extract patterns** (60 min):
- Common approaches
- Known anti-patterns
- Success metrics
3. **Adapt to context** (60 min):
- What's applicable?
- What needs modification?
- What's missing?
**Example** (Bootstrap-003):
```
Prior art: Error handling literature
- Detection: Industry standard (logs, monitoring)
- Diagnosis: Root cause analysis patterns
- Recovery: Retry, fallback patterns
- Prevention: Static analysis, linting
Adaptation:
- Detection: meta-cc MCP queries (novel application)
- Diagnosis: Session history analysis (context-specific)
- Recovery: Generic patterns apply
- Prevention: Pre-tool validation (novel approach)
Result: V_completeness = 0.60 (60% from prior art, 40% novel)
```
### Strategy 2: Quantify Baseline (1-2 hours)
**When**: Rich historical data exists
**Steps**:
1. **Identify data sources** (15 min):
- Logs, session history, metrics
- Git history, CI/CD logs
- Issue trackers, user feedback
2. **Extract metrics** (30 min):
- Volume (total instances)
- Rate (frequency)
- Distribution (categories)
- Impact (cost)
3. **Analyze patterns** (45 min):
- What's most common?
- What's most costly?
- What's preventable?
**Example** (Bootstrap-003):
```
Data source: meta-cc MCP server
Query: meta-cc query-tools --status error
Results:
- Volume: 1,336 errors
- Rate: 5.78% error rate
- Distribution: File-not-found 12.2%, Read-before-write 5.2%, etc.
- Impact: MTTD 15 min, MTTR 30 min
Analysis:
- Top 3 categories account for 23.7% of errors
- File path issues most preventable
- Clear automation opportunities
Result: V_effectiveness = 0.40 (baseline quantified)
```
### Strategy 3: Domain Universality Analysis (1-2 hours)
**When**: Domain is universal (errors, testing, CI/CD)
**Steps**:
1. **Identify universal patterns** (30 min):
- What applies to all projects?
- What's language-agnostic?
- What's platform-agnostic?
2. **Document transferability** (30 min):
- What % is reusable?
- What needs adaptation?
- What's project-specific?
3. **Create initial taxonomy** (30 min):
- Categorize patterns
- Identify gaps
- Estimate coverage
**Example** (Bootstrap-003):
```
Universal patterns:
- Errors affect all software (100% universal)
- Detection, diagnosis, recovery, prevention (universal workflow)
- File operations, API calls, data validation (universal categories)
Taxonomy (iteration 0):
- 10 categories identified
- 1,058 errors classified (79.1% coverage)
- Gaps: Edge cases, complex interactions
Result: V_reusability = 0.40 (universal patterns identified)
```
---
## Baseline Investment ROI
**Trade-off**: Spend more in iteration 0 to save overall time
**Data** (from experiments):
| Baseline | Iter 0 Time | Total Iterations | Total Time | Savings |
|----------|-------------|------------------|------------|---------|
| Minimal (<0.20) | 1-2h | 6-10 | 24-40h | Baseline |
| Basic (0.20-0.39) | 2-3h | 5-7 | 20-28h | 10-30% |
| Comprehensive (0.40-0.60) | 3-5h | 3-4 | 12-16h | 40-50% |
| Exceptional (>0.60) | 5-8h | 2-3 | 10-15h | 50-60% |
**Example** (Bootstrap-003):
```
Comprehensive baseline:
- Iteration 0: 3 hours (vs 1 hour minimal)
- Total: 10 hours, 3 iterations
- Savings: 15-25 hours vs minimal baseline (60-70%)
ROI: +2 hours investment → 15-25 hours saved
```
**Recommendation**: Target comprehensive (V_meta ≥0.40) when:
- Time constrained (need fast convergence)
- Prior art exists (can leverage quickly)
- Data available (can quantify immediately)
---
## Component-by-Component Guide
### Completeness (Documentation)
**0.00**: No documentation
**0.25**: Basic notes
- Informal observations
- Bullet points
- No structure
**0.50**: Partial documentation
- Some categories/patterns
- 40-60% coverage
- Basic structure
**0.75**: Most documentation
- Structured taxonomy
- 70-90% coverage
- Clear organization
**1.00**: Comprehensive
- Complete taxonomy
- 90%+ coverage
- Production-ready
**Target for V_meta ≥0.40**: Completeness ≥0.50
### Effectiveness (Quantification)
**0.00**: No baseline measurement
**0.25**: Informal estimates
- "Errors happen sometimes"
- No numbers
**0.50**: Some metrics
- Volume measured (e.g., 1,336 errors)
- Rate not calculated
**0.75**: Most metrics
- Volume, rate, distribution
- Missing impact (MTTD/MTTR)
**1.00**: Full quantification
- All metrics measured
- Baseline fully quantified
**Target for V_meta ≥0.40**: Effectiveness ≥0.30
### Reusability (Patterns)
**0.00**: No patterns
**0.25**: Ad-hoc solutions
- One-off fixes
- No generalization
**0.50**: Some patterns
- 3-5 patterns identified
- Partial universality
**0.75**: Most patterns
- 5-10 patterns codified
- High transferability
**1.00**: Universal patterns
- Complete pattern library
- 90%+ transferable
**Target for V_meta ≥0.40**: Reusability ≥0.40
### Validation (Evidence)
**0.00**: No validation
**0.25**: Anecdotal
- "Seems to work"
- No data
**0.50**: Some data
- Basic analysis
- Limited scope
**0.75**: Systematic
- Comprehensive analysis
- Clear evidence
**1.00**: Validated
- Multiple contexts
- Statistical confidence
**Target for V_meta ≥0.40**: Validation ≥0.30
---
## Iteration 0 Checklist (for V_meta ≥0.40)
**Documentation** (Target: Completeness ≥0.50):
- [ ] Create initial taxonomy (≥5 categories)
- [ ] Document 3-5 patterns/workflows
- [ ] Achieve 60-80% coverage
- [ ] Structured markdown documentation
**Quantification** (Target: Effectiveness ≥0.30):
- [ ] Measure volume (total instances)
- [ ] Calculate rate (frequency)
- [ ] Analyze distribution (category breakdown)
- [ ] Baseline quantified with numbers
**Patterns** (Target: Reusability ≥0.40):
- [ ] Identify 3-5 universal patterns
- [ ] Document transferability
- [ ] Estimate reusability %
- [ ] Distinguish universal vs domain-specific
**Validation** (Target: Validation ≥0.30):
- [ ] Analyze historical data
- [ ] Sample validation (≥30 instances)
- [ ] Evidence-based claims
- [ ] Data sources documented
**Time Investment**: 3-5 hours
**Expected V_meta(s₀)**: 0.40-0.50
---
## Success Criteria
Baseline quality assessment succeeded when:
1. **V_meta target met**: V_meta(s₀) ≥ 0.40 achieved
2. **Iteration reduction**: 3-4 iterations vs 5-7 (40-50% reduction)
3. **Time savings**: Total time ≤12-16 hours (comprehensive baseline)
4. **Gap clarity**: Clear objectives for iteration 1-2
5. **ROI positive**: Baseline investment <total time saved
**Bootstrap-003 Validation**:
- ✅ V_meta(s₀) = 0.48 (target met)
- ✅ 3 iterations (vs 6 for Bootstrap-002 with minimal baseline)
- ✅ 10 hours total (60% reduction)
- ✅ Gaps clear (top 3 automations identified)
- ✅ ROI: +2h investment → 15h saved
---
## Related Skills
**Parent framework**:
- [methodology-bootstrapping](../methodology-bootstrapping/SKILL.md) - Core OCA cycle
**Uses baseline for**:
- [rapid-convergence](../rapid-convergence/SKILL.md) - V_meta ≥0.40 is criterion #1
**Validation**:
- [retrospective-validation](../retrospective-validation/SKILL.md) - Data quantification
---
## References
**Core guide**:
- [Quality Levels](reference/quality-levels.md) - Detailed level definitions
- [Component Guide](reference/components.md) - V_meta calculation
- [Investment ROI](reference/roi.md) - Time savings analysis
**Examples**:
- [Bootstrap-003 Comprehensive](examples/error-recovery-comprehensive-baseline.md) - V_meta=0.48
- [Bootstrap-002 Minimal](examples/testing-strategy-minimal-baseline.md) - V_meta=0.04
---
**Status**: ✅ Validated | 40-50% iteration reduction | Positive ROI

View File

@@ -0,0 +1,62 @@
# Error Recovery: Comprehensive Baseline Example
**Experiment**: bootstrap-003-error-recovery
**Baseline Investment**: 120 min
**V_meta(s₀)**: 0.758 (Excellent)
**Result**: 3 iterations (vs 6 standard)
---
## Activities (120 min)
### 1. Data Analysis (60 min)
```bash
# Query all errors
meta-cc query-tools --status=error > errors.jsonl
# Result: 1,336 errors
# Frequency analysis
cat errors.jsonl | jq -r '.error_pattern' | sort | uniq -c | sort -rn
# Top patterns:
# - File-not-found: 250 (18.7%)
# - MCP errors: 228 (17.1%)
# - Build errors: 200 (15.0%)
```
### 2. Taxonomy Creation (40 min)
Created 10 categories, classified 1,056/1,336 = 79.1%
### 3. Prior Art Research (15 min)
Borrowed 5 industry error patterns
### 4. Automation Planning (5 min)
Identified 3 tools (23.7% prevention potential)
---
## V_meta(s₀) Calculation
```
Completeness: 10/13 = 0.77
Transferability: 5/10 = 0.50
Automation: 3/3 = 1.0
V_meta(s₀) = 0.4×0.77 + 0.3×0.50 + 0.3×1.0 = 0.758
```
---
## Outcome
- Iterations: 3 (rapid convergence)
- Total time: 10 hours
- ROI: 540 min saved / 60 min extra = 9x
---
**Source**: Bootstrap-003, comprehensive baseline approach

View File

@@ -0,0 +1,69 @@
# Testing Strategy: Minimal Baseline Example
**Experiment**: bootstrap-002-test-strategy
**Baseline Investment**: 60 min
**V_meta(s₀)**: 0.04 (Poor)
**Result**: 6 iterations (standard convergence)
---
## Activities (60 min)
### 1. Coverage Measurement (30 min)
```bash
go test -cover ./...
# Result: 72.1% coverage, 590 tests
```
### 2. Ad-hoc Testing (20 min)
Wrote 3 tests manually, noted duplication issues
### 3. No Prior Art Research (0 min)
Started from scratch
### 4. Vague Automation Ideas (10 min)
"Maybe coverage analysis tools..." (not concrete)
---
## V_meta(s₀) Calculation
```
Completeness: 0/8 = 0.00 (no patterns documented)
Transferability: 0/8 = 0.00 (no research)
Automation: 0/3 = 0.00 (not identified)
V_meta(s₀) = 0.4×0.00 + 0.3×0.00 + 0.3×0.00 = 0.00
```
---
## Outcome
- Iterations: 6 (standard convergence)
- Total time: 25.5 hours
- Patterns emerged gradually over 6 iterations
---
## What Could Have Been Different
**If invested 2 more hours in iteration 0**:
- Research test patterns (borrow 5-6)
- Analyze codebase for test opportunities
- Identify coverage tools
**Estimated result**:
- V_meta(s₀) = 0.30-0.40
- 4-5 iterations (vs 6)
- Time saved: 3-6 hours
**ROI**: 2-3x
---
**Source**: Bootstrap-002, minimal baseline approach

View File

@@ -0,0 +1,133 @@
# Baseline Quality Assessment Components
**Purpose**: V_meta(s₀) calculation components for strong iteration 0
**Target**: V_meta(s₀) ≥ 0.40 for rapid convergence
---
## Formula
```
V_meta(s₀) = 0.4 × completeness +
0.3 × transferability +
0.3 × automation_effectiveness
```
---
## Component 1: Completeness (40%)
**Definition**: Initial pattern/taxonomy coverage
**Calculation**:
```
completeness = initial_items / estimated_final_items
```
**Achieve ≥0.50**:
- Analyze ALL available data (3-5 hours)
- Create 10-15 initial categories/patterns
- Classify ≥70% of observed cases
**Example (Error Recovery)**:
```
Initial: 10 categories (1,056/1,336 = 79.1% coverage)
Estimated final: 12-13 categories
Completeness: 10/12.5 = 0.80
Contribution: 0.4 × 0.80 = 0.32
```
---
## Component 2: Transferability (30%)
**Definition**: Reusable patterns from prior art
**Calculation**:
```
transferability = borrowed_patterns / total_patterns_needed
```
**Achieve ≥0.30**:
- Research similar methodologies (1-2 hours)
- Identify industry standards
- Document borrowable patterns (≥30%)
**Example (Error Recovery)**:
```
Borrowed: 5 industry error patterns
Total needed: ~10
Transferability: 5/10 = 0.50
Contribution: 0.3 × 0.50 = 0.15
```
---
## Component 3: Automation (30%)
**Definition**: Early identification of high-ROI automation
**Calculation**:
```
automation_effectiveness = identified_tools / expected_tools
```
**Achieve ≥0.30**:
- Frequency analysis (1 hour)
- Identify top 3-5 automation candidates
- Estimate ROI (≥5x)
**Example (Error Recovery)**:
```
Identified: 3 tools (all with >20x ROI)
Expected final: 3 tools
Automation: 3/3 = 1.0
Contribution: 0.3 × 1.0 = 0.30
```
---
## Quality Levels
### Excellent (V_meta ≥ 0.60)
**Achieves**:
- Completeness: ≥0.70
- Transferability: ≥0.60
- Automation: ≥0.70
**Effort**: 6-10 hours
**Outcome**: 3-4 iterations
### Good (V_meta = 0.40-0.59)
**Achieves**:
- Completeness: ≥0.50
- Transferability: ≥0.30
- Automation: ≥0.30
**Effort**: 4-6 hours
**Outcome**: 4-5 iterations
### Fair (V_meta = 0.20-0.39)
**Achieves**:
- Completeness: 0.30-0.50
- Transferability: 0.20-0.30
- Automation: 0.20-0.30
**Effort**: 2-4 hours
**Outcome**: 5-7 iterations
### Poor (V_meta < 0.20)
**Indicates**:
- Minimal baseline work
- Exploratory phase needed
**Effort**: <2 hours
**Outcome**: 7-10 iterations
---
**Source**: BAIME Baseline Quality Assessment

View File

@@ -0,0 +1,61 @@
# Baseline Quality Levels
**V_meta(s₀) thresholds and expected outcomes**
---
## Level 1: Excellent (0.60-1.0)
**Characteristics**:
- Comprehensive data analysis (ALL available data)
- 70-80% initial coverage
- Significant prior art borrowed (≥60%)
- All automation identified upfront
**Investment**: 6-10 hours
**Outcome**: 3-4 iterations (rapid convergence)
**Examples**: Bootstrap-003 (V_meta=0.758)
---
## Level 2: Good (0.40-0.59)
**Characteristics**:
- Thorough analysis (≥80% of data)
- 50-70% initial coverage
- Moderate borrowing (30-60%)
- Top 3 automations identified
**Investment**: 4-6 hours
**Outcome**: 4-5 iterations
**ROI**: 2-3x (saves 8-12 hours overall)
---
## Level 3: Fair (0.20-0.39)
**Characteristics**:
- Partial analysis (50-80% of data)
- 30-50% initial coverage
- Limited borrowing (<30%)
- 1-2 automations identified
**Investment**: 2-4 hours
**Outcome**: 5-7 iterations (standard)
---
## Level 4: Poor (<0.20)
**Characteristics**:
- Minimal analysis (<50% of data)
- <30% coverage
- Little/no prior art research
- Unclear automation
**Investment**: <2 hours
**Outcome**: 7-10 iterations (exploratory)
---
**Recommendation**: Target Level 2 (≥0.40) minimum for quality convergence.

View File

@@ -0,0 +1,55 @@
# Baseline Investment ROI
**Investment in strong baseline vs time saved**
---
## ROI Formula
```
ROI = time_saved / baseline_investment
Where:
- time_saved = (standard_iterations - actual_iterations) × avg_iteration_time
- baseline_investment = (iteration_0_time - minimal_baseline_time)
```
---
## Examples
### Bootstrap-003 (High ROI)
```
Baseline investment: 120 min (vs 60 min minimal) = +60 min
Iterations saved: 6 - 3 = 3 iterations
Time per iteration: ~3 hours
Time saved: 3 × 3h = 9 hours = 540 min
ROI = 540 min / 60 min = 9x
```
### Bootstrap-002 (Low Investment)
```
Baseline investment: 60 min (minimal)
Result: 6 iterations (standard)
No time saved (baseline approach)
ROI = 0x (but no risk either)
```
---
## Investment Levels
| Investment | V_meta(s₀) | Iterations | Time Saved | ROI |
|------------|------------|------------|------------|-----|
| 8-10h | 0.70-0.80 | 3 | 15-20h | 2-3x |
| 6-8h | 0.50-0.70 | 3-4 | 12-18h | 2-3x |
| 4-6h | 0.40-0.50 | 4-5 | 8-12h | 2-2.5x |
| 2-4h | 0.20-0.40 | 5-7 | 0-4h | 0-1x |
| <2h | <0.20 | 7-10 | N/A | N/A |
---
**Recommendation**: Invest 4-6 hours for V_meta(s₀) = 0.40-0.50 (2-3x ROI).