gh-yaleh-meta-cc-claude/skills/documentation-management/reference/baime-documentation-example.md

# BAIME Usage Guide

**BAIME (Bootstrapped AI Methodology Engineering)** - A systematic framework for developing and validating software engineering methodologies through observation, codification, and automation.

---

## Table of Contents

- [What is BAIME?](#what-is-baime)
- [When to Use BAIME](#when-to-use-baime)
- [Prerequisites](#prerequisites)
- [Core Concepts](#core-concepts)
- [Frequently Asked Questions](#frequently-asked-questions)
- [Quick Start](#quick-start)
- [Step-by-Step Workflow](#step-by-step-workflow)
- [Specialized Agents](#specialized-agents)
- [Practical Example](#practical-example)
- [Troubleshooting](#troubleshooting)
- [Next Steps](#next-steps)

---

## What is BAIME?

BAIME integrates three complementary methodologies optimized for LLM-based development:

1. **OCA Cycle** (Observe-Codify-Automate) - Core iterative framework
2. **Empirical Validation** - Scientific method and data-driven decisions
3. **Value Optimization** - Dual-layer value functions for quantitative evaluation

**Key Innovation**: BAIME treats methodology development like software development—with empirical observation, automated testing, continuous iteration, and quantitative metrics.

### Why BAIME?

**Problem**: Ad-hoc methodology development is slow, subjective, and hard to validate.

**Solution**: BAIME provides systematic approach with:
- ✅ **Rapid convergence**: Typically 3-7 iterations, 6-15 hours
- ✅ **Empirical validation**: Data-driven evidence, not opinions
- ✅ **High transferability**: 70-95% reusable across projects
- ✅ **Proven results**: 100% success rate across 8 experiments, 10-50x speedup

### BAIME in Action

**Example Results**:
- **Testing Strategy**: 15x speedup, 89% transferability
- **CI/CD Pipeline**: 2.5-3.5x speedup, 91.7% pattern validation
- **Error Recovery**: 95.4% error coverage, 3 iterations
- **Documentation System**: 47% token cost reduction, 85% reduction in redundancy
- **Knowledge Transfer**: 3-8x onboarding speedup

---

## When to Use BAIME

### Use BAIME For

✅ **Creating systematic methodologies** for:
- Testing strategies
- CI/CD pipelines
- Error handling patterns
- Observability systems
- Dependency management
- Documentation systems
- Knowledge transfer processes
- Technical debt management
- Cross-cutting concerns

✅ **When you need**:
- Empirical validation with data
- Iterative methodology evolution
- Quantitative quality metrics
- Transferable best practices
- Rapid convergence (hours to days, not weeks)

### Don't Use BAIME For

❌ **One-time ad-hoc tasks** without reusability goals
❌ **Trivial processes** (<100 lines of code/docs)
❌ **Established standards** that fully solve your problem

---

## Prerequisites

### Required

1. **meta-cc plugin installed** and configured
   - See [Installation Guide](installation.md)
   - Verify: `/meta "show stats"` works

2. **Claude Code** environment
   - Access to Task tool for subagent invocation

3. **Project with need for methodology**
   - Have a specific domain in mind (testing, CI/CD, etc.)
   - Able to measure current state (baseline)

### Recommended

- **Familiarity with meta-cc** basic features
- **Understanding of your domain** (e.g., if developing testing methodology, know testing basics)
- **Git repository** for tracking methodology evolution

---

## Core Concepts

### Understanding Value Functions

BAIME uses **dual-layer value functions** to measure quality at two independent levels:

#### V_instance: Domain-Specific Quality

Measures the quality of your specific deliverables:

- **Purpose**: Assess whether your domain work is high-quality
- **Examples**:
  - Testing methodology: Test coverage percentage, test maintainability
  - CI/CD pipeline: Build time, deployment success rate, quality gate coverage
  - Documentation: Completeness, accuracy, usability
- **Characteristics**: Domain-dependent, specific to your work

#### V_meta: Methodology Quality

Measures the quality of the methodology itself:

- **Purpose**: Assess whether your methodology is reusable and effective
- **Components**:
  - **Completeness**: All necessary patterns, templates, tools exist
  - **Effectiveness**: Methodology improves quality and efficiency
  - **Reusability**: Works across projects with minimal adaptation
  - **Validation**: Empirically tested and proven effective
- **Characteristics**: Domain-independent, universal assessment

#### Convergence Requirement

**Both must reach ≥ 0.80** for methodology to be complete:

- V_instance ≥ 0.80: Domain work is production-ready
- V_meta ≥ 0.80: Methodology is reusable
- If only one converges, keep iterating

---

### The OCA Cycle

Each iteration follows the **Observe-Codify-Automate** cycle:

```
Observe → Codify → Automate → Evaluate
   ↓                              ↓
   ← ← ← ← ← Iterate ← ← ← ← ← ←
```

#### Phase 1: Observe

**Goal**: Collect empirical data about current state

**Activities**:
- Read previous iteration results
- Measure baseline (Iteration 0) or current state
- Identify problems and patterns
- Gather evidence about what's working/not working

**Output**: Data artifacts documenting observations

#### Phase 2: Codify

**Goal**: Extract patterns and create reusable structures

**Activities**:
- Form strategy based on evidence
- Extract recurring patterns into documented forms
- Create templates for common structures
- Prioritize improvements based on impact

**Output**: Patterns, templates, strategy documentation

#### Phase 3: Automate

**Goal**: Build tools to improve efficiency and consistency

**Activities**:
- Create automation scripts (validators, generators, analyzers)
- Implement quality gates
- Build CI integration
- Execute planned improvements

**Output**: Working tools, improved deliverables

#### Phase 4: Evaluate

**Goal**: Measure progress and assess convergence

**Activities**:
- Calculate V_instance and V_meta scores
- Provide evidence for each component
- Identify remaining gaps
- Check convergence criteria

**Output**: Value scores, gap analysis, convergence decision

---

### Meta-Agent and Specialized Agents

#### Meta-Agent

The **meta-agent orchestrates** the entire BAIME process:

**Responsibilities**:
- Read lifecycle capabilities before each phase (fresh, no caching)
- Execute OCA cycle systematically
- Track system state evolution (M_n, A_n, s_n)
- Coordinate specialized agents when needed
- Make evidence-based evolution decisions

**Key Behavior**: Reads capabilities fresh each iteration to incorporate latest guidance

#### Specialized Agents

**Domain-specific executors** created when evidence shows need:

**When created**:
- Generic approach insufficient (demonstrated, not assumed)
- Task recurs 3+ times with similar structure
- Clear expected improvement from specialization

**Examples**:
- `test-generator`: Creates tests following validated patterns
- `validator-agent`: Checks deliverables against quality criteria
- `knowledge-extractor`: Transforms experiment into reusable methodology

**Key Principle**: Agents evolve based on retrospective evidence (not anticipatory design)

---

### Capabilities and System State

#### Capabilities

**Modular guidance files** for each OCA lifecycle phase:

- `capabilities/collect.md` - Data collection patterns
- `capabilities/strategy.md` - Strategy formation guidance
- `capabilities/execute.md` - Execution patterns
- `capabilities/evaluate.md` - Evaluation rubrics
- `capabilities/converge.md` - Convergence assessment

**Evolution**:
- Start empty (placeholders) in Iteration 0
- Evolve when patterns recur 2-3 times
- Based on retrospective evidence (not speculation)
- Read fresh each phase (no caching)

#### System State

**Tracked components** across iterations:

- **M_n**: Methodology components (capabilities, patterns, templates)
- **A_n**: Agent system (specialized agents)
- **s_n**: Current state (deliverables, artifacts, value scores)
- **V(s_n)**: Dual value functions (V_instance, V_meta)

**State transition**: s_{n-1} → s_n documents evolution

---

### Convergence Criteria

Methodology is **complete and production-ready** when all four conditions met:

#### 1. Dual Threshold

- ✅ V_instance ≥ 0.80 (domain goals achieved)
- ✅ V_meta ≥ 0.80 (methodology quality high)

#### 2. System Stability

- ✅ M_n == M_{n-1} (no methodology changes)
- ✅ A_n == A_{n-1} (no agent evolution)
- ✅ Stable for 2+ consecutive iterations

#### 3. Objectives Complete

- ✅ All planned work finished
- ✅ No critical gaps remaining

#### 4. Diminishing Returns

- ✅ ΔV_instance < 0.02 for 2+ iterations
- ✅ ΔV_meta < 0.02 for 2+ iterations

**Note**: If system evolves (new agent/capability), stability clock resets. Evolution must be validated in next iteration before convergence.

---

## Frequently Asked Questions

### General Questions

#### What exactly is BAIME and how is it different from other methodologies?

BAIME (Bootstrapped AI Methodology Engineering) is a meta-methodology for developing domain-specific methodologies through empirical observation and iteration. Unlike traditional methodologies that are designed upfront, BAIME creates methodologies through practice:

- **Traditional approach**: Design methodology → Apply → Hope it works
- **BAIME approach**: Observe patterns → Extract methodology → Validate → Iterate

Key differentiators:
- Dual-layer value functions measure both deliverable quality AND methodology quality
- Evidence-driven evolution (not anticipatory design)
- Quantitative convergence criteria (≥0.80 thresholds)
- Specialized subagents for consistent execution

#### When should I use BAIME vs just following existing best practices?

**Use BAIME when**:
- No established methodology fully fits your domain
- You need methodology customized to your project constraints
- You want empirically validated patterns, not borrowed practices
- You need to measure and prove methodology effectiveness

**Use existing practices when**:
- Industry-standard methodology already solves your problem
- Team already trained on established framework
- Project timeline doesn't allow methodology development
- Problem domain is simple and well-understood

**Use both**: Start with BAIME to develop baseline, then integrate proven external practices in later iterations.

#### How long does a typical BAIME experiment take?

**Typical timeline**:
- **Iteration 0** (Baseline): 2-4 hours
- **Iterations 1-N**: 3-6 hours each
- **Total**: 10-30 hours over 3-7 iterations
- **Knowledge extraction**: 2-4 hours post-convergence

**Time factors**:
- Domain complexity (testing < CI/CD < architecture)
- Baseline quality (higher baseline → fewer iterations)
- Team familiarity with BAIME (improves with practice)
- Automation investment (upfront cost, ongoing savings)

**ROI**: 10-50x speedup on future work justifies investment. A 20-hour methodology development that saves 10 hours per month pays off in month 2.

#### What if my value scores aren't improving between iterations?

**Diagnostic steps**:

1. **Check if addressing root problems**:
   - Review problem identification from previous iteration
   - Are you solving symptoms vs causes?
   - Example: Low test coverage may be due to unclear testing strategy, not lack of tests

2. **Verify evidence quality**:
   - Is data collection comprehensive?
   - Are you making evidence-based decisions?
   - Review data artifacts - do they support your strategy?

3. **Assess scope**:
   - Trying to fix too many things?
   - Focus on top 2-3 highest-impact problems
   - Better to solve 2 problems well than 5 problems poorly

4. **Challenge your scoring**:
   - Are scores honest (vs inflated)?
   - Seek disconfirming evidence
   - Compare against rubric, not "could be worse"

5. **Consider system evolution**:
   - Do you need specialized agent for recurring complex task?
   - Would new capability help structure repeated work?
   - Evolution requires evidence of insufficiency (not speculation)

**If still stuck after 2-3 iterations**: Re-examine value function definitions. May need to adjust components or convergence targets.

### Usage Questions

#### Can I use BAIME for [specific domain]?

BAIME works for **any software engineering domain where**:
- ✅ You can measure quality objectively
- ✅ Patterns emerge from practice
- ✅ Work involves 100+ lines of code/docs
- ✅ Results will be reused (methodology has value)

**Proven domains** (8 successful experiments):
- Testing strategy
- CI/CD pipelines
- Error recovery
- Observability instrumentation
- Dependency management
- Documentation systems
- Knowledge transfer
- Technical debt management

**Untested but promising**:
- API design
- Database migration
- Performance optimization
- Security review processes
- Code review workflows

**Probably not suitable**:
- One-time tasks (no reusability)
- Trivial processes (<1 hour total work)
- Domains with perfect existing solutions

#### Do I need the meta-cc plugin to use BAIME?

**For full BAIME workflow**: Yes, meta-cc provides:
- Session history analysis (understanding past work)
- MCP tools for querying patterns
- Specialized subagents (iteration-executor, knowledge-extractor)
- `/meta` command for quick insights

**Without meta-cc**: You can still apply BAIME principles:
- Manual OCA cycle execution
- Self-tracked value functions
- Evidence collection through notes/logs
- Pattern extraction through reflection

**Recommendation**: Use meta-cc. The 5-minute installation saves hours of manual tracking and provides empirical data for better decisions.

#### How do I know when to create a specialized agent?

**Create specialized agent when** (all three conditions):

1. **Evidence of insufficiency**:
   - Generic approach tried and struggled
   - Task complexity consistently high
   - Errors or quality issues recurring

2. **Pattern recurrence**:
   - Task performed 3+ times across iterations
   - Similar structure each time
   - Clear enough to codify

3. **Expected improvement**:
   - Can articulate what agent will do better
   - Have evidence from past attempts
   - Benefit justifies creation cost

**Don't create agent when**:
- Task only done 1-2 times (insufficient evidence)
- Generic approach working fine
- Speculation about future need (wait for evidence)

**Example**: In testing methodology, created `test-generator` agent after:
- Iteration 0-1: Manually wrote tests (worked but slow)
- Iteration 2: Pattern clear (fixture → arrange → act → assert)
- Iteration 3: Created agent, 3x speedup validated

### Technical Questions

#### What's the difference between capabilities and agents?

**Capabilities** (meta-agent lifecycle phases):
- **Purpose**: Guide meta-agent through OCA cycle phases
- **Content**: Patterns, guidelines, checklists for each phase
- **Location**: `capabilities/` directory (e.g., `capabilities/collect.md`)
- **Evolution**: Based on retrospective evidence (start as placeholders)
- **Example**: Strategy formation capability contains prioritization patterns

**Agents** (specialized executors):
- **Purpose**: Execute specific domain tasks
- **Content**: Domain expertise, task-specific workflows
- **Location**: `agents/` directory (e.g., `agents/test-generator.md`)
- **Evolution**: Created when evidence shows insufficiency
- **Example**: Test generator agent creates tests following patterns

**Analogy**:
- Capabilities = "How to think about the work" (meta-level)
- Agents = "How to do the work" (execution-level)

**Both**:
- Start as placeholders (empty files)
- Evolve based on evidence (not anticipatory design)
- Read fresh each time (no caching)

#### How do capabilities evolve during iterations?

**Evolution trigger**: Retrospective evidence of pattern recurrence

**Process**:

1. **Iteration 0-1**: Capabilities are placeholders (empty)
   - Meta-agent works generically
   - Patterns emerge during work

2. **Iteration 2-3**: Evidence accumulates
   - Same problems recur
   - Solutions follow similar patterns
   - Decision points become predictable

3. **Evolution point**: When pattern recurs 2-3 times
   - Extract pattern to relevant capability
   - Document guidance based on what worked
   - Add to capability file

4. **Validation**: Next iteration tests guidance
   - Does following capability improve outcomes?
   - Are value scores higher?
   - Is work more efficient?

**Example**: In CI/CD methodology:
- Iteration 0-1: Strategy capability empty
- Iteration 2: Same prioritization pattern used twice (quality gates > performance > observability)
- Iteration 2 end: Extracted to `strategy.md` capability
- Iteration 3: Following capability saved 30 minutes of decision-making

**Key principle**: Capabilities codify what worked, not what might work.

### Convergence Questions

#### Can I stop before reaching 0.80 thresholds?

**Yes, but understand trade-offs**:

**Stop at V_instance < 0.80**:
- Deliverable is incomplete or lower quality
- May need significant rework for production use
- Methodology validation is weak

**Stop at V_meta < 0.80**:
- Methodology is not fully reusable
- Transferability to other projects questionable
- May be project-specific, not universal

**When early stopping is acceptable**:
- Proof of concept (showing BAIME works for domain)
- Time constraints (better to have 0.70 than nothing)
- Sufficient for current needs (will iterate later)
- Learning exercise (not production use)

**When to push for full convergence**:
- Production deliverable needed
- Methodology will be shared/reused
- Investment in convergence pays off quickly
- Demonstrating BAIME effectiveness

**Recommendation**: Aim for dual convergence. The final iterations often provide the highest-value insights.

#### What if iterations take longer than estimated?

**Common in early BAIME use**:
- First experiment: 20-40 hours (learning BAIME itself)
- Second experiment: 15-25 hours (familiar with process)
- Third+ experiment: 10-20 hours (efficient execution)

**Time optimization strategies**:

1. **Invest in baseline** (Iteration 0):
   - 3-4 hours in Iteration 0 can save 6+ hours overall
   - Higher V_meta_0 (≥0.40) enables rapid convergence

2. **Use specialized subagents**:
   - iteration-executor saves 1-2 hours per iteration
   - knowledge-extractor saves 4-6 hours post-convergence

3. **Time-box template creation**:
   - Set 1.5 hour limit per template
   - Quality over quantity (3 excellent > 5 mediocre)

4. **Batch similar work**:
   - Create all templates together (context switching cost)
   - Run all automation tools together (testing efficiency)

5. **Defer low-ROI items**:
   - Visual aids can wait (2 hours for +0.03 impact)
   - Second example if first validates pattern

**If consistently over time**: Review your value function definitions. May be too ambitious for domain complexity.

---

## Quick Start

### 1. Define Your Domain

Choose the methodology you want to develop:

```
Examples:
- "Develop systematic testing strategy for Go projects"
- "Create CI/CD pipeline methodology with quality gates"
- "Build error recovery patterns for web services"
- "Establish documentation management system"
```

### 2. Establish Baseline

Measure current state in your domain:

```bash
# Example: Testing domain
- Current coverage: 65%
- Test approach: Ad-hoc
- No systematic patterns
- Estimated effort: High

# Example: CI/CD domain
- Build time: 5 minutes
- No quality gates
- Manual releases
- No smoke tests
```

### 3. Set Dual Goals

Define objectives for both layers:

**Instance Goal** (domain-specific):
- "Reach 80% test coverage with systematic strategy"
- "Reduce CI/CD build time to <2 minutes with quality gates"

**Meta Goal** (methodology):
- "Create reusable testing strategy with 85%+ transferability"
- "Develop CI/CD methodology applicable to any Go project"

### 4. Create Experiment Structure

```bash
# Create experiment directory
mkdir -p experiments/my-methodology

# Use iteration-prompt-designer subagent
# (See Specialized Agents section below)
```

### 5. Start Iteration 0

Execute baseline iteration using iteration-executor subagent.

---

## Step-by-Step Workflow

### Phase 0: Experiment Setup

**Goal**: Create experiment structure and iteration prompts

**Steps**:

1. **Create experiment directory**:
   ```bash
   cd your-project
   mkdir -p experiments/my-methodology-name
   cd experiments/my-methodology-name
   ```

2. **Design iteration prompts** (use iteration-prompt-designer subagent):
   ```
   User: "Design ITERATION-PROMPTS.md for [domain] methodology experiment"

   Agent creates:
   - ITERATION-PROMPTS.md (comprehensive iteration guidance)
   - Architecture overview (meta-agent + agents)
   - Value function definitions
   - Baseline iteration steps
   ```

3. **Review and customize**:
   - Adjust value function components for your domain
   - Customize baseline iteration steps
   - Set convergence targets

**Output**: `ITERATION-PROMPTS.md` ready for execution

---

### Phase 1: Iteration 0 (Baseline)

**Goal**: Establish baseline measurements and initial system state

**Steps**:

1. **Execute iteration** (use iteration-executor subagent):
   ```
   User: "Execute Iteration 0 for [domain] methodology using iteration-executor"
   ```

2. **Iteration-executor will**:
   - Create modular architecture (capabilities, agents, system state)
   - Collect baseline data
   - Create first deliverables (low quality expected)
   - Calculate V_instance_0 and V_meta_0 (honest assessment)
   - Identify problems and gaps
   - Generate iteration-0.md documentation

3. **Review baseline results**:
   ```bash
   # Check value scores
   cat system-state.md

   # Review iteration documentation
   cat iteration-0.md

   # Check identified problems
   grep "Problems" system-state.md
   ```

**Expected Baseline**: V_instance: 0.20-0.40, V_meta: 0.15-0.30

**Key Principle**: Low scores are expected and acceptable. This is measurement baseline, not final product.

---

### Phase 2: Iterations 1-N (Evolution)

**Goal**: Iteratively improve both deliverables and methodology until convergence

**For Each Iteration**:

1. **Read system state**:
   ```bash
   cat system-state.md  # Current scores and problems
   cat iteration-log.md # Iteration history
   ```

2. **Execute iteration** (use iteration-executor):
   ```
   User: "Execute Iteration N for [domain] methodology using iteration-executor"
   ```

3. **Iteration-executor follows OCA cycle**:

   **Observe**:
   - Read all capabilities for methodology context
   - Collect data on prioritized problems
   - Gather evidence about current state

   **Codify**:
   - Form strategy based on evidence
   - Plan specific improvements
   - Set iteration targets

   **Execute**:
   - Create/improve deliverables
   - Apply methodology patterns
   - Document execution observations

   **Evaluate**:
   - Calculate V_instance_N and V_meta_N
   - Provide evidence for each score component
   - Identify remaining gaps

   **Converge**:
   - Check convergence criteria
   - Extract patterns (if evidence supports)
   - Update capabilities (if retrospective evidence shows gaps)
   - Prioritize next iteration focus

4. **Review iteration results**:
   ```bash
   cat iteration-N.md      # Complete iteration documentation
   cat system-state.md     # Updated scores and state
   cat iteration-log.md    # Updated history
   ```

5. **Check convergence**:
   - V_instance ≥ 0.80?
   - V_meta ≥ 0.80?
   - Both stable for 2+ iterations?
   - If YES → Converged! Move to Phase 3
   - If NO → Continue to next iteration

**Typical Iteration Count**: 3-7 iterations to convergence

---

### Phase 3: Knowledge Extraction (Post-Convergence)

**Goal**: Transform experiment artifacts into reusable methodology

**Steps**:

1. **Use knowledge-extractor subagent**:
   ```
   User: "Extract methodology from [domain] experiment using knowledge-extractor"
   ```

2. **Knowledge-extractor creates**:
   - Methodology guide (comprehensive documentation)
   - Pattern library (extracted patterns)
   - Template collection (reusable templates)
   - Automation tools (scripts, validators)
   - Best practices (principles discovered)

3. **Package as skill** (optional):
   ```bash
   # Create skill structure
   mkdir -p .claude/skills/my-methodology

   # Copy extracted knowledge
   cp -r patterns templates .claude/skills/my-methodology/

   # Create SKILL.md
   # (See knowledge-extractor output for template)
   ```

**Output**: Reusable methodology ready for other projects

---

## Specialized Agents

BAIME provides three specialized Claude Code subagents:

### iteration-prompt-designer

**Purpose**: Design comprehensive ITERATION-PROMPTS.md for your experiment

**When to use**: At experiment start, before Iteration 0

**Invocation**:
```
Use Task tool with subagent_type="iteration-prompt-designer"

Example:
"Design ITERATION-PROMPTS.md for CI/CD optimization methodology experiment"
```

**What it creates**:
- Modular meta-agent architecture definition
- Domain-specific value function design
- Baseline iteration (Iteration 0) detailed steps
- Subsequent iteration templates
- Evidence-driven evolution guidance

**Time saved**: 2-3 hours of setup work

---

### iteration-executor

**Purpose**: Execute iteration through complete OCA cycle

**When to use**: For each iteration (Iteration 0, 1, 2, ...)

**Invocation**:
```
Use Task tool with subagent_type="iteration-executor"

Example:
"Execute Iteration 2 of testing methodology experiment using iteration-executor"
```

**What it does**:
1. Reads previous iteration state
2. Reads all capability files (fresh, no caching)
3. Executes lifecycle phases:
   - Data Collection (Observe)
   - Strategy Formation (Codify)
   - Work Execution (Automate)
   - Evaluation (Calculate dual values)
   - Convergence Check (Assess progress)
4. Generates complete iteration-N.md documentation
5. Updates system-state.md and iteration-log.md

**Benefits**:
- ✅ Consistent iteration structure
- ✅ Systematic value calculation (reduces bias)
- ✅ Proper convergence evaluation
- ✅ Complete artifact generation
- ✅ Structured execution vs ad-hoc

---

### knowledge-extractor

**Purpose**: Extract and transform converged experiment into reusable methodology

**When to use**: After experiment converges

**Invocation**:
```
Use Task tool with subagent_type="knowledge-extractor"

Example:
"Extract methodology from documentation-management experiment using knowledge-extractor"
```

**What it creates**:
- Methodology guide (user-facing documentation)
- Pattern library (validated patterns)
- Template collection (reusable templates)
- Automation tools (scripts, validators)
- Best practices guide (principles)
- Skill package (optional .claude/skills/ structure)

**Time saved**: 4-6 hours of knowledge organization work

---

## Practical Example

### Example: Developing Testing Methodology

**Domain**: Systematic testing strategy for Go projects

#### Step 1: Setup

```bash
# Create experiment
mkdir -p experiments/testing-methodology
cd experiments/testing-methodology

# Design iteration prompts
# (Use iteration-prompt-designer subagent)
```

Result: `ITERATION-PROMPTS.md` created with:
- Value functions for testing (coverage, quality, maintainability)
- Baseline iteration steps
- Testing-specific guidance

#### Step 2: Iteration 0 (Baseline)

```
User: "Execute Iteration 0 of testing methodology using iteration-executor"
```

**What happens**:

1. **Architecture created**:
   ```
   testing-methodology/
   ├── capabilities/
   │   ├── test-collect.md      (placeholder)
   │   ├── test-strategy.md     (placeholder)
   │   ├── test-execute.md      (placeholder)
   │   ├── test-evaluate.md     (placeholder)
   │   └── test-converge.md     (placeholder)
   ├── agents/
   │   ├── test-generator.md    (placeholder)
   │   └── test-validator.md    (placeholder)
   ├── data/
   ├── patterns/
   ├── templates/
   ├── system-state.md
   ├── iteration-log.md
   └── knowledge-index.md
   ```

2. **Data collected**:
   ```
   data/current-testing-state.md:
   - Current coverage: 65%
   - Test approach: Ad-hoc unit tests
   - No integration test strategy
   - No TDD workflow
   ```

3. **First deliverable created**:
   ```
   # Example: Basic test helper function
   # Quality: Low (intentionally, for baseline)
   ```

4. **Baseline scores calculated**:
   ```
   V_instance_0: 0.35
   - Coverage: 0.40 (65% actual, target 80%)
   - Quality: 0.25 (ad-hoc, no systematic approach)
   - Maintainability: 0.40 (some organization)

   V_meta_0: 0.25
   - Completeness: 0.20 (capabilities empty)
   - Effectiveness: 0.30 (no proven patterns yet)
   - Reusability: 0.20 (project-specific so far)
   - Validation: 0.30 (baseline measurement only)
   ```

5. **Problems identified**:
   - No TDD workflow
   - Coverage gaps unknown
   - Test organization unclear
   - No fixture patterns

**Output**: `iteration-0.md` with complete baseline documentation

#### Step 3: Iteration 1 (First Improvement)

```
User: "Execute Iteration 1 of testing methodology using iteration-executor"
```

**Focused on**: TDD workflow and coverage analysis

**Results**:
- Created TDD workflow pattern
- Built coverage gap analyzer tool
- Improved test organization
- V_instance_1: 0.55 (+0.20)
- V_meta_1: 0.45 (+0.20)

#### Step 4: Iterations 2-3 (Evolution)

Continued iterations until:
- V_instance_3: 0.85
- V_meta_3: 0.83
- Both stable (no major changes in iteration 4)

**Convergence achieved!**

#### Step 5: Knowledge Extraction

```
User: "Extract methodology from testing-methodology experiment using knowledge-extractor"
```

**Created**:
- `methodology/testing-strategy.md` (comprehensive guide)
- 8 validated patterns
- 3 reusable templates
- Coverage analyzer tool
- Test generator script

**Result**: Reusable testing methodology ready for other Go projects

---

### Example 2: Developing Error Recovery Methodology

**Domain**: Systematic error handling and recovery patterns for software systems

**Why This Example**: Demonstrates BAIME applicability to a different domain (error handling vs testing), showing methodology transferability and universal OCA cycle pattern.

#### Step 1: Setup

```bash
# Create experiment
mkdir -p experiments/error-recovery
cd experiments/error-recovery

# Design iteration prompts
# (Use iteration-prompt-designer subagent)
```

Result: `ITERATION-PROMPTS.md` created with:
- Value functions for error recovery (coverage, diagnostic quality, recovery effectiveness)
- Error taxonomy definition
- Recovery pattern identification

#### Step 2: Iteration 0 (Baseline)

```
User: "Execute Iteration 0 of error-recovery methodology using iteration-executor"
```

**What happens**:

1. **Architecture created**:
   ```
   error-recovery/
   ├── capabilities/
   │   ├── error-collect.md       (placeholder)
   │   ├── error-strategy.md      (placeholder)
   │   ├── error-execute.md       (placeholder)
   │   ├── error-evaluate.md      (placeholder)
   │   └── error-converge.md      (placeholder)
   ├── agents/
   │   ├── error-analyzer.md      (placeholder)
   │   └── error-classifier.md    (placeholder)
   ├── data/
   ├── patterns/
   ├── templates/
   ├── system-state.md
   ├── iteration-log.md
   └── knowledge-index.md
   ```

2. **Data collected**:
   ```
   data/error-analysis.md:
   - Historical errors: 1,336 instances analyzed
   - Error handling: Ad-hoc, inconsistent
   - Recovery patterns: None documented
   - MTTD/MTTR: High, no systematic diagnosis
   ```

3. **First deliverable created**:
   ```
   # Initial error taxonomy (13 categories)
   # Quality: Basic classification, no recovery patterns yet
   ```

4. **Baseline scores calculated**:
   ```
   V_instance_0: 0.40
   - Coverage: 0.50 (errors classified, not all types covered)
   - Diagnostic Quality: 0.30 (basic categorization only)
   - Recovery Effectiveness: 0.25 (no systematic recovery)
   - Documentation: 0.55 (taxonomy exists)

   V_meta_0: 0.30
   - Completeness: 0.25 (taxonomy only, no workflows)
   - Effectiveness: 0.35 (classification helpful but limited)
   - Reusability: 0.25 (domain-specific so far)
   - Validation: 0.35 (validated against 1,336 historical errors)
   ```

5. **Problems identified**:
   - No systematic diagnosis workflow
   - No recovery patterns
   - No prevention guidelines
   - Taxonomy incomplete (95.4% coverage, gaps exist)

**Output**: `iteration-0.md` with complete baseline documentation

**Key Difference from Testing Example**: Error Recovery started with rich historical data (1,336 errors), enabling retrospective validation from Iteration 0. This demonstrates how domain characteristics affect baseline quality (V_instance_0 = 0.40 vs Testing's 0.35).

#### Step 3: Iteration 1 (Diagnostic Workflows)

```
User: "Execute Iteration 1 of error-recovery methodology using iteration-executor"
```

**Focused on**: Creating diagnostic workflows and expanding taxonomy

**Results**:
- Created 8 diagnostic workflows (file operations, API calls, data validation, etc.)
- Expanded error taxonomy to 13 categories
- Added contextual logging patterns
- **V_instance_1: 0.62** (+0.22, significant jump due to workflow addition)
- **V_meta_1: 0.50** (+0.20, patterns emerging)

**Pattern Emerged**: Error diagnosis follows consistent structure:
1. Symptom identification
2. Context gathering
3. Root cause analysis
4. Solution selection

#### Step 4: Iteration 2 (Recovery Patterns and Prevention)

```
User: "Execute Iteration 2 of error-recovery methodology using iteration-executor"
```

**Focused on**: Recovery patterns, prevention guidelines, automation

**Results**:
- Documented 5 recovery patterns (retry, fallback, circuit breaker, graceful degradation, fail-fast)
- Created 8 prevention guidelines
- Built 3 automation tools (file path validation, read-before-write check, file size validation)
- **V_instance_2: 0.78** (+0.16, approaching convergence)
- **V_meta_2: 0.72** (+0.22, acceleration due to automation)

**Automation Impact**: Prevention tools covered 23.7% of historical errors, proving methodology effectiveness empirically.

#### Step 5: Iteration 3 (Convergence)

```
User: "Execute Iteration 3 of error-recovery methodology using iteration-executor"
```

**Focused on**: Final validation, cross-language transferability

**Results**:
- Validated patterns across 4 languages (Go, Python, JavaScript, Rust)
- Achieved 95.4% error coverage (1,274/1,336 historical errors)
- Transferability assessment: 85-90% universal patterns
- **V_instance_3: 0.83** (+0.05, exceeded threshold)
- **V_meta_3: 0.85** (+0.13, strong convergence)

**System Stability**: No capability or agent evolution needed (3 iterations stable) - generic OCA cycle sufficient.

**Convergence Status**: ✅ **CONVERGED**
- Both layers > 0.80 ✅
- System stable (M_3 == M_2, A_3 == A_2) ✅
- Objectives complete ✅
- Total time: ~10 hours over 3 iterations

#### Step 6: Knowledge Extraction

```
User: "Extract methodology from error-recovery experiment using knowledge-extractor"
```

**Created**:
- `methodology/error-recovery.md` (comprehensive 13-category taxonomy)
- 8 diagnostic workflows
- 5 recovery patterns
- 8 prevention guidelines
- 3 automation tools (file validation, read-before-write, size validation)
- Retrospective validation report (95.4% historical error coverage)

**Result**: Reusable error recovery methodology with 85-90% transferability across languages/platforms

**Transferability Evidence**:
- Core concepts: 100% universal (error taxonomy, diagnostic workflows)
- Recovery patterns: 95% universal (retry, fallback, circuit breaker work everywhere)
- Automation tools: 60% universal (concepts transfer, implementations vary by language)

---

### Comparing the Two Examples

| Aspect | Testing Methodology | Error Recovery Methodology |
|--------|---------------------|----------------------------|
| **Domain Complexity** | Medium (test strategies, patterns) | High (13 error categories, recovery patterns) |
| **Baseline Data** | Limited (current tests only) | Rich (1,336 historical errors) |
| **V_instance_0** | 0.35 | 0.40 (higher due to historical data) |
| **V_meta_0** | 0.25 | 0.30 (retrospective validation possible) |
| **Iterations to Converge** | 3-4 iterations | 3 iterations (rapid due to data richness) |
| **Total Time** | ~12 hours | ~10 hours (rich baseline enabled efficiency) |
| **Transferability** | 89% (Go projects) | 85-90% (universal, cross-language) |
| **Key Innovation** | TDD workflow, coverage analyzer | Error taxonomy, diagnostic workflows, prevention |
| **System Evolution** | Stable (no agent specialization) | Stable (no agent specialization) |

**Universal Lessons**:
1. **Rich baseline data accelerates convergence** (Error Recovery's 1,336 errors vs Testing's current state)
2. **OCA cycle works across domains** (same structure, different content)
3. **System stability is common** (both examples: no agent evolution needed)
4. **Retrospective validation powerful** (Error Recovery: 95.4% coverage proves methodology)
5. **Automation provides empirical evidence** (23.7% error prevention measurable)

**BAIME Transferability Confirmed**: Same methodology framework produced high-quality results in two distinct domains (testing vs error handling), demonstrating universal applicability.

---

## Troubleshooting

### Issue: Value scores not improving

**Symptoms**: V_instance or V_meta stuck or decreasing across iterations

**Example**:
```
Iteration 0: V_instance = 0.35, V_meta = 0.25
Iteration 1: V_instance = 0.37, V_meta = 0.28  (minimal progress)
Iteration 2: V_instance = 0.34, V_meta = 0.30  (instance decreased!)
```

**Diagnosis**:

**Root Cause 1: Solving symptoms, not problems**
```
❌ Problem identified: "Low test coverage"
❌ Solution attempted: "Write more tests"
❌ Result: Coverage increased but tests are brittle, hard to maintain

✅ Better problem: "No systematic testing strategy"
✅ Better solution: "Create TDD workflow, extract test patterns"
✅ Result: Fewer tests, but higher quality and maintainable
```

**Root Cause 2: Strategy not evidence-based**
```
❌ Strategy: "Let's add integration tests because they seem useful"
❌ Evidence: None (speculation)

✅ Strategy: "Data shows 80% of bugs in API layer, add API tests"
✅ Evidence: Bug analysis from data/bug-analysis.md
```

**Root Cause 3: Scope too broad**
```
❌ Iteration 2 plan: Fix 7 problems (test coverage, CI/CD, docs, errors)
❌ Result: All partially done, none well done

✅ Iteration 2 plan: Fix top 2 problems (test strategy, coverage analysis)
✅ Result: Both fully solved, measurable improvement
```

**Solutions**:
1. **Re-examine problem identification**:
   - Are you solving root causes or symptoms?
   - Review data artifacts - do they support your problem statement?
   - Ask "why" 3 times to find root cause

2. **Verify evidence quality**:
   - Is data collection comprehensive?
   - Do you have concrete measurements?
   - Can you show before/after data?

3. **Narrow focus**:
   - Address top 2-3 highest-impact problems only
   - Better to solve 2 problems completely than 5 partially
   - Defer lower-priority items to next iteration

4. **Re-evaluate strategy**:
   - Is it based on data or assumptions?
   - Review iteration-N-strategy.md for evidence
   - Challenge each planned improvement: "What evidence supports this?"

---

### Issue: Methodology not transferable (low V_meta Reusability)

**Symptoms**: V_meta Reusability component < 0.60 after multiple iterations

**Example**:
```
Iteration 2 evaluation:
- Completeness: 0.70 ✅
- Effectiveness: 0.75 ✅
- Reusability: 0.45 ❌ (blocking convergence)
- Validation: 0.65 ✅
```

**Diagnosis**:

**Problem: Patterns too project-specific**

Before (Low Reusability):
```markdown
# Testing Pattern
1. Create test file in src/api/handlers/__tests__/
2. Import UserModel from "../../models/user"
3. Use Jest expect() assertions
4. Run with npm test
```

After (High Reusability):
```markdown
# Testing Pattern (Parameterized)
1. Create test file adjacent to source: {source_dir}/__tests__/{module}_test{ext}
2. Import module under test: {import_statement}
3. Use test framework assertion: {assertion_method}
4. Run with project test command: {test_command}

Adaptation guide:
- Go: {ext}=.go, {assertion_method}=testing.T methods
- JS: {ext}=.js, {assertion_method}=expect() or assert()
- Python: {ext}=.py, {assertion_method}=unittest assertions
```

**Problem: No abstraction of domain concepts**

Before:
```markdown
# CI/CD Pattern
- Install Go 1.21
- Run go test ./...
- Build with go build -o bin/app
- Check coverage is >80%
```

After (Abstracted):
```markdown
# CI/CD Quality Gate Pattern

Universal steps:
1. Install language runtime (version from project config)
2. Run test suite (project-specific command)
3. Build artifact (project-specific build process)
4. Verify quality threshold (configurable threshold)

Domain-specific implementations:
- Go: {runtime}=Go 1.21+, {test}=go test, {build}=go build
- Node: {runtime}=Node 18+, {test}=npm test, {build}=npm run build
- Python: {runtime}=Python 3.10+, {test}=pytest, {build}=python setup.py
```

**Solutions**:
1. **Extract universal patterns**:
   - Identify what's essential vs project-specific
   - Replace hardcoded values with parameters
   - Document adaptation guide

2. **Create parameterized templates**:
   - Use placeholders: {variable_name}
   - Provide examples for 3+ different contexts
   - Include "How to adapt" section

3. **Test across scenarios**:
   - Apply pattern to different project in same domain
   - Document what needed changing
   - Refine pattern based on adaptation effort

4. **Add abstraction layers**:
   - Layer 1: Universal principle (works anywhere)
   - Layer 2: Domain-specific implementation (testing/CI/CD/etc)
   - Layer 3: Tool-specific details (Jest/pytest/etc)

---

### Issue: Can't reach convergence (stuck at V ~0.70)

**Symptoms**: Multiple iterations without reaching 0.80

**Causes**:
- Unrealistic convergence targets
- Missing critical patterns
- Need specialized agent but using generic approach

**Solutions**:
1. Review value function definitions - are they appropriate?
2. Identify missing methodology components
3. Consider creating specialized agent if problem recurs
4. Re-assess convergence criteria - is 0.80 realistic for this domain?

---

### Issue: Too many iterations (>10)

**Symptoms**: Slow convergence, many iterations needed

**Causes**:
- Insufficient baseline (V_meta_0 < 0.20)
- Not extracting patterns early enough
- Too conservative improvements

**Solutions**:
1. Improve baseline iteration - invest more time in Iteration 0
2. Extract patterns when they recur (don't wait)
3. Make bolder improvements (test larger changes)
4. Use specialized agents earlier

---

### Issue: Premature convergence claims

**Symptoms**: Claiming convergence but quality obviously low

**Causes**:
- Inflated value scores (not honest assessment)
- Comparing to "could be worse" instead of rubrics
- Time pressure leading to rushed evaluation

**Solutions**:
1. Seek disconfirming evidence actively
2. Test deliverables thoroughly
3. Enumerate gaps explicitly
4. Challenge high scores with extra scrutiny
5. Remember: Honest assessment is more valuable than fast convergence

---

## Next Steps

### After Your First BAIME Experiment

1. **Review iteration documentation** - See what worked, what didn't
2. **Extract lessons learned** - Document insights about BAIME process
3. **Apply methodology** - Use created methodology in real work
4. **Share knowledge** - Package as skill or contribute back

### Advanced Topics

- **Baseline Quality Assessment** - Achieve comprehensive baseline (V_meta ≥ 0.40 in Iteration 0) for faster convergence
- **Rapid Convergence** - Techniques for 3-4 iteration methodology development
- **Agent Specialization** - When and how to create specialized agents
- **Retrospective Validation** - Validate methodology against historical data
- **Cross-Domain Transfer** - Apply methodology to different projects

See individual skills for detailed guidance:
- `baseline-quality-assessment`
- `rapid-convergence`
- `agent-prompt-evolution`
- `retrospective-validation`

### Further Reading

- **[Methodology Bootstrapping Skill](../../.claude/skills/methodology-bootstrapping/)** - Complete BAIME reference
- **[Empirical Methodology Development](../methodology/empirical-methodology-development.md)** - Theoretical foundation
- **[Bootstrapped Software Engineering](../methodology/bootstrapped-software-engineering.md)** - BAIME in depth
- **[Example Experiments](../../experiments/)** - Real BAIME experiments to study

### Getting Help

- **Check skill documentation**: `.claude/skills/methodology-bootstrapping/`
- **Review example experiments**: `experiments/bootstrap-*/`
- **Use @meta-coach**: Ask for workflow optimization guidance
- **Read iteration documentation**: See how past experiments evolved

---

## Summary

**BAIME provides**:
- ✅ Systematic framework for methodology development
- ✅ Empirical validation with data-driven decisions
- ✅ Dual-layer value functions for quality measurement
- ✅ Specialized agents for streamlined execution
- ✅ Proven results: 10-50x speedup, 70-95% transferability

**Key workflow**:
1. Define domain and dual goals
2. Design iteration prompts (iteration-prompt-designer)
3. Execute Iteration 0 baseline (iteration-executor)
4. Iterate until convergence (typically 3-7 iterations)
5. Extract knowledge (knowledge-extractor)
6. Apply methodology to real work

**Remember**:
- Start with clear domain and goals
- Low baseline scores are expected
- Honest assessment is crucial
- Evidence-driven evolution (not anticipatory design)
- Convergence requires both V_instance ≥ 0.80 AND V_meta ≥ 0.80

**Ready to start?** Choose your domain, set up your experiment, and begin with Iteration 0!

---

**Document Version**: 1.0 (Iteration 0 Baseline)
**Last Updated**: 2025-10-19
**Status**: Initial version - Will evolve based on user feedback