1504 lines
45 KiB
Markdown
1504 lines
45 KiB
Markdown
# BAIME Usage Guide
|
|
|
|
**BAIME (Bootstrapped AI Methodology Engineering)** - A systematic framework for developing and validating software engineering methodologies through observation, codification, and automation.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
- [What is BAIME?](#what-is-baime)
|
|
- [When to Use BAIME](#when-to-use-baime)
|
|
- [Prerequisites](#prerequisites)
|
|
- [Core Concepts](#core-concepts)
|
|
- [Frequently Asked Questions](#frequently-asked-questions)
|
|
- [Quick Start](#quick-start)
|
|
- [Step-by-Step Workflow](#step-by-step-workflow)
|
|
- [Specialized Agents](#specialized-agents)
|
|
- [Practical Example](#practical-example)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Next Steps](#next-steps)
|
|
|
|
---
|
|
|
|
## What is BAIME?
|
|
|
|
BAIME integrates three complementary methodologies optimized for LLM-based development:
|
|
|
|
1. **OCA Cycle** (Observe-Codify-Automate) - Core iterative framework
|
|
2. **Empirical Validation** - Scientific method and data-driven decisions
|
|
3. **Value Optimization** - Dual-layer value functions for quantitative evaluation
|
|
|
|
**Key Innovation**: BAIME treats methodology development like software development—with empirical observation, automated testing, continuous iteration, and quantitative metrics.
|
|
|
|
### Why BAIME?
|
|
|
|
**Problem**: Ad-hoc methodology development is slow, subjective, and hard to validate.
|
|
|
|
**Solution**: BAIME provides systematic approach with:
|
|
- ✅ **Rapid convergence**: Typically 3-7 iterations, 6-15 hours
|
|
- ✅ **Empirical validation**: Data-driven evidence, not opinions
|
|
- ✅ **High transferability**: 70-95% reusable across projects
|
|
- ✅ **Proven results**: 100% success rate across 8 experiments, 10-50x speedup
|
|
|
|
### BAIME in Action
|
|
|
|
**Example Results**:
|
|
- **Testing Strategy**: 15x speedup, 89% transferability
|
|
- **CI/CD Pipeline**: 2.5-3.5x speedup, 91.7% pattern validation
|
|
- **Error Recovery**: 95.4% error coverage, 3 iterations
|
|
- **Documentation System**: 47% token cost reduction, 85% reduction in redundancy
|
|
- **Knowledge Transfer**: 3-8x onboarding speedup
|
|
|
|
---
|
|
|
|
## When to Use BAIME
|
|
|
|
### Use BAIME For
|
|
|
|
✅ **Creating systematic methodologies** for:
|
|
- Testing strategies
|
|
- CI/CD pipelines
|
|
- Error handling patterns
|
|
- Observability systems
|
|
- Dependency management
|
|
- Documentation systems
|
|
- Knowledge transfer processes
|
|
- Technical debt management
|
|
- Cross-cutting concerns
|
|
|
|
✅ **When you need**:
|
|
- Empirical validation with data
|
|
- Iterative methodology evolution
|
|
- Quantitative quality metrics
|
|
- Transferable best practices
|
|
- Rapid convergence (hours to days, not weeks)
|
|
|
|
### Don't Use BAIME For
|
|
|
|
❌ **One-time ad-hoc tasks** without reusability goals
|
|
❌ **Trivial processes** (<100 lines of code/docs)
|
|
❌ **Established standards** that fully solve your problem
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Required
|
|
|
|
1. **meta-cc plugin installed** and configured
|
|
- See [Installation Guide](installation.md)
|
|
- Verify: `/meta "show stats"` works
|
|
|
|
2. **Claude Code** environment
|
|
- Access to Task tool for subagent invocation
|
|
|
|
3. **Project with need for methodology**
|
|
- Have a specific domain in mind (testing, CI/CD, etc.)
|
|
- Able to measure current state (baseline)
|
|
|
|
### Recommended
|
|
|
|
- **Familiarity with meta-cc** basic features
|
|
- **Understanding of your domain** (e.g., if developing testing methodology, know testing basics)
|
|
- **Git repository** for tracking methodology evolution
|
|
|
|
---
|
|
|
|
## Core Concepts
|
|
|
|
### Understanding Value Functions
|
|
|
|
BAIME uses **dual-layer value functions** to measure quality at two independent levels:
|
|
|
|
#### V_instance: Domain-Specific Quality
|
|
|
|
Measures the quality of your specific deliverables:
|
|
|
|
- **Purpose**: Assess whether your domain work is high-quality
|
|
- **Examples**:
|
|
- Testing methodology: Test coverage percentage, test maintainability
|
|
- CI/CD pipeline: Build time, deployment success rate, quality gate coverage
|
|
- Documentation: Completeness, accuracy, usability
|
|
- **Characteristics**: Domain-dependent, specific to your work
|
|
|
|
#### V_meta: Methodology Quality
|
|
|
|
Measures the quality of the methodology itself:
|
|
|
|
- **Purpose**: Assess whether your methodology is reusable and effective
|
|
- **Components**:
|
|
- **Completeness**: All necessary patterns, templates, tools exist
|
|
- **Effectiveness**: Methodology improves quality and efficiency
|
|
- **Reusability**: Works across projects with minimal adaptation
|
|
- **Validation**: Empirically tested and proven effective
|
|
- **Characteristics**: Domain-independent, universal assessment
|
|
|
|
#### Convergence Requirement
|
|
|
|
**Both must reach ≥ 0.80** for methodology to be complete:
|
|
|
|
- V_instance ≥ 0.80: Domain work is production-ready
|
|
- V_meta ≥ 0.80: Methodology is reusable
|
|
- If only one converges, keep iterating
|
|
|
|
---
|
|
|
|
### The OCA Cycle
|
|
|
|
Each iteration follows the **Observe-Codify-Automate** cycle:
|
|
|
|
```
|
|
Observe → Codify → Automate → Evaluate
|
|
↓ ↓
|
|
← ← ← ← ← Iterate ← ← ← ← ← ←
|
|
```
|
|
|
|
#### Phase 1: Observe
|
|
|
|
**Goal**: Collect empirical data about current state
|
|
|
|
**Activities**:
|
|
- Read previous iteration results
|
|
- Measure baseline (Iteration 0) or current state
|
|
- Identify problems and patterns
|
|
- Gather evidence about what's working/not working
|
|
|
|
**Output**: Data artifacts documenting observations
|
|
|
|
#### Phase 2: Codify
|
|
|
|
**Goal**: Extract patterns and create reusable structures
|
|
|
|
**Activities**:
|
|
- Form strategy based on evidence
|
|
- Extract recurring patterns into documented forms
|
|
- Create templates for common structures
|
|
- Prioritize improvements based on impact
|
|
|
|
**Output**: Patterns, templates, strategy documentation
|
|
|
|
#### Phase 3: Automate
|
|
|
|
**Goal**: Build tools to improve efficiency and consistency
|
|
|
|
**Activities**:
|
|
- Create automation scripts (validators, generators, analyzers)
|
|
- Implement quality gates
|
|
- Build CI integration
|
|
- Execute planned improvements
|
|
|
|
**Output**: Working tools, improved deliverables
|
|
|
|
#### Phase 4: Evaluate
|
|
|
|
**Goal**: Measure progress and assess convergence
|
|
|
|
**Activities**:
|
|
- Calculate V_instance and V_meta scores
|
|
- Provide evidence for each component
|
|
- Identify remaining gaps
|
|
- Check convergence criteria
|
|
|
|
**Output**: Value scores, gap analysis, convergence decision
|
|
|
|
---
|
|
|
|
### Meta-Agent and Specialized Agents
|
|
|
|
#### Meta-Agent
|
|
|
|
The **meta-agent orchestrates** the entire BAIME process:
|
|
|
|
**Responsibilities**:
|
|
- Read lifecycle capabilities before each phase (fresh, no caching)
|
|
- Execute OCA cycle systematically
|
|
- Track system state evolution (M_n, A_n, s_n)
|
|
- Coordinate specialized agents when needed
|
|
- Make evidence-based evolution decisions
|
|
|
|
**Key Behavior**: Reads capabilities fresh each iteration to incorporate latest guidance
|
|
|
|
#### Specialized Agents
|
|
|
|
**Domain-specific executors** created when evidence shows need:
|
|
|
|
**When created**:
|
|
- Generic approach insufficient (demonstrated, not assumed)
|
|
- Task recurs 3+ times with similar structure
|
|
- Clear expected improvement from specialization
|
|
|
|
**Examples**:
|
|
- `test-generator`: Creates tests following validated patterns
|
|
- `validator-agent`: Checks deliverables against quality criteria
|
|
- `knowledge-extractor`: Transforms experiment into reusable methodology
|
|
|
|
**Key Principle**: Agents evolve based on retrospective evidence (not anticipatory design)
|
|
|
|
---
|
|
|
|
### Capabilities and System State
|
|
|
|
#### Capabilities
|
|
|
|
**Modular guidance files** for each OCA lifecycle phase:
|
|
|
|
- `capabilities/collect.md` - Data collection patterns
|
|
- `capabilities/strategy.md` - Strategy formation guidance
|
|
- `capabilities/execute.md` - Execution patterns
|
|
- `capabilities/evaluate.md` - Evaluation rubrics
|
|
- `capabilities/converge.md` - Convergence assessment
|
|
|
|
**Evolution**:
|
|
- Start empty (placeholders) in Iteration 0
|
|
- Evolve when patterns recur 2-3 times
|
|
- Based on retrospective evidence (not speculation)
|
|
- Read fresh each phase (no caching)
|
|
|
|
#### System State
|
|
|
|
**Tracked components** across iterations:
|
|
|
|
- **M_n**: Methodology components (capabilities, patterns, templates)
|
|
- **A_n**: Agent system (specialized agents)
|
|
- **s_n**: Current state (deliverables, artifacts, value scores)
|
|
- **V(s_n)**: Dual value functions (V_instance, V_meta)
|
|
|
|
**State transition**: s_{n-1} → s_n documents evolution
|
|
|
|
---
|
|
|
|
### Convergence Criteria
|
|
|
|
Methodology is **complete and production-ready** when all four conditions met:
|
|
|
|
#### 1. Dual Threshold
|
|
|
|
- ✅ V_instance ≥ 0.80 (domain goals achieved)
|
|
- ✅ V_meta ≥ 0.80 (methodology quality high)
|
|
|
|
#### 2. System Stability
|
|
|
|
- ✅ M_n == M_{n-1} (no methodology changes)
|
|
- ✅ A_n == A_{n-1} (no agent evolution)
|
|
- ✅ Stable for 2+ consecutive iterations
|
|
|
|
#### 3. Objectives Complete
|
|
|
|
- ✅ All planned work finished
|
|
- ✅ No critical gaps remaining
|
|
|
|
#### 4. Diminishing Returns
|
|
|
|
- ✅ ΔV_instance < 0.02 for 2+ iterations
|
|
- ✅ ΔV_meta < 0.02 for 2+ iterations
|
|
|
|
**Note**: If system evolves (new agent/capability), stability clock resets. Evolution must be validated in next iteration before convergence.
|
|
|
|
---
|
|
|
|
## Frequently Asked Questions
|
|
|
|
### General Questions
|
|
|
|
#### What exactly is BAIME and how is it different from other methodologies?
|
|
|
|
BAIME (Bootstrapped AI Methodology Engineering) is a meta-methodology for developing domain-specific methodologies through empirical observation and iteration. Unlike traditional methodologies that are designed upfront, BAIME creates methodologies through practice:
|
|
|
|
- **Traditional approach**: Design methodology → Apply → Hope it works
|
|
- **BAIME approach**: Observe patterns → Extract methodology → Validate → Iterate
|
|
|
|
Key differentiators:
|
|
- Dual-layer value functions measure both deliverable quality AND methodology quality
|
|
- Evidence-driven evolution (not anticipatory design)
|
|
- Quantitative convergence criteria (≥0.80 thresholds)
|
|
- Specialized subagents for consistent execution
|
|
|
|
#### When should I use BAIME vs just following existing best practices?
|
|
|
|
**Use BAIME when**:
|
|
- No established methodology fully fits your domain
|
|
- You need methodology customized to your project constraints
|
|
- You want empirically validated patterns, not borrowed practices
|
|
- You need to measure and prove methodology effectiveness
|
|
|
|
**Use existing practices when**:
|
|
- Industry-standard methodology already solves your problem
|
|
- Team already trained on established framework
|
|
- Project timeline doesn't allow methodology development
|
|
- Problem domain is simple and well-understood
|
|
|
|
**Use both**: Start with BAIME to develop baseline, then integrate proven external practices in later iterations.
|
|
|
|
#### How long does a typical BAIME experiment take?
|
|
|
|
**Typical timeline**:
|
|
- **Iteration 0** (Baseline): 2-4 hours
|
|
- **Iterations 1-N**: 3-6 hours each
|
|
- **Total**: 10-30 hours over 3-7 iterations
|
|
- **Knowledge extraction**: 2-4 hours post-convergence
|
|
|
|
**Time factors**:
|
|
- Domain complexity (testing < CI/CD < architecture)
|
|
- Baseline quality (higher baseline → fewer iterations)
|
|
- Team familiarity with BAIME (improves with practice)
|
|
- Automation investment (upfront cost, ongoing savings)
|
|
|
|
**ROI**: 10-50x speedup on future work justifies investment. A 20-hour methodology development that saves 10 hours per month pays off in month 2.
|
|
|
|
#### What if my value scores aren't improving between iterations?
|
|
|
|
**Diagnostic steps**:
|
|
|
|
1. **Check if addressing root problems**:
|
|
- Review problem identification from previous iteration
|
|
- Are you solving symptoms vs causes?
|
|
- Example: Low test coverage may be due to unclear testing strategy, not lack of tests
|
|
|
|
2. **Verify evidence quality**:
|
|
- Is data collection comprehensive?
|
|
- Are you making evidence-based decisions?
|
|
- Review data artifacts - do they support your strategy?
|
|
|
|
3. **Assess scope**:
|
|
- Trying to fix too many things?
|
|
- Focus on top 2-3 highest-impact problems
|
|
- Better to solve 2 problems well than 5 problems poorly
|
|
|
|
4. **Challenge your scoring**:
|
|
- Are scores honest (vs inflated)?
|
|
- Seek disconfirming evidence
|
|
- Compare against rubric, not "could be worse"
|
|
|
|
5. **Consider system evolution**:
|
|
- Do you need specialized agent for recurring complex task?
|
|
- Would new capability help structure repeated work?
|
|
- Evolution requires evidence of insufficiency (not speculation)
|
|
|
|
**If still stuck after 2-3 iterations**: Re-examine value function definitions. May need to adjust components or convergence targets.
|
|
|
|
### Usage Questions
|
|
|
|
#### Can I use BAIME for [specific domain]?
|
|
|
|
BAIME works for **any software engineering domain where**:
|
|
- ✅ You can measure quality objectively
|
|
- ✅ Patterns emerge from practice
|
|
- ✅ Work involves 100+ lines of code/docs
|
|
- ✅ Results will be reused (methodology has value)
|
|
|
|
**Proven domains** (8 successful experiments):
|
|
- Testing strategy
|
|
- CI/CD pipelines
|
|
- Error recovery
|
|
- Observability instrumentation
|
|
- Dependency management
|
|
- Documentation systems
|
|
- Knowledge transfer
|
|
- Technical debt management
|
|
|
|
**Untested but promising**:
|
|
- API design
|
|
- Database migration
|
|
- Performance optimization
|
|
- Security review processes
|
|
- Code review workflows
|
|
|
|
**Probably not suitable**:
|
|
- One-time tasks (no reusability)
|
|
- Trivial processes (<1 hour total work)
|
|
- Domains with perfect existing solutions
|
|
|
|
#### Do I need the meta-cc plugin to use BAIME?
|
|
|
|
**For full BAIME workflow**: Yes, meta-cc provides:
|
|
- Session history analysis (understanding past work)
|
|
- MCP tools for querying patterns
|
|
- Specialized subagents (iteration-executor, knowledge-extractor)
|
|
- `/meta` command for quick insights
|
|
|
|
**Without meta-cc**: You can still apply BAIME principles:
|
|
- Manual OCA cycle execution
|
|
- Self-tracked value functions
|
|
- Evidence collection through notes/logs
|
|
- Pattern extraction through reflection
|
|
|
|
**Recommendation**: Use meta-cc. The 5-minute installation saves hours of manual tracking and provides empirical data for better decisions.
|
|
|
|
#### How do I know when to create a specialized agent?
|
|
|
|
**Create specialized agent when** (all three conditions):
|
|
|
|
1. **Evidence of insufficiency**:
|
|
- Generic approach tried and struggled
|
|
- Task complexity consistently high
|
|
- Errors or quality issues recurring
|
|
|
|
2. **Pattern recurrence**:
|
|
- Task performed 3+ times across iterations
|
|
- Similar structure each time
|
|
- Clear enough to codify
|
|
|
|
3. **Expected improvement**:
|
|
- Can articulate what agent will do better
|
|
- Have evidence from past attempts
|
|
- Benefit justifies creation cost
|
|
|
|
**Don't create agent when**:
|
|
- Task only done 1-2 times (insufficient evidence)
|
|
- Generic approach working fine
|
|
- Speculation about future need (wait for evidence)
|
|
|
|
**Example**: In testing methodology, created `test-generator` agent after:
|
|
- Iteration 0-1: Manually wrote tests (worked but slow)
|
|
- Iteration 2: Pattern clear (fixture → arrange → act → assert)
|
|
- Iteration 3: Created agent, 3x speedup validated
|
|
|
|
### Technical Questions
|
|
|
|
#### What's the difference between capabilities and agents?
|
|
|
|
**Capabilities** (meta-agent lifecycle phases):
|
|
- **Purpose**: Guide meta-agent through OCA cycle phases
|
|
- **Content**: Patterns, guidelines, checklists for each phase
|
|
- **Location**: `capabilities/` directory (e.g., `capabilities/collect.md`)
|
|
- **Evolution**: Based on retrospective evidence (start as placeholders)
|
|
- **Example**: Strategy formation capability contains prioritization patterns
|
|
|
|
**Agents** (specialized executors):
|
|
- **Purpose**: Execute specific domain tasks
|
|
- **Content**: Domain expertise, task-specific workflows
|
|
- **Location**: `agents/` directory (e.g., `agents/test-generator.md`)
|
|
- **Evolution**: Created when evidence shows insufficiency
|
|
- **Example**: Test generator agent creates tests following patterns
|
|
|
|
**Analogy**:
|
|
- Capabilities = "How to think about the work" (meta-level)
|
|
- Agents = "How to do the work" (execution-level)
|
|
|
|
**Both**:
|
|
- Start as placeholders (empty files)
|
|
- Evolve based on evidence (not anticipatory design)
|
|
- Read fresh each time (no caching)
|
|
|
|
#### How do capabilities evolve during iterations?
|
|
|
|
**Evolution trigger**: Retrospective evidence of pattern recurrence
|
|
|
|
**Process**:
|
|
|
|
1. **Iteration 0-1**: Capabilities are placeholders (empty)
|
|
- Meta-agent works generically
|
|
- Patterns emerge during work
|
|
|
|
2. **Iteration 2-3**: Evidence accumulates
|
|
- Same problems recur
|
|
- Solutions follow similar patterns
|
|
- Decision points become predictable
|
|
|
|
3. **Evolution point**: When pattern recurs 2-3 times
|
|
- Extract pattern to relevant capability
|
|
- Document guidance based on what worked
|
|
- Add to capability file
|
|
|
|
4. **Validation**: Next iteration tests guidance
|
|
- Does following capability improve outcomes?
|
|
- Are value scores higher?
|
|
- Is work more efficient?
|
|
|
|
**Example**: In CI/CD methodology:
|
|
- Iteration 0-1: Strategy capability empty
|
|
- Iteration 2: Same prioritization pattern used twice (quality gates > performance > observability)
|
|
- Iteration 2 end: Extracted to `strategy.md` capability
|
|
- Iteration 3: Following capability saved 30 minutes of decision-making
|
|
|
|
**Key principle**: Capabilities codify what worked, not what might work.
|
|
|
|
### Convergence Questions
|
|
|
|
#### Can I stop before reaching 0.80 thresholds?
|
|
|
|
**Yes, but understand trade-offs**:
|
|
|
|
**Stop at V_instance < 0.80**:
|
|
- Deliverable is incomplete or lower quality
|
|
- May need significant rework for production use
|
|
- Methodology validation is weak
|
|
|
|
**Stop at V_meta < 0.80**:
|
|
- Methodology is not fully reusable
|
|
- Transferability to other projects questionable
|
|
- May be project-specific, not universal
|
|
|
|
**When early stopping is acceptable**:
|
|
- Proof of concept (showing BAIME works for domain)
|
|
- Time constraints (better to have 0.70 than nothing)
|
|
- Sufficient for current needs (will iterate later)
|
|
- Learning exercise (not production use)
|
|
|
|
**When to push for full convergence**:
|
|
- Production deliverable needed
|
|
- Methodology will be shared/reused
|
|
- Investment in convergence pays off quickly
|
|
- Demonstrating BAIME effectiveness
|
|
|
|
**Recommendation**: Aim for dual convergence. The final iterations often provide the highest-value insights.
|
|
|
|
#### What if iterations take longer than estimated?
|
|
|
|
**Common in early BAIME use**:
|
|
- First experiment: 20-40 hours (learning BAIME itself)
|
|
- Second experiment: 15-25 hours (familiar with process)
|
|
- Third+ experiment: 10-20 hours (efficient execution)
|
|
|
|
**Time optimization strategies**:
|
|
|
|
1. **Invest in baseline** (Iteration 0):
|
|
- 3-4 hours in Iteration 0 can save 6+ hours overall
|
|
- Higher V_meta_0 (≥0.40) enables rapid convergence
|
|
|
|
2. **Use specialized subagents**:
|
|
- iteration-executor saves 1-2 hours per iteration
|
|
- knowledge-extractor saves 4-6 hours post-convergence
|
|
|
|
3. **Time-box template creation**:
|
|
- Set 1.5 hour limit per template
|
|
- Quality over quantity (3 excellent > 5 mediocre)
|
|
|
|
4. **Batch similar work**:
|
|
- Create all templates together (context switching cost)
|
|
- Run all automation tools together (testing efficiency)
|
|
|
|
5. **Defer low-ROI items**:
|
|
- Visual aids can wait (2 hours for +0.03 impact)
|
|
- Second example if first validates pattern
|
|
|
|
**If consistently over time**: Review your value function definitions. May be too ambitious for domain complexity.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1. Define Your Domain
|
|
|
|
Choose the methodology you want to develop:
|
|
|
|
```
|
|
Examples:
|
|
- "Develop systematic testing strategy for Go projects"
|
|
- "Create CI/CD pipeline methodology with quality gates"
|
|
- "Build error recovery patterns for web services"
|
|
- "Establish documentation management system"
|
|
```
|
|
|
|
### 2. Establish Baseline
|
|
|
|
Measure current state in your domain:
|
|
|
|
```bash
|
|
# Example: Testing domain
|
|
- Current coverage: 65%
|
|
- Test approach: Ad-hoc
|
|
- No systematic patterns
|
|
- Estimated effort: High
|
|
|
|
# Example: CI/CD domain
|
|
- Build time: 5 minutes
|
|
- No quality gates
|
|
- Manual releases
|
|
- No smoke tests
|
|
```
|
|
|
|
### 3. Set Dual Goals
|
|
|
|
Define objectives for both layers:
|
|
|
|
**Instance Goal** (domain-specific):
|
|
- "Reach 80% test coverage with systematic strategy"
|
|
- "Reduce CI/CD build time to <2 minutes with quality gates"
|
|
|
|
**Meta Goal** (methodology):
|
|
- "Create reusable testing strategy with 85%+ transferability"
|
|
- "Develop CI/CD methodology applicable to any Go project"
|
|
|
|
### 4. Create Experiment Structure
|
|
|
|
```bash
|
|
# Create experiment directory
|
|
mkdir -p experiments/my-methodology
|
|
|
|
# Use iteration-prompt-designer subagent
|
|
# (See Specialized Agents section below)
|
|
```
|
|
|
|
### 5. Start Iteration 0
|
|
|
|
Execute baseline iteration using iteration-executor subagent.
|
|
|
|
---
|
|
|
|
## Step-by-Step Workflow
|
|
|
|
### Phase 0: Experiment Setup
|
|
|
|
**Goal**: Create experiment structure and iteration prompts
|
|
|
|
**Steps**:
|
|
|
|
1. **Create experiment directory**:
|
|
```bash
|
|
cd your-project
|
|
mkdir -p experiments/my-methodology-name
|
|
cd experiments/my-methodology-name
|
|
```
|
|
|
|
2. **Design iteration prompts** (use iteration-prompt-designer subagent):
|
|
```
|
|
User: "Design ITERATION-PROMPTS.md for [domain] methodology experiment"
|
|
|
|
Agent creates:
|
|
- ITERATION-PROMPTS.md (comprehensive iteration guidance)
|
|
- Architecture overview (meta-agent + agents)
|
|
- Value function definitions
|
|
- Baseline iteration steps
|
|
```
|
|
|
|
3. **Review and customize**:
|
|
- Adjust value function components for your domain
|
|
- Customize baseline iteration steps
|
|
- Set convergence targets
|
|
|
|
**Output**: `ITERATION-PROMPTS.md` ready for execution
|
|
|
|
---
|
|
|
|
### Phase 1: Iteration 0 (Baseline)
|
|
|
|
**Goal**: Establish baseline measurements and initial system state
|
|
|
|
**Steps**:
|
|
|
|
1. **Execute iteration** (use iteration-executor subagent):
|
|
```
|
|
User: "Execute Iteration 0 for [domain] methodology using iteration-executor"
|
|
```
|
|
|
|
2. **Iteration-executor will**:
|
|
- Create modular architecture (capabilities, agents, system state)
|
|
- Collect baseline data
|
|
- Create first deliverables (low quality expected)
|
|
- Calculate V_instance_0 and V_meta_0 (honest assessment)
|
|
- Identify problems and gaps
|
|
- Generate iteration-0.md documentation
|
|
|
|
3. **Review baseline results**:
|
|
```bash
|
|
# Check value scores
|
|
cat system-state.md
|
|
|
|
# Review iteration documentation
|
|
cat iteration-0.md
|
|
|
|
# Check identified problems
|
|
grep "Problems" system-state.md
|
|
```
|
|
|
|
**Expected Baseline**: V_instance: 0.20-0.40, V_meta: 0.15-0.30
|
|
|
|
**Key Principle**: Low scores are expected and acceptable. This is measurement baseline, not final product.
|
|
|
|
---
|
|
|
|
### Phase 2: Iterations 1-N (Evolution)
|
|
|
|
**Goal**: Iteratively improve both deliverables and methodology until convergence
|
|
|
|
**For Each Iteration**:
|
|
|
|
1. **Read system state**:
|
|
```bash
|
|
cat system-state.md # Current scores and problems
|
|
cat iteration-log.md # Iteration history
|
|
```
|
|
|
|
2. **Execute iteration** (use iteration-executor):
|
|
```
|
|
User: "Execute Iteration N for [domain] methodology using iteration-executor"
|
|
```
|
|
|
|
3. **Iteration-executor follows OCA cycle**:
|
|
|
|
**Observe**:
|
|
- Read all capabilities for methodology context
|
|
- Collect data on prioritized problems
|
|
- Gather evidence about current state
|
|
|
|
**Codify**:
|
|
- Form strategy based on evidence
|
|
- Plan specific improvements
|
|
- Set iteration targets
|
|
|
|
**Execute**:
|
|
- Create/improve deliverables
|
|
- Apply methodology patterns
|
|
- Document execution observations
|
|
|
|
**Evaluate**:
|
|
- Calculate V_instance_N and V_meta_N
|
|
- Provide evidence for each score component
|
|
- Identify remaining gaps
|
|
|
|
**Converge**:
|
|
- Check convergence criteria
|
|
- Extract patterns (if evidence supports)
|
|
- Update capabilities (if retrospective evidence shows gaps)
|
|
- Prioritize next iteration focus
|
|
|
|
4. **Review iteration results**:
|
|
```bash
|
|
cat iteration-N.md # Complete iteration documentation
|
|
cat system-state.md # Updated scores and state
|
|
cat iteration-log.md # Updated history
|
|
```
|
|
|
|
5. **Check convergence**:
|
|
- V_instance ≥ 0.80?
|
|
- V_meta ≥ 0.80?
|
|
- Both stable for 2+ iterations?
|
|
- If YES → Converged! Move to Phase 3
|
|
- If NO → Continue to next iteration
|
|
|
|
**Typical Iteration Count**: 3-7 iterations to convergence
|
|
|
|
---
|
|
|
|
### Phase 3: Knowledge Extraction (Post-Convergence)
|
|
|
|
**Goal**: Transform experiment artifacts into reusable methodology
|
|
|
|
**Steps**:
|
|
|
|
1. **Use knowledge-extractor subagent**:
|
|
```
|
|
User: "Extract methodology from [domain] experiment using knowledge-extractor"
|
|
```
|
|
|
|
2. **Knowledge-extractor creates**:
|
|
- Methodology guide (comprehensive documentation)
|
|
- Pattern library (extracted patterns)
|
|
- Template collection (reusable templates)
|
|
- Automation tools (scripts, validators)
|
|
- Best practices (principles discovered)
|
|
|
|
3. **Package as skill** (optional):
|
|
```bash
|
|
# Create skill structure
|
|
mkdir -p .claude/skills/my-methodology
|
|
|
|
# Copy extracted knowledge
|
|
cp -r patterns templates .claude/skills/my-methodology/
|
|
|
|
# Create SKILL.md
|
|
# (See knowledge-extractor output for template)
|
|
```
|
|
|
|
**Output**: Reusable methodology ready for other projects
|
|
|
|
---
|
|
|
|
## Specialized Agents
|
|
|
|
BAIME provides three specialized Claude Code subagents:
|
|
|
|
### iteration-prompt-designer
|
|
|
|
**Purpose**: Design comprehensive ITERATION-PROMPTS.md for your experiment
|
|
|
|
**When to use**: At experiment start, before Iteration 0
|
|
|
|
**Invocation**:
|
|
```
|
|
Use Task tool with subagent_type="iteration-prompt-designer"
|
|
|
|
Example:
|
|
"Design ITERATION-PROMPTS.md for CI/CD optimization methodology experiment"
|
|
```
|
|
|
|
**What it creates**:
|
|
- Modular meta-agent architecture definition
|
|
- Domain-specific value function design
|
|
- Baseline iteration (Iteration 0) detailed steps
|
|
- Subsequent iteration templates
|
|
- Evidence-driven evolution guidance
|
|
|
|
**Time saved**: 2-3 hours of setup work
|
|
|
|
---
|
|
|
|
### iteration-executor
|
|
|
|
**Purpose**: Execute iteration through complete OCA cycle
|
|
|
|
**When to use**: For each iteration (Iteration 0, 1, 2, ...)
|
|
|
|
**Invocation**:
|
|
```
|
|
Use Task tool with subagent_type="iteration-executor"
|
|
|
|
Example:
|
|
"Execute Iteration 2 of testing methodology experiment using iteration-executor"
|
|
```
|
|
|
|
**What it does**:
|
|
1. Reads previous iteration state
|
|
2. Reads all capability files (fresh, no caching)
|
|
3. Executes lifecycle phases:
|
|
- Data Collection (Observe)
|
|
- Strategy Formation (Codify)
|
|
- Work Execution (Automate)
|
|
- Evaluation (Calculate dual values)
|
|
- Convergence Check (Assess progress)
|
|
4. Generates complete iteration-N.md documentation
|
|
5. Updates system-state.md and iteration-log.md
|
|
|
|
**Benefits**:
|
|
- ✅ Consistent iteration structure
|
|
- ✅ Systematic value calculation (reduces bias)
|
|
- ✅ Proper convergence evaluation
|
|
- ✅ Complete artifact generation
|
|
- ✅ Structured execution vs ad-hoc
|
|
|
|
---
|
|
|
|
### knowledge-extractor
|
|
|
|
**Purpose**: Extract and transform converged experiment into reusable methodology
|
|
|
|
**When to use**: After experiment converges
|
|
|
|
**Invocation**:
|
|
```
|
|
Use Task tool with subagent_type="knowledge-extractor"
|
|
|
|
Example:
|
|
"Extract methodology from documentation-management experiment using knowledge-extractor"
|
|
```
|
|
|
|
**What it creates**:
|
|
- Methodology guide (user-facing documentation)
|
|
- Pattern library (validated patterns)
|
|
- Template collection (reusable templates)
|
|
- Automation tools (scripts, validators)
|
|
- Best practices guide (principles)
|
|
- Skill package (optional .claude/skills/ structure)
|
|
|
|
**Time saved**: 4-6 hours of knowledge organization work
|
|
|
|
---
|
|
|
|
## Practical Example
|
|
|
|
### Example: Developing Testing Methodology
|
|
|
|
**Domain**: Systematic testing strategy for Go projects
|
|
|
|
#### Step 1: Setup
|
|
|
|
```bash
|
|
# Create experiment
|
|
mkdir -p experiments/testing-methodology
|
|
cd experiments/testing-methodology
|
|
|
|
# Design iteration prompts
|
|
# (Use iteration-prompt-designer subagent)
|
|
```
|
|
|
|
Result: `ITERATION-PROMPTS.md` created with:
|
|
- Value functions for testing (coverage, quality, maintainability)
|
|
- Baseline iteration steps
|
|
- Testing-specific guidance
|
|
|
|
#### Step 2: Iteration 0 (Baseline)
|
|
|
|
```
|
|
User: "Execute Iteration 0 of testing methodology using iteration-executor"
|
|
```
|
|
|
|
**What happens**:
|
|
|
|
1. **Architecture created**:
|
|
```
|
|
testing-methodology/
|
|
├── capabilities/
|
|
│ ├── test-collect.md (placeholder)
|
|
│ ├── test-strategy.md (placeholder)
|
|
│ ├── test-execute.md (placeholder)
|
|
│ ├── test-evaluate.md (placeholder)
|
|
│ └── test-converge.md (placeholder)
|
|
├── agents/
|
|
│ ├── test-generator.md (placeholder)
|
|
│ └── test-validator.md (placeholder)
|
|
├── data/
|
|
├── patterns/
|
|
├── templates/
|
|
├── system-state.md
|
|
├── iteration-log.md
|
|
└── knowledge-index.md
|
|
```
|
|
|
|
2. **Data collected**:
|
|
```
|
|
data/current-testing-state.md:
|
|
- Current coverage: 65%
|
|
- Test approach: Ad-hoc unit tests
|
|
- No integration test strategy
|
|
- No TDD workflow
|
|
```
|
|
|
|
3. **First deliverable created**:
|
|
```
|
|
# Example: Basic test helper function
|
|
# Quality: Low (intentionally, for baseline)
|
|
```
|
|
|
|
4. **Baseline scores calculated**:
|
|
```
|
|
V_instance_0: 0.35
|
|
- Coverage: 0.40 (65% actual, target 80%)
|
|
- Quality: 0.25 (ad-hoc, no systematic approach)
|
|
- Maintainability: 0.40 (some organization)
|
|
|
|
V_meta_0: 0.25
|
|
- Completeness: 0.20 (capabilities empty)
|
|
- Effectiveness: 0.30 (no proven patterns yet)
|
|
- Reusability: 0.20 (project-specific so far)
|
|
- Validation: 0.30 (baseline measurement only)
|
|
```
|
|
|
|
5. **Problems identified**:
|
|
- No TDD workflow
|
|
- Coverage gaps unknown
|
|
- Test organization unclear
|
|
- No fixture patterns
|
|
|
|
**Output**: `iteration-0.md` with complete baseline documentation
|
|
|
|
#### Step 3: Iteration 1 (First Improvement)
|
|
|
|
```
|
|
User: "Execute Iteration 1 of testing methodology using iteration-executor"
|
|
```
|
|
|
|
**Focused on**: TDD workflow and coverage analysis
|
|
|
|
**Results**:
|
|
- Created TDD workflow pattern
|
|
- Built coverage gap analyzer tool
|
|
- Improved test organization
|
|
- V_instance_1: 0.55 (+0.20)
|
|
- V_meta_1: 0.45 (+0.20)
|
|
|
|
#### Step 4: Iterations 2-3 (Evolution)
|
|
|
|
Continued iterations until:
|
|
- V_instance_3: 0.85
|
|
- V_meta_3: 0.83
|
|
- Both stable (no major changes in iteration 4)
|
|
|
|
**Convergence achieved!**
|
|
|
|
#### Step 5: Knowledge Extraction
|
|
|
|
```
|
|
User: "Extract methodology from testing-methodology experiment using knowledge-extractor"
|
|
```
|
|
|
|
**Created**:
|
|
- `methodology/testing-strategy.md` (comprehensive guide)
|
|
- 8 validated patterns
|
|
- 3 reusable templates
|
|
- Coverage analyzer tool
|
|
- Test generator script
|
|
|
|
**Result**: Reusable testing methodology ready for other Go projects
|
|
|
|
---
|
|
|
|
### Example 2: Developing Error Recovery Methodology
|
|
|
|
**Domain**: Systematic error handling and recovery patterns for software systems
|
|
|
|
**Why This Example**: Demonstrates BAIME applicability to a different domain (error handling vs testing), showing methodology transferability and universal OCA cycle pattern.
|
|
|
|
#### Step 1: Setup
|
|
|
|
```bash
|
|
# Create experiment
|
|
mkdir -p experiments/error-recovery
|
|
cd experiments/error-recovery
|
|
|
|
# Design iteration prompts
|
|
# (Use iteration-prompt-designer subagent)
|
|
```
|
|
|
|
Result: `ITERATION-PROMPTS.md` created with:
|
|
- Value functions for error recovery (coverage, diagnostic quality, recovery effectiveness)
|
|
- Error taxonomy definition
|
|
- Recovery pattern identification
|
|
|
|
#### Step 2: Iteration 0 (Baseline)
|
|
|
|
```
|
|
User: "Execute Iteration 0 of error-recovery methodology using iteration-executor"
|
|
```
|
|
|
|
**What happens**:
|
|
|
|
1. **Architecture created**:
|
|
```
|
|
error-recovery/
|
|
├── capabilities/
|
|
│ ├── error-collect.md (placeholder)
|
|
│ ├── error-strategy.md (placeholder)
|
|
│ ├── error-execute.md (placeholder)
|
|
│ ├── error-evaluate.md (placeholder)
|
|
│ └── error-converge.md (placeholder)
|
|
├── agents/
|
|
│ ├── error-analyzer.md (placeholder)
|
|
│ └── error-classifier.md (placeholder)
|
|
├── data/
|
|
├── patterns/
|
|
├── templates/
|
|
├── system-state.md
|
|
├── iteration-log.md
|
|
└── knowledge-index.md
|
|
```
|
|
|
|
2. **Data collected**:
|
|
```
|
|
data/error-analysis.md:
|
|
- Historical errors: 1,336 instances analyzed
|
|
- Error handling: Ad-hoc, inconsistent
|
|
- Recovery patterns: None documented
|
|
- MTTD/MTTR: High, no systematic diagnosis
|
|
```
|
|
|
|
3. **First deliverable created**:
|
|
```
|
|
# Initial error taxonomy (13 categories)
|
|
# Quality: Basic classification, no recovery patterns yet
|
|
```
|
|
|
|
4. **Baseline scores calculated**:
|
|
```
|
|
V_instance_0: 0.40
|
|
- Coverage: 0.50 (errors classified, not all types covered)
|
|
- Diagnostic Quality: 0.30 (basic categorization only)
|
|
- Recovery Effectiveness: 0.25 (no systematic recovery)
|
|
- Documentation: 0.55 (taxonomy exists)
|
|
|
|
V_meta_0: 0.30
|
|
- Completeness: 0.25 (taxonomy only, no workflows)
|
|
- Effectiveness: 0.35 (classification helpful but limited)
|
|
- Reusability: 0.25 (domain-specific so far)
|
|
- Validation: 0.35 (validated against 1,336 historical errors)
|
|
```
|
|
|
|
5. **Problems identified**:
|
|
- No systematic diagnosis workflow
|
|
- No recovery patterns
|
|
- No prevention guidelines
|
|
- Taxonomy incomplete (95.4% coverage, gaps exist)
|
|
|
|
**Output**: `iteration-0.md` with complete baseline documentation
|
|
|
|
**Key Difference from Testing Example**: Error Recovery started with rich historical data (1,336 errors), enabling retrospective validation from Iteration 0. This demonstrates how domain characteristics affect baseline quality (V_instance_0 = 0.40 vs Testing's 0.35).
|
|
|
|
#### Step 3: Iteration 1 (Diagnostic Workflows)
|
|
|
|
```
|
|
User: "Execute Iteration 1 of error-recovery methodology using iteration-executor"
|
|
```
|
|
|
|
**Focused on**: Creating diagnostic workflows and expanding taxonomy
|
|
|
|
**Results**:
|
|
- Created 8 diagnostic workflows (file operations, API calls, data validation, etc.)
|
|
- Expanded error taxonomy to 13 categories
|
|
- Added contextual logging patterns
|
|
- **V_instance_1: 0.62** (+0.22, significant jump due to workflow addition)
|
|
- **V_meta_1: 0.50** (+0.20, patterns emerging)
|
|
|
|
**Pattern Emerged**: Error diagnosis follows consistent structure:
|
|
1. Symptom identification
|
|
2. Context gathering
|
|
3. Root cause analysis
|
|
4. Solution selection
|
|
|
|
#### Step 4: Iteration 2 (Recovery Patterns and Prevention)
|
|
|
|
```
|
|
User: "Execute Iteration 2 of error-recovery methodology using iteration-executor"
|
|
```
|
|
|
|
**Focused on**: Recovery patterns, prevention guidelines, automation
|
|
|
|
**Results**:
|
|
- Documented 5 recovery patterns (retry, fallback, circuit breaker, graceful degradation, fail-fast)
|
|
- Created 8 prevention guidelines
|
|
- Built 3 automation tools (file path validation, read-before-write check, file size validation)
|
|
- **V_instance_2: 0.78** (+0.16, approaching convergence)
|
|
- **V_meta_2: 0.72** (+0.22, acceleration due to automation)
|
|
|
|
**Automation Impact**: Prevention tools covered 23.7% of historical errors, proving methodology effectiveness empirically.
|
|
|
|
#### Step 5: Iteration 3 (Convergence)
|
|
|
|
```
|
|
User: "Execute Iteration 3 of error-recovery methodology using iteration-executor"
|
|
```
|
|
|
|
**Focused on**: Final validation, cross-language transferability
|
|
|
|
**Results**:
|
|
- Validated patterns across 4 languages (Go, Python, JavaScript, Rust)
|
|
- Achieved 95.4% error coverage (1,274/1,336 historical errors)
|
|
- Transferability assessment: 85-90% universal patterns
|
|
- **V_instance_3: 0.83** (+0.05, exceeded threshold)
|
|
- **V_meta_3: 0.85** (+0.13, strong convergence)
|
|
|
|
**System Stability**: No capability or agent evolution needed (3 iterations stable) - generic OCA cycle sufficient.
|
|
|
|
**Convergence Status**: ✅ **CONVERGED**
|
|
- Both layers > 0.80 ✅
|
|
- System stable (M_3 == M_2, A_3 == A_2) ✅
|
|
- Objectives complete ✅
|
|
- Total time: ~10 hours over 3 iterations
|
|
|
|
#### Step 6: Knowledge Extraction
|
|
|
|
```
|
|
User: "Extract methodology from error-recovery experiment using knowledge-extractor"
|
|
```
|
|
|
|
**Created**:
|
|
- `methodology/error-recovery.md` (comprehensive 13-category taxonomy)
|
|
- 8 diagnostic workflows
|
|
- 5 recovery patterns
|
|
- 8 prevention guidelines
|
|
- 3 automation tools (file validation, read-before-write, size validation)
|
|
- Retrospective validation report (95.4% historical error coverage)
|
|
|
|
**Result**: Reusable error recovery methodology with 85-90% transferability across languages/platforms
|
|
|
|
**Transferability Evidence**:
|
|
- Core concepts: 100% universal (error taxonomy, diagnostic workflows)
|
|
- Recovery patterns: 95% universal (retry, fallback, circuit breaker work everywhere)
|
|
- Automation tools: 60% universal (concepts transfer, implementations vary by language)
|
|
|
|
---
|
|
|
|
### Comparing the Two Examples
|
|
|
|
| Aspect | Testing Methodology | Error Recovery Methodology |
|
|
|--------|---------------------|----------------------------|
|
|
| **Domain Complexity** | Medium (test strategies, patterns) | High (13 error categories, recovery patterns) |
|
|
| **Baseline Data** | Limited (current tests only) | Rich (1,336 historical errors) |
|
|
| **V_instance_0** | 0.35 | 0.40 (higher due to historical data) |
|
|
| **V_meta_0** | 0.25 | 0.30 (retrospective validation possible) |
|
|
| **Iterations to Converge** | 3-4 iterations | 3 iterations (rapid due to data richness) |
|
|
| **Total Time** | ~12 hours | ~10 hours (rich baseline enabled efficiency) |
|
|
| **Transferability** | 89% (Go projects) | 85-90% (universal, cross-language) |
|
|
| **Key Innovation** | TDD workflow, coverage analyzer | Error taxonomy, diagnostic workflows, prevention |
|
|
| **System Evolution** | Stable (no agent specialization) | Stable (no agent specialization) |
|
|
|
|
**Universal Lessons**:
|
|
1. **Rich baseline data accelerates convergence** (Error Recovery's 1,336 errors vs Testing's current state)
|
|
2. **OCA cycle works across domains** (same structure, different content)
|
|
3. **System stability is common** (both examples: no agent evolution needed)
|
|
4. **Retrospective validation powerful** (Error Recovery: 95.4% coverage proves methodology)
|
|
5. **Automation provides empirical evidence** (23.7% error prevention measurable)
|
|
|
|
**BAIME Transferability Confirmed**: Same methodology framework produced high-quality results in two distinct domains (testing vs error handling), demonstrating universal applicability.
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Value scores not improving
|
|
|
|
**Symptoms**: V_instance or V_meta stuck or decreasing across iterations
|
|
|
|
**Example**:
|
|
```
|
|
Iteration 0: V_instance = 0.35, V_meta = 0.25
|
|
Iteration 1: V_instance = 0.37, V_meta = 0.28 (minimal progress)
|
|
Iteration 2: V_instance = 0.34, V_meta = 0.30 (instance decreased!)
|
|
```
|
|
|
|
**Diagnosis**:
|
|
|
|
**Root Cause 1: Solving symptoms, not problems**
|
|
```
|
|
❌ Problem identified: "Low test coverage"
|
|
❌ Solution attempted: "Write more tests"
|
|
❌ Result: Coverage increased but tests are brittle, hard to maintain
|
|
|
|
✅ Better problem: "No systematic testing strategy"
|
|
✅ Better solution: "Create TDD workflow, extract test patterns"
|
|
✅ Result: Fewer tests, but higher quality and maintainable
|
|
```
|
|
|
|
**Root Cause 2: Strategy not evidence-based**
|
|
```
|
|
❌ Strategy: "Let's add integration tests because they seem useful"
|
|
❌ Evidence: None (speculation)
|
|
|
|
✅ Strategy: "Data shows 80% of bugs in API layer, add API tests"
|
|
✅ Evidence: Bug analysis from data/bug-analysis.md
|
|
```
|
|
|
|
**Root Cause 3: Scope too broad**
|
|
```
|
|
❌ Iteration 2 plan: Fix 7 problems (test coverage, CI/CD, docs, errors)
|
|
❌ Result: All partially done, none well done
|
|
|
|
✅ Iteration 2 plan: Fix top 2 problems (test strategy, coverage analysis)
|
|
✅ Result: Both fully solved, measurable improvement
|
|
```
|
|
|
|
**Solutions**:
|
|
1. **Re-examine problem identification**:
|
|
- Are you solving root causes or symptoms?
|
|
- Review data artifacts - do they support your problem statement?
|
|
- Ask "why" 3 times to find root cause
|
|
|
|
2. **Verify evidence quality**:
|
|
- Is data collection comprehensive?
|
|
- Do you have concrete measurements?
|
|
- Can you show before/after data?
|
|
|
|
3. **Narrow focus**:
|
|
- Address top 2-3 highest-impact problems only
|
|
- Better to solve 2 problems completely than 5 partially
|
|
- Defer lower-priority items to next iteration
|
|
|
|
4. **Re-evaluate strategy**:
|
|
- Is it based on data or assumptions?
|
|
- Review iteration-N-strategy.md for evidence
|
|
- Challenge each planned improvement: "What evidence supports this?"
|
|
|
|
---
|
|
|
|
### Issue: Methodology not transferable (low V_meta Reusability)
|
|
|
|
**Symptoms**: V_meta Reusability component < 0.60 after multiple iterations
|
|
|
|
**Example**:
|
|
```
|
|
Iteration 2 evaluation:
|
|
- Completeness: 0.70 ✅
|
|
- Effectiveness: 0.75 ✅
|
|
- Reusability: 0.45 ❌ (blocking convergence)
|
|
- Validation: 0.65 ✅
|
|
```
|
|
|
|
**Diagnosis**:
|
|
|
|
**Problem: Patterns too project-specific**
|
|
|
|
Before (Low Reusability):
|
|
```markdown
|
|
# Testing Pattern
|
|
1. Create test file in src/api/handlers/__tests__/
|
|
2. Import UserModel from "../../models/user"
|
|
3. Use Jest expect() assertions
|
|
4. Run with npm test
|
|
```
|
|
|
|
After (High Reusability):
|
|
```markdown
|
|
# Testing Pattern (Parameterized)
|
|
1. Create test file adjacent to source: {source_dir}/__tests__/{module}_test{ext}
|
|
2. Import module under test: {import_statement}
|
|
3. Use test framework assertion: {assertion_method}
|
|
4. Run with project test command: {test_command}
|
|
|
|
Adaptation guide:
|
|
- Go: {ext}=.go, {assertion_method}=testing.T methods
|
|
- JS: {ext}=.js, {assertion_method}=expect() or assert()
|
|
- Python: {ext}=.py, {assertion_method}=unittest assertions
|
|
```
|
|
|
|
**Problem: No abstraction of domain concepts**
|
|
|
|
Before:
|
|
```markdown
|
|
# CI/CD Pattern
|
|
- Install Go 1.21
|
|
- Run go test ./...
|
|
- Build with go build -o bin/app
|
|
- Check coverage is >80%
|
|
```
|
|
|
|
After (Abstracted):
|
|
```markdown
|
|
# CI/CD Quality Gate Pattern
|
|
|
|
Universal steps:
|
|
1. Install language runtime (version from project config)
|
|
2. Run test suite (project-specific command)
|
|
3. Build artifact (project-specific build process)
|
|
4. Verify quality threshold (configurable threshold)
|
|
|
|
Domain-specific implementations:
|
|
- Go: {runtime}=Go 1.21+, {test}=go test, {build}=go build
|
|
- Node: {runtime}=Node 18+, {test}=npm test, {build}=npm run build
|
|
- Python: {runtime}=Python 3.10+, {test}=pytest, {build}=python setup.py
|
|
```
|
|
|
|
**Solutions**:
|
|
1. **Extract universal patterns**:
|
|
- Identify what's essential vs project-specific
|
|
- Replace hardcoded values with parameters
|
|
- Document adaptation guide
|
|
|
|
2. **Create parameterized templates**:
|
|
- Use placeholders: {variable_name}
|
|
- Provide examples for 3+ different contexts
|
|
- Include "How to adapt" section
|
|
|
|
3. **Test across scenarios**:
|
|
- Apply pattern to different project in same domain
|
|
- Document what needed changing
|
|
- Refine pattern based on adaptation effort
|
|
|
|
4. **Add abstraction layers**:
|
|
- Layer 1: Universal principle (works anywhere)
|
|
- Layer 2: Domain-specific implementation (testing/CI/CD/etc)
|
|
- Layer 3: Tool-specific details (Jest/pytest/etc)
|
|
|
|
---
|
|
|
|
### Issue: Can't reach convergence (stuck at V ~0.70)
|
|
|
|
**Symptoms**: Multiple iterations without reaching 0.80
|
|
|
|
**Causes**:
|
|
- Unrealistic convergence targets
|
|
- Missing critical patterns
|
|
- Need specialized agent but using generic approach
|
|
|
|
**Solutions**:
|
|
1. Review value function definitions - are they appropriate?
|
|
2. Identify missing methodology components
|
|
3. Consider creating specialized agent if problem recurs
|
|
4. Re-assess convergence criteria - is 0.80 realistic for this domain?
|
|
|
|
---
|
|
|
|
### Issue: Too many iterations (>10)
|
|
|
|
**Symptoms**: Slow convergence, many iterations needed
|
|
|
|
**Causes**:
|
|
- Insufficient baseline (V_meta_0 < 0.20)
|
|
- Not extracting patterns early enough
|
|
- Too conservative improvements
|
|
|
|
**Solutions**:
|
|
1. Improve baseline iteration - invest more time in Iteration 0
|
|
2. Extract patterns when they recur (don't wait)
|
|
3. Make bolder improvements (test larger changes)
|
|
4. Use specialized agents earlier
|
|
|
|
---
|
|
|
|
### Issue: Premature convergence claims
|
|
|
|
**Symptoms**: Claiming convergence but quality obviously low
|
|
|
|
**Causes**:
|
|
- Inflated value scores (not honest assessment)
|
|
- Comparing to "could be worse" instead of rubrics
|
|
- Time pressure leading to rushed evaluation
|
|
|
|
**Solutions**:
|
|
1. Seek disconfirming evidence actively
|
|
2. Test deliverables thoroughly
|
|
3. Enumerate gaps explicitly
|
|
4. Challenge high scores with extra scrutiny
|
|
5. Remember: Honest assessment is more valuable than fast convergence
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### After Your First BAIME Experiment
|
|
|
|
1. **Review iteration documentation** - See what worked, what didn't
|
|
2. **Extract lessons learned** - Document insights about BAIME process
|
|
3. **Apply methodology** - Use created methodology in real work
|
|
4. **Share knowledge** - Package as skill or contribute back
|
|
|
|
### Advanced Topics
|
|
|
|
- **Baseline Quality Assessment** - Achieve comprehensive baseline (V_meta ≥ 0.40 in Iteration 0) for faster convergence
|
|
- **Rapid Convergence** - Techniques for 3-4 iteration methodology development
|
|
- **Agent Specialization** - When and how to create specialized agents
|
|
- **Retrospective Validation** - Validate methodology against historical data
|
|
- **Cross-Domain Transfer** - Apply methodology to different projects
|
|
|
|
See individual skills for detailed guidance:
|
|
- `baseline-quality-assessment`
|
|
- `rapid-convergence`
|
|
- `agent-prompt-evolution`
|
|
- `retrospective-validation`
|
|
|
|
### Further Reading
|
|
|
|
- **[Methodology Bootstrapping Skill](../../.claude/skills/methodology-bootstrapping/)** - Complete BAIME reference
|
|
- **[Empirical Methodology Development](../methodology/empirical-methodology-development.md)** - Theoretical foundation
|
|
- **[Bootstrapped Software Engineering](../methodology/bootstrapped-software-engineering.md)** - BAIME in depth
|
|
- **[Example Experiments](../../experiments/)** - Real BAIME experiments to study
|
|
|
|
### Getting Help
|
|
|
|
- **Check skill documentation**: `.claude/skills/methodology-bootstrapping/`
|
|
- **Review example experiments**: `experiments/bootstrap-*/`
|
|
- **Use @meta-coach**: Ask for workflow optimization guidance
|
|
- **Read iteration documentation**: See how past experiments evolved
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**BAIME provides**:
|
|
- ✅ Systematic framework for methodology development
|
|
- ✅ Empirical validation with data-driven decisions
|
|
- ✅ Dual-layer value functions for quality measurement
|
|
- ✅ Specialized agents for streamlined execution
|
|
- ✅ Proven results: 10-50x speedup, 70-95% transferability
|
|
|
|
**Key workflow**:
|
|
1. Define domain and dual goals
|
|
2. Design iteration prompts (iteration-prompt-designer)
|
|
3. Execute Iteration 0 baseline (iteration-executor)
|
|
4. Iterate until convergence (typically 3-7 iterations)
|
|
5. Extract knowledge (knowledge-extractor)
|
|
6. Apply methodology to real work
|
|
|
|
**Remember**:
|
|
- Start with clear domain and goals
|
|
- Low baseline scores are expected
|
|
- Honest assessment is crucial
|
|
- Evidence-driven evolution (not anticipatory design)
|
|
- Convergence requires both V_instance ≥ 0.80 AND V_meta ≥ 0.80
|
|
|
|
**Ready to start?** Choose your domain, set up your experiment, and begin with Iteration 0!
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0 (Iteration 0 Baseline)
|
|
**Last Updated**: 2025-10-19
|
|
**Status**: Initial version - Will evolve based on user feedback
|