gh-yaleh-meta-cc-claude/skills/methodology-bootstrapping/reference/observe-codify-automate.md

---
name: bootstrapped-se
description: Apply Bootstrapped Software Engineering (BSE) methodology to evolve project-specific development practices through systematic Observe-Codify-Automate cycles
keywords: bootstrapping, meta-methodology, OCA, observe, codify, automate, self-improvement, empirical, methodology-development
category: methodology
version: 1.0.0
based_on: docs/methodology/bootstrapped-software-engineering.md
transferability: 95%
effectiveness: 10-50x methodology development speedup
---

# Bootstrapped Software Engineering

**Evolve project-specific methodologies through systematic observation, codification, and automation.**

> The best methodologies are not **designed** but **evolved** through systematic observation, codification, and automation of successful practices.

---

## Core Insight

Traditional methodologies are theory-driven and static. **Bootstrapped Software Engineering (BSE)** enables development processes to:

1. **Observe** themselves through instrumentation and data collection
2. **Codify** discovered patterns into reusable methodologies
3. **Automate** methodology enforcement and validation
4. **Self-improve** by applying the methodology to its own evolution

### Three-Tuple Output

Every BSE process produces:

```
(O, Aₙ, Mₙ)

where:
  O  = Task output (code, documentation, system)
  Aₙ = Converged agent set (reusable for similar tasks)
  Mₙ = Converged meta-agent (transferable to new domains)
```

---

## The OCA Framework

**Three-Phase Cycle**: Observe → Codify → Automate

### Phase 1: OBSERVE

**Instrument your development process to collect data**

**Tools**:
- Session history analysis (meta-cc)
- Git commit analysis
- Code metrics (coverage, complexity)
- Access pattern tracking
- Error rate monitoring

**Example** (from meta-cc):
```bash
# Analyze file access patterns
meta-cc query files --threshold 5

# Result: plan.md accessed 423 times (highest)
# Insight: Core reference document, needs optimization
```

**Output**: Empirical data about actual development patterns

### Phase 2: CODIFY

**Extract patterns and document as reusable methodologies**

**Process**:
1. **Pattern Recognition**: Identify recurring successful practices
2. **Hypothesis Formation**: Formulate testable claims
3. **Documentation**: Write methodology documents
4. **Validation**: Test methodology on real scenarios

**Example** (from meta-cc):
```markdown
# Discovered Pattern: Role-Based Documentation

Observation:
  - plan.md: 423 accesses (Coordination role)
  - CLAUDE.md: ~300 implicit loads (Entry Point role)
  - features.md: 89 accesses (Reference role)

Methodology:
  - Classify docs by actual access patterns
  - Optimize high-access docs for token efficiency
  - Create role-specific maintenance procedures

Validation:
  - CLAUDE.md reduction: 607 → 278 lines (-54%)
  - Token cost reduction: 47%
  - Access efficiency: Maintained
```

**Output**: Documented methodology with empirical validation

### Phase 3: AUTOMATE

**Convert methodology into automated checks and tools**

**Automation Levels**:
1. **Detection**: Automated pattern detection
2. **Validation**: Check compliance with methodology
3. **Enforcement**: CI/CD integration, block violations
4. **Suggestion**: Automated fix recommendations

**Example** (from meta-cc):
```bash
# Automation: /meta doc-health capability

# Checks:
- Role classification compliance
- Token efficiency (lines < threshold)
- Cross-reference completeness
- Update frequency

# Actions:
- Flag oversized documents
- Suggest restructuring
- Validate role assignments
```

**Output**: Automated tools enforcing methodology

---

## Self-Referential Feedback Loop

The ultimate power of BSE: **Apply the methodology to improve itself**

```
Layer 0: Basic Functionality
  → Build tools (meta-cc CLI)

Layer 1: Self-Observation
  → Use tools on self (query own sessions)
  → Discovery: Usage patterns, bottlenecks

Layer 2: Pattern Recognition
  → Analyze data (R/E ratio, access density)
  → Discovery: Document roles, optimization opportunities

Layer 3: Methodology Extraction
  → Codify patterns (role-based-documentation.md)
  → Definition: Classification algorithm, maintenance procedures

Layer 4: Tool Automation
  → Implement checks (/meta doc-health)
  → Auto-validate: Methodology compliance

Layer 5: Continuous Evolution
  → Apply tools to self
  → Discover new patterns → Update methodology → Update tools
```

**This creates a closed loop**: Tools improve tools, methodologies optimize methodologies.

---

## Parameters

- **domain**: `documentation` | `testing` | `architecture` | `custom` (default: `custom`)
- **observation_period**: number of days/commits to analyze (default: auto-detect)
- **automation_level**: `detect` | `validate` | `enforce` | `suggest` (default: `validate`)
- **iteration_count**: number of OCA cycles (default: 3)

---

## Execution Flow

### Phase 1: Observation Setup

```python
1. Identify observation targets
   - Code metrics (LOC, complexity, coverage)
   - Development patterns (commits, PRs, errors)
   - Access patterns (file reads, searches)
   - Quality metrics (test results, build time)

2. Install instrumentation
   - meta-cc integration (session analysis)
   - Git hooks (commit tracking)
   - Coverage tracking
   - CI/CD metrics

3. Collect baseline data
   - Run for observation_period
   - Generate initial reports
   - Identify data gaps
```

### Phase 2: Pattern Analysis

```python
4. Analyze collected data
   - Statistical analysis (frequencies, correlations)
   - Pattern recognition (recurring behaviors)
   - Anomaly detection (outliers, inefficiencies)

5. Formulate hypotheses
   - "High-access docs should be < 300 lines"
   - "Test coverage gaps correlate with bugs"
   - "Batch remediation is 5x more efficient"

6. Validate hypotheses
   - Historical data validation
   - A/B testing if possible
   - Expert review
```

### Phase 3: Codification

```python
7. Document patterns
   - Pattern name and description
   - Context and applicability
   - Implementation steps
   - Validation criteria
   - Examples and counter-examples

8. Create methodology
   - Problem statement
   - Solution approach
   - Procedures and guidelines
   - Metrics and validation

9. Peer review
   - Team review
   - Iterate based on feedback
```

### Phase 4: Automation

```python
10. Design automation
    - Detection: Identify when pattern applies
    - Validation: Check compliance
    - Enforcement: Prevent violations
    - Suggestion: Recommend fixes

11. Implement tools
    - Scripts (bash, Python)
    - CI/CD integration
    - IDE plugins
    - Bot automation

12. Deploy and monitor
    - Gradual rollout
    - Collect usage data
    - Measure effectiveness
```

### Phase 5: Evolution

```python
13. Apply to self
    - Use tools on development process
    - Discover meta-patterns
    - Optimize methodology

14. Iterate OCA cycle
    - New observations → New patterns
    - Refined hypotheses → Better validation
    - Enhanced automation → Higher compliance
```

---

## Usage Examples

### Example 1: Documentation Optimization

```bash
# User: "Optimize project documentation"
bootstrapped-se domain=documentation

# Execution:

[OBSERVE Phase]
✓ Analyzing file access patterns (30 days)...
  - README.md: 423 accesses (Entry Point)
  - ARCHITECTURE.md: 89 accesses (Reference)
  - API.md: 234 accesses (Reference)

✓ Measuring token efficiency...
  - README.md: 1909 lines (HIGH - inefficient)
  - ARCHITECTURE.md: 456 lines (OK)
  - API.md: 789 lines (MEDIUM)

[CODIFY Phase]
✓ Pattern identified: Role-Based Documentation
  - Entry Point docs: Should be < 300 lines
  - Reference docs: Should be < 500 lines
  - Specialized docs: Can be longer

✓ Methodology documented:
  - Created: docs/methodology/role-based-documentation.md
  - Includes: Classification algorithm, optimization guide

[AUTOMATE Phase]
✓ Automation created:
  - Script: scripts/check-doc-health.sh
  - CI check: .github/workflows/doc-validation.yml
  - Capability: /meta doc-health

[EVOLUTION Phase]
✓ Applied to self:
  - Optimized README.md: 1909 → 275 lines (-85%)
  - Optimized CLAUDE.md: 607 → 278 lines (-54%)
  - Token cost reduction: 47%

✓ Next iteration insights:
  - New pattern: Cross-reference density
  - New automation: Auto-linking checker
```

### Example 2: Testing Strategy Evolution

```bash
# User: "Develop systematic testing methodology"
bootstrapped-se domain=testing observation_period=60

# Execution:

[OBSERVE Phase]
✓ Analyzing test patterns (60 days)...
  - Coverage: 75% (good)
  - Test distribution: Uneven (80% unit, 15% integration, 5% e2e)
  - Flaky tests: 12% failure rate
  - Slowest tests: integration (avg 45s)

[CODIFY Phase]
✓ Patterns identified:
  1. Coverage-driven development works
  2. Integration tests need fixtures
  3. Flaky tests have environmental dependencies

✓ Methodology created:
  - Test strategy matrix (when to write what)
  - Fixture patterns for integration tests
  - Flaky test elimination procedures

[AUTOMATE Phase]
✓ Automation implemented:
  - Coverage gate: Block if < 75%
  - Fixture generator: Auto-create from examples
  - Flaky detector: Flag unstable tests

✓ Results:
  - Coverage: 75% → 86%
  - Flaky rate: 12% → 2%
  - Test time: 45s → 12s (with parallel execution)
```

### Example 3: Error Recovery Patterns

```bash
# User: "Extract error recovery methodology"
bootstrapped-se domain=custom

# Execution:

[OBSERVE Phase]
✓ Analyzing error patterns...
  - Total errors: 423 (from sessions)
  - Error rate: 6.06%
  - Most common: Type errors (45%), Logic errors (30%), Deps (15%)

[CODIFY Phase]
✓ Pattern: Error Classification Taxonomy
  - Categories: Type, Logic, Dependency, Integration, Infrastructure
  - Recovery strategies per category
  - Prevention guidelines

✓ Methodology: Systematic Error Recovery
  - Detection: Error signature extraction
  - Classification: Rule-based categorization
  - Recovery: Strategy pattern application
  - Prevention: Root cause analysis → Code patterns

[AUTOMATE Phase]
✓ Tools created:
  - Error classifier (ML-based)
  - Recovery strategy recommender
  - Prevention linter (detect anti-patterns)

✓ CI/CD Integration:
  - Auto-classify build failures
  - Suggest recovery steps
  - Track error trends
```

---

## Validated Outcomes

**From meta-cc project** (8 experiments, 95% transferability):

### Documentation Methodology
- **Observation**: 423 file access patterns analyzed
- **Codification**: Role-based documentation methodology
- **Automation**: /meta doc-health capability
- **Result**: 47% token cost reduction, maintained accessibility

### Testing Strategy
- **Observation**: 75% coverage, uneven distribution
- **Codification**: Coverage-driven gap closure
- **Automation**: CI coverage gates, fixture generators
- **Result**: 75% → 86% coverage, 15x speedup vs ad-hoc

### Error Recovery
- **Observation**: 6.06% error rate, 423 errors analyzed
- **Codification**: Error taxonomy, recovery patterns
- **Automation**: Error classifier, recovery recommender
- **Result**: 85% transferability, systematic recovery

### Dependency Health
- **Observation**: 7 vulnerabilities, 11 outdated deps
- **Codification**: 6 patterns (vulnerability, update, license, etc.)
- **Automation**: 3 scripts + CI/CD workflow
- **Result**: 6x speedup (9h → 1.5h), 88% transferability

### Observability
- **Observation**: 0 logs, 0 metrics, 0 traces (baseline)
- **Codification**: Three Pillars methodology (Logging + Metrics + Tracing)
- **Automation**: Code generators, instrumentation templates
- **Result**: 23-46x speedup, 90-95% transferability

---

## Transferability

**95% transferable** across domains and projects:

### What Transfers (95%+)
- OCA framework itself (universal process)
- Self-referential feedback loop pattern
- Observation → Pattern → Automation pipeline
- Empirical validation approach
- Continuous evolution mindset

### What Needs Adaptation (5%)
- Specific observation tools (meta-cc → custom tools)
- Domain-specific patterns (docs vs testing vs architecture)
- Automation implementation details (language, platform)

### Adaptation Effort
- **Same project, new domain**: 2-4 hours
- **New project, same domain**: 4-8 hours
- **New project, new domain**: 8-16 hours

---

## Prerequisites

### Tools Required
- **Session analysis**: meta-cc or equivalent
- **Git analysis**: Git installed, access to repository
- **Metrics collection**: Coverage tools, static analyzers
- **Automation**: CI/CD platform (GitHub Actions, GitLab CI, etc.)

### Skills Required
- Basic data analysis (statistics, pattern recognition)
- Methodology documentation
- Scripting (bash, Python, or equivalent)
- CI/CD configuration

---

## Implementation Guidance

### Start Small
```bash
# Week 1: Observe
- Install meta-cc
- Track file accesses for 1 week
- Collect simple metrics

# Week 2: Codify
- Analyze top 10 access patterns
- Document 1-2 simple patterns
- Get team feedback

# Week 3: Automate
- Create 1 simple validation script
- Add to CI/CD
- Monitor compliance

# Week 4: Iterate
- Apply tools to development
- Discover new patterns
- Refine methodology
```

### Scale Up
```bash
# Month 2: Expand domains
- Apply to testing
- Apply to architecture
- Cross-validate patterns

# Month 3: Deep automation
- Build sophisticated checkers
- Integrate with IDE
- Create dashboards

# Month 4: Evolution
- Meta-patterns emerge
- Methodology generator
- Cross-project application
```

---

## Theoretical Foundation

### The Convergence Theorem

**Conjecture**: For any domain D, there exists a methodology M* such that:

1. **M* is locally optimal** for D (cannot be significantly improved)
2. **M* can be reached through bootstrapping** (systematic self-improvement)
3. **Convergence speed increases** with each iteration (learning effect)

**Implication**: We can **automatically discover** optimal methodologies for any domain.

### Scientific Method Analogy

```
1. Observation     = Instrumentation (meta-cc tools)
2. Hypothesis      = "CLAUDE.md should be <300 lines"
3. Experiment      = Implement constraint, measure effects
4. Data Collection = query-files, git log analysis
5. Analysis        = Calculate R/E ratio, access density
6. Conclusion      = "300-line limit effective: 47% reduction"
7. Publication     = Codify as methodology document
8. Replication     = Apply to other projects
```

---

## Success Criteria

| Metric | Target | Validation |
|--------|--------|------------|
| **Pattern Discovery** | ≥3 patterns per cycle | Documented patterns |
| **Methodology Quality** | Peer-reviewed | Team acceptance |
| **Automation Coverage** | ≥80% of patterns | CI integration |
| **Effectiveness** | ≥3x improvement | Before/after metrics |
| **Transferability** | ≥85% reusability | Cross-project validation |

---

## Domain Adaptation Guide

**Different domains have different complexity characteristics** that affect iteration count, agent needs, and convergence patterns. This guide helps predict and adapt to domain-specific challenges.

### Domain Complexity Classes

Based on 8 completed Bootstrap experiments, we've identified three complexity classes:

#### Simple Domains (3-4 iterations)

**Characteristics**:
- Well-defined problem space
- Clear success criteria
- Limited interdependencies
- Established best practices exist
- Straightforward automation

**Examples**:
- **Bootstrap-010 (Dependency Health)**: 3 iterations
  - Clear goals: vulnerabilities, freshness, licenses
  - Existing tools: govulncheck, go-licenses
  - Straightforward automation: CI/CD scripts
  - Converged fastest in series

- **Bootstrap-011 (Knowledge Transfer)**: 3-4 iterations
  - Well-understood domain: onboarding paths
  - Clear structure: Day-1, Week-1, Month-1
  - Existing patterns: progressive disclosure
  - High transferability (95%+)

**Adaptation Strategy**:
```markdown
Simple Domain Approach:
1. Start with generic agents only (coder, data-analyst, doc-writer)
2. Focus on automation (tools, scripts, CI)
3. Expect fast convergence (3-4 iterations)
4. Prioritize transferability (aim for 85%+)
5. Minimal agent specialization needed
```

**Expected Outcomes**:
- Iterations: 3-4
- Duration: 6-8 hours
- Specialized agents: 0-1
- Transferability: 85-95%
- V_instance: Often exceeds 0.80 significantly (e.g., 0.92)

#### Medium Complexity Domains (4-6 iterations)

**Characteristics**:
- Multiple dimensions to optimize
- Some ambiguity in success criteria
- Moderate interdependencies
- Require domain expertise
- Automation has nuances

**Examples**:
- **Bootstrap-001 (Documentation)**: 3 iterations (simple side of medium)
  - Multiple roles to define
  - Access patterns analysis needed
  - Search infrastructure complexity
  - 85% transferability

- **Bootstrap-002 (Testing)**: 5 iterations
  - Coverage vs quality trade-offs
  - Multiple test types (unit, integration, e2e)
  - Fixture patterns discovery
  - 89% transferability

- **Bootstrap-009 (Observability)**: 6 iterations
  - Three pillars (logging, metrics, tracing)
  - Performance vs verbosity trade-offs
  - Integration complexity
  - 90-95% transferability

**Adaptation Strategy**:
```markdown
Medium Domain Approach:
1. Start with generic agents, add 1-2 specialized as needed
2. Expect iterative refinement of value functions
3. Plan for 4-6 iterations
4. Balance instance and meta objectives equally
5. Document trade-offs explicitly
```

**Expected Outcomes**:
- Iterations: 4-6
- Duration: 8-12 hours
- Specialized agents: 1-3
- Transferability: 85-90%
- V_instance: Typically 0.80-0.87

#### Complex Domains (6-8+ iterations)

**Characteristics**:
- High interdependency
- Emergent patterns (not obvious upfront)
- Multiple competing objectives
- Requires novel agent capabilities
- Automation is sophisticated

**Examples**:
- **Bootstrap-013 (Cross-Cutting Concerns)**: 8 iterations
  - Pattern extraction from existing code
  - Convention definition ambiguity
  - Automated enforcement complexity
  - Large codebase scope (all modules)
  - Longest experiment but highest ROI (16.7x)

- **Bootstrap-003 (Error Recovery)**: 5 iterations (complex side)
  - Error taxonomy creation
  - Root cause diagnosis
  - Recovery strategy patterns
  - 85% transferability

- **Bootstrap-012 (Technical Debt)**: 4 iterations (medium-complex)
  - SQALE quantification
  - Prioritization complexity
  - Subjective vs objective debt
  - 85% transferability

**Adaptation Strategy**:
```markdown
Complex Domain Approach:
1. Expect agent evolution throughout
2. Plan for 6-8+ iterations
3. Accept lower initial V values (baseline often <0.35)
4. Focus on one dimension per iteration
5. Create specialized agents proactively when gaps identified
6. Document emergent patterns as discovered
```

**Expected Outcomes**:
- Iterations: 6-8+
- Duration: 12-18 hours
- Specialized agents: 3-5
- Transferability: 70-85%
- V_instance: Hard-earned 0.80-0.85
- Largest single-iteration gains possible (e.g., +27.3% in Bootstrap-013 Iteration 7)

### Domain-Specific Considerations

#### Documentation-Heavy Domains
**Examples**: Documentation (001), Knowledge Transfer (011)

**Key Adaptations**:
- Prioritize clarity over completeness
- Role-based structuring
- Accessibility optimization
- Cross-referencing systems

**Success Indicators**:
- Access/line ratio > 1.0
- User satisfaction surveys
- Search effectiveness

#### Technical Implementation Domains
**Examples**: Observability (009), Dependency Health (010)

**Key Adaptations**:
- Performance overhead monitoring
- Automation-first approach
- Integration testing critical
- CI/CD pipeline emphasis

**Success Indicators**:
- Automated coverage %
- Performance impact < 10%
- CI/CD reliability

#### Quality/Analysis Domains
**Examples**: Testing (002), Error Recovery (003), Technical Debt (012)

**Key Adaptations**:
- Quantification frameworks essential
- Baseline measurement critical
- Before/after comparisons
- Statistical validation

**Success Indicators**:
- Coverage metrics
- Error rate reduction
- Time savings quantified

#### Systematic Enforcement Domains
**Examples**: Cross-Cutting Concerns (013), Code Review (008 planned)

**Key Adaptations**:
- Pattern extraction from existing code
- Linter/checker development
- Gradual enforcement rollout
- Exception handling

**Success Indicators**:
- Pattern consistency %
- Violation detection rate
- Developer adoption rate

### Predicting Iteration Count

Based on empirical data from 8 experiments:

```
Base estimate: 5 iterations

Adjust based on:
  - Well-defined domain:        -2 iterations
  - Existing tools available:   -1 iteration
  - High interdependency:       +2 iterations
  - Novel patterns needed:      +1 iteration
  - Large codebase scope:       +1 iteration
  - Multiple competing goals:   +1 iteration

Examples:
  Dependency Health: 5 - 2 - 1 = 2 → actual 3 ✓
  Observability:     5 + 0 + 1 = 6 → actual 6 ✓
  Cross-Cutting:     5 + 2 + 1 = 8 → actual 8 ✓
```

### Agent Specialization Prediction

```
Generic agents sufficient when:
  - Domain has established patterns
  - Clear best practices exist
  - Automation is straightforward
  → Examples: Dependency Health, Knowledge Transfer

Specialized agents needed when:
  - Novel pattern extraction required
  - Domain-specific expertise needed
  - Complex analysis algorithms
  → Examples: Observability (log-analyzer, metric-designer)
                Cross-Cutting (pattern-extractor, convention-definer)

Rule of thumb:
  - Simple domains: 0-1 specialized agents
  - Medium domains: 1-3 specialized agents
  - Complex domains: 3-5 specialized agents
```

### Meta-Agent Evolution Prediction

**Key finding from 8 experiments**: **M₀ was sufficient in ALL cases**

```
Meta-Agent M₀ capabilities (5):
  1. observe: Pattern observation
  2. plan: Iteration planning
  3. execute: Agent orchestration
  4. reflect: Value assessment
  5. evolve: System evolution

No evolution needed because:
  - M₀ capabilities cover full lifecycle
  - Agent specialization handles domain gaps
  - Modular design allows capability reuse
```

**When to evolve Meta-Agent** (theoretical, not yet observed):
- Novel coordination pattern needed
- Capability gap in lifecycle
- Cross-agent orchestration complexity
- New convergence pattern discovered

### Convergence Pattern Prediction

Based on domain characteristics:

**Standard Dual Convergence** (most common):
- Both V_instance and V_meta reach 0.80+
- Examples: Observability (009), Dependency Health (010), Technical Debt (012), Cross-Cutting (013)
- **Use when**: Both objectives equally important

**Meta-Focused Convergence**:
- V_meta reaches 0.80+, V_instance practically sufficient
- Example: Knowledge Transfer (011) - V_meta = 0.877, V_instance = 0.585
- **Use when**: Methodology is primary goal, instance is vehicle

**Practical Convergence**:
- Combined quality exceeds metrics, justified partial criteria
- Example: Testing (002) - V_instance = 0.848, quality > coverage %
- **Use when**: Quality evidence exceeds raw numbers

### Domain Transfer Considerations

**Transferability varies by domain abstraction**:

```
High (90-95%):
  - Knowledge Transfer (95%+): Learning principles universal
  - Observability (90-95%): Three Pillars apply everywhere

Medium-High (85-90%):
  - Testing (89%): Test types similar across languages
  - Dependency Health (88%): Package manager patterns similar
  - Documentation (85%): Role-based structure universal
  - Error Recovery (85%): Error taxonomy concepts transfer
  - Technical Debt (85%): SQALE principles universal

Medium (70-85%):
  - Cross-Cutting Concerns (70-80%): Language-specific patterns
  - Refactoring (80% est.): Code smells vary by language
```

**Adaptation effort**:
```
Same language/ecosystem:     10-20% effort (adapt examples)
Similar language (Go→Rust):  30-40% effort (remap patterns)
Different paradigm (Go→JS):  50-60% effort (rethink patterns)
```

---

## Context Management for LLM Execution

λ(iteration, context_state) → (work_output, context_optimized) | context < limit:

**Context management is critical for LLM-based execution** where token limits constrain iteration depth and agent effectiveness.

### Context Allocation Protocol

```
context_allocation :: Phase → Percentage
context_allocation(phase) = match phase {
  observation → 0.30,    -- Data collection, pattern analysis
  codification → 0.40,   -- Documentation, methodology writing
  automation → 0.20,     -- Tool creation, CI integration
  reflection → 0.10      -- Evaluation, planning
} where Σ = 1.0
```

**Rationale**: Based on 8 experiments, codification consumes most context (methodology docs, agent definitions), followed by observation (data analysis), automation (code writing), and reflection (evaluation).

### Context Pressure Management

```
context_pressure :: State → Strategy
context_pressure(s) =
  if usage(s) > 0.80 then overflow_protocol(s)
  else if usage(s) > 0.50 then compression_protocol(s)
  else standard_protocol(s)
```

### Overflow Protocol (Context >80%)

```
overflow_protocol :: State → Action
overflow_protocol(s) = prioritize(
  serialize_to_disk: save(s.knowledge/*) ∧ compress(s.history),
  reference_compression: link(files) ∧ ¬inline(content),
  session_split: checkpoint(s) ∧ continue(s_{n+1}, fresh_context)
) where preserve_critical ∧ drop_redundant
```

**Actions**:
1. **Serialize to disk**: Save iteration state to `knowledge/` directory
2. **Reference compression**: Link to files instead of inlining content
3. **Session split**: Complete current phase, start new session for next iteration

**Example** (from Bootstrap-013, 8 iterations):
- Iteration 4: Context 85% → Serialized analysis to `knowledge/pattern-analysis.md`
- Iteration 5: Started fresh session, loaded serialized state via file references
- Result: Continued 4 more iterations without context overflow

### Compression Protocol (Context 50-80%)

```
compression_protocol :: State → Optimizations
compression_protocol(s) = apply(
  deduplication: merge(similar_patterns) ∧ reference_once,
  summarization: compress(historical_context) ∧ keep(structure),
  lazy_loading: defer(load) ∧ fetch_on_demand
)
```

**Optimizations**:
1. **Deduplication**: Merge similar patterns, reference once
2. **Summarization**: Compress historical iterations while preserving structure
3. **Lazy loading**: Load agent definitions only when invoked

### Convergence Adjustment Under Context Pressure

```
convergence_adjustment :: (Context, V_i, V_m) → Threshold
convergence_adjustment(ctx, V_i, V_m) =
  if usage(ctx) > 0.80 then
    prefer(meta_focused) ∧ accept(V_i ≥ 0.55 ∧ V_m ≥ 0.80)
  else if usage(ctx) > 0.50 then
    standard_dual ∧ target(V_i ≥ 0.80 ∧ V_m ≥ 0.80)
  else
    extended_optimization ∧ pursue(V_i ≥ 0.90)
```

**Principle**: Under high context pressure, prioritize methodology quality (V_meta) over instance quality (V_instance), as methodology is more transferable and valuable long-term.

**Validation** (Bootstrap-011):
- Context pressure: High (95%+ transferability methodology)
- Converged with: V_meta = 0.877, V_instance = 0.585
- Pattern: Meta-Focused Convergence justified by context constraints

### Context Tracking Metrics

```
context_metrics :: State → Metrics
context_metrics(s) = {
  usage_percentage: tokens_used / tokens_limit,
  phase_distribution: {obs: 0.30, cod: 0.40, aut: 0.20, ref: 0.10},
  compression_ratio: compressed_size / original_size,
  session_splits: count(checkpoints)
}
```

Track these metrics to predict when intervention needed.

---

## Prompt Evolution Protocol

λ(agent, effectiveness_data) → agent' | ∀evolution: evidence_driven:

**Systematic prompt engineering** based on empirical effectiveness data, not intuition.

### Core Prompt Patterns

```
prompt_pattern :: Pattern → Template
prompt_pattern(p) = match p {
  context_bounded:
    "Process {input} in chunks of {size}. For each chunk: {analysis}. Aggregate: {synthesis}.",

  tool_orchestrating:
    "Execute: {tool_sequence}. For each result: {validation}. If {condition}: {fallback}.",

  iterative_refinement:
    "Attempt: {approach_1}. Assess: {criteria}. If insufficient: {approach_2}. Repeat until: {threshold}.",

  evidence_accumulation:
    "Hypothesis: {H}. Seek confirming: {C}. Seek disconfirming: {D}. Weight: {W}. Decide: {decision}."
}
```

**Usage**:
- **context_bounded**: When processing large datasets (e.g., log analysis, file scanning)
- **tool_orchestrating**: When coordinating multiple MCP tools (e.g., query cascade)
- **iterative_refinement**: When solution quality improves through iteration (e.g., optimization)
- **evidence_accumulation**: When validating hypotheses (e.g., pattern discovery)

### Prompt Effectiveness Measurement

```
prompt_effectiveness :: Prompt → Metrics
prompt_effectiveness(P) = measure(
  convergence_contribution: ΔV_per_iteration,
  token_efficiency: output_value / tokens_used,
  error_rate: failures / total_invocations,
  reusability: cross_domain_success_rate
) where empirical_data ∧ comparative_baseline
```

**Metrics**:
1. **Convergence contribution**: How much does agent improve V_instance or V_meta per iteration?
2. **Token efficiency**: Value delivered per token consumed (cost-effectiveness)
3. **Error rate**: Percentage of invocations that fail or produce invalid output
4. **Reusability**: Success rate when applied to different domains

**Example** (from Bootstrap-009):
- log-analyzer agent:
  - ΔV_per_iteration: +0.12 average
  - Token efficiency: 0.85 (high value, moderate tokens)
  - Error rate: 3% (acceptable)
  - Reusability: 90% (worked in 009, 010, 012)
- Result: Prompt kept, agent reused in subsequent experiments

### Prompt Evolution Decision

```
prompt_evolution :: (P, Evidence) → P'
prompt_evolution(P, E) =
  if improvement_demonstrated(E) ∧ generalization_validated(E) then
    update(P → P') ∧ version(P'.version + 1) ∧ document(E.rationale)
  else
    maintain(P) ∧ log(evolution_rejected, E.reason)
  where ¬premature_optimization ∧ n_samples ≥ 3
```

**Evolution criteria**:
1. **Improvement demonstrated**: Evidence shows measurable improvement (ΔV > 0.05 or error_rate < 50%)
2. **Generalization validated**: Works across ≥3 different scenarios
3. **n_samples ≥ 3**: Avoid overfitting to single case

**Example** (theoretical - prompt evolution not yet observed in 8 experiments):
```
Original prompt: "Analyze logs for errors."
Evidence: Error detection rate 67%, false positives 23%

Evolved prompt: "Analyze {logs} for errors. For each: classify(type, severity, context). Filter: severity >= {threshold}. Output: structured_json."
Evidence: Error detection rate 89%, false positives 8%

Decision: Evolution accepted (improvement demonstrated, validated across 3 log types)
```

### Agent Prompt Protocol

```
agent_prompt_protocol :: Agent → Execution
agent_prompt_protocol(A) = ∀invocation:
  read(agents/{A.name}.md) ∧
  extract(prompt_latest_version) ∧
  apply(prompt) ∧
  track(effectiveness) ∧
  ¬cache_prompt
```

**Critical**: Always read agent definition fresh (no caching) to ensure latest prompt version used.

**Tracking**:
- Log each invocation: agent_name, prompt_version, input, output, success/failure
- Aggregate metrics: Calculate effectiveness scores periodically
- Trigger evolution: When n_samples ≥ 3 and improvement opportunity identified

---

## Relationship to Other Methodologies

**bootstrapped-se is the CORE framework** that integrates and extends two complementary methodologies.

### Relationship to empirical-methodology (Inclusion)

**bootstrapped-se INCLUDES AND EXTENDS empirical-methodology**:

```
empirical-methodology (5 phases):
  Observe → Analyze → Codify → Automate → Evolve

bootstrapped-se (OCA cycle + extensions):
  Observe ───────────→ Codify ────→ Automate
     ↑                                  ↓
     └─────────────── Evolve ──────────┘
     (Self-referential feedback loop)
```

**What bootstrapped-se adds beyond empirical-methodology**:
1. **Three-Tuple Output** (O, Aₙ, Mₙ) - Reusable artifacts at system level
2. **Agent Framework** - Specialized agents emerge from domain needs
3. **Meta-Agent System** - Modular capabilities for coordination
4. **Self-Referential Loop** - Framework applies to itself
5. **Formal Convergence** - System stability criteria (M_n == M_{n-1}, A_n == A_{n-1})

**When to use empirical-methodology explicitly**:
- Need detailed scientific method guidance
- Require fine-grained observation tool selection
- Want explicit separation of Analyze phase

**When to use bootstrapped-se**:
- **Always** - It's the core framework
- All Bootstrap experiments use bootstrapped-se as foundation
- Provides complete OCA cycle with agent system

### Relationship to value-optimization (Mutual Support)

**value-optimization PROVIDES QUANTIFICATION for bootstrapped-se**:

```
bootstrapped-se needs:          value-optimization provides:
- Quality measurement      →    Dual-layer value functions
- Convergence detection    →    Formal convergence criteria
- Evolution decisions      →    ΔV calculations, trends
- Success validation       →    V_instance ≥ 0.80, V_meta ≥ 0.80
```

**bootstrapped-se ENABLES value-optimization**:
- OCA cycle generates state transitions (s_i → s_{i+1})
- Agent work produces V_instance improvements
- Meta-Agent work produces V_meta improvements
- Iteration framework implements optimization loop

**When to use value-optimization**:
- **Always with bootstrapped-se** - Provides evaluation framework
- Calculate V_instance and V_meta at every iteration
- Check convergence criteria formally
- Compare across experiments

**Integration**:
```
Every bootstrapped-se iteration:
  1. Execute OCA cycle (Observe → Codify → Automate)
  2. Calculate V(s_n) using value-optimization
  3. Check convergence (system stable + dual threshold)
  4. If not converged: Continue iteration
  5. If converged: Generate (O, Aₙ, Mₙ)
```

### Three-Methodology Integration

**Complete workflow** (as used in all Bootstrap experiments):

```
┌─ methodology-framework ─────────────────────┐
│                                              │
│  ┌─ bootstrapped-se (CORE) ───────────────┐ │
│  │                                         │ │
│  │  ┌─ empirical-methodology ──────────┐  │ │
│  │  │                                   │  │ │
│  │  │  Observe + Analyze                │  │ │
│  │  │  Codify (with evidence)           │  │ │
│  │  │  Automate (CI/CD)                 │  │ │
│  │  │  Evolve (self-referential)        │  │ │
│  │  │                                   │  │ │
│  │  └───────────────────────────────────┘  │ │
│  │                 ↓                        │ │
│  │  Produce: (O, Aₙ, Mₙ)                   │ │
│  │                 ↓                        │ │
│  │  ┌─ value-optimization ──────────────┐  │ │
│  │  │                                   │  │ │
│  │  │  V_instance(s_n) = domain quality │  │ │
│  │  │  V_meta(s_n) = methodology quality│  │ │
│  │  │                                   │  │ │
│  │  │  Convergence check:               │  │ │
│  │  │  - System stable?                 │  │ │
│  │  │  - Dual threshold met?            │  │ │
│  │  │                                   │  │ │
│  │  └───────────────────────────────────┘  │ │
│  │                                         │ │
│  └─────────────────────────────────────────┘ │
│                                              │
└──────────────────────────────────────────────┘
```

**Usage Recommendation**:
- **Start here**: Read bootstrapped-se.md (this file)
- **Add evaluation**: Read value-optimization.md
- **Add rigor**: Read empirical-methodology.md (optional)
- **See integration**: Read bootstrapped-ai-methodology-engineering.md (BAIME framework)

---

## Related Skills

- **bootstrapped-ai-methodology-engineering**: Unified BAIME framework integrating all three methodologies
- **empirical-methodology**: Scientific foundation (included in bootstrapped-se)
- **value-optimization**: Quantitative evaluation framework (used by bootstrapped-se)
- **iteration-executor**: Implementation agent (coordinates bootstrapped-se execution)

---

## Knowledge Base

### Source Documentation
- **Core methodology**: `docs/methodology/bootstrapped-software-engineering.md`
- **Related**: `docs/methodology/empirical-methodology-development.md`
- **Examples**: `experiments/bootstrap-*/` (8 validated experiments)

### Key Concepts
- OCA Framework (Observe-Codify-Automate)
- Three-Tuple Output (O, Aₙ, Mₙ)
- Self-Referential Feedback Loop
- Convergence Theorem
- Meta-Methodology

---

## Version History

- **v1.0.0** (2025-10-18): Initial release
  - Based on meta-cc methodology development
  - 8 experiments validated (95% transferability)
  - OCA framework with 5-layer feedback loop
  - Empirical validation from 277 commits, 11 days

---

**Status**: ✅ Production-ready
**Validation**: 8 experiments (Bootstrap-001 to -013)
**Effectiveness**: 10-50x methodology development speedup
**Transferability**: 95% (framework universal, tools adaptable)