Files
gh-yaleh-meta-cc-claude/skills/methodology-bootstrapping/reference/observe-codify-automate.md
2025-11-30 09:07:22 +08:00

37 KiB

name, description, keywords, category, version, based_on, transferability, effectiveness
name description keywords category version based_on transferability effectiveness
bootstrapped-se Apply Bootstrapped Software Engineering (BSE) methodology to evolve project-specific development practices through systematic Observe-Codify-Automate cycles bootstrapping, meta-methodology, OCA, observe, codify, automate, self-improvement, empirical, methodology-development methodology 1.0.0 docs/methodology/bootstrapped-software-engineering.md 95% 10-50x methodology development speedup

Bootstrapped Software Engineering

Evolve project-specific methodologies through systematic observation, codification, and automation.

The best methodologies are not designed but evolved through systematic observation, codification, and automation of successful practices.


Core Insight

Traditional methodologies are theory-driven and static. Bootstrapped Software Engineering (BSE) enables development processes to:

  1. Observe themselves through instrumentation and data collection
  2. Codify discovered patterns into reusable methodologies
  3. Automate methodology enforcement and validation
  4. Self-improve by applying the methodology to its own evolution

Three-Tuple Output

Every BSE process produces:

(O, Aₙ, Mₙ)

where:
  O  = Task output (code, documentation, system)
  Aₙ = Converged agent set (reusable for similar tasks)
  Mₙ = Converged meta-agent (transferable to new domains)

The OCA Framework

Three-Phase Cycle: Observe → Codify → Automate

Phase 1: OBSERVE

Instrument your development process to collect data

Tools:

  • Session history analysis (meta-cc)
  • Git commit analysis
  • Code metrics (coverage, complexity)
  • Access pattern tracking
  • Error rate monitoring

Example (from meta-cc):

# Analyze file access patterns
meta-cc query files --threshold 5

# Result: plan.md accessed 423 times (highest)
# Insight: Core reference document, needs optimization

Output: Empirical data about actual development patterns

Phase 2: CODIFY

Extract patterns and document as reusable methodologies

Process:

  1. Pattern Recognition: Identify recurring successful practices
  2. Hypothesis Formation: Formulate testable claims
  3. Documentation: Write methodology documents
  4. Validation: Test methodology on real scenarios

Example (from meta-cc):

# Discovered Pattern: Role-Based Documentation

Observation:
  - plan.md: 423 accesses (Coordination role)
  - CLAUDE.md: ~300 implicit loads (Entry Point role)
  - features.md: 89 accesses (Reference role)

Methodology:
  - Classify docs by actual access patterns
  - Optimize high-access docs for token efficiency
  - Create role-specific maintenance procedures

Validation:
  - CLAUDE.md reduction: 607 → 278 lines (-54%)
  - Token cost reduction: 47%
  - Access efficiency: Maintained

Output: Documented methodology with empirical validation

Phase 3: AUTOMATE

Convert methodology into automated checks and tools

Automation Levels:

  1. Detection: Automated pattern detection
  2. Validation: Check compliance with methodology
  3. Enforcement: CI/CD integration, block violations
  4. Suggestion: Automated fix recommendations

Example (from meta-cc):

# Automation: /meta doc-health capability

# Checks:
- Role classification compliance
- Token efficiency (lines < threshold)
- Cross-reference completeness
- Update frequency

# Actions:
- Flag oversized documents
- Suggest restructuring
- Validate role assignments

Output: Automated tools enforcing methodology


Self-Referential Feedback Loop

The ultimate power of BSE: Apply the methodology to improve itself

Layer 0: Basic Functionality
  → Build tools (meta-cc CLI)

Layer 1: Self-Observation
  → Use tools on self (query own sessions)
  → Discovery: Usage patterns, bottlenecks

Layer 2: Pattern Recognition
  → Analyze data (R/E ratio, access density)
  → Discovery: Document roles, optimization opportunities

Layer 3: Methodology Extraction
  → Codify patterns (role-based-documentation.md)
  → Definition: Classification algorithm, maintenance procedures

Layer 4: Tool Automation
  → Implement checks (/meta doc-health)
  → Auto-validate: Methodology compliance

Layer 5: Continuous Evolution
  → Apply tools to self
  → Discover new patterns → Update methodology → Update tools

This creates a closed loop: Tools improve tools, methodologies optimize methodologies.


Parameters

  • domain: documentation | testing | architecture | custom (default: custom)
  • observation_period: number of days/commits to analyze (default: auto-detect)
  • automation_level: detect | validate | enforce | suggest (default: validate)
  • iteration_count: number of OCA cycles (default: 3)

Execution Flow

Phase 1: Observation Setup

1. Identify observation targets
   - Code metrics (LOC, complexity, coverage)
   - Development patterns (commits, PRs, errors)
   - Access patterns (file reads, searches)
   - Quality metrics (test results, build time)

2. Install instrumentation
   - meta-cc integration (session analysis)
   - Git hooks (commit tracking)
   - Coverage tracking
   - CI/CD metrics

3. Collect baseline data
   - Run for observation_period
   - Generate initial reports
   - Identify data gaps

Phase 2: Pattern Analysis

4. Analyze collected data
   - Statistical analysis (frequencies, correlations)
   - Pattern recognition (recurring behaviors)
   - Anomaly detection (outliers, inefficiencies)

5. Formulate hypotheses
   - "High-access docs should be < 300 lines"
   - "Test coverage gaps correlate with bugs"
   - "Batch remediation is 5x more efficient"

6. Validate hypotheses
   - Historical data validation
   - A/B testing if possible
   - Expert review

Phase 3: Codification

7. Document patterns
   - Pattern name and description
   - Context and applicability
   - Implementation steps
   - Validation criteria
   - Examples and counter-examples

8. Create methodology
   - Problem statement
   - Solution approach
   - Procedures and guidelines
   - Metrics and validation

9. Peer review
   - Team review
   - Iterate based on feedback

Phase 4: Automation

10. Design automation
    - Detection: Identify when pattern applies
    - Validation: Check compliance
    - Enforcement: Prevent violations
    - Suggestion: Recommend fixes

11. Implement tools
    - Scripts (bash, Python)
    - CI/CD integration
    - IDE plugins
    - Bot automation

12. Deploy and monitor
    - Gradual rollout
    - Collect usage data
    - Measure effectiveness

Phase 5: Evolution

13. Apply to self
    - Use tools on development process
    - Discover meta-patterns
    - Optimize methodology

14. Iterate OCA cycle
    - New observations  New patterns
    - Refined hypotheses  Better validation
    - Enhanced automation  Higher compliance

Usage Examples

Example 1: Documentation Optimization

# User: "Optimize project documentation"
bootstrapped-se domain=documentation

# Execution:

[OBSERVE Phase]
✓ Analyzing file access patterns (30 days)...
  - README.md: 423 accesses (Entry Point)
  - ARCHITECTURE.md: 89 accesses (Reference)
  - API.md: 234 accesses (Reference)

✓ Measuring token efficiency...
  - README.md: 1909 lines (HIGH - inefficient)
  - ARCHITECTURE.md: 456 lines (OK)
  - API.md: 789 lines (MEDIUM)

[CODIFY Phase]
✓ Pattern identified: Role-Based Documentation
  - Entry Point docs: Should be < 300 lines
  - Reference docs: Should be < 500 lines
  - Specialized docs: Can be longer

✓ Methodology documented:
  - Created: docs/methodology/role-based-documentation.md
  - Includes: Classification algorithm, optimization guide

[AUTOMATE Phase]
✓ Automation created:
  - Script: scripts/check-doc-health.sh
  - CI check: .github/workflows/doc-validation.yml
  - Capability: /meta doc-health

[EVOLUTION Phase]
✓ Applied to self:
  - Optimized README.md: 1909275 lines (-85%)
  - Optimized CLAUDE.md: 607278 lines (-54%)
  - Token cost reduction: 47%

✓ Next iteration insights:
  - New pattern: Cross-reference density
  - New automation: Auto-linking checker

Example 2: Testing Strategy Evolution

# User: "Develop systematic testing methodology"
bootstrapped-se domain=testing observation_period=60

# Execution:

[OBSERVE Phase]
✓ Analyzing test patterns (60 days)...
  - Coverage: 75% (good)
  - Test distribution: Uneven (80% unit, 15% integration, 5% e2e)
  - Flaky tests: 12% failure rate
  - Slowest tests: integration (avg 45s)

[CODIFY Phase]
✓ Patterns identified:
  1. Coverage-driven development works
  2. Integration tests need fixtures
  3. Flaky tests have environmental dependencies

✓ Methodology created:
  - Test strategy matrix (when to write what)
  - Fixture patterns for integration tests
  - Flaky test elimination procedures

[AUTOMATE Phase]
✓ Automation implemented:
  - Coverage gate: Block if < 75%
  - Fixture generator: Auto-create from examples
  - Flaky detector: Flag unstable tests

✓ Results:
  - Coverage: 75% → 86%
  - Flaky rate: 12% → 2%
  - Test time: 45s → 12s (with parallel execution)

Example 3: Error Recovery Patterns

# User: "Extract error recovery methodology"
bootstrapped-se domain=custom

# Execution:

[OBSERVE Phase]
✓ Analyzing error patterns...
  - Total errors: 423 (from sessions)
  - Error rate: 6.06%
  - Most common: Type errors (45%), Logic errors (30%), Deps (15%)

[CODIFY Phase]
✓ Pattern: Error Classification Taxonomy
  - Categories: Type, Logic, Dependency, Integration, Infrastructure
  - Recovery strategies per category
  - Prevention guidelines

✓ Methodology: Systematic Error Recovery
  - Detection: Error signature extraction
  - Classification: Rule-based categorization
  - Recovery: Strategy pattern application
  - Prevention: Root cause analysis → Code patterns

[AUTOMATE Phase]
✓ Tools created:
  - Error classifier (ML-based)
  - Recovery strategy recommender
  - Prevention linter (detect anti-patterns)

✓ CI/CD Integration:
  - Auto-classify build failures
  - Suggest recovery steps
  - Track error trends

Validated Outcomes

From meta-cc project (8 experiments, 95% transferability):

Documentation Methodology

  • Observation: 423 file access patterns analyzed
  • Codification: Role-based documentation methodology
  • Automation: /meta doc-health capability
  • Result: 47% token cost reduction, maintained accessibility

Testing Strategy

  • Observation: 75% coverage, uneven distribution
  • Codification: Coverage-driven gap closure
  • Automation: CI coverage gates, fixture generators
  • Result: 75% → 86% coverage, 15x speedup vs ad-hoc

Error Recovery

  • Observation: 6.06% error rate, 423 errors analyzed
  • Codification: Error taxonomy, recovery patterns
  • Automation: Error classifier, recovery recommender
  • Result: 85% transferability, systematic recovery

Dependency Health

  • Observation: 7 vulnerabilities, 11 outdated deps
  • Codification: 6 patterns (vulnerability, update, license, etc.)
  • Automation: 3 scripts + CI/CD workflow
  • Result: 6x speedup (9h → 1.5h), 88% transferability

Observability

  • Observation: 0 logs, 0 metrics, 0 traces (baseline)
  • Codification: Three Pillars methodology (Logging + Metrics + Tracing)
  • Automation: Code generators, instrumentation templates
  • Result: 23-46x speedup, 90-95% transferability

Transferability

95% transferable across domains and projects:

What Transfers (95%+)

  • OCA framework itself (universal process)
  • Self-referential feedback loop pattern
  • Observation → Pattern → Automation pipeline
  • Empirical validation approach
  • Continuous evolution mindset

What Needs Adaptation (5%)

  • Specific observation tools (meta-cc → custom tools)
  • Domain-specific patterns (docs vs testing vs architecture)
  • Automation implementation details (language, platform)

Adaptation Effort

  • Same project, new domain: 2-4 hours
  • New project, same domain: 4-8 hours
  • New project, new domain: 8-16 hours

Prerequisites

Tools Required

  • Session analysis: meta-cc or equivalent
  • Git analysis: Git installed, access to repository
  • Metrics collection: Coverage tools, static analyzers
  • Automation: CI/CD platform (GitHub Actions, GitLab CI, etc.)

Skills Required

  • Basic data analysis (statistics, pattern recognition)
  • Methodology documentation
  • Scripting (bash, Python, or equivalent)
  • CI/CD configuration

Implementation Guidance

Start Small

# Week 1: Observe
- Install meta-cc
- Track file accesses for 1 week
- Collect simple metrics

# Week 2: Codify
- Analyze top 10 access patterns
- Document 1-2 simple patterns
- Get team feedback

# Week 3: Automate
- Create 1 simple validation script
- Add to CI/CD
- Monitor compliance

# Week 4: Iterate
- Apply tools to development
- Discover new patterns
- Refine methodology

Scale Up

# Month 2: Expand domains
- Apply to testing
- Apply to architecture
- Cross-validate patterns

# Month 3: Deep automation
- Build sophisticated checkers
- Integrate with IDE
- Create dashboards

# Month 4: Evolution
- Meta-patterns emerge
- Methodology generator
- Cross-project application

Theoretical Foundation

The Convergence Theorem

Conjecture: For any domain D, there exists a methodology M* such that:

  1. M is locally optimal* for D (cannot be significantly improved)
  2. M can be reached through bootstrapping* (systematic self-improvement)
  3. Convergence speed increases with each iteration (learning effect)

Implication: We can automatically discover optimal methodologies for any domain.

Scientific Method Analogy

1. Observation     = Instrumentation (meta-cc tools)
2. Hypothesis      = "CLAUDE.md should be <300 lines"
3. Experiment      = Implement constraint, measure effects
4. Data Collection = query-files, git log analysis
5. Analysis        = Calculate R/E ratio, access density
6. Conclusion      = "300-line limit effective: 47% reduction"
7. Publication     = Codify as methodology document
8. Replication     = Apply to other projects

Success Criteria

Metric Target Validation
Pattern Discovery ≥3 patterns per cycle Documented patterns
Methodology Quality Peer-reviewed Team acceptance
Automation Coverage ≥80% of patterns CI integration
Effectiveness ≥3x improvement Before/after metrics
Transferability ≥85% reusability Cross-project validation

Domain Adaptation Guide

Different domains have different complexity characteristics that affect iteration count, agent needs, and convergence patterns. This guide helps predict and adapt to domain-specific challenges.

Domain Complexity Classes

Based on 8 completed Bootstrap experiments, we've identified three complexity classes:

Simple Domains (3-4 iterations)

Characteristics:

  • Well-defined problem space
  • Clear success criteria
  • Limited interdependencies
  • Established best practices exist
  • Straightforward automation

Examples:

  • Bootstrap-010 (Dependency Health): 3 iterations

    • Clear goals: vulnerabilities, freshness, licenses
    • Existing tools: govulncheck, go-licenses
    • Straightforward automation: CI/CD scripts
    • Converged fastest in series
  • Bootstrap-011 (Knowledge Transfer): 3-4 iterations

    • Well-understood domain: onboarding paths
    • Clear structure: Day-1, Week-1, Month-1
    • Existing patterns: progressive disclosure
    • High transferability (95%+)

Adaptation Strategy:

Simple Domain Approach:
1. Start with generic agents only (coder, data-analyst, doc-writer)
2. Focus on automation (tools, scripts, CI)
3. Expect fast convergence (3-4 iterations)
4. Prioritize transferability (aim for 85%+)
5. Minimal agent specialization needed

Expected Outcomes:

  • Iterations: 3-4
  • Duration: 6-8 hours
  • Specialized agents: 0-1
  • Transferability: 85-95%
  • V_instance: Often exceeds 0.80 significantly (e.g., 0.92)

Medium Complexity Domains (4-6 iterations)

Characteristics:

  • Multiple dimensions to optimize
  • Some ambiguity in success criteria
  • Moderate interdependencies
  • Require domain expertise
  • Automation has nuances

Examples:

  • Bootstrap-001 (Documentation): 3 iterations (simple side of medium)

    • Multiple roles to define
    • Access patterns analysis needed
    • Search infrastructure complexity
    • 85% transferability
  • Bootstrap-002 (Testing): 5 iterations

    • Coverage vs quality trade-offs
    • Multiple test types (unit, integration, e2e)
    • Fixture patterns discovery
    • 89% transferability
  • Bootstrap-009 (Observability): 6 iterations

    • Three pillars (logging, metrics, tracing)
    • Performance vs verbosity trade-offs
    • Integration complexity
    • 90-95% transferability

Adaptation Strategy:

Medium Domain Approach:
1. Start with generic agents, add 1-2 specialized as needed
2. Expect iterative refinement of value functions
3. Plan for 4-6 iterations
4. Balance instance and meta objectives equally
5. Document trade-offs explicitly

Expected Outcomes:

  • Iterations: 4-6
  • Duration: 8-12 hours
  • Specialized agents: 1-3
  • Transferability: 85-90%
  • V_instance: Typically 0.80-0.87

Complex Domains (6-8+ iterations)

Characteristics:

  • High interdependency
  • Emergent patterns (not obvious upfront)
  • Multiple competing objectives
  • Requires novel agent capabilities
  • Automation is sophisticated

Examples:

  • Bootstrap-013 (Cross-Cutting Concerns): 8 iterations

    • Pattern extraction from existing code
    • Convention definition ambiguity
    • Automated enforcement complexity
    • Large codebase scope (all modules)
    • Longest experiment but highest ROI (16.7x)
  • Bootstrap-003 (Error Recovery): 5 iterations (complex side)

    • Error taxonomy creation
    • Root cause diagnosis
    • Recovery strategy patterns
    • 85% transferability
  • Bootstrap-012 (Technical Debt): 4 iterations (medium-complex)

    • SQALE quantification
    • Prioritization complexity
    • Subjective vs objective debt
    • 85% transferability

Adaptation Strategy:

Complex Domain Approach:
1. Expect agent evolution throughout
2. Plan for 6-8+ iterations
3. Accept lower initial V values (baseline often <0.35)
4. Focus on one dimension per iteration
5. Create specialized agents proactively when gaps identified
6. Document emergent patterns as discovered

Expected Outcomes:

  • Iterations: 6-8+
  • Duration: 12-18 hours
  • Specialized agents: 3-5
  • Transferability: 70-85%
  • V_instance: Hard-earned 0.80-0.85
  • Largest single-iteration gains possible (e.g., +27.3% in Bootstrap-013 Iteration 7)

Domain-Specific Considerations

Documentation-Heavy Domains

Examples: Documentation (001), Knowledge Transfer (011)

Key Adaptations:

  • Prioritize clarity over completeness
  • Role-based structuring
  • Accessibility optimization
  • Cross-referencing systems

Success Indicators:

  • Access/line ratio > 1.0
  • User satisfaction surveys
  • Search effectiveness

Technical Implementation Domains

Examples: Observability (009), Dependency Health (010)

Key Adaptations:

  • Performance overhead monitoring
  • Automation-first approach
  • Integration testing critical
  • CI/CD pipeline emphasis

Success Indicators:

  • Automated coverage %
  • Performance impact < 10%
  • CI/CD reliability

Quality/Analysis Domains

Examples: Testing (002), Error Recovery (003), Technical Debt (012)

Key Adaptations:

  • Quantification frameworks essential
  • Baseline measurement critical
  • Before/after comparisons
  • Statistical validation

Success Indicators:

  • Coverage metrics
  • Error rate reduction
  • Time savings quantified

Systematic Enforcement Domains

Examples: Cross-Cutting Concerns (013), Code Review (008 planned)

Key Adaptations:

  • Pattern extraction from existing code
  • Linter/checker development
  • Gradual enforcement rollout
  • Exception handling

Success Indicators:

  • Pattern consistency %
  • Violation detection rate
  • Developer adoption rate

Predicting Iteration Count

Based on empirical data from 8 experiments:

Base estimate: 5 iterations

Adjust based on:
  - Well-defined domain:        -2 iterations
  - Existing tools available:   -1 iteration
  - High interdependency:       +2 iterations
  - Novel patterns needed:      +1 iteration
  - Large codebase scope:       +1 iteration
  - Multiple competing goals:   +1 iteration

Examples:
  Dependency Health: 5 - 2 - 1 = 2 → actual 3 ✓
  Observability:     5 + 0 + 1 = 6 → actual 6 ✓
  Cross-Cutting:     5 + 2 + 1 = 8 → actual 8 ✓

Agent Specialization Prediction

Generic agents sufficient when:
  - Domain has established patterns
  - Clear best practices exist
  - Automation is straightforward
  → Examples: Dependency Health, Knowledge Transfer

Specialized agents needed when:
  - Novel pattern extraction required
  - Domain-specific expertise needed
  - Complex analysis algorithms
  → Examples: Observability (log-analyzer, metric-designer)
                Cross-Cutting (pattern-extractor, convention-definer)

Rule of thumb:
  - Simple domains: 0-1 specialized agents
  - Medium domains: 1-3 specialized agents
  - Complex domains: 3-5 specialized agents

Meta-Agent Evolution Prediction

Key finding from 8 experiments: M₀ was sufficient in ALL cases

Meta-Agent M₀ capabilities (5):
  1. observe: Pattern observation
  2. plan: Iteration planning
  3. execute: Agent orchestration
  4. reflect: Value assessment
  5. evolve: System evolution

No evolution needed because:
  - M₀ capabilities cover full lifecycle
  - Agent specialization handles domain gaps
  - Modular design allows capability reuse

When to evolve Meta-Agent (theoretical, not yet observed):

  • Novel coordination pattern needed
  • Capability gap in lifecycle
  • Cross-agent orchestration complexity
  • New convergence pattern discovered

Convergence Pattern Prediction

Based on domain characteristics:

Standard Dual Convergence (most common):

  • Both V_instance and V_meta reach 0.80+
  • Examples: Observability (009), Dependency Health (010), Technical Debt (012), Cross-Cutting (013)
  • Use when: Both objectives equally important

Meta-Focused Convergence:

  • V_meta reaches 0.80+, V_instance practically sufficient
  • Example: Knowledge Transfer (011) - V_meta = 0.877, V_instance = 0.585
  • Use when: Methodology is primary goal, instance is vehicle

Practical Convergence:

  • Combined quality exceeds metrics, justified partial criteria
  • Example: Testing (002) - V_instance = 0.848, quality > coverage %
  • Use when: Quality evidence exceeds raw numbers

Domain Transfer Considerations

Transferability varies by domain abstraction:

High (90-95%):
  - Knowledge Transfer (95%+): Learning principles universal
  - Observability (90-95%): Three Pillars apply everywhere

Medium-High (85-90%):
  - Testing (89%): Test types similar across languages
  - Dependency Health (88%): Package manager patterns similar
  - Documentation (85%): Role-based structure universal
  - Error Recovery (85%): Error taxonomy concepts transfer
  - Technical Debt (85%): SQALE principles universal

Medium (70-85%):
  - Cross-Cutting Concerns (70-80%): Language-specific patterns
  - Refactoring (80% est.): Code smells vary by language

Adaptation effort:

Same language/ecosystem:     10-20% effort (adapt examples)
Similar language (Go→Rust):  30-40% effort (remap patterns)
Different paradigm (Go→JS):  50-60% effort (rethink patterns)

Context Management for LLM Execution

λ(iteration, context_state) → (work_output, context_optimized) | context < limit:

Context management is critical for LLM-based execution where token limits constrain iteration depth and agent effectiveness.

Context Allocation Protocol

context_allocation :: Phase → Percentage
context_allocation(phase) = match phase {
  observation → 0.30,    -- Data collection, pattern analysis
  codification → 0.40,   -- Documentation, methodology writing
  automation → 0.20,     -- Tool creation, CI integration
  reflection → 0.10      -- Evaluation, planning
} where Σ = 1.0

Rationale: Based on 8 experiments, codification consumes most context (methodology docs, agent definitions), followed by observation (data analysis), automation (code writing), and reflection (evaluation).

Context Pressure Management

context_pressure :: State → Strategy
context_pressure(s) =
  if usage(s) > 0.80 then overflow_protocol(s)
  else if usage(s) > 0.50 then compression_protocol(s)
  else standard_protocol(s)

Overflow Protocol (Context >80%)

overflow_protocol :: State → Action
overflow_protocol(s) = prioritize(
  serialize_to_disk: save(s.knowledge/*) ∧ compress(s.history),
  reference_compression: link(files) ∧ ¬inline(content),
  session_split: checkpoint(s) ∧ continue(s_{n+1}, fresh_context)
) where preserve_critical ∧ drop_redundant

Actions:

  1. Serialize to disk: Save iteration state to knowledge/ directory
  2. Reference compression: Link to files instead of inlining content
  3. Session split: Complete current phase, start new session for next iteration

Example (from Bootstrap-013, 8 iterations):

  • Iteration 4: Context 85% → Serialized analysis to knowledge/pattern-analysis.md
  • Iteration 5: Started fresh session, loaded serialized state via file references
  • Result: Continued 4 more iterations without context overflow

Compression Protocol (Context 50-80%)

compression_protocol :: State → Optimizations
compression_protocol(s) = apply(
  deduplication: merge(similar_patterns) ∧ reference_once,
  summarization: compress(historical_context) ∧ keep(structure),
  lazy_loading: defer(load) ∧ fetch_on_demand
)

Optimizations:

  1. Deduplication: Merge similar patterns, reference once
  2. Summarization: Compress historical iterations while preserving structure
  3. Lazy loading: Load agent definitions only when invoked

Convergence Adjustment Under Context Pressure

convergence_adjustment :: (Context, V_i, V_m) → Threshold
convergence_adjustment(ctx, V_i, V_m) =
  if usage(ctx) > 0.80 then
    prefer(meta_focused) ∧ accept(V_i ≥ 0.55 ∧ V_m ≥ 0.80)
  else if usage(ctx) > 0.50 then
    standard_dual ∧ target(V_i ≥ 0.80 ∧ V_m ≥ 0.80)
  else
    extended_optimization ∧ pursue(V_i ≥ 0.90)

Principle: Under high context pressure, prioritize methodology quality (V_meta) over instance quality (V_instance), as methodology is more transferable and valuable long-term.

Validation (Bootstrap-011):

  • Context pressure: High (95%+ transferability methodology)
  • Converged with: V_meta = 0.877, V_instance = 0.585
  • Pattern: Meta-Focused Convergence justified by context constraints

Context Tracking Metrics

context_metrics :: State → Metrics
context_metrics(s) = {
  usage_percentage: tokens_used / tokens_limit,
  phase_distribution: {obs: 0.30, cod: 0.40, aut: 0.20, ref: 0.10},
  compression_ratio: compressed_size / original_size,
  session_splits: count(checkpoints)
}

Track these metrics to predict when intervention needed.


Prompt Evolution Protocol

λ(agent, effectiveness_data) → agent' | ∀evolution: evidence_driven:

Systematic prompt engineering based on empirical effectiveness data, not intuition.

Core Prompt Patterns

prompt_pattern :: Pattern → Template
prompt_pattern(p) = match p {
  context_bounded:
    "Process {input} in chunks of {size}. For each chunk: {analysis}. Aggregate: {synthesis}.",

  tool_orchestrating:
    "Execute: {tool_sequence}. For each result: {validation}. If {condition}: {fallback}.",

  iterative_refinement:
    "Attempt: {approach_1}. Assess: {criteria}. If insufficient: {approach_2}. Repeat until: {threshold}.",

  evidence_accumulation:
    "Hypothesis: {H}. Seek confirming: {C}. Seek disconfirming: {D}. Weight: {W}. Decide: {decision}."
}

Usage:

  • context_bounded: When processing large datasets (e.g., log analysis, file scanning)
  • tool_orchestrating: When coordinating multiple MCP tools (e.g., query cascade)
  • iterative_refinement: When solution quality improves through iteration (e.g., optimization)
  • evidence_accumulation: When validating hypotheses (e.g., pattern discovery)

Prompt Effectiveness Measurement

prompt_effectiveness :: Prompt → Metrics
prompt_effectiveness(P) = measure(
  convergence_contribution: ΔV_per_iteration,
  token_efficiency: output_value / tokens_used,
  error_rate: failures / total_invocations,
  reusability: cross_domain_success_rate
) where empirical_data ∧ comparative_baseline

Metrics:

  1. Convergence contribution: How much does agent improve V_instance or V_meta per iteration?
  2. Token efficiency: Value delivered per token consumed (cost-effectiveness)
  3. Error rate: Percentage of invocations that fail or produce invalid output
  4. Reusability: Success rate when applied to different domains

Example (from Bootstrap-009):

  • log-analyzer agent:
    • ΔV_per_iteration: +0.12 average
    • Token efficiency: 0.85 (high value, moderate tokens)
    • Error rate: 3% (acceptable)
    • Reusability: 90% (worked in 009, 010, 012)
  • Result: Prompt kept, agent reused in subsequent experiments

Prompt Evolution Decision

prompt_evolution :: (P, Evidence) → P'
prompt_evolution(P, E) =
  if improvement_demonstrated(E) ∧ generalization_validated(E) then
    update(P → P') ∧ version(P'.version + 1) ∧ document(E.rationale)
  else
    maintain(P) ∧ log(evolution_rejected, E.reason)
  where ¬premature_optimization ∧ n_samples ≥ 3

Evolution criteria:

  1. Improvement demonstrated: Evidence shows measurable improvement (ΔV > 0.05 or error_rate < 50%)
  2. Generalization validated: Works across ≥3 different scenarios
  3. n_samples ≥ 3: Avoid overfitting to single case

Example (theoretical - prompt evolution not yet observed in 8 experiments):

Original prompt: "Analyze logs for errors."
Evidence: Error detection rate 67%, false positives 23%

Evolved prompt: "Analyze {logs} for errors. For each: classify(type, severity, context). Filter: severity >= {threshold}. Output: structured_json."
Evidence: Error detection rate 89%, false positives 8%

Decision: Evolution accepted (improvement demonstrated, validated across 3 log types)

Agent Prompt Protocol

agent_prompt_protocol :: Agent → Execution
agent_prompt_protocol(A) = ∀invocation:
  read(agents/{A.name}.md) ∧
  extract(prompt_latest_version) ∧
  apply(prompt) ∧
  track(effectiveness) ∧
  ¬cache_prompt

Critical: Always read agent definition fresh (no caching) to ensure latest prompt version used.

Tracking:

  • Log each invocation: agent_name, prompt_version, input, output, success/failure
  • Aggregate metrics: Calculate effectiveness scores periodically
  • Trigger evolution: When n_samples ≥ 3 and improvement opportunity identified

Relationship to Other Methodologies

bootstrapped-se is the CORE framework that integrates and extends two complementary methodologies.

Relationship to empirical-methodology (Inclusion)

bootstrapped-se INCLUDES AND EXTENDS empirical-methodology:

empirical-methodology (5 phases):
  Observe → Analyze → Codify → Automate → Evolve

bootstrapped-se (OCA cycle + extensions):
  Observe ───────────→ Codify ────→ Automate
     ↑                                  ↓
     └─────────────── Evolve ──────────┘
     (Self-referential feedback loop)

What bootstrapped-se adds beyond empirical-methodology:

  1. Three-Tuple Output (O, Aₙ, Mₙ) - Reusable artifacts at system level
  2. Agent Framework - Specialized agents emerge from domain needs
  3. Meta-Agent System - Modular capabilities for coordination
  4. Self-Referential Loop - Framework applies to itself
  5. Formal Convergence - System stability criteria (M_n == M_{n-1}, A_n == A_{n-1})

When to use empirical-methodology explicitly:

  • Need detailed scientific method guidance
  • Require fine-grained observation tool selection
  • Want explicit separation of Analyze phase

When to use bootstrapped-se:

  • Always - It's the core framework
  • All Bootstrap experiments use bootstrapped-se as foundation
  • Provides complete OCA cycle with agent system

Relationship to value-optimization (Mutual Support)

value-optimization PROVIDES QUANTIFICATION for bootstrapped-se:

bootstrapped-se needs:          value-optimization provides:
- Quality measurement      →    Dual-layer value functions
- Convergence detection    →    Formal convergence criteria
- Evolution decisions      →    ΔV calculations, trends
- Success validation       →    V_instance ≥ 0.80, V_meta ≥ 0.80

bootstrapped-se ENABLES value-optimization:

  • OCA cycle generates state transitions (s_i → s_{i+1})
  • Agent work produces V_instance improvements
  • Meta-Agent work produces V_meta improvements
  • Iteration framework implements optimization loop

When to use value-optimization:

  • Always with bootstrapped-se - Provides evaluation framework
  • Calculate V_instance and V_meta at every iteration
  • Check convergence criteria formally
  • Compare across experiments

Integration:

Every bootstrapped-se iteration:
  1. Execute OCA cycle (Observe → Codify → Automate)
  2. Calculate V(s_n) using value-optimization
  3. Check convergence (system stable + dual threshold)
  4. If not converged: Continue iteration
  5. If converged: Generate (O, Aₙ, Mₙ)

Three-Methodology Integration

Complete workflow (as used in all Bootstrap experiments):

┌─ methodology-framework ─────────────────────┐
│                                              │
│  ┌─ bootstrapped-se (CORE) ───────────────┐ │
│  │                                         │ │
│  │  ┌─ empirical-methodology ──────────┐  │ │
│  │  │                                   │  │ │
│  │  │  Observe + Analyze                │  │ │
│  │  │  Codify (with evidence)           │  │ │
│  │  │  Automate (CI/CD)                 │  │ │
│  │  │  Evolve (self-referential)        │  │ │
│  │  │                                   │  │ │
│  │  └───────────────────────────────────┘  │ │
│  │                 ↓                        │ │
│  │  Produce: (O, Aₙ, Mₙ)                   │ │
│  │                 ↓                        │ │
│  │  ┌─ value-optimization ──────────────┐  │ │
│  │  │                                   │  │ │
│  │  │  V_instance(s_n) = domain quality │  │ │
│  │  │  V_meta(s_n) = methodology quality│  │ │
│  │  │                                   │  │ │
│  │  │  Convergence check:               │  │ │
│  │  │  - System stable?                 │  │ │
│  │  │  - Dual threshold met?            │  │ │
│  │  │                                   │  │ │
│  │  └───────────────────────────────────┘  │ │
│  │                                         │ │
│  └─────────────────────────────────────────┘ │
│                                              │
└──────────────────────────────────────────────┘

Usage Recommendation:

  • Start here: Read bootstrapped-se.md (this file)
  • Add evaluation: Read value-optimization.md
  • Add rigor: Read empirical-methodology.md (optional)
  • See integration: Read bootstrapped-ai-methodology-engineering.md (BAIME framework)

  • bootstrapped-ai-methodology-engineering: Unified BAIME framework integrating all three methodologies
  • empirical-methodology: Scientific foundation (included in bootstrapped-se)
  • value-optimization: Quantitative evaluation framework (used by bootstrapped-se)
  • iteration-executor: Implementation agent (coordinates bootstrapped-se execution)

Knowledge Base

Source Documentation

  • Core methodology: docs/methodology/bootstrapped-software-engineering.md
  • Related: docs/methodology/empirical-methodology-development.md
  • Examples: experiments/bootstrap-*/ (8 validated experiments)

Key Concepts

  • OCA Framework (Observe-Codify-Automate)
  • Three-Tuple Output (O, Aₙ, Mₙ)
  • Self-Referential Feedback Loop
  • Convergence Theorem
  • Meta-Methodology

Version History

  • v1.0.0 (2025-10-18): Initial release
    • Based on meta-cc methodology development
    • 8 experiments validated (95% transferability)
    • OCA framework with 5-layer feedback loop
    • Empirical validation from 277 commits, 11 days

Status: Production-ready Validation: 8 experiments (Bootstrap-001 to -013) Effectiveness: 10-50x methodology development speedup Transferability: 95% (framework universal, tools adaptable)