Files

Zhongwei Li fab98d059b Initial commit

2025-11-30 09:07:22 +08:00

37 KiB

Raw Blame History

name, description, keywords, category, version, based_on, transferability, effectiveness

name	description	keywords	category	version	based_on	transferability	effectiveness
bootstrapped-se	Apply Bootstrapped Software Engineering (BSE) methodology to evolve project-specific development practices through systematic Observe-Codify-Automate cycles	bootstrapping, meta-methodology, OCA, observe, codify, automate, self-improvement, empirical, methodology-development	methodology	1.0.0	docs/methodology/bootstrapped-software-engineering.md	95%	10-50x methodology development speedup

Bootstrapped Software Engineering

Evolve project-specific methodologies through systematic observation, codification, and automation.

The best methodologies are not designed but evolved through systematic observation, codification, and automation of successful practices.

Core Insight

Traditional methodologies are theory-driven and static. Bootstrapped Software Engineering (BSE) enables development processes to:

Observe themselves through instrumentation and data collection
Codify discovered patterns into reusable methodologies
Automate methodology enforcement and validation
Self-improve by applying the methodology to its own evolution

Three-Tuple Output

Every BSE process produces:

(O, Aₙ, Mₙ)

where:
  O  = Task output (code, documentation, system)
  Aₙ = Converged agent set (reusable for similar tasks)
  Mₙ = Converged meta-agent (transferable to new domains)

The OCA Framework

Three-Phase Cycle: Observe → Codify → Automate

Phase 1: OBSERVE

Instrument your development process to collect data

Tools:

Session history analysis (meta-cc)
Git commit analysis
Code metrics (coverage, complexity)
Access pattern tracking
Error rate monitoring

Example (from meta-cc):

# Analyze file access patterns
meta-cc query files --threshold 5

# Result: plan.md accessed 423 times (highest)
# Insight: Core reference document, needs optimization

Output: Empirical data about actual development patterns

Phase 2: CODIFY

Extract patterns and document as reusable methodologies

Process:

Pattern Recognition: Identify recurring successful practices
Hypothesis Formation: Formulate testable claims
Documentation: Write methodology documents
Validation: Test methodology on real scenarios

Example (from meta-cc):

# Discovered Pattern: Role-Based Documentation

Observation:
  - plan.md: 423 accesses (Coordination role)
  - CLAUDE.md: ~300 implicit loads (Entry Point role)
  - features.md: 89 accesses (Reference role)

Methodology:
  - Classify docs by actual access patterns
  - Optimize high-access docs for token efficiency
  - Create role-specific maintenance procedures

Validation:
  - CLAUDE.md reduction: 607 → 278 lines (-54%)
  - Token cost reduction: 47%
  - Access efficiency: Maintained

Output: Documented methodology with empirical validation

Phase 3: AUTOMATE

Convert methodology into automated checks and tools

Automation Levels:

Detection: Automated pattern detection
Validation: Check compliance with methodology
Enforcement: CI/CD integration, block violations
Suggestion: Automated fix recommendations

Example (from meta-cc):

# Automation: /meta doc-health capability

# Checks:
- Role classification compliance
- Token efficiency (lines < threshold)
- Cross-reference completeness
- Update frequency

# Actions:
- Flag oversized documents
- Suggest restructuring
- Validate role assignments

Output: Automated tools enforcing methodology

Self-Referential Feedback Loop

The ultimate power of BSE: Apply the methodology to improve itself

Layer 0: Basic Functionality
  → Build tools (meta-cc CLI)

Layer 1: Self-Observation
  → Use tools on self (query own sessions)
  → Discovery: Usage patterns, bottlenecks

Layer 2: Pattern Recognition
  → Analyze data (R/E ratio, access density)
  → Discovery: Document roles, optimization opportunities

Layer 3: Methodology Extraction
  → Codify patterns (role-based-documentation.md)
  → Definition: Classification algorithm, maintenance procedures

Layer 4: Tool Automation
  → Implement checks (/meta doc-health)
  → Auto-validate: Methodology compliance

Layer 5: Continuous Evolution
  → Apply tools to self
  → Discover new patterns → Update methodology → Update tools

This creates a closed loop: Tools improve tools, methodologies optimize methodologies.

Parameters

domain: documentation | testing | architecture | custom (default: custom)
observation_period: number of days/commits to analyze (default: auto-detect)
automation_level: detect | validate | enforce | suggest (default: validate)
iteration_count: number of OCA cycles (default: 3)

Execution Flow

Phase 1: Observation Setup

1. Identify observation targets
   - Code metrics (LOC, complexity, coverage)
   - Development patterns (commits, PRs, errors)
   - Access patterns (file reads, searches)
   - Quality metrics (test results, build time)

2. Install instrumentation
   - meta-cc integration (session analysis)
   - Git hooks (commit tracking)
   - Coverage tracking
   - CI/CD metrics

3. Collect baseline data
   - Run for observation_period
   - Generate initial reports
   - Identify data gaps

Phase 2: Pattern Analysis

4. Analyze collected data
   - Statistical analysis (frequencies, correlations)
   - Pattern recognition (recurring behaviors)
   - Anomaly detection (outliers, inefficiencies)

5. Formulate hypotheses
   - "High-access docs should be < 300 lines"
   - "Test coverage gaps correlate with bugs"
   - "Batch remediation is 5x more efficient"

6. Validate hypotheses
   - Historical data validation
   - A/B testing if possible
   - Expert review

Phase 3: Codification

7. Document patterns
   - Pattern name and description
   - Context and applicability
   - Implementation steps
   - Validation criteria
   - Examples and counter-examples

8. Create methodology
   - Problem statement
   - Solution approach
   - Procedures and guidelines
   - Metrics and validation

9. Peer review
   - Team review
   - Iterate based on feedback

Phase 4: Automation

10. Design automation
    - Detection: Identify when pattern applies
    - Validation: Check compliance
    - Enforcement: Prevent violations
    - Suggestion: Recommend fixes

11. Implement tools
    - Scripts (bash, Python)
    - CI/CD integration
    - IDE plugins
    - Bot automation

12. Deploy and monitor
    - Gradual rollout
    - Collect usage data
    - Measure effectiveness

Phase 5: Evolution

13. Apply to self
    - Use tools on development process
    - Discover meta-patterns
    - Optimize methodology

14. Iterate OCA cycle
    - New observations → New patterns
    - Refined hypotheses → Better validation
    - Enhanced automation → Higher compliance

Usage Examples

Example 1: Documentation Optimization

# User: "Optimize project documentation"
bootstrapped-se domain=documentation

# Execution:

[OBSERVE Phase]
✓ Analyzing file access patterns (30 days)...
  - README.md: 423 accesses (Entry Point)
  - ARCHITECTURE.md: 89 accesses (Reference)
  - API.md: 234 accesses (Reference)

✓ Measuring token efficiency...
  - README.md: 1909 lines (HIGH - inefficient)
  - ARCHITECTURE.md: 456 lines (OK)
  - API.md: 789 lines (MEDIUM)

[CODIFY Phase]
✓ Pattern identified: Role-Based Documentation
  - Entry Point docs: Should be < 300 lines
  - Reference docs: Should be < 500 lines
  - Specialized docs: Can be longer

✓ Methodology documented:
  - Created: docs/methodology/role-based-documentation.md
  - Includes: Classification algorithm, optimization guide

[AUTOMATE Phase]
✓ Automation created:
  - Script: scripts/check-doc-health.sh
  - CI check: .github/workflows/doc-validation.yml
  - Capability: /meta doc-health

[EVOLUTION Phase]
✓ Applied to self:
  - Optimized README.md: 1909 → 275 lines (-85%)
  - Optimized CLAUDE.md: 607 → 278 lines (-54%)
  - Token cost reduction: 47%

✓ Next iteration insights:
  - New pattern: Cross-reference density
  - New automation: Auto-linking checker

Example 2: Testing Strategy Evolution

# User: "Develop systematic testing methodology"
bootstrapped-se domain=testing observation_period=60

# Execution:

[OBSERVE Phase]
✓ Analyzing test patterns (60 days)...
  - Coverage: 75% (good)
  - Test distribution: Uneven (80% unit, 15% integration, 5% e2e)
  - Flaky tests: 12% failure rate
  - Slowest tests: integration (avg 45s)

[CODIFY Phase]
✓ Patterns identified:
  1. Coverage-driven development works
  2. Integration tests need fixtures
  3. Flaky tests have environmental dependencies

✓ Methodology created:
  - Test strategy matrix (when to write what)
  - Fixture patterns for integration tests
  - Flaky test elimination procedures

[AUTOMATE Phase]
✓ Automation implemented:
  - Coverage gate: Block if < 75%
  - Fixture generator: Auto-create from examples
  - Flaky detector: Flag unstable tests

✓ Results:
  - Coverage: 75% → 86%
  - Flaky rate: 12% → 2%
  - Test time: 45s → 12s (with parallel execution)

Example 3: Error Recovery Patterns

# User: "Extract error recovery methodology"
bootstrapped-se domain=custom

# Execution:

[OBSERVE Phase]
✓ Analyzing error patterns...
  - Total errors: 423 (from sessions)
  - Error rate: 6.06%
  - Most common: Type errors (45%), Logic errors (30%), Deps (15%)

[CODIFY Phase]
✓ Pattern: Error Classification Taxonomy
  - Categories: Type, Logic, Dependency, Integration, Infrastructure
  - Recovery strategies per category
  - Prevention guidelines

✓ Methodology: Systematic Error Recovery
  - Detection: Error signature extraction
  - Classification: Rule-based categorization
  - Recovery: Strategy pattern application
  - Prevention: Root cause analysis → Code patterns

[AUTOMATE Phase]
✓ Tools created:
  - Error classifier (ML-based)
  - Recovery strategy recommender
  - Prevention linter (detect anti-patterns)

✓ CI/CD Integration:
  - Auto-classify build failures
  - Suggest recovery steps
  - Track error trends

Validated Outcomes

From meta-cc project (8 experiments, 95% transferability):

Documentation Methodology

Observation: 423 file access patterns analyzed
Codification: Role-based documentation methodology
Automation: /meta doc-health capability
Result: 47% token cost reduction, maintained accessibility

Testing Strategy

Observation: 75% coverage, uneven distribution
Codification: Coverage-driven gap closure
Automation: CI coverage gates, fixture generators
Result: 75% → 86% coverage, 15x speedup vs ad-hoc

Error Recovery

Observation: 6.06% error rate, 423 errors analyzed
Codification: Error taxonomy, recovery patterns
Automation: Error classifier, recovery recommender
Result: 85% transferability, systematic recovery

Dependency Health

Observation: 7 vulnerabilities, 11 outdated deps
Codification: 6 patterns (vulnerability, update, license, etc.)
Automation: 3 scripts + CI/CD workflow
Result: 6x speedup (9h → 1.5h), 88% transferability

Observability

Observation: 0 logs, 0 metrics, 0 traces (baseline)
Codification: Three Pillars methodology (Logging + Metrics + Tracing)
Automation: Code generators, instrumentation templates
Result: 23-46x speedup, 90-95% transferability

Transferability

95% transferable across domains and projects:

What Transfers (95%+)

OCA framework itself (universal process)
Self-referential feedback loop pattern
Observation → Pattern → Automation pipeline
Empirical validation approach
Continuous evolution mindset

What Needs Adaptation (5%)

Specific observation tools (meta-cc → custom tools)
Domain-specific patterns (docs vs testing vs architecture)
Automation implementation details (language, platform)

Adaptation Effort

Same project, new domain: 2-4 hours
New project, same domain: 4-8 hours
New project, new domain: 8-16 hours

Prerequisites

Tools Required

Session analysis: meta-cc or equivalent
Git analysis: Git installed, access to repository
Metrics collection: Coverage tools, static analyzers
Automation: CI/CD platform (GitHub Actions, GitLab CI, etc.)

Skills Required

Basic data analysis (statistics, pattern recognition)
Methodology documentation
Scripting (bash, Python, or equivalent)
CI/CD configuration

Implementation Guidance

Start Small

# Week 1: Observe
- Install meta-cc
- Track file accesses for 1 week
- Collect simple metrics

# Week 2: Codify
- Analyze top 10 access patterns
- Document 1-2 simple patterns
- Get team feedback

# Week 3: Automate
- Create 1 simple validation script
- Add to CI/CD
- Monitor compliance

# Week 4: Iterate
- Apply tools to development
- Discover new patterns
- Refine methodology

Scale Up

# Month 2: Expand domains
- Apply to testing
- Apply to architecture
- Cross-validate patterns

# Month 3: Deep automation
- Build sophisticated checkers
- Integrate with IDE
- Create dashboards

# Month 4: Evolution
- Meta-patterns emerge
- Methodology generator
- Cross-project application

Theoretical Foundation

The Convergence Theorem

Conjecture: For any domain D, there exists a methodology M* such that:

M is locally optimal* for D (cannot be significantly improved)
M can be reached through bootstrapping* (systematic self-improvement)
Convergence speed increases with each iteration (learning effect)

Implication: We can automatically discover optimal methodologies for any domain.

Scientific Method Analogy

1. Observation     = Instrumentation (meta-cc tools)
2. Hypothesis      = "CLAUDE.md should be <300 lines"
3. Experiment      = Implement constraint, measure effects
4. Data Collection = query-files, git log analysis
5. Analysis        = Calculate R/E ratio, access density
6. Conclusion      = "300-line limit effective: 47% reduction"
7. Publication     = Codify as methodology document
8. Replication     = Apply to other projects

Success Criteria

Metric	Target	Validation
Pattern Discovery	≥3 patterns per cycle	Documented patterns
Methodology Quality	Peer-reviewed	Team acceptance
Automation Coverage	≥80% of patterns	CI integration
Effectiveness	≥3x improvement	Before/after metrics
Transferability	≥85% reusability	Cross-project validation

Domain Adaptation Guide

Different domains have different complexity characteristics that affect iteration count, agent needs, and convergence patterns. This guide helps predict and adapt to domain-specific challenges.

Domain Complexity Classes

Based on 8 completed Bootstrap experiments, we've identified three complexity classes:

Simple Domains (3-4 iterations)

Characteristics:

Well-defined problem space
Clear success criteria
Limited interdependencies
Established best practices exist
Straightforward automation

Examples:

Bootstrap-010 (Dependency Health): 3 iterations
- Clear goals: vulnerabilities, freshness, licenses
- Existing tools: govulncheck, go-licenses
- Straightforward automation: CI/CD scripts
- Converged fastest in series
Bootstrap-011 (Knowledge Transfer): 3-4 iterations
- Well-understood domain: onboarding paths
- Clear structure: Day-1, Week-1, Month-1
- Existing patterns: progressive disclosure
- High transferability (95%+)

Adaptation Strategy:

Simple Domain Approach:
1. Start with generic agents only (coder, data-analyst, doc-writer)
2. Focus on automation (tools, scripts, CI)
3. Expect fast convergence (3-4 iterations)
4. Prioritize transferability (aim for 85%+)
5. Minimal agent specialization needed

Expected Outcomes:

Iterations: 3-4
Duration: 6-8 hours
Specialized agents: 0-1
Transferability: 85-95%
V_instance: Often exceeds 0.80 significantly (e.g., 0.92)

Medium Complexity Domains (4-6 iterations)

Characteristics:

Multiple dimensions to optimize
Some ambiguity in success criteria
Moderate interdependencies
Require domain expertise
Automation has nuances

Examples:

Bootstrap-001 (Documentation): 3 iterations (simple side of medium)
- Multiple roles to define
- Access patterns analysis needed
- Search infrastructure complexity
- 85% transferability
Bootstrap-002 (Testing): 5 iterations
- Coverage vs quality trade-offs
- Multiple test types (unit, integration, e2e)
- Fixture patterns discovery
- 89% transferability
Bootstrap-009 (Observability): 6 iterations
- Three pillars (logging, metrics, tracing)
- Performance vs verbosity trade-offs
- Integration complexity
- 90-95% transferability

Adaptation Strategy:

Medium Domain Approach:
1. Start with generic agents, add 1-2 specialized as needed
2. Expect iterative refinement of value functions
3. Plan for 4-6 iterations
4. Balance instance and meta objectives equally
5. Document trade-offs explicitly

Expected Outcomes:

Iterations: 4-6
Duration: 8-12 hours
Specialized agents: 1-3
Transferability: 85-90%
V_instance: Typically 0.80-0.87

Complex Domains (6-8+ iterations)

Characteristics:

High interdependency
Emergent patterns (not obvious upfront)
Multiple competing objectives
Requires novel agent capabilities
Automation is sophisticated

Examples:

Bootstrap-013 (Cross-Cutting Concerns): 8 iterations
- Pattern extraction from existing code
- Convention definition ambiguity
- Automated enforcement complexity
- Large codebase scope (all modules)
- Longest experiment but highest ROI (16.7x)
Bootstrap-003 (Error Recovery): 5 iterations (complex side)
- Error taxonomy creation
- Root cause diagnosis
- Recovery strategy patterns
- 85% transferability
Bootstrap-012 (Technical Debt): 4 iterations (medium-complex)
- SQALE quantification
- Prioritization complexity
- Subjective vs objective debt
- 85% transferability

Adaptation Strategy:

Complex Domain Approach:
1. Expect agent evolution throughout
2. Plan for 6-8+ iterations
3. Accept lower initial V values (baseline often <0.35)
4. Focus on one dimension per iteration
5. Create specialized agents proactively when gaps identified
6. Document emergent patterns as discovered

Expected Outcomes:

Iterations: 6-8+
Duration: 12-18 hours
Specialized agents: 3-5
Transferability: 70-85%
V_instance: Hard-earned 0.80-0.85
Largest single-iteration gains possible (e.g., +27.3% in Bootstrap-013 Iteration 7)

Domain-Specific Considerations

Documentation-Heavy Domains

Examples: Documentation (001), Knowledge Transfer (011)

Key Adaptations:

Prioritize clarity over completeness
Role-based structuring
Accessibility optimization
Cross-referencing systems

Success Indicators:

Access/line ratio > 1.0
User satisfaction surveys
Search effectiveness

Technical Implementation Domains

Examples: Observability (009), Dependency Health (010)

Key Adaptations:

Performance overhead monitoring
Automation-first approach
Integration testing critical
CI/CD pipeline emphasis

Success Indicators:

Automated coverage %
Performance impact < 10%
CI/CD reliability

Quality/Analysis Domains

Examples: Testing (002), Error Recovery (003), Technical Debt (012)

Key Adaptations:

Quantification frameworks essential
Baseline measurement critical
Before/after comparisons
Statistical validation

Success Indicators:

Coverage metrics
Error rate reduction
Time savings quantified

Systematic Enforcement Domains

Examples: Cross-Cutting Concerns (013), Code Review (008 planned)

Key Adaptations:

Pattern extraction from existing code
Linter/checker development
Gradual enforcement rollout
Exception handling

Success Indicators:

Pattern consistency %
Violation detection rate
Developer adoption rate

Predicting Iteration Count

Based on empirical data from 8 experiments:

Base estimate: 5 iterations

Adjust based on:
  - Well-defined domain:        -2 iterations
  - Existing tools available:   -1 iteration
  - High interdependency:       +2 iterations
  - Novel patterns needed:      +1 iteration
  - Large codebase scope:       +1 iteration
  - Multiple competing goals:   +1 iteration

Examples:
  Dependency Health: 5 - 2 - 1 = 2 → actual 3 ✓
  Observability:     5 + 0 + 1 = 6 → actual 6 ✓
  Cross-Cutting:     5 + 2 + 1 = 8 → actual 8 ✓

Agent Specialization Prediction

Generic agents sufficient when:
  - Domain has established patterns
  - Clear best practices exist
  - Automation is straightforward
  → Examples: Dependency Health, Knowledge Transfer

Specialized agents needed when:
  - Novel pattern extraction required
  - Domain-specific expertise needed
  - Complex analysis algorithms
  → Examples: Observability (log-analyzer, metric-designer)
                Cross-Cutting (pattern-extractor, convention-definer)

Rule of thumb:
  - Simple domains: 0-1 specialized agents
  - Medium domains: 1-3 specialized agents
  - Complex domains: 3-5 specialized agents

Meta-Agent Evolution Prediction

Key finding from 8 experiments: M₀ was sufficient in ALL cases

Meta-Agent M₀ capabilities (5):
  1. observe: Pattern observation
  2. plan: Iteration planning
  3. execute: Agent orchestration
  4. reflect: Value assessment
  5. evolve: System evolution

No evolution needed because:
  - M₀ capabilities cover full lifecycle
  - Agent specialization handles domain gaps
  - Modular design allows capability reuse

When to evolve Meta-Agent (theoretical, not yet observed):

Novel coordination pattern needed
Capability gap in lifecycle
Cross-agent orchestration complexity
New convergence pattern discovered

Convergence Pattern Prediction

Based on domain characteristics:

Standard Dual Convergence (most common):

Both V_instance and V_meta reach 0.80+
Examples: Observability (009), Dependency Health (010), Technical Debt (012), Cross-Cutting (013)
Use when: Both objectives equally important

Meta-Focused Convergence:

V_meta reaches 0.80+, V_instance practically sufficient
Example: Knowledge Transfer (011) - V_meta = 0.877, V_instance = 0.585
Use when: Methodology is primary goal, instance is vehicle

Practical Convergence:

Combined quality exceeds metrics, justified partial criteria
Example: Testing (002) - V_instance = 0.848, quality > coverage %
Use when: Quality evidence exceeds raw numbers

Domain Transfer Considerations

Transferability varies by domain abstraction:

High (90-95%):
  - Knowledge Transfer (95%+): Learning principles universal
  - Observability (90-95%): Three Pillars apply everywhere

Medium-High (85-90%):
  - Testing (89%): Test types similar across languages
  - Dependency Health (88%): Package manager patterns similar
  - Documentation (85%): Role-based structure universal
  - Error Recovery (85%): Error taxonomy concepts transfer
  - Technical Debt (85%): SQALE principles universal

Medium (70-85%):
  - Cross-Cutting Concerns (70-80%): Language-specific patterns
  - Refactoring (80% est.): Code smells vary by language

Adaptation effort:

Same language/ecosystem:     10-20% effort (adapt examples)
Similar language (Go→Rust):  30-40% effort (remap patterns)
Different paradigm (Go→JS):  50-60% effort (rethink patterns)

Context Management for LLM Execution

λ(iteration, context_state) → (work_output, context_optimized) | context < limit:

Context management is critical for LLM-based execution where token limits constrain iteration depth and agent effectiveness.

Context Allocation Protocol

context_allocation :: Phase → Percentage
context_allocation(phase) = match phase {
  observation → 0.30,    -- Data collection, pattern analysis
  codification → 0.40,   -- Documentation, methodology writing
  automation → 0.20,     -- Tool creation, CI integration
  reflection → 0.10      -- Evaluation, planning
} where Σ = 1.0

Rationale: Based on 8 experiments, codification consumes most context (methodology docs, agent definitions), followed by observation (data analysis), automation (code writing), and reflection (evaluation).

Context Pressure Management

context_pressure :: State → Strategy
context_pressure(s) =
  if usage(s) > 0.80 then overflow_protocol(s)
  else if usage(s) > 0.50 then compression_protocol(s)
  else standard_protocol(s)

Overflow Protocol (Context >80%)

overflow_protocol :: State → Action
overflow_protocol(s) = prioritize(
  serialize_to_disk: save(s.knowledge/*) ∧ compress(s.history),
  reference_compression: link(files) ∧ ¬inline(content),
  session_split: checkpoint(s) ∧ continue(s_{n+1}, fresh_context)
) where preserve_critical ∧ drop_redundant

Actions:

Serialize to disk: Save iteration state to knowledge/ directory
Reference compression: Link to files instead of inlining content
Session split: Complete current phase, start new session for next iteration

Example (from Bootstrap-013, 8 iterations):

Iteration 4: Context 85% → Serialized analysis to knowledge/pattern-analysis.md
Iteration 5: Started fresh session, loaded serialized state via file references
Result: Continued 4 more iterations without context overflow

Compression Protocol (Context 50-80%)

compression_protocol :: State → Optimizations
compression_protocol(s) = apply(
  deduplication: merge(similar_patterns) ∧ reference_once,
  summarization: compress(historical_context) ∧ keep(structure),
  lazy_loading: defer(load) ∧ fetch_on_demand
)

Optimizations:

Deduplication: Merge similar patterns, reference once
Summarization: Compress historical iterations while preserving structure
Lazy loading: Load agent definitions only when invoked

Convergence Adjustment Under Context Pressure

convergence_adjustment :: (Context, V_i, V_m) → Threshold
convergence_adjustment(ctx, V_i, V_m) =
  if usage(ctx) > 0.80 then
    prefer(meta_focused) ∧ accept(V_i ≥ 0.55 ∧ V_m ≥ 0.80)
  else if usage(ctx) > 0.50 then
    standard_dual ∧ target(V_i ≥ 0.80 ∧ V_m ≥ 0.80)
  else
    extended_optimization ∧ pursue(V_i ≥ 0.90)

Principle: Under high context pressure, prioritize methodology quality (V_meta) over instance quality (V_instance), as methodology is more transferable and valuable long-term.

Validation (Bootstrap-011):

Context pressure: High (95%+ transferability methodology)
Converged with: V_meta = 0.877, V_instance = 0.585
Pattern: Meta-Focused Convergence justified by context constraints

Context Tracking Metrics

context_metrics :: State → Metrics
context_metrics(s) = {
  usage_percentage: tokens_used / tokens_limit,
  phase_distribution: {obs: 0.30, cod: 0.40, aut: 0.20, ref: 0.10},
  compression_ratio: compressed_size / original_size,
  session_splits: count(checkpoints)
}

Track these metrics to predict when intervention needed.

Prompt Evolution Protocol

λ(agent, effectiveness_data) → agent' | ∀evolution: evidence_driven:

Systematic prompt engineering based on empirical effectiveness data, not intuition.

Core Prompt Patterns

prompt_pattern :: Pattern → Template
prompt_pattern(p) = match p {
  context_bounded:
    "Process {input} in chunks of {size}. For each chunk: {analysis}. Aggregate: {synthesis}.",

  tool_orchestrating:
    "Execute: {tool_sequence}. For each result: {validation}. If {condition}: {fallback}.",

  iterative_refinement:
    "Attempt: {approach_1}. Assess: {criteria}. If insufficient: {approach_2}. Repeat until: {threshold}.",

  evidence_accumulation:
    "Hypothesis: {H}. Seek confirming: {C}. Seek disconfirming: {D}. Weight: {W}. Decide: {decision}."
}

Usage:

context_bounded: When processing large datasets (e.g., log analysis, file scanning)
tool_orchestrating: When coordinating multiple MCP tools (e.g., query cascade)
iterative_refinement: When solution quality improves through iteration (e.g., optimization)
evidence_accumulation: When validating hypotheses (e.g., pattern discovery)

Prompt Effectiveness Measurement

prompt_effectiveness :: Prompt → Metrics
prompt_effectiveness(P) = measure(
  convergence_contribution: ΔV_per_iteration,
  token_efficiency: output_value / tokens_used,
  error_rate: failures / total_invocations,
  reusability: cross_domain_success_rate
) where empirical_data ∧ comparative_baseline

Metrics:

Convergence contribution: How much does agent improve V_instance or V_meta per iteration?
Token efficiency: Value delivered per token consumed (cost-effectiveness)
Error rate: Percentage of invocations that fail or produce invalid output
Reusability: Success rate when applied to different domains

Example (from Bootstrap-009):

log-analyzer agent:
- ΔV_per_iteration: +0.12 average
- Token efficiency: 0.85 (high value, moderate tokens)
- Error rate: 3% (acceptable)
- Reusability: 90% (worked in 009, 010, 012)
Result: Prompt kept, agent reused in subsequent experiments

Prompt Evolution Decision

prompt_evolution :: (P, Evidence) → P'
prompt_evolution(P, E) =
  if improvement_demonstrated(E) ∧ generalization_validated(E) then
    update(P → P') ∧ version(P'.version + 1) ∧ document(E.rationale)
  else
    maintain(P) ∧ log(evolution_rejected, E.reason)
  where ¬premature_optimization ∧ n_samples ≥ 3

Evolution criteria:

Improvement demonstrated: Evidence shows measurable improvement (ΔV > 0.05 or error_rate < 50%)
Generalization validated: Works across ≥3 different scenarios
n_samples ≥ 3: Avoid overfitting to single case

Example (theoretical - prompt evolution not yet observed in 8 experiments):

Original prompt: "Analyze logs for errors."
Evidence: Error detection rate 67%, false positives 23%

Evolved prompt: "Analyze {logs} for errors. For each: classify(type, severity, context). Filter: severity >= {threshold}. Output: structured_json."
Evidence: Error detection rate 89%, false positives 8%

Decision: Evolution accepted (improvement demonstrated, validated across 3 log types)

Agent Prompt Protocol

agent_prompt_protocol :: Agent → Execution
agent_prompt_protocol(A) = ∀invocation:
  read(agents/{A.name}.md) ∧
  extract(prompt_latest_version) ∧
  apply(prompt) ∧
  track(effectiveness) ∧
  ¬cache_prompt

Critical: Always read agent definition fresh (no caching) to ensure latest prompt version used.

Tracking:

Log each invocation: agent_name, prompt_version, input, output, success/failure
Aggregate metrics: Calculate effectiveness scores periodically
Trigger evolution: When n_samples ≥ 3 and improvement opportunity identified

Relationship to Other Methodologies

bootstrapped-se is the CORE framework that integrates and extends two complementary methodologies.

Relationship to empirical-methodology (Inclusion)

bootstrapped-se INCLUDES AND EXTENDS empirical-methodology:

empirical-methodology (5 phases):
  Observe → Analyze → Codify → Automate → Evolve

bootstrapped-se (OCA cycle + extensions):
  Observe ───────────→ Codify ────→ Automate
     ↑                                  ↓
     └─────────────── Evolve ──────────┘
     (Self-referential feedback loop)

What bootstrapped-se adds beyond empirical-methodology:

Three-Tuple Output (O, Aₙ, Mₙ) - Reusable artifacts at system level
Agent Framework - Specialized agents emerge from domain needs
Meta-Agent System - Modular capabilities for coordination
Self-Referential Loop - Framework applies to itself
Formal Convergence - System stability criteria (M_n == M_{n-1}, A_n == A_{n-1})

When to use empirical-methodology explicitly:

Need detailed scientific method guidance
Require fine-grained observation tool selection
Want explicit separation of Analyze phase

When to use bootstrapped-se:

Always - It's the core framework
All Bootstrap experiments use bootstrapped-se as foundation
Provides complete OCA cycle with agent system

Relationship to value-optimization (Mutual Support)

value-optimization PROVIDES QUANTIFICATION for bootstrapped-se:

bootstrapped-se needs:          value-optimization provides:
- Quality measurement      →    Dual-layer value functions
- Convergence detection    →    Formal convergence criteria
- Evolution decisions      →    ΔV calculations, trends
- Success validation       →    V_instance ≥ 0.80, V_meta ≥ 0.80

bootstrapped-se ENABLES value-optimization:

OCA cycle generates state transitions (s_i → s_{i+1})
Agent work produces V_instance improvements
Meta-Agent work produces V_meta improvements
Iteration framework implements optimization loop

When to use value-optimization:

Always with bootstrapped-se - Provides evaluation framework
Calculate V_instance and V_meta at every iteration
Check convergence criteria formally
Compare across experiments

Integration:

Every bootstrapped-se iteration:
  1. Execute OCA cycle (Observe → Codify → Automate)
  2. Calculate V(s_n) using value-optimization
  3. Check convergence (system stable + dual threshold)
  4. If not converged: Continue iteration
  5. If converged: Generate (O, Aₙ, Mₙ)

Three-Methodology Integration

Complete workflow (as used in all Bootstrap experiments):

┌─ methodology-framework ─────────────────────┐
│                                              │
│  ┌─ bootstrapped-se (CORE) ───────────────┐ │
│  │                                         │ │
│  │  ┌─ empirical-methodology ──────────┐  │ │
│  │  │                                   │  │ │
│  │  │  Observe + Analyze                │  │ │
│  │  │  Codify (with evidence)           │  │ │
│  │  │  Automate (CI/CD)                 │  │ │
│  │  │  Evolve (self-referential)        │  │ │
│  │  │                                   │  │ │
│  │  └───────────────────────────────────┘  │ │
│  │                 ↓                        │ │
│  │  Produce: (O, Aₙ, Mₙ)                   │ │
│  │                 ↓                        │ │
│  │  ┌─ value-optimization ──────────────┐  │ │
│  │  │                                   │  │ │
│  │  │  V_instance(s_n) = domain quality │  │ │
│  │  │  V_meta(s_n) = methodology quality│  │ │
│  │  │                                   │  │ │
│  │  │  Convergence check:               │  │ │
│  │  │  - System stable?                 │  │ │
│  │  │  - Dual threshold met?            │  │ │
│  │  │                                   │  │ │
│  │  └───────────────────────────────────┘  │ │
│  │                                         │ │
│  └─────────────────────────────────────────┘ │
│                                              │
└──────────────────────────────────────────────┘

Usage Recommendation:

Start here: Read bootstrapped-se.md (this file)
Add evaluation: Read value-optimization.md
Add rigor: Read empirical-methodology.md (optional)
See integration: Read bootstrapped-ai-methodology-engineering.md (BAIME framework)

bootstrapped-ai-methodology-engineering: Unified BAIME framework integrating all three methodologies
empirical-methodology: Scientific foundation (included in bootstrapped-se)
value-optimization: Quantitative evaluation framework (used by bootstrapped-se)
iteration-executor: Implementation agent (coordinates bootstrapped-se execution)

Knowledge Base

Source Documentation

Core methodology: docs/methodology/bootstrapped-software-engineering.md
Related: docs/methodology/empirical-methodology-development.md
Examples: experiments/bootstrap-*/ (8 validated experiments)

Key Concepts

OCA Framework (Observe-Codify-Automate)
Three-Tuple Output (O, Aₙ, Mₙ)
Self-Referential Feedback Loop
Convergence Theorem
Meta-Methodology

Version History

v1.0.0 (2025-10-18): Initial release
- Based on meta-cc methodology development
- 8 experiments validated (95% transferability)
- OCA framework with 5-layer feedback loop
- Empirical validation from 277 commits, 11 days

Status: ✅ Production-ready Validation: 8 experiments (Bootstrap-001 to -013) Effectiveness: 10-50x methodology development speedup Transferability: 95% (framework universal, tools adaptable)

37 KiB Raw Blame History

Bootstrapped Software Engineering

Core Insight

Three-Tuple Output

The OCA Framework

Phase 1: OBSERVE

Phase 2: CODIFY

Phase 3: AUTOMATE

Self-Referential Feedback Loop

Parameters

Execution Flow

Phase 1: Observation Setup

Phase 2: Pattern Analysis

Phase 3: Codification

Phase 4: Automation

Phase 5: Evolution

Usage Examples

Example 1: Documentation Optimization

Example 2: Testing Strategy Evolution

Example 3: Error Recovery Patterns

Validated Outcomes

Documentation Methodology

Testing Strategy

Error Recovery

Dependency Health

Observability

Transferability

What Transfers (95%+)

What Needs Adaptation (5%)

Adaptation Effort

Prerequisites

Tools Required

Skills Required

Implementation Guidance

Start Small

Scale Up

Theoretical Foundation

The Convergence Theorem

Scientific Method Analogy

Success Criteria

Domain Adaptation Guide

Domain Complexity Classes

Simple Domains (3-4 iterations)

Medium Complexity Domains (4-6 iterations)

Complex Domains (6-8+ iterations)

Domain-Specific Considerations

Documentation-Heavy Domains

Technical Implementation Domains

Quality/Analysis Domains

Systematic Enforcement Domains

Predicting Iteration Count

Agent Specialization Prediction

Meta-Agent Evolution Prediction

Convergence Pattern Prediction

Domain Transfer Considerations

Context Management for LLM Execution

Context Allocation Protocol

Context Pressure Management

Overflow Protocol (Context >80%)

Compression Protocol (Context 50-80%)

Convergence Adjustment Under Context Pressure

Context Tracking Metrics

Prompt Evolution Protocol

Core Prompt Patterns

Prompt Effectiveness Measurement

Prompt Evolution Decision

Agent Prompt Protocol

Relationship to Other Methodologies

Relationship to empirical-methodology (Inclusion)

Relationship to value-optimization (Mutual Support)

Three-Methodology Integration

Related Skills

Knowledge Base

Source Documentation

Key Concepts

Version History

37 KiB

Raw Blame History