--- name: bootstrapped-se description: Apply Bootstrapped Software Engineering (BSE) methodology to evolve project-specific development practices through systematic Observe-Codify-Automate cycles keywords: bootstrapping, meta-methodology, OCA, observe, codify, automate, self-improvement, empirical, methodology-development category: methodology version: 1.0.0 based_on: docs/methodology/bootstrapped-software-engineering.md transferability: 95% effectiveness: 10-50x methodology development speedup --- # Bootstrapped Software Engineering **Evolve project-specific methodologies through systematic observation, codification, and automation.** > The best methodologies are not **designed** but **evolved** through systematic observation, codification, and automation of successful practices. --- ## Core Insight Traditional methodologies are theory-driven and static. **Bootstrapped Software Engineering (BSE)** enables development processes to: 1. **Observe** themselves through instrumentation and data collection 2. **Codify** discovered patterns into reusable methodologies 3. **Automate** methodology enforcement and validation 4. **Self-improve** by applying the methodology to its own evolution ### Three-Tuple Output Every BSE process produces: ``` (O, Aₙ, Mₙ) where: O = Task output (code, documentation, system) Aₙ = Converged agent set (reusable for similar tasks) Mₙ = Converged meta-agent (transferable to new domains) ``` --- ## The OCA Framework **Three-Phase Cycle**: Observe → Codify → Automate ### Phase 1: OBSERVE **Instrument your development process to collect data** **Tools**: - Session history analysis (meta-cc) - Git commit analysis - Code metrics (coverage, complexity) - Access pattern tracking - Error rate monitoring **Example** (from meta-cc): ```bash # Analyze file access patterns meta-cc query files --threshold 5 # Result: plan.md accessed 423 times (highest) # Insight: Core reference document, needs optimization ``` **Output**: Empirical data about actual development patterns ### Phase 2: CODIFY **Extract patterns and document as reusable methodologies** **Process**: 1. **Pattern Recognition**: Identify recurring successful practices 2. **Hypothesis Formation**: Formulate testable claims 3. **Documentation**: Write methodology documents 4. **Validation**: Test methodology on real scenarios **Example** (from meta-cc): ```markdown # Discovered Pattern: Role-Based Documentation Observation: - plan.md: 423 accesses (Coordination role) - CLAUDE.md: ~300 implicit loads (Entry Point role) - features.md: 89 accesses (Reference role) Methodology: - Classify docs by actual access patterns - Optimize high-access docs for token efficiency - Create role-specific maintenance procedures Validation: - CLAUDE.md reduction: 607 → 278 lines (-54%) - Token cost reduction: 47% - Access efficiency: Maintained ``` **Output**: Documented methodology with empirical validation ### Phase 3: AUTOMATE **Convert methodology into automated checks and tools** **Automation Levels**: 1. **Detection**: Automated pattern detection 2. **Validation**: Check compliance with methodology 3. **Enforcement**: CI/CD integration, block violations 4. **Suggestion**: Automated fix recommendations **Example** (from meta-cc): ```bash # Automation: /meta doc-health capability # Checks: - Role classification compliance - Token efficiency (lines < threshold) - Cross-reference completeness - Update frequency # Actions: - Flag oversized documents - Suggest restructuring - Validate role assignments ``` **Output**: Automated tools enforcing methodology --- ## Self-Referential Feedback Loop The ultimate power of BSE: **Apply the methodology to improve itself** ``` Layer 0: Basic Functionality → Build tools (meta-cc CLI) Layer 1: Self-Observation → Use tools on self (query own sessions) → Discovery: Usage patterns, bottlenecks Layer 2: Pattern Recognition → Analyze data (R/E ratio, access density) → Discovery: Document roles, optimization opportunities Layer 3: Methodology Extraction → Codify patterns (role-based-documentation.md) → Definition: Classification algorithm, maintenance procedures Layer 4: Tool Automation → Implement checks (/meta doc-health) → Auto-validate: Methodology compliance Layer 5: Continuous Evolution → Apply tools to self → Discover new patterns → Update methodology → Update tools ``` **This creates a closed loop**: Tools improve tools, methodologies optimize methodologies. --- ## Parameters - **domain**: `documentation` | `testing` | `architecture` | `custom` (default: `custom`) - **observation_period**: number of days/commits to analyze (default: auto-detect) - **automation_level**: `detect` | `validate` | `enforce` | `suggest` (default: `validate`) - **iteration_count**: number of OCA cycles (default: 3) --- ## Execution Flow ### Phase 1: Observation Setup ```python 1. Identify observation targets - Code metrics (LOC, complexity, coverage) - Development patterns (commits, PRs, errors) - Access patterns (file reads, searches) - Quality metrics (test results, build time) 2. Install instrumentation - meta-cc integration (session analysis) - Git hooks (commit tracking) - Coverage tracking - CI/CD metrics 3. Collect baseline data - Run for observation_period - Generate initial reports - Identify data gaps ``` ### Phase 2: Pattern Analysis ```python 4. Analyze collected data - Statistical analysis (frequencies, correlations) - Pattern recognition (recurring behaviors) - Anomaly detection (outliers, inefficiencies) 5. Formulate hypotheses - "High-access docs should be < 300 lines" - "Test coverage gaps correlate with bugs" - "Batch remediation is 5x more efficient" 6. Validate hypotheses - Historical data validation - A/B testing if possible - Expert review ``` ### Phase 3: Codification ```python 7. Document patterns - Pattern name and description - Context and applicability - Implementation steps - Validation criteria - Examples and counter-examples 8. Create methodology - Problem statement - Solution approach - Procedures and guidelines - Metrics and validation 9. Peer review - Team review - Iterate based on feedback ``` ### Phase 4: Automation ```python 10. Design automation - Detection: Identify when pattern applies - Validation: Check compliance - Enforcement: Prevent violations - Suggestion: Recommend fixes 11. Implement tools - Scripts (bash, Python) - CI/CD integration - IDE plugins - Bot automation 12. Deploy and monitor - Gradual rollout - Collect usage data - Measure effectiveness ``` ### Phase 5: Evolution ```python 13. Apply to self - Use tools on development process - Discover meta-patterns - Optimize methodology 14. Iterate OCA cycle - New observations → New patterns - Refined hypotheses → Better validation - Enhanced automation → Higher compliance ``` --- ## Usage Examples ### Example 1: Documentation Optimization ```bash # User: "Optimize project documentation" bootstrapped-se domain=documentation # Execution: [OBSERVE Phase] ✓ Analyzing file access patterns (30 days)... - README.md: 423 accesses (Entry Point) - ARCHITECTURE.md: 89 accesses (Reference) - API.md: 234 accesses (Reference) ✓ Measuring token efficiency... - README.md: 1909 lines (HIGH - inefficient) - ARCHITECTURE.md: 456 lines (OK) - API.md: 789 lines (MEDIUM) [CODIFY Phase] ✓ Pattern identified: Role-Based Documentation - Entry Point docs: Should be < 300 lines - Reference docs: Should be < 500 lines - Specialized docs: Can be longer ✓ Methodology documented: - Created: docs/methodology/role-based-documentation.md - Includes: Classification algorithm, optimization guide [AUTOMATE Phase] ✓ Automation created: - Script: scripts/check-doc-health.sh - CI check: .github/workflows/doc-validation.yml - Capability: /meta doc-health [EVOLUTION Phase] ✓ Applied to self: - Optimized README.md: 1909 → 275 lines (-85%) - Optimized CLAUDE.md: 607 → 278 lines (-54%) - Token cost reduction: 47% ✓ Next iteration insights: - New pattern: Cross-reference density - New automation: Auto-linking checker ``` ### Example 2: Testing Strategy Evolution ```bash # User: "Develop systematic testing methodology" bootstrapped-se domain=testing observation_period=60 # Execution: [OBSERVE Phase] ✓ Analyzing test patterns (60 days)... - Coverage: 75% (good) - Test distribution: Uneven (80% unit, 15% integration, 5% e2e) - Flaky tests: 12% failure rate - Slowest tests: integration (avg 45s) [CODIFY Phase] ✓ Patterns identified: 1. Coverage-driven development works 2. Integration tests need fixtures 3. Flaky tests have environmental dependencies ✓ Methodology created: - Test strategy matrix (when to write what) - Fixture patterns for integration tests - Flaky test elimination procedures [AUTOMATE Phase] ✓ Automation implemented: - Coverage gate: Block if < 75% - Fixture generator: Auto-create from examples - Flaky detector: Flag unstable tests ✓ Results: - Coverage: 75% → 86% - Flaky rate: 12% → 2% - Test time: 45s → 12s (with parallel execution) ``` ### Example 3: Error Recovery Patterns ```bash # User: "Extract error recovery methodology" bootstrapped-se domain=custom # Execution: [OBSERVE Phase] ✓ Analyzing error patterns... - Total errors: 423 (from sessions) - Error rate: 6.06% - Most common: Type errors (45%), Logic errors (30%), Deps (15%) [CODIFY Phase] ✓ Pattern: Error Classification Taxonomy - Categories: Type, Logic, Dependency, Integration, Infrastructure - Recovery strategies per category - Prevention guidelines ✓ Methodology: Systematic Error Recovery - Detection: Error signature extraction - Classification: Rule-based categorization - Recovery: Strategy pattern application - Prevention: Root cause analysis → Code patterns [AUTOMATE Phase] ✓ Tools created: - Error classifier (ML-based) - Recovery strategy recommender - Prevention linter (detect anti-patterns) ✓ CI/CD Integration: - Auto-classify build failures - Suggest recovery steps - Track error trends ``` --- ## Validated Outcomes **From meta-cc project** (8 experiments, 95% transferability): ### Documentation Methodology - **Observation**: 423 file access patterns analyzed - **Codification**: Role-based documentation methodology - **Automation**: /meta doc-health capability - **Result**: 47% token cost reduction, maintained accessibility ### Testing Strategy - **Observation**: 75% coverage, uneven distribution - **Codification**: Coverage-driven gap closure - **Automation**: CI coverage gates, fixture generators - **Result**: 75% → 86% coverage, 15x speedup vs ad-hoc ### Error Recovery - **Observation**: 6.06% error rate, 423 errors analyzed - **Codification**: Error taxonomy, recovery patterns - **Automation**: Error classifier, recovery recommender - **Result**: 85% transferability, systematic recovery ### Dependency Health - **Observation**: 7 vulnerabilities, 11 outdated deps - **Codification**: 6 patterns (vulnerability, update, license, etc.) - **Automation**: 3 scripts + CI/CD workflow - **Result**: 6x speedup (9h → 1.5h), 88% transferability ### Observability - **Observation**: 0 logs, 0 metrics, 0 traces (baseline) - **Codification**: Three Pillars methodology (Logging + Metrics + Tracing) - **Automation**: Code generators, instrumentation templates - **Result**: 23-46x speedup, 90-95% transferability --- ## Transferability **95% transferable** across domains and projects: ### What Transfers (95%+) - OCA framework itself (universal process) - Self-referential feedback loop pattern - Observation → Pattern → Automation pipeline - Empirical validation approach - Continuous evolution mindset ### What Needs Adaptation (5%) - Specific observation tools (meta-cc → custom tools) - Domain-specific patterns (docs vs testing vs architecture) - Automation implementation details (language, platform) ### Adaptation Effort - **Same project, new domain**: 2-4 hours - **New project, same domain**: 4-8 hours - **New project, new domain**: 8-16 hours --- ## Prerequisites ### Tools Required - **Session analysis**: meta-cc or equivalent - **Git analysis**: Git installed, access to repository - **Metrics collection**: Coverage tools, static analyzers - **Automation**: CI/CD platform (GitHub Actions, GitLab CI, etc.) ### Skills Required - Basic data analysis (statistics, pattern recognition) - Methodology documentation - Scripting (bash, Python, or equivalent) - CI/CD configuration --- ## Implementation Guidance ### Start Small ```bash # Week 1: Observe - Install meta-cc - Track file accesses for 1 week - Collect simple metrics # Week 2: Codify - Analyze top 10 access patterns - Document 1-2 simple patterns - Get team feedback # Week 3: Automate - Create 1 simple validation script - Add to CI/CD - Monitor compliance # Week 4: Iterate - Apply tools to development - Discover new patterns - Refine methodology ``` ### Scale Up ```bash # Month 2: Expand domains - Apply to testing - Apply to architecture - Cross-validate patterns # Month 3: Deep automation - Build sophisticated checkers - Integrate with IDE - Create dashboards # Month 4: Evolution - Meta-patterns emerge - Methodology generator - Cross-project application ``` --- ## Theoretical Foundation ### The Convergence Theorem **Conjecture**: For any domain D, there exists a methodology M* such that: 1. **M* is locally optimal** for D (cannot be significantly improved) 2. **M* can be reached through bootstrapping** (systematic self-improvement) 3. **Convergence speed increases** with each iteration (learning effect) **Implication**: We can **automatically discover** optimal methodologies for any domain. ### Scientific Method Analogy ``` 1. Observation = Instrumentation (meta-cc tools) 2. Hypothesis = "CLAUDE.md should be <300 lines" 3. Experiment = Implement constraint, measure effects 4. Data Collection = query-files, git log analysis 5. Analysis = Calculate R/E ratio, access density 6. Conclusion = "300-line limit effective: 47% reduction" 7. Publication = Codify as methodology document 8. Replication = Apply to other projects ``` --- ## Success Criteria | Metric | Target | Validation | |--------|--------|------------| | **Pattern Discovery** | ≥3 patterns per cycle | Documented patterns | | **Methodology Quality** | Peer-reviewed | Team acceptance | | **Automation Coverage** | ≥80% of patterns | CI integration | | **Effectiveness** | ≥3x improvement | Before/after metrics | | **Transferability** | ≥85% reusability | Cross-project validation | --- ## Domain Adaptation Guide **Different domains have different complexity characteristics** that affect iteration count, agent needs, and convergence patterns. This guide helps predict and adapt to domain-specific challenges. ### Domain Complexity Classes Based on 8 completed Bootstrap experiments, we've identified three complexity classes: #### Simple Domains (3-4 iterations) **Characteristics**: - Well-defined problem space - Clear success criteria - Limited interdependencies - Established best practices exist - Straightforward automation **Examples**: - **Bootstrap-010 (Dependency Health)**: 3 iterations - Clear goals: vulnerabilities, freshness, licenses - Existing tools: govulncheck, go-licenses - Straightforward automation: CI/CD scripts - Converged fastest in series - **Bootstrap-011 (Knowledge Transfer)**: 3-4 iterations - Well-understood domain: onboarding paths - Clear structure: Day-1, Week-1, Month-1 - Existing patterns: progressive disclosure - High transferability (95%+) **Adaptation Strategy**: ```markdown Simple Domain Approach: 1. Start with generic agents only (coder, data-analyst, doc-writer) 2. Focus on automation (tools, scripts, CI) 3. Expect fast convergence (3-4 iterations) 4. Prioritize transferability (aim for 85%+) 5. Minimal agent specialization needed ``` **Expected Outcomes**: - Iterations: 3-4 - Duration: 6-8 hours - Specialized agents: 0-1 - Transferability: 85-95% - V_instance: Often exceeds 0.80 significantly (e.g., 0.92) #### Medium Complexity Domains (4-6 iterations) **Characteristics**: - Multiple dimensions to optimize - Some ambiguity in success criteria - Moderate interdependencies - Require domain expertise - Automation has nuances **Examples**: - **Bootstrap-001 (Documentation)**: 3 iterations (simple side of medium) - Multiple roles to define - Access patterns analysis needed - Search infrastructure complexity - 85% transferability - **Bootstrap-002 (Testing)**: 5 iterations - Coverage vs quality trade-offs - Multiple test types (unit, integration, e2e) - Fixture patterns discovery - 89% transferability - **Bootstrap-009 (Observability)**: 6 iterations - Three pillars (logging, metrics, tracing) - Performance vs verbosity trade-offs - Integration complexity - 90-95% transferability **Adaptation Strategy**: ```markdown Medium Domain Approach: 1. Start with generic agents, add 1-2 specialized as needed 2. Expect iterative refinement of value functions 3. Plan for 4-6 iterations 4. Balance instance and meta objectives equally 5. Document trade-offs explicitly ``` **Expected Outcomes**: - Iterations: 4-6 - Duration: 8-12 hours - Specialized agents: 1-3 - Transferability: 85-90% - V_instance: Typically 0.80-0.87 #### Complex Domains (6-8+ iterations) **Characteristics**: - High interdependency - Emergent patterns (not obvious upfront) - Multiple competing objectives - Requires novel agent capabilities - Automation is sophisticated **Examples**: - **Bootstrap-013 (Cross-Cutting Concerns)**: 8 iterations - Pattern extraction from existing code - Convention definition ambiguity - Automated enforcement complexity - Large codebase scope (all modules) - Longest experiment but highest ROI (16.7x) - **Bootstrap-003 (Error Recovery)**: 5 iterations (complex side) - Error taxonomy creation - Root cause diagnosis - Recovery strategy patterns - 85% transferability - **Bootstrap-012 (Technical Debt)**: 4 iterations (medium-complex) - SQALE quantification - Prioritization complexity - Subjective vs objective debt - 85% transferability **Adaptation Strategy**: ```markdown Complex Domain Approach: 1. Expect agent evolution throughout 2. Plan for 6-8+ iterations 3. Accept lower initial V values (baseline often <0.35) 4. Focus on one dimension per iteration 5. Create specialized agents proactively when gaps identified 6. Document emergent patterns as discovered ``` **Expected Outcomes**: - Iterations: 6-8+ - Duration: 12-18 hours - Specialized agents: 3-5 - Transferability: 70-85% - V_instance: Hard-earned 0.80-0.85 - Largest single-iteration gains possible (e.g., +27.3% in Bootstrap-013 Iteration 7) ### Domain-Specific Considerations #### Documentation-Heavy Domains **Examples**: Documentation (001), Knowledge Transfer (011) **Key Adaptations**: - Prioritize clarity over completeness - Role-based structuring - Accessibility optimization - Cross-referencing systems **Success Indicators**: - Access/line ratio > 1.0 - User satisfaction surveys - Search effectiveness #### Technical Implementation Domains **Examples**: Observability (009), Dependency Health (010) **Key Adaptations**: - Performance overhead monitoring - Automation-first approach - Integration testing critical - CI/CD pipeline emphasis **Success Indicators**: - Automated coverage % - Performance impact < 10% - CI/CD reliability #### Quality/Analysis Domains **Examples**: Testing (002), Error Recovery (003), Technical Debt (012) **Key Adaptations**: - Quantification frameworks essential - Baseline measurement critical - Before/after comparisons - Statistical validation **Success Indicators**: - Coverage metrics - Error rate reduction - Time savings quantified #### Systematic Enforcement Domains **Examples**: Cross-Cutting Concerns (013), Code Review (008 planned) **Key Adaptations**: - Pattern extraction from existing code - Linter/checker development - Gradual enforcement rollout - Exception handling **Success Indicators**: - Pattern consistency % - Violation detection rate - Developer adoption rate ### Predicting Iteration Count Based on empirical data from 8 experiments: ``` Base estimate: 5 iterations Adjust based on: - Well-defined domain: -2 iterations - Existing tools available: -1 iteration - High interdependency: +2 iterations - Novel patterns needed: +1 iteration - Large codebase scope: +1 iteration - Multiple competing goals: +1 iteration Examples: Dependency Health: 5 - 2 - 1 = 2 → actual 3 ✓ Observability: 5 + 0 + 1 = 6 → actual 6 ✓ Cross-Cutting: 5 + 2 + 1 = 8 → actual 8 ✓ ``` ### Agent Specialization Prediction ``` Generic agents sufficient when: - Domain has established patterns - Clear best practices exist - Automation is straightforward → Examples: Dependency Health, Knowledge Transfer Specialized agents needed when: - Novel pattern extraction required - Domain-specific expertise needed - Complex analysis algorithms → Examples: Observability (log-analyzer, metric-designer) Cross-Cutting (pattern-extractor, convention-definer) Rule of thumb: - Simple domains: 0-1 specialized agents - Medium domains: 1-3 specialized agents - Complex domains: 3-5 specialized agents ``` ### Meta-Agent Evolution Prediction **Key finding from 8 experiments**: **M₀ was sufficient in ALL cases** ``` Meta-Agent M₀ capabilities (5): 1. observe: Pattern observation 2. plan: Iteration planning 3. execute: Agent orchestration 4. reflect: Value assessment 5. evolve: System evolution No evolution needed because: - M₀ capabilities cover full lifecycle - Agent specialization handles domain gaps - Modular design allows capability reuse ``` **When to evolve Meta-Agent** (theoretical, not yet observed): - Novel coordination pattern needed - Capability gap in lifecycle - Cross-agent orchestration complexity - New convergence pattern discovered ### Convergence Pattern Prediction Based on domain characteristics: **Standard Dual Convergence** (most common): - Both V_instance and V_meta reach 0.80+ - Examples: Observability (009), Dependency Health (010), Technical Debt (012), Cross-Cutting (013) - **Use when**: Both objectives equally important **Meta-Focused Convergence**: - V_meta reaches 0.80+, V_instance practically sufficient - Example: Knowledge Transfer (011) - V_meta = 0.877, V_instance = 0.585 - **Use when**: Methodology is primary goal, instance is vehicle **Practical Convergence**: - Combined quality exceeds metrics, justified partial criteria - Example: Testing (002) - V_instance = 0.848, quality > coverage % - **Use when**: Quality evidence exceeds raw numbers ### Domain Transfer Considerations **Transferability varies by domain abstraction**: ``` High (90-95%): - Knowledge Transfer (95%+): Learning principles universal - Observability (90-95%): Three Pillars apply everywhere Medium-High (85-90%): - Testing (89%): Test types similar across languages - Dependency Health (88%): Package manager patterns similar - Documentation (85%): Role-based structure universal - Error Recovery (85%): Error taxonomy concepts transfer - Technical Debt (85%): SQALE principles universal Medium (70-85%): - Cross-Cutting Concerns (70-80%): Language-specific patterns - Refactoring (80% est.): Code smells vary by language ``` **Adaptation effort**: ``` Same language/ecosystem: 10-20% effort (adapt examples) Similar language (Go→Rust): 30-40% effort (remap patterns) Different paradigm (Go→JS): 50-60% effort (rethink patterns) ``` --- ## Context Management for LLM Execution λ(iteration, context_state) → (work_output, context_optimized) | context < limit: **Context management is critical for LLM-based execution** where token limits constrain iteration depth and agent effectiveness. ### Context Allocation Protocol ``` context_allocation :: Phase → Percentage context_allocation(phase) = match phase { observation → 0.30, -- Data collection, pattern analysis codification → 0.40, -- Documentation, methodology writing automation → 0.20, -- Tool creation, CI integration reflection → 0.10 -- Evaluation, planning } where Σ = 1.0 ``` **Rationale**: Based on 8 experiments, codification consumes most context (methodology docs, agent definitions), followed by observation (data analysis), automation (code writing), and reflection (evaluation). ### Context Pressure Management ``` context_pressure :: State → Strategy context_pressure(s) = if usage(s) > 0.80 then overflow_protocol(s) else if usage(s) > 0.50 then compression_protocol(s) else standard_protocol(s) ``` ### Overflow Protocol (Context >80%) ``` overflow_protocol :: State → Action overflow_protocol(s) = prioritize( serialize_to_disk: save(s.knowledge/*) ∧ compress(s.history), reference_compression: link(files) ∧ ¬inline(content), session_split: checkpoint(s) ∧ continue(s_{n+1}, fresh_context) ) where preserve_critical ∧ drop_redundant ``` **Actions**: 1. **Serialize to disk**: Save iteration state to `knowledge/` directory 2. **Reference compression**: Link to files instead of inlining content 3. **Session split**: Complete current phase, start new session for next iteration **Example** (from Bootstrap-013, 8 iterations): - Iteration 4: Context 85% → Serialized analysis to `knowledge/pattern-analysis.md` - Iteration 5: Started fresh session, loaded serialized state via file references - Result: Continued 4 more iterations without context overflow ### Compression Protocol (Context 50-80%) ``` compression_protocol :: State → Optimizations compression_protocol(s) = apply( deduplication: merge(similar_patterns) ∧ reference_once, summarization: compress(historical_context) ∧ keep(structure), lazy_loading: defer(load) ∧ fetch_on_demand ) ``` **Optimizations**: 1. **Deduplication**: Merge similar patterns, reference once 2. **Summarization**: Compress historical iterations while preserving structure 3. **Lazy loading**: Load agent definitions only when invoked ### Convergence Adjustment Under Context Pressure ``` convergence_adjustment :: (Context, V_i, V_m) → Threshold convergence_adjustment(ctx, V_i, V_m) = if usage(ctx) > 0.80 then prefer(meta_focused) ∧ accept(V_i ≥ 0.55 ∧ V_m ≥ 0.80) else if usage(ctx) > 0.50 then standard_dual ∧ target(V_i ≥ 0.80 ∧ V_m ≥ 0.80) else extended_optimization ∧ pursue(V_i ≥ 0.90) ``` **Principle**: Under high context pressure, prioritize methodology quality (V_meta) over instance quality (V_instance), as methodology is more transferable and valuable long-term. **Validation** (Bootstrap-011): - Context pressure: High (95%+ transferability methodology) - Converged with: V_meta = 0.877, V_instance = 0.585 - Pattern: Meta-Focused Convergence justified by context constraints ### Context Tracking Metrics ``` context_metrics :: State → Metrics context_metrics(s) = { usage_percentage: tokens_used / tokens_limit, phase_distribution: {obs: 0.30, cod: 0.40, aut: 0.20, ref: 0.10}, compression_ratio: compressed_size / original_size, session_splits: count(checkpoints) } ``` Track these metrics to predict when intervention needed. --- ## Prompt Evolution Protocol λ(agent, effectiveness_data) → agent' | ∀evolution: evidence_driven: **Systematic prompt engineering** based on empirical effectiveness data, not intuition. ### Core Prompt Patterns ``` prompt_pattern :: Pattern → Template prompt_pattern(p) = match p { context_bounded: "Process {input} in chunks of {size}. For each chunk: {analysis}. Aggregate: {synthesis}.", tool_orchestrating: "Execute: {tool_sequence}. For each result: {validation}. If {condition}: {fallback}.", iterative_refinement: "Attempt: {approach_1}. Assess: {criteria}. If insufficient: {approach_2}. Repeat until: {threshold}.", evidence_accumulation: "Hypothesis: {H}. Seek confirming: {C}. Seek disconfirming: {D}. Weight: {W}. Decide: {decision}." } ``` **Usage**: - **context_bounded**: When processing large datasets (e.g., log analysis, file scanning) - **tool_orchestrating**: When coordinating multiple MCP tools (e.g., query cascade) - **iterative_refinement**: When solution quality improves through iteration (e.g., optimization) - **evidence_accumulation**: When validating hypotheses (e.g., pattern discovery) ### Prompt Effectiveness Measurement ``` prompt_effectiveness :: Prompt → Metrics prompt_effectiveness(P) = measure( convergence_contribution: ΔV_per_iteration, token_efficiency: output_value / tokens_used, error_rate: failures / total_invocations, reusability: cross_domain_success_rate ) where empirical_data ∧ comparative_baseline ``` **Metrics**: 1. **Convergence contribution**: How much does agent improve V_instance or V_meta per iteration? 2. **Token efficiency**: Value delivered per token consumed (cost-effectiveness) 3. **Error rate**: Percentage of invocations that fail or produce invalid output 4. **Reusability**: Success rate when applied to different domains **Example** (from Bootstrap-009): - log-analyzer agent: - ΔV_per_iteration: +0.12 average - Token efficiency: 0.85 (high value, moderate tokens) - Error rate: 3% (acceptable) - Reusability: 90% (worked in 009, 010, 012) - Result: Prompt kept, agent reused in subsequent experiments ### Prompt Evolution Decision ``` prompt_evolution :: (P, Evidence) → P' prompt_evolution(P, E) = if improvement_demonstrated(E) ∧ generalization_validated(E) then update(P → P') ∧ version(P'.version + 1) ∧ document(E.rationale) else maintain(P) ∧ log(evolution_rejected, E.reason) where ¬premature_optimization ∧ n_samples ≥ 3 ``` **Evolution criteria**: 1. **Improvement demonstrated**: Evidence shows measurable improvement (ΔV > 0.05 or error_rate < 50%) 2. **Generalization validated**: Works across ≥3 different scenarios 3. **n_samples ≥ 3**: Avoid overfitting to single case **Example** (theoretical - prompt evolution not yet observed in 8 experiments): ``` Original prompt: "Analyze logs for errors." Evidence: Error detection rate 67%, false positives 23% Evolved prompt: "Analyze {logs} for errors. For each: classify(type, severity, context). Filter: severity >= {threshold}. Output: structured_json." Evidence: Error detection rate 89%, false positives 8% Decision: Evolution accepted (improvement demonstrated, validated across 3 log types) ``` ### Agent Prompt Protocol ``` agent_prompt_protocol :: Agent → Execution agent_prompt_protocol(A) = ∀invocation: read(agents/{A.name}.md) ∧ extract(prompt_latest_version) ∧ apply(prompt) ∧ track(effectiveness) ∧ ¬cache_prompt ``` **Critical**: Always read agent definition fresh (no caching) to ensure latest prompt version used. **Tracking**: - Log each invocation: agent_name, prompt_version, input, output, success/failure - Aggregate metrics: Calculate effectiveness scores periodically - Trigger evolution: When n_samples ≥ 3 and improvement opportunity identified --- ## Relationship to Other Methodologies **bootstrapped-se is the CORE framework** that integrates and extends two complementary methodologies. ### Relationship to empirical-methodology (Inclusion) **bootstrapped-se INCLUDES AND EXTENDS empirical-methodology**: ``` empirical-methodology (5 phases): Observe → Analyze → Codify → Automate → Evolve bootstrapped-se (OCA cycle + extensions): Observe ───────────→ Codify ────→ Automate ↑ ↓ └─────────────── Evolve ──────────┘ (Self-referential feedback loop) ``` **What bootstrapped-se adds beyond empirical-methodology**: 1. **Three-Tuple Output** (O, Aₙ, Mₙ) - Reusable artifacts at system level 2. **Agent Framework** - Specialized agents emerge from domain needs 3. **Meta-Agent System** - Modular capabilities for coordination 4. **Self-Referential Loop** - Framework applies to itself 5. **Formal Convergence** - System stability criteria (M_n == M_{n-1}, A_n == A_{n-1}) **When to use empirical-methodology explicitly**: - Need detailed scientific method guidance - Require fine-grained observation tool selection - Want explicit separation of Analyze phase **When to use bootstrapped-se**: - **Always** - It's the core framework - All Bootstrap experiments use bootstrapped-se as foundation - Provides complete OCA cycle with agent system ### Relationship to value-optimization (Mutual Support) **value-optimization PROVIDES QUANTIFICATION for bootstrapped-se**: ``` bootstrapped-se needs: value-optimization provides: - Quality measurement → Dual-layer value functions - Convergence detection → Formal convergence criteria - Evolution decisions → ΔV calculations, trends - Success validation → V_instance ≥ 0.80, V_meta ≥ 0.80 ``` **bootstrapped-se ENABLES value-optimization**: - OCA cycle generates state transitions (s_i → s_{i+1}) - Agent work produces V_instance improvements - Meta-Agent work produces V_meta improvements - Iteration framework implements optimization loop **When to use value-optimization**: - **Always with bootstrapped-se** - Provides evaluation framework - Calculate V_instance and V_meta at every iteration - Check convergence criteria formally - Compare across experiments **Integration**: ``` Every bootstrapped-se iteration: 1. Execute OCA cycle (Observe → Codify → Automate) 2. Calculate V(s_n) using value-optimization 3. Check convergence (system stable + dual threshold) 4. If not converged: Continue iteration 5. If converged: Generate (O, Aₙ, Mₙ) ``` ### Three-Methodology Integration **Complete workflow** (as used in all Bootstrap experiments): ``` ┌─ methodology-framework ─────────────────────┐ │ │ │ ┌─ bootstrapped-se (CORE) ───────────────┐ │ │ │ │ │ │ │ ┌─ empirical-methodology ──────────┐ │ │ │ │ │ │ │ │ │ │ │ Observe + Analyze │ │ │ │ │ │ Codify (with evidence) │ │ │ │ │ │ Automate (CI/CD) │ │ │ │ │ │ Evolve (self-referential) │ │ │ │ │ │ │ │ │ │ │ └───────────────────────────────────┘ │ │ │ │ ↓ │ │ │ │ Produce: (O, Aₙ, Mₙ) │ │ │ │ ↓ │ │ │ │ ┌─ value-optimization ──────────────┐ │ │ │ │ │ │ │ │ │ │ │ V_instance(s_n) = domain quality │ │ │ │ │ │ V_meta(s_n) = methodology quality│ │ │ │ │ │ │ │ │ │ │ │ Convergence check: │ │ │ │ │ │ - System stable? │ │ │ │ │ │ - Dual threshold met? │ │ │ │ │ │ │ │ │ │ │ └───────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────┘ ``` **Usage Recommendation**: - **Start here**: Read bootstrapped-se.md (this file) - **Add evaluation**: Read value-optimization.md - **Add rigor**: Read empirical-methodology.md (optional) - **See integration**: Read bootstrapped-ai-methodology-engineering.md (BAIME framework) --- ## Related Skills - **bootstrapped-ai-methodology-engineering**: Unified BAIME framework integrating all three methodologies - **empirical-methodology**: Scientific foundation (included in bootstrapped-se) - **value-optimization**: Quantitative evaluation framework (used by bootstrapped-se) - **iteration-executor**: Implementation agent (coordinates bootstrapped-se execution) --- ## Knowledge Base ### Source Documentation - **Core methodology**: `docs/methodology/bootstrapped-software-engineering.md` - **Related**: `docs/methodology/empirical-methodology-development.md` - **Examples**: `experiments/bootstrap-*/` (8 validated experiments) ### Key Concepts - OCA Framework (Observe-Codify-Automate) - Three-Tuple Output (O, Aₙ, Mₙ) - Self-Referential Feedback Loop - Convergence Theorem - Meta-Methodology --- ## Version History - **v1.0.0** (2025-10-18): Initial release - Based on meta-cc methodology development - 8 experiments validated (95% transferability) - OCA framework with 5-layer feedback loop - Empirical validation from 277 commits, 11 days --- **Status**: ✅ Production-ready **Validation**: 8 experiments (Bootstrap-001 to -013) **Effectiveness**: 10-50x methodology development speedup **Transferability**: 95% (framework universal, tools adaptable)