Files
gh-yaleh-meta-cc-claude/skills/methodology-bootstrapping/reference/observe-codify-automate.md
2025-11-30 09:07:22 +08:00

1235 lines
37 KiB
Markdown

---
name: bootstrapped-se
description: Apply Bootstrapped Software Engineering (BSE) methodology to evolve project-specific development practices through systematic Observe-Codify-Automate cycles
keywords: bootstrapping, meta-methodology, OCA, observe, codify, automate, self-improvement, empirical, methodology-development
category: methodology
version: 1.0.0
based_on: docs/methodology/bootstrapped-software-engineering.md
transferability: 95%
effectiveness: 10-50x methodology development speedup
---
# Bootstrapped Software Engineering
**Evolve project-specific methodologies through systematic observation, codification, and automation.**
> The best methodologies are not **designed** but **evolved** through systematic observation, codification, and automation of successful practices.
---
## Core Insight
Traditional methodologies are theory-driven and static. **Bootstrapped Software Engineering (BSE)** enables development processes to:
1. **Observe** themselves through instrumentation and data collection
2. **Codify** discovered patterns into reusable methodologies
3. **Automate** methodology enforcement and validation
4. **Self-improve** by applying the methodology to its own evolution
### Three-Tuple Output
Every BSE process produces:
```
(O, Aₙ, Mₙ)
where:
O = Task output (code, documentation, system)
Aₙ = Converged agent set (reusable for similar tasks)
Mₙ = Converged meta-agent (transferable to new domains)
```
---
## The OCA Framework
**Three-Phase Cycle**: Observe → Codify → Automate
### Phase 1: OBSERVE
**Instrument your development process to collect data**
**Tools**:
- Session history analysis (meta-cc)
- Git commit analysis
- Code metrics (coverage, complexity)
- Access pattern tracking
- Error rate monitoring
**Example** (from meta-cc):
```bash
# Analyze file access patterns
meta-cc query files --threshold 5
# Result: plan.md accessed 423 times (highest)
# Insight: Core reference document, needs optimization
```
**Output**: Empirical data about actual development patterns
### Phase 2: CODIFY
**Extract patterns and document as reusable methodologies**
**Process**:
1. **Pattern Recognition**: Identify recurring successful practices
2. **Hypothesis Formation**: Formulate testable claims
3. **Documentation**: Write methodology documents
4. **Validation**: Test methodology on real scenarios
**Example** (from meta-cc):
```markdown
# Discovered Pattern: Role-Based Documentation
Observation:
- plan.md: 423 accesses (Coordination role)
- CLAUDE.md: ~300 implicit loads (Entry Point role)
- features.md: 89 accesses (Reference role)
Methodology:
- Classify docs by actual access patterns
- Optimize high-access docs for token efficiency
- Create role-specific maintenance procedures
Validation:
- CLAUDE.md reduction: 607 → 278 lines (-54%)
- Token cost reduction: 47%
- Access efficiency: Maintained
```
**Output**: Documented methodology with empirical validation
### Phase 3: AUTOMATE
**Convert methodology into automated checks and tools**
**Automation Levels**:
1. **Detection**: Automated pattern detection
2. **Validation**: Check compliance with methodology
3. **Enforcement**: CI/CD integration, block violations
4. **Suggestion**: Automated fix recommendations
**Example** (from meta-cc):
```bash
# Automation: /meta doc-health capability
# Checks:
- Role classification compliance
- Token efficiency (lines < threshold)
- Cross-reference completeness
- Update frequency
# Actions:
- Flag oversized documents
- Suggest restructuring
- Validate role assignments
```
**Output**: Automated tools enforcing methodology
---
## Self-Referential Feedback Loop
The ultimate power of BSE: **Apply the methodology to improve itself**
```
Layer 0: Basic Functionality
→ Build tools (meta-cc CLI)
Layer 1: Self-Observation
→ Use tools on self (query own sessions)
→ Discovery: Usage patterns, bottlenecks
Layer 2: Pattern Recognition
→ Analyze data (R/E ratio, access density)
→ Discovery: Document roles, optimization opportunities
Layer 3: Methodology Extraction
→ Codify patterns (role-based-documentation.md)
→ Definition: Classification algorithm, maintenance procedures
Layer 4: Tool Automation
→ Implement checks (/meta doc-health)
→ Auto-validate: Methodology compliance
Layer 5: Continuous Evolution
→ Apply tools to self
→ Discover new patterns → Update methodology → Update tools
```
**This creates a closed loop**: Tools improve tools, methodologies optimize methodologies.
---
## Parameters
- **domain**: `documentation` | `testing` | `architecture` | `custom` (default: `custom`)
- **observation_period**: number of days/commits to analyze (default: auto-detect)
- **automation_level**: `detect` | `validate` | `enforce` | `suggest` (default: `validate`)
- **iteration_count**: number of OCA cycles (default: 3)
---
## Execution Flow
### Phase 1: Observation Setup
```python
1. Identify observation targets
- Code metrics (LOC, complexity, coverage)
- Development patterns (commits, PRs, errors)
- Access patterns (file reads, searches)
- Quality metrics (test results, build time)
2. Install instrumentation
- meta-cc integration (session analysis)
- Git hooks (commit tracking)
- Coverage tracking
- CI/CD metrics
3. Collect baseline data
- Run for observation_period
- Generate initial reports
- Identify data gaps
```
### Phase 2: Pattern Analysis
```python
4. Analyze collected data
- Statistical analysis (frequencies, correlations)
- Pattern recognition (recurring behaviors)
- Anomaly detection (outliers, inefficiencies)
5. Formulate hypotheses
- "High-access docs should be < 300 lines"
- "Test coverage gaps correlate with bugs"
- "Batch remediation is 5x more efficient"
6. Validate hypotheses
- Historical data validation
- A/B testing if possible
- Expert review
```
### Phase 3: Codification
```python
7. Document patterns
- Pattern name and description
- Context and applicability
- Implementation steps
- Validation criteria
- Examples and counter-examples
8. Create methodology
- Problem statement
- Solution approach
- Procedures and guidelines
- Metrics and validation
9. Peer review
- Team review
- Iterate based on feedback
```
### Phase 4: Automation
```python
10. Design automation
- Detection: Identify when pattern applies
- Validation: Check compliance
- Enforcement: Prevent violations
- Suggestion: Recommend fixes
11. Implement tools
- Scripts (bash, Python)
- CI/CD integration
- IDE plugins
- Bot automation
12. Deploy and monitor
- Gradual rollout
- Collect usage data
- Measure effectiveness
```
### Phase 5: Evolution
```python
13. Apply to self
- Use tools on development process
- Discover meta-patterns
- Optimize methodology
14. Iterate OCA cycle
- New observations New patterns
- Refined hypotheses Better validation
- Enhanced automation Higher compliance
```
---
## Usage Examples
### Example 1: Documentation Optimization
```bash
# User: "Optimize project documentation"
bootstrapped-se domain=documentation
# Execution:
[OBSERVE Phase]
✓ Analyzing file access patterns (30 days)...
- README.md: 423 accesses (Entry Point)
- ARCHITECTURE.md: 89 accesses (Reference)
- API.md: 234 accesses (Reference)
✓ Measuring token efficiency...
- README.md: 1909 lines (HIGH - inefficient)
- ARCHITECTURE.md: 456 lines (OK)
- API.md: 789 lines (MEDIUM)
[CODIFY Phase]
✓ Pattern identified: Role-Based Documentation
- Entry Point docs: Should be < 300 lines
- Reference docs: Should be < 500 lines
- Specialized docs: Can be longer
✓ Methodology documented:
- Created: docs/methodology/role-based-documentation.md
- Includes: Classification algorithm, optimization guide
[AUTOMATE Phase]
✓ Automation created:
- Script: scripts/check-doc-health.sh
- CI check: .github/workflows/doc-validation.yml
- Capability: /meta doc-health
[EVOLUTION Phase]
✓ Applied to self:
- Optimized README.md: 1909275 lines (-85%)
- Optimized CLAUDE.md: 607278 lines (-54%)
- Token cost reduction: 47%
✓ Next iteration insights:
- New pattern: Cross-reference density
- New automation: Auto-linking checker
```
### Example 2: Testing Strategy Evolution
```bash
# User: "Develop systematic testing methodology"
bootstrapped-se domain=testing observation_period=60
# Execution:
[OBSERVE Phase]
✓ Analyzing test patterns (60 days)...
- Coverage: 75% (good)
- Test distribution: Uneven (80% unit, 15% integration, 5% e2e)
- Flaky tests: 12% failure rate
- Slowest tests: integration (avg 45s)
[CODIFY Phase]
✓ Patterns identified:
1. Coverage-driven development works
2. Integration tests need fixtures
3. Flaky tests have environmental dependencies
✓ Methodology created:
- Test strategy matrix (when to write what)
- Fixture patterns for integration tests
- Flaky test elimination procedures
[AUTOMATE Phase]
✓ Automation implemented:
- Coverage gate: Block if < 75%
- Fixture generator: Auto-create from examples
- Flaky detector: Flag unstable tests
✓ Results:
- Coverage: 75% → 86%
- Flaky rate: 12% → 2%
- Test time: 45s → 12s (with parallel execution)
```
### Example 3: Error Recovery Patterns
```bash
# User: "Extract error recovery methodology"
bootstrapped-se domain=custom
# Execution:
[OBSERVE Phase]
✓ Analyzing error patterns...
- Total errors: 423 (from sessions)
- Error rate: 6.06%
- Most common: Type errors (45%), Logic errors (30%), Deps (15%)
[CODIFY Phase]
✓ Pattern: Error Classification Taxonomy
- Categories: Type, Logic, Dependency, Integration, Infrastructure
- Recovery strategies per category
- Prevention guidelines
✓ Methodology: Systematic Error Recovery
- Detection: Error signature extraction
- Classification: Rule-based categorization
- Recovery: Strategy pattern application
- Prevention: Root cause analysis → Code patterns
[AUTOMATE Phase]
✓ Tools created:
- Error classifier (ML-based)
- Recovery strategy recommender
- Prevention linter (detect anti-patterns)
✓ CI/CD Integration:
- Auto-classify build failures
- Suggest recovery steps
- Track error trends
```
---
## Validated Outcomes
**From meta-cc project** (8 experiments, 95% transferability):
### Documentation Methodology
- **Observation**: 423 file access patterns analyzed
- **Codification**: Role-based documentation methodology
- **Automation**: /meta doc-health capability
- **Result**: 47% token cost reduction, maintained accessibility
### Testing Strategy
- **Observation**: 75% coverage, uneven distribution
- **Codification**: Coverage-driven gap closure
- **Automation**: CI coverage gates, fixture generators
- **Result**: 75% → 86% coverage, 15x speedup vs ad-hoc
### Error Recovery
- **Observation**: 6.06% error rate, 423 errors analyzed
- **Codification**: Error taxonomy, recovery patterns
- **Automation**: Error classifier, recovery recommender
- **Result**: 85% transferability, systematic recovery
### Dependency Health
- **Observation**: 7 vulnerabilities, 11 outdated deps
- **Codification**: 6 patterns (vulnerability, update, license, etc.)
- **Automation**: 3 scripts + CI/CD workflow
- **Result**: 6x speedup (9h → 1.5h), 88% transferability
### Observability
- **Observation**: 0 logs, 0 metrics, 0 traces (baseline)
- **Codification**: Three Pillars methodology (Logging + Metrics + Tracing)
- **Automation**: Code generators, instrumentation templates
- **Result**: 23-46x speedup, 90-95% transferability
---
## Transferability
**95% transferable** across domains and projects:
### What Transfers (95%+)
- OCA framework itself (universal process)
- Self-referential feedback loop pattern
- Observation → Pattern → Automation pipeline
- Empirical validation approach
- Continuous evolution mindset
### What Needs Adaptation (5%)
- Specific observation tools (meta-cc → custom tools)
- Domain-specific patterns (docs vs testing vs architecture)
- Automation implementation details (language, platform)
### Adaptation Effort
- **Same project, new domain**: 2-4 hours
- **New project, same domain**: 4-8 hours
- **New project, new domain**: 8-16 hours
---
## Prerequisites
### Tools Required
- **Session analysis**: meta-cc or equivalent
- **Git analysis**: Git installed, access to repository
- **Metrics collection**: Coverage tools, static analyzers
- **Automation**: CI/CD platform (GitHub Actions, GitLab CI, etc.)
### Skills Required
- Basic data analysis (statistics, pattern recognition)
- Methodology documentation
- Scripting (bash, Python, or equivalent)
- CI/CD configuration
---
## Implementation Guidance
### Start Small
```bash
# Week 1: Observe
- Install meta-cc
- Track file accesses for 1 week
- Collect simple metrics
# Week 2: Codify
- Analyze top 10 access patterns
- Document 1-2 simple patterns
- Get team feedback
# Week 3: Automate
- Create 1 simple validation script
- Add to CI/CD
- Monitor compliance
# Week 4: Iterate
- Apply tools to development
- Discover new patterns
- Refine methodology
```
### Scale Up
```bash
# Month 2: Expand domains
- Apply to testing
- Apply to architecture
- Cross-validate patterns
# Month 3: Deep automation
- Build sophisticated checkers
- Integrate with IDE
- Create dashboards
# Month 4: Evolution
- Meta-patterns emerge
- Methodology generator
- Cross-project application
```
---
## Theoretical Foundation
### The Convergence Theorem
**Conjecture**: For any domain D, there exists a methodology M* such that:
1. **M* is locally optimal** for D (cannot be significantly improved)
2. **M* can be reached through bootstrapping** (systematic self-improvement)
3. **Convergence speed increases** with each iteration (learning effect)
**Implication**: We can **automatically discover** optimal methodologies for any domain.
### Scientific Method Analogy
```
1. Observation = Instrumentation (meta-cc tools)
2. Hypothesis = "CLAUDE.md should be <300 lines"
3. Experiment = Implement constraint, measure effects
4. Data Collection = query-files, git log analysis
5. Analysis = Calculate R/E ratio, access density
6. Conclusion = "300-line limit effective: 47% reduction"
7. Publication = Codify as methodology document
8. Replication = Apply to other projects
```
---
## Success Criteria
| Metric | Target | Validation |
|--------|--------|------------|
| **Pattern Discovery** | ≥3 patterns per cycle | Documented patterns |
| **Methodology Quality** | Peer-reviewed | Team acceptance |
| **Automation Coverage** | ≥80% of patterns | CI integration |
| **Effectiveness** | ≥3x improvement | Before/after metrics |
| **Transferability** | ≥85% reusability | Cross-project validation |
---
## Domain Adaptation Guide
**Different domains have different complexity characteristics** that affect iteration count, agent needs, and convergence patterns. This guide helps predict and adapt to domain-specific challenges.
### Domain Complexity Classes
Based on 8 completed Bootstrap experiments, we've identified three complexity classes:
#### Simple Domains (3-4 iterations)
**Characteristics**:
- Well-defined problem space
- Clear success criteria
- Limited interdependencies
- Established best practices exist
- Straightforward automation
**Examples**:
- **Bootstrap-010 (Dependency Health)**: 3 iterations
- Clear goals: vulnerabilities, freshness, licenses
- Existing tools: govulncheck, go-licenses
- Straightforward automation: CI/CD scripts
- Converged fastest in series
- **Bootstrap-011 (Knowledge Transfer)**: 3-4 iterations
- Well-understood domain: onboarding paths
- Clear structure: Day-1, Week-1, Month-1
- Existing patterns: progressive disclosure
- High transferability (95%+)
**Adaptation Strategy**:
```markdown
Simple Domain Approach:
1. Start with generic agents only (coder, data-analyst, doc-writer)
2. Focus on automation (tools, scripts, CI)
3. Expect fast convergence (3-4 iterations)
4. Prioritize transferability (aim for 85%+)
5. Minimal agent specialization needed
```
**Expected Outcomes**:
- Iterations: 3-4
- Duration: 6-8 hours
- Specialized agents: 0-1
- Transferability: 85-95%
- V_instance: Often exceeds 0.80 significantly (e.g., 0.92)
#### Medium Complexity Domains (4-6 iterations)
**Characteristics**:
- Multiple dimensions to optimize
- Some ambiguity in success criteria
- Moderate interdependencies
- Require domain expertise
- Automation has nuances
**Examples**:
- **Bootstrap-001 (Documentation)**: 3 iterations (simple side of medium)
- Multiple roles to define
- Access patterns analysis needed
- Search infrastructure complexity
- 85% transferability
- **Bootstrap-002 (Testing)**: 5 iterations
- Coverage vs quality trade-offs
- Multiple test types (unit, integration, e2e)
- Fixture patterns discovery
- 89% transferability
- **Bootstrap-009 (Observability)**: 6 iterations
- Three pillars (logging, metrics, tracing)
- Performance vs verbosity trade-offs
- Integration complexity
- 90-95% transferability
**Adaptation Strategy**:
```markdown
Medium Domain Approach:
1. Start with generic agents, add 1-2 specialized as needed
2. Expect iterative refinement of value functions
3. Plan for 4-6 iterations
4. Balance instance and meta objectives equally
5. Document trade-offs explicitly
```
**Expected Outcomes**:
- Iterations: 4-6
- Duration: 8-12 hours
- Specialized agents: 1-3
- Transferability: 85-90%
- V_instance: Typically 0.80-0.87
#### Complex Domains (6-8+ iterations)
**Characteristics**:
- High interdependency
- Emergent patterns (not obvious upfront)
- Multiple competing objectives
- Requires novel agent capabilities
- Automation is sophisticated
**Examples**:
- **Bootstrap-013 (Cross-Cutting Concerns)**: 8 iterations
- Pattern extraction from existing code
- Convention definition ambiguity
- Automated enforcement complexity
- Large codebase scope (all modules)
- Longest experiment but highest ROI (16.7x)
- **Bootstrap-003 (Error Recovery)**: 5 iterations (complex side)
- Error taxonomy creation
- Root cause diagnosis
- Recovery strategy patterns
- 85% transferability
- **Bootstrap-012 (Technical Debt)**: 4 iterations (medium-complex)
- SQALE quantification
- Prioritization complexity
- Subjective vs objective debt
- 85% transferability
**Adaptation Strategy**:
```markdown
Complex Domain Approach:
1. Expect agent evolution throughout
2. Plan for 6-8+ iterations
3. Accept lower initial V values (baseline often <0.35)
4. Focus on one dimension per iteration
5. Create specialized agents proactively when gaps identified
6. Document emergent patterns as discovered
```
**Expected Outcomes**:
- Iterations: 6-8+
- Duration: 12-18 hours
- Specialized agents: 3-5
- Transferability: 70-85%
- V_instance: Hard-earned 0.80-0.85
- Largest single-iteration gains possible (e.g., +27.3% in Bootstrap-013 Iteration 7)
### Domain-Specific Considerations
#### Documentation-Heavy Domains
**Examples**: Documentation (001), Knowledge Transfer (011)
**Key Adaptations**:
- Prioritize clarity over completeness
- Role-based structuring
- Accessibility optimization
- Cross-referencing systems
**Success Indicators**:
- Access/line ratio > 1.0
- User satisfaction surveys
- Search effectiveness
#### Technical Implementation Domains
**Examples**: Observability (009), Dependency Health (010)
**Key Adaptations**:
- Performance overhead monitoring
- Automation-first approach
- Integration testing critical
- CI/CD pipeline emphasis
**Success Indicators**:
- Automated coverage %
- Performance impact < 10%
- CI/CD reliability
#### Quality/Analysis Domains
**Examples**: Testing (002), Error Recovery (003), Technical Debt (012)
**Key Adaptations**:
- Quantification frameworks essential
- Baseline measurement critical
- Before/after comparisons
- Statistical validation
**Success Indicators**:
- Coverage metrics
- Error rate reduction
- Time savings quantified
#### Systematic Enforcement Domains
**Examples**: Cross-Cutting Concerns (013), Code Review (008 planned)
**Key Adaptations**:
- Pattern extraction from existing code
- Linter/checker development
- Gradual enforcement rollout
- Exception handling
**Success Indicators**:
- Pattern consistency %
- Violation detection rate
- Developer adoption rate
### Predicting Iteration Count
Based on empirical data from 8 experiments:
```
Base estimate: 5 iterations
Adjust based on:
- Well-defined domain: -2 iterations
- Existing tools available: -1 iteration
- High interdependency: +2 iterations
- Novel patterns needed: +1 iteration
- Large codebase scope: +1 iteration
- Multiple competing goals: +1 iteration
Examples:
Dependency Health: 5 - 2 - 1 = 2 → actual 3 ✓
Observability: 5 + 0 + 1 = 6 → actual 6 ✓
Cross-Cutting: 5 + 2 + 1 = 8 → actual 8 ✓
```
### Agent Specialization Prediction
```
Generic agents sufficient when:
- Domain has established patterns
- Clear best practices exist
- Automation is straightforward
→ Examples: Dependency Health, Knowledge Transfer
Specialized agents needed when:
- Novel pattern extraction required
- Domain-specific expertise needed
- Complex analysis algorithms
→ Examples: Observability (log-analyzer, metric-designer)
Cross-Cutting (pattern-extractor, convention-definer)
Rule of thumb:
- Simple domains: 0-1 specialized agents
- Medium domains: 1-3 specialized agents
- Complex domains: 3-5 specialized agents
```
### Meta-Agent Evolution Prediction
**Key finding from 8 experiments**: **M₀ was sufficient in ALL cases**
```
Meta-Agent M₀ capabilities (5):
1. observe: Pattern observation
2. plan: Iteration planning
3. execute: Agent orchestration
4. reflect: Value assessment
5. evolve: System evolution
No evolution needed because:
- M₀ capabilities cover full lifecycle
- Agent specialization handles domain gaps
- Modular design allows capability reuse
```
**When to evolve Meta-Agent** (theoretical, not yet observed):
- Novel coordination pattern needed
- Capability gap in lifecycle
- Cross-agent orchestration complexity
- New convergence pattern discovered
### Convergence Pattern Prediction
Based on domain characteristics:
**Standard Dual Convergence** (most common):
- Both V_instance and V_meta reach 0.80+
- Examples: Observability (009), Dependency Health (010), Technical Debt (012), Cross-Cutting (013)
- **Use when**: Both objectives equally important
**Meta-Focused Convergence**:
- V_meta reaches 0.80+, V_instance practically sufficient
- Example: Knowledge Transfer (011) - V_meta = 0.877, V_instance = 0.585
- **Use when**: Methodology is primary goal, instance is vehicle
**Practical Convergence**:
- Combined quality exceeds metrics, justified partial criteria
- Example: Testing (002) - V_instance = 0.848, quality > coverage %
- **Use when**: Quality evidence exceeds raw numbers
### Domain Transfer Considerations
**Transferability varies by domain abstraction**:
```
High (90-95%):
- Knowledge Transfer (95%+): Learning principles universal
- Observability (90-95%): Three Pillars apply everywhere
Medium-High (85-90%):
- Testing (89%): Test types similar across languages
- Dependency Health (88%): Package manager patterns similar
- Documentation (85%): Role-based structure universal
- Error Recovery (85%): Error taxonomy concepts transfer
- Technical Debt (85%): SQALE principles universal
Medium (70-85%):
- Cross-Cutting Concerns (70-80%): Language-specific patterns
- Refactoring (80% est.): Code smells vary by language
```
**Adaptation effort**:
```
Same language/ecosystem: 10-20% effort (adapt examples)
Similar language (Go→Rust): 30-40% effort (remap patterns)
Different paradigm (Go→JS): 50-60% effort (rethink patterns)
```
---
## Context Management for LLM Execution
λ(iteration, context_state) → (work_output, context_optimized) | context < limit:
**Context management is critical for LLM-based execution** where token limits constrain iteration depth and agent effectiveness.
### Context Allocation Protocol
```
context_allocation :: Phase → Percentage
context_allocation(phase) = match phase {
observation → 0.30, -- Data collection, pattern analysis
codification → 0.40, -- Documentation, methodology writing
automation → 0.20, -- Tool creation, CI integration
reflection → 0.10 -- Evaluation, planning
} where Σ = 1.0
```
**Rationale**: Based on 8 experiments, codification consumes most context (methodology docs, agent definitions), followed by observation (data analysis), automation (code writing), and reflection (evaluation).
### Context Pressure Management
```
context_pressure :: State → Strategy
context_pressure(s) =
if usage(s) > 0.80 then overflow_protocol(s)
else if usage(s) > 0.50 then compression_protocol(s)
else standard_protocol(s)
```
### Overflow Protocol (Context >80%)
```
overflow_protocol :: State → Action
overflow_protocol(s) = prioritize(
serialize_to_disk: save(s.knowledge/*) ∧ compress(s.history),
reference_compression: link(files) ∧ ¬inline(content),
session_split: checkpoint(s) ∧ continue(s_{n+1}, fresh_context)
) where preserve_critical ∧ drop_redundant
```
**Actions**:
1. **Serialize to disk**: Save iteration state to `knowledge/` directory
2. **Reference compression**: Link to files instead of inlining content
3. **Session split**: Complete current phase, start new session for next iteration
**Example** (from Bootstrap-013, 8 iterations):
- Iteration 4: Context 85% → Serialized analysis to `knowledge/pattern-analysis.md`
- Iteration 5: Started fresh session, loaded serialized state via file references
- Result: Continued 4 more iterations without context overflow
### Compression Protocol (Context 50-80%)
```
compression_protocol :: State → Optimizations
compression_protocol(s) = apply(
deduplication: merge(similar_patterns) ∧ reference_once,
summarization: compress(historical_context) ∧ keep(structure),
lazy_loading: defer(load) ∧ fetch_on_demand
)
```
**Optimizations**:
1. **Deduplication**: Merge similar patterns, reference once
2. **Summarization**: Compress historical iterations while preserving structure
3. **Lazy loading**: Load agent definitions only when invoked
### Convergence Adjustment Under Context Pressure
```
convergence_adjustment :: (Context, V_i, V_m) → Threshold
convergence_adjustment(ctx, V_i, V_m) =
if usage(ctx) > 0.80 then
prefer(meta_focused) ∧ accept(V_i ≥ 0.55 ∧ V_m ≥ 0.80)
else if usage(ctx) > 0.50 then
standard_dual ∧ target(V_i ≥ 0.80 ∧ V_m ≥ 0.80)
else
extended_optimization ∧ pursue(V_i ≥ 0.90)
```
**Principle**: Under high context pressure, prioritize methodology quality (V_meta) over instance quality (V_instance), as methodology is more transferable and valuable long-term.
**Validation** (Bootstrap-011):
- Context pressure: High (95%+ transferability methodology)
- Converged with: V_meta = 0.877, V_instance = 0.585
- Pattern: Meta-Focused Convergence justified by context constraints
### Context Tracking Metrics
```
context_metrics :: State → Metrics
context_metrics(s) = {
usage_percentage: tokens_used / tokens_limit,
phase_distribution: {obs: 0.30, cod: 0.40, aut: 0.20, ref: 0.10},
compression_ratio: compressed_size / original_size,
session_splits: count(checkpoints)
}
```
Track these metrics to predict when intervention needed.
---
## Prompt Evolution Protocol
λ(agent, effectiveness_data) → agent' | ∀evolution: evidence_driven:
**Systematic prompt engineering** based on empirical effectiveness data, not intuition.
### Core Prompt Patterns
```
prompt_pattern :: Pattern → Template
prompt_pattern(p) = match p {
context_bounded:
"Process {input} in chunks of {size}. For each chunk: {analysis}. Aggregate: {synthesis}.",
tool_orchestrating:
"Execute: {tool_sequence}. For each result: {validation}. If {condition}: {fallback}.",
iterative_refinement:
"Attempt: {approach_1}. Assess: {criteria}. If insufficient: {approach_2}. Repeat until: {threshold}.",
evidence_accumulation:
"Hypothesis: {H}. Seek confirming: {C}. Seek disconfirming: {D}. Weight: {W}. Decide: {decision}."
}
```
**Usage**:
- **context_bounded**: When processing large datasets (e.g., log analysis, file scanning)
- **tool_orchestrating**: When coordinating multiple MCP tools (e.g., query cascade)
- **iterative_refinement**: When solution quality improves through iteration (e.g., optimization)
- **evidence_accumulation**: When validating hypotheses (e.g., pattern discovery)
### Prompt Effectiveness Measurement
```
prompt_effectiveness :: Prompt → Metrics
prompt_effectiveness(P) = measure(
convergence_contribution: ΔV_per_iteration,
token_efficiency: output_value / tokens_used,
error_rate: failures / total_invocations,
reusability: cross_domain_success_rate
) where empirical_data ∧ comparative_baseline
```
**Metrics**:
1. **Convergence contribution**: How much does agent improve V_instance or V_meta per iteration?
2. **Token efficiency**: Value delivered per token consumed (cost-effectiveness)
3. **Error rate**: Percentage of invocations that fail or produce invalid output
4. **Reusability**: Success rate when applied to different domains
**Example** (from Bootstrap-009):
- log-analyzer agent:
- ΔV_per_iteration: +0.12 average
- Token efficiency: 0.85 (high value, moderate tokens)
- Error rate: 3% (acceptable)
- Reusability: 90% (worked in 009, 010, 012)
- Result: Prompt kept, agent reused in subsequent experiments
### Prompt Evolution Decision
```
prompt_evolution :: (P, Evidence) → P'
prompt_evolution(P, E) =
if improvement_demonstrated(E) ∧ generalization_validated(E) then
update(P → P') ∧ version(P'.version + 1) ∧ document(E.rationale)
else
maintain(P) ∧ log(evolution_rejected, E.reason)
where ¬premature_optimization ∧ n_samples ≥ 3
```
**Evolution criteria**:
1. **Improvement demonstrated**: Evidence shows measurable improvement (ΔV > 0.05 or error_rate < 50%)
2. **Generalization validated**: Works across ≥3 different scenarios
3. **n_samples ≥ 3**: Avoid overfitting to single case
**Example** (theoretical - prompt evolution not yet observed in 8 experiments):
```
Original prompt: "Analyze logs for errors."
Evidence: Error detection rate 67%, false positives 23%
Evolved prompt: "Analyze {logs} for errors. For each: classify(type, severity, context). Filter: severity >= {threshold}. Output: structured_json."
Evidence: Error detection rate 89%, false positives 8%
Decision: Evolution accepted (improvement demonstrated, validated across 3 log types)
```
### Agent Prompt Protocol
```
agent_prompt_protocol :: Agent → Execution
agent_prompt_protocol(A) = ∀invocation:
read(agents/{A.name}.md) ∧
extract(prompt_latest_version) ∧
apply(prompt) ∧
track(effectiveness) ∧
¬cache_prompt
```
**Critical**: Always read agent definition fresh (no caching) to ensure latest prompt version used.
**Tracking**:
- Log each invocation: agent_name, prompt_version, input, output, success/failure
- Aggregate metrics: Calculate effectiveness scores periodically
- Trigger evolution: When n_samples ≥ 3 and improvement opportunity identified
---
## Relationship to Other Methodologies
**bootstrapped-se is the CORE framework** that integrates and extends two complementary methodologies.
### Relationship to empirical-methodology (Inclusion)
**bootstrapped-se INCLUDES AND EXTENDS empirical-methodology**:
```
empirical-methodology (5 phases):
Observe → Analyze → Codify → Automate → Evolve
bootstrapped-se (OCA cycle + extensions):
Observe ───────────→ Codify ────→ Automate
↑ ↓
└─────────────── Evolve ──────────┘
(Self-referential feedback loop)
```
**What bootstrapped-se adds beyond empirical-methodology**:
1. **Three-Tuple Output** (O, Aₙ, Mₙ) - Reusable artifacts at system level
2. **Agent Framework** - Specialized agents emerge from domain needs
3. **Meta-Agent System** - Modular capabilities for coordination
4. **Self-Referential Loop** - Framework applies to itself
5. **Formal Convergence** - System stability criteria (M_n == M_{n-1}, A_n == A_{n-1})
**When to use empirical-methodology explicitly**:
- Need detailed scientific method guidance
- Require fine-grained observation tool selection
- Want explicit separation of Analyze phase
**When to use bootstrapped-se**:
- **Always** - It's the core framework
- All Bootstrap experiments use bootstrapped-se as foundation
- Provides complete OCA cycle with agent system
### Relationship to value-optimization (Mutual Support)
**value-optimization PROVIDES QUANTIFICATION for bootstrapped-se**:
```
bootstrapped-se needs: value-optimization provides:
- Quality measurement → Dual-layer value functions
- Convergence detection → Formal convergence criteria
- Evolution decisions → ΔV calculations, trends
- Success validation → V_instance ≥ 0.80, V_meta ≥ 0.80
```
**bootstrapped-se ENABLES value-optimization**:
- OCA cycle generates state transitions (s_i → s_{i+1})
- Agent work produces V_instance improvements
- Meta-Agent work produces V_meta improvements
- Iteration framework implements optimization loop
**When to use value-optimization**:
- **Always with bootstrapped-se** - Provides evaluation framework
- Calculate V_instance and V_meta at every iteration
- Check convergence criteria formally
- Compare across experiments
**Integration**:
```
Every bootstrapped-se iteration:
1. Execute OCA cycle (Observe → Codify → Automate)
2. Calculate V(s_n) using value-optimization
3. Check convergence (system stable + dual threshold)
4. If not converged: Continue iteration
5. If converged: Generate (O, Aₙ, Mₙ)
```
### Three-Methodology Integration
**Complete workflow** (as used in all Bootstrap experiments):
```
┌─ methodology-framework ─────────────────────┐
│ │
│ ┌─ bootstrapped-se (CORE) ───────────────┐ │
│ │ │ │
│ │ ┌─ empirical-methodology ──────────┐ │ │
│ │ │ │ │ │
│ │ │ Observe + Analyze │ │ │
│ │ │ Codify (with evidence) │ │ │
│ │ │ Automate (CI/CD) │ │ │
│ │ │ Evolve (self-referential) │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────┘ │ │
│ │ ↓ │ │
│ │ Produce: (O, Aₙ, Mₙ) │ │
│ │ ↓ │ │
│ │ ┌─ value-optimization ──────────────┐ │ │
│ │ │ │ │ │
│ │ │ V_instance(s_n) = domain quality │ │ │
│ │ │ V_meta(s_n) = methodology quality│ │ │
│ │ │ │ │ │
│ │ │ Convergence check: │ │ │
│ │ │ - System stable? │ │ │
│ │ │ - Dual threshold met? │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────┘
```
**Usage Recommendation**:
- **Start here**: Read bootstrapped-se.md (this file)
- **Add evaluation**: Read value-optimization.md
- **Add rigor**: Read empirical-methodology.md (optional)
- **See integration**: Read bootstrapped-ai-methodology-engineering.md (BAIME framework)
---
## Related Skills
- **bootstrapped-ai-methodology-engineering**: Unified BAIME framework integrating all three methodologies
- **empirical-methodology**: Scientific foundation (included in bootstrapped-se)
- **value-optimization**: Quantitative evaluation framework (used by bootstrapped-se)
- **iteration-executor**: Implementation agent (coordinates bootstrapped-se execution)
---
## Knowledge Base
### Source Documentation
- **Core methodology**: `docs/methodology/bootstrapped-software-engineering.md`
- **Related**: `docs/methodology/empirical-methodology-development.md`
- **Examples**: `experiments/bootstrap-*/` (8 validated experiments)
### Key Concepts
- OCA Framework (Observe-Codify-Automate)
- Three-Tuple Output (O, Aₙ, Mₙ)
- Self-Referential Feedback Loop
- Convergence Theorem
- Meta-Methodology
---
## Version History
- **v1.0.0** (2025-10-18): Initial release
- Based on meta-cc methodology development
- 8 experiments validated (95% transferability)
- OCA framework with 5-layer feedback loop
- Empirical validation from 277 commits, 11 days
---
**Status**: ✅ Production-ready
**Validation**: 8 experiments (Bootstrap-001 to -013)
**Effectiveness**: 10-50x methodology development speedup
**Transferability**: 95% (framework universal, tools adaptable)