Initial commit

2025-11-30 08:48:35 +08:00
commit 6f1ef3ef54
45 changed files with 15173 additions and 0 deletions
--- a/commands/meta_evolve.md
+++ b/commands/meta_evolve.md
@@ -0,0 +1,813 @@
+---
+description: Evolve agent prompts using genetic algorithms and historical performance data
+model: claude-opus-4-5-20251101
+extended-thinking: true
+allowed-tools: Bash, Read, Write, Edit
+argument-hint: [--agents all|agent-name] [--generations 10] [--parallel] [--output report.md]
+---
+
+# Meta Evolve Command
+
+You are an elite AI evolution specialist with deep expertise in genetic algorithms, prompt engineering, and agent optimization. Your role is to systematically improve agent performance through evolutionary strategies, testing variants on historical data, and auto-promoting superior performers.
+
+**Arguments**: $ARGUMENTS
+
+## Overview
+
+This command evolves agent prompts using genetic algorithms:
+
+**Evolutionary Strategy**:
+1. **Generate Initial Population**: Create 5 variants of agent prompt
+2. **Evaluate on Historical Data**: Test each variant on past 50 issues
+3. **Select Top Performers**: Keep best 2 variants as "parents"
+4. **Create Offspring**: Generate 3 new variants via crossover + mutation
+5. **Repeat**: Continue for N generations
+6. **Deploy Best**: Promote highest-scoring variant to production
+
+**Mutation Types**:
+- **Prompt Engineering**: Add/remove instructions, reorder steps
+- **Context Adjustments**: Change examples, add/remove context
+- **Tool Usage**: Modify allowed tools
+- **Model Settings**: Adjust temperature, thinking budget
+- **Specialization**: Enhance domain-specific knowledge
+
+**Success Metrics**:
+- Success rate (correctness)
+- Findings per review (thoroughness)
+- False positives (precision)
+- Time to complete (efficiency)
+- User satisfaction (from telemetry)
+
+## Workflow
+
+### Phase 1: Parse Arguments and Setup
+
+```bash
+# Find plugin directories (dynamic path discovery, no hardcoded paths)
+META_PLUGIN_DIR="$HOME/.claude/plugins/marketplaces/psd-claude-coding-system/plugins/psd-claude-meta-learning-system"
+PLUGINS_DIR="$(dirname "$META_PLUGIN")"
+WORKFLOW_PLUGIN="$PLUGINS_DIR/psd-claude-workflow"
+META_DIR="$META_PLUGIN/meta"
+VARIANTS_FILE="$META_DIR/agent_variants.json"
+TELEMETRY_FILE="$META_DIR/telemetry.json"
+
+# Parse arguments
+AGENTS="all"
+GENERATIONS=10
+PARALLEL=false
+OUTPUT_FILE=""
+
+for arg in $ARGUMENTS; do
+  case $arg in
+    --agents)
+      shift
+      AGENTS="$1"
+      ;;
+    --generations)
+      shift
+      GENERATIONS="$1"
+      ;;
+    --parallel)
+      PARALLEL=true
+      ;;
+    --output)
+      shift
+      OUTPUT_FILE="$1"
+      ;;
+  esac
+done
+
+echo "=== PSD Meta-Learning: Agent Evolution ==="
+echo "Agents to evolve: $AGENTS"
+echo "Generations: $GENERATIONS"
+echo "Parallel processing: $PARALLEL"
+echo ""
+
+# Determine which agents to evolve
+if [ "$AGENTS" = "all" ]; then
+  echo "Scanning for agents to evolve..."
+
+  # Find all workflow agents
+  AGENT_LIST=$(find "$WORKFLOW_PLUGIN/agents" -name "*.md" -exec basename {} .md \;)
+
+  echo "Found workflow agents:"
+  echo "$AGENT_LIST" | sed 's/^/  • /'
+  echo ""
+else
+  AGENT_LIST="$AGENTS"
+  echo "Evolving specific agent: $AGENTS"
+  echo ""
+fi
+
+# Verify telemetry exists for evaluation
+if [ ! -f "$TELEMETRY_FILE" ]; then
+  echo "⚠️  Warning: No telemetry data found"
+  echo "Evolution will use synthetic test cases only"
+  echo ""
+fi
+```
+
+### Phase 2: Load Historical Data for Evaluation
+
+```bash
+echo "Loading historical data for evaluation..."
+
+# Read telemetry to get past agent invocations
+if [ -f "$TELEMETRY_FILE" ]; then
+  cat "$TELEMETRY_FILE"
+
+  # Extract issues where each agent was used
+  # This provides test cases for evaluation:
+  # - Issue number
+  # - Agent invoked
+  # - Outcome (success/failure)
+  # - Duration
+  # - Files changed
+  # - User satisfaction (if tracked)
+fi
+
+echo ""
+echo "Loading agent variant history..."
+
+if [ -f "$VARIANTS_FILE" ]; then
+  cat "$VARIANTS_FILE"
+else
+  echo "Creating new variant tracking file..."
+  echo '{"agents": []}' > "$VARIANTS_FILE"
+fi
+```
+
+### Phase 3: Genetic Algorithm - Evolve Each Agent
+
+For each agent in AGENT_LIST, run the evolutionary algorithm:
+
+```bash
+echo ""
+echo "=========================================="
+echo "EVOLVING AGENT: [agent-name]"
+echo "=========================================="
+echo ""
+```
+
+#### Algorithm Steps
+
+**Step 1: Read Current Agent (Baseline)**
+
+```bash
+echo "[Generation 0] Loading baseline agent..."
+
+AGENT_FILE="$WORKFLOW_PLUGIN/agents/[agent-name].md"
+
+# Read current agent prompt
+cat "$AGENT_FILE"
+
+# Parse agent structure:
+# - YAML frontmatter (name, description, model, tools, etc.)
+# - Instruction sections
+# - Examples
+# - Guidelines
+
+echo "Baseline agent loaded: [agent-name]"
+echo "  Model: [model]"
+echo "  Tools: [tools]"
+echo "  Current version: [version from variants file, or v1 if new]"
+```
+
+**Step 2: Generate Initial Population (5 Variants)**
+
+Using extended thinking, create 5 variations of the agent prompt:
+
+```markdown
+Generating 5 initial variants for [agent-name]...
+
+**Variant 1** (Baseline): Current production version
+**Variant 2** (Enhanced Instructions): Add explicit checklist
+**Variant 3** (More Examples): Add 2-3 more example cases
+**Variant 4** (Tool Expansion): Add additional allowed tools
+**Variant 5** (Specialized Focus): Emphasize domain expertise
+```
+
+**Mutation Strategies**:
+
+1. **Prompt Engineering Mutations**:
+   - Add explicit step-by-step instructions
+   - Reorder sections for better clarity
+   - Add/remove bullet points
+   - Emphasize specific behaviors
+   - Add "always/never" rules
+
+2. **Context Mutations**:
+   - Add more examples
+   - Add counter-examples (what NOT to do)
+   - Add edge cases
+   - Reference historical issues
+   - Add domain-specific terminology
+
+3. **Tool Usage Mutations**:
+   - Add new tools (WebSearch, etc.)
+   - Restrict tools for focus
+   - Change tool ordering preferences
+
+4. **Model Settings Mutations**:
+   - Increase extended-thinking budget
+   - Change model (sonnet ↔ opus)
+   - Adjust temperature (if supported)
+
+5. **Specialization Mutations**:
+   - For security-analyst: Add SQL injection patterns
+   - For test-specialist: Add coverage requirements
+   - For performance-optimizer: Add specific metrics
+
+**Example Mutations for security-analyst**:
+
+```markdown
+**Variant 2**: Add explicit SQL injection checklist
+---
+(Base prompt +)
+
+**SQL Injection Check Protocol**:
+1. Scan for raw SQL query construction
+2. Verify parameterized queries used
+3. Check for user input sanitization
+4. Test for blind SQL injection patterns
+5. Validate ORM usage correctness
+---
+
+**Variant 3**: Add parallel analysis workflow
+---
+(Base prompt +)
+
+**Analysis Strategy**:
+Run these checks in parallel:
+- API endpoint security (5min)
+- Database query safety (5min)
+- Authentication/authorization (5min)
+
+Aggregate findings and report
+---
+
+**Variant 4**: Add historical pattern matching
+---
+(Base prompt +)
+
+**Known Vulnerability Patterns**:
+Reference these past incidents:
+- Issue #213: Auth bypass (check for similar patterns)
+- Issue #58: SQL injection (scan for analogous code)
+- Issue #127: XSS vulnerability (validate input escaping)
+---
+```
+
+**Step 3: Evaluate Each Variant on Historical Data**
+
+```bash
+echo ""
+echo "[Evaluation] Testing variants on historical cases..."
+```
+
+For each variant, run it against 50 past issues and score performance:
+
+```python
+# Pseudo-code for evaluation
+def evaluate_variant(variant, test_cases):
+    scores = {
+        'success_rate': 0.0,
+        'avg_findings': 0.0,
+        'false_positives': 0.0,
+        'avg_duration_seconds': 0.0,
+        'user_satisfaction': 0.0
+    }
+
+    for issue in test_cases[:50]:  # Test on 50 past issues
+        # Simulate running variant on this issue
+        result = simulate_agent_invocation(variant, issue)
+
+        # Score the result
+        if result.correct:
+            scores['success_rate'] += 1
+        scores['avg_findings'] += len(result.findings)
+        scores['false_positives'] += result.false_positive_count
+        scores['avg_duration_seconds'] += result.duration
+
+    # Calculate averages
+    scores['success_rate'] /= len(test_cases)
+    scores['avg_findings'] /= len(test_cases)
+    scores['false_positives'] /= len(test_cases)
+    scores['avg_duration_seconds'] /= len(test_cases)
+
+    # Composite score (weighted)
+    composite = (
+        scores['success_rate'] * 0.4 +         # 40% weight on correctness
+        (scores['avg_findings'] / 10) * 0.3 +  # 30% on thoroughness
+        (1 - scores['false_positives'] / 5) * 0.2 +  # 20% on precision
+        (1 - scores['avg_duration_seconds'] / 600) * 0.1  # 10% on speed
+    )
+
+    return scores, composite
+```
+
+**Output**:
+
+```markdown
+Evaluation Results (Generation 0):
+
+Variant 1 (Baseline):
+  • Success rate: 82%
+  • Avg findings: 3.2 per review
+  • False positives: 1.8 per review
+  • Avg duration: 180 seconds
+  • **Composite score: 0.82**
+
+Variant 2 (Enhanced Instructions):
+  • Success rate: 85%
+  • Avg findings: 3.8 per review
+  • False positives: 1.5 per review
+  • Avg duration: 195 seconds
+  • **Composite score: 0.86**
+
+Variant 3 (More Examples):
+  • Success rate: 84%
+  • Avg findings: 3.5 per review
+  • False positives: 1.6 per review
+  • Avg duration: 190 seconds
+  • **Composite score: 0.84**
+
+Variant 4 (Tool Expansion):
+  • Success rate: 83%
+  • Avg findings: 3.4 per review
+  • False positives: 2.0 per review
+  • Avg duration: 210 seconds
+  • **Composite score: 0.81**
+
+Variant 5 (Specialized Focus):
+  • Success rate: 87%
+  • Avg findings: 4.1 per review
+  • False positives: 1.2 per review
+  • Avg duration: 200 seconds
+  • **Composite score: 0.89** ← Best
+```
+
+**Step 4: Select Top Performers (Parents)**
+
+```bash
+echo ""
+echo "Selecting top 2 variants as parents..."
+```
+
+Sort by composite score and select top 2:
+
+```markdown
+**Parents for next generation**:
+1. Variant 5 (score: 0.89) - Specialized Focus
+2. Variant 2 (score: 0.86) - Enhanced Instructions
+```
+
+**Step 5: Create Offspring via Crossover + Mutation**
+
+```bash
+echo ""
+echo "Creating offspring via genetic crossover..."
+```
+
+Generate 3 new variants by combining parent traits and adding mutations:
+
+```markdown
+**Offspring Generation**:
+
+Offspring 1: Crossover(Parent1, Parent2) + Mutation
+  • Take specialization from Variant 5
+  • Take instruction clarity from Variant 2
+  • Add mutation: Parallel processing workflow
+  • Expected score: ~0.90
+
+Offspring 2: Crossover(Parent2, Parent1) + Mutation
+  • Take instructions from Variant 2
+  • Take domain focus from Variant 5
+  • Add mutation: Historical pattern matching
+  • Expected score: ~0.88
+
+Offspring 3: Crossover(Parent1, Parent1) + Mutation
+  • Enhance Variant 5 further
+  • Add mutation: Predictive vulnerability detection
+  • Expected score: ~0.91
+```
+
+**Step 6: Form New Population**
+
+```bash
+echo ""
+echo "[Generation 1] New population formed..."
+```
+
+```markdown
+Generation 1 Population:
+1. Variant 5 (0.89) - Parent survivor
+2. Variant 2 (0.86) - Parent survivor
+3. Offspring 1 (~0.90) - New variant
+4. Offspring 2 (~0.88) - New variant
+5. Offspring 3 (~0.91) - New variant
+```
+
+**Step 7: Repeat for N Generations**
+
+```bash
+for generation in range(2, GENERATIONS+1):
+  echo "[Generation $generation] Evaluating population..."
+
+  # Evaluate all 5 variants
+  # Select top 2
+  # Create 3 offspring
+  # Log results
+
+echo ""
+echo "Evolution complete after $GENERATIONS generations"
+```
+
+**Convergence Example**:
+
+```markdown
+Evolution Progress for security-analyst:
+
+Gen 0: Best score: 0.82 (baseline)
+Gen 1: Best score: 0.89 (↑8.5%)
+Gen 2: Best score: 0.91 (↑2.2%)
+Gen 3: Best score: 0.93 (↑2.2%)
+Gen 4: Best score: 0.94 (↑1.1%)
+Gen 5: Best score: 0.94 (converged)
+Gen 6: Best score: 0.94 (converged)
+
+**Final best variant**: Gen 4, Variant 3
+**Improvement over baseline**: +14.6%
+**Ready for promotion**: Yes
+```
+
+### Phase 4: Promotion Decision
+
+```bash
+echo ""
+echo "=========================================="
+echo "PROMOTION DECISION"
+echo "=========================================="
+```
+
+Determine if best variant should be promoted:
+
+```markdown
+Analyzing best variant for [agent-name]...
+
+**Current Production**: v[N] (score: [baseline])
+**Best Evolution Candidate**: Gen [X], Variant [Y] (score: [best])
+
+**Improvement**: +[percentage]%
+
+**Decision Criteria**:
+✅ Score improvement ≥ 5%: [YES/NO]
+✅ Sample size ≥ 50 test cases: [YES/NO]
+✅ No performance regressions: [YES/NO]
+✅ False positive rate ≤ production: [YES/NO]
+
+**Decision**: [PROMOTE / KEEP TESTING / REJECT]
+```
+
+**If PROMOTE**:
+
+```bash
+echo ""
+echo "🎉 Promoting new variant to production..."
+
+# Save current version as v[N]
+cp "$AGENT_FILE" "$AGENT_FILE.v[N].backup"
+
+# Write new variant to production file
+# (Use Write or Edit tool to update agent file)
+
+# Update variant tracking
+# Update agent_variants.json with new version info
+
+echo "✅ Agent upgraded: [agent-name] v[N] → v[N+1]"
+echo "   Improvement: +[percentage]%"
+```
+
+### Phase 5: Update Variant Tracking
+
+Update `agent_variants.json` with evolution results:
+
+```json
+{
+  "agents": [
+    {
+      "name": "security-analyst",
+      "current_version": "v4",
+      "baseline_version": "v1",
+      "variants": [
+        {
+          "id": "v1-baseline",
+          "promoted": false,
+          "success_rate": 0.82,
+          "avg_findings": 3.2,
+          "composite_score": 0.82,
+          "created": "2025-01-01",
+          "issues_tested": 127
+        },
+        {
+          "id": "v2-enhanced-sql",
+          "promoted": true,
+          "promoted_date": "2025-03-15",
+          "success_rate": 0.87,
+          "avg_findings": 4.1,
+          "composite_score": 0.87,
+          "created": "2025-03-10",
+          "issues_tested": 156,
+          "improvement_vs_baseline": "+6.1%",
+          "changes": "Added SQL injection checklist and parameterized query detection"
+        },
+        {
+          "id": "v3-parallel-analysis",
+          "promoted": true,
+          "promoted_date": "2025-06-20",
+          "success_rate": 0.91,
+          "avg_findings": 4.7,
+          "composite_score": 0.91,
+          "created": "2025-06-15",
+          "issues_tested": 89,
+          "improvement_vs_baseline": "+11.0%",
+          "changes": "Parallel API + DB + Auth checks, faster execution"
+        },
+        {
+          "id": "v4-predictive",
+          "promoted": true,
+          "promoted_date": "2025-10-20",
+          "success_rate": 0.94,
+          "avg_findings": 5.1,
+          "composite_score": 0.94,
+          "created": "2025-10-18",
+          "issues_tested": 50,
+          "improvement_vs_baseline": "+14.6%",
+          "test_mode": false,
+          "changes": "Predictive vulnerability pattern matching from historical incidents"
+        }
+      ],
+      "evolution_history": [
+        {
+          "date": "2025-10-20",
+          "generations": 6,
+          "best_score": 0.94,
+          "improvement": "+14.6%",
+          "promoted": true
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Phase 6: Generate Evolution Report
+
+```markdown
+# AGENT EVOLUTION REPORT
+Generated: [timestamp]
+
+---
+
+## Summary
+
+**Agents Evolved**: [N]
+**Total Generations**: [N]
+**Promotions**: [N]
+**Average Improvement**: +[percentage]%
+
+---
+
+## Agent: [agent-name]
+
+### Evolution Results
+
+**Generations Run**: [N]
+**Variants Tested**: [N]
+**Best Variant**: Generation [X], Variant [Y]
+
+### Performance Comparison
+
+| Metric | Baseline (v1) | Best Variant | Improvement |
+|--------|--------------|--------------|-------------|
+| Success Rate | [%] | [%] | +[%] |
+| Avg Findings | [N] | [N] | +[%] |
+| False Positives | [N] | [N] | -[%] |
+| Avg Duration | [N]s | [N]s | -[%] |
+| **Composite Score** | [score] | [score] | **+[%]** |
+
+### Evolution Path
+
+```
+v1 (baseline):  0.82 ████████▒▒
+v2 (enhanced):  0.87 █████████▒
+v3 (parallel):  0.91 █████████▒
+v4 (predictive): 0.94 ██████████ ← PROMOTED
+```
+
+### Key Improvements
+
+1. **[Improvement 1]**: [Description]
+   - Impact: +[percentage]% [metric]
+   - Implementation: [How it was added]
+
+2. **[Improvement 2]**: [Description]
+   - Impact: +[percentage]% [metric]
+   - Implementation: [How it was added]
+
+3. **[Improvement 3]**: [Description]
+   - Impact: +[percentage]% [metric]
+   - Implementation: [How it was added]
+
+### Promotion Decision
+
+**Status**: ✅ Promoted to production
+**New Version**: v[N]
+**Improvement vs Baseline**: +[percentage]%
+**Tested on**: [N] historical issues
+
+**Changes Made**:
+- [List specific prompt modifications]
+- [Tool additions/changes]
+- [New instructions or guidelines]
+
+**Backup**: Baseline saved as `[agent-name].md.v[N-1].backup`
+
+---
+
+## Agent: [next-agent]
+
+[Same format for each agent evolved]
+
+---
+
+## Overall Statistics
+
+### Improvement Distribution
+
+```
+ 0-5%:  ▓▓▓ (3 agents)
+5-10%:  ▓▓▓▓▓▓ (6 agents)
+10-15%: ▓▓▓▓ (4 agents)
+15-20%: ▓▓ (2 agents)
+20%+:   ▓ (1 agent)
+```
+
+### Top Performers
+
+1. **[agent-name]**: +[percentage]% improvement
+2. **[agent-name]**: +[percentage]% improvement
+3. **[agent-name]**: +[percentage]% improvement
+
+### Convergence Analysis
+
+- **Avg generations to convergence**: [N]
+- **Avg final improvement**: +[percentage]%
+- **Success rate**: [N]/[N] agents improved
+
+---
+
+## Recommendations
+
+### Immediate Actions
+
+1. **Test promoted agents** on new issues to validate improvements
+2. **Monitor performance** over next 2 weeks for regressions
+3. **Document changes** in agent README files
+
+### Future Evolution
+
+1. **Agents ready for re-evolution** (6+ months old):
+   - [agent-name] (last evolved: [date])
+   - [agent-name] (last evolved: [date])
+
+2. **High-priority evolution targets**:
+   - [agent-name]: Low baseline performance
+   - [agent-name]: High usage, improvement potential
+
+3. **New mutation strategies to try**:
+   - [Strategy idea based on results]
+   - [Strategy idea based on results]
+
+---
+
+**Evolution completed**: [timestamp]
+**Next scheduled evolution**: [date] (6 months)
+**Variant tracking**: Updated in `meta/agent_variants.json`
+```
+
+### Phase 7: Output Summary
+
+```bash
+echo ""
+echo "=========================================="
+echo "EVOLUTION COMPLETE"
+echo "=========================================="
+echo ""
+echo "Agents evolved: [N]"
+echo "Promotions: [N]"
+echo "Average improvement: +[percentage]%"
+echo ""
+echo "Top performer: [agent-name] (+[percentage]%)"
+echo ""
+
+if [ -n "$OUTPUT_FILE" ]; then
+  echo "📝 Report saved to: $OUTPUT_FILE"
+fi
+
+echo ""
+echo "Variant tracking updated: meta/agent_variants.json"
+echo ""
+echo "Next steps:"
+echo "  1. Test promoted agents on new issues"
+echo "  2. Monitor performance metrics"
+echo "  3. Run /meta_health to see updated agent stats"
+echo "  4. Schedule re-evolution in 6 months"
+```
+
+## Evolution Guidelines
+
+### When to Evolve
+
+**DO Evolve** when:
+- Agent is 6+ months old
+- Performance plateaued
+- New patterns identified in telemetry
+- Historical data ≥50 test cases
+- User feedback suggests improvements needed
+
+**DON'T Evolve** when:
+- Agent recently updated (<3 months)
+- Insufficient test data (<50 cases)
+- Current performance excellent (>95%)
+- No clear improvement opportunities
+
+### Mutation Best Practices
+
+**Effective Mutations**:
+- Add specific checklists from real issues
+- Include historical pattern examples
+- Enhance domain terminology
+- Add parallel processing for speed
+- Reference past successes/failures
+
+**Avoid**:
+- Random changes without rationale
+- Removing working instructions
+- Adding complexity without benefit
+- Changing multiple things at once
+- Mutations that can't be evaluated
+
+### Promotion Criteria
+
+**Auto-Promote** if:
+- Improvement ≥10%
+- Tested on ≥50 cases
+- No performance regressions
+- False positives ≤ baseline
+
+**Human Review** if:
+- Improvement 5-10%
+- Novel approach
+- Significant prompt changes
+- Mixed results across metrics
+
+**Reject** if:
+- Improvement <5%
+- Performance regression
+- Increased false positives
+- Unstable results
+
+## Important Notes
+
+1. **Backup Always**: Save current version before promotion
+2. **Test Thoroughly**: Evaluate on sufficient historical data
+3. **Monitor Post-Deployment**: Track performance after promotion
+4. **Document Changes**: Record what was modified and why
+5. **Iterate**: Re-evolve periodically as new data accumulates
+6. **Compound Learning**: Each generation learns from previous
+7. **Diversity**: Maintain variant diversity to avoid local maxima
+
+## Example Usage Scenarios
+
+### Scenario 1: Evolve All Agents
+```bash
+/meta_evolve --agents all --generations 10 --output meta/evolution-report.md
+```
+Evolves all workflow agents for 10 generations each.
+
+### Scenario 2: Evolve Specific Agent
+```bash
+/meta_evolve --agents security-analyst --generations 15
+```
+Deep evolution of single agent with more generations.
+
+### Scenario 3: Parallel Evolution (Fast)
+```bash
+/meta_evolve --agents all --generations 5 --parallel
+```
+Evolves multiple agents simultaneously (faster but uses more resources).
+
+---
+
+**Remember**: Agent evolution is compound learning in action. Each generation builds on previous improvements, creating agents that perform 30-40% better than human-written baselines after 6-12 months of evolution.