Initial commit

2025-11-29 18:27:25 +08:00
commit e18b9b4fa8
77 changed files with 35441 additions and 0 deletions
--- a/references/tools/activation-tester.md
+++ b/references/tools/activation-tester.md
@@ -0,0 +1,571 @@
+# Activation Test Automation Framework v1.0
+
+**Version:** 1.0
+**Purpose:** Automated testing system for skill activation reliability
+**Target:** 99.5% activation reliability with <1% false positives
+
+---
+
+## 🎯 **Overview**
+
+This framework provides automated tools to test, validate, and monitor skill activation reliability across the 3-Layer Activation System (Keywords, Patterns, Description + NLU).
+
+### **Problem Solved**
+
+**Before:** Manual testing was time-consuming, inconsistent, and missed edge cases
+**After:** Automated testing provides consistent validation, comprehensive coverage, and continuous monitoring
+
+---
+
+## 🛠️ **Core Components**
+
+### **1. Activation Test Suite Generator**
+Automatically generates comprehensive test cases for any skill based on its marketplace.json configuration.
+
+### **2. Regex Pattern Validator**
+Validates regex patterns against test cases and identifies potential issues.
+
+### **3. Coverage Analyzer**
+Calculates activation coverage and identifies gaps in keyword/pattern combinations.
+
+### **4. Continuous Monitor**
+Monitors skill activation in real-time and tracks performance metrics.
+
+---
+
+## 📁 **Framework Structure**
+
+```
+references/tools/activation-tester/
+├── core/
+│   ├── test-generator.md          # Test case generation logic
+│   ├── pattern-validator.md       # Regex validation tools
+│   ├── coverage-analyzer.md       # Coverage calculation
+│   └── performance-monitor.md     # Continuous monitoring
+├── scripts/
+│   ├── run-full-test-suite.sh     # Complete automation script
+│   ├── quick-validation.sh        # Fast validation checks
+│   ├── regression-test.sh         # Regression testing
+│   └── performance-benchmark.sh   # Performance testing
+├── templates/
+│   ├── test-report-template.md    # Standardized reporting
+│   ├── coverage-report-template.md # Coverage analysis
+│   └── performance-dashboard.md   # Metrics visualization
+└── examples/
+    ├── stock-analyzer-test-suite.md # Example test suite
+    └── agent-creator-test-suite.md  # Example reference test
+```
+
+---
+
+## 🧪 **Test Generation System**
+
+### **Keyword Test Generation**
+
+For each keyword in marketplace.json, the system generates:
+
+```bash
+generate_keyword_tests() {
+    local keyword="$1"
+    local skill_context="$2"
+
+    # 1. Exact match test
+    echo "Test: \"${keyword}\""
+
+    # 2. Embedded in sentence
+    echo "Test: \"I need to ${keyword} for my project\""
+
+    # 3. Case variations
+    echo "Test: \"$(echo ${keyword} | tr '[:lower:]' '[:upper:]')\""
+
+    # 4. Natural language variations
+    echo "Test: \"Can you help me ${keyword}?\""
+
+    # 5. Context-specific variations
+    echo "Test: \"${keyword} in ${skill_context}\""
+}
+```
+
+### **Pattern Test Generation**
+
+For each regex pattern, generate comprehensive test cases:
+
+```bash
+generate_pattern_tests() {
+    local pattern="$1"
+    local description="$2"
+
+    # Extract pattern components
+    local verbs=$(extract_verbs "$pattern")
+    local entities=$(extract_entities "$pattern")
+    local contexts=$(extract_contexts "$pattern")
+
+    # Generate positive test cases
+    for verb in $verbs; do
+        for entity in $entities; do
+            echo "Test: \"${verb} ${entity}\""
+            echo "Test: \"I want to ${verb} ${entity} now\""
+            echo "Test: \"Can you ${verb} ${entity} for me?\""
+        done
+    done
+
+    # Generate negative test cases
+    generate_negative_cases "$pattern"
+}
+```
+
+### **Integration Test Generation**
+
+Creates realistic user queries combining multiple elements:
+
+```bash
+generate_integration_tests() {
+    local capabilities=("$@")
+
+    for capability in "${capabilities[@]}"; do
+        # Natural language variations
+        echo "Test: \"How can I ${capability}?\""
+        echo "Test: \"I need help with ${capability}\""
+        echo "Test: \"Can you ${capability} for me?\""
+
+        # Workflow context
+        echo "Test: \"Every day I have to ${capability}\""
+        echo "Test: \"I want to automate ${capability}\""
+
+        # Complex queries
+        echo "Test: \"${capability} and show me results\""
+        echo "Test: \"Help me understand ${capability} better\""
+    done
+}
+```
+
+---
+
+## 🔍 **Pattern Validation System**
+
+### **Regex Pattern Analyzer**
+
+Validates regex patterns for common issues:
+
+```python
+def analyze_pattern(pattern):
+    """Analyze regex pattern for potential issues"""
+    issues = []
+    suggestions = []
+
+    # Check for common regex problems
+    if pattern.count('*') > 2:
+        issues.append("Too many wildcards - may cause false positives")
+
+    if not re.search(r'\(\?\:i\)', pattern):
+        suggestions.append("Add case-insensitive flag: (?i)")
+
+    if pattern.startswith('.*') and pattern.endswith('.*'):
+        issues.append("Pattern too broad - may match anything")
+
+    # Calculate pattern specificity
+    specificity = calculate_specificity(pattern)
+
+    return {
+        'issues': issues,
+        'suggestions': suggestions,
+        'specificity': specificity,
+        'risk_level': assess_risk(pattern)
+    }
+```
+
+### **Pattern Coverage Test**
+
+Tests pattern against comprehensive query variations:
+
+```bash
+test_pattern_coverage() {
+    local pattern="$1"
+    local test_queries=("$@")
+    local matches=0
+    local total=${#test_queries[@]}
+
+    for query in "${test_queries[@]}"; do
+        if [[ $query =~ $pattern ]]; then
+            ((matches++))
+            echo "✅ Match: '$query'"
+        else
+            echo "❌ No match: '$query'"
+        fi
+    done
+
+    local coverage=$((matches * 100 / total))
+    echo "Pattern coverage: ${coverage}%"
+
+    if [[ $coverage -lt 80 ]]; then
+        echo "⚠️  Low coverage - consider expanding pattern"
+    fi
+}
+```
+
+---
+
+## 📊 **Coverage Analysis System**
+
+### **Multi-Layer Coverage Calculator**
+
+Calculates coverage across all three activation layers:
+
+```python
+def calculate_activation_coverage(skill_config):
+    """Calculate comprehensive activation coverage"""
+
+    keywords = skill_config['activation']['keywords']
+    patterns = skill_config['activation']['patterns']
+    description = skill_config['metadata']['description']
+
+    # Layer 1: Keyword coverage
+    keyword_coverage = {
+        'total_keywords': len(keywords),
+        'categories': categorize_keywords(keywords),
+        'synonym_coverage': calculate_synonym_coverage(keywords),
+        'natural_language_coverage': calculate_nl_coverage(keywords)
+    }
+
+    # Layer 2: Pattern coverage
+    pattern_coverage = {
+        'total_patterns': len(patterns),
+        'pattern_types': categorize_patterns(patterns),
+        'regex_complexity': calculate_pattern_complexity(patterns),
+        'overlap_analysis': analyze_pattern_overlap(patterns)
+    }
+
+    # Layer 3: Description coverage
+    description_coverage = {
+        'keyword_density': calculate_keyword_density(description, keywords),
+        'semantic_richness': analyze_semantic_content(description),
+        'concept_coverage': extract_concepts(description)
+    }
+
+    # Overall coverage score
+    overall_score = calculate_overall_coverage(
+        keyword_coverage, pattern_coverage, description_coverage
+    )
+
+    return {
+        'overall_score': overall_score,
+        'keyword_coverage': keyword_coverage,
+        'pattern_coverage': pattern_coverage,
+        'description_coverage': description_coverage,
+        'recommendations': generate_recommendations(overall_score)
+    }
+```
+
+### **Gap Identification**
+
+Identifies gaps in activation coverage:
+
+```python
+def identify_activation_gaps(skill_config, test_results):
+    """Identify gaps in activation coverage"""
+
+    gaps = []
+
+    # Analyze failed test queries
+    failed_queries = [q for q in test_results if not q['activated']]
+
+    # Categorize failures
+    failure_categories = categorize_failures(failed_queries)
+
+    # Identify missing keyword categories
+    missing_categories = find_missing_keyword_categories(
+        skill_config['activation']['keywords'],
+        failure_categories
+    )
+
+    # Identify pattern weaknesses
+    pattern_gaps = find_pattern_gaps(
+        skill_config['activation']['patterns'],
+        failed_queries
+    )
+
+    # Generate specific recommendations
+    for category in missing_categories:
+        gaps.append({
+            'type': 'missing_keyword_category',
+            'category': category,
+            'suggestion': f"Add 5-10 keywords from {category} category"
+        })
+
+    for gap in pattern_gaps:
+        gaps.append({
+            'type': 'pattern_gap',
+            'gap_type': gap['type'],
+            'suggestion': gap['suggestion']
+        })
+
+    return gaps
+```
+
+---
+
+## 🚀 **Automation Scripts**
+
+### **Full Test Suite Runner**
+
+```bash
+#!/bin/bash
+# run-full-test-suite.sh
+
+run_full_test_suite() {
+    local skill_path="$1"
+    local output_dir="$2"
+
+    echo "🧪 Running Full Activation Test Suite"
+    echo "Skill: $skill_path"
+    echo "Output: $output_dir"
+
+    # 1. Parse skill configuration
+    echo "📋 Parsing skill configuration..."
+    parse_skill_config "$skill_path"
+
+    # 2. Generate test cases
+    echo "🎲 Generating test cases..."
+    generate_all_test_cases "$skill_path"
+
+    # 3. Run keyword tests
+    echo "🔑 Testing keyword activation..."
+    run_keyword_tests "$skill_path"
+
+    # 4. Run pattern tests
+    echo "🔍 Testing pattern matching..."
+    run_pattern_tests "$skill_path"
+
+    # 5. Run integration tests
+    echo "🔗 Testing integration scenarios..."
+    run_integration_tests "$skill_path"
+
+    # 6. Run negative tests
+    echo "🚫 Testing false positives..."
+    run_negative_tests "$skill_path"
+
+    # 7. Calculate coverage
+    echo "📊 Calculating coverage..."
+    calculate_coverage "$skill_path"
+
+    # 8. Generate report
+    echo "📄 Generating test report..."
+    generate_test_report "$skill_path" "$output_dir"
+
+    echo "✅ Test suite completed!"
+    echo "📁 Report available at: $output_dir/activation-test-report.html"
+}
+```
+
+### **Quick Validation Script**
+
+```bash
+#!/bin/bash
+# quick-validation.sh
+
+quick_validation() {
+    local skill_path="$1"
+
+    echo "⚡ Quick Activation Validation"
+
+    # Fast JSON validation
+    if ! python3 -m json.tool "$skill_path/marketplace.json" > /dev/null 2>&1; then
+        echo "❌ Invalid JSON in marketplace.json"
+        return 1
+    fi
+
+    # Check required fields
+    check_required_fields "$skill_path"
+
+    # Validate regex patterns
+    validate_patterns "$skill_path"
+
+    # Quick keyword count check
+    keyword_count=$(jq '.activation.keywords | length' "$skill_path/marketplace.json")
+    if [[ $keyword_count -lt 20 ]]; then
+        echo "⚠️  Low keyword count: $keyword_count (recommend 50+)"
+    fi
+
+    # Pattern count check
+    pattern_count=$(jq '.activation.patterns | length' "$skill_path/marketplace.json")
+    if [[ $pattern_count -lt 8 ]]; then
+        echo "⚠️  Low pattern count: $pattern_count (recommend 10+)"
+    fi
+
+    echo "✅ Quick validation completed"
+}
+```
+
+---
+
+## 📈 **Performance Monitoring**
+
+### **Real-time Activation Monitor**
+
+```python
+class ActivationMonitor:
+    """Monitor skill activation performance in real-time"""
+
+    def __init__(self, skill_name):
+        self.skill_name = skill_name
+        self.activation_log = []
+        self.performance_metrics = {
+            'total_activations': 0,
+            'successful_activations': 0,
+            'failed_activations': 0,
+            'average_response_time': 0,
+            'activation_by_layer': {
+                'keywords': 0,
+                'patterns': 0,
+                'description': 0
+            }
+        }
+
+    def log_activation(self, query, activated, layer, response_time):
+        """Log activation attempt"""
+        self.activation_log.append({
+            'timestamp': datetime.now(),
+            'query': query,
+            'activated': activated,
+            'layer': layer,
+            'response_time': response_time
+        })
+
+        self.update_metrics(activated, layer, response_time)
+
+    def calculate_reliability_score(self):
+        """Calculate current reliability score"""
+        if self.performance_metrics['total_activations'] == 0:
+            return 0.0
+
+        success_rate = (
+            self.performance_metrics['successful_activations'] /
+            self.performance_metrics['total_activations']
+        )
+
+        return success_rate
+
+    def generate_alerts(self):
+        """Generate performance alerts"""
+        alerts = []
+
+        reliability = self.calculate_reliability_score()
+        if reliability < 0.95:
+            alerts.append({
+                'type': 'low_reliability',
+                'message': f'Reliability dropped to {reliability:.2%}',
+                'severity': 'high'
+            })
+
+        avg_response_time = self.performance_metrics['average_response_time']
+        if avg_response_time > 5.0:
+            alerts.append({
+                'type': 'slow_response',
+                'message': f'Average response time: {avg_response_time:.2f}s',
+                'severity': 'medium'
+            })
+
+        return alerts
+```
+
+---
+
+## 📋 **Usage Examples**
+
+### **Example 1: Testing Stock Analyzer Skill**
+
+```bash
+# Run full test suite
+./run-full-test-suite.sh \
+    /path/to/stock-analyzer-cskill \
+    /output/test-results
+
+# Quick validation
+./quick-validation.sh /path/to/stock-analyzer-cskill
+
+# Monitor performance
+./performance-benchmark.sh stock-analyzer-cskill
+```
+
+### **Example 2: Integration with Development Workflow**
+
+```yaml
+# .github/workflows/activation-testing.yml
+name: Activation Testing
+
+on: [push, pull_request]
+
+jobs:
+  test-activation:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Run Activation Tests
+        run: |
+          ./references/tools/activation-tester/scripts/run-full-test-suite.sh \
+            ./references/examples/stock-analyzer-cskill \
+            ./test-results
+      - name: Upload Test Results
+        uses: actions/upload-artifact@v2
+        with:
+          name: activation-test-results
+          path: ./test-results/
+```
+
+---
+
+## ✅ **Quality Standards**
+
+### **Test Coverage Requirements**
+- [ ] 100% keyword coverage testing
+- [ ] 95%+ pattern coverage validation
+- [ ] All capability variations tested
+- [ ] Edge cases documented and tested
+- [ ] Negative testing for false positives
+
+### **Performance Benchmarks**
+- [ ] Activation reliability: 99.5%+
+- [ ] False positive rate: <1%
+- [ ] Test execution time: <30 seconds
+- [ ] Memory usage: <100MB
+- [ ] Response time: <2 seconds average
+
+### **Reporting Standards**
+- [ ] Automated test report generation
+- [ ] Performance metrics dashboard
+- [ ] Historical trend analysis
+- [ ] Actionable recommendations
+- [ ] Integration with CI/CD pipeline
+
+---
+
+## 🔄 **Continuous Improvement**
+
+### **Feedback Loop Integration**
+1. **Collect** activation data from real usage
+2. **Analyze** performance metrics and failure patterns
+3. **Identify** optimization opportunities
+4. **Implement** improvements to keywords/patterns
+5. **Validate** improvements with automated testing
+6. **Deploy** updated configurations
+
+### **A/B Testing Framework**
+- Test different keyword combinations
+- Compare pattern performance
+- Validate description effectiveness
+- Measure user satisfaction impact
+
+---
+
+## 📚 **Additional Resources**
+
+- `../activation-testing-guide.md` - Manual testing procedures
+- `../activation-patterns-guide.md` - Pattern library
+- `../phase4-detection.md` - Detection methodology
+- `../synonym-expansion-system.md` - Keyword expansion
+
+---
+
+**Version:** 1.0
+**Last Updated:** 2025-10-24
+**Maintained By:** Agent-Skill-Creator Team