gh-francyjglisboa-agent-ski…/references/tools/activation-tester.md

# Activation Test Automation Framework v1.0

**Version:** 1.0
**Purpose:** Automated testing system for skill activation reliability
**Target:** 99.5% activation reliability with <1% false positives

---

## 🎯 **Overview**

This framework provides automated tools to test, validate, and monitor skill activation reliability across the 3-Layer Activation System (Keywords, Patterns, Description + NLU).

### **Problem Solved**

**Before:** Manual testing was time-consuming, inconsistent, and missed edge cases
**After:** Automated testing provides consistent validation, comprehensive coverage, and continuous monitoring

---

## 🛠️ **Core Components**

### **1. Activation Test Suite Generator**
Automatically generates comprehensive test cases for any skill based on its marketplace.json configuration.

### **2. Regex Pattern Validator**
Validates regex patterns against test cases and identifies potential issues.

### **3. Coverage Analyzer**
Calculates activation coverage and identifies gaps in keyword/pattern combinations.

### **4. Continuous Monitor**
Monitors skill activation in real-time and tracks performance metrics.

---

## 📁 **Framework Structure**

```
references/tools/activation-tester/
├── core/
│   ├── test-generator.md          # Test case generation logic
│   ├── pattern-validator.md       # Regex validation tools
│   ├── coverage-analyzer.md       # Coverage calculation
│   └── performance-monitor.md     # Continuous monitoring
├── scripts/
│   ├── run-full-test-suite.sh     # Complete automation script
│   ├── quick-validation.sh        # Fast validation checks
│   ├── regression-test.sh         # Regression testing
│   └── performance-benchmark.sh   # Performance testing
├── templates/
│   ├── test-report-template.md    # Standardized reporting
│   ├── coverage-report-template.md # Coverage analysis
│   └── performance-dashboard.md   # Metrics visualization
└── examples/
    ├── stock-analyzer-test-suite.md # Example test suite
    └── agent-creator-test-suite.md  # Example reference test
```

---

## 🧪 **Test Generation System**

### **Keyword Test Generation**

For each keyword in marketplace.json, the system generates:

```bash
generate_keyword_tests() {
    local keyword="$1"
    local skill_context="$2"

    # 1. Exact match test
    echo "Test: \"${keyword}\""

    # 2. Embedded in sentence
    echo "Test: \"I need to ${keyword} for my project\""

    # 3. Case variations
    echo "Test: \"$(echo ${keyword} | tr '[:lower:]' '[:upper:]')\""

    # 4. Natural language variations
    echo "Test: \"Can you help me ${keyword}?\""

    # 5. Context-specific variations
    echo "Test: \"${keyword} in ${skill_context}\""
}
```

### **Pattern Test Generation**

For each regex pattern, generate comprehensive test cases:

```bash
generate_pattern_tests() {
    local pattern="$1"
    local description="$2"

    # Extract pattern components
    local verbs=$(extract_verbs "$pattern")
    local entities=$(extract_entities "$pattern")
    local contexts=$(extract_contexts "$pattern")

    # Generate positive test cases
    for verb in $verbs; do
        for entity in $entities; do
            echo "Test: \"${verb} ${entity}\""
            echo "Test: \"I want to ${verb} ${entity} now\""
            echo "Test: \"Can you ${verb} ${entity} for me?\""
        done
    done

    # Generate negative test cases
    generate_negative_cases "$pattern"
}
```

### **Integration Test Generation**

Creates realistic user queries combining multiple elements:

```bash
generate_integration_tests() {
    local capabilities=("$@")

    for capability in "${capabilities[@]}"; do
        # Natural language variations
        echo "Test: \"How can I ${capability}?\""
        echo "Test: \"I need help with ${capability}\""
        echo "Test: \"Can you ${capability} for me?\""

        # Workflow context
        echo "Test: \"Every day I have to ${capability}\""
        echo "Test: \"I want to automate ${capability}\""

        # Complex queries
        echo "Test: \"${capability} and show me results\""
        echo "Test: \"Help me understand ${capability} better\""
    done
}
```

---

## 🔍 **Pattern Validation System**

### **Regex Pattern Analyzer**

Validates regex patterns for common issues:

```python
def analyze_pattern(pattern):
    """Analyze regex pattern for potential issues"""
    issues = []
    suggestions = []

    # Check for common regex problems
    if pattern.count('*') > 2:
        issues.append("Too many wildcards - may cause false positives")

    if not re.search(r'\(\?\:i\)', pattern):
        suggestions.append("Add case-insensitive flag: (?i)")

    if pattern.startswith('.*') and pattern.endswith('.*'):
        issues.append("Pattern too broad - may match anything")

    # Calculate pattern specificity
    specificity = calculate_specificity(pattern)

    return {
        'issues': issues,
        'suggestions': suggestions,
        'specificity': specificity,
        'risk_level': assess_risk(pattern)
    }
```

### **Pattern Coverage Test**

Tests pattern against comprehensive query variations:

```bash
test_pattern_coverage() {
    local pattern="$1"
    local test_queries=("$@")
    local matches=0
    local total=${#test_queries[@]}

    for query in "${test_queries[@]}"; do
        if [[ $query =~ $pattern ]]; then
            ((matches++))
            echo "✅ Match: '$query'"
        else
            echo "❌ No match: '$query'"
        fi
    done

    local coverage=$((matches * 100 / total))
    echo "Pattern coverage: ${coverage}%"

    if [[ $coverage -lt 80 ]]; then
        echo "⚠️  Low coverage - consider expanding pattern"
    fi
}
```

---

## 📊 **Coverage Analysis System**

### **Multi-Layer Coverage Calculator**

Calculates coverage across all three activation layers:

```python
def calculate_activation_coverage(skill_config):
    """Calculate comprehensive activation coverage"""

    keywords = skill_config['activation']['keywords']
    patterns = skill_config['activation']['patterns']
    description = skill_config['metadata']['description']

    # Layer 1: Keyword coverage
    keyword_coverage = {
        'total_keywords': len(keywords),
        'categories': categorize_keywords(keywords),
        'synonym_coverage': calculate_synonym_coverage(keywords),
        'natural_language_coverage': calculate_nl_coverage(keywords)
    }

    # Layer 2: Pattern coverage
    pattern_coverage = {
        'total_patterns': len(patterns),
        'pattern_types': categorize_patterns(patterns),
        'regex_complexity': calculate_pattern_complexity(patterns),
        'overlap_analysis': analyze_pattern_overlap(patterns)
    }

    # Layer 3: Description coverage
    description_coverage = {
        'keyword_density': calculate_keyword_density(description, keywords),
        'semantic_richness': analyze_semantic_content(description),
        'concept_coverage': extract_concepts(description)
    }

    # Overall coverage score
    overall_score = calculate_overall_coverage(
        keyword_coverage, pattern_coverage, description_coverage
    )

    return {
        'overall_score': overall_score,
        'keyword_coverage': keyword_coverage,
        'pattern_coverage': pattern_coverage,
        'description_coverage': description_coverage,
        'recommendations': generate_recommendations(overall_score)
    }
```

### **Gap Identification**

Identifies gaps in activation coverage:

```python
def identify_activation_gaps(skill_config, test_results):
    """Identify gaps in activation coverage"""

    gaps = []

    # Analyze failed test queries
    failed_queries = [q for q in test_results if not q['activated']]

    # Categorize failures
    failure_categories = categorize_failures(failed_queries)

    # Identify missing keyword categories
    missing_categories = find_missing_keyword_categories(
        skill_config['activation']['keywords'],
        failure_categories
    )

    # Identify pattern weaknesses
    pattern_gaps = find_pattern_gaps(
        skill_config['activation']['patterns'],
        failed_queries
    )

    # Generate specific recommendations
    for category in missing_categories:
        gaps.append({
            'type': 'missing_keyword_category',
            'category': category,
            'suggestion': f"Add 5-10 keywords from {category} category"
        })

    for gap in pattern_gaps:
        gaps.append({
            'type': 'pattern_gap',
            'gap_type': gap['type'],
            'suggestion': gap['suggestion']
        })

    return gaps
```

---

## 🚀 **Automation Scripts**

### **Full Test Suite Runner**

```bash
#!/bin/bash
# run-full-test-suite.sh

run_full_test_suite() {
    local skill_path="$1"
    local output_dir="$2"

    echo "🧪 Running Full Activation Test Suite"
    echo "Skill: $skill_path"
    echo "Output: $output_dir"

    # 1. Parse skill configuration
    echo "📋 Parsing skill configuration..."
    parse_skill_config "$skill_path"

    # 2. Generate test cases
    echo "🎲 Generating test cases..."
    generate_all_test_cases "$skill_path"

    # 3. Run keyword tests
    echo "🔑 Testing keyword activation..."
    run_keyword_tests "$skill_path"

    # 4. Run pattern tests
    echo "🔍 Testing pattern matching..."
    run_pattern_tests "$skill_path"

    # 5. Run integration tests
    echo "🔗 Testing integration scenarios..."
    run_integration_tests "$skill_path"

    # 6. Run negative tests
    echo "🚫 Testing false positives..."
    run_negative_tests "$skill_path"

    # 7. Calculate coverage
    echo "📊 Calculating coverage..."
    calculate_coverage "$skill_path"

    # 8. Generate report
    echo "📄 Generating test report..."
    generate_test_report "$skill_path" "$output_dir"

    echo "✅ Test suite completed!"
    echo "📁 Report available at: $output_dir/activation-test-report.html"
}
```

### **Quick Validation Script**

```bash
#!/bin/bash
# quick-validation.sh

quick_validation() {
    local skill_path="$1"

    echo "⚡ Quick Activation Validation"

    # Fast JSON validation
    if ! python3 -m json.tool "$skill_path/marketplace.json" > /dev/null 2>&1; then
        echo "❌ Invalid JSON in marketplace.json"
        return 1
    fi

    # Check required fields
    check_required_fields "$skill_path"

    # Validate regex patterns
    validate_patterns "$skill_path"

    # Quick keyword count check
    keyword_count=$(jq '.activation.keywords | length' "$skill_path/marketplace.json")
    if [[ $keyword_count -lt 20 ]]; then
        echo "⚠️  Low keyword count: $keyword_count (recommend 50+)"
    fi

    # Pattern count check
    pattern_count=$(jq '.activation.patterns | length' "$skill_path/marketplace.json")
    if [[ $pattern_count -lt 8 ]]; then
        echo "⚠️  Low pattern count: $pattern_count (recommend 10+)"
    fi

    echo "✅ Quick validation completed"
}
```

---

## 📈 **Performance Monitoring**

### **Real-time Activation Monitor**

```python
class ActivationMonitor:
    """Monitor skill activation performance in real-time"""

    def __init__(self, skill_name):
        self.skill_name = skill_name
        self.activation_log = []
        self.performance_metrics = {
            'total_activations': 0,
            'successful_activations': 0,
            'failed_activations': 0,
            'average_response_time': 0,
            'activation_by_layer': {
                'keywords': 0,
                'patterns': 0,
                'description': 0
            }
        }

    def log_activation(self, query, activated, layer, response_time):
        """Log activation attempt"""
        self.activation_log.append({
            'timestamp': datetime.now(),
            'query': query,
            'activated': activated,
            'layer': layer,
            'response_time': response_time
        })

        self.update_metrics(activated, layer, response_time)

    def calculate_reliability_score(self):
        """Calculate current reliability score"""
        if self.performance_metrics['total_activations'] == 0:
            return 0.0

        success_rate = (
            self.performance_metrics['successful_activations'] /
            self.performance_metrics['total_activations']
        )

        return success_rate

    def generate_alerts(self):
        """Generate performance alerts"""
        alerts = []

        reliability = self.calculate_reliability_score()
        if reliability < 0.95:
            alerts.append({
                'type': 'low_reliability',
                'message': f'Reliability dropped to {reliability:.2%}',
                'severity': 'high'
            })

        avg_response_time = self.performance_metrics['average_response_time']
        if avg_response_time > 5.0:
            alerts.append({
                'type': 'slow_response',
                'message': f'Average response time: {avg_response_time:.2f}s',
                'severity': 'medium'
            })

        return alerts
```

---

## 📋 **Usage Examples**

### **Example 1: Testing Stock Analyzer Skill**

```bash
# Run full test suite
./run-full-test-suite.sh \
    /path/to/stock-analyzer-cskill \
    /output/test-results

# Quick validation
./quick-validation.sh /path/to/stock-analyzer-cskill

# Monitor performance
./performance-benchmark.sh stock-analyzer-cskill
```

### **Example 2: Integration with Development Workflow**

```yaml
# .github/workflows/activation-testing.yml
name: Activation Testing

on: [push, pull_request]

jobs:
  test-activation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run Activation Tests
        run: |
          ./references/tools/activation-tester/scripts/run-full-test-suite.sh \
            ./references/examples/stock-analyzer-cskill \
            ./test-results
      - name: Upload Test Results
        uses: actions/upload-artifact@v2
        with:
          name: activation-test-results
          path: ./test-results/
```

---

## ✅ **Quality Standards**

### **Test Coverage Requirements**
- [ ] 100% keyword coverage testing
- [ ] 95%+ pattern coverage validation
- [ ] All capability variations tested
- [ ] Edge cases documented and tested
- [ ] Negative testing for false positives

### **Performance Benchmarks**
- [ ] Activation reliability: 99.5%+
- [ ] False positive rate: <1%
- [ ] Test execution time: <30 seconds
- [ ] Memory usage: <100MB
- [ ] Response time: <2 seconds average

### **Reporting Standards**
- [ ] Automated test report generation
- [ ] Performance metrics dashboard
- [ ] Historical trend analysis
- [ ] Actionable recommendations
- [ ] Integration with CI/CD pipeline

---

## 🔄 **Continuous Improvement**

### **Feedback Loop Integration**
1. **Collect** activation data from real usage
2. **Analyze** performance metrics and failure patterns
3. **Identify** optimization opportunities
4. **Implement** improvements to keywords/patterns
5. **Validate** improvements with automated testing
6. **Deploy** updated configurations

### **A/B Testing Framework**
- Test different keyword combinations
- Compare pattern performance
- Validate description effectiveness
- Measure user satisfaction impact

---

## 📚 **Additional Resources**

- `../activation-testing-guide.md` - Manual testing procedures
- `../activation-patterns-guide.md` - Pattern library
- `../phase4-detection.md` - Detection methodology
- `../synonym-expansion-system.md` - Keyword expansion

---

**Version:** 1.0
**Last Updated:** 2025-10-24
**Maintained By:** Agent-Skill-Creator Team