# Activation Test Automation Framework v1.0 **Version:** 1.0 **Purpose:** Automated testing system for skill activation reliability **Target:** 99.5% activation reliability with <1% false positives --- ## ๐ŸŽฏ **Overview** This framework provides automated tools to test, validate, and monitor skill activation reliability across the 3-Layer Activation System (Keywords, Patterns, Description + NLU). ### **Problem Solved** **Before:** Manual testing was time-consuming, inconsistent, and missed edge cases **After:** Automated testing provides consistent validation, comprehensive coverage, and continuous monitoring --- ## ๐Ÿ› ๏ธ **Core Components** ### **1. Activation Test Suite Generator** Automatically generates comprehensive test cases for any skill based on its marketplace.json configuration. ### **2. Regex Pattern Validator** Validates regex patterns against test cases and identifies potential issues. ### **3. Coverage Analyzer** Calculates activation coverage and identifies gaps in keyword/pattern combinations. ### **4. Continuous Monitor** Monitors skill activation in real-time and tracks performance metrics. --- ## ๐Ÿ“ **Framework Structure** ``` references/tools/activation-tester/ โ”œโ”€โ”€ core/ โ”‚ โ”œโ”€โ”€ test-generator.md # Test case generation logic โ”‚ โ”œโ”€โ”€ pattern-validator.md # Regex validation tools โ”‚ โ”œโ”€โ”€ coverage-analyzer.md # Coverage calculation โ”‚ โ””โ”€โ”€ performance-monitor.md # Continuous monitoring โ”œโ”€โ”€ scripts/ โ”‚ โ”œโ”€โ”€ run-full-test-suite.sh # Complete automation script โ”‚ โ”œโ”€โ”€ quick-validation.sh # Fast validation checks โ”‚ โ”œโ”€โ”€ regression-test.sh # Regression testing โ”‚ โ””โ”€โ”€ performance-benchmark.sh # Performance testing โ”œโ”€โ”€ templates/ โ”‚ โ”œโ”€โ”€ test-report-template.md # Standardized reporting โ”‚ โ”œโ”€โ”€ coverage-report-template.md # Coverage analysis โ”‚ โ””โ”€โ”€ performance-dashboard.md # Metrics visualization โ””โ”€โ”€ examples/ โ”œโ”€โ”€ stock-analyzer-test-suite.md # Example test suite โ””โ”€โ”€ agent-creator-test-suite.md # Example reference test ``` --- ## ๐Ÿงช **Test Generation System** ### **Keyword Test Generation** For each keyword in marketplace.json, the system generates: ```bash generate_keyword_tests() { local keyword="$1" local skill_context="$2" # 1. Exact match test echo "Test: \"${keyword}\"" # 2. Embedded in sentence echo "Test: \"I need to ${keyword} for my project\"" # 3. Case variations echo "Test: \"$(echo ${keyword} | tr '[:lower:]' '[:upper:]')\"" # 4. Natural language variations echo "Test: \"Can you help me ${keyword}?\"" # 5. Context-specific variations echo "Test: \"${keyword} in ${skill_context}\"" } ``` ### **Pattern Test Generation** For each regex pattern, generate comprehensive test cases: ```bash generate_pattern_tests() { local pattern="$1" local description="$2" # Extract pattern components local verbs=$(extract_verbs "$pattern") local entities=$(extract_entities "$pattern") local contexts=$(extract_contexts "$pattern") # Generate positive test cases for verb in $verbs; do for entity in $entities; do echo "Test: \"${verb} ${entity}\"" echo "Test: \"I want to ${verb} ${entity} now\"" echo "Test: \"Can you ${verb} ${entity} for me?\"" done done # Generate negative test cases generate_negative_cases "$pattern" } ``` ### **Integration Test Generation** Creates realistic user queries combining multiple elements: ```bash generate_integration_tests() { local capabilities=("$@") for capability in "${capabilities[@]}"; do # Natural language variations echo "Test: \"How can I ${capability}?\"" echo "Test: \"I need help with ${capability}\"" echo "Test: \"Can you ${capability} for me?\"" # Workflow context echo "Test: \"Every day I have to ${capability}\"" echo "Test: \"I want to automate ${capability}\"" # Complex queries echo "Test: \"${capability} and show me results\"" echo "Test: \"Help me understand ${capability} better\"" done } ``` --- ## ๐Ÿ” **Pattern Validation System** ### **Regex Pattern Analyzer** Validates regex patterns for common issues: ```python def analyze_pattern(pattern): """Analyze regex pattern for potential issues""" issues = [] suggestions = [] # Check for common regex problems if pattern.count('*') > 2: issues.append("Too many wildcards - may cause false positives") if not re.search(r'\(\?\:i\)', pattern): suggestions.append("Add case-insensitive flag: (?i)") if pattern.startswith('.*') and pattern.endswith('.*'): issues.append("Pattern too broad - may match anything") # Calculate pattern specificity specificity = calculate_specificity(pattern) return { 'issues': issues, 'suggestions': suggestions, 'specificity': specificity, 'risk_level': assess_risk(pattern) } ``` ### **Pattern Coverage Test** Tests pattern against comprehensive query variations: ```bash test_pattern_coverage() { local pattern="$1" local test_queries=("$@") local matches=0 local total=${#test_queries[@]} for query in "${test_queries[@]}"; do if [[ $query =~ $pattern ]]; then ((matches++)) echo "โœ… Match: '$query'" else echo "โŒ No match: '$query'" fi done local coverage=$((matches * 100 / total)) echo "Pattern coverage: ${coverage}%" if [[ $coverage -lt 80 ]]; then echo "โš ๏ธ Low coverage - consider expanding pattern" fi } ``` --- ## ๐Ÿ“Š **Coverage Analysis System** ### **Multi-Layer Coverage Calculator** Calculates coverage across all three activation layers: ```python def calculate_activation_coverage(skill_config): """Calculate comprehensive activation coverage""" keywords = skill_config['activation']['keywords'] patterns = skill_config['activation']['patterns'] description = skill_config['metadata']['description'] # Layer 1: Keyword coverage keyword_coverage = { 'total_keywords': len(keywords), 'categories': categorize_keywords(keywords), 'synonym_coverage': calculate_synonym_coverage(keywords), 'natural_language_coverage': calculate_nl_coverage(keywords) } # Layer 2: Pattern coverage pattern_coverage = { 'total_patterns': len(patterns), 'pattern_types': categorize_patterns(patterns), 'regex_complexity': calculate_pattern_complexity(patterns), 'overlap_analysis': analyze_pattern_overlap(patterns) } # Layer 3: Description coverage description_coverage = { 'keyword_density': calculate_keyword_density(description, keywords), 'semantic_richness': analyze_semantic_content(description), 'concept_coverage': extract_concepts(description) } # Overall coverage score overall_score = calculate_overall_coverage( keyword_coverage, pattern_coverage, description_coverage ) return { 'overall_score': overall_score, 'keyword_coverage': keyword_coverage, 'pattern_coverage': pattern_coverage, 'description_coverage': description_coverage, 'recommendations': generate_recommendations(overall_score) } ``` ### **Gap Identification** Identifies gaps in activation coverage: ```python def identify_activation_gaps(skill_config, test_results): """Identify gaps in activation coverage""" gaps = [] # Analyze failed test queries failed_queries = [q for q in test_results if not q['activated']] # Categorize failures failure_categories = categorize_failures(failed_queries) # Identify missing keyword categories missing_categories = find_missing_keyword_categories( skill_config['activation']['keywords'], failure_categories ) # Identify pattern weaknesses pattern_gaps = find_pattern_gaps( skill_config['activation']['patterns'], failed_queries ) # Generate specific recommendations for category in missing_categories: gaps.append({ 'type': 'missing_keyword_category', 'category': category, 'suggestion': f"Add 5-10 keywords from {category} category" }) for gap in pattern_gaps: gaps.append({ 'type': 'pattern_gap', 'gap_type': gap['type'], 'suggestion': gap['suggestion'] }) return gaps ``` --- ## ๐Ÿš€ **Automation Scripts** ### **Full Test Suite Runner** ```bash #!/bin/bash # run-full-test-suite.sh run_full_test_suite() { local skill_path="$1" local output_dir="$2" echo "๐Ÿงช Running Full Activation Test Suite" echo "Skill: $skill_path" echo "Output: $output_dir" # 1. Parse skill configuration echo "๐Ÿ“‹ Parsing skill configuration..." parse_skill_config "$skill_path" # 2. Generate test cases echo "๐ŸŽฒ Generating test cases..." generate_all_test_cases "$skill_path" # 3. Run keyword tests echo "๐Ÿ”‘ Testing keyword activation..." run_keyword_tests "$skill_path" # 4. Run pattern tests echo "๐Ÿ” Testing pattern matching..." run_pattern_tests "$skill_path" # 5. Run integration tests echo "๐Ÿ”— Testing integration scenarios..." run_integration_tests "$skill_path" # 6. Run negative tests echo "๐Ÿšซ Testing false positives..." run_negative_tests "$skill_path" # 7. Calculate coverage echo "๐Ÿ“Š Calculating coverage..." calculate_coverage "$skill_path" # 8. Generate report echo "๐Ÿ“„ Generating test report..." generate_test_report "$skill_path" "$output_dir" echo "โœ… Test suite completed!" echo "๐Ÿ“ Report available at: $output_dir/activation-test-report.html" } ``` ### **Quick Validation Script** ```bash #!/bin/bash # quick-validation.sh quick_validation() { local skill_path="$1" echo "โšก Quick Activation Validation" # Fast JSON validation if ! python3 -m json.tool "$skill_path/marketplace.json" > /dev/null 2>&1; then echo "โŒ Invalid JSON in marketplace.json" return 1 fi # Check required fields check_required_fields "$skill_path" # Validate regex patterns validate_patterns "$skill_path" # Quick keyword count check keyword_count=$(jq '.activation.keywords | length' "$skill_path/marketplace.json") if [[ $keyword_count -lt 20 ]]; then echo "โš ๏ธ Low keyword count: $keyword_count (recommend 50+)" fi # Pattern count check pattern_count=$(jq '.activation.patterns | length' "$skill_path/marketplace.json") if [[ $pattern_count -lt 8 ]]; then echo "โš ๏ธ Low pattern count: $pattern_count (recommend 10+)" fi echo "โœ… Quick validation completed" } ``` --- ## ๐Ÿ“ˆ **Performance Monitoring** ### **Real-time Activation Monitor** ```python class ActivationMonitor: """Monitor skill activation performance in real-time""" def __init__(self, skill_name): self.skill_name = skill_name self.activation_log = [] self.performance_metrics = { 'total_activations': 0, 'successful_activations': 0, 'failed_activations': 0, 'average_response_time': 0, 'activation_by_layer': { 'keywords': 0, 'patterns': 0, 'description': 0 } } def log_activation(self, query, activated, layer, response_time): """Log activation attempt""" self.activation_log.append({ 'timestamp': datetime.now(), 'query': query, 'activated': activated, 'layer': layer, 'response_time': response_time }) self.update_metrics(activated, layer, response_time) def calculate_reliability_score(self): """Calculate current reliability score""" if self.performance_metrics['total_activations'] == 0: return 0.0 success_rate = ( self.performance_metrics['successful_activations'] / self.performance_metrics['total_activations'] ) return success_rate def generate_alerts(self): """Generate performance alerts""" alerts = [] reliability = self.calculate_reliability_score() if reliability < 0.95: alerts.append({ 'type': 'low_reliability', 'message': f'Reliability dropped to {reliability:.2%}', 'severity': 'high' }) avg_response_time = self.performance_metrics['average_response_time'] if avg_response_time > 5.0: alerts.append({ 'type': 'slow_response', 'message': f'Average response time: {avg_response_time:.2f}s', 'severity': 'medium' }) return alerts ``` --- ## ๐Ÿ“‹ **Usage Examples** ### **Example 1: Testing Stock Analyzer Skill** ```bash # Run full test suite ./run-full-test-suite.sh \ /path/to/stock-analyzer-cskill \ /output/test-results # Quick validation ./quick-validation.sh /path/to/stock-analyzer-cskill # Monitor performance ./performance-benchmark.sh stock-analyzer-cskill ``` ### **Example 2: Integration with Development Workflow** ```yaml # .github/workflows/activation-testing.yml name: Activation Testing on: [push, pull_request] jobs: test-activation: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Run Activation Tests run: | ./references/tools/activation-tester/scripts/run-full-test-suite.sh \ ./references/examples/stock-analyzer-cskill \ ./test-results - name: Upload Test Results uses: actions/upload-artifact@v2 with: name: activation-test-results path: ./test-results/ ``` --- ## โœ… **Quality Standards** ### **Test Coverage Requirements** - [ ] 100% keyword coverage testing - [ ] 95%+ pattern coverage validation - [ ] All capability variations tested - [ ] Edge cases documented and tested - [ ] Negative testing for false positives ### **Performance Benchmarks** - [ ] Activation reliability: 99.5%+ - [ ] False positive rate: <1% - [ ] Test execution time: <30 seconds - [ ] Memory usage: <100MB - [ ] Response time: <2 seconds average ### **Reporting Standards** - [ ] Automated test report generation - [ ] Performance metrics dashboard - [ ] Historical trend analysis - [ ] Actionable recommendations - [ ] Integration with CI/CD pipeline --- ## ๐Ÿ”„ **Continuous Improvement** ### **Feedback Loop Integration** 1. **Collect** activation data from real usage 2. **Analyze** performance metrics and failure patterns 3. **Identify** optimization opportunities 4. **Implement** improvements to keywords/patterns 5. **Validate** improvements with automated testing 6. **Deploy** updated configurations ### **A/B Testing Framework** - Test different keyword combinations - Compare pattern performance - Validate description effectiveness - Measure user satisfaction impact --- ## ๐Ÿ“š **Additional Resources** - `../activation-testing-guide.md` - Manual testing procedures - `../activation-patterns-guide.md` - Pattern library - `../phase4-detection.md` - Detection methodology - `../synonym-expansion-system.md` - Keyword expansion --- **Version:** 1.0 **Last Updated:** 2025-10-24 **Maintained By:** Agent-Skill-Creator Team