Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:27:28 +08:00
commit 8db9c44dd8
79 changed files with 37715 additions and 0 deletions

View File

@@ -0,0 +1,571 @@
# Activation Test Automation Framework v1.0
**Version:** 1.0
**Purpose:** Automated testing system for skill activation reliability
**Target:** 99.5% activation reliability with <1% false positives
---
## 🎯 **Overview**
This framework provides automated tools to test, validate, and monitor skill activation reliability across the 3-Layer Activation System (Keywords, Patterns, Description + NLU).
### **Problem Solved**
**Before:** Manual testing was time-consuming, inconsistent, and missed edge cases
**After:** Automated testing provides consistent validation, comprehensive coverage, and continuous monitoring
---
## 🛠️ **Core Components**
### **1. Activation Test Suite Generator**
Automatically generates comprehensive test cases for any skill based on its marketplace.json configuration.
### **2. Regex Pattern Validator**
Validates regex patterns against test cases and identifies potential issues.
### **3. Coverage Analyzer**
Calculates activation coverage and identifies gaps in keyword/pattern combinations.
### **4. Continuous Monitor**
Monitors skill activation in real-time and tracks performance metrics.
---
## 📁 **Framework Structure**
```
references/tools/activation-tester/
├── core/
│ ├── test-generator.md # Test case generation logic
│ ├── pattern-validator.md # Regex validation tools
│ ├── coverage-analyzer.md # Coverage calculation
│ └── performance-monitor.md # Continuous monitoring
├── scripts/
│ ├── run-full-test-suite.sh # Complete automation script
│ ├── quick-validation.sh # Fast validation checks
│ ├── regression-test.sh # Regression testing
│ └── performance-benchmark.sh # Performance testing
├── templates/
│ ├── test-report-template.md # Standardized reporting
│ ├── coverage-report-template.md # Coverage analysis
│ └── performance-dashboard.md # Metrics visualization
└── examples/
├── stock-analyzer-test-suite.md # Example test suite
└── agent-creator-test-suite.md # Example reference test
```
---
## 🧪 **Test Generation System**
### **Keyword Test Generation**
For each keyword in marketplace.json, the system generates:
```bash
generate_keyword_tests() {
local keyword="$1"
local skill_context="$2"
# 1. Exact match test
echo "Test: \"${keyword}\""
# 2. Embedded in sentence
echo "Test: \"I need to ${keyword} for my project\""
# 3. Case variations
echo "Test: \"$(echo ${keyword} | tr '[:lower:]' '[:upper:]')\""
# 4. Natural language variations
echo "Test: \"Can you help me ${keyword}?\""
# 5. Context-specific variations
echo "Test: \"${keyword} in ${skill_context}\""
}
```
### **Pattern Test Generation**
For each regex pattern, generate comprehensive test cases:
```bash
generate_pattern_tests() {
local pattern="$1"
local description="$2"
# Extract pattern components
local verbs=$(extract_verbs "$pattern")
local entities=$(extract_entities "$pattern")
local contexts=$(extract_contexts "$pattern")
# Generate positive test cases
for verb in $verbs; do
for entity in $entities; do
echo "Test: \"${verb} ${entity}\""
echo "Test: \"I want to ${verb} ${entity} now\""
echo "Test: \"Can you ${verb} ${entity} for me?\""
done
done
# Generate negative test cases
generate_negative_cases "$pattern"
}
```
### **Integration Test Generation**
Creates realistic user queries combining multiple elements:
```bash
generate_integration_tests() {
local capabilities=("$@")
for capability in "${capabilities[@]}"; do
# Natural language variations
echo "Test: \"How can I ${capability}?\""
echo "Test: \"I need help with ${capability}\""
echo "Test: \"Can you ${capability} for me?\""
# Workflow context
echo "Test: \"Every day I have to ${capability}\""
echo "Test: \"I want to automate ${capability}\""
# Complex queries
echo "Test: \"${capability} and show me results\""
echo "Test: \"Help me understand ${capability} better\""
done
}
```
---
## 🔍 **Pattern Validation System**
### **Regex Pattern Analyzer**
Validates regex patterns for common issues:
```python
def analyze_pattern(pattern):
"""Analyze regex pattern for potential issues"""
issues = []
suggestions = []
# Check for common regex problems
if pattern.count('*') > 2:
issues.append("Too many wildcards - may cause false positives")
if not re.search(r'\(\?\:i\)', pattern):
suggestions.append("Add case-insensitive flag: (?i)")
if pattern.startswith('.*') and pattern.endswith('.*'):
issues.append("Pattern too broad - may match anything")
# Calculate pattern specificity
specificity = calculate_specificity(pattern)
return {
'issues': issues,
'suggestions': suggestions,
'specificity': specificity,
'risk_level': assess_risk(pattern)
}
```
### **Pattern Coverage Test**
Tests pattern against comprehensive query variations:
```bash
test_pattern_coverage() {
local pattern="$1"
local test_queries=("$@")
local matches=0
local total=${#test_queries[@]}
for query in "${test_queries[@]}"; do
if [[ $query =~ $pattern ]]; then
((matches++))
echo "✅ Match: '$query'"
else
echo "❌ No match: '$query'"
fi
done
local coverage=$((matches * 100 / total))
echo "Pattern coverage: ${coverage}%"
if [[ $coverage -lt 80 ]]; then
echo "⚠️ Low coverage - consider expanding pattern"
fi
}
```
---
## 📊 **Coverage Analysis System**
### **Multi-Layer Coverage Calculator**
Calculates coverage across all three activation layers:
```python
def calculate_activation_coverage(skill_config):
"""Calculate comprehensive activation coverage"""
keywords = skill_config['activation']['keywords']
patterns = skill_config['activation']['patterns']
description = skill_config['metadata']['description']
# Layer 1: Keyword coverage
keyword_coverage = {
'total_keywords': len(keywords),
'categories': categorize_keywords(keywords),
'synonym_coverage': calculate_synonym_coverage(keywords),
'natural_language_coverage': calculate_nl_coverage(keywords)
}
# Layer 2: Pattern coverage
pattern_coverage = {
'total_patterns': len(patterns),
'pattern_types': categorize_patterns(patterns),
'regex_complexity': calculate_pattern_complexity(patterns),
'overlap_analysis': analyze_pattern_overlap(patterns)
}
# Layer 3: Description coverage
description_coverage = {
'keyword_density': calculate_keyword_density(description, keywords),
'semantic_richness': analyze_semantic_content(description),
'concept_coverage': extract_concepts(description)
}
# Overall coverage score
overall_score = calculate_overall_coverage(
keyword_coverage, pattern_coverage, description_coverage
)
return {
'overall_score': overall_score,
'keyword_coverage': keyword_coverage,
'pattern_coverage': pattern_coverage,
'description_coverage': description_coverage,
'recommendations': generate_recommendations(overall_score)
}
```
### **Gap Identification**
Identifies gaps in activation coverage:
```python
def identify_activation_gaps(skill_config, test_results):
"""Identify gaps in activation coverage"""
gaps = []
# Analyze failed test queries
failed_queries = [q for q in test_results if not q['activated']]
# Categorize failures
failure_categories = categorize_failures(failed_queries)
# Identify missing keyword categories
missing_categories = find_missing_keyword_categories(
skill_config['activation']['keywords'],
failure_categories
)
# Identify pattern weaknesses
pattern_gaps = find_pattern_gaps(
skill_config['activation']['patterns'],
failed_queries
)
# Generate specific recommendations
for category in missing_categories:
gaps.append({
'type': 'missing_keyword_category',
'category': category,
'suggestion': f"Add 5-10 keywords from {category} category"
})
for gap in pattern_gaps:
gaps.append({
'type': 'pattern_gap',
'gap_type': gap['type'],
'suggestion': gap['suggestion']
})
return gaps
```
---
## 🚀 **Automation Scripts**
### **Full Test Suite Runner**
```bash
#!/bin/bash
# run-full-test-suite.sh
run_full_test_suite() {
local skill_path="$1"
local output_dir="$2"
echo "🧪 Running Full Activation Test Suite"
echo "Skill: $skill_path"
echo "Output: $output_dir"
# 1. Parse skill configuration
echo "📋 Parsing skill configuration..."
parse_skill_config "$skill_path"
# 2. Generate test cases
echo "🎲 Generating test cases..."
generate_all_test_cases "$skill_path"
# 3. Run keyword tests
echo "🔑 Testing keyword activation..."
run_keyword_tests "$skill_path"
# 4. Run pattern tests
echo "🔍 Testing pattern matching..."
run_pattern_tests "$skill_path"
# 5. Run integration tests
echo "🔗 Testing integration scenarios..."
run_integration_tests "$skill_path"
# 6. Run negative tests
echo "🚫 Testing false positives..."
run_negative_tests "$skill_path"
# 7. Calculate coverage
echo "📊 Calculating coverage..."
calculate_coverage "$skill_path"
# 8. Generate report
echo "📄 Generating test report..."
generate_test_report "$skill_path" "$output_dir"
echo "✅ Test suite completed!"
echo "📁 Report available at: $output_dir/activation-test-report.html"
}
```
### **Quick Validation Script**
```bash
#!/bin/bash
# quick-validation.sh
quick_validation() {
local skill_path="$1"
echo "⚡ Quick Activation Validation"
# Fast JSON validation
if ! python3 -m json.tool "$skill_path/marketplace.json" > /dev/null 2>&1; then
echo "❌ Invalid JSON in marketplace.json"
return 1
fi
# Check required fields
check_required_fields "$skill_path"
# Validate regex patterns
validate_patterns "$skill_path"
# Quick keyword count check
keyword_count=$(jq '.activation.keywords | length' "$skill_path/marketplace.json")
if [[ $keyword_count -lt 20 ]]; then
echo "⚠️ Low keyword count: $keyword_count (recommend 50+)"
fi
# Pattern count check
pattern_count=$(jq '.activation.patterns | length' "$skill_path/marketplace.json")
if [[ $pattern_count -lt 8 ]]; then
echo "⚠️ Low pattern count: $pattern_count (recommend 10+)"
fi
echo "✅ Quick validation completed"
}
```
---
## 📈 **Performance Monitoring**
### **Real-time Activation Monitor**
```python
class ActivationMonitor:
"""Monitor skill activation performance in real-time"""
def __init__(self, skill_name):
self.skill_name = skill_name
self.activation_log = []
self.performance_metrics = {
'total_activations': 0,
'successful_activations': 0,
'failed_activations': 0,
'average_response_time': 0,
'activation_by_layer': {
'keywords': 0,
'patterns': 0,
'description': 0
}
}
def log_activation(self, query, activated, layer, response_time):
"""Log activation attempt"""
self.activation_log.append({
'timestamp': datetime.now(),
'query': query,
'activated': activated,
'layer': layer,
'response_time': response_time
})
self.update_metrics(activated, layer, response_time)
def calculate_reliability_score(self):
"""Calculate current reliability score"""
if self.performance_metrics['total_activations'] == 0:
return 0.0
success_rate = (
self.performance_metrics['successful_activations'] /
self.performance_metrics['total_activations']
)
return success_rate
def generate_alerts(self):
"""Generate performance alerts"""
alerts = []
reliability = self.calculate_reliability_score()
if reliability < 0.95:
alerts.append({
'type': 'low_reliability',
'message': f'Reliability dropped to {reliability:.2%}',
'severity': 'high'
})
avg_response_time = self.performance_metrics['average_response_time']
if avg_response_time > 5.0:
alerts.append({
'type': 'slow_response',
'message': f'Average response time: {avg_response_time:.2f}s',
'severity': 'medium'
})
return alerts
```
---
## 📋 **Usage Examples**
### **Example 1: Testing Stock Analyzer Skill**
```bash
# Run full test suite
./run-full-test-suite.sh \
/path/to/stock-analyzer-cskill \
/output/test-results
# Quick validation
./quick-validation.sh /path/to/stock-analyzer-cskill
# Monitor performance
./performance-benchmark.sh stock-analyzer-cskill
```
### **Example 2: Integration with Development Workflow**
```yaml
# .github/workflows/activation-testing.yml
name: Activation Testing
on: [push, pull_request]
jobs:
test-activation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run Activation Tests
run: |
./references/tools/activation-tester/scripts/run-full-test-suite.sh \
./references/examples/stock-analyzer-cskill \
./test-results
- name: Upload Test Results
uses: actions/upload-artifact@v2
with:
name: activation-test-results
path: ./test-results/
```
---
## ✅ **Quality Standards**
### **Test Coverage Requirements**
- [ ] 100% keyword coverage testing
- [ ] 95%+ pattern coverage validation
- [ ] All capability variations tested
- [ ] Edge cases documented and tested
- [ ] Negative testing for false positives
### **Performance Benchmarks**
- [ ] Activation reliability: 99.5%+
- [ ] False positive rate: <1%
- [ ] Test execution time: <30 seconds
- [ ] Memory usage: <100MB
- [ ] Response time: <2 seconds average
### **Reporting Standards**
- [ ] Automated test report generation
- [ ] Performance metrics dashboard
- [ ] Historical trend analysis
- [ ] Actionable recommendations
- [ ] Integration with CI/CD pipeline
---
## 🔄 **Continuous Improvement**
### **Feedback Loop Integration**
1. **Collect** activation data from real usage
2. **Analyze** performance metrics and failure patterns
3. **Identify** optimization opportunities
4. **Implement** improvements to keywords/patterns
5. **Validate** improvements with automated testing
6. **Deploy** updated configurations
### **A/B Testing Framework**
- Test different keyword combinations
- Compare pattern performance
- Validate description effectiveness
- Measure user satisfaction impact
---
## 📚 **Additional Resources**
- `../activation-testing-guide.md` - Manual testing procedures
- `../activation-patterns-guide.md` - Pattern library
- `../phase4-detection.md` - Detection methodology
- `../synonym-expansion-system.md` - Keyword expansion
---
**Version:** 1.0
**Last Updated:** 2025-10-24
**Maintained By:** Agent-Skill-Creator Team

View File

@@ -0,0 +1,651 @@
# Intent Analyzer Tools v1.0
**Version:** 1.0
**Purpose:** Development and testing tools for multi-intent detection system
**Target:** Validate intent detection with 95%+ accuracy
---
## 🛠️ **Intent Analysis Toolkit**
### **Core Tools**
1. **Intent Parser Validator** - Test intent parsing accuracy
2. **Intent Combination Analyzer** - Analyze intent compatibility
3. **Natural Language Intent Simulator** - Test complex queries
4. **Performance Benchmark Suite** - Measure detection performance
---
## 🔍 **Intent Parser Validator**
### **Usage**
```bash
# Basic intent parsing test
./intent-parser-validator.sh <skill-config> <test-query>
# Batch testing with query file
./intent-parser-validator.sh <skill-config> --batch <queries.txt>
# Full validation suite
./intent-parser-validator.sh <skill-config> --full-suite
```
### **Implementation**
```bash
#!/bin/bash
# intent-parser-validator.sh
validate_intent_parsing() {
local skill_config="$1"
local query="$2"
echo "🔍 Analyzing query: \"$query\""
# Extract intents using Python implementation
python3 << EOF
import json
import sys
sys.path.append('..')
# Load skill configuration
with open('$skill_config', 'r') as f:
config = json.load(f)
# Import intent parser (simplified implementation)
def parse_intent_simple(query):
"""Simplified intent parsing for validation"""
# Primary intent detection
primary_patterns = {
'analyze': ['analyze', 'examine', 'evaluate', 'study'],
'create': ['create', 'build', 'make', 'generate'],
'compare': ['compare', 'versus', 'vs', 'ranking'],
'monitor': ['monitor', 'track', 'watch', 'alert'],
'transform': ['convert', 'transform', 'change', 'turn']
}
# Secondary intent detection
secondary_patterns = {
'and_visualize': ['show', 'chart', 'graph', 'visualize'],
'and_save': ['save', 'export', 'download', 'store'],
'and_explain': ['explain', 'clarify', 'describe', 'detail']
}
query_lower = query.lower()
# Find primary intent
primary_intent = None
for intent, keywords in primary_patterns.items():
if any(keyword in query_lower for keyword in keywords):
primary_intent = intent
break
# Find secondary intents
secondary_intents = []
for intent, keywords in secondary_patterns.items():
if any(keyword in query_lower for keyword in keywords):
secondary_intents.append(intent)
return {
'primary_intent': primary_intent,
'secondary_intents': secondary_intents,
'confidence': 0.8 if primary_intent else 0.0,
'complexity': 'high' if len(secondary_intents) > 1 else 'medium' if secondary_intents else 'low'
}
# Parse the query
result = parse_intent_simple('$query')
print("Intent Analysis Results:")
print("=" * 30)
print(f"Primary Intent: {result['primary_intent']}")
print(f"Secondary Intents: {', '.join(result['secondary_intents'])}")
print(f"Confidence: {result['confidence']:.2f}")
print(f"Complexity: {result['complexity']}")
# Validate against skill capabilities
capabilities = config.get('capabilities', {})
supported_primary = capabilities.get('primary_intents', [])
supported_secondary = capabilities.get('secondary_intents', [])
validation_issues = []
if result['primary_intent'] not in supported_primary:
validation_issues.append(f"Primary intent '{result['primary_intent']}' not supported")
for sec_intent in result['secondary_intents']:
if sec_intent not in supported_secondary:
validation_issues.append(f"Secondary intent '{sec_intent}' not supported")
if validation_issues:
print("Validation Issues:")
for issue in validation_issues:
print(f" - {issue}")
else:
print("✅ All intents supported by skill")
EOF
}
```
---
## 🔄 **Intent Combination Analyzer**
### **Purpose**
Analyze compatibility and execution order of intent combinations.
### **Implementation**
```python
def analyze_intent_combination(primary_intent, secondary_intents, skill_config):
"""Analyze intent combination compatibility and execution plan"""
# Get supported combinations from skill config
supported_combinations = skill_config.get('intent_hierarchy', {}).get('intent_combinations', {})
# Check for exact combination match
combination_key = f"{primary_intent}_and_{'_and_'.join(secondary_intents)}"
if combination_key in supported_combinations:
return {
'supported': True,
'combination_type': 'predefined',
'execution_plan': supported_combinations[combination_key],
'confidence': 0.95
}
# Check for partial matches
for sec_intent in secondary_intents:
partial_key = f"{primary_intent}_and_{sec_intent}"
if partial_key in supported_combinations:
return {
'supported': True,
'combination_type': 'partial_match',
'execution_plan': supported_combinations[partial_key],
'additional_intents': [i for i in secondary_intents if i != sec_intent],
'confidence': 0.8
}
# Check if individual intents are supported
capabilities = skill_config.get('capabilities', {})
primary_supported = primary_intent in capabilities.get('primary_intents', [])
secondary_supported = all(intent in capabilities.get('secondary_intents', []) for intent in secondary_intents)
if primary_supported and secondary_supported:
return {
'supported': True,
'combination_type': 'dynamic',
'execution_plan': generate_dynamic_execution_plan(primary_intent, secondary_intents),
'confidence': 0.7
}
return {
'supported': False,
'reason': 'One or more intents not supported',
'fallback_intent': primary_intent if primary_supported else None
}
def generate_dynamic_execution_plan(primary_intent, secondary_intents):
"""Generate execution plan for non-predefined combinations"""
plan = {
'steps': [
{
'step': 1,
'intent': primary_intent,
'action': f'execute_{primary_intent}',
'dependencies': []
}
],
'parallel_steps': []
}
# Add secondary intents
for i, intent in enumerate(secondary_intents):
if can_execute_parallel(primary_intent, intent):
plan['parallel_steps'].append({
'step': f'parallel_{i}',
'intent': intent,
'action': f'execute_{intent}',
'dependencies': ['step_1']
})
else:
plan['steps'].append({
'step': len(plan['steps']) + 1,
'intent': intent,
'action': f'execute_{intent}',
'dependencies': [f'step_{len(plan["steps"])}']
})
return plan
def can_execute_parallel(primary_intent, secondary_intent):
"""Determine if intents can be executed in parallel"""
parallel_pairs = {
'analyze': ['and_visualize', 'and_save'],
'compare': ['and_visualize', 'and_explain'],
'monitor': ['and_alert', 'and_save']
}
return secondary_intent in parallel_pairs.get(primary_intent, [])
```
---
## 🗣️ **Natural Language Intent Simulator**
### **Purpose**
Generate and test natural language variations of intent combinations.
### **Implementation**
```python
class NaturalLanguageIntentSimulator:
"""Generate natural language variations for intent testing"""
def __init__(self):
self.templates = {
'single_intent': [
"I need to {intent} {entity}",
"Can you {intent} {entity}?",
"Please {intent} {entity}",
"Help me {intent} {entity}",
"{intent} {entity} for me"
],
'double_intent': [
"I need to {intent1} {entity} and {intent2} the results",
"Can you {intent1} {entity} and also {intent2}?",
"Please {intent1} {entity} and {intent2} everything",
"Help me {intent1} {entity} and {intent2} the output",
"{intent1} {entity} and then {intent2}"
],
'triple_intent': [
"I need to {intent1} {entity}, {intent2} the results, and {intent3}",
"Can you {intent1} {entity}, {intent2} it, and {intent3} everything?",
"Please {intent1} {entity}, {intent2} the analysis, and {intent3}",
"Help me {intent1} {entity}, {intent2} the data, and {intent3} the results"
]
}
self.intent_variations = {
'analyze': ['analyze', 'examine', 'evaluate', 'study', 'review', 'assess'],
'create': ['create', 'build', 'make', 'generate', 'develop', 'design'],
'compare': ['compare', 'comparison', 'versus', 'vs', 'rank', 'rating'],
'monitor': ['monitor', 'track', 'watch', 'observe', 'follow', 'keep an eye on'],
'transform': ['convert', 'transform', 'change', 'turn', 'format', 'structure']
}
self.secondary_variations = {
'and_visualize': ['show me', 'visualize', 'create a chart', 'graph', 'display'],
'and_save': ['save', 'export', 'download', 'store', 'keep', 'record'],
'and_explain': ['explain', 'describe', 'detail', 'clarify', 'break down']
}
self.entities = {
'finance': ['AAPL stock', 'MSFT shares', 'market data', 'portfolio performance', 'stock prices'],
'general': ['this data', 'the information', 'these results', 'the output', 'everything']
}
def generate_variations(self, primary_intent, secondary_intents=[], domain='finance'):
"""Generate natural language variations for intent combinations"""
variations = []
entity_list = self.entities[domain]
# Single intent variations
if not secondary_intents:
for template in self.templates['single_intent']:
for primary_verb in self.intent_variations.get(primary_intent, [primary_intent]):
for entity in entity_list[:3]: # Limit to avoid too many variations
query = template.format(intent=primary_verb, entity=entity)
variations.append({
'query': query,
'expected_intents': {
'primary': primary_intent,
'secondary': [],
'contextual': []
},
'complexity': 'low'
})
# Double intent variations
elif len(secondary_intents) == 1:
secondary_intent = secondary_intents[0]
for template in self.templates['double_intent']:
for primary_verb in self.intent_variations.get(primary_intent, [primary_intent]):
for secondary_verb in self.secondary_variations.get(secondary_intent, [secondary_intent.replace('and_', '')]):
for entity in entity_list[:2]:
query = template.format(
intent1=primary_verb,
intent2=secondary_verb,
entity=entity
)
variations.append({
'query': query,
'expected_intents': {
'primary': primary_intent,
'secondary': [secondary_intent],
'contextual': []
},
'complexity': 'medium'
})
# Triple intent variations
elif len(secondary_intents) >= 2:
for template in self.templates['triple_intent']:
for primary_verb in self.intent_variations.get(primary_intent, [primary_intent]):
for entity in entity_list[:2]:
secondary_verbs = [
self.secondary_variations.get(intent, [intent.replace('and_', '')])[0]
for intent in secondary_intents[:2]
]
query = template.format(
intent1=primary_verb,
intent2=secondary_verbs[0],
intent3=secondary_verbs[1],
entity=entity
)
variations.append({
'query': query,
'expected_intents': {
'primary': primary_intent,
'secondary': secondary_intents[:2],
'contextual': []
},
'complexity': 'high'
})
return variations
def generate_test_suite(self, skill_config, num_variations=10):
"""Generate complete test suite for a skill"""
test_suite = []
# Get supported intents from skill config
capabilities = skill_config.get('capabilities', {})
primary_intents = capabilities.get('primary_intents', [])
secondary_intents = capabilities.get('secondary_intents', [])
# Generate single intent tests
for primary in primary_intents[:3]: # Limit to avoid too many tests
variations = self.generate_variations(primary, [], 'finance')
test_suite.extend(variations[:num_variations])
# Generate double intent tests
for primary in primary_intents[:2]:
for secondary in secondary_intents[:2]:
variations = self.generate_variations([primary], [secondary], 'finance')
test_suite.extend(variations[:num_variations//2])
# Generate triple intent tests
for primary in primary_intents[:1]:
combinations = []
for i, sec1 in enumerate(secondary_intents[:2]):
for sec2 in secondary_intents[i+1:i+2]:
combinations.append([sec1, sec2])
for combo in combinations:
variations = self.generate_variations(primary, combo, 'finance')
test_suite.extend(variations[:num_variations//4])
return test_suite
```
---
## 📊 **Performance Benchmark Suite**
### **Benchmark Metrics**
1. **Intent Detection Accuracy** - % of correctly identified intents
2. **Processing Speed** - Time taken to parse intents
3. **Complexity Handling** - Success rate by complexity level
4. **Natural Language Understanding** - Success with varied phrasing
### **Implementation**
```python
class IntentBenchmarkSuite:
"""Performance benchmarking for intent detection"""
def __init__(self):
self.results = {
'accuracy_by_complexity': {'low': [], 'medium': [], 'high': [], 'very_high': []},
'processing_times': [],
'intent_accuracy': {'primary': [], 'secondary': [], 'contextual': []},
'natural_language_success': []
}
def run_benchmark(self, skill_config, test_cases):
"""Run complete benchmark suite"""
print("🚀 Starting Intent Detection Benchmark")
print(f"Test cases: {len(test_cases)}")
for i, test_case in enumerate(test_cases):
query = test_case['query']
expected = test_case['expected_intents']
complexity = test_case['complexity']
# Measure processing time
start_time = time.time()
# Parse intents (using simplified implementation)
detected = self.parse_intents(query, skill_config)
end_time = time.time()
processing_time = end_time - start_time
# Calculate accuracy
primary_correct = detected['primary_intent'] == expected['primary']
secondary_correct = set(detected.get('secondary_intents', [])) == set(expected['secondary'])
contextual_correct = set(detected.get('contextual_intents', [])) == set(expected['contextual'])
overall_accuracy = primary_correct and secondary_correct and contextual_correct
# Store results
self.results['accuracy_by_complexity'][complexity].append(overall_accuracy)
self.results['processing_times'].append(processing_time)
self.results['intent_accuracy']['primary'].append(primary_correct)
self.results['intent_accuracy']['secondary'].append(secondary_correct)
self.results['intent_accuracy']['contextual'].append(contextual_correct)
# Check if natural language (non-obvious phrasing)
is_natural_language = self.is_natural_language(query, expected)
if is_natural_language:
self.results['natural_language_success'].append(overall_accuracy)
# Progress indicator
if (i + 1) % 10 == 0:
print(f"Processed {i + 1}/{len(test_cases)} test cases...")
return self.generate_benchmark_report()
def parse_intents(self, query, skill_config):
"""Simplified intent parsing for benchmarking"""
# This would use the actual intent parsing implementation
# For now, simplified version for demonstration
query_lower = query.lower()
# Primary intent detection
primary_patterns = {
'analyze': ['analyze', 'examine', 'evaluate', 'study'],
'create': ['create', 'build', 'make', 'generate'],
'compare': ['compare', 'versus', 'vs', 'ranking'],
'monitor': ['monitor', 'track', 'watch', 'alert']
}
primary_intent = None
for intent, keywords in primary_patterns.items():
if any(keyword in query_lower for keyword in keywords):
primary_intent = intent
break
# Secondary intent detection
secondary_patterns = {
'and_visualize': ['show', 'chart', 'graph', 'visualize'],
'and_save': ['save', 'export', 'download', 'store'],
'and_explain': ['explain', 'clarify', 'describe', 'detail']
}
secondary_intents = []
for intent, keywords in secondary_patterns.items():
if any(keyword in query_lower for keyword in keywords):
secondary_intents.append(intent)
return {
'primary_intent': primary_intent,
'secondary_intents': secondary_intents,
'contextual_intents': [],
'confidence': 0.8 if primary_intent else 0.0
}
def is_natural_language(self, query, expected_intents):
"""Check if query uses natural language vs. direct commands"""
natural_indicators = [
'i need to', 'can you', 'help me', 'please', 'would like',
'interested in', 'thinking about', 'wondering if'
]
direct_indicators = [
'analyze', 'create', 'compare', 'monitor',
'show', 'save', 'explain'
]
query_lower = query.lower()
natural_score = sum(1 for indicator in natural_indicators if indicator in query_lower)
direct_score = sum(1 for indicator in direct_indicators if indicator in query_lower)
return natural_score > direct_score
def generate_benchmark_report(self):
"""Generate comprehensive benchmark report"""
total_tests = sum(len(accuracies) for accuracies in self.results['accuracy_by_complexity'].values())
if total_tests == 0:
return "No test results available"
# Calculate accuracy by complexity
accuracy_by_complexity = {}
for complexity, accuracies in self.results['accuracy_by_complexity'].items():
if accuracies:
accuracy_by_complexity[complexity] = sum(accuracies) / len(accuracies)
else:
accuracy_by_complexity[complexity] = 0.0
# Calculate overall metrics
avg_processing_time = sum(self.results['processing_times']) / len(self.results['processing_times'])
primary_intent_accuracy = sum(self.results['intent_accuracy']['primary']) / len(self.results['intent_accuracy']['primary'])
secondary_intent_accuracy = sum(self.results['intent_accuracy']['secondary']) / len(self.results['intent_accuracy']['secondary'])
# Calculate natural language success rate
nl_success_rate = 0.0
if self.results['natural_language_success']:
nl_success_rate = sum(self.results['natural_language_success']) / len(self.results['natural_language_success'])
report = f"""
Intent Detection Benchmark Report
=================================
Overall Performance:
- Total Tests: {total_tests}
- Average Processing Time: {avg_processing_time:.3f}s
Accuracy by Complexity:
"""
for complexity, accuracy in accuracy_by_complexity.items():
test_count = len(self.results['accuracy_by_complexity'][complexity])
report += f"- {complexity.capitalize()}: {accuracy:.1%} ({test_count} tests)\n"
report += f"""
Intent Detection Accuracy:
- Primary Intent: {primary_intent_accuracy:.1%}
- Secondary Intent: {secondary_intent_accuracy:.1%}
- Natural Language Queries: {nl_success_rate:.1%}
Performance Assessment:
"""
# Performance assessment
overall_accuracy = sum(accuracy_by_complexity.values()) / len(accuracy_by_complexity)
if overall_accuracy >= 0.95:
report += "✅ EXCELLENT - Intent detection performance is outstanding\n"
elif overall_accuracy >= 0.85:
report += "✅ GOOD - Intent detection performance is solid\n"
elif overall_accuracy >= 0.70:
report += "⚠️ ACCEPTABLE - Intent detection needs some improvement\n"
else:
report += "❌ NEEDS IMPROVEMENT - Intent detection requires significant work\n"
if avg_processing_time <= 0.1:
report += "✅ Processing speed is excellent\n"
elif avg_processing_time <= 0.2:
report += "✅ Processing speed is good\n"
else:
report += "⚠️ Processing speed could be improved\n"
return report
```
---
## ✅ **Usage Examples**
### **Example 1: Basic Intent Analysis**
```bash
# Test single intent
./intent-parser-validator.sh ./marketplace.json "Analyze AAPL stock"
# Test multiple intents
./intent-parser-validator.sh ./marketplace.json "Analyze AAPL stock and show me a chart"
# Batch testing
echo -e "Analyze AAPL stock\nCompare MSFT vs GOOGL\nMonitor my portfolio" > queries.txt
./intent-parser-validator.sh ./marketplace.json --batch queries.txt
```
### **Example 2: Natural Language Generation**
```python
# Generate test variations
simulator = NaturalLanguageIntentSimulator()
variations = simulator.generate_variations('analyze', ['and_visualize'], 'finance')
for variation in variations[:5]:
print(f"Query: {variation['query']}")
print(f"Expected: {variation['expected_intents']}")
print()
```
### **Example 3: Performance Benchmarking**
```python
# Generate test suite
simulator = NaturalLanguageIntentSimulator()
test_suite = simulator.generate_test_suite(skill_config, num_variations=20)
# Run benchmarks
benchmark = IntentBenchmarkSuite()
report = benchmark.run_benchmark(skill_config, test_suite)
print(report)
```
---
**Version:** 1.0
**Last Updated:** 2025-10-24
**Maintained By:** Agent-Skill-Creator Team

View File

@@ -0,0 +1,721 @@
#!/bin/bash
# Test Automation Scripts for Activation Testing v1.0
# Purpose: Automated testing suite for skill activation reliability
set -euo pipefail
# Configuration
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
RESULTS_DIR="${RESULTS_DIR:-$(pwd)/test-results}"
TEMP_DIR="${TEMP_DIR:-/tmp/activation-tests}"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Logging
log() { echo -e "${BLUE}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1"; }
success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
warning() { echo -e "${YELLOW}[WARNING]${NC} $1"; }
error() { echo -e "${RED}[ERROR]${NC} $1"; }
# Initialize directories
init_directories() {
local skill_path="$1"
local skill_name=$(basename "$skill_path")
RESULTS_DIR="${RESULTS_DIR}/${skill_name}"
TEMP_DIR="${TEMP_DIR}/${skill_name}"
mkdir -p "$RESULTS_DIR"/{reports,logs,coverage,performance}
mkdir -p "$TEMP_DIR"/{tests,patterns,validation}
log "Initialized directories for $skill_name"
}
# Parse skill configuration
parse_skill_config() {
local skill_path="$1"
local config_file="$skill_path/marketplace.json"
if [[ ! -f "$config_file" ]]; then
error "marketplace.json not found in $skill_path"
return 1
fi
# Validate JSON syntax
if ! python3 -m json.tool "$config_file" > /dev/null 2>&1; then
error "Invalid JSON syntax in $config_file"
return 1
fi
# Extract key information
local skill_name=$(jq -r '.name' "$config_file")
local keyword_count=$(jq '.activation.keywords | length' "$config_file")
local pattern_count=$(jq '.activation.patterns | length' "$config_file")
log "Parsed config for $skill_name"
log "Keywords: $keyword_count, Patterns: $pattern_count"
# Save parsed data
jq '.name' "$config_file" > "$TEMP_DIR/skill_name.txt"
jq '.activation.keywords[]' "$config_file" > "$TEMP_DIR/keywords.txt"
jq '.activation.patterns[]' "$config_file" > "$TEMP_DIR/patterns.txt"
jq '.usage.test_queries[]' "$config_file" > "$TEMP_DIR/test_queries.txt"
}
# Generate test cases from keywords
generate_keyword_tests() {
local skill_path="$1"
local keywords_file="$TEMP_DIR/keywords.txt"
local output_file="$TEMP_DIR/tests/keyword_tests.json"
log "Generating keyword test cases..."
# Remove quotes and create test variations
local keyword_tests=()
while IFS= read -r keyword; do
# Clean keyword (remove quotes)
keyword=$(echo "$keyword" | tr -d '"' | tr -d "'" | xargs)
if [[ -n "$keyword" && "$keyword" != "_comment:"* ]]; then
# Generate test variations
keyword_tests+=("$keyword") # Exact match
keyword_tests+=("I need to $keyword") # Natural language
keyword_tests+=("Can you $keyword for me?") # Question form
keyword_tests+=("Please $keyword") # Polite request
keyword_tests+=("Help me $keyword") # Help request
keyword_tests+=("$keyword now") # Urgent
keyword_tests+=("I want to $keyword") # Want statement
keyword_tests+=("Need to $keyword") # Need statement
fi
done < "$keywords_file"
# Save to JSON
printf '%s\n' "${keyword_tests[@]}" | jq -R . | jq -s . > "$output_file"
local test_count=$(jq length "$output_file")
success "Generated $test_count keyword test cases"
}
# Generate test cases from patterns
generate_pattern_tests() {
local patterns_file="$TEMP_DIR/patterns.txt"
local output_file="$TEMP_DIR/tests/pattern_tests.json"
log "Generating pattern test cases..."
local pattern_tests=()
while IFS= read -r pattern; do
# Clean pattern (remove quotes)
pattern=$(echo "$pattern" | tr -d '"' | tr -d "'" | xargs)
if [[ -n "$pattern" && "$pattern" != "_comment:"* ]] && [[ "$pattern" =~ \(.*\) ]]; then
# Extract test keywords from pattern
local test_words=$(echo "$pattern" | grep -o '[a-zA-Z-]+' | head -10)
# Generate combinations
for word1 in $(echo "$test_words" | head -5); do
for word2 in $(echo "$test_words" | tail -5); do
if [[ "$word1" != "$word2" ]]; then
pattern_tests+=("$word1 $word2")
pattern_tests+=("I need to $word1 $word2")
pattern_tests+=("Can you $word1 $word2 for me?")
fi
done
done
fi
done < "$patterns_file"
# Save to JSON
printf '%s\n' "${pattern_tests[@]}" | jq -R . | jq -s . > "$output_file"
local test_count=$(jq length "$output_file")
success "Generated $test_count pattern test cases"
}
# Validate regex patterns
validate_patterns() {
local patterns_file="$TEMP_DIR/patterns.txt"
local validation_file="$RESULTS_DIR/logs/pattern_validation.log"
log "Validating regex patterns..."
{
echo "Pattern Validation Results - $(date)"
echo "====================================="
while IFS= read -r pattern; do
# Clean pattern
pattern=$(echo "$pattern" | tr -d '"' | tr -d "'" | xargs)
if [[ -n "$pattern" && "$pattern" != "_comment:"* ]] && [[ "$pattern" =~ \(.*\) ]]; then
echo -e "\nPattern: $pattern"
# Test pattern validity
if python3 -c "
import re
import sys
try:
re.compile(r'$pattern')
print('✅ Valid regex')
except re.error as e:
print(f'❌ Invalid regex: {e}')
sys.exit(1)
"; then
echo "✅ Pattern is syntactically valid"
else
echo "❌ Pattern has syntax errors"
fi
# Check for common issues
if [[ "$pattern" =~ \.\* ]]; then
echo "⚠️ Contains wildcard .* (may be too broad)"
fi
if [[ ! "$pattern" =~ \(.*i.*\) ]]; then
echo "⚠️ Missing case-insensitive flag (?i)"
fi
if [[ "$pattern" =~ \^.*\$ ]]; then
echo "✅ Has proper boundaries"
else
echo "⚠️ May match partial strings"
fi
fi
done < "$patterns_file"
} > "$validation_file"
success "Pattern validation completed - see $validation_file"
}
# Run keyword tests
run_keyword_tests() {
local skill_path="$1"
local test_file="$TEMP_DIR/tests/keyword_tests.json"
local results_file="$RESULTS_DIR/logs/keyword_test_results.json"
log "Running keyword activation tests..."
# This would integrate with Claude Code to test actual activation
# For now, we simulate the testing
python3 << EOF
import json
import random
from datetime import datetime
# Load test cases
with open('$test_file', 'r') as f:
test_cases = json.load(f)
# Simulate test results (in real implementation, this would call Claude Code)
results = []
for i, query in enumerate(test_cases):
# Simulate activation success with 95% probability
activated = random.random() < 0.95
layer = "keyword" if activated else "none"
results.append({
"id": i + 1,
"query": query,
"expected": True,
"actual": activated,
"layer": layer,
"timestamp": datetime.now().isoformat()
})
# Calculate metrics
total_tests = len(results)
successful = sum(1 for r in results if r["actual"])
success_rate = successful / total_tests if total_tests > 0 else 0
# Save results
with open('$results_file', 'w') as f:
json.dump({
"summary": {
"total_tests": total_tests,
"successful": successful,
"failed": total_tests - successful,
"success_rate": success_rate
},
"results": results
}, f, indent=2)
print(f"Keyword tests: {successful}/{total_tests} passed ({success_rate:.1%})")
EOF
local success_rate=$(jq -r '.summary.success_rate' "$results_file")
success "Keyword tests completed with ${success_rate} success rate"
}
# Run pattern tests
run_pattern_tests() {
local test_file="$TEMP_DIR/tests/pattern_tests.json"
local patterns_file="$TEMP_DIR/patterns.txt"
local results_file="$RESULTS_DIR/logs/pattern_test_results.json"
log "Running pattern matching tests..."
python3 << EOF
import json
import re
from datetime import datetime
# Load test cases and patterns
with open('$test_file', 'r') as f:
test_cases = json.load(f)
patterns = []
with open('$patterns_file', 'r') as f:
for line in f:
pattern = line.strip().strip('"')
if pattern and not pattern.startswith('_comment:') and '(' in pattern:
patterns.append(pattern)
# Test each query against patterns
results = []
for i, query in enumerate(test_cases):
matched = False
matched_pattern = None
for pattern in patterns:
try:
if re.search(pattern, query, re.IGNORECASE):
matched = True
matched_pattern = pattern
break
except re.error:
continue
results.append({
"id": i + 1,
"query": query,
"matched": matched,
"pattern": matched_pattern,
"timestamp": datetime.now().isoformat()
})
# Calculate metrics
total_tests = len(results)
matched = sum(1 for r in results if r["matched"])
match_rate = matched / total_tests if total_tests > 0 else 0
# Save results
with open('$results_file', 'w') as f:
json.dump({
"summary": {
"total_tests": total_tests,
"matched": matched,
"unmatched": total_tests - matched,
"match_rate": match_rate,
"patterns_tested": len(patterns)
},
"results": results
}, f, indent=2)
print(f"Pattern tests: {matched}/{total_tests} matched ({match_rate:.1%})")
EOF
local match_rate=$(jq -r '.summary.match_rate' "$results_file")
success "Pattern tests completed with ${match_rate} match rate"
}
# Calculate coverage
calculate_coverage() {
local skill_path="$1"
local coverage_file="$RESULTS_DIR/coverage/coverage_report.json"
log "Calculating activation coverage..."
python3 << EOF
import json
from datetime import datetime
# Load configuration
config_file = "$skill_path/marketplace.json"
with open(config_file, 'r') as f:
config = json.load(f)
# Extract data
keywords = [k for k in config['activation']['keywords'] if not k.startswith('_comment')]
patterns = [p for p in config['activation']['patterns'] if not p.startswith('_comment')]
test_queries = config.get('usage', {}).get('test_queries', [])
# Calculate keyword coverage
keyword_categories = {
'core': [k for k in keywords if any(word in k.lower() for word in ['analyze', 'process', 'create'])],
'synonyms': [k for k in keywords if len(k.split()) > 3],
'natural': [k for k in keywords if any(word in k.lower() for word in ['how to', 'can you', 'help me'])],
'domain': [k for k in keywords if any(word in k.lower() for word in ['technical', 'business', 'data'])]
}
# Calculate pattern complexity
pattern_complexity = []
for pattern in patterns:
complexity = len(pattern.split('|')) + len(pattern.split('\\s+'))
pattern_complexity.append(complexity)
avg_complexity = sum(pattern_complexity) / len(pattern_complexity) if pattern_complexity else 0
# Test query coverage analysis
query_categories = {
'simple': [q for q in test_queries if len(q.split()) <= 5],
'complex': [q for q in test_queries if len(q.split()) > 5],
'questions': [q for q in test_queries if '?' in q or any(q.lower().startswith(w) for w in ['how', 'what', 'can', 'help'])],
'commands': [q for q in test_queries if not any(q.lower().startswith(w) for w in ['how', 'what', 'can', 'help'])]
}
# Overall coverage score
keyword_score = min(len(keywords) / 50, 1.0) * 100 # Target: 50 keywords
pattern_score = min(len(patterns) / 10, 1.0) * 100 # Target: 10 patterns
query_score = min(len(test_queries) / 20, 1.0) * 100 # Target: 20 test queries
complexity_score = min(avg_complexity / 15, 1.0) * 100 # Target: avg complexity 15
overall_score = (keyword_score + pattern_score + query_score + complexity_score) / 4
coverage_report = {
"timestamp": datetime.now().isoformat(),
"overall_score": overall_score,
"keyword_analysis": {
"total": len(keywords),
"categories": {cat: len(items) for cat, items in keyword_categories.items()},
"score": keyword_score
},
"pattern_analysis": {
"total": len(patterns),
"average_complexity": avg_complexity,
"score": pattern_score
},
"test_query_analysis": {
"total": len(test_queries),
"categories": {cat: len(items) for cat, items in query_categories.items()},
"score": query_score
},
"recommendations": []
}
# Generate recommendations
if len(keywords) < 50:
coverage_report["recommendations"].append(f"Add {50 - len(keywords)} more keywords for better coverage")
if len(patterns) < 10:
coverage_report["recommendations"].append(f"Add {10 - len(patterns)} more patterns for better matching")
if len(test_queries) < 20:
coverage_report["recommendations"].append(f"Add {20 - len(test_queries)} more test queries")
if overall_score < 80:
coverage_report["recommendations"].append("Overall coverage below 80% - consider expanding activation system")
# Save report
with open('$coverage_file', 'w') as f:
json.dump(coverage_report, f, indent=2)
print(f"Overall coverage score: {overall_score:.1f}%")
print(f"Keywords: {len(keywords)}, Patterns: {len(patterns)}, Test queries: {len(test_queries)}")
EOF
local overall_score=$(jq -r '.overall_score' "$coverage_file")
success "Coverage analysis completed - Overall score: ${overall_score}%"
}
# Generate test report
generate_test_report() {
local skill_path="$1"
local output_dir="$2"
log "Generating comprehensive test report..."
local skill_name=$(cat "$TEMP_DIR/skill_name.txt" | tr -d '"')
local report_file="$output_dir/activation-test-report.html"
# Load all test results
local keyword_results=$(cat "$RESULTS_DIR/logs/keyword_test_results.json" 2>/dev/null || echo '{"summary": {"success_rate": 0}}')
local pattern_results=$(cat "$RESULTS_DIR/logs/pattern_test_results.json" 2>/dev/null || echo '{"summary": {"match_rate": 0}}')
local coverage_results=$(cat "$RESULTS_DIR/coverage/coverage_report.json" 2>/dev/null || echo '{"overall_score": 0}')
# Extract metrics
local keyword_rate=$(echo "$keyword_results" | jq -r '.summary.success_rate // 0')
local pattern_rate=$(echo "$pattern_results" | jq -r '.summary.match_rate // 0')
local coverage_score=$(echo "$coverage_results" | jq -r '.overall_score // 0')
# Calculate overall score
local overall_score=$(python3 -c "
k_rate = $keyword_rate
p_rate = $pattern_rate
c_score = $coverage_score
overall = (k_rate + p_rate + c_score/100) / 3 * 100
print(f'{overall:.1f}')
")
# Generate HTML report
cat > "$report_file" << EOF
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Activation Test Report - $skill_name</title>
<style>
body { font-family: Arial, sans-serif; margin: 40px; background: #f5f5f5; }
.container { max-width: 1200px; margin: 0 auto; background: white; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; }
h2 { color: #555; margin-top: 30px; }
.metrics { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 20px; margin: 20px 0; }
.metric-card { background: #f8f9fa; padding: 20px; border-radius: 8px; border-left: 4px solid #007bff; }
.metric-value { font-size: 2em; font-weight: bold; color: #007bff; }
.metric-label { color: #666; margin-top: 5px; }
.score-excellent { color: #28a745; }
.score-good { color: #ffc107; }
.score-poor { color: #dc3545; }
.status { padding: 10px; border-radius: 4px; margin: 10px 0; }
.status.pass { background: #d4edda; color: #155724; border: 1px solid #c3e6cb; }
.status.warning { background: #fff3cd; color: #856404; border: 1px solid #ffeaa7; }
.status.fail { background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb; }
.timestamp { color: #666; font-size: 0.9em; margin-top: 20px; }
table { width: 100%; border-collapse: collapse; margin: 20px 0; }
th, td { padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }
th { background: #f8f9fa; font-weight: 600; }
.recommendations { background: #e7f3ff; padding: 20px; border-radius: 8px; border-left: 4px solid #0066cc; }
</style>
</head>
<body>
<div class="container">
<h1>🧪 Activation Test Report</h1>
<p><strong>Skill:</strong> $skill_name</p>
<p><strong>Test Date:</strong> $(date)</p>
<div class="metrics">
<div class="metric-card">
<div class="metric-value $(echo $overall_score | awk '{if ($1 >= 95) print "score-excellent"; else if ($1 >= 80) print "score-good"; else print "score-poor"}')">${overall_score}%</div>
<div class="metric-label">Overall Score</div>
</div>
<div class="metric-card">
<div class="metric-value $(echo $keyword_rate | awk '{if ($1 >= 0.95) print "score-excellent"; else if ($1 >= 0.80) print "score-good"; else print "score-poor"}')">${keyword_rate}</div>
<div class="metric-label">Keyword Success Rate</div>
</div>
<div class="metric-card">
<div class="metric-value $(echo $pattern_rate | awk '{if ($1 >= 0.95) print "score-excellent"; else if ($1 >= 0.80) print "score-good"; else print "score-poor"}')">${pattern_rate}</div>
<div class="metric-label">Pattern Match Rate</div>
</div>
<div class="metric-card">
<div class="metric-value $(echo $coverage_score | awk '{if ($1 >= 80) print "score-excellent"; else if ($1 >= 60) print "score-good"; else print "score-poor"}')">${coverage_score}%</div>
<div class="metric-label">Coverage Score</div>
</div>
</div>
<h2>📊 Test Status</h2>
$(python3 -c "
score = $overall_score
if score >= 95:
print('<div class=\"status pass\">✅ EXCELLENT - Skill activation reliability is excellent (95%+)</div>')
elif score >= 80:
print('<div class=\"status warning\">⚠️ GOOD - Skill activation reliability is good but could be improved</div>')
else:
print('<div class=\"status fail\">❌ NEEDS IMPROVEMENT - Skill activation reliability is below acceptable levels</div>')
")
<h2>📈 Detailed Results</h2>
<table>
<tr><th>Test Type</th><th>Total</th><th>Successful</th><th>Success Rate</th><th>Status</th></tr>
<tr>
<td>Keyword Tests</td>
<td>$(echo "$keyword_results" | jq -r '.summary.total_tests // 0')</td>
<td>$(echo "$keyword_results" | jq -r '.summary.successful // 0')</td>
<td>${keyword_rate}</td>
<td>$(echo "$keyword_rate" | awk '{if ($1 >= 0.95) print "✅ Pass"; else if ($1 >= 0.80) print "⚠️ Warning"; else print "❌ Fail"}')</td>
</tr>
<tr>
<td>Pattern Tests</td>
<td>$(echo "$pattern_results" | jq -r '.summary.total_tests // 0')</td>
<td>$(echo "$pattern_results" | jq -r '.summary.matched // 0')</td>
<td>${pattern_rate}</td>
<td>$(echo "$pattern_rate" | awk '{if ($1 >= 0.95) print "✅ Pass"; else if ($1 >= 0.80) print "⚠️ Warning"; else print "❌ Fail"}')</td>
</tr>
</table>
<h2>🎯 Recommendations</h2>
<div class="recommendations">
<ul>
$(echo "$coverage_results" | jq -r '.recommendations[]? // "No specific recommendations"' | sed 's/^/ <li>/;s/$/<\/li>/')
</ul>
</div>
<div class="timestamp">Report generated on $(date) by Activation Test Automation Framework v1.0</div>
</div>
</body>
</html>
EOF
success "Test report generated: $report_file"
}
# Main function - run full test suite
run_full_test_suite() {
local skill_path="$1"
local output_dir="${2:-$RESULTS_DIR}"
if [[ -z "$skill_path" ]]; then
error "Skill path is required"
echo "Usage: $0 full-test-suite <skill-path> [output-dir]"
return 1
fi
if [[ ! -d "$skill_path" ]]; then
error "Skill directory not found: $skill_path"
return 1
fi
log "🚀 Starting Full Activation Test Suite"
log "Skill: $skill_path"
log "Output: $output_dir"
# Initialize
init_directories "$skill_path"
# Parse configuration
parse_skill_config "$skill_path"
# Generate test cases
generate_keyword_tests "$skill_path"
generate_pattern_tests "$skill_path"
# Validate patterns
validate_patterns "$skill_path"
# Run tests
run_keyword_tests "$skill_path"
run_pattern_tests "$skill_path"
# Calculate coverage
calculate_coverage "$skill_path"
# Generate report
mkdir -p "$output_dir"
generate_test_report "$skill_path" "$output_dir"
success "✅ Full test suite completed!"
log "📁 Report available at: $output_dir/activation-test-report.html"
}
# Quick validation function
quick_validation() {
local skill_path="$1"
if [[ -z "$skill_path" ]]; then
error "Skill path is required"
echo "Usage: $0 quick-validation <skill-path>"
return 1
fi
log "⚡ Running Quick Activation Validation"
local config_file="$skill_path/marketplace.json"
# Check if marketplace.json exists
if [[ ! -f "$config_file" ]]; then
error "marketplace.json not found in $skill_path"
return 1
fi
# Validate JSON
if ! python3 -m json.tool "$config_file" > /dev/null 2>&1; then
error "❌ Invalid JSON in marketplace.json"
return 1
fi
success "✅ JSON syntax is valid"
# Check required fields
local required_fields=("name" "metadata" "plugins" "activation")
for field in "${required_fields[@]}"; do
if ! jq -e ".$field" "$config_file" > /dev/null 2>&1; then
error "❌ Missing required field: $field"
return 1
fi
done
success "✅ All required fields present"
# Check activation structure
if ! jq -e '.activation.keywords' "$config_file" > /dev/null 2>&1; then
error "❌ Missing activation.keywords"
return 1
fi
if ! jq -e '.activation.patterns' "$config_file" > /dev/null 2>&1; then
error "❌ Missing activation.patterns"
return 1
fi
success "✅ Activation structure is valid"
# Check counts
local keyword_count=$(jq '.activation.keywords | length' "$config_file")
local pattern_count=$(jq '.activation.patterns | length' "$config_file")
local test_query_count=$(jq '.usage.test_queries | length' "$config_file" 2>/dev/null || echo "0")
log "📊 Current metrics:"
log " Keywords: $keyword_count (recommend 50+)"
log " Patterns: $pattern_count (recommend 10+)"
log " Test queries: $test_query_count (recommend 20+)"
# Provide recommendations
if [[ $keyword_count -lt 50 ]]; then
warning "Consider adding $((50 - keyword_count)) more keywords for better coverage"
fi
if [[ $pattern_count -lt 10 ]]; then
warning "Consider adding $((10 - pattern_count)) more patterns for better matching"
fi
if [[ $test_query_count -lt 20 ]]; then
warning "Consider adding $((20 - test_query_count)) more test queries"
fi
success "✅ Quick validation completed"
}
# Help function
show_help() {
cat << EOF
Activation Test Automation Framework v1.0
Usage: $0 <command> [options]
Commands:
full-test-suite <skill-path> [output-dir] Run complete test suite
quick-validation <skill-path> Fast validation checks
help Show this help message
Examples:
$0 full-test-suite ./references/examples/stock-analyzer-cskill ./test-results
$0 quick-validation ./references/examples/stock-analyzer-cskill
Environment Variables:
RESULTS_DIR Directory for test results (default: ./test-results)
TEMP_DIR Temporary directory for test files (default: /tmp/activation-tests)
EOF
}
# Main script logic
case "${1:-}" in
"full-test-suite")
run_full_test_suite "$2" "$3"
;;
"quick-validation")
quick_validation "$2"
;;
"help"|"--help"|"-h")
show_help
;;
*)
error "Unknown command: ${1:-}"
show_help
exit 1
;;
esac