Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:27:28 +08:00
commit 8db9c44dd8
79 changed files with 37715 additions and 0 deletions

View File

@@ -0,0 +1,613 @@
# Activation Testing Guide
**Version:** 1.0
**Purpose:** Comprehensive guide for testing skill activation reliability
---
## Overview
This guide provides procedures, templates, and checklists for testing the 3-Layer Activation System to ensure skills activate correctly and reliably.
### Testing Philosophy
**Goal:** 95%+ activation reliability
**Approach:** Test each layer independently, then integration
**Metrics:**
- **True Positives:** Valid queries that correctly activate
- **True Negatives:** Invalid queries that correctly don't activate
- **False Positives:** Invalid queries that incorrectly activate
- **False Negatives:** Valid queries that fail to activate
**Target:** Zero false positives, <5% false negatives
---
## 🧪 Testing Methodology
### Phase 1: Layer 1 Testing (Keywords)
#### Objective
Verify that exact keyword phrases activate the skill.
#### Procedure
**Step 1:** List all keywords from marketplace.json
**Step 2:** Create test query for each keyword
**Step 3:** Test each query manually
**Step 4:** Document results
#### Template
```markdown
## Layer 1: Keywords Testing
**Keyword 1:** "create an agent for"
Test Queries:
1. "create an agent for processing invoices"
- ✅ Activated
- Via: Keyword match
2. "I want to create an agent for data analysis"
- ✅ Activated
- Via: Keyword match
3. "Create An Agent For automation" // Case variation
- ✅ Activated
- Via: Keyword match (case-insensitive)
**Keyword 2:** "automate workflow"
...
```
#### Pass Criteria
- [ ] 100% of keyword test queries activate
- [ ] Case-insensitive matching works
- [ ] Embedded keywords activate (keyword within longer query)
---
### Phase 2: Layer 2 Testing (Patterns)
#### Objective
Verify that regex patterns capture expected variations.
#### Procedure
**Step 1:** List all patterns from marketplace.json
**Step 2:** Create 5+ test queries per pattern
**Step 3:** Test pattern matching (can use regex tester)
**Step 4:** Test in Claude Code
**Step 5:** Document results
#### Template
```markdown
## Layer 2: Patterns Testing
**Pattern 1:** `(?i)(create|build)\s+(an?\s+)?agent\s+for`
Designed to Match:
- Verbs: create, build
- Optional article: a, an
- Entity: agent
- Connector: for
Test Queries:
1. "create an agent for automation"
- ✅ Matches pattern
- ✅ Activated in Claude Code
2. "build a agent for processing"
- ✅ Matches pattern
- ✅ Activated
3. "create agent for data" // No article
- ✅ Matches pattern
- ✅ Activated
4. "Build Agent For Tasks" // Different case
- ✅ Matches pattern
- ✅ Activated
5. "I want to create an agent for reporting" // Embedded
- ✅ Matches pattern
- ✅ Activated
Should NOT Match:
6. "agent creation guide"
- ❌ No action verb
- ❌ Correctly did not activate
7. "create something for automation"
- ❌ No "agent" keyword
- ❌ Correctly did not activate
```
#### Pass Criteria
- [ ] 100% of positive test queries match pattern
- [ ] 100% of positive queries activate in Claude Code
- [ ] 0% of negative queries match pattern
- [ ] Pattern is flexible (captures variations)
- [ ] Pattern is specific (no false positives)
---
### Phase 3: Layer 3 Testing (Description + NLU)
#### Objective
Verify that description helps Claude understand intent for edge cases.
#### Procedure
**Step 1:** Create queries that DON'T match keywords/patterns
**Step 2:** Verify these still activate via description understanding
**Step 3:** Document which queries activate
#### Template
```markdown
## Layer 3: Description + NLU Testing
**Queries that don't match Keywords or Patterns:**
1. "I keep doing this task manually, can you help automate it?"
- ❌ No keyword match
- ❌ No pattern match
- ✅ Should activate via description understanding
- Result: {activated/did not activate}
2. "This process is repetitive and takes hours daily"
- ❌ No keyword match
- ❌ No pattern match
- ✅ Should activate (describes repetitive workflow)
- Result: {activated/did not activate}
3. "Help me build something to handle this workflow"
- ❌ No exact keyword
- ⚠️ Might match pattern
- ✅ Should activate
- Result: {activated/did not activate}
```
#### Pass Criteria
- [ ] Edge case queries activate when appropriate
- [ ] Natural language variations work
- [ ] Description provides fallback coverage
---
### Phase 4: Integration Testing
#### Objective
Test complete system with real-world query variations.
#### Procedure
**Step 1:** Create 10+ realistic query variations per capability
**Step 2:** Test all queries in actual Claude Code environment
**Step 3:** Track activation success rate
**Step 4:** Identify gaps
#### Template
```markdown
## Integration Testing
**Capability:** Agent Creation
**Test Queries:**
| # | Query | Expected | Actual | Layer | Status |
|---|-------|----------|--------|-------|--------|
| 1 | "create an agent for PDFs" | Activate | Activated | Keyword | ✅ |
| 2 | "build automation for emails" | Activate | Activated | Pattern | ✅ |
| 3 | "daily I process invoices manually" | Activate | Activated | Desc | ✅ |
| 4 | "make agent for data entry" | Activate | Activated | Pattern | ✅ |
| 5 | "automate my workflow for reports" | Activate | Activated | Keyword | ✅ |
| 6 | "I need help with automation" | Activate | NOT activated | - | ❌ |
| 7 | "turn this into automated process" | Activate | Activated | Pattern | ✅ |
| 8 | "create skill for stock analysis" | Activate | Activated | Keyword | ✅ |
| 9 | "repeatedly doing this task" | Activate | Activated | Desc | ✅ |
| 10 | "can you help automate this?" | Activate | Activated | Desc | ✅ |
**Results:**
- Total queries: 10
- Activated correctly: 9
- Failed to activate: 1 (Query #6)
- Success rate: 90%
**Issues:**
- Query #6 too generic, needs more specific keywords
```
#### Pass Criteria
- [ ] 95%+ success rate
- [ ] All capability variations covered
- [ ] Realistic query phrasings tested
- [ ] Edge cases documented
---
### Phase 5: Negative Testing (False Positives)
#### Objective
Ensure skill does NOT activate for out-of-scope queries.
#### Procedure
**Step 1:** List out-of-scope use cases (when_not_to_use)
**Step 2:** Create queries for each
**Step 3:** Verify skill does NOT activate
**Step 4:** Document any false positives
#### Template
```markdown
## Negative Testing
**Out of Scope:** General programming questions
Test Queries (Should NOT Activate):
1. "How do I write a for loop in Python?"
- Result: Did not activate ✅
2. "What's the difference between list and tuple?"
- Result: Did not activate ✅
3. "Help me debug this code"
- Result: Did not activate ✅
**Out of Scope:** Using existing skills
Test Queries (Should NOT Activate):
4. "Run the invoice processor skill"
- Result: Did not activate ✅
5. "Show me existing agents"
- Result: Did not activate ✅
**Results:**
- Total negative queries: 5
- Correctly did not activate: 5
- False positives: 0
- Success rate: 100%
```
#### Pass Criteria
- [ ] 100% of out-of-scope queries do NOT activate
- [ ] Zero false positives
- [ ] when_not_to_use cases covered
---
## 📋 Complete Testing Checklist
### Pre-Testing Setup
- [ ] marketplace.json has activation section
- [ ] Keywords defined (10-15)
- [ ] Patterns defined (5-7)
- [ ] Description includes keywords
- [ ] when_to_use / when_not_to_use defined
- [ ] test_queries array populated
### Layer 1: Keywords
- [ ] All keywords tested individually
- [ ] Case-insensitive matching verified
- [ ] Embedded keywords work
- [ ] 100% activation rate
### Layer 2: Patterns
- [ ] Each pattern tested with 5+ queries
- [ ] Pattern matches verified (regex tester)
- [ ] Claude Code activation verified
- [ ] No false positives
- [ ] Flexible enough for variations
### Layer 3: Description
- [ ] Edge cases tested
- [ ] Natural language variations work
- [ ] Fallback coverage confirmed
### Integration
- [ ] 10+ realistic queries per capability tested
- [ ] 95%+ success rate achieved
- [ ] All capabilities covered
- [ ] Results documented
### Negative Testing
- [ ] Out-of-scope queries tested
- [ ] Zero false positives
- [ ] when_not_to_use cases verified
### Documentation
- [ ] Test results documented
- [ ] Issues logged
- [ ] Recommendations made
- [ ] marketplace.json updated if needed
---
## 📊 Test Report Template
```markdown
# Activation Test Report
**Skill Name:** {skill-name}
**Version:** {version}
**Test Date:** {date}
**Tested By:** {name}
**Environment:** Claude Code {version}
---
## Executive Summary
- **Overall Success Rate:** {X}%
- **Total Queries Tested:** {N}
- **True Positives:** {N}
- **True Negatives:** {N}
- **False Positives:** {N}
- **False Negatives:** {N}
---
## Layer 1: Keywords Testing
**Keywords Tested:** {count}
**Success Rate:** {X}%
### Results
| Keyword | Test Queries | Passed | Failed |
|---------|--------------|--------|--------|
| {keyword-1} | {N} | {N} | {N} |
| {keyword-2} | {N} | {N} | {N} |
**Issues:**
- {issue-1}
- {issue-2}
---
## Layer 2: Patterns Testing
**Patterns Tested:** {count}
**Success Rate:** {X}%
### Results
| Pattern | Test Queries | Passed | Failed |
|---------|--------------|--------|--------|
| {pattern-1} | {N} | {N} | {N} |
| {pattern-2} | {N} | {N} | {N} |
**Issues:**
- {issue-1}
- {issue-2}
---
## Layer 3: Description Testing
**Edge Cases Tested:** {count}
**Success Rate:** {X}%
**Results:**
- Activated via description: {N}
- Failed to activate: {N}
---
## Integration Testing
**Total Test Queries:** {count}
**Success Rate:** {X}%
**Breakdown by Capability:**
| Capability | Queries | Success | Rate |
|------------|---------|---------|------|
| {cap-1} | {N} | {N} | {X}% |
| {cap-2} | {N} | {N} | {X}% |
---
## Negative Testing
**Out-of-Scope Queries:** {count}
**False Positives:** {N}
**Success Rate:** {X}%
---
## Issues & Recommendations
### Critical Issues
1. {issue-description}
- Impact: {high/medium/low}
- Recommendation: {action}
### Minor Issues
1. {issue-description}
- Impact: {low}
- Recommendation: {action}
### Recommendations
1. {recommendation-1}
2. {recommendation-2}
---
## Conclusion
{Summary of test results and next steps}
**Status:** {PASS / NEEDS WORK / FAIL}
---
**Appendix A:** Full Test Query List
**Appendix B:** Failed Query Analysis
**Appendix C:** Updated marketplace.json (if changes needed)
```
---
## 🔄 Iterative Testing Process
### Step 1: Initial Test
- Run complete test suite
- Document results
- Identify failures
### Step 2: Analysis
- Analyze failed queries
- Determine root cause
- Plan fixes
### Step 3: Fix
- Update keywords/patterns/description
- Document changes
### Step 4: Retest
- Test only failed queries
- Verify fixes work
- Ensure no regressions
### Step 5: Full Regression Test
- Run complete test suite again
- Verify 95%+ success rate
- Document final results
---
## 🎯 Sample Test Suite
### Example: Agent Creation Skill
```markdown
## Test Suite: Agent Creation Skill
### Layer 1 Tests (Keywords)
**Keyword:** "create an agent for"
- ✅ "create an agent for processing PDFs"
- ✅ "I want to create an agent for automation"
- ✅ "Create An Agent For daily tasks"
**Keyword:** "automate workflow"
- ✅ "automate workflow for invoices"
- ✅ "need to automate workflow"
- ✅ "Automate Workflow handling"
[... more keywords]
### Layer 2 Tests (Patterns)
**Pattern:** `(?i)(create|build)\s+(an?\s+)?agent`
- ✅ "create an agent for X"
- ✅ "build a agent for Y"
- ✅ "create agent for Z"
- ✅ "Build Agent for tasks"
- ❌ "agent creation guide" (should not match)
[... more patterns]
### Integration Tests
**Capability:** Agent Creation
1. ✅ "create an agent for processing CSVs"
2. ✅ "build automation for email handling"
3. ✅ "automate this workflow: download, process, upload"
4. ✅ "every day I have to categorize files manually"
5. ✅ "turn this process into an automated agent"
6. ✅ "I need a skill for data extraction"
7. ✅ "daily workflow automation needed"
8. ✅ "repeatedly doing manual data entry"
9. ✅ "develop an agent to monitor APIs"
10. ✅ "make something to handle invoices automatically"
**Success Rate:** 10/10 = 100%
### Negative Tests
**Should NOT Activate:**
1. ✅ "How do I use an existing agent?" (did not activate)
2. ✅ "Explain what agents are" (did not activate)
3. ✅ "Debug this code" (did not activate)
4. ✅ "Write a Python function" (did not activate)
5. ✅ "Run the invoice agent" (did not activate)
**Success Rate:** 5/5 = 100%
```
---
## 📚 Additional Resources
- `phase4-detection.md` - Detection methodology
- `activation-patterns-guide.md` - Pattern library
- `activation-quality-checklist.md` - Quality standards
- `ACTIVATION_BEST_PRACTICES.md` - Best practices
---
## 🔧 Troubleshooting
### Issue: Low Success Rate (<90%)
**Diagnosis:**
1. Review failed queries
2. Check if keywords/patterns too narrow
3. Verify description includes key concepts
**Solution:**
1. Add more keyword variations
2. Broaden patterns slightly
3. Enhance description with synonyms
### Issue: False Positives
**Diagnosis:**
1. Review activated queries
2. Check if patterns too broad
3. Verify keywords not too generic
**Solution:**
1. Narrow patterns (add context requirements)
2. Use complete phrases for keywords
3. Add negative scope to description
### Issue: Inconsistent Activation
**Diagnosis:**
1. Test same query multiple times
2. Check for Claude Code updates
3. Verify marketplace.json structure
**Solution:**
1. Use all 3 layers (keywords + patterns + description)
2. Increase keyword/pattern coverage
3. Validate JSON syntax
---
**Version:** 1.0
**Last Updated:** 2025-10-23
**Maintained By:** Agent-Skill-Creator Team