# Activation Testing Guide **Version:** 1.0 **Purpose:** Comprehensive guide for testing skill activation reliability --- ## Overview This guide provides procedures, templates, and checklists for testing the 3-Layer Activation System to ensure skills activate correctly and reliably. ### Testing Philosophy **Goal:** 95%+ activation reliability **Approach:** Test each layer independently, then integration **Metrics:** - **True Positives:** Valid queries that correctly activate - **True Negatives:** Invalid queries that correctly don't activate - **False Positives:** Invalid queries that incorrectly activate - **False Negatives:** Valid queries that fail to activate **Target:** Zero false positives, <5% false negatives --- ## ๐Ÿงช Testing Methodology ### Phase 1: Layer 1 Testing (Keywords) #### Objective Verify that exact keyword phrases activate the skill. #### Procedure **Step 1:** List all keywords from marketplace.json **Step 2:** Create test query for each keyword **Step 3:** Test each query manually **Step 4:** Document results #### Template ```markdown ## Layer 1: Keywords Testing **Keyword 1:** "create an agent for" Test Queries: 1. "create an agent for processing invoices" - โœ… Activated - Via: Keyword match 2. "I want to create an agent for data analysis" - โœ… Activated - Via: Keyword match 3. "Create An Agent For automation" // Case variation - โœ… Activated - Via: Keyword match (case-insensitive) **Keyword 2:** "automate workflow" ... ``` #### Pass Criteria - [ ] 100% of keyword test queries activate - [ ] Case-insensitive matching works - [ ] Embedded keywords activate (keyword within longer query) --- ### Phase 2: Layer 2 Testing (Patterns) #### Objective Verify that regex patterns capture expected variations. #### Procedure **Step 1:** List all patterns from marketplace.json **Step 2:** Create 5+ test queries per pattern **Step 3:** Test pattern matching (can use regex tester) **Step 4:** Test in Claude Code **Step 5:** Document results #### Template ```markdown ## Layer 2: Patterns Testing **Pattern 1:** `(?i)(create|build)\s+(an?\s+)?agent\s+for` Designed to Match: - Verbs: create, build - Optional article: a, an - Entity: agent - Connector: for Test Queries: 1. "create an agent for automation" - โœ… Matches pattern - โœ… Activated in Claude Code 2. "build a agent for processing" - โœ… Matches pattern - โœ… Activated 3. "create agent for data" // No article - โœ… Matches pattern - โœ… Activated 4. "Build Agent For Tasks" // Different case - โœ… Matches pattern - โœ… Activated 5. "I want to create an agent for reporting" // Embedded - โœ… Matches pattern - โœ… Activated Should NOT Match: 6. "agent creation guide" - โŒ No action verb - โŒ Correctly did not activate 7. "create something for automation" - โŒ No "agent" keyword - โŒ Correctly did not activate ``` #### Pass Criteria - [ ] 100% of positive test queries match pattern - [ ] 100% of positive queries activate in Claude Code - [ ] 0% of negative queries match pattern - [ ] Pattern is flexible (captures variations) - [ ] Pattern is specific (no false positives) --- ### Phase 3: Layer 3 Testing (Description + NLU) #### Objective Verify that description helps Claude understand intent for edge cases. #### Procedure **Step 1:** Create queries that DON'T match keywords/patterns **Step 2:** Verify these still activate via description understanding **Step 3:** Document which queries activate #### Template ```markdown ## Layer 3: Description + NLU Testing **Queries that don't match Keywords or Patterns:** 1. "I keep doing this task manually, can you help automate it?" - โŒ No keyword match - โŒ No pattern match - โœ… Should activate via description understanding - Result: {activated/did not activate} 2. "This process is repetitive and takes hours daily" - โŒ No keyword match - โŒ No pattern match - โœ… Should activate (describes repetitive workflow) - Result: {activated/did not activate} 3. "Help me build something to handle this workflow" - โŒ No exact keyword - โš ๏ธ Might match pattern - โœ… Should activate - Result: {activated/did not activate} ``` #### Pass Criteria - [ ] Edge case queries activate when appropriate - [ ] Natural language variations work - [ ] Description provides fallback coverage --- ### Phase 4: Integration Testing #### Objective Test complete system with real-world query variations. #### Procedure **Step 1:** Create 10+ realistic query variations per capability **Step 2:** Test all queries in actual Claude Code environment **Step 3:** Track activation success rate **Step 4:** Identify gaps #### Template ```markdown ## Integration Testing **Capability:** Agent Creation **Test Queries:** | # | Query | Expected | Actual | Layer | Status | |---|-------|----------|--------|-------|--------| | 1 | "create an agent for PDFs" | Activate | Activated | Keyword | โœ… | | 2 | "build automation for emails" | Activate | Activated | Pattern | โœ… | | 3 | "daily I process invoices manually" | Activate | Activated | Desc | โœ… | | 4 | "make agent for data entry" | Activate | Activated | Pattern | โœ… | | 5 | "automate my workflow for reports" | Activate | Activated | Keyword | โœ… | | 6 | "I need help with automation" | Activate | NOT activated | - | โŒ | | 7 | "turn this into automated process" | Activate | Activated | Pattern | โœ… | | 8 | "create skill for stock analysis" | Activate | Activated | Keyword | โœ… | | 9 | "repeatedly doing this task" | Activate | Activated | Desc | โœ… | | 10 | "can you help automate this?" | Activate | Activated | Desc | โœ… | **Results:** - Total queries: 10 - Activated correctly: 9 - Failed to activate: 1 (Query #6) - Success rate: 90% **Issues:** - Query #6 too generic, needs more specific keywords ``` #### Pass Criteria - [ ] 95%+ success rate - [ ] All capability variations covered - [ ] Realistic query phrasings tested - [ ] Edge cases documented --- ### Phase 5: Negative Testing (False Positives) #### Objective Ensure skill does NOT activate for out-of-scope queries. #### Procedure **Step 1:** List out-of-scope use cases (when_not_to_use) **Step 2:** Create queries for each **Step 3:** Verify skill does NOT activate **Step 4:** Document any false positives #### Template ```markdown ## Negative Testing **Out of Scope:** General programming questions Test Queries (Should NOT Activate): 1. "How do I write a for loop in Python?" - Result: Did not activate โœ… 2. "What's the difference between list and tuple?" - Result: Did not activate โœ… 3. "Help me debug this code" - Result: Did not activate โœ… **Out of Scope:** Using existing skills Test Queries (Should NOT Activate): 4. "Run the invoice processor skill" - Result: Did not activate โœ… 5. "Show me existing agents" - Result: Did not activate โœ… **Results:** - Total negative queries: 5 - Correctly did not activate: 5 - False positives: 0 - Success rate: 100% ``` #### Pass Criteria - [ ] 100% of out-of-scope queries do NOT activate - [ ] Zero false positives - [ ] when_not_to_use cases covered --- ## ๐Ÿ“‹ Complete Testing Checklist ### Pre-Testing Setup - [ ] marketplace.json has activation section - [ ] Keywords defined (10-15) - [ ] Patterns defined (5-7) - [ ] Description includes keywords - [ ] when_to_use / when_not_to_use defined - [ ] test_queries array populated ### Layer 1: Keywords - [ ] All keywords tested individually - [ ] Case-insensitive matching verified - [ ] Embedded keywords work - [ ] 100% activation rate ### Layer 2: Patterns - [ ] Each pattern tested with 5+ queries - [ ] Pattern matches verified (regex tester) - [ ] Claude Code activation verified - [ ] No false positives - [ ] Flexible enough for variations ### Layer 3: Description - [ ] Edge cases tested - [ ] Natural language variations work - [ ] Fallback coverage confirmed ### Integration - [ ] 10+ realistic queries per capability tested - [ ] 95%+ success rate achieved - [ ] All capabilities covered - [ ] Results documented ### Negative Testing - [ ] Out-of-scope queries tested - [ ] Zero false positives - [ ] when_not_to_use cases verified ### Documentation - [ ] Test results documented - [ ] Issues logged - [ ] Recommendations made - [ ] marketplace.json updated if needed --- ## ๐Ÿ“Š Test Report Template ```markdown # Activation Test Report **Skill Name:** {skill-name} **Version:** {version} **Test Date:** {date} **Tested By:** {name} **Environment:** Claude Code {version} --- ## Executive Summary - **Overall Success Rate:** {X}% - **Total Queries Tested:** {N} - **True Positives:** {N} - **True Negatives:** {N} - **False Positives:** {N} - **False Negatives:** {N} --- ## Layer 1: Keywords Testing **Keywords Tested:** {count} **Success Rate:** {X}% ### Results | Keyword | Test Queries | Passed | Failed | |---------|--------------|--------|--------| | {keyword-1} | {N} | {N} | {N} | | {keyword-2} | {N} | {N} | {N} | **Issues:** - {issue-1} - {issue-2} --- ## Layer 2: Patterns Testing **Patterns Tested:** {count} **Success Rate:** {X}% ### Results | Pattern | Test Queries | Passed | Failed | |---------|--------------|--------|--------| | {pattern-1} | {N} | {N} | {N} | | {pattern-2} | {N} | {N} | {N} | **Issues:** - {issue-1} - {issue-2} --- ## Layer 3: Description Testing **Edge Cases Tested:** {count} **Success Rate:** {X}% **Results:** - Activated via description: {N} - Failed to activate: {N} --- ## Integration Testing **Total Test Queries:** {count} **Success Rate:** {X}% **Breakdown by Capability:** | Capability | Queries | Success | Rate | |------------|---------|---------|------| | {cap-1} | {N} | {N} | {X}% | | {cap-2} | {N} | {N} | {X}% | --- ## Negative Testing **Out-of-Scope Queries:** {count} **False Positives:** {N} **Success Rate:** {X}% --- ## Issues & Recommendations ### Critical Issues 1. {issue-description} - Impact: {high/medium/low} - Recommendation: {action} ### Minor Issues 1. {issue-description} - Impact: {low} - Recommendation: {action} ### Recommendations 1. {recommendation-1} 2. {recommendation-2} --- ## Conclusion {Summary of test results and next steps} **Status:** {PASS / NEEDS WORK / FAIL} --- **Appendix A:** Full Test Query List **Appendix B:** Failed Query Analysis **Appendix C:** Updated marketplace.json (if changes needed) ``` --- ## ๐Ÿ”„ Iterative Testing Process ### Step 1: Initial Test - Run complete test suite - Document results - Identify failures ### Step 2: Analysis - Analyze failed queries - Determine root cause - Plan fixes ### Step 3: Fix - Update keywords/patterns/description - Document changes ### Step 4: Retest - Test only failed queries - Verify fixes work - Ensure no regressions ### Step 5: Full Regression Test - Run complete test suite again - Verify 95%+ success rate - Document final results --- ## ๐ŸŽฏ Sample Test Suite ### Example: Agent Creation Skill ```markdown ## Test Suite: Agent Creation Skill ### Layer 1 Tests (Keywords) **Keyword:** "create an agent for" - โœ… "create an agent for processing PDFs" - โœ… "I want to create an agent for automation" - โœ… "Create An Agent For daily tasks" **Keyword:** "automate workflow" - โœ… "automate workflow for invoices" - โœ… "need to automate workflow" - โœ… "Automate Workflow handling" [... more keywords] ### Layer 2 Tests (Patterns) **Pattern:** `(?i)(create|build)\s+(an?\s+)?agent` - โœ… "create an agent for X" - โœ… "build a agent for Y" - โœ… "create agent for Z" - โœ… "Build Agent for tasks" - โŒ "agent creation guide" (should not match) [... more patterns] ### Integration Tests **Capability:** Agent Creation 1. โœ… "create an agent for processing CSVs" 2. โœ… "build automation for email handling" 3. โœ… "automate this workflow: download, process, upload" 4. โœ… "every day I have to categorize files manually" 5. โœ… "turn this process into an automated agent" 6. โœ… "I need a skill for data extraction" 7. โœ… "daily workflow automation needed" 8. โœ… "repeatedly doing manual data entry" 9. โœ… "develop an agent to monitor APIs" 10. โœ… "make something to handle invoices automatically" **Success Rate:** 10/10 = 100% ### Negative Tests **Should NOT Activate:** 1. โœ… "How do I use an existing agent?" (did not activate) 2. โœ… "Explain what agents are" (did not activate) 3. โœ… "Debug this code" (did not activate) 4. โœ… "Write a Python function" (did not activate) 5. โœ… "Run the invoice agent" (did not activate) **Success Rate:** 5/5 = 100% ``` --- ## ๐Ÿ“š Additional Resources - `phase4-detection.md` - Detection methodology - `activation-patterns-guide.md` - Pattern library - `activation-quality-checklist.md` - Quality standards - `ACTIVATION_BEST_PRACTICES.md` - Best practices --- ## ๐Ÿ”ง Troubleshooting ### Issue: Low Success Rate (<90%) **Diagnosis:** 1. Review failed queries 2. Check if keywords/patterns too narrow 3. Verify description includes key concepts **Solution:** 1. Add more keyword variations 2. Broaden patterns slightly 3. Enhance description with synonyms ### Issue: False Positives **Diagnosis:** 1. Review activated queries 2. Check if patterns too broad 3. Verify keywords not too generic **Solution:** 1. Narrow patterns (add context requirements) 2. Use complete phrases for keywords 3. Add negative scope to description ### Issue: Inconsistent Activation **Diagnosis:** 1. Test same query multiple times 2. Check for Claude Code updates 3. Verify marketplace.json structure **Solution:** 1. Use all 3 layers (keywords + patterns + description) 2. Increase keyword/pattern coverage 3. Validate JSON syntax --- **Version:** 1.0 **Last Updated:** 2025-10-23 **Maintained By:** Agent-Skill-Creator Team