Initial commit

2025-11-29 18:27:28 +08:00
commit 8db9c44dd8
79 changed files with 37715 additions and 0 deletions
--- a/skills/FrancyJGLisboa__agent-skill-creator/references/activation-testing-guide.md
+++ b/skills/FrancyJGLisboa__agent-skill-creator/references/activation-testing-guide.md
@@ -0,0 +1,613 @@
+# Activation Testing Guide
+
+**Version:** 1.0
+**Purpose:** Comprehensive guide for testing skill activation reliability
+
+---
+
+## Overview
+
+This guide provides procedures, templates, and checklists for testing the 3-Layer Activation System to ensure skills activate correctly and reliably.
+
+### Testing Philosophy
+
+**Goal:** 95%+ activation reliability
+
+**Approach:** Test each layer independently, then integration
+
+**Metrics:**
+- **True Positives:** Valid queries that correctly activate
+- **True Negatives:** Invalid queries that correctly don't activate
+- **False Positives:** Invalid queries that incorrectly activate
+- **False Negatives:** Valid queries that fail to activate
+
+**Target:** Zero false positives, <5% false negatives
+
+---
+
+## 🧪 Testing Methodology
+
+### Phase 1: Layer 1 Testing (Keywords)
+
+#### Objective
+Verify that exact keyword phrases activate the skill.
+
+#### Procedure
+
+**Step 1:** List all keywords from marketplace.json
+
+**Step 2:** Create test query for each keyword
+
+**Step 3:** Test each query manually
+
+**Step 4:** Document results
+
+#### Template
+
+```markdown
+## Layer 1: Keywords Testing
+
+**Keyword 1:** "create an agent for"
+
+Test Queries:
+1. "create an agent for processing invoices"
+   - ✅ Activated
+   - Via: Keyword match
+
+2. "I want to create an agent for data analysis"
+   - ✅ Activated
+   - Via: Keyword match
+
+3. "Create An Agent For automation"  // Case variation
+   - ✅ Activated
+   - Via: Keyword match (case-insensitive)
+
+**Keyword 2:** "automate workflow"
+...
+```
+
+#### Pass Criteria
+- [ ] 100% of keyword test queries activate
+- [ ] Case-insensitive matching works
+- [ ] Embedded keywords activate (keyword within longer query)
+
+---
+
+### Phase 2: Layer 2 Testing (Patterns)
+
+#### Objective
+Verify that regex patterns capture expected variations.
+
+#### Procedure
+
+**Step 1:** List all patterns from marketplace.json
+
+**Step 2:** Create 5+ test queries per pattern
+
+**Step 3:** Test pattern matching (can use regex tester)
+
+**Step 4:** Test in Claude Code
+
+**Step 5:** Document results
+
+#### Template
+
+```markdown
+## Layer 2: Patterns Testing
+
+**Pattern 1:** `(?i)(create|build)\s+(an?\s+)?agent\s+for`
+
+Designed to Match:
+- Verbs: create, build
+- Optional article: a, an
+- Entity: agent
+- Connector: for
+
+Test Queries:
+1. "create an agent for automation"
+   - ✅ Matches pattern
+   - ✅ Activated in Claude Code
+
+2. "build a agent for processing"
+   - ✅ Matches pattern
+   - ✅ Activated
+
+3. "create agent for data"  // No article
+   - ✅ Matches pattern
+   - ✅ Activated
+
+4. "Build Agent For Tasks"  // Different case
+   - ✅ Matches pattern
+   - ✅ Activated
+
+5. "I want to create an agent for reporting"  // Embedded
+   - ✅ Matches pattern
+   - ✅ Activated
+
+Should NOT Match:
+6. "agent creation guide"
+   - ❌ No action verb
+   - ❌ Correctly did not activate
+
+7. "create something for automation"
+   - ❌ No "agent" keyword
+   - ❌ Correctly did not activate
+```
+
+#### Pass Criteria
+- [ ] 100% of positive test queries match pattern
+- [ ] 100% of positive queries activate in Claude Code
+- [ ] 0% of negative queries match pattern
+- [ ] Pattern is flexible (captures variations)
+- [ ] Pattern is specific (no false positives)
+
+---
+
+### Phase 3: Layer 3 Testing (Description + NLU)
+
+#### Objective
+Verify that description helps Claude understand intent for edge cases.
+
+#### Procedure
+
+**Step 1:** Create queries that DON'T match keywords/patterns
+
+**Step 2:** Verify these still activate via description understanding
+
+**Step 3:** Document which queries activate
+
+#### Template
+
+```markdown
+## Layer 3: Description + NLU Testing
+
+**Queries that don't match Keywords or Patterns:**
+
+1. "I keep doing this task manually, can you help automate it?"
+   - ❌ No keyword match
+   - ❌ No pattern match
+   - ✅ Should activate via description understanding
+   - Result: {activated/did not activate}
+
+2. "This process is repetitive and takes hours daily"
+   - ❌ No keyword match
+   - ❌ No pattern match
+   - ✅ Should activate (describes repetitive workflow)
+   - Result: {activated/did not activate}
+
+3. "Help me build something to handle this workflow"
+   - ❌ No exact keyword
+   - ⚠️ Might match pattern
+   - ✅ Should activate
+   - Result: {activated/did not activate}
+```
+
+#### Pass Criteria
+- [ ] Edge case queries activate when appropriate
+- [ ] Natural language variations work
+- [ ] Description provides fallback coverage
+
+---
+
+### Phase 4: Integration Testing
+
+#### Objective
+Test complete system with real-world query variations.
+
+#### Procedure
+
+**Step 1:** Create 10+ realistic query variations per capability
+
+**Step 2:** Test all queries in actual Claude Code environment
+
+**Step 3:** Track activation success rate
+
+**Step 4:** Identify gaps
+
+#### Template
+
+```markdown
+## Integration Testing
+
+**Capability:** Agent Creation
+
+**Test Queries:**
+
+| # | Query | Expected | Actual | Layer | Status |
+|---|-------|----------|--------|-------|--------|
+| 1 | "create an agent for PDFs" | Activate | Activated | Keyword | ✅ |
+| 2 | "build automation for emails" | Activate | Activated | Pattern | ✅ |
+| 3 | "daily I process invoices manually" | Activate | Activated | Desc | ✅ |
+| 4 | "make agent for data entry" | Activate | Activated | Pattern | ✅ |
+| 5 | "automate my workflow for reports" | Activate | Activated | Keyword | ✅ |
+| 6 | "I need help with automation" | Activate | NOT activated | - | ❌ |
+| 7 | "turn this into automated process" | Activate | Activated | Pattern | ✅ |
+| 8 | "create skill for stock analysis" | Activate | Activated | Keyword | ✅ |
+| 9 | "repeatedly doing this task" | Activate | Activated | Desc | ✅ |
+| 10 | "can you help automate this?" | Activate | Activated | Desc | ✅ |
+
+**Results:**
+- Total queries: 10
+- Activated correctly: 9
+- Failed to activate: 1 (Query #6)
+- Success rate: 90%
+
+**Issues:**
+- Query #6 too generic, needs more specific keywords
+```
+
+#### Pass Criteria
+- [ ] 95%+ success rate
+- [ ] All capability variations covered
+- [ ] Realistic query phrasings tested
+- [ ] Edge cases documented
+
+---
+
+### Phase 5: Negative Testing (False Positives)
+
+#### Objective
+Ensure skill does NOT activate for out-of-scope queries.
+
+#### Procedure
+
+**Step 1:** List out-of-scope use cases (when_not_to_use)
+
+**Step 2:** Create queries for each
+
+**Step 3:** Verify skill does NOT activate
+
+**Step 4:** Document any false positives
+
+#### Template
+
+```markdown
+## Negative Testing
+
+**Out of Scope:** General programming questions
+
+Test Queries (Should NOT Activate):
+1. "How do I write a for loop in Python?"
+   - Result: Did not activate ✅
+
+2. "What's the difference between list and tuple?"
+   - Result: Did not activate ✅
+
+3. "Help me debug this code"
+   - Result: Did not activate ✅
+
+**Out of Scope:** Using existing skills
+
+Test Queries (Should NOT Activate):
+4. "Run the invoice processor skill"
+   - Result: Did not activate ✅
+
+5. "Show me existing agents"
+   - Result: Did not activate ✅
+
+**Results:**
+- Total negative queries: 5
+- Correctly did not activate: 5
+- False positives: 0
+- Success rate: 100%
+```
+
+#### Pass Criteria
+- [ ] 100% of out-of-scope queries do NOT activate
+- [ ] Zero false positives
+- [ ] when_not_to_use cases covered
+
+---
+
+## 📋 Complete Testing Checklist
+
+### Pre-Testing Setup
+- [ ] marketplace.json has activation section
+- [ ] Keywords defined (10-15)
+- [ ] Patterns defined (5-7)
+- [ ] Description includes keywords
+- [ ] when_to_use / when_not_to_use defined
+- [ ] test_queries array populated
+
+### Layer 1: Keywords
+- [ ] All keywords tested individually
+- [ ] Case-insensitive matching verified
+- [ ] Embedded keywords work
+- [ ] 100% activation rate
+
+### Layer 2: Patterns
+- [ ] Each pattern tested with 5+ queries
+- [ ] Pattern matches verified (regex tester)
+- [ ] Claude Code activation verified
+- [ ] No false positives
+- [ ] Flexible enough for variations
+
+### Layer 3: Description
+- [ ] Edge cases tested
+- [ ] Natural language variations work
+- [ ] Fallback coverage confirmed
+
+### Integration
+- [ ] 10+ realistic queries per capability tested
+- [ ] 95%+ success rate achieved
+- [ ] All capabilities covered
+- [ ] Results documented
+
+### Negative Testing
+- [ ] Out-of-scope queries tested
+- [ ] Zero false positives
+- [ ] when_not_to_use cases verified
+
+### Documentation
+- [ ] Test results documented
+- [ ] Issues logged
+- [ ] Recommendations made
+- [ ] marketplace.json updated if needed
+
+---
+
+## 📊 Test Report Template
+
+```markdown
+# Activation Test Report
+
+**Skill Name:** {skill-name}
+**Version:** {version}
+**Test Date:** {date}
+**Tested By:** {name}
+**Environment:** Claude Code {version}
+
+---
+
+## Executive Summary
+
+- **Overall Success Rate:** {X}%
+- **Total Queries Tested:** {N}
+- **True Positives:** {N}
+- **True Negatives:** {N}
+- **False Positives:** {N}
+- **False Negatives:** {N}
+
+---
+
+## Layer 1: Keywords Testing
+
+**Keywords Tested:** {count}
+**Success Rate:** {X}%
+
+### Results
+| Keyword | Test Queries | Passed | Failed |
+|---------|--------------|--------|--------|
+| {keyword-1} | {N} | {N} | {N} |
+| {keyword-2} | {N} | {N} | {N} |
+
+**Issues:**
+- {issue-1}
+- {issue-2}
+
+---
+
+## Layer 2: Patterns Testing
+
+**Patterns Tested:** {count}
+**Success Rate:** {X}%
+
+### Results
+| Pattern | Test Queries | Passed | Failed |
+|---------|--------------|--------|--------|
+| {pattern-1} | {N} | {N} | {N} |
+| {pattern-2} | {N} | {N} | {N} |
+
+**Issues:**
+- {issue-1}
+- {issue-2}
+
+---
+
+## Layer 3: Description Testing
+
+**Edge Cases Tested:** {count}
+**Success Rate:** {X}%
+
+**Results:**
+- Activated via description: {N}
+- Failed to activate: {N}
+
+---
+
+## Integration Testing
+
+**Total Test Queries:** {count}
+**Success Rate:** {X}%
+
+**Breakdown by Capability:**
+| Capability | Queries | Success | Rate |
+|------------|---------|---------|------|
+| {cap-1} | {N} | {N} | {X}% |
+| {cap-2} | {N} | {N} | {X}% |
+
+---
+
+## Negative Testing
+
+**Out-of-Scope Queries:** {count}
+**False Positives:** {N}
+**Success Rate:** {X}%
+
+---
+
+## Issues & Recommendations
+
+### Critical Issues
+1. {issue-description}
+   - Impact: {high/medium/low}
+   - Recommendation: {action}
+
+### Minor Issues
+1. {issue-description}
+   - Impact: {low}
+   - Recommendation: {action}
+
+### Recommendations
+1. {recommendation-1}
+2. {recommendation-2}
+
+---
+
+## Conclusion
+
+{Summary of test results and next steps}
+
+**Status:** {PASS / NEEDS WORK / FAIL}
+
+---
+
+**Appendix A:** Full Test Query List
+**Appendix B:** Failed Query Analysis
+**Appendix C:** Updated marketplace.json (if changes needed)
+```
+
+---
+
+## 🔄 Iterative Testing Process
+
+### Step 1: Initial Test
+- Run complete test suite
+- Document results
+- Identify failures
+
+### Step 2: Analysis
+- Analyze failed queries
+- Determine root cause
+- Plan fixes
+
+### Step 3: Fix
+- Update keywords/patterns/description
+- Document changes
+
+### Step 4: Retest
+- Test only failed queries
+- Verify fixes work
+- Ensure no regressions
+
+### Step 5: Full Regression Test
+- Run complete test suite again
+- Verify 95%+ success rate
+- Document final results
+
+---
+
+## 🎯 Sample Test Suite
+
+### Example: Agent Creation Skill
+
+```markdown
+## Test Suite: Agent Creation Skill
+
+### Layer 1 Tests (Keywords)
+
+**Keyword:** "create an agent for"
+- ✅ "create an agent for processing PDFs"
+- ✅ "I want to create an agent for automation"
+- ✅ "Create An Agent For daily tasks"
+
+**Keyword:** "automate workflow"
+- ✅ "automate workflow for invoices"
+- ✅ "need to automate workflow"
+- ✅ "Automate Workflow handling"
+
+[... more keywords]
+
+### Layer 2 Tests (Patterns)
+
+**Pattern:** `(?i)(create|build)\s+(an?\s+)?agent`
+- ✅ "create an agent for X"
+- ✅ "build a agent for Y"
+- ✅ "create agent for Z"
+- ✅ "Build Agent for tasks"
+- ❌ "agent creation guide" (should not match)
+
+[... more patterns]
+
+### Integration Tests
+
+**Capability:** Agent Creation
+1. ✅ "create an agent for processing CSVs"
+2. ✅ "build automation for email handling"
+3. ✅ "automate this workflow: download, process, upload"
+4. ✅ "every day I have to categorize files manually"
+5. ✅ "turn this process into an automated agent"
+6. ✅ "I need a skill for data extraction"
+7. ✅ "daily workflow automation needed"
+8. ✅ "repeatedly doing manual data entry"
+9. ✅ "develop an agent to monitor APIs"
+10. ✅ "make something to handle invoices automatically"
+
+**Success Rate:** 10/10 = 100%
+
+### Negative Tests
+
+**Should NOT Activate:**
+1. ✅ "How do I use an existing agent?" (did not activate)
+2. ✅ "Explain what agents are" (did not activate)
+3. ✅ "Debug this code" (did not activate)
+4. ✅ "Write a Python function" (did not activate)
+5. ✅ "Run the invoice agent" (did not activate)
+
+**Success Rate:** 5/5 = 100%
+```
+
+---
+
+## 📚 Additional Resources
+
+- `phase4-detection.md` - Detection methodology
+- `activation-patterns-guide.md` - Pattern library
+- `activation-quality-checklist.md` - Quality standards
+- `ACTIVATION_BEST_PRACTICES.md` - Best practices
+
+---
+
+## 🔧 Troubleshooting
+
+### Issue: Low Success Rate (<90%)
+
+**Diagnosis:**
+1. Review failed queries
+2. Check if keywords/patterns too narrow
+3. Verify description includes key concepts
+
+**Solution:**
+1. Add more keyword variations
+2. Broaden patterns slightly
+3. Enhance description with synonyms
+
+### Issue: False Positives
+
+**Diagnosis:**
+1. Review activated queries
+2. Check if patterns too broad
+3. Verify keywords not too generic
+
+**Solution:**
+1. Narrow patterns (add context requirements)
+2. Use complete phrases for keywords
+3. Add negative scope to description
+
+### Issue: Inconsistent Activation
+
+**Diagnosis:**
+1. Test same query multiple times
+2. Check for Claude Code updates
+3. Verify marketplace.json structure
+
+**Solution:**
+1. Use all 3 layers (keywords + patterns + description)
+2. Increase keyword/pattern coverage
+3. Validate JSON syntax
+
+---
+
+**Version:** 1.0
+**Last Updated:** 2025-10-23
+**Maintained By:** Agent-Skill-Creator Team