Files
gh-francyjglisboa-agent-ski…/references/activation-quality-checklist.md
2025-11-29 18:27:25 +08:00

340 lines
9.5 KiB
Markdown

# Activation Quality Checklist
**Version:** 1.0
**Purpose:** Ensure high-quality activation system for all created skills
---
## Overview
Use this checklist during Phase 4 (Detection) to ensure the skill has robust, reliable activation. **All items must be checked before proceeding to Phase 5.**
**Target:** 95%+ activation reliability with zero false positives
---
## ✅ Layer 1: Keywords Quality
### Quantity
- [ ] **Minimum 10 keywords defined**
- [ ] **Maximum 20 keywords** (more can dilute effectiveness)
- [ ] At least 3 categories covered (action, workflow, domain)
### Quality
- [ ] **All keywords are complete phrases** (not single words)
- [ ] No keywords shorter than 2 words
- [ ] **No overly generic keywords** (e.g., "data", "analysis" alone)
- [ ] Each keyword is unique and non-redundant
### Coverage
- [ ] Keywords cover main capability: {{capability-1}}
- [ ] Keywords cover secondary capability: {{capability-2}}
- [ ] Keywords cover tertiary capability: {{capability-3}}
- [ ] **At least 3 keywords per major capability**
### Specificity
- [ ] Keywords include action verbs (create, analyze, extract)
- [ ] Keywords include domain entities (agent, stock, crop)
- [ ] Keywords include context modifiers when appropriate
### Examples
- [ ] ✅ Good: "create an agent for"
- [ ] ✅ Good: "stock technical analysis"
- [ ] ✅ Good: "harvest progress data"
- [ ] ❌ Bad: "create" (single word)
- [ ] ❌ Bad: "data analysis" (too generic)
- [ ] ❌ Bad: "help me" (too vague)
---
## ✅ Layer 2: Patterns Quality
### Quantity
- [ ] **Minimum 5 patterns defined**
- [ ] **Maximum 10 patterns** (more can create conflicts)
- [ ] At least 3 pattern types covered (action, transformation, query)
### Structure
- [ ] **All patterns start with (?i)** for case-insensitivity
- [ ] All patterns include action verb group
- [ ] Patterns allow for flexible word order where appropriate
- [ ] **No patterns match single words only**
### Specificity vs Flexibility
- [ ] Patterns are specific enough (avoid false positives)
- [ ] Patterns are flexible enough (capture variations)
- [ ] Patterns require both verb AND entity/context
- [ ] **Tested each pattern independently**
### Quality Checks
- [ ] **Pattern 1: Action + Object pattern exists**
- Example: `(?i)(create|build)\s+(an?\s+)?agent\s+for`
- [ ] **Pattern 2: Domain-specific pattern exists**
- Example: `(?i)(analyze|monitor)\s+.*\s+(stock|crop)`
- [ ] **Pattern 3: Workflow pattern exists** (if applicable)
- Example: `(?i)(every day|daily)\s+I\s+(have to|need)`
- [ ] **Pattern 4: Transformation pattern exists** (if applicable)
- Example: `(?i)(convert|transform)\s+.*\s+into`
- [ ] Pattern 5-7: Additional patterns cover edge cases
### Testing
- [ ] **Each pattern tested with 5+ positive examples**
- [ ] Each pattern tested with 2+ negative examples
- [ ] No pattern has >20% false positive rate
- [ ] Combined patterns achieve >80% coverage
---
## ✅ Layer 3: Description Quality
### Content Requirements
- [ ] **60+ unique keywords included in description**
- [ ] All major capabilities explicitly mentioned
- [ ] **Each capability has synonyms** in parentheses
- [ ] Technology/API/data source names included
- [ ] 3-5 example use cases mentioned
### Structure
- [ ] Description starts with primary use case
- [ ] **"Activates for queries about:"** section included
- [ ] **"Does NOT activate for:"** section included
- [ ] Length is 300-500 characters (comprehensive but not excessive)
### Keyword Integration
- [ ] All Layer 1 keywords appear in description
- [ ] Domain-specific terms well-represented
- [ ] Action verbs prominently featured
- [ ] Geographic/temporal qualifiers included (if relevant)
### Clarity
- [ ] Description is readable and natural
- [ ] No keyword stuffing (keywords flow naturally)
- [ ] Technical terms explained where necessary
- [ ] **User can understand when to use skill**
---
## ✅ Usage Section Quality
### when_to_use
- [ ] **Minimum 5 use cases listed**
- [ ] Use cases are specific and actionable
- [ ] Use cases cover all major capabilities
- [ ] Use cases use natural language
### when_not_to_use
- [ ] **Minimum 3 counter-cases listed**
- [ ] Counter-cases prevent common false positives
- [ ] Counter-cases clearly distinguish from similar skills
- [ ] Each counter-case explains WHY not to use
### Example
- [ ] **Concrete example query provided**
- [ ] Example demonstrates typical usage
- [ ] Example would actually activate the skill
---
## ✅ Test Queries Quality
### Quantity
- [ ] **Minimum 10 test queries defined**
- [ ] At least 2 queries per major capability
- [ ] Mix of query types (direct, natural, edge cases)
### Coverage
- [ ] Tests cover Layer 1 (keywords)
- [ ] Tests cover Layer 2 (patterns)
- [ ] Tests cover Layer 3 (description/NLU)
- [ ] Tests cover all capabilities
- [ ] Tests include edge cases
### Quality
- [ ] Queries use natural language
- [ ] Queries are realistic user requests
- [ ] Queries vary in phrasing and structure
- [ ] **Each query documented with expected activation layer**
### Negative Tests
- [ ] **Minimum 3 negative test cases** (should NOT activate)
- [ ] Negative cases test counter-examples from when_not_to_use
- [ ] Negative cases documented separately
---
## ✅ Integration & Conflicts
### Conflict Check
- [ ] **Reviewed other existing skills in ecosystem**
- [ ] No keyword conflicts with other skills
- [ ] Patterns don't overlap significantly with other skills
- [ ] Clear differentiation from similar skills
### Priority
- [ ] Activation priority is appropriate
- [ ] More specific skills have higher priority if needed
- [ ] Domain-specific skills prioritized over general skills
---
## ✅ Documentation
### In marketplace.json
- [ ] **activation section complete**
- [ ] **usage section complete**
- [ ] **test_queries array populated**
- [ ] All JSON is valid (no syntax errors)
### In SKILL.md
- [ ] Keywords section included
- [ ] Activation examples (positive and negative)
- [ ] Use cases clearly documented
### In README.md
- [ ] **Activation section included** (see template)
- [ ] 10+ activation phrase examples
- [ ] Counter-examples documented
- [ ] Activation tips provided
---
## ✅ Testing Validation
### Layer Testing
- [ ] **Layer 1 (Keywords) tested individually**
- Pass rate: ___% (target: 100%)
- [ ] **Layer 2 (Patterns) tested individually**
- Pass rate: ___% (target: 100%)
- [ ] **Layer 3 (Description) tested with edge cases**
- Pass rate: ___% (target: 90%+)
### Integration Testing
- [ ] **All test_queries tested in Claude Code**
- Pass rate: ___% (target: 95%+)
- [ ] Negative tests verified (no false positives)
- Pass rate: ___% (target: 100%)
### Results
- [ ] **Overall success rate: ____%** (target: >=95%)
- [ ] **False positive rate: ____%** (target: 0%)
- [ ] **False negative rate: ____%** (target: <5%)
---
## ✅ Final Verification
### Pre-Deployment
- [ ] All above checklists completed
- [ ] Test report documented
- [ ] Issues identified and fixed
- [ ] **Activation success rate >= 95%**
### Documentation Complete
- [ ] marketplace.json reviewed and validated
- [ ] SKILL.md includes activation section
- [ ] README.md includes activation examples
- [ ] TESTING.md created (if complex skill)
### Sign-Off
- [ ] Creator reviewed activation system
- [ ] Test results satisfactory
- [ ] Ready for Phase 5 (Implementation)
---
## 📊 Scoring System
### Minimum Requirements
| Layer | Minimum Score | Target Score |
|-------|---------------|--------------|
| Keywords (Layer 1) | 10 keywords | 12-15 keywords |
| Patterns (Layer 2) | 5 patterns | 7 patterns |
| Description (Layer 3) | 300 chars, 60+ keywords | 400 chars, 80+ keywords |
| Test Queries | 10 queries | 15+ queries |
| Success Rate | 90% | 95%+ |
### Grading
**A (Excellent):** 95%+ success rate, all requirements met
**B (Good):** 90-94% success rate, most requirements met
**C (Acceptable):** 85-89% success rate, minimum requirements met
**F (Needs Work):** <85% success rate, requirements not met
**Only Grade A skills should proceed to implementation.**
---
## 🚨 Common Issues Checklist
### Issue: Low Activation Rate (<90%)
**Check:**
- [ ] Are keywords too specific/narrow?
- [ ] Are patterns too restrictive?
- [ ] Is description missing key concepts?
- [ ] Are test queries realistic?
### Issue: False Positives
**Check:**
- [ ] Are keywords too generic?
- [ ] Are patterns too broad?
- [ ] Is description unclear about scope?
- [ ] Are when_not_to_use cases defined?
### Issue: Inconsistent Activation
**Check:**
- [ ] Are all 3 layers properly configured?
- [ ] Is JSON syntax valid?
- [ ] Are patterns properly escaped?
- [ ] Has testing been thorough?
---
## 📝 Quick Reference
### Minimum Requirements Summary
**Must Have:**
- ✅ 10+ keywords (complete phrases)
- ✅ 5+ patterns (with verbs + entities)
- ✅ 300+ char description (60+ keywords)
- ✅ 5+ when_to_use cases
- ✅ 3+ when_not_to_use cases
- ✅ 10+ test queries
- ✅ 95%+ success rate
**Should Have:**
- ⭐ 15 keywords
- ⭐ 7 patterns
- ⭐ 400+ char description (80+ keywords)
- ⭐ 15+ test queries
- ⭐ 98%+ success rate
- ⭐ Zero false positives
---
## 📚 Additional Resources
- `phase4-detection.md` - Complete detection methodology
- `activation-patterns-guide.md` - Pattern library
- `activation-testing-guide.md` - Testing procedures
- `marketplace-robust-template.json` - Template with placeholders
- `README-activation-template.md` - README template
---
**Status:** ___ (In Progress / Complete)
**Reviewer:** ___
**Date:** ___
**Success Rate:** ___%
**Grade:** ___ (A / B / C / F)
---
**Version:** 1.0
**Last Updated:** 2025-10-23
**Maintained By:** Agent-Skill-Creator Team