Files
gh-tenequm-claude-plugins-s…/skills/skill/references/best-practices-checklist.md
2025-11-30 09:01:22 +08:00

293 lines
7.6 KiB
Markdown

# Anthropic Best Practices Checklist
Evaluation criteria for assessing Claude Skill quality based on official Anthropic guidelines.
## Purpose
Use this checklist to evaluate skills found on GitHub. Each criterion contributes to the overall quality score (0-10).
## Evaluation Criteria
### 1. Description Quality (Weight: 2.0)
**What to check:**
- [ ] Description is specific, not vague
- [ ] Includes what the skill does
- [ ] Includes when to use it (trigger conditions)
- [ ] Contains key terms users would mention
- [ ] Written in third person
- [ ] Under 1024 characters
- [ ] No XML tags
**Scoring:**
- 2.0: All criteria met, very clear and specific
- 1.5: Most criteria met, good clarity
- 1.0: Basic description, somewhat vague
- 0.5: Very vague or generic
- 0.0: Missing or completely unclear
**Examples:**
**Good (2.0):**
```yaml
description: Analyze Excel spreadsheets, create pivot tables, generate charts. Use when working with Excel files, spreadsheets, tabular data, or .xlsx files.
```
**Bad (0.5):**
```yaml
description: Helps with documents
```
### 2. Name Convention (Weight: 0.5)
**What to check:**
- [ ] Uses lowercase letters, numbers, hyphens only
- [ ] Under 64 characters
- [ ] Follows naming pattern (gerund form preferred)
- [ ] Descriptive, not vague
- [ ] No reserved words ("anthropic", "claude")
**Scoring:**
- 0.5: Follows all conventions
- 0.25: Minor issues (e.g., not gerund but still clear)
- 0.0: Violates conventions or very vague
**Good:** `processing-pdfs`, `analyzing-spreadsheets`
**Bad:** `helper`, `utils`, `claude-tool`
### 3. Conciseness (Weight: 1.5)
**What to check:**
- [ ] SKILL.md body under 500 lines
- [ ] No unnecessary explanations
- [ ] Assumes Claude's intelligence
- [ ] Gets to the point quickly
- [ ] Additional content in separate files if needed
**Scoring:**
- 1.5: Very concise, well-edited, <300 lines
- 1.0: Reasonable length, <500 lines
- 0.5: Long but not excessive, 500-800 lines
- 0.0: Very verbose, >800 lines
### 4. Progressive Disclosure (Weight: 1.0)
**What to check:**
- [ ] SKILL.md serves as overview/table of contents
- [ ] Additional details in separate files
- [ ] Clear references to other files
- [ ] Files organized by domain/feature
- [ ] No deeply nested references (max 1 level deep)
**Scoring:**
- 1.0: Excellent use of progressive disclosure
- 0.75: Good organization with some references
- 0.5: Some separation, could be better
- 0.25: All content in SKILL.md, no references
- 0.0: Poorly organized or deeply nested
### 5. Examples and Workflows (Weight: 1.0)
**What to check:**
- [ ] Has concrete examples (not abstract)
- [ ] Includes code snippets
- [ ] Shows input/output pairs
- [ ] Has clear workflows for complex tasks
- [ ] Examples use real patterns, not placeholders
**Scoring:**
- 1.0: Excellent examples and clear workflows
- 0.75: Good examples, some workflows
- 0.5: Basic examples, no workflows
- 0.25: Few or abstract examples
- 0.0: No examples
### 6. Appropriate Degree of Freedom (Weight: 0.5)
**What to check:**
- [ ] Instructions match task fragility
- [ ] High freedom for flexible tasks (text instructions)
- [ ] Low freedom for fragile tasks (specific scripts)
- [ ] Clear when to use exact commands vs adapt
**Scoring:**
- 0.5: Perfect match of freedom to task type
- 0.25: Reasonable but could be better
- 0.0: Inappropriate level (too rigid or too loose)
### 7. Dependencies Documentation (Weight: 0.5)
**What to check:**
- [ ] Required packages listed
- [ ] Installation instructions provided
- [ ] Dependencies verified as available
- [ ] No assumption of pre-installed packages
**Scoring:**
- 0.5: All dependencies documented and verified
- 0.25: Dependencies mentioned but not fully documented
- 0.0: Dependencies assumed or not mentioned
### 8. Structure and Organization (Weight: 1.0)
**What to check:**
- [ ] Clear section headings
- [ ] Logical flow of information
- [ ] Table of contents for long files
- [ ] Consistent formatting
- [ ] Unix-style paths (forward slashes)
**Scoring:**
- 1.0: Excellently organized
- 0.75: Well organized with minor issues
- 0.5: Basic organization
- 0.25: Poor organization
- 0.0: No clear structure
### 9. Error Handling (Weight: 0.5)
**What to check (for skills with scripts):**
- [ ] Scripts handle errors explicitly
- [ ] Clear error messages
- [ ] Fallback strategies provided
- [ ] Validation loops for critical operations
- [ ] No "voodoo constants"
**Scoring:**
- 0.5: Excellent error handling
- 0.25: Basic error handling
- 0.0: No error handling or punts to Claude
### 10. Avoids Anti-Patterns (Weight: 1.0)
**What to avoid:**
- [ ] Time-sensitive information
- [ ] Inconsistent terminology
- [ ] Windows-style paths
- [ ] Offering too many options without guidance
- [ ] Deeply nested references
- [ ] Vague or generic content
**Scoring:**
- 1.0: No anti-patterns
- 0.75: 1-2 minor anti-patterns
- 0.5: Multiple anti-patterns
- 0.0: Severe anti-patterns
### 11. Testing and Validation (Weight: 0.5)
**What to check:**
- [ ] Evidence of testing mentioned
- [ ] Evaluation examples provided
- [ ] Clear success criteria
- [ ] Feedback loops for quality
**Scoring:**
- 0.5: Clear testing approach
- 0.25: Some testing mentioned
- 0.0: No testing mentioned
## Scoring System
**Total possible: 10.0 points**
Calculate weighted score:
```
quality_score = (
description_score * 2.0 +
name_score * 0.5 +
conciseness_score * 1.5 +
progressive_disclosure_score * 1.0 +
examples_score * 1.0 +
freedom_score * 0.5 +
dependencies_score * 0.5 +
structure_score * 1.0 +
error_handling_score * 0.5 +
anti_patterns_score * 1.0 +
testing_score * 0.5
)
```
## Quality Tiers
**Excellent (8.0-10.0):**
- Follows all best practices
- Clearly professional
- Ready for production use
- **Recommendation:** Strongly recommended
**Good (6.0-7.9):**
- Follows most best practices
- Minor improvements needed
- Usable but not perfect
- **Recommendation:** Recommended with minor notes
**Fair (4.0-5.9):**
- Follows some best practices
- Several improvements needed
- May work but needs review
- **Recommendation:** Consider with caution
**Poor (0.0-3.9):**
- Violates many best practices
- Significant issues
- High risk of problems
- **Recommendation:** Not recommended
## Quick Evaluation Process
For rapid assessment during search:
1. **Read SKILL.md frontmatter** (30 sec)
- Check description quality (most important)
- Check name convention
2. **Scan SKILL.md body** (1-2 min)
- Check length (<500 lines?)
- Look for examples
- Check for references to other files
- Note any obvious anti-patterns
3. **Check file structure** (30 sec)
- Look for reference files
- Check for scripts/utilities
- Verify organization
4. **Calculate quick score** (30 sec)
- Focus on weighted criteria
- Estimate tier (Excellent/Good/Fair/Poor)
**Total time per skill: ~3-4 minutes**
## Automation Tips
When evaluating multiple skills:
```bash
# Check SKILL.md length
wc -l SKILL.md
# Count reference files
find . -name "*.md" -not -name "SKILL.md" | wc -l
# Check for common anti-patterns
grep -i "claude can help\|I can help\|you can use" SKILL.md
# Verify Unix paths
grep -E '\\\|\\\\' SKILL.md
# Check description length
head -10 SKILL.md | grep "description:" | wc -c
```
## Reference
Based on official Anthropic documentation:
- [Agent Skills Overview](https://docs.anthropic.com/en/docs/agents-and-tools/agent-skills/overview)
- [Best Practices Guide](https://docs.anthropic.com/en/docs/agents-and-tools/agent-skills/best-practices)
- [Claude Code Skills](https://docs.anthropic.com/en/docs/claude-code/skills)
---
**Usage:** Use this checklist when evaluating skills found through skill-finder to provide quality scores and recommendations to users.