# Anthropic Best Practices Checklist

Evaluation criteria for assessing Claude Skill quality based on official Anthropic guidelines.

## Purpose

Use this checklist to evaluate skills found on GitHub. Each criterion contributes to the overall quality score (0-10).

## Evaluation Criteria

### 1. Description Quality (Weight: 2.0)

**What to check:**
- [ ] Description is specific, not vague
- [ ] Includes what the skill does
- [ ] Includes when to use it (trigger conditions)
- [ ] Contains key terms users would mention
- [ ] Written in third person
- [ ] Under 1024 characters
- [ ] No XML tags

**Scoring:**
- 2.0: All criteria met, very clear and specific
- 1.5: Most criteria met, good clarity
- 1.0: Basic description, somewhat vague
- 0.5: Very vague or generic
- 0.0: Missing or completely unclear

**Examples:**

**Good (2.0):**
```yaml
description: Analyze Excel spreadsheets, create pivot tables, generate charts. Use when working with Excel files, spreadsheets, tabular data, or .xlsx files.
```

**Bad (0.5):**
```yaml
description: Helps with documents
```

### 2. Name Convention (Weight: 0.5)

**What to check:**
- [ ] Uses lowercase letters, numbers, hyphens only
- [ ] Under 64 characters
- [ ] Follows naming pattern (gerund form preferred)
- [ ] Descriptive, not vague
- [ ] No reserved words ("anthropic", "claude")

**Scoring:**
- 0.5: Follows all conventions
- 0.25: Minor issues (e.g., not gerund but still clear)
- 0.0: Violates conventions or very vague

**Good:** `processing-pdfs`, `analyzing-spreadsheets`
**Bad:** `helper`, `utils`, `claude-tool`

### 3. Conciseness (Weight: 1.5)

**What to check:**
- [ ] SKILL.md body under 500 lines
- [ ] No unnecessary explanations
- [ ] Assumes Claude's intelligence
- [ ] Gets to the point quickly
- [ ] Additional content in separate files if needed

**Scoring:**
- 1.5: Very concise, well-edited, <300 lines
- 1.0: Reasonable length, <500 lines
- 0.5: Long but not excessive, 500-800 lines
- 0.0: Very verbose, >800 lines

### 4. Progressive Disclosure (Weight: 1.0)

**What to check:**
- [ ] SKILL.md serves as overview/table of contents
- [ ] Additional details in separate files
- [ ] Clear references to other files
- [ ] Files organized by domain/feature
- [ ] No deeply nested references (max 1 level deep)

**Scoring:**
- 1.0: Excellent use of progressive disclosure
- 0.75: Good organization with some references
- 0.5: Some separation, could be better
- 0.25: All content in SKILL.md, no references
- 0.0: Poorly organized or deeply nested

### 5. Examples and Workflows (Weight: 1.0)

**What to check:**
- [ ] Has concrete examples (not abstract)
- [ ] Includes code snippets
- [ ] Shows input/output pairs
- [ ] Has clear workflows for complex tasks
- [ ] Examples use real patterns, not placeholders

**Scoring:**
- 1.0: Excellent examples and clear workflows
- 0.75: Good examples, some workflows
- 0.5: Basic examples, no workflows
- 0.25: Few or abstract examples
- 0.0: No examples

### 6. Appropriate Degree of Freedom (Weight: 0.5)

**What to check:**
- [ ] Instructions match task fragility
- [ ] High freedom for flexible tasks (text instructions)
- [ ] Low freedom for fragile tasks (specific scripts)
- [ ] Clear when to use exact commands vs adapt

**Scoring:**
- 0.5: Perfect match of freedom to task type
- 0.25: Reasonable but could be better
- 0.0: Inappropriate level (too rigid or too loose)

### 7. Dependencies Documentation (Weight: 0.5)

**What to check:**
- [ ] Required packages listed
- [ ] Installation instructions provided
- [ ] Dependencies verified as available
- [ ] No assumption of pre-installed packages

**Scoring:**
- 0.5: All dependencies documented and verified
- 0.25: Dependencies mentioned but not fully documented
- 0.0: Dependencies assumed or not mentioned

### 8. Structure and Organization (Weight: 1.0)

**What to check:**
- [ ] Clear section headings
- [ ] Logical flow of information
- [ ] Table of contents for long files
- [ ] Consistent formatting
- [ ] Unix-style paths (forward slashes)

**Scoring:**
- 1.0: Excellently organized
- 0.75: Well organized with minor issues
- 0.5: Basic organization
- 0.25: Poor organization
- 0.0: No clear structure

### 9. Error Handling (Weight: 0.5)

**What to check (for skills with scripts):**
- [ ] Scripts handle errors explicitly
- [ ] Clear error messages
- [ ] Fallback strategies provided
- [ ] Validation loops for critical operations
- [ ] No "voodoo constants"

**Scoring:**
- 0.5: Excellent error handling
- 0.25: Basic error handling
- 0.0: No error handling or punts to Claude

### 10. Avoids Anti-Patterns (Weight: 1.0)

**What to avoid:**
- [ ] Time-sensitive information
- [ ] Inconsistent terminology
- [ ] Windows-style paths
- [ ] Offering too many options without guidance
- [ ] Deeply nested references
- [ ] Vague or generic content

**Scoring:**
- 1.0: No anti-patterns
- 0.75: 1-2 minor anti-patterns
- 0.5: Multiple anti-patterns
- 0.0: Severe anti-patterns

### 11. Testing and Validation (Weight: 0.5)

**What to check:**
- [ ] Evidence of testing mentioned
- [ ] Evaluation examples provided
- [ ] Clear success criteria
- [ ] Feedback loops for quality

**Scoring:**
- 0.5: Clear testing approach
- 0.25: Some testing mentioned
- 0.0: No testing mentioned

## Scoring System

**Total possible: 10.0 points**

Calculate weighted score:
```
quality_score = (
  description_score * 2.0 +
  name_score * 0.5 +
  conciseness_score * 1.5 +
  progressive_disclosure_score * 1.0 +
  examples_score * 1.0 +
  freedom_score * 0.5 +
  dependencies_score * 0.5 +
  structure_score * 1.0 +
  error_handling_score * 0.5 +
  anti_patterns_score * 1.0 +
  testing_score * 0.5
)
```

## Quality Tiers

**Excellent (8.0-10.0):**
- Follows all best practices
- Clearly professional
- Ready for production use
- **Recommendation:** Strongly recommended

**Good (6.0-7.9):**
- Follows most best practices
- Minor improvements needed
- Usable but not perfect
- **Recommendation:** Recommended with minor notes

**Fair (4.0-5.9):**
- Follows some best practices
- Several improvements needed
- May work but needs review
- **Recommendation:** Consider with caution

**Poor (0.0-3.9):**
- Violates many best practices
- Significant issues
- High risk of problems
- **Recommendation:** Not recommended

## Quick Evaluation Process

For rapid assessment during search:

1. **Read SKILL.md frontmatter** (30 sec)
   - Check description quality (most important)
   - Check name convention

2. **Scan SKILL.md body** (1-2 min)
   - Check length (<500 lines?)
   - Look for examples
   - Check for references to other files
   - Note any obvious anti-patterns

3. **Check file structure** (30 sec)
   - Look for reference files
   - Check for scripts/utilities
   - Verify organization

4. **Calculate quick score** (30 sec)
   - Focus on weighted criteria
   - Estimate tier (Excellent/Good/Fair/Poor)

**Total time per skill: ~3-4 minutes**

## Automation Tips

When evaluating multiple skills:

```bash
# Check SKILL.md length
wc -l SKILL.md

# Count reference files
find . -name "*.md" -not -name "SKILL.md" | wc -l

# Check for common anti-patterns
grep -i "claude can help\|I can help\|you can use" SKILL.md

# Verify Unix paths
grep -E '\\\|\\\\' SKILL.md

# Check description length
head -10 SKILL.md | grep "description:" | wc -c
```

## Reference

Based on official Anthropic documentation:
- [Agent Skills Overview](https://docs.anthropic.com/en/docs/agents-and-tools/agent-skills/overview)
- [Best Practices Guide](https://docs.anthropic.com/en/docs/agents-and-tools/agent-skills/best-practices)
- [Claude Code Skills](https://docs.anthropic.com/en/docs/claude-code/skills)

---

**Usage:** Use this checklist when evaluating skills found through skill-finder to provide quality scores and recommendations to users.