Files
gh-tenequm-claude-plugins-s…/skills/skill/references/best-practices-checklist.md
2025-11-30 09:01:22 +08:00

7.6 KiB

Anthropic Best Practices Checklist

Evaluation criteria for assessing Claude Skill quality based on official Anthropic guidelines.

Purpose

Use this checklist to evaluate skills found on GitHub. Each criterion contributes to the overall quality score (0-10).

Evaluation Criteria

1. Description Quality (Weight: 2.0)

What to check:

  • Description is specific, not vague
  • Includes what the skill does
  • Includes when to use it (trigger conditions)
  • Contains key terms users would mention
  • Written in third person
  • Under 1024 characters
  • No XML tags

Scoring:

  • 2.0: All criteria met, very clear and specific
  • 1.5: Most criteria met, good clarity
  • 1.0: Basic description, somewhat vague
  • 0.5: Very vague or generic
  • 0.0: Missing or completely unclear

Examples:

Good (2.0):

description: Analyze Excel spreadsheets, create pivot tables, generate charts. Use when working with Excel files, spreadsheets, tabular data, or .xlsx files.

Bad (0.5):

description: Helps with documents

2. Name Convention (Weight: 0.5)

What to check:

  • Uses lowercase letters, numbers, hyphens only
  • Under 64 characters
  • Follows naming pattern (gerund form preferred)
  • Descriptive, not vague
  • No reserved words ("anthropic", "claude")

Scoring:

  • 0.5: Follows all conventions
  • 0.25: Minor issues (e.g., not gerund but still clear)
  • 0.0: Violates conventions or very vague

Good: processing-pdfs, analyzing-spreadsheets Bad: helper, utils, claude-tool

3. Conciseness (Weight: 1.5)

What to check:

  • SKILL.md body under 500 lines
  • No unnecessary explanations
  • Assumes Claude's intelligence
  • Gets to the point quickly
  • Additional content in separate files if needed

Scoring:

  • 1.5: Very concise, well-edited, <300 lines
  • 1.0: Reasonable length, <500 lines
  • 0.5: Long but not excessive, 500-800 lines
  • 0.0: Very verbose, >800 lines

4. Progressive Disclosure (Weight: 1.0)

What to check:

  • SKILL.md serves as overview/table of contents
  • Additional details in separate files
  • Clear references to other files
  • Files organized by domain/feature
  • No deeply nested references (max 1 level deep)

Scoring:

  • 1.0: Excellent use of progressive disclosure
  • 0.75: Good organization with some references
  • 0.5: Some separation, could be better
  • 0.25: All content in SKILL.md, no references
  • 0.0: Poorly organized or deeply nested

5. Examples and Workflows (Weight: 1.0)

What to check:

  • Has concrete examples (not abstract)
  • Includes code snippets
  • Shows input/output pairs
  • Has clear workflows for complex tasks
  • Examples use real patterns, not placeholders

Scoring:

  • 1.0: Excellent examples and clear workflows
  • 0.75: Good examples, some workflows
  • 0.5: Basic examples, no workflows
  • 0.25: Few or abstract examples
  • 0.0: No examples

6. Appropriate Degree of Freedom (Weight: 0.5)

What to check:

  • Instructions match task fragility
  • High freedom for flexible tasks (text instructions)
  • Low freedom for fragile tasks (specific scripts)
  • Clear when to use exact commands vs adapt

Scoring:

  • 0.5: Perfect match of freedom to task type
  • 0.25: Reasonable but could be better
  • 0.0: Inappropriate level (too rigid or too loose)

7. Dependencies Documentation (Weight: 0.5)

What to check:

  • Required packages listed
  • Installation instructions provided
  • Dependencies verified as available
  • No assumption of pre-installed packages

Scoring:

  • 0.5: All dependencies documented and verified
  • 0.25: Dependencies mentioned but not fully documented
  • 0.0: Dependencies assumed or not mentioned

8. Structure and Organization (Weight: 1.0)

What to check:

  • Clear section headings
  • Logical flow of information
  • Table of contents for long files
  • Consistent formatting
  • Unix-style paths (forward slashes)

Scoring:

  • 1.0: Excellently organized
  • 0.75: Well organized with minor issues
  • 0.5: Basic organization
  • 0.25: Poor organization
  • 0.0: No clear structure

9. Error Handling (Weight: 0.5)

What to check (for skills with scripts):

  • Scripts handle errors explicitly
  • Clear error messages
  • Fallback strategies provided
  • Validation loops for critical operations
  • No "voodoo constants"

Scoring:

  • 0.5: Excellent error handling
  • 0.25: Basic error handling
  • 0.0: No error handling or punts to Claude

10. Avoids Anti-Patterns (Weight: 1.0)

What to avoid:

  • Time-sensitive information
  • Inconsistent terminology
  • Windows-style paths
  • Offering too many options without guidance
  • Deeply nested references
  • Vague or generic content

Scoring:

  • 1.0: No anti-patterns
  • 0.75: 1-2 minor anti-patterns
  • 0.5: Multiple anti-patterns
  • 0.0: Severe anti-patterns

11. Testing and Validation (Weight: 0.5)

What to check:

  • Evidence of testing mentioned
  • Evaluation examples provided
  • Clear success criteria
  • Feedback loops for quality

Scoring:

  • 0.5: Clear testing approach
  • 0.25: Some testing mentioned
  • 0.0: No testing mentioned

Scoring System

Total possible: 10.0 points

Calculate weighted score:

quality_score = (
  description_score * 2.0 +
  name_score * 0.5 +
  conciseness_score * 1.5 +
  progressive_disclosure_score * 1.0 +
  examples_score * 1.0 +
  freedom_score * 0.5 +
  dependencies_score * 0.5 +
  structure_score * 1.0 +
  error_handling_score * 0.5 +
  anti_patterns_score * 1.0 +
  testing_score * 0.5
)

Quality Tiers

Excellent (8.0-10.0):

  • Follows all best practices
  • Clearly professional
  • Ready for production use
  • Recommendation: Strongly recommended

Good (6.0-7.9):

  • Follows most best practices
  • Minor improvements needed
  • Usable but not perfect
  • Recommendation: Recommended with minor notes

Fair (4.0-5.9):

  • Follows some best practices
  • Several improvements needed
  • May work but needs review
  • Recommendation: Consider with caution

Poor (0.0-3.9):

  • Violates many best practices
  • Significant issues
  • High risk of problems
  • Recommendation: Not recommended

Quick Evaluation Process

For rapid assessment during search:

  1. Read SKILL.md frontmatter (30 sec)

    • Check description quality (most important)
    • Check name convention
  2. Scan SKILL.md body (1-2 min)

    • Check length (<500 lines?)
    • Look for examples
    • Check for references to other files
    • Note any obvious anti-patterns
  3. Check file structure (30 sec)

    • Look for reference files
    • Check for scripts/utilities
    • Verify organization
  4. Calculate quick score (30 sec)

    • Focus on weighted criteria
    • Estimate tier (Excellent/Good/Fair/Poor)

Total time per skill: ~3-4 minutes

Automation Tips

When evaluating multiple skills:

# Check SKILL.md length
wc -l SKILL.md

# Count reference files
find . -name "*.md" -not -name "SKILL.md" | wc -l

# Check for common anti-patterns
grep -i "claude can help\|I can help\|you can use" SKILL.md

# Verify Unix paths
grep -E '\\\|\\\\' SKILL.md

# Check description length
head -10 SKILL.md | grep "description:" | wc -c

Reference

Based on official Anthropic documentation:


Usage: Use this checklist when evaluating skills found through skill-finder to provide quality scores and recommendations to users.