7.8 KiB
name, description
| name | description |
|---|---|
| skill-evaluator | Evaluate Claude Code skills against best practices including size, structure, examples, and prompt engineering quality. Provides comprehensive analysis with actionable suggestions. |
Claude Code Skill Evaluator
Systematically evaluate Claude Code skills for quality, compliance with best practices, and optimization opportunities. Provides detailed assessment with actionable suggestions for improvement.
Table of Contents
- Instructions
- Important Guidelines
- Requirements
- Context & Standards
Instructions
1. Find Skill
Identify the skill passed in the directory passed to you or find all in the user's ~/.claude/skills/ directory. For each directory (excluding hidden files), verify it contains a SKILL.md file.
Present the user with:
- List of available skills
- Ask which skill to evaluate (or accept skill name as input)
2. Read the Skill File
Once a skill is selected, read its SKILL.md file and extract:
- Frontmatter metadata (name, description)
- Total line count
- Word count
- Character count
- Structure and sections
3. Analyze Against Best Practices
Evaluate the skill across 8 dimensions:
Dimension 1: Size & Length
Guidelines:
- Body: Under 500 lines (hard maximum)
- Name: Maximum 64 characters
- Description: Maximum 1024 characters (200 char summary preferred)
- Table of Contents: Include if over 100 lines
Assessment:
- Count total lines in SKILL.md body
- Flag if over 500 lines
- Compliment if well-sized (ideal: 100-300 lines for medium skills)
- Check if TOC exists (expected for 100+ line skills)
Dimension 2: Scope Definition
Guidelines:
- Narrow focus (one skill = one capability)
- Clear boundary of what the skill does and doesn't do
- No scope creep (e.g., "document processing" → "PDF form filling")
Assessment:
- Does the description clearly state what the skill does?
- Are there multiple conflicting capabilities within one skill?
- Is the boundary clear to a new user?
Dimension 3: Description Quality
Guidelines:
- Third-person voice (avoid "I can" or "you can")
- Include both WHAT and WHEN TO USE
- Specific, searchable terminology
- 200 character summary ideal
Assessment:
- Voice and tone appropriate?
- Discovery terms clear? (Would users search for these terms?)
- Is "when to use" explained?
Dimension 4: Structure & Organization
Guidelines:
- Clear section hierarchy (headings, subsections)
- Logical flow (progressive disclosure)
- Step-by-step instructions preferred for workflows
- Rules/constraints clearly stated
Assessment:
- Is structure logical?
- Can a user easily navigate?
- Are instructions sequential or scattered?
Dimension 5: Examples
Guidelines:
- Quality over quantity
- Typical: 2-3 examples for basic skills, more for format-heavy
- Concrete (not abstract)
- Show patterns and edge cases
Assessment:
- How many examples? (count them)
- Are examples concrete and realistic?
- Do they demonstrate key patterns?
- Are there enough to show variations?
Dimension 6: Anti-Pattern Detection
Red flags (check for these):
- ❌ Windows-style paths (should use forward slashes)
- ❌ Magic numbers without justification
- ❌ Vague terminology (inconsistent synonyms)
- ❌ Time-sensitive instructions (date-dependent)
- ❌ Deeply nested file references (over 2 levels)
- ❌ Vague descriptions (missing WHAT or WHEN)
- ❌ Scope creep (trying to do too much)
- ❌ No error handling or validation steps
- ❌ No user feedback loops (for complex workflows)
- ❌ Multiple conflicting approaches for same task
Assessment:
- Count violations
- Severity of each violation
- Impact on usability
Dimension 7: Prompt Engineering Quality
Guidelines:
- Imperative language (verb-first instructions)
- Explicit rules with clear boundaries
- Validation loops where appropriate (especially for destructive ops)
- Clear error handling
- Assumes user is intelligent (don't over-explain)
Assessment:
- Is language imperative?
- Are there validation steps?
- How clear are the rules?
- Is error handling explicit?
Dimension 8: Completeness
Guidelines:
- Requirements listed (what's needed to use the skill)
- Edge cases acknowledged
- Limitations stated where relevant
Assessment:
- Are prerequisites clear?
- Are limitations or edge cases mentioned?
- Is scope of responsibility clear?
4. Generate Comprehensive Evaluation Report
Create a detailed evaluation report with these components:
-
Executive Summary: 1-2 paragraphs covering overall assessment, key strengths, and critical issues
-
Metrics: Present line count, word count, character count, and guideline compliance assessment
-
Dimensional Analysis: For each of the 8 dimensions:
- Status indicator (✓ Pass / ⚠ Warning / ❌ Fail)
- 1-2 sentence assessment explaining the rating
-
Detected Issues: Organize by severity:
- Critical Issues (must fix) - any ❌ Fail items with explanation
- Warnings (should address) - any ⚠ Warning items with explanation
- Observations (minor items worth noting)
-
Comparative Analysis: Compare the skill against official skills repository patterns with examples and rationale
-
Actionable Suggestions: Numbered list of specific improvements, prioritized by impact:
- High Priority (do this first)
- Medium Priority (nice to have)
- Low Priority (optional refinements)
Each suggestion should include concrete rationale, not vague guidance.
-
Overall Assessment:
- Professional verdict on production-readiness
- Clear recommendation (Keep as-is / Minor tweaks / Significant refactor / Major restructure)
5. Deliver Report to User
Present the complete evaluation report to the user in a clear, formatted structure. Ensure:
- Status indicators are visible (✓ Pass / ⚠ Warning / ❌ Fail)
- Actionable suggestions are specific (not vague)
- Rationale is explained for each issue
- Prioritization is clear
Important Guidelines
- Be brutally honest: Point out real issues, don't sugarcoat
- Specific over vague: "The examples don't show error handling" not "examples could be better"
- Professional tone: Constructive criticism, not harsh
- Evidence-based: Reference specific lines or patterns from the skill
- Proportional feedback: Don't over-critique minor issues
- Future-focused: Suggest improvements, not judgment
Requirements
- User has installed skills in
~/.claude/skills/ - Target skill has a valid
SKILL.mdfile with frontmatter - User accepts the detailed, honest evaluation
Context & Standards
This evaluator uses best practices from:
- Official Anthropic Claude Code Skills documentation
- Analysis of official skills repository patterns
- Professional technical writing standards
- Prompt engineering best practices for LLM interactions
All assessments are comparative to official guidelines, not arbitrary standards.