--- name: skill-evaluator description: Evaluate Claude Code skills against best practices including size, structure, examples, and prompt engineering quality. Provides comprehensive analysis with actionable suggestions. --- # Claude Code Skill Evaluator Systematically evaluate Claude Code skills for quality, compliance with best practices, and optimization opportunities. Provides detailed assessment with actionable suggestions for improvement. ## Table of Contents - [Instructions](#instructions) - [1. Find Skill](#1-find-skill) - [2. Read the Skill File](#2-read-the-skill-file) - [3. Analyze Against Best Practices](#3-analyze-against-best-practices) - [Dimension 1: Size & Length](#dimension-1-size--length) - [Dimension 2: Scope Definition](#dimension-2-scope-definition) - [Dimension 3: Description Quality](#dimension-3-description-quality) - [Dimension 4: Structure & Organization](#dimension-4-structure--organization) - [Dimension 5: Examples](#dimension-5-examples) - [Dimension 6: Anti-Pattern Detection](#dimension-6-anti-pattern-detection) - [Dimension 7: Prompt Engineering Quality](#dimension-7-prompt-engineering-quality) - [Dimension 8: Completeness](#dimension-8-completeness) - [4. Generate Comprehensive Evaluation Report](#4-generate-comprehensive-evaluation-report) - [5. Deliver Report to User](#5-deliver-report-to-user) - [Important Guidelines](#important-guidelines) - [Requirements](#requirements) - [Context & Standards](#context--standards) ## Instructions ### 1. Find Skill Identify the skill passed in the directory passed to you or find all in the user's `~/.claude/skills/` directory. For each directory (excluding hidden files), verify it contains a `SKILL.md` file. Present the user with: - List of available skills - Ask which skill to evaluate (or accept skill name as input) ### 2. Read the Skill File Once a skill is selected, read its `SKILL.md` file and extract: - Frontmatter metadata (name, description) - Total line count - Word count - Character count - Structure and sections ### 3. Analyze Against Best Practices Evaluate the skill across **8 dimensions**: #### Dimension 1: Size & Length **Guidelines:** - Body: Under 500 lines (hard maximum) - Name: Maximum 64 characters - Description: Maximum 1024 characters (200 char summary preferred) - Table of Contents: Include if over 100 lines **Assessment:** - Count total lines in SKILL.md body - Flag if over 500 lines - Compliment if well-sized (ideal: 100-300 lines for medium skills) - Check if TOC exists (expected for 100+ line skills) #### Dimension 2: Scope Definition **Guidelines:** - Narrow focus (one skill = one capability) - Clear boundary of what the skill does and doesn't do - No scope creep (e.g., "document processing" → "PDF form filling") **Assessment:** - Does the description clearly state what the skill does? - Are there multiple conflicting capabilities within one skill? - Is the boundary clear to a new user? #### Dimension 3: Description Quality **Guidelines:** - Third-person voice (avoid "I can" or "you can") - Include both WHAT and WHEN TO USE - Specific, searchable terminology - 200 character summary ideal **Assessment:** - Voice and tone appropriate? - Discovery terms clear? (Would users search for these terms?) - Is "when to use" explained? #### Dimension 4: Structure & Organization **Guidelines:** - Clear section hierarchy (headings, subsections) - Logical flow (progressive disclosure) - Step-by-step instructions preferred for workflows - Rules/constraints clearly stated **Assessment:** - Is structure logical? - Can a user easily navigate? - Are instructions sequential or scattered? #### Dimension 5: Examples **Guidelines:** - Quality over quantity - Typical: 2-3 examples for basic skills, more for format-heavy - Concrete (not abstract) - Show patterns and edge cases **Assessment:** - How many examples? (count them) - Are examples concrete and realistic? - Do they demonstrate key patterns? - Are there enough to show variations? #### Dimension 6: Anti-Pattern Detection **Red flags (check for these):** - ❌ Windows-style paths (should use forward slashes) - ❌ Magic numbers without justification - ❌ Vague terminology (inconsistent synonyms) - ❌ Time-sensitive instructions (date-dependent) - ❌ Deeply nested file references (over 2 levels) - ❌ Vague descriptions (missing WHAT or WHEN) - ❌ Scope creep (trying to do too much) - ❌ No error handling or validation steps - ❌ No user feedback loops (for complex workflows) - ❌ Multiple conflicting approaches for same task **Assessment:** - Count violations - Severity of each violation - Impact on usability #### Dimension 7: Prompt Engineering Quality **Guidelines:** - Imperative language (verb-first instructions) - Explicit rules with clear boundaries - Validation loops where appropriate (especially for destructive ops) - Clear error handling - Assumes user is intelligent (don't over-explain) **Assessment:** - Is language imperative? - Are there validation steps? - How clear are the rules? - Is error handling explicit? #### Dimension 8: Completeness **Guidelines:** - Requirements listed (what's needed to use the skill) - Edge cases acknowledged - Limitations stated where relevant **Assessment:** - Are prerequisites clear? - Are limitations or edge cases mentioned? - Is scope of responsibility clear? ### 4. Generate Comprehensive Evaluation Report Create a detailed evaluation report with these components: 1. **Executive Summary**: 1-2 paragraphs covering overall assessment, key strengths, and critical issues 2. **Metrics**: Present line count, word count, character count, and guideline compliance assessment 3. **Dimensional Analysis**: For each of the 8 dimensions: - Status indicator (✓ Pass / ⚠ Warning / ❌ Fail) - 1-2 sentence assessment explaining the rating 4. **Detected Issues**: Organize by severity: - Critical Issues (must fix) - any ❌ Fail items with explanation - Warnings (should address) - any ⚠ Warning items with explanation - Observations (minor items worth noting) 5. **Comparative Analysis**: Compare the skill against official skills repository patterns with examples and rationale 6. **Actionable Suggestions**: Numbered list of specific improvements, prioritized by impact: - High Priority (do this first) - Medium Priority (nice to have) - Low Priority (optional refinements) Each suggestion should include concrete rationale, not vague guidance. 7. **Overall Assessment**: - Professional verdict on production-readiness - Clear recommendation (Keep as-is / Minor tweaks / Significant refactor / Major restructure) ### 5. Deliver Report to User Present the complete evaluation report to the user in a clear, formatted structure. Ensure: - Status indicators are visible (✓ Pass / ⚠ Warning / ❌ Fail) - Actionable suggestions are specific (not vague) - Rationale is explained for each issue - Prioritization is clear ## Important Guidelines - **Be brutally honest**: Point out real issues, don't sugarcoat - **Specific over vague**: "The examples don't show error handling" not "examples could be better" - **Professional tone**: Constructive criticism, not harsh - **Evidence-based**: Reference specific lines or patterns from the skill - **Proportional feedback**: Don't over-critique minor issues - **Future-focused**: Suggest improvements, not judgment ## Requirements - User has installed skills in `~/.claude/skills/` - Target skill has a valid `SKILL.md` file with frontmatter - User accepts the detailed, honest evaluation ## Context & Standards This evaluator uses best practices from: - Official Anthropic Claude Code Skills documentation - Analysis of official skills repository patterns - Professional technical writing standards - Prompt engineering best practices for LLM interactions All assessments are comparative to **official guidelines**, not arbitrary standards.