zhongwei/gh-lhohan-claude-code-plugins-skill-evaluator

Files

Zhongwei Li 86bde46ab1 Initial commit

2025-11-30 08:37:35 +08:00

7.8 KiB

Raw Permalink Blame History

name, description

name	description
skill-evaluator	Evaluate Claude Code skills against best practices including size, structure, examples, and prompt engineering quality. Provides comprehensive analysis with actionable suggestions.

Claude Code Skill Evaluator

Systematically evaluate Claude Code skills for quality, compliance with best practices, and optimization opportunities. Provides detailed assessment with actionable suggestions for improvement.

Instructions
Important Guidelines
Requirements
Context & Standards

Instructions

1. Find Skill

Identify the skill passed in the directory passed to you or find all in the user's ~/.claude/skills/ directory. For each directory (excluding hidden files), verify it contains a SKILL.md file.

Present the user with:

List of available skills
Ask which skill to evaluate (or accept skill name as input)

2. Read the Skill File

Once a skill is selected, read its SKILL.md file and extract:

Frontmatter metadata (name, description)
Total line count
Word count
Character count
Structure and sections

3. Analyze Against Best Practices

Evaluate the skill across 8 dimensions:

Dimension 1: Size & Length

Guidelines:

Body: Under 500 lines (hard maximum)
Name: Maximum 64 characters
Description: Maximum 1024 characters (200 char summary preferred)
Table of Contents: Include if over 100 lines

Assessment:

Count total lines in SKILL.md body
Flag if over 500 lines
Compliment if well-sized (ideal: 100-300 lines for medium skills)
Check if TOC exists (expected for 100+ line skills)

Dimension 2: Scope Definition

Guidelines:

Narrow focus (one skill = one capability)
Clear boundary of what the skill does and doesn't do
No scope creep (e.g., "document processing" → "PDF form filling")

Assessment:

Does the description clearly state what the skill does?
Are there multiple conflicting capabilities within one skill?
Is the boundary clear to a new user?

Dimension 3: Description Quality

Guidelines:

Third-person voice (avoid "I can" or "you can")
Include both WHAT and WHEN TO USE
Specific, searchable terminology
200 character summary ideal

Assessment:

Voice and tone appropriate?
Discovery terms clear? (Would users search for these terms?)
Is "when to use" explained?

Dimension 4: Structure & Organization

Guidelines:

Clear section hierarchy (headings, subsections)
Logical flow (progressive disclosure)
Step-by-step instructions preferred for workflows
Rules/constraints clearly stated

Assessment:

Is structure logical?
Can a user easily navigate?
Are instructions sequential or scattered?

Dimension 5: Examples

Guidelines:

Quality over quantity
Typical: 2-3 examples for basic skills, more for format-heavy
Concrete (not abstract)
Show patterns and edge cases

Assessment:

How many examples? (count them)
Are examples concrete and realistic?
Do they demonstrate key patterns?
Are there enough to show variations?

Dimension 6: Anti-Pattern Detection

Red flags (check for these):

❌ Windows-style paths (should use forward slashes)
❌ Magic numbers without justification
❌ Vague terminology (inconsistent synonyms)
❌ Time-sensitive instructions (date-dependent)
❌ Deeply nested file references (over 2 levels)
❌ Vague descriptions (missing WHAT or WHEN)
❌ Scope creep (trying to do too much)
❌ No error handling or validation steps
❌ No user feedback loops (for complex workflows)
❌ Multiple conflicting approaches for same task

Assessment:

Count violations
Severity of each violation
Impact on usability

Dimension 7: Prompt Engineering Quality

Guidelines:

Imperative language (verb-first instructions)
Explicit rules with clear boundaries
Validation loops where appropriate (especially for destructive ops)
Clear error handling
Assumes user is intelligent (don't over-explain)

Assessment:

Is language imperative?
Are there validation steps?
How clear are the rules?
Is error handling explicit?

Dimension 8: Completeness

Guidelines:

Requirements listed (what's needed to use the skill)
Edge cases acknowledged
Limitations stated where relevant

Assessment:

Are prerequisites clear?
Are limitations or edge cases mentioned?
Is scope of responsibility clear?

4. Generate Comprehensive Evaluation Report

Create a detailed evaluation report with these components:

Executive Summary: 1-2 paragraphs covering overall assessment, key strengths, and critical issues
Metrics: Present line count, word count, character count, and guideline compliance assessment
Dimensional Analysis: For each of the 8 dimensions:
- Status indicator (✓ Pass / ⚠ Warning / ❌ Fail)
- 1-2 sentence assessment explaining the rating
Detected Issues: Organize by severity:
- Critical Issues (must fix) - any ❌ Fail items with explanation
- Warnings (should address) - any ⚠ Warning items with explanation
- Observations (minor items worth noting)
Comparative Analysis: Compare the skill against official skills repository patterns with examples and rationale
Actionable Suggestions: Numbered list of specific improvements, prioritized by impact:
- High Priority (do this first)
- Medium Priority (nice to have)
- Low Priority (optional refinements)
Each suggestion should include concrete rationale, not vague guidance.
Overall Assessment:
- Professional verdict on production-readiness
- Clear recommendation (Keep as-is / Minor tweaks / Significant refactor / Major restructure)

5. Deliver Report to User

Present the complete evaluation report to the user in a clear, formatted structure. Ensure:

Status indicators are visible (✓ Pass / ⚠ Warning / ❌ Fail)
Actionable suggestions are specific (not vague)
Rationale is explained for each issue
Prioritization is clear

Important Guidelines

Be brutally honest: Point out real issues, don't sugarcoat
Specific over vague: "The examples don't show error handling" not "examples could be better"
Professional tone: Constructive criticism, not harsh
Evidence-based: Reference specific lines or patterns from the skill
Proportional feedback: Don't over-critique minor issues
Future-focused: Suggest improvements, not judgment

Requirements

User has installed skills in ~/.claude/skills/
Target skill has a valid SKILL.md file with frontmatter
User accepts the detailed, honest evaluation

Context & Standards

This evaluator uses best practices from:

Official Anthropic Claude Code Skills documentation
Analysis of official skills repository patterns
Professional technical writing standards
Prompt engineering best practices for LLM interactions

All assessments are comparative to official guidelines, not arbitrary standards.

7.8 KiB Raw Permalink Blame History

Claude Code Skill Evaluator

Table of Contents

Instructions

1. Find Skill

2. Read the Skill File

3. Analyze Against Best Practices

Dimension 1: Size & Length

Dimension 2: Scope Definition

Dimension 3: Description Quality

Dimension 4: Structure & Organization

Dimension 5: Examples

Dimension 6: Anti-Pattern Detection

Dimension 7: Prompt Engineering Quality

Dimension 8: Completeness

4. Generate Comprehensive Evaluation Report

5. Deliver Report to User

Important Guidelines

Requirements

Context & Standards

7.8 KiB

Raw Permalink Blame History