Initial commit

2025-11-30 08:37:35 +08:00
commit 86bde46ab1
4 changed files with 278 additions and 0 deletions
--- a/skills/skill-evaluator/SKILL.md
+++ b/skills/skill-evaluator/SKILL.md
@@ -0,0 +1,218 @@
+---
+name: skill-evaluator
+description: Evaluate Claude Code skills against best practices including size, structure, examples, and prompt engineering quality. Provides comprehensive analysis with actionable suggestions.
+---
+
+# Claude Code Skill Evaluator
+
+Systematically evaluate Claude Code skills for quality, compliance with best practices, and optimization opportunities. Provides detailed assessment with actionable suggestions for improvement.
+
+## Table of Contents
+
+- [Instructions](#instructions)
+  - [1. Find Skill](#1-find-skill)
+  - [2. Read the Skill File](#2-read-the-skill-file)
+  - [3. Analyze Against Best Practices](#3-analyze-against-best-practices)
+    - [Dimension 1: Size & Length](#dimension-1-size--length)
+    - [Dimension 2: Scope Definition](#dimension-2-scope-definition)
+    - [Dimension 3: Description Quality](#dimension-3-description-quality)
+    - [Dimension 4: Structure & Organization](#dimension-4-structure--organization)
+    - [Dimension 5: Examples](#dimension-5-examples)
+    - [Dimension 6: Anti-Pattern Detection](#dimension-6-anti-pattern-detection)
+    - [Dimension 7: Prompt Engineering Quality](#dimension-7-prompt-engineering-quality)
+    - [Dimension 8: Completeness](#dimension-8-completeness)
+  - [4. Generate Comprehensive Evaluation Report](#4-generate-comprehensive-evaluation-report)
+  - [5. Deliver Report to User](#5-deliver-report-to-user)
+- [Important Guidelines](#important-guidelines)
+- [Requirements](#requirements)
+- [Context & Standards](#context--standards)
+
+## Instructions
+
+### 1. Find Skill
+
+Identify the skill passed in the directory passed to you or find all in the user's `~/.claude/skills/` directory. For each directory (excluding hidden files), verify it contains a `SKILL.md` file.
+
+Present the user with:
+- List of available skills
+- Ask which skill to evaluate (or accept skill name as input)
+
+### 2. Read the Skill File
+
+Once a skill is selected, read its `SKILL.md` file and extract:
+- Frontmatter metadata (name, description)
+- Total line count
+- Word count
+- Character count
+- Structure and sections
+
+### 3. Analyze Against Best Practices
+
+Evaluate the skill across **8 dimensions**:
+
+#### Dimension 1: Size & Length
+**Guidelines:**
+- Body: Under 500 lines (hard maximum)
+- Name: Maximum 64 characters
+- Description: Maximum 1024 characters (200 char summary preferred)
+- Table of Contents: Include if over 100 lines
+
+**Assessment:**
+- Count total lines in SKILL.md body
+- Flag if over 500 lines
+- Compliment if well-sized (ideal: 100-300 lines for medium skills)
+- Check if TOC exists (expected for 100+ line skills)
+
+#### Dimension 2: Scope Definition
+**Guidelines:**
+- Narrow focus (one skill = one capability)
+- Clear boundary of what the skill does and doesn't do
+- No scope creep (e.g., "document processing" → "PDF form filling")
+
+**Assessment:**
+- Does the description clearly state what the skill does?
+- Are there multiple conflicting capabilities within one skill?
+- Is the boundary clear to a new user?
+
+#### Dimension 3: Description Quality
+**Guidelines:**
+- Third-person voice (avoid "I can" or "you can")
+- Include both WHAT and WHEN TO USE
+- Specific, searchable terminology
+- 200 character summary ideal
+
+**Assessment:**
+- Voice and tone appropriate?
+- Discovery terms clear? (Would users search for these terms?)
+- Is "when to use" explained?
+
+#### Dimension 4: Structure & Organization
+**Guidelines:**
+- Clear section hierarchy (headings, subsections)
+- Logical flow (progressive disclosure)
+- Step-by-step instructions preferred for workflows
+- Rules/constraints clearly stated
+
+**Assessment:**
+- Is structure logical?
+- Can a user easily navigate?
+- Are instructions sequential or scattered?
+
+#### Dimension 5: Examples
+**Guidelines:**
+- Quality over quantity
+- Typical: 2-3 examples for basic skills, more for format-heavy
+- Concrete (not abstract)
+- Show patterns and edge cases
+
+**Assessment:**
+- How many examples? (count them)
+- Are examples concrete and realistic?
+- Do they demonstrate key patterns?
+- Are there enough to show variations?
+
+#### Dimension 6: Anti-Pattern Detection
+**Red flags (check for these):**
+- ❌ Windows-style paths (should use forward slashes)
+- ❌ Magic numbers without justification
+- ❌ Vague terminology (inconsistent synonyms)
+- ❌ Time-sensitive instructions (date-dependent)
+- ❌ Deeply nested file references (over 2 levels)
+- ❌ Vague descriptions (missing WHAT or WHEN)
+- ❌ Scope creep (trying to do too much)
+- ❌ No error handling or validation steps
+- ❌ No user feedback loops (for complex workflows)
+- ❌ Multiple conflicting approaches for same task
+
+**Assessment:**
+- Count violations
+- Severity of each violation
+- Impact on usability
+
+#### Dimension 7: Prompt Engineering Quality
+**Guidelines:**
+- Imperative language (verb-first instructions)
+- Explicit rules with clear boundaries
+- Validation loops where appropriate (especially for destructive ops)
+- Clear error handling
+- Assumes user is intelligent (don't over-explain)
+
+**Assessment:**
+- Is language imperative?
+- Are there validation steps?
+- How clear are the rules?
+- Is error handling explicit?
+
+#### Dimension 8: Completeness
+**Guidelines:**
+- Requirements listed (what's needed to use the skill)
+- Edge cases acknowledged
+- Limitations stated where relevant
+
+**Assessment:**
+- Are prerequisites clear?
+- Are limitations or edge cases mentioned?
+- Is scope of responsibility clear?
+
+### 4. Generate Comprehensive Evaluation Report
+
+Create a detailed evaluation report with these components:
+
+1. **Executive Summary**: 1-2 paragraphs covering overall assessment, key strengths, and critical issues
+
+2. **Metrics**: Present line count, word count, character count, and guideline compliance assessment
+
+3. **Dimensional Analysis**: For each of the 8 dimensions:
+   - Status indicator (✓ Pass / ⚠ Warning / ❌ Fail)
+   - 1-2 sentence assessment explaining the rating
+
+4. **Detected Issues**: Organize by severity:
+   - Critical Issues (must fix) - any ❌ Fail items with explanation
+   - Warnings (should address) - any ⚠ Warning items with explanation
+   - Observations (minor items worth noting)
+
+5. **Comparative Analysis**: Compare the skill against official skills repository patterns with examples and rationale
+
+6. **Actionable Suggestions**: Numbered list of specific improvements, prioritized by impact:
+   - High Priority (do this first)
+   - Medium Priority (nice to have)
+   - Low Priority (optional refinements)
+
+   Each suggestion should include concrete rationale, not vague guidance.
+
+7. **Overall Assessment**:
+   - Professional verdict on production-readiness
+   - Clear recommendation (Keep as-is / Minor tweaks / Significant refactor / Major restructure)
+
+### 5. Deliver Report to User
+
+Present the complete evaluation report to the user in a clear, formatted structure. Ensure:
+- Status indicators are visible (✓ Pass / ⚠ Warning / ❌ Fail)
+- Actionable suggestions are specific (not vague)
+- Rationale is explained for each issue
+- Prioritization is clear
+
+## Important Guidelines
+
+- **Be brutally honest**: Point out real issues, don't sugarcoat
+- **Specific over vague**: "The examples don't show error handling" not "examples could be better"
+- **Professional tone**: Constructive criticism, not harsh
+- **Evidence-based**: Reference specific lines or patterns from the skill
+- **Proportional feedback**: Don't over-critique minor issues
+- **Future-focused**: Suggest improvements, not judgment
+
+## Requirements
+
+- User has installed skills in `~/.claude/skills/`
+- Target skill has a valid `SKILL.md` file with frontmatter
+- User accepts the detailed, honest evaluation
+
+## Context & Standards
+
+This evaluator uses best practices from:
+- Official Anthropic Claude Code Skills documentation
+- Analysis of official skills repository patterns
+- Professional technical writing standards
+- Prompt engineering best practices for LLM interactions
+
+All assessments are comparative to **official guidelines**, not arbitrary standards.