15 KiB
Quality Checklist - Detailed Criteria
Comprehensive quality criteria for skill review with examples and guidance.
1. Progressive Disclosure
What to check: Information is properly layered across metadata, instructions, and resources.
Good example:
---
name: pdf-processor
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files.
---
# PDF Processor
## Quick Start
Use pdfplumber to extract text:
```python
import pdfplumber
with pdfplumber.open("doc.pdf") as pdf:
text = pdf.pages[0].extract_text()
For form filling, see forms.md. For advanced table extraction, see tables.md.
**Bad example**:
```markdown
# PDF Processor
## Complete API Reference
[500 lines of pdfplumber API documentation inline...]
## All Possible Workflows
[50 different use cases with full code...]
## Configuration Options
[Every configuration parameter explained...]
Why it matters: Skills should load incrementally. Metadata is always loaded (tiny), SKILL.md loads when triggered (small), references load as needed (can be large).
Review questions:
- Is SKILL.md under 5k tokens?
- Are detailed references offloaded to separate files?
- Does SKILL.md link to references instead of duplicating content?
2. Mental Model Shift
What to check: Skill is described as the canonical way, not a "new" or "recommended" feature.
Good example:
# Session Registry
Use the session registry for automatic session tracking. This eliminates manual socket management.
## Standard Workflow
1. Create a session: `create-session.sh -n my-session`
2. Use the session: `safe-send.sh -s my-session -c "command"`
Bad example:
# Session Registry (NEW!)
The session registry is a new recommended feature that you can optionally use instead of manual socket management.
## Two Approaches
### Approach 1: Manual Socket Management (Traditional)
[old way...]
### Approach 2: Session Registry (Recommended!)
[new way...]
Why it matters: Mental model shift means the feature becomes "the way" things are done, not an alternative. Documentation should reflect this confidence.
Red flags:
- "New feature" or "recommended approach"
- Side-by-side comparisons of old vs new
- Hedging language ("you might want to", "consider using")
- "Traditional" or "legacy" alongside "new"
Review questions:
- Does the documentation present this as THE way to do the task?
- Is old/alternative approach relegated to a "Manual Alternative" section?
- Does language convey confidence rather than optionality?
3. Degree of Freedom
What to check: Instructions match the declared autonomy level (high/medium/low).
High Freedom (principles and heuristics):
## Analyzing Code Quality
Review code for:
- Readability and maintainability
- Performance implications
- Security concerns
- Test coverage
Consider the project's context and constraints when making recommendations.
Medium Freedom (preferred patterns with parameters):
## Creating Tests
Use pytest with this structure:
```python
def test_feature_name():
# Arrange: Setup test data
# Act: Execute the feature
# Assert: Verify results
Adjust assertion strictness based on feature criticality.
**Low Freedom** (specific steps):
```markdown
## Deploying to Production
Execute exactly in order:
1. Run: `make test` (must pass 100%)
2. Run: `make build`
3. Tag release: `git tag v1.x.x`
4. Push: `git push origin v1.x.x`
5. Run: `./deploy.sh production`
6. Monitor: `tail -f /var/log/app.log` for 5 minutes
Why it matters: Mismatch creates confusion. High freedom tasks shouldn't be over-specified. Low freedom tasks shouldn't be under-specified.
Review questions:
- Is the freedom level explicitly stated or clearly implied?
- Do instructions match that freedom level?
- Are fragile operations given low freedom with exact steps?
- Are creative/contextual tasks given high freedom?
4. SKILL.md Conciseness
What to check: SKILL.md is lean, actionable, and purpose-driven.
Good example (concise):
# API Client
## Authentication
Set `API_KEY` environment variable before making requests.
## Making Requests
```python
import requests
response = requests.get(
"https://api.example.com/data",
headers={"Authorization": f"Bearer {os.getenv('API_KEY')}"}
)
For all endpoints and parameters, see api-reference.md.
**Bad example** (verbose):
```markdown
# API Client
## Introduction
This skill helps you interact with the Example API. The API provides various endpoints for data access and manipulation. Founded in 2020, Example Corp offers...
## Why Use This Skill
Benefits of using this skill include...
- Consistent authentication
- Error handling
- Rate limiting
[more marketing copy...]
## Prerequisites
Before you begin, make sure you have:
1. An API key (see below for how to obtain)
2. Python 3.7+ installed
3. requests library (can be installed via pip)
4. A stable internet connection
...
Why it matters: Context window is expensive. Every word should earn its place.
Conciseness checklist:
- ❌ Marketing language or lengthy introductions
- ❌ Redundant explanations of obvious concepts
- ❌ Walls of text that could be examples
- ✅ Direct, actionable instructions
- ✅ Minimal but representative examples
- ✅ Links to references for depth
Review questions:
- Could any section be condensed by 50% without losing clarity?
- Are there marketing phrases or fluff?
- Do examples replace explanations where possible?
- Is depth offloaded to references/?
5. Safety & Failure Handling
What to check: Guardrails for dangerous actions, clear failure modes, recovery steps.
Good example:
## Deploying Changes
**⚠️ WARNING**: This deploys to production. Ensure tests pass before proceeding.
```bash
# Verify tests first
make test || { echo "Tests failed - aborting"; exit 1; }
# Deploy
./deploy.sh production
If deployment fails:
- Check logs:
tail -f /var/log/deploy.log - Rollback:
./deploy.sh rollback - Verify:
curl https://api.example.com/health
Rollback steps:
git revert HEAD
./deploy.sh production
**Bad example**:
```markdown
## Deploying Changes
Run: `./deploy.sh production`
Why it matters: Skills often perform critical or destructive operations. Users need to know what can go wrong and how to recover.
Safety elements:
- Warnings for destructive operations
- Validation steps before critical actions
- Failure modes documented
- Recovery procedures provided
- Assumptions stated explicitly
Review questions:
- Are dangerous operations flagged with warnings?
- Are there validation steps before destructive actions?
- Are failure scenarios documented?
- Are rollback/recovery steps provided?
6. Resource Hygiene
What to check: References are current, minimal, discoverable, and properly linked.
Good example:
skill-name/
├── SKILL.md
└── references/
├── api-reference.md (current, focused)
├── examples.md (representative cases)
└── troubleshooting.md (common issues)
SKILL.md properly links:
See [API Reference](references/api-reference.md) for all endpoints.
For common issues, check [Troubleshooting](references/troubleshooting.md).
Bad example:
skill-name/
├── SKILL.md
└── references/
├── docs.md (duplicates SKILL.md)
├── api-v1.md (outdated)
├── api-v2.md (current but not clear)
├── examples-old.md (deprecated)
├── examples-new.md (current)
├── random-notes.md (unclear purpose)
└── README.md (redundant)
Resource hygiene checklist:
- ✅ Each file has clear, unique purpose
- ✅ File names indicate content
- ✅ No duplicate information
- ✅ Links from SKILL.md resolve
- ✅ No outdated or deprecated content
- ✅ Secret handling documented if applicable
Review questions:
- Is each reference file's purpose clear from its name?
- Are all links from SKILL.md valid?
- Is there duplicate content between files?
- Are outdated resources removed?
7. Consistency & Clarity
What to check: Terminology consistent, flow logical, formatting readable.
Good example:
# Database Migration Tool
## Running Migrations
Apply all pending migrations:
```bash
./migrate.sh apply
Rollback the last migration:
./migrate.sh rollback
Migration Files
Create new migration:
./migrate.sh create add_users_table
This creates migrations/001_add_users_table.sql.
**Bad example**:
```markdown
# Database Migration Tool
## Executing Migrations
Run migrations using the migration runner:
```bash
./run-migrations.sh
Reverting Changes
Undo schema modifications:
./rollback-db.sh
Creating Migration Scripts
Generate new migration file:
./new-migration.sh
**Consistency issues** in bad example:
- Command names inconsistent (`./migrate.sh` vs `./run-migrations.sh`)
- Terminology varies ("migrations" vs "schema modifications")
- Section headings use different patterns
**Clarity checklist**:
- ✅ Consistent terminology throughout
- ✅ Logical section ordering
- ✅ Clear, unambiguous instructions
- ✅ Readable formatting and spacing
- ✅ No conflicting guidance
**Review questions**:
- Is the same concept called by the same name throughout?
- Do sections flow in logical order?
- Are commands/tools referenced consistently?
- Is formatting consistent?
---
## 8. Testing & Verification
**What to check**: Quick checks, expected outputs, or smoke tests included.
**Good example**:
```markdown
## Verification
Test the installation:
```bash
./health-check.sh
Expected output:
✓ API connection successful
✓ Database accessible
✓ Cache configured
All systems operational
Quick smoke test:
# Should return status 200
curl -I https://api.example.com/health
**Bad example**:
```markdown
## Usage
Run the tool:
```bash
./tool.sh
**Why it matters**: Users need to verify the skill works correctly and understand what success looks like.
**Testing elements**:
- **Smoke tests**: Quick checks that basic functionality works
- **Expected outputs**: What success looks like
- **Verification steps**: How to confirm it's working
- **Example runs**: Representative use cases with results
**Review questions**:
- Are there quick verification steps?
- Is expected output shown?
- Can users confirm the skill works?
- Are examples testable/reproducible?
---
## 9. Ownership & Maintenance (Optional)
**What to check**: Known limitations documented. Version/maintainer metadata optional but recommended for team/public skills.
**Note**: Marketplace-level changelogs (changelogs/skill-name.md) are required per marketplace standards and provide versioning at the marketplace level. The Version section within SKILL.md itself is optional.
**When to include version metadata in SKILL.md**:
- Public marketplace skills (helps users track updates)
- Team-shared skills (clarifies who maintains it)
- Skills with frequent breaking changes (version tracking important)
**When to skip version metadata in SKILL.md**:
- Personal skills for individual use
- Experimental/prototype skills
- Skills where marketplace changelogs provide sufficient tracking
**Example with optional version metadata**:
```markdown
# API Integration Skill
**Version**: 1.2.0
**Maintainer**: DevTools Team (devtools@example.com)
## Known Limitations
- Rate limited to 100 requests/minute
- Large file uploads (>10MB) not supported
- Requires Python 3.8+
Minimal example (recommended for most skills):
# Simple Helper Skill
## Known Limitations
- Works only on Linux/macOS
- Requires bash 4.0+
Why it matters: Known limitations help users understand constraints. Version metadata in SKILL.md is helpful for team coordination but optional.
Core elements:
- Known Limitations (recommended): Document constraints and requirements
- Version (optional): Current version number
- Maintainer (optional): Contact info for questions
- Changelog (optional in SKILL.md): Can reference marketplace changelog or references/
Review questions:
- Are known limitations or requirements documented?
- If version metadata is present, is it complete and current?
- If this is a team skill, is maintainer contact info provided?
10. Tight Scope & Minimalism
What to check: Focused purpose, no feature creep, no overlapping functionality.
Good example (focused):
# PDF Text Extractor
Extract text content from PDF files using pdfplumber.
## Supported Operations
- Extract text from single page
- Extract text from all pages
- Extract text with layout preservation
**Not covered** (use pdf-form-filler skill):
- Form filling
- PDF editing
Bad example (scope creep):
# PDF Swiss Army Knife
Complete PDF toolkit for all your document needs!
## Features
- Text extraction
- Image extraction
- Form filling
- PDF editing
- PDF merging
- PDF splitting
- Watermarking
- OCR processing
- Compression
- Encryption
- Digital signatures
- Conversion to Word/Excel
- Email integration
- Cloud storage sync
Why it matters: Focused skills are easier to maintain, understand, and use. Feature creep dilutes the skill's purpose and increases complexity.
Scope checklist:
- ✅ Solves one focused job well
- ✅ Clear boundaries (what's in, what's out)
- ✅ No overlapping functionality with other skills
- ✅ No unrelated features
- ✅ Complexity matches the actual need
Review questions:
- Does the skill do one thing well?
- Are there unrelated features that should be separate skills?
- Does functionality overlap with existing skills?
- Is complexity justified by the use case?
Using This Checklist
Quick Review (5-10 minutes)
Scan for obvious issues:
- Check SKILL.md length (should be under 5k tokens)
- Verify progressive disclosure (links to references/)
- Look for mental model language ("new feature", "recommended")
- Check for safety warnings on destructive operations
- Verify examples are present and minimal
Thorough Review (30-60 minutes)
Apply all 10 criteria systematically:
- Read SKILL.md completely
- Check frontmatter quality
- Verify each criterion with examples
- Review all reference files
- Test examples if possible
- Document findings in review report
Common Review Patterns
New Skill:
- Focus on criteria 1, 2, 4, 10 (structure and scope)
- Verify progressive disclosure from the start
- Ensure mental model language is correct
Updated Skill:
- Focus on criteria 3, 7, 9 (consistency with changes)
- Check that updates didn't break existing patterns
- Verify changelog is updated
Audit:
- Apply all 10 criteria
- Compare against other skills for consistency
- Look for improvement opportunities