Initial commit

2025-11-29 18:17:56 +08:00
commit 7290dfdcfb
10 changed files with 2646 additions and 0 deletions
--- a/references/quality-checklist.md
+++ b/references/quality-checklist.md
@@ -0,0 +1,601 @@
+# Quality Checklist - Detailed Criteria
+
+Comprehensive quality criteria for skill review with examples and guidance.
+
+## 1. Progressive Disclosure
+
+**What to check**: Information is properly layered across metadata, instructions, and resources.
+
+**Good example**:
+```yaml
+---
+name: pdf-processor
+description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files.
+---
+
+# PDF Processor
+
+## Quick Start
+Use pdfplumber to extract text:
+```python
+import pdfplumber
+with pdfplumber.open("doc.pdf") as pdf:
+    text = pdf.pages[0].extract_text()
+```
+
+For form filling, see [forms.md](references/forms.md).
+For advanced table extraction, see [tables.md](references/tables.md).
+```
+
+**Bad example**:
+```markdown
+# PDF Processor
+
+## Complete API Reference
+[500 lines of pdfplumber API documentation inline...]
+
+## All Possible Workflows
+[50 different use cases with full code...]
+
+## Configuration Options
+[Every configuration parameter explained...]
+```
+
+**Why it matters**: Skills should load incrementally. Metadata is always loaded (tiny), SKILL.md loads when triggered (small), references load as needed (can be large).
+
+**Review questions**:
+- Is SKILL.md under 5k tokens?
+- Are detailed references offloaded to separate files?
+- Does SKILL.md link to references instead of duplicating content?
+
+---
+
+## 2. Mental Model Shift
+
+**What to check**: Skill is described as the canonical way, not a "new" or "recommended" feature.
+
+**Good example**:
+```markdown
+# Session Registry
+
+Use the session registry for automatic session tracking. This eliminates manual socket management.
+
+## Standard Workflow
+1. Create a session: `create-session.sh -n my-session`
+2. Use the session: `safe-send.sh -s my-session -c "command"`
+```
+
+**Bad example**:
+```markdown
+# Session Registry (NEW!)
+
+The session registry is a new recommended feature that you can optionally use instead of manual socket management.
+
+## Two Approaches
+### Approach 1: Manual Socket Management (Traditional)
+[old way...]
+
+### Approach 2: Session Registry (Recommended!)
+[new way...]
+```
+
+**Why it matters**: Mental model shift means the feature becomes "the way" things are done, not an alternative. Documentation should reflect this confidence.
+
+**Red flags**:
+- "New feature" or "recommended approach"
+- Side-by-side comparisons of old vs new
+- Hedging language ("you might want to", "consider using")
+- "Traditional" or "legacy" alongside "new"
+
+**Review questions**:
+- Does the documentation present this as THE way to do the task?
+- Is old/alternative approach relegated to a "Manual Alternative" section?
+- Does language convey confidence rather than optionality?
+
+---
+
+## 3. Degree of Freedom
+
+**What to check**: Instructions match the declared autonomy level (high/medium/low).
+
+**High Freedom** (principles and heuristics):
+```markdown
+## Analyzing Code Quality
+
+Review code for:
+- Readability and maintainability
+- Performance implications
+- Security concerns
+- Test coverage
+
+Consider the project's context and constraints when making recommendations.
+```
+
+**Medium Freedom** (preferred patterns with parameters):
+```markdown
+## Creating Tests
+
+Use pytest with this structure:
+```python
+def test_feature_name():
+    # Arrange: Setup test data
+    # Act: Execute the feature
+    # Assert: Verify results
+```
+
+Adjust assertion strictness based on feature criticality.
+```
+
+**Low Freedom** (specific steps):
+```markdown
+## Deploying to Production
+
+Execute exactly in order:
+1. Run: `make test` (must pass 100%)
+2. Run: `make build`
+3. Tag release: `git tag v1.x.x`
+4. Push: `git push origin v1.x.x`
+5. Run: `./deploy.sh production`
+6. Monitor: `tail -f /var/log/app.log` for 5 minutes
+```
+
+**Why it matters**: Mismatch creates confusion. High freedom tasks shouldn't be over-specified. Low freedom tasks shouldn't be under-specified.
+
+**Review questions**:
+- Is the freedom level explicitly stated or clearly implied?
+- Do instructions match that freedom level?
+- Are fragile operations given low freedom with exact steps?
+- Are creative/contextual tasks given high freedom?
+
+---
+
+## 4. SKILL.md Conciseness
+
+**What to check**: SKILL.md is lean, actionable, and purpose-driven.
+
+**Good example** (concise):
+```markdown
+# API Client
+
+## Authentication
+Set `API_KEY` environment variable before making requests.
+
+## Making Requests
+```python
+import requests
+response = requests.get(
+    "https://api.example.com/data",
+    headers={"Authorization": f"Bearer {os.getenv('API_KEY')}"}
+)
+```
+
+For all endpoints and parameters, see [api-reference.md](references/api-reference.md).
+```
+
+**Bad example** (verbose):
+```markdown
+# API Client
+
+## Introduction
+This skill helps you interact with the Example API. The API provides various endpoints for data access and manipulation. Founded in 2020, Example Corp offers...
+
+## Why Use This Skill
+Benefits of using this skill include...
+- Consistent authentication
+- Error handling
+- Rate limiting
+[more marketing copy...]
+
+## Prerequisites
+Before you begin, make sure you have:
+1. An API key (see below for how to obtain)
+2. Python 3.7+ installed
+3. requests library (can be installed via pip)
+4. A stable internet connection
+...
+```
+
+**Why it matters**: Context window is expensive. Every word should earn its place.
+
+**Conciseness checklist**:
+- ❌ Marketing language or lengthy introductions
+- ❌ Redundant explanations of obvious concepts
+- ❌ Walls of text that could be examples
+- ✅ Direct, actionable instructions
+- ✅ Minimal but representative examples
+- ✅ Links to references for depth
+
+**Review questions**:
+- Could any section be condensed by 50% without losing clarity?
+- Are there marketing phrases or fluff?
+- Do examples replace explanations where possible?
+- Is depth offloaded to references/?
+
+---
+
+## 5. Safety & Failure Handling
+
+**What to check**: Guardrails for dangerous actions, clear failure modes, recovery steps.
+
+**Good example**:
+```markdown
+## Deploying Changes
+
+**⚠️  WARNING**: This deploys to production. Ensure tests pass before proceeding.
+
+```bash
+# Verify tests first
+make test || { echo "Tests failed - aborting"; exit 1; }
+
+# Deploy
+./deploy.sh production
+```
+
+**If deployment fails**:
+1. Check logs: `tail -f /var/log/deploy.log`
+2. Rollback: `./deploy.sh rollback`
+3. Verify: `curl https://api.example.com/health`
+
+**Rollback steps**:
+```bash
+git revert HEAD
+./deploy.sh production
+```
+```
+
+**Bad example**:
+```markdown
+## Deploying Changes
+
+Run: `./deploy.sh production`
+```
+
+**Why it matters**: Skills often perform critical or destructive operations. Users need to know what can go wrong and how to recover.
+
+**Safety elements**:
+- **Warnings** for destructive operations
+- **Validation** steps before critical actions
+- **Failure modes** documented
+- **Recovery procedures** provided
+- **Assumptions** stated explicitly
+
+**Review questions**:
+- Are dangerous operations flagged with warnings?
+- Are there validation steps before destructive actions?
+- Are failure scenarios documented?
+- Are rollback/recovery steps provided?
+
+---
+
+## 6. Resource Hygiene
+
+**What to check**: References are current, minimal, discoverable, and properly linked.
+
+**Good example**:
+```
+skill-name/
+├── SKILL.md
+└── references/
+    ├── api-reference.md (current, focused)
+    ├── examples.md (representative cases)
+    └── troubleshooting.md (common issues)
+```
+
+SKILL.md properly links:
+```markdown
+See [API Reference](references/api-reference.md) for all endpoints.
+For common issues, check [Troubleshooting](references/troubleshooting.md).
+```
+
+**Bad example**:
+```
+skill-name/
+├── SKILL.md
+└── references/
+    ├── docs.md (duplicates SKILL.md)
+    ├── api-v1.md (outdated)
+    ├── api-v2.md (current but not clear)
+    ├── examples-old.md (deprecated)
+    ├── examples-new.md (current)
+    ├── random-notes.md (unclear purpose)
+    └── README.md (redundant)
+```
+
+**Resource hygiene checklist**:
+- ✅ Each file has clear, unique purpose
+- ✅ File names indicate content
+- ✅ No duplicate information
+- ✅ Links from SKILL.md resolve
+- ✅ No outdated or deprecated content
+- ✅ Secret handling documented if applicable
+
+**Review questions**:
+- Is each reference file's purpose clear from its name?
+- Are all links from SKILL.md valid?
+- Is there duplicate content between files?
+- Are outdated resources removed?
+
+---
+
+## 7. Consistency & Clarity
+
+**What to check**: Terminology consistent, flow logical, formatting readable.
+
+**Good example**:
+```markdown
+# Database Migration Tool
+
+## Running Migrations
+
+Apply all pending migrations:
+```bash
+./migrate.sh apply
+```
+
+Rollback the last migration:
+```bash
+./migrate.sh rollback
+```
+
+## Migration Files
+
+Create new migration:
+```bash
+./migrate.sh create add_users_table
+```
+
+This creates `migrations/001_add_users_table.sql`.
+```
+
+**Bad example**:
+```markdown
+# Database Migration Tool
+
+## Executing Migrations
+
+Run migrations using the migration runner:
+```bash
+./run-migrations.sh
+```
+
+## Reverting Changes
+
+Undo schema modifications:
+```bash
+./rollback-db.sh
+```
+
+## Creating Migration Scripts
+
+Generate new migration file:
+```bash
+./new-migration.sh
+```
+```
+
+**Consistency issues** in bad example:
+- Command names inconsistent (`./migrate.sh` vs `./run-migrations.sh`)
+- Terminology varies ("migrations" vs "schema modifications")
+- Section headings use different patterns
+
+**Clarity checklist**:
+- ✅ Consistent terminology throughout
+- ✅ Logical section ordering
+- ✅ Clear, unambiguous instructions
+- ✅ Readable formatting and spacing
+- ✅ No conflicting guidance
+
+**Review questions**:
+- Is the same concept called by the same name throughout?
+- Do sections flow in logical order?
+- Are commands/tools referenced consistently?
+- Is formatting consistent?
+
+---
+
+## 8. Testing & Verification
+
+**What to check**: Quick checks, expected outputs, or smoke tests included.
+
+**Good example**:
+```markdown
+## Verification
+
+Test the installation:
+```bash
+./health-check.sh
+```
+
+**Expected output**:
+```
+✓ API connection successful
+✓ Database accessible
+✓ Cache configured
+All systems operational
+```
+
+**Quick smoke test**:
+```bash
+# Should return status 200
+curl -I https://api.example.com/health
+```
+```
+
+**Bad example**:
+```markdown
+## Usage
+
+Run the tool:
+```bash
+./tool.sh
+```
+```
+
+**Why it matters**: Users need to verify the skill works correctly and understand what success looks like.
+
+**Testing elements**:
+- **Smoke tests**: Quick checks that basic functionality works
+- **Expected outputs**: What success looks like
+- **Verification steps**: How to confirm it's working
+- **Example runs**: Representative use cases with results
+
+**Review questions**:
+- Are there quick verification steps?
+- Is expected output shown?
+- Can users confirm the skill works?
+- Are examples testable/reproducible?
+
+---
+
+## 9. Ownership & Maintenance (Optional)
+
+**What to check**: Known limitations documented. Version/maintainer metadata optional but recommended for team/public skills.
+
+**Note**: Marketplace-level changelogs (changelogs/skill-name.md) are required per marketplace standards and provide versioning at the marketplace level. The Version section within SKILL.md itself is optional.
+
+**When to include version metadata in SKILL.md**:
+- Public marketplace skills (helps users track updates)
+- Team-shared skills (clarifies who maintains it)
+- Skills with frequent breaking changes (version tracking important)
+
+**When to skip version metadata in SKILL.md**:
+- Personal skills for individual use
+- Experimental/prototype skills
+- Skills where marketplace changelogs provide sufficient tracking
+
+**Example with optional version metadata**:
+```markdown
+# API Integration Skill
+
+**Version**: 1.2.0
+**Maintainer**: DevTools Team (devtools@example.com)
+
+## Known Limitations
+
+- Rate limited to 100 requests/minute
+- Large file uploads (>10MB) not supported
+- Requires Python 3.8+
+```
+
+**Minimal example (recommended for most skills)**:
+```markdown
+# Simple Helper Skill
+
+## Known Limitations
+
+- Works only on Linux/macOS
+- Requires bash 4.0+
+```
+
+**Why it matters**: Known limitations help users understand constraints. Version metadata in SKILL.md is helpful for team coordination but optional.
+
+**Core elements**:
+- **Known Limitations** (recommended): Document constraints and requirements
+- **Version** (optional): Current version number
+- **Maintainer** (optional): Contact info for questions
+- **Changelog** (optional in SKILL.md): Can reference marketplace changelog or references/
+
+**Review questions**:
+- Are known limitations or requirements documented?
+- If version metadata is present, is it complete and current?
+- If this is a team skill, is maintainer contact info provided?
+
+---
+
+## 10. Tight Scope & Minimalism
+
+**What to check**: Focused purpose, no feature creep, no overlapping functionality.
+
+**Good example** (focused):
+```markdown
+# PDF Text Extractor
+
+Extract text content from PDF files using pdfplumber.
+
+## Supported Operations
+- Extract text from single page
+- Extract text from all pages
+- Extract text with layout preservation
+
+**Not covered** (use pdf-form-filler skill):
+- Form filling
+- PDF editing
+```
+
+**Bad example** (scope creep):
+```markdown
+# PDF Swiss Army Knife
+
+Complete PDF toolkit for all your document needs!
+
+## Features
+- Text extraction
+- Image extraction
+- Form filling
+- PDF editing
+- PDF merging
+- PDF splitting
+- Watermarking
+- OCR processing
+- Compression
+- Encryption
+- Digital signatures
+- Conversion to Word/Excel
+- Email integration
+- Cloud storage sync
+```
+
+**Why it matters**: Focused skills are easier to maintain, understand, and use. Feature creep dilutes the skill's purpose and increases complexity.
+
+**Scope checklist**:
+- ✅ Solves one focused job well
+- ✅ Clear boundaries (what's in, what's out)
+- ✅ No overlapping functionality with other skills
+- ✅ No unrelated features
+- ✅ Complexity matches the actual need
+
+**Review questions**:
+- Does the skill do one thing well?
+- Are there unrelated features that should be separate skills?
+- Does functionality overlap with existing skills?
+- Is complexity justified by the use case?
+
+---
+
+## Using This Checklist
+
+### Quick Review (5-10 minutes)
+
+Scan for obvious issues:
+1. Check SKILL.md length (should be under 5k tokens)
+2. Verify progressive disclosure (links to references/)
+3. Look for mental model language ("new feature", "recommended")
+4. Check for safety warnings on destructive operations
+5. Verify examples are present and minimal
+
+### Thorough Review (30-60 minutes)
+
+Apply all 10 criteria systematically:
+1. Read SKILL.md completely
+2. Check frontmatter quality
+3. Verify each criterion with examples
+4. Review all reference files
+5. Test examples if possible
+6. Document findings in review report
+
+### Common Review Patterns
+
+**New Skill**:
+- Focus on criteria 1, 2, 4, 10 (structure and scope)
+- Verify progressive disclosure from the start
+- Ensure mental model language is correct
+
+**Updated Skill**:
+- Focus on criteria 3, 7, 9 (consistency with changes)
+- Check that updates didn't break existing patterns
+- Verify changelog is updated
+
+**Audit**:
+- Apply all 10 criteria
+- Compare against other skills for consistency
+- Look for improvement opportunities