zhongwei/gh-racurry-neat-little-package-plugins-box-factory

Files

Zhongwei Li 2b34b5aa74 Initial commit

2025-11-30 08:48:43 +08:00

25 KiB

Raw Blame History

name, description

name	description
box-factory-skill-design	Meta-skill that teaches how to design Claude Code skills following the Box Factory philosophy. Helps you understand when to create skills, how to structure them for low maintenance, and how to add value beyond documentation. Use when creating or reviewing skills.

Skill Design Skill

This meta-skill teaches you how to design excellent Claude Code skills. Skills are unique among Claude Code components - they provide progressive knowledge disclosure and interpretive guidance that loads when relevant.

Required Reading Before Creating Skills

Official documentation: Skills are part of the agent system but don't have dedicated documentation. Their purpose and structure are inferred from:

https://code.claude.com/docs/en/sub-agents.md - Skills mentioned as knowledge loaded when relevant
Existing skills in the wild - Examine high-quality skills for patterns

Core Understanding

Skills Are Progressive Knowledge Disclosure

Key insight: Skills solve the "you can't put everything in the system prompt" problem.

What this means:

Without skills: Important knowledge buried in long prompts or forgotten
With skills: Knowledge loads automatically when topics become relevant
Value proposition: Right information at the right time, without token bloat

Decision test: Does this information need to be available across multiple contexts, but not always?

Skills vs System Prompts vs CLAUDE.md

Skills are for:

Substantial procedural expertise (20+ lines of guidance)
Domain-specific knowledge needed sporadically
Interpretive frameworks that enhance understanding
Best practices that apply across multiple scenarios

System prompts are for:

Always-relevant instructions
Core behavior and personality
Universal constraints

CLAUDE.md is for:

Project-specific context
Repository structure
Team conventions
Always-loaded information

Knowledge Delta Filter (Critical Understanding)

THE MOST IMPORTANT PRINCIPLE: Skills should only document what Claude DOESN'T already know.

Why this matters: Claude's training includes extensive knowledge of common development tools, standard workflows, well-established frameworks, and general best practices. Skills that duplicate this base knowledge waste tokens and create maintenance burden without adding value.

Skills document the DELTA - the difference between Claude's base knowledge and what Claude needs to know for your specific context.

INCLUDE in skills (the delta):

✅ User-specific preferences and conventions

"This user wants commit messages terse, single-line, no emojis, no attribution"
"This team uses specific naming conventions not found in standard docs"
"This project requires custom workflow steps"
Example: User's preference for no "Generated with Claude Code" messages

✅ Edge cases and gotchas Claude would miss

"Pre-commit hooks that modify files require retry with --amend"
"This API has undocumented rate limiting behavior"
"File system paths need special escaping in this environment"
Example: Specific retry logic for linter hooks that auto-fix

✅ Decision frameworks for ambiguous situations

"When to use gh CLI vs GitHub MCP server in this project"
"Tool selection hierarchy when multiple options exist"
"Which pattern to prefer when standards conflict"
Example: Prefer gh CLI when available, fall back to MCP

✅ Things Claude gets wrong without guidance

"Claude invents unsupported frontmatter in slash commands"
"Claude uses deprecated syntax for Tool X without this guidance"
"Claude doesn't know about this project-specific integration pattern"
Example: Claude making up skills: git-workflow frontmatter that doesn't exist

✅ New or rapidly-changing technology (post-training)

Claude Code itself (released after training cutoff)
New framework versions with breaking changes
Emerging tools not well-represented in training data
Example: Claude Code plugin system specifics

✅ Integration patterns between tools (project-specific)

"How this project connects Tool A with Tool B"
"Custom workflow orchestration"
"Project-specific toolchain configuration"
Example: Using both gh CLI and GitHub MCP server in same plugin

EXCLUDE from skills (Claude already knows):

❌ Basic commands for well-known tools

Don't document: git status, git commit, git push, git diff
Don't document: npm install, pip install, docker run
Don't document: Standard CLI flags and options Claude knows
Claude learned this in training and doesn't need reminders

❌ Standard workflows Claude knows

Don't document: Basic git branching workflow
Don't document: Standard PR review process
Don't document: Common testing patterns
These are well-established practices in Claude's training

❌ General best practices (not project-specific)

Don't document: "Write clear commit messages"
Don't document: "Test your code before committing"
Don't document: "Use semantic versioning"
Claude already knows these principles

❌ Well-established patterns for common tools

Don't document: REST API design basics
Don't document: Standard design patterns (MVC, etc.)
Don't document: Common security practices Claude knows
Training data covers these extensively

Decision Test: Should This Be In A Skill?

Before including content in a skill, ask:

Would Claude get this wrong without the skill?
- Yes → Include it (fills a knowledge gap)
- No → Exclude it (redundant with training)
Is this specific to this user/project/context?
- Yes → Include it (contextual delta)
- No → Probably exclude it (general knowledge)
Is this well-documented in Claude's training data?
- No (new/custom/edge case) → Include it
- Yes (standard practice) → Exclude it
Would this information change Claude's behavior?
- Yes (corrects mistakes or fills gaps) → Include it
- No (Claude already behaves this way) → Exclude it

Example: git-workflow skill

❌ Bad (includes base knowledge):

480 lines including:
- How to use git status, git diff, git commit
- Basic branching operations
- Standard commit message formats
- Common git commands

95% redundant, 5% valuable

✅ Good (only includes delta):

~80 lines including:
- User's specific commit format preferences
- Edge case: pre-commit hook retry logic
- User requirement: no attribution text

100% valuable, focused on what Claude doesn't know

The Box Factory Philosophy for Skills

1. Low-Maintenance by Design

Defer to official documentation via WebFetch:

## Required Reading Before Creating Agents

Fetch these docs with WebFetch every time:

- **https://code.claude.com/docs/en/sub-agents.md** - Core specification

Why: Documentation changes; skills that defer stay valid.

Don't hardcode:

❌ "Available models: sonnet, opus, haiku"
❌ "Tools include: Read, Write, Edit, Bash, Glob, Grep"
❌ Specific syntax that may change

Do reference:

✅ "See model-config documentation for current options"
✅ "Refer to tools documentation for current capabilities"
✅ "Fetch official specification for syntax details"

2. Two-Layer Approach

Layer 1: Official Specification

What the docs explicitly say
Required fields and syntax
Official examples
Mark with headings: ## X (Official Specification)

Layer 2: Best Practices

What the docs don't emphasize
Common gotchas and anti-patterns
Interpretive guidance
Mark with headings: ## X (Best Practices)

Example:

## Frontmatter Fields (Official Specification)

The `description` field is optional and defaults to first line.

## Description Field Design (Best Practices)

Always include `description` even though it's optional - improves discoverability and Claude's ability to use the SlashCommand tool.

3. Evidence-Based Recommendations

All claims must be:

✅ Grounded in official documentation, OR ✅ Clearly marked as opinionated best practices, OR ✅ Based on documented common pitfalls

Avoid: ❌ Presenting opinions as official requirements ❌ Making unsupported claims about "best practices" ❌ Prescribing patterns not in documentation without labeling them

Skill Structure (Best Practices)

Frontmatter

---
name: skill-name
description: What this skill teaches and when to use it
---

Name: kebab-case identifier Description: Clear triggering conditions for when the skill loads

Content Organization

# Skill Name

[Opening paragraph explaining purpose and value]

## Required Reading Before [Task]

Fetch these docs with WebFetch every time:
- [Official doc URLs with descriptions]

## Core Understanding

[Fundamental concepts, architecture, philosophy]

## [Topic] (Official Specification)

[What the official docs explicitly state]

## [Topic] (Best Practices)

[Interpretive guidance, gotchas, patterns]

## Decision Framework

[When to use X vs Y, with clear criteria]

## Common Pitfalls

[Anti-patterns with why they fail and better approaches]

## Quality Checklist

[Validation items before finalizing]

## Documentation References

[Authoritative sources to fetch]

Content Quality Standards

Be specific and actionable:

✅ "Run pytest -v and parse output for failures"
❌ "Run tests and check for problems"

Distinguish official from opinionated:

✅ "Official docs say 'description is optional.' Best practice: always include it."
❌ "description is required" (when it's not)

Use examples effectively:

Show before/after
Explain what makes the "after" better
Mark issues with ❌ and improvements with ✅

Progressive disclosure:

Start with core concepts
Build to advanced features
Don't overwhelm with details upfront

When to Create Skills

Skill vs Agent vs Command

Use a Skill when:

Multiple contexts need the same knowledge
Substantial procedural expertise (not just 2-3 bullet points)
Progressive disclosure would save tokens
Teaching "how to think about" something
You want automatic loading when topics arise

Examples:

agent-design - Teaches how to design agents
api-documentation-standards - Formatting rules across projects
security-practices - Principles that apply broadly

Use an Agent when:

Need isolated context for complex work
Want autonomous delegation
Doing actual work (writing files, running tests)
Task-oriented, not knowledge-oriented

Examples:

test-runner - Executes tests and analyzes failures
code-reviewer - Performs analysis and provides feedback

Use a Command when:

User wants explicit trigger
Simple, deterministic operation
One-off action

Examples:

/deploy - User-triggered deployment
/create-component - File generation

Scope Guidelines (Best Practices)

Good skill scope:

Focused on single domain (API design, testing, security)
Self-contained knowledge
Clear boundaries
Composable with other skills

Bad skill scope:

"Everything about development" (too broad)
Overlaps heavily with another skill
Just 3-4 bullet points (put in CLAUDE.md instead)
Project-specific details (put in CLAUDE.md instead)

Common Pitfalls

Pitfall #1: Duplicating Official Documentation

Problem: Skill becomes outdated copy of docs

## Available Models

The following models are available:
- claude-sonnet-4-5-20250929
- claude-opus-4-20250514
- claude-3-5-haiku-20241022

Why it fails: Model names change, skill becomes outdated

Better:

## Model Selection

Fetch current model options from:
https://code.claude.com/docs/en/model-config.md

**Best practice:** Use haiku for simple tasks, sonnet for balanced work, opus for complex reasoning.

Pitfall #2: Hardcoding Version-Specific Details

Problem: Skill includes specifics that change

## Tool Permissions

Grant these tools to your agent:
- Read (for reading files)
- Write (for writing files)
- Edit (for editing files)

Why it fails: Tool list may expand, descriptions may change

Better:

## Tool Selection Philosophy

**Match tools to autonomous responsibilities:**

- If agent analyzes only → Read, Grep, Glob
- If agent writes code → Add Write, Edit
- If agent runs commands → Add Bash

Fetch current tool list from:
https://code.claude.com/docs/en/settings#tools-available-to-claude

Pitfall #3: Presenting Opinions as Official Requirements

Problem: Blurs the line between specs and best practices

## Agent Description Field

The description field MUST use strong directive language like "ALWAYS use when" to ensure proper delegation.

Why it fails: Official docs don't require this; it's a best practice opinion

Better:

## Description Field Design (Best Practices)

Official requirement: "Natural language explanation of when to invoke"

**Best practice:** Use specific triggering conditions and directive language to improve autonomous delegation. While not required, this pattern increases the likelihood of proper agent invocation.

Pitfall #4: Kitchen Sink Skills

Problem: One skill tries to cover everything

# Full-Stack Development Skill

This skill covers:
- Frontend frameworks (React, Vue, Angular)
- Backend APIs (Node, Python, Go, Rust)
- Databases (SQL, NoSQL)
- DevOps (Docker, K8s, CI/CD)
- Security best practices
- Testing strategies
...

Why it fails: Too broad, overwhelming, hard to maintain, loads unnecessarily

Better: Split into focused skills:

frontend-architecture
api-design
testing-strategy

Pitfall #5: No Clear Triggering Conditions

Problem: Description doesn't indicate when skill should load

---
name: api-standards
description: API documentation standards
---

Why it fails: Unclear when this skill is relevant

Better:

---
name: api-standards
description: Guidelines for designing and documenting REST APIs following team standards. Use when creating endpoints, reviewing API code, or writing API documentation.
---

Pitfall #6: Documenting Claude's Base Knowledge

Problem: Skill includes comprehensive documentation of tools and workflows Claude already knows from training, creating token waste and maintenance burden without adding value.

Bad example (hypothetical 480-line git-workflow skill):

---
name: git-workflow
description: Comprehensive git usage guide
---

# Git Workflow Skill

## Common Git Operations

**Checking Repository Status:**

```bash
git status  # Shows staged, unstaged, and untracked files

See detailed diff:

git diff  # Unstaged changes
git diff --staged  # Staged changes
git diff HEAD  # All changes

Commit Workflow:

# 1. Review changes
git status
git diff

# 2. Stage changes
git add .

# 3. Commit with message
git commit -m "fix: correct validation logic"

Branch Operations:

git checkout -b feature-name  # Create and switch
git switch main  # Switch to main
git branch  # List branches

[... 400 more lines documenting standard git commands, branching workflows, merge strategies, rebase operations, standard commit message formats, general best practices ...]


**Why it fails:**

- Claude already knows all standard git commands from training
- 95% of content is redundant with base knowledge
- Wastes tokens loading information Claude doesn't need
- Creates maintenance burden (skill needs updates when nothing actually changed)
- Obscures the 5% that's actually valuable (user-specific preferences)
- No behavioral change - Claude would do the same without this skill

**Better (focused 80-line version documenting only the delta):**

```markdown
---
name: git-workflow
description: User-specific git workflow preferences and edge case handling. Use when creating commits or handling pre-commit hook failures.
---

# Git Workflow Skill

This skill documents workflow preferences and edge cases specific to this user. For standard git knowledge, Claude relies on base training.

## Commit Message Requirements (User Preference)

**This user requires:**

- Terse, single-line format (max ~200 characters)
- No emojis or decorative elements
- **No attribution text** (no "Co-Authored-By:", no "Generated with Claude Code")

**Format pattern:** `<type>: <brief specific description>`

**Examples:**

fix: prevent race condition in session cleanup add: rate limiting middleware


**Avoid:**

❌ ✨ add: new feature (emoji) ❌ fix: thing

🤖 Generated with Claude Code (attribution this user doesn't want)


## Pre-Commit Hook Edge Case (Critical)

**Problem:** Pre-commit hooks modify files during commit, causing failure.

**Workflow:**

1. Attempt: `git commit -m "message"`
2. Hook modifies files (auto-format)
3. Commit FAILS (working directory changed)
4. Stage modifications: `git add .`
5. Retry ONCE: `git commit --amend --no-edit`

**Critical:** Only retry ONCE to avoid infinite loops.

## Quality Checklist

- ✓ Message is terse, single-line, no emojis, no attribution
- ✓ No secrets in staged files
- ✓ Prepared for potential hook retry

Key improvements:

✅ Went from 480 lines → 80 lines (83% reduction)
✅ Removed all standard git knowledge Claude already has
✅ Kept only user-specific preferences (commit format, no attribution)
✅ Kept only edge cases Claude would miss (pre-commit hook retry logic)
✅ 100% of content is valuable delta knowledge
✅ Skill actually changes Claude's behavior (would get these things wrong without it)

The delta principle: Skills should only contain knowledge that bridges the gap between what Claude knows and what Claude needs to know for this specific context.

Skill Quality Checklist

Before finalizing a skill:

Structure (based on successful patterns):

✓ Proper frontmatter with name and description
✓ Clear description indicating when skill loads
✓ Filename is SKILL.md (uppercase, not skill.md)
✓ Located in skills/[skill-name]/SKILL.md subdirectory
✓ Single H1 heading matching skill name
✓ Organized with clear H2/H3 hierarchy

Content quality:

✓ Includes "Required Reading" section with official doc URLs
✓ Distinguishes official specs from best practices
✓ Avoids hardcoding version-specific details
✓ Uses examples effectively (before/after, ❌/✅)
✓ Provides decision frameworks
✓ Includes common pitfalls section
✓ Has validation checklist
✓ Cites authoritative sources

Philosophy alignment:

✓ Defers to official docs via WebFetch instructions
✓ Two-layer approach (specs + guidance)
✓ Evidence-based recommendations
✓ Focused scope, not kitchen sink
✓ Interpretive, not duplicative
✓ Progressive disclosure structure

Example: High-Quality Skill Design

Before (hypothetical low-quality skill):

---
name: testing
description: Testing stuff
---

# Testing

Use pytest for Python testing.
Use jest for JavaScript testing.

Make sure to write good tests.

Issues:

❌ Vague description ("testing stuff")
❌ No structure or organization
❌ No official documentation references
❌ Hardcodes specific tools without context
❌ "Write good tests" is not actionable
❌ No decision framework or examples
❌ Too brief to warrant a skill (put in CLAUDE.md)

After (applying skill-design principles):

---
name: testing-strategy
description: Interpretive guidance for test-driven development, test design, and testing workflows. Use when writing tests, reviewing test coverage, or designing testing strategies.
---

# Testing Strategy Skill

This skill provides guidance for effective testing across languages and frameworks.

## Required Reading

Fetch language/framework-specific testing docs:

- **Python/pytest**: https://docs.pytest.org/
- **JavaScript/Jest**: https://jestjs.io/docs/getting-started
- **Go**: https://go.dev/doc/tutorial/add-a-test

## Core Testing Philosophy (Best Practices)

**The Testing Pyramid:**

- Many unit tests (fast, isolated, specific)
- Fewer integration tests (moderate speed, component interaction)
- Few end-to-end tests (slow, full system, critical paths)

**Why:** Balance coverage, speed, and maintenance burden.

## Test Design Principles (Best Practices)

**Arrange-Act-Assert pattern:**

```python
def test_user_registration():
    # Arrange: Set up test data
    user_data = {"email": "test@example.com", "password": "secure123"}

    # Act: Perform the action
    result = register_user(user_data)

    # Assert: Verify outcomes
    assert result.success is True
    assert result.user.email == "test@example.com"

Benefits:

Clear test structure
Easy to understand intent
Maintainable

When to Mock (Best Practices)

Mock when:

External services (APIs, databases)
Time-dependent operations
File system operations
Random/non-deterministic behavior

Don't mock when:

Testing integration between your components
Pure functions with no dependencies
Simple data transformations

Common Pitfalls

Pitfall #1: Testing Implementation Details

Problem: Tests break when refactoring even though behavior unchanged

# Bad: Tests internal structure
def test_user_service():
    service = UserService()
    assert service._internal_cache is not None  # Implementation detail

Better: Test behavior, not structure

# Good: Tests observable behavior
def test_user_service_caches_results():
    service = UserService()
    user1 = service.get_user(123)
    user2 = service.get_user(123)
    assert user1 is user2  # Behavior: caching works

Quality Checklist

✓ Test names clearly describe what's being tested
✓ One assertion concept per test
✓ Tests are independent (can run in any order)
✓ Mocks used appropriately (external dependencies only)
✓ Test data is representative
✓ Edge cases covered
✓ Fast execution (< 1s for unit tests)

Documentation References

Fetch framework-specific docs for syntax
Testing philosophies: https://martinfowler.com/articles/practical-test-pyramid.html


**Improvements:**

- ✅ Specific, actionable description
- ✅ Clear structure with progressive disclosure
- ✅ Defers to official docs for syntax
- ✅ Provides interpretive guidance (when to mock, testing pyramid)
- ✅ Concrete examples with explanations
- ✅ Common pitfalls with before/after
- ✅ Validation checklist
- ✅ Substantial enough to warrant a skill

## Creating Skills for Different Purposes

### Interpretive Guidance Skills (Like Box Factory's Design Skills)

**Purpose:** Help Claude understand how to apply official documentation

**Structure:**
- Fetch official docs first
- Explain what docs mean in practice
- Provide decision frameworks
- Include common gotchas
- Show anti-patterns

**Example:** `agent-design`, `slash-command-design`, `plugin-design`

### Domain Knowledge Skills

**Purpose:** Provide reusable expertise across projects

**Structure:**
- Define principles and philosophy
- Provide decision frameworks
- Include practical examples
- Show common patterns
- Reference external authoritative sources

**Example:** `api-standards`, `security-practices`, `testing-strategy`

### Procedural Skills

**Purpose:** Guide multi-step workflows

**Structure:**
- Clear step-by-step process
- Decision points and branching
- Success criteria
- Common failure modes
- Recovery strategies

**Example:** `deployment-workflow`, `incident-response`, `code-review-checklist`

## File Structure

Skills live in subdirectories within `skills/`:

plugin-name/ ├── skills/ │ ├── skill-one/ │ │ ├── SKILL.md # Required: Skill content │ │ └── helper.py # Optional: Supporting files │ └── skill-two/ │ └── SKILL.md


**Critical:** Filename must be `SKILL.md` (uppercase), not `skill.md`, `Skill.md`, or `skill.MD`

## Testing Skills

**How to verify a skill works:**

1. Use the Skill tool to invoke it
2. Check if it loads in appropriate contexts
3. Verify the guidance is helpful and accurate
4. Test that official doc references are current
5. Ensure examples run as shown

**Signs of a good skill:**

- Claude provides better answers in the skill's domain
- Reduces need to repeat context
- Catches common mistakes proactively
- Loads automatically when relevant

## Documentation References

Skills are part of the agent system but lightweight:

**Official documentation:**

- https://code.claude.com/docs/en/sub-agents.md - Mentions skills briefly

**Examples of excellent skills:**

- Examine Box Factory's design skills (agent-design, slash-command-design, plugin-design, hooks-design) for patterns
- Look for skills in well-maintained plugin marketplaces

**Philosophy resources:**

- Progressive disclosure principles
- Token-efficient context management
- Knowledge organization patterns

**Remember:** This meta-skill itself follows the principles it teaches - it defers to official docs, distinguishes specs from best practices, and provides interpretive guidance rather than duplication. This is the Box Factory way.

25 KiB Raw Blame History

Skill Design Skill

Required Reading Before Creating Skills

Core Understanding

Skills Are Progressive Knowledge Disclosure

Skills vs System Prompts vs CLAUDE.md

Knowledge Delta Filter (Critical Understanding)

The Box Factory Philosophy for Skills

1. Low-Maintenance by Design

2. Two-Layer Approach

3. Evidence-Based Recommendations

Skill Structure (Best Practices)

Frontmatter

Content Organization

Content Quality Standards

When to Create Skills

Skill vs Agent vs Command

Scope Guidelines (Best Practices)

Common Pitfalls

Pitfall #1: Duplicating Official Documentation

Pitfall #2: Hardcoding Version-Specific Details

Pitfall #3: Presenting Opinions as Official Requirements

Pitfall #4: Kitchen Sink Skills

Pitfall #5: No Clear Triggering Conditions

Pitfall #6: Documenting Claude's Base Knowledge

Skill Quality Checklist

Example: High-Quality Skill Design

When to Mock (Best Practices)

Common Pitfalls

Pitfall #1: Testing Implementation Details

Quality Checklist

Documentation References

25 KiB

Raw Blame History