18 KiB
name, description
| name | description |
|---|---|
| writing-skills | Use when creating new skills, editing existing skills, or verifying skills work - applies TDD to documentation by testing with subagents before writing |
<skill_overview> Writing skills IS test-driven development applied to process documentation; write test (pressure scenario), watch fail (baseline), write skill, watch pass, refactor (close loopholes). </skill_overview>
<rigidity_level> LOW FREEDOM - Follow the RED-GREEN-REFACTOR cycle exactly when creating skills. No skill without failing test first. Same Iron Law as TDD. </rigidity_level>
<quick_reference>
| Phase | Action | Verify |
|---|---|---|
| RED | Create pressure scenarios | Document baseline failures |
| RED | Run WITHOUT skill | Agent violates rule |
| GREEN | Write minimal skill | Addresses baseline failures |
| GREEN | Run WITH skill | Agent now complies |
| REFACTOR | Find new rationalizations | Agent still complies |
| REFACTOR | Add explicit counters | Bulletproof against excuses |
| DEPLOY | Commit and optionally PR | Skill ready for use |
Iron Law: NO SKILL WITHOUT FAILING TEST FIRST (applies to new skills AND edits) </quick_reference>
<when_to_use> Create skill when:
- Technique wasn't intuitively obvious to you
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit from this knowledge
Never create for:
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md instead)
Edit existing skill when:
- Found new rationalization agents use
- Discovered loophole in current guidance
- Need to add clarifying examples
ALWAYS test before writing or editing. No exceptions. </when_to_use>
<tdd_mapping> Skills use the exact same TDD cycle as code:
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
REQUIRED BACKGROUND: You MUST understand hyperpowers:test-driven-development before using this skill. </tdd_mapping>
<the_process>
1. RED Phase - Create Failing Test
Create pressure scenarios for subagent:
Task tool with general-purpose agent:
"You are implementing a payment processing feature. User requirements:
- Process credit card payments
- Handle retries on failure
- Log all transactions
[PRESSURE 1: Time] You have 10 minutes before deployment.
[PRESSURE 2: Sunk Cost] You've already written 200 lines of code.
[PRESSURE 3: Authority] Senior engineer said 'just make it work, tests can wait.'
Implement this feature."
Run WITHOUT skill present.
Document baseline behavior:
- Exact rationalizations agent uses ("tests can wait," "simple feature," etc.)
- What agent skips (tests, verification, bd task, etc.)
- Patterns in failure modes
Example baseline result:
Agent response:
"I'll implement the payment processing quickly since time is tight..."
[Skips TDD]
[Skips verification-before-completion]
[Claims done without evidence]
This is your failing test. Agent doesn't follow the workflow without guidance.
2. GREEN Phase - Write Minimal Skill
Write skill that addresses the SPECIFIC failures from baseline:
Structure:
---
name: skill-name-with-hyphens
description: Use when [specific triggers] - [what skill does]
---
<skill_overview>
One sentence core principle
</skill_overview>
<rigidity_level>
LOW | MEDIUM | HIGH FREEDOM - [What this means]
</rigidity_level>
[Rest of standard XML structure]
Frontmatter rules:
- Only
nameanddescriptionfields (max 1024 chars total) - Name: letters, numbers, hyphens only (no parentheses/special chars)
- Description: Start with "Use when...", third person, includes triggers
Description format:
# ❌ BAD: Too abstract, first person
description: I can help with async tests when they're flaky
# ✅ GOOD: Starts with "Use when", describes problem
description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
Write skill addressing baseline failures:
- Add explicit counters for rationalizations ("tests can wait" → "NO EXCEPTIONS: tests first")
- Create quick reference table for scanning
- Add concrete examples showing failure modes
- Use XML structure for all sections
Run WITH skill present.
Verify agent now complies:
- Same pressure scenario
- Agent now follows workflow
- No rationalizations from baseline appear
This is your passing test.
3. REFACTOR Phase - Close Loopholes
Find NEW rationalizations:
Run skill with DIFFERENT pressures:
- Combine 3+ pressures (time + sunk cost + exhaustion)
- Try meta-rationalizations ("this skill doesn't apply because...")
- Test with edge cases
Document new failures:
- What rationalizations appear NOW?
- What loopholes did agent find?
- What explicit counters are needed?
Add counters to skill:
<critical_rules>
## Common Excuses
All of these mean: [Action to take]
- "Test can wait" (NO, test first always)
- "Simple feature" (Simple breaks too, test first)
- "Time pressure" (Broken code wastes more time)
[Add ALL rationalizations found during testing]
</critical_rules>
Re-test until bulletproof:
- Run scenarios again
- Verify new counters work
- Agent complies even under combined pressures
4. Quality Checks
Before deployment, verify:
- Has
<quick_reference>section (scannable table) - Has
<rigidity_level>explicit - Has 2-3
<example>tags showing failure modes - Description <500 chars, starts with "Use when..."
- Keywords throughout for search (error messages, symptoms, tools)
- One excellent code example (not multi-language)
- Supporting files only for tools or heavy reference (>100 lines)
Token efficiency:
- Frequently-loaded skills: <200 words ideally
- Other skills: <500 words
- Move heavy content to resources/ files
5. Deploy
Commit to git:
git add skills/skill-name/
git commit -m "feat: add [skill-name] skill
Tested with subagents under [pressures used].
Addresses [baseline failures found].
Closes rationalizations:
- [Rationalization 1]
- [Rationalization 2]"
Personal skills: Write to ~/.claude/skills/ for cross-project use
Plugin skills: PR to plugin repository if broadly useful
STOP: Before moving to next skill, complete this entire process. No batching untested skills. </the_process>
Developer writes skill without testing first
# Developer writes skill:
"---
name: always-use-tdd
description: Always write tests first
---
Write tests first. No exceptions."
Then tries to deploy it
<why_it_fails>
- No baseline behavior documented (don't know what agent does WITHOUT skill)
- No verification skill actually works (might not address real rationalizations)
- Generic guidance ("no exceptions") without specific counters
- Will likely miss common excuses agents use
- Violates Iron Law: no skill without failing test first </why_it_fails>
RED Phase:
- Create pressure scenario (time + sunk cost)
- Run WITHOUT skill
- Document baseline: Agent says "I'll test after since time is tight"
GREEN Phase:
- Write skill with explicit counter to that rationalization
- Add: "Common excuses: 'Time is tight' → Wrong. Broken code wastes more time. Write test first."
- Run WITH skill → agent now writes test first
REFACTOR Phase:
- Try new pressure (exhaustion: "this is the 5th feature today")
- Agent finds loophole: "these are all similar, I can skip tests"
- Add counter: "Similar ≠ identical. Write test for each."
- Re-test → bulletproof
What you gain:
- Know skill addresses real failures (saw baseline)
- Confident skill works (saw it fix behavior)
- Closed all loopholes (tested multiple pressures)
- Ready for production use
# Existing skill works well
# Developer thinks: "I'll just add this section about edge cases"
[Adds 50 lines to skill]
Commits without testing
<why_it_fails>
- Don't know if new section actually helps (no baseline)
- Might introduce contradictions with existing guidance
- Could make skill less effective (more verbose, less clear)
- Violates Iron Law: applies to edits too
- Changes might not address actual rationalization patterns </why_it_fails>
RED Phase (for edit):
- Identify specific failure mode you want to address
- Create pressure scenario that triggers it
- Run WITH current skill → document how agent fails
GREEN Phase (edit):
- Add ONLY content addressing that failure
- Keep changes minimal
- Run WITH edited skill → verify agent now complies
REFACTOR Phase:
- Check edit didn't break existing scenarios
- Run previous test cases
- Verify all still pass
What you gain:
- Changes address real problems (saw failure)
- Know edit helps (saw improvement)
- Didn't break existing guidance (regression tested)
- Skill stays bulletproof
---
name: async-testing
description: For testing async code
---
Skill content...
<why_it_fails>
- Future Claude won't find this when needed
- "For testing async code" too abstract (when would Claude search this?)
- Doesn't describe symptoms or triggers
- Missing keywords like "flaky," "race condition," "timeout"
- Won't show up when agent has the actual problem </why_it_fails>
---
name: condition-based-waiting
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
---
Why this works:
- Starts with "Use when" (triggers)
- Lists symptoms: "race conditions," "pass/fail inconsistently"
- Describes problem AND solution
- Keywords: "race conditions," "timing," "inconsistent," "timeouts"
- Future Claude searching "why are my tests flaky" will find this
What you gain:
- Skill actually gets found when needed
- Claude knows when to use it (clear triggers)
- Search terms match real developer language
- Description doubles as activation criteria
<skill_types>
Technique
Concrete method with steps to follow.
Examples: condition-based-waiting, hyperpowers:root-cause-tracing
Test approach: Pressure scenarios with combined pressures
Pattern
Way of thinking about problems.
Examples: flatten-with-flags, test-invariants
Test approach: Present problems the pattern solves, verify agent applies pattern
Reference
API docs, syntax guides, tool documentation.
Examples: Office document manipulation, API reference guides
Test approach: Give task requiring reference, verify agent uses it correctly
For detailed testing methodology by skill type: See resources/testing-methodology.md </skill_types>
<file_organization>
Self-Contained Skill
defense-in-depth/
SKILL.md # Everything inline
When: All content fits, no heavy reference needed
Skill with Reusable Tool
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adapt
When: Tool is reusable code, not just narrative
Skill with Heavy Reference
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable tools
When: Reference material too large for inline (>100 lines)
Keep inline:
- Principles and concepts
- Code patterns (<50 lines)
- Everything that fits </file_organization>
<search_optimization>
Claude Search Optimization (CSO)
Future Claude needs to FIND your skill. Optimize for search.
1. Rich Description Field
Format: Start with "Use when..." + triggers + what it does
# ❌ BAD: Too abstract
description: For async testing
# ❌ BAD: First person
description: I can help you with async tests
# ✅ GOOD: Triggers + problem + solution
description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling
2. Keyword Coverage
Use words Claude would search for:
- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
- Symptoms: "flaky", "hanging", "zombie", "pollution"
- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
- Tools: Actual commands, library names, file types
3. Token Efficiency
Problem: Frequently-referenced skills load into EVERY conversation.
Target word counts:
- Frequently-loaded: <200 words
- Other skills: <500 words
Techniques:
- Move details to tool --help
- Use cross-references to other skills
- Compress examples
- Eliminate redundancy
Verification:
wc -w skills/skill-name/SKILL.md
4. Cross-Referencing
Use skill name only, with explicit markers:
**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development
**REQUIRED SUB-SKILL:** Use hyperpowers:debugging-with-tools first
Don't use @ links: Force-loads files immediately, burns context unnecessarily. </search_optimization>
<critical_rules>
Rules That Have No Exceptions
- NO SKILL WITHOUT FAILING TEST FIRST → Applies to new skills AND edits
- Test with subagents under pressure → Combined pressures (time + sunk cost + authority)
- Document baseline behavior → Exact rationalizations, not paraphrases
- Write minimal skill addressing baseline → Don't add content not validated by testing
- STOP before next skill → Complete RED-GREEN-REFACTOR-DEPLOY for each skill
Common Excuses
All of these mean: STOP. Run baseline test first.
- "Simple skill, don't need testing" (If simple, testing is fast. Do it.)
- "Just adding documentation" (Documentation can be wrong. Test it.)
- "I'll test after I write a few" (Batching untested = deploying untested code)
- "This is obvious, everyone knows it" (Then baseline will show agent already complies)
- "Testing is overkill for skills" (TDD applies to documentation too)
- "I'll adapt while testing" (Violates RED phase. Start over.)
- "I'll keep untested as reference" (Delete means delete. No exceptions.)
The Iron Law
Same as TDD:
NO SKILL WITHOUT FAILING TEST FIRST
No exceptions for:
- "Simple additions"
- "Just adding a section"
- "Documentation updates"
- Edits to existing skills
Write skill before testing? Delete it. Start over. </critical_rules>
<verification_checklist> Before deploying ANY skill:
RED Phase:
- Created pressure scenarios (3+ combined pressures for discipline skills)
- Ran WITHOUT skill present
- Documented baseline behavior verbatim (exact rationalizations)
- Identified patterns in failures
GREEN Phase:
- Name uses only letters, numbers, hyphens
- YAML frontmatter: name + description only (max 1024 chars)
- Description starts with "Use when..." and includes triggers
- Description in third person
- Has
<quick_reference>section - Has
<rigidity_level>explicit - Has 2-3
<example>tags - Addresses specific baseline failures
- Ran WITH skill present
- Verified agent now complies
REFACTOR Phase:
- Tested with different pressures
- Found NEW rationalizations
- Added explicit counters
- Re-tested until bulletproof
Quality:
- Keywords throughout for search
- One excellent code example (not multi-language)
- Token-efficient (check word count)
- Supporting files only if needed
Deploy:
- Committed to git with descriptive message
- Pushed to plugin repository (if applicable)
Can't check all boxes? Return to process and fix. </verification_checklist>
**This skill requires:** - hyperpowers:test-driven-development (understand TDD before applying to docs) - Task tool (for running subagent tests)This skill is called by:
- Anyone creating or editing skills
- Plugin maintainers
- Users with personal skill repositories
Agents used:
- general-purpose (for testing skills under pressure)
When stuck:
- Skill seems too simple to test → If simple, testing is fast. Do it anyway.
- Don't know what pressures to use → Time + sunk cost + authority always work
- Agent still rationalizes → Add explicit counter for that exact excuse
- Testing feels like overhead → Same as TDD: testing prevents bigger problems