Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:06:38 +08:00
commit ed3e4c84c3
76 changed files with 20449 additions and 0 deletions

View File

@@ -0,0 +1,529 @@
---
name: brainstorming
description: Use when creating or developing anything, before writing code - refines rough ideas into bd epics with immutable requirements
---
<skill_overview>
Turn rough ideas into validated designs stored as bd epics with immutable requirements; tasks created iteratively as you learn, not upfront.
</skill_overview>
<rigidity_level>
HIGH FREEDOM - Adapt Socratic questioning to context, but always create immutable epic before code and only create first task (not full tree).
</rigidity_level>
<quick_reference>
| Step | Action | Deliverable |
|------|--------|-------------|
| 1 | Ask questions (one at a time) | Understanding of requirements |
| 2 | Research (agents for codebase/internet) | Existing patterns and approaches |
| 3 | Propose 2-3 approaches with trade-offs | Recommended option |
| 4 | Present design in sections (200-300 words) | Validated architecture |
| 5 | Create bd epic with IMMUTABLE requirements | Epic with anti-patterns |
| 6 | Create ONLY first task | Ready for executing-plans |
| 7 | Hand off to executing-plans | Iterative implementation begins |
**Key:** Epic = contract (immutable), Tasks = adaptive (created as you learn)
</quick_reference>
<when_to_use>
- User describes new feature to implement
- User has rough idea that needs refinement
- About to write code without clear requirements
- Need to explore approaches before committing
- Requirements exist but architecture unclear
**Don't use for:**
- Executing existing plans (use hyperpowers:executing-plans)
- Fixing bugs (use hyperpowers:fixing-bugs)
- Refactoring (use hyperpowers:refactoring-safely)
- Requirements already crystal clear and epic exists
</when_to_use>
<the_process>
## 1. Understanding the Idea
**Announce:** "I'm using the brainstorming skill to refine your idea into a design."
**Check current state:**
- Recent commits, existing docs, codebase structure
- Dispatch `hyperpowers:codebase-investigator` for existing patterns
- Dispatch `hyperpowers:internet-researcher` for external APIs/libraries
**REQUIRED: Use AskUserQuestion tool for all questions**
- One question at a time (don't batch multiple questions)
- Prefer multiple choice options (easier to answer)
- Wait for response before asking next question
- Focus on: purpose, constraints, success criteria
- Gather enough context to propose approaches
**Do NOT just print questions and wait for "yes"** - use the AskUserQuestion tool.
**Example questions:**
- "What problem does this solve for users?"
- "Are there existing implementations we should follow?"
- "What's the most important success criterion?"
- "Token storage: cookies, localStorage, or sessionStorage?"
---
## 2. Exploring Approaches
**Research first:**
- Similar feature exists → dispatch codebase-investigator
- New integration → dispatch internet-researcher
- Review findings before proposing
**Propose 2-3 approaches with trade-offs:**
```
Based on [research findings], I recommend:
1. **[Approach A]** (recommended)
- Pros: [benefits, especially "matches existing pattern"]
- Cons: [drawbacks]
2. **[Approach B]**
- Pros: [benefits]
- Cons: [drawbacks]
3. **[Approach C]**
- Pros: [benefits]
- Cons: [drawbacks]
I recommend option 1 because [specific reason, especially codebase consistency].
```
**Lead with recommended option and explain why.**
---
## 3. Presenting the Design
**Once approach is chosen, present design in sections:**
- Break into 200-300 word chunks
- Ask after each: "Does this look right so far?"
- Cover: architecture, components, data flow, error handling, testing
- Be ready to go back and clarify
**Show research findings:**
- "Based on codebase investigation: auth/ uses passport.js..."
- "API docs show OAuth flow requires..."
- Demonstrate how design builds on existing code
---
## 4. Creating the bd Epic
**After design validated, create epic as immutable contract:**
```bash
bd create "Feature: [Feature Name]" \
--type epic \
--priority [0-4] \
--design "## Requirements (IMMUTABLE)
[What MUST be true when complete - specific, testable]
- Requirement 1: [concrete requirement]
- Requirement 2: [concrete requirement]
- Requirement 3: [concrete requirement]
## Success Criteria (MUST ALL BE TRUE)
- [ ] Criterion 1 (objective, testable - e.g., 'Integration tests pass')
- [ ] Criterion 2 (objective, testable - e.g., 'Works with existing User model')
- [ ] All tests passing
- [ ] Pre-commit hooks passing
## Anti-Patterns (FORBIDDEN)
- ❌ [Specific shortcut that violates requirements]
- ❌ [Rationalization to prevent - e.g., 'NO mocking core behavior']
- ❌ [Pattern to avoid - e.g., 'NO localStorage for tokens']
## Approach
[2-3 paragraph summary of chosen approach]
## Architecture
[Key components, data flow, integration points]
## Context
[Links to similar implementations: file.ts:123]
[External docs consulted]
[Agent research findings]"
```
**Critical:** Anti-patterns section prevents watering down requirements when blockers occur.
**Example anti-patterns:**
- ❌ NO localStorage tokens (violates httpOnly security requirement)
- ❌ NO new user model (must integrate with existing db/models/user.ts)
- ❌ NO mocking OAuth in integration tests (defeats validation)
- ❌ NO TODO stubs for core authentication flow
---
## 5. Creating ONLY First Task
**Create one task, not full tree:**
```bash
bd create "Task 1: [Specific Deliverable]" \
--type feature \
--priority [match-epic] \
--design "## Goal
[What this task delivers - one clear outcome]
## Implementation
[Detailed step-by-step for this task]
1. Study existing code
[Point to 2-3 similar implementations: file.ts:line]
2. Write tests first (TDD)
[Specific test cases for this task]
3. Implementation checklist
- [ ] file.ts:line - function_name() - [exactly what it does]
- [ ] test.ts:line - test_name() - [what scenario it tests]
## Success Criteria
- [ ] [Specific, measurable outcome]
- [ ] Tests passing
- [ ] Pre-commit hooks passing"
bd dep add bd-2 bd-1 --type parent-child # Link to epic
```
**Why only one task?**
- Subsequent tasks created iteratively by executing-plans
- Each task reflects learnings from previous
- Avoids brittle task trees that break when assumptions change
---
## 6. SRE Refinement and Handoff
After epic and first task created:
**REQUIRED: Run SRE refinement before handoff**
```
Use Skill tool: hyperpowers:sre-task-refinement
```
SRE refinement will:
- Apply 7-category corner-case analysis (Opus 4.1)
- Strengthen success criteria
- Identify edge cases and failure modes
- Ensure task is ready for implementation
**Do NOT skip SRE refinement.** The first task sets the pattern for the entire epic.
**After refinement approved, present handoff:**
```
"Epic bd-1 is ready with immutable requirements and success criteria.
First task bd-2 has been refined and is ready to execute.
Ready to start implementation? I'll use executing-plans to work through this iteratively.
The executing-plans skill will:
1. Execute the current task
2. Review what was learned against epic requirements
3. Create next task based on current reality
4. Run SRE refinement on new tasks
5. Repeat until all epic success criteria met
This approach avoids brittle upfront planning - each task adapts to what we learn."
```
</the_process>
<examples>
<example>
<scenario>Developer skips research, proposes approach without checking codebase</scenario>
<code>
User: "Add OAuth authentication"
Claude (without brainstorming):
"I'll implement OAuth with Auth0..."
[Proposes approach without checking if auth exists]
[Doesn't research existing patterns]
[Misses that passport.js already set up]
</code>
<why_it_fails>
- Proposes Auth0 when passport.js already exists in codebase
- Creates inconsistent architecture (two auth systems)
- Wastes time implementing when partial solution exists
- Doesn't leverage existing code
- User has to redirect to existing pattern
</why_it_fails>
<correction>
**Correct approach:**
1. **Research first:**
- Dispatch codebase-investigator: "Find existing auth implementation"
- Findings: passport.js at auth/passport-config.ts
- Dispatch internet-researcher: "Passport OAuth2 strategies"
2. **Propose approaches building on findings:**
```
Based on codebase showing passport.js at auth/passport-config.ts:
1. Extend existing passport setup (recommended)
- Add google-oauth20 strategy
- Matches codebase pattern
- Pros: Consistent, tested library
- Cons: Requires OAuth provider setup
2. Custom JWT implementation
- Pros: Full control
- Cons: Security complexity, breaks pattern
I recommend option 1 because it builds on existing auth/ setup.
```
**What you gain:**
- Leverages existing code (faster)
- Consistent architecture (maintainable)
- Research informs design (correct)
- User sees you understand codebase (trust)
</correction>
</example>
<example>
<scenario>Developer creates full task tree upfront</scenario>
<code>
bd create "Epic: Add OAuth"
bd create "Task 1: Configure OAuth provider"
bd create "Task 2: Implement token exchange"
bd create "Task 3: Add refresh token logic"
bd create "Task 4: Create middleware"
bd create "Task 5: Add UI components"
bd create "Task 6: Write integration tests"
# Starts implementing Task 1
# Discovers OAuth library handles refresh automatically
# Now Task 3 is wrong, needs deletion
# Discovers middleware already exists
# Now Task 4 is wrong
# Task tree brittle to reality
</code>
<why_it_fails>
- Assumptions about implementation prove wrong
- Task tree becomes incorrect as you learn
- Wastes time updating/deleting wrong tasks
- Rigid plan fights with reality
- Context switching between fixing plan and implementing
</why_it_fails>
<correction>
**Correct approach (iterative):**
```bash
bd create "Epic: Add OAuth" [with immutable requirements]
bd create "Task 1: Configure OAuth provider"
# Execute Task 1
# Learn: OAuth library handles refresh, middleware exists
bd create "Task 2: Integrate with existing middleware"
# [Created AFTER learning from Task 1]
# Execute Task 2
# Learn: UI needs OAuth button component
bd create "Task 3: Add OAuth button to login UI"
# [Created AFTER learning from Task 2]
```
**What you gain:**
- Tasks reflect current reality (accurate)
- No wasted time fixing wrong plans (efficient)
- Each task informed by previous learnings (adaptive)
- Plan evolves with understanding (flexible)
- Epic requirements stay immutable (contract preserved)
</correction>
</example>
<example>
<scenario>Epic created without anti-patterns section</scenario>
<code>
bd create "Epic: OAuth Authentication" --design "
## Requirements
- Users authenticate via Google OAuth2
- Tokens stored securely
- Session management
## Success Criteria
- [ ] Login flow works
- [ ] Tokens secured
- [ ] All tests pass
"
# During implementation, hits blocker:
# "Integration tests for OAuth are complex, I'll mock it..."
# [No anti-pattern preventing this]
# Ships with mocked OAuth (defeats validation)
</code>
<why_it_fails>
- No explicit forbidden patterns
- Agent rationalizes shortcuts when blocked
- "Tokens stored securely" too vague (localStorage? cookies?)
- Requirements can be "met" without meeting intent
- Mocking defeats the purpose of integration tests
</why_it_fails>
<correction>
**Correct approach with anti-patterns:**
```bash
bd create "Epic: OAuth Authentication" --design "
## Requirements (IMMUTABLE)
- Users authenticate via Google OAuth2
- Tokens stored in httpOnly cookies (NOT localStorage)
- Session expires after 24h inactivity
- Integrates with existing User model at db/models/user.ts
## Success Criteria
- [ ] Login redirects to Google and back
- [ ] Tokens in httpOnly cookies
- [ ] Token refresh works automatically
- [ ] Integration tests pass WITHOUT mocking OAuth
- [ ] All tests passing
## Anti-Patterns (FORBIDDEN)
- ❌ NO localStorage tokens (violates httpOnly requirement)
- ❌ NO new user model (must use existing)
- ❌ NO mocking OAuth in integration tests (defeats validation)
- ❌ NO skipping token refresh (explicit requirement)
"
```
**What you gain:**
- Requirements concrete and specific (testable)
- Forbidden patterns explicit (prevents shortcuts)
- Agent can't rationalize away requirements (contract enforced)
- Success criteria unambiguous (clear done state)
- Anti-patterns prevent "letter not spirit" compliance
</correction>
</example>
</examples>
<key_principles>
- **One question at a time** - Don't overwhelm
- **Multiple choice preferred** - Easier to answer when possible
- **Delegate research** - Use codebase-investigator and internet-researcher agents
- **YAGNI ruthlessly** - Remove unnecessary features from all designs
- **Explore alternatives** - Propose 2-3 approaches before settling
- **Incremental validation** - Present design in sections, validate each
- **Epic is contract** - Requirements immutable, tasks adapt
- **Anti-patterns prevent shortcuts** - Explicit forbidden patterns stop rationalization
- **One task only** - Subsequent tasks created iteratively (not upfront)
</key_principles>
<research_agents>
## Use codebase-investigator when:
- Understanding how existing features work
- Finding where specific functionality lives
- Identifying patterns to follow
- Verifying assumptions about structure
- Checking if feature already exists
## Use internet-researcher when:
- Finding current API documentation
- Researching library capabilities
- Comparing technology options
- Understanding community recommendations
- Finding official code examples
## Research protocol:
1. Codebase pattern exists → Use it (unless clearly unwise)
2. No codebase pattern → Research external patterns
3. Research yields nothing → Ask user for direction
</research_agents>
<critical_rules>
## Rules That Have No Exceptions
1. **Use AskUserQuestion tool** → Don't just print questions and wait
2. **Research BEFORE proposing** → Use agents to understand context
3. **Propose 2-3 approaches** → Don't jump to single solution
4. **Epic requirements IMMUTABLE** → Tasks adapt, requirements don't
5. **Include anti-patterns section** → Prevents watering down requirements
6. **Create ONLY first task** → Subsequent tasks created iteratively
7. **Run SRE refinement** → Before handoff to executing-plans
## Common Excuses
All of these mean: **STOP. Follow the process.**
- "Requirements obvious, don't need questions" (Questions reveal hidden complexity)
- "I know this pattern, don't need research" (Research might show better way)
- "Can plan all tasks upfront" (Plans become brittle, tasks adapt as you learn)
- "Anti-patterns section overkill" (Prevents rationalization under pressure)
- "Epic can evolve" (Requirements contract, tasks evolve)
- "Can just print questions" (Use AskUserQuestion tool - it's more interactive)
- "SRE refinement overkill for first task" (First task sets pattern for entire epic)
- "User said yes, design is done" (Still need SRE refinement before execution)
</critical_rules>
<verification_checklist>
Before handing off to executing-plans:
- [ ] Used AskUserQuestion tool for clarifying questions (one at a time)
- [ ] Researched codebase patterns (if applicable)
- [ ] Researched external docs/libraries (if applicable)
- [ ] Proposed 2-3 approaches with trade-offs
- [ ] Presented design in sections, validated each
- [ ] Created bd epic with all sections (requirements, success criteria, anti-patterns, approach, architecture)
- [ ] Requirements are IMMUTABLE and specific
- [ ] Anti-patterns section prevents shortcuts
- [ ] Created ONLY first task (not full tree)
- [ ] First task has detailed implementation checklist
- [ ] Ran SRE refinement on first task (hyperpowers:sre-task-refinement)
- [ ] Announced handoff to executing-plans after refinement approved
**Can't check all boxes?** Return to process and complete missing steps.
</verification_checklist>
<integration>
**This skill calls:**
- hyperpowers:codebase-investigator (for finding existing patterns)
- hyperpowers:internet-researcher (for external documentation)
- hyperpowers:sre-task-refinement (REQUIRED before handoff to executing-plans)
- hyperpowers:executing-plans (handoff after refinement approved)
**Call chain:**
```
brainstorming → sre-task-refinement → executing-plans
```
**This skill is called by:**
- hyperpowers:using-hyper (mandatory before writing code)
- User requests for new features
- Beginning of greenfield development
**Agents used:**
- codebase-investigator (understand existing code)
- internet-researcher (find external documentation)
**Tools required:**
- AskUserQuestion (for all clarifying questions)
</integration>
<resources>
**Detailed guides:**
- [bd epic template examples](resources/epic-templates.md)
- [Socratic questioning patterns](resources/questioning-patterns.md)
- [Anti-pattern examples by domain](resources/anti-patterns.md)
**When stuck:**
- User gives vague answer → Ask follow-up multiple choice question
- Research yields nothing → Ask user for direction explicitly
- Too many approaches → Narrow to top 2-3, explain why others eliminated
- User changes requirements mid-design → Acknowledge, return to understanding phase
</resources>

View File

@@ -0,0 +1,609 @@
---
name: building-hooks
description: Use when creating Claude Code hooks - covers hook patterns, composition, testing, progressive enhancement from simple to advanced
---
<skill_overview>
Hooks encode business rules at application level; start with observation, add automation, enforce only when patterns clear.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Follow progressive enhancement (observe → automate → enforce) strictly. Hook patterns are adaptable, but always start non-blocking and test thoroughly.
</rigidity_level>
<quick_reference>
| Phase | Approach | Example |
|-------|----------|---------|
| 1. Observe | Non-blocking, report only | Log edits, display reminders |
| 2. Automate | Background tasks, non-blocking | Auto-format, run builds |
| 3. Enforce | Blocking only when necessary | Block dangerous ops, require fixes |
**Most used events:** UserPromptSubmit (before processing), Stop (after completion)
**Critical:** Start Phase 1, observe for a week, then Phase 2. Only add Phase 3 if absolutely necessary.
</quick_reference>
<when_to_use>
Use hooks for:
- Automatic quality checks (build, lint, format)
- Workflow automation (skill activation, context injection)
- Error prevention (catching issues early)
- Consistent behavior (formatting, conventions)
**Never use hooks for:**
- Complex business logic (use tools/scripts)
- Slow operations that block workflow (use background jobs)
- Anything requiring LLM reasoning (hooks are deterministic)
</when_to_use>
<hook_lifecycle_events>
| Event | When Fires | Use Cases |
|-------|------------|-----------|
| UserPromptSubmit | Before Claude processes prompt | Validation, context injection, skill activation |
| Stop | After Claude finishes | Build checks, formatting, quality reminders |
| PostToolUse | After each tool execution | Logging, tracking, validation |
| PreToolUse | Before tool execution | Permission checks, validation |
| ToolError | When tool fails | Error handling, fallbacks |
| SessionStart | New session begins | Environment setup, context loading |
| SessionEnd | Session closes | Cleanup, logging |
| Error | Unhandled error | Error recovery, notifications |
</hook_lifecycle_events>
<progressive_enhancement>
## Phase 1: Observation (Non-Blocking)
**Goal:** Understand patterns before acting
**Examples:**
- Log file edits (PostToolUse)
- Display reminders (Stop, non-blocking)
- Track metrics
**Duration:** Observe for 1 week minimum
---
## Phase 2: Automation (Background)
**Goal:** Automate tedious tasks
**Examples:**
- Auto-format edited files (Stop)
- Run builds after changes (Stop)
- Inject helpful context (UserPromptSubmit)
**Requirement:** Fast (<2 seconds), non-blocking
---
## Phase 3: Enforcement (Blocking)
**Goal:** Prevent errors, enforce standards
**Examples:**
- Block dangerous operations (PreToolUse)
- Require fixes before continuing (Stop, blocking)
- Validate inputs (UserPromptSubmit, blocking)
**Requirement:** Only add when patterns clear from Phase 1-2
</progressive_enhancement>
<common_hook_patterns>
## Pattern 1: Build Checker (Stop Hook)
**Problem:** TypeScript errors left behind
**Solution:**
```bash
#!/bin/bash
# Stop hook - runs after Claude finishes
# Check modified repos
modified_repos=$(grep -h "edited" ~/.claude/edit-log.txt | cut -d: -f1 | sort -u)
for repo in $modified_repos; do
echo "Building $repo..."
cd "$repo" && npm run build 2>&1 | tee /tmp/build-output.txt
error_count=$(grep -c "error TS" /tmp/build-output.txt || echo "0")
if [ "$error_count" -gt 0 ]; then
if [ "$error_count" -ge 5 ]; then
echo "⚠️ Found $error_count errors - consider error-resolver agent"
else
echo "🔴 Found $error_count TypeScript errors:"
grep "error TS" /tmp/build-output.txt
fi
else
echo "✅ Build passed"
fi
done
```
**Configuration:**
```json
{
"event": "Stop",
"command": "~/.claude/hooks/build-checker.sh",
"description": "Run builds on modified repos",
"blocking": false
}
```
**Result:** Zero errors left behind
---
## Pattern 2: Auto-Formatter (Stop Hook)
**Problem:** Inconsistent formatting
**Solution:**
```bash
#!/bin/bash
# Stop hook - format all edited files
edited_files=$(tail -20 ~/.claude/edit-log.txt | grep "^/" | sort -u)
for file in $edited_files; do
repo_dir=$(dirname "$file")
while [ "$repo_dir" != "/" ]; do
if [ -f "$repo_dir/.prettierrc" ]; then
echo "Formatting $file..."
cd "$repo_dir" && npx prettier --write "$file"
break
fi
repo_dir=$(dirname "$repo_dir")
done
done
echo "✅ Formatting complete"
```
**Result:** All code consistently formatted
---
## Pattern 3: Error Handling Reminder (Stop Hook)
**Problem:** Claude forgets error handling
**Solution:**
```bash
#!/bin/bash
# Stop hook - gentle reminder
edited_files=$(tail -20 ~/.claude/edit-log.txt | grep "^/")
risky_patterns=0
for file in $edited_files; do
if grep -q "try\|catch\|async\|await\|prisma\|router\." "$file"; then
((risky_patterns++))
fi
done
if [ "$risky_patterns" -gt 0 ]; then
cat <<EOF
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 ERROR HANDLING SELF-CHECK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ Risky Patterns Detected
$risky_patterns file(s) with async/try-catch/database operations
❓ Did you add proper error handling?
❓ Are errors logged appropriately?
💡 Consider: Sentry.captureException(), proper logging
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EOF
fi
```
**Result:** Claude self-checks without blocking
---
## Pattern 4: Skills Auto-Activation
**See:** hyperpowers:skills-auto-activation for complete implementation
**Summary:** Analyzes prompt keywords, injects skill activation reminder before Claude processes.
</common_hook_patterns>
<hook_composition>
## Naming for Order Control
Multiple hooks for same event run in **alphabetical order** by filename.
**Use numeric prefixes:**
```
hooks/
├── 00-log-prompt.sh # First (logging)
├── 10-inject-context.sh # Second (context)
├── 20-activate-skills.sh # Third (skills)
└── 99-notify.sh # Last (notifications)
```
## Hook Dependencies
If Hook B depends on Hook A's output:
1. **Option 1:** Numeric prefixes (A before B)
2. **Option 2:** Combine into single hook
3. **Option 3:** File-based communication
**Example:**
```bash
# 10-track-edits.sh writes to edit-log.txt
# 20-check-builds.sh reads from edit-log.txt
```
</hook_composition>
<testing_hooks>
## Test in Isolation
```bash
# Manually trigger
bash ~/.claude/hooks/build-checker.sh
# Check exit code
echo $? # 0 = success
```
## Test with Mock Data
```bash
# Create mock log
echo "/path/to/test/file.ts" > /tmp/test-edit-log.txt
# Run with test data
EDIT_LOG=/tmp/test-edit-log.txt bash ~/.claude/hooks/build-checker.sh
```
## Test Non-Blocking Behavior
- Hook exits quickly (<2 seconds)
- Doesn't block Claude
- Provides clear output
## Test Blocking Behavior
- Blocking decision correct
- Reason message helpful
- Escape hatch exists
## Debugging
**Enable logging:**
```bash
set -x # Debug output
exec 2>~/.claude/hooks/debug.log
```
**Check execution:**
```bash
tail -f ~/.claude/logs/hooks.log
```
**Common issues:**
- Timeout (>10 second default)
- Wrong working directory
- Missing environment variables
- File permissions
</testing_hooks>
<examples>
<example>
<scenario>Developer adds blocking hook immediately without observation</scenario>
<code>
# Developer frustrated by TypeScript errors
# Creates blocking Stop hook immediately:
#!/bin/bash
npm run build
if [ $? -ne 0 ]; then
echo "BUILD FAILED - BLOCKING"
exit 1 # Blocks Claude
fi
</code>
<why_it_fails>
- No observation period to understand patterns
- Blocks even for minor errors
- No escape hatch if hook misbehaves
- Might block during experimentation
- Frustrates workflow when building is slow
- Haven't identified when blocking is actually needed
</why_it_fails>
<correction>
**Phase 1: Observe (1 week)**
```bash
#!/bin/bash
# Non-blocking observation
npm run build 2>&1 | tee /tmp/build.log
if grep -q "error TS" /tmp/build.log; then
echo "🔴 Build errors found (not blocking)"
fi
```
**After 1 week, review:**
- How often do errors appear?
- Are they usually fixed quickly?
- Do they cause real problems or just noise?
**Phase 2: If errors are frequent, automate**
```bash
#!/bin/bash
# Still non-blocking, but more helpful
npm run build 2>&1 | tee /tmp/build.log
error_count=$(grep -c "error TS" /tmp/build.log || echo "0")
if [ "$error_count" -ge 5 ]; then
echo "⚠️ $error_count errors - consider using error-resolver agent"
elif [ "$error_count" -gt 0 ]; then
echo "🔴 $error_count errors (not blocking):"
grep "error TS" /tmp/build.log | head -5
fi
```
**Phase 3: Only if observation shows blocking is necessary**
Never reached - non-blocking works fine!
**What you gain:**
- Understood patterns before acting
- Non-blocking keeps workflow smooth
- Helpful messages without friction
- Can experiment without frustration
</correction>
</example>
<example>
<scenario>Hook is slow, blocks workflow</scenario>
<code>
#!/bin/bash
# Stop hook that's too slow
# Run full test suite (takes 45 seconds!)
npm test
# Run linter (takes 10 seconds)
npm run lint
# Run build (takes 30 seconds)
npm run build
# Total: 85 seconds of blocking!
</code>
<why_it_fails>
- Hook takes 85 seconds to complete
- Blocks Claude for entire duration
- User can't continue working
- Frustrating, likely to be disabled
- Defeats purpose of automation
</why_it_fails>
<correction>
**Make hook fast (<2 seconds):**
```bash
#!/bin/bash
# Stop hook - fast checks only
# Quick syntax check (< 1 second)
npm run check-syntax
if [ $? -ne 0 ]; then
echo "🔴 Syntax errors found"
echo "💡 Run 'npm test' manually for full test suite"
fi
echo "✅ Quick checks passed (run 'npm test' for full suite)"
```
**Or run slow checks in background:**
```bash
#!/bin/bash
# Stop hook - trigger background job
# Start tests in background
(
npm test > /tmp/test-results.txt 2>&1
if [ $? -ne 0 ]; then
echo "🔴 Tests failed (see /tmp/test-results.txt)"
fi
) &
echo "⏳ Tests running in background (check /tmp/test-results.txt)"
```
**What you gain:**
- Hook completes instantly
- Workflow not blocked
- Still get quality checks
- User can continue working
</correction>
</example>
<example>
<scenario>Hook has no error handling, fails silently</scenario>
<code>
#!/bin/bash
# Hook with no error handling
file=$(tail -1 ~/.claude/edit-log.txt)
prettier --write "$file"
</code>
<why_it_fails>
- If edit-log.txt missing → hook fails silently
- If file path invalid → prettier errors not caught
- If prettier not installed → silent failure
- No logging, can't debug
- User has no idea hook ran or failed
</why_it_fails>
<correction>
**Add error handling:**
```bash
#!/bin/bash
set -euo pipefail # Exit on error, undefined vars
# Log execution
echo "[$(date)] Hook started" >> ~/.claude/hooks/formatter.log
# Validate input
if [ ! -f ~/.claude/edit-log.txt ]; then
echo "[$(date)] ERROR: edit-log.txt not found" >> ~/.claude/hooks/formatter.log
exit 1
fi
file=$(tail -1 ~/.claude/edit-log.txt | grep "^/.*\.ts$")
if [ -z "$file" ]; then
echo "[$(date)] No TypeScript file to format" >> ~/.claude/hooks/formatter.log
exit 0
fi
if [ ! -f "$file" ]; then
echo "[$(date)] ERROR: File not found: $file" >> ~/.claude/hooks/formatter.log
exit 1
fi
# Check prettier exists
if ! command -v prettier &> /dev/null; then
echo "[$(date)] ERROR: prettier not installed" >> ~/.claude/hooks/formatter.log
exit 1
fi
# Format
echo "[$(date)] Formatting: $file" >> ~/.claude/hooks/formatter.log
if prettier --write "$file" 2>&1 | tee -a ~/.claude/hooks/formatter.log; then
echo "✅ Formatted $file"
else
echo "🔴 Formatting failed (see ~/.claude/hooks/formatter.log)"
fi
```
**What you gain:**
- Errors logged and visible
- Graceful handling of missing files
- Can debug when issues occur
- Clear feedback to user
- Hook doesn't fail silently
</correction>
</example>
</examples>
<security>
**Hooks run with your credentials and have full system access.**
## Best Practices
1. **Review code carefully** - Hooks execute any command
2. **Use absolute paths** - Don't rely on PATH
3. **Validate inputs** - Don't trust file paths blindly
4. **Limit scope** - Only access what's needed
5. **Log actions** - Track what hooks do
6. **Test thoroughly** - Especially blocking hooks
## Dangerous Patterns
**Don't:**
```bash
# DANGEROUS - executes arbitrary code
cmd=$(tail -1 ~/.claude/edit-log.txt)
eval "$cmd"
```
**Do:**
```bash
# SAFE - validates and sanitizes
file=$(tail -1 ~/.claude/edit-log.txt | grep "^/.*\.ts$")
if [ -f "$file" ]; then
prettier --write "$file"
fi
```
</security>
<critical_rules>
## Rules That Have No Exceptions
1. **Start with Phase 1 (observe)** → Understand patterns before acting
2. **Keep hooks fast (<2 seconds)** → Don't block workflow
3. **Test thoroughly** → Hooks have full system access
4. **Add error handling and logging** → Silent failures are debugging nightmares
5. **Use progressive enhancement** → Observe → Automate → Enforce (only if needed)
## Common Excuses
All of these mean: **STOP. Follow progressive enhancement.**
- "Hook is simple, don't need testing" (Untested hooks fail in production)
- "Blocking is fine, need to enforce" (Start non-blocking, observe first)
- "I'll add error handling later" (Hook errors silent, add now)
- "Hook is slow but thorough" (Slow hooks block workflow, optimize)
- "Need access to everything" (Minimal permissions only)
</critical_rules>
<verification_checklist>
Before deploying hook:
- [ ] Tested in isolation (manual execution)
- [ ] Tested with mock data
- [ ] Completes quickly (<2 seconds for non-blocking)
- [ ] Has error handling (set -euo pipefail)
- [ ] Has logging (can debug failures)
- [ ] Validates inputs (doesn't trust blindly)
- [ ] Uses absolute paths
- [ ] Started with Phase 1 (observation)
- [ ] If blocking: has escape hatch
**Can't check all boxes?** Return to development and fix.
</verification_checklist>
<integration>
**This skill covers:** Hook creation and patterns
**Related skills:**
- hyperpowers:skills-auto-activation (complete skill activation hook)
- hyperpowers:verification-before-completion (quality hooks automate this)
- hyperpowers:testing-anti-patterns (avoid in hooks)
**Hook patterns support:**
- Automatic skill activation
- Build verification
- Code formatting
- Error prevention
- Workflow automation
</integration>
<resources>
**Detailed guides:**
- [Complete hook examples](resources/hook-examples.md)
- [Hook pattern library](resources/hook-patterns.md)
- [Testing strategies](resources/testing-hooks.md)
**Official documentation:**
- [Anthropic Hooks Guide](https://docs.claude.com/en/docs/claude-code/hooks-guide)
**When stuck:**
- Hook failing silently → Add logging, check ~/.claude/hooks/debug.log
- Hook too slow → Profile execution, move slow parts to background
- Hook blocking incorrectly → Return to Phase 1, observe patterns
- Testing unclear → Start with manual execution, then mock data
</resources>

View File

@@ -0,0 +1,577 @@
# Complete Hook Examples
This guide provides complete, production-ready hook implementations you can use and adapt.
## Example 1: File Edit Tracker (PostToolUse)
**Purpose:** Track which files were edited and in which repos for later analysis.
**File:** `~/.claude/hooks/post-tool-use/01-track-edits.sh`
```bash
#!/bin/bash
# Configuration
LOG_FILE="$HOME/.claude/edit-log.txt"
MAX_LOG_LINES=1000
# Create log if doesn't exist
touch "$LOG_FILE"
# Function to log edit
log_edit() {
local file_path="$1"
local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
local repo=$(find_repo "$file_path")
echo "$timestamp | $repo | $file_path" >> "$LOG_FILE"
}
# Function to find repo root
find_repo() {
local dir=$(dirname "$1")
while [ "$dir" != "/" ]; do
if [ -d "$dir/.git" ]; then
basename "$dir"
return
fi
dir=$(dirname "$dir")
done
echo "unknown"
}
# Read tool use event from stdin
read -r tool_use_json
# Extract file path from tool use
tool_name=$(echo "$tool_use_json" | jq -r '.tool.name')
file_path=""
case "$tool_name" in
"Edit"|"Write")
file_path=$(echo "$tool_use_json" | jq -r '.tool.input.file_path')
;;
"MultiEdit")
# MultiEdit has multiple files - log each
echo "$tool_use_json" | jq -r '.tool.input.edits[].file_path' | while read -r path; do
log_edit "$path"
done
exit 0
;;
esac
# Log single edit
if [ -n "$file_path" ] && [ "$file_path" != "null" ]; then
log_edit "$file_path"
fi
# Rotate log if too large
line_count=$(wc -l < "$LOG_FILE")
if [ "$line_count" -gt "$MAX_LOG_LINES" ]; then
tail -n "$MAX_LOG_LINES" "$LOG_FILE" > "$LOG_FILE.tmp"
mv "$LOG_FILE.tmp" "$LOG_FILE"
fi
# Return success (non-blocking)
echo '{}'
```
**Configuration (`hooks.json`):**
```json
{
"hooks": [
{
"event": "PostToolUse",
"command": "~/.claude/hooks/post-tool-use/01-track-edits.sh",
"description": "Track file edits for build checking",
"blocking": false,
"timeout": 1000
}
]
}
```
## Example 2: Multi-Repo Build Checker (Stop)
**Purpose:** Run builds on all repos that were modified, report errors.
**File:** `~/.claude/hooks/stop/20-build-checker.sh`
```bash
#!/bin/bash
# Configuration
LOG_FILE="$HOME/.claude/edit-log.txt"
PROJECT_ROOT="$HOME/git/myproject"
ERROR_THRESHOLD=5
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Get repos modified since last check
get_modified_repos() {
# Get unique repos from recent edits
tail -50 "$LOG_FILE" 2>/dev/null | \
cut -d'|' -f2 | \
tr -d ' ' | \
sort -u | \
grep -v "unknown"
}
# Run build in repo
build_repo() {
local repo_name="$1"
local repo_path="$PROJECT_ROOT/$repo_name"
if [ ! -d "$repo_path" ]; then
return 0
fi
# Determine build command
local build_cmd=""
if [ -f "$repo_path/package.json" ]; then
build_cmd="npm run build"
elif [ -f "$repo_path/Cargo.toml" ]; then
build_cmd="cargo build"
elif [ -f "$repo_path/go.mod" ]; then
build_cmd="go build ./..."
else
return 0 # No build system found
fi
echo "Building $repo_name..."
# Run build and capture output
cd "$repo_path"
local output=$(eval "$build_cmd" 2>&1)
local exit_code=$?
if [ $exit_code -ne 0 ]; then
# Count errors
local error_count=$(echo "$output" | grep -c "error" || echo "0")
if [ "$error_count" -ge "$ERROR_THRESHOLD" ]; then
echo -e "${YELLOW}⚠️ $repo_name: $error_count errors found${NC}"
echo " Consider launching auto-error-resolver agent"
else
echo -e "${RED}🔴 $repo_name: $error_count errors${NC}"
echo "$output" | grep "error" | head -10
fi
return 1
else
echo -e "${GREEN}$repo_name: Build passed${NC}"
return 0
fi
}
# Main execution
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "🔨 BUILD VERIFICATION"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
modified_repos=$(get_modified_repos)
if [ -z "$modified_repos" ]; then
echo "No repos modified since last check"
exit 0
fi
build_failures=0
for repo in $modified_repos; do
if ! build_repo "$repo"; then
((build_failures++))
fi
echo ""
done
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
if [ "$build_failures" -gt 0 ]; then
echo -e "${RED}$build_failures repo(s) failed to build${NC}"
else
echo -e "${GREEN}All builds passed${NC}"
fi
# Non-blocking - always return success
echo '{}'
```
## Example 3: TypeScript Prettier Formatter (Stop)
**Purpose:** Auto-format all edited TypeScript/JavaScript files.
**File:** `~/.claude/hooks/stop/30-format-code.sh`
```bash
#!/bin/bash
# Configuration
LOG_FILE="$HOME/.claude/edit-log.txt"
PROJECT_ROOT="$HOME/git/myproject"
# Get recently edited files
get_edited_files() {
tail -50 "$LOG_FILE" 2>/dev/null | \
cut -d'|' -f3 | \
tr -d ' ' | \
grep -E '\.(ts|tsx|js|jsx)$' | \
sort -u
}
# Format file with prettier
format_file() {
local file="$1"
if [ ! -f "$file" ]; then
return 0
fi
# Find prettier config
local dir=$(dirname "$file")
local prettier_config=""
while [ "$dir" != "/" ]; do
if [ -f "$dir/.prettierrc" ] || [ -f "$dir/.prettierrc.json" ]; then
prettier_config="$dir"
break
fi
dir=$(dirname "$dir")
done
if [ -z "$prettier_config" ]; then
return 0
fi
# Format the file
cd "$prettier_config"
npx prettier --write "$file" 2>/dev/null
if [ $? -eq 0 ]; then
echo "✓ Formatted: $(basename $file)"
fi
}
# Main execution
echo "🎨 Formatting edited files..."
edited_files=$(get_edited_files)
if [ -z "$edited_files" ]; then
echo "No files to format"
exit 0
fi
formatted_count=0
for file in $edited_files; do
if format_file "$file"; then
((formatted_count++))
fi
done
echo "✅ Formatted $formatted_count file(s)"
# Non-blocking
echo '{}'
```
## Example 4: Skill Activation Injector (UserPromptSubmit)
**Purpose:** Analyze user prompt and inject skill activation reminders.
**File:** `~/.claude/hooks/user-prompt-submit/skill-activator.js`
```javascript
#!/usr/bin/env node
const fs = require('fs');
const path = require('path');
// Load skill rules
const rulesPath = process.env.SKILL_RULES || path.join(process.env.HOME, '.claude/skill-rules.json');
const rules = JSON.parse(fs.readFileSync(rulesPath, 'utf8'));
// Read prompt from stdin
let promptData = '';
process.stdin.on('data', chunk => {
promptData += chunk;
});
process.stdin.on('end', () => {
const prompt = JSON.parse(promptData);
const activatedSkills = analyzePrompt(prompt.text);
if (activatedSkills.length > 0) {
const context = generateContext(activatedSkills);
console.log(JSON.stringify({
decision: 'approve',
additionalContext: context
}));
} else {
console.log(JSON.stringify({ decision: 'approve' }));
}
});
function analyzePrompt(text) {
const lowerText = text.toLowerCase();
const activated = [];
for (const [skillName, config] of Object.entries(rules)) {
// Check keywords
if (config.promptTriggers?.keywords) {
for (const keyword of config.promptTriggers.keywords) {
if (lowerText.includes(keyword.toLowerCase())) {
activated.push({ skill: skillName, priority: config.priority || 'medium' });
break;
}
}
}
// Check intent patterns
if (config.promptTriggers?.intentPatterns) {
for (const pattern of config.promptTriggers.intentPatterns) {
if (new RegExp(pattern, 'i').test(text)) {
activated.push({ skill: skillName, priority: config.priority || 'medium' });
break;
}
}
}
}
// Sort by priority
return activated.sort((a, b) => {
const priorityOrder = { high: 0, medium: 1, low: 2 };
return priorityOrder[a.priority] - priorityOrder[b.priority];
});
}
function generateContext(skills) {
const skillList = skills.map(s => s.skill).join(', ');
return `
🎯 SKILL ACTIVATION CHECK
The following skills may be relevant to this prompt:
${skills.map(s => `- **${s.skill}** (${s.priority} priority)`).join('\n')}
Before responding, check if any of these skills should be used.
`;
}
```
**Configuration (`skill-rules.json`):**
```json
{
"backend-dev-guidelines": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": ["backend", "controller", "service", "API", "endpoint"],
"intentPatterns": [
"(create|add).*?(route|endpoint|controller)",
"(how to|best practice).*?(backend|API)"
]
}
},
"frontend-dev-guidelines": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": ["frontend", "component", "react", "UI", "layout"],
"intentPatterns": [
"(create|build).*?(component|page|view)",
"(how to|pattern).*?(react|frontend)"
]
}
}
}
```
## Example 5: Error Handling Reminder (Stop)
**Purpose:** Gentle reminder to check error handling in risky code.
**File:** `~/.claude/hooks/stop/40-error-reminder.sh`
```bash
#!/bin/bash
LOG_FILE="$HOME/.claude/edit-log.txt"
# Get recently edited files
get_edited_files() {
tail -20 "$LOG_FILE" 2>/dev/null | \
cut -d'|' -f3 | \
tr -d ' ' | \
sort -u
}
# Check for risky patterns
check_file_risk() {
local file="$1"
if [ ! -f "$file" ]; then
return 1
fi
# Look for risky patterns
if grep -q -E "try|catch|async|await|prisma|\.execute\(|fetch\(|axios\." "$file"; then
return 0
fi
return 1
}
# Main execution
risky_count=0
backend_files=0
for file in $(get_edited_files); do
if check_file_risk "$file"; then
((risky_count++))
if echo "$file" | grep -q "backend\|server\|api"; then
((backend_files++))
fi
fi
done
if [ "$risky_count" -gt 0 ]; then
cat <<EOF
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 ERROR HANDLING SELF-CHECK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ Risky Patterns Detected
$risky_count file(s) with async/try-catch/database operations
❓ Did you add proper error handling?
❓ Are errors logged/captured appropriately?
❓ Are promises handled correctly?
EOF
if [ "$backend_files" -gt 0 ]; then
cat <<EOF
💡 Backend Best Practice:
- All errors should be captured (Sentry, logging)
- Database operations need try-catch
- API routes should use error middleware
EOF
fi
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
fi
# Non-blocking
echo '{}'
```
## Example 6: Dangerous Operation Blocker (PreToolUse)
**Purpose:** Block dangerous file operations (deletion, overwrite) in production paths.
**File:** `~/.claude/hooks/pre-tool-use/dangerous-ops.sh`
```bash
#!/bin/bash
# Read tool use event
read -r tool_use_json
tool_name=$(echo "$tool_use_json" | jq -r '.tool.name')
file_path=$(echo "$tool_use_json" | jq -r '.tool.input.file_path // empty')
# Dangerous paths (customize for your project)
PROTECTED_PATHS=(
"/production/"
"/prod/"
"/.env.production"
"/config/production"
)
# Check if operation is dangerous
is_dangerous() {
local path="$1"
for protected in "${PROTECTED_PATHS[@]}"; do
if [[ "$path" == *"$protected"* ]]; then
return 0
fi
done
return 1
}
# Check dangerous operations
if [ "$tool_name" == "Write" ] || [ "$tool_name" == "Edit" ]; then
if is_dangerous "$file_path"; then
cat <<EOF | jq -c '.'
{
"decision": "block",
"reason": "⛔ BLOCKED: Attempting to modify protected path\\n\\nFile: $file_path\\n\\nThis path is protected from automatic modification.\\nIf you need to make changes:\\n1. Review changes carefully\\n2. Use manual file editing\\n3. Confirm with teammate\\n\\nTo override, edit ~/.claude/hooks/pre-tool-use/dangerous-ops.sh"
}
EOF
exit 0
fi
fi
# Allow operation (NOTE: PreToolUse hooks should use hookSpecificOutput format with permissionDecision)
echo '{"decision": "allow"}'
```
## Testing These Examples
### Test Edit Tracker
```bash
# Create test log entry
echo "2025-01-15 10:30:00 | frontend | /path/to/file.ts" > ~/.claude/edit-log.txt
# Test formatting script
bash ~/.claude/hooks/stop/30-format-code.sh
```
### Test Build Checker
```bash
# Add some edits to log
echo "2025-01-15 10:30:00 | backend | /path/to/backend/file.ts" >> ~/.claude/edit-log.txt
# Run build checker
bash ~/.claude/hooks/stop/20-build-checker.sh
```
### Test Skill Activator
```bash
# Test with mock prompt
echo '{"text": "How do I create a new API endpoint?"}' | node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
## Debugging Tips
**Enable debug mode:**
```bash
# Add to top of any bash script
set -x
exec 2>>~/.claude/hooks/debug.log
```
**Check hook execution:**
```bash
# Watch hooks run in real-time
tail -f ~/.claude/logs/hooks.log
```
**Test hook output:**
```bash
# Capture output
bash ~/.claude/hooks/stop/20-build-checker.sh > /tmp/hook-test.log 2>&1
cat /tmp/hook-test.log
```

View File

@@ -0,0 +1,610 @@
# Hook Patterns Library
Reusable patterns for common hook use cases.
## Pattern: File Path Validation
Safely validate and sanitize file paths in hooks.
```bash
validate_file_path() {
local path="$1"
# Remove null/empty
if [ -z "$path" ] || [ "$path" == "null" ]; then
return 1
fi
# Must be absolute path
if [[ ! "$path" =~ ^/ ]]; then
return 1
fi
# Must exist
if [ ! -f "$path" ]; then
return 1
fi
# Check file extension whitelist
if [[ ! "$path" =~ \.(ts|tsx|js|jsx|py|rs|go|java)$ ]]; then
return 1
fi
return 0
}
# Usage
if validate_file_path "$file_path"; then
# Safe to operate on file
process_file "$file_path"
fi
```
## Pattern: Finding Project Root
Locate the project root directory from any file path.
```bash
find_project_root() {
local dir="$1"
# Start from file's directory
if [ -f "$dir" ]; then
dir=$(dirname "$dir")
fi
# Walk up until finding markers
while [ "$dir" != "/" ]; do
# Check for project markers
if [ -f "$dir/package.json" ] || \
[ -f "$dir/Cargo.toml" ] || \
[ -f "$dir/go.mod" ] || \
[ -d "$dir/.git" ]; then
echo "$dir"
return 0
fi
dir=$(dirname "$dir")
done
return 1
}
# Usage
project_root=$(find_project_root "$file_path")
if [ -n "$project_root" ]; then
cd "$project_root"
npm run build
fi
```
## Pattern: Conditional Hook Execution
Run hook only when certain conditions are met.
```bash
#!/bin/bash
# Configuration
MIN_CHANGES=3
TARGET_REPO="backend"
# Check if should run
should_run() {
# Count recent edits
local edit_count=$(tail -20 ~/.claude/edit-log.txt | wc -l)
if [ "$edit_count" -lt "$MIN_CHANGES" ]; then
return 1
fi
# Check if target repo was modified
if ! tail -20 ~/.claude/edit-log.txt | grep -q "$TARGET_REPO"; then
return 1
fi
return 0
}
# Main execution
if ! should_run; then
echo '{}'
exit 0
fi
# Run actual hook logic
perform_build_check
```
## Pattern: Rate Limiting
Prevent hooks from running too frequently.
```bash
#!/bin/bash
RATE_LIMIT_FILE="/tmp/hook-last-run"
MIN_INTERVAL=30 # seconds
# Check if enough time has passed
should_run() {
if [ ! -f "$RATE_LIMIT_FILE" ]; then
return 0
fi
local last_run=$(cat "$RATE_LIMIT_FILE")
local now=$(date +%s)
local elapsed=$((now - last_run))
if [ "$elapsed" -lt "$MIN_INTERVAL" ]; then
echo "Skipping (ran ${elapsed}s ago, min interval ${MIN_INTERVAL}s)"
return 1
fi
return 0
}
# Update last run time
mark_run() {
date +%s > "$RATE_LIMIT_FILE"
}
# Usage
if should_run; then
perform_expensive_operation
mark_run
fi
echo '{}'
```
## Pattern: Multi-Project Detection
Detect which project/repo a file belongs to.
```bash
detect_project() {
local file="$1"
local project_root="/Users/myuser/projects"
# Extract project name from path
if [[ "$file" =~ $project_root/([^/]+) ]]; then
echo "${BASH_REMATCH[1]}"
return 0
fi
echo "unknown"
return 1
}
# Usage
project=$(detect_project "$file_path")
case "$project" in
"frontend")
npm --prefix ~/projects/frontend run build
;;
"backend")
cargo build --manifest-path ~/projects/backend/Cargo.toml
;;
*)
echo "Unknown project: $project"
;;
esac
```
## Pattern: Graceful Degradation
Handle failures gracefully without blocking workflow.
```bash
#!/bin/bash
# Try operation with fallback
try_with_fallback() {
local primary_cmd="$1"
local fallback_cmd="$2"
local description="$3"
echo "Attempting: $description"
# Try primary command
if eval "$primary_cmd" 2>/dev/null; then
echo "✅ Success"
return 0
fi
echo "⚠️ Primary failed, trying fallback..."
# Try fallback
if eval "$fallback_cmd" 2>/dev/null; then
echo "✅ Fallback succeeded"
return 0
fi
echo "❌ Both failed, continuing anyway"
return 1
}
# Usage
try_with_fallback \
"npm run build" \
"npm run build:dev" \
"Building project"
# Always return empty response (non-blocking)
echo '{}'
```
## Pattern: Parallel Execution
Run multiple checks in parallel for speed.
```bash
#!/bin/bash
# Run checks in parallel
run_parallel_checks() {
local pids=()
# Start each check in background
check_typescript &
pids+=($!)
check_eslint &
pids+=($!)
check_tests &
pids+=($!)
# Wait for all to complete
local exit_code=0
for pid in "${pids[@]}"; do
wait "$pid" || exit_code=1
done
return $exit_code
}
check_typescript() {
npx tsc --noEmit > /tmp/tsc-output.txt 2>&1
if [ $? -ne 0 ]; then
echo "TypeScript errors found"
return 1
fi
}
check_eslint() {
npx eslint . > /tmp/eslint-output.txt 2>&1
}
check_tests() {
npm test > /tmp/test-output.txt 2>&1
}
# Usage
if run_parallel_checks; then
echo "✅ All checks passed"
else
echo "⚠️ Some checks failed"
cat /tmp/tsc-output.txt
cat /tmp/eslint-output.txt
fi
echo '{}'
```
## Pattern: Smart Caching
Cache results to avoid redundant work.
```bash
#!/bin/bash
CACHE_DIR="$HOME/.claude/hook-cache"
mkdir -p "$CACHE_DIR"
# Generate cache key
cache_key() {
local file="$1"
echo -n "$file:$(stat -f %m "$file" 2>/dev/null || stat -c %Y "$file")" | md5sum | cut -d' ' -f1
}
# Check cache
check_cache() {
local file="$1"
local key=$(cache_key "$file")
local cache_file="$CACHE_DIR/$key"
if [ -f "$cache_file" ]; then
# Cache hit
cat "$cache_file"
return 0
fi
return 1
}
# Update cache
update_cache() {
local file="$1"
local result="$2"
local key=$(cache_key "$file")
local cache_file="$CACHE_DIR/$key"
echo "$result" > "$cache_file"
# Clean old cache entries (older than 1 day)
find "$CACHE_DIR" -type f -mtime +1 -delete 2>/dev/null
}
# Usage
if cached=$(check_cache "$file_path"); then
echo "Cache hit: $cached"
else
result=$(expensive_operation "$file_path")
update_cache "$file_path" "$result"
echo "Computed: $result"
fi
```
## Pattern: Progressive Output
Show progress for long-running hooks.
```bash
#!/bin/bash
# Progress indicator
show_progress() {
local message="$1"
echo -n "$message..."
}
complete_progress() {
local status="$1"
if [ "$status" == "success" ]; then
echo " ✅"
else
echo " ❌"
fi
}
# Usage
show_progress "Running TypeScript compiler"
if npx tsc --noEmit 2>/dev/null; then
complete_progress "success"
else
complete_progress "failure"
fi
show_progress "Running linter"
if npx eslint . 2>/dev/null; then
complete_progress "success"
else
complete_progress "failure"
fi
echo '{}'
```
## Pattern: Context Injection
Inject helpful context into Claude's prompt.
```javascript
// UserPromptSubmit hook
function injectContext(prompt) {
const context = [];
// Add relevant documentation
if (prompt.includes('API')) {
context.push('📖 API Documentation: https://docs.example.com/api');
}
// Add recent changes
const recentFiles = getRecentlyEditedFiles();
if (recentFiles.length > 0) {
context.push(`📝 Recently edited: ${recentFiles.join(', ')}`);
}
// Add project status
const buildStatus = getLastBuildStatus();
if (!buildStatus.passed) {
context.push(`⚠️ Current build has ${buildStatus.errorCount} errors`);
}
if (context.length === 0) {
return { decision: 'approve' };
}
return {
decision: 'approve',
additionalContext: `\n\n---\n${context.join('\n')}\n---\n`
};
}
```
## Pattern: Error Accumulation
Collect multiple errors before reporting.
```bash
#!/bin/bash
ERRORS=()
# Add error to collection
add_error() {
ERRORS+=("$1")
}
# Report all errors
report_errors() {
if [ ${#ERRORS[@]} -eq 0 ]; then
echo "✅ No errors found"
return 0
fi
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "⚠️ Found ${#ERRORS[@]} issue(s):"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
local i=1
for error in "${ERRORS[@]}"; do
echo "$i. $error"
((i++))
done
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
return 1
}
# Usage
if ! run_typescript_check; then
add_error "TypeScript compilation failed"
fi
if ! run_lint_check; then
add_error "Linting issues found"
fi
if ! run_test_check; then
add_error "Tests failing"
fi
report_errors
echo '{}'
```
## Pattern: Conditional Blocking
Block only on critical errors, warn on others.
```bash
#!/bin/bash
ERROR_LEVEL="none" # none, warning, critical
# Check for issues
check_critical_issues() {
if grep -q "FIXME\|XXX\|TODO: CRITICAL" "$file_path"; then
ERROR_LEVEL="critical"
return 1
fi
return 0
}
check_warnings() {
if grep -q "console.log\|debugger" "$file_path"; then
ERROR_LEVEL="warning"
return 1
fi
return 0
}
# Run checks
check_critical_issues
check_warnings
# Return appropriate decision
case "$ERROR_LEVEL" in
"critical")
echo '{
"decision": "block",
"reason": "🚫 CRITICAL: Found critical TODOs or FIXMEs that must be addressed"
}' | jq -c '.'
;;
"warning")
echo "⚠️ Warning: Found debug statements (console.log, debugger)"
echo '{}'
;;
*)
echo '{}'
;;
esac
```
## Pattern: Hook Coordination
Coordinate between multiple hooks using shared state.
```bash
# Hook 1: Track state
#!/bin/bash
STATE_FILE="/tmp/hook-state.json"
# Update state
jq -n \
--arg timestamp "$(date +%s)" \
--arg files "$files_edited" \
'{lastRun: $timestamp, filesEdited: ($files | split(","))}' \
> "$STATE_FILE"
echo '{}'
```
```bash
# Hook 2: Read state
#!/bin/bash
STATE_FILE="/tmp/hook-state.json"
if [ -f "$STATE_FILE" ]; then
last_run=$(jq -r '.lastRun' "$STATE_FILE")
files=$(jq -r '.filesEdited[]' "$STATE_FILE")
# Use state from previous hook
for file in $files; do
process_file "$file"
done
fi
echo '{}'
```
## Pattern: User Notification
Notify user of important events without blocking.
```bash
#!/bin/bash
# Send desktop notification (macOS)
notify_macos() {
osascript -e "display notification \"$1\" with title \"Claude Code Hook\""
}
# Send desktop notification (Linux)
notify_linux() {
notify-send "Claude Code Hook" "$1"
}
# Notify based on OS
notify() {
local message="$1"
case "$OSTYPE" in
darwin*)
notify_macos "$message"
;;
linux*)
notify_linux "$message"
;;
esac
}
# Usage
if [ "$error_count" -gt 10 ]; then
notify "⚠️ Build has $error_count errors"
fi
echo '{}'
```
## Remember
- **Keep it simple** - Start with basic patterns, add complexity only when needed
- **Test thoroughly** - Test each pattern in isolation before combining
- **Fail gracefully** - Non-blocking hooks should never crash workflow
- **Log everything** - You'll need it for debugging
- **Document patterns** - Future you will thank present you

View File

@@ -0,0 +1,657 @@
# Testing Hooks
Comprehensive testing strategies for Claude Code hooks.
## Testing Philosophy
**Hooks run with full system access. Test them thoroughly before deploying.**
### Testing Levels
1. **Unit testing** - Test functions in isolation
2. **Integration testing** - Test with mock Claude Code events
3. **Manual testing** - Test in real Claude Code sessions
4. **Regression testing** - Verify hooks don't break existing workflows
## Unit Testing Hook Functions
### Bash Functions
**Example: Testing file validation**
```bash
# hook-functions.sh - extractable functions
validate_file_path() {
local path="$1"
if [ -z "$path" ] || [ "$path" == "null" ]; then
return 1
fi
if [[ ! "$path" =~ ^/ ]]; then
return 1
fi
if [ ! -f "$path" ]; then
return 1
fi
return 0
}
# Test script
#!/bin/bash
source ./hook-functions.sh
test_validate_file_path() {
# Test valid path
touch /tmp/test-file.txt
if validate_file_path "/tmp/test-file.txt"; then
echo "✅ Valid path test passed"
else
echo "❌ Valid path test failed"
return 1
fi
# Test invalid path
if ! validate_file_path ""; then
echo "✅ Empty path test passed"
else
echo "❌ Empty path test failed"
return 1
fi
# Test null path
if ! validate_file_path "null"; then
echo "✅ Null path test passed"
else
echo "❌ Null path test failed"
return 1
fi
# Test relative path
if ! validate_file_path "relative/path.txt"; then
echo "✅ Relative path test passed"
else
echo "❌ Relative path test failed"
return 1
fi
rm /tmp/test-file.txt
return 0
}
# Run test
test_validate_file_path
```
### JavaScript Functions
**Example: Testing prompt analysis**
```javascript
// skill-activator.js
function analyzePrompt(text, rules) {
const lowerText = text.toLowerCase();
const activated = [];
for (const [skillName, config] of Object.entries(rules)) {
if (config.promptTriggers?.keywords) {
for (const keyword of config.promptTriggers.keywords) {
if (lowerText.includes(keyword.toLowerCase())) {
activated.push({ skill: skillName, priority: config.priority || 'medium' });
break;
}
}
}
}
return activated;
}
// test.js
const assert = require('assert');
const testRules = {
'backend-dev': {
priority: 'high',
promptTriggers: {
keywords: ['backend', 'API', 'endpoint']
}
}
};
// Test keyword matching
function testKeywordMatching() {
const result = analyzePrompt('How do I create a backend endpoint?', testRules);
assert.equal(result.length, 1, 'Should find one skill');
assert.equal(result[0].skill, 'backend-dev', 'Should match backend-dev');
assert.equal(result[0].priority, 'high', 'Should have high priority');
console.log('✅ Keyword matching test passed');
}
// Test no match
function testNoMatch() {
const result = analyzePrompt('How do I write Python?', testRules);
assert.equal(result.length, 0, 'Should find no skills');
console.log('✅ No match test passed');
}
// Test case insensitivity
function testCaseInsensitive() {
const result = analyzePrompt('BACKEND endpoint', testRules);
assert.equal(result.length, 1, 'Should match regardless of case');
console.log('✅ Case insensitive test passed');
}
// Run tests
testKeywordMatching();
testNoMatch();
testCaseInsensitive();
```
## Integration Testing with Mock Events
### Creating Mock Events
**PostToolUse event:**
```json
{
"event": "PostToolUse",
"tool": {
"name": "Edit",
"input": {
"file_path": "/Users/test/project/src/file.ts",
"old_string": "const x = 1;",
"new_string": "const x = 2;"
}
},
"result": {
"success": true
}
}
```
**UserPromptSubmit event:**
```json
{
"event": "UserPromptSubmit",
"text": "How do I create a new API endpoint?",
"timestamp": "2025-01-15T10:30:00Z"
}
```
**Stop event:**
```json
{
"event": "Stop",
"sessionId": "abc123",
"messageCount": 10
}
```
### Testing Hook with Mock Events
```bash
#!/bin/bash
# test-hook.sh
# Create mock event
create_mock_edit_event() {
cat <<EOF
{
"event": "PostToolUse",
"tool": {
"name": "Edit",
"input": {
"file_path": "/tmp/test-file.ts"
}
}
}
EOF
}
# Test hook
test_edit_tracker() {
# Setup
export LOG_FILE="/tmp/test-edit-log.txt"
rm -f "$LOG_FILE"
# Run hook with mock event
create_mock_edit_event | bash hooks/post-tool-use/01-track-edits.sh
# Verify
if [ -f "$LOG_FILE" ]; then
if grep -q "test-file.ts" "$LOG_FILE"; then
echo "✅ Edit tracker test passed"
return 0
fi
fi
echo "❌ Edit tracker test failed"
return 1
}
test_edit_tracker
```
### Testing JavaScript Hooks
```javascript
// test-skill-activator.js
const { execSync } = require('child_process');
function testSkillActivator(prompt) {
const mockEvent = JSON.stringify({
text: prompt
});
const result = execSync(
'node hooks/user-prompt-submit/skill-activator.js',
{
input: mockEvent,
encoding: 'utf8',
env: {
...process.env,
SKILL_RULES: './test-skill-rules.json'
}
}
);
return JSON.parse(result);
}
// Test activation
function testBackendActivation() {
const result = testSkillActivator('How do I create a backend endpoint?');
if (result.additionalContext && result.additionalContext.includes('backend')) {
console.log('✅ Backend activation test passed');
} else {
console.log('❌ Backend activation test failed');
process.exit(1);
}
}
testBackendActivation();
```
## Manual Testing in Claude Code
### Testing Checklist
**Before deployment:**
- [ ] Hook executes without errors
- [ ] Hook completes within timeout (default 10s)
- [ ] Output is helpful and not overwhelming
- [ ] Non-blocking hooks don't prevent work
- [ ] Blocking hooks have clear error messages
- [ ] Hook handles missing files gracefully
- [ ] Hook handles malformed input gracefully
### Manual Test Procedure
**1. Enable debug mode:**
```bash
# Add to top of hook
set -x
exec 2>>~/.claude/hooks/debug-$(date +%Y%m%d).log
```
**2. Test with minimal prompt:**
```
Create a simple test file
```
**3. Observe hook execution:**
```bash
# Watch debug log
tail -f ~/.claude/hooks/debug-*.log
```
**4. Verify output:**
- Check that hook completes
- Verify no errors in debug log
- Confirm expected behavior
**5. Test edge cases:**
- Empty file paths
- Non-existent files
- Files outside project
- Malformed input
- Missing dependencies
**6. Test performance:**
```bash
# Time hook execution
time bash hooks/stop/build-checker.sh
```
## Regression Testing
### Creating Test Suite
```bash
#!/bin/bash
# regression-test.sh
TEST_DIR="/tmp/hook-tests"
mkdir -p "$TEST_DIR"
# Setup test environment
setup() {
export LOG_FILE="$TEST_DIR/edit-log.txt"
export PROJECT_ROOT="$TEST_DIR/projects"
mkdir -p "$PROJECT_ROOT"
}
# Cleanup after tests
teardown() {
rm -rf "$TEST_DIR"
}
# Test 1: Edit tracker logs edits
test_edit_tracker_logs() {
echo '{"tool": {"name": "Edit", "input": {"file_path": "/test/file.ts"}}}' | \
bash hooks/post-tool-use/01-track-edits.sh
if grep -q "file.ts" "$LOG_FILE"; then
echo "✅ Test 1 passed"
return 0
fi
echo "❌ Test 1 failed"
return 1
}
# Test 2: Build checker finds errors
test_build_checker_finds_errors() {
# Create mock project with errors
mkdir -p "$PROJECT_ROOT/test-project"
echo 'const x: string = 123;' > "$PROJECT_ROOT/test-project/error.ts"
# Add to log
echo "2025-01-15 10:00:00 | test-project | error.ts" > "$LOG_FILE"
# Run build checker (should find errors)
output=$(bash hooks/stop/20-build-checker.sh)
if echo "$output" | grep -q "error"; then
echo "✅ Test 2 passed"
return 0
fi
echo "❌ Test 2 failed"
return 1
}
# Test 3: Formatter handles missing prettier
test_formatter_missing_prettier() {
# Create file without prettier config
mkdir -p "$PROJECT_ROOT/no-prettier"
echo 'const x=1' > "$PROJECT_ROOT/no-prettier/file.js"
echo "2025-01-15 10:00:00 | no-prettier | file.js" > "$LOG_FILE"
# Should complete without error
if bash hooks/stop/30-format-code.sh 2>&1; then
echo "✅ Test 3 passed"
return 0
fi
echo "❌ Test 3 failed"
return 1
}
# Run all tests
run_all_tests() {
setup
local failed=0
test_edit_tracker_logs || ((failed++))
test_build_checker_finds_errors || ((failed++))
test_formatter_missing_prettier || ((failed++))
teardown
if [ $failed -eq 0 ]; then
echo "━━━━━━━━━━━━━━━━━━━━━━━━"
echo "✅ All tests passed!"
echo "━━━━━━━━━━━━━━━━━━━━━━━━"
return 0
else
echo "━━━━━━━━━━━━━━━━━━━━━━━━"
echo "$failed test(s) failed"
echo "━━━━━━━━━━━━━━━━━━━━━━━━"
return 1
fi
}
run_all_tests
```
### Running Regression Suite
```bash
# Run before deploying changes
bash test/regression-test.sh
# Run on schedule (cron)
0 0 * * * cd ~/hooks && bash test/regression-test.sh
```
## Performance Testing
### Measuring Hook Performance
```bash
#!/bin/bash
# benchmark-hook.sh
ITERATIONS=10
HOOK_PATH="hooks/stop/build-checker.sh"
total_time=0
for i in $(seq 1 $ITERATIONS); do
start=$(date +%s%N)
bash "$HOOK_PATH" > /dev/null 2>&1
end=$(date +%s%N)
elapsed=$(( (end - start) / 1000000 )) # Convert to ms
total_time=$(( total_time + elapsed ))
echo "Iteration $i: ${elapsed}ms"
done
average=$(( total_time / ITERATIONS ))
echo "━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Average: ${average}ms"
echo "Total: ${total_time}ms"
echo "━━━━━━━━━━━━━━━━━━━━━━━━"
if [ $average -gt 2000 ]; then
echo "⚠️ Hook is slow (>2s)"
exit 1
fi
echo "✅ Performance acceptable"
```
### Performance Targets
- **Non-blocking hooks:** <2 seconds
- **Blocking hooks:** <5 seconds
- **UserPromptSubmit:** <1 second (critical path)
- **PostToolUse:** <500ms (runs frequently)
## Continuous Testing
### Pre-commit Hook for Hook Testing
```bash
#!/bin/bash
# .git/hooks/pre-commit
echo "Testing Claude Code hooks..."
# Run test suite
if bash test/regression-test.sh; then
echo "✅ Hook tests passed"
exit 0
else
echo "❌ Hook tests failed"
echo "Fix tests before committing"
exit 1
fi
```
### CI/CD Integration
```yaml
# .github/workflows/test-hooks.yml
name: Test Claude Code Hooks
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Node.js
uses: actions/setup-node@v2
with:
node-version: '18'
- name: Install dependencies
run: npm install
- name: Run hook tests
run: bash test/regression-test.sh
- name: Run performance tests
run: bash test/benchmark-hook.sh
```
## Common Testing Mistakes
### Mistake 1: Not Testing Error Paths
**Wrong:**
```bash
# Only test success path
npx tsc --noEmit
echo "✅ Build passed"
```
**Right:**
```bash
# Test both success and failure
if npx tsc --noEmit 2>&1; then
echo "✅ Build passed"
else
echo "❌ Build failed"
# Test that error handling works
fi
```
### Mistake 2: Hardcoding Paths
**Wrong:**
```bash
# Hardcoded path
cd /Users/myname/projects/myproject
npm run build
```
**Right:**
```bash
# Dynamic path
project_root=$(find_project_root "$file_path")
if [ -n "$project_root" ]; then
cd "$project_root"
npm run build
fi
```
### Mistake 3: Not Cleaning Up
**Wrong:**
```bash
# Leaves test files behind
echo "test" > /tmp/test-file.txt
run_test
# Never cleans up
```
**Right:**
```bash
# Always cleanup
trap 'rm -f /tmp/test-file.txt' EXIT
echo "test" > /tmp/test-file.txt
run_test
```
### Mistake 4: Silent Failures
**Wrong:**
```bash
# Errors disappear
npx tsc --noEmit 2>/dev/null
```
**Right:**
```bash
# Capture errors
output=$(npx tsc --noEmit 2>&1)
if [ $? -ne 0 ]; then
echo "❌ TypeScript errors:"
echo "$output"
fi
```
## Debugging Failed Tests
### Enable Verbose Output
```bash
# Add debug flags
set -x # Print commands
set -e # Exit on error
set -u # Error on undefined variables
set -o pipefail # Catch pipe failures
```
### Capture Test Output
```bash
# Run test with full output
bash -x test/regression-test.sh 2>&1 | tee test-output.log
# Review output
less test-output.log
```
### Isolate Failing Test
```bash
# Run single test
source test/regression-test.sh
setup
test_build_checker_finds_errors
teardown
```
## Remember
- **Test before deploying** - Hooks have full system access
- **Test all paths** - Success, failure, edge cases
- **Test performance** - Hooks shouldn't slow workflow
- **Automate testing** - Run tests on every change
- **Clean up** - Don't leave test artifacts
- **Document tests** - Future you will thank present you
**Golden rule:** If you wouldn't run it on production, don't deploy it as a hook.

View File

@@ -0,0 +1 @@
Use your hyperpowers:brainstorming skill.

View File

@@ -0,0 +1 @@
Use your Executing-Plans skill.

View File

@@ -0,0 +1 @@
Use your hyperpowers:writing-plans skill.

View File

@@ -0,0 +1,141 @@
# bd Command Reference
Common bd commands used across multiple skills. Reference this instead of duplicating.
## Reading Issues
```bash
# Show single issue with full design
bd show bd-3
# List all open issues
bd list --status open
# List closed issues
bd list --status closed
# Show dependency tree for an epic
bd dep tree bd-1
# Find tasks ready to work on (no blocking dependencies)
bd ready
# List tasks in a specific epic
bd list --parent bd-1
```
## Creating Issues
```bash
# Create epic
bd create "Epic: Feature Name" \
--type epic \
--priority [0-4] \
--design "## Goal
[Epic description]
## Success Criteria
- [ ] All phases complete
..."
# Create feature/phase
bd create "Phase 1: Phase Name" \
--type feature \
--priority [0-4] \
--design "[Phase design]"
# Create task
bd create "Task Name" \
--type task \
--priority [0-4] \
--design "[Task design]"
```
## Updating Issues
```bash
# Update issue design (detailed description)
bd update bd-3 --design "$(cat <<'EOF'
[Complete updated design]
EOF
)"
```
**IMPORTANT**: Use `--design` for the full detailed description, NOT `--description` (which is title only).
## Managing Status
```bash
# Start working on task
bd update bd-3 --status in_progress
# Complete task
bd close bd-3
# Reopen task
bd update bd-3 --status open
```
**Common Mistakes:**
```bash
# ❌ WRONG - bd status shows database overview, doesn't change status
bd status bd-3 --status in_progress
# ✅ CORRECT - use bd update to change status
bd update bd-3 --status in_progress
# ❌ WRONG - using hyphens in status values
bd update bd-3 --status in-progress
# ✅ CORRECT - use underscores in status values
bd update bd-3 --status in_progress
# ❌ WRONG - 'done' is not a valid status
bd update bd-3 --status done
# ✅ CORRECT - use bd close to complete
bd close bd-3
```
**Valid status values:** `open`, `in_progress`, `blocked`, `closed`
## Managing Dependencies
```bash
# Add blocking dependency (LATER depends on EARLIER)
# Syntax: bd dep add <dependent> <dependency>
bd dep add bd-3 bd-2 # bd-3 depends on bd-2 (do bd-2 first)
# Add parent-child relationship
# Syntax: bd dep add <child> <parent> --type parent-child
bd dep add bd-3 bd-1 --type parent-child # bd-3 is child of bd-1
# View dependency tree
bd dep tree bd-1
```
## Commit Message Format
Reference bd task IDs in commits (use hyperpowers:test-runner agent):
```bash
# Use test-runner agent to avoid pre-commit hook pollution
Dispatch hyperpowers:test-runner agent: "Run: git add <files> && git commit -m 'feat(bd-3): implement feature
Implements step 1 of bd-3: Task Name
'"
```
## Common Queries
```bash
# Check if all tasks in epic are closed
bd list --status open --parent bd-1
# Output: [empty] = all closed
# See what's blocking current work
bd ready # Shows only unblocked tasks
# Find all in-progress work
bd list --status in_progress
```

View File

@@ -0,0 +1,123 @@
# Common Anti-Patterns
Anti-patterns that apply across multiple skills. Reference this to avoid duplication.
## Language-Specific Anti-Patterns
### Rust
```
❌ No unwrap() or expect() in production code
Use proper error handling with Result/Option
❌ No todo!(), unimplemented!(), or panic!() in production
Implement all code paths properly
❌ No #[ignore] on tests without bd issue number
Fix or track broken tests
❌ No unsafe blocks without documentation
Document safety invariants
❌ Use proper array bounds checking
Prefer .get() over direct indexing in production
```
### Swift
```
❌ No force unwrap (!) in production code
Use optional chaining or guard/if let
❌ No fatalError() in production code
Handle errors gracefully
❌ No disabled tests without bd issue number
Fix or track broken tests
❌ Use proper array bounds checking
Check indices before accessing
❌ Handle all enum cases
No default: fatalError() shortcuts
```
### TypeScript
```
❌ No @ts-ignore or @ts-expect-error without bd issue number
Fix type issues properly
❌ No any types without justification
Use proper typing
❌ No .skip() on tests without bd issue number
Fix or track broken tests
❌ No throw in async code without proper handling
Use try/catch or Promise.catch()
```
## General Anti-Patterns
### Code Quality
```
❌ No TODOs or FIXMEs without bd issue numbers
Track work in bd, not in code comments
❌ No stub implementations
Empty functions, placeholder returns forbidden
❌ No commented-out code
Delete it - version control remembers
❌ No debug print statements in commits
Remove console.log, println!, print() before committing
❌ No "we'll do this later"
Either do it now or create bd issue and reference it
```
### Testing
```
❌ Don't test mock behavior
Test real behavior or unmock it
❌ Don't add test-only methods to production code
Put in test utilities instead
❌ Don't mock without understanding dependencies
Understand what you're testing first
❌ Don't skip verifications
Run the test, see the output, then claim it passes
```
### Process
```
❌ Don't commit without running tests
Verify tests pass before committing
❌ Don't create PR without running full test suite
All tests must pass before PR creation
❌ Don't skip pre-commit hooks
Never use --no-verify
❌ Don't force push without explicit request
Respect shared branch history
❌ Don't assume backwards compatibility is desired
Ask if breaking changes are acceptable
```
## Project-Specific Additions
Each project may have additional anti-patterns. Check CLAUDE.md for:
- Project-specific code patterns to avoid
- Custom linting rules
- Framework-specific anti-patterns
- Team conventions

View File

@@ -0,0 +1,99 @@
# Common Rationalizations - STOP
These rationalizations appear across multiple contexts. When you catch yourself thinking any of these, STOP - you're about to violate a skill.
## Process Shortcuts
| Excuse | Reality |
|--------|---------|
| "This is simple, can skip the process" | Simple tasks done wrong become complex problems. |
| "Just this once" | No exceptions. Process exists because exceptions fail. |
| "I'm confident this will work" | Confidence ≠ evidence. Run the verification. |
| "I'm tired" | Exhaustion ≠ excuse for shortcuts. |
| "No time for proper approach" | Shortcuts cost more time in rework. |
| "Partner won't notice" | They will. Trust is earned through consistency. |
| "Different words so rule doesn't apply" | Spirit over letter. Intent matters. |
## Verification Shortcuts
| Excuse | Reality |
|--------|---------|
| "Should work now" | RUN the verification command. |
| "Looks correct" | Run it and see the output. |
| "Tests probably pass" | Probably ≠ verified. Run them. |
| "Linter passed, must be fine" | Linter ≠ compiler ≠ tests. Run everything. |
| "Partial check is enough" | Partial proves nothing about the whole. |
| "Agent said success" | Agents lie/hallucinate. Verify independently. |
## Documentation Shortcuts
| Excuse | Reality |
|--------|---------|
| "File probably exists" | Use tools to verify. Don't assume. |
| "Design mentioned it, must be there" | Codebase changes. Verify current state. |
| "I can verify quickly myself" | Use investigator agents. Prevents hallucination. |
| "User can figure it out during execution" | Your job is exact instructions. No ambiguity. |
## Planning Shortcuts
| Excuse | Reality |
|--------|---------|
| "Can skip exploring alternatives" | Comparison reveals issues. Always propose 2-3. |
| "Partner knows what they want" | Questions reveal hidden constraints. Always ask. |
| "Whole design at once for efficiency" | Incremental validation catches problems early. |
| "Checklist is just suggestion" | Create TodoWrite todos. Track properly. |
| "Subtask can reference parent for details" | NO. Subtasks must be complete. NO placeholders, NO "see parent". |
| "I'll use placeholder and fill in later" | NO. Write actual content NOW. No meta-references like "[detailed above]". |
| "Design field is too long, use placeholder" | Length doesn't matter. Write full content. Placeholder defeats the purpose. |
| "Should I continue to the next task?" | YES. You have a TodoWrite plan. Execute it. Don't interrupt your own workflow. |
| "Let me ask user's preference for remaining tasks" | NO. The user gave you the work. Do it. Only ask at natural completion points. |
| "Should I stop here or keep going?" | Your TodoWrite list tells you. If tasks remain, continue. |
## Execution Shortcuts
| Excuse | Reality |
|--------|---------|
| "Task is tracked in TodoWrite, don't need step tracking" | Tasks have 4-8 implementation steps. Without substep tracking, steps 4-8 get skipped. |
| "Made progress on the task, can move on" | Progress ≠ complete. All substeps must finish. 2/6 steps = 33%, not done. |
| "Other tasks are waiting, should continue" | Current task incomplete = blocked. Finish all substeps first. |
| "Can finish remaining steps later" | Later never comes. Complete all substeps now before closing task. |
## Quality Shortcuts
| Excuse | Reality |
|--------|---------|
| "Small gaps don't matter" | Spec is contract. All criteria must be met. |
| "Will fix in next PR" | This PR should complete this work. Fix now. |
| "Partner will review anyway" | You review first. Don't delegate your quality check. |
| "Good enough for now" | "Now" becomes "forever". Do it right. |
## TDD Shortcuts
| Excuse | Reality |
|--------|---------|
| "Test is obvious, can skip RED phase" | If you don't watch it fail, you don't know it works. |
| "Will adapt this code while writing test" | Delete it. Start fresh from the test. |
| "Can keep it as reference" | No. Delete means delete. |
| "Test is simple, don't need to run it" | Simple tests fail for subtle reasons. Run it. |
## Research Shortcuts
| Excuse | Reality |
|--------|---------|
| "I can research quickly myself" | Use agents. You'll hallucinate or waste context. |
| "Agent didn't find it first try, must not exist" | Be persistent. Refine query and try again. |
| "I know this codebase" | You don't know current state. Always verify. |
| "Obvious solution, skip research" | Codebase may have established pattern. Check first. |
**All of these mean: STOP. Follow the requirements exactly.**
## Why This Matters
Rationalizations are how good processes fail:
1. Developer thinks "just this once"
2. Shortcut causes subtle bug
3. Bug found in production/PR
4. More time spent fixing than process would have cost
5. Trust damaged
**No shortcuts. Follow the process. Every time.**

View File

@@ -0,0 +1,493 @@
---
name: debugging-with-tools
description: Use when encountering bugs or test failures - systematic debugging using debuggers, internet research, and agents to find root cause before fixing
---
<skill_overview>
Random fixes waste time and create new bugs. Always use tools to understand root cause BEFORE attempting fixes. Symptom fixes are failure.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Must complete investigation phases (tools → hypothesis → test) before fixing.
Can adapt tool choice to language/context. Never skip investigation or guess at fixes.
</rigidity_level>
<quick_reference>
| Phase | Tools to Use | Output |
|-------|--------------|--------|
| **1. Investigate** | Error messages, internet-researcher agent, debugger, codebase-investigator | Root cause understanding |
| **2. Hypothesize** | Form theory based on evidence (not guesses) | Testable hypothesis |
| **3. Test** | Validate hypothesis with minimal change | Confirms or rejects theory |
| **4. Fix** | Implement proper fix for root cause | Problem solved permanently |
**FORBIDDEN:** Skip investigation → guess at fix → hope it works
**REQUIRED:** Tools → evidence → hypothesis → test → fix
**Key agents:**
- `internet-researcher` - Search error messages, known bugs, solutions
- `codebase-investigator` - Understand code structure, find related code
- `test-runner` - Run tests without output pollution
</quick_reference>
<when_to_use>
**Use for ANY technical issue:**
- Test failures
- Bugs in production or development
- Unexpected behavior
- Build failures
- Integration issues
- Performance problems
**ESPECIALLY when:**
- "Just one quick fix" seems obvious
- Under time pressure (emergencies make guessing tempting)
- Error message is unclear
- Previous fix didn't work
</when_to_use>
<the_process>
## Phase 1: Tool-Assisted Investigation
**BEFORE attempting ANY fix, gather evidence with tools:**
### 1. Read Complete Error Messages
- Entire error message (not just first line)
- Complete stack trace (all frames)
- Line numbers, file paths, error codes
- Stack traces show exact execution path
### 2. Search Internet FIRST (Use internet-researcher Agent)
**Dispatch internet-researcher with:**
```
"Search for error: [exact error message]
- Check Stack Overflow solutions
- Look for GitHub issues in [library] version [X]
- Find official documentation explaining this error
- Check if this is a known bug"
```
**What agent should find:**
- Exact matches to your error
- Similar symptoms and solutions
- Known bugs in your dependency versions
- Workarounds that worked for others
### 3. Use Debugger to Inspect State
**Claude cannot run debuggers directly. Instead:**
**Option A - Recommend debugger to user:**
```
"Let's use lldb/gdb/DevTools to inspect state at error location.
Please run: [specific commands]
When breakpoint hits: [what to inspect]
Share output with me."
```
**Option B - Add instrumentation Claude can add:**
```rust
// Add logging
println!("DEBUG: var = {:?}, state = {:?}", var, state);
// Add assertions
assert!(condition, "Expected X but got {:?}", actual);
```
### 4. Investigate Codebase (Use codebase-investigator Agent)
**Dispatch codebase-investigator with:**
```
"Error occurs in function X at line Y.
Find:
- How is X called? What are the callers?
- What does variable Z contain at this point?
- Are there similar functions that work correctly?
- What changed recently in this area?"
```
## Phase 2: Form Hypothesis
**Based on evidence (not guesses):**
1. **State what you know** (from investigation)
2. **Propose theory** explaining the evidence
3. **Make prediction** that tests the theory
**Example:**
```
Known: Error "null pointer" at auth.rs:45 when email is empty
Theory: Empty email bypasses validation, passes null to login()
Prediction: Adding validation before login() will prevent error
Test: Add validation, verify error doesn't occur with empty email
```
**NEVER:**
- Guess without evidence
- Propose fix without hypothesis
- Skip to "try this and see"
## Phase 3: Test Hypothesis
**Minimal change to validate theory:**
1. Make smallest change that tests hypothesis
2. Run test/reproduction case
3. Observe result
**If confirmed:** Proceed to Phase 4
**If rejected:** Return to Phase 1 with new information
## Phase 4: Implement Fix
**After understanding root cause:**
1. Write test reproducing bug (RED phase - use test-driven-development skill)
2. Implement proper fix addressing root cause
3. Verify test passes (GREEN phase)
4. Run full test suite (regression check)
5. Commit fix
**The fix should:**
- Address root cause (not symptom)
- Be minimal and focused
- Include test preventing regression
</the_process>
<examples>
<example>
<scenario>Developer encounters test failure, immediately tries "obvious" fix without investigation</scenario>
<code>
Test error:
```
FAIL: test_login_expired_token
AssertionError: Expected Err(TokenExpired), got Ok(User)
```
Developer thinks: "Obviously the token expiration check is wrong"
Makes change without investigation:
```rust
// "Fix" - just check if token is expired
if token.expires_at < now() {
return Err(AuthError::TokenExpired);
}
```
Commits without testing other cases.
</code>
<why_it_fails>
**No investigation:**
- Didn't read error completely
- Didn't check what `expires_at` contains
- Didn't debug to see token state
- Didn't search for similar issues
**What actually happened:** Token `expires_at` was being parsed incorrectly, always showing future date. The "fix" adds dead code that never runs.
**Result:** Bug not fixed, new dead code added, time wasted.
</why_it_fails>
<correction>
**Phase 1 - Investigate with tools:**
```bash
# 1. Read complete error
FAIL: test_login_expired_token at line 45
Expected: Err(TokenExpired)
Got: Ok(User { id: 123 })
Token: { expires_at: "2099-01-01", ... }
```
**Dispatch internet-researcher:**
```
"Search for: token expiration always showing future date
- Check date parsing bugs
- Look for timezone issues
- Find JWT expiration handling"
```
**Add instrumentation:**
```rust
println!("DEBUG: expires_at = {:?}, now = {:?}, expired = {:?}",
token.expires_at, now(), token.expires_at < now());
```
**Run test again:**
```
DEBUG: expires_at = 2099-01-01T00:00:00Z, now = 2024-01-15T10:30:00Z, expired = false
```
**Phase 2 - Hypothesis:**
"Token `expires_at` is being set to 2099, not actual expiration. Problem is in token creation, not validation."
**Phase 3 - Test:**
Check token creation code:
```rust
// Found the bug!
fn create_token() -> Token {
Token {
expires_at: "2099-01-01".parse()?, // HARDCODED!
...
}
}
```
**Phase 4 - Fix root cause:**
```rust
fn create_token(duration: Duration) -> Token {
Token {
expires_at: now() + duration, // Correct
...
}
}
```
**Result:** Root cause fixed, test passes, no dead code.
</correction>
</example>
<example>
<scenario>Developer skips internet search, reinvents solution to known problem</scenario>
<code>
Error:
```
error: linking with `cc` failed: exit status: 1
ld: symbol(s) not found for architecture arm64
```
Developer thinks: "Must be a linking issue, I'll add flags"
Spends 2 hours trying different linker flags:
```toml
[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-undefined dynamic_lookup"]
# Doesn't work, tries more flags...
```
</code>
<why_it_fails>
**Skipped internet search:**
- This is a common error with known solutions
- Stack Overflow has exact fix
- Official docs explain the issue
- Wasted 2 hours reinventing solution
**Why it happens:** Impatience, thinking "I can figure this out faster"
</why_it_fails>
<correction>
**Dispatch internet-researcher FIRST:**
```
"Search for: 'symbol not found for architecture arm64' Rust linking
- Check Stack Overflow solutions
- Look for Xcode/macOS specific fixes
- Find Cargo configuration for Apple Silicon"
```
**Agent returns (30 seconds):**
```
Found on Stack Overflow (2.4k upvotes):
This occurs when Xcode Command Line Tools aren't installed or outdated.
Solution:
xcode-select --install
Or update existing:
softwareupdate --all --install --force
```
**Apply solution:**
```bash
xcode-select --install
# Wait for installation
cargo build
# Success!
```
**Result:** Fixed in 5 minutes, not 2 hours.
</correction>
</example>
<example>
<scenario>Developer fixes symptom without understanding root cause, bug returns different way</scenario>
<code>
Bug: Users can delete other users' posts
Developer notices: DELETE /posts/:id doesn't check ownership
"Fix":
```python
@app.delete("/posts/{post_id}")
def delete_post(post_id: int, user: User):
post = get_post(post_id)
if post.author_id != user.id:
raise Forbidden()
delete(post)
```
Bug "fixed"... but bug report comes back: "Users can edit other users' posts"
</code>
<why_it_fails>
**Symptom fix:**
- Fixed DELETE endpoint
- Didn't investigate root cause
- Didn't check other endpoints
**Root cause:** No authorization middleware checking ownership across ALL endpoints.
**Result:** Whack-a-mole fixing symptoms, not the underlying issue.
</why_it_fails>
<correction>
**Phase 1 - Investigate with codebase-investigator:**
```
"User can delete other users' posts. Investigate:
- What authorization exists across all post endpoints?
- Is there a pattern for ownership checks?
- Which other endpoints access posts?
- Is there middleware handling this?"
```
**Agent reports:**
```
Found:
- DELETE /posts/:id - NO ownership check
- PUT /posts/:id - NO ownership check
- GET /posts/:id - NO ownership check (allows viewing private posts!)
- No authorization middleware found
- Similar pattern in comments, likes endpoints
```
**Phase 2 - Hypothesis:**
"Missing authorization layer. Need middleware checking resource ownership across ALL endpoints."
**Phase 4 - Fix root cause:**
```python
# Add authorization middleware
class OwnershipMiddleware:
def check_ownership(self, resource, user):
if resource.author_id != user.id:
raise Forbidden()
# Apply to all endpoints
@app.delete("/posts/{post_id}")
@require_ownership(Post)
def delete_post(...):
...
@app.put("/posts/{post_id}")
@require_ownership(Post)
def update_post(...):
...
```
**Result:** Root cause fixed, ALL endpoints secured, not just one symptom.
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Tools before fixes** → Never guess without investigation
- Use internet-researcher for errors
- Use debugger or instrumentation for state
- Use codebase-investigator for context
2. **Evidence-based hypotheses** → Not guesses or hunches
- State what tools revealed
- Propose theory explaining evidence
- Make testable prediction
3. **Test hypothesis before fixing** → Minimal change to validate
- Smallest change that tests theory
- Observe result
- If wrong, return to investigation
4. **Fix root cause, not symptom** → One fix, many symptoms prevented
- Understand why problem occurred
- Fix the underlying issue
- Don't play whack-a-mole
## Common Excuses
All of these mean: Stop, use tools to investigate:
- "The fix is obvious"
- "I know what this is"
- "Just a quick try"
- "No time for debugging"
- "Error message is clear enough"
- "Internet search will take too long"
</critical_rules>
<verification_checklist>
Before proposing any fix:
- [ ] Read complete error message (not just first line)
- [ ] Dispatched internet-researcher for unclear errors
- [ ] Used debugger or added instrumentation to inspect state
- [ ] Dispatched codebase-investigator to understand context
- [ ] Formed hypothesis based on evidence (not guesses)
- [ ] Tested hypothesis with minimal change
- [ ] Verified hypothesis confirmed before fixing
Before committing fix:
- [ ] Written test reproducing bug (RED phase)
- [ ] Verified test fails before fix
- [ ] Implemented fix addressing root cause
- [ ] Verified test passes after fix (GREEN phase)
- [ ] Ran full test suite (regression check)
</verification_checklist>
<integration>
**This skill calls:**
- internet-researcher (search errors, known bugs, solutions)
- codebase-investigator (understand code structure, find related code)
- test-driven-development (write test for bug, implement fix)
- test-runner (run tests without output pollution)
**This skill is called by:**
- fixing-bugs (complete bug fix workflow)
- root-cause-tracing (deep debugging for complex issues)
- Any skill when encountering unexpected behavior
**Agents used:**
- hyperpowers:internet-researcher (search for error solutions)
- hyperpowers:codebase-investigator (understand codebase context)
- hyperpowers:test-runner (run tests, return summary only)
</integration>
<resources>
**Detailed guides:**
- [Debugger reference](resources/debugger-reference.md) - LLDB, GDB, DevTools commands
- [Debugging session example](resources/debugging-session-example.md) - Complete walkthrough
**When stuck:**
- Error unclear → Dispatch internet-researcher with exact error text
- Don't understand code flow → Dispatch codebase-investigator
- Need to inspect runtime state → Recommend debugger to user or add instrumentation
- Tempted to guess → Stop, use tools to gather evidence first
</resources>

View File

@@ -0,0 +1,113 @@
## Debugger Quick Reference
### Automated Debugging (Claude CAN run these)
#### lldb Batch Mode
```bash
# One-shot command to inspect variable at breakpoint
lldb -o "breakpoint set --file main.rs --line 42" \
-o "run" \
-o "frame variable my_var" \
-o "quit" \
-- target/debug/myapp 2>&1
# With script file for complex debugging
cat > debug.lldb <<'EOF'
breakpoint set --file main.rs --line 42
run
frame variable
bt
up
frame variable
quit
EOF
lldb -s debug.lldb target/debug/myapp 2>&1
```
#### strace (Linux - system call tracing)
```bash
# See which files program opens
strace -e trace=open,openat cargo run 2>&1 | grep -v "ENOENT"
# Find network activity
strace -e trace=network cargo run 2>&1
# All syscalls with time
strace -tt cargo test some_test 2>&1
```
#### dtrace (macOS - dynamic tracing)
```bash
# Trace function calls
sudo dtrace -n 'pid$target:myapp::entry { printf("%s", probefunc); }' -p <PID>
```
### Interactive Debugging (USER runs these, Claude guides)
**These require interactive terminal - Claude provides commands, user runs them**
### lldb (Rust, Swift, C++)
```bash
# Start debugging
lldb target/debug/myapp
# Set breakpoints
(lldb) breakpoint set --file main.rs --line 42
(lldb) breakpoint set --name my_function
# Run
(lldb) run
(lldb) run arg1 arg2
# When paused:
(lldb) frame variable # Show all locals
(lldb) print my_var # Print specific variable
(lldb) bt # Backtrace (stack)
(lldb) up / down # Navigate stack
(lldb) continue # Resume
(lldb) step / next # Step into / over
(lldb) finish # Run until return
```
### Browser DevTools (JavaScript)
```javascript
// In code:
debugger; // Execution pauses here
// In DevTools:
// - Sources tab → Add breakpoint by clicking line number
// - When paused:
// - Scope panel: See all variables
// - Watch: Add expressions to watch
// - Call stack: Navigate callers
// - Step over (F10), Step into (F11)
```
### gdb (C, C++, Go)
```bash
# Start debugging
gdb ./myapp
# Set breakpoints
(gdb) break main.c:42
(gdb) break myfunction
# Run
(gdb) run
# When paused:
(gdb) print myvar
(gdb) info locals
(gdb) backtrace
(gdb) up / down
(gdb) continue
(gdb) step / next
```

View File

@@ -0,0 +1,73 @@
## Example: Complete Debugging Session
**Problem:** Test fails with "Symbol not found: _OBJC_CLASS_$_WKWebView"
**Phase 1: Investigation**
1. **Read error**: Symbol not found, linking issue
2. **Internet research**:
```
Dispatch hyperpowers:internet-researcher:
"Search for 'dyld Symbol not found _OBJC_CLASS_$_WKWebView'
Focus on: Xcode linking, framework configuration, iOS deployment"
Results: Need to link WebKit framework in Xcode project
```
3. **Debugger**: Not needed, linking happens before runtime
4. **Codebase investigation**:
```
Dispatch hyperpowers:codebase-investigator:
"Find other code using WKWebView - how is WebKit linked?"
Results: Main app target has WebKit in frameworks, test target doesn't
```
**Phase 2: Analysis**
Root cause: Test target doesn't link WebKit framework
Evidence: Main target works, test target fails, Stack Overflow confirms
**Phase 3: Testing**
Hypothesis: Adding WebKit to test target will fix it
Minimal test:
1. Add WebKit.framework to test target
2. Clean build
3. Run tests
```
Dispatch hyperpowers:test-runner: "Run: swift test"
Result: ✓ All tests pass
```
**Phase 4: Implementation**
1. Test already exists (the failing test)
2. Fix: Framework linked
3. Verification: Tests pass
4. Update bd:
```bash
bd close bd-123
```
**Time:** 15 minutes systematic vs. 2+ hours guessing
## Remember
- **Tools make debugging faster**, not slower
- **hyperpowers:internet-researcher** can find solutions in seconds
- **Automated debugging works** - lldb batch mode, strace, instrumentation
- **hyperpowers:codebase-investigator** finds patterns you'd miss
- **hyperpowers:test-runner agent** keeps context clean
- **Evidence before fixes**, always
**Prefer automated tools:**
1. lldb batch mode - non-interactive variable inspection
2. strace/dtrace - system call tracing
3. Instrumentation - logging Claude can add
4. Interactive debugger - only when automated tools insufficient
95% faster to investigate systematically than to guess-and-check.

View File

@@ -0,0 +1,662 @@
---
name: dispatching-parallel-agents
description: Use when facing 3+ independent failures that can be investigated without shared state or dependencies - dispatches multiple Claude agents to investigate and fix independent problems concurrently
---
<skill_overview>
When facing 3+ independent failures, dispatch one agent per problem domain to investigate concurrently; verify independence first, dispatch all in single message, wait for all agents, check conflicts, verify integration.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Follow the 6-step process (identify, create tasks, dispatch, monitor, review, verify) strictly. Independence verification mandatory. Parallel dispatch in single message required. Adapt agent prompt content to problem domain.
</rigidity_level>
<quick_reference>
| Step | Action | Critical Rule |
|------|--------|---------------|
| 1. Identify Domains | Test independence (fix A doesn't affect B) | 3+ independent domains required |
| 2. Create Agent Tasks | Write focused prompts (scope, goal, constraints, output) | One prompt per domain |
| 3. Dispatch Agents | Launch all agents in SINGLE message | Multiple Task() calls in parallel |
| 4. Monitor Progress | Track completions, don't integrate until ALL done | Wait for all agents |
| 5. Review Results | Read summaries, check conflicts | Manual conflict resolution |
| 6. Verify Integration | Run full test suite | Use verification-before-completion |
**Why 3+?** With only 2 failures, coordination overhead often exceeds sequential time.
**Critical:** Dispatch all agents in single message with multiple Task() calls, or they run sequentially.
</quick_reference>
<when_to_use>
Use when:
- 3+ test files failing with different root causes
- Multiple subsystems broken independently
- Each problem can be understood without context from others
- No shared state between investigations
- You've verified failures are truly independent
- Each domain has clear boundaries (different files, modules, features)
Don't use when:
- Failures are related (fix one might fix others)
- Need to understand full system state first
- Agents would interfere (editing same files)
- Haven't verified independence yet (exploratory phase)
- Failures share root cause (one bug, multiple symptoms)
- Need to preserve investigation order (cascading failures)
- Only 2 failures (overhead exceeds benefit)
</when_to_use>
<the_process>
## Step 1: Identify Independent Domains
**Announce:** "I'm using hyperpowers:dispatching-parallel-agents to investigate these independent failures concurrently."
**Create TodoWrite tracker:**
```
- Identify independent domains (3+ domains identified)
- Create agent tasks (one prompt per domain drafted)
- Dispatch agents in parallel (all agents launched in single message)
- Monitor agent progress (track completions)
- Review results (summaries read, conflicts checked)
- Verify integration (full test suite green)
```
**Test for independence:**
1. **Ask:** "If I fix failure A, does it affect failure B?"
- If NO → Independent
- If YES → Related, investigate together
2. **Check:** "Do failures touch same code/files?"
- If NO → Likely independent
- If YES → Check if different functions/areas
3. **Verify:** "Do failures share error patterns?"
- If NO → Independent
- If YES → Might be same root cause
**Example independence check:**
```
Failure 1: Authentication tests failing (auth.test.ts)
Failure 2: Database query tests failing (db.test.ts)
Failure 3: API endpoint tests failing (api.test.ts)
Check: Does fixing auth affect db queries? NO
Check: Does fixing db affect API? YES - API uses db
Result: 2 independent domains:
Domain 1: Authentication (auth.test.ts)
Domain 2: Database + API (db.test.ts + api.test.ts together)
```
**Group failures by what's broken:**
- File A tests: Tool approval flow
- File B tests: Batch completion behavior
- File C tests: Abort functionality
---
## Step 2: Create Focused Agent Tasks
Each agent prompt must have:
1. **Specific scope:** One test file or subsystem
2. **Clear goal:** Make these tests pass
3. **Constraints:** Don't change other code
4. **Expected output:** Summary of what you found and fixed
**Good agent prompt example:**
```markdown
Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:
1. "should abort tool with partial output capture" - expects 'interrupted at' in message
2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed
3. "should properly track pendingToolCount" - expects 3 results but gets 0
These are timing/race condition issues. Your task:
1. Read the test file and understand what each test verifies
2. Identify root cause - timing issues or actual bugs?
3. Fix by:
- Replacing arbitrary timeouts with event-based waiting
- Fixing bugs in abort implementation if found
- Adjusting test expectations if testing changed behavior
Never just increase timeouts - find the real issue.
Return: Summary of what you found and what you fixed.
```
**What makes this good:**
- Specific test failures listed
- Context provided (timing/race conditions)
- Clear methodology (read, identify, fix)
- Constraints (don't just increase timeouts)
- Output format (summary)
**Common mistakes:**
**Too broad:** "Fix all the tests" - agent gets lost
**Specific:** "Fix agent-tool-abort.test.ts" - focused scope
**No context:** "Fix the race condition" - agent doesn't know where
**Context:** Paste the error messages and test names
**No constraints:** Agent might refactor everything
**Constraints:** "Do NOT change production code" or "Fix tests only"
**Vague output:** "Fix it" - you don't know what changed
**Specific:** "Return summary of root cause and changes"
---
## Step 3: Dispatch All Agents in Parallel
**CRITICAL:** You must dispatch all agents in a SINGLE message with multiple Task() calls.
```typescript
// ✅ CORRECT - Single message with multiple parallel tasks
Task("Fix agent-tool-abort.test.ts failures", prompt1)
Task("Fix batch-completion-behavior.test.ts failures", prompt2)
Task("Fix tool-approval-race-conditions.test.ts failures", prompt3)
// All three run concurrently
// ❌ WRONG - Sequential messages
Task("Fix agent-tool-abort.test.ts failures", prompt1)
// Wait for response
Task("Fix batch-completion-behavior.test.ts failures", prompt2)
// This is sequential, not parallel!
```
**After dispatch:**
- Mark "Dispatch agents in parallel" as completed in TodoWrite
- Mark "Monitor agent progress" as in_progress
- Wait for all agents to complete before integration
---
## Step 4: Monitor Progress
As agents work:
- Note which agents have completed
- Note which are still running
- Don't start integration until ALL agents done
**If an agent gets stuck (>5 minutes):**
1. Check AgentOutput to see what it's doing
2. If stuck on wrong path: Cancel and retry with clearer prompt
3. If needs context from other domain: Wait for other agent, then restart with context
4. If hit real blocker: Investigate blocker yourself, then retry
---
## Step 5: Review Results and Check Conflicts
**When all agents return:**
1. **Read each summary carefully**
- What was the root cause?
- What did the agent change?
- Were there any uncertainties?
2. **Check for conflicts**
- Did multiple agents edit same files?
- Did agents make contradictory assumptions?
- Are there integration points between domains?
3. **Integration strategy:**
- If no conflicts: Apply all changes
- If conflicts: Resolve manually before applying
- If assumptions conflict: Verify with user
4. **Document what happened**
- Which agents fixed what
- Any conflicts found
- Integration decisions made
---
## Step 6: Verify Integration
**Run full test suite:**
- Not just the fixed tests
- Verify no regressions in other areas
- Use hyperpowers:verification-before-completion skill
**Before completing:**
```bash
# Run all tests
npm test # or cargo test, pytest, etc.
# Verify output
# If all pass → Mark "Verify integration" complete
# If failures → Identify which agent's change caused regression
```
</the_process>
<examples>
<example>
<scenario>Developer dispatches agents sequentially instead of in parallel</scenario>
<code>
# Developer sees 3 independent failures
# Creates 3 agent prompts
# Dispatches first agent
Task("Fix agent-tool-abort.test.ts failures", prompt1)
# Waits for response from agent 1
# Then dispatches second agent
Task("Fix batch-completion-behavior.test.ts failures", prompt2)
# Waits for response from agent 2
# Then dispatches third agent
Task("Fix tool-approval-race-conditions.test.ts failures", prompt3)
# Total time: Sum of all three agents (sequential)
</code>
<why_it_fails>
- Agents run sequentially, not in parallel
- No time savings from parallelization
- Each agent waits for previous to complete
- Defeats entire purpose of parallel dispatch
- Same result as sequential investigation
- Wasted overhead of creating separate agents
</why_it_fails>
<correction>
**Dispatch all agents in SINGLE message:**
```typescript
// Single message with multiple Task() calls
Task("Fix agent-tool-abort.test.ts failures", `
Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:
[prompt 1 content]
`)
Task("Fix batch-completion-behavior.test.ts failures", `
Fix the 2 failing tests in src/agents/batch-completion-behavior.test.ts:
[prompt 2 content]
`)
Task("Fix tool-approval-race-conditions.test.ts failures", `
Fix the 1 failing test in src/agents/tool-approval-race-conditions.test.ts:
[prompt 3 content]
`)
// All three run concurrently - THIS IS THE KEY
```
**What happens:**
- All three agents start simultaneously
- Each investigates independently
- All complete in parallel
- Total time: Max(agent1, agent2, agent3) instead of Sum
**What you gain:**
- True parallelization - 3 problems solved concurrently
- Time saved: 3 investigations in time of 1
- Each agent focused on narrow scope
- No waiting for sequential completion
- Proper use of parallel dispatch pattern
</correction>
</example>
<example>
<scenario>Developer assumes failures are independent without verification</scenario>
<code>
# Developer sees 3 test failures:
# - API endpoint tests failing
# - Database query tests failing
# - Cache invalidation tests failing
# Thinks: "Different subsystems, must be independent"
# Dispatches 3 agents immediately without checking independence
# Agent 1 finds: API failing because database schema changed
# Agent 2 finds: Database queries need migration
# Agent 3 finds: Cache keys based on old schema
# All three failures caused by same root cause: schema change
# Agents make conflicting fixes based on different assumptions
# Integration fails because fixes contradict each other
</code>
<why_it_fails>
- Skipped independence verification (Step 1)
- Assumed independence based on surface appearance
- All failures actually shared root cause (schema change)
- Agents worked in isolation without seeing connection
- Each agent made different assumptions about correct schema
- Conflicting fixes can't be integrated
- Wasted time on parallel work that should have been unified
- Have to throw away agent work and start over
</why_it_fails>
<correction>
**Run independence check FIRST:**
```
Check: Does fixing API affect database queries?
- API uses database
- If database schema changes, API breaks
- YES - these are related
Check: Does fixing database affect cache?
- Cache stores database results
- If database schema changes, cache keys break
- YES - these are related
Check: Do failures share error patterns?
- All mention "column not found: user_email"
- All started after schema migration
- YES - shared root cause
Result: NOT INDEPENDENT
These are one problem (schema change) manifesting in 3 places
```
**Correct approach:**
```
Single agent investigates: "Schema migration broke 3 subsystems"
Agent prompt:
"We have 3 test failures all related to schema change:
1. API endpoints: column not found
2. Database queries: column not found
3. Cache invalidation: old keys
Investigate the schema migration that caused this.
Fix by updating all 3 subsystems consistently.
Return: What changed in schema, how you fixed each subsystem."
# One agent sees full picture
# Makes consistent fix across all 3 areas
# No conflicts, proper integration
```
**What you gain:**
- Caught shared root cause before wasting time
- One agent sees full context
- Consistent fix across all affected areas
- No conflicting assumptions
- No integration conflicts
- Faster than 3 agents working at cross-purposes
- Proper problem diagnosis before parallel dispatch
</correction>
</example>
<example>
<scenario>Developer integrates agent results without checking conflicts</scenario>
<code>
# 3 agents complete successfully
# Developer quickly reads summaries:
Agent 1: "Fixed timeout issue by increasing wait time to 5000ms"
Agent 2: "Fixed race condition by adding mutex lock"
Agent 3: "Fixed timing issue by reducing wait time to 1000ms"
# Developer thinks: "All agents succeeded, ship it"
# Applies all changes without checking conflicts
# Result:
# - Agent 1 and Agent 3 edited same file
# - Agent 1 increased timeout, Agent 3 decreased it
# - Final code has inconsistent timeouts
# - Agent 2's mutex interacts badly with Agent 3's reduced timeout
# - Tests still fail after integration
</code>
<why_it_fails>
- Skipped conflict checking (Step 5)
- Didn't carefully read what each agent changed
- Agents made contradictory decisions
- Agent 1 and Agent 3 had different assumptions about timing
- Agent 2's locking interacts with timing changes
- Blindly applying all fixes creates inconsistent state
- Tests fail after "successful" integration
- Have to manually untangle conflicting changes
</why_it_fails>
<correction>
**Review results carefully before integration:**
```markdown
## Agent Summaries Review
Agent 1: Fixed timeout issue by increasing wait time to 5000ms
- File: src/agents/tool-executor.ts
- Change: DEFAULT_TIMEOUT = 5000
Agent 2: Fixed race condition by adding mutex lock
- File: src/agents/tool-executor.ts
- Change: Added mutex around tool execution
Agent 3: Fixed timing issue by reducing wait time to 1000ms
- File: src/agents/tool-executor.ts
- Change: DEFAULT_TIMEOUT = 1000
## Conflict Analysis
**CONFLICT DETECTED:**
- Agents 1 and 3 edited same file (tool-executor.ts)
- Agents 1 and 3 changed same constant (DEFAULT_TIMEOUT)
- Agent 1: increase to 5000ms
- Agent 3: decrease to 1000ms
- Contradictory assumptions about correct timing
**Why conflict occurred:**
- Domains weren't actually independent (same timeout constant)
- Both agents tested locally, didn't see interaction
- Different problem spaces led to different timing needs
## Resolution
**Option 1:** Different timeouts for different operations
```typescript
const TOOL_EXECUTION_TIMEOUT = 5000 // Agent 1's need
const TOOL_APPROVAL_TIMEOUT = 1000 // Agent 3's need
```
**Option 2:** Investigate why timing varies
- Maybe Agent 1's tests are actually slow (fix slowness)
- Maybe Agent 3's tests are correct (use 1000ms everywhere)
**Choose Option 2 after investigation:**
- Agent 1's tests were slow due to unrelated issue
- Fix the slowness, use 1000ms timeout everywhere
- Agent 2's mutex is compatible with 1000ms
**Integration steps:**
1. Apply Agent 2's mutex (no conflict)
2. Apply Agent 3's 1000ms timeout
3. Fix Agent 1's slow tests (root cause)
4. Don't apply Agent 1's timeout increase (symptom fix)
```
**Run full test suite:**
```bash
npm test
# All tests pass ✅
```
**What you gain:**
- Caught contradiction before breaking integration
- Understood why agents made different decisions
- Resolved conflict thoughtfully, not arbitrarily
- Fixed root cause (slow tests) not symptom (long timeout)
- Verified integration works correctly
- Avoided shipping inconsistent code
- Professional conflict resolution process
</correction>
</example>
</examples>
<failure_modes>
## Agent Gets Stuck
**Symptoms:** No progress after 5+ minutes
**Causes:**
- Prompt too vague, agent exploring aimlessly
- Domain not actually independent, needs context from other agents
- Agent hit a blocker (missing file, unclear error)
**Recovery:**
1. Use AgentOutput tool to check what it's doing
2. If stuck on wrong path: Cancel and retry with clearer prompt
3. If needs context from other domain: Wait for other agent, then restart with context
4. If hit real blocker: Investigate blocker yourself, then retry
---
## Agents Return Conflicting Fixes
**Symptoms:** Agents edited same code differently, or made contradictory assumptions
**Causes:**
- Domains weren't actually independent
- Shared code between domains
- Agents made different assumptions about correct behavior
**Recovery:**
1. Don't apply either fix automatically
2. Read both fixes carefully
3. Identify the conflict point
4. Resolve manually based on which assumption is correct
5. Consider if domains should be merged
---
## Integration Breaks Other Tests
**Symptoms:** Fixed tests pass, but other tests now fail
**Causes:**
- Agent changed shared code
- Agent's fix was too broad
- Agent misunderstood requirements
**Recovery:**
1. Identify which agent's change caused the regression
2. Read the agent's summary - did they mention this change?
3. Evaluate if change is correct but tests need updating
4. Or if change broke something, need to refine the fix
5. Use hyperpowers:verification-before-completion skill for final check
---
## False Independence
**Symptoms:** Fixing one domain revealed it affected another
**Recovery:**
1. Merge the domains
2. Have one agent investigate both together
3. Learn: Better independence test needed upfront
</failure_modes>
<critical_rules>
## Rules That Have No Exceptions
1. **Verify independence first** → Test with questions before dispatching
2. **3+ domains required** → 2 failures: overhead exceeds benefit, do sequentially
3. **Single message dispatch** → All agents in one message with multiple Task() calls
4. **Wait for ALL agents** → Don't integrate until all complete
5. **Check conflicts manually** → Read summaries, verify no contradictions
6. **Verify integration** → Run full suite yourself, don't trust agents
7. **TodoWrite tracking** → Track agent progress explicitly
## Common Excuses
All of these mean: **STOP. Follow the process.**
- "Just 2 failures, can still parallelize" (Overhead exceeds benefit, do sequentially)
- "Probably independent, will dispatch and see" (Verify independence FIRST)
- "Can dispatch sequentially to save syntax" (WRONG - must dispatch in single message)
- "Agent failed, but others succeeded - ship it" (All agents must succeed or re-investigate)
- "Conflicts are minor, can ignore" (Resolve all conflicts explicitly)
- "Don't need TodoWrite for just tracking agents" (Use TodoWrite, track properly)
- "Can skip verification, agents ran tests" (Agents can make mistakes, YOU verify)
</critical_rules>
<verification_checklist>
Before completing parallel agent work:
- [ ] Verified independence with 3 questions (fix A affects B? same code? same error pattern?)
- [ ] 3+ independent domains identified (not 2 or fewer)
- [ ] Created focused agent prompts (scope, goal, constraints, output)
- [ ] Dispatched all agents in single message (multiple Task() calls)
- [ ] Waited for ALL agents to complete (didn't integrate early)
- [ ] Read all agent summaries carefully
- [ ] Checked for conflicts (same files, contradictory assumptions)
- [ ] Resolved any conflicts manually before integration
- [ ] Ran full test suite (not just fixed tests)
- [ ] Used verification-before-completion skill
- [ ] Documented which agents fixed what
**Can't check all boxes?** Return to the process and complete missing steps.
</verification_checklist>
<integration>
**This skill covers:** Parallel investigation of independent failures
**Related skills:**
- hyperpowers:debugging-with-tools (how to investigate individual failures)
- hyperpowers:fixing-bugs (complete bug workflow)
- hyperpowers:verification-before-completion (verify integration)
- hyperpowers:test-runner (run tests without context pollution)
**This skill uses:**
- Task tool (dispatch parallel agents)
- AgentOutput tool (monitor stuck agents)
- TodoWrite (track agent progress)
**Workflow integration:**
```
Multiple independent failures
Verify independence (Step 1)
Create agent tasks (Step 2)
Dispatch in parallel (Step 3)
Monitor progress (Step 4)
Review + check conflicts (Step 5)
Verify integration (Step 6)
hyperpowers:verification-before-completion
```
**Real example from session (2025-10-03):**
- 6 failures across 3 files
- 3 agents dispatched in parallel
- All investigations completed concurrently
- All fixes integrated successfully
- Zero conflicts between agent changes
- Time saved: 3 problems solved in parallel vs sequentially
</integration>
<resources>
**Key principles:**
- Parallelization only wins with 3+ independent problems
- Independence verification prevents wasted parallel work
- Single message dispatch is critical for true parallelism
- Conflict checking prevents integration disasters
- Full verification catches agent mistakes
**When stuck:**
- Agent not making progress → Check AgentOutput, retry with clearer prompt
- Conflicts after dispatch → Domains weren't independent, merge and retry
- Integration fails tests → Identify which agent caused regression
- Unclear if independent → Test with 3 questions (affects? same code? same error?)
</resources>

View File

@@ -0,0 +1,571 @@
---
name: executing-plans
description: Use to execute bd tasks iteratively - executes one task, reviews learnings, creates/refines next task, then STOPS for user review before continuing
---
<skill_overview>
Execute bd tasks one at a time with mandatory checkpoints: Load epic → Execute task → Review learnings → Create next task → Run SRE refinement → STOP. User clears context, reviews implementation, then runs command again to continue. Epic requirements are immutable, tasks adapt to reality.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow exact process: load epic, execute ONE task, review, create next task with SRE refinement, STOP.
Epic requirements are immutable. Tasks adapt to discoveries. Do not skip checkpoints, SRE refinement, or verification. STOP after each task for user review.
</rigidity_level>
<quick_reference>
| Step | Command | Purpose |
|------|---------|---------|
| **Load Epic** | `bd show bd-1` | Read immutable requirements once at start |
| **Find Task** | `bd ready` | Get next ready task to execute |
| **Start Task** | `bd update bd-2 --status in_progress` | Mark task active |
| **Track Substeps** | TodoWrite for each implementation step | Prevent incomplete execution |
| **Close Task** | `bd close bd-2` | Mark task complete after verification |
| **Review** | Re-read epic, check learnings | Adapt next task to reality |
| **Create Next** | `bd create "Task N"` | Based on learnings, not assumptions |
| **Refine** | Use `sre-task-refinement` skill | Corner-case analysis with Opus 4.1 |
| **STOP** | Present summary to user | User reviews, clears context, runs command again |
| **Final Check** | Use `review-implementation` skill | Verify all success criteria before closing epic |
**Critical:** Epic = contract (immutable). Tasks = discovery (adapt to reality). STOP after each task for user review.
</quick_reference>
<when_to_use>
**Use after hyperpowers:writing-plans creates epic and first task.**
Symptoms you need this:
- bd epic exists with tasks ready to execute
- Need to implement features iteratively
- Requirements clear, but implementation path will adapt
- Want continuous learning between tasks
</when_to_use>
<the_process>
## 0. Resumption Check (Every Invocation)
This skill supports explicit resumption. When invoked:
```bash
bd list --type epic --status open # Find active epic
bd ready # Check for ready tasks
bd list --status in_progress # Check for in-progress tasks
```
**Fresh start:** No in-progress tasks, proceed to Step 1.
**Resuming:** Found ready or in-progress tasks:
- In-progress task exists → Resume at Step 2 (continue executing)
- Ready task exists → Resume at Step 2 (start executing)
- All tasks closed but epic open → Resume at Step 4 (check criteria)
**Why resumption matters:**
- User cleared context between tasks (intended workflow)
- Context limit reached mid-task
- Previous session ended unexpectedly
**Do not ask "where did we leave off?"** - bd state tells you exactly where to resume.
## 1. Load Epic Context (Once at Start)
Before executing ANY task, load the epic into context:
```bash
bd list --type epic --status open # Find epic
bd show bd-1 # Load epic details
```
**Extract and keep in mind:**
- Requirements (IMMUTABLE)
- Success criteria (validation checklist)
- Anti-patterns (FORBIDDEN shortcuts)
- Approach (high-level strategy)
**Why:** Requirements prevent watering down when blocked.
## 2. Execute Current Ready Task
```bash
bd ready # Find next task
bd update bd-2 --status in_progress # Start it
bd show bd-2 # Read details
```
**CRITICAL - Create TodoWrite for ALL substeps:**
Tasks contain 4-8 implementation steps. Create TodoWrite todos for each to prevent incomplete execution:
```
- bd-2 Step 1: Write test (pending)
- bd-2 Step 2: Run test RED (pending)
- bd-2 Step 3: Implement function (pending)
- bd-2 Step 4: Run test GREEN (pending)
- bd-2 Step 5: Refactor (pending)
- bd-2 Step 6: Commit (pending)
```
**Execute steps:**
- Use `test-driven-development` when implementing features
- Mark each substep completed immediately after finishing
- Use `test-runner` agent for verifications
**Pre-close verification:**
- Check TodoWrite: All substeps completed?
- If incomplete: Continue with remaining substeps
- If complete: Close task and commit
```bash
bd close bd-2 # After ALL substeps done
```
## 3. Review Against Epic and Create Next Task
**CRITICAL:** After each task, adapt plan based on reality.
**Review questions:**
1. What did we learn?
2. Discovered any blockers, existing functionality, limitations?
3. Does this move us toward epic success criteria?
4. What's next logical step?
5. Any epic anti-patterns to avoid?
**Re-read epic:**
```bash
bd show bd-1 # Keep requirements fresh
```
**Three cases:**
**A) Next task still valid** → Proceed to Step 2
**B) Next task now redundant** (plan invalidation allowed):
```bash
bd delete bd-4 # Remove wasteful task
# Or update: bd update bd-4 --title "New work" --design "..."
```
**C) Need new task** based on learnings:
```bash
bd create "Task N: [Next Step Based on Reality]" \
--type feature \
--design "## Goal
[Deliverable based on what we learned]
## Context
Completed bd-2: [discoveries]
## Implementation
[Steps reflecting current state, not assumptions]
## Success Criteria
- [ ] Specific outcomes
- [ ] Tests passing"
bd dep add bd-N bd-1 --type parent-child
bd dep add bd-N bd-2 --type blocks
```
**REQUIRED - Run SRE refinement on new task:**
```
Use Skill tool: hyperpowers:sre-task-refinement
```
SRE refinement will:
- Apply 7-category corner-case analysis (Opus 4.1)
- Identify edge cases and failure modes
- Strengthen success criteria
- Ensure task is ready for implementation
**Do NOT skip SRE refinement.** New tasks need the same rigor as initial planning.
## 4. Check Epic Success Criteria and STOP
```bash
bd show bd-1 # Check success criteria
```
- ALL criteria met? → Step 5 (final validation)
- Some missing? → **STOP for user review**
## 4a. STOP Checkpoint (Mandatory)
**Present summary to user:**
```markdown
## Task bd-N Complete - Checkpoint
### What Was Done
- [Summary of implementation]
- [Key learnings/discoveries]
### Next Task Ready
- bd-M: [Title]
- [Brief description of what's next]
### Epic Progress
- [X/Y success criteria met]
- [Remaining criteria]
### To Continue
Run `/hyperpowers:execute-plan` to execute the next task.
```
**Why STOP is mandatory:**
- User can clear context (prevents context exhaustion)
- User can review implementation before next task
- User can adjust next task if needed
- Prevents runaway execution without oversight
**Do NOT rationalize skipping the stop:**
- "Good context loaded" → Context reloads are cheap, wrong decisions aren't
- "Momentum" → Checkpoints ensure quality over speed
- "User didn't ask to stop" → Stopping is the default, continuing requires explicit command
## 5. Final Validation and Closure
When all success criteria appear met:
1. **Run full verification** (tests, hooks, manual checks)
2. **REQUIRED - Use review-implementation skill:**
```
Use Skill tool: hyperpowers:review-implementation
```
Review-implementation will:
- Check each requirement met
- Verify each success criterion satisfied
- Confirm no anti-patterns used
- If approved: Calls `finishing-a-development-branch`
- If gaps: Create tasks, return to Step 2
3. **Only close epic after review approves**
</the_process>
<examples>
<example>
<scenario>Developer closes task without completing all substeps, claims "mostly done"</scenario>
<code>
bd-2 has 6 implementation steps.
TodoWrite shows:
- ✅ bd-2 Step 1: Write test
- ✅ bd-2 Step 2: Run test RED
- ✅ bd-2 Step 3: Implement function
- ⏸️ bd-2 Step 4: Run test GREEN (pending)
- ⏸️ bd-2 Step 5: Refactor (pending)
- ⏸️ bd-2 Step 6: Commit (pending)
Developer thinks: "Function works, I'll close bd-2 and move on"
Runs: bd close bd-2
</code>
<why_it_fails>
Steps 4-6 skipped:
- Tests not verified GREEN (might have broken other tests)
- Code not refactored (leaves technical debt)
- Changes not committed (work could be lost)
"Mostly done" = incomplete task = will cause issues later.
</why_it_fails>
<correction>
**Pre-close verification checkpoint:**
Before closing ANY task:
1. Check TodoWrite: All substeps completed?
2. If incomplete: Continue with remaining substeps
3. Only when ALL ✅: bd close bd-2
**Result:** Task actually complete, tests passing, code committed.
</correction>
</example>
<example>
<scenario>Developer discovers planned task is redundant, executes it anyway "because it's in the plan"</scenario>
<code>
bd-4 says: "Implement token refresh middleware"
While executing bd-2, developer discovers:
- Token refresh middleware already exists in auth/middleware/refresh.ts
- Works correctly, has tests
- bd-4 would duplicate existing code
Developer thinks: "bd-4 is in the plan, I should do it anyway"
Proceeds to implement duplicate middleware
</code>
<why_it_fails>
**Wasteful execution:**
- Duplicates existing functionality
- Creates maintenance burden (two implementations to keep in sync)
- Violates DRY principle
- Wastes time on redundant work
**Why it happens:** Treating tasks as immutable instead of epic.
</why_it_fails>
<correction>
**Plan invalidation is allowed:**
1. Verify the discovery:
```bash
# Check existing code
cat auth/middleware/refresh.ts
# Confirm it works
npm test -- refresh.spec.ts
```
2. Delete redundant task:
```bash
bd delete bd-4
```
3. Document why:
```
bd update bd-2 --design "...
Discovery: Token refresh middleware already exists (auth/middleware/refresh.ts).
Verified working with tests. bd-4 deleted as redundant."
```
4. Create new task if needed (maybe "Integrate existing refresh middleware" instead)
**Result:** Plan adapts to reality. No wasted work.
</correction>
</example>
<example>
<scenario>Developer hits blocker, waters down epic requirement to "make it easier"</scenario>
<code>
Epic bd-1 anti-patterns say:
"FORBIDDEN: Using mocks for database integration tests. Must use real test database."
Developer encounters:
- Real database setup is complex
- Mocking would make tests pass quickly
Developer thinks: "This is too hard, I'll use mocks just for now and refactor later"
Adds TODO: // TODO: Replace mocks with real DB later
</code>
<why_it_fails>
**Violates epic anti-pattern:**
- Epic explicitly forbids mocks for integration tests
- "Later" never happens (TODO remains forever)
- Tests don't verify actual integration
- Defeats purpose of integration testing
**Why it happens:** Rationalizing around blockers instead of solving them.
</why_it_fails>
<correction>
**When blocked, re-read epic:**
1. Re-read epic requirements and anti-patterns:
```bash
bd show bd-1
```
2. Check if solution violates anti-pattern:
- Using mocks? YES, explicitly forbidden
3. Don't rationalize. Instead:
**Option A - Research:**
```bash
bd create "Research: Real DB test setup for [project]" \
--design "Find how this project sets up test databases.
Check existing test files for patterns.
Document setup process that meets anti-pattern requirements."
```
**Option B - Ask user:**
"Blocker: Test DB setup complex. Epic forbids mocks for integration.
Is there existing test DB infrastructure I should use?"
**Result:** Epic requirements maintained. Blocker solved properly.
</correction>
</example>
<example>
<scenario>Developer skips STOP checkpoint to "maintain momentum"</scenario>
<code>
Just completed bd-2 (authentication middleware).
Created bd-3 (rate limiting endpoint).
Ran SRE refinement on bd-3.
Developer thinks: "Good context loaded, I'll just do bd-3 quickly then stop.
User approved the epic, they trust me to execute it.
Stopping now is inefficient."
Continues directly to execute bd-3 without STOP checkpoint.
</code>
<why_it_fails>
**Multiple failures:**
- User can't review bd-2 implementation before bd-3 starts
- User can't clear context (may hit context limit mid-task)
- User can't adjust bd-3 based on bd-2 learnings
- No checkpoint = no oversight
**The rationalization trap:**
- "Good context" sounds efficient but prevents review
- "User trust" misinterprets approval (one command ≠ blanket permission)
- "Quick task" becomes long task when issues arise
**What actually happens:**
- bd-3 hits unexpected issue
- Context exhausted trying to debug
- User returns to find 2 half-finished tasks instead of 1 complete task
</why_it_fails>
<correction>
**Follow the STOP checkpoint:**
1. After completing bd-2 and refining bd-3:
```markdown
## Task bd-2 Complete - Checkpoint
### What Was Done
- Implemented JWT middleware with validation
- Added token refresh handling
### Next Task Ready
- bd-3: Implement rate limiting
- Adds rate limiting to auth endpoints
### Epic Progress
- 2/4 success criteria met
- Remaining: password reset, rate limiting
### To Continue
Run `/hyperpowers:execute-plan` to execute the next task.
```
2. **STOP and wait for user**
**Result:** User can review, clear context, adjust next task. Each task completes with full oversight.
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **STOP after each task** → Present summary, wait for user to run command again
- User needs checkpoint to review implementation
- User may need to clear context
- Continuous execution = no oversight
2. **SRE refinement for new tasks** → Never skip corner-case analysis
- New tasks created during execution need same rigor as initial planning
- Use Opus 4.1 for thorough analysis
- Tasks without refinement will miss edge cases
3. **Epic requirements are immutable** → Never water down when blocked
- If blocked: Research solution or ask user
- Never violate anti-patterns to "make it easier"
4. **All substeps must be completed** → Never close task with pending substeps
- Check TodoWrite before closing
- "Mostly done" = incomplete = will cause issues
5. **Plan invalidation is allowed** → Delete redundant tasks
- If discovered existing functionality: Delete duplicate task
- If discovered blocker: Update or delete invalid task
- Document what you found and why
6. **Review before closing epic** → Use review-implementation skill
- Tasks done ≠ success criteria met
- All criteria must be verified before closing
## Common Excuses
All of these mean: Re-read epic, STOP as required, ask for help:
- "Good context loaded, don't want to lose it" → STOP anyway, context reloads
- "Just one more quick task" → STOP anyway, user needs checkpoint
- "User didn't ask me to stop" → Stopping is default, continuing requires explicit command
- "SRE refinement is overkill for this task" → Every task needs refinement, no exceptions
- "This requirement is too hard" → Research or ask, don't water down
- "I'll come back to this later" → Complete now or document why blocked
- "Let me fake this to make tests pass" → Never, defeats purpose
- "Existing task is wasteful, but it's planned" → Delete it, plan adapts to reality
- "All tasks done, epic must be complete" → Verify with review-implementation
</critical_rules>
<verification_checklist>
Before closing each task:
- [ ] ALL TodoWrite substeps completed (no pending)
- [ ] Tests passing (use test-runner agent)
- [ ] Changes committed
- [ ] Task actually done (not "mostly")
After closing each task:
- [ ] Reviewed learnings against epic
- [ ] Created/updated next task based on reality
- [ ] Ran SRE refinement on any new tasks
- [ ] Presented STOP checkpoint summary to user
- [ ] STOPPED execution (do not continue to next task)
Before closing epic:
- [ ] ALL success criteria met (check epic)
- [ ] review-implementation skill used and approved
- [ ] No anti-patterns violated
- [ ] All tasks closed
</verification_checklist>
<integration>
**This skill calls:**
- writing-plans (creates epic and first task before this runs)
- sre-task-refinement (REQUIRED for new tasks created during execution)
- test-driven-development (when implementing features)
- test-runner (for running tests without output pollution)
- review-implementation (final validation before closing epic)
- finishing-a-development-branch (after review approves)
**This skill is called by:**
- User (via /hyperpowers:execute-plan command)
- After writing-plans creates epic
- Explicitly to resume after checkpoint (user runs command again)
**Agents used:**
- hyperpowers:test-runner (run tests, return summary only)
**Workflow pattern:**
```
/hyperpowers:execute-plan → Execute task → STOP
[User clears context, reviews]
/hyperpowers:execute-plan → Execute next task → STOP
[Repeat until epic complete]
```
</integration>
<resources>
**bd command reference:**
- See [bd commands](../common-patterns/bd-commands.md) for complete command list
**When stuck:**
- Hit blocker → Re-read epic, check anti-patterns, research or ask
- Don't understand instruction → Stop and ask (never guess)
- Verification fails repeatedly → Check epic anti-patterns, ask for help
- Tempted to skip steps → Check TodoWrite, complete all substeps
</resources>

View File

@@ -0,0 +1,482 @@
---
name: finishing-a-development-branch
description: Use when implementation complete and tests pass - closes bd epic, presents integration options (merge/PR/keep/discard), executes choice
---
<skill_overview>
Close bd epic, verify tests pass, present 4 integration options, execute choice, cleanup worktree appropriately.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow the 6-step process exactly. Present exactly 4 options. Never skip test verification. Must confirm before discarding.
</rigidity_level>
<quick_reference>
| Step | Action | If Blocked |
|------|--------|------------|
| 1 | Close bd epic | Tasks still open → STOP |
| 2 | Verify tests pass (test-runner agent) | Tests fail → STOP |
| 3 | Determine base branch | Ask if needed |
| 4 | Present exactly 4 options | Wait for choice |
| 5 | Execute choice | Follow option workflow |
| 6 | Cleanup worktree (options 1,2,4 only) | Option 3 keeps worktree |
**Options:** 1=Merge locally, 2=PR, 3=Keep as-is, 4=Discard (confirm)
</quick_reference>
<when_to_use>
- Implementation complete and reviewed
- All bd tasks for epic are done
- Ready to integrate work back to main branch
- Called by hyperpowers:review-implementation (final step)
**Don't use for:**
- Work still in progress
- Tests failing
- Epic has open tasks
- Mid-implementation (use hyperpowers:executing-plans)
</when_to_use>
<the_process>
## Step 1: Close bd Epic
**Announce:** "I'm using hyperpowers:finishing-a-development-branch to complete this work."
**Verify all tasks closed:**
```bash
bd dep tree bd-1 # Show task tree
bd list --status open --parent bd-1 # Check for open tasks
```
**If any tasks still open:**
```
Cannot close epic bd-1: N tasks still open:
- bd-3: Task Name (status: in_progress)
- bd-5: Task Name (status: open)
Complete all tasks before finishing.
```
**STOP. Do not proceed.**
**If all tasks closed:**
```bash
bd close bd-1
```
---
## Step 2: Verify Tests
**IMPORTANT:** Use hyperpowers:test-runner agent to avoid context pollution.
Dispatch hyperpowers:test-runner agent:
```
Run: cargo test
(or: npm test / pytest / go test ./...)
```
Agent returns summary + failures only.
**If tests fail:**
```
Tests failing (N failures). Must fix before completing:
[Show failures]
Cannot proceed until tests pass.
```
**STOP. Do not proceed.**
**If tests pass:** Continue to Step 3.
---
## Step 3: Determine Base Branch
```bash
git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/null
```
Or ask: "This branch split from main - is that correct?"
---
## Step 4: Present Options
Present exactly these 4 options:
```
Implementation complete. What would you like to do?
1. Merge back to <base-branch> locally
2. Push and create a Pull Request
3. Keep the branch as-is (I'll handle it later)
4. Discard this work
Which option?
```
**Don't add explanation.** Keep concise.
---
## Step 5: Execute Choice
### Option 1: Merge Locally
```bash
git checkout <base-branch>
git pull
git merge <feature-branch>
# Verify tests on merged result
Dispatch hyperpowers:test-runner: "Run: <test command>"
# If tests pass
git branch -d <feature-branch>
```
Then: Step 6 (cleanup worktree)
---
### Option 2: Push and Create PR
**Get epic info:**
```bash
bd show bd-1
bd dep tree bd-1
```
**Create PR:**
```bash
git push -u origin <feature-branch>
gh pr create --title "feat: <epic-name>" --body "$(cat <<'EOF'
## Epic
Closes bd-<N>: <Epic Title>
## Summary
<2-3 bullets from epic implementation>
## Tasks Completed
- bd-2: <Task Name>
- bd-3: <Task Name>
## Test Plan
- [ ] All tests passing
- [ ] <verification steps from epic>
EOF
)"
```
Then: Step 6 (cleanup worktree)
---
### Option 3: Keep As-Is
Report: "Keeping branch <name>. Worktree preserved at <path>."
**Don't cleanup worktree.**
---
### Option 4: Discard
**Confirm first:**
```
This will permanently delete:
- Branch <name>
- All commits: <commit-list>
- Worktree at <path>
Type 'discard' to confirm.
```
Wait for exact "discard" confirmation.
**If confirmed:**
```bash
git checkout <base-branch>
git branch -D <feature-branch>
```
Then: Step 6 (cleanup worktree)
---
## Step 6: Cleanup Worktree
**For Options 1, 2, 4 only:**
```bash
# Check if in worktree
git worktree list | grep $(git branch --show-current)
# If yes
git worktree remove <worktree-path>
```
**For Option 3:** Keep worktree (don't cleanup).
</the_process>
<examples>
<example>
<scenario>Developer skips test verification before presenting options</scenario>
<code>
# Step 1: Epic closed ✓
bd close bd-1
# Step 2: SKIPPED test verification
# Jump directly to presenting options
"Implementation complete. What would you like to do?
1. Merge back to main locally
2. Push and create PR
..."
User selects Option 1
git checkout main
git merge feature-branch
# Tests fail! Broken code now on main
</code>
<why_it_fails>
- Skipped mandatory test verification
- Merged broken code to main branch
- Other developers pull broken main
- CI/CD fails, blocks deployment
- Must revert, fix, merge again (wasted time)
</why_it_fails>
<correction>
**Follow Step 2 strictly:**
```bash
# After closing epic
bd close bd-1 ✓
# MANDATORY: Verify tests BEFORE presenting options
Dispatch hyperpowers:test-runner agent: "Run: cargo test"
# Agent reports
"Test suite passed (127 tests, 0 failures, 2.3s)"
# NOW present options
"Implementation complete. What would you like to do?
1. Merge back to main locally
..."
```
**What you gain:**
- Confidence tests pass before integration
- No broken code merged to main
- CI/CD stays green
- Other developers unblocked
- Professional workflow
</correction>
</example>
<example>
<scenario>Developer auto-cleans worktree for PR option</scenario>
<code>
# User selects Option 2: Create PR
git push -u origin feature-auth
gh pr create --title "feat: Add OAuth" --body "..."
# Developer immediately cleans up worktree
git worktree remove ../feature-auth-worktree
# PR gets feedback: "Please add rate limiting"
# User: "Can you address the PR feedback?"
# Worktree is gone! Have to recreate it
git worktree add ../feature-auth-worktree feature-auth
# Lost local state, uncommitted experiments, etc.
</code>
<why_it_fails>
- Cleaned worktree when PR still active
- User likely needs worktree for PR feedback
- Have to recreate worktree for changes
- Lost any local uncommitted work
- Inefficient workflow
</why_it_fails>
<correction>
**Option 2 workflow (correct):**
```bash
git push -u origin feature-auth
gh pr create --title "feat: Add OAuth" --body "..."
# Report PR created
"Pull request created: https://github.com/user/repo/pull/42
Keeping worktree at ../feature-auth-worktree for PR updates."
# NO worktree cleanup
# User can address PR feedback in same worktree
```
**Cleanup happens later when:**
- PR is merged
- User explicitly requests cleanup
- User uses finishing-a-development-branch again after PR merges
**What you gain:**
- Worktree available for PR feedback
- No need to recreate worktree
- Preserve local state and experiments
- Efficient PR iteration workflow
</correction>
</example>
<example>
<scenario>Developer discards work without confirmation</scenario>
<code>
# User says: "Actually, discard this work"
# Developer immediately executes
git checkout main
git branch -D feature-experimental
git worktree remove ../feature-experimental-worktree
# 50 commits deleted
# User: "Wait, I meant discard the LAST commit, not the whole branch!"
# Too late - work is gone
</code>
<why_it_fails>
- No confirmation before destructive action
- Misunderstood user intent
- Permanent data loss
- No way to recover work
- Catastrophic for user
</why_it_fails>
<correction>
**Option 4 workflow (correct):**
```
User: "Discard this work"
"This will permanently delete:
- Branch feature-experimental
- All commits:
* a1b2c3d Add OAuth integration
* d4e5f6g Add rate limiting
* g7h8i9j Update tests
... (47 more commits)
- Worktree at ../feature-experimental-worktree
Type 'discard' to confirm."
# WAIT for exact confirmation
User types: "discard"
# NOW execute
git checkout main
git branch -D feature-experimental
git worktree remove ../feature-experimental-worktree
"Branch feature-experimental deleted."
```
**What you gain:**
- User sees exactly what will be deleted
- Explicit confirmation required
- Prevents accidental data loss
- Time to reconsider or clarify
- Safe destructive operations
</correction>
</example>
</examples>
<option_matrix>
| Option | Merge | Push | Keep Worktree | Cleanup Branch | Cleanup Worktree |
|--------|-------|------|---------------|----------------|------------------|
| 1. Merge locally | ✓ | - | - | ✓ | ✓ |
| 2. Create PR | - | ✓ | ✓ | - | - |
| 3. Keep as-is | - | - | ✓ | - | - |
| 4. Discard | - | - | - | ✓ (force) | ✓ |
</option_matrix>
<critical_rules>
## Rules That Have No Exceptions
1. **Never skip test verification** → Tests must pass before presenting options
2. **Present exactly 4 options** → No open-ended questions
3. **Require confirmation for Option 4** → Type "discard" exactly
4. **Keep worktree for Options 2 & 3** → PR and keep-as-is need worktree
5. **Verify tests after merge (Option 1)** → Merged result might break
## Common Excuses
All of these mean: **STOP. Follow the process.**
- "Tests passed earlier, don't need to verify" (Might have changed, verify now)
- "User knows what they want" (Present options, let them choose)
- "Obvious they want to discard" (Require explicit confirmation)
- "PR done, cleanup worktree" (PR likely needs updates, keep worktree)
- "Too many options" (Exactly 4, no more, no less)
</critical_rules>
<verification_checklist>
Before completing:
- [ ] bd epic closed (all child tasks closed)
- [ ] Tests verified passing (via test-runner agent)
- [ ] Presented exactly 4 options (no open-ended questions)
- [ ] Waited for user choice (didn't assume)
- [ ] If Option 4: Got typed "discard" confirmation
- [ ] Worktree cleaned for Options 1, 4 only (not 2, 3)
- [ ] If Option 1: Verified tests on merged result
**Can't check all boxes?** Return to process and complete missing steps.
</verification_checklist>
<integration>
**This skill is called by:**
- hyperpowers:review-implementation (final step after approval)
**Call chain:**
```
hyperpowers:executing-plans → hyperpowers:review-implementation → hyperpowers:finishing-a-development-branch
(if gaps found: STOP)
```
**This skill calls:**
- hyperpowers:test-runner agent (for test verification)
- bd commands (epic management)
- gh commands (PR creation)
**CRITICAL:** Never read `.beads/issues.jsonl` directly. Always use bd CLI commands.
</integration>
<resources>
**Detailed guides:**
- [Git worktree management](resources/worktree-guide.md)
- [PR description templates](resources/pr-templates.md)
- [bd epic reference in PRs](resources/bd-pr-integration.md)
**When stuck:**
- Tasks won't close → Check bd status, verify all child tasks done
- Tests fail → Fix before presenting options (can't proceed)
- User unsure → Explain options, but don't make choice for them
- Worktree won't remove → Might have uncommitted changes, ask user
</resources>

528
skills/fixing-bugs/SKILL.md Normal file
View File

@@ -0,0 +1,528 @@
---
name: fixing-bugs
description: Use when encountering a bug - complete workflow from discovery through debugging, bd issue, test-driven fix, verification, and closure
---
<skill_overview>
Bug fixing is a complete workflow: reproduce, track in bd, debug systematically, write test, fix, verify, close. Every bug gets a bd issue and regression test.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow exact workflow: create bd issue → debug with tools → write failing test → fix → verify → close.
Never skip tracking or regression test. Use debugging-with-tools for investigation, test-driven-development for fix.
</rigidity_level>
<quick_reference>
| Step | Action | Command/Skill |
|------|--------|---------------|
| **1. Track** | Create bd bug issue | `bd create "Bug: [description]" --type bug` |
| **2. Debug** | Systematic investigation | Use `debugging-with-tools` skill |
| **3. Test (RED)** | Write failing test reproducing bug | Use `test-driven-development` skill |
| **4. Fix (GREEN)** | Implement fix | Minimal code to pass test |
| **5. Verify** | Run full test suite | Use `verification-before-completion` skill |
| **6. Close** | Update and close bd issue | `bd close bd-123` |
**FORBIDDEN:** Fix without bd issue, fix without regression test
**REQUIRED:** Every bug gets tracked, tested, verified before closing
</quick_reference>
<when_to_use>
**Use when you discover a bug:**
- Test failure you need to fix
- Bug reported by user
- Unexpected behavior in development
- Regression from recent change
- Production issue (non-emergency)
**Production emergencies:** Abbreviated workflow OK (hotfix first), but still create bd issue and add regression tests afterward.
</when_to_use>
<the_process>
## 1. Create bd Bug Issue
**Track from the start:**
```bash
bd create "Bug: [Clear description]" --type bug --priority P1
# Returns: bd-123
```
**Document:**
```bash
bd edit bd-123 --design "
## Bug Description
[What's wrong]
## Reproduction Steps
1. Step one
2. Step two
## Expected Behavior
[What should happen]
## Actual Behavior
[What actually happens]
## Environment
[Version, OS, etc.]"
```
## 2. Debug Systematically
**REQUIRED: Use debugging-with-tools skill**
```
Use Skill tool: hyperpowers:debugging-with-tools
```
**debugging-with-tools will:**
- Use internet-researcher to search for error
- Recommend debugger or instrumentation
- Use codebase-investigator to understand context
- Guide to root cause (not symptom)
**Update bd issue with findings:**
```bash
bd edit bd-123 --design "[previous content]
## Investigation
[Root cause found via debugging]
[Tools used: debugger, internet search, etc.]"
```
## 3. Write Failing Test (RED Phase)
**REQUIRED: Use test-driven-development skill**
Write test that reproduces the bug:
```python
def test_rejects_empty_email():
"""Regression test for bd-123: Empty email accepted"""
with pytest.raises(ValidationError):
create_user(email="") # Should fail, currently passes
```
**Run test, verify it FAILS:**
```bash
pytest tests/test_user.py::test_rejects_empty_email
# Expected: PASS (bug exists)
# Should fail AFTER fix
```
**Why critical:** If test passes before fix, it doesn't test the bug.
## 4. Implement Fix (GREEN Phase)
**Fix the root cause (not symptom):**
```python
def create_user(email: str):
if not email or not email.strip(): # Fix
raise ValidationError("Email required")
# ... rest
```
**Run test, verify it now FAILS (test was written backwards by mistake earlier - fix this):**
Actually write the test to FAIL first:
```python
def test_rejects_empty_email():
with pytest.raises(ValidationError):
create_user(email="")
```
Run:
```bash
pytest tests/test_user.py::test_rejects_empty_email
# Should FAIL before fix (no validation)
# Should PASS after fix (validation added)
```
## 5. Verify Complete Fix
**REQUIRED: Use verification-before-completion skill**
```bash
# Run full test suite (via test-runner agent)
"Run: pytest"
# Agent returns: All tests pass (including regression test)
```
**Verify:**
- Regression test passes
- All other tests still pass
- No new warnings or errors
- Pre-commit hooks pass
## 6. Close bd Issue
**Update with fix details:**
```bash
bd edit bd-123 --design "[previous content]
## Fix Implemented
[Description of fix]
[File changed: src/auth/user.py:23]
## Regression Test
[Test added: tests/test_user.py::test_rejects_empty_email]
## Verification
[All tests pass: 145/145]"
bd close bd-123
```
**Commit with bd reference:**
```bash
git commit -m "fix(bd-123): Reject empty email in user creation
Adds validation to prevent empty strings.
Regression test: test_rejects_empty_email
Closes bd-123"
```
</the_process>
<examples>
<example>
<scenario>Developer fixes bug without creating bd issue or regression test</scenario>
<code>
Developer notices: Empty email accepted in user creation
"Fixes" immediately:
```python
def create_user(email: str):
if not email: # Quick fix
raise ValidationError("Email required")
```
Commits: "fix: validate email"
[No bd issue, no regression test]
</code>
<why_it_fails>
**No tracking:**
- Work not tracked in bd (can't see what was fixed)
- No link between commit and bug
- Can't verify fix meets requirements
**No regression test:**
- Bug could come back in future
- Can't prove fix works
- No protection against breaking this again
**Incomplete fix:**
- Doesn't handle `email=" "` (whitespace)
- Didn't debug to understand full issue
**Result:** Bug returns when someone changes validation logic.
</why_it_fails>
<correction>
**Complete workflow:**
```bash
# 1. Track
bd create "Bug: Empty email accepted" --type bug
# Returns: bd-123
# 2. Debug (use debugging-with-tools)
# Investigation reveals: Email validation missing entirely
# Also: Whitespace emails like " " also accepted
# 3. Write failing test (RED)
def test_rejects_empty_email():
with pytest.raises(ValidationError):
create_user(email="")
def test_rejects_whitespace_email():
with pytest.raises(ValidationError):
create_user(email=" ")
# Run: Both PASS (bug exists) - WAIT, test should FAIL before fix!
```
Actually:
```python
# Test currently PASSES (bug exists - no validation)
# We expect test to FAIL after we add validation
# 4. Fix
def create_user(email: str):
if not email or not email.strip():
raise ValidationError("Email required")
# 5. Verify
pytest # All tests pass now, including regression tests
# 6. Close
bd close bd-123
git commit -m "fix(bd-123): Reject empty/whitespace email"
```
**Result:** Bug fixed, tracked, tested, won't regress.
</correction>
</example>
<example>
<scenario>Developer writes test after fix, test passes immediately, doesn't catch regression</scenario>
<code>
Developer fixes validation bug, then writes test:
```python
# Fix first
def validate_email(email):
return "@" in email and len(email) > 0
# Then test
def test_validate_email():
assert validate_email("user@example.com") == True
```
Test runs: PASS
Commits both together.
Later, someone changes validation:
```python
def validate_email(email):
return True # Breaks validation!
```
Test still PASSES (only checks happy path).
</code>
<why_it_fails>
**Test written after fix:**
- Never saw test fail
- Only tests happy path remembered
- Doesn't test the bug that was fixed
- Missed edge case: `validate_email("@@")` returns True (bug!)
**Why it happens:** Skipping TDD RED phase.
</why_it_fails>
<correction>
**TDD approach (RED-GREEN):**
```python
# 1. Write test FIRST that reproduces the bug
def test_validate_email():
# Happy path
assert validate_email("user@example.com") == True
# Bug case (empty email was accepted)
assert validate_email("") == False
# Edge case discovered during debugging
assert validate_email("@@") == False
# 2. Run test - should FAIL (bug exists)
pytest test_validate_email
# FAIL: validate_email("") returned True, expected False
# 3. Implement fix
def validate_email(email):
if not email or len(email) == 0:
return False
return "@" in email and email.count("@") == 1
# 4. Run test - should PASS
pytest test_validate_email
# PASS: All assertions pass
```
**Later regression:**
```python
def validate_email(email):
return True # Someone breaks it
```
**Test catches it:**
```
FAIL: assert validate_email("") == False
Expected False, got True
```
**Result:** Regression test actually prevents bug from returning.
</correction>
</example>
<example>
<scenario>Developer fixes symptom without using debugging-with-tools to find root cause</scenario>
<code>
Bug report: "Application crashes when processing user data"
Error:
```
NullPointerException at UserService.java:45
```
Developer sees line 45:
```java
String email = user.getEmail().toLowerCase(); // Line 45
```
"Obvious fix":
```java
String email = user.getEmail() != null ? user.getEmail().toLowerCase() : "";
```
Bug "fixed"... but crashes continue with different data.
</code>
<why_it_fails>
**Symptom fix:**
- Fixed null check at crash point
- Didn't investigate WHY email is null
- Didn't use debugging-with-tools to find root cause
**Actual root cause:** User object created without email in registration flow. Email is null for all users created via broken endpoint.
**Result:** Null-check applied everywhere, root cause (broken registration) unfixed.
</why_it_fails>
<correction>
**Use debugging-with-tools skill:**
```
# Dispatch internet-researcher
"Search for: NullPointerException UserService getEmail
- Common causes of null email in user objects
- User registration validation patterns"
# Dispatch codebase-investigator
"Investigate:
- How is User object created?
- Where is email set?
- Are there paths where email can be null?
- Which endpoints create users?"
# Agent reports:
"Found: POST /register endpoint creates User without validating email field.
Email is optional in UserDTO but required in User domain object."
```
**Root cause found:** Registration doesn't validate email.
**Proper fix:**
```java
// In registration endpoint
@PostMapping("/register")
public User register(@RequestBody UserDTO dto) {
if (dto.getEmail() == null || dto.getEmail().isEmpty()) {
throw new ValidationException("Email required");
}
return userService.create(dto);
}
```
**Regression test:**
```java
@Test
void registrationRequiresEmail() {
assertThrows(ValidationException.class, () ->
register(new UserDTO(null, "password")));
}
```
**Result:** Root cause fixed, no more null emails created.
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Every bug gets a bd issue** → Track from discovery to closure
- Create bd issue before fixing
- Document reproduction steps
- Update with investigation findings
- Close only after verified
2. **Use debugging-with-tools skill** → Systematic investigation required
- Never guess at fixes
- Use internet-researcher for errors
- Use debugger/instrumentation for state
- Find root cause, not symptom
3. **Write failing test first (RED)** → Regression prevention
- Test must fail before fix
- Test must reproduce the bug
- Test must pass after fix
- If test passes immediately, it doesn't test the bug
4. **Verify complete fix** → Use verification-before-completion
- Regression test passes
- Full test suite passes
- No new warnings
- Pre-commit hooks pass
## Common Excuses
All of these mean: Stop, follow complete workflow:
- "Quick fix, no need for bd issue"
- "Obvious bug, no need to debug"
- "I'll add test later"
- "Test passes, must be fixed"
- "Just one line change"
</critical_rules>
<verification_checklist>
Before claiming bug fixed:
- [ ] bd issue created with reproduction steps
- [ ] Used debugging-with-tools to find root cause
- [ ] Wrote test that reproduces bug (RED phase)
- [ ] Verified test FAILS before fix
- [ ] Implemented fix addressing root cause
- [ ] Verified test PASSES after fix
- [ ] Ran full test suite (all pass)
- [ ] Updated bd issue with fix details
- [ ] Closed bd issue
- [ ] Committed with bd reference
</verification_checklist>
<integration>
**This skill calls:**
- debugging-with-tools (systematic investigation)
- test-driven-development (RED-GREEN-REFACTOR cycle)
- verification-before-completion (verify complete fix)
**This skill is called by:**
- When bugs discovered during development
- When test failures need fixing
- When user reports bugs
**Agents used:**
- hyperpowers:internet-researcher (via debugging-with-tools)
- hyperpowers:codebase-investigator (via debugging-with-tools)
- hyperpowers:test-runner (run tests without output pollution)
</integration>
<resources>
**When stuck:**
- Don't understand bug → Use debugging-with-tools skill
- Tempted to skip tracking → Create bd issue first, always
- Test passes immediately → Not testing the bug, rewrite test
- Fix doesn't work → Return to debugging-with-tools, find actual root cause
</resources>

View File

@@ -0,0 +1,707 @@
---
name: managing-bd-tasks
description: Use for advanced bd operations - splitting tasks mid-flight, merging duplicates, changing dependencies, archiving epics, querying metrics, cross-epic dependencies
---
<skill_overview>
Advanced bd operations for managing complex task structures; bd is single source of truth, keep it accurate.
</skill_overview>
<rigidity_level>
HIGH FREEDOM - These are operational patterns, not rigid workflows. Adapt operations to your specific situation while following the core principles (keep bd accurate, merge don't delete, document changes).
</rigidity_level>
<quick_reference>
| Operation | When | Key Command |
|-----------|------|-------------|
| Split task | Task too large mid-flight | Create subtasks, add deps, close parent |
| Merge duplicates | Found duplicate tasks | Combine designs, move deps, close with reference |
| Change dependencies | Dependencies wrong/changed | `bd dep remove` then `bd dep add` |
| Archive epic | Epic complete, hide from views | `bd close bd-X --reason "Archived"` |
| Query metrics | Need status/velocity data | `bd list` + filters + `wc -l` |
| Cross-epic deps | Task depends on other epic | `bd dep add` works across epics |
| Bulk updates | Multiple tasks need same change | Loop with careful review first |
| Recover mistakes | Accidentally closed/wrong dep | `bd update --status` or `bd dep remove` |
**Core principle:** Track all work in bd, update as you go, never batch updates.
</quick_reference>
<when_to_use>
Use this skill for **advanced** bd operations:
- Split task that's too large (discovered mid-implementation)
- Merge duplicate tasks
- Reorganize dependencies after work started
- Archive completed epics (hide from views, keep history)
- Query bd for metrics (velocity, progress, bottlenecks)
- Manage cross-epic dependencies
- Bulk status updates
- Recover from bd mistakes
**For basic operations:** See skills/common-patterns/bd-commands.md (create, show, close, update)
</when_to_use>
<operations>
## Operation 1: Splitting Tasks Mid-Flight
**When:** Task in-progress but turns out too large.
**Example:** Started "Implement authentication" - realize it's 8+ hours of work across multiple areas.
**Process:**
### Step 1: Create subtasks for remaining work
```bash
# Original task bd-5 is in-progress
# Already completed: Login form
# Remaining work gets split:
bd create "Auth API endpoints" --type task --priority P1 --design "
POST /api/login and POST /api/logout endpoints.
## Success Criteria
- [ ] POST /api/login validates credentials, returns JWT
- [ ] POST /api/logout invalidates token
- [ ] Tests pass
"
# Returns bd-12
bd create "Session management" --type task --priority P1 --design "
JWT token tracking and validation.
## Success Criteria
- [ ] JWT generated on login
- [ ] Tokens validated on protected routes
- [ ] Token expiration handled
- [ ] Tests pass
"
# Returns bd-13
bd create "Password hashing" --type task --priority P1 --design "
Secure password hashing with bcrypt.
## Success Criteria
- [ ] Passwords hashed before storage
- [ ] Hash verification on login
- [ ] Tests pass
"
# Returns bd-14
```
### Step 2: Set up dependencies
```bash
# Password hashing must be done first
# API endpoints depend on password hashing
bd dep add bd-12 bd-14 # bd-12 depends on bd-14
# Session management depends on API endpoints
bd dep add bd-13 bd-12 # bd-13 depends on bd-12
# View tree
bd dep tree bd-5
```
### Step 3: Update original task and close
```bash
bd edit bd-5 --design "
Implement user authentication.
## Status
✓ Login form completed (frontend)
✗ Remaining work split into subtasks:
- bd-14: Password hashing (do first)
- bd-12: Auth API endpoints (depends on bd-14)
- bd-13: Session management (depends on bd-12)
## Success Criteria
- [x] Login form renders
- [ ] See subtasks for remaining criteria
"
bd close bd-5 --reason "Split into bd-12, bd-13, bd-14"
```
### Step 4: Work on subtasks in order
```bash
bd ready # Shows bd-14 (no dependencies)
bd update bd-14 --status in_progress
# Complete bd-14...
bd close bd-14
# Now bd-12 is unblocked
bd ready # Shows bd-12
```
---
## Operation 2: Merging Duplicate Tasks
**When:** Discovered two tasks are same thing.
**Example:**
```
bd-7: "Add email validation"
bd-9: "Validate user email addresses"
^ Duplicates
```
### Step 1: Choose which to keep
Based on:
- Which has more complete design?
- Which has more work done?
- Which has more dependencies?
**Example:** Keep bd-7 (more complete)
### Step 2: Merge designs
```bash
bd show bd-7
bd show bd-9
# Combine into bd-7
bd edit bd-7 --design "
Add email validation to user creation and update.
## Background
Originally tracked as bd-7 and bd-9 (now merged).
## Success Criteria
- [ ] Email validated on creation
- [ ] Email validated on update
- [ ] Rejects invalid formats
- [ ] Rejects empty strings
- [ ] Tests cover all cases
## Notes from bd-9
Need validation on update, not just creation.
"
```
### Step 3: Move dependencies
```bash
# Check bd-9 dependencies
bd show bd-9
# If bd-10 depended on bd-9, update to bd-7
bd dep remove bd-10 bd-9
bd dep add bd-10 bd-7
```
### Step 4: Close duplicate with reference
```bash
bd edit bd-9 --design "DUPLICATE: Merged into bd-7
This task was duplicate of bd-7. All work tracked there."
bd close bd-9
```
---
## Operation 3: Changing Dependencies
**When:** Dependencies were wrong or requirements changed.
**Example:** bd-10 depends on bd-8 and bd-9, but bd-9 got merged and bd-10 now also needs bd-11.
```bash
# Remove obsolete dependency
bd dep remove bd-10 bd-9
# Add new dependency
bd dep add bd-10 bd-11
# Verify
bd dep tree bd-1 # If bd-10 in epic bd-1
bd show bd-10 | grep "Blocking"
```
**Common scenarios:**
- Discovered hidden dependency during implementation
- Requirements changed mid-flight
- Tasks reordered for better flow
---
## Operation 4: Archiving Completed Epics
**When:** Epic complete, want to hide from default views but keep history.
```bash
# Verify all tasks closed
bd list --parent bd-1 --status open
# Output: [empty] = all closed
# Archive epic
bd close bd-1 --reason "Archived - completed Oct 2025"
# Won't show in open listings
bd list --status open # bd-1 won't appear
# Still accessible
bd show bd-1 # Still shows full epic
```
**Use archived for:** Completed epics, shipped features, historical reference
**Use open/in-progress for:** Active work
**Use closed with note for:** Cancelled work (explain why)
---
## Operation 5: Querying for Metrics
### Velocity
```bash
# Tasks closed this week
bd list --status closed | grep "closed_at" | grep "2025-10-" | wc -l
# Tasks closed by epic
bd list --parent bd-1 --status closed | wc -l
```
### Blocked vs Ready
```bash
# Ready to work on
bd ready
bd ready | grep "^bd-" | wc -l
# All open tasks
bd list --status open | wc -l
# Blocked = open - ready
```
### Epic Progress
```bash
# Show tree
bd dep tree bd-1
# Total tasks in epic
bd list --parent bd-1 | grep "^bd-" | wc -l
# Completed tasks
bd list --parent bd-1 --status closed | grep "^bd-" | wc -l
# Percentage = (completed / total) * 100
```
**For detailed metrics guidance:** See [resources/metrics-guide.md](resources/metrics-guide.md)
---
## Operation 6: Cross-Epic Dependencies
**When:** Task in one epic depends on task in different epic.
**Example:**
```
Epic bd-1: User Management
- bd-10: User CRUD API
Epic bd-2: Order Management
- bd-20: Order creation (needs user API)
```
```bash
# Add cross-epic dependency
bd dep add bd-20 bd-10
# bd-20 (in bd-2) depends on bd-10 (in bd-1)
# Check dependencies
bd show bd-20 | grep "Blocking"
# Check ready tasks
bd ready
# Won't show bd-20 until bd-10 closed
```
**Best practices:**
- Document cross-epic dependencies clearly
- Consider if epics should be merged
- Coordinate if different people own epics
---
## Operation 7: Bulk Status Updates
**When:** Need to update multiple tasks.
**Example:** Mark all test tasks closed after suite complete.
```bash
# Get tasks
bd list --parent bd-1 --status open | grep "test:" > test-tasks.txt
# Review list
cat test-tasks.txt
# Update each
while read task_id; do
bd close "$task_id"
done < test-tasks.txt
# Verify
bd list --parent bd-1 --status open | grep "test:"
```
**Use bulk for:**
- Marking completed work closed
- Reopening related tasks
- Updating priorities
**Never bulk:**
- Thoughtless changes
- Hiding problems (closing unfinished tasks)
---
## Operation 8: Recovering from Mistakes
### Accidentally closed task
```bash
bd update bd-15 --status open
# Or if was in progress
bd update bd-15 --status in_progress
```
### Wrong dependency
```bash
bd dep remove bd-10 bd-8 # Remove wrong
bd dep add bd-10 bd-9 # Add correct
```
### Undo design changes
```bash
# bd has no undo, restore from git
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-10"
# Find previous version, copy
bd edit bd-10 --design "[paste previous]"
```
### Epic structure wrong
1. Create new tasks with correct structure
2. Move work to new tasks
3. Close old tasks with reference
4. Don't delete (keep audit trail)
</operations>
<examples>
<example>
<scenario>Developer closes duplicate without merging information</scenario>
<code>
# Found duplicates
bd-7: "Add email validation"
bd-9: "Validate user email addresses"
# Developer just closes bd-9
bd close bd-9
# Loses information from bd-9's design
# bd-9 mentioned validation on update (bd-7 didn't)
# Now that requirement is lost
# Work on bd-7 completes, but misses update validation
# Bug ships to production
</code>
<why_it_fails>
- Closed duplicate without reading its design
- Lost requirement mentioned only in duplicate
- Information not preserved
- Incomplete implementation ships
- bd not accurate source of truth
</why_it_fails>
<correction>
**Correct process:**
```bash
# Read BOTH tasks
bd show bd-7 # Only mentions validation on creation
bd show bd-9 # Mentions validation on update too
# Merge information
bd edit bd-7 --design "
Email validation for user creation and update.
## Background
Merged from bd-9.
## Success Criteria
- [ ] Validate on creation (from bd-7)
- [ ] Validate on update (from bd-9) ← Preserved!
- [ ] Tests for both cases
"
# Then close duplicate with reference
bd edit bd-9 --design "DUPLICATE: Merged into bd-7"
bd close bd-9
```
**What you gain:**
- All requirements preserved
- bd remains accurate
- No information lost
- Complete implementation
- Audit trail clear
</correction>
</example>
<example>
<scenario>Developer doesn't split large task, struggles through</scenario>
<code>
bd-15: "Implement payment processing" (started)
# 3 hours in, developer realizes:
# - Need Stripe API integration (4 hours)
# - Need payment validation (2 hours)
# - Need retry logic (3 hours)
# - Need receipt generation (2 hours)
# Total: 11 more hours!
# Developer thinks: "Too late to split, I'll power through"
# Works 14 hours straight
# Gets exhausted, makes mistakes
# Ships buggy code
# Has to fix in production
</code>
<why_it_fails>
- Didn't split when discovered size
- "Sunk cost" rationalization (already started)
- No clear stopping points
- Exhaustion leads to bugs
- Can't track progress granularly
- If interrupted, hard to resume
</why_it_fails>
<correction>
**Correct approach (split mid-flight):**
```bash
# 3 hours in, stop and split
bd edit bd-15 --design "
Implement payment processing.
## Status
✓ Completed: Payment form UI (3 hours)
✗ Split remaining work into subtasks:
- bd-20: Stripe API integration
- bd-21: Payment validation
- bd-22: Retry logic
- bd-23: Receipt generation
"
bd close bd-15 --reason "Split into bd-20, bd-21, bd-22, bd-23"
# Create subtasks with dependencies
bd create "Stripe API integration" ... # bd-20
bd create "Payment validation" ... # bd-21
bd create "Retry logic" ... # bd-22
bd create "Receipt generation" ... # bd-23
bd dep add bd-21 bd-20 # Validation needs API
bd dep add bd-22 bd-20 # Retry needs API
bd dep add bd-23 bd-22 # Receipts after retry works
# Work on one at a time
bd update bd-20 --status in_progress
# Complete bd-20 (4 hours)
bd close bd-20
# Take break
# Next day: bd-21
```
**What you gain:**
- Clear stopping points (can pause between tasks)
- Track progress granularly
- No exhaustion (spread over days)
- Better quality (not rushed)
- If interrupted, easy to resume
- Each subtask gets proper focus
</correction>
</example>
<example>
<scenario>Developer adds dependency but doesn't update dependent task</scenario>
<code>
# Initial state
bd-10: "Add user dashboard" (in progress)
bd-15: "Add analytics to dashboard" (blocked on bd-10)
# During bd-10 implementation, discover need for new API
bd create "Analytics API endpoints" ... # Creates bd-20
# Add dependency
bd dep add bd-15 bd-20 # bd-15 now depends on bd-20 too
# But bd-10 completes, closes
bd close bd-10
# bd-15 shows as ready (bd-10 closed)
bd ready # Shows bd-15
# Developer starts bd-15
bd update bd-15 --status in_progress
# Immediately blocked - needs bd-20!
# bd-20 not done yet
# Have to stop work on bd-15
# Time wasted
</code>
<why_it_fails>
- Added dependency but didn't document in bd-15
- bd-15's design doesn't mention bd-20 requirement
- Appears ready when not actually ready
- Wastes time starting work that's blocked
- Dependencies not obvious from task design
</why_it_fails>
<correction>
**Correct approach:**
```bash
# Create new API task
bd create "Analytics API endpoints" ... # bd-20
# Add dependency
bd dep add bd-15 bd-20
# UPDATE bd-15 to document new requirement
bd edit bd-15 --design "
Add analytics to dashboard.
## Dependencies
- bd-10: User dashboard (completed)
- bd-20: Analytics API endpoints (NEW - discovered during bd-10)
## Success Criteria
- [ ] Integrate with analytics API (bd-20)
- [ ] Display charts on dashboard
- [ ] Tests pass
"
# Close bd-10
bd close bd-10
# Check ready
bd ready # Does NOT show bd-15 (blocked on bd-20)
# Work on bd-20 first
bd update bd-20 --status in_progress
# Complete bd-20
bd close bd-20
# NOW bd-15 is truly ready
bd ready # Shows bd-15
```
**What you gain:**
- Dependencies documented in task design
- Clear why task is blocked
- No false "ready" signals
- Work proceeds in correct order
- No wasted time starting blocked work
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Keep bd accurate** → Single source of truth for all work
2. **Merge duplicates, don't just close** → Preserve information from both
3. **Split large tasks when discovered** → Not after struggling through
4. **Document dependency changes** → Update task designs when deps change
5. **Update as you go** → Never batch updates "for later"
## Common Excuses
All of these mean: **STOP. Follow the operation properly.**
- "Task too complex to split" (Every task can be broken down)
- "Just close duplicate" (Merge first, preserve information)
- "Won't track this in bd" (All work tracked, no exceptions)
- "bd is out of date, update later" (Later never comes, update now)
- "This dependency doesn't matter" (Dependencies prevent blocking, they matter)
- "Too much overhead to split" (More overhead to fail huge task)
</critical_rules>
<bd_best_practices>
**For detailed guidance on:**
- Task naming conventions
- Priority guidelines (P0-P4)
- Task granularity
- Success criteria
- Dependency management
**See:** [resources/task-naming-guide.md](resources/task-naming-guide.md)
</bd_best_practices>
<red_flags>
Watch for these patterns:
- **Multiple in-progress tasks** → Focus on one
- **Tasks stuck in-progress for days** → Blocked? Split it?
- **Many open tasks, no dependencies** → Prioritize!
- **Epics with 20+ tasks** → Too large, split epic
- **Closed tasks, incomplete criteria** → Not done, reopen
</red_flags>
<verification_checklist>
After advanced bd operations:
- [ ] bd still accurate (reflects reality)
- [ ] Dependencies correct (nothing blocked incorrectly)
- [ ] Duplicate information merged (not lost)
- [ ] Changes documented in task designs
- [ ] Ready tasks are actually unblocked
- [ ] Metrics queries return sensible numbers
- [ ] No orphaned tasks (all part of epics)
**Can't check all boxes?** Review operation and fix issues.
</verification_checklist>
<integration>
**This skill covers:** Advanced bd operations
**For basic operations:**
- skills/common-patterns/bd-commands.md
**Related skills:**
- hyperpowers:writing-plans (creating epics and tasks)
- hyperpowers:executing-plans (working through tasks)
- hyperpowers:verification-before-completion (closing tasks properly)
**CRITICAL:** Use bd CLI commands, never read `.beads/issues.jsonl` directly.
</integration>
<resources>
**Detailed guides:**
- [Metrics guide (cycle time, WIP limits)](resources/metrics-guide.md)
- [Task naming conventions](resources/task-naming-guide.md)
- [Dependency patterns](resources/dependency-patterns.md)
**When stuck:**
- Task seems unsplittable → Ask user how to break it down
- Duplicates complex → Merge designs carefully, don't rush
- Dependencies tangled → Draw diagram, untangle systematically
- bd out of sync → Stop everything, update bd first
</resources>

View File

@@ -0,0 +1,166 @@
# bd Metrics Guide
This guide covers the key metrics for tracking work in bd.
## Cycle Time vs. Lead Time
**Two distinct time measurements:**
### Cycle Time
- **Definition**: Time from "work started" to "work completed"
- **Start**: When task moves to "in-progress" status
- **End**: When task moves to "closed" status
- **Measures**: How efficiently work flows through active development
- **Use**: Identify process inefficiencies, improve development speed
```bash
# Calculate cycle time for completed task
bd show bd-5 | grep "status.*in-progress" # Get start time
bd show bd-5 | grep "status.*closed" # Get end time
# Difference = cycle time
```
### Lead Time
- **Definition**: Time from "request created" to "delivered to customer"
- **Start**: When task is created (enters backlog)
- **End**: When task is deployed/delivered
- **Measures**: Overall responsiveness to requests
- **Use**: Set realistic expectations, measure total process duration
```bash
# Calculate lead time for completed task
bd show bd-5 | grep "created_at" # Get creation time
bd show bd-5 | grep "deployed_at" # Get deployment time (if tracked)
# Difference = lead time
```
### Key Differences
| Metric | Starts | Ends | Includes Waiting? | Measures |
|--------|--------|------|-------------------|----------|
| **Cycle Time** | In-progress | Closed | No | Development efficiency |
| **Lead Time** | Created | Deployed | Yes | Total responsiveness |
### Example
```
Task created: Monday 9am (enters backlog)
↓ [waits 2 days]
Task started: Wednesday 9am (moved to in-progress)
↓ [active work]
Task completed: Wednesday 5pm (moved to closed)
↓ [waits for deployment]
Task deployed: Thursday 2pm (delivered)
Cycle Time: 8 hours (Wednesday 9am → 5pm)
Lead Time: 3 days, 5 hours (Monday 9am → Thursday 2pm)
```
### Why Both Matter
- **Short cycle time, long lead time**: Work is efficient once started, but tasks wait too long in backlog
- Fix: Reduce WIP, start fewer tasks, finish faster
- **Long cycle time, short lead time**: Work starts immediately but takes forever to complete
- Fix: Split tasks smaller, remove blockers, improve focus
- **Both long**: Overall process is slow
- Fix: Address both backlog management AND development efficiency
### Tracking Over Time
```bash
# Average cycle time (manual calculation)
# For each closed task: (closed_at - started_at)
# Sum and divide by task count
# Trend analysis
# Week 1: Avg cycle time = 3 days
# Week 2: Avg cycle time = 2 days ✅ Improving
# Week 3: Avg cycle time = 4 days ❌ Getting worse
```
### Improvement Targets
- **Cycle time**: Reduce by splitting tasks, removing blockers, improving focus
- **Lead time**: Reduce by prioritizing backlog, reducing WIP, faster deployment
## Work in Progress (WIP)
```bash
# All in-progress tasks
bd list --status in-progress
# Count
bd list --status in-progress | grep "^bd-" | wc -l
```
### WIP Limits
Work in Progress limits prevent overcommitment and identify bottlenecks.
**Setting WIP limits:**
- **Personal WIP limit**: 1-2 tasks in-progress at a time
- **Team WIP limit**: Depends on team size and workflow stages
- **Rule of thumb**: WIP limit = (Team size ÷ 2) + 1
**Example for individual developer:**
```
✅ Good: 1 task in-progress, 0-1 in code review
❌ Bad: 5 tasks in-progress simultaneously
```
**Example for team of 6:**
```
Workflow stages and limits:
- Backlog: Unlimited
- Ready: 8 items max
- In Progress: 4 items max (team size ÷ 2 + 1)
- Code Review: 3 items max
- Testing: 2 items max
- Done: Unlimited
```
### Why WIP Limits Matter
1. **Focus:** Fewer tasks means deeper focus, faster completion
2. **Flow:** Prevents bottlenecks from accumulating
3. **Quality:** Less context switching, fewer mistakes
4. **Visibility:** High WIP indicates blocked work or overcommitment
### Monitoring WIP
```bash
# Check personal WIP
bd list --status in-progress | grep "assignee:me" | wc -l
# If > 2: Focus on finishing before starting new work
```
### Red Flags
- WIP consistently at or above limit (need more capacity or smaller tasks)
- WIP growing week-over-week (work piling up, not finishing)
- WIP high but velocity low (tasks blocked or too large)
### Response to High WIP
1. Finish existing tasks before starting new ones
2. Identify and remove blockers
3. Split large tasks
4. Add capacity (if chronically high)
## Bottleneck Identification
```bash
# Find tasks that are blocking others
# (Tasks that many other tasks depend on)
for task in $(bd list --status open | grep "^bd-" | cut -d: -f1); do
echo -n "$task: "
bd list --status open | xargs -I {} sh -c "bd show {} | grep -q \"depends on $task\" && echo {}" | wc -l
done | sort -t: -k2 -n -r
# Shows tasks with most dependencies (top bottlenecks)
```

View File

@@ -0,0 +1,276 @@
# bd Task Naming and Quality Guidelines
This guide covers best practices for naming tasks, setting priorities, sizing work, and defining success criteria.
## Task Naming Conventions
### Principles
- **Actionable**: Start with action verbs (add, fix, update, remove, refactor, implement)
- **Specific**: Include enough context to understand without opening
- **Consistent**: Follow project-wide templates
### Templates by Task Type
#### User Stories
**Template:**
```
As a [persona], I want [something] so that [reason]
```
**Examples:**
```
As a customer, I want one-click checkout so that I can purchase quickly
As an admin, I want bulk user import so that I can onboard teams efficiently
As a developer, I want API rate limiting so that I can prevent abuse
```
**When to use:** Features from user perspective
#### Bug Reports
**Template 1 (Capability-focused):**
```
[User type] can't [action they should be able to do]
```
**Examples:**
```
New users can't view home screen after signup
Admin users can't export user data to CSV
Guest users can't add items to cart
```
**Template 2 (Event-focused):**
```
When [action/event], [system feature] doesn't work
```
**Examples:**
```
When clicking Submit, payment form doesn't validate
When uploading large files, progress bar freezes
When session expires, user isn't redirected to login
```
**When to use:** Describing broken functionality
#### Tasks (Implementation Work)
**Template:**
```
[Verb] [object] [context]
```
**Examples:**
```
feat(auth): Implement JWT token generation
fix(api): Handle empty email validation in user endpoint
test: Add integration tests for payment flow
refactor: Extract validation logic from UserService
docs: Update API documentation for v2 endpoints
```
**When to use:** Technical implementation tasks
#### Features (High-Level Capabilities)
**Template:**
```
[Verb] [capability] for [user/system]
```
**Examples:**
```
Add dark mode toggle for Settings page
Implement rate limiting for API endpoints
Enable two-factor authentication for admin users
Build export functionality for report data
```
**When to use:** Feature-level work (may become epic with multiple tasks)
### Context Guidelines
- **Which component**: "in login flow", "for user API", "in Settings page"
- **Which user type**: "for admins", "for guests", "for authenticated users"
- **Avoid jargon** in user stories (user perspective, not technical)
- **Be specific** in technical tasks (exact API, file, function)
### Good vs Bad Names
**Good names:**
- `feat(auth): Implement JWT token generation`
- `fix(api): Handle empty email validation in user endpoint`
- `As a customer, I want CSV export so that I can analyze my data`
- `test: Add integration tests for payment flow`
- `refactor: Extract validation logic from UserService`
**Bad names:**
- `fix stuff` (vague - what stuff?)
- `implement feature` (vague - which feature?)
- `work on backend` (vague - what work?)
- `Report` (noun, not action - should be "Generate Q4 Sales Report")
- `API endpoint` (incomplete - "Add GET /users endpoint" better)
## Priority Guidelines
Use bd's priority system consistently:
- **P0:** Critical production bug (drop everything)
- **P1:** Blocking other work (do next)
- **P2:** Important feature work (normal priority)
- **P3:** Nice to have (do when time permits)
- **P4:** Someday/maybe (backlog)
## Granularity Guidelines
**Good task size:**
- 2-4 hours of focused work
- Can complete in one sitting
- Clear deliverable
**Too large:**
- Takes multiple days
- Multiple independent pieces
- Should be split
**Too small:**
- Takes 15 minutes
- Too granular to track
- Combine with related tasks
## Success Criteria: Acceptance Criteria vs. Definition of Done
**Two distinct types of completion criteria:**
### Acceptance Criteria (Per-Task, Functional)
**Definition:** Specific, measurable requirements unique to each task that define functional completeness from user/business perspective.
**Scope:** Unique to each backlog item (bug, task, story)
**Purpose:** "Does this feature work correctly?"
**Owner:** Product owner/stakeholder defines, team validates
**Format:** Checklist or scenarios
```markdown
## Acceptance Criteria
- [ ] User can upload CSV files up to 10MB
- [ ] System validates CSV format before processing
- [ ] User sees progress bar during upload
- [ ] User receives success message with row count
- [ ] Invalid files show specific error messages
```
**Scenario format (Given/When/Then):**
```markdown
## Acceptance Criteria
Scenario 1: Valid file upload
Given a user is on the upload page
When they select a valid CSV file
Then the file uploads successfully
And they see confirmation with row count
Scenario 2: Invalid file format
Given a user selects a non-CSV file
When they try to upload
Then they see error: "Only CSV files supported"
```
### Definition of Done (Universal, Quality)
**Definition:** Universal checklist that applies to ALL work items to ensure consistent quality and release-readiness.
**Scope:** Applies to every single task (bugs, features, stories)
**Purpose:** "Is this work complete to our quality standards?"
**Owner:** Team defines and maintains (reviewed in retrospectives)
**Example DoD:**
```markdown
## Definition of Done (applies to all tasks)
- [ ] Code written and peer-reviewed
- [ ] Unit tests written and passing (>80% coverage)
- [ ] Integration tests passing
- [ ] No linter warnings
- [ ] Documentation updated (if public API)
- [ ] Manual testing completed (if UI)
- [ ] Deployed to staging environment
- [ ] Product owner accepted
- [ ] Commit references bd task ID
```
### Key Differences
| Aspect | Acceptance Criteria | Definition of Done |
|--------|--------------------|--------------------|
| **Scope** | Per-task (unique) | All tasks (universal) |
| **Focus** | Functional requirements | Quality standards |
| **Question** | "Does it work?" | "Is it done?" |
| **Owner** | Product owner | Team |
| **Changes** | Per task | Rarely (retrospectives) |
| **Examples** | "User can export data" | "Tests pass, code reviewed" |
### How to Use Both
**When creating a task:**
1. **Define Acceptance Criteria** (task-specific functional requirements)
2. **Reference Definition of Done** (don't duplicate it in task)
```markdown
bd create "Implement CSV file upload" --design "
## Acceptance Criteria
- [ ] User can upload CSV files up to 10MB
- [ ] System validates CSV format
- [ ] Progress bar shows during upload
- [ ] Success message displays row count
## Notes
Must also meet team's Definition of Done (see project wiki)
"
```
**Before closing a task:**
1. ✅ Verify all Acceptance Criteria met (functional)
2. ✅ Verify Definition of Done met (quality)
3. Only then close task
**Bad practice:**
```markdown
## Success Criteria
- [ ] CSV upload works
- [ ] Tests pass ← This is DoD, not acceptance criteria
- [ ] Code reviewed ← This is DoD, not acceptance criteria
- [ ] No linter warnings ← This is DoD, not acceptance criteria
```
**Good practice:**
```markdown
## Acceptance Criteria (functional, task-specific)
- [ ] CSV upload handles files up to 10MB
- [ ] Validation rejects non-CSV formats
- [ ] Progress bar updates during upload
## Definition of Done (quality, universal - referenced, not duplicated)
See team DoD checklist (applies to all tasks)
```
## Dependency Management
**Good dependency usage:**
- Technical dependency (feature B needs feature A's code)
- Clear ordering (must do A before B)
- Unblocks work (completing A unblocks B)
**Bad dependency usage:**
- "Feels like should be done first" (vague)
- No technical relationship (just preference)
- Circular dependencies (A depends on B depends on A)

View File

@@ -0,0 +1,541 @@
---
name: refactoring-safely
description: Use when refactoring code - test-preserving transformations in small steps, running tests between each change
---
<skill_overview>
Refactoring changes code structure without changing behavior; tests must stay green throughout or you're rewriting, not refactoring.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Follow the change→test→commit cycle strictly, but adapt the specific refactoring patterns to your language and codebase.
</rigidity_level>
<quick_reference>
| Step | Action | Verify |
|------|--------|--------|
| 1 | Run full test suite | ALL pass |
| 2 | Create bd refactoring task | Track work |
| 3 | Make ONE small change | Compiles |
| 4 | Run tests immediately | ALL still pass |
| 5 | Commit with descriptive message | History clear |
| 6 | Repeat 3-5 until complete | Each step safe |
| 7 | Final verification & close bd | Done |
**Core cycle:** Change → Test → Commit (repeat until complete)
</quick_reference>
<when_to_use>
- Improving code structure without changing functionality
- Extracting duplicated code into shared utilities
- Renaming for clarity
- Reorganizing file/module structure
- Simplifying complex code while preserving behavior
**Don't use for:**
- Changing functionality (use feature development)
- Fixing bugs (use hyperpowers:fixing-bugs)
- Adding features while restructuring (do separately)
- Code without tests (write tests first using hyperpowers:test-driven-development)
</when_to_use>
<the_process>
## 1. Verify Tests Pass
**BEFORE any refactoring:**
```bash
# Use test-runner agent to keep context clean
Dispatch hyperpowers:test-runner agent: "Run: cargo test"
```
**Verify:** ALL tests pass. If any fail, fix them FIRST, then refactor.
**Why:** Failing tests mean you can't detect if refactoring breaks things.
---
## 2. Create bd Task for Refactoring
Track the refactoring work:
```bash
bd create "Refactor: Extract user validation logic" \
--type task \
--priority P2
bd edit bd-456 --design "
## Goal
Extract user validation logic from UserService into separate Validator class.
## Why
- Validation duplicated across 3 services
- Makes testing individual validations difficult
- Violates single responsibility
## Approach
1. Create UserValidator class
2. Extract email validation
3. Extract name validation
4. Extract age validation
5. Update UserService to use validator
6. Remove duplication from other services
## Success Criteria
- All existing tests still pass
- No behavior changes
- Validator has 100% test coverage
"
bd update bd-456 --status in_progress
```
---
## 3. Make ONE Small Change
The smallest transformation that compiles.
**Examples of "small":**
- Extract one method
- Rename one variable
- Move one function to different file
- Inline one constant
- Extract one interface
**NOT small:**
- Extracting multiple methods at once
- Renaming + moving + restructuring
- "While I'm here" improvements
**Example:**
```rust
// Before
fn create_user(name: &str, email: &str) -> Result<User> {
if email.is_empty() {
return Err(Error::InvalidEmail);
}
if !email.contains('@') {
return Err(Error::InvalidEmail);
}
let user = User { name, email };
Ok(user)
}
// After - ONE small change (extract email validation)
fn create_user(name: &str, email: &str) -> Result<User> {
validate_email(email)?;
let user = User { name, email };
Ok(user)
}
fn validate_email(email: &str) -> Result<()> {
if email.is_empty() {
return Err(Error::InvalidEmail);
}
if !email.contains('@') {
return Err(Error::InvalidEmail);
}
Ok(())
}
```
---
## 4. Run Tests Immediately
After EVERY small change:
```bash
Dispatch hyperpowers:test-runner agent: "Run: cargo test"
```
**Verify:** ALL tests still pass.
**If tests fail:**
1. STOP
2. Undo the change: `git restore src/file.rs`
3. Understand why it broke
4. Make smaller change
5. Try again
**Never proceed with failing tests.**
---
## 5. Commit the Small Change
Commit each safe transformation:
```bash
Dispatch hyperpowers:test-runner agent: "Run: git add src/user_service.rs && git commit -m 'refactor(bd-456): extract email validation to function
No behavior change. All tests pass.
Part of bd-456'"
```
**Why commit so often:**
- Easy to undo if next step breaks
- Clear history of transformations
- Can review each step independently
- Proves tests passed at each point
---
## 6. Repeat Until Complete
Repeat steps 3-5 for each small transformation:
```
1. Extract validate_email() ✓ (committed)
2. Extract validate_name() ✓ (committed)
3. Extract validate_age() ✓ (committed)
4. Create UserValidator struct ✓ (committed)
5. Move validations into UserValidator ✓ (committed)
6. Update UserService to use validator ✓ (committed)
7. Remove validation from OrderService ✓ (committed)
8. Remove validation from AccountService ✓ (committed)
```
**Pattern:** change → test → commit (repeat)
---
## 7. Final Verification
After all transformations complete:
```bash
# Full test suite
Dispatch hyperpowers:test-runner agent: "Run: cargo test"
# Linter
Dispatch hyperpowers:test-runner agent: "Run: cargo clippy"
```
**Review the changes:**
```bash
# See all refactoring commits
git log --oneline | grep "bd-456"
# Review full diff
git diff main...HEAD
```
**Checklist:**
- [ ] All tests pass
- [ ] No new warnings
- [ ] No behavior changes
- [ ] Code is cleaner/simpler
- [ ] Each commit is small and safe
**Close bd task:**
```bash
bd edit bd-456 --design "
... (append to existing design)
## Completed
- Created UserValidator class with email, name, age validation
- Removed duplicated validation from 3 services
- All tests pass (verified)
- No behavior changes
- 8 small transformations, each tested
"
bd close bd-456
```
</the_process>
<examples>
<example>
<scenario>Developer changes behavior while "refactoring"</scenario>
<code>
// Original code
fn validate_email(email: &str) -> Result<()> {
if email.is_empty() {
return Err(Error::InvalidEmail);
}
if !email.contains('@') {
return Err(Error::InvalidEmail);
}
Ok(())
}
// "Refactored" version
fn validate_email(email: &str) -> Result<()> {
if email.is_empty() {
return Err(Error::InvalidEmail);
}
if !email.contains('@') {
return Err(Error::InvalidEmail);
}
// NEW: Added extra validation
if !email.contains('.') { // BEHAVIOR CHANGE
return Err(Error::InvalidEmail);
}
Ok(())
}
</code>
<why_it_fails>
- This changes behavior (now rejects emails like "user@localhost")
- Tests might fail, or worse, pass and ship breaking change
- Not refactoring - this is modifying functionality
- Users who relied on old behavior experience regression
</why_it_fails>
<correction>
**Correct approach:**
1. Extract validation (pure refactoring, no behavior change)
2. Commit with tests passing
3. THEN add new validation as separate feature with new tests
4. Two clear commits: refactoring vs. feature addition
**What you gain:**
- Clear history of what changed when
- Easy to revert feature without losing refactoring
- Tests document exact behavior changes
- No surprises in production
</correction>
</example>
<example>
<scenario>Developer does big-bang refactoring</scenario>
<code>
# Changes made all at once:
- Renamed 15 functions across 5 files
- Extracted 3 new classes
- Moved code between 10 files
- Reorganized module structure
- Updated all import statements
# Then runs tests
$ cargo test
... 23 test failures ...
# Now what? Which change broke what?
</code>
<why_it_fails>
- Can't identify which specific change broke tests
- Reverting means losing ALL work
- Fixing requires re-debugging entire refactoring
- Wastes hours trying to untangle failures
- Might give up and revert everything
</why_it_fails>
<correction>
**Correct approach:**
1. Rename ONE function → test → commit
2. Extract ONE class → test → commit
3. Move ONE file → test → commit
4. Continue one change at a time
**If test fails:**
- Know exactly which change broke it
- Revert ONE commit, not all work
- Fix or make smaller change
- Continue from known-good state
**What you gain:**
- Tests break → immediately know why
- Each commit is reviewable independently
- Can stop halfway with useful progress
- Confidence from continuous green tests
- Clear history for future developers
</correction>
</example>
<example>
<scenario>Developer refactors code without tests</scenario>
<code>
// Legacy code with no tests
fn process_payment(amount: f64, user_id: i64) -> Result<PaymentId> {
// 200 lines of complex payment logic
// Multiple edge cases
// No tests exist
}
// Developer refactors without tests:
// - Extracts 5 methods
// - Renames variables
// - Simplifies conditionals
// - "Looks good to me!"
// Deploys to production
// 💥 Payments fail for amounts over $1000
// Edge case handling was accidentally changed
</code>
<why_it_fails>
- No tests to verify behavior preserved
- Complex logic has hidden edge cases
- Subtle behavior changes go unnoticed
- Breaks in production, not development
- Costs customer trust and emergency debugging
</why_it_fails>
<correction>
**Correct approach:**
1. **Write tests FIRST** (using hyperpowers:test-driven-development)
- Test happy path
- Test all edge cases (amounts over $1000, etc.)
- Test error conditions
- Run tests → all pass (documenting current behavior)
2. **Then refactor with tests as safety net**
- Extract method → run tests → commit
- Rename → run tests → commit
- Simplify → run tests → commit
3. **Tests catch any behavior changes immediately**
**What you gain:**
- Confidence behavior is preserved
- Edge cases documented in tests
- Catches subtle changes before production
- Future refactoring is also safe
- Tests serve as documentation
</correction>
</example>
</examples>
<refactor_vs_rewrite>
## When to Refactor
- Tests exist and pass
- Changes are incremental
- Business logic stays same
- Can transform in small, safe steps
- Each step independently valuable
## When to Rewrite
- No tests exist (write tests first, then refactor)
- Fundamental architecture change needed
- Easier to rebuild than modify
- Requirements changed significantly
- After 3+ failed refactoring attempts
**Rule:** If you need to change test assertions (not just add tests), you're rewriting, not refactoring.
## Strangler Fig Pattern (Hybrid)
**When to use:**
- Need to replace legacy system but can't tolerate downtime
- Want incremental migration with continuous monitoring
- System too large to refactor in one go
**How it works:**
1. **Transform:** Create modernized components alongside legacy
2. **Coexist:** Both systems run in parallel (façade routes requests)
3. **Eliminate:** Retire old functionality piece by piece
**Example:**
```
Legacy: Monolithic user service (50K LOC)
Goal: Microservices architecture
Step 1 (Transform):
- Create new UserService microservice
- Implement user creation endpoint
- Tests pass in isolation
Step 2 (Coexist):
- Add routing layer (façade)
- Route POST /users to new service
- Route GET /users to legacy service (for now)
- Monitor both, compare results
Step 3 (Eliminate):
- Once confident, migrate GET /users to new service
- Remove user creation from legacy
- Repeat for remaining endpoints
```
**Benefits:**
- Incremental replacement reduces risk
- Legacy continues operating during transition
- Can pause/rollback at any point
- Each migration step is independently valuable
**Use refactoring within components, Strangler Fig for replacing systems.**
</refactor_vs_rewrite>
<critical_rules>
## Rules That Have No Exceptions
1. **Tests must stay green** throughout refactoring → If they fail, you changed behavior (stop and undo)
2. **Commit after each small change** → Large commits hide which change broke what
3. **One transformation at a time** → Multiple changes = impossible to debug failures
4. **Run tests after EVERY change** → Delayed testing doesn't tell you which change broke it
5. **If tests fail 3+ times, question approach** → Might need to rewrite instead, or add tests first
## Common Excuses
All of these mean: **Stop and return to the change→test→commit cycle**
- "Small refactoring, don't need tests between steps"
- "I'll test at the end"
- "Tests are slow, I'll run once at the end"
- "Just fixing bugs while refactoring" (bug fixes = behavior changes = not refactoring)
- "Easier to do all at once"
- "I know it works without tests"
- "While I'm here, I'll also..." (scope creep during refactoring)
- "Tests will fail temporarily but I'll fix them" (tests must stay green)
</critical_rules>
<verification_checklist>
Before marking refactoring complete:
- [ ] All tests pass (verified with hyperpowers:test-runner agent)
- [ ] No new linter warnings
- [ ] No behavior changes introduced
- [ ] Code is cleaner/simpler than before
- [ ] Each commit in history is small and safe
- [ ] bd task documents what was done and why
- [ ] Can explain what each transformation did
**Can't check all boxes?** Return to process and fix before closing bd task.
</verification_checklist>
<integration>
**This skill requires:**
- hyperpowers:test-driven-development (for writing tests before refactoring if none exist)
- hyperpowers:verification-before-completion (for final verification)
- hyperpowers:test-runner agent (for running tests without context pollution)
**This skill is called by:**
- General development workflows when improving code structure
- After features are complete and working
- When preparing code for new features
**Agents used:**
- test-runner (runs tests/commits without polluting main context)
</integration>
<resources>
**Detailed guides:**
- [Common refactoring patterns](resources/refactoring-patterns.md) - Extract Method, Extract Class, Inline, etc.
- [Complete refactoring session example](resources/example-session.md) - Minute-by-minute walkthrough
**When stuck:**
- Tests fail after change → Undo (git restore), make smaller change
- 3+ failures → Question if refactoring is right approach, consider rewrite
- No tests exist → Use hyperpowers:test-driven-development to write tests first
- Unsure how small → If it touches more than one function/file, it's too big
</resources>

View File

@@ -0,0 +1,78 @@
## Example: Complete Refactoring Session
**Goal:** Extract validation logic from UserService
**Time: 60 minutes**
### Minutes 0-5: Verify Tests Pass
```bash
Dispatch hyperpowers:test-runner: "Run: cargo test"
Result: ✓ 234 tests pass
```
### Minutes 5-10: Create bd Task
```bash
bd create "Refactor: Extract user validation" --type task
bd edit bd-456 --design "Extract validation to UserValidator class..."
bd update bd-456 --status in_progress
```
### Minutes 10-15: Step 1 - Extract email validation function
```rust
// Extract validate_email()
```
```bash
Dispatch hyperpowers:test-runner: "Run: cargo test"
Result: ✓ 234 tests pass
git commit -m "refactor(bd-456): extract email validation"
```
### Minutes 15-20: Step 2 - Extract name validation function
```rust
// Extract validate_name()
```
```bash
Dispatch hyperpowers:test-runner: "Run: cargo test"
Result: ✓ 234 tests pass
git commit -m "refactor(bd-456): extract name validation"
```
### Minutes 20-25: Step 3 - Create UserValidator struct
```rust
struct UserValidator { /* empty */ }
impl UserValidator { /* empty */ }
```
```bash
Dispatch hyperpowers:test-runner: "Run: cargo test"
Result: ✓ 234 tests pass
git commit -m "refactor(bd-456): create UserValidator struct"
```
### Minutes 25-35: Steps 4-6 - Move validations to UserValidator
Each step: move one method, test, commit
### Minutes 35-45: Step 7 - Update UserService to use validator
```rust
// Use UserValidator instead of inline validation
```
```bash
Dispatch hyperpowers:test-runner: "Run: cargo test"
Result: ✓ 234 tests pass
git commit -m "refactor(bd-456): use UserValidator in UserService"
```
### Minutes 45-55: Step 8 - Remove duplication from other services
Each service: one change, test, commit
### Minutes 55-60: Final verification and close
```bash
Dispatch hyperpowers:test-runner: "Run: cargo test"
Result: ✓ 234 tests pass
Dispatch hyperpowers:test-runner: "Run: cargo clippy"
Result: ✓ No warnings
bd close bd-456
```
**Result:** Refactoring complete, 8 safe commits, all tests green throughout.

View File

@@ -0,0 +1,121 @@
## Common Refactoring Patterns
### Extract Method
**When:** Duplicated code or long function
```rust
// Before: Long function
fn process(data: Vec<i32>) -> i32 {
let mut sum = 0;
for x in data {
sum += x * x;
}
sum
}
// After: Extracted method
fn process(data: Vec<i32>) -> i32 {
data.iter().map(|x| square(x)).sum()
}
fn square(x: &i32) -> i32 {
x * x
}
```
**Steps:**
1. Extract square() function
2. Run tests
3. Commit
4. Replace loop with iterator
5. Run tests
6. Commit
### Rename Variable/Function
**When:** Name is unclear or misleading
```rust
// Before
fn calc(d: Vec<i32>) -> f64 {
let s: i32 = d.iter().sum();
s as f64 / d.len() as f64
}
// After - Step by step
// Step 1: Rename function
fn calculate_average(d: Vec<i32>) -> f64 { ... } // Test, commit
// Step 2: Rename parameter
fn calculate_average(data: Vec<i32>) -> f64 { ... } // Test, commit
// Step 3: Rename variable
fn calculate_average(data: Vec<i32>) -> f64 {
let sum: i32 = data.iter().sum(); // Test, commit
sum as f64 / data.len() as f64
}
```
### Extract Class/Struct
**When:** Class has multiple responsibilities
```rust
// Before: God object
struct UserService {
db: Database,
email_validator: Regex,
name_validator: Regex,
}
// After: Single responsibility
struct UserService {
db: Database,
validator: UserValidator, // EXTRACTED
}
struct UserValidator {
email_pattern: Regex,
name_pattern: Regex,
}
```
**Steps:**
1. Create empty UserValidator struct
2. Test, commit
3. Move email_validator field
4. Test, commit
5. Move name_validator field
6. Test, commit
7. Update UserService to use UserValidator
8. Test, commit
### Inline Unnecessary Abstraction
**When:** Abstraction adds no value
```rust
// Before: Pointless wrapper
fn get_user_email(user: &User) -> &str {
&user.email
}
fn process() {
let email = get_user_email(&user); // Just use user.email!
}
// After: Inline
fn process() {
let email = &user.email;
}
```
**Steps:**
1. Replace one call site with direct access
2. Test, commit
3. Replace next call site
4. Test, commit
5. Remove wrapper function
6. Test, commit

View File

@@ -0,0 +1,646 @@
---
name: review-implementation
description: Use after hyperpowers:executing-plans completes all tasks - verifies implementation against bd spec, all success criteria met, anti-patterns avoided
---
<skill_overview>
Review completed implementation against bd epic to catch gaps before claiming completion; spec is contract, implementation must fulfill contract completely.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow the 4-step review process exactly. Review with Google Fellow-level scrutiny. Never skip automated checks, quality gates, or code reading. No approval without evidence for every criterion.
</rigidity_level>
<quick_reference>
| Step | Action | Deliverable |
|------|--------|-------------|
| 1 | Load bd epic + all tasks | TodoWrite with tasks to review |
| 2 | Review each task (automated checks, quality gates, read code, verify criteria) | Findings per task |
| 3 | Report findings (approved / gaps found) | Review decision |
| 4 | Gate: If approved → finishing-a-development-branch, If gaps → STOP | Next action |
**Review Perspective:** Google Fellow-level SRE with 20+ years experience reviewing junior engineer code.
</quick_reference>
<when_to_use>
- hyperpowers:executing-plans completed all tasks
- Before claiming work is complete
- Before hyperpowers:finishing-a-development-branch
- Want to verify implementation matches spec
**Don't use for:**
- Mid-implementation (use hyperpowers:executing-plans)
- Before all tasks done
- Code reviews of external PRs (this is self-review)
</when_to_use>
<the_process>
## Step 1: Load Epic Specification
**Announce:** "I'm using hyperpowers:review-implementation to verify implementation matches spec. Reviewing with Google Fellow-level scrutiny."
**Get epic and tasks:**
```bash
bd show bd-1 # Epic specification
bd dep tree bd-1 # Task tree
bd list --parent bd-1 # All tasks
```
**Create TodoWrite tracker:**
```
TodoWrite todos:
- Review bd-2: Task Name
- Review bd-3: Task Name
- Review bd-4: Task Name
- Compile findings and make decision
```
---
## Step 2: Review Each Task
For each task:
### A. Read Task Specification
```bash
bd show bd-3
```
Extract:
- Goal (what problem solved?)
- Success criteria (how verify done?)
- Implementation checklist (files/functions/tests)
- Key considerations (edge cases)
- Anti-patterns (prohibited patterns)
---
### B. Run Automated Code Completeness Checks
```bash
# TODOs/FIXMEs without issue numbers
rg -i "todo|fixme" src/ tests/ || echo "✅ None"
# Stub implementations
rg "unimplemented!|todo!|unreachable!|panic!\(\"not implemented" src/ || echo "✅ None"
# Unsafe patterns in production
rg "\.unwrap\(\)|\.expect\(" src/ | grep -v "/tests/" || echo "✅ None"
# Ignored/skipped tests
rg "#\[ignore\]|#\[skip\]|\.skip\(\)" tests/ src/ || echo "✅ None"
```
---
### C. Run Quality Gates (via test-runner agent)
**IMPORTANT:** Use hyperpowers:test-runner agent to avoid context pollution.
```
Dispatch hyperpowers:test-runner: "Run: cargo test"
Dispatch hyperpowers:test-runner: "Run: cargo fmt --check"
Dispatch hyperpowers:test-runner: "Run: cargo clippy -- -D warnings"
Dispatch hyperpowers:test-runner: "Run: .git/hooks/pre-commit"
```
---
### D. Read Implementation Files
**CRITICAL:** READ actual files, not just git diff.
```bash
# See changes
git diff main...HEAD -- src/auth/jwt.ts
# THEN READ FULL FILE
Read tool: src/auth/jwt.ts
```
**While reading, check:**
- ✅ Code implements checklist items (not stubs)
- ✅ Error handling uses proper patterns (Result, try/catch)
- ✅ Edge cases from "Key Considerations" handled
- ✅ Code is clear and maintainable
- ✅ No anti-patterns present
---
### E. Code Quality Review (Google Fellow Perspective)
**Assume code written by junior engineer. Apply production-grade scrutiny.**
**Error Handling:**
- Proper use of Result/Option or try/catch?
- Error messages helpful for production debugging?
- No unwrap/expect in production?
- Errors propagate with context?
- Failure modes graceful?
**Safety:**
- No unsafe blocks without justification?
- Proper bounds checking?
- No potential panics?
- No data races?
- No SQL injection, XSS vulnerabilities?
**Clarity:**
- Would junior understand in 6 months?
- Single responsibility per function?
- Descriptive variable names?
- Complex logic explained?
- No clever tricks - obvious and boring?
**Testing:**
- Edge cases covered (empty, max, Unicode)?
- Tests meaningful, not just coverage?
- Test names describe what verified?
- Tests test behavior, not implementation?
- Failure scenarios tested?
**Production Readiness:**
- Comfortable deploying to production?
- Could cause outage or data loss?
- Performance acceptable under load?
- Logging sufficient for debugging?
---
### F. Verify Success Criteria with Evidence
For EACH criterion in bd task:
- Run verification command
- Check actual output
- Don't assume - verify with evidence
- Use hyperpowers:test-runner for tests/lints
**Example:**
```
Criterion: "All tests passing"
Command: cargo test
Evidence: "127 tests passed, 0 failures"
Result: ✅ Met
Criterion: "No unwrap in production"
Command: rg "\.unwrap\(\)" src/
Evidence: "No matches"
Result: ✅ Met
```
---
### G. Check Anti-Patterns
Search for each prohibited pattern from bd task:
```bash
# Example anti-patterns from task
rg "\.unwrap\(\)" src/ # If task prohibits unwrap
rg "TODO" src/ # If task prohibits untracked TODOs
rg "\.skip\(\)" tests/ # If task prohibits skipped tests
```
---
### H. Verify Key Considerations
Read code to confirm edge cases handled:
- Empty input validation
- Unicode handling
- Concurrent access
- Failure modes
- Performance concerns
**Example:** Task says "Must handle empty payload" → Find validation code for empty payload.
---
### I. Record Findings
```markdown
### Task: bd-3 - Implement JWT authentication
#### Automated Checks
- TODOs: ✅ None
- Stubs: ✅ None
- Unsafe patterns: ❌ Found `.unwrap()` at src/auth/jwt.ts:45
- Ignored tests: ✅ None
#### Quality Gates
- Tests: ✅ Pass (127 tests)
- Formatting: ✅ Pass
- Linting: ❌ 3 warnings
- Pre-commit: ❌ Fails due to linting
#### Files Reviewed
- src/auth/jwt.ts: ⚠️ Contains `.unwrap()` at line 45
- tests/auth/jwt_test.rs: ✅ Complete
#### Code Quality
- Error Handling: ⚠️ Uses unwrap instead of proper error propagation
- Safety: ✅ Good
- Clarity: ✅ Good
- Testing: ✅ Good
#### Success Criteria
1. "All tests pass": ✅ Met - Evidence: 127 tests passed
2. "Pre-commit passes": ❌ Not met - Evidence: clippy warnings
3. "No unwrap in production": ❌ Not met - Evidence: Found at jwt.ts:45
#### Anti-Patterns
- "NO unwrap in production": ❌ Violated at src/auth/jwt.ts:45
#### Issues
**Critical:**
1. unwrap() at jwt.ts:45 - violates anti-pattern, must use proper error handling
**Important:**
2. 3 clippy warnings block pre-commit hook
```
---
### J. Mark Task Reviewed (TodoWrite)
---
## Step 3: Report Findings
After reviewing ALL tasks:
**If NO gaps:**
```markdown
## Implementation Review: APPROVED ✅
Reviewed bd-1 (OAuth Authentication) against implementation.
### Tasks Reviewed
- bd-2: Configure OAuth provider ✅
- bd-3: Implement token exchange ✅
- bd-4: Add refresh logic ✅
### Verification Summary
- All success criteria verified
- No anti-patterns detected
- All key considerations addressed
- All files implemented per spec
### Evidence
- Tests: 127 passed, 0 failures (2.3s)
- Linting: No warnings
- Pre-commit: Pass
- Code review: Production-ready
Ready to proceed to hyperpowers:finishing-a-development-branch.
```
**If gaps found:**
```markdown
## Implementation Review: GAPS FOUND ❌
Reviewed bd-1 (OAuth Authentication) against implementation.
### Tasks with Gaps
#### bd-3: Implement token exchange
**Gaps:**
- ❌ Success criterion not met: "Pre-commit hooks pass"
- Evidence: cargo clippy shows 3 warnings
- ❌ Anti-pattern violation: Found `.unwrap()` at src/auth/jwt.ts:45
- ⚠️ Key consideration not addressed: "Empty payload validation"
- No check for empty payload in generateToken()
#### bd-4: Add refresh logic
**Gaps:**
- ❌ Success criterion not met: "All tests passing"
- Evidence: test_verify_expired_token failing
### Cannot Proceed
Implementation does not match spec. Fix gaps before completing.
```
---
## Step 4: Gate Decision
**If APPROVED:**
```
Announce: "I'm using hyperpowers:finishing-a-development-branch to complete this work."
Use Skill tool: hyperpowers:finishing-a-development-branch
```
**If GAPS FOUND:**
```
STOP. Do not proceed to finishing-a-development-branch.
Fix gaps or discuss with partner.
Re-run review after fixes.
```
</the_process>
<examples>
<example>
<scenario>Developer only checks git diff, doesn't read actual files</scenario>
<code>
# Review process
git diff main...HEAD # Shows changes
# Developer sees:
+ function generateToken(payload) {
+ return jwt.sign(payload, secret);
+ }
# Approves based on diff
"Looks good, token generation implemented ✅"
# Misses: Full context shows no validation
function generateToken(payload) {
// No validation of payload!
// No check for empty payload (key consideration)
// No error handling if jwt.sign fails
return jwt.sign(payload, secret);
}
</code>
<why_it_fails>
- Git diff shows additions, not full context
- Missed that empty payload not validated (key consideration)
- Missed that error handling missing (quality issue)
- False approval - gaps exist but not caught
- Will fail in production when empty payload passed
</why_it_fails>
<correction>
**Correct review process:**
```bash
# See changes
git diff main...HEAD -- src/auth/jwt.ts
# THEN READ FULL FILE
Read tool: src/auth/jwt.ts
```
**Reading full file reveals:**
```javascript
function generateToken(payload) {
// Missing: empty payload check (key consideration from bd task)
// Missing: error handling for jwt.sign failure
return jwt.sign(payload, secret);
}
```
**Record in findings:**
```
⚠️ Key consideration not addressed: "Empty payload validation"
- No check for empty payload in generateToken()
- Code at src/auth/jwt.ts:15-17
⚠️ Error handling: jwt.sign can throw, not handled
```
**What you gain:**
- Caught gaps that git diff missed
- Full context reveals missing validation
- Quality issues identified before production
- Spec compliance verified, not assumed
</correction>
</example>
<example>
<scenario>Developer assumes tests passing means done</scenario>
<code>
# Run tests
cargo test
# Output: 127 tests passed
# Developer concludes
"Tests pass, implementation complete ✅"
# Proceeds to finishing-a-development-branch
# Misses:
- bd task has 5 success criteria
- Only checked 1 (tests pass)
- Anti-pattern: unwrap() present (prohibited)
- Key consideration: Unicode handling not tested
- Linter has warnings (blocks pre-commit)
</code>
<why_it_fails>
- Tests passing ≠ spec compliance
- Didn't verify all success criteria
- Didn't check anti-patterns
- Didn't verify key considerations
- Pre-commit will fail (blocks merge)
- Ships code violating anti-patterns
</why_it_fails>
<correction>
**Correct review checks ALL criteria:**
```markdown
bd task has 5 success criteria:
1. "All tests pass" ✅ - Evidence: 127 passed
2. "Pre-commit passes" ❌ - Evidence: clippy warns (3 warnings)
3. "No unwrap in production" ❌ - Evidence: Found at jwt.ts:45
4. "Unicode handling tested" ⚠️ - Need to verify test exists
5. "Rate limiting implemented" ⚠️ - Need to check code
Result: 1/5 criteria verified met. GAPS EXIST.
```
**Run additional checks:**
```bash
# Check criterion 2
cargo clippy
# 3 warnings found ❌
# Check criterion 3
rg "\.unwrap\(\)" src/
# src/auth/jwt.ts:45 ❌
# Check criterion 4
rg "unicode" tests/
# No matches ⚠️ Need to verify
```
**Decision: GAPS FOUND, cannot proceed**
**What you gain:**
- Verified ALL criteria, not just tests
- Caught anti-pattern violations
- Caught pre-commit blockers
- Prevented shipping non-compliant code
- Spec contract honored completely
</correction>
</example>
<example>
<scenario>Developer rationalizes skipping rigor for "simple" task</scenario>
<code>
bd task: "Add logging to error paths"
# Developer thinks: "Simple task, just added console.log"
# Skips:
- Automated checks (assumes no issues)
- Code quality review (seems obvious)
- Full success criteria verification
# Approves quickly:
"Logging added ✅"
# Misses:
- console.log used instead of proper logger (anti-pattern)
- Only added to 2 of 5 error paths (incomplete)
- No test verifying logs actually output (criterion)
- Logs contain sensitive data (security issue)
</code>
<why_it_fails>
- "Simple" tasks have hidden complexity
- Skipped rigor catches exactly these issues
- Incomplete implementation (2/5 paths)
- Security vulnerability shipped
- Anti-pattern not caught
- Failed success criterion (test logs)
</why_it_fails>
<correction>
**Follow full review process:**
```bash
# Automated checks
rg "console\.log" src/
# Found at error-handler.ts:12, 15 ⚠️
# Read bd task
bd show bd-5
# Success criteria:
# 1. "All error paths logged"
# 2. "No sensitive data in logs"
# 3. "Test verifies log output"
# Check criterion 1
grep -n "throw new Error" src/
# 5 locations found
# Only 2 have logging ❌ Incomplete
# Check criterion 2
Read tool: src/error-handler.ts
# Logs contain password field ❌ Security issue
# Check criterion 3
rg "test.*log" tests/
# No matches ❌ Test missing
```
**Decision: GAPS FOUND**
- Incomplete (3/5 error paths missing logs)
- Security issue (logs password)
- Anti-pattern (console.log instead of logger)
- Missing test
**What you gain:**
- "Simple" task revealed multiple gaps
- Security vulnerability caught pre-production
- Rigor prevents incomplete work shipping
- All criteria must be met, no exceptions
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Review every task** → No skipping "simple" tasks
2. **Run all automated checks** → TODOs, stubs, unwrap, ignored tests
3. **Read actual files with Read tool** → Not just git diff
4. **Verify every success criterion** → With evidence, not assumptions
5. **Check all anti-patterns** → Search for prohibited patterns
6. **Apply Google Fellow scrutiny** → Production-grade code review
7. **If gaps found → STOP** → Don't proceed to finishing-a-development-branch
## Common Excuses
All of these mean: **STOP. Follow full review process.**
- "Tests pass, must be complete" (Tests ≠ spec, check all criteria)
- "I implemented it, it's done" (Implementation ≠ compliance, verify)
- "No time for thorough review" (Gaps later cost more than review now)
- "Looks good to me" (Opinion ≠ evidence, run verifications)
- "Small gaps don't matter" (Spec is contract, all criteria matter)
- "Will fix in next PR" (This PR completes this epic, fix now)
- "Can check diff instead of files" (Diff shows changes, not context)
- "Automated checks cover it" (Checks + code review both required)
- "Success criteria passing means done" (Also check anti-patterns, quality, edge cases)
</critical_rules>
<verification_checklist>
Before approving implementation:
**Per task:**
- [ ] Read bd task specification completely
- [ ] Ran all automated checks (TODOs, stubs, unwrap, ignored tests)
- [ ] Ran all quality gates via test-runner agent (tests, format, lint, pre-commit)
- [ ] Read actual implementation files with Read tool (not just diff)
- [ ] Reviewed code quality with Google Fellow perspective
- [ ] Verified every success criterion with evidence
- [ ] Checked every anti-pattern (searched for prohibited patterns)
- [ ] Verified every key consideration addressed in code
**Overall:**
- [ ] Reviewed ALL tasks (no exceptions)
- [ ] TodoWrite tracker shows all tasks reviewed
- [ ] Compiled findings (approved or gaps)
- [ ] If approved: all criteria met for all tasks
- [ ] If gaps: documented exactly what missing
**Can't check all boxes?** Return to Step 2 and complete review.
</verification_checklist>
<integration>
**This skill is called by:**
- hyperpowers:executing-plans (Step 5, after all tasks executed)
**This skill calls:**
- hyperpowers:finishing-a-development-branch (if approved)
- hyperpowers:test-runner agent (for quality gates)
**This skill uses:**
- hyperpowers:verification-before-completion principles (evidence before claims)
**Call chain:**
```
hyperpowers:executing-plans → hyperpowers:review-implementation → hyperpowers:finishing-a-development-branch
(if gaps: STOP)
```
**CRITICAL:** Use bd commands (bd show, bd list, bd dep tree), never read `.beads/issues.jsonl` directly.
</integration>
<resources>
**Detailed guides:**
- [Code quality standards by language](resources/quality-standards.md)
- [Common anti-patterns to check](resources/anti-patterns-reference.md)
- [Production readiness checklist](resources/production-checklist.md)
**When stuck:**
- Unsure if gap critical → If violates criterion, it's a gap
- Criteria ambiguous → Ask user for clarification before approving
- Anti-pattern unclear → Search for it, document if found
- Quality concern → Document as gap, don't rationalize away
</resources>

View File

@@ -0,0 +1,566 @@
---
name: root-cause-tracing
description: Use when errors occur deep in execution - traces bugs backward through call stack to find original trigger, not just symptom
---
<skill_overview>
Bugs manifest deep in the call stack; trace backward until you find the original trigger, then fix at source, not where error appears.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Follow the backward tracing process strictly, but adapt instrumentation and debugging techniques to your language and tools.
</rigidity_level>
<quick_reference>
| Step | Action | Question |
|------|--------|----------|
| 1 | Read error completely | What failed and where? |
| 2 | Find immediate cause | What code directly threw this? |
| 3 | Trace backward one level | What called this code? |
| 4 | Keep tracing up stack | What called that? |
| 5 | Find where bad data originated | Where was invalid value created? |
| 6 | Fix at source | Address root cause |
| 7 | Add defense at each layer | Validate assumptions as backup |
**Core rule:** Never fix just where error appears. Fix where problem originates.
</quick_reference>
<when_to_use>
- Error happens deep in execution (not at entry point)
- Stack trace shows long call chain
- Unclear where invalid data originated
- Need to find which test/code triggers problem
- Error message points to utility/library code
**Example symptoms:**
- "Database rejects empty string" ← Where did empty string come from?
- "File not found: ''" ← Why is path empty?
- "Invalid argument to function" ← Who passed invalid argument?
- "Null pointer dereference" ← What should have been initialized?
</when_to_use>
<the_process>
## 1. Observe the Symptom
Read the complete error:
```
Error: Invalid email format: ""
at validateEmail (validator.ts:42)
at UserService.create (user-service.ts:18)
at ApiHandler.createUser (api-handler.ts:67)
at HttpServer.handleRequest (server.ts:123)
at TestCase.test_create_user (user.test.ts:10)
```
**Symptom:** Email validation fails on empty string
**Location:** Deep in validator utility
**DON'T fix here yet.** This might be symptom, not source.
---
## 2. Find Immediate Cause
What code directly causes this?
```typescript
// validator.ts:42
function validateEmail(email: string): boolean {
if (!email) throw new Error(`Invalid email format: "${email}"`);
return EMAIL_REGEX.test(email);
}
```
**Question:** Why is email empty? Keep tracing.
---
## 3. Trace Backward: What Called This?
Use stack trace:
```typescript
// user-service.ts:18
create(request: UserRequest): User {
validateEmail(request.email); // Called with request.email = ""
// ...
}
```
**Question:** Why is `request.email` empty? Keep tracing.
---
## 4. Keep Tracing Up the Stack
```typescript
// api-handler.ts:67
async createUser(req: Request): Promise<Response> {
const userRequest = {
name: req.body.name,
email: req.body.email || "", // ← FOUND IT!
};
return this.userService.create(userRequest);
}
```
**Root cause found:** API handler provides default empty string when email missing.
---
## 5. Identify the Pattern
**Why empty string as default?**
- Misguided "safety": Thought empty string better than undefined
- Should reject invalid request at API boundary
- Downstream code assumes data already validated
---
## 6. Fix at Source
```typescript
// api-handler.ts (SOURCE FIX)
async createUser(req: Request): Promise<Response> {
if (!req.body.email) {
return Response.badRequest("Email is required");
}
const userRequest = {
name: req.body.name,
email: req.body.email, // No default, already validated
};
return this.userService.create(userRequest);
}
```
---
## 7. Add Defense in Depth
After fixing source, add validation at each layer as backup:
```typescript
// Layer 1: API - Reject invalid input (PRIMARY FIX)
if (!req.body.email) return Response.badRequest("Email required");
// Layer 2: Service - Validate assumptions
assert(request.email, "email must be present");
// Layer 3: Utility - Defensive check
if (!email) throw new Error("invariant violated: email empty");
```
**Primary fix at source. Defense is backup, not replacement.**
</the_process>
<debugging_approaches>
## Option 1: Guide User Through Debugger
**IMPORTANT:** Claude cannot run interactive debuggers. Guide user through debugger commands.
```
"Let's use lldb to trace backward through the call stack.
Please run these commands:
lldb target/debug/myapp
(lldb) breakpoint set --file validator.rs --line 42
(lldb) run
When breakpoint hits:
(lldb) frame variable email # Check value here
(lldb) bt # See full call stack
(lldb) up # Move to caller
(lldb) frame variable request # Check values in caller
(lldb) up # Move up again
(lldb) frame variable # Where empty string created?
Please share:
1. Value of 'email' at validator.rs:42
2. Value of 'request.email' in user_service.rs
3. Value of 'req.body.email' in api_handler.rs
4. Where does empty string first appear?"
```
---
## Option 2: Add Instrumentation (Claude CAN Do This)
When debugger not available or issue intermittent:
```rust
// Add at error location
fn validate_email(email: &str) -> Result<()> {
eprintln!("DEBUG validate_email called:");
eprintln!(" email: {:?}", email);
eprintln!(" backtrace: {}", std::backtrace::Backtrace::capture());
if email.is_empty() {
return Err(Error::InvalidEmail);
}
// ...
}
```
**Critical:** Use `eprintln!()` or `console.error()` in tests (not logger - may be suppressed).
**Run and analyze:**
```bash
cargo test 2>&1 | grep "DEBUG validate_email" -A 10
```
Look for:
- Test file names in backtraces
- Line numbers triggering the call
- Patterns (same test? same parameter?)
</debugging_approaches>
<finding_polluting_tests>
## Finding Which Test Pollutes
When something appears during tests but you don't know which:
**Binary search approach:**
```bash
# Run half the tests
npm test tests/first-half/*.test.ts
# Pollution appears? Yes → in first half, No → second half
# Subdivide
npm test tests/first-quarter/*.test.ts
# Continue until specific file
npm test tests/auth/login.test.ts ← Found it!
```
**Or test isolation:**
```bash
# Run tests one at a time
for test in tests/**/*.test.ts; do
echo "Testing: $test"
npm test "$test"
if [ -d .git ]; then
echo "FOUND POLLUTER: $test"
break
fi
done
```
</finding_polluting_tests>
<examples>
<example>
<scenario>Developer fixes symptom, not source</scenario>
<code>
# Error appears in git utility:
fn git_init(directory: &str) {
Command::new("git")
.arg("init")
.current_dir(directory)
.run()
}
# Error: "Invalid argument: empty directory"
# Developer adds validation at symptom:
fn git_init(directory: &str) {
if directory.is_empty() {
panic!("Directory cannot be empty"); // Band-aid
}
Command::new("git").arg("init").current_dir(directory).run()
}
</code>
<why_it_fails>
- Fixes symptom, not source (where empty string created)
- Same bug will appear elsewhere directory is used
- Doesn't explain WHY directory was empty
- Future code might make same mistake
- Band-aid hides the real problem
</why_it_fails>
<correction>
**Trace backward:**
1. git_init called with directory=""
2. WorkspaceManager.init(projectDir="")
3. Session.create(projectDir="")
4. Test: Project.create(context.tempDir)
5. **SOURCE:** context.tempDir="" (accessed before beforeEach!)
**Fix at source:**
```typescript
function setupTest() {
let _tempDir: string | undefined;
return {
beforeEach() {
_tempDir = makeTempDir();
},
get tempDir(): string {
if (!_tempDir) {
throw new Error("tempDir accessed before beforeEach!");
}
return _tempDir;
}
};
}
```
**What you gain:**
- Fixes actual bug (test timing issue)
- Prevents same mistake elsewhere
- Clear error at source, not deep in stack
- No empty strings propagating through system
</correction>
</example>
<example>
<scenario>Developer stops tracing too early</scenario>
<code>
# Error in API handler
async createUser(req: Request): Promise<Response> {
const userRequest = {
name: req.body.name,
email: req.body.email || "", // Suspicious!
};
return this.userService.create(userRequest);
}
# Developer sees empty string default and "fixes" it:
email: req.body.email || "noreply@example.com"
# Ships to production
# Bug: Users created without email input get noreply@example.com
# Database has fake emails, can't distinguish missing from real
</code>
<why_it_fails>
- Stopped at first suspicious code
- Didn't question WHY empty string was default
- "Fixed" by replacing with different wrong default
- Root cause: shouldn't accept missing email at all
- Validation should happen at API boundary
</why_it_fails>
<correction>
**Keep tracing to understand intent:**
1. Why was empty string default?
2. Should email be optional or required?
3. What does API spec say?
4. What does database schema say?
**Findings:**
- Email column is NOT NULL in database
- API docs say email is required
- Empty string was workaround, not design
**Fix at source (validate at boundary):**
```typescript
async createUser(req: Request): Promise<Response> {
// Validate at API boundary
if (!req.body.email) {
return Response.badRequest("Email is required");
}
const userRequest = {
name: req.body.name,
email: req.body.email, // No default needed
};
return this.userService.create(userRequest);
}
```
**What you gain:**
- Validates at correct layer (API boundary)
- Clear error message to client
- No invalid data propagates downstream
- Database constraints enforced
- Matches API specification
</correction>
</example>
<example>
<scenario>Complex multi-layer trace to find original trigger</scenario>
<code>
# Problem: .git directory appearing in source code directory during tests
# Symptom location:
Error: Cannot initialize git repo (repo already exists)
Location: src/workspace/git.rs:45
# Developer adds check:
if Path::new(".git").exists() {
return Err("Git already initialized");
}
# Doesn't help - still appears in wrong place!
</code>
<why_it_fails>
- Detects symptom, doesn't prevent it
- .git still created in wrong directory
- Doesn't explain HOW it gets there
- Pollution still happens, just detected
</why_it_fails>
<correction>
**Trace through multiple layers:**
```
1. git init runs with cwd=""
↓ Why is cwd empty?
2. WorkspaceManager.init(projectDir="")
↓ Why is projectDir empty?
3. Session.create(projectDir="")
↓ Why was empty string passed?
4. Test: Project.create(context.tempDir)
↓ Why is context.tempDir empty?
5. ROOT CAUSE:
const context = setupTest(); // tempDir="" initially
Project.create(context.tempDir); // Accessed at top level!
beforeEach(() => {
context.tempDir = makeTempDir(); // Assigned here
});
TEST ACCESSED TEMPDIR BEFORE BEFOREEACH RAN!
```
**Fix at source (make early access impossible):**
```typescript
function setupTest() {
let _tempDir: string | undefined;
return {
beforeEach() {
_tempDir = makeTempDir();
},
get tempDir(): string {
if (!_tempDir) {
throw new Error("tempDir accessed before beforeEach!");
}
return _tempDir;
}
};
}
```
**Then add defense at each layer:**
```rust
// Layer 1: Test framework (PRIMARY FIX)
// Getter throws if accessed early
// Layer 2: Project validation
fn create(directory: &str) -> Result<Self> {
if directory.is_empty() {
return Err("Directory cannot be empty");
}
// ...
}
// Layer 3: Workspace validation
fn init(path: &Path) -> Result<()> {
if !path.exists() {
return Err("Path must exist");
}
// ...
}
// Layer 4: Environment guard
fn git_init(dir: &Path) -> Result<()> {
if env::var("NODE_ENV") != Ok("test".to_string()) {
if !dir.starts_with("/tmp") {
panic!("Refusing to git init outside test dir");
}
}
// ...
}
```
**What you gain:**
- Primary fix prevents early access (source)
- Each layer validates assumptions (defense)
- Clear error at source, not deep in stack
- Environment guard prevents production pollution
- Multi-layer defense catches future mistakes
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Never fix just where error appears** → Trace backward to find source
2. **Don't stop at first suspicious code** → Keep tracing to original trigger
3. **Fix at source first** → Defense is backup, not primary fix
4. **Use debugger OR instrumentation** → Don't guess at call chain
5. **Add defense at each layer** → After fixing source, validate assumptions throughout
## Common Excuses
All of these mean: **STOP. Trace backward to find source.**
- "Error is obvious here, I'll add validation" (That's a symptom fix)
- "Stack trace shows the problem" (Shows symptom location, not source)
- "This code should handle empty values" (Why is value empty? Find source.)
- "Too deep to trace, I'll add defensive check" (Defense without source fix = band-aid)
- "Multiple places could cause this" (Trace to find which one actually does)
</critical_rules>
<verification_checklist>
Before claiming root cause fixed:
- [ ] Traced backward through entire call chain
- [ ] Found where invalid data was created (not just passed)
- [ ] Identified WHY invalid data was created (pattern/assumption)
- [ ] Fixed at source (where bad data originates)
- [ ] Added defense at each layer (validate assumptions)
- [ ] Verified fix with test (reproduces original bug, passes with fix)
- [ ] Confirmed no other code paths have same pattern
**Can't check all boxes?** Keep tracing backward.
</verification_checklist>
<integration>
**This skill is called by:**
- hyperpowers:debugging-with-tools (Phase 2: Trace Backward Through Call Stack)
- When errors occur deep in execution
- When unclear where invalid data originated
**This skill requires:**
- Stack traces or debugger access
- Ability to add instrumentation (logging)
- Understanding of call chain
**This skill calls:**
- hyperpowers:test-driven-development (write regression test after finding source)
- hyperpowers:verification-before-completion (verify fix works)
</integration>
<resources>
**Detailed guides:**
- [Debugger commands by language](resources/debugger-reference.md)
- [Instrumentation patterns](resources/instrumentation-patterns.md)
- [Defense-in-depth examples](resources/defense-patterns.md)
**When stuck:**
- Can't find source → Add instrumentation at each layer, run test
- Stack trace unclear → Use debugger to inspect variables at each frame
- Multiple suspects → Add instrumentation to all, find which actually executes
- Intermittent issue → Add instrumentation and wait for reproduction
</resources>

View File

@@ -0,0 +1,399 @@
---
name: skills-auto-activation
description: Use when skills aren't activating reliably - covers official solutions (better descriptions) and custom hook system for deterministic skill activation
---
<skill_overview>
Skills often don't activate despite keywords; make activation reliable through better descriptions, explicit triggers, or custom hooks.
</skill_overview>
<rigidity_level>
HIGH FREEDOM - Choose solution level based on project needs (Level 1 for simple, Level 3 for complex). Hook implementation is flexible pattern, not rigid process.
</rigidity_level>
<quick_reference>
| Level | Solution | Effort | Reliability | When to Use |
|-------|----------|--------|-------------|-------------|
| 1 | Better descriptions + explicit requests | Low | Moderate | Small projects, starting out |
| 2 | CLAUDE.md references | Low | Moderate | Document patterns |
| 3 | Custom hook system | High | Very High | Large projects, established patterns |
**Hyperpowers includes:** Auto-activation hook at `hooks/user-prompt-submit/10-skill-activator.js`
</quick_reference>
<when_to_use>
Use this skill when:
- Skills you created aren't being used automatically
- Need consistent skill activation across sessions
- Large codebases with established patterns
- Manual "/use skill-name" gets tedious
**Prerequisites:**
- Skills properly configured (name, description, SKILL.md)
- Code execution enabled (Settings > Capabilities)
- Skills toggled on (Settings > Capabilities)
</when_to_use>
<the_problem>
## What Users Experience
**Symptoms:**
- Keywords from skill descriptions present → skill not used
- Working on files that should trigger skills → nothing
- Skills exist but sit unused
**Community reports:**
- GitHub Issue #9954: "Skills not available even if explicitly enabled"
- "Claude knows it should use skills, but it's not reliable"
- Skills activation is "not reliable yet"
**Root cause:** Skills rely on Claude recognizing relevance (not deterministic)
</the_problem>
<solution_levels>
## Level 1: Official Solutions (Start Here)
### 1. Improve Skill Descriptions
**Bad:**
```yaml
name: backend-dev
description: Helps with backend development
```
**Good:**
```yaml
name: backend-dev-guidelines
description: Use when creating API routes, controllers, services, or repositories in backend - enforces TypeScript patterns, error handling with Sentry, and Prisma repository pattern
```
**Key elements:**
- Specific keywords: "API routes", "controllers", "services"
- When to use: "Use when creating..."
- What it enforces: Patterns, error handling
### 2. Be Explicit in Requests
Instead of: "How do I create an endpoint?"
Try: "Use my backend-dev-guidelines skill to create an endpoint"
**Result:** Works, but tedious
### 3. Check Settings
- Settings > Capabilities > Enable code execution
- Settings > Capabilities > Toggle Skills on
- Team/Enterprise: Check org-level settings
---
## Level 2: Skill References (Moderate)
Reference skills in CLAUDE.md:
```markdown
## When Working on Backend
Before making changes:
1. Check `/skills/backend-dev-guidelines` for patterns
2. Follow repository pattern for database access
The backend-dev-guidelines skill contains complete examples.
```
**Pros:** No custom code
**Cons:** Claude still might not check
---
## Level 3: Custom Hook System (Advanced)
**How it works:**
1. UserPromptSubmit hook analyzes prompt before Claude sees it
2. Matches keywords, intent patterns, file paths
3. Injects skill activation reminder into context
4. Claude sees "🎯 USE these skills" before processing
**Result:** "Night and day difference" - skills consistently used
### Architecture
```
User submits prompt
UserPromptSubmit hook intercepts
Analyze prompt (keywords, intent, files)
Check skill-rules.json for matches
Inject activation reminder
Claude sees: "🎯 USE these skills: ..."
Claude loads and uses relevant skills
```
### Configuration: skill-rules.json
```json
{
"backend-dev-guidelines": {
"type": "domain",
"enforcement": "suggest",
"priority": "high",
"promptTriggers": {
"keywords": ["backend", "controller", "service", "API", "endpoint"],
"intentPatterns": [
"(create|add|build).*?(route|endpoint|controller|service)",
"(how to|pattern).*?(backend|API)"
]
},
"fileTriggers": {
"pathPatterns": ["backend/src/**/*.ts", "server/**/*.ts"],
"contentPatterns": ["express\\.Router", "export.*Controller"]
}
},
"test-driven-development": {
"type": "process",
"enforcement": "suggest",
"priority": "high",
"promptTriggers": {
"keywords": ["test", "TDD", "testing"],
"intentPatterns": [
"(write|add|create).*?(test|spec)",
"test.*(first|before|TDD)"
]
},
"fileTriggers": {
"pathPatterns": ["**/*.test.ts", "**/*.spec.ts"],
"contentPatterns": ["describe\\(", "it\\(", "test\\("]
}
}
}
```
### Trigger Types
1. **Keyword Triggers** - Simple string matching (case insensitive)
2. **Intent Pattern Triggers** - Regex for actions + objects
3. **File Path Triggers** - Glob patterns for file paths
4. **Content Pattern Triggers** - Regex in file content
### Hook Implementation (High-Level)
```javascript
#!/usr/bin/env node
// ~/.claude/hooks/user-prompt-submit/skill-activator.js
const fs = require('fs');
const path = require('path');
// Load skill rules
const rules = JSON.parse(fs.readFileSync(
path.join(process.env.HOME, '.claude/skill-rules.json'), 'utf8'
));
// Read prompt from stdin
let promptData = '';
process.stdin.on('data', chunk => promptData += chunk);
process.stdin.on('end', () => {
const prompt = JSON.parse(promptData);
// Analyze prompt for skill matches
const activatedSkills = analyzePrompt(prompt.text);
if (activatedSkills.length > 0) {
// Inject skill activation reminder
const context = `
🎯 SKILL ACTIVATION CHECK
Relevant skills for this prompt:
${activatedSkills.map(s => `- **${s.skill}** (${s.priority} priority)`).join('\n')}
Check if these skills should be used before responding.
`;
console.log(JSON.stringify({
decision: 'approve',
additionalContext: context
}));
} else {
console.log(JSON.stringify({ decision: 'approve' }));
}
});
function analyzePrompt(text) {
// Match against all skill rules
// Return list of activated skills with priorities
}
```
**For complete working implementation:** See [resources/hook-implementation.md](resources/hook-implementation.md)
### Progressive Enhancement
**Phase 1 (Week 1):** Basic keyword matching
```json
{"keywords": ["backend", "API", "controller"]}
```
**Phase 2 (Week 2):** Add intent patterns
```json
{"intentPatterns": ["(create|add).*?(route|endpoint)"]}
```
**Phase 3 (Week 3):** Add file triggers
```json
{"fileTriggers": {"pathPatterns": ["backend/**/*.ts"]}}
```
**Phase 4 (Ongoing):** Refine based on observation
</solution_levels>
<results>
### Before Hook System
- Skills sit unused despite perfect keywords
- Manual "/use skill-name" every time
- Inconsistent patterns across codebase
- Time spent fixing "creative interpretations"
### After Hook System
- Skills activate automatically and reliably
- Consistent patterns enforced
- Claude self-checks before showing code
- "Night and day difference"
**Real user:** "Skills went from 'expensive decorations' to actually useful"
</results>
<limitations>
## Hook System Limitations
1. **Requires hook system** - Not built into Claude Code
2. **Maintenance overhead** - skill-rules.json needs updates
3. **May over-activate** - Too many skills overwhelm context
4. **Not perfect** - Still relies on Claude using activated skills
## Considerations
**Token usage:**
- Activation reminder adds ~50-100 tokens per prompt
- Multiple skills add more tokens
- Use priorities to limit activation
**Performance:**
- Hook adds ~100-300ms to prompt processing
- Acceptable for quality improvement
- Optimize regex patterns if slow
**Maintenance:**
- Update rules when adding new skills
- Review activation logs monthly
- Refine patterns based on misses
</limitations>
<alternatives>
## Approach 1: MCP Integration
Use Model Context Protocol to provide skills as context.
**Pros:** Built into Claude system
**Cons:** Still not deterministic, same activation issues
## Approach 2: Custom System Prompt
Modify Claude's system prompt to always check certain skills.
**Pros:** Works without hooks
**Cons:** Limited to Pro plan, can't customize per-project
## Approach 3: Manual Discipline
Always explicitly request skill usage.
**Pros:** No setup required
**Cons:** Tedious, easy to forget, doesn't scale
## Approach 4: Skill Consolidation
Combine all guidelines into CLAUDE.md.
**Pros:** Always loaded
**Cons:** Violates progressive disclosure, wastes tokens
**Recommendation:** Level 3 (hooks) for large projects, Level 1 for smaller projects
</alternatives>
<critical_rules>
## Rules That Have No Exceptions
1. **Try Level 1 first** → Better descriptions and explicit requests before building hooks
2. **Observe before building** → Watch which prompts should activate skills
3. **Start with keywords** → Add complexity incrementally (keywords → intent → files)
4. **Keep hook fast (<1 second)** → Don't block prompt processing
5. **Maintain skill-rules.json** → Update when skills change
## Common Excuses
All of these mean: **Try Level 1 first, then decide.**
- "Skills should just work automatically" (They should, but don't reliably - workaround needed)
- "Hook system too complex" (Setup takes 2 hours, saves hundreds of hours)
- "I'll manually specify skills" (You'll forget, it gets tedious)
- "Improving descriptions will fix it" (Helps, but not deterministic)
- "This is overkill" (Maybe - start Level 1, upgrade if needed)
</critical_rules>
<verification_checklist>
Before building hook system:
- [ ] Tried improving skill descriptions (Level 1)
- [ ] Tried explicit skill requests (Level 1)
- [ ] Checked all settings are enabled
- [ ] Observed which prompts should activate skills
- [ ] Identified patterns in failures
- [ ] Project large enough to justify hook overhead
- [ ] Have time for 2-hour setup + ongoing maintenance
**If Level 1 works:** Don't build hook system
**If Level 1 insufficient:** Build hook system (Level 3)
</verification_checklist>
<integration>
**This skill covers:** Skill activation strategies
**Related skills:**
- hyperpowers:building-hooks (how to build hook system)
- hyperpowers:using-hyper (when to use skills generally)
- hyperpowers:writing-skills (creating skills that activate well)
**This skill enables:**
- Consistent enforcement of patterns
- Automatic guideline checking
- Reliable skill usage across sessions
**Hyperpowers includes:** Auto-activation hook at `hooks/user-prompt-submit/10-skill-activator.js`
</integration>
<resources>
**Detailed implementation:**
- [Complete working hook code](resources/hook-implementation.md)
- [skill-rules.json examples](resources/skill-rules-examples.md)
- [Troubleshooting guide](resources/troubleshooting.md)
**Official documentation:**
- [Anthropic Skills Best Practices](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices)
- [Claude Code Hooks Guide](https://docs.claude.com/en/docs/claude-code/hooks-guide)
**When stuck:**
- Skills still not activating → Check Settings > Capabilities
- Hook not working → Check ~/.claude/logs/hooks.log
- Over-activation → Reduce keywords, increase priority thresholds
- Under-activation → Add more keywords, broaden intent patterns
</resources>

View File

@@ -0,0 +1,655 @@
# Complete Hook Implementation for Skills Auto-Activation
This guide provides complete, production-ready code for implementing skills auto-activation using Claude Code hooks.
## Complete File Structure
```
~/.claude/
├── hooks/
│ └── user-prompt-submit/
│ └── skill-activator.js # Main hook script
├── skill-rules.json # Skill activation rules
└── hooks.json # Hook configuration
```
## Step 1: Create skill-rules.json
**Location:** `~/.claude/skill-rules.json`
```json
{
"backend-dev-guidelines": {
"type": "domain",
"enforcement": "suggest",
"priority": "high",
"promptTriggers": {
"keywords": [
"backend",
"controller",
"service",
"repository",
"API",
"endpoint",
"route",
"middleware",
"database",
"prisma",
"sequelize"
],
"intentPatterns": [
"(create|add|build|implement).*?(route|endpoint|controller|service|repository)",
"(how to|best practice|pattern|guide).*?(backend|API|database|server)",
"(setup|configure|initialize).*?(database|ORM|API)",
"implement.*(authentication|authorization|auth|security)",
"(error|exception).*(handling|catching|logging)"
]
},
"fileTriggers": {
"pathPatterns": [
"backend/**/*.ts",
"backend/**/*.js",
"server/**/*.ts",
"api/**/*.ts",
"src/controllers/**",
"src/services/**",
"src/repositories/**"
],
"contentPatterns": [
"express\\.Router",
"export.*Controller",
"export.*Service",
"export.*Repository",
"prisma\\.",
"@Controller",
"@Injectable"
]
}
},
"frontend-dev-guidelines": {
"type": "domain",
"enforcement": "suggest",
"priority": "high",
"promptTriggers": {
"keywords": [
"frontend",
"component",
"react",
"UI",
"layout",
"page",
"view",
"hooks",
"state",
"props",
"routing",
"navigation"
],
"intentPatterns": [
"(create|build|add|implement).*?(component|page|layout|view|screen)",
"(how to|pattern|best practice).*?(react|hooks|state|context|props)",
"(style|CSS|design).*?(component|layout|UI)",
"implement.*?(routing|navigation|route)",
"(state|data).*(management|flow|handling)"
]
},
"fileTriggers": {
"pathPatterns": [
"src/components/**/*.tsx",
"src/components/**/*.jsx",
"src/pages/**/*.tsx",
"src/views/**/*.tsx",
"frontend/**/*.tsx"
],
"contentPatterns": [
"import.*from ['\"]react",
"export.*function.*Component",
"export.*default.*function",
"useState",
"useEffect",
"React\\.FC"
]
}
},
"test-driven-development": {
"type": "process",
"enforcement": "suggest",
"priority": "high",
"promptTriggers": {
"keywords": [
"test",
"testing",
"TDD",
"spec",
"unit test",
"integration test",
"e2e",
"jest",
"vitest",
"mocha"
],
"intentPatterns": [
"(write|add|create|implement).*?(test|spec|unit test)",
"test.*(first|before|TDD|driven)",
"(bug|fix|issue).*?(reproduce|test)",
"(coverage|untested).*?(code|function)",
"(mock|stub|spy).*?(function|API|service)"
]
},
"fileTriggers": {
"pathPatterns": [
"**/*.test.ts",
"**/*.test.js",
"**/*.spec.ts",
"**/*.spec.js",
"**/__tests__/**",
"**/test/**"
],
"contentPatterns": [
"describe\\(",
"it\\(",
"test\\(",
"expect\\(",
"jest\\.fn",
"beforeEach\\(",
"afterEach\\("
]
}
},
"debugging-with-tools": {
"type": "process",
"enforcement": "suggest",
"priority": "medium",
"promptTriggers": {
"keywords": [
"debug",
"debugging",
"error",
"bug",
"crash",
"fails",
"broken",
"not working",
"issue",
"problem"
],
"intentPatterns": [
"(debug|fix|solve|investigate|troubleshoot).*?(error|bug|issue|problem)",
"(why|what).*?(failing|broken|not working|crashing)",
"(find|locate|identify).*?(bug|issue|problem|root cause)",
"reproduce.*(bug|issue|error)"
]
}
},
"refactoring-safely": {
"type": "process",
"enforcement": "suggest",
"priority": "medium",
"promptTriggers": {
"keywords": [
"refactor",
"refactoring",
"cleanup",
"improve",
"restructure",
"reorganize",
"simplify"
],
"intentPatterns": [
"(refactor|clean up|improve|restructure).*?(code|function|class|component)",
"(extract|split|separate).*?(function|method|component|logic)",
"(rename|move|relocate).*?(file|function|class)",
"remove.*(duplication|duplicate|repeated code)"
]
}
}
}
```
## Step 2: Create Hook Script
**Location:** `~/.claude/hooks/user-prompt-submit/skill-activator.js`
```javascript
#!/usr/bin/env node
const fs = require('fs');
const path = require('path');
// Configuration
const CONFIG = {
rulesPath: process.env.SKILL_RULES || path.join(process.env.HOME, '.claude/skill-rules.json'),
maxSkills: 3, // Limit to avoid context overload
debugMode: process.env.DEBUG === 'true'
};
// Load skill rules
function loadRules() {
try {
const content = fs.readFileSync(CONFIG.rulesPath, 'utf8');
return JSON.parse(content);
} catch (error) {
if (CONFIG.debugMode) {
console.error('Failed to load skill rules:', error.message);
}
return {};
}
}
// Read prompt from stdin
function readPrompt() {
return new Promise((resolve) => {
let data = '';
process.stdin.on('data', chunk => data += chunk);
process.stdin.on('end', () => {
try {
resolve(JSON.parse(data));
} catch (error) {
if (CONFIG.debugMode) {
console.error('Failed to parse prompt:', error.message);
}
resolve({ text: '' });
}
});
});
}
// Analyze prompt for skill matches
function analyzePrompt(promptText, rules) {
const lowerText = promptText.toLowerCase();
const activated = [];
for (const [skillName, config] of Object.entries(rules)) {
let matched = false;
let matchReason = '';
// Check keyword triggers
if (config.promptTriggers?.keywords) {
for (const keyword of config.promptTriggers.keywords) {
if (lowerText.includes(keyword.toLowerCase())) {
matched = true;
matchReason = `keyword: "${keyword}"`;
break;
}
}
}
// Check intent pattern triggers
if (!matched && config.promptTriggers?.intentPatterns) {
for (const pattern of config.promptTriggers.intentPatterns) {
try {
if (new RegExp(pattern, 'i').test(promptText)) {
matched = true;
matchReason = `intent pattern: "${pattern}"`;
break;
}
} catch (error) {
if (CONFIG.debugMode) {
console.error(`Invalid pattern "${pattern}":`, error.message);
}
}
}
}
if (matched) {
activated.push({
skill: skillName,
priority: config.priority || 'medium',
reason: matchReason,
type: config.type || 'general'
});
}
}
// Sort by priority (high > medium > low)
const priorityOrder = { high: 0, medium: 1, low: 2 };
activated.sort((a, b) => {
const priorityDiff = priorityOrder[a.priority] - priorityOrder[b.priority];
if (priorityDiff !== 0) return priorityDiff;
// Secondary sort: process types before domain types
const typeOrder = { process: 0, domain: 1, general: 2 };
return (typeOrder[a.type] || 2) - (typeOrder[b.type] || 2);
});
// Limit to max skills
return activated.slice(0, CONFIG.maxSkills);
}
// Generate activation context
function generateContext(skills) {
if (skills.length === 0) {
return null;
}
const lines = [
'',
'━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━',
'🎯 SKILL ACTIVATION CHECK',
'━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━',
'',
'Relevant skills for this prompt:',
''
];
for (const skill of skills) {
const emoji = skill.priority === 'high' ? '⭐' : skill.priority === 'medium' ? '📌' : '💡';
lines.push(`${emoji} **${skill.skill}** (${skill.priority} priority)`);
if (CONFIG.debugMode) {
lines.push(` Matched: ${skill.reason}`);
}
}
lines.push('');
lines.push('Before responding, check if any of these skills should be used.');
lines.push('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
lines.push('');
return lines.join('\n');
}
// Main execution
async function main() {
try {
// Load rules
const rules = loadRules();
if (Object.keys(rules).length === 0) {
if (CONFIG.debugMode) {
console.error('No rules loaded');
}
console.log(JSON.stringify({ decision: 'approve' }));
return;
}
// Read prompt
const prompt = await readPrompt();
if (!prompt.text || prompt.text.trim() === '') {
console.log(JSON.stringify({ decision: 'approve' }));
return;
}
// Analyze prompt
const activatedSkills = analyzePrompt(prompt.text, rules);
// Generate response
if (activatedSkills.length > 0) {
const context = generateContext(activatedSkills);
if (CONFIG.debugMode) {
console.error('Activated skills:', activatedSkills.map(s => s.skill).join(', '));
}
console.log(JSON.stringify({
decision: 'approve',
additionalContext: context
}));
} else {
if (CONFIG.debugMode) {
console.error('No skills activated');
}
console.log(JSON.stringify({ decision: 'approve' }));
}
} catch (error) {
if (CONFIG.debugMode) {
console.error('Hook error:', error.message, error.stack);
}
// Always approve on error
console.log(JSON.stringify({ decision: 'approve' }));
}
}
main();
```
## Step 3: Make Hook Executable
```bash
chmod +x ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
## Step 4: Configure Hook
**Location:** `~/.claude/hooks.json`
```json
{
"hooks": [
{
"event": "UserPromptSubmit",
"command": "~/.claude/hooks/user-prompt-submit/skill-activator.js",
"description": "Analyze prompt and inject skill activation reminders",
"blocking": false,
"timeout": 1000
}
]
}
```
## Step 5: Test the Hook
### Test 1: Keyword Matching
```bash
# Create test prompt
echo '{"text": "How do I create a new API endpoint?"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
**Expected output:**
```json
{
"additionalContext": "\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n🎯 SKILL ACTIVATION CHECK\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n\nRelevant skills for this prompt:\n\n⭐ **backend-dev-guidelines** (high priority)\n\nBefore responding, check if any of these skills should be used.\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n"
}
```
### Test 2: Intent Pattern Matching
```bash
echo '{"text": "I want to build a new React component"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
**Expected:** Should activate frontend-dev-guidelines
### Test 3: Multiple Skills
```bash
echo '{"text": "Write a test for the API endpoint"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
**Expected:** Should activate hyperpowers:test-driven-development and backend-dev-guidelines
### Test 4: Debug Mode
```bash
DEBUG=true echo '{"text": "How do I create a component?"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js 2>&1
```
**Expected:** Debug output showing which skills matched and why
## Advanced: File-Based Triggers
To add file-based triggers, extend the hook to check which files are being edited:
```javascript
// Add to skill-activator.js
// Get recently edited files from Claude Code context
function getRecentFiles(prompt) {
// Claude Code provides context about files being edited
// This would come from the prompt context or a separate tracking mechanism
return prompt.files || [];
}
// Check file triggers
function checkFileTriggers(files, config) {
if (!files || files.length === 0) return false;
if (!config.fileTriggers) return false;
// Check path patterns
if (config.fileTriggers.pathPatterns) {
for (const file of files) {
for (const pattern of config.fileTriggers.pathPatterns) {
// Convert glob pattern to regex
const regex = globToRegex(pattern);
if (regex.test(file)) {
return true;
}
}
}
}
// Check content patterns (would require reading files)
// Omitted for performance - better to check in PostToolUse hook
return false;
}
// Convert glob pattern to regex
function globToRegex(glob) {
const regex = glob
.replace(/\*\*/g, '___DOUBLE_STAR___')
.replace(/\*/g, '[^/]*')
.replace(/___DOUBLE_STAR___/g, '.*')
.replace(/\?/g, '.');
return new RegExp(`^${regex}$`);
}
```
## Troubleshooting
### Hook Not Running
**Check:**
```bash
# Verify hook is configured
cat ~/.claude/hooks.json
# Test hook manually
echo '{"text": "test"}' | node ~/.claude/hooks/user-prompt-submit/skill-activator.js
# Check Claude Code logs
tail -f ~/.claude/logs/hooks.log
```
### No Skills Activating
**Enable debug mode:**
```bash
DEBUG=true node ~/.claude/hooks/user-prompt-submit/skill-activator.js < test-prompt.json
```
**Common causes:**
- skill-rules.json not found or invalid
- Keywords don't match (check casing, spelling)
- Patterns have regex errors
- Hook timing out (increase timeout)
### Too Many Skills Activating
**Adjust maxSkills:**
```javascript
const CONFIG = {
maxSkills: 2, // Reduce from 3
// ...
};
```
**Or tighten triggers:**
```json
{
"backend-dev-guidelines": {
"priority": "high", // Only high priority skills
"promptTriggers": {
"keywords": ["controller", "service"], // More specific keywords
// ...
}
}
}
```
### Performance Issues
**If hook is slow (>500ms):**
1. Reduce regex complexity
2. Limit number of patterns
3. Cache compiled regex patterns
4. Profile with:
```bash
time echo '{"text": "test"}' | node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
## Maintenance
### Monthly Review
```bash
# Check activation frequency
grep "Activated skills" ~/.claude/hooks/debug.log | sort | uniq -c
# Find prompts that didn't activate any skills
grep "No skills activated" ~/.claude/hooks/debug.log
```
### Updating Rules
When adding new skills:
1. Add to skill-rules.json
2. Test activation with sample prompts
3. Observe for false positives/negatives
4. Refine patterns based on usage
### Version Control
```bash
# Track rules in git
cd ~/.claude
git init
git add skill-rules.json hooks/
git commit -m "Initial skill activation rules"
```
## Integration with Other Hooks
The skill activator can work alongside other hooks:
```json
{
"hooks": [
{
"event": "UserPromptSubmit",
"command": "~/.claude/hooks/user-prompt-submit/00-log-prompt.sh",
"description": "Log prompts for analysis",
"blocking": false
},
{
"event": "UserPromptSubmit",
"command": "~/.claude/hooks/user-prompt-submit/10-skill-activator.js",
"description": "Activate relevant skills",
"blocking": false
}
]
}
```
**Naming convention:** Use numeric prefixes (00-, 10-, 20-) to control execution order.
## Performance Benchmarks
**Target performance:**
- Keyword matching: <50ms
- Intent pattern matching: <200ms
- Total hook execution: <500ms
**Actual performance (typical):**
- 2-3 skills: ~100-300ms
- 5+ skills: ~300-500ms
If performance degrades, profile and optimize patterns.

View File

@@ -0,0 +1,443 @@
# Skill Rules Examples
Example configurations for common skill types and scenarios.
## Domain-Specific Skills
### Backend Development
```json
{
"backend-dev-guidelines": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": [
"backend", "server", "API", "endpoint", "route",
"controller", "service", "repository",
"middleware", "authentication", "authorization"
],
"intentPatterns": [
"(create|build|implement|add).*?(API|endpoint|route|controller)",
"how.*(backend|server|API)",
"(setup|configure).*(server|backend|API)",
"implement.*(auth|security)"
]
},
"fileTriggers": {
"pathPatterns": [
"backend/**/*.ts",
"server/**/*.ts",
"src/api/**"
]
}
}
}
```
### Frontend Development
```json
{
"frontend-dev-guidelines": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": [
"frontend", "UI", "component", "react", "vue", "angular",
"page", "layout", "view", "hooks", "state"
],
"intentPatterns": [
"(create|build).*?(component|page|layout)",
"how.*(react|hooks|state)",
"(style|design).*?(component|UI)"
]
},
"fileTriggers": {
"pathPatterns": [
"src/components/**/*.tsx",
"src/pages/**/*.tsx"
]
}
}
}
```
## Process Skills
### Test-Driven Development
```json
{
"test-driven-development": {
"type": "process",
"priority": "high",
"promptTriggers": {
"keywords": ["test", "TDD", "testing", "spec", "jest", "vitest"],
"intentPatterns": [
"(write|create|add).*?test",
"test.*first",
"reproduce.*(bug|error)"
]
},
"fileTriggers": {
"pathPatterns": [
"**/*.test.ts",
"**/*.spec.ts",
"**/__tests__/**"
]
}
}
}
```
### Code Review
```json
{
"code-review": {
"type": "process",
"priority": "medium",
"promptTriggers": {
"keywords": ["review", "check", "verify", "audit", "quality"],
"intentPatterns": [
"review.*(code|changes|implementation)",
"(check|verify).*(quality|standards|best practices)"
]
}
}
}
```
## Technology-Specific Skills
### Database/Prisma
```json
{
"database-prisma": {
"type": "technology",
"priority": "high",
"promptTriggers": {
"keywords": [
"database", "prisma", "schema", "migration",
"query", "orm", "model"
],
"intentPatterns": [
"(create|update|modify).*?(schema|model|migration)",
"(query|fetch|get).*?database"
]
},
"fileTriggers": {
"pathPatterns": [
"**/prisma/**",
"**/*.prisma"
]
}
}
}
```
### Docker/DevOps
```json
{
"devops-docker": {
"type": "technology",
"priority": "medium",
"promptTriggers": {
"keywords": [
"docker", "dockerfile", "container",
"deployment", "CI/CD", "kubernetes"
],
"intentPatterns": [
"(create|build|configure).*?(docker|container)",
"(deploy|release|publish)"
]
},
"fileTriggers": {
"pathPatterns": [
"**/Dockerfile",
"**/.github/workflows/**",
"**/docker-compose.yml"
]
}
}
}
```
## Project-Specific Skills
### Feature-Specific (E-commerce Cart)
```json
{
"cart-feature": {
"type": "feature",
"priority": "medium",
"promptTriggers": {
"keywords": [
"cart", "shopping cart", "basket",
"add to cart", "checkout"
],
"intentPatterns": [
"(implement|create|modify).*?cart",
"cart.*(functionality|feature|logic)"
]
},
"fileTriggers": {
"pathPatterns": [
"src/features/cart/**",
"backend/cart-service/**"
]
}
}
}
```
## Priority-Based Configuration
### High Priority (Always Check)
```json
{
"critical-security": {
"type": "security",
"priority": "high",
"enforcement": "suggest",
"promptTriggers": {
"keywords": [
"security", "vulnerability", "authentication", "authorization",
"SQL injection", "XSS", "CSRF", "password", "token"
],
"intentPatterns": [
"secur(e|ity)",
"vulnerab(le|ility)",
"(auth|password|token).*(implement|handle|store)"
]
}
}
}
```
### Medium Priority (Contextual)
```json
{
"performance-optimization": {
"type": "optimization",
"priority": "medium",
"promptTriggers": {
"keywords": [
"performance", "optimize", "slow", "cache",
"memory", "speed", "latency"
],
"intentPatterns": [
"(improve|optimize).*(performance|speed)",
"(reduce|minimize).*(latency|memory|time)"
]
}
}
}
```
### Low Priority (Optional)
```json
{
"documentation-guide": {
"type": "documentation",
"priority": "low",
"promptTriggers": {
"keywords": [
"documentation", "docs", "comments", "readme",
"docstring", "jsdoc"
],
"intentPatterns": [
"(write|update|create).*?(documentation|docs)",
"document.*(API|function|component)"
]
}
}
}
```
## Multi-Repo Configuration
For projects with multiple repositories:
```json
{
"frontend-mobile": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": ["mobile", "ios", "android", "react native"],
"intentPatterns": ["(create|build).*?(screen|component)"]
},
"fileTriggers": {
"pathPatterns": ["/mobile/**", "/apps/mobile/**"]
}
},
"frontend-web": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": ["web", "website", "react", "nextjs"],
"intentPatterns": ["(create|build).*?(page|component)"]
},
"fileTriggers": {
"pathPatterns": ["/web/**", "/apps/web/**"]
}
}
}
```
## Advanced Pattern Matching
### Negative Patterns (Exclude)
```json
{
"backend-dev-guidelines": {
"promptTriggers": {
"keywords": ["backend"],
"intentPatterns": [
"backend",
"(?!.*test).*backend" // Match "backend" but not if "test" appears
]
}
}
}
```
### Compound Patterns
```json
{
"database-migration": {
"promptTriggers": {
"intentPatterns": [
"(create|generate|run).*(migration|schema change)",
"(add|remove|modify).*(column|table|index)"
]
}
}
}
```
### Context-Aware Patterns
```json
{
"error-handling": {
"promptTriggers": {
"keywords": ["error", "exception", "try", "catch"],
"intentPatterns": [
"(handle|catch|throw).*(error|exception)",
"error.*handling"
]
}
}
}
```
## Enforcement Levels
```json
{
"critical-skill": {
"enforcement": "block", // Block if not used (future feature)
"priority": "high"
},
"recommended-skill": {
"enforcement": "suggest", // Suggest usage
"priority": "medium"
},
"optional-skill": {
"enforcement": "optional", // Mention availability
"priority": "low"
}
}
```
## Full Example Configuration
Complete configuration for a full-stack TypeScript project:
```json
{
"backend-dev-guidelines": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": ["backend", "API", "endpoint", "controller", "service"],
"intentPatterns": [
"(create|add|implement).*?(API|endpoint|route|controller|service)",
"how.*(backend|server|API)"
]
},
"fileTriggers": {
"pathPatterns": ["backend/**/*.ts", "server/**/*.ts"]
}
},
"frontend-dev-guidelines": {
"type": "domain",
"priority": "high",
"promptTriggers": {
"keywords": ["frontend", "component", "react", "UI"],
"intentPatterns": ["(create|build).*?(component|page)"]
},
"fileTriggers": {
"pathPatterns": ["src/components/**/*.tsx", "src/pages/**/*.tsx"]
}
},
"test-driven-development": {
"type": "process",
"priority": "high",
"promptTriggers": {
"keywords": ["test", "TDD", "testing"],
"intentPatterns": ["(write|create).*?test", "test.*first"]
},
"fileTriggers": {
"pathPatterns": ["**/*.test.ts", "**/*.spec.ts"]
}
},
"database-prisma": {
"type": "technology",
"priority": "high",
"promptTriggers": {
"keywords": ["database", "prisma", "schema", "migration"],
"intentPatterns": ["(create|modify).*?(schema|migration)"]
},
"fileTriggers": {
"pathPatterns": ["**/prisma/**"]
}
},
"debugging-with-tools": {
"type": "process",
"priority": "medium",
"promptTriggers": {
"keywords": ["debug", "bug", "error", "broken", "not working"],
"intentPatterns": ["(debug|fix|solve).*?(error|bug|issue)"]
}
},
"refactoring-safely": {
"type": "process",
"priority": "medium",
"promptTriggers": {
"keywords": ["refactor", "cleanup", "improve", "restructure"],
"intentPatterns": ["(refactor|clean up|improve).*?code"]
}
}
}
```
## Tips for Creating Rules
1. **Start broad, refine narrow** - Begin with general keywords, narrow based on false positives
2. **Use priority wisely** - High priority for critical skills only
3. **Test patterns** - Validate regex patterns before deploying
4. **Monitor activation** - Track which skills activate and adjust
5. **Keep it maintainable** - Comment complex patterns
6. **Version control** - Track changes to rules over time

View File

@@ -0,0 +1,557 @@
# Troubleshooting Skills Auto-Activation
Common issues and solutions for skills auto-activation system.
## Problem: Hook Not Running At All
### Symptoms
- No skill activation messages appear
- Prompts process normally without injected context
### Diagnosis
**Step 1: Check hook configuration**
```bash
cat ~/.claude/hooks.json
```
Should contain:
```json
{
"hooks": [
{
"event": "UserPromptSubmit",
"command": "~/.claude/hooks/user-prompt-submit/skill-activator.js"
}
]
}
```
**Step 2: Test hook manually**
```bash
echo '{"text": "test backend endpoint"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
Should output JSON with `decision` and possibly `additionalContext`.
**Step 3: Check file permissions**
```bash
ls -l ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
Should be executable (`-rwxr-xr-x`). If not:
```bash
chmod +x ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
**Step 4: Check Claude Code logs**
```bash
tail -f ~/.claude/logs/hooks.log
```
Look for errors related to skill-activator.
### Solutions
**Solution 1: Reinstall hook**
```bash
mkdir -p ~/.claude/hooks/user-prompt-submit
cp skill-activator.js ~/.claude/hooks/user-prompt-submit/
chmod +x ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
**Solution 2: Verify Node.js**
```bash
which node
node --version
```
Ensure Node.js is installed and in PATH.
**Solution 3: Check hook timeout**
```json
{
"hooks": [
{
"event": "UserPromptSubmit",
"command": "~/.claude/hooks/user-prompt-submit/skill-activator.js",
"timeout": 2000 // Increase if needed
}
]
}
```
## Problem: No Skills Activating
### Symptoms
- Hook runs successfully
- No skills appear in activation messages
- Debug shows "No skills activated"
### Diagnosis
**Enable debug mode:**
```bash
DEBUG=true echo '{"text": "your test prompt"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js 2>&1
```
**Check for:**
- "No rules loaded" → skill-rules.json not found
- "No skills activated" → Keywords/patterns don't match
### Solutions
**Solution 1: Verify skill-rules.json location**
```bash
cat ~/.claude/skill-rules.json
```
If not found:
```bash
cp skill-rules.json ~/.claude/skill-rules.json
```
**Solution 2: Test with known keyword**
```bash
echo '{"text": "create backend controller"}' | \
SKILL_RULES=~/.claude/skill-rules.json \
DEBUG=true \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js 2>&1
```
Should match "backend-dev-guidelines" if configured.
**Solution 3: Check JSON syntax**
```bash
cat ~/.claude/skill-rules.json | jq '.'
```
If errors, fix JSON syntax.
**Solution 4: Simplify rules for testing**
```json
{
"test-skill": {
"type": "test",
"priority": "high",
"promptTriggers": {
"keywords": ["test"]
}
}
}
```
Test with:
```bash
echo '{"text": "test"}' | node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
## Problem: Wrong Skills Activating
### Symptoms
- Skills activate on irrelevant prompts
- Too many false positives
### Diagnosis
**Enable debug to see why skills matched:**
```bash
DEBUG=true echo '{"text": "your prompt"}' | \
node ~/.claude/hooks/user-prompt-submit/skill-activator.js 2>&1
```
Look for "Matched: keyword" or "Matched: intent pattern" to see why.
### Solutions
**Solution 1: Tighten keywords**
Before:
```json
{
"keywords": ["api", "test", "code"]
}
```
After (more specific):
```json
{
"keywords": ["API endpoint", "integration test", "refactor code"]
}
```
**Solution 2: Use negative patterns**
```json
{
"intentPatterns": [
"(?!.*test).*backend" // Match "backend" but not if "test" in prompt
]
}
```
**Solution 3: Increase priority thresholds**
```json
{
"test-skill": {
"priority": "low" // Will be deprioritized if others match
}
}
```
**Solution 4: Reduce maxSkills**
In skill-activator.js:
```javascript
const CONFIG = {
maxSkills: 2, // Reduce from 3
};
```
## Problem: Hook Is Slow
### Symptoms
- Noticeable delay before Claude responds
- Hook takes >1 second
### Diagnosis
**Measure hook performance:**
```bash
time echo '{"text": "test"}' | node ~/.claude/hooks/user-prompt-submit/skill-activator.js
```
Should be <500ms. If slower, diagnose:
**Check number of rules:**
```bash
cat ~/.claude/skill-rules.json | jq 'keys | length'
```
More than 10 rules may slow down.
**Check pattern complexity:**
```bash
cat ~/.claude/skill-rules.json | jq '.[].promptTriggers.intentPatterns'
```
Complex regex patterns slow matching.
### Solutions
**Solution 1: Optimize regex patterns**
Before (slow):
```json
{
"intentPatterns": [
".*create.*backend.*endpoint.*"
]
}
```
After (faster):
```json
{
"intentPatterns": [
"(create|build).*(backend|API).*(endpoint|route)"
]
}
```
**Solution 2: Cache compiled patterns**
Modify hook to compile patterns once:
```javascript
const compiledPatterns = new Map();
function getCompiledPattern(pattern) {
if (!compiledPatterns.has(pattern)) {
compiledPatterns.set(pattern, new RegExp(pattern, 'i'));
}
return compiledPatterns.get(pattern);
}
```
**Solution 3: Reduce number of rules**
Remove low-priority or rarely-used skills.
**Solution 4: Parallelize pattern matching**
For advanced users, use worker threads to match patterns in parallel.
## Problem: Skills Still Don't Activate in Claude
### Symptoms
- Hook injects activation message
- Claude still doesn't use the skills
### Diagnosis
This means the hook is working, but Claude is ignoring the suggestion.
**Check:**
1. Are skills actually installed?
2. Does Claude have access to read skills?
3. Are skill descriptions clear?
### Solutions
**Solution 1: Make activation message stronger**
In skill-activator.js, change:
```javascript
'Before responding, check if any of these skills should be used.'
```
To:
```javascript
'⚠️ IMPORTANT: You MUST check these skills before responding. Use the Skill tool to load them.'
```
**Solution 2: Block until skills loaded**
Change hook to blocking mode (use cautiously):
```json
{
"hooks": [
{
"event": "UserPromptSubmit",
"command": "~/.claude/hooks/user-prompt-submit/skill-activator.js",
"blocking": true // ⚠️ Experimental
}
]
}
```
**Solution 3: Improve skill descriptions**
Ensure skill descriptions are specific:
```yaml
name: backend-dev-guidelines
description: Use when creating API routes, controllers, services, or repositories - enforces TypeScript patterns, Prisma repository pattern, and Sentry error handling
```
**Solution 4: Reference skills in CLAUDE.md**
Add to project's CLAUDE.md:
```markdown
## Available Skills
- backend-dev-guidelines: Use for all backend code
- frontend-dev-guidelines: Use for all frontend code
- hyperpowers:test-driven-development: Use when writing tests
```
## Problem: Hook Crashes Claude Code
### Symptoms
- Claude Code freezes or crashes after hook execution
- Error in hooks.log
### Diagnosis
**Check error logs:**
```bash
tail -50 ~/.claude/logs/hooks.log
```
Look for errors related to skill-activator.
**Common causes:**
- Infinite loop in hook
- Memory leak
- Unhandled promise rejection
- Blocking operation
### Solutions
**Solution 1: Add error handling**
```javascript
async function main() {
try {
// ... hook logic
} catch (error) {
console.error('Hook error:', error.message);
// Always return approve on error
console.log(JSON.stringify({ decision: 'approve' }));
}
}
```
**Solution 2: Add timeout protection**
```javascript
const timeout = setTimeout(() => {
console.log(JSON.stringify({ decision: 'approve' }));
process.exit(0);
}, 900); // Exit before hook timeout
// Clear timeout if completed normally
clearTimeout(timeout);
```
**Solution 3: Test hook in isolation**
```bash
# Run hook with various inputs
for prompt in "test" "backend" "frontend" "debug"; do
echo "Testing: $prompt"
echo "{\"text\": \"$prompt\"}" | \
timeout 2s node ~/.claude/hooks/user-prompt-submit/skill-activator.js
done
```
**Solution 4: Simplify hook**
Remove complex logic and test minimal version:
```javascript
// Minimal hook for testing
console.log(JSON.stringify({
decision: 'approve',
additionalContext: '🎯 Test message'
}));
```
## Problem: Context Overload
### Symptoms
- Too many skill activation messages
- Context window fills quickly
- Claude seems overwhelmed
### Solutions
**Solution 1: Limit activated skills**
```javascript
const CONFIG = {
maxSkills: 1, // Only top match
};
```
**Solution 2: Use priorities strictly**
```json
{
"critical-skill": {
"priority": "high" // Only high priority
},
"optional-skill": {
"priority": "low" // Remove low priority
}
}
```
**Solution 3: Shorten activation message**
```javascript
function generateContext(skills) {
return `🎯 Use: ${skills.map(s => s.skill).join(', ')}`;
}
```
## Problem: Inconsistent Activation
### Symptoms
- Sometimes activates, sometimes doesn't
- Same prompt gives different results
### Diagnosis
**This is expected due to:**
- Prompt variations (punctuation, wording)
- Context differences (files being edited)
- Keyword order
### Solutions
**Solution 1: Add keyword variations**
```json
{
"keywords": [
"backend",
"back end",
"back-end",
"server side",
"server-side"
]
}
```
**Solution 2: Use more patterns**
```json
{
"intentPatterns": [
"create.*backend",
"backend.*create",
"build.*API",
"API.*build"
]
}
```
**Solution 3: Log all prompts for analysis**
```javascript
// Add to hook
fs.appendFileSync(
path.join(process.env.HOME, '.claude/prompt-log.txt'),
`${new Date().toISOString()} | ${prompt.text}\n`
);
```
Analyze monthly:
```bash
grep "backend" ~/.claude/prompt-log.txt | wc -l
```
## General Debugging Tips
**Enable full debug logging:**
```bash
# Add to hook
const logFile = path.join(process.env.HOME, '.claude/hook-debug.log');
function debug(msg) {
fs.appendFileSync(logFile, `${new Date().toISOString()} ${msg}\n`);
}
debug(`Analyzing prompt: ${prompt.text}`);
debug(`Activated skills: ${activatedSkills.map(s => s.skill).join(', ')}`);
```
**Test with controlled inputs:**
```bash
# Create test suite
cat > test-prompts.json <<EOF
[
{"text": "create backend endpoint", "expected": ["backend-dev-guidelines"]},
{"text": "build react component", "expected": ["frontend-dev-guidelines"]},
{"text": "write test for API", "expected": ["test-driven-development"]}
]
EOF
# Run tests
node test-hook.js
```
**Monitor in production:**
```bash
# Daily summary
grep "Activated skills" ~/.claude/hook-debug.log | \
grep "$(date +%Y-%m-%d)" | \
sort | uniq -c
```
## Getting Help
If problems persist:
1. Check GitHub issues for similar problems
2. Share debug output (sanitize sensitive info)
3. Test with minimal configuration
4. Verify with official examples
**Checklist before asking for help:**
- [ ] Hook runs manually without errors
- [ ] skill-rules.json is valid JSON
- [ ] Node.js version is current (v18+)
- [ ] Debug mode shows expected behavior
- [ ] Tested with simplified configuration
- [ ] Checked Claude Code logs

View File

@@ -0,0 +1,826 @@
---
name: sre-task-refinement
description: Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.
---
<skill_overview>
Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps.
</rigidity_level>
<quick_reference>
| Category | Key Questions | Auto-Reject If |
|----------|---------------|----------------|
| 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown |
| 2. Implementability | Junior can execute without questions? | Vague language, missing details |
| 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") |
| 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies |
| 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section |
| 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration |
| 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" |
**Perspective**: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.
**Time**: Don't rush - catching one gap pre-implementation saves hours of rework.
</quick_reference>
<when_to_use>
Use when:
- Reviewing bd epic/feature plans before implementation
- Need to ensure junior engineer can execute without questions
- Want to catch edge cases and failure modes upfront
- Need to verify task granularity (4-8 hour subtasks)
- After hyperpowers:writing-plans creates initial plan
- Before hyperpowers:executing-plans starts implementation
Don't use when:
- Task already being implemented (too late)
- Just need to understand existing code (use codebase-investigator)
- Debugging issues (use debugging-with-tools)
- Want to create plan from scratch (use brainstorming → writing-plans)
</when_to_use>
<the_process>
## Announcement
**Announce:** "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."
---
## Review Checklist (Apply to Every Task)
### 1. Task Granularity
**Check:**
- [ ] No task >8 hours (subtasks) or >16 hours (phases)?
- [ ] Large phases broken into 4-8 hour subtasks?
- [ ] Each subtask independently completable?
- [ ] Each subtask has clear deliverable?
**If task >16 hours:**
- Create subtasks with `bd create`
- Link with `bd dep add child parent --type parent-child`
- Update parent to coordinator role
---
### 2. Implementability (Junior Engineer Test)
**Check:**
- [ ] Can junior engineer implement without asking questions?
- [ ] Function signatures/behaviors described, not just "implement X"?
- [ ] Test scenarios described (what they verify, not just names)?
- [ ] "Done" clearly defined with verifiable criteria?
- [ ] All file paths specified or marked "TBD: new file"?
**Red flags:**
- "Implement properly" (how?)
- "Add support" (for what exactly?)
- "Make it work" (what does working mean?)
- File paths missing or ambiguous
---
### 3. Success Criteria Quality
**Check:**
- [ ] Each task has 3+ specific, measurable success criteria?
- [ ] All criteria testable/verifiable (not subjective)?
- [ ] Includes automated verification (tests pass, clippy clean)?
- [ ] No vague criteria like "works well" or "is implemented"?
**Good criteria examples:**
- ✅ "5+ unit tests pass (valid VIN, invalid checksum, various formats)"
- ✅ "Clippy clean with no warnings"
- ✅ "Performance: <100ms for 1000 records"
**Bad criteria examples:**
- ❌ "Code is good quality"
- ❌ "Works correctly"
- ❌ "Is implemented"
---
### 4. Dependency Structure
**Check:**
- [ ] Parent-child relationships correct (epic → phases → subtasks)?
- [ ] Blocking dependencies correct (earlier work blocks later)?
- [ ] No circular dependencies?
- [ ] Dependency graph makes logical sense?
**Verify with:**
```bash
bd dep tree bd-1 # Show full dependency tree
```
---
### 5. Safety & Quality Standards
**Check:**
- [ ] Anti-patterns include unwrap/expect prohibition?
- [ ] Anti-patterns include TODO prohibition (or must have issue #)?
- [ ] Anti-patterns include stub implementation prohibition?
- [ ] Error handling requirements specified (use Result, avoid panic)?
- [ ] Test requirements specific (test names, scenarios listed)?
**Minimum anti-patterns:**
- ❌ No unwrap/expect in production code
- ❌ No TODOs without issue numbers
- ❌ No stub implementations (unimplemented!, todo!)
- ❌ No regex without catastrophic backtracking check
---
### 6. Edge Cases & Failure Modes (Fellow SRE Perspective)
**Ask for each task:**
- [ ] What happens with malformed input?
- [ ] What happens with empty/nil/zero values?
- [ ] What happens under high load/concurrency?
- [ ] What happens when dependencies fail?
- [ ] What happens with Unicode, special characters, large inputs?
- [ ] Are these edge cases addressed in the plan?
**Add to Key Considerations section:**
- Edge case descriptions
- Mitigation strategies
- References to similar code handling these cases
---
### 7. Red Flags (AUTO-REJECT)
**Check for these - if found, REJECT plan:**
- ❌ Any task >16 hours without subtask breakdown
- ❌ Vague language: "implement properly", "add support", "make it work"
- ❌ Success criteria that can't be verified: "code is good", "works well"
- ❌ Missing test specifications
- ❌ "We'll handle this later" or "TODO" in the plan itself
- ❌ No anti-patterns section
- ❌ Implementation checklist with fewer than 3 items per task
- ❌ No effort estimates
- ❌ Missing error handling considerations
-**CRITICAL: Placeholder text in design field** - "[detailed above]", "[as specified]", "[complete steps here]"
---
## Review Process
For each task in the plan:
**Step 1: Read the task**
```bash
bd show bd-3
```
**Step 2: Apply all 7 checklist categories**
- Task Granularity
- Implementability
- Success Criteria Quality
- Dependency Structure
- Safety & Quality Standards
- Edge Cases & Failure Modes
- Red Flags
**Step 3: Document findings**
Take notes:
- What's done well
- What's missing
- What's vague or ambiguous
- Hidden failure modes not addressed
- Better approaches or simplifications
**Step 4: Update the task**
Use `bd update` to add missing information:
```bash
bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Original goal, preserved]
## Effort Estimate
[Updated estimate if needed]
## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing measurable criteria
## Implementation Checklist
[Complete checklist with file paths]
## Key Considerations (ADDED BY SRE REVIEW)
**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing
**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods
**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition
**Reference Implementation**
- Study src/similar/module.rs for pattern to follow
## Anti-patterns
[Original anti-patterns]
- ❌ NEW: Specific anti-pattern for this task's risks
EOF
)"
```
**IMPORTANT:** Use `--design` for full detailed description, NOT `--description` (title only).
**Step 5: Verify no placeholder text (MANDATORY)**
After updating, read back with `bd show bd-N` and verify:
- ✅ All sections contain actual content, not meta-references
- ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
- ✅ Implementation steps fully written with actual code examples
- ✅ Success criteria explicit, not referencing "criteria above"
- ❌ If ANY placeholder text found: REJECT and rewrite with actual content
---
## Breaking Down Large Tasks
If task >16 hours, create subtasks:
```bash
# Create first subtask
bd create "Subtask 1: [Specific Component]" \
--type task \
--priority 1 \
--design "[Complete subtask design with all 7 categories addressed]"
# Returns bd-10
# Create second subtask
bd create "Subtask 2: [Another Component]" \
--type task \
--priority 1 \
--design "[Complete subtask design]"
# Returns bd-11
# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child # bd-11 is child of bd-3
# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10 # bd-11 depends on bd-10 (do bd-10 first)
# Update parent to coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.
## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"
```
---
## Output Format
After reviewing all tasks:
```markdown
## Plan Review Results
### Epic: [Name] ([epic-id])
### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]
### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]
**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]
### Task-by-Task Review
#### [Task Name] (bd-N)
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])
**Strengths**:
- [What's done well]
**Critical Issues** (must fix):
- [Blocking problems]
**Improvements Needed**:
- [What to add/clarify]
**Edge Cases Missing**:
- [Failure modes not addressed]
**Changes Made**:
- [Specific improvements added via `bd update`]
---
[Repeat for each task/phase/subtask]
### Summary of Changes
**Issues Updated**:
- bd-3 - Added edge case handling for Unicode, regex backtracking risks
- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
- bd-7 - Strengthened success criteria (added test names, verification commands)
### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]
### Recommendations
[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed
[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement
[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]
```
</the_process>
<examples>
<example>
<scenario>Developer reviews task but skips edge case analysis (Category 6)</scenario>
<code>
# Review of bd-3: Implement VIN scanner
## Checklist review:
1. Granularity: ✅ 6-8 hours
2. Implementability: ✅ Junior can implement
3. Success Criteria: ✅ Has 5 test scenarios
4. Dependencies: ✅ Correct
5. Safety Standards: ✅ Anti-patterns present
6. Edge Cases: [SKIPPED - "looks straightforward"]
7. Red Flags: ✅ None found
Conclusion: "Task looks good, approve ✅"
# Task ships without edge case review
# Production issues occur:
- VIN scanner matches random 17-char strings (no checksum validation)
- Lowercase VINs not handled (should normalize)
- Catastrophic regex backtracking on long inputs (DoS vulnerability)
</code>
<why_it_fails>
- Skipped Category 6 (Edge Cases) assuming task was "straightforward"
- Didn't ask: What happens with invalid checksums? Lowercase? Long inputs?
- Missed critical production issues:
- False positives (no checksum validation)
- Data handling bugs (case sensitivity)
- Security vulnerability (regex DoS)
- Junior engineer didn't know to handle these (not in task)
- Production incidents occur after deployment
- Hours of emergency fixes, customer impact
- SRE review failed to prevent known failure modes
</why_it_fails>
<correction>
**Apply Category 6 rigorously:**
```markdown
## Edge Case Analysis for bd-3: VIN Scanner
Ask for EVERY task:
- Malformed input? VIN has checksum - must validate, not just pattern match
- Empty/nil? What if empty string passed?
- Concurrency? Read-only scanner, no concurrency issues
- Dependency failures? No external dependencies
- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
- Large inputs? Regex `.*` patterns can cause catastrophic backtracking
Findings:
❌ VIN checksum validation not mentioned (will match random strings)
❌ Case normalization not mentioned (lowercase VINs exist)
❌ Regex backtracking risk not mentioned (DoS vulnerability)
```
**Update task:**
```bash
bd update bd-3 --design "$(cat <<'EOF'
[... original content ...]
## Key Considerations (ADDED BY SRE REVIEW)
**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- MUST validate checksum, not just pattern - prevents false positives
**Case Normalization**:
- VINs can appear in lowercase
- MUST normalize to uppercase before validation
- Test with mixed case: "1hgbh41jxmn109186"
**Regex Backtracking Risk**:
- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
- Test with pathological input: 10000 'X's followed by 16-char string
- Use possessive quantifiers or bounded repetition
- Reference: https://www.regular-expressions.info/catastrophic.html
**Edge Cases to Test**:
- Valid VIN with valid checksum (should match)
- Valid pattern but invalid checksum (should NOT match)
- Lowercase VIN (should normalize and validate)
- Ambiguous chars I/O not valid in VIN (should reject)
- Very long input (should not DoS)
EOF
)"
```
**What you gain:**
- Prevented false positives (checksum validation)
- Prevented data handling bugs (case normalization)
- Prevented security vulnerability (regex DoS)
- Junior engineer has complete requirements
- Production issues caught pre-implementation
- Proper SRE review preventing known failure modes
- Customer trust maintained
</correction>
</example>
<example>
<scenario>Developer approves task with placeholder text (Red Flag #10)</scenario>
<code>
# Review of bd-5: Implement License Plate Scanner
bd show bd-5:
## Implementation Checklist
- [ ] Create scanner module
- [ ] [Complete implementation steps detailed above]
- [ ] Add tests
## Success Criteria
- [ ] [As specified in the implementation checklist]
- [ ] Tests pass
## Key Considerations
- [Will be added during implementation]
# Developer's review:
"Looks comprehensive, has implementation checklist and success criteria ✅"
# During implementation:
Junior engineer: "What are the 'implementation steps detailed above'?"
Junior engineer: "What specific success criteria should I verify?"
Junior engineer: "What key considerations exist?"
# No answers in the task - junior engineer blocked
# Have to research and add missing information
# Implementation delayed by 2 days
</code>
<why_it_fails>
- Missed Red Flag #10: Placeholder text present
- "[Complete implementation steps detailed above]" is meta-reference, not content
- "[As specified in the implementation checklist]" is circular reference
- "[Will be added during implementation]" is deferral, not specification
- Junior engineer can't execute - missing critical information
- Task looks complete but actually incomplete
- Implementation blocked until details added
- SRE review failed to catch placeholder text
</why_it_fails>
<correction>
**Check for placeholder text after reading:**
```markdown
## Red Flag Check (Category 7)
Read through bd-5 line by line:
Line 15: "[Complete implementation steps detailed above]"
❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content
Line 22: "[As specified in the implementation checklist]"
❌ PLACEHOLDER - Circular reference to another section, not explicit criteria
Line 30: "[Will be added during implementation]"
❌ PLACEHOLDER - Deferral to future, not actual considerations
DECISION: REJECT ❌
Reason: Contains placeholder text - task not ready for implementation
```
**Update task with actual content:**
```bash
bd update bd-5 --design "$(cat <<'EOF'
## Implementation Checklist
- [ ] Create src/scan/plugins/scanners/license_plate.rs
- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
- [ ] Add regex patterns for US states:
- CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
- NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
- TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
- Generic: `[A-Z0-9]{5,8}` (fallback)
- [ ] Implement has_healthcare_context() check
- [ ] Create test module with 8+ test cases
- [ ] Register in src/scan/plugins/scanners/mod.rs
## Success Criteria
- [ ] Valid CA plate "1ABC123" detected in healthcare context
- [ ] Valid NY plate "ABC1234" detected in healthcare context
- [ ] Invalid plate "123" NOT detected (too short)
- [ ] Valid plate NOT detected outside healthcare context
- [ ] 8+ unit tests pass covering all patterns and edge cases
- [ ] Clippy clean, no warnings
- [ ] cargo test passes
## Key Considerations
**False Positive Risk**:
- License plates are short and generic (5-8 chars)
- MUST require healthcare context via has_healthcare_context()
- Without context, will match random alphanumeric sequences
- Test: Random string "ABC1234" should NOT match outside healthcare context
**State Format Variations**:
- 50 US states have different formats
- Implement common formats (CA, NY, TX) + generic fallback
- Document which formats supported in module docstring
- Consider international plates in future iteration
**Performance**:
- Regex patterns are simple, no backtracking risk
- Should process <1ms per chunk
**Reference Implementation**:
- Study src/scan/plugins/scanners/vehicle_identifier.rs
- Follow same pattern: regex + context check + tests
EOF
)"
```
**Verify no placeholder text:**
```bash
bd show bd-5
# Read entire output
# Confirm: All sections have actual content
# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
# ✅ Task ready for implementation
```
**What you gain:**
- Junior engineer has complete specification
- No blocked implementation waiting for details
- All edge cases documented upfront
- Success criteria explicit and verifiable
- Key considerations prevent common mistakes
- No placeholder text - task truly ready
- Professional SRE review standard maintained
</correction>
</example>
<example>
<scenario>Developer accepts vague success criteria (Category 3)</scenario>
<code>
# Review of bd-7: Implement Data Encryption
bd show bd-7:
## Success Criteria
- [ ] Encryption is implemented correctly
- [ ] Code is good quality
- [ ] Tests work properly
# Developer's review:
"Has 3 success criteria ✅ Meets minimum requirement"
# During implementation:
Junior engineer: "How do I know if encryption is 'correct'?"
Junior engineer: "What makes code 'good quality'?"
Junior engineer: "What does 'tests work properly' mean?"
# Junior engineer makes best guesses:
- Uses ECB mode (insecure, should use GCM)
- No key rotation (bad practice)
- Tests only happy path (misses edge cases)
# Code review finds critical security issues
# Complete rewrite required - 3 days wasted
</code>
<why_it_fails>
- Category 3 check: "All criteria testable/verifiable?" ❌ FAILED
- "Implemented correctly" - not measurable (correct by what standard?)
- "Good quality" - subjective, not verifiable
- "Work properly" - vague, what is proper?
- Junior engineer can't verify criteria objectively
- Makes incorrect assumptions filling gaps
- Security vulnerabilities introduced
- Wastes time on implementation that fails review
- SRE review failed to strengthen criteria
</why_it_fails>
<correction>
**Apply Category 3 rigorously:**
```markdown
## Success Criteria Analysis for bd-7
Current criteria:
- [ ] Encryption is implemented correctly
❌ NOT TESTABLE - "correctly" is subjective, no standard specified
- [ ] Code is good quality
❌ NOT TESTABLE - "good quality" is opinion, not measurable
- [ ] Tests work properly
❌ NOT TESTABLE - "properly" is vague, no definition
Minimum requirement: 3+ specific, measurable, testable criteria
Current: 0 testable criteria
DECISION: REJECT ❌
```
**Update with measurable criteria:**
```bash
bd update bd-7 --design "$(cat <<'EOF'
[... original content ...]
## Success Criteria
**Encryption Implementation**:
- [ ] Uses AES-256-GCM mode (verified in code review)
- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
- [ ] Unique IV generated per encryption (crypto_random)
- [ ] Authentication tag verified on decryption
**Code Quality** (automated checks):
- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
- [ ] Rustfmt compliant: `cargo fmt --check`
- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0
**Test Coverage**:
- [ ] 12+ unit tests pass covering:
- test_encrypt_decrypt_roundtrip (happy path)
- test_wrong_key_fails_auth (security)
- test_modified_ciphertext_fails_auth (security)
- test_empty_plaintext (edge case)
- test_large_plaintext_10mb (performance)
- test_unicode_plaintext (data handling)
- test_concurrent_encryption (thread safety)
- test_iv_uniqueness (security)
- [4 more specific scenarios]
- [ ] All tests pass: `cargo test encryption`
- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`
**Documentation**:
- [ ] Module docstring explains encryption scheme (AES-256-GCM)
- [ ] Function docstrings include examples
- [ ] Security considerations documented (key management, IV handling)
**Security Review**:
- [ ] No hardcoded keys or IVs (verified via grep)
- [ ] Key zeroized after use (verified in code)
- [ ] Constant-time comparison for auth tag (timing attack prevention)
EOF
)"
```
**What you gain:**
- Every criterion objectively verifiable
- Junior engineer knows exactly what "done" means
- Automated checks (clippy, fmt, grep) provide instant feedback
- Specific test scenarios prevent missed edge cases
- Security requirements explicit (GCM, PBKDF2, unique IV)
- No ambiguity - can verify each criterion with command or code review
- Professional SRE review standard: measurable, testable, specific
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Apply all 7 categories to every task** → No skipping any category for any task
2. **Reject plans with placeholder text** → "[detailed above]", "[as specified]" = instant reject
3. **Verify no placeholder after updates** → Read back with `bd show` and confirm actual content
4. **Break tasks >16 hours** → Create subtasks, don't accept large tasks
5. **Strengthen vague criteria** → "Works correctly" → measurable verification commands
6. **Add edge cases to every task** → Empty? Unicode? Concurrency? Failures?
7. **Never skip Category 6** → Edge case analysis prevents production issues
## Common Excuses
All of these mean: **STOP. Apply the full process.**
- "Task looks straightforward" (Edge cases hide in "straightforward" tasks)
- "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items)
- "Placeholder text is just formatting" (Placeholders mean incomplete specification)
- "Can handle edge cases during implementation" (Must specify upfront, not defer)
- "Junior will figure it out" (Junior should NOT need to figure out - we specify)
- "Too detailed, feels like micromanaging" (Detail prevents questions and rework)
- "Taking too long to review" (One gap caught saves hours of rework)
</critical_rules>
<verification_checklist>
Before completing SRE review:
**Per task reviewed:**
- [ ] Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags)
- [ ] Checked for placeholder text in design field
- [ ] Updated task with missing information via `bd update --design`
- [ ] Verified updated task with `bd show` (no placeholders remain)
- [ ] Broke down any task >16 hours into subtasks
- [ ] Strengthened vague success criteria to measurable
- [ ] Added edge case analysis to Key Considerations
- [ ] Strengthened anti-patterns based on failure modes
**Overall plan:**
- [ ] Reviewed ALL tasks/phases/subtasks (no exceptions)
- [ ] Verified dependency structure with `bd dep tree`
- [ ] Documented findings for each task
- [ ] Created summary of changes made
- [ ] Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT)
**Can't check all boxes?** Return to review process and complete missing steps.
</verification_checklist>
<integration>
**This skill is used after:**
- hyperpowers:writing-plans (creates initial plan)
- hyperpowers:brainstorming (establishes requirements)
**This skill is used before:**
- hyperpowers:executing-plans (implements tasks)
**This skill is also called by:**
- hyperpowers:executing-plans (REQUIRED for new tasks created during execution)
**Call chains:**
```
Initial planning:
hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
(if gaps: revise and re-review)
During execution (for new tasks):
hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint
```
**This skill uses:**
- bd commands (show, update, create, dep add, dep tree)
- Google Fellow SRE perspective (20+ years distributed systems)
- 7-category checklist (mandatory for every task)
**Time expectations:**
- Small epic (3-5 tasks): 15-20 minutes
- Medium epic (6-10 tasks): 25-40 minutes
- Large epic (10+ tasks): 45-60 minutes
**Don't rush:** Catching one critical gap pre-implementation saves hours of rework.
</integration>
<resources>
**Review patterns:**
- Task too large (>16h) → Break into 4-8h subtasks
- Vague criteria ("works correctly") → Measurable commands/checks
- Missing edge cases → Add to Key Considerations with mitigations
- Placeholder text → Rewrite with actual content
**When stuck:**
- Unsure if task too large → Ask: Can junior complete in one day?
- Unsure if criteria measurable → Ask: Can I verify with command/code review?
- Unsure if edge case matters → Ask: Could this fail in production?
- Unsure if placeholder → Ask: Does this reference other content instead of providing content?
**Key principle:** Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.
</resources>

View File

@@ -0,0 +1,336 @@
---
name: test-driven-development
description: Use when implementing features or fixing bugs - enforces RED-GREEN-REFACTOR cycle requiring tests to fail before writing code
---
<skill_overview>
Write the test first, watch it fail, write minimal code to pass. If you didn't watch the test fail, you don't know if it tests the right thing.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow these exact steps in order. Do not adapt.
Violating the letter of the rules is violating the spirit of the rules.
</rigidity_level>
<quick_reference>
| Phase | Action | Command Example | Expected Result |
|-------|--------|-----------------|-----------------|
| **RED** | Write failing test | `cargo test test_name` | FAIL (feature missing) |
| **Verify RED** | Confirm correct failure | Check error message | "function not found" or assertion fails |
| **GREEN** | Write minimal code | Implement feature | Test passes |
| **Verify GREEN** | All tests pass | `cargo test` | All green, no warnings |
| **REFACTOR** | Clean up code | Improve while green | Tests still pass |
**Iron Law:** NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
</quick_reference>
<when_to_use>
**Always use for:**
- New features
- Bug fixes
- Refactoring with behavior changes
- Any production code
**Ask your human partner for exceptions:**
- Throwaway prototypes (will be deleted)
- Generated code
- Configuration files
Thinking "skip TDD just this once"? Stop. That's rationalization.
</when_to_use>
<the_process>
## 1. RED - Write Failing Test
Write one minimal test showing what should happen.
**Requirements:**
- Test one behavior only ("and" in name? Split it)
- Clear name describing behavior
- Use real code (no mocks unless unavoidable)
See [resources/language-examples.md](resources/language-examples.md) for Rust, Swift, TypeScript examples.
## 2. Verify RED - Watch It Fail
**MANDATORY. Never skip.**
Run the test and confirm:
- ✓ Test **fails** (not errors with syntax issues)
- ✓ Failure message is expected ("function not found" or assertion fails)
- ✓ Fails because feature missing (not typos)
**If test passes:** You're testing existing behavior. Fix the test.
**If test errors:** Fix syntax error, re-run until it fails correctly.
## 3. GREEN - Write Minimal Code
Write simplest code to pass the test. Nothing more.
**Key principle:** Don't add features the test doesn't require. Don't refactor other code. Don't "improve" beyond the test.
## 4. Verify GREEN - Watch It Pass
**MANDATORY.**
Run tests and confirm:
- ✓ New test passes
- ✓ All other tests still pass
- ✓ No errors or warnings
**If test fails:** Fix code, not test.
**If other tests fail:** Fix now before proceeding.
## 5. REFACTOR - Clean Up
**Only after green:**
- Remove duplication
- Improve names
- Extract helpers
Keep tests green. Don't add behavior.
## 6. Repeat
Next failing test for next feature.
</the_process>
<examples>
<example>
<scenario>Developer writes implementation first, then adds test that passes immediately</scenario>
<code>
// Code written FIRST
def validate_email(email):
return "@" in email # Bug: accepts "@@"
// Test written AFTER
def test_validate_email():
assert validate_email("user@example.com") # Passes immediately!
// Missing edge case: assert not validate_email("@@")
</code>
<why_it_fails>
When test passes immediately:
- Never proved the test catches bugs
- Only tested happy path you remembered
- Forgot edge cases (like "@@")
- Bug ships to production
Tests written after verify remembered cases, not required behavior.
</why_it_fails>
<correction>
**TDD approach:**
1. **RED** - Write test first (including edge case):
```python
def test_validate_email():
assert validate_email("user@example.com") # Will fail - function doesn't exist
assert not validate_email("@@") # Edge case up front
```
2. **Verify RED** - Run test, watch it fail:
```bash
NameError: function 'validate_email' is not defined
```
3. **GREEN** - Implement to pass both cases:
```python
def validate_email(email):
return "@" in email and email.count("@") == 1
```
4. **Verify GREEN** - Both assertions pass, bug prevented.
**Result:** Test failed first, proving it works. Edge case discovered during test writing, not in production.
</correction>
</example>
<example>
<scenario>Developer has already written 3 hours of code without tests. Wants to keep it as "reference" while writing tests.</scenario>
<code>
// 200 lines of untested code exists
// Developer thinks: "I'll keep this and write tests that match it"
// Or: "I'll use it as reference to speed up TDD"
</code>
<why_it_fails>
**Keeping code as "reference":**
- You'll copy it (that's testing after, with extra steps)
- You'll adapt it (biased by implementation)
- Tests will match code, not requirements
- You'll justify shortcuts: "I already know this works"
**Result:** All the problems of test-after, none of the benefits of TDD.
</why_it_fails>
<correction>
**Delete it. Completely.**
```bash
git stash # Or delete the file
```
**Then start TDD:**
1. Write first failing test from requirements (not from code)
2. Watch it fail
3. Implement fresh (might be different from original, that's OK)
4. Watch it pass
**Why delete:**
- Sunk cost is already gone
- 3 hours implementing ≠ 3 hours with TDD (TDD might be 2 hours total)
- Code without tests is technical debt
- Fresh implementation from tests is usually better
**What you gain:**
- Tests that actually verify behavior
- Confidence code works
- Ability to refactor safely
- No bugs from untested edge cases
</correction>
</example>
<example>
<scenario>Test is hard to write. Developer thinks "design must be unclear, but I'll implement first to explore."</scenario>
<code>
// Test attempt:
func testUserServiceCreatesAccount() {
// Need to mock database, email service, payment gateway, logger...
// This is getting complicated, maybe I should just implement first
}
</code>
<why_it_fails>
**"Test is hard" is valuable signal:**
- Hard to test = hard to use
- Too many dependencies = coupling too tight
- Complex setup = design needs simplification
**Implementing first ignores this signal:**
- Build the complex design
- Lock in the coupling
- Now forced to write complex tests (or skip them)
</why_it_fails>
<correction>
**Listen to the test.**
Hard to test? Simplify the interface:
```swift
// Instead of:
class UserService {
init(db: Database, email: EmailService, payments: PaymentGateway, logger: Logger) { }
func createAccount(email: String, password: String, paymentToken: String) throws { }
}
// Make testable:
class UserService {
func createAccount(request: CreateAccountRequest) -> Result<Account, Error> {
// Dependencies injected through request or passed separately
}
}
```
**Test becomes simple:**
```swift
func testCreatesAccountFromRequest() {
let service = UserService()
let request = CreateAccountRequest(email: "user@example.com")
let result = service.createAccount(request: request)
XCTAssertEqual(result.email, "user@example.com")
}
```
**TDD forces good design.** If test is hard, fix design before implementing.
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Write code before test?** → Delete it. Start over.
- Never keep as "reference"
- Never "adapt" while writing tests
- Delete means delete
2. **Test passes immediately?** → Not TDD. Fix the test or delete the code.
- Passing immediately proves nothing
- You're testing existing behavior, not required behavior
3. **Can't explain why test failed?** → Fix until failure makes sense.
- "function not found" = good (feature doesn't exist)
- Weird error = bad (fix test, re-run)
4. **Want to skip "just this once"?** → That's rationalization. Stop.
- TDD is faster than debugging in production
- "Too simple to test" = test takes 30 seconds
- "Already manually tested" = not systematic, not repeatable
## Common Excuses
All of these mean: Stop, follow TDD:
- "This is different because..."
- "I'm being pragmatic, not dogmatic"
- "It's about spirit not ritual"
- "Tests after achieve the same goals"
- "Deleting X hours of work is wasteful"
</critical_rules>
<verification_checklist>
Before marking work complete:
- [ ] Every new function/method has a test
- [ ] Watched each test **fail** before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass with no warnings
- [ ] Tests use real code (mocks only if unavoidable)
- [ ] Edge cases and errors covered
**Can't check all boxes?** You skipped TDD. Start over.
</verification_checklist>
<integration>
**This skill calls:**
- verification-before-completion (running tests to verify)
**This skill is called by:**
- fixing-bugs (write failing test reproducing bug)
- executing-plans (when implementing bd tasks)
- refactoring-safely (keep tests green while refactoring)
**Agents used:**
- hyperpowers:test-runner (run tests, return summary only)
</integration>
<resources>
**Detailed language-specific examples:**
- [Rust, Swift, TypeScript examples](resources/language-examples.md) - Complete RED-GREEN-REFACTOR cycles
- [Language-specific test commands](resources/language-examples.md#verification-commands-by-language)
**When stuck:**
- Test too complicated? → Design too complicated, simplify interface
- Must mock everything? → Code too coupled, use dependency injection
- Test setup huge? → Extract helpers, or simplify design
</resources>

View File

@@ -0,0 +1,325 @@
# TDD Workflow Examples
This guide shows complete TDD workflows for common scenarios: bug fixes and feature additions.
## Example: Bug Fix
### Bug
Empty email is accepted when it should be rejected.
### RED Phase: Write Failing Test
**Swift:**
```swift
func testRejectsEmptyEmail() async throws {
let result = try await submitForm(FormData(email: ""))
XCTAssertEqual(result.error, "Email required")
}
```
### Verify RED: Watch It Fail
```bash
$ swift test --filter FormTests.testRejectsEmptyEmail
FAIL: XCTAssertEqual failed: ("nil") is not equal to ("Optional("Email required")")
```
**Confirms:**
- Test fails (not errors)
- Failure message shows email not being validated
- Fails because feature missing (not typos)
### GREEN Phase: Minimal Code
```swift
struct FormResult {
var error: String?
}
func submitForm(_ data: FormData) async throws -> FormResult {
if data.email.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
return FormResult(error: "Email required")
}
// ... rest of form processing
return FormResult()
}
```
### Verify GREEN: Watch It Pass
```bash
$ swift test --filter FormTests.testRejectsEmptyEmail
Test Case '-[FormTests testRejectsEmptyEmail]' passed
```
**Confirms:**
- Test passes
- Other tests still pass
- No errors or warnings
### REFACTOR: Clean Up
If multiple fields need validation:
```swift
extension FormData {
func validate() -> String? {
if email.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
return "Email required"
}
// Add other validations...
return nil
}
}
func submitForm(_ data: FormData) async throws -> FormResult {
if let error = data.validate() {
return FormResult(error: error)
}
// ... rest of form processing
return FormResult()
}
```
Run tests again to confirm still green.
---
## Example: Feature Addition
### Feature
Calculate average of non-empty list.
### RED Phase: Write Failing Test
**TypeScript:**
```typescript
describe('average', () => {
it('calculates average of non-empty list', () => {
expect(average([1, 2, 3])).toBe(2);
});
});
```
### Verify RED: Watch It Fail
```bash
$ npm test -- --testNamePattern="average"
FAIL: ReferenceError: average is not defined
```
**Confirms:**
- Function doesn't exist yet
- Test would verify the behavior when function exists
### GREEN Phase: Minimal Code
```typescript
function average(numbers: number[]): number {
const sum = numbers.reduce((acc, n) => acc + n, 0);
return sum / numbers.length;
}
```
### Verify GREEN: Watch It Pass
```bash
$ npm test -- --testNamePattern="average"
PASS: calculates average of non-empty list
```
### Add Edge Case: Empty List
**RED:**
```typescript
it('returns 0 for empty list', () => {
expect(average([])).toBe(0);
});
```
**Verify RED:**
```bash
$ npm test -- --testNamePattern="average.*empty"
FAIL: Expected: 0, Received: NaN
```
**GREEN:**
```typescript
function average(numbers: number[]): number {
if (numbers.length === 0) return 0;
const sum = numbers.reduce((acc, n) => acc + n, 0);
return sum / numbers.length;
}
```
**Verify GREEN:**
```bash
$ npm test -- --testNamePattern="average"
PASS: 2 tests passed
```
### REFACTOR: Clean Up
No duplication or unclear naming, so no refactoring needed. Move to next feature.
---
## Example: Refactoring with Tests
### Scenario
Existing function works but is hard to read. Tests exist and pass.
### Current Code
```rust
fn process(data: Vec<i32>) -> i32 {
let mut result = 0;
for item in data {
if item > 0 {
result += item * 2;
}
}
result
}
```
### Existing Tests (Already Green)
```rust
#[test]
fn processes_positive_numbers() {
assert_eq!(process(vec![1, 2, 3]), 12); // (1*2) + (2*2) + (3*2) = 12
}
#[test]
fn ignores_negative_numbers() {
assert_eq!(process(vec![1, -2, 3]), 8); // (1*2) + (3*2) = 8
}
#[test]
fn handles_empty_list() {
assert_eq!(process(vec![]), 0);
}
```
### REFACTOR: Improve Clarity
```rust
fn process(data: Vec<i32>) -> i32 {
data.iter()
.filter(|&&n| n > 0)
.map(|&n| n * 2)
.sum()
}
```
### Verify Still Green
```bash
$ cargo test
running 3 tests
test processes_positive_numbers ... ok
test ignores_negative_numbers ... ok
test handles_empty_list ... ok
test result: ok. 3 passed; 0 failed
```
**Key:** Tests prove refactoring didn't break behavior.
---
## Common Patterns
### Pattern: Adding Validation
1. **RED:** Test that invalid input is rejected
2. **Verify RED:** Confirm invalid input currently accepted
3. **GREEN:** Add validation check
4. **Verify GREEN:** Confirm validation works
5. **REFACTOR:** Extract validation if reusable
### Pattern: Adding Error Handling
1. **RED:** Test that error condition is caught
2. **Verify RED:** Confirm error currently unhandled
3. **GREEN:** Add error handling
4. **Verify GREEN:** Confirm error handled correctly
5. **REFACTOR:** Consolidate error handling if duplicated
### Pattern: Optimizing Performance
1. **Ensure tests exist and pass** (if not, add tests first)
2. **REFACTOR:** Optimize implementation
3. **Verify GREEN:** Confirm tests still pass
4. **Measure:** Confirm performance improved
**Note:** Never optimize without tests. You can't prove optimization didn't break behavior.
---
## Workflow Checklist
### For Each New Feature
- [ ] Write one failing test
- [ ] Run test, confirm it fails correctly
- [ ] Write minimal code to pass
- [ ] Run test, confirm it passes
- [ ] Run all tests, confirm no regressions
- [ ] Refactor if needed (staying green)
- [ ] Commit
### For Each Bug Fix
- [ ] Write test reproducing the bug
- [ ] Run test, confirm it fails (reproduces bug)
- [ ] Fix the bug (minimal change)
- [ ] Run test, confirm it passes (bug fixed)
- [ ] Run all tests, confirm no regressions
- [ ] Commit
### For Each Refactoring
- [ ] Confirm tests exist and pass
- [ ] Make one small refactoring change
- [ ] Run tests, confirm still green
- [ ] Repeat until refactoring complete
- [ ] Commit
---
## Anti-Patterns to Avoid
### ❌ Writing Multiple Tests Before Implementing
**Why bad:** You can't tell which test makes implementation fail. Write one, implement, repeat.
### ❌ Changing Test to Make It Pass
**Why bad:** Test should define correct behavior. If test is wrong, fix test first, then re-run RED phase.
### ❌ Adding Features Not Covered by Tests
**Why bad:** Untested code. If you need a feature, write test first.
### ❌ Skipping RED Verification
**Why bad:** Test might pass immediately, meaning it doesn't test anything new.
### ❌ Skipping GREEN Verification
**Why bad:** Test might fail for unexpected reason. Always verify expected pass.
---
## Remember
- **One test at a time:** Write test, implement, repeat
- **Watch it fail:** Proves test actually tests something
- **Watch it pass:** Proves implementation works
- **Stay green:** All tests pass before moving on
- **Refactor freely:** Tests catch breaks

View File

@@ -0,0 +1,267 @@
# TDD Language-Specific Examples
This guide provides concrete TDD examples in multiple programming languages, showing the RED-GREEN-REFACTOR cycle.
## RED Phase Examples
Write one minimal test showing what should happen.
### Rust
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn retries_failed_operations_3_times() {
let mut attempts = 0;
let operation = || -> Result<&str, &str> {
attempts += 1;
if attempts < 3 {
Err("fail")
} else {
Ok("success")
}
};
let result = retry_operation(operation);
assert_eq!(result, Ok("success"));
assert_eq!(attempts, 3);
}
}
```
**Running the test:**
```bash
cargo test tests::retries_failed_operations_3_times
```
### Swift
```swift
func testRetriesFailedOperations3Times() async throws {
var attempts = 0
let operation = { () -> Result<String, Error> in
attempts += 1
if attempts < 3 {
return .failure(RetryError.failed)
}
return .success("success")
}
let result = try await retryOperation(operation)
XCTAssertEqual(result, "success")
XCTAssertEqual(attempts, 3)
}
```
**Running the test:**
```bash
swift test --filter RetryTests.testRetriesFailedOperations3Times
```
### TypeScript
```typescript
describe('retryOperation', () => {
it('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = () => {
attempts++;
if (attempts < 3) {
throw new Error('fail');
}
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
});
```
**Running the test (Jest):**
```bash
npm test -- --testNamePattern="retries failed operations"
```
**Running the test (Vitest):**
```bash
npm test -- -t "retries failed operations"
```
### Why These Are Good
- Clear names describing the behavior
- Test real behavior, not mocks
- One thing per test
- Shows desired API
### Bad Example
```typescript
test('retry', () => {
let mockCalls = 0;
const mock = () => {
mockCalls++;
return 'success';
};
retryOperation(mock);
expect(mockCalls).toBe(1); // Tests mock, not behavior
});
```
**Why this is bad:**
- Vague name
- Tests mock behavior, not real retry logic
## GREEN Phase Examples
Write simplest code to pass the test.
### Rust
```rust
fn retry_operation<F, T, E>(mut operation: F) -> Result<T, E>
where
F: FnMut() -> Result<T, E>,
{
for i in 0..3 {
match operation() {
Ok(result) => return Ok(result),
Err(e) => {
if i == 2 {
return Err(e);
}
}
}
}
unreachable!()
}
```
### Swift
```swift
func retryOperation<T>(_ operation: () async throws -> T) async throws -> T {
var lastError: Error?
for attempt in 0..<3 {
do {
return try await operation()
} catch {
lastError = error
if attempt == 2 {
throw error
}
}
}
throw lastError!
}
```
### TypeScript
```typescript
async function retryOperation<T>(
operation: () => Promise<T>
): Promise<T> {
let lastError: Error | undefined;
for (let i = 0; i < 3; i++) {
try {
return await operation();
} catch (error) {
lastError = error as Error;
if (i === 2) {
throw error;
}
}
}
throw lastError;
}
```
### Bad Example - Over-engineered (YAGNI)
```typescript
async function retryOperation<T>(
operation: () => Promise<T>,
options: {
maxRetries?: number;
backoff?: 'linear' | 'exponential';
onRetry?: (attempt: number) => void;
shouldRetry?: (error: Error) => boolean;
} = {}
): Promise<T> {
// Don't add features the test doesn't require!
}
```
**Why this is bad:** Test only requires 3 retries. Don't add:
- Configurable retries
- Backoff strategies
- Callbacks
- Error filtering
...until a test requires them.
## Test Requirements
**Every test should:**
- Test one behavior
- Have a clear name
- Use real code (no mocks unless unavoidable)
## Verification Commands by Language
### Rust
```bash
# Single test
cargo test tests::test_name
# All tests
cargo test
# With output
cargo test -- --nocapture
```
### Swift
```bash
# Single test
swift test --filter TestClass.testName
# All tests
swift test
# With output
swift test --verbose
```
### TypeScript (Jest)
```bash
# Single test
npm test -- --testNamePattern="test name"
# All tests
npm test
# With coverage
npm test -- --coverage
```
### TypeScript (Vitest)
```bash
# Single test
npm test -- -t "test name"
# All tests
npm test
# With coverage
npm test -- --coverage
```

View File

@@ -0,0 +1,581 @@
---
name: testing-anti-patterns
description: Use when writing or changing tests, adding mocks - prevents testing mock behavior, production pollution with test-only methods, and mocking without understanding dependencies
---
<skill_overview>
Tests must verify real behavior, not mock behavior; mocks are tools to isolate, not things to test.
</skill_overview>
<rigidity_level>
LOW FREEDOM - The 3 Iron Laws are absolute (never test mocks, never add test-only methods, never mock without understanding). Apply gate functions strictly.
</rigidity_level>
<quick_reference>
## The 3 Iron Laws
1. **NEVER test mock behavior** → Test real component behavior
2. **NEVER add test-only methods to production** → Use test utilities instead
3. **NEVER mock without understanding** → Know dependencies before mocking
## Gate Functions (Use Before Action)
**Before asserting on any mock:**
- Ask: "Am I testing real behavior or mock existence?"
- If mock existence → STOP, delete assertion
**Before adding method to production:**
- Ask: "Is this only used by tests?"
- If yes → STOP, put in test utilities
**Before mocking:**
- Ask: "What side effects does real method have?"
- Ask: "Does test depend on those side effects?"
- If depends → Mock lower level, not this method
</quick_reference>
<when_to_use>
- Writing new tests
- Adding mocks to tests
- Tempted to add method only tests will use
- Test failing and considering mocking something
- Unsure whether to mock a dependency
- Test setup becoming complex with mocks
**Critical moment:** Before you add a mock or test-only method, use this skill's gate functions.
</when_to_use>
<the_iron_laws>
## Law 1: Never Test Mock Behavior
**Anti-pattern:**
```rust
// ❌ BAD: Testing that mock exists
#[test]
fn test_processes_request() {
let mock_service = MockApiService::new();
let handler = RequestHandler::new(Box::new(mock_service));
// Testing mock existence, not behavior
assert!(handler.service().is_mock());
}
```
**Why wrong:** Verifies mock works, not that code works.
**Fix:**
```rust
// ✅ GOOD: Test real behavior
#[test]
fn test_processes_request() {
let service = TestApiService::new(); // Real implementation or full fake
let handler = RequestHandler::new(Box::new(service));
let result = handler.process_request("data");
assert_eq!(result.status, StatusCode::OK);
}
```
---
## Law 2: Never Add Test-Only Methods to Production
**Anti-pattern:**
```rust
// ❌ BAD: reset() only used in tests
pub struct Connection {
pool: Arc<ConnectionPool>,
}
impl Connection {
pub fn reset(&mut self) { // Looks like production API!
self.pool.clear_all();
}
}
// In tests
#[test]
fn test_something() {
let mut conn = Connection::new();
conn.reset(); // Test-only method
}
```
**Why wrong:**
- Production code polluted with test-only methods
- Dangerous if accidentally called in production
- Confuses object lifecycle with entity lifecycle
**Fix:**
```rust
// ✅ GOOD: Test utilities handle cleanup
// Connection has no reset()
// In tests/test_utils.rs
pub fn cleanup_connection(conn: &Connection) {
if let Some(pool) = conn.get_pool() {
pool.clear_test_data();
}
}
// In tests
#[test]
fn test_something() {
let conn = Connection::new();
cleanup_connection(&conn);
}
```
---
## Law 3: Never Mock Without Understanding
**Anti-pattern:**
```rust
// ❌ BAD: Mock breaks test logic
#[test]
fn test_detects_duplicate_server() {
// Mock prevents config write that test depends on!
let mut config_manager = MockConfigManager::new();
config_manager.expect_add_server()
.returning(|_| Ok(())); // No actual config write!
config_manager.add_server(&config).unwrap();
config_manager.add_server(&config).unwrap(); // Should fail - but won't!
}
```
**Why wrong:** Mocked method had side effect test depended on (writing config).
**Fix:**
```rust
// ✅ GOOD: Mock at correct level
#[test]
fn test_detects_duplicate_server() {
// Mock the slow part, preserve behavior test needs
let server_manager = MockServerManager::new(); // Just mock slow server startup
let config_manager = ConfigManager::new_with_manager(server_manager);
config_manager.add_server(&config).unwrap(); // Config written
let result = config_manager.add_server(&config); // Duplicate detected ✓
assert!(result.is_err());
}
```
</the_iron_laws>
<gate_functions>
## Gate Function 1: Before Asserting on Mock
```
BEFORE any assertion that checks mock elements:
1. Ask: "Am I testing real component behavior or just mock existence?"
2. If testing mock existence:
STOP - Delete the assertion or unmock the component
3. Test real behavior instead
```
**Examples of mock existence testing (all wrong):**
- `assert!(handler.service().is_mock())`
- `XCTAssertTrue(manager.delegate is MockDelegate)`
- `expect(component.database).toBe(mockDb)`
---
## Gate Function 2: Before Adding Method to Production
```
BEFORE adding any method to production class:
1. Ask: "Is this only used by tests?"
2. If yes:
STOP - Don't add it
Put it in test utilities instead
3. Ask: "Does this class own this resource's lifecycle?"
4. If no:
STOP - Wrong class for this method
```
**Red flags:**
- Method named `reset()`, `clear()`, `cleanup()` in production class
- Method only has `#[cfg(test)]` callers
- Method added "for testing purposes"
---
## Gate Function 3: Before Mocking
```
BEFORE mocking any method:
STOP - Don't mock yet
1. Ask: "What side effects does the real method have?"
2. Ask: "Does this test depend on any of those side effects?"
3. Ask: "Do I fully understand what this test needs?"
If depends on side effects:
→ Mock at lower level (the actual slow/external operation)
→ OR use test doubles that preserve necessary behavior
→ NOT the high-level method the test depends on
If unsure what test depends on:
→ Run test with real implementation FIRST
→ Observe what actually needs to happen
→ THEN add minimal mocking at the right level
```
**Red flags:**
- "I'll mock this to be safe"
- "This might be slow, better mock it"
- Mocking without understanding dependency chain
</gate_functions>
<examples>
<example>
<scenario>Developer tests mock behavior instead of real behavior</scenario>
<code>
#[test]
fn test_user_service_initialized() {
let mock_db = MockDatabase::new();
let service = UserService::new(mock_db);
// Testing that mock exists
assert_eq!(service.database().connection_string(), "mock://test");
assert!(service.database().is_test_mode());
}
</code>
<why_it_fails>
- Assertions check mock properties, not service behavior
- Test passes when mock is correct, fails when mock is wrong
- Tells you nothing about whether UserService works
- Would pass even if UserService.new() does nothing
- False confidence - mock works, but does service work?
</why_it_fails>
<correction>
**Apply Gate Function 1:**
"Am I testing real behavior or mock existence?"
→ Testing mock existence (connection_string(), is_test_mode() are mock properties)
**Fix:**
```rust
#[test]
fn test_user_service_creates_user() {
let db = TestDatabase::new(); // Real test implementation
let service = UserService::new(db);
// Test real behavior
let user = service.create_user("alice", "alice@example.com").unwrap();
assert_eq!(user.name, "alice");
assert_eq!(user.email, "alice@example.com");
// Verify user was saved
let retrieved = service.get_user(user.id).unwrap();
assert_eq!(retrieved.name, "alice");
}
```
**What you gain:**
- Tests actual UserService behavior
- Validates create and retrieve work
- Would fail if service broken (even with working mock)
- Confidence service actually works
</correction>
</example>
<example>
<scenario>Developer adds test-only method to production class</scenario>
<code>
// Production code
pub struct Database {
pool: ConnectionPool,
}
impl Database {
pub fn new() -> Self { /* ... */ }
// Added "for testing"
pub fn reset(&mut self) {
self.pool.clear();
self.pool.reinitialize();
}
}
// Tests
#[test]
fn test_user_creation() {
let mut db = Database::new();
// ... test logic ...
db.reset(); // Clean up
}
#[test]
fn test_user_deletion() {
let mut db = Database::new();
// ... test logic ...
db.reset(); // Clean up
}
</code>
<why_it_fails>
- Production Database polluted with test-only reset()
- reset() looks like legitimate API to other developers
- Dangerous if accidentally called in production (clears all data!)
- Violates single responsibility (Database manages connections, not test lifecycle)
- Every test class now needs reset() added
</why_it_fails>
<correction>
**Apply Gate Function 2:**
"Is this only used by tests?" → YES
"Does Database class own test lifecycle?" → NO
**Fix:**
```rust
// Production code (NO reset method)
pub struct Database {
pool: ConnectionPool,
}
impl Database {
pub fn new() -> Self { /* ... */ }
// No reset() - production code clean
}
// Test utilities (tests/test_utils.rs)
pub fn create_test_database() -> Database {
Database::new()
}
pub fn cleanup_database(db: &mut Database) {
// Access internals properly for cleanup
if let Some(pool) = db.get_pool_mut() {
pool.clear_test_data();
}
}
// Tests
#[test]
fn test_user_creation() {
let mut db = create_test_database();
// ... test logic ...
cleanup_database(&mut db);
}
```
**What you gain:**
- Production code has no test pollution
- No risk of accidental production calls
- Clear separation: Database manages connections, test utils manage test lifecycle
- Test utilities can evolve without changing production code
</correction>
</example>
<example>
<scenario>Developer mocks without understanding dependencies</scenario>
<code>
#[test]
fn test_detects_duplicate_server() {
// "I'll mock ConfigManager to speed up the test"
let mut mock_config = MockConfigManager::new();
mock_config.expect_add_server()
.times(2)
.returning(|_| Ok(())); // Always returns Ok!
// Test expects duplicate detection
mock_config.add_server(&server_config).unwrap();
let result = mock_config.add_server(&server_config);
// Assertion fails! Mock always returns Ok, no duplicate detection
assert!(result.is_err()); // FAILS
}
</code>
<why_it_fails>
- Mocked add_server() without understanding it writes config
- Mock returns Ok() both times (no duplicate detection)
- Test depends on ConfigManager's internal state tracking
- Mock eliminates the behavior test needs to verify
- "Speeding up" by mocking broke the test
</why_it_fails>
<correction>
**Apply Gate Function 3:**
"What side effects does add_server() have?" → Writes to config file, tracks added servers
"Does test depend on those?" → YES! Test needs duplicate detection
"Do I understand what test needs?" → Now yes
**Fix:**
```rust
#[test]
fn test_detects_duplicate_server() {
// Mock at the RIGHT level - just the slow I/O
let mock_file_system = MockFileSystem::new(); // Mock slow file writes
let config_manager = ConfigManager::new_with_fs(mock_file_system);
// ConfigManager's duplicate detection still works
config_manager.add_server(&server_config).unwrap();
let result = config_manager.add_server(&server_config);
// Passes! ConfigManager tracks duplicates, only file I/O is mocked
assert!(result.is_err());
}
```
**What you gain:**
- Test verifies real duplicate detection logic
- Only mocked the actual slow part (file I/O)
- ConfigManager's internal tracking works normally
- Test actually validates the feature
</correction>
</example>
</examples>
<additional_anti_patterns>
## Anti-Pattern 4: Incomplete Mocks
**Problem:** Mock only fields you think you need, omit others.
```rust
// ❌ BAD: Partial mock
struct MockResponse {
status: String,
data: UserData,
// Missing: metadata that downstream code uses
}
impl ApiResponse for MockResponse {
fn metadata(&self) -> &Metadata {
panic!("metadata not implemented!") // Breaks at runtime!
}
}
```
**Fix:** Mirror real API completely.
```rust
// ✅ GOOD: Complete mock
struct MockResponse {
status: String,
data: UserData,
metadata: Metadata, // All fields real API returns
}
```
**Gate function:**
```
BEFORE creating mock responses:
1. Examine actual API response structure
2. Include ALL fields system might consume
3. Verify mock matches real schema completely
```
---
## Anti-Pattern 5: Over-Complex Mocks
**Warning signs:**
- Mock setup longer than test logic
- Mocking everything to make test pass
- Test breaks when mock changes
**Consider:** Integration tests with real components often simpler than complex mocks.
</additional_anti_patterns>
<tdd_prevention>
## TDD Prevents These Anti-Patterns
**Why TDD helps:**
1. **Write test first** → Forces thinking about what you're actually testing
2. **Watch it fail** → Confirms test tests real behavior, not mocks
3. **Minimal implementation** → No test-only methods creep in
4. **Real dependencies** → See what test needs before mocking
**If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development before using this skill.
</tdd_prevention>
<critical_rules>
## Rules That Have No Exceptions
1. **Never test mock behavior** → Test real component behavior always
2. **Never add test-only methods to production** → Pollutes production code
3. **Never mock without understanding** → Must know dependencies and side effects
4. **Use gate functions before action** → Before asserting, adding methods, or mocking
5. **Follow TDD** → Write test first, watch fail, prevents testing mocks
## Common Excuses
All of these mean: **STOP. Apply the gate function.**
- "Just checking the mock is wired up" (Testing mock, not behavior)
- "Need reset() for test cleanup" (Test-only method, use test utilities)
- "I'll mock this to be safe" (Don't understand dependencies)
- "Mock setup is complex but necessary" (Probably over-mocking)
- "This will speed up tests" (Might break test logic)
</critical_rules>
<verification_checklist>
Before claiming tests are correct:
- [ ] No assertions on mock elements (no `is_mock()`, `is MockType`, etc.)
- [ ] No test-only methods in production classes
- [ ] All mocks preserve side effects test depends on
- [ ] Mock at lowest level needed (mock slow I/O, not business logic)
- [ ] Understand why each mock is necessary
- [ ] Mock structure matches real API completely
- [ ] Test logic shorter/equal to mock setup (not longer)
- [ ] Followed TDD (test failed with real code before mocking)
**Can't check all boxes?** Apply gate functions and refactor.
</verification_checklist>
<integration>
**This skill requires:**
- hyperpowers:test-driven-development (prevents these anti-patterns)
- Understanding of mocking vs. faking vs. stubbing
**This skill is called by:**
- When writing tests
- When adding mocks
- When test setup becoming complex
- hyperpowers:test-driven-development (use gate functions during RED phase)
**Red flags triggering this skill:**
- Assertion checks for `*-mock` test IDs
- Methods only called in test files
- Mock setup >50% of test
- Test fails when you remove mock
- Can't explain why mock needed
</integration>
<resources>
**Detailed guides:**
- [Mocking vs Faking vs Stubbing](resources/test-doubles.md)
- [Test utilities patterns](resources/test-utilities.md)
- [When to use integration tests](resources/integration-vs-unit.md)
**When stuck:**
- Mock too complex → Consider integration test with real components
- Unsure what to mock → Run with real implementation first, observe
- Test failing mysteriously → Check if mock breaks test logic (use Gate Function 3)
- Production polluted → Move all test helpers to test_utils
</resources>

386
skills/using-hyper/SKILL.md Normal file
View File

@@ -0,0 +1,386 @@
---
name: using-hyper
description: Use when starting any conversation - establishes mandatory workflows for finding and using skills
---
<EXTREMELY_IMPORTANT>
If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST read the skill.
**IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.**
This is not negotiable. This is not optional. You cannot rationalize your way out of this.
</EXTREMELY_IMPORTANT>
<skill_overview>
Skills are proven workflows; if one exists for your task, using it is mandatory, not optional.
</skill_overview>
<rigidity_level>
HIGH FREEDOM - The meta-process (check for skills, use Skill tool, announce usage) is rigid, but each individual skill defines its own rigidity level.
</rigidity_level>
<quick_reference>
**Before responding to ANY user message:**
1. List available skills mentally
2. Ask: "Does ANY skill match this request?"
3. If yes → Use Skill tool to load the skill file
4. Announce which skill you're using
5. Follow the skill exactly as written
**Skill has checklist?** Create TodoWrite for every item.
**Finding a relevant skill = mandatory to use it.**
</quick_reference>
<when_to_use>
This skill applies at the start of EVERY conversation and BEFORE every task:
- User asks you to implement a feature
- User asks you to fix a bug
- User asks you to refactor code
- User asks you to debug an issue
- User asks you to write tests
- User asks you to review code
- User describes a problem to solve
- User provides requirements to implement
**Applies to:** Literally any task that might have a corresponding skill.
</when_to_use>
<the_process>
## 1. MANDATORY FIRST RESPONSE PROTOCOL
Before responding to ANY user message, complete this checklist:
1. ☐ List available skills in your mind
2. ☐ Ask yourself: "Does ANY skill match this request?"
3. ☐ If yes → Use the Skill tool to read and run the skill file
4. ☐ Announce which skill you're using
5. ☐ Follow the skill exactly
**Responding WITHOUT completing this checklist = automatic failure.**
---
## 2. Execute Skills with the Skill Tool
**Always use the Skill tool to load skills.** Never rely on memory.
```
Skill tool: "hyperpowers:test-driven-development"
```
**Why:**
- Skills evolve - you need the current version
- Using the tool ensures you get the full skill content
- Confirms to user you're following the skill
---
## 3. Announce Skill Usage
Before using a skill, announce it:
**Format:** "I'm using [Skill Name] to [what you're doing]."
**Examples:**
- "I'm using hyperpowers:brainstorming to refine your idea into a design."
- "I'm using hyperpowers:test-driven-development to implement this feature."
- "I'm using hyperpowers:debugging-with-tools to investigate this error."
**Why:** Transparency helps user understand your process and catch errors early. Confirms you actually read the skill.
---
## 4. Follow Mandatory Workflows
**Before writing ANY code:**
- Use hyperpowers:brainstorming to refine requirements
- Use hyperpowers:writing-plans to create detailed plan
- Use hyperpowers:executing-plans to implement iteratively
**When implementing:**
- Use hyperpowers:test-driven-development (RED-GREEN-REFACTOR cycle)
- Use hyperpowers:verification-before-completion before claiming done
**When debugging:**
- Use hyperpowers:debugging-with-tools (tools first, fixes second)
- Use hyperpowers:fixing-bugs (complete workflow from discovery to closure)
**User instructions describe WHAT to do, not HOW.** "Add X" means use brainstorming, TDD, verification. Not permission to skip workflows.
---
## 5. Create TodoWrite for Skill Checklists
If a skill has a checklist, YOU MUST create TodoWrite todos for EACH item.
**Don't:**
- Work through checklist mentally
- Skip creating todos "to save time"
- Batch multiple items into one todo
- Mark complete without doing them
**Why:** Checklists without TodoWrite tracking = steps get skipped. Every time. The overhead is tiny compared to missing steps.
**Example:**
```
Skill has verification checklist:
- [ ] All tests pass
- [ ] No linter warnings
- [ ] bd task updated
TodoWrite todos:
1. Run all tests and verify they pass
2. Run linter and verify no warnings
3. Update bd task with completion status
```
</the_process>
<examples>
<example>
<scenario>User asks to implement a new feature</scenario>
<code>
User: "Add a user profile page with avatar upload"
Claude (without using-hyper):
"Sure! Let me start implementing the user profile page..."
[Starts writing code immediately without brainstorming or planning]
</code>
<why_it_fails>
- Skipped mandatory brainstorming (requirements unclear)
- Skipped writing-plans (no detailed implementation plan)
- Skipped test-driven-development (no tests first)
- Will likely miss edge cases, make wrong assumptions
- User has no visibility into approach before code is written
</why_it_fails>
<correction>
**Correct approach:**
Claude: "I'm using hyperpowers:brainstorming to refine your requirements into a complete design."
[Skill tool loads brainstorming]
[Asks Socratic questions about avatar size limits, formats, storage, etc.]
[Creates refined requirements]
Claude: "Now I'm using hyperpowers:writing-plans to create a detailed implementation plan."
[Skill tool loads writing-plans]
[Creates bd epic with tasks]
Claude: "Now I'm using hyperpowers:executing-plans to implement the tasks iteratively."
[Skill tool loads executing-plans]
[Implements with TDD, commits incrementally]
**What you gain:**
- Clear requirements before coding
- Detailed plan user can review
- Implementation follows proven workflow
- Tests written first (TDD)
- Progress tracked in bd
</correction>
</example>
<example>
<scenario>User reports a bug</scenario>
<code>
User: "Users can't log in after password reset"
Claude (without using-hyper):
"Let me check the password reset code... I found the issue, it's not updating the password hash. Let me fix that..."
[Makes immediate fix without investigation]
</code>
<why_it_fails>
- Skipped debugging-with-tools (no systematic investigation)
- Skipped fixing-bugs workflow (no bd issue, no regression test)
- "Obvious" fix might be wrong or incomplete
- No verification that fix actually works
- Will likely miss root cause and fix symptom
</why_it_fails>
<correction>
**Correct approach:**
Claude: "I'm using hyperpowers:debugging-with-tools to investigate this systematically."
[Skill tool loads debugging-with-tools]
[Uses debugger, checks logs, searches internet for known issues]
[Identifies root cause: token invalidation timing issue]
Claude: "Now I'm using hyperpowers:fixing-bugs to implement a complete fix."
[Skill tool loads fixing-bugs]
[Creates bd bug issue]
[Writes failing test reproducing bug (RED)]
[Implements minimal fix (GREEN)]
[Runs full test suite via test-runner agent]
[Verifies fix and closes bd issue]
**What you gain:**
- Root cause identified (not symptom)
- Regression test prevents recurrence
- Fix verified with evidence
- Bug tracked in bd
- Complete workflow documented
</correction>
</example>
<example>
<scenario>User asks to refactor code</scenario>
<code>
User: "This validation logic is duplicated, can you extract it?"
Claude (without using-hyper):
"Sure, let me extract that into a shared function..."
[Changes 5 files at once, then runs tests]
[Tests fail]
"Hmm, let me fix these test failures..."
</code>
<why_it_fails>
- Skipped refactoring-safely workflow (change→test→commit cycle)
- Big-bang refactoring (changed multiple files before testing)
- Tests failing means behavior changed (not safe refactoring)
- Can't identify which change broke what
- Wastes time debugging entire refactoring
</why_it_fails>
<correction>
**Correct approach:**
Claude: "I'm using hyperpowers:refactoring-safely to extract this validation logic incrementally."
[Skill tool loads refactoring-safely]
Step 1: Verify tests pass BEFORE refactoring
Step 2: Create bd refactoring task
Step 3: Extract validation from first file → test → commit
Step 4: Extract validation from second file → test → commit
Step 5: Create shared validator → test → commit
Step 6: Final verification → close bd task
**What you gain:**
- Tests stay green throughout (safe refactoring)
- Each commit is reviewable independently
- Know exactly which change broke if test fails
- Can stop halfway with useful progress
- Clear history of transformations
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Check for relevant skills BEFORE any task** → If skill exists, use it (not optional)
2. **Use Skill tool to load skills** → Never rely on memory (skills evolve)
3. **Announce skill usage** → Transparency helps catch errors early
4. **Follow mandatory workflows** → brainstorming before coding, TDD for implementation, verification before claiming done
5. **Create TodoWrite for checklists** → Mental tracking = skipped steps
## Common Rationalizations
All of these mean: **STOP. Check for and use the relevant skill.**
- "This is just a simple question" (Questions are tasks. Check for skills.)
- "I can check git/files quickly" (Files lack context. Check for skills.)
- "Let me gather information first" (Skills tell you HOW to gather. Check for skills.)
- "This doesn't need a formal skill" (If skill exists, use it. Not optional.)
- "I remember this skill" (Skills evolve. Use Skill tool to load current version.)
- "This doesn't count as a task" (Taking action = task. Check for skills.)
- "The skill is overkill for this" (Skills exist because "simple" becomes complex.)
- "I'll just do this one thing first" (Check for skills BEFORE doing anything.)
- "Instruction was specific so I can skip brainstorming" (Specific instructions = WHAT, not HOW. Use workflows.)
</critical_rules>
<understanding_rigidity>
## Rigid Skills (Follow Exactly)
These have LOW FREEDOM - follow the exact process:
- hyperpowers:test-driven-development (RED-GREEN-REFACTOR cycle)
- hyperpowers:verification-before-completion (evidence before claims)
- hyperpowers:executing-plans (continuous execution, substep tracking)
## Flexible Skills (Adapt Principles)
These have HIGH FREEDOM - adapt core principles to context:
- hyperpowers:brainstorming (Socratic method, but questions vary)
- hyperpowers:managing-bd-tasks (operations adapt to project)
- hyperpowers:sre-task-refinement (corner case analysis, but depth varies)
**The skill itself tells you its rigidity level.** Check `<rigidity_level>` section.
</understanding_rigidity>
<instructions_vs_workflows>
## User Instructions Describe WHAT, Not HOW
**User says:** "Add user authentication"
**This means:** Use brainstorming → writing-plans → executing-plans → TDD → verification
**User says:** "Fix this bug"
**This means:** Use debugging-with-tools → fixing-bugs → TDD → verification
**User says:** "Refactor this code"
**This means:** Use refactoring-safely (change→test→commit cycle)
**User instructions are the GOAL, not permission to skip workflows.**
**Red flags that you're rationalizing:**
- "Instruction was specific, don't need brainstorming"
- "Seems simple, don't need TDD"
- "Workflow is overkill for this"
**Why workflows matter MORE when instructions are specific:**
- Clear requirements = perfect time for structured implementation
- "Simple" tasks often have hidden complexity
- Skipping process on "easy" tasks is how they become hard problems
</instructions_vs_workflows>
<verification_checklist>
Before completing ANY task:
- [ ] Did I check for relevant skills before starting?
- [ ] Did I use Skill tool to load skills (not rely on memory)?
- [ ] Did I announce which skill I'm using?
- [ ] Did I follow the skill's process exactly?
- [ ] Did I create TodoWrite for any skill checklists?
- [ ] Did I follow mandatory workflows (brainstorming, TDD, verification)?
**Can't check all boxes?** You skipped critical steps. Review and fix.
</verification_checklist>
<integration>
**This skill calls:**
- ALL other skills (meta-skill that triggers appropriate skill usage)
**This skill is called by:**
- Session start (always loaded)
- User requests (check before every task)
**Critical workflows this establishes:**
- hyperpowers:brainstorming (before writing code)
- hyperpowers:test-driven-development (during implementation)
- hyperpowers:verification-before-completion (before claiming done)
</integration>
<resources>
**Available skills:**
- See skill descriptions in Skill tool's "Available Commands" section
- Each skill's description shows when to use it
**When unsure if skill applies:**
- If there's even 1% chance it applies → use it
- Better to load and decide "not needed" than to skip and fail
- Skills are optimized, loading them is cheap
</resources>

View File

@@ -0,0 +1,363 @@
---
name: verification-before-completion
description: Use before claiming work complete, fixed, or passing - requires running verification commands and confirming output; evidence before assertions always
---
<skill_overview>
Claiming work is complete without verification is dishonesty, not efficiency. Evidence before claims, always.
</skill_overview>
<rigidity_level>
LOW FREEDOM - NO exceptions. Run verification command, read output, THEN make claim.
No shortcuts. No "should work". No partial verification. Run it, prove it.
</rigidity_level>
<quick_reference>
| Claim | Verification Required | Not Sufficient |
|-------|----------------------|----------------|
| **Tests pass** | Run full test command, see 0 failures | Previous run, "should pass" |
| **Build succeeds** | Run build, see exit 0 | Linter passing |
| **Bug fixed** | Test original symptom, passes | Code changed |
| **Task complete** | Check all success criteria, run verifications | "Implemented bd-3" |
| **Epic complete** | `bd list --status open --parent bd-1` shows 0 | "All tasks done" |
**Iron Law:** NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
**Use test-runner agent for:** Tests, pre-commit hooks, commits (keeps verbose output out of context)
</quick_reference>
<when_to_use>
**ALWAYS before:**
- Any success/completion claim
- Any expression of satisfaction
- Committing, PR creation, task completion
- Moving to next task
- ANY communication suggesting completion/correctness
**Red flags you need this:**
- Using "should", "probably", "seems to"
- Expressing satisfaction before verification ("Great!", "Perfect!")
- About to commit/push without verification
- Trusting agent success reports
- Relying on partial verification
</when_to_use>
<the_process>
## The Gate Function
Before claiming ANY status:
### 1. Identify
What command proves this claim?
### 2. Run
Execute the full command (fresh, complete).
**For tests/hooks/commits:** Use `hyperpowers:test-runner` agent
- Agent captures verbose output in its context
- Returns only summary + failures
- Prevents context pollution
**For other commands:** Run directly and capture output
### 3. Read
Full output, check exit code, count failures.
### 4. Verify
Does output confirm the claim?
- If NO: State actual status with evidence
- If YES: State claim with evidence
### 5. Only Then
Make the claim.
**Skip any step = lying, not verifying**
</the_process>
<examples>
<example>
<scenario>Developer claims tests pass without running them</scenario>
<code>
Developer modifies authentication logic.
Developer thinks: "This fix is straightforward, tests should pass now"
Developer writes: "Great! All tests passing. The bug is fixed."
[No test command run, no output shown]
</code>
<why_it_fails>
**No evidence:**
- "Should pass" ≠ evidence
- Confidence ≠ verification
- Might have broken other tests
- Might not have fixed the bug
**Why dangerous:**
- Broken code ships
- Trust broken with partner
- Wastes time on false completion
</why_it_fails>
<correction>
**Run verification:**
```bash
# Dispatch test-runner agent
"Run: cargo test"
```
**Agent returns:**
```
Summary: 33 passed, 1 failed
Failures:
- test_login_with_expired_token: assertion failed
Expected: Err(AuthError::TokenExpired)
Got: Ok(User { id: 123 })
```
**State actual status:**
```
Tests: 33 passed, 1 failed.
Failure: test_login_with_expired_token still fails.
The fix didn't handle expired tokens.
Investigating...
```
**Result:** Accurate status, bug caught before committing.
</correction>
</example>
<example>
<scenario>Developer closes bd task without verifying success criteria</scenario>
<code>
bd-3 success criteria:
- [ ] All functions fully implemented (no stubs, no TODOs)
- [ ] Tests written and passing: test_generate_token, test_verify_token
- [ ] Pre-commit hooks pass
Developer implements token functions.
Developer thinks: "I implemented everything, task complete"
Runs: bd close bd-3
[No verification commands run]
</code>
<why_it_fails>
**Success criteria not verified:**
- Might have TODO comments left
- Specific tests not run
- Pre-commit hooks not checked
**Result:** Incomplete task closed, issues discovered later.
</why_it_fails>
<correction>
**Verify each criterion:**
```bash
# 1. Check for TODOs
rg "TODO|FIXME|todo!|unimplemented!" src/
# Output: [nothing found] ✅
# 2. Run specific tests
cargo test test_generate_token test_verify_token
# Output: test result: ok. 2 passed ✅
# 3. Run pre-commit hooks (via test-runner agent)
"Run: cargo fmt --check && cargo clippy && cargo test"
```
**Agent returns:**
```
All checks passed:
- cargo fmt: no formatting issues
- cargo clippy: no warnings
- cargo test: 34/34 passed
```
**THEN close task:**
```bash
bd close bd-3
```
**Result:** Task actually complete, all criteria met with evidence.
</correction>
</example>
<example>
<scenario>Developer claims build works based on linter passing</scenario>
<code>
Developer fixes TypeScript errors flagged by linter.
Runs: npm run lint
Output: ✅ No issues found
Developer writes: "Build passes! Ready to commit."
[No build command run]
</code>
<why_it_fails>
**Linter ≠ compiler:**
- Linter checks style and simple errors
- Doesn't verify imports, types, compilation
- Build might still fail
**Why it happens:** Partial verification, extrapolation.
</why_it_fails>
<correction>
**Run actual build:**
```bash
npm run build
```
**Output:**
```
ERROR in src/auth/service.ts:23:15
TS2304: Cannot find name 'validateToken'
Did you mean 'verifyToken'?
```
**State actual status:**
```
Linter passes, but build fails.
Error: validateToken not defined (should be verifyToken).
Fixing...
```
**After fix, run build again:**
```bash
npm run build
# Output: ✓ built in 1.2s
```
**Now can claim:**
```
Build passes. [Evidence: npm run build exit 0]
Ready to commit.
```
**Result:** Actual build status verified, error caught.
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **No claims without fresh verification** → Run command, see output, THEN claim
- "Should work" = forbidden
- "Looks correct" = forbidden
- Previous run ≠ fresh verification
2. **Use test-runner agent for verbose commands** → Tests, hooks, commits
- Prevents context pollution
- Returns summary + failures only
- Never run `git commit` or `cargo test` directly if output is verbose
3. **Verify ALL success criteria** → Not just "tests pass"
- Read each criterion from bd task
- Run verification for each
- Check all pass before closing
4. **Evidence in every claim** → Show the output
- Not: "Tests pass"
- Yes: "Tests pass [Ran: cargo test, Output: 34/34 passed]"
## Common Excuses
All of these mean: Stop, run verification:
- "Should work now"
- "I'm confident this fixes it"
- "Just this once"
- "Linter passed" (when claiming build works)
- "Agent said success" (without independent verification)
- "I'm tired" (exhaustion ≠ excuse)
- "Partial check is enough"
## Pre-Commit Hook Assumption
**If your project uses pre-commit hooks enforcing tests:**
- All test failures are from your current changes
- Never check if errors were "pre-existing"
- Don't run `git checkout <sha> && pytest` to verify
- Pre-commit hooks guarantee previous commit passed
- Just fix the error directly
</critical_rules>
<verification_checklist>
Before claiming tests pass:
- [ ] Ran full test command (not partial)
- [ ] Saw output showing 0 failures
- [ ] Used test-runner agent if output verbose
Before claiming build succeeds:
- [ ] Ran build command (not just linter)
- [ ] Saw exit code 0
- [ ] Checked for compilation errors
Before closing bd task:
- [ ] Re-read success criteria from bd task
- [ ] Ran verification for each criterion
- [ ] Saw evidence all pass
- [ ] THEN closed task
Before closing bd epic:
- [ ] Ran `bd list --status open --parent bd-1`
- [ ] Saw 0 open tasks
- [ ] Ran `bd dep tree bd-1`
- [ ] Confirmed all tasks closed
- [ ] THEN closed epic
</verification_checklist>
<integration>
**This skill calls:**
- test-runner (for verbose verification commands)
**This skill is called by:**
- test-driven-development (verify tests pass/fail)
- executing-plans (verify task success criteria)
- refactoring-safely (verify tests still pass)
- ALL skills before completion claims
**Agents used:**
- hyperpowers:test-runner (run tests, hooks, commits without output pollution)
</integration>
<resources>
**When stuck:**
- Tempted to say "should work" → Run the verification
- Agent reports success → Check VCS diff, verify independently
- Partial verification → Run complete command
- Tired and want to finish → Run verification anyway, no exceptions
**Verification patterns:**
- Tests: Use test-runner agent, check 0 failures
- Build: Run build command, check exit 0
- bd task: Verify each success criterion
- bd epic: Check all tasks closed with bd list/dep tree
</resources>

View File

@@ -0,0 +1,478 @@
---
name: writing-plans
description: Use to expand bd tasks with detailed implementation steps - adds exact file paths, complete code, verification commands assuming zero context
---
<skill_overview>
Enhance bd tasks with comprehensive implementation details for engineers with zero codebase context. Expand checklists into explicit steps: which files, complete code examples, exact commands, verification steps.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Follow task-by-task validation pattern, use codebase-investigator for verification.
Adapt implementation details to actual codebase state. Never use placeholders or meta-references.
</rigidity_level>
<quick_reference>
| Step | Action | Critical Rule |
|------|--------|---------------|
| **Identify Scope** | Single task, range, or full epic | No artificial limits |
| **Verify Codebase** | Use `codebase-investigator` agent | NEVER verify yourself, report discrepancies |
| **Draft Steps** | Write bite-sized (2-5 min) actions | Follow TDD cycle for new features |
| **Present to User** | Show COMPLETE expansion FIRST | Then ask for approval |
| **Update bd** | `bd update bd-N --design "..."` | Only after user approves |
| **Continue** | Move to next task automatically | NO asking permission between tasks |
**FORBIDDEN:** Placeholders like `[Full implementation steps as detailed above]`
**REQUIRED:** Actual content - complete code, exact paths, real commands
</quick_reference>
<when_to_use>
**Use after hyperpowers:sre-task-refinement or anytime tasks need more detail.**
Symptoms:
- bd tasks have implementation checklists but need expansion
- Engineer needs step-by-step guide with zero context
- Want explicit file paths, complete code examples
- Need exact verification commands
</when_to_use>
<the_process>
## 1. Identify Tasks to Expand
**User specifies scope:**
- Single: "Expand bd-2"
- Range: "Expand bd-2 through bd-5"
- Epic: "Expand all tasks in bd-1"
**If epic:**
```bash
bd dep tree bd-1 # View complete dependency tree
# Note all child task IDs
```
**Create TodoWrite tracker:**
```
- [ ] bd-2: [Task Title]
- [ ] bd-3: [Task Title]
...
```
## 2. For EACH Task (Loop Until All Done)
### 2a. Mark In Progress and Read Current State
```bash
# Mark in TodoWrite: in_progress
bd show bd-3 # Read current task design
```
### 2b. Verify Codebase State
**CRITICAL: Use codebase-investigator agent, NEVER verify yourself.**
**Provide agent with bd assumptions:**
```
Assumptions from bd-3:
- Auth service should be in src/services/auth.ts with login() and logout()
- User model in src/models/user.ts with email and password fields
- Test file at tests/services/auth.test.ts
- Uses bcrypt dependency for password hashing
Verify these assumptions and report:
1. What exists vs what bd-3 expects
2. Structural differences (different paths, functions, exports)
3. Missing or additional components
4. Current dependency versions
```
**Based on investigator report:**
- ✓ Confirmed assumptions → Use in implementation
- ✗ Incorrect assumptions → Adjust plan to match reality
- + Found additional → Document and incorporate
**NEVER write conditional steps:**
❌ "Update `index.js` if exists"
❌ "Modify `config.py` (if present)"
**ALWAYS write definitive steps:**
✅ "Create `src/auth.ts`" (investigator confirmed doesn't exist)
✅ "Modify `src/index.ts:45-67`" (investigator confirmed exists)
### 2c. Draft Expanded Implementation Steps
**Bite-sized granularity (2-5 minutes per step):**
For new features (follow test-driven-development):
1. Write the failing test (one step)
2. Run it to verify it fails (one step)
3. Implement minimal code to pass (one step)
4. Run tests to verify they pass (one step)
5. Commit (one step)
**Include in each step:**
- Exact file path
- Complete code example (not pseudo-code)
- Exact command to run
- Expected output
### 2d. Present COMPLETE Expansion to User
**CRITICAL: Show the full expansion BEFORE asking for approval.**
**Format:**
```markdown
**bd-[N]: [Task Title]**
**From bd issue:**
- Goal: [From bd show]
- Effort estimate: [From bd issue]
- Success criteria: [From bd issue]
**Codebase verification findings:**
- ✓ Confirmed: [what matched]
- ✗ Incorrect: [what issue said] - ACTUALLY: [reality]
- + Found: [unexpected discoveries]
**Implementation steps based on actual codebase state:**
### Step Group 1: [Component Name]
**Files:**
- Create: `exact/path/to/file.py`
- Modify: `exact/path/to/existing.py:123-145`
- Test: `tests/exact/path/to/test.py`
**Step 1: Write the failing test**
```python
# tests/auth/test_login.py
def test_login_with_valid_credentials():
user = create_test_user(email="test@example.com", password="secure123")
result = login(email="test@example.com", password="secure123")
assert result.success is True
assert result.user_id == user.id
```
**Step 2: Run test to verify it fails**
```bash
pytest tests/auth/test_login.py::test_login_with_valid_credentials
# Expected: ModuleNotFoundError: No module named 'auth.login'
```
[... continue for all steps ...]
```
**THEN ask for approval using AskUserQuestion:**
- Question: "Is this expansion approved for bd-[N]?"
- Options:
- "Approved - continue to next task"
- "Needs revision"
- "Other"
### 2e. If Approved: Update bd and Continue
```bash
bd update bd-3 --design "[paste complete expansion]"
# Mark completed in TodoWrite
# IMMEDIATELY continue to next task (NO asking permission)
```
### 2f. If Needs Revision: Iterate
- Keep as in_progress in TodoWrite
- Revise based on feedback
- Present again (step 2d)
## 3. After ALL Tasks Done
```
All bd issues now contain detailed implementation steps.
Epic ready for execution.
```
**Offer execution choice:**
"Ready to execute? I can use hyperpowers:executing-plans to implement iteratively."
</the_process>
<examples>
<example>
<scenario>Developer writes placeholder text instead of actual implementation steps</scenario>
<code>
bd update bd-3 --design "## Goal
Implement user authentication
## Implementation
[Full implementation steps as detailed above - includes all 6 step groups with complete code examples]
## Tests
[Complete code examples will be added here]"
</code>
<why_it_fails>
**Placeholders defeat the purpose:**
- Engineer executing bd-3 has zero context
- There is no "above" to reference (bd issue is the source of truth)
- Violates "exact instructions for zero-context engineer" principle
- Makes task impossible to execute
**Common placeholder patterns (ALL FORBIDDEN):**
- `[Full implementation steps as detailed above]`
- `[See above for detailed steps]`
- `[As specified in success criteria]`
- `[Complete code examples will be added here]`
</why_it_fails>
<correction>
**Write actual content:**
```bash
bd update bd-3 --design "## Goal
Implement user authentication
## Implementation
### Step 1: Write failing login test
```python
# tests/auth/test_login.py
import pytest
from auth.service import login
def test_login_with_valid_credentials():
result = login(email='test@example.com', password='pass123')
assert result.success is True
```
### Step 2: Run test (should fail)
```bash
pytest tests/auth/test_login.py::test_login_with_valid_credentials
# Expected: ModuleNotFoundError: No module named 'auth.service'
```
### Step 3: Create login function
```python
# src/auth/service.py
from dataclasses import dataclass
@dataclass
class LoginResult:
success: bool
user_id: int | None = None
def login(email: str, password: str) -> LoginResult:
# Minimal implementation
return LoginResult(success=True, user_id=1)
```
[... continue for all steps with complete code ...]
## Tests
All test code included in implementation steps above following TDD cycle."
```
**Result:** Engineer can execute without any context.
</correction>
</example>
<example>
<scenario>Developer verifies codebase state themselves instead of using codebase-investigator agent</scenario>
<code>
Developer reads files manually:
- Reads src/services/auth.ts directly
- Checks package.json manually
- Assumes file structure based on quick look
Writes expansion based on quick check:
"Modify src/services/auth.ts (if exists)"
</code>
<why_it_fails>
**Manual verification problems:**
- Misses nuances (existing functions, imports, structure)
- Creates conditional steps ("if exists")
- Doesn't catch version mismatches
- Doesn't report discrepancies from bd assumptions
**Result:** Implementation plan may not match actual codebase state.
</why_it_fails>
<correction>
**Use codebase-investigator agent:**
```
Dispatch agent with bd-3 assumptions:
"bd-3 expects auth service in src/services/auth.ts with login() and logout() functions.
Verify:
1. Does src/services/auth.ts exist?
2. What functions does it export?
3. How do login() and logout() work currently?
4. Any other relevant auth code?
5. What's the bcrypt version?"
```
**Agent reports:**
```
✓ src/services/auth.ts exists
✗ ONLY has login() function - NO logout() yet
+ Found: login() uses argon2 NOT bcrypt
+ Found: Session management in src/services/session.ts
✓ argon2 version: 0.31.2
```
**Write definitive steps based on findings:**
```
Step 1: Add logout() function to EXISTING src/services/auth.ts:45-67
(no "if exists" - investigator confirmed location)
Step 2: Use argon2 (already installed 0.31.2) not bcrypt
(no assumption - investigator confirmed actual dependency)
```
**Result:** Plan matches actual codebase state.
</correction>
</example>
<example>
<scenario>Developer asks permission between each task validation instead of continuing automatically</scenario>
<code>
After user approves bd-3 expansion:
Developer: "bd-3 expansion approved and updated in bd.
Should I continue to bd-4 now? What's your preference?"
[Waits for user response]
</code>
<why_it_fails>
**Breaks workflow momentum:**
- Unnecessary interruption
- User has to respond multiple times
- Slows down batch processing
- TodoWrite list IS the plan
**Why it happens:** Over-asking for permission instead of executing the plan.
</why_it_fails>
<correction>
**After user approves bd-3:**
```bash
bd update bd-3 --design "[expansion]" # Update bd
# Mark completed in TodoWrite
```
**IMMEDIATELY continue to bd-4:**
```bash
bd show bd-4 # Read next task
# Dispatch codebase-investigator with bd-4 assumptions
# Draft expansion
# Present bd-4 expansion to user
```
**NO asking:** "Should I continue?" or "What's your preference?"
**ONLY ask user:**
1. When presenting each task expansion for validation
2. At the VERY END after ALL tasks done to offer execution choice
**Between validations: JUST CONTINUE.**
**Result:** Efficient batch processing of all tasks.
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **No placeholders or meta-references** → Write actual content
- ❌ FORBIDDEN: `[Full implementation steps as detailed above]`
- ✅ REQUIRED: Complete code, exact paths, real commands
2. **Use codebase-investigator agent** → Never verify yourself
- Agent gets bd assumptions
- Agent reports discrepancies
- You adjust plan to match reality
3. **Present COMPLETE expansion before asking** → User must SEE before approving
- Show full expansion in message text
- Then use AskUserQuestion for approval
- Never ask without showing first
4. **Continue automatically between validations** → Don't ask permission
- TodoWrite list IS your plan
- Execute it completely
- Only ask: (a) task validation, (b) final execution choice
5. **Write definitive steps** → Never conditional
- ❌ "Update `index.js` if exists"
- ✅ "Create `src/auth.ts`" (investigator confirmed)
## Common Excuses
All of these mean: Stop, write actual content:
- "I'll add the details later"
- "The implementation is obvious from the goal"
- "See above for the steps"
- "User can figure out the code"
</critical_rules>
<verification_checklist>
Before marking each task complete in TodoWrite:
- [ ] Used codebase-investigator agent (not manual verification)
- [ ] Presented COMPLETE expansion to user (showed full text)
- [ ] User approved expansion (via AskUserQuestion)
- [ ] Updated bd with actual content (no placeholders)
- [ ] No meta-references in design field
Before finishing all tasks:
- [ ] All tasks in TodoWrite marked completed
- [ ] All bd issues updated with expansions
- [ ] No conditional steps ("if exists")
- [ ] Complete code examples in all steps
- [ ] Exact file paths and commands throughout
</verification_checklist>
<integration>
**This skill calls:**
- sre-task-refinement (optional, can run before this)
- codebase-investigator (REQUIRED for each task verification)
- executing-plans (offered after all tasks expanded)
**This skill is called by:**
- User (via /hyperpowers:write-plan command)
- After brainstorming creates epic
**Agents used:**
- hyperpowers:codebase-investigator (verify assumptions, report discrepancies)
</integration>
<resources>
**Detailed guidance:**
- [bd command reference](../common-patterns/bd-commands.md)
- [Task structure examples](resources/task-examples.md) (if exists)
**When stuck:**
- Unsure about file structure → Use codebase-investigator
- Don't know version → Use codebase-investigator
- Tempted to write "if exists" → Use codebase-investigator first
- About to write placeholder → Stop, write actual content
- Want to ask permission → Check: Is this task validation or final choice? If neither, don't ask
</resources>

View File

@@ -0,0 +1,599 @@
---
name: writing-skills
description: Use when creating new skills, editing existing skills, or verifying skills work - applies TDD to documentation by testing with subagents before writing
---
<skill_overview>
Writing skills IS test-driven development applied to process documentation; write test (pressure scenario), watch fail (baseline), write skill, watch pass, refactor (close loopholes).
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow the RED-GREEN-REFACTOR cycle exactly when creating skills. No skill without failing test first. Same Iron Law as TDD.
</rigidity_level>
<quick_reference>
| Phase | Action | Verify |
|-------|--------|--------|
| **RED** | Create pressure scenarios | Document baseline failures |
| **RED** | Run WITHOUT skill | Agent violates rule |
| **GREEN** | Write minimal skill | Addresses baseline failures |
| **GREEN** | Run WITH skill | Agent now complies |
| **REFACTOR** | Find new rationalizations | Agent still complies |
| **REFACTOR** | Add explicit counters | Bulletproof against excuses |
| **DEPLOY** | Commit and optionally PR | Skill ready for use |
**Iron Law:** NO SKILL WITHOUT FAILING TEST FIRST (applies to new skills AND edits)
</quick_reference>
<when_to_use>
**Create skill when:**
- Technique wasn't intuitively obvious to you
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit from this knowledge
**Never create for:**
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md instead)
**Edit existing skill when:**
- Found new rationalization agents use
- Discovered loophole in current guidance
- Need to add clarifying examples
**ALWAYS test before writing or editing. No exceptions.**
</when_to_use>
<tdd_mapping>
Skills use the exact same TDD cycle as code:
| TDD Concept | Skill Creation |
|-------------|----------------|
| **Test case** | Pressure scenario with subagent |
| **Production code** | Skill document (SKILL.md) |
| **Test fails (RED)** | Agent violates rule without skill |
| **Test passes (GREEN)** | Agent complies with skill present |
| **Refactor** | Close loopholes while maintaining compliance |
| **Write test first** | Run baseline scenario BEFORE writing skill |
| **Watch it fail** | Document exact rationalizations agent uses |
| **Minimal code** | Write skill addressing those specific violations |
| **Watch it pass** | Verify agent now complies |
| **Refactor cycle** | Find new rationalizations → plug → re-verify |
**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development before using this skill.
</tdd_mapping>
<the_process>
## 1. RED Phase - Create Failing Test
**Create pressure scenarios for subagent:**
```
Task tool with general-purpose agent:
"You are implementing a payment processing feature. User requirements:
- Process credit card payments
- Handle retries on failure
- Log all transactions
[PRESSURE 1: Time] You have 10 minutes before deployment.
[PRESSURE 2: Sunk Cost] You've already written 200 lines of code.
[PRESSURE 3: Authority] Senior engineer said 'just make it work, tests can wait.'
Implement this feature."
```
**Run WITHOUT skill present.**
**Document baseline behavior:**
- Exact rationalizations agent uses ("tests can wait," "simple feature," etc.)
- What agent skips (tests, verification, bd task, etc.)
- Patterns in failure modes
**Example baseline result:**
```
Agent response:
"I'll implement the payment processing quickly since time is tight..."
[Skips TDD]
[Skips verification-before-completion]
[Claims done without evidence]
```
**This is your failing test.** Agent doesn't follow the workflow without guidance.
---
## 2. GREEN Phase - Write Minimal Skill
Write skill that addresses the SPECIFIC failures from baseline:
**Structure:**
```markdown
---
name: skill-name-with-hyphens
description: Use when [specific triggers] - [what skill does]
---
<skill_overview>
One sentence core principle
</skill_overview>
<rigidity_level>
LOW | MEDIUM | HIGH FREEDOM - [What this means]
</rigidity_level>
[Rest of standard XML structure]
```
**Frontmatter rules:**
- Only `name` and `description` fields (max 1024 chars total)
- Name: letters, numbers, hyphens only (no parentheses/special chars)
- Description: Start with "Use when...", third person, includes triggers
**Description format:**
```yaml
# ❌ BAD: Too abstract, first person
description: I can help with async tests when they're flaky
# ✅ GOOD: Starts with "Use when", describes problem
description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
```
**Write skill addressing baseline failures:**
- Add explicit counters for rationalizations ("tests can wait" → "NO EXCEPTIONS: tests first")
- Create quick reference table for scanning
- Add concrete examples showing failure modes
- Use XML structure for all sections
**Run WITH skill present.**
**Verify agent now complies:**
- Same pressure scenario
- Agent now follows workflow
- No rationalizations from baseline appear
**This is your passing test.**
---
## 3. REFACTOR Phase - Close Loopholes
**Find NEW rationalizations:**
Run skill with DIFFERENT pressures:
- Combine 3+ pressures (time + sunk cost + exhaustion)
- Try meta-rationalizations ("this skill doesn't apply because...")
- Test with edge cases
**Document new failures:**
- What rationalizations appear NOW?
- What loopholes did agent find?
- What explicit counters are needed?
**Add counters to skill:**
```markdown
<critical_rules>
## Common Excuses
All of these mean: [Action to take]
- "Test can wait" (NO, test first always)
- "Simple feature" (Simple breaks too, test first)
- "Time pressure" (Broken code wastes more time)
[Add ALL rationalizations found during testing]
</critical_rules>
```
**Re-test until bulletproof:**
- Run scenarios again
- Verify new counters work
- Agent complies even under combined pressures
---
## 4. Quality Checks
Before deployment, verify:
- [ ] Has `<quick_reference>` section (scannable table)
- [ ] Has `<rigidity_level>` explicit
- [ ] Has 2-3 `<example>` tags showing failure modes
- [ ] Description <500 chars, starts with "Use when..."
- [ ] Keywords throughout for search (error messages, symptoms, tools)
- [ ] One excellent code example (not multi-language)
- [ ] Supporting files only for tools or heavy reference (>100 lines)
**Token efficiency:**
- Frequently-loaded skills: <200 words ideally
- Other skills: <500 words
- Move heavy content to resources/ files
---
## 5. Deploy
**Commit to git:**
```bash
git add skills/skill-name/
git commit -m "feat: add [skill-name] skill
Tested with subagents under [pressures used].
Addresses [baseline failures found].
Closes rationalizations:
- [Rationalization 1]
- [Rationalization 2]"
```
**Personal skills:** Write to `~/.claude/skills/` for cross-project use
**Plugin skills:** PR to plugin repository if broadly useful
**STOP:** Before moving to next skill, complete this entire process. No batching untested skills.
</the_process>
<examples>
<example>
<scenario>Developer writes skill without testing first</scenario>
<code>
# Developer writes skill:
"---
name: always-use-tdd
description: Always write tests first
---
Write tests first. No exceptions."
# Then tries to deploy it
</code>
<why_it_fails>
- No baseline behavior documented (don't know what agent does WITHOUT skill)
- No verification skill actually works (might not address real rationalizations)
- Generic guidance ("no exceptions") without specific counters
- Will likely miss common excuses agents use
- Violates Iron Law: no skill without failing test first
</why_it_fails>
<correction>
**Correct approach (RED-GREEN-REFACTOR):**
**RED Phase:**
1. Create pressure scenario (time + sunk cost)
2. Run WITHOUT skill
3. Document baseline: Agent says "I'll test after since time is tight"
**GREEN Phase:**
1. Write skill with explicit counter to that rationalization
2. Add: "Common excuses: 'Time is tight' → Wrong. Broken code wastes more time. Write test first."
3. Run WITH skill → agent now writes test first
**REFACTOR Phase:**
1. Try new pressure (exhaustion: "this is the 5th feature today")
2. Agent finds loophole: "these are all similar, I can skip tests"
3. Add counter: "Similar ≠ identical. Write test for each."
4. Re-test → bulletproof
**What you gain:**
- Know skill addresses real failures (saw baseline)
- Confident skill works (saw it fix behavior)
- Closed all loopholes (tested multiple pressures)
- Ready for production use
</correction>
</example>
<example>
<scenario>Developer edits skill without testing changes</scenario>
<code>
# Existing skill works well
# Developer thinks: "I'll just add this section about edge cases"
[Adds 50 lines to skill]
# Commits without testing
</code>
<why_it_fails>
- Don't know if new section actually helps (no baseline)
- Might introduce contradictions with existing guidance
- Could make skill less effective (more verbose, less clear)
- Violates Iron Law: applies to edits too
- Changes might not address actual rationalization patterns
</why_it_fails>
<correction>
**Correct approach:**
**RED Phase (for edit):**
1. Identify specific failure mode you want to address
2. Create pressure scenario that triggers it
3. Run WITH current skill → document how agent fails
**GREEN Phase (edit):**
1. Add ONLY content addressing that failure
2. Keep changes minimal
3. Run WITH edited skill → verify agent now complies
**REFACTOR Phase:**
1. Check edit didn't break existing scenarios
2. Run previous test cases
3. Verify all still pass
**What you gain:**
- Changes address real problems (saw failure)
- Know edit helps (saw improvement)
- Didn't break existing guidance (regression tested)
- Skill stays bulletproof
</correction>
</example>
<example>
<scenario>Skill description too vague for search</scenario>
<code>
---
name: async-testing
description: For testing async code
---
# Skill content...
</code>
<why_it_fails>
- Future Claude won't find this when needed
- "For testing async code" too abstract (when would Claude search this?)
- Doesn't describe symptoms or triggers
- Missing keywords like "flaky," "race condition," "timeout"
- Won't show up when agent has the actual problem
</why_it_fails>
<correction>
**Better description:**
```yaml
---
name: condition-based-waiting
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
---
```
**Why this works:**
- Starts with "Use when" (triggers)
- Lists symptoms: "race conditions," "pass/fail inconsistently"
- Describes problem AND solution
- Keywords: "race conditions," "timing," "inconsistent," "timeouts"
- Future Claude searching "why are my tests flaky" will find this
**What you gain:**
- Skill actually gets found when needed
- Claude knows when to use it (clear triggers)
- Search terms match real developer language
- Description doubles as activation criteria
</correction>
</example>
</examples>
<skill_types>
## Technique
Concrete method with steps to follow.
**Examples:** condition-based-waiting, hyperpowers:root-cause-tracing
**Test approach:** Pressure scenarios with combined pressures
## Pattern
Way of thinking about problems.
**Examples:** flatten-with-flags, test-invariants
**Test approach:** Present problems the pattern solves, verify agent applies pattern
## Reference
API docs, syntax guides, tool documentation.
**Examples:** Office document manipulation, API reference guides
**Test approach:** Give task requiring reference, verify agent uses it correctly
**For detailed testing methodology by skill type:** See [resources/testing-methodology.md](resources/testing-methodology.md)
</skill_types>
<file_organization>
## Self-Contained Skill
```
defense-in-depth/
SKILL.md # Everything inline
```
**When:** All content fits, no heavy reference needed
## Skill with Reusable Tool
```
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adapt
```
**When:** Tool is reusable code, not just narrative
## Skill with Heavy Reference
```
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable tools
```
**When:** Reference material too large for inline (>100 lines)
**Keep inline:**
- Principles and concepts
- Code patterns (<50 lines)
- Everything that fits
</file_organization>
<search_optimization>
## Claude Search Optimization (CSO)
Future Claude needs to FIND your skill. Optimize for search.
### 1. Rich Description Field
**Format:** Start with "Use when..." + triggers + what it does
```yaml
# ❌ BAD: Too abstract
description: For async testing
# ❌ BAD: First person
description: I can help you with async tests
# ✅ GOOD: Triggers + problem + solution
description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling
```
### 2. Keyword Coverage
Use words Claude would search for:
- **Error messages:** "Hook timed out", "ENOTEMPTY", "race condition"
- **Symptoms:** "flaky", "hanging", "zombie", "pollution"
- **Synonyms:** "timeout/hang/freeze", "cleanup/teardown/afterEach"
- **Tools:** Actual commands, library names, file types
### 3. Token Efficiency
**Problem:** Frequently-referenced skills load into EVERY conversation.
**Target word counts:**
- Frequently-loaded: <200 words
- Other skills: <500 words
**Techniques:**
- Move details to tool --help
- Use cross-references to other skills
- Compress examples
- Eliminate redundancy
**Verification:**
```bash
wc -w skills/skill-name/SKILL.md
```
### 4. Cross-Referencing
**Use skill name only, with explicit markers:**
```markdown
**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development
**REQUIRED SUB-SKILL:** Use hyperpowers:debugging-with-tools first
```
**Don't use @ links:** Force-loads files immediately, burns context unnecessarily.
</search_optimization>
<critical_rules>
## Rules That Have No Exceptions
1. **NO SKILL WITHOUT FAILING TEST FIRST** → Applies to new skills AND edits
2. **Test with subagents under pressure** → Combined pressures (time + sunk cost + authority)
3. **Document baseline behavior** → Exact rationalizations, not paraphrases
4. **Write minimal skill addressing baseline** → Don't add content not validated by testing
5. **STOP before next skill** → Complete RED-GREEN-REFACTOR-DEPLOY for each skill
## Common Excuses
All of these mean: **STOP. Run baseline test first.**
- "Simple skill, don't need testing" (If simple, testing is fast. Do it.)
- "Just adding documentation" (Documentation can be wrong. Test it.)
- "I'll test after I write a few" (Batching untested = deploying untested code)
- "This is obvious, everyone knows it" (Then baseline will show agent already complies)
- "Testing is overkill for skills" (TDD applies to documentation too)
- "I'll adapt while testing" (Violates RED phase. Start over.)
- "I'll keep untested as reference" (Delete means delete. No exceptions.)
## The Iron Law
Same as TDD:
```
NO SKILL WITHOUT FAILING TEST FIRST
```
**No exceptions for:**
- "Simple additions"
- "Just adding a section"
- "Documentation updates"
- Edits to existing skills
**Write skill before testing?** Delete it. Start over.
</critical_rules>
<verification_checklist>
Before deploying ANY skill:
**RED Phase:**
- [ ] Created pressure scenarios (3+ combined pressures for discipline skills)
- [ ] Ran WITHOUT skill present
- [ ] Documented baseline behavior verbatim (exact rationalizations)
- [ ] Identified patterns in failures
**GREEN Phase:**
- [ ] Name uses only letters, numbers, hyphens
- [ ] YAML frontmatter: name + description only (max 1024 chars)
- [ ] Description starts with "Use when..." and includes triggers
- [ ] Description in third person
- [ ] Has `<quick_reference>` section
- [ ] Has `<rigidity_level>` explicit
- [ ] Has 2-3 `<example>` tags
- [ ] Addresses specific baseline failures
- [ ] Ran WITH skill present
- [ ] Verified agent now complies
**REFACTOR Phase:**
- [ ] Tested with different pressures
- [ ] Found NEW rationalizations
- [ ] Added explicit counters
- [ ] Re-tested until bulletproof
**Quality:**
- [ ] Keywords throughout for search
- [ ] One excellent code example (not multi-language)
- [ ] Token-efficient (check word count)
- [ ] Supporting files only if needed
**Deploy:**
- [ ] Committed to git with descriptive message
- [ ] Pushed to plugin repository (if applicable)
**Can't check all boxes?** Return to process and fix.
</verification_checklist>
<integration>
**This skill requires:**
- hyperpowers:test-driven-development (understand TDD before applying to docs)
- Task tool (for running subagent tests)
**This skill is called by:**
- Anyone creating or editing skills
- Plugin maintainers
- Users with personal skill repositories
**Agents used:**
- general-purpose (for testing skills under pressure)
</integration>
<resources>
**Detailed guides:**
- [Testing methodology by skill type](resources/testing-methodology.md) - How to test disciplines, techniques, patterns, reference skills
- [Anthropic best practices](resources/anthropic-best-practices.md) - Official skill authoring guidance
- [Graphviz conventions](resources/graphviz-conventions.dot) - Flowchart style rules
**When stuck:**
- Skill seems too simple to test → If simple, testing is fast. Do it anyway.
- Don't know what pressures to use → Time + sunk cost + authority always work
- Agent still rationalizes → Add explicit counter for that exact excuse
- Testing feels like overhead → Same as TDD: testing prevents bigger problems
</resources>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,172 @@
digraph STYLE_GUIDE {
// The style guide for our process DSL, written in the DSL itself
// Node type examples with their shapes
subgraph cluster_node_types {
label="NODE TYPES AND SHAPES";
// Questions are diamonds
"Is this a question?" [shape=diamond];
// Actions are boxes (default)
"Take an action" [shape=box];
// Commands are plaintext
"git commit -m 'msg'" [shape=plaintext];
// States are ellipses
"Current state" [shape=ellipse];
// Warnings are octagons
"STOP: Critical warning" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
// Entry/exit are double circles
"Process starts" [shape=doublecircle];
"Process complete" [shape=doublecircle];
// Examples of each
"Is test passing?" [shape=diamond];
"Write test first" [shape=box];
"npm test" [shape=plaintext];
"I am stuck" [shape=ellipse];
"NEVER use git add -A" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
}
// Edge naming conventions
subgraph cluster_edge_types {
label="EDGE LABELS";
"Binary decision?" [shape=diamond];
"Yes path" [shape=box];
"No path" [shape=box];
"Binary decision?" -> "Yes path" [label="yes"];
"Binary decision?" -> "No path" [label="no"];
"Multiple choice?" [shape=diamond];
"Option A" [shape=box];
"Option B" [shape=box];
"Option C" [shape=box];
"Multiple choice?" -> "Option A" [label="condition A"];
"Multiple choice?" -> "Option B" [label="condition B"];
"Multiple choice?" -> "Option C" [label="otherwise"];
"Process A done" [shape=doublecircle];
"Process B starts" [shape=doublecircle];
"Process A done" -> "Process B starts" [label="triggers", style=dotted];
}
// Naming patterns
subgraph cluster_naming_patterns {
label="NAMING PATTERNS";
// Questions end with ?
"Should I do X?";
"Can this be Y?";
"Is Z true?";
"Have I done W?";
// Actions start with verb
"Write the test";
"Search for patterns";
"Commit changes";
"Ask for help";
// Commands are literal
"grep -r 'pattern' .";
"git status";
"npm run build";
// States describe situation
"Test is failing";
"Build complete";
"Stuck on error";
}
// Process structure template
subgraph cluster_structure {
label="PROCESS STRUCTURE TEMPLATE";
"Trigger: Something happens" [shape=ellipse];
"Initial check?" [shape=diamond];
"Main action" [shape=box];
"git status" [shape=plaintext];
"Another check?" [shape=diamond];
"Alternative action" [shape=box];
"STOP: Don't do this" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
"Process complete" [shape=doublecircle];
"Trigger: Something happens" -> "Initial check?";
"Initial check?" -> "Main action" [label="yes"];
"Initial check?" -> "Alternative action" [label="no"];
"Main action" -> "git status";
"git status" -> "Another check?";
"Another check?" -> "Process complete" [label="ok"];
"Another check?" -> "STOP: Don't do this" [label="problem"];
"Alternative action" -> "Process complete";
}
// When to use which shape
subgraph cluster_shape_rules {
label="WHEN TO USE EACH SHAPE";
"Choosing a shape" [shape=ellipse];
"Is it a decision?" [shape=diamond];
"Use diamond" [shape=diamond, style=filled, fillcolor=lightblue];
"Is it a command?" [shape=diamond];
"Use plaintext" [shape=plaintext, style=filled, fillcolor=lightgray];
"Is it a warning?" [shape=diamond];
"Use octagon" [shape=octagon, style=filled, fillcolor=pink];
"Is it entry/exit?" [shape=diamond];
"Use doublecircle" [shape=doublecircle, style=filled, fillcolor=lightgreen];
"Is it a state?" [shape=diamond];
"Use ellipse" [shape=ellipse, style=filled, fillcolor=lightyellow];
"Default: use box" [shape=box, style=filled, fillcolor=lightcyan];
"Choosing a shape" -> "Is it a decision?";
"Is it a decision?" -> "Use diamond" [label="yes"];
"Is it a decision?" -> "Is it a command?" [label="no"];
"Is it a command?" -> "Use plaintext" [label="yes"];
"Is it a command?" -> "Is it a warning?" [label="no"];
"Is it a warning?" -> "Use octagon" [label="yes"];
"Is it a warning?" -> "Is it entry/exit?" [label="no"];
"Is it entry/exit?" -> "Use doublecircle" [label="yes"];
"Is it entry/exit?" -> "Is it a state?" [label="no"];
"Is it a state?" -> "Use ellipse" [label="yes"];
"Is it a state?" -> "Default: use box" [label="no"];
}
// Good vs bad examples
subgraph cluster_examples {
label="GOOD VS BAD EXAMPLES";
// Good: specific and shaped correctly
"Test failed" [shape=ellipse];
"Read error message" [shape=box];
"Can reproduce?" [shape=diamond];
"git diff HEAD~1" [shape=plaintext];
"NEVER ignore errors" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
"Test failed" -> "Read error message";
"Read error message" -> "Can reproduce?";
"Can reproduce?" -> "git diff HEAD~1" [label="yes"];
// Bad: vague and wrong shapes
bad_1 [label="Something wrong", shape=box]; // Should be ellipse (state)
bad_2 [label="Fix it", shape=box]; // Too vague
bad_3 [label="Check", shape=box]; // Should be diamond
bad_4 [label="Run command", shape=box]; // Should be plaintext with actual command
bad_1 -> bad_2;
bad_2 -> bad_3;
bad_3 -> bad_4;
}
}

View File

@@ -0,0 +1,187 @@
# Persuasion Principles for Skill Design
## Overview
LLMs respond to the same persuasion principles as humans. Understanding this psychology helps you design more effective skills - not to manipulate, but to ensure critical practices are followed even under pressure.
**Research foundation:** Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
## The Seven Principles
### 1. Authority
**What it is:** Deference to expertise, credentials, or official sources.
**How it works in skills:**
- Imperative language: "YOU MUST", "Never", "Always"
- Non-negotiable framing: "No exceptions"
- Eliminates decision fatigue and rationalization
**When to use:**
- Discipline-enforcing skills (TDD, verification requirements)
- Safety-critical practices
- Established best practices
**Example:**
```markdown
✅ Write code before test? Delete it. Start over. No exceptions.
❌ Consider writing tests first when feasible.
```
### 2. Commitment
**What it is:** Consistency with prior actions, statements, or public declarations.
**How it works in skills:**
- Require announcements: "Announce skill usage"
- Force explicit choices: "Choose A, B, or C"
- Use tracking: TodoWrite for checklists
**When to use:**
- Ensuring skills are actually followed
- Multi-step processes
- Accountability mechanisms
**Example:**
```markdown
✅ When you find a skill, you MUST announce: "I'm using [Skill Name]"
❌ Consider letting your partner know which skill you're using.
```
### 3. Scarcity
**What it is:** Urgency from time limits or limited availability.
**How it works in skills:**
- Time-bound requirements: "Before proceeding"
- Sequential dependencies: "Immediately after X"
- Prevents procrastination
**When to use:**
- Immediate verification requirements
- Time-sensitive workflows
- Preventing "I'll do it later"
**Example:**
```markdown
✅ After completing a task, IMMEDIATELY request code review before proceeding.
❌ You can review code when convenient.
```
### 4. Social Proof
**What it is:** Conformity to what others do or what's considered normal.
**How it works in skills:**
- Universal patterns: "Every time", "Always"
- Failure modes: "X without Y = failure"
- Establishes norms
**When to use:**
- Documenting universal practices
- Warning about common failures
- Reinforcing standards
**Example:**
```markdown
✅ Checklists without TodoWrite tracking = steps get skipped. Every time.
❌ Some people find TodoWrite helpful for checklists.
```
### 5. Unity
**What it is:** Shared identity, "we-ness", in-group belonging.
**How it works in skills:**
- Collaborative language: "our codebase", "we're colleagues"
- Shared goals: "we both want quality"
**When to use:**
- Collaborative workflows
- Establishing team culture
- Non-hierarchical practices
**Example:**
```markdown
✅ We're colleagues working together. I need your honest technical judgment.
❌ You should probably tell me if I'm wrong.
```
### 6. Reciprocity
**What it is:** Obligation to return benefits received.
**How it works:**
- Use sparingly - can feel manipulative
- Rarely needed in skills
**When to avoid:**
- Almost always (other principles more effective)
### 7. Liking
**What it is:** Preference for cooperating with those we like.
**How it works:**
- **DON'T USE for compliance**
- Conflicts with honest feedback culture
- Creates sycophancy
**When to avoid:**
- Always for discipline enforcement
## Principle Combinations by Skill Type
| Skill Type | Use | Avoid |
|------------|-----|-------|
| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
| Guidance/technique | Moderate Authority + Unity | Heavy authority |
| Collaborative | Unity + Commitment | Authority, Liking |
| Reference | Clarity only | All persuasion |
## Why This Works: The Psychology
**Bright-line rules reduce rationalization:**
- "YOU MUST" removes decision fatigue
- Absolute language eliminates "is this an exception?" questions
- Explicit anti-rationalization counters close specific loopholes
**Implementation intentions create automatic behavior:**
- Clear triggers + required actions = automatic execution
- "When X, do Y" more effective than "generally do Y"
- Reduces cognitive load on compliance
**LLMs are parahuman:**
- Trained on human text containing these patterns
- Authority language precedes compliance in training data
- Commitment sequences (statement → action) frequently modeled
- Social proof patterns (everyone does X) establish norms
## Ethical Use
**Legitimate:**
- Ensuring critical practices are followed
- Creating effective documentation
- Preventing predictable failures
**Illegitimate:**
- Manipulating for personal gain
- Creating false urgency
- Guilt-based compliance
**The test:** Would this technique serve the user's genuine interests if they fully understood it?
## Research Citations
**Cialdini, R. B. (2021).** *Influence: The Psychology of Persuasion (New and Expanded).* Harper Business.
- Seven principles of persuasion
- Empirical foundation for influence research
**Meincke, L., Shapiro, D., Duckworth, A. L., Mollick, E., Mollick, L., & Cialdini, R. (2025).** Call Me A Jerk: Persuading AI to Comply with Objectionable Requests. University of Pennsylvania.
- Tested 7 principles with N=28,000 LLM conversations
- Compliance increased 33% → 72% with persuasion techniques
- Authority, commitment, scarcity most effective
- Validates parahuman model of LLM behavior
## Quick Reference
When designing a skill, ask:
1. **What type is it?** (Discipline vs. guidance vs. reference)
2. **What behavior am I trying to change?**
3. **Which principle(s) apply?** (Usually authority + commitment for discipline)
4. **Am I combining too many?** (Don't use all seven)
5. **Is this ethical?** (Serves user's genuine interests?)

View File

@@ -0,0 +1,167 @@
## Testing All Skill Types
Different skill types need different test approaches:
### Discipline-Enforcing Skills (rules/requirements)
**Examples:** TDD, hyperpowers:verification-before-completion, hyperpowers:designing-before-coding
**Test with:**
- Academic questions: Do they understand the rules?
- Pressure scenarios: Do they comply under stress?
- Multiple pressures combined: time + sunk cost + exhaustion
- Identify rationalizations and add explicit counters
**Success criteria:** Agent follows rule under maximum pressure
### Technique Skills (how-to guides)
**Examples:** condition-based-waiting, hyperpowers:root-cause-tracing, defensive-programming
**Test with:**
- Application scenarios: Can they apply the technique correctly?
- Variation scenarios: Do they handle edge cases?
- Missing information tests: Do instructions have gaps?
**Success criteria:** Agent successfully applies technique to new scenario
### Pattern Skills (mental models)
**Examples:** reducing-complexity, information-hiding concepts
**Test with:**
- Recognition scenarios: Do they recognize when pattern applies?
- Application scenarios: Can they use the mental model?
- Counter-examples: Do they know when NOT to apply?
**Success criteria:** Agent correctly identifies when/how to apply pattern
### Reference Skills (documentation/APIs)
**Examples:** API documentation, command references, library guides
**Test with:**
- Retrieval scenarios: Can they find the right information?
- Application scenarios: Can they use what they found correctly?
- Gap testing: Are common use cases covered?
**Success criteria:** Agent finds and correctly applies reference information
## Common Rationalizations for Skipping Testing
| Excuse | Reality |
|--------|---------|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
**All of these mean: Test before deploying. No exceptions.**
## Bulletproofing Skills Against Rationalization
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
### Close Every Loophole Explicitly
Don't just state the rule - forbid specific workarounds:
<Bad>
```markdown
Write code before test? Delete it.
```
</Bad>
<Good>
```markdown
Write code before test? Delete it. Start over.
**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
```
</Good>
### Address "Spirit vs Letter" Arguments
Add foundational principle early:
```markdown
**Violating the letter of the rules is violating the spirit of the rules.**
```
This cuts off entire class of "I'm following the spirit" rationalizations.
### Build Rationalization Table
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
```markdown
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
```
### Create Red Flags List
Make it easy for agents to self-check when rationalizing:
```markdown
## Red Flags - STOP and Start Over
- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
```
### Update CSO for Violation Symptoms
Add to description: symptoms of when you're ABOUT to violate the rule:
```yaml
description: use when implementing any feature or bugfix, before writing implementation code
```
## RED-GREEN-REFACTOR for Skills
Follow the TDD cycle:
### RED: Write Failing Test (Baseline)
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
- What choices did they make?
- What rationalizations did they use (verbatim)?
- Which pressures triggered violations?
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
### GREEN: Write Minimal Skill
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill. Agent should now comply.
### REFACTOR: Close Loopholes
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
**REQUIRED SUB-SKILL:** Use superpowers:testing-skills-with-subagents for the complete testing methodology:
- How to write pressure scenarios
- Pressure types (time, sunk cost, authority, exhaustion)
- Plugging holes systematically
- Meta-testing techniques