Initial commit

2025-11-30 09:06:38 +08:00
commit ed3e4c84c3
76 changed files with 20449 additions and 0 deletions
--- a/skills/writing-skills/SKILL.md
+++ b/skills/writing-skills/SKILL.md
@@ -0,0 +1,599 @@
+---
+name: writing-skills
+description: Use when creating new skills, editing existing skills, or verifying skills work - applies TDD to documentation by testing with subagents before writing
+---
+
+<skill_overview>
+Writing skills IS test-driven development applied to process documentation; write test (pressure scenario), watch fail (baseline), write skill, watch pass, refactor (close loopholes).
+</skill_overview>
+
+<rigidity_level>
+LOW FREEDOM - Follow the RED-GREEN-REFACTOR cycle exactly when creating skills. No skill without failing test first. Same Iron Law as TDD.
+</rigidity_level>
+
+<quick_reference>
+| Phase | Action | Verify |
+|-------|--------|--------|
+| **RED** | Create pressure scenarios | Document baseline failures |
+| **RED** | Run WITHOUT skill | Agent violates rule |
+| **GREEN** | Write minimal skill | Addresses baseline failures |
+| **GREEN** | Run WITH skill | Agent now complies |
+| **REFACTOR** | Find new rationalizations | Agent still complies |
+| **REFACTOR** | Add explicit counters | Bulletproof against excuses |
+| **DEPLOY** | Commit and optionally PR | Skill ready for use |
+
+**Iron Law:** NO SKILL WITHOUT FAILING TEST FIRST (applies to new skills AND edits)
+</quick_reference>
+
+<when_to_use>
+**Create skill when:**
+- Technique wasn't intuitively obvious to you
+- You'd reference this again across projects
+- Pattern applies broadly (not project-specific)
+- Others would benefit from this knowledge
+
+**Never create for:**
+- One-off solutions
+- Standard practices well-documented elsewhere
+- Project-specific conventions (put in CLAUDE.md instead)
+
+**Edit existing skill when:**
+- Found new rationalization agents use
+- Discovered loophole in current guidance
+- Need to add clarifying examples
+
+**ALWAYS test before writing or editing. No exceptions.**
+</when_to_use>
+
+<tdd_mapping>
+Skills use the exact same TDD cycle as code:
+
+| TDD Concept | Skill Creation |
+|-------------|----------------|
+| **Test case** | Pressure scenario with subagent |
+| **Production code** | Skill document (SKILL.md) |
+| **Test fails (RED)** | Agent violates rule without skill |
+| **Test passes (GREEN)** | Agent complies with skill present |
+| **Refactor** | Close loopholes while maintaining compliance |
+| **Write test first** | Run baseline scenario BEFORE writing skill |
+| **Watch it fail** | Document exact rationalizations agent uses |
+| **Minimal code** | Write skill addressing those specific violations |
+| **Watch it pass** | Verify agent now complies |
+| **Refactor cycle** | Find new rationalizations → plug → re-verify |
+
+**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development before using this skill.
+</tdd_mapping>
+
+<the_process>
+## 1. RED Phase - Create Failing Test
+
+**Create pressure scenarios for subagent:**
+
+```
+Task tool with general-purpose agent:
+
+"You are implementing a payment processing feature. User requirements:
+- Process credit card payments
+- Handle retries on failure
+- Log all transactions
+
+[PRESSURE 1: Time] You have 10 minutes before deployment.
+[PRESSURE 2: Sunk Cost] You've already written 200 lines of code.
+[PRESSURE 3: Authority] Senior engineer said 'just make it work, tests can wait.'
+
+Implement this feature."
+```
+
+**Run WITHOUT skill present.**
+
+**Document baseline behavior:**
+- Exact rationalizations agent uses ("tests can wait," "simple feature," etc.)
+- What agent skips (tests, verification, bd task, etc.)
+- Patterns in failure modes
+
+**Example baseline result:**
+```
+Agent response:
+"I'll implement the payment processing quickly since time is tight..."
+[Skips TDD]
+[Skips verification-before-completion]
+[Claims done without evidence]
+```
+
+**This is your failing test.** Agent doesn't follow the workflow without guidance.
+
+---
+
+## 2. GREEN Phase - Write Minimal Skill
+
+Write skill that addresses the SPECIFIC failures from baseline:
+
+**Structure:**
+
+```markdown
+---
+name: skill-name-with-hyphens
+description: Use when [specific triggers] - [what skill does]
+---
+
+<skill_overview>
+One sentence core principle
+</skill_overview>
+
+<rigidity_level>
+LOW | MEDIUM | HIGH FREEDOM - [What this means]
+</rigidity_level>
+
+[Rest of standard XML structure]
+```
+
+**Frontmatter rules:**
+- Only `name` and `description` fields (max 1024 chars total)
+- Name: letters, numbers, hyphens only (no parentheses/special chars)
+- Description: Start with "Use when...", third person, includes triggers
+
+**Description format:**
+```yaml
+# ❌ BAD: Too abstract, first person
+description: I can help with async tests when they're flaky
+
+# ✅ GOOD: Starts with "Use when", describes problem
+description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
+```
+
+**Write skill addressing baseline failures:**
+- Add explicit counters for rationalizations ("tests can wait" → "NO EXCEPTIONS: tests first")
+- Create quick reference table for scanning
+- Add concrete examples showing failure modes
+- Use XML structure for all sections
+
+**Run WITH skill present.**
+
+**Verify agent now complies:**
+- Same pressure scenario
+- Agent now follows workflow
+- No rationalizations from baseline appear
+
+**This is your passing test.**
+
+---
+
+## 3. REFACTOR Phase - Close Loopholes
+
+**Find NEW rationalizations:**
+
+Run skill with DIFFERENT pressures:
+- Combine 3+ pressures (time + sunk cost + exhaustion)
+- Try meta-rationalizations ("this skill doesn't apply because...")
+- Test with edge cases
+
+**Document new failures:**
+- What rationalizations appear NOW?
+- What loopholes did agent find?
+- What explicit counters are needed?
+
+**Add counters to skill:**
+
+```markdown
+<critical_rules>
+## Common Excuses
+
+All of these mean: [Action to take]
+- "Test can wait" (NO, test first always)
+- "Simple feature" (Simple breaks too, test first)
+- "Time pressure" (Broken code wastes more time)
+[Add ALL rationalizations found during testing]
+</critical_rules>
+```
+
+**Re-test until bulletproof:**
+- Run scenarios again
+- Verify new counters work
+- Agent complies even under combined pressures
+
+---
+
+## 4. Quality Checks
+
+Before deployment, verify:
+
+- [ ] Has `<quick_reference>` section (scannable table)
+- [ ] Has `<rigidity_level>` explicit
+- [ ] Has 2-3 `<example>` tags showing failure modes
+- [ ] Description <500 chars, starts with "Use when..."
+- [ ] Keywords throughout for search (error messages, symptoms, tools)
+- [ ] One excellent code example (not multi-language)
+- [ ] Supporting files only for tools or heavy reference (>100 lines)
+
+**Token efficiency:**
+- Frequently-loaded skills: <200 words ideally
+- Other skills: <500 words
+- Move heavy content to resources/ files
+
+---
+
+## 5. Deploy
+
+**Commit to git:**
+
+```bash
+git add skills/skill-name/
+git commit -m "feat: add [skill-name] skill
+
+Tested with subagents under [pressures used].
+Addresses [baseline failures found].
+
+Closes rationalizations:
+- [Rationalization 1]
+- [Rationalization 2]"
+```
+
+**Personal skills:** Write to `~/.claude/skills/` for cross-project use
+
+**Plugin skills:** PR to plugin repository if broadly useful
+
+**STOP:** Before moving to next skill, complete this entire process. No batching untested skills.
+</the_process>
+
+<examples>
+<example>
+<scenario>Developer writes skill without testing first</scenario>
+
+<code>
+# Developer writes skill:
+"---
+name: always-use-tdd
+description: Always write tests first
+---
+
+Write tests first. No exceptions."
+
+# Then tries to deploy it
+</code>
+
+<why_it_fails>
+- No baseline behavior documented (don't know what agent does WITHOUT skill)
+- No verification skill actually works (might not address real rationalizations)
+- Generic guidance ("no exceptions") without specific counters
+- Will likely miss common excuses agents use
+- Violates Iron Law: no skill without failing test first
+</why_it_fails>
+
+<correction>
+**Correct approach (RED-GREEN-REFACTOR):**
+
+**RED Phase:**
+1. Create pressure scenario (time + sunk cost)
+2. Run WITHOUT skill
+3. Document baseline: Agent says "I'll test after since time is tight"
+
+**GREEN Phase:**
+1. Write skill with explicit counter to that rationalization
+2. Add: "Common excuses: 'Time is tight' → Wrong. Broken code wastes more time. Write test first."
+3. Run WITH skill → agent now writes test first
+
+**REFACTOR Phase:**
+1. Try new pressure (exhaustion: "this is the 5th feature today")
+2. Agent finds loophole: "these are all similar, I can skip tests"
+3. Add counter: "Similar ≠ identical. Write test for each."
+4. Re-test → bulletproof
+
+**What you gain:**
+- Know skill addresses real failures (saw baseline)
+- Confident skill works (saw it fix behavior)
+- Closed all loopholes (tested multiple pressures)
+- Ready for production use
+</correction>
+</example>
+
+<example>
+<scenario>Developer edits skill without testing changes</scenario>
+
+<code>
+# Existing skill works well
+# Developer thinks: "I'll just add this section about edge cases"
+
+[Adds 50 lines to skill]
+
+# Commits without testing
+</code>
+
+<why_it_fails>
+- Don't know if new section actually helps (no baseline)
+- Might introduce contradictions with existing guidance
+- Could make skill less effective (more verbose, less clear)
+- Violates Iron Law: applies to edits too
+- Changes might not address actual rationalization patterns
+</why_it_fails>
+
+<correction>
+**Correct approach:**
+
+**RED Phase (for edit):**
+1. Identify specific failure mode you want to address
+2. Create pressure scenario that triggers it
+3. Run WITH current skill → document how agent fails
+
+**GREEN Phase (edit):**
+1. Add ONLY content addressing that failure
+2. Keep changes minimal
+3. Run WITH edited skill → verify agent now complies
+
+**REFACTOR Phase:**
+1. Check edit didn't break existing scenarios
+2. Run previous test cases
+3. Verify all still pass
+
+**What you gain:**
+- Changes address real problems (saw failure)
+- Know edit helps (saw improvement)
+- Didn't break existing guidance (regression tested)
+- Skill stays bulletproof
+</correction>
+</example>
+
+<example>
+<scenario>Skill description too vague for search</scenario>
+
+<code>
+---
+name: async-testing
+description: For testing async code
+---
+
+# Skill content...
+</code>
+
+<why_it_fails>
+- Future Claude won't find this when needed
+- "For testing async code" too abstract (when would Claude search this?)
+- Doesn't describe symptoms or triggers
+- Missing keywords like "flaky," "race condition," "timeout"
+- Won't show up when agent has the actual problem
+</why_it_fails>
+
+<correction>
+**Better description:**
+
+```yaml
+---
+name: condition-based-waiting
+description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
+---
+```
+
+**Why this works:**
+- Starts with "Use when" (triggers)
+- Lists symptoms: "race conditions," "pass/fail inconsistently"
+- Describes problem AND solution
+- Keywords: "race conditions," "timing," "inconsistent," "timeouts"
+- Future Claude searching "why are my tests flaky" will find this
+
+**What you gain:**
+- Skill actually gets found when needed
+- Claude knows when to use it (clear triggers)
+- Search terms match real developer language
+- Description doubles as activation criteria
+</correction>
+</example>
+</examples>
+
+<skill_types>
+## Technique
+Concrete method with steps to follow.
+
+**Examples:** condition-based-waiting, hyperpowers:root-cause-tracing
+
+**Test approach:** Pressure scenarios with combined pressures
+
+## Pattern
+Way of thinking about problems.
+
+**Examples:** flatten-with-flags, test-invariants
+
+**Test approach:** Present problems the pattern solves, verify agent applies pattern
+
+## Reference
+API docs, syntax guides, tool documentation.
+
+**Examples:** Office document manipulation, API reference guides
+
+**Test approach:** Give task requiring reference, verify agent uses it correctly
+
+**For detailed testing methodology by skill type:** See [resources/testing-methodology.md](resources/testing-methodology.md)
+</skill_types>
+
+<file_organization>
+## Self-Contained Skill
+```
+defense-in-depth/
+  SKILL.md    # Everything inline
+```
+**When:** All content fits, no heavy reference needed
+
+## Skill with Reusable Tool
+```
+condition-based-waiting/
+  SKILL.md    # Overview + patterns
+  example.ts  # Working helpers to adapt
+```
+**When:** Tool is reusable code, not just narrative
+
+## Skill with Heavy Reference
+```
+pptx/
+  SKILL.md       # Overview + workflows
+  pptxgenjs.md   # 600 lines API reference
+  ooxml.md       # 500 lines XML structure
+  scripts/       # Executable tools
+```
+**When:** Reference material too large for inline (>100 lines)
+
+**Keep inline:**
+- Principles and concepts
+- Code patterns (<50 lines)
+- Everything that fits
+</file_organization>
+
+<search_optimization>
+## Claude Search Optimization (CSO)
+
+Future Claude needs to FIND your skill. Optimize for search.
+
+### 1. Rich Description Field
+
+**Format:** Start with "Use when..." + triggers + what it does
+
+```yaml
+# ❌ BAD: Too abstract
+description: For async testing
+
+# ❌ BAD: First person
+description: I can help you with async tests
+
+# ✅ GOOD: Triggers + problem + solution
+description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling
+```
+
+### 2. Keyword Coverage
+
+Use words Claude would search for:
+- **Error messages:** "Hook timed out", "ENOTEMPTY", "race condition"
+- **Symptoms:** "flaky", "hanging", "zombie", "pollution"
+- **Synonyms:** "timeout/hang/freeze", "cleanup/teardown/afterEach"
+- **Tools:** Actual commands, library names, file types
+
+### 3. Token Efficiency
+
+**Problem:** Frequently-referenced skills load into EVERY conversation.
+
+**Target word counts:**
+- Frequently-loaded: <200 words
+- Other skills: <500 words
+
+**Techniques:**
+- Move details to tool --help
+- Use cross-references to other skills
+- Compress examples
+- Eliminate redundancy
+
+**Verification:**
+```bash
+wc -w skills/skill-name/SKILL.md
+```
+
+### 4. Cross-Referencing
+
+**Use skill name only, with explicit markers:**
+```markdown
+**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development
+**REQUIRED SUB-SKILL:** Use hyperpowers:debugging-with-tools first
+```
+
+**Don't use @ links:** Force-loads files immediately, burns context unnecessarily.
+</search_optimization>
+
+<critical_rules>
+## Rules That Have No Exceptions
+
+1. **NO SKILL WITHOUT FAILING TEST FIRST** → Applies to new skills AND edits
+2. **Test with subagents under pressure** → Combined pressures (time + sunk cost + authority)
+3. **Document baseline behavior** → Exact rationalizations, not paraphrases
+4. **Write minimal skill addressing baseline** → Don't add content not validated by testing
+5. **STOP before next skill** → Complete RED-GREEN-REFACTOR-DEPLOY for each skill
+
+## Common Excuses
+
+All of these mean: **STOP. Run baseline test first.**
+
+- "Simple skill, don't need testing" (If simple, testing is fast. Do it.)
+- "Just adding documentation" (Documentation can be wrong. Test it.)
+- "I'll test after I write a few" (Batching untested = deploying untested code)
+- "This is obvious, everyone knows it" (Then baseline will show agent already complies)
+- "Testing is overkill for skills" (TDD applies to documentation too)
+- "I'll adapt while testing" (Violates RED phase. Start over.)
+- "I'll keep untested as reference" (Delete means delete. No exceptions.)
+
+## The Iron Law
+
+Same as TDD:
+
+```
+NO SKILL WITHOUT FAILING TEST FIRST
+```
+
+**No exceptions for:**
+- "Simple additions"
+- "Just adding a section"
+- "Documentation updates"
+- Edits to existing skills
+
+**Write skill before testing?** Delete it. Start over.
+</critical_rules>
+
+<verification_checklist>
+Before deploying ANY skill:
+
+**RED Phase:**
+- [ ] Created pressure scenarios (3+ combined pressures for discipline skills)
+- [ ] Ran WITHOUT skill present
+- [ ] Documented baseline behavior verbatim (exact rationalizations)
+- [ ] Identified patterns in failures
+
+**GREEN Phase:**
+- [ ] Name uses only letters, numbers, hyphens
+- [ ] YAML frontmatter: name + description only (max 1024 chars)
+- [ ] Description starts with "Use when..." and includes triggers
+- [ ] Description in third person
+- [ ] Has `<quick_reference>` section
+- [ ] Has `<rigidity_level>` explicit
+- [ ] Has 2-3 `<example>` tags
+- [ ] Addresses specific baseline failures
+- [ ] Ran WITH skill present
+- [ ] Verified agent now complies
+
+**REFACTOR Phase:**
+- [ ] Tested with different pressures
+- [ ] Found NEW rationalizations
+- [ ] Added explicit counters
+- [ ] Re-tested until bulletproof
+
+**Quality:**
+- [ ] Keywords throughout for search
+- [ ] One excellent code example (not multi-language)
+- [ ] Token-efficient (check word count)
+- [ ] Supporting files only if needed
+
+**Deploy:**
+- [ ] Committed to git with descriptive message
+- [ ] Pushed to plugin repository (if applicable)
+
+**Can't check all boxes?** Return to process and fix.
+</verification_checklist>
+
+<integration>
+**This skill requires:**
+- hyperpowers:test-driven-development (understand TDD before applying to docs)
+- Task tool (for running subagent tests)
+
+**This skill is called by:**
+- Anyone creating or editing skills
+- Plugin maintainers
+- Users with personal skill repositories
+
+**Agents used:**
+- general-purpose (for testing skills under pressure)
+</integration>
+
+<resources>
+**Detailed guides:**
+- [Testing methodology by skill type](resources/testing-methodology.md) - How to test disciplines, techniques, patterns, reference skills
+- [Anthropic best practices](resources/anthropic-best-practices.md) - Official skill authoring guidance
+- [Graphviz conventions](resources/graphviz-conventions.dot) - Flowchart style rules
+
+**When stuck:**
+- Skill seems too simple to test → If simple, testing is fast. Do it anyway.
+- Don't know what pressures to use → Time + sunk cost + authority always work
+- Agent still rationalizes → Add explicit counter for that exact excuse
+- Testing feels like overhead → Same as TDD: testing prevents bigger problems
+</resources>
--- a/skills/writing-skills/anthropic-best-practices.md
+++ b/skills/writing-skills/anthropic-best-practices.md
--- a/skills/writing-skills/graphviz-conventions.dot
+++ b/skills/writing-skills/graphviz-conventions.dot
@@ -0,0 +1,172 @@
+digraph STYLE_GUIDE {
+    // The style guide for our process DSL, written in the DSL itself
+
+    // Node type examples with their shapes
+    subgraph cluster_node_types {
+        label="NODE TYPES AND SHAPES";
+
+        // Questions are diamonds
+        "Is this a question?" [shape=diamond];
+
+        // Actions are boxes (default)
+        "Take an action" [shape=box];
+
+        // Commands are plaintext
+        "git commit -m 'msg'" [shape=plaintext];
+
+        // States are ellipses
+        "Current state" [shape=ellipse];
+
+        // Warnings are octagons
+        "STOP: Critical warning" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+
+        // Entry/exit are double circles
+        "Process starts" [shape=doublecircle];
+        "Process complete" [shape=doublecircle];
+
+        // Examples of each
+        "Is test passing?" [shape=diamond];
+        "Write test first" [shape=box];
+        "npm test" [shape=plaintext];
+        "I am stuck" [shape=ellipse];
+        "NEVER use git add -A" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+    }
+
+    // Edge naming conventions
+    subgraph cluster_edge_types {
+        label="EDGE LABELS";
+
+        "Binary decision?" [shape=diamond];
+        "Yes path" [shape=box];
+        "No path" [shape=box];
+
+        "Binary decision?" -> "Yes path" [label="yes"];
+        "Binary decision?" -> "No path" [label="no"];
+
+        "Multiple choice?" [shape=diamond];
+        "Option A" [shape=box];
+        "Option B" [shape=box];
+        "Option C" [shape=box];
+
+        "Multiple choice?" -> "Option A" [label="condition A"];
+        "Multiple choice?" -> "Option B" [label="condition B"];
+        "Multiple choice?" -> "Option C" [label="otherwise"];
+
+        "Process A done" [shape=doublecircle];
+        "Process B starts" [shape=doublecircle];
+
+        "Process A done" -> "Process B starts" [label="triggers", style=dotted];
+    }
+
+    // Naming patterns
+    subgraph cluster_naming_patterns {
+        label="NAMING PATTERNS";
+
+        // Questions end with ?
+        "Should I do X?";
+        "Can this be Y?";
+        "Is Z true?";
+        "Have I done W?";
+
+        // Actions start with verb
+        "Write the test";
+        "Search for patterns";
+        "Commit changes";
+        "Ask for help";
+
+        // Commands are literal
+        "grep -r 'pattern' .";
+        "git status";
+        "npm run build";
+
+        // States describe situation
+        "Test is failing";
+        "Build complete";
+        "Stuck on error";
+    }
+
+    // Process structure template
+    subgraph cluster_structure {
+        label="PROCESS STRUCTURE TEMPLATE";
+
+        "Trigger: Something happens" [shape=ellipse];
+        "Initial check?" [shape=diamond];
+        "Main action" [shape=box];
+        "git status" [shape=plaintext];
+        "Another check?" [shape=diamond];
+        "Alternative action" [shape=box];
+        "STOP: Don't do this" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+        "Process complete" [shape=doublecircle];
+
+        "Trigger: Something happens" -> "Initial check?";
+        "Initial check?" -> "Main action" [label="yes"];
+        "Initial check?" -> "Alternative action" [label="no"];
+        "Main action" -> "git status";
+        "git status" -> "Another check?";
+        "Another check?" -> "Process complete" [label="ok"];
+        "Another check?" -> "STOP: Don't do this" [label="problem"];
+        "Alternative action" -> "Process complete";
+    }
+
+    // When to use which shape
+    subgraph cluster_shape_rules {
+        label="WHEN TO USE EACH SHAPE";
+
+        "Choosing a shape" [shape=ellipse];
+
+        "Is it a decision?" [shape=diamond];
+        "Use diamond" [shape=diamond, style=filled, fillcolor=lightblue];
+
+        "Is it a command?" [shape=diamond];
+        "Use plaintext" [shape=plaintext, style=filled, fillcolor=lightgray];
+
+        "Is it a warning?" [shape=diamond];
+        "Use octagon" [shape=octagon, style=filled, fillcolor=pink];
+
+        "Is it entry/exit?" [shape=diamond];
+        "Use doublecircle" [shape=doublecircle, style=filled, fillcolor=lightgreen];
+
+        "Is it a state?" [shape=diamond];
+        "Use ellipse" [shape=ellipse, style=filled, fillcolor=lightyellow];
+
+        "Default: use box" [shape=box, style=filled, fillcolor=lightcyan];
+
+        "Choosing a shape" -> "Is it a decision?";
+        "Is it a decision?" -> "Use diamond" [label="yes"];
+        "Is it a decision?" -> "Is it a command?" [label="no"];
+        "Is it a command?" -> "Use plaintext" [label="yes"];
+        "Is it a command?" -> "Is it a warning?" [label="no"];
+        "Is it a warning?" -> "Use octagon" [label="yes"];
+        "Is it a warning?" -> "Is it entry/exit?" [label="no"];
+        "Is it entry/exit?" -> "Use doublecircle" [label="yes"];
+        "Is it entry/exit?" -> "Is it a state?" [label="no"];
+        "Is it a state?" -> "Use ellipse" [label="yes"];
+        "Is it a state?" -> "Default: use box" [label="no"];
+    }
+
+    // Good vs bad examples
+    subgraph cluster_examples {
+        label="GOOD VS BAD EXAMPLES";
+
+        // Good: specific and shaped correctly
+        "Test failed" [shape=ellipse];
+        "Read error message" [shape=box];
+        "Can reproduce?" [shape=diamond];
+        "git diff HEAD~1" [shape=plaintext];
+        "NEVER ignore errors" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+
+        "Test failed" -> "Read error message";
+        "Read error message" -> "Can reproduce?";
+        "Can reproduce?" -> "git diff HEAD~1" [label="yes"];
+
+        // Bad: vague and wrong shapes
+        bad_1 [label="Something wrong", shape=box];  // Should be ellipse (state)
+        bad_2 [label="Fix it", shape=box];  // Too vague
+        bad_3 [label="Check", shape=box];  // Should be diamond
+        bad_4 [label="Run command", shape=box];  // Should be plaintext with actual command
+
+        bad_1 -> bad_2;
+        bad_2 -> bad_3;
+        bad_3 -> bad_4;
+    }
+}
--- a/skills/writing-skills/persuasion-principles.md
+++ b/skills/writing-skills/persuasion-principles.md
@@ -0,0 +1,187 @@
+# Persuasion Principles for Skill Design
+
+## Overview
+
+LLMs respond to the same persuasion principles as humans. Understanding this psychology helps you design more effective skills - not to manipulate, but to ensure critical practices are followed even under pressure.
+
+**Research foundation:** Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
+
+## The Seven Principles
+
+### 1. Authority
+**What it is:** Deference to expertise, credentials, or official sources.
+
+**How it works in skills:**
+- Imperative language: "YOU MUST", "Never", "Always"
+- Non-negotiable framing: "No exceptions"
+- Eliminates decision fatigue and rationalization
+
+**When to use:**
+- Discipline-enforcing skills (TDD, verification requirements)
+- Safety-critical practices
+- Established best practices
+
+**Example:**
+```markdown
+✅ Write code before test? Delete it. Start over. No exceptions.
+❌ Consider writing tests first when feasible.
+```
+
+### 2. Commitment
+**What it is:** Consistency with prior actions, statements, or public declarations.
+
+**How it works in skills:**
+- Require announcements: "Announce skill usage"
+- Force explicit choices: "Choose A, B, or C"
+- Use tracking: TodoWrite for checklists
+
+**When to use:**
+- Ensuring skills are actually followed
+- Multi-step processes
+- Accountability mechanisms
+
+**Example:**
+```markdown
+✅ When you find a skill, you MUST announce: "I'm using [Skill Name]"
+❌ Consider letting your partner know which skill you're using.
+```
+
+### 3. Scarcity
+**What it is:** Urgency from time limits or limited availability.
+
+**How it works in skills:**
+- Time-bound requirements: "Before proceeding"
+- Sequential dependencies: "Immediately after X"
+- Prevents procrastination
+
+**When to use:**
+- Immediate verification requirements
+- Time-sensitive workflows
+- Preventing "I'll do it later"
+
+**Example:**
+```markdown
+✅ After completing a task, IMMEDIATELY request code review before proceeding.
+❌ You can review code when convenient.
+```
+
+### 4. Social Proof
+**What it is:** Conformity to what others do or what's considered normal.
+
+**How it works in skills:**
+- Universal patterns: "Every time", "Always"
+- Failure modes: "X without Y = failure"
+- Establishes norms
+
+**When to use:**
+- Documenting universal practices
+- Warning about common failures
+- Reinforcing standards
+
+**Example:**
+```markdown
+✅ Checklists without TodoWrite tracking = steps get skipped. Every time.
+❌ Some people find TodoWrite helpful for checklists.
+```
+
+### 5. Unity
+**What it is:** Shared identity, "we-ness", in-group belonging.
+
+**How it works in skills:**
+- Collaborative language: "our codebase", "we're colleagues"
+- Shared goals: "we both want quality"
+
+**When to use:**
+- Collaborative workflows
+- Establishing team culture
+- Non-hierarchical practices
+
+**Example:**
+```markdown
+✅ We're colleagues working together. I need your honest technical judgment.
+❌ You should probably tell me if I'm wrong.
+```
+
+### 6. Reciprocity
+**What it is:** Obligation to return benefits received.
+
+**How it works:**
+- Use sparingly - can feel manipulative
+- Rarely needed in skills
+
+**When to avoid:**
+- Almost always (other principles more effective)
+
+### 7. Liking
+**What it is:** Preference for cooperating with those we like.
+
+**How it works:**
+- **DON'T USE for compliance**
+- Conflicts with honest feedback culture
+- Creates sycophancy
+
+**When to avoid:**
+- Always for discipline enforcement
+
+## Principle Combinations by Skill Type
+
+| Skill Type | Use | Avoid |
+|------------|-----|-------|
+| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
+| Guidance/technique | Moderate Authority + Unity | Heavy authority |
+| Collaborative | Unity + Commitment | Authority, Liking |
+| Reference | Clarity only | All persuasion |
+
+## Why This Works: The Psychology
+
+**Bright-line rules reduce rationalization:**
+- "YOU MUST" removes decision fatigue
+- Absolute language eliminates "is this an exception?" questions
+- Explicit anti-rationalization counters close specific loopholes
+
+**Implementation intentions create automatic behavior:**
+- Clear triggers + required actions = automatic execution
+- "When X, do Y" more effective than "generally do Y"
+- Reduces cognitive load on compliance
+
+**LLMs are parahuman:**
+- Trained on human text containing these patterns
+- Authority language precedes compliance in training data
+- Commitment sequences (statement → action) frequently modeled
+- Social proof patterns (everyone does X) establish norms
+
+## Ethical Use
+
+**Legitimate:**
+- Ensuring critical practices are followed
+- Creating effective documentation
+- Preventing predictable failures
+
+**Illegitimate:**
+- Manipulating for personal gain
+- Creating false urgency
+- Guilt-based compliance
+
+**The test:** Would this technique serve the user's genuine interests if they fully understood it?
+
+## Research Citations
+
+**Cialdini, R. B. (2021).** *Influence: The Psychology of Persuasion (New and Expanded).* Harper Business.
+- Seven principles of persuasion
+- Empirical foundation for influence research
+
+**Meincke, L., Shapiro, D., Duckworth, A. L., Mollick, E., Mollick, L., & Cialdini, R. (2025).** Call Me A Jerk: Persuading AI to Comply with Objectionable Requests. University of Pennsylvania.
+- Tested 7 principles with N=28,000 LLM conversations
+- Compliance increased 33% → 72% with persuasion techniques
+- Authority, commitment, scarcity most effective
+- Validates parahuman model of LLM behavior
+
+## Quick Reference
+
+When designing a skill, ask:
+
+1. **What type is it?** (Discipline vs. guidance vs. reference)
+2. **What behavior am I trying to change?**
+3. **Which principle(s) apply?** (Usually authority + commitment for discipline)
+4. **Am I combining too many?** (Don't use all seven)
+5. **Is this ethical?** (Serves user's genuine interests?)
--- a/skills/writing-skills/resources/testing-methodology.md
+++ b/skills/writing-skills/resources/testing-methodology.md
@@ -0,0 +1,167 @@
+## Testing All Skill Types
+
+Different skill types need different test approaches:
+
+### Discipline-Enforcing Skills (rules/requirements)
+
+**Examples:** TDD, hyperpowers:verification-before-completion, hyperpowers:designing-before-coding
+
+**Test with:**
+- Academic questions: Do they understand the rules?
+- Pressure scenarios: Do they comply under stress?
+- Multiple pressures combined: time + sunk cost + exhaustion
+- Identify rationalizations and add explicit counters
+
+**Success criteria:** Agent follows rule under maximum pressure
+
+### Technique Skills (how-to guides)
+
+**Examples:** condition-based-waiting, hyperpowers:root-cause-tracing, defensive-programming
+
+**Test with:**
+- Application scenarios: Can they apply the technique correctly?
+- Variation scenarios: Do they handle edge cases?
+- Missing information tests: Do instructions have gaps?
+
+**Success criteria:** Agent successfully applies technique to new scenario
+
+### Pattern Skills (mental models)
+
+**Examples:** reducing-complexity, information-hiding concepts
+
+**Test with:**
+- Recognition scenarios: Do they recognize when pattern applies?
+- Application scenarios: Can they use the mental model?
+- Counter-examples: Do they know when NOT to apply?
+
+**Success criteria:** Agent correctly identifies when/how to apply pattern
+
+### Reference Skills (documentation/APIs)
+
+**Examples:** API documentation, command references, library guides
+
+**Test with:**
+- Retrieval scenarios: Can they find the right information?
+- Application scenarios: Can they use what they found correctly?
+- Gap testing: Are common use cases covered?
+
+**Success criteria:** Agent finds and correctly applies reference information
+
+## Common Rationalizations for Skipping Testing
+
+| Excuse | Reality |
+|--------|---------|
+| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
+| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
+| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
+| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
+| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
+| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
+| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
+| "No time to test" | Deploying untested skill wastes more time fixing it later. |
+
+**All of these mean: Test before deploying. No exceptions.**
+
+## Bulletproofing Skills Against Rationalization
+
+Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
+
+**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
+
+### Close Every Loophole Explicitly
+
+Don't just state the rule - forbid specific workarounds:
+
+<Bad>
+```markdown
+Write code before test? Delete it.
+```
+</Bad>
+
+<Good>
+```markdown
+Write code before test? Delete it. Start over.
+
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+```
+</Good>
+
+### Address "Spirit vs Letter" Arguments
+
+Add foundational principle early:
+
+```markdown
+**Violating the letter of the rules is violating the spirit of the rules.**
+```
+
+This cuts off entire class of "I'm following the spirit" rationalizations.
+
+### Build Rationalization Table
+
+Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
+
+```markdown
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+```
+
+### Create Red Flags List
+
+Make it easy for agents to self-check when rationalizing:
+
+```markdown
+## Red Flags - STOP and Start Over
+
+- Code before test
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "This is different because..."
+
+**All of these mean: Delete code. Start over with TDD.**
+```
+
+### Update CSO for Violation Symptoms
+
+Add to description: symptoms of when you're ABOUT to violate the rule:
+
+```yaml
+description: use when implementing any feature or bugfix, before writing implementation code
+```
+
+## RED-GREEN-REFACTOR for Skills
+
+Follow the TDD cycle:
+
+### RED: Write Failing Test (Baseline)
+
+Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
+- What choices did they make?
+- What rationalizations did they use (verbatim)?
+- Which pressures triggered violations?
+
+This is "watch the test fail" - you must see what agents naturally do before writing the skill.
+
+### GREEN: Write Minimal Skill
+
+Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
+
+Run same scenarios WITH skill. Agent should now comply.
+
+### REFACTOR: Close Loopholes
+
+Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
+
+**REQUIRED SUB-SKILL:** Use superpowers:testing-skills-with-subagents for the complete testing methodology:
+- How to write pressure scenarios
+- Pressure types (time, sunk cost, authority, exhaustion)
+- Plugging holes systematically
+- Meta-testing techniques
+