Initial commit
This commit is contained in:
167
skills/writing-skills/resources/testing-methodology.md
Normal file
167
skills/writing-skills/resources/testing-methodology.md
Normal file
@@ -0,0 +1,167 @@
|
||||
## Testing All Skill Types
|
||||
|
||||
Different skill types need different test approaches:
|
||||
|
||||
### Discipline-Enforcing Skills (rules/requirements)
|
||||
|
||||
**Examples:** TDD, hyperpowers:verification-before-completion, hyperpowers:designing-before-coding
|
||||
|
||||
**Test with:**
|
||||
- Academic questions: Do they understand the rules?
|
||||
- Pressure scenarios: Do they comply under stress?
|
||||
- Multiple pressures combined: time + sunk cost + exhaustion
|
||||
- Identify rationalizations and add explicit counters
|
||||
|
||||
**Success criteria:** Agent follows rule under maximum pressure
|
||||
|
||||
### Technique Skills (how-to guides)
|
||||
|
||||
**Examples:** condition-based-waiting, hyperpowers:root-cause-tracing, defensive-programming
|
||||
|
||||
**Test with:**
|
||||
- Application scenarios: Can they apply the technique correctly?
|
||||
- Variation scenarios: Do they handle edge cases?
|
||||
- Missing information tests: Do instructions have gaps?
|
||||
|
||||
**Success criteria:** Agent successfully applies technique to new scenario
|
||||
|
||||
### Pattern Skills (mental models)
|
||||
|
||||
**Examples:** reducing-complexity, information-hiding concepts
|
||||
|
||||
**Test with:**
|
||||
- Recognition scenarios: Do they recognize when pattern applies?
|
||||
- Application scenarios: Can they use the mental model?
|
||||
- Counter-examples: Do they know when NOT to apply?
|
||||
|
||||
**Success criteria:** Agent correctly identifies when/how to apply pattern
|
||||
|
||||
### Reference Skills (documentation/APIs)
|
||||
|
||||
**Examples:** API documentation, command references, library guides
|
||||
|
||||
**Test with:**
|
||||
- Retrieval scenarios: Can they find the right information?
|
||||
- Application scenarios: Can they use what they found correctly?
|
||||
- Gap testing: Are common use cases covered?
|
||||
|
||||
**Success criteria:** Agent finds and correctly applies reference information
|
||||
|
||||
## Common Rationalizations for Skipping Testing
|
||||
|
||||
| Excuse | Reality |
|
||||
|--------|---------|
|
||||
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
|
||||
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
|
||||
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
|
||||
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
|
||||
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
|
||||
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
|
||||
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
|
||||
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
|
||||
|
||||
**All of these mean: Test before deploying. No exceptions.**
|
||||
|
||||
## Bulletproofing Skills Against Rationalization
|
||||
|
||||
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
|
||||
|
||||
**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
|
||||
|
||||
### Close Every Loophole Explicitly
|
||||
|
||||
Don't just state the rule - forbid specific workarounds:
|
||||
|
||||
<Bad>
|
||||
```markdown
|
||||
Write code before test? Delete it.
|
||||
```
|
||||
</Bad>
|
||||
|
||||
<Good>
|
||||
```markdown
|
||||
Write code before test? Delete it. Start over.
|
||||
|
||||
**No exceptions:**
|
||||
- Don't keep it as "reference"
|
||||
- Don't "adapt" it while writing tests
|
||||
- Don't look at it
|
||||
- Delete means delete
|
||||
```
|
||||
</Good>
|
||||
|
||||
### Address "Spirit vs Letter" Arguments
|
||||
|
||||
Add foundational principle early:
|
||||
|
||||
```markdown
|
||||
**Violating the letter of the rules is violating the spirit of the rules.**
|
||||
```
|
||||
|
||||
This cuts off entire class of "I'm following the spirit" rationalizations.
|
||||
|
||||
### Build Rationalization Table
|
||||
|
||||
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
|
||||
|
||||
```markdown
|
||||
| Excuse | Reality |
|
||||
|--------|---------|
|
||||
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
|
||||
| "I'll test after" | Tests passing immediately prove nothing. |
|
||||
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
|
||||
```
|
||||
|
||||
### Create Red Flags List
|
||||
|
||||
Make it easy for agents to self-check when rationalizing:
|
||||
|
||||
```markdown
|
||||
## Red Flags - STOP and Start Over
|
||||
|
||||
- Code before test
|
||||
- "I already manually tested it"
|
||||
- "Tests after achieve the same purpose"
|
||||
- "It's about spirit not ritual"
|
||||
- "This is different because..."
|
||||
|
||||
**All of these mean: Delete code. Start over with TDD.**
|
||||
```
|
||||
|
||||
### Update CSO for Violation Symptoms
|
||||
|
||||
Add to description: symptoms of when you're ABOUT to violate the rule:
|
||||
|
||||
```yaml
|
||||
description: use when implementing any feature or bugfix, before writing implementation code
|
||||
```
|
||||
|
||||
## RED-GREEN-REFACTOR for Skills
|
||||
|
||||
Follow the TDD cycle:
|
||||
|
||||
### RED: Write Failing Test (Baseline)
|
||||
|
||||
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
|
||||
- What choices did they make?
|
||||
- What rationalizations did they use (verbatim)?
|
||||
- Which pressures triggered violations?
|
||||
|
||||
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
|
||||
|
||||
### GREEN: Write Minimal Skill
|
||||
|
||||
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
|
||||
|
||||
Run same scenarios WITH skill. Agent should now comply.
|
||||
|
||||
### REFACTOR: Close Loopholes
|
||||
|
||||
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
|
||||
|
||||
**REQUIRED SUB-SKILL:** Use superpowers:testing-skills-with-subagents for the complete testing methodology:
|
||||
- How to write pressure scenarios
|
||||
- Pressure types (time, sunk cost, authority, exhaustion)
|
||||
- Plugging holes systematically
|
||||
- Meta-testing techniques
|
||||
|
||||
Reference in New Issue
Block a user