gh-withzombies-hyperpowers/skills/test-driven-development/SKILL.md

---
name: test-driven-development
description: Use when implementing features or fixing bugs - enforces RED-GREEN-REFACTOR cycle requiring tests to fail before writing code
---

<skill_overview>
Write the test first, watch it fail, write minimal code to pass. If you didn't watch the test fail, you don't know if it tests the right thing.
</skill_overview>

<rigidity_level>
LOW FREEDOM - Follow these exact steps in order. Do not adapt.

Violating the letter of the rules is violating the spirit of the rules.
</rigidity_level>

<quick_reference>

| Phase | Action | Command Example | Expected Result |
|-------|--------|-----------------|-----------------|
| **RED** | Write failing test | `cargo test test_name` | FAIL (feature missing) |
| **Verify RED** | Confirm correct failure | Check error message | "function not found" or assertion fails |
| **GREEN** | Write minimal code | Implement feature | Test passes |
| **Verify GREEN** | All tests pass | `cargo test` | All green, no warnings |
| **REFACTOR** | Clean up code | Improve while green | Tests still pass |

**Iron Law:** NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

</quick_reference>

<when_to_use>
**Always use for:**
- New features
- Bug fixes
- Refactoring with behavior changes
- Any production code

**Ask your human partner for exceptions:**
- Throwaway prototypes (will be deleted)
- Generated code
- Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.
</when_to_use>

<the_process>

## 1. RED - Write Failing Test

Write one minimal test showing what should happen.

**Requirements:**
- Test one behavior only ("and" in name? Split it)
- Clear name describing behavior
- Use real code (no mocks unless unavoidable)

See [resources/language-examples.md](resources/language-examples.md) for Rust, Swift, TypeScript examples.

## 2. Verify RED - Watch It Fail

**MANDATORY. Never skip.**

Run the test and confirm:
- ✓ Test **fails** (not errors with syntax issues)
- ✓ Failure message is expected ("function not found" or assertion fails)
- ✓ Fails because feature missing (not typos)

**If test passes:** You're testing existing behavior. Fix the test.
**If test errors:** Fix syntax error, re-run until it fails correctly.

## 3. GREEN - Write Minimal Code

Write simplest code to pass the test. Nothing more.

**Key principle:** Don't add features the test doesn't require. Don't refactor other code. Don't "improve" beyond the test.

## 4. Verify GREEN - Watch It Pass

**MANDATORY.**

Run tests and confirm:
- ✓ New test passes
- ✓ All other tests still pass
- ✓ No errors or warnings

**If test fails:** Fix code, not test.
**If other tests fail:** Fix now before proceeding.

## 5. REFACTOR - Clean Up

**Only after green:**
- Remove duplication
- Improve names
- Extract helpers

Keep tests green. Don't add behavior.

## 6. Repeat

Next failing test for next feature.

</the_process>

<examples>

<example>
<scenario>Developer writes implementation first, then adds test that passes immediately</scenario>

<code>
// Code written FIRST
def validate_email(email):
    return "@" in email  # Bug: accepts "@@"

// Test written AFTER
def test_validate_email():
    assert validate_email("user@example.com")  # Passes immediately!
    // Missing edge case: assert not validate_email("@@")
</code>

<why_it_fails>
When test passes immediately:
- Never proved the test catches bugs
- Only tested happy path you remembered
- Forgot edge cases (like "@@")
- Bug ships to production

Tests written after verify remembered cases, not required behavior.
</why_it_fails>

<correction>
**TDD approach:**

1. **RED** - Write test first (including edge case):
```python
def test_validate_email():
    assert validate_email("user@example.com")  # Will fail - function doesn't exist
    assert not validate_email("@@")            # Edge case up front
```

2. **Verify RED** - Run test, watch it fail:
```bash
NameError: function 'validate_email' is not defined
```

3. **GREEN** - Implement to pass both cases:
```python
def validate_email(email):
    return "@" in email and email.count("@") == 1
```

4. **Verify GREEN** - Both assertions pass, bug prevented.

**Result:** Test failed first, proving it works. Edge case discovered during test writing, not in production.
</correction>
</example>

<example>
<scenario>Developer has already written 3 hours of code without tests. Wants to keep it as "reference" while writing tests.</scenario>

<code>
// 200 lines of untested code exists
// Developer thinks: "I'll keep this and write tests that match it"
// Or: "I'll use it as reference to speed up TDD"
</code>

<why_it_fails>
**Keeping code as "reference":**
- You'll copy it (that's testing after, with extra steps)
- You'll adapt it (biased by implementation)
- Tests will match code, not requirements
- You'll justify shortcuts: "I already know this works"

**Result:** All the problems of test-after, none of the benefits of TDD.
</why_it_fails>

<correction>
**Delete it. Completely.**

```bash
git stash  # Or delete the file
```

**Then start TDD:**
1. Write first failing test from requirements (not from code)
2. Watch it fail
3. Implement fresh (might be different from original, that's OK)
4. Watch it pass

**Why delete:**
- Sunk cost is already gone
- 3 hours implementing ≠ 3 hours with TDD (TDD might be 2 hours total)
- Code without tests is technical debt
- Fresh implementation from tests is usually better

**What you gain:**
- Tests that actually verify behavior
- Confidence code works
- Ability to refactor safely
- No bugs from untested edge cases
</correction>
</example>

<example>
<scenario>Test is hard to write. Developer thinks "design must be unclear, but I'll implement first to explore."</scenario>

<code>
// Test attempt:
func testUserServiceCreatesAccount() {
    // Need to mock database, email service, payment gateway, logger...
    // This is getting complicated, maybe I should just implement first
}
</code>

<why_it_fails>
**"Test is hard" is valuable signal:**
- Hard to test = hard to use
- Too many dependencies = coupling too tight
- Complex setup = design needs simplification

**Implementing first ignores this signal:**
- Build the complex design
- Lock in the coupling
- Now forced to write complex tests (or skip them)
</why_it_fails>

<correction>
**Listen to the test.**

Hard to test? Simplify the interface:

```swift
// Instead of:
class UserService {
    init(db: Database, email: EmailService, payments: PaymentGateway, logger: Logger) { }
    func createAccount(email: String, password: String, paymentToken: String) throws { }
}

// Make testable:
class UserService {
    func createAccount(request: CreateAccountRequest) -> Result<Account, Error> {
        // Dependencies injected through request or passed separately
    }
}
```

**Test becomes simple:**
```swift
func testCreatesAccountFromRequest() {
    let service = UserService()
    let request = CreateAccountRequest(email: "user@example.com")
    let result = service.createAccount(request: request)
    XCTAssertEqual(result.email, "user@example.com")
}
```

**TDD forces good design.** If test is hard, fix design before implementing.
</correction>
</example>

</examples>

<critical_rules>

## Rules That Have No Exceptions

1. **Write code before test?** → Delete it. Start over.
   - Never keep as "reference"
   - Never "adapt" while writing tests
   - Delete means delete

2. **Test passes immediately?** → Not TDD. Fix the test or delete the code.
   - Passing immediately proves nothing
   - You're testing existing behavior, not required behavior

3. **Can't explain why test failed?** → Fix until failure makes sense.
   - "function not found" = good (feature doesn't exist)
   - Weird error = bad (fix test, re-run)

4. **Want to skip "just this once"?** → That's rationalization. Stop.
   - TDD is faster than debugging in production
   - "Too simple to test" = test takes 30 seconds
   - "Already manually tested" = not systematic, not repeatable

## Common Excuses

All of these mean: Stop, follow TDD:
- "This is different because..."
- "I'm being pragmatic, not dogmatic"
- "It's about spirit not ritual"
- "Tests after achieve the same goals"
- "Deleting X hours of work is wasteful"

</critical_rules>

<verification_checklist>

Before marking work complete:

- [ ] Every new function/method has a test
- [ ] Watched each test **fail** before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass with no warnings
- [ ] Tests use real code (mocks only if unavoidable)
- [ ] Edge cases and errors covered

**Can't check all boxes?** You skipped TDD. Start over.

</verification_checklist>

<integration>

**This skill calls:**
- verification-before-completion (running tests to verify)

**This skill is called by:**
- fixing-bugs (write failing test reproducing bug)
- executing-plans (when implementing bd tasks)
- refactoring-safely (keep tests green while refactoring)

**Agents used:**
- hyperpowers:test-runner (run tests, return summary only)

</integration>

<resources>

**Detailed language-specific examples:**
- [Rust, Swift, TypeScript examples](resources/language-examples.md) - Complete RED-GREEN-REFACTOR cycles
- [Language-specific test commands](resources/language-examples.md#verification-commands-by-language)

**When stuck:**
- Test too complicated? → Design too complicated, simplify interface
- Must mock everything? → Code too coupled, use dependency injection
- Test setup huge? → Extract helpers, or simplify design

</resources>