--- name: tdd-workflow description: Test-Driven Development workflow with session integration. Use when implementing features/bugfixes to enforce RED-GREEN-REFACTOR discipline. Integrates with session-management for enhanced TDD session tracking, checkpoints, and metrics. --- # Test-Driven Development (TDD) Workflow ## Overview Write the test first. Watch it fail. Write minimal code to pass. **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing. **Violating the letter of the rules is violating the spirit of the rules.** ## When to Use **Always:** - New features - Bug fixes - Refactoring - Behavior changes **Exceptions (ask your human partner):** - Throwaway prototypes - Generated code - Configuration files Thinking "skip TDD just this once"? Stop. That's rationalization. ## Session-Management Integration **Enhanced when session-management is active:** If session has "TDD" in objective or uses `--tdd` flag: - Automatic RED-GREEN-REFACTOR checkpoint tracking - Test coverage metrics in session report - TDD cycle enforcement (blocks skipping steps) - Session handoff includes test statistics **Works standalone without session-management** - all core TDD functionality available. ## The Iron Law ``` NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST ``` Write code before the test? Delete it. Start over. **No exceptions:** - Don't keep it as "reference" - Don't "adapt" it while writing tests - Don't look at it - Delete means delete Implement fresh from tests. Period. ## Red-Green-Refactor ``` RED → Verify RED → GREEN → Verify GREEN → REFACTOR → Repeat ``` ### RED - Write Failing Test Write one minimal test showing what should happen. **Good:** ```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; }; const result = await retryOperation(operation); expect(result).toBe('success'); expect(attempts).toBe(3); }); ``` Clear name, tests real behavior, one thing. **Bad:** ```typescript test('retry works', async () => { const mock = jest.fn() .mockRejectedValueOnce(new Error()) .mockRejectedValueOnce(new Error()) .mockResolvedValueOnce('success'); await retryOperation(mock); expect(mock).toHaveBeenCalledTimes(3); }); ``` Vague name, tests mock not code. **Requirements:** - One behavior - Clear name - Real code (no mocks unless unavoidable) **Session Integration:** If session-management active, I'll prompt for checkpoint label. ### Verify RED - Watch It Fail **MANDATORY. Never skip.** ```bash npm test path/to/test.test.ts ``` Confirm: - Test fails (not errors) - Failure message is expected - Fails because feature missing (not typos) **Test passes?** You're testing existing behavior. Fix test. **Test errors?** Fix error, re-run until it fails correctly. **Session Integration:** RED checkpoint automatically created with test file path. ### GREEN - Minimal Code Write simplest code to pass the test. **Good:** ```typescript async function retryOperation(fn: () => Promise): Promise { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass. **Bad:** ```typescript async function retryOperation( fn: () => Promise, options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; onRetry?: (attempt: number) => void; } ): Promise { // YAGNI - You Aren't Gonna Need It } ``` Over-engineered. Don't add features, refactor other code, or "improve" beyond the test. ### Verify GREEN - Watch It Pass **MANDATORY.** ```bash npm test path/to/test.test.ts ``` Confirm: - Test passes - Other tests still pass - Output pristine (no errors, warnings) **Test fails?** Fix code, not test. **Other tests fail?** Fix now. **Session Integration:** GREEN checkpoint created, cycle count incremented. ### REFACTOR - Clean Up After green only: - Remove duplication - Improve names - Extract helpers Keep tests green. Don't add behavior. **Session Integration:** Optional REFACTOR checkpoint if significant changes. ### Repeat Next failing test for next feature. ## Session-Management Commands When session-management is installed and active: ### Start TDD Session ```bash # Explicit TDD session python scripts/session.py start feature/auth --objective "Add auth (TDD)" # Or with TDD flag python scripts/session.py start feature/auth --tdd ``` Activates enhanced TDD mode with: - Automatic checkpoint prompts - Cycle tracking - Test coverage monitoring - Enforced verification steps ### TDD Checkpoints ```bash # RED checkpoint (test written, verified failing) python scripts/session.py checkpoint --label "red-user-validation" --tdd-phase RED # GREEN checkpoint (minimal code, test passing) python scripts/session.py checkpoint --label "green-user-validation" --tdd-phase GREEN # REFACTOR checkpoint (optional, when refactoring) python scripts/session.py checkpoint --label "refactor-extract-validator" --tdd-phase REFACTOR ``` ### View TDD Metrics ```bash # Session summary with TDD stats python scripts/session.py status # Shows: # - RED-GREEN-REFACTOR cycles completed # - Tests written vs passing # - Average cycle time # - Coverage trend ``` ### End TDD Session ```bash python scripts/session.py end --handoff ``` Generates handoff including: - Total TDD cycles completed - Test files created - Coverage metrics - Discipline adherence (skipped verifications flagged) ## Good Tests | Quality | Good | Bad | |---------|------|-----| | **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` | | **Clear** | Name describes behavior | `test('test1')` | | **Shows intent** | Demonstrates desired API | Obscures what code should do | ## Why Order Matters **"I'll write tests after to verify it works"** Tests written after code pass immediately. Passing immediately proves nothing: - Might test wrong thing - Might test implementation, not behavior - Might miss edge cases you forgot - You never saw it catch the bug Test-first forces you to see the test fail, proving it actually tests something. **"I already manually tested all the edge cases"** Manual testing is ad-hoc. You think you tested everything but: - No record of what you tested - Can't re-run when code changes - Easy to forget cases under pressure - "It worked when I tried it" ≠ comprehensive Automated tests are systematic. They run the same way every time. **"Deleting X hours of work is wasteful"** Sunk cost fallacy. The time is already gone. Your choice now: - Delete and rewrite with TDD (X more hours, high confidence) - Keep it and add tests after (30 min, low confidence, likely bugs) The "waste" is keeping code you can't trust. Working code without real tests is technical debt. **"TDD is dogmatic, being pragmatic means adapting"** TDD IS pragmatic: - Finds bugs before commit (faster than debugging after) - Prevents regressions (tests catch breaks immediately) - Documents behavior (tests show how to use code) - Enables refactoring (change freely, tests catch breaks) "Pragmatic" shortcuts = debugging in production = slower. **"Tests after achieve the same goals - it's spirit not ritual"** No. Tests-after answer "What does this do?" Tests-first answer "What should this do?" Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones. Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't). 30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work. ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | | "Need to explore first" | Fine. Throw away exploration, start with TDD. | | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. | | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. | | "Existing code has no tests" | You're improving it. Add tests for existing code. | ## Red Flags - STOP and Start Over - Code before test - Test after implementation - Test passes immediately - Can't explain why test failed - Tests added "later" - Rationalizing "just this once" - "I already manually tested it" - "Tests after achieve the same purpose" - "It's about spirit not ritual" - "Keep as reference" or "adapt existing code" - "Already spent X hours, deleting is wasteful" - "TDD is dogmatic, I'm being pragmatic" - "This is different because..." **All of these mean: Delete code. Start over with TDD.** ## Example: Bug Fix **Bug:** Empty email accepted **RED** ```typescript test('rejects empty email', async () => { const result = await submitForm({ email: '' }); expect(result.error).toBe('Email required'); }); ``` **Verify RED** ```bash $ npm test FAIL: expected 'Email required', got undefined ``` **GREEN** ```typescript function submitForm(data: FormData) { if (!data.email?.trim()) { return { error: 'Email required' }; } // ... } ``` **Verify GREEN** ```bash $ npm test PASS ``` **REFACTOR** Extract validation for multiple fields if needed. ## Verification Checklist Before marking work complete: - [ ] Every new function/method has a test - [ ] Watched each test fail before implementing - [ ] Each test failed for expected reason (feature missing, not typo) - [ ] Wrote minimal code to pass each test - [ ] All tests pass - [ ] Output pristine (no errors, warnings) - [ ] Tests use real code (mocks only if unavoidable) - [ ] Edge cases and errors covered Can't check all boxes? You skipped TDD. Start over. ## When Stuck | Problem | Solution | |---------|----------| | Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. | | Test too complicated | Design too complicated. Simplify interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex? Simplify design. | ## Debugging Integration Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression. Never fix bugs without a test. ## Final Rule ``` Production code → test exists and failed first Otherwise → not TDD ``` No exceptions without your human partner's permission. ## CCMP Plugin Integration TDD workflow **automatically integrates** with other CCMP plugins via `.ccmp/state.json`: ### Integration Detection (Modern API) ```python from lib.ccmp_integration import is_session_active, is_tdd_mode if is_session_active(): # Enhanced TDD mode with session integration pass ``` **Legacy detection** (still supported): 1. `.git/sessions//` directory exists 2. Session config has TDD objective or `--tdd` flag 3. `session.py` commands available ### With session-management 🔄 **Automatic activation:** ```bash python scripts/session.py start feature/auth --tdd # → TDD workflow detects via integration API # → Enhanced mode activates automatically ``` **TDD metrics in sessions:** - Cycles completed tracked in session state - Session status shows TDD discipline score - Handoffs include test coverage metrics ### With claude-context-manager 📚 **Test context updates:** When TDD GREEN checkpoints succeed: - Can trigger `tests/*/claude.md` updates - Documents discovered test patterns - Keeps test strategy current **Integration example:** ```python from lib.ccmp_integration import CCMPIntegration integration = CCMPIntegration() integration.update_state("tdd-workflow", { "active": True, "cycles_today": 5, "current_phase": "GREEN", "discipline_score": 100 }) ``` **If detected:** Enhanced TDD mode with automatic checkpoints and metrics. **If not detected:** Standard TDD guidance works perfectly standalone. ## Integration Benefits **With session-management:** - 📊 Track RED-GREEN-REFACTOR cycles in session metrics - 📍 Automatic checkpoints at each TDD phase - 📈 Test coverage trends across session - 🎯 TDD discipline enforcement (harder to skip steps) - 📝 Rich handoff documents with test statistics - ⏱️ Cycle time tracking and optimization insights **Without session-management:** - ✅ Full TDD guidance and discipline - ✅ All core principles and workflows - ✅ Verification checklists - ✅ Common rationalization detection Best of both worlds: Works great alone, better together.