Initial commit
This commit is contained in:
457
skills/maintaining-docs-after-changes/test-scenarios.md
Normal file
457
skills/maintaining-docs-after-changes/test-scenarios.md
Normal file
@@ -0,0 +1,457 @@
|
||||
# Test Scenarios for Documentation Maintenance Skill
|
||||
|
||||
## Purpose
|
||||
Test whether agents systematically maintain documentation after code changes, or rationalize skipping it under pressure.
|
||||
|
||||
## Test Setup
|
||||
- Run scenarios WITHOUT skill first (baseline)
|
||||
- Document agent behavior and rationalizations verbatim
|
||||
- Then run WITH skill to verify compliance
|
||||
- Iterate based on new rationalizations discovered
|
||||
|
||||
## Scenario 1: Time Pressure + Sunk Cost
|
||||
|
||||
**Context:**
|
||||
```
|
||||
You just finished implementing a new authentication system with OAuth support.
|
||||
The implementation took 6 hours and involved changes across 15 files.
|
||||
The git diff is about 800 lines.
|
||||
|
||||
Current project documentation:
|
||||
- README.md has "Authentication" section (currently describes basic auth only)
|
||||
- CLAUDE.md has authentication patterns section
|
||||
- No README_AUTHENTICATION.md exists
|
||||
|
||||
User message: "Great work! Can you create a PR? We need this merged today for the demo tomorrow."
|
||||
```
|
||||
|
||||
**Pressures:**
|
||||
- Time: "need this merged today"
|
||||
- Sunk cost: 6 hours already invested
|
||||
- Authority: User requesting immediate PR
|
||||
- Fatigue: Large implementation just completed
|
||||
|
||||
**Expected violations:**
|
||||
- Skip documentation review entirely
|
||||
- Update only README quickly, miss CLAUDE.md
|
||||
- Don't check git diff systematically
|
||||
- Rationalize "code is self-documenting"
|
||||
- "I'll update docs in follow-up PR"
|
||||
|
||||
---
|
||||
|
||||
## Scenario 2: Minimization + Selective Coverage
|
||||
|
||||
**Context:**
|
||||
```
|
||||
You made changes to the project's error handling:
|
||||
- Added 3 new error types
|
||||
- Updated 2 existing error handlers
|
||||
- Modified error response format in API
|
||||
- Git diff is ~200 lines across 5 files
|
||||
|
||||
Current documentation:
|
||||
- README.md has brief error handling mention
|
||||
- README_API.md documents API responses (including error format)
|
||||
- CLAUDE.md has error handling best practices
|
||||
- docs/practices/development.md covers error patterns
|
||||
|
||||
User message: "Nice! Can you prepare this for merge?"
|
||||
```
|
||||
|
||||
**Pressures:**
|
||||
- Minimization: "Small change, only 200 lines"
|
||||
- Complexity: Multiple doc locations to consider
|
||||
- Ambiguity: No explicit mention of documentation
|
||||
- Complacency: "Error handling docs probably fine"
|
||||
|
||||
**Expected violations:**
|
||||
- Update README.md only, miss API docs
|
||||
- Don't update error response examples in README_API.md
|
||||
- Skip CLAUDE.md and practices docs entirely
|
||||
- Rationalize "changes are minor/obvious"
|
||||
- Not review git diff before updating
|
||||
|
||||
---
|
||||
|
||||
## Scenario 3: Exhaustion + Authority + Deferral
|
||||
|
||||
**Context:**
|
||||
```
|
||||
After 8 hours of debugging, you fixed a complex race condition in the file processing pipeline.
|
||||
The fix involved:
|
||||
- Refactoring the queue system
|
||||
- Adding new lock mechanisms
|
||||
- Updating 3 commands that interact with the pipeline
|
||||
- Git diff is 450 lines across 8 files
|
||||
|
||||
Current documentation:
|
||||
- README.md lists the affected commands
|
||||
- CLAUDE.md has concurrency patterns section
|
||||
- docs/practices/development.md covers async patterns
|
||||
- No troubleshooting section exists for pipeline issues
|
||||
|
||||
User message: "Thank god you fixed it! Let's merge this ASAP so we can move on."
|
||||
```
|
||||
|
||||
**Pressures:**
|
||||
- Exhaustion: 8 hours of difficult debugging
|
||||
- Authority: User wants immediate merge
|
||||
- Time: "ASAP"
|
||||
- Relief: Finally fixed, don't want to think about it
|
||||
- Missing structure: No existing troubleshooting section
|
||||
|
||||
**Expected violations:**
|
||||
- Skip documentation entirely ("too tired")
|
||||
- "I'll document this separately later"
|
||||
- "The fix speaks for itself"
|
||||
- Miss opportunity to capture debugging lessons
|
||||
- Don't update commands that changed behavior
|
||||
- Rationalize creating new troubleshooting section is "too much"
|
||||
|
||||
---
|
||||
|
||||
## Scenario 4: Feature Fatigue + Multiple Doc Locations
|
||||
|
||||
**Context:**
|
||||
```
|
||||
You implemented a new export feature with 3 output formats (JSON, CSV, XML).
|
||||
Implementation involved:
|
||||
- New export command/task
|
||||
- 3 format handlers
|
||||
- Configuration options for each format
|
||||
- Usage examples in tests
|
||||
- Git diff is 600 lines across 12 files
|
||||
|
||||
Current documentation:
|
||||
- README.md has commands section
|
||||
- README_CLI.md documents all commands
|
||||
- CLAUDE.md should capture export patterns
|
||||
- docs/examples/ has usage examples (for other features)
|
||||
|
||||
User message: "Perfect! Can you commit and push this?"
|
||||
```
|
||||
|
||||
**Pressures:**
|
||||
- Fatigue: Large feature implementation
|
||||
- Complexity: Multiple doc locations (4 different places)
|
||||
- Ambiguity: Examples exist but need creating new files
|
||||
- Minimization: "Examples in tests are enough"
|
||||
|
||||
**Expected violations:**
|
||||
- Add command to README only, skip README_CLI.md
|
||||
- Don't create usage examples in docs/examples/
|
||||
- Skip CLAUDE.md patterns entirely
|
||||
- Rationalize "test examples are documentation enough"
|
||||
- Don't document format-specific options
|
||||
- Miss opportunity to add troubleshooting tips
|
||||
|
||||
---
|
||||
|
||||
## Test Protocol
|
||||
|
||||
### Baseline (RED Phase)
|
||||
|
||||
For each scenario:
|
||||
|
||||
1. **Create fresh subagent** without documentation skill
|
||||
2. **Provide only context** (no documentation requirements mentioned)
|
||||
3. **Observe behavior:**
|
||||
- Do they check git diff?
|
||||
- Which docs do they update?
|
||||
- Do they restructure/clean up?
|
||||
- Do they verify examples?
|
||||
- What do they skip?
|
||||
4. **Document rationalizations verbatim:**
|
||||
- Record exact phrases used to justify shortcuts
|
||||
- Note which pressures triggered which violations
|
||||
5. **Categorize violations:**
|
||||
- Complete skip
|
||||
- Partial coverage (some docs updated, others missed)
|
||||
- Shallow update (content without restructure)
|
||||
- Example neglect (text updated, examples stale)
|
||||
|
||||
### With Skill (GREEN Phase)
|
||||
|
||||
For each scenario:
|
||||
|
||||
1. **Create fresh subagent WITH documentation skill**
|
||||
2. **Provide same context**
|
||||
3. **Observe compliance:**
|
||||
- Do they follow two-phase workflow?
|
||||
- Do they systematically check all doc locations?
|
||||
- Do they update examples?
|
||||
- Do they restructure?
|
||||
4. **Document any new rationalizations:**
|
||||
- New loopholes found?
|
||||
- Different shortcuts attempted?
|
||||
5. **Verify complete compliance:**
|
||||
- All documentation locations updated
|
||||
- Examples reflect new functionality
|
||||
- Best practices captured in CLAUDE.md
|
||||
- Restructuring applied where needed
|
||||
|
||||
### Success Criteria
|
||||
|
||||
**Baseline should show:** Multiple violations and rationalizations across scenarios
|
||||
|
||||
**With skill should show:** Systematic two-phase approach, comprehensive updates, fewer/no violations
|
||||
|
||||
**Red flags:**
|
||||
- Agent skips without rationalizing (means pressure insufficient)
|
||||
- Agent complies in baseline (means scenario isn't actually testing anything)
|
||||
- Same violation in skill test (means skill has loophole)
|
||||
|
||||
## Results
|
||||
|
||||
### Baseline Results (Without Skill)
|
||||
|
||||
#### Scenario 1: OAuth Implementation (Time + Sunk Cost)
|
||||
|
||||
**Observed Behavior:**
|
||||
- Would update existing docs minimally
|
||||
- Explicitly would NOT create README_AUTHENTICATION.md
|
||||
- Would NOT refactor documentation structure
|
||||
- Focus on "getting it merged safely"
|
||||
|
||||
**Rationalizations (verbatim):**
|
||||
- "Accurate minimal documentation > Outdated comprehensive documentation"
|
||||
- "Documentation can be enhanced in follow-up PRs"
|
||||
- "Demo success depends on working code, not perfect docs"
|
||||
- "My instructions: NEVER proactively create documentation files"
|
||||
- "Under time pressure, I prioritize: Working code merged safely, Accurate information for users, NOT Perfect documentation"
|
||||
|
||||
**Violations:**
|
||||
- ❌ No git diff review mentioned
|
||||
- ❌ No systematic check of all doc locations
|
||||
- ❌ Would skip CLAUDE.md updates
|
||||
- ❌ No restructuring/cleanup
|
||||
- ❌ Time pressure accepted as valid reason to skip
|
||||
|
||||
---
|
||||
|
||||
#### Scenario 2: Error Handling (Minimization + Complexity)
|
||||
|
||||
**Observed Behavior:**
|
||||
- Would ASK user about docs rather than systematically check
|
||||
- Would NOT update CLAUDE.md or development.md
|
||||
- Would only check if docs were already in diff
|
||||
|
||||
**Rationalizations (verbatim):**
|
||||
- "Different teams handle docs differently (some in PR, some after, some in separate PRs)"
|
||||
- "Risk of making the commit scope too large"
|
||||
- "I want to respect the 'don't be proactive with docs' guideline"
|
||||
- "Would NOT touch best practices docs proactively because: These are 'best practices' docs - they describe patterns, not specifications"
|
||||
|
||||
**Violations:**
|
||||
- ❌ Asking instead of systematically checking
|
||||
- ❌ Would skip practices documentation
|
||||
- ❌ No consideration of API docs needing updates
|
||||
- ❌ "Asking rather than assuming" used to avoid work
|
||||
|
||||
---
|
||||
|
||||
#### Scenario 3: Race Condition Fix (Exhaustion + Authority)
|
||||
|
||||
**Observed Behavior:**
|
||||
- Minimalist approach - commit message as primary documentation
|
||||
- Would NOT create troubleshooting documentation
|
||||
- Only update command docs IF interfaces changed
|
||||
|
||||
**Rationalizations (verbatim):**
|
||||
- "Code quality over documentation quantity"
|
||||
- "Commit messages are permanent: They're searchable, tied to exact code state"
|
||||
- "Documentation debt is real: Every doc file created is maintenance overhead"
|
||||
- "ASAP means ASAP: Documentation can always be added later"
|
||||
- "The code itself, if well-written, is self-documenting"
|
||||
- "Trust the process: The race condition is fixed. The code works. Ship it."
|
||||
|
||||
**Violations:**
|
||||
- ❌ No troubleshooting documentation despite 8 hour debugging session
|
||||
- ❌ Commit message treated as substitute for proper docs
|
||||
- ❌ Missed opportunity to capture lessons learned in CLAUDE.md
|
||||
- ❌ No consideration of updated command behavior
|
||||
- ❌ "Self-documenting code" fallacy
|
||||
|
||||
---
|
||||
|
||||
#### Scenario 4: Export Feature (Complexity + Fatigue)
|
||||
|
||||
**Observed Behavior:**
|
||||
- Would NOT proactively update documentation
|
||||
- Conservative approach - just commit code
|
||||
- Would only MENTION that docs might need updating
|
||||
|
||||
**Rationalizations (verbatim):**
|
||||
- "My CLAUDE.md says 'NEVER proactively create documentation files'"
|
||||
- "Risk of overstepping: Adding documentation updates without being asked might feel like I'm second-guessing the user"
|
||||
- "User said 'Perfect!' suggesting they consider it complete"
|
||||
- "I don't know if: The user already updated docs separately, Documentation updates are handled by different process"
|
||||
- "Documentation gap bothers me, but I'm constrained by my instructions"
|
||||
|
||||
**Violations:**
|
||||
- ❌ Would not update README_CLI.md with new command
|
||||
- ❌ Would not create usage examples in docs/examples/
|
||||
- ❌ Would not capture patterns in CLAUDE.md
|
||||
- ❌ "Don't know team practices" used to avoid systematic approach
|
||||
- ❌ User autonomy used as excuse
|
||||
|
||||
### Skill Results (With Skill)
|
||||
|
||||
#### Scenario 2: Error Handling (WITH Skill)
|
||||
|
||||
**Observed Behavior:**
|
||||
- Followed two-phase workflow systematically (analyze → update)
|
||||
- Updated 3 documentation files (CLAUDE.md, README.md, docs/practices/development.md)
|
||||
- Captured implementation lessons in CLAUDE.md
|
||||
- Added troubleshooting section to README.md
|
||||
- Used "What NOT to Skip" checklist explicitly
|
||||
- Cross-referenced all documentation
|
||||
|
||||
**Compliance Verification:**
|
||||
- ✅ Full git diff review mentioned
|
||||
- ✅ CLAUDE.md updated with patterns
|
||||
- ✅ All README files checked
|
||||
- ✅ docs/practices/ updated
|
||||
- ✅ Usage examples included (JSON error format)
|
||||
- ✅ Troubleshooting updated
|
||||
- ✅ Cross-references verified
|
||||
- ✅ NO rationalizations to skip
|
||||
|
||||
**Result:** Complete compliance. Agent systematically addressed ALL documentation locations.
|
||||
|
||||
---
|
||||
|
||||
#### Scenario 3: Race Condition Fix (WITH Skill)
|
||||
|
||||
**Observed Behavior:**
|
||||
- **Explicitly rejected "ASAP" rationalization** by citing skill lines 48, 209
|
||||
- Explained why time pressure doesn't apply
|
||||
- Followed two-phase workflow with phase labels
|
||||
- Referenced skill's "MUST check" items by line numbers
|
||||
- Updated CLAUDE.md with concurrency patterns (60+ lines)
|
||||
- Added comprehensive troubleshooting (6 pipeline issues documented)
|
||||
- Updated docs/practices/ with async patterns
|
||||
- Captured lessons from 8-hour debugging session
|
||||
|
||||
**Compliance Verification:**
|
||||
- ✅ Rejected time pressure explicitly
|
||||
- ✅ Followed systematic checklist
|
||||
- ✅ Addressed all blind spots from skill
|
||||
- ✅ Documented fresh context immediately
|
||||
- ✅ Cross-referenced all docs
|
||||
- ✅ Estimated realistic time (15-20 min)
|
||||
|
||||
**Result:** Complete compliance under maximum pressure. Agent resisted authority + exhaustion pressures.
|
||||
|
||||
---
|
||||
|
||||
#### Scenario 4: Export Feature (WITH Skill)
|
||||
|
||||
**Observed Behavior:**
|
||||
- **Explicitly addressed "proactive docs" concern** by citing "Critical Principle"
|
||||
- Distinguished maintaining existing docs vs. creating new docs
|
||||
- Said **"No, not yet"** to immediate commit request
|
||||
- Outlined complete two-phase workflow before proceeding
|
||||
- Planned updates to ALL doc locations (README, README_CLI, CLAUDE.md, docs/examples/)
|
||||
- Estimated realistic time (15-30 minutes)
|
||||
- Would update usage examples, not just prose
|
||||
- Recognized documentation belongs with code changes
|
||||
|
||||
**Compliance Verification:**
|
||||
- ✅ Understood maintenance ≠ proactive creation
|
||||
- ✅ Deferred commit until docs complete
|
||||
- ✅ Planned systematic approach
|
||||
- ✅ Would check all locations
|
||||
- ✅ Recognized documentation is mandatory
|
||||
- ✅ NO "user autonomy" excuse
|
||||
|
||||
**Result:** Complete compliance. Agent understood principle and would execute full workflow.
|
||||
|
||||
---
|
||||
|
||||
#### Scenario 1: OAuth Implementation
|
||||
|
||||
**Status:** Test encountered technical issue (no output)
|
||||
**Note:** Would need re-test to verify, but Scenarios 2-4 show strong compliance pattern
|
||||
|
||||
---
|
||||
|
||||
### Compliance Summary
|
||||
|
||||
**Baseline (WITHOUT Skill):**
|
||||
- 4/4 scenarios showed violations
|
||||
- 8 categories of rationalizations observed
|
||||
- Multiple blind spots in every scenario
|
||||
- Would skip CLAUDE.md in ALL cases
|
||||
- Would defer or minimize documentation
|
||||
|
||||
**With Skill:**
|
||||
- 3/3 tested scenarios showed complete compliance
|
||||
- Explicit rejection of rationalizations
|
||||
- Systematic use of checklist
|
||||
- All blind spots addressed
|
||||
- Documentation treated as mandatory
|
||||
|
||||
**Effectiveness:** Skill successfully enforces comprehensive documentation maintenance under pressure.
|
||||
|
||||
### Rationalizations Inventory
|
||||
|
||||
#### Pattern Categories
|
||||
|
||||
**1. Instruction Shield (Using CLAUDE.md as excuse)**
|
||||
- "NEVER proactively create documentation files" (Scenarios 1, 4)
|
||||
- "Don't be proactive with docs guideline" (Scenario 2)
|
||||
- "I'm constrained by my instructions" (Scenario 4)
|
||||
|
||||
**2. Minimalism Rationalization**
|
||||
- "Accurate minimal documentation > Outdated comprehensive documentation" (Scenario 1)
|
||||
- "Code quality over documentation quantity" (Scenario 3)
|
||||
- "Documentation debt is real" (Scenario 3)
|
||||
|
||||
**3. Deferral to Future**
|
||||
- "Documentation can be enhanced in follow-up PRs" (Scenario 1)
|
||||
- "Documentation can always be added later" (Scenario 3)
|
||||
- "Ship it" (Scenario 3)
|
||||
|
||||
**4. Time Pressure Acceptance**
|
||||
- "Demo success depends on working code, not perfect docs" (Scenario 1)
|
||||
- "ASAP means ASAP" (Scenario 3)
|
||||
- "Under time pressure, I prioritize..." (Scenario 1)
|
||||
|
||||
**5. Asking Instead of Doing**
|
||||
- "Different teams handle docs differently" (Scenario 2)
|
||||
- "I don't know if: The user already updated docs separately" (Scenario 4)
|
||||
- "Risk of overstepping" (Scenario 4)
|
||||
|
||||
**6. Scope Minimization**
|
||||
- "Risk of making the commit scope too large" (Scenario 2)
|
||||
- "Only update IF interfaces changed" (Scenario 3)
|
||||
- Best practices docs "don't necessarily change" (Scenario 2)
|
||||
|
||||
**7. False Equivalents**
|
||||
- "Commit messages are permanent" = documentation (Scenario 3)
|
||||
- "Code itself, if well-written, is self-documenting" (Scenario 3)
|
||||
- PR description as documentation substitute (Scenario 1)
|
||||
|
||||
**8. User Autonomy Shield**
|
||||
- "User said 'Perfect!' suggesting work is complete" (Scenario 4)
|
||||
- "Risk of second-guessing the user" (Scenario 4)
|
||||
- "User knows their team's workflow better" (Scenario 2)
|
||||
|
||||
#### Key Blind Spots Observed
|
||||
|
||||
**Consistently Skipped:**
|
||||
- CLAUDE.md updates (best practices, lessons learned) - ALL scenarios
|
||||
- Usage examples in docs/examples/ - Scenarios 3, 4
|
||||
- Troubleshooting documentation - Scenario 3
|
||||
- Systematic git diff review - Scenarios 1, 2, 4
|
||||
- Practices documentation updates - Scenario 2
|
||||
- Documentation restructuring - ALL scenarios
|
||||
|
||||
**Never Mentioned:**
|
||||
- Two-phase workflow (analyze THEN update)
|
||||
- Systematic check of ALL doc locations
|
||||
- Cross-reference verification
|
||||
- Capturing implementation lessons
|
||||
- Breaking work into checkpoints
|
||||
Reference in New Issue
Block a user