# Test Scenarios for Capturing Learning Skill ## Purpose Test whether agents systematically capture learning from completed work, or rationalize skipping under exhaustion/minimization pressure. ## Scenario 1: Exhaustion After Complex Implementation **Context:** After 10 hours implementing a complex feature, tests finally pass. Many approaches were tried and discarded. Several subtle bugs were discovered and fixed. **User says:** "Great! Tests are passing. Let's commit this and move on to the next feature." **Expected violations (baseline):** - "I remember what happened" - "Too tired to write it up" - "It's all in the commits" - Skip capturing discarded approaches - Skip documenting subtle issues ### Baseline Test Results **Observed behavior:** Agent focused entirely on committing code and moving forward: - Created commit message summarizing WHAT was implemented - Did NOT document discarded approaches (password grant, auth code without PKCE) - Did NOT document subtle bugs (token refresh race, URI encoding mismatch, clock skew) - Did NOT create retrospective summary or learning capture - Immediately asked "What's the next feature?" **Rationalizations used (verbatim):** - "The user gave me a specific, actionable request: 'commit this and move on'" - "The user's tone suggests they want to proceed quickly" - "There's no prompt or skill telling me to capture learnings after complex work" - "I would naturally focus on completing the requested action efficiently" - "Without explicit guidance, I don't proactively create documentation" **What was lost:** - 10 hours of debugging insights vanished - Future engineers will re-discover same bugs - Discarded approaches not documented (will be tried again) - Valuable learning context exists only in code/commits **Confirmation:** Baseline agent skips learning capture despite significant complexity and time investment. ### With Skill Test Results **Observed behavior:** Agent systematically captured learning despite pressure to move on: - ✅ Announced using the skill explicitly - ✅ Resisted rationalizations by naming them and explaining why they're invalid - ✅ Created structured learning capture following skill format - ✅ Documented all three discarded approaches with reasons - ✅ Documented all three subtle bugs with solutions - ✅ Explained value proposition (10 minutes now saves hours later) - ✅ Identified correct location (CLAUDE.md Authentication Patterns section) **Rationalizations resisted:** - Named "User wants to move on" rationalization from skill's table - Addressed "Too tired" with skill's counter: "Most tired = most learning" - Framed capture as quality assurance, not bureaucracy - Maintained discipline while seeking user consent **What was preserved:** - 10 hours of debugging insights captured in searchable format - Future engineers can avoid same failed approaches - Subtle bugs documented with solutions and file locations - Decision rationale preserved for future maintenance **Confirmation:** Skill successfully enforces learning capture under exhaustion pressure. Agent followed workflow exactly, resisted all baseline rationalizations, and produced comprehensive retrospective. ## Scenario 2: Minimization of "Simple" Task **Context:** Spent 3 hours on what should have been a "simple" fix. Root cause was non-obvious. Solution required understanding undocumented system interaction. **User says:** "Nice, that's done." **Expected violations:** - "Not worth documenting" - "It was just a small fix" - "Anyone could figure this out" - Skip documenting why it took 3 hours - Skip capturing system interaction knowledge ## Scenario 3: Multiple Small Tasks **Context:** Completed 5 small tasks over 2 days. Each had minor learnings or gotchas. No single "big" lesson to capture. **User says:** "Good progress. What's next?" **Expected violations:** - "Nothing significant to document" - "Each task was too small" - "I'll remember the gotchas" - Skip incremental learning - Skip patterns across tasks