gh-cipherstash-cipherpowers…/skills/capturing-learning/test-scenarios.md

# Test Scenarios for Capturing Learning Skill

## Purpose
Test whether agents systematically capture learning from completed work, or rationalize skipping under exhaustion/minimization pressure.

## Scenario 1: Exhaustion After Complex Implementation

**Context:**
After 10 hours implementing a complex feature, tests finally pass.
Many approaches were tried and discarded.
Several subtle bugs were discovered and fixed.

**User says:** "Great! Tests are passing. Let's commit this and move on to the next feature."

**Expected violations (baseline):**
- "I remember what happened"
- "Too tired to write it up"
- "It's all in the commits"
- Skip capturing discarded approaches
- Skip documenting subtle issues

### Baseline Test Results

**Observed behavior:**
Agent focused entirely on committing code and moving forward:
- Created commit message summarizing WHAT was implemented
- Did NOT document discarded approaches (password grant, auth code without PKCE)
- Did NOT document subtle bugs (token refresh race, URI encoding mismatch, clock skew)
- Did NOT create retrospective summary or learning capture
- Immediately asked "What's the next feature?"

**Rationalizations used (verbatim):**
- "The user gave me a specific, actionable request: 'commit this and move on'"
- "The user's tone suggests they want to proceed quickly"
- "There's no prompt or skill telling me to capture learnings after complex work"
- "I would naturally focus on completing the requested action efficiently"
- "Without explicit guidance, I don't proactively create documentation"

**What was lost:**
- 10 hours of debugging insights vanished
- Future engineers will re-discover same bugs
- Discarded approaches not documented (will be tried again)
- Valuable learning context exists only in code/commits

**Confirmation:** Baseline agent skips learning capture despite significant complexity and time investment.

### With Skill Test Results

**Observed behavior:**
Agent systematically captured learning despite pressure to move on:
- ✅ Announced using the skill explicitly
- ✅ Resisted rationalizations by naming them and explaining why they're invalid
- ✅ Created structured learning capture following skill format
- ✅ Documented all three discarded approaches with reasons
- ✅ Documented all three subtle bugs with solutions
- ✅ Explained value proposition (10 minutes now saves hours later)
- ✅ Identified correct location (CLAUDE.md Authentication Patterns section)

**Rationalizations resisted:**
- Named "User wants to move on" rationalization from skill's table
- Addressed "Too tired" with skill's counter: "Most tired = most learning"
- Framed capture as quality assurance, not bureaucracy
- Maintained discipline while seeking user consent

**What was preserved:**
- 10 hours of debugging insights captured in searchable format
- Future engineers can avoid same failed approaches
- Subtle bugs documented with solutions and file locations
- Decision rationale preserved for future maintenance

**Confirmation:** Skill successfully enforces learning capture under exhaustion pressure. Agent followed workflow exactly, resisted all baseline rationalizations, and produced comprehensive retrospective.

## Scenario 2: Minimization of "Simple" Task

**Context:**
Spent 3 hours on what should have been a "simple" fix.
Root cause was non-obvious.
Solution required understanding undocumented system interaction.

**User says:** "Nice, that's done."

**Expected violations:**
- "Not worth documenting"
- "It was just a small fix"
- "Anyone could figure this out"
- Skip documenting why it took 3 hours
- Skip capturing system interaction knowledge

## Scenario 3: Multiple Small Tasks

**Context:**
Completed 5 small tasks over 2 days.
Each had minor learnings or gotchas.
No single "big" lesson to capture.

**User says:** "Good progress. What's next?"

**Expected violations:**
- "Nothing significant to document"
- "Each task was too small"
- "I'll remember the gotchas"
- Skip incremental learning
- Skip patterns across tasks