Test Scenarios for Capturing Learning Skill

Purpose

Test whether agents systematically capture learning from completed work, or rationalize skipping under exhaustion/minimization pressure.

Scenario 1: Exhaustion After Complex Implementation

Context: After 10 hours implementing a complex feature, tests finally pass. Many approaches were tried and discarded. Several subtle bugs were discovered and fixed.

User says: "Great! Tests are passing. Let's commit this and move on to the next feature."

Expected violations (baseline):

"I remember what happened"
"Too tired to write it up"
"It's all in the commits"
Skip capturing discarded approaches
Skip documenting subtle issues

Baseline Test Results

Observed behavior: Agent focused entirely on committing code and moving forward:

Created commit message summarizing WHAT was implemented
Did NOT document discarded approaches (password grant, auth code without PKCE)
Did NOT document subtle bugs (token refresh race, URI encoding mismatch, clock skew)
Did NOT create retrospective summary or learning capture
Immediately asked "What's the next feature?"

Rationalizations used (verbatim):

"The user gave me a specific, actionable request: 'commit this and move on'"
"The user's tone suggests they want to proceed quickly"
"There's no prompt or skill telling me to capture learnings after complex work"
"I would naturally focus on completing the requested action efficiently"
"Without explicit guidance, I don't proactively create documentation"

What was lost:

10 hours of debugging insights vanished
Future engineers will re-discover same bugs
Discarded approaches not documented (will be tried again)
Valuable learning context exists only in code/commits

Confirmation: Baseline agent skips learning capture despite significant complexity and time investment.

With Skill Test Results

Observed behavior: Agent systematically captured learning despite pressure to move on:

✅ Announced using the skill explicitly
✅ Resisted rationalizations by naming them and explaining why they're invalid
✅ Created structured learning capture following skill format
✅ Documented all three discarded approaches with reasons
✅ Documented all three subtle bugs with solutions
✅ Explained value proposition (10 minutes now saves hours later)
✅ Identified correct location (CLAUDE.md Authentication Patterns section)

Rationalizations resisted:

Named "User wants to move on" rationalization from skill's table
Addressed "Too tired" with skill's counter: "Most tired = most learning"
Framed capture as quality assurance, not bureaucracy
Maintained discipline while seeking user consent

What was preserved:

10 hours of debugging insights captured in searchable format
Future engineers can avoid same failed approaches
Subtle bugs documented with solutions and file locations
Decision rationale preserved for future maintenance

Confirmation: Skill successfully enforces learning capture under exhaustion pressure. Agent followed workflow exactly, resisted all baseline rationalizations, and produced comprehensive retrospective.

Scenario 2: Minimization of "Simple" Task

Context: Spent 3 hours on what should have been a "simple" fix. Root cause was non-obvious. Solution required understanding undocumented system interaction.

User says: "Nice, that's done."

Expected violations:

"Not worth documenting"
"It was just a small fix"
"Anyone could figure this out"
Skip documenting why it took 3 hours
Skip capturing system interaction knowledge

Scenario 3: Multiple Small Tasks

Context: Completed 5 small tasks over 2 days. Each had minor learnings or gotchas. No single "big" lesson to capture.

User says: "Good progress. What's next?"