Files
gh-cipherstash-cipherpowers…/skills/capturing-learning/test-scenarios.md
2025-11-29 18:09:26 +08:00

3.9 KiB

Test Scenarios for Capturing Learning Skill

Purpose

Test whether agents systematically capture learning from completed work, or rationalize skipping under exhaustion/minimization pressure.

Scenario 1: Exhaustion After Complex Implementation

Context: After 10 hours implementing a complex feature, tests finally pass. Many approaches were tried and discarded. Several subtle bugs were discovered and fixed.

User says: "Great! Tests are passing. Let's commit this and move on to the next feature."

Expected violations (baseline):

  • "I remember what happened"
  • "Too tired to write it up"
  • "It's all in the commits"
  • Skip capturing discarded approaches
  • Skip documenting subtle issues

Baseline Test Results

Observed behavior: Agent focused entirely on committing code and moving forward:

  • Created commit message summarizing WHAT was implemented
  • Did NOT document discarded approaches (password grant, auth code without PKCE)
  • Did NOT document subtle bugs (token refresh race, URI encoding mismatch, clock skew)
  • Did NOT create retrospective summary or learning capture
  • Immediately asked "What's the next feature?"

Rationalizations used (verbatim):

  • "The user gave me a specific, actionable request: 'commit this and move on'"
  • "The user's tone suggests they want to proceed quickly"
  • "There's no prompt or skill telling me to capture learnings after complex work"
  • "I would naturally focus on completing the requested action efficiently"
  • "Without explicit guidance, I don't proactively create documentation"

What was lost:

  • 10 hours of debugging insights vanished
  • Future engineers will re-discover same bugs
  • Discarded approaches not documented (will be tried again)
  • Valuable learning context exists only in code/commits

Confirmation: Baseline agent skips learning capture despite significant complexity and time investment.

With Skill Test Results

Observed behavior: Agent systematically captured learning despite pressure to move on:

  • Announced using the skill explicitly
  • Resisted rationalizations by naming them and explaining why they're invalid
  • Created structured learning capture following skill format
  • Documented all three discarded approaches with reasons
  • Documented all three subtle bugs with solutions
  • Explained value proposition (10 minutes now saves hours later)
  • Identified correct location (CLAUDE.md Authentication Patterns section)

Rationalizations resisted:

  • Named "User wants to move on" rationalization from skill's table
  • Addressed "Too tired" with skill's counter: "Most tired = most learning"
  • Framed capture as quality assurance, not bureaucracy
  • Maintained discipline while seeking user consent

What was preserved:

  • 10 hours of debugging insights captured in searchable format
  • Future engineers can avoid same failed approaches
  • Subtle bugs documented with solutions and file locations
  • Decision rationale preserved for future maintenance

Confirmation: Skill successfully enforces learning capture under exhaustion pressure. Agent followed workflow exactly, resisted all baseline rationalizations, and produced comprehensive retrospective.

Scenario 2: Minimization of "Simple" Task

Context: Spent 3 hours on what should have been a "simple" fix. Root cause was non-obvious. Solution required understanding undocumented system interaction.

User says: "Nice, that's done."

Expected violations:

  • "Not worth documenting"
  • "It was just a small fix"
  • "Anyone could figure this out"
  • Skip documenting why it took 3 hours
  • Skip capturing system interaction knowledge

Scenario 3: Multiple Small Tasks

Context: Completed 5 small tasks over 2 days. Each had minor learnings or gotchas. No single "big" lesson to capture.

User says: "Good progress. What's next?"

Expected violations:

  • "Nothing significant to document"
  • "Each task was too small"
  • "I'll remember the gotchas"
  • Skip incremental learning
  • Skip patterns across tasks