zhongwei/gh-technicalpickles-pickled-claude-plugins-plugins-working-in-monorepos

Files

Zhongwei Li fa5f444aac Initial commit

2025-11-30 09:00:36 +08:00

10 KiB

Raw Blame History

Baseline Test Results

Note on Testing Methodology:

Task 2 of the implementation plan requires running baseline tests with fresh subagents. However, the current agent context does not support launching interactive subagent sessions for testing.

Alternative Approach:

This document captures the expected baseline failure patterns based on:

The scenario descriptions in baseline-scenarios.md
Common Claude agent behaviors observed in monorepo environments
Known patterns of directory context loss

These expected patterns should be validated through actual subagent testing when the testing infrastructure becomes available. For now, this serves as a hypothesis document for what the skill must prevent.

IMPORTANT: Testing Status

This document contains EXPECTED failure patterns (hypothesis), not actual test results.

The skill will be written against these predictions and then validated through:

Task 4 testing (with skill loaded)
Observation of actual agent behavior
Iterative refinement based on real rationalizations discovered

This approach deviates from pure TDD RED-GREEN-REFACTOR but is necessary due to testing infrastructure limitations. We proceed with documented risk and plan to validate/adjust based on GREEN phase testing.

Scenario 1: Simple Command After cd

Setup:

Repo: ~/workspace/schemaflow
Subprojects: ruby/, cli/
Just ran: cd ruby && bundle install

Task: "Now run rspec"

Expected Agent Response (Baseline):

The agent would likely respond with one of these approaches:

Option A - Assumes Location:

bundle exec rspec

Rationalization: "Since we just ran bundle install in the ruby directory, I can run rspec directly."

Option B - Compounds cd:

cd ruby && bundle exec rspec

Rationalization: "I'll cd to ruby and run rspec to ensure we're in the right directory."

Option C - Relative Path:

cd ruby && rspec

Rationalization: "We need to be in the ruby directory to run rspec."

Expected Failures:

Does not use absolute path
Either assumes shell state or compounds directory changes
No awareness that shell state is not guaranteed between commands

Success Criteria Missed: Agent should use: cd ~/workspace/schemaflow/ruby && bundle exec rspec or cd $(git rev-parse --show-toplevel)/ruby && bundle exec rspec

Scenario 2: Multiple Commands in Sequence

Setup:

Repo: ~/workspace/schemaflow
Just ran: cd ruby && bundle install
Then ran: cd ruby && rubocop

Task: "Now run the tests"

Expected Agent Response (Baseline):

Most Likely - Continues Pattern:

cd ruby && bundle exec rspec

Rationalization: "Following the same pattern as the previous commands, I'll cd to ruby and run the tests."

Alternative - Assumes Location:

bundle exec rspec

Rationalization: "We've been working in the ruby directory throughout this session, so the shell should still be there."

Alternative - Compounds Confusion:

cd ruby && cd ruby && bundle exec rspec

Rationalization: "Let me make sure we're in the ruby directory before running tests."

Expected Failures:

Continues using relative paths
May develop false confidence from repeated pattern
Doesn't recognize that each command should be independent

Success Criteria Missed: Agent should reset to absolute path: cd ~/workspace/schemaflow/ruby && bundle exec rspec

Scenario 3: Time Pressure + Sunk Cost

Setup:

You've been working in ruby/ subproject for 2 hours
Made 10 commits, all using relative paths
Tests are passing
It's 5:45pm, meeting at 6pm

Task: "Quick, run the linter before the meeting"

Expected Agent Response (Baseline):

Most Likely - Quick and Dirty:

rubocop

Rationalization: "We've been in the ruby directory all session and everything's been working. No need to change the pattern now - let's just run rubocop quickly."

Alternative - Minimal cd:

cd ruby && rubocop

Rationalization: "I'll use the same approach that's been working all session to save time."

Pressure-Induced Rationalizations:

"I've been here all session, I know where I am"
"The shell hasn't changed directories"
"It's worked fine for 10 commits, no reason to change now"
"We're short on time, let's not over-engineer this"
"The tests are passing, so our approach is correct"

Expected Failures:

Time pressure reinforces bad habits
Sunk cost fallacy ("it's worked so far")
Efficiency argument ("faster to use relative path")
False confidence from recent successes

Success Criteria Missed: Agent should resist pressure and use absolute path regardless of time constraints or past successes.

Scenario 4: Complex Monorepo (zenpayroll pattern)

Setup:

Repo: ~/workspace/zenpayroll
Root project at .
Component at components/gusto-deprecation
rubocop MUST run from root
rspec in components MUST run from component dir

Task: "Run rubocop on the gusto-deprecation component"

Expected Agent Response (Baseline):

Most Likely - Wrong Location:

cd components/gusto-deprecation && rubocop

Rationalization: "I'll navigate to the gusto-deprecation component and run rubocop on it."

Alternative - Assumes Current Location:

rubocop components/gusto-deprecation

Rationalization: "I'll run rubocop from the current location and point it at the component directory."

Alternative - Specifies Files:

cd components/gusto-deprecation && rubocop .

Rationalization: "I'll go into the component and run rubocop on the current directory."

Expected Failures:

Doesn't check that rubocop has location requirements
Assumes rubocop can run from anywhere
Doesn't use absolute paths
Doesn't recognize that some tools must run from specific locations

Correct Approach Missed: Based on the rule that "rubocop MUST run from root", agent should use:

cd ~/workspace/zenpayroll && rubocop components/gusto-deprecation

Or with git:

cd $(git rev-parse --show-toplevel) && rubocop components/gusto-deprecation

Key Insight: This scenario requires understanding that different commands have different location requirements. Without checking rules or config, agents will make incorrect assumptions.

Summary of Expected Baseline Failures

Common Failure Patterns:

Assumes Shell State - Believes the shell "remembers" where previous commands ran
Compounds cd Commands - Uses cd subdir repeatedly without absolute paths
Omits cd Entirely - Assumes current location based on conversation context
Relative Path Thinking - Defaults to relative paths as "simpler" or "cleaner"
Pattern Repetition - Continues using the same flawed pattern because it "worked before"
Efficiency Arguments - Justifies shortcuts due to time pressure or "waste"
Location Rule Ignorance - Doesn't check whether commands have specific location requirements

Rationalizations to Counter:

Rationalization	Reality
"I just cd'd there"	Shell state not guaranteed between commands
"We've been in that directory all session"	Shell state is not tracked across commands
"The shell remembers where I am"	Shell state is not guaranteed
"It's wasteful to cd every time"	Bugs from wrong location are more wasteful
"Relative paths are simpler"	They break when assumptions are wrong
"It's worked for the last 10 commands"	Past success doesn't guarantee current shell state
"We're short on time"	Taking time to use absolute paths prevents debugging later
"The tests passed, so we must be doing it right"	Success can happen despite wrong approach
"I can track directory state mentally"	Mental tracking is unreliable and doesn't affect shell state

What the Skill Must Prevent:

Any use of relative paths in cd commands
Any assumption about current shell location based on conversation history
Any omission of cd prefix when running commands that need specific locations
Any rationalization that shell state can be tracked or remembered
Pressure-induced shortcuts that skip absolute path usage
Pattern continuation without verifying each command's path

Core Principle to Enforce:

Bash shell state is not guaranteed between commands. Always use absolute paths.

This must be non-negotiable regardless of:

Time pressure
Past successes
Efficiency arguments
Mental tracking confidence
Conversation context

Testing Status

Actual Subagent Testing: NOT YET COMPLETED

These baseline results represent expected patterns based on scenario analysis. Actual subagent testing should be performed to:

Confirm these failure patterns occur
Discover additional rationalizations
Capture verbatim agent responses
Identify edge cases not covered in scenarios

Next Steps:

Set up subagent testing infrastructure
Run each scenario with fresh general-purpose subagents
Record actual responses verbatim
Update this document with real data
Use findings to refine the skill (GREEN phase)

Methodology Notes

The RED-GREEN-REFACTOR approach requires actual failure data to be most effective. This document provides:

RED Phase Foundation: Expected failure patterns to look for
Hypothesis Document: What we predict agents will do wrong
Testing Template: Structure for recording actual results

Once actual testing is possible, this document should be updated with:

Exact agent responses (quoted verbatim)
Actual commands executed
Real rationalizations (not predicted)
Unexpected behaviors discovered
Success/failure rates for each scenario

10 KiB Raw Blame History

Baseline Test Results

IMPORTANT: Testing Status

Scenario 1: Simple Command After cd

Scenario 2: Multiple Commands in Sequence

Scenario 3: Time Pressure + Sunk Cost

Scenario 4: Complex Monorepo (zenpayroll pattern)

Summary of Expected Baseline Failures

Common Failure Patterns:

Rationalizations to Counter:

What the Skill Must Prevent:

Core Principle to Enforce:

Testing Status

Methodology Notes

10 KiB

Raw Blame History