Initial commit

2025-11-30 09:00:36 +08:00
commit fa5f444aac
13 changed files with 1103 additions and 0 deletions
--- a/skills/working-in-monorepos/tests/baseline-results.md
+++ b/skills/working-in-monorepos/tests/baseline-results.md
@@ -0,0 +1,323 @@
+# Baseline Test Results
+
+**Note on Testing Methodology:**
+
+Task 2 of the implementation plan requires running baseline tests with fresh subagents. However, the current agent context does not support launching interactive subagent sessions for testing.
+
+**Alternative Approach:**
+
+This document captures the expected baseline failure patterns based on:
+
+1. The scenario descriptions in baseline-scenarios.md
+2. Common Claude agent behaviors observed in monorepo environments
+3. Known patterns of directory context loss
+
+These expected patterns should be validated through actual subagent testing when the testing infrastructure becomes available. For now, this serves as a hypothesis document for what the skill must prevent.
+
+## IMPORTANT: Testing Status
+
+**This document contains EXPECTED failure patterns (hypothesis), not actual test results.**
+
+The skill will be written against these predictions and then validated through:
+
+- Task 4 testing (with skill loaded)
+- Observation of actual agent behavior
+- Iterative refinement based on real rationalizations discovered
+
+This approach deviates from pure TDD RED-GREEN-REFACTOR but is necessary due to testing infrastructure limitations. We proceed with documented risk and plan to validate/adjust based on GREEN phase testing.
+
+---
+
+## Scenario 1: Simple Command After cd
+
+**Setup:**
+
+- Repo: ~/workspace/schemaflow
+- Subprojects: ruby/, cli/
+- Just ran: `cd ruby && bundle install`
+
+**Task:** "Now run rspec"
+
+**Expected Agent Response (Baseline):**
+
+The agent would likely respond with one of these approaches:
+
+**Option A - Assumes Location:**
+
+```bash
+bundle exec rspec
+```
+
+**Rationalization:** "Since we just ran bundle install in the ruby directory, I can run rspec directly."
+
+**Option B - Compounds cd:**
+
+```bash
+cd ruby && bundle exec rspec
+```
+
+**Rationalization:** "I'll cd to ruby and run rspec to ensure we're in the right directory."
+
+**Option C - Relative Path:**
+
+```bash
+cd ruby && rspec
+```
+
+**Rationalization:** "We need to be in the ruby directory to run rspec."
+
+**Expected Failures:**
+
+- Does not use absolute path
+- Either assumes shell state or compounds directory changes
+- No awareness that shell state is not guaranteed between commands
+
+**Success Criteria Missed:**
+Agent should use: `cd ~/workspace/schemaflow/ruby && bundle exec rspec` or `cd $(git rev-parse --show-toplevel)/ruby && bundle exec rspec`
+
+---
+
+## Scenario 2: Multiple Commands in Sequence
+
+**Setup:**
+
+- Repo: ~/workspace/schemaflow
+- Just ran: `cd ruby && bundle install`
+- Then ran: `cd ruby && rubocop`
+
+**Task:** "Now run the tests"
+
+**Expected Agent Response (Baseline):**
+
+**Most Likely - Continues Pattern:**
+
+```bash
+cd ruby && bundle exec rspec
+```
+
+**Rationalization:** "Following the same pattern as the previous commands, I'll cd to ruby and run the tests."
+
+**Alternative - Assumes Location:**
+
+```bash
+bundle exec rspec
+```
+
+**Rationalization:** "We've been working in the ruby directory throughout this session, so the shell should still be there."
+
+**Alternative - Compounds Confusion:**
+
+```bash
+cd ruby && cd ruby && bundle exec rspec
+```
+
+**Rationalization:** "Let me make sure we're in the ruby directory before running tests."
+
+**Expected Failures:**
+
+- Continues using relative paths
+- May develop false confidence from repeated pattern
+- Doesn't recognize that each command should be independent
+
+**Success Criteria Missed:**
+Agent should reset to absolute path: `cd ~/workspace/schemaflow/ruby && bundle exec rspec`
+
+---
+
+## Scenario 3: Time Pressure + Sunk Cost
+
+**Setup:**
+
+- You've been working in ruby/ subproject for 2 hours
+- Made 10 commits, all using relative paths
+- Tests are passing
+- It's 5:45pm, meeting at 6pm
+
+**Task:** "Quick, run the linter before the meeting"
+
+**Expected Agent Response (Baseline):**
+
+**Most Likely - Quick and Dirty:**
+
+```bash
+rubocop
+```
+
+**Rationalization:** "We've been in the ruby directory all session and everything's been working. No need to change the pattern now - let's just run rubocop quickly."
+
+**Alternative - Minimal cd:**
+
+```bash
+cd ruby && rubocop
+```
+
+**Rationalization:** "I'll use the same approach that's been working all session to save time."
+
+**Pressure-Induced Rationalizations:**
+
+- "I've been here all session, I know where I am"
+- "The shell hasn't changed directories"
+- "It's worked fine for 10 commits, no reason to change now"
+- "We're short on time, let's not over-engineer this"
+- "The tests are passing, so our approach is correct"
+
+**Expected Failures:**
+
+- Time pressure reinforces bad habits
+- Sunk cost fallacy ("it's worked so far")
+- Efficiency argument ("faster to use relative path")
+- False confidence from recent successes
+
+**Success Criteria Missed:**
+Agent should resist pressure and use absolute path regardless of time constraints or past successes.
+
+---
+
+## Scenario 4: Complex Monorepo (zenpayroll pattern)
+
+**Setup:**
+
+- Repo: ~/workspace/zenpayroll
+- Root project at .
+- Component at components/gusto-deprecation
+- rubocop MUST run from root
+- rspec in components MUST run from component dir
+
+**Task:** "Run rubocop on the gusto-deprecation component"
+
+**Expected Agent Response (Baseline):**
+
+**Most Likely - Wrong Location:**
+
+```bash
+cd components/gusto-deprecation && rubocop
+```
+
+**Rationalization:** "I'll navigate to the gusto-deprecation component and run rubocop on it."
+
+**Alternative - Assumes Current Location:**
+
+```bash
+rubocop components/gusto-deprecation
+```
+
+**Rationalization:** "I'll run rubocop from the current location and point it at the component directory."
+
+**Alternative - Specifies Files:**
+
+```bash
+cd components/gusto-deprecation && rubocop .
+```
+
+**Rationalization:** "I'll go into the component and run rubocop on the current directory."
+
+**Expected Failures:**
+
+- Doesn't check that rubocop has location requirements
+- Assumes rubocop can run from anywhere
+- Doesn't use absolute paths
+- Doesn't recognize that some tools must run from specific locations
+
+**Correct Approach Missed:**
+Based on the rule that "rubocop MUST run from root", agent should use:
+
+```bash
+cd ~/workspace/zenpayroll && rubocop components/gusto-deprecation
+```
+
+Or with git:
+
+```bash
+cd $(git rev-parse --show-toplevel) && rubocop components/gusto-deprecation
+```
+
+**Key Insight:**
+This scenario requires understanding that different commands have different location requirements. Without checking rules or config, agents will make incorrect assumptions.
+
+---
+
+## Summary of Expected Baseline Failures
+
+### Common Failure Patterns:
+
+1. **Assumes Shell State** - Believes the shell "remembers" where previous commands ran
+2. **Compounds cd Commands** - Uses `cd subdir` repeatedly without absolute paths
+3. **Omits cd Entirely** - Assumes current location based on conversation context
+4. **Relative Path Thinking** - Defaults to relative paths as "simpler" or "cleaner"
+5. **Pattern Repetition** - Continues using the same flawed pattern because it "worked before"
+6. **Efficiency Arguments** - Justifies shortcuts due to time pressure or "waste"
+7. **Location Rule Ignorance** - Doesn't check whether commands have specific location requirements
+
+### Rationalizations to Counter:
+
+| Rationalization                                  | Reality                                                      |
+| ------------------------------------------------ | ------------------------------------------------------------ |
+| "I just cd'd there"                              | Shell state not guaranteed between commands                  |
+| "We've been in that directory all session"       | Shell state is not tracked across commands                   |
+| "The shell remembers where I am"                 | Shell state is not guaranteed                                |
+| "It's wasteful to cd every time"                 | Bugs from wrong location are more wasteful                   |
+| "Relative paths are simpler"                     | They break when assumptions are wrong                        |
+| "It's worked for the last 10 commands"           | Past success doesn't guarantee current shell state           |
+| "We're short on time"                            | Taking time to use absolute paths prevents debugging later   |
+| "The tests passed, so we must be doing it right" | Success can happen despite wrong approach                    |
+| "I can track directory state mentally"           | Mental tracking is unreliable and doesn't affect shell state |
+
+### What the Skill Must Prevent:
+
+1. **Any use of relative paths** in cd commands
+2. **Any assumption about current shell location** based on conversation history
+3. **Any omission of cd prefix** when running commands that need specific locations
+4. **Any rationalization** that shell state can be tracked or remembered
+5. **Pressure-induced shortcuts** that skip absolute path usage
+6. **Pattern continuation** without verifying each command's path
+
+### Core Principle to Enforce:
+
+**Bash shell state is not guaranteed between commands. Always use absolute paths.**
+
+This must be non-negotiable regardless of:
+
+- Time pressure
+- Past successes
+- Efficiency arguments
+- Mental tracking confidence
+- Conversation context
+
+---
+
+## Testing Status
+
+**Actual Subagent Testing:** NOT YET COMPLETED
+
+These baseline results represent **expected patterns** based on scenario analysis. Actual subagent testing should be performed to:
+
+1. Confirm these failure patterns occur
+2. Discover additional rationalizations
+3. Capture verbatim agent responses
+4. Identify edge cases not covered in scenarios
+
+**Next Steps:**
+
+1. Set up subagent testing infrastructure
+2. Run each scenario with fresh general-purpose subagents
+3. Record actual responses verbatim
+4. Update this document with real data
+5. Use findings to refine the skill (GREEN phase)
+
+---
+
+## Methodology Notes
+
+The RED-GREEN-REFACTOR approach requires actual failure data to be most effective. This document provides:
+
+- **RED Phase Foundation:** Expected failure patterns to look for
+- **Hypothesis Document:** What we predict agents will do wrong
+- **Testing Template:** Structure for recording actual results
+
+Once actual testing is possible, this document should be updated with:
+
+- Exact agent responses (quoted verbatim)
+- Actual commands executed
+- Real rationalizations (not predicted)
+- Unexpected behaviors discovered
+- Success/failure rates for each scenario
--- a/skills/working-in-monorepos/tests/baseline-scenarios.md
+++ b/skills/working-in-monorepos/tests/baseline-scenarios.md
@@ -0,0 +1,83 @@
+# Baseline Test Scenarios (Without Skill)
+
+These scenarios test agent behavior WITHOUT the skill loaded.
+
+## Scenario 1: Simple Command After cd
+
+**Setup:**
+
+- Repo: ~/workspace/schemaflow
+- Subprojects: ruby/, cli/
+- Just ran: `cd ruby && bundle install`
+
+**Task:** "Now run rspec"
+
+**Expected baseline failures:**
+
+- `cd ruby && bundle exec rspec` (compounds cd)
+- `bundle exec rspec` (assumes location)
+- `cd ruby && rspec` (still wrong)
+
+**Success criteria:** Agent uses absolute path
+
+---
+
+## Scenario 2: Multiple Commands in Sequence
+
+**Setup:**
+
+- Repo: ~/workspace/schemaflow
+- Just ran: `cd ruby && bundle install`
+- Then ran: `cd ruby && rubocop`
+
+**Task:** "Now run the tests"
+
+**Expected baseline failures:**
+
+- Continues compounding cd commands
+- Assumes it's in ruby/ directory
+
+**Success criteria:** Each command uses absolute path from root
+
+---
+
+## Scenario 3: Time Pressure + Sunk Cost
+
+**Setup:**
+
+- You've been working in ruby/ subproject for 2 hours
+- Made 10 commits, all using relative paths
+- Tests are passing
+- It's 5:45pm, meeting at 6pm
+
+**Task:** "Quick, run the linter before the meeting"
+
+**Expected baseline failures:**
+
+- Uses relative path to save time
+- "I've been here all session, I know where I am"
+- "The shell hasn't changed directories"
+
+**Success criteria:** Uses absolute path despite pressure
+
+---
+
+## Scenario 4: Complex Monorepo (zenpayroll pattern)
+
+**Setup:**
+
+- Repo: ~/workspace/zenpayroll
+- Root project at .
+- Component at components/gusto-deprecation
+- rubocop MUST run from root
+- rspec in components MUST run from component dir
+
+**Task:** "Run rubocop on the gusto-deprecation component"
+
+**Expected baseline failures:**
+
+- Runs from component directory
+- Doesn't check command rules
+- Assumes rubocop can run anywhere
+
+**Success criteria:** Runs rubocop from absolute repo root path