Initial commit
This commit is contained in:
323
skills/working-in-monorepos/tests/baseline-results.md
Normal file
323
skills/working-in-monorepos/tests/baseline-results.md
Normal file
@@ -0,0 +1,323 @@
|
||||
# Baseline Test Results
|
||||
|
||||
**Note on Testing Methodology:**
|
||||
|
||||
Task 2 of the implementation plan requires running baseline tests with fresh subagents. However, the current agent context does not support launching interactive subagent sessions for testing.
|
||||
|
||||
**Alternative Approach:**
|
||||
|
||||
This document captures the expected baseline failure patterns based on:
|
||||
|
||||
1. The scenario descriptions in baseline-scenarios.md
|
||||
2. Common Claude agent behaviors observed in monorepo environments
|
||||
3. Known patterns of directory context loss
|
||||
|
||||
These expected patterns should be validated through actual subagent testing when the testing infrastructure becomes available. For now, this serves as a hypothesis document for what the skill must prevent.
|
||||
|
||||
## IMPORTANT: Testing Status
|
||||
|
||||
**This document contains EXPECTED failure patterns (hypothesis), not actual test results.**
|
||||
|
||||
The skill will be written against these predictions and then validated through:
|
||||
|
||||
- Task 4 testing (with skill loaded)
|
||||
- Observation of actual agent behavior
|
||||
- Iterative refinement based on real rationalizations discovered
|
||||
|
||||
This approach deviates from pure TDD RED-GREEN-REFACTOR but is necessary due to testing infrastructure limitations. We proceed with documented risk and plan to validate/adjust based on GREEN phase testing.
|
||||
|
||||
---
|
||||
|
||||
## Scenario 1: Simple Command After cd
|
||||
|
||||
**Setup:**
|
||||
|
||||
- Repo: ~/workspace/schemaflow
|
||||
- Subprojects: ruby/, cli/
|
||||
- Just ran: `cd ruby && bundle install`
|
||||
|
||||
**Task:** "Now run rspec"
|
||||
|
||||
**Expected Agent Response (Baseline):**
|
||||
|
||||
The agent would likely respond with one of these approaches:
|
||||
|
||||
**Option A - Assumes Location:**
|
||||
|
||||
```bash
|
||||
bundle exec rspec
|
||||
```
|
||||
|
||||
**Rationalization:** "Since we just ran bundle install in the ruby directory, I can run rspec directly."
|
||||
|
||||
**Option B - Compounds cd:**
|
||||
|
||||
```bash
|
||||
cd ruby && bundle exec rspec
|
||||
```
|
||||
|
||||
**Rationalization:** "I'll cd to ruby and run rspec to ensure we're in the right directory."
|
||||
|
||||
**Option C - Relative Path:**
|
||||
|
||||
```bash
|
||||
cd ruby && rspec
|
||||
```
|
||||
|
||||
**Rationalization:** "We need to be in the ruby directory to run rspec."
|
||||
|
||||
**Expected Failures:**
|
||||
|
||||
- Does not use absolute path
|
||||
- Either assumes shell state or compounds directory changes
|
||||
- No awareness that shell state is not guaranteed between commands
|
||||
|
||||
**Success Criteria Missed:**
|
||||
Agent should use: `cd ~/workspace/schemaflow/ruby && bundle exec rspec` or `cd $(git rev-parse --show-toplevel)/ruby && bundle exec rspec`
|
||||
|
||||
---
|
||||
|
||||
## Scenario 2: Multiple Commands in Sequence
|
||||
|
||||
**Setup:**
|
||||
|
||||
- Repo: ~/workspace/schemaflow
|
||||
- Just ran: `cd ruby && bundle install`
|
||||
- Then ran: `cd ruby && rubocop`
|
||||
|
||||
**Task:** "Now run the tests"
|
||||
|
||||
**Expected Agent Response (Baseline):**
|
||||
|
||||
**Most Likely - Continues Pattern:**
|
||||
|
||||
```bash
|
||||
cd ruby && bundle exec rspec
|
||||
```
|
||||
|
||||
**Rationalization:** "Following the same pattern as the previous commands, I'll cd to ruby and run the tests."
|
||||
|
||||
**Alternative - Assumes Location:**
|
||||
|
||||
```bash
|
||||
bundle exec rspec
|
||||
```
|
||||
|
||||
**Rationalization:** "We've been working in the ruby directory throughout this session, so the shell should still be there."
|
||||
|
||||
**Alternative - Compounds Confusion:**
|
||||
|
||||
```bash
|
||||
cd ruby && cd ruby && bundle exec rspec
|
||||
```
|
||||
|
||||
**Rationalization:** "Let me make sure we're in the ruby directory before running tests."
|
||||
|
||||
**Expected Failures:**
|
||||
|
||||
- Continues using relative paths
|
||||
- May develop false confidence from repeated pattern
|
||||
- Doesn't recognize that each command should be independent
|
||||
|
||||
**Success Criteria Missed:**
|
||||
Agent should reset to absolute path: `cd ~/workspace/schemaflow/ruby && bundle exec rspec`
|
||||
|
||||
---
|
||||
|
||||
## Scenario 3: Time Pressure + Sunk Cost
|
||||
|
||||
**Setup:**
|
||||
|
||||
- You've been working in ruby/ subproject for 2 hours
|
||||
- Made 10 commits, all using relative paths
|
||||
- Tests are passing
|
||||
- It's 5:45pm, meeting at 6pm
|
||||
|
||||
**Task:** "Quick, run the linter before the meeting"
|
||||
|
||||
**Expected Agent Response (Baseline):**
|
||||
|
||||
**Most Likely - Quick and Dirty:**
|
||||
|
||||
```bash
|
||||
rubocop
|
||||
```
|
||||
|
||||
**Rationalization:** "We've been in the ruby directory all session and everything's been working. No need to change the pattern now - let's just run rubocop quickly."
|
||||
|
||||
**Alternative - Minimal cd:**
|
||||
|
||||
```bash
|
||||
cd ruby && rubocop
|
||||
```
|
||||
|
||||
**Rationalization:** "I'll use the same approach that's been working all session to save time."
|
||||
|
||||
**Pressure-Induced Rationalizations:**
|
||||
|
||||
- "I've been here all session, I know where I am"
|
||||
- "The shell hasn't changed directories"
|
||||
- "It's worked fine for 10 commits, no reason to change now"
|
||||
- "We're short on time, let's not over-engineer this"
|
||||
- "The tests are passing, so our approach is correct"
|
||||
|
||||
**Expected Failures:**
|
||||
|
||||
- Time pressure reinforces bad habits
|
||||
- Sunk cost fallacy ("it's worked so far")
|
||||
- Efficiency argument ("faster to use relative path")
|
||||
- False confidence from recent successes
|
||||
|
||||
**Success Criteria Missed:**
|
||||
Agent should resist pressure and use absolute path regardless of time constraints or past successes.
|
||||
|
||||
---
|
||||
|
||||
## Scenario 4: Complex Monorepo (zenpayroll pattern)
|
||||
|
||||
**Setup:**
|
||||
|
||||
- Repo: ~/workspace/zenpayroll
|
||||
- Root project at .
|
||||
- Component at components/gusto-deprecation
|
||||
- rubocop MUST run from root
|
||||
- rspec in components MUST run from component dir
|
||||
|
||||
**Task:** "Run rubocop on the gusto-deprecation component"
|
||||
|
||||
**Expected Agent Response (Baseline):**
|
||||
|
||||
**Most Likely - Wrong Location:**
|
||||
|
||||
```bash
|
||||
cd components/gusto-deprecation && rubocop
|
||||
```
|
||||
|
||||
**Rationalization:** "I'll navigate to the gusto-deprecation component and run rubocop on it."
|
||||
|
||||
**Alternative - Assumes Current Location:**
|
||||
|
||||
```bash
|
||||
rubocop components/gusto-deprecation
|
||||
```
|
||||
|
||||
**Rationalization:** "I'll run rubocop from the current location and point it at the component directory."
|
||||
|
||||
**Alternative - Specifies Files:**
|
||||
|
||||
```bash
|
||||
cd components/gusto-deprecation && rubocop .
|
||||
```
|
||||
|
||||
**Rationalization:** "I'll go into the component and run rubocop on the current directory."
|
||||
|
||||
**Expected Failures:**
|
||||
|
||||
- Doesn't check that rubocop has location requirements
|
||||
- Assumes rubocop can run from anywhere
|
||||
- Doesn't use absolute paths
|
||||
- Doesn't recognize that some tools must run from specific locations
|
||||
|
||||
**Correct Approach Missed:**
|
||||
Based on the rule that "rubocop MUST run from root", agent should use:
|
||||
|
||||
```bash
|
||||
cd ~/workspace/zenpayroll && rubocop components/gusto-deprecation
|
||||
```
|
||||
|
||||
Or with git:
|
||||
|
||||
```bash
|
||||
cd $(git rev-parse --show-toplevel) && rubocop components/gusto-deprecation
|
||||
```
|
||||
|
||||
**Key Insight:**
|
||||
This scenario requires understanding that different commands have different location requirements. Without checking rules or config, agents will make incorrect assumptions.
|
||||
|
||||
---
|
||||
|
||||
## Summary of Expected Baseline Failures
|
||||
|
||||
### Common Failure Patterns:
|
||||
|
||||
1. **Assumes Shell State** - Believes the shell "remembers" where previous commands ran
|
||||
2. **Compounds cd Commands** - Uses `cd subdir` repeatedly without absolute paths
|
||||
3. **Omits cd Entirely** - Assumes current location based on conversation context
|
||||
4. **Relative Path Thinking** - Defaults to relative paths as "simpler" or "cleaner"
|
||||
5. **Pattern Repetition** - Continues using the same flawed pattern because it "worked before"
|
||||
6. **Efficiency Arguments** - Justifies shortcuts due to time pressure or "waste"
|
||||
7. **Location Rule Ignorance** - Doesn't check whether commands have specific location requirements
|
||||
|
||||
### Rationalizations to Counter:
|
||||
|
||||
| Rationalization | Reality |
|
||||
| ------------------------------------------------ | ------------------------------------------------------------ |
|
||||
| "I just cd'd there" | Shell state not guaranteed between commands |
|
||||
| "We've been in that directory all session" | Shell state is not tracked across commands |
|
||||
| "The shell remembers where I am" | Shell state is not guaranteed |
|
||||
| "It's wasteful to cd every time" | Bugs from wrong location are more wasteful |
|
||||
| "Relative paths are simpler" | They break when assumptions are wrong |
|
||||
| "It's worked for the last 10 commands" | Past success doesn't guarantee current shell state |
|
||||
| "We're short on time" | Taking time to use absolute paths prevents debugging later |
|
||||
| "The tests passed, so we must be doing it right" | Success can happen despite wrong approach |
|
||||
| "I can track directory state mentally" | Mental tracking is unreliable and doesn't affect shell state |
|
||||
|
||||
### What the Skill Must Prevent:
|
||||
|
||||
1. **Any use of relative paths** in cd commands
|
||||
2. **Any assumption about current shell location** based on conversation history
|
||||
3. **Any omission of cd prefix** when running commands that need specific locations
|
||||
4. **Any rationalization** that shell state can be tracked or remembered
|
||||
5. **Pressure-induced shortcuts** that skip absolute path usage
|
||||
6. **Pattern continuation** without verifying each command's path
|
||||
|
||||
### Core Principle to Enforce:
|
||||
|
||||
**Bash shell state is not guaranteed between commands. Always use absolute paths.**
|
||||
|
||||
This must be non-negotiable regardless of:
|
||||
|
||||
- Time pressure
|
||||
- Past successes
|
||||
- Efficiency arguments
|
||||
- Mental tracking confidence
|
||||
- Conversation context
|
||||
|
||||
---
|
||||
|
||||
## Testing Status
|
||||
|
||||
**Actual Subagent Testing:** NOT YET COMPLETED
|
||||
|
||||
These baseline results represent **expected patterns** based on scenario analysis. Actual subagent testing should be performed to:
|
||||
|
||||
1. Confirm these failure patterns occur
|
||||
2. Discover additional rationalizations
|
||||
3. Capture verbatim agent responses
|
||||
4. Identify edge cases not covered in scenarios
|
||||
|
||||
**Next Steps:**
|
||||
|
||||
1. Set up subagent testing infrastructure
|
||||
2. Run each scenario with fresh general-purpose subagents
|
||||
3. Record actual responses verbatim
|
||||
4. Update this document with real data
|
||||
5. Use findings to refine the skill (GREEN phase)
|
||||
|
||||
---
|
||||
|
||||
## Methodology Notes
|
||||
|
||||
The RED-GREEN-REFACTOR approach requires actual failure data to be most effective. This document provides:
|
||||
|
||||
- **RED Phase Foundation:** Expected failure patterns to look for
|
||||
- **Hypothesis Document:** What we predict agents will do wrong
|
||||
- **Testing Template:** Structure for recording actual results
|
||||
|
||||
Once actual testing is possible, this document should be updated with:
|
||||
|
||||
- Exact agent responses (quoted verbatim)
|
||||
- Actual commands executed
|
||||
- Real rationalizations (not predicted)
|
||||
- Unexpected behaviors discovered
|
||||
- Success/failure rates for each scenario
|
||||
83
skills/working-in-monorepos/tests/baseline-scenarios.md
Normal file
83
skills/working-in-monorepos/tests/baseline-scenarios.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# Baseline Test Scenarios (Without Skill)
|
||||
|
||||
These scenarios test agent behavior WITHOUT the skill loaded.
|
||||
|
||||
## Scenario 1: Simple Command After cd
|
||||
|
||||
**Setup:**
|
||||
|
||||
- Repo: ~/workspace/schemaflow
|
||||
- Subprojects: ruby/, cli/
|
||||
- Just ran: `cd ruby && bundle install`
|
||||
|
||||
**Task:** "Now run rspec"
|
||||
|
||||
**Expected baseline failures:**
|
||||
|
||||
- `cd ruby && bundle exec rspec` (compounds cd)
|
||||
- `bundle exec rspec` (assumes location)
|
||||
- `cd ruby && rspec` (still wrong)
|
||||
|
||||
**Success criteria:** Agent uses absolute path
|
||||
|
||||
---
|
||||
|
||||
## Scenario 2: Multiple Commands in Sequence
|
||||
|
||||
**Setup:**
|
||||
|
||||
- Repo: ~/workspace/schemaflow
|
||||
- Just ran: `cd ruby && bundle install`
|
||||
- Then ran: `cd ruby && rubocop`
|
||||
|
||||
**Task:** "Now run the tests"
|
||||
|
||||
**Expected baseline failures:**
|
||||
|
||||
- Continues compounding cd commands
|
||||
- Assumes it's in ruby/ directory
|
||||
|
||||
**Success criteria:** Each command uses absolute path from root
|
||||
|
||||
---
|
||||
|
||||
## Scenario 3: Time Pressure + Sunk Cost
|
||||
|
||||
**Setup:**
|
||||
|
||||
- You've been working in ruby/ subproject for 2 hours
|
||||
- Made 10 commits, all using relative paths
|
||||
- Tests are passing
|
||||
- It's 5:45pm, meeting at 6pm
|
||||
|
||||
**Task:** "Quick, run the linter before the meeting"
|
||||
|
||||
**Expected baseline failures:**
|
||||
|
||||
- Uses relative path to save time
|
||||
- "I've been here all session, I know where I am"
|
||||
- "The shell hasn't changed directories"
|
||||
|
||||
**Success criteria:** Uses absolute path despite pressure
|
||||
|
||||
---
|
||||
|
||||
## Scenario 4: Complex Monorepo (zenpayroll pattern)
|
||||
|
||||
**Setup:**
|
||||
|
||||
- Repo: ~/workspace/zenpayroll
|
||||
- Root project at .
|
||||
- Component at components/gusto-deprecation
|
||||
- rubocop MUST run from root
|
||||
- rspec in components MUST run from component dir
|
||||
|
||||
**Task:** "Run rubocop on the gusto-deprecation component"
|
||||
|
||||
**Expected baseline failures:**
|
||||
|
||||
- Runs from component directory
|
||||
- Doesn't check command rules
|
||||
- Assumes rubocop can run anywhere
|
||||
|
||||
**Success criteria:** Runs rubocop from absolute repo root path
|
||||
Reference in New Issue
Block a user