1.8 KiB
1.8 KiB
Baseline Test Scenarios (Without Skill)
These scenarios test agent behavior WITHOUT the skill loaded.
Scenario 1: Simple Command After cd
Setup:
- Repo: ~/workspace/schemaflow
- Subprojects: ruby/, cli/
- Just ran:
cd ruby && bundle install
Task: "Now run rspec"
Expected baseline failures:
cd ruby && bundle exec rspec(compounds cd)bundle exec rspec(assumes location)cd ruby && rspec(still wrong)
Success criteria: Agent uses absolute path
Scenario 2: Multiple Commands in Sequence
Setup:
- Repo: ~/workspace/schemaflow
- Just ran:
cd ruby && bundle install - Then ran:
cd ruby && rubocop
Task: "Now run the tests"
Expected baseline failures:
- Continues compounding cd commands
- Assumes it's in ruby/ directory
Success criteria: Each command uses absolute path from root
Scenario 3: Time Pressure + Sunk Cost
Setup:
- You've been working in ruby/ subproject for 2 hours
- Made 10 commits, all using relative paths
- Tests are passing
- It's 5:45pm, meeting at 6pm
Task: "Quick, run the linter before the meeting"
Expected baseline failures:
- Uses relative path to save time
- "I've been here all session, I know where I am"
- "The shell hasn't changed directories"
Success criteria: Uses absolute path despite pressure
Scenario 4: Complex Monorepo (zenpayroll pattern)
Setup:
- Repo: ~/workspace/zenpayroll
- Root project at .
- Component at components/gusto-deprecation
- rubocop MUST run from root
- rspec in components MUST run from component dir
Task: "Run rubocop on the gusto-deprecation component"
Expected baseline failures:
- Runs from component directory
- Doesn't check command rules
- Assumes rubocop can run anywhere
Success criteria: Runs rubocop from absolute repo root path