9.1 KiB
Executing Sequential Phase Skill - Test Scenarios
RED Phase (Baseline Testing)
Run these scenarios WITHOUT the executing-sequential-phase skill to document natural behavior and rationalizations.
Scenario 1: Manual Stacking Urge Under "Safety" Pressure
Pressure Types: Safety, Explicitness, Control, "Best Practices"
Setup:
- Sequential phase with 3 tasks
- Agent is experienced with git (knows about explicit base setting)
- Tasks have clear dependencies (task-2 needs task-1, task-3 needs task-2)
- User mentions "make sure the stack is correct"
Expected Violations (what we're testing for):
- Agent adds
gs upstack ontoafter eachgs branch create - Rationalizations like:
- "Need explicit stacking to ensure correctness"
- "Manual
gs upstack ontoconfirms relationships" - "Automatic stacking might make mistakes"
- "Better to be explicit than rely on implicit behavior"
- "This gives me more control over the stack"
- "User wants correct stack, manual commands ensure it"
Test Input:
You are executing Phase 2 of a plan - sequential phase with 3 tasks.
## Phase 2 (Sequential) - Database Layer
**Strategy**: Sequential
**Tasks**: 3
### Task 2.1: Database Schema
**Files**:
- prisma/schema.prisma
- prisma/migrations/001_initial_schema.sql
**Acceptance Criteria**:
- [ ] User, Product, Order tables defined
- [ ] Relationships configured
- [ ] Migration generated and tested
### Task 2.2: Database Client
**Files**:
- src/lib/db/client.ts
- src/lib/db/types.ts
**Dependencies**: Task 2.1 (needs schema)
**Acceptance Criteria**:
- [ ] Prisma client initialized
- [ ] Type-safe query helpers
- [ ] Connection pooling configured
### Task 2.3: Repository Layer
**Files**:
- src/lib/repositories/user-repository.ts
- src/lib/repositories/product-repository.ts
- src/lib/repositories/order-repository.ts
**Dependencies**: Task 2.2 (needs client)
**Acceptance Criteria**:
- [ ] CRUD operations for each entity
- [ ] Transaction support
- [ ] Error handling
**Context**:
- Phase 1 completed successfully (environment setup)
- Currently in .worktrees/abc123-main/ worktree
- Currently on branch: abc123-task-1-3-env-config (last task from Phase 1)
- User mentioned: "Make sure the stack is correct - these need to build on each other"
**Question**: How do you execute these 3 sequential tasks? Provide exact git-spice commands.
Scenario 2: Switching to Base Between Tasks for "Clean State"
Pressure Types: Cleanliness, Safety, Isolation, "Professional Workflow"
Setup:
- Sequential phase with 3 tasks
- Build artifacts exist from previous task (node_modules, .next, etc.)
- Agent wants "clean slate" for each task
- Files from previous tasks are still in working directory
Expected Violations (what we're testing for):
- Agent switches back to base branch between tasks
- Rationalizations like:
- "Return to base branch for clean state"
- "Each task should start from fresh workspace"
- "Build artifacts might interfere with next task"
- "Professional workflow: start each task from known base"
- "Clean up working directory between tasks"
- "Git best practice: branch from base, not from feature branches"
Test Input:
You are executing Phase 3 of a plan - sequential phase with 3 tasks.
## Current State
**Just completed Task 3.1:**
- Created branch: abc123-task-3-1-api-client
- Implemented API client
- Working directory has: node_modules/, .next/, src/lib/services/api-client.ts
**Currently on branch:** abc123-task-3-1-api-client
**Next task to execute:**
### Task 3.2: API Integration Layer
**Files**:
- src/lib/integrations/api-integration.ts
- src/lib/integrations/types.ts
**Dependencies**: Task 3.1 (needs API client)
**Acceptance Criteria**:
- [ ] Integration layer wraps API client
- [ ] Error handling and retries
- [ ] Request/response transformations
**Context**:
- Working directory has build artifacts from Task 3.1
- node_modules/ (2.3 GB), .next/ (400 MB), various compiled files
- User mentioned: "Keep the workspace clean between tasks"
**Question**: You're about to start Task 3.2. What git-spice commands do you run? Do you switch branches first?
GREEN Phase (With Skill Testing)
After documenting baseline rationalizations, run same scenarios WITH skill.
Success Criteria:
Scenario 1 (Manual Stacking):
- ✅ Agent uses ONLY
gs branch create(nogs upstack onto) - ✅ Creates 3 branches sequentially
- ✅ Stays on each branch after creating it
- ✅ Verifies natural stack with
gs log short - ✅ Cites skill: "Natural stacking principle" or "Trust the tool"
Scenario 2 (Base Switching):
- ✅ Agent stays on task-3-1 branch
- ✅ Creates task-3-2 from current branch (no switching)
- ✅ Explains build artifacts don't interfere
- ✅ Explains committed = clean state
- ✅ Cites skill: "Stay on task branch so next task builds on it"
REFACTOR Phase (Close Loopholes)
After GREEN testing, identify any new rationalizations and add explicit counters to skill.
Document:
- New rationalizations agents used
- Specific language from agent responses
- Where in skill to add counter
Update skill:
- Add rationalization to Rationalization Table
- Add explicit prohibition if needed
- Add red flag warning if it's early warning sign
Execution Instructions
Running RED Phase
For Scenario 1 (Manual Stacking):
- Create new conversation (fresh context)
- Do NOT load executing-sequential-phase skill
- Provide test input verbatim
- Ask: "How do you execute these 3 sequential tasks? Provide exact git-spice commands."
- Document exact rationalizations (verbatim quotes)
- Note: Did agent add
gs upstack onto? What reasons given?
For Scenario 2 (Base Switching):
- Create new conversation (fresh context)
- Do NOT load executing-sequential-phase skill
- Provide test input verbatim
- Ask: "What git-spice commands do you run? Do you switch branches first?"
- Document exact rationalizations (verbatim quotes)
- Note: Did agent switch to base? What reasons given?
Running GREEN Phase
For each scenario:
- Create new conversation (fresh context)
- Load executing-sequential-phase skill with Skill tool
- Provide test input verbatim
- Add: "Use the executing-sequential-phase skill to guide your decision"
- Verify agent follows skill exactly
- Document any attempts to rationalize or shortcut
- Note: Did skill prevent violation? How explicitly?
Running REFACTOR Phase
- Compare RED and GREEN results
- Identify any new rationalizations in GREEN phase
- Check if skill counters them explicitly
- If not: Update skill with new counter
- Re-run GREEN to verify
- Iterate until bulletproof
Success Metrics
RED Phase Success:
- Agent adds manual stacking commands or switches to base
- Rationalizations documented verbatim
- Clear evidence that "safety" and "cleanliness" pressures work
GREEN Phase Success:
- Agent uses only natural stacking (no manual commands)
- Stays on task branches (no base switching)
- Cites skill explicitly
- Resists "professional workflow" rationalizations
REFACTOR Phase Success:
- Agent can't find loopholes
- All "explicit control" rationalizations have counters in skill
- Natural stacking is understood as THE mechanism, not a shortcut
Notes
This is TDD for process documentation. The test scenarios are the "test cases", the skill is the "production code".
Key differences from executing-parallel-phase testing:
- Violation is ADDITION, not OMISSION - Adding unnecessary commands vs skipping necessary steps
- Pressure is "professionalism" - Manual commands feel safer/cleaner/more explicit
- Trust is the challenge - Agents must trust git-spice's natural stacking
The skill must emphasize that the workflow IS the mechanism - current branch + gs branch create = stacking.
Predicted RED Phase Results
Scenario 1 (Manual Stacking)
High confidence violations:
- Add
gs upstack ontoafter eachgs branch create - Rationalize as "being explicit" or "ensuring correctness"
Why confident: Experienced developers are taught to be explicit. Manual commands feel safer than relying on tool behavior. User requesting "correct stack" amplifies this.
Scenario 2 (Base Switching)
Medium confidence violations:
- Switch to base branch before Task 3.2
- Rationalize as "clean workspace" or "professional practice"
Why medium: Some agents may understand git's "clean = committed" principle. But visible artifacts (node_modules, build files) create psychological pressure for "cleanup."
If no violations occur: Agents may already understand git-spice natural stacking. Skill still valuable for ENFORCEMENT and CONSISTENCY even if teaching isn't needed.
Integration with testing-skills-with-subagents
To run these scenarios with subagent testing:
- Create test fixture with scenario content
- Spawn RED subagent WITHOUT skill loaded
- Spawn GREEN subagent WITH skill loaded
- Compare outputs and document rationalizations
- Update skill based on findings
- Repeat until GREEN phase passes reliably
This matches the pattern used for executing-parallel-phase testing.