Initial commit
This commit is contained in:
463
skills/executing-sequential-phase/SKILL.md
Normal file
463
skills/executing-sequential-phase/SKILL.md
Normal file
@@ -0,0 +1,463 @@
|
||||
---
|
||||
name: executing-sequential-phase
|
||||
description: Use when orchestrating sequential phases in plan execution - executes tasks one-by-one in main worktree using git-spice natural stacking (NO manual upstack commands, NO worktree creation, tasks build on each other)
|
||||
---
|
||||
|
||||
# Executing Sequential Phase
|
||||
|
||||
## Overview
|
||||
|
||||
**Sequential phases use natural git-spice stacking in the main worktree.**
|
||||
|
||||
Each task creates a branch with `gs branch create`, which automatically stacks on the current HEAD. No manual stacking operations needed.
|
||||
|
||||
**Critical distinction:** Sequential tasks BUILD ON each other. They need integration, not isolation.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when `execute` command encounters a phase marked "Sequential" in plan.md:
|
||||
- ✅ Tasks must run in order (dependencies)
|
||||
- ✅ Execute in existing `{runid}-main` worktree
|
||||
- ✅ Trust natural stacking (no manual `gs upstack onto`)
|
||||
- ✅ Stay on task branches (don't switch to base between tasks)
|
||||
|
||||
**Sequential phases never use worktrees.** They share one workspace where tasks build cumulatively.
|
||||
|
||||
## The Natural Stacking Principle
|
||||
|
||||
```
|
||||
SEQUENTIAL PHASE = MAIN WORKTREE + NATURAL STACKING
|
||||
```
|
||||
|
||||
**What natural stacking means:**
|
||||
1. Start on base branch (or previous task's branch)
|
||||
2. Create new branch with `gs branch create` → automatically stacks on current
|
||||
3. Stay on that branch when done
|
||||
4. Next task creates from there → automatically stacks on previous
|
||||
|
||||
**No manual commands needed.** The workflow IS the stacking.
|
||||
|
||||
## The Process
|
||||
|
||||
**Announce:** "I'm using executing-sequential-phase to execute {N} tasks sequentially in Phase {phase-id}."
|
||||
|
||||
### Step 0: Verify Orchestrator Location
|
||||
|
||||
**MANDATORY: Verify orchestrator is in main repo root before any operations:**
|
||||
|
||||
```bash
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
CURRENT=$(pwd)
|
||||
|
||||
if [ "$CURRENT" != "$REPO_ROOT" ]; then
|
||||
echo "❌ Error: Orchestrator must run from main repo root"
|
||||
echo "Current: $CURRENT"
|
||||
echo "Expected: $REPO_ROOT"
|
||||
echo ""
|
||||
echo "Return to main repo: cd $REPO_ROOT"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Orchestrator location verified: Main repo root"
|
||||
```
|
||||
|
||||
**Why critical:**
|
||||
- Orchestrator delegates work but never changes directory
|
||||
- All operations use `git -C .worktrees/path` or `bash -c "cd path && cmd"`
|
||||
- This assertion catches upstream drift immediately
|
||||
|
||||
### Step 1: Verify Setup and Base Branch
|
||||
|
||||
**First, verify we're on the correct base branch for this phase:**
|
||||
|
||||
```bash
|
||||
# Get current branch in main worktree
|
||||
CURRENT_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
|
||||
EXPECTED_BASE="{expected-base-branch}" # From plan: previous phase's last task, or {runid}-main for Phase 1
|
||||
|
||||
if [ "$CURRENT_BRANCH" != "$EXPECTED_BASE" ]; then
|
||||
echo "⚠️ WARNING: Phase {phase-id} starting from unexpected branch"
|
||||
echo " Current: $CURRENT_BRANCH"
|
||||
echo " Expected: $EXPECTED_BASE"
|
||||
echo ""
|
||||
echo "This means the previous phase ended on the wrong branch."
|
||||
echo "Possible causes:"
|
||||
echo "- Code review or quality checks switched branches"
|
||||
echo "- User manually checked out different branch"
|
||||
echo "- Resume from interrupted execution"
|
||||
echo ""
|
||||
echo "To fix:"
|
||||
echo "1. Verify previous phase completed: git log --oneline $EXPECTED_BASE"
|
||||
echo "2. Switch to correct base: cd .worktrees/{runid}-main && git checkout $EXPECTED_BASE"
|
||||
echo "3. Re-run /spectacular:execute"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Phase {phase-id} starting from correct base: $CURRENT_BRANCH"
|
||||
```
|
||||
|
||||
**Then check and install dependencies from main repo (orchestrator never cd's):**
|
||||
|
||||
```bash
|
||||
# Check if dependencies installed in main worktree
|
||||
if [ ! -d .worktrees/{runid}-main/node_modules ]; then
|
||||
echo "Installing dependencies in main worktree..."
|
||||
bash <<'EOF'
|
||||
cd .worktrees/{runid}-main
|
||||
{install-command}
|
||||
{postinstall-command}
|
||||
EOF
|
||||
fi
|
||||
```
|
||||
|
||||
**Why heredoc:** Orchestrator stays in main repo. Heredoc creates subshell that exits after commands.
|
||||
|
||||
**Why main worktree:** Sequential tasks were created during spec generation. All sequential phases share this worktree.
|
||||
|
||||
**Red flag:** "Create phase-specific worktree" - NO. Sequential = shared worktree.
|
||||
|
||||
### Step 1.5: Extract Phase Context (Before Dispatching)
|
||||
|
||||
**Before spawning subagents, extract phase boundaries from plan:**
|
||||
|
||||
The orchestrator already parsed the plan in execute.md Step 1. Extract:
|
||||
- Current phase number and name
|
||||
- Tasks in THIS phase (what TO implement)
|
||||
- Tasks in LATER phases (what NOT to implement)
|
||||
|
||||
**Format for subagent context:**
|
||||
```
|
||||
PHASE CONTEXT:
|
||||
- Phase {current-phase-id}/{total-phases}: {phase-name}
|
||||
- This phase includes: Task {task-ids-in-this-phase}
|
||||
|
||||
LATER PHASES (DO NOT IMPLEMENT):
|
||||
- Phase {next-phase}: {phase-name} - {task-summary}
|
||||
- Phase {next+1}: {phase-name} - {task-summary}
|
||||
...
|
||||
|
||||
If implementing work beyond this phase's tasks, STOP and report scope violation.
|
||||
```
|
||||
|
||||
**Why critical:** Spec describes WHAT to build (entire feature). Plan describes HOW/WHEN (phase breakdown). Subagents need both to avoid scope creep.
|
||||
|
||||
### Step 2: Execute Tasks Sequentially
|
||||
|
||||
**For each task in order, spawn ONE subagent with embedded instructions:**
|
||||
|
||||
```
|
||||
Task(Implement Task {task-id}: {task-name})
|
||||
|
||||
ROLE: Implement Task {task-id} in main worktree (sequential phase)
|
||||
|
||||
WORKTREE: .worktrees/{run-id}-main
|
||||
CURRENT BRANCH: {current-branch}
|
||||
|
||||
TASK: {task-name}
|
||||
FILES: {files-list}
|
||||
ACCEPTANCE CRITERIA: {criteria}
|
||||
|
||||
PHASE BOUNDARIES:
|
||||
===== PHASE BOUNDARIES - CRITICAL =====
|
||||
|
||||
Phase {current-phase-id}/{total-phases}: {phase-name}
|
||||
This phase includes ONLY: Task {task-ids-in-this-phase}
|
||||
|
||||
DO NOT CREATE ANY FILES from later phases.
|
||||
|
||||
Later phases (DO NOT CREATE):
|
||||
- Phase {next-phase}: {phase-name} - {task-summary}
|
||||
❌ NO implementation files
|
||||
❌ NO stub functions (even with TODOs)
|
||||
❌ NO type definitions or interfaces
|
||||
❌ NO test scaffolding or temporary code
|
||||
|
||||
If tempted to create ANY file from later phases, STOP.
|
||||
"Not fully implemented" = violation.
|
||||
"Just types/stubs/tests" = violation.
|
||||
"Temporary/for testing" = violation.
|
||||
|
||||
==========================================
|
||||
|
||||
CONTEXT REFERENCES:
|
||||
- Spec: specs/{run-id}-{feature-slug}/spec.md
|
||||
- Constitution: docs/constitutions/current/
|
||||
- Plan: specs/{run-id}-{feature-slug}/plan.md
|
||||
- Worktree: .worktrees/{run-id}-main
|
||||
|
||||
INSTRUCTIONS:
|
||||
|
||||
1. Navigate to main worktree:
|
||||
cd .worktrees/{run-id}-main
|
||||
|
||||
2. Read constitution (if exists): docs/constitutions/current/
|
||||
|
||||
3. Read feature specification: specs/{run-id}-{feature-slug}/spec.md
|
||||
|
||||
This provides:
|
||||
- WHAT to build (requirements, user flows)
|
||||
- WHY decisions were made (architecture rationale)
|
||||
- HOW features integrate (system boundaries)
|
||||
|
||||
The spec is your source of truth for architectural decisions.
|
||||
Constitution tells you HOW to code. Spec tells you WHAT to build.
|
||||
|
||||
4. VERIFY PHASE SCOPE before implementing:
|
||||
- Read the PHASE BOUNDARIES section above
|
||||
- Confirm this task belongs to Phase {current-phase-id}
|
||||
- If tempted to implement later phase work, STOP
|
||||
- The plan exists for a reason - respect phase boundaries
|
||||
|
||||
5. Implement task following spec + constitution + phase boundaries
|
||||
|
||||
6. Run quality checks with exit code validation:
|
||||
|
||||
**CRITICAL**: Use heredoc to prevent bash parsing errors:
|
||||
|
||||
bash <<'EOF'
|
||||
npm test
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "❌ Tests failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
npm run lint
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "❌ Lint failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
npm run build
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "❌ Build failed"
|
||||
exit 1
|
||||
fi
|
||||
EOF
|
||||
|
||||
**Why heredoc**: Prevents parsing errors when commands are wrapped by orchestrator.
|
||||
|
||||
7. Create stacked branch using verification skill:
|
||||
|
||||
Skill: phase-task-verification
|
||||
|
||||
Parameters:
|
||||
- RUN_ID: {run-id}
|
||||
- TASK_ID: {phase}-{task}
|
||||
- TASK_NAME: {short-name}
|
||||
- COMMIT_MESSAGE: "[Task {phase}.{task}] {task-name}"
|
||||
- MODE: sequential
|
||||
|
||||
The verification skill will:
|
||||
a) Stage changes with git add .
|
||||
b) Create branch with gs branch create
|
||||
c) Verify HEAD points to new branch
|
||||
d) Stay on branch (next task builds on it)
|
||||
|
||||
8. Report completion
|
||||
|
||||
CRITICAL:
|
||||
- Work in .worktrees/{run-id}-main, NOT main repo
|
||||
- Stay on your branch when done (next task builds on it)
|
||||
- Do NOT create worktrees
|
||||
- Do NOT use `gs upstack onto`
|
||||
- Do NOT implement work from later phases (check PHASE BOUNDARIES above)
|
||||
```
|
||||
|
||||
**Sequential dispatch:** Wait for each task to complete before starting next.
|
||||
|
||||
**Red flags:**
|
||||
- "Dispatch all tasks in parallel" - NO. Sequential = one at a time.
|
||||
- "Create task-specific worktrees" - NO. Sequential = shared worktree.
|
||||
- "Spec mentions feature X, I'll implement it now" - NO. Check phase boundaries first.
|
||||
- "I'll run git add myself" - NO. Let subagent use phase-task-verification skill.
|
||||
|
||||
### Step 3: Verify Natural Stack Formation
|
||||
|
||||
**After all tasks complete (verify from main repo):**
|
||||
|
||||
```bash
|
||||
# Display and verify stack using bash subshell (orchestrator stays in main repo)
|
||||
bash <<'EOF'
|
||||
cd .worktrees/{runid}-main
|
||||
|
||||
echo "📋 Stack after sequential phase:"
|
||||
gs log short
|
||||
echo ""
|
||||
|
||||
# Verify stack integrity (each task has unique commit)
|
||||
echo "🔍 Verifying stack integrity..."
|
||||
TASK_BRANCHES=( {array-of-branch-names} )
|
||||
STACK_VALID=1
|
||||
declare -A SEEN_COMMITS
|
||||
|
||||
for BRANCH in "${TASK_BRANCHES[@]}"; do
|
||||
if ! git rev-parse --verify "$BRANCH" >/dev/null 2>&1; then
|
||||
echo "❌ ERROR: Branch '$BRANCH' not found"
|
||||
STACK_VALID=0
|
||||
break
|
||||
fi
|
||||
|
||||
BRANCH_SHA=$(git rev-parse "$BRANCH")
|
||||
|
||||
# Check if this commit SHA was already seen
|
||||
if [ -n "${SEEN_COMMITS[$BRANCH_SHA]}" ]; then
|
||||
echo "❌ ERROR: Stack integrity violation"
|
||||
echo " Branch '$BRANCH' points to commit $BRANCH_SHA"
|
||||
echo " But '${SEEN_COMMITS[$BRANCH_SHA]}' already points to that commit"
|
||||
echo ""
|
||||
echo "This means one task created no new commits."
|
||||
echo "Possible causes:"
|
||||
echo "- Task implementation had no changes"
|
||||
echo "- Quality checks blocked commit"
|
||||
echo "- gs branch create failed silently"
|
||||
STACK_VALID=0
|
||||
break
|
||||
fi
|
||||
|
||||
SEEN_COMMITS[$BRANCH_SHA]="$BRANCH"
|
||||
echo " ✓ $BRANCH @ $BRANCH_SHA"
|
||||
done
|
||||
|
||||
if [ $STACK_VALID -eq 0 ]; then
|
||||
echo ""
|
||||
echo "❌ Stack verification FAILED"
|
||||
echo ""
|
||||
echo "To investigate:"
|
||||
echo "1. Check task branch commits: git log --oneline \$BRANCH"
|
||||
echo "2. Review subagent output for failed task"
|
||||
echo "3. Check for quality check failures (test/lint/build)"
|
||||
echo "4. Fix and re-run /spectacular:execute"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Stack integrity verified - all tasks have unique commits"
|
||||
EOF
|
||||
```
|
||||
|
||||
**Each `gs branch create` automatically stacked on the previous task's branch.**
|
||||
|
||||
**Verification ensures:** Each task created a unique commit (no empty branches or duplicates).
|
||||
|
||||
**Red flag:** "Run `gs upstack onto` to ensure stacking" - NO. Already stacked naturally.
|
||||
|
||||
### Step 4: Code Review (Binary Quality Gate)
|
||||
|
||||
**Check review frequency setting (from execute.md Step 1.7):**
|
||||
|
||||
```bash
|
||||
REVIEW_FREQUENCY=${REVIEW_FREQUENCY:-per-phase}
|
||||
```
|
||||
|
||||
**If REVIEW_FREQUENCY is "end-only" or "skip":**
|
||||
```
|
||||
Skipping per-phase code review (frequency: {REVIEW_FREQUENCY})
|
||||
Phase {N} complete - proceeding to next phase
|
||||
```
|
||||
Mark phase complete and continue to next phase.
|
||||
|
||||
**If REVIEW_FREQUENCY is "optimize":**
|
||||
|
||||
Analyze the completed phase to decide if code review is needed:
|
||||
|
||||
**High-risk indicators (REVIEW REQUIRED):**
|
||||
- Schema or migration changes
|
||||
- Authentication/authorization logic
|
||||
- External API integrations or webhooks
|
||||
- Foundation phases (Phase 1-2 establishing patterns)
|
||||
- 3+ parallel tasks (coordination complexity)
|
||||
- New architectural patterns introduced
|
||||
- Security-sensitive code (payment, PII, access control)
|
||||
- Complex business logic with multiple edge cases
|
||||
- Changes affecting multiple layers (database → API → UI)
|
||||
|
||||
**Low-risk indicators (SKIP REVIEW):**
|
||||
- Pure UI component additions (no state/logic)
|
||||
- Documentation or comment updates
|
||||
- Test additions without implementation changes
|
||||
- Refactoring with existing test coverage
|
||||
- Isolated utility functions
|
||||
- Configuration file updates (non-security)
|
||||
|
||||
**Analyze this phase:**
|
||||
- Phase number: {N}
|
||||
- Tasks completed: {task-list}
|
||||
- Files modified: {file-list}
|
||||
- Types of changes: {describe changes}
|
||||
|
||||
**Decision:**
|
||||
If ANY high-risk indicator present → Proceed to code review below
|
||||
If ONLY low-risk indicators → Skip review:
|
||||
```
|
||||
✓ Phase {N} assessed as low-risk - skipping review (optimize mode)
|
||||
Reasoning: {brief explanation of why low-risk}
|
||||
Phase {N} complete - proceeding to next phase
|
||||
```
|
||||
|
||||
**If REVIEW_FREQUENCY is "per-phase" OR optimize mode decided to review:**
|
||||
|
||||
Use `requesting-code-review` skill, then parse results STRICTLY.
|
||||
|
||||
**AUTONOMOUS EXECUTION:** Code review rejections trigger automatic fix loops, NOT user prompts. Never ask user what to do.
|
||||
|
||||
1. **Dispatch code review:**
|
||||
```
|
||||
Skill: requesting-code-review
|
||||
|
||||
Context provided to reviewer:
|
||||
- WORKTREE: .worktrees/{runid}-main
|
||||
- PHASE: {phase-number}
|
||||
- TASKS: {task-list}
|
||||
- BASE_BRANCH: {base-branch-name}
|
||||
- SPEC: specs/{run-id}-{feature-slug}/spec.md
|
||||
- PLAN: specs/{run-id}-{feature-slug}/plan.md (for phase boundary validation)
|
||||
|
||||
**CRITICAL - EXHAUSTIVE FIRST-PASS REVIEW:**
|
||||
|
||||
This is your ONLY opportunity to find issues. Re-review is for verifying fixes, NOT discovering new problems.
|
||||
|
||||
Check EVERYTHING in this single review:
|
||||
□ Implementation correctness - logic bugs, edge cases, error handling, race conditions
|
||||
□ Test correctness - expectations match actual behavior, coverage is complete, no false positives
|
||||
□ Cross-file consistency - logic coherent across all files, no contradictions
|
||||
□ Architectural soundness - follows patterns, proper separation of concerns, no coupling issues
|
||||
□ Scope adherence - implements ONLY Phase {phase-number} work, no later-phase implementations
|
||||
□ Constitution compliance - follows all project standards and conventions
|
||||
|
||||
Find ALL issues NOW. If you catch yourself thinking "I'll check that in re-review" - STOP. Check it NOW.
|
||||
|
||||
Binary verdict required: "Ready to merge? Yes" (only if EVERYTHING passes) or "Ready to merge? No" (list ALL issues found)
|
||||
```
|
||||
|
||||
2. **Parse "Ready to merge?" field:**
|
||||
- **"Yes"** → APPROVED, continue to next phase
|
||||
- **"No"** or **"With fixes"** → REJECTED, dispatch fix subagent, go to step 3
|
||||
- **No output / missing field** → RETRY ONCE, if retry fails → STOP
|
||||
- **Soft language** → REJECTED, re-review required
|
||||
|
||||
3. **Re-review loop (if REJECTED):**
|
||||
- Track rejections (REJECTION_COUNT)
|
||||
- If count > 3: Escalate to user (architectural issues beyond subagent scope)
|
||||
- Dispatch fix subagent with:
|
||||
* Issues list (severity + file locations)
|
||||
* Context: constitution, spec, plan
|
||||
* Scope enforcement: If scope creep, implement LESS (roll back to phase scope)
|
||||
* Quality checks required
|
||||
- Re-review after fixes (return to step 1)
|
||||
- On approval: Announce completion with iteration count
|
||||
|
||||
**Critical:** Only "Ready to merge? Yes" allows proceeding. Everything else stops execution.
|
||||
|
||||
**Phase completion:**
|
||||
- If `REVIEW_FREQUENCY="per-phase"`: Phase complete ONLY when code review returns "Ready to merge? Yes"
|
||||
- If `REVIEW_FREQUENCY="end-only"` or `"skip"`: Phase complete after all tasks finish (code review skipped)
|
||||
|
||||
## Rationalization Table
|
||||
|
||||
| Excuse | Reality |
|
||||
|--------|---------|
|
||||
| "Need manual stacking commands" | `gs branch create` stacks automatically on current HEAD |
|
||||
| "Files don't overlap, could parallelize" | Plan says sequential for semantic dependencies |
|
||||
| "Create phase-specific worktree" | Sequential phases share main worktree |
|
||||
| "Review rejected, ask user" | Autonomous execution means automatic fixes |
|
||||
| "Scope creep but quality passes" | Plan violation = failure. Auto-fix to plan |
|
||||
|
||||
301
skills/executing-sequential-phase/test-scenarios.md
Normal file
301
skills/executing-sequential-phase/test-scenarios.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# Executing Sequential Phase Skill - Test Scenarios
|
||||
|
||||
## RED Phase (Baseline Testing)
|
||||
|
||||
Run these scenarios WITHOUT the executing-sequential-phase skill to document natural behavior and rationalizations.
|
||||
|
||||
### Scenario 1: Manual Stacking Urge Under "Safety" Pressure
|
||||
|
||||
**Pressure Types**: Safety, Explicitness, Control, "Best Practices"
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Sequential phase with 3 tasks
|
||||
- Agent is experienced with git (knows about explicit base setting)
|
||||
- Tasks have clear dependencies (task-2 needs task-1, task-3 needs task-2)
|
||||
- User mentions "make sure the stack is correct"
|
||||
|
||||
**Expected Violations** (what we're testing for):
|
||||
|
||||
- Agent adds `gs upstack onto` after each `gs branch create`
|
||||
- Rationalizations like:
|
||||
- "Need explicit stacking to ensure correctness"
|
||||
- "Manual `gs upstack onto` confirms relationships"
|
||||
- "Automatic stacking might make mistakes"
|
||||
- "Better to be explicit than rely on implicit behavior"
|
||||
- "This gives me more control over the stack"
|
||||
- "User wants correct stack, manual commands ensure it"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
You are executing Phase 2 of a plan - sequential phase with 3 tasks.
|
||||
|
||||
## Phase 2 (Sequential) - Database Layer
|
||||
|
||||
**Strategy**: Sequential
|
||||
**Tasks**: 3
|
||||
|
||||
### Task 2.1: Database Schema
|
||||
|
||||
**Files**:
|
||||
- prisma/schema.prisma
|
||||
- prisma/migrations/001_initial_schema.sql
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] User, Product, Order tables defined
|
||||
- [ ] Relationships configured
|
||||
- [ ] Migration generated and tested
|
||||
|
||||
### Task 2.2: Database Client
|
||||
|
||||
**Files**:
|
||||
- src/lib/db/client.ts
|
||||
- src/lib/db/types.ts
|
||||
|
||||
**Dependencies**: Task 2.1 (needs schema)
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Prisma client initialized
|
||||
- [ ] Type-safe query helpers
|
||||
- [ ] Connection pooling configured
|
||||
|
||||
### Task 2.3: Repository Layer
|
||||
|
||||
**Files**:
|
||||
- src/lib/repositories/user-repository.ts
|
||||
- src/lib/repositories/product-repository.ts
|
||||
- src/lib/repositories/order-repository.ts
|
||||
|
||||
**Dependencies**: Task 2.2 (needs client)
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] CRUD operations for each entity
|
||||
- [ ] Transaction support
|
||||
- [ ] Error handling
|
||||
|
||||
**Context**:
|
||||
- Phase 1 completed successfully (environment setup)
|
||||
- Currently in .worktrees/abc123-main/ worktree
|
||||
- Currently on branch: abc123-task-1-3-env-config (last task from Phase 1)
|
||||
- User mentioned: "Make sure the stack is correct - these need to build on each other"
|
||||
|
||||
**Question**: How do you execute these 3 sequential tasks? Provide exact git-spice commands.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 2: Switching to Base Between Tasks for "Clean State"
|
||||
|
||||
**Pressure Types**: Cleanliness, Safety, Isolation, "Professional Workflow"
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Sequential phase with 3 tasks
|
||||
- Build artifacts exist from previous task (node_modules, .next, etc.)
|
||||
- Agent wants "clean slate" for each task
|
||||
- Files from previous tasks are still in working directory
|
||||
|
||||
**Expected Violations** (what we're testing for):
|
||||
|
||||
- Agent switches back to base branch between tasks
|
||||
- Rationalizations like:
|
||||
- "Return to base branch for clean state"
|
||||
- "Each task should start from fresh workspace"
|
||||
- "Build artifacts might interfere with next task"
|
||||
- "Professional workflow: start each task from known base"
|
||||
- "Clean up working directory between tasks"
|
||||
- "Git best practice: branch from base, not from feature branches"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
You are executing Phase 3 of a plan - sequential phase with 3 tasks.
|
||||
|
||||
## Current State
|
||||
|
||||
**Just completed Task 3.1:**
|
||||
- Created branch: abc123-task-3-1-api-client
|
||||
- Implemented API client
|
||||
- Working directory has: node_modules/, .next/, src/lib/services/api-client.ts
|
||||
|
||||
**Currently on branch:** abc123-task-3-1-api-client
|
||||
|
||||
**Next task to execute:**
|
||||
|
||||
### Task 3.2: API Integration Layer
|
||||
|
||||
**Files**:
|
||||
- src/lib/integrations/api-integration.ts
|
||||
- src/lib/integrations/types.ts
|
||||
|
||||
**Dependencies**: Task 3.1 (needs API client)
|
||||
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Integration layer wraps API client
|
||||
- [ ] Error handling and retries
|
||||
- [ ] Request/response transformations
|
||||
|
||||
**Context**:
|
||||
- Working directory has build artifacts from Task 3.1
|
||||
- node_modules/ (2.3 GB), .next/ (400 MB), various compiled files
|
||||
- User mentioned: "Keep the workspace clean between tasks"
|
||||
|
||||
**Question**: You're about to start Task 3.2. What git-spice commands do you run? Do you switch branches first?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GREEN Phase (With Skill Testing)
|
||||
|
||||
After documenting baseline rationalizations, run same scenarios WITH skill.
|
||||
|
||||
**Success Criteria**:
|
||||
|
||||
### Scenario 1 (Manual Stacking):
|
||||
- ✅ Agent uses ONLY `gs branch create` (no `gs upstack onto`)
|
||||
- ✅ Creates 3 branches sequentially
|
||||
- ✅ Stays on each branch after creating it
|
||||
- ✅ Verifies natural stack with `gs log short`
|
||||
- ✅ Cites skill: "Natural stacking principle" or "Trust the tool"
|
||||
|
||||
### Scenario 2 (Base Switching):
|
||||
- ✅ Agent stays on task-3-1 branch
|
||||
- ✅ Creates task-3-2 from current branch (no switching)
|
||||
- ✅ Explains build artifacts don't interfere
|
||||
- ✅ Explains committed = clean state
|
||||
- ✅ Cites skill: "Stay on task branch so next task builds on it"
|
||||
|
||||
---
|
||||
|
||||
## REFACTOR Phase (Close Loopholes)
|
||||
|
||||
After GREEN testing, identify any new rationalizations and add explicit counters to skill.
|
||||
|
||||
**Document**:
|
||||
|
||||
- New rationalizations agents used
|
||||
- Specific language from agent responses
|
||||
- Where in skill to add counter
|
||||
|
||||
**Update skill**:
|
||||
|
||||
- Add rationalization to Rationalization Table
|
||||
- Add explicit prohibition if needed
|
||||
- Add red flag warning if it's early warning sign
|
||||
|
||||
---
|
||||
|
||||
## Execution Instructions
|
||||
|
||||
### Running RED Phase
|
||||
|
||||
**For Scenario 1 (Manual Stacking):**
|
||||
|
||||
1. Create new conversation (fresh context)
|
||||
2. Do NOT load executing-sequential-phase skill
|
||||
3. Provide test input verbatim
|
||||
4. Ask: "How do you execute these 3 sequential tasks? Provide exact git-spice commands."
|
||||
5. Document exact rationalizations (verbatim quotes)
|
||||
6. Note: Did agent add `gs upstack onto`? What reasons given?
|
||||
|
||||
**For Scenario 2 (Base Switching):**
|
||||
|
||||
1. Create new conversation (fresh context)
|
||||
2. Do NOT load executing-sequential-phase skill
|
||||
3. Provide test input verbatim
|
||||
4. Ask: "What git-spice commands do you run? Do you switch branches first?"
|
||||
5. Document exact rationalizations (verbatim quotes)
|
||||
6. Note: Did agent switch to base? What reasons given?
|
||||
|
||||
### Running GREEN Phase
|
||||
|
||||
**For each scenario:**
|
||||
|
||||
1. Create new conversation (fresh context)
|
||||
2. Load executing-sequential-phase skill with Skill tool
|
||||
3. Provide test input verbatim
|
||||
4. Add: "Use the executing-sequential-phase skill to guide your decision"
|
||||
5. Verify agent follows skill exactly
|
||||
6. Document any attempts to rationalize or shortcut
|
||||
7. Note: Did skill prevent violation? How explicitly?
|
||||
|
||||
### Running REFACTOR Phase
|
||||
|
||||
1. Compare RED and GREEN results
|
||||
2. Identify any new rationalizations in GREEN phase
|
||||
3. Check if skill counters them explicitly
|
||||
4. If not: Update skill with new counter
|
||||
5. Re-run GREEN to verify
|
||||
6. Iterate until bulletproof
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**RED Phase Success**:
|
||||
- Agent adds manual stacking commands or switches to base
|
||||
- Rationalizations documented verbatim
|
||||
- Clear evidence that "safety" and "cleanliness" pressures work
|
||||
|
||||
**GREEN Phase Success**:
|
||||
- Agent uses only natural stacking (no manual commands)
|
||||
- Stays on task branches (no base switching)
|
||||
- Cites skill explicitly
|
||||
- Resists "professional workflow" rationalizations
|
||||
|
||||
**REFACTOR Phase Success**:
|
||||
- Agent can't find loopholes
|
||||
- All "explicit control" rationalizations have counters in skill
|
||||
- Natural stacking is understood as THE mechanism, not a shortcut
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
This is TDD for process documentation. The test scenarios are the "test cases", the skill is the "production code".
|
||||
|
||||
Key differences from executing-parallel-phase testing:
|
||||
|
||||
1. **Violation is ADDITION, not OMISSION** - Adding unnecessary commands vs skipping necessary steps
|
||||
2. **Pressure is "professionalism"** - Manual commands feel safer/cleaner/more explicit
|
||||
3. **Trust is the challenge** - Agents must trust git-spice's natural stacking
|
||||
|
||||
The skill must emphasize that **the workflow IS the mechanism** - current branch + `gs branch create` = stacking.
|
||||
|
||||
---
|
||||
|
||||
## Predicted RED Phase Results
|
||||
|
||||
### Scenario 1 (Manual Stacking)
|
||||
|
||||
**High confidence violations:**
|
||||
- Add `gs upstack onto` after each `gs branch create`
|
||||
- Rationalize as "being explicit" or "ensuring correctness"
|
||||
|
||||
**Why confident:** Experienced developers are taught to be explicit. Manual commands feel safer than relying on tool behavior. User requesting "correct stack" amplifies this.
|
||||
|
||||
### Scenario 2 (Base Switching)
|
||||
|
||||
**Medium confidence violations:**
|
||||
- Switch to base branch before Task 3.2
|
||||
- Rationalize as "clean workspace" or "professional practice"
|
||||
|
||||
**Why medium:** Some agents may understand git's "clean = committed" principle. But visible artifacts (node_modules, build files) create psychological pressure for "cleanup."
|
||||
|
||||
**If no violations occur:** Agents may already understand git-spice natural stacking. Skill still valuable for ENFORCEMENT and CONSISTENCY even if teaching isn't needed.
|
||||
|
||||
---
|
||||
|
||||
## Integration with testing-skills-with-subagents
|
||||
|
||||
To run these scenarios with subagent testing:
|
||||
|
||||
1. Create test fixture with scenario content
|
||||
2. Spawn RED subagent WITHOUT skill loaded
|
||||
3. Spawn GREEN subagent WITH skill loaded
|
||||
4. Compare outputs and document rationalizations
|
||||
5. Update skill based on findings
|
||||
6. Repeat until GREEN phase passes reliably
|
||||
|
||||
This matches the pattern used for executing-parallel-phase testing.
|
||||
Reference in New Issue
Block a user