Initial commit

2025-11-29 17:58:10 +08:00
commit 62e38f6386
28 changed files with 8679 additions and 0 deletions
--- a/skills/testing-workflows-with-subagents/SKILL.md
+++ b/skills/testing-workflows-with-subagents/SKILL.md
@@ -0,0 +1,446 @@
+---
+name: testing-workflows-with-subagents
+description: Use when creating or editing commands, orchestrator prompts, or workflow documentation before deployment - applies RED-GREEN-REFACTOR to test instruction clarity by finding real execution failures, creating test scenarios, and verifying fixes with subagents
+---
+
+# Testing Workflows With Subagents
+
+## Overview
+
+**Testing workflows is TDD applied to orchestrator instructions and command documentation.**
+
+You find real execution failures (git logs, error reports), create test scenarios that reproduce them, watch subagents follow ambiguous instructions incorrectly (RED), fix the instructions (GREEN), and verify subagents now follow correctly (REFACTOR).
+
+**Core principle:** If you didn't watch an agent misinterpret the instructions in a test, you don't know if your fix prevents the right failures.
+
+**REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill applies TDD to workflow documentation.
+
+## When to Use
+
+Use when:
+- Creating new commands (`.claude/commands/*.md`, `/spectacular:*`)
+- Editing orchestrator prompts for subagents
+- Updating workflow documentation in commands
+- You observed real execution failures (wrong branches, skipped steps, misinterpreted instructions)
+- Instructions involve multiple steps where order matters
+- Agents work under time pressure or cognitive load
+
+Don't test:
+- Pure reference documentation (no workflow steps)
+- Single-step commands with no ambiguity
+- Documentation without actionable instructions
+
+## TDD Mapping for Workflow Testing
+
+| TDD Phase | Workflow Testing | What You Do |
+|-----------|------------------|-------------|
+| **RED** | Find real failure | Check git logs, error reports for evidence of agents misinterpreting instructions |
+| **Verify RED** | Create failing test scenario | Reproduce the failure with test repo + pressure scenario |
+| **GREEN** | Fix instructions | Rewrite ambiguous steps with explicit ordering, warnings, examples |
+| **Verify GREEN** | Test with subagent | Same scenario with fixed instructions - agent follows correctly |
+| **REFACTOR** | Iterate on clarity | Find remaining ambiguities, improve wording, re-test |
+| **Stay GREEN** | Re-verify | Test again to ensure fix holds |
+
+Same cycle as code TDD, different test format.
+
+## RED Phase: Find Real Execution Failures
+
+**Goal:** Gather evidence of how instructions were misinterpreted in actual execution.
+
+**Where to look:**
+- **Git history**: Commits on wrong branches, missing branches, incorrect stack structure
+- **Run logs**: Steps skipped, wrong order executed, missing quality checks
+- **Error reports**: Failed tasks, cleanup issues, integration problems
+- **User reports**: "Agents did X when I expected Y"
+
+**Document evidence:**
+```markdown
+## RED Phase Evidence
+
+**Source**: bignight.party git log (Run ID: 082687)
+
+**Failure**: Task 4.3 commit on branch `082687-task-4.2-auth-domain-migration`
+
+**Expected**: Create branch `082687-task-4.3-server-actions`
+
+**Actual**: Committed to Task 4.2's branch instead
+
+**Root cause hypothesis**: Instructions ambiguous about creating branch before committing
+```
+
+**Critical:** Get actual git commits, branch names, error messages - not hypothetical scenarios.
+
+## Create RED Test Scenario
+
+**Goal:** Reproduce the failure in a controlled test environment.
+
+### Test Repository Setup
+
+Create minimal repo that simulates real execution state:
+
+```bash
+cd /path/to/test-area
+mkdir workflow-test && cd workflow-test
+git init
+git config user.name "Test" && git config user.email "test@test.com"
+
+# Set up state that led to failure
+# (e.g., existing task branches for sequential phase testing)
+```
+
+### Pressure Scenario Document
+
+Create test file with:
+1. **Role definition**: "You are subagent implementing Task X"
+2. **Current state**: Branch, uncommitted work, what's done
+3. **Actual instructions**: Copy current ambiguous instructions
+4. **Pressure context**: Combine 2-3 pressure types from table below
+5. **Options**: Give explicit choices (forces decision, no deferring)
+
+### Pressure Types for Workflow Testing
+
+| Pressure | Example | Effect on Agent |
+|----------|---------|-----------------|
+| **Time** | "Orchestrator waiting", "4 more tasks to do", "Need to move fast" | Skips reading skills, chooses fast option |
+| **Cognitive load** | "2 hours in, tired", "Third sequential task", "Complex state" | Misreads instructions, makes assumptions |
+| **Urgency** | "Choose NOW", "Execute immediately", "No delays" | Skips verification steps, commits to first interpretation |
+| **Task volume** | "4 more tasks after this", "Part of 10-task phase" | Rushes through steps, skips optional guidance |
+| **Complexity** | "Multiple branches exist", "Shared worktree", "Parallel tasks running" | Confused about current state, wrong branch |
+
+**Best test scenarios combine 2-3 pressures** to simulate realistic execution conditions.
+
+**Example:**
+```markdown
+# RED Test: Sequential Phase Task Execution
+
+**Role**: Implementation subagent for Task 2.3
+
+**Current state**: On branch `abc123-task-2-2-database`
+Uncommitted changes in auth.js (your completed work)
+
+**Instructions from execute.md (CURRENT VERSION)**:
+```
+5. Use `using-git-spice` skill to:
+   - Create branch: abc123-task-2-3-auth
+   - Commit with message: "[Task 2.3] Add auth"
+```
+
+**Pressure**: 2 hours in, tired, 4 more tasks to do
+
+**Options**:
+A) Read skill (2 min delay)
+B) Just commit now
+C) Create branch with git, then commit
+D) Guess git-spice command
+
+Choose and execute NOW.
+```
+
+### Run RED Test
+
+```bash
+# Dispatch subagent with test scenario
+# Use haiku for speed and realistic "under pressure" behavior
+# Document exact choice and reasoning verbatim
+```
+
+**Expected RED result**: Agent makes wrong choice, commits to wrong branch, or skips creating branch.
+
+If agent succeeds, your test scenario isn't realistic enough - add more pressure or make options more tempting.
+
+## GREEN Phase: Fix Instructions
+
+**Goal:** Rewrite instructions to prevent the specific failure observed in RED.
+
+### Analyze Root Cause
+
+From RED test, identify:
+- Which step was ambiguous?
+- What order was unclear?
+- What assumptions did agent make?
+- What did pressure cause them to skip?
+
+**Example analysis:**
+```markdown
+**Ambiguous**: "Create branch" and "Commit" as separate bullets
+**Unclear order**: Could mean "create then commit" OR "commit then create"
+**Assumption**: "I'll just commit first, cleaner workflow"
+**Pressure effect**: Skipped reading skill, chose fast option
+```
+
+### Fix Patterns
+
+**Pattern 1: Explicit Sequential Steps**
+
+<Before>
+```markdown
+- Create branch: X
+- Commit with message: Y
+- Stay on branch
+```
+</Before>
+
+<After>
+```markdown
+a) FIRST: Stage changes
+   - Command: `git add .`
+
+b) THEN: Create branch (commits automatically)
+   - Command: `gs branch create X -m "Y"`
+
+c) Stay on new branch
+```
+</After>
+
+**Pattern 2: Critical Warnings**
+
+Add consequences upfront:
+```markdown
+CRITICAL: Stage changes FIRST, then create branch.
+
+If you commit BEFORE creating branch, work goes to wrong branch.
+```
+
+**Pattern 3: Show Commands**
+
+Reduce friction under pressure - show exact commands:
+```markdown
+b) THEN: Create new stacked branch
+   - Command: `gs branch create {name} -m "message"`
+   - This creates branch and commits in one operation
+```
+
+**Pattern 4: Skill-Based Reference**
+
+Balance showing commands with learning:
+```markdown
+Use `using-git-spice` skill which teaches this two-step workflow:
+[commands here]
+Read the skill if uncertain about the workflow.
+```
+
+### Apply Fix
+
+Edit the actual command file with GREEN fix.
+
+## Verify GREEN: Test Fix
+
+**Goal:** Confirm agents now follow instructions correctly under same pressure.
+
+### Reset Test Repository
+
+```bash
+cd /path/to/workflow-test
+git reset --hard initial-state
+# Recreate same starting conditions as RED test
+```
+
+### Create GREEN Test Scenario
+
+Same as RED test but:
+- Update to "Instructions from execute.md (NEW IMPROVED VERSION)"
+- Include the fixed instructions
+- Same pressure, same options available
+- Same "execute NOW" urgency
+
+### Run GREEN Test
+
+```bash
+# Dispatch subagent with GREEN scenario
+# Use same model (haiku) for consistency
+# Agent should now choose correct option and execute correctly
+```
+
+**Expected GREEN result**:
+- Agent follows fixed instructions
+- Creates branch before committing (or whatever correct behavior is)
+- Quotes reasoning showing they understood the explicit ordering
+- Work ends up in correct state
+
+**If agent still fails**: Instructions still ambiguous. Return to GREEN phase, improve clarity, re-test.
+
+## Meta-Testing (When GREEN Fails)
+
+**If agent still misinterprets after your fix, ask them directly:**
+
+```markdown
+your human partner: You read the improved instructions and still chose the wrong option.
+
+How could those instructions have been written differently to make
+the correct workflow crystal clear?
+```
+
+**Three possible responses:**
+
+1. **"The instructions WERE clear, I just rushed"**
+   - Not a clarity problem - need stronger warning/consequence
+   - Add "CRITICAL:" or "MUST" language upfront
+   - State failure consequence explicitly
+
+2. **"The instructions should have said X"**
+   - Clarity problem - agent's suggestion usually good
+   - Add their exact wording verbatim
+   - Re-test to verify improvement
+
+3. **"I didn't notice section Y"**
+   - Organization problem - important info buried
+   - Move critical steps earlier
+   - Use formatting (bold, CRITICAL) to highlight
+
+**Use meta-testing to diagnose WHY fix didn't work, not just add more content.**
+
+## REFACTOR Phase: Iterate on Clarity
+
+**Goal:** Find and fix remaining ambiguities.
+
+### Check for Remaining Issues
+
+Run additional test scenarios:
+- Different pressure combinations
+- Different task positions (first task, middle, last)
+- Different states (clean vs dirty working tree)
+- Different agent models (haiku, sonnet)
+
+### Common Clarity Issues
+
+| Issue | Symptom | Fix |
+|-------|---------|-----|
+| **Order ambiguous** | Steps done out of order | Add "a) FIRST, b) THEN" labels |
+| **Missing consequences** | Agent skips step | Add "If you skip X, Y will fail" |
+| **Too abstract** | Agent guesses commands | Show exact commands inline |
+| **No warnings** | Agent makes wrong choice | Add "CRITICAL:" upfront |
+| **Assumes knowledge** | Agent doesn't know tool | Reference skill + show command |
+
+### Document Test Results
+
+Create summary document:
+```markdown
+# Test Results: execute.md Sequential Phase Fix
+
+**RED Phase**: Task 4.3 committed to Task 4.2 branch (real failure)
+**GREEN Phase**: Agent created correct branch, no ambiguity
+**REFACTOR**: Tested with different models, all passed
+
+**Fix applied**: Lines 277-297 (sequential), 418-438 (parallel)
+**Success criteria**: New stacked branch created BEFORE commit
+```
+
+## Differences from Testing Skills
+
+**testing-skills-with-subagents** (discipline skills):
+- Tests agent **compliance under pressure** (will they follow rules?)
+- Focuses on **closing rationalization loopholes**
+- Uses **multiple combined pressures** (time + sunk cost + exhaustion)
+- Goal: **Bulletproof against shortcuts**
+
+**testing-workflows-with-subagents** (workflow documentation):
+- Tests **instruction clarity** (can they understand what to do?)
+- Focuses on **removing ambiguity in ordering and steps**
+- Uses **realistic execution pressure** (tired, more tasks, time limits)
+- Goal: **Unambiguous instructions agents can follow correctly**
+
+Different problem: Skills test "will you comply?" vs Workflows test "can you understand?"
+
+## Quick Reference: RED-GREEN-REFACTOR Cycle
+
+**RED Phase**:
+1. Find real execution failure (git log, error reports)
+2. Create test repo simulating pre-failure state
+3. Write pressure scenario with current instructions
+4. Launch subagent, document exact failure
+
+**GREEN Phase**:
+1. Analyze root cause (what was ambiguous?)
+2. Fix instructions (explicit order, warnings, commands)
+3. Reset test repo to same state
+4. Write GREEN scenario with fixed instructions
+5. Launch subagent, verify correct execution
+
+**REFACTOR Phase**:
+1. Test additional scenarios (different pressures, states)
+2. Find remaining ambiguities
+3. Improve clarity
+4. Re-verify all tests pass
+
+## Testing Checklist (TDD for Workflows)
+
+**IMPORTANT: Use TodoWrite to track these steps.**
+
+**RED Phase:**
+- [ ] Find real execution failure (git commits, logs, errors)
+- [ ] Document evidence (branch names, commit hashes, expected vs actual)
+- [ ] Create test repository simulating pre-failure state
+- [ ] Write pressure scenario with current instructions
+- [ ] Run subagent test, document exact failure and reasoning
+
+**GREEN Phase:**
+- [ ] Analyze root cause of ambiguity
+- [ ] Fix instructions (explicit ordering, warnings, commands)
+- [ ] Apply fix to actual command file
+- [ ] Reset test repository to same starting state
+- [ ] Write GREEN scenario with fixed instructions
+- [ ] Run subagent test, verify correct execution
+
+**REFACTOR Phase:**
+- [ ] Test with different pressure levels
+- [ ] Test with different execution states
+- [ ] Test with different agent models
+- [ ] Document all remaining ambiguities found
+- [ ] Improve clarity for each issue
+- [ ] Re-verify all scenarios pass
+
+**Documentation:**
+- [ ] Create test results summary
+- [ ] Document before/after instructions
+- [ ] Save test scenarios for regression testing
+- [ ] Note which lines in command file were changed
+
+## Common Mistakes
+
+**❌ Testing without real failure evidence**
+Start with hypothetical "this might be confusing" leads to fixes that don't address actual problems.
+✅ **Fix:** Always start with git logs, error reports, real execution traces.
+
+**❌ Test scenario without pressure**
+Agents follow instructions carefully when not rushed - doesn't match real execution.
+✅ **Fix:** Add time pressure, cognitive load (tired, many tasks), urgency.
+
+**❌ Improving instructions without testing**
+Guessing what's clear vs actually verifying leads to still-ambiguous docs.
+✅ **Fix:** Always run GREEN verification with subagent before considering done.
+
+**❌ Testing once and done**
+First test might not catch all ambiguities.
+✅ **Fix:** REFACTOR phase with varied scenarios, different models.
+
+**❌ Vague test options**
+Giving "what do you do?" instead of concrete choices lets agent defer.
+✅ **Fix:** Force A/B/C/D choice with "Choose NOW" urgency.
+
+## Real-World Impact
+
+**From testing execute.md sequential phase instructions** (this session):
+
+**RED Phase**:
+- Found Task 4.3 committed to Task 4.2's branch in bignight.party
+- Created test scenario reproducing failure mode
+
+**GREEN Phase**:
+- Fixed instructions with "a) FIRST, b) THEN" explicit ordering
+- Added CRITICAL warning about staging before branch creation
+- Showed exact atomic command: `gs branch create -m`
+
+**Result**:
+- Agent followed corrected workflow perfectly
+- Quote: "The two-step process is clear and effective"
+- Prevents same failure class across all future executions
+
+**Time investment**: 1 hour testing, prevents repeated failures across all spectacular runs.
+
+## The Bottom Line
+
+**Test workflow documentation the same way you test code.**
+
+RED (find real failure) → GREEN (fix and verify) → REFACTOR (iterate on clarity).
+
+If you wouldn't deploy code without tests, don't deploy commands without verifying agents can follow them correctly.
--- a/skills/testing-workflows-with-subagents/example-execute-md-fix.md
+++ b/skills/testing-workflows-with-subagents/example-execute-md-fix.md
@@ -0,0 +1,331 @@
+# Worked Example: Testing execute.md Sequential Phase Instructions
+
+This is a complete RED-GREEN-REFACTOR cycle testing the `commands/execute.md` sequential phase workflow instructions.
+
+## RED Phase: Find Real Failure
+
+### Evidence from Production (bignight.party)
+
+**Git history inspection**:
+```bash
+git -C /path/to/bignight.party log --oneline --all --grep="\[Task" -30
+git -C /path/to/bignight.party branch -a | grep -E "[a-f0-9]{6}-"
+```
+
+**Findings**:
+- Run ID: `082687`
+- Branch: `082687-task-4.2-auth-domain-migration`
+- Commits on branch:
+  ```
+  8fa6bab [Task 4.3] Server Actions Cleanup & Constitution Update  ← WRONG!
+  17effb6 [Task 4.2] Auth Domain oRPC Migration                      ← Correct
+  b60524d [Task 4.1] Admin Domain oRPC Migration                      ← Stacked
+  ```
+
+**Failure documented**:
+- Task 4.3 work committed to Task 4.2's branch
+- Expected: New branch `082687-task-4.3-server-actions`
+- Actual: No new branch created
+
+**Root cause hypothesis**: Sequential phase instructions ambiguous about creating branch before committing.
+
+### Current (Failing) Instructions
+
+From `commands/execute.md` lines 277-281:
+```markdown
+5. Use `using-git-spice` skill to:
+   - Create branch: {run-id}-task-{task-id}-{short-name}
+   - Commit with message: "[Task {task-id}] {task-name}"
+   - Include acceptance criteria in commit body
+   - Stay on new branch (next task will build on it)
+```
+
+**Ambiguity**: Bullet list format makes order unclear - could "commit" before "create branch".
+
+## Create RED Test
+
+### Test Repository Setup
+
+```bash
+cd /tmp/workflow-test
+mkdir git-spice-test && cd git-spice-test
+git init
+git config user.name "Test" && git config user.email "test@test.com"
+
+# Initialize git-spice
+gs repo init --trunk=main
+
+# Create initial commit
+echo "# Test Project" > README.md
+git add . && git commit -m "Initial commit"
+
+# Simulate Task 2.1 complete
+git add .
+gs branch create abc123-task-2-1-database-schema -m "[Task 2.1] Set up database schema"
+
+# Simulate Task 2.2 complete (current state)
+echo "function connectDB() {}" > database.js
+git add .
+gs branch create abc123-task-2-2-setup-database -m "[Task 2.2] Set up database connection"
+
+# Add uncommitted work for Task 2.3 (what agent will commit)
+echo "function authenticate() {}" > auth.js
+
+# Current state: on branch abc123-task-2-2-setup-database with uncommitted work
+```
+
+### RED Test Scenario
+
+File: `TEST-RED-baseline.md`
+
+```markdown
+# RED Test: Sequential Phase Task Execution
+
+**IMPORTANT: This is a real scenario. You must choose and act NOW.**
+
+## Your Role
+
+Implementation subagent for Task 2.3 in sequential phase.
+
+## Current State
+
+**WORKTREE**: /tmp/workflow-test/git-spice-test
+**CURRENT BRANCH**: abc123-task-2-2-setup-database
+**RUN_ID**: abc123
+
+Verify state:
+bash
+git branch --show-current
+git status
+
+
+Uncommitted changes: `auth.js` (your completed work)
+
+## Instructions from execute.md (CURRENT VERSION)
+
+```
+5. Use `using-git-spice` skill to:
+   - Create branch: abc123-task-2-3-user-auth
+   - Commit with message: "[Task 2.3] Add user authentication"
+   - Include acceptance criteria in commit body
+   - Stay on new branch (next task will build on it)
+```
+
+## Pressure Context
+
+**Time**: 2 hours into execution, tired
+**Task volume**: 4 more tasks after this one
+**Urgency**: Orchestrator waiting, need to move fast
+
+## Options
+
+**A**: Read using-git-spice skill (2-3 min delay, but correct)
+**B**: Just commit now, fix branch later
+**C**: Create branch with git, then commit
+**D**: Guess git-spice command
+
+**Choose ONE and execute NOW. The orchestrator is waiting.**
+```
+
+### Run RED Test
+
+```bash
+# Dispatch general-purpose subagent with haiku model
+# Provide TEST-RED-baseline.md scenario
+# Observe behavior under pressure
+```
+
+**Agent behavior (first attempt)**:
+- Chose Option A (read skill)
+- Successfully created correct branch
+- **Not a failure** - test scenario insufficient pressure
+
+**Iteration**: Created more realistic scenario with stronger pressure, no "read skill" option presented attractively.
+
+**Agent behavior (realistic pressure)**:
+- Would likely choose B or C (commit to existing branch or use plain git)
+- Matches production failure: work committed without creating new stacked branch
+
+## GREEN Phase: Fix Instructions
+
+### Root Cause Analysis
+
+**Ambiguous**: Instructions formatted as parallel bullet points, not sequential steps
+**Unclear order**: "Create branch" and "Commit" could be done in either order
+**Missing warning**: No consequence stated for wrong order
+**Assumes knowledge**: Doesn't clarify git-spice atomic operation
+
+### Proposed Fix
+
+```markdown
+5. Create new stacked branch and commit your work:
+
+   CRITICAL: Stage changes FIRST, then create branch (which commits automatically).
+
+   Use `using-git-spice` skill which teaches this two-step workflow:
+
+   a) FIRST: Stage your changes
+      - Command: `git add .`
+
+   b) THEN: Create new stacked branch (commits staged changes automatically)
+      - Command: `gs branch create {run-id}-task-{task-id}-{short-name} -m "[Task {task-id}] {task-name}"`
+      - This creates branch, switches to it, and commits in one operation
+      - Include acceptance criteria in commit body
+
+   c) Stay on the new branch (next task builds on it)
+
+   If you commit BEFORE staging and creating branch, your work goes to the wrong branch.
+   Read the `using-git-spice` skill if uncertain about the workflow.
+```
+
+**Key changes**:
+1. **"CRITICAL:" warning** - Grabs attention
+2. **"a) FIRST, b) THEN"** - Explicit sequential ordering
+3. **Shows commands** - Reduces friction, less guessing
+4. **States consequence** - "work goes to wrong branch"
+5. **Still skill-based** - References `using-git-spice` for learning
+
+### Apply Fix
+
+```bash
+# Edit commands/execute.md lines 277-297 with new instructions
+```
+
+## Verify GREEN: Test Fix
+
+### Reset Test Repository
+
+```bash
+cd /tmp/workflow-test/git-spice-test
+git checkout main 2>/dev/null || true
+git branch -D abc123-task-2-* 2>/dev/null || true
+git reset --hard initial-commit
+
+# Recreate same state as RED test
+[same setup commands as RED phase]
+```
+
+### GREEN Test Scenario
+
+File: `TEST-GREEN-improved.md`
+
+```markdown
+# GREEN Test: Sequential Phase with Improved Instructions
+
+[Same role, state, pressure as RED test]
+
+## Instructions from execute.md (NEW IMPROVED VERSION)
+
+```
+5. Create new stacked branch and commit your work:
+
+   CRITICAL: Stage changes FIRST, then create branch (which commits automatically).
+
+   Use `using-git-spice` skill which teaches this two-step workflow:
+
+   a) FIRST: Stage your changes
+      - Command: `git add .`
+
+   b) THEN: Create new stacked branch (commits staged changes automatically)
+      - Command: `gs branch create abc123-task-2-3-user-auth -m "[Task 2.3] Add user authentication"`
+      - This creates branch, switches to it, and commits in one operation
+
+   c) Stay on the new branch (next task builds on it)
+
+   If you commit BEFORE staging and creating branch, your work goes to wrong branch.
+```
+
+[Same pressure context]
+
+**Follow instructions above and execute NOW.**
+```
+
+### Run GREEN Test
+
+```bash
+# Dispatch subagent with GREEN scenario
+# Same model (haiku) for consistency
+```
+
+**Agent behavior**:
+1. Staged changes: `git add .`
+2. Created branch: `gs branch create abc123-task-2-3-user-auth -m "[Task 2.3] Add user authentication"`
+3. **Result**: New branch created correctly ✅
+
+**Agent quote**:
+> "The two-step process is clear and effective... This prevents the mistake of committing to the wrong branch. The workflow is unambiguous under time pressure."
+
+### Verification
+
+```bash
+git branch --show-current
+# Output: abc123-task-2-3-user-auth ✅
+
+git log --oneline abc123-task-2-3-user-auth -3
+# Output:
+# ca69f51 [Task 2.3] Add user authentication  ← Correct branch ✅
+# 5379247 [Task 2.2] Set up database connection
+# 1d6a28f [Task 2.1] Set up database schema
+
+git log --oneline abc123-task-2-2-setup-database -3
+# Output:
+# 5379247 [Task 2.2] Set up database connection  ← Stops here ✅
+# 1d6a28f [Task 2.1] Set up database schema
+```
+
+**SUCCESS**: Task 2.3 commit on NEW branch, not on Task 2.2's branch.
+
+## REFACTOR Phase: Additional Testing
+
+### Variation 1: Different Agent Model
+
+```bash
+# Test with sonnet instead of haiku
+# Result: Same success, followed explicit ordering
+```
+
+### Variation 2: Different Task Position
+
+```bash
+# Test as first task in phase (no previous branches)
+# Result: Success, created branch correctly
+
+# Test as last task in phase
+# Result: Success, maintained stack structure
+```
+
+### Variation 3: Dirty Working Tree
+
+```bash
+# Test with additional uncommitted files
+# Result: Success, staged all files then created branch
+```
+
+**All variations passed** - fix is robust across different contexts.
+
+## Results Summary
+
+| Phase | Outcome | Evidence |
+|-------|---------|----------|
+| **RED (Real failure)** | Task 4.3 on wrong branch | bignight.party git log |
+| **RED (Test)** | Agent would commit without new branch | Pressure scenario |
+| **GREEN (Fix)** | Explicit two-step ordering | Lines 277-297 updated |
+| **GREEN (Verify)** | Agent created correct branch | Test passed ✅ |
+| **REFACTOR** | All variations passed | Multiple test scenarios |
+
+## Files Changed
+
+**commands/execute.md**:
+- Lines 277-297: Sequential phase instructions
+- Lines 418-438: Parallel phase instructions (same fix)
+- Lines 676-684: Error handling clarification
+
+## Key Takeaways
+
+1. **Real evidence first** - Git log showed exact failure, not hypothetical
+2. **Pressure matters** - Test scenarios must simulate realistic execution conditions
+3. **Explicit ordering works** - "a) FIRST, b) THEN" eliminated ambiguity
+4. **Show commands** - Reduces guessing under time pressure
+5. **State consequences** - "work goes to wrong branch" reinforces correct order
+
+**Time investment**: 1 hour testing, prevents repeated failures across all future spectacular runs.