Initial commit

2025-11-29 17:58:10 +08:00
commit 62e38f6386
28 changed files with 8679 additions and 0 deletions
--- a/skills/decomposing-tasks/SKILL.md
+++ b/skills/decomposing-tasks/SKILL.md
@@ -0,0 +1,552 @@
+---
+name: decomposing-tasks
+description: Use when you have a complete feature spec and need to plan implementation - analyzes task dependencies, groups into sequential/parallel phases, validates task quality (no XL tasks, explicit files), and calculates parallelization time savings
+---
+
+# Task Decomposition
+
+Analyze a feature specification and decompose it into an execution-ready plan with automatic phase grouping based on file dependencies.
+
+**When to use:** After completing a feature spec, before implementation.
+
+**Announce:** "I'm using the Task Decomposition skill to create an execution plan."
+
+## Overview
+
+This skill transforms a feature specification into a structured implementation plan by:
+
+1. Extracting tasks from spec
+2. Analyzing file dependencies between tasks
+3. Grouping into phases (sequential or parallel)
+4. Validating task quality
+5. Outputting executable plan.md
+
+## PR-Sized Chunks Philosophy
+
+**Tasks should be PR-sized, thematically coherent units** - not mechanical file-by-file splits.
+
+**Think like a senior engineer:**
+
+- ❌ "Add schema" + "Install dependency" + "Add routes" (3 tiny tasks)
+- ✅ "Database Foundation" (schema + migration + dependencies as one unit)
+
+**Task chunking principles:**
+
+1. **Thematic Coherence** - Task represents a complete "thing"
+
+   - Complete subsystem (agent system with tools + config + types)
+   - Complete layer (all service methods for a feature)
+   - Complete feature slice (UI flow from form to preview to confirm)
+
+2. **Natural PR Size** - Reviewable in one sitting (4-7h)
+
+   - M (3-5h): Sweet spot for most tasks
+   - L (5-7h): Complex but coherent units (full UI layer, complete API surface)
+   - S (1-2h): Rare - only for truly standalone work
+
+3. **Logical Boundaries** - Clear separation points
+
+   - Layer boundaries (Models, Services, Actions, UI)
+   - Subsystem boundaries (Agent, Import Service, API)
+   - Feature boundaries (Auth, Import, Dashboard)
+
+4. **Stackable** - Dependencies flow cleanly
+   - Database → Logic → API → UI
+   - Foundation → Core → Integration
+
+**Good chunking examples:**
+
+```
+✅ GOOD: PR-sized, thematic chunks
+- Task 1: Database Foundation (M - 4h)
+  - Schema changes + migration + dependency install
+  - One coherent "foundation" PR
+
+- Task 2: Agent System (L - 6h)
+  - Agent config + tools + schemas + types
+  - Complete agent subsystem as a unit
+
+- Task 3: Import Service Layer (M - 4h)
+  - All service methods + business logic
+  - Clean layer boundary
+
+- Task 4: API Surface (L - 6h)
+  - Server actions + SSE route
+  - Complete API interface
+
+- Task 5: Import UI (L - 7h)
+  - All components + page + integration
+  - Complete user-facing feature
+
+Total: 5 tasks, 27h
+Each task is a reviewable PR that adds value
+```
+
+```
+❌ BAD: Too granular, mechanical splits
+- Task 1: Add schema fields (S - 2h)
+- Task 2: Create migration (S - 1h)
+- Task 3: Install dependency (S - 1h)
+- Task 4: Create agent config (M - 3h)
+- Task 5: Create fetch tool (S - 1h)
+- Task 6: Create schemas (S - 2h)
+- Task 7: Create service (M - 4h)
+- Task 8: Create actions (M - 3h)
+- Task 9: Create SSE route (M - 3h)
+- Task 10: Create form component (S - 2h)
+- Task 11: Create progress component (S - 2h)
+- Task 12: Create preview component (M - 2h)
+- Task 13: Add routes (S - 1h)
+- Task 14: Integrate components (S - 1h)
+
+Total: 14 tasks, 28h
+Too many tiny PRs, no coherent units
+```
+
+**Bundling heuristics:**
+
+If you're creating S tasks, ask:
+
+- Can this bundle with a related M task?
+- Does this complete a subsystem or layer?
+- Would a senior engineer create a separate PR for this?
+
+**Common bundling patterns:**
+
+- Schema + migration + dependencies → "Database Foundation"
+- Agent + tools + schemas → "Agent System"
+- Service + helper functions → "Service Layer"
+- Actions + API routes → "API Layer"
+- All UI components for a flow → "UI Layer"
+
+## The Process
+
+### Step 1: Read Spec and Extract/Design Tasks
+
+Read the spec file and extract tasks. The spec may provide tasks in two ways:
+
+**Option A: Spec has "Implementation Plan" section** (structured task breakdown)
+- Extract tasks directly from this section
+- Each task should have: ID, description, files, complexity, acceptance criteria
+
+**Option B: Spec has no "Implementation Plan"** (lean spec - requirements only)
+- Analyze the requirements and design task breakdown yourself
+- Look at: Functional Requirements, Architecture section, Files to Create/Modify
+- Design PR-sized chunks following the chunking philosophy above
+- Create tasks that implement all requirements
+
+For each task (extracted or designed), capture:
+
+- **Task ID** (from heading)
+- **Description** (what to implement)
+- **Files** (explicit paths from spec)
+- **Complexity** (S/M/L/XL - estimated hours)
+- **Acceptance Criteria** (checklist items)
+- **Implementation Steps** (detailed steps)
+
+**Example extraction:**
+
+```markdown
+Spec has:
+
+### Task 1: Database Schema
+
+**Complexity**: M (2-4h)
+**Files**:
+
+- prisma/schema.prisma
+- prisma/migrations/
+
+**Description**: Add VerificationToken model for Auth.js...
+
+**Acceptance**:
+
+- [ ] Model matches Auth.js spec
+- [ ] Migration runs cleanly
+
+Extract to:
+{
+id: "task-1-database-schema",
+description: "Add VerificationToken model",
+files: ["prisma/schema.prisma", "prisma/migrations/"],
+complexity: "M",
+estimated_hours: 3,
+acceptance_criteria: [...],
+steps: [...]
+}
+```
+
+### Step 2: Validate Task Quality & Chunking
+
+For each task, check for quality issues:
+
+**CRITICAL (must fix):**
+
+- ❌ XL complexity (>8h) → Must split into M/L tasks
+- ❌ No files specified → Must add explicit file paths
+- ❌ No acceptance criteria → Must add 3-5 testable criteria
+- ❌ Wildcard patterns (`src/**/*.ts`) → Must use explicit paths
+- ❌ Too many S tasks (>30% of total) → Bundle into thematic M/L tasks
+
+**HIGH (strongly recommend):**
+
+- ⚠️ Standalone S task that could bundle with related work
+- ⚠️ L complexity (5-8h) → Verify it's a coherent unit, not arbitrary split
+- ⚠️ >10 files → Likely too large, consider splitting by subsystem
+- ⚠️ <50 char description → Add more detail about what subsystem/layer this completes
+- ⚠️ <3 acceptance criteria → Add more specific criteria
+
+**Chunking validation:**
+
+- If task is S (1-2h), verify it's truly standalone:
+
+  - Can't be bundled with schema/migration/dependencies?
+  - Can't be bundled with related service/action/component?
+  - Would a senior engineer create a separate PR for this?
+
+- If >50% of tasks are S, that's a red flag:
+  - Likely too granular
+  - Missing thematic coherence
+  - Bundle related S tasks into M tasks
+
+**If CRITICAL issues found:**
+
+- STOP and report issues to user
+- User must update spec or adjust chunking
+- Re-run skill after fixes
+
+**If only HIGH issues:**
+
+- Report warnings
+- Offer to continue or fix
+
+### Step 3: Analyze File Dependencies
+
+Build dependency graph by analyzing file overlaps:
+
+**Algorithm:**
+
+```
+For each task T1:
+  For each task T2 (where T2 appears after T1):
+    shared_files = intersection(T1.files, T2.files)
+
+    If shared_files is not empty:
+      T2.dependencies.add(T1.id)
+      T2.dependency_reason = "Shares files: {shared_files}"
+```
+
+**Example:**
+
+```
+Task 1: ["prisma/schema.prisma"]
+Task 2: ["src/lib/models/auth.ts"]
+Task 3: ["prisma/schema.prisma", "src/types/auth.ts"]
+
+Analysis:
+- Task 2: No dependencies (no shared files with Task 1)
+- Task 3: Depends on Task 1 (shares prisma/schema.prisma)
+```
+
+**Architectural dependencies:**
+Also add dependencies based on layer order:
+
+- Models → Services → Actions → UI
+- Database → Types → Logic → API → Components
+
+### Step 4: Group into Phases
+
+Group tasks into phases using dependency graph:
+
+**Phase grouping algorithm:**
+
+```
+1. Start with tasks that have no dependencies (roots)
+2. Group all independent roots into Phase 1
+3. Remove roots from graph
+4. Repeat until all tasks grouped
+
+For each phase:
+  - If all tasks independent: strategy = "parallel"
+  - If any dependencies exist: strategy = "sequential"
+```
+
+**Example:**
+
+```
+Tasks:
+- Task 1: [] (no deps)
+- Task 2: [] (no deps)
+- Task 3: [task-1, task-2]
+- Task 4: [task-3]
+
+Grouping:
+Phase 1: [Task 1, Task 2] - parallel (independent)
+Phase 2: [Task 3] - sequential (waits for Phase 1)
+Phase 3: [Task 4] - sequential (waits for Phase 2)
+```
+
+### Step 5: Calculate Execution Estimates
+
+For each phase, calculate:
+
+- **Sequential time**: Sum of all task hours
+- **Parallel time**: Max of all task hours (if parallel strategy)
+- **Time savings**: Sequential - Parallel
+
+**Example:**
+
+```
+Phase 2 (parallel):
+- Task A: 3h
+- Task B: 2h
+- Task C: 4h
+
+Sequential: 3 + 2 + 4 = 9h
+Parallel: max(3, 2, 4) = 4h
+Savings: 9 - 4 = 5h (56% faster)
+```
+
+### Step 6: Generate plan.md
+
+Write plan to `{spec-directory}/plan.md`:
+
+**Template:**
+
+````markdown
+# Feature: {Feature Name} - Implementation Plan
+
+> **Generated by:** Task Decomposition skill
+> **From spec:** {spec-path}
+> **Created:** {date}
+
+## Execution Summary
+
+- **Total Tasks**: {count}
+- **Total Phases**: {count}
+- **Sequential Time**: {hours}h
+- **Parallel Time**: {hours}h
+- **Time Savings**: {hours}h ({percent}%)
+
+**Parallel Opportunities:**
+
+- Phase {id}: {task-count} tasks ({hours}h saved)
+
+---
+
+## Phase {N}: {Phase Name}
+
+**Strategy**: {sequential|parallel}
+**Reason**: {why this strategy}
+
+### Task {ID}: {Name}
+
+**Files**:
+
+- {file-path-1}
+- {file-path-2}
+
+**Complexity**: {S|M|L} ({hours}h)
+
+**Dependencies**: {[task-ids] or "None"}
+
+**Description**:
+{What to implement and why}
+
+**Implementation Steps**:
+
+1. {step-1}
+2. {step-2}
+3. {step-3}
+
+**Acceptance Criteria**:
+
+- [ ] {criterion-1}
+- [ ] {criterion-2}
+- [ ] {criterion-3}
+
+**Mandatory Patterns**:
+
+> **Constitution**: All code must follow @docs/constitutions/current/
+
+See architecture.md for layer boundaries and patterns.md for required patterns.
+
+**TDD**: Follow `test-driven-development` skill (write test first, watch fail, minimal code, watch pass)
+
+**Quality Gates**:
+
+```bash
+pnpm biome check --write .
+pnpm test {test-files}
+```
+````
+
+---
+
+{Repeat for all tasks in all phases}
+
+````
+
+### Step 7: Report to User
+
+After generating plan:
+
+```markdown
+✅ Task Decomposition Complete
+
+**Plan Location**: specs/{run-id}-{feature-slug}/plan.md
+
+## Breakdown
+- Phases: {count}
+- Tasks: {count}
+- Complexity: {XL}: {n}, {L}: {n}, {M}: {n}, {S}: {n}
+
+## Execution Strategy
+- Sequential Phases: {count} ({tasks} tasks)
+- Parallel Phases: {count} ({tasks} tasks)
+
+## Time Estimates
+- Sequential Execution: {hours}h
+- With Parallelization: {hours}h
+- **Time Savings: {hours}h ({percent}% faster)**
+
+## Next Steps
+
+Review plan:
+```bash
+cat specs/{run-id}-{feature-slug}/plan.md
+````
+
+Execute plan:
+
+```bash
+/spectacular:execute @specs/{run-id}-{feature-slug}/plan.md
+```
+
+```
+
+## Quality Rules
+
+**Task Sizing (PR-focused):**
+- ⚠️ S (1-2h): Rare - only truly standalone work (e.g., config-only changes)
+  - Most S tasks should bundle into M
+  - Ask: "Would a senior engineer PR this alone?"
+- ✅ M (3-5h): Sweet spot - most tasks should be this size
+  - Complete subsystem, layer, or feature slice
+  - Reviewable in one sitting
+  - Thematically coherent unit
+- ✅ L (5-7h): Complex coherent units (use for major subsystems)
+  - Full UI layer with all components
+  - Complete API surface (actions + routes)
+  - Major feature integration
+- ❌ XL (>8h): NEVER - always split into M/L tasks
+
+**Chunking Standards:**
+- ❌ <30% S tasks is a red flag (too granular)
+- ✅ Most tasks should be M (60-80%)
+- ✅ Some L tasks for major units (10-30%)
+- ✅ Rare S tasks for truly standalone work (<10%)
+
+**File Specificity:**
+- ✅ `src/lib/models/auth.ts`
+- ✅ `src/components/auth/LoginForm.tsx`
+- ❌ `src/**/*.ts` (too vague)
+- ❌ `src/lib/models/` (specify exact files)
+
+**Acceptance Criteria:**
+- ✅ 3-5 specific, testable criteria
+- ✅ Quantifiable (tests pass, build succeeds, API returns 200)
+- ❌ Vague ("works well", "is good")
+- ❌ Too many (>7 - task is too large)
+
+**Dependencies:**
+- ✅ Minimal (only true blockers)
+- ✅ Explicit reasons (shares file X)
+- ❌ Circular dependencies
+- ❌ Over-constrained (everything depends on everything)
+
+## Error Handling
+
+### Spec Has Insufficient Information
+
+If spec has neither "Implementation Plan" nor enough detail to design tasks:
+
+```
+
+❌ Cannot decompose - spec lacks implementation details
+
+The spec must have either:
+- An "Implementation Plan" section with tasks, OR
+- Sufficient requirements and architecture details to design tasks
+
+Current spec has:
+- Functional Requirements: [YES/NO]
+- Architecture section: [YES/NO]
+- Files to create/modify: [YES/NO]
+
+Add more implementation details to the spec, then re-run:
+/spectacular:plan @specs/{run-id}-{feature-slug}/spec.md
+
+```
+
+### Critical Quality Issues
+
+If tasks have critical issues:
+
+```
+
+❌ Task Quality Issues - Cannot Generate Plan
+
+Critical Issues Found:
+
+- Task 3: XL complexity (12h) - must split
+- Task 5: No files specified
+- Task 7: No acceptance criteria
+
+Fix these issues in the spec, then re-run:
+/spectacular:plan @specs/{run-id}-{feature-slug}/spec.md
+
+```
+
+### Circular Dependencies
+
+If dependency graph has cycles:
+
+```
+
+❌ Circular Dependencies Detected
+
+Task A depends on Task B
+Task B depends on Task C
+Task C depends on Task A
+
+This is impossible to execute. Review task organization.
+
+````
+
+## Integration with Other Skills
+
+**Before:** Use `brainstorming` and `spec-feature` to create complete spec
+
+**After:** Use `/execute` command to run plan with `subagent-driven-development`
+
+**Pairs with:**
+- `subagent-driven-development` - Executes individual tasks
+- `finishing-a-development-branch` - Completes implementation
+
+## Project-Specific Configuration
+
+For projects with a constitution, reference it in every task:
+
+> **Constitution**: All tasks MUST follow @docs/constitutions/current/
+
+Every task must include:
+- Reference to constitution for architecture (layer boundaries, dependencies)
+- Reference to constitution for patterns (validation, state management, etc.)
+- Quality gates (linting, testing, building)
+
+**Quality gates:**
+```bash
+pnpm biome check --write .
+pnpm test
+````
--- a/skills/decomposing-tasks/test-scenarios.md
+++ b/skills/decomposing-tasks/test-scenarios.md
@@ -0,0 +1,414 @@
+# Task Decomposition Skill - Test Scenarios
+
+## RED Phase (Baseline Testing)
+
+Run these scenarios WITHOUT the decomposing-tasks skill to document natural behavior and rationalizations.
+
+### Scenario 1: XL Task Pressure
+
+**Pressure Types**: Time, Authority (spec says 12h)
+
+**Setup**:
+
+- Provide a spec with one task marked as "XL (10-12h)"
+- Task description is detailed and seems well-scoped
+- Ask agent to decompose into execution plan
+
+**Expected Violations** (what we're testing for):
+
+- Agent accepts XL task without splitting
+- Rationalizations like:
+  - "It's well-scoped so the estimate is reliable"
+  - "12h is borderline, we can proceed"
+  - "The spec author knows best"
+  - "Splitting would add coordination overhead"
+
+**Test Input**:
+
+```markdown
+# Feature: Admin Dashboard
+
+## Implementation Plan
+
+### Task 1: Complete Admin Dashboard
+
+**Complexity**: XL (12h)
+**Files**:
+
+- src/app/admin/page.tsx
+- src/app/admin/users/page.tsx
+- src/app/admin/categories/page.tsx
+- src/lib/services/admin-service.ts
+- src/lib/actions/admin-actions.ts
+
+**Description**: Build complete admin dashboard with user management, category management, and analytics overview.
+
+**Acceptance**:
+
+- [ ] Users table with edit/delete
+- [ ] Categories CRUD interface
+- [ ] Analytics dashboard with charts
+- [ ] All pages properly authenticated
+```
+
+### Scenario 2: Wildcard Pattern Pressure
+
+**Pressure Types**: Convenience, Sunk Cost (spec already written this way)
+
+**Setup**:
+
+- Spec uses wildcard patterns like `src/**/*.ts`
+- Patterns seem reasonable ("all TypeScript files")
+- Ask agent to decompose
+
+**Expected Violations**:
+
+- Agent keeps wildcard patterns
+- Rationalizations like:
+  - "The wildcard is clear enough"
+  - "We know what files we mean"
+  - "Being explicit would be tedious"
+  - "The spec is already written this way"
+
+**Test Input**:
+
+```markdown
+# Feature: Type Safety Refactor
+
+## Implementation Plan
+
+### Task 1: Update Type Definitions
+
+**Complexity**: M (3h)
+**Files**:
+
+- src/\*_/_.ts
+- types/\*_/_.d.ts
+
+**Description**: Update all TypeScript files to use strict mode...
+```
+
+### Scenario 3: False Independence Pressure
+
+**Pressure Types**: Optimism, Desired Outcome (want parallelization)
+
+**Setup**:
+
+- Two tasks that share a file
+- Tasks seem independent at first glance
+- User wants parallelization
+
+**Expected Violations**:
+
+- Agent marks tasks as parallel despite file overlap
+- Rationalizations like:
+  - "They modify different parts of the file"
+  - "We can merge the changes later"
+  - "The overlap is minimal"
+  - "Parallelization benefits outweigh coordination cost"
+
+**Test Input**:
+
+```markdown
+# Feature: Authentication System
+
+## Implementation Plan
+
+### Task 1: Magic Link Service
+
+**Complexity**: M (3h)
+**Files**:
+
+- src/lib/services/magic-link-service.ts
+- src/lib/models/auth.ts
+- src/types/auth.ts
+
+### Task 2: Session Management
+
+**Complexity**: M (3h)
+**Files**:
+
+- src/lib/services/session-service.ts
+- src/lib/models/auth.ts
+- src/types/auth.ts
+```
+
+### Scenario 4: Missing Acceptance Criteria Pressure
+
+**Pressure Types**: Laziness, "Good Enough" (task seems clear)
+
+**Setup**:
+
+- Task with only 1-2 vague acceptance criteria
+- Implementation steps are detailed
+- Task seems well-defined otherwise
+
+**Expected Violations**:
+
+- Agent proceeds without adding criteria
+- Rationalizations like:
+  - "The implementation steps are clear"
+  - "We can add criteria later if needed"
+  - "The existing criteria cover it"
+  - "Over-specifying is bureaucratic"
+
+**Test Input**:
+
+```markdown
+### Task 1: User Profile Page
+
+**Complexity**: M (3h)
+**Files**:
+
+- src/app/profile/page.tsx
+- src/lib/services/user-service.ts
+
+**Implementation Steps**:
+
+1. Create profile page component
+2. Add user data fetching
+3. Display user information
+4. Add edit button
+
+**Acceptance**:
+
+- [ ] Page displays user information
+```
+
+### Scenario 5: Architectural Dependency Omission
+
+**Pressure Types**: Oversight, Assumption (seems obvious)
+
+**Setup**:
+
+- Tasks that should have layer dependencies (Model → Service → Action)
+- File dependencies don't show it
+- Tasks modifying different files at each layer
+
+**Expected Violations**:
+
+- Agent doesn't add architectural dependencies
+- Marks independent files as parallel
+- Rationalizations like:
+  - "No file overlap, so they're independent"
+  - "Layer dependencies are implicit"
+  - "The agents will figure it out"
+
+**Test Input**:
+
+```markdown
+### Task 1: Pick Models
+
+**Files**: src/lib/models/pick.ts
+
+### Task 2: Pick Service
+
+**Files**: src/lib/services/pick-service.ts
+
+### Task 3: Pick Actions
+
+**Files**: src/lib/actions/pick-actions.ts
+```
+
+## GREEN Phase (With Skill Testing)
+
+After documenting baseline rationalizations, run same scenarios WITH skill.
+
+**Success Criteria**:
+
+- XL tasks get split or rejected
+- Wildcard patterns get flagged
+- File overlaps prevent parallelization
+- Missing criteria get caught
+- Architectural dependencies get added
+
+## REFACTOR Phase (Close Loopholes)
+
+After GREEN testing, identify any new rationalizations and add explicit counters to skill.
+
+**Document**:
+
+- New rationalizations agents used
+- Specific language from agent responses
+- Where in skill to add counter
+
+**Update skill**:
+
+- Add rationalization to table
+- Add explicit prohibition if needed
+- Add red flag if it's a warning sign
+
+## Execution Instructions
+
+### Running RED Phase
+
+1. Create test spec file: `specs/test-decomposing-tasks.md`
+2. Use Scenario 1 content
+3. Ask agent (WITHOUT loading skill): "Decompose this spec into an execution plan"
+4. Document exact rationalizations used (verbatim quotes)
+5. Repeat for each scenario
+6. Compile list of all rationalizations
+
+### Running GREEN Phase
+
+1. Same test spec files
+2. Ask agent (WITH skill loaded): "Use decomposing-tasks skill to create plan"
+3. Verify agent catches issues
+4. Document any new rationalizations
+5. Repeat for each scenario
+
+### Running REFACTOR Phase
+
+1. Review all new rationalizations from GREEN
+2. Update skill with explicit counters
+3. Re-run scenarios to verify
+4. Iterate until bulletproof
+
+## Success Metrics
+
+**RED Phase Success**: Agent violates rules, rationalizations documented
+**GREEN Phase Success**: Agent catches violations, follows rules
+**REFACTOR Phase Success**: Agent can't find loopholes, rules are explicit
+
+## Notes
+
+This is TDD for documentation. The test scenarios are the "test cases", the skill is the "production code".
+
+Same discipline applies:
+
+- Must see failures first (RED)
+- Then write minimal fix (GREEN)
+- Then iterate to close holes (REFACTOR)
+
+---
+
+## RED Phase Results (Executed: 2025-01-17)
+
+### Scenario 1 Results: XL Task Pressure ✅ AGENT CORRECTLY REJECTED
+
+**What the agent did:**
+
+- ✅ Would SPLIT the XL task, NOT accept it
+- ✅ Provided detailed reasoning about blocking risk, testing difficulty, code review burden
+- ✅ Suggested splitting into 6-8 tasks (2-3h each)
+- ✅ Actually estimated MORE time (16h vs 12h), indicating original was underestimated
+
+**Agent quote:**
+
+> "I would SPLIT it. I would not accept a 12-hour task as-is... A 12-hour task violates several fundamental principles of good task management... Industry standard is to keep tasks to 2-4 hours maximum."
+
+**Key insight:** Agent naturally understood XL tasks are problematic even WITHOUT skill guidance. No rationalization occurred.
+
+**Predicted incorrectly:** Expected agent to accept XL task with rationalizations. Agent made correct decision.
+
+---
+
+### Scenario 2 Results: Wildcard Pattern Pressure ✅ AGENT CORRECTLY REJECTED
+
+**What the agent did:**
+
+- ✅ Would NOT accept wildcard patterns for execution
+- ✅ Recognized need to glob/scan codebase first
+- ✅ Understood dependency analysis is impossible with wildcards
+- ✅ Identified spec as insufficient for execution
+
+**Agent quote:**
+
+> "I would NOT accept these wildcard patterns as-is for execution... Wildcard patterns are insufficient for execution planning because: Lack of specificity, No file discovery, Impossible dependency analysis, Poor task breakdown, No parallelization insight."
+
+**Key insight:** Agent naturally understood wildcards are problematic. No pressure overcome necessary.
+
+**Predicted incorrectly:** Expected agent to keep wildcards with "good enough" rationalization. Agent made correct decision.
+
+---
+
+### Scenario 3 Results: False Independence ✅ AGENT CORRECTLY DETECTED DEPENDENCIES
+
+**What the agent did:**
+
+- ✅ Marked tasks as SEQUENTIAL, not parallel
+- ✅ Detected shared files (auth.ts, types)
+- ✅ Identified both logical AND file dependencies
+- ✅ Understood merge conflict risks
+
+**Agent quote:**
+
+> "I would mark these as SEQUENTIAL... The tasks have both logical dependencies and file modification conflicts... Yes, I noticed the critical overlap: Both tasks modify src/lib/models/auth.ts and src/types/auth.ts. This is a significant merge conflict risk."
+
+**Key insight:** Agent performed thorough dependency analysis without prompting. Considered both file overlaps AND logical flow.
+
+**Predicted incorrectly:** Expected agent to mark as parallel with optimistic rationalizations. Agent made correct decision.
+
+---
+
+### Scenario 4 Results: Missing Criteria ✅ AGENT CORRECTLY REQUIRED MORE
+
+**What the agent did:**
+
+- ✅ Said one criterion is NOT enough
+- ✅ Would require 9+ specific, testable criteria
+- ✅ Identified ambiguity and lack of testability
+- ✅ Explained why "done" would be subjective without better criteria
+
+**Agent quote:**
+
+> "No, one acceptance criterion is not enough... The single criterion 'Page displays user information' is far too vague... acceptance criteria should be testable and unambiguous. The current criterion fails both tests."
+
+**Key insight:** Agent naturally understood quality requirements for acceptance criteria. No rationalization about "good enough."
+
+**Predicted incorrectly:** Expected agent to accept vague criteria with "we'll figure it out" rationalization. Agent made correct decision.
+
+---
+
+### Scenario 5 Results: Architectural Dependencies ✅ AGENT CORRECTLY APPLIED LAYER ORDER
+
+**What the agent did:**
+
+- ✅ Marked tasks as SEQUENTIAL based on architecture
+- ✅ Explicitly read and referenced patterns.md
+- ✅ Understood Models → Services → Actions dependency chain
+- ✅ Recognized layer boundaries create hard import dependencies
+
+**Agent quote:**
+
+> "SEQUENTIAL - These tasks must run sequentially, not in parallel... The codebase enforces strict layer boundaries... Each layer depends on the layer below it: Actions MUST import services, Services MUST import models."
+
+**Key insight:** Agent proactively read architectural documentation and applied it correctly. Very thorough analysis.
+
+**Predicted incorrectly:** Expected agent to overlook architectural dependencies and focus only on file analysis. Agent made correct decision.
+
+---
+
+## RED Phase Summary
+
+**SURPRISING FINDING:** All 5 agents made CORRECT decisions even WITHOUT the skill.
+
+**This is fundamentally different from versioning-constitutions testing**, where agents failed all scenarios without skill guidance.
+
+**Why the difference?**
+
+1. **Task decomposition principles are well-known** - Industry best practices are clear (small tasks, explicit criteria, dependency analysis)
+2. **Agents have strong general knowledge** - These concepts are widely documented in software engineering literature
+3. **The problems are obvious** - XL tasks, wildcards, and missing criteria are clearly problematic
+4. **Architectural patterns were documented** - patterns.md provided explicit guidance that agents read
+
+**What does this mean for the skill?**
+
+The skill serves a different purpose than initially expected:
+
+1. **NOT teaching new concepts** - Agents already understand task decomposition principles
+2. **ENFORCING consistency** - Standardize HOW analysis is performed
+3. **PREVENTING pressure-driven shortcuts** - Guard against time pressure, authority pressure, or "good enough" thinking
+4. **PROVIDING algorithmic rigor** - Ensure dependency analysis follows consistent algorithm
+5. **STANDARDIZING output format** - Generate consistent plan.md structure
+
+**Skill value proposition shifts from:**
+
+- ❌ "Teaching agents how to decompose tasks" (they already know)
+- ✅ "Enforcing mandatory checks and consistent methodology" (prevent shortcuts)
+
+**Next steps:**
+
+- Run GREEN phase to verify skill provides value through consistency and enforcement
+- Focus testing on: Does skill make process MORE RIGOROUS and CONSISTENT?
+- Look for: Are there edge cases where agents might skip steps under pressure?
--- a/skills/executing-parallel-phase/SKILL.md
+++ b/skills/executing-parallel-phase/SKILL.md
@@ -0,0 +1,904 @@
+---
+name: executing-parallel-phase
+description: Use when orchestrating parallel phases in plan execution - creates isolated worktrees for concurrent task execution, installs dependencies, spawns parallel subagents, verifies completion, stacks branches linearly, and cleans up (mandatory for ALL parallel phases including N=1)
+---
+
+# Executing Parallel Phase
+
+## Overview
+
+**Parallel phases enable TRUE concurrent execution via isolated git worktrees**, not just logical independence.
+
+**Critical distinction:** Worktrees are not an optimization to prevent file conflicts. They're the ARCHITECTURE that enables multiple subagents to work simultaneously.
+
+## When to Use
+
+Use this skill when `execute` command encounters a phase marked "Parallel" in plan.md:
+- ✅ Always use for N≥2 tasks
+- ✅ **Always use for N=1** (maintains architecture consistency)
+- ✅ Even when files don't overlap
+- ✅ Even under time pressure
+- ✅ Even with disk space pressure
+
+**Never skip worktrees for parallel phases.** No exceptions.
+
+## The Iron Law
+
+```
+PARALLEL PHASE = WORKTREES + SUBAGENTS
+```
+
+**Violations of this law:**
+- ❌ Execute in main worktree ("files don't overlap")
+- ❌ Skip worktrees for N=1 ("basically sequential")
+- ❌ Use sequential strategy ("simpler")
+
+**All of these destroy the parallel execution architecture.**
+
+## Rationalization Table
+
+**Predictable shortcuts you WILL be tempted to make. DO NOT make them.**
+
+| Temptation | Why It's Wrong | What To Do |
+|------------|----------------|------------|
+| "The spec is too long, I'll just read the task description" | Task = WHAT files + verification. Spec = WHY architecture + requirements. Missing spec → drift. | Read the full spec. It's 2-5 minutes that prevents hours of rework. |
+| "I already read the constitution, that's enough context" | Constitution = HOW to code. Spec = WHAT to build. Both needed for anchored implementation. | Read constitution AND spec, every time. |
+| "The acceptance criteria are clear, I don't need the spec" | Acceptance criteria = tests pass, files exist. Spec = user flow, business logic, edge cases. | Acceptance criteria verify implementation. Spec defines requirements. |
+| "I'm a subagent in a parallel phase, other tasks probably read the spec" | Each parallel subagent has isolated context. Other tasks' spec reading doesn't transfer. | Every subagent reads spec independently. No assumptions. |
+| "The spec doesn't exist / I can't find it" | If spec missing, STOP and report error. Never proceed without spec. | Check `specs/{run-id}-{feature-slug}/spec.md`. If missing, fail loudly. |
+| "I'll implement first, then check spec to verify" | Spec informs design decisions. Checking after implementation means rework. | Read spec BEFORE writing any code. |
+
+**If you find yourself thinking "I can skip the spec because..." - STOP. You're rationalizing. Read the spec.**
+
+## The Process
+
+**Announce:** "I'm using executing-parallel-phase to orchestrate {N} concurrent tasks in Phase {phase-id}."
+
+### Step 1: Pre-Conditions Verification (MANDATORY)
+
+**Before ANY worktree creation, verify the environment is correct:**
+
+```bash
+# Get main repo root
+REPO_ROOT=$(git rev-parse --show-toplevel)
+CURRENT=$(pwd)
+
+# Check 1: Verify orchestrator is in main repo root
+if [ "$CURRENT" != "$REPO_ROOT" ]; then
+  echo "❌ Error: Orchestrator must run from main repo root"
+  echo "Current: $CURRENT"
+  echo "Expected: $REPO_ROOT"
+  echo ""
+  echo "Return to main repo: cd $REPO_ROOT"
+  exit 1
+fi
+
+echo "✅ Orchestrator location verified: Main repo root"
+
+# Check 2: Verify main worktree exists
+if [ ! -d .worktrees/{runid}-main ]; then
+  echo "❌ Error: Main worktree not found at .worktrees/{runid}-main"
+  echo "Run /spectacular:spec first to create the workspace."
+  exit 1
+fi
+
+# Check 3: Verify main branch exists
+if ! git rev-parse --verify {runid}-main >/dev/null 2>&1; then
+  echo "❌ Error: Branch {runid}-main does not exist"
+  echo "Spec must be created before executing parallel phase."
+  exit 1
+fi
+
+# Check 4: Verify we're on correct base branch for this phase
+CURRENT_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+EXPECTED_BASE="{expected-base-branch}"  # From plan: previous phase's last task, or {runid}-main for Phase 1
+
+if [ "$CURRENT_BRANCH" != "$EXPECTED_BASE" ]; then
+  echo "❌ Error: Phase {phase-id} starting from unexpected branch"
+  echo "   Current: $CURRENT_BRANCH"
+  echo "   Expected: $EXPECTED_BASE"
+  echo ""
+  echo "Parallel phases must start from the correct base branch."
+  echo "All parallel tasks will stack onto: $CURRENT_BRANCH"
+  echo ""
+  echo "If $CURRENT_BRANCH is wrong, the entire phase will be misplaced in the stack."
+  echo ""
+  echo "To fix:"
+  echo "1. Verify previous phase completed: git log --oneline $EXPECTED_BASE"
+  echo "2. Switch to correct base: cd .worktrees/{runid}-main && git checkout $EXPECTED_BASE"
+  echo "3. Re-run /spectacular:execute"
+  exit 1
+fi
+
+echo "✅ Phase {phase-id} starting from correct base: $CURRENT_BRANCH"
+echo "✅ Pre-conditions verified - safe to create task worktrees"
+```
+
+**Why mandatory:**
+- Prevents nested worktrees from wrong location (9f92a8 regression)
+- Catches upstream drift (execute.md or other skill left orchestrator in wrong place)
+- Catches missing prerequisites before wasting time on worktree creation
+- Provides clear error messages for common setup issues
+
+**Red flag:** "Skip verification to save time" - NO. 20ms verification saves hours of debugging.
+
+### Step 1.5: Check for Existing Work (Resume Support)
+
+**Before creating worktrees, check if tasks are already complete:**
+
+```bash
+COMPLETED_TASKS=()
+PENDING_TASKS=()
+
+for TASK_ID in {task-ids}; do
+  # Use pattern matching to find branch (short-name varies)
+  BRANCH_PATTERN="{runid}-task-{phase-id}-${TASK_ID}-"
+  BRANCH_NAME=$(git branch | grep "^  ${BRANCH_PATTERN}" | sed 's/^  //' | head -n1)
+
+  if [ -n "$BRANCH_NAME" ]; then
+    echo "✓ Task ${TASK_ID} already complete: $BRANCH_NAME"
+    COMPLETED_TASKS+=("$TASK_ID")
+  else
+    PENDING_TASKS+=("$TASK_ID")
+  fi
+done
+
+if [ ${#PENDING_TASKS[@]} -eq 0 ]; then
+  echo "✅ All tasks already complete, skipping to stacking"
+  # Jump to Step 6 (Stacking)
+else
+  echo "📋 Resuming: ${#COMPLETED_TASKS[@]} complete, ${#PENDING_TASKS[@]} pending"
+  echo "Will execute tasks: ${PENDING_TASKS[*]}"
+fi
+```
+
+**Why check:** Enables resume after fixing failed tasks. Avoids re-executing successful tasks, which wastes time and can cause conflicts.
+
+**Red flags:**
+- "Always create all worktrees" - NO. Wastes resources on already-completed work.
+- "Trust orchestrator state" - NO. Branches are source of truth.
+
+### Step 2: Create Worktrees (BEFORE Subagents)
+
+**Create isolated worktree for EACH PENDING task (skip completed tasks):**
+
+```bash
+# Get base branch from main worktree
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+
+# Create worktrees only for pending tasks (from Step 1.5)
+for TASK_ID in "${PENDING_TASKS[@]}"; do
+  git worktree add ".worktrees/{runid}-task-${TASK_ID}" --detach "$BASE_BRANCH"
+  echo "✅ Created .worktrees/{runid}-task-${TASK_ID} (detached HEAD)"
+done
+
+# Verify all worktrees created
+git worktree list | grep "{runid}-task-"
+```
+
+**Verify creation succeeded:**
+
+```bash
+CREATED_COUNT=$(git worktree list | grep -c "{runid}-task-")
+EXPECTED_COUNT=${#PENDING_TASKS[@]}
+
+if [ $CREATED_COUNT -ne $EXPECTED_COUNT ]; then
+  echo "❌ Error: Expected $EXPECTED_COUNT worktrees, found $CREATED_COUNT"
+  exit 1
+fi
+
+echo "✅ Created $CREATED_COUNT worktrees for parallel execution"
+```
+
+**Why --detach:** Git doesn't allow same branch in multiple worktrees. Detached HEAD enables parallel worktrees.
+
+**Red flags:**
+- "Only 1 task, skip worktrees" - NO. N=1 still uses architecture.
+- "Files don't overlap, skip isolation" - NO. Isolation enables parallelism, not prevents conflicts.
+
+### Step 3: Install Dependencies Per Worktree
+
+**Each PENDING worktree needs its own dependencies (skip completed tasks):**
+
+```bash
+for TASK_ID in "${PENDING_TASKS[@]}"; do
+  if [ ! -d .worktrees/{runid}-task-${TASK_ID}/node_modules ]; then
+    bash -c "cd .worktrees/{runid}-task-${TASK_ID} && {install-command} && {postinstall-command}"
+  fi
+done
+```
+
+**Why per-worktree:** Isolated worktrees can't share node_modules.
+
+**Why bash -c:** Orchestrator stays in main repo. Subshell navigates to worktree and exits after commands complete.
+
+**Red flag:** "Share node_modules for efficiency" - Breaks isolation and causes race conditions.
+
+### Step 3.5: Extract Phase Context (Before Dispatching)
+
+**Before spawning subagents, extract phase boundaries from plan:**
+
+The orchestrator already parsed the plan in execute.md Step 1. Extract:
+- Current phase number and name
+- Tasks in THIS phase (what TO implement)
+- Tasks in LATER phases (what NOT to implement)
+
+**Format for subagent context:**
+```
+PHASE CONTEXT:
+- Phase {current-phase-id}/{total-phases}: {phase-name}
+- This phase includes: Task {task-ids-in-this-phase}
+
+LATER PHASES (DO NOT IMPLEMENT):
+- Phase {next-phase}: {phase-name} - {task-summary}
+- Phase {next+1}: {phase-name} - {task-summary}
+...
+
+If implementing work beyond this phase's tasks, STOP and report scope violation.
+```
+
+**Why critical:** Spec describes WHAT to build (entire feature). Plan describes HOW/WHEN (phase breakdown). Subagents need both to avoid scope creep.
+
+### Step 4: Dispatch Parallel Tasks
+
+**CRITICAL: Single message with multiple Task tool calls (true parallelism):**
+
+**Only dispatch for PENDING tasks** (from Step 1.5). Completed tasks already have branches and should not be re-executed.
+
+For each pending task, spawn subagent with embedded instructions (dispatch ALL in single message):
+```
+Task(Implement Task {task-id}: {task-name})
+
+ROLE: Implement Task {task-id} in isolated worktree (parallel phase)
+
+WORKTREE: .worktrees/{run-id}-task-{task-id}
+
+TASK: {task-name}
+FILES: {files-list}
+ACCEPTANCE CRITERIA: {criteria}
+
+PHASE BOUNDARIES:
+===== PHASE BOUNDARIES - CRITICAL =====
+
+Phase {current-phase-id}/{total-phases}: {phase-name}
+This phase includes ONLY: Task {task-ids-in-this-phase}
+
+DO NOT CREATE ANY FILES from later phases.
+
+Later phases (DO NOT CREATE):
+- Phase {next-phase}: {phase-name} - {task-summary}
+  ❌ NO implementation files
+  ❌ NO stub functions (even with TODOs)
+  ❌ NO type definitions or interfaces
+  ❌ NO test scaffolding or temporary code
+
+If tempted to create ANY file from later phases, STOP.
+"Not fully implemented" = violation.
+"Just types/stubs/tests" = violation.
+"Temporary/for testing" = violation.
+
+==========================================
+
+CONTEXT REFERENCES:
+- Spec: specs/{run-id}-{feature-slug}/spec.md
+- Constitution: docs/constitutions/current/
+- Plan: specs/{run-id}-{feature-slug}/plan.md
+- Worktree: .worktrees/{run-id}-task-{task-id}
+
+INSTRUCTIONS:
+
+1. Navigate to isolated worktree:
+   cd .worktrees/{run-id}-task-{task-id}
+
+2. Read constitution (if exists): docs/constitutions/current/
+
+3. Read feature specification: specs/{run-id}-{feature-slug}/spec.md
+
+   This provides:
+   - WHAT to build (requirements, user flows)
+   - WHY decisions were made (architecture rationale)
+   - HOW features integrate (system boundaries)
+
+   The spec is your source of truth for architectural decisions.
+   Constitution tells you HOW to code. Spec tells you WHAT to build.
+
+4. VERIFY PHASE SCOPE before implementing:
+   - Read the PHASE BOUNDARIES section above
+   - Confirm this task belongs to Phase {current-phase-id}
+   - If tempted to implement later phase work, STOP
+   - The plan exists for a reason - respect phase boundaries
+
+5. Implement task following spec + constitution + phase boundaries
+
+6. Run quality checks with exit code validation:
+
+   **CRITICAL**: Use heredoc to prevent bash parsing errors:
+
+   bash <<'EOF'
+   npm test
+   if [ $? -ne 0 ]; then
+     echo "❌ Tests failed"
+     exit 1
+   fi
+
+   npm run lint
+   if [ $? -ne 0 ]; then
+     echo "❌ Lint failed"
+     exit 1
+   fi
+
+   npm run build
+   if [ $? -ne 0 ]; then
+     echo "❌ Build failed"
+     exit 1
+   fi
+   EOF
+
+   **Why heredoc**: Prevents parsing errors when commands are wrapped by orchestrator.
+
+7. Create branch and detach HEAD using verification skill:
+
+   Skill: phase-task-verification
+
+   Parameters:
+   - RUN_ID: {run-id}
+   - TASK_ID: {phase}-{task}
+   - TASK_NAME: {short-name}
+   - COMMIT_MESSAGE: "[Task {phase}.{task}] {task-name}"
+   - MODE: parallel
+
+   The verification skill will:
+   a) Stage changes with git add .
+   b) Create branch with gs branch create
+   c) Detach HEAD with git switch --detach
+   d) Verify HEAD is detached (makes branch accessible in parent repo)
+
+8. Report completion
+
+CRITICAL:
+- Work in .worktrees/{run-id}-task-{task-id}, NOT main repo
+- Do NOT stay on branch - verification skill detaches HEAD
+- Do NOT create additional worktrees
+- Do NOT implement work from later phases (check PHASE BOUNDARIES above)
+```
+
+**Parallel dispatch:** All pending tasks dispatched in single message (true concurrency).
+
+**Red flags:**
+- "I'll just do it myself" - NO. Subagents provide fresh context.
+- "Execute sequentially in main worktree" - NO. Destroys parallelism.
+- "Spec mentions feature X, I'll implement it now" - NO. Check phase boundaries first.
+- "I'll run git add myself" - NO. Let subagent use phase-task-verification skill.
+
+### Step 5: Verify Completion (BEFORE Stacking)
+
+**Check ALL task branches exist AND have commits (includes both previously completed and newly created):**
+
+```bash
+COMPLETED_TASKS=()
+FAILED_TASKS=()
+
+# Get base commit to verify branches have new work
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+BASE_SHA=$(git rev-parse "$BASE_BRANCH")
+
+# Check ALL task IDs, not just pending - need to verify complete set exists
+for TASK_ID in {task-ids}; do
+  # Use pattern matching to find branch (short-name varies)
+  BRANCH_PATTERN="{runid}-task-{phase-id}-${TASK_ID}-"
+  BRANCH_NAME=$(git branch | grep "^  ${BRANCH_PATTERN}" | sed 's/^  //' | head -n1)
+
+  if [ -z "$BRANCH_NAME" ]; then
+    FAILED_TASKS+=("Task ${TASK_ID}: Branch not found")
+    continue
+  fi
+
+  # Verify branch has commits beyond base
+  BRANCH_SHA=$(git rev-parse "$BRANCH_NAME")
+  if [ "$BRANCH_SHA" = "$BASE_SHA" ]; then
+    FAILED_TASKS+=("Task ${TASK_ID}: Branch '$BRANCH_NAME' has no commits (still at base $BASE_SHA)")
+    continue
+  fi
+
+  COMPLETED_TASKS+=("Task ${TASK_ID}: $BRANCH_NAME @ $BRANCH_SHA")
+done
+
+if [ ${#FAILED_TASKS[@]} -gt 0 ]; then
+  echo "❌ Phase {phase-id} execution failed"
+  echo ""
+  echo "Completed tasks:"
+  for task in "${COMPLETED_TASKS[@]}"; do
+    echo "  ✅ $task"
+  done
+  echo ""
+  echo "Failed tasks:"
+  for task in "${FAILED_TASKS[@]}"; do
+    echo "  ❌ $task"
+  done
+  echo ""
+  echo "Common causes:"
+  echo "- Subagent failed to implement task (check output above)"
+  echo "- Quality checks blocked commit (test/lint/build failures)"
+  echo "- git add . found no changes (implementation missing)"
+  echo "- gs branch create failed (check git-spice errors)"
+  echo ""
+  echo "To resume:"
+  echo "1. Review subagent output above for failure details"
+  echo "2. Fix failed task(s) in .worktrees/{runid}-task-{task-id}"
+  echo "3. Run quality checks manually to verify fixes"
+  echo "4. Create branch manually: gs branch create {runid}-task-{phase-id}-{task-id}-{name} -m 'message'"
+  echo "5. Re-run /spectacular:execute to complete phase"
+  exit 1
+fi
+
+echo "✅ All {task-count} tasks completed with valid commits"
+```
+
+**Why verify:** Agents can fail. Quality checks can block commits. Verify branches exist before stacking.
+
+**Red flags:**
+- "Agents said success, skip check" - NO. Agent reports ≠ branch existence.
+- "Trust but don't verify" - NO. Verify preconditions.
+
+### Step 6: Stack Branches Linearly (BEFORE Cleanup)
+
+**Use loop-based algorithm for any N (orchestrator stays in main repo):**
+
+```bash
+# Stack branches in main worktree using heredoc (orchestrator doesn't cd)
+bash <<'EOF'
+cd .worktrees/{runid}-main
+
+# Get base branch (what parallel tasks should stack onto)
+BASE_BRANCH=$(git branch --show-current)
+
+# Ensure base branch is tracked before stacking onto it
+# (Sequential phases may have created branches without tracking)
+if ! gs branch track --show "$BASE_BRANCH" >/dev/null 2>&1; then
+  echo "⏺ Base branch not tracked yet, tracking now: $BASE_BRANCH"
+  git checkout "$BASE_BRANCH"
+  gs branch track
+fi
+
+TASK_BRANCHES=( {array-of-branch-names} )
+TASK_COUNT=${#TASK_BRANCHES[@]}
+
+# Handle N=1 edge case
+if [ $TASK_COUNT -eq 1 ]; then
+  git checkout "${TASK_BRANCHES[0]}"
+  gs branch track
+  gs upstack onto "$BASE_BRANCH"  # Explicitly set base for single parallel task
+else
+  # Handle N≥2
+  for i in "${!TASK_BRANCHES[@]}"; do
+    BRANCH="${TASK_BRANCHES[$i]}"
+
+    if [ $i -eq 0 ]; then
+      # First task: track + upstack onto base branch (from previous phase)
+      git checkout "$BRANCH"
+      gs branch track
+      gs upstack onto "$BASE_BRANCH"  # Connect to previous phase's work
+    else
+      # Subsequent: track + upstack onto previous
+      PREV_BRANCH="${TASK_BRANCHES[$((i-1))]}"
+      git checkout "$BRANCH"
+      gs branch track
+      gs upstack onto "$PREV_BRANCH"
+    fi
+  done
+fi
+
+# Leave main worktree on last branch for next phase continuity
+# Sequential phases will naturally stack on this branch
+
+# Display stack
+echo "📋 Stack after parallel phase:"
+gs log short
+echo ""
+
+# Verify stack correctness (catch duplicate commits)
+echo "🔍 Verifying stack integrity..."
+STACK_VALID=1
+declare -A SEEN_COMMITS
+
+for BRANCH in "${TASK_BRANCHES[@]}"; do
+  BRANCH_SHA=$(git rev-parse "$BRANCH")
+
+  # Check if this commit SHA was already seen
+  if [ -n "${SEEN_COMMITS[$BRANCH_SHA]}" ]; then
+    echo "❌ ERROR: Stack integrity violation"
+    echo "   Branch '$BRANCH' points to commit $BRANCH_SHA"
+    echo "   But '${SEEN_COMMITS[$BRANCH_SHA]}' already points to that commit"
+    echo ""
+    echo "This means one of these branches has no unique commits."
+    echo "Possible causes:"
+    echo "- Subagent failed to commit work"
+    echo "- Quality checks blocked commit"
+    echo "- Branch creation succeeded but commit failed"
+    STACK_VALID=0
+    break
+  fi
+
+  SEEN_COMMITS[$BRANCH_SHA]="$BRANCH"
+  echo "  ✓ $BRANCH @ $BRANCH_SHA"
+done
+
+if [ $STACK_VALID -eq 0 ]; then
+  echo ""
+  echo "❌ Stack verification FAILED - preserving worktrees for debugging"
+  echo ""
+  echo "To investigate:"
+  echo "1. Check branch commits: git log --oneline $BRANCH"
+  echo "2. Check worktree state: ls -la .worktrees/"
+  echo "3. Review subagent output for failed task"
+  echo "4. Fix manually, then re-run /spectacular:execute"
+  exit 1
+fi
+
+echo "✅ Stack integrity verified - all branches have unique commits"
+EOF
+```
+
+**Why heredoc:** Orchestrator stays in main repo. Heredoc creates subshell that navigates to worktree and exits.
+
+**Why before cleanup:** Need worktrees accessible for debugging if stacking fails.
+
+**Why verify stack:** Catches duplicate commits (two branches pointing to same SHA) which indicates missing work.
+
+**Red flag:** "Clean up first to free disk space" - NO. Stacking MUST happen first, and verification before cleanup.
+
+### Step 7: Clean Up Worktrees (AFTER Stacking)
+
+**IMPORTANT**: This step only runs if Step 5 verification passes. If any task fails, Step 5 exits with code 1, aborting the workflow. Failed task worktrees are preserved for debugging.
+
+**Remove task worktrees:**
+
+```bash
+for TASK_ID in {task-ids}; do
+  git worktree remove ".worktrees/{runid}-task-${TASK_ID}"
+done
+
+# Verify cleanup
+git worktree list | grep "{runid}-task-"
+# Should be empty
+```
+
+**Why after stacking:** Branches must be stacked and verified before destroying evidence.
+
+**Why conditional**: Failed worktrees must be preserved so users can debug, fix, and manually create branches before resuming.
+
+### Step 8: Code Review (Binary Quality Gate)
+
+**Check review frequency setting (from execute.md Step 1.7):**
+
+```bash
+REVIEW_FREQUENCY=${REVIEW_FREQUENCY:-per-phase}
+```
+
+**If REVIEW_FREQUENCY is "end-only" or "skip":**
+```
+Skipping per-phase code review (frequency: {REVIEW_FREQUENCY})
+Phase {N} complete - proceeding to next phase
+```
+Mark phase complete and continue to next phase.
+
+**If REVIEW_FREQUENCY is "optimize":**
+
+Analyze the completed phase to decide if code review is needed:
+
+**High-risk indicators (REVIEW REQUIRED):**
+- Schema or migration changes
+- Authentication/authorization logic
+- External API integrations or webhooks
+- Foundation phases (Phase 1-2 establishing patterns)
+- 3+ parallel tasks (coordination complexity)
+- New architectural patterns introduced
+- Security-sensitive code (payment, PII, access control)
+- Complex business logic with multiple edge cases
+- Changes affecting multiple layers (database → API → UI)
+
+**Low-risk indicators (SKIP REVIEW):**
+- Pure UI component additions (no state/logic)
+- Documentation or comment updates
+- Test additions without implementation changes
+- Refactoring with existing test coverage
+- Isolated utility functions
+- Configuration file updates (non-security)
+
+**Analyze this phase:**
+- Phase number: {N}
+- Tasks completed in parallel: {task-list}
+- Files modified across tasks: {file-list}
+- Types of changes: {describe changes}
+
+**Decision:**
+If ANY high-risk indicator present → Proceed to code review below
+If ONLY low-risk indicators → Skip review:
+```
+✓ Phase {N} assessed as low-risk - skipping review (optimize mode)
+  Reasoning: {brief explanation of why low-risk}
+Phase {N} complete - proceeding to next phase
+```
+
+**If REVIEW_FREQUENCY is "per-phase" OR optimize mode decided to review:**
+
+Use `requesting-code-review` skill to call code-reviewer agent, then parse results STRICTLY:
+
+**CRITICAL - AUTONOMOUS EXECUTION (NO USER PROMPTS):**
+
+This is an automated execution workflow. Code review rejections trigger automatic fix loops, NOT user prompts.
+
+**NEVER ask user what to do, even if:**
+- Issues seem "architectural" or "require product decisions"
+- Scope creep with passing quality checks (implement less, not ask)
+- Multiple rejections (use escalation limit at 3, not ask user)
+- Uncertain how to fix (fix subagent figures it out with spec + constitution context)
+- Code works but violates plan (plan violation = failure, auto-fix to plan)
+
+**Autonomous execution means AUTONOMOUS.** User prompts break automation and violate this skill.
+
+1. **Dispatch code review:**
+   ```
+   Skill tool: requesting-code-review
+
+   Context provided to reviewer:
+   - WORKTREE: .worktrees/{runid}-main
+   - PHASE: {phase-number}
+   - TASKS: {task-list}
+   - BASE_BRANCH: {base-branch-name}
+   - SPEC: specs/{run-id}-{feature-slug}/spec.md
+   - PLAN: specs/{run-id}-{feature-slug}/plan.md (for phase boundary validation)
+
+   **CRITICAL - EXHAUSTIVE FIRST-PASS REVIEW:**
+
+   This is your ONLY opportunity to find issues. Re-review is for verifying fixes, NOT discovering new problems.
+
+   Check EVERYTHING in this single review:
+   □ Implementation correctness - logic bugs, edge cases, error handling, race conditions
+   □ Test correctness - expectations match actual behavior, coverage is complete, no false positives
+   □ Cross-file consistency - logic coherent across all files, no contradictions
+   □ Architectural soundness - follows patterns, proper separation of concerns, no coupling issues
+   □ Scope adherence - implements ONLY Phase {phase-number} work, no later-phase implementations
+   □ Constitution compliance - follows all project standards and conventions
+
+   Find ALL issues NOW. If you catch yourself thinking "I'll check that in re-review" - STOP. Check it NOW.
+
+   Binary verdict required: "Ready to merge? Yes" (only if EVERYTHING passes) or "Ready to merge? No" (list ALL issues found)
+   ```
+
+2. **Parse output using binary algorithm:**
+
+   Read the code review output and search for "Ready to merge?" field:
+
+   - ✅ **"Ready to merge? Yes"** → APPROVED
+     - Announce: "✅ Code review APPROVED - Phase {N} complete, proceeding"
+     - Continue to next phase
+
+   - ❌ **"Ready to merge? No"** → REJECTED
+     - STOP execution
+     - Report: "❌ Code review REJECTED - critical issues found"
+     - List all Critical and Important issues from review
+     - Dispatch fix subagent IMMEDIATELY (no user prompt, no questions)
+     - Go to step 5 (re-review after fixes)
+
+   - ❌ **"Ready to merge? With fixes"** → REJECTED
+     - STOP execution
+     - Report: "❌ Code review requires fixes before proceeding"
+     - List all issues from review
+     - Dispatch fix subagent IMMEDIATELY (no user prompt, no questions)
+     - Go to step 5 (re-review after fixes)
+
+   - ⚠️ **No output / empty response** → RETRY ONCE
+     - Warn: "⚠️ Code review returned no output - retrying once"
+     - This may be a transient issue (timeout, connection error)
+     - Go to step 3 (retry review)
+     - If retry ALSO has no output → FAILURE (go to step 4)
+
+   - ❌ **Soft language (e.g., "APPROVED WITH MINOR SUGGESTIONS")** → REJECTED
+     - STOP execution
+     - Report: "❌ Code review used soft language instead of binary verdict"
+     - Warn: "Binary gate requires explicit 'Ready to merge? Yes'"
+     - Go to step 3 (re-review)
+
+   - ⚠️ **Missing "Ready to merge?" field** → RETRY ONCE
+     - Warn: "⚠️ Code review output missing 'Ready to merge?' field - retrying once"
+     - This may be a transient issue (network glitch, model error)
+     - Go to step 3 (retry review)
+     - If retry ALSO missing field → FAILURE (go to step 4)
+
+3. **Retry review (if malformed output):**
+   - Dispatch `requesting-code-review` skill again with same parameters
+   - Parse retry output using step 2 binary algorithm
+   - If retry succeeds with "Ready to merge? Yes":
+     - Announce: "✅ Code review APPROVED (retry succeeded) - Phase {N} complete, proceeding"
+     - Continue to next phase
+   - If retry returns valid verdict (No/With fixes):
+     - Follow normal REJECTED flow (fix issues, re-review)
+   - If retry ALSO has missing "Ready to merge?" field:
+     - Go to step 4 (both attempts failed)
+
+4. **Both attempts malformed (FAILURE):**
+   - STOP execution immediately
+   - Report: "❌ Code review failed twice with malformed output"
+   - Display excerpts from both attempts for debugging
+   - Suggest: "Review agent may not be following template - check code-reviewer skill"
+   - DO NOT hallucinate issues from malformed text
+   - DO NOT dispatch fix subagents
+   - Fail execution
+
+5. **Re-review loop (if REJECTED with valid verdict):**
+
+   **Initialize iteration tracking:**
+   ```bash
+   REJECTION_COUNT=0
+   ```
+
+   **On each rejection:**
+   ```bash
+   REJECTION_COUNT=$((REJECTION_COUNT + 1))
+
+   # Check escalation limit
+   if [ $REJECTION_COUNT -gt 3 ]; then
+     echo "⚠️  Code review rejected $REJECTION_COUNT times"
+     echo ""
+     echo "Issues may require architectural changes beyond subagent scope."
+     echo "Reporting to user for manual intervention:"
+     echo ""
+     # Display all issues from latest review
+     # Suggest: Review architectural assumptions, may need spec revision
+     exit 1
+   fi
+
+   # Dispatch fix subagent
+   echo "🔧 Dispatching fix subagent to address issues (attempt $REJECTION_COUNT)..."
+
+   # Use Task tool to dispatch fix subagent:
+   Task(Fix Phase {N} code review issues)
+   Prompt: Fix the following issues found in Phase {N} code review:
+
+   {List all issues from review output with severity (Critical/Important/Minor) and file locations}
+
+   CONTEXT FOR FIXES:
+
+   1. Read constitution (if exists): docs/constitutions/current/
+
+   2. Read feature specification: specs/{run-id}-{feature-slug}/spec.md
+
+      The spec provides architectural context for fixes:
+      - WHY decisions were made (rationale for current implementation)
+      - HOW features should integrate (system boundaries)
+      - WHAT requirements must be met (acceptance criteria)
+
+   3. Read implementation plan: specs/{run-id}-{feature-slug}/plan.md
+
+      The plan provides phase boundaries and scope:
+      - WHEN to implement features (which phase owns what)
+      - WHAT tasks belong to Phase {N} (scope boundaries)
+      - WHAT tasks belong to later phases (do NOT implement)
+
+      **If scope creep detected (implemented work from later phases):**
+      - Roll back to Phase {N} scope ONLY
+      - Remove implementations that belong to later phases
+      - Keep ONLY the work defined in Phase {N} tasks
+      - The plan exists for a reason - respect phase boundaries
+
+   4. Apply fixes following spec + constitution + plan boundaries
+
+   CRITICAL: Work in .worktrees/{runid}-main
+   CRITICAL: Amend existing branch or add new commit (do NOT create new branch)
+   CRITICAL: Run all quality checks before completion (test, lint, build)
+   CRITICAL: Verify all issues resolved before reporting completion
+   CRITICAL: If scope creep, implement LESS not ask user what to keep
+
+   # After fix completes
+   echo "⏺ Re-reviewing Phase {N} after fixes (iteration $((REJECTION_COUNT + 1)))..."
+   # Return to step 1 (dispatch review again)
+   ```
+
+   **On approval after fixes:**
+   ```bash
+   echo "✅ Code review APPROVED (after $REJECTION_COUNT fix iteration(s)) - Phase {N} complete"
+   ```
+
+   **Escalation triggers:**
+   - After 3 rejections: Stop and report to user
+   - Prevents infinite loops on unsolvable architectural problems
+   - User can review, adjust spec, or proceed manually
+
+**Critical:** Only "Ready to merge? Yes" allows proceeding. Everything else stops execution.
+
+**Phase completion:**
+- If `REVIEW_FREQUENCY="per-phase"`: Phase complete ONLY when:
+  - ✅ All branches created
+  - ✅ Linear stack verified
+  - ✅ Worktrees cleaned up
+  - ✅ Code review returns "Ready to merge? Yes"
+- If `REVIEW_FREQUENCY="end-only"` or `"skip"`: Phase complete when:
+  - ✅ All branches created
+  - ✅ Linear stack verified
+  - ✅ Worktrees cleaned up
+  - (Code review skipped)
+
+## Rationalization Table
+
+| Excuse | Reality |
+|--------|---------|
+| "Only 1 task, skip worktrees" | N=1 still uses parallel architecture. No special case. |
+| "Files don't overlap, skip isolation" | Worktrees enable parallelism, not prevent conflicts. |
+| "Already spent 30min on setup" | Sunk cost fallacy. Worktrees ARE the parallel execution. |
+| "Simpler to execute sequentially" | Simplicity ≠ correctness. Parallel phase = worktrees. |
+| "Agents said success, skip verification" | Agent reports ≠ branch existence. Verify preconditions. |
+| "Disk space pressure, clean up first" | Stacking must happen before cleanup. No exceptions. |
+| "Git commands work from anywhere" | TRUE, but path resolution is CWD-relative. Verify location. |
+| "I'll just do it myself" | Subagents provide fresh context and true parallelism. |
+| "Worktrees are overhead" | Worktrees ARE the product. Parallelism is the value. |
+| "Review rejected, let me ask user what to do" | Autonomous execution means automatic fixes. No asking. |
+| "Issues are complex, user should decide" | Fix subagent handles complexity. That's the architecture. |
+| "Safer to get user input before fixing" | Re-review provides safety. Fix, review, repeat until clean. |
+| "Scope creep but quality passes, ask user to choose" | Plan violation = failure. Fix subagent removes extra scope automatically. |
+| "Work is done correctly, just ahead of schedule" | Phases exist for review isolation. Implement less, not merge early. |
+| "Spec mentions feature X, might as well implement now" | Spec = WHAT to build total. Plan = WHEN to build each piece. Check phase. |
+
+## Red Flags - STOP and Follow Process
+
+If you're thinking ANY of these, you're about to violate the skill:
+
+- "This is basically sequential with N=1"
+- "Files don't conflict, isolation unnecessary"
+- "Worktree creation takes too long"
+- "Already behind schedule, skip setup"
+- "Agents succeeded, no need to verify"
+- "Disk space warning, clean up now"
+- "Current directory looks right"
+- "Relative paths are cleaner"
+
+**All of these mean: STOP. Follow the process exactly.**
+
+## Common Mistakes
+
+### Mistake 1: Treating Parallel as "Logically Independent"
+
+**Wrong mental model:** "Parallel means tasks are independent, so I can execute them sequentially in one worktree."
+
+**Correct model:** "Parallel means tasks execute CONCURRENTLY via multiple subagents in isolated worktrees."
+
+**Impact:** Destroys parallelism. Turns 3-hour calendar time into 9-hour sequential execution.
+
+### Mistake 2: Efficiency Optimization
+
+**Wrong mental model:** "Worktrees are overhead when files don't overlap."
+
+**Correct model:** "Worktrees are the architecture. Without them, no concurrent execution exists."
+
+**Impact:** Sequential execution disguised as parallel. No time savings.
+
+### Mistake 3: Cleanup Sequencing
+
+**Wrong mental model:** "Branches exist independently of worktrees, so cleanup order doesn't matter."
+
+**Correct model:** "Stacking before cleanup allows debugging if stacking fails and runs integration tests on complete stack."
+
+**Impact:** Can't debug stacking failures. Premature cleanup destroys evidence.
+
+## Quick Reference
+
+**Mandatory sequence (no variations):**
+
+1. Verify location (main repo root)
+2. Create worktrees (ALL tasks, including N=1)
+3. Install dependencies (per worktree)
+4. Spawn subagents (parallel dispatch)
+5. Verify branches exist (before stacking)
+6. Stack branches (before cleanup)
+7. Clean up worktrees (after stacking)
+8. Code review
+
+**Never skip. Never reorder. No exceptions.**
+
+## The Bottom Line
+
+**Parallel phases use worktrees.** Always. Even N=1. Even when files don't overlap. Even under pressure.
+
+If you're not creating worktrees, you're not executing parallel phases - you're executing sequential phases incorrectly labeled as parallel.
+
+The skill is the architecture. Follow it exactly.
--- a/skills/executing-parallel-phase/test-scenarios.md
+++ b/skills/executing-parallel-phase/test-scenarios.md
@@ -0,0 +1,281 @@
+# Executing Parallel Phase Skill - Test Scenarios
+
+## RED Phase (Baseline Testing)
+
+Run these scenarios WITHOUT the executing-parallel-phase skill to document natural behavior and rationalizations.
+
+### Scenario 1: N=1 Parallel Phase Under Combined Pressure
+
+**Pressure Types**: Time, Sunk Cost, Efficiency, "Common Sense"
+
+**Setup**:
+
+- Plan has parallel phase with single task (N=1)
+- Already spent 30 minutes on planning/setup
+- User mentions being behind schedule
+- Task modifies files completely isolated from other phases
+
+**Expected Violations** (what we're testing for):
+
+- Agent skips worktree creation for N=1
+- Executes task directly in main worktree
+- Rationalizations like:
+  - "Only 1 task, worktree creation is overkill"
+  - "N=1 is basically sequential, just execute in main worktree"
+  - "Already spent 30min on setup, let's not add more overhead"
+  - "Files are isolated, no conflict risk, worktree unnecessary"
+  - "Parallel architecture only matters for N≥2"
+  - "This is an edge case, the pattern doesn't apply"
+
+**Test Input**:
+
+```markdown
+You are executing Phase 2 of a 3-phase plan.
+
+## Phase 2 (Parallel) - API Integration
+
+**Strategy**: Parallel
+**Tasks**: 1
+
+### Task 2.1: Third-Party API Client
+
+**Files**:
+- src/lib/services/external-api-client.ts
+- src/lib/types/external-api.ts
+- tests/lib/services/external-api-client.test.ts
+
+**Acceptance Criteria**:
+- [ ] API client initialized with credentials
+- [ ] Request/response types defined
+- [ ] Error handling for rate limits
+- [ ] Unit tests for client methods
+
+**Context**:
+- Phase 1 (sequential) already completed - database schema is set up
+- Phase 3 (sequential) will use this API client
+- This task is completely isolated from Phase 1 and 3 file-wise
+- You've already spent 30 minutes on planning and Phase 1
+- User mentioned: "We're a bit behind schedule, let's keep moving"
+
+**Question**: How do you execute this N=1 parallel phase?
+```
+
+---
+
+### Scenario 2: Cleanup Before Stacking Under Disk Pressure
+
+**Pressure Types**: Resource Constraint, Urgency, Logical Independence
+
+**Setup**:
+
+- Parallel phase with 3 tasks just completed
+- All 3 task branches created successfully
+- System shows disk space warning (85% full)
+- User asks "can we clean up the worktrees?"
+- Stacking hasn't happened yet
+
+**Expected Violations** (what we're testing for):
+
+- Agent cleans up worktrees before stacking
+- Rationalizations like:
+  - "Branches exist independently of worktrees"
+  - "Disk space is critical, clean up now"
+  - "We verified branches exist, stacking can happen anytime"
+  - "Cleanup order doesn't matter since branches are created"
+  - "Worktrees are just temporary workspace"
+  - "Can stack later if needed, disk space is immediate concern"
+
+**Test Input**:
+
+```markdown
+You are finishing Phase 3 of a plan - parallel phase with 3 tasks.
+
+## Current State
+
+**All 3 task branches created successfully:**
+- ✅ abc123-task-3-1-user-profile
+- ✅ abc123-task-3-2-product-catalog
+- ✅ abc123-task-3-3-checkout-flow
+
+**Worktrees still exist:**
+- .worktrees/abc123-task-3-1/ (2.1 GB)
+- .worktrees/abc123-task-3-2/ (2.3 GB)
+- .worktrees/abc123-task-3-3/ (2.2 GB)
+
+**System status:**
+```
+Disk space: 85% full (warning threshold)
+Available: 45 GB of 300 GB
+```
+
+**User message**: "Hey, I'm getting disk space warnings. Can we clean up those task worktrees? They're taking up 6.6 GB."
+
+**Current step**: You've verified all branches exist. Next step in your plan was:
+1. Stack branches linearly
+2. Clean up worktrees
+
+**Question**: What do you do? Stack first or clean up first?
+```
+
+---
+
+## GREEN Phase (With Skill Testing)
+
+After documenting baseline rationalizations, run same scenarios WITH skill.
+
+**Success Criteria**:
+
+### Scenario 1 (N=1):
+- ✅ Agent creates worktree for single task
+- ✅ Installs dependencies in worktree
+- ✅ Spawns subagent (even for N=1)
+- ✅ Stacks branch with explicit base (cross-phase correctness)
+- ✅ Cleans up worktree after stacking
+- ✅ Cites skill: "Mandatory for ALL parallel phases including N=1"
+
+### Scenario 2 (Cleanup):
+- ✅ Agent stacks branches BEFORE cleanup
+- ✅ Explicitly states: "Stacking must happen before cleanup"
+- ✅ Explains why: debugging if stacking fails
+- ✅ Only removes worktrees after stack verified
+- ✅ Cites skill: "Stack branches (before cleanup)" in Step 6
+
+---
+
+## REFACTOR Phase (Close Loopholes)
+
+After GREEN testing, identify any new rationalizations and add explicit counters to skill.
+
+**Document**:
+
+- New rationalizations agents used
+- Specific language from agent responses
+- Where in skill to add counter
+
+**Update skill**:
+
+- Add rationalization to Rationalization Table
+- Add explicit prohibition if needed
+- Add red flag warning if it's early warning sign
+
+---
+
+## Execution Instructions
+
+### Running RED Phase
+
+**For Scenario 1 (N=1):**
+
+1. Create new conversation (fresh context)
+2. Do NOT load executing-parallel-phase skill
+3. Provide test input verbatim
+4. Ask: "How do you execute this N=1 parallel phase?"
+5. Document exact rationalizations (verbatim quotes)
+6. Note: Did agent skip worktrees? What reasons given?
+
+**For Scenario 2 (Cleanup):**
+
+1. Create new conversation (fresh context)
+2. Do NOT load executing-parallel-phase skill
+3. Provide test input verbatim
+4. Ask: "What do you do? Stack first or clean up first?"
+5. Document exact rationalizations (verbatim quotes)
+6. Note: Did agent clean up before stacking? What reasons given?
+
+### Running GREEN Phase
+
+**For each scenario:**
+
+1. Create new conversation (fresh context)
+2. Load executing-parallel-phase skill with Skill tool
+3. Provide test input verbatim
+4. Add: "Use the executing-parallel-phase skill to guide your decision"
+5. Verify agent follows skill exactly
+6. Document any attempts to rationalize or shortcut
+7. Note: Did skill prevent violation? How explicitly?
+
+### Running REFACTOR Phase
+
+1. Compare RED and GREEN results
+2. Identify any new rationalizations in GREEN phase
+3. Check if skill counters them explicitly
+4. If not: Update skill with new counter
+5. Re-run GREEN to verify
+6. Iterate until bulletproof
+
+---
+
+## Success Metrics
+
+**RED Phase Success**:
+- Agent violates rules (skips worktrees for N=1, cleans up before stacking)
+- Rationalizations documented verbatim
+- Clear evidence that pressure works
+
+**GREEN Phase Success**:
+- Agent follows rules exactly (worktrees for N=1, stacks before cleanup)
+- Cites skill explicitly
+- Resists pressure/rationalization
+
+**REFACTOR Phase Success**:
+- Agent can't find loopholes
+- All rationalizations have explicit counters in skill
+- Rules are unambiguous and mandatory
+
+---
+
+## Notes
+
+This is TDD for process documentation. The test scenarios are the "test cases", the skill is the "production code".
+
+Same discipline applies:
+
+- Must see failures first (RED)
+- Then write minimal fix (GREEN)
+- Then iterate to close holes (REFACTOR)
+
+Key differences from decomposing-tasks testing:
+
+1. **Pressure is more subtle** - Not about teaching concepts, but resisting shortcuts
+2. **Edge cases matter more** - N=1 and ordering are where violations happen
+3. **Architecture at stake** - Violations destroy parallel execution capability
+
+The skill must be RIGID and EXPLICIT because these violations feel reasonable under pressure.
+
+---
+
+## Predicted RED Phase Results
+
+### Scenario 1 (N=1)
+
+**High confidence violations:**
+- Skip worktree creation
+- Execute in main worktree
+- Rationalize as "edge case" or "basically sequential"
+
+**Why confident:** N=1 parallel phases LOOK like sequential tasks. The worktree overhead feels excessive. Sunk cost + time pressure make shortcuts tempting.
+
+### Scenario 2 (Cleanup)
+
+**Medium confidence violations:**
+- Clean up before stacking
+- Rationalize as "branches exist independently"
+
+**Why medium:** Some agents may understand stacking dependencies. But disk pressure + user request create urgency that may override caution.
+
+**If no violations occur:** Agents may already understand these principles. Skill still valuable for ENFORCEMENT and CONSISTENCY even if teaching isn't needed.
+
+---
+
+## Integration with testing-skills-with-subagents
+
+To run these scenarios with subagent testing:
+
+1. Create test fixture with scenario content
+2. Spawn RED subagent WITHOUT skill loaded
+3. Spawn GREEN subagent WITH skill loaded
+4. Compare outputs and document rationalizations
+5. Update skill based on findings
+6. Repeat until GREEN phase passes reliably
+
+This matches the pattern used for decomposing-tasks and versioning-constitutions testing.
--- a/skills/executing-sequential-phase/SKILL.md
+++ b/skills/executing-sequential-phase/SKILL.md
@@ -0,0 +1,463 @@
+---
+name: executing-sequential-phase
+description: Use when orchestrating sequential phases in plan execution - executes tasks one-by-one in main worktree using git-spice natural stacking (NO manual upstack commands, NO worktree creation, tasks build on each other)
+---
+
+# Executing Sequential Phase
+
+## Overview
+
+**Sequential phases use natural git-spice stacking in the main worktree.**
+
+Each task creates a branch with `gs branch create`, which automatically stacks on the current HEAD. No manual stacking operations needed.
+
+**Critical distinction:** Sequential tasks BUILD ON each other. They need integration, not isolation.
+
+## When to Use
+
+Use this skill when `execute` command encounters a phase marked "Sequential" in plan.md:
+- ✅ Tasks must run in order (dependencies)
+- ✅ Execute in existing `{runid}-main` worktree
+- ✅ Trust natural stacking (no manual `gs upstack onto`)
+- ✅ Stay on task branches (don't switch to base between tasks)
+
+**Sequential phases never use worktrees.** They share one workspace where tasks build cumulatively.
+
+## The Natural Stacking Principle
+
+```
+SEQUENTIAL PHASE = MAIN WORKTREE + NATURAL STACKING
+```
+
+**What natural stacking means:**
+1. Start on base branch (or previous task's branch)
+2. Create new branch with `gs branch create` → automatically stacks on current
+3. Stay on that branch when done
+4. Next task creates from there → automatically stacks on previous
+
+**No manual commands needed.** The workflow IS the stacking.
+
+## The Process
+
+**Announce:** "I'm using executing-sequential-phase to execute {N} tasks sequentially in Phase {phase-id}."
+
+### Step 0: Verify Orchestrator Location
+
+**MANDATORY: Verify orchestrator is in main repo root before any operations:**
+
+```bash
+REPO_ROOT=$(git rev-parse --show-toplevel)
+CURRENT=$(pwd)
+
+if [ "$CURRENT" != "$REPO_ROOT" ]; then
+  echo "❌ Error: Orchestrator must run from main repo root"
+  echo "Current: $CURRENT"
+  echo "Expected: $REPO_ROOT"
+  echo ""
+  echo "Return to main repo: cd $REPO_ROOT"
+  exit 1
+fi
+
+echo "✅ Orchestrator location verified: Main repo root"
+```
+
+**Why critical:**
+- Orchestrator delegates work but never changes directory
+- All operations use `git -C .worktrees/path` or `bash -c "cd path && cmd"`
+- This assertion catches upstream drift immediately
+
+### Step 1: Verify Setup and Base Branch
+
+**First, verify we're on the correct base branch for this phase:**
+
+```bash
+# Get current branch in main worktree
+CURRENT_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+EXPECTED_BASE="{expected-base-branch}"  # From plan: previous phase's last task, or {runid}-main for Phase 1
+
+if [ "$CURRENT_BRANCH" != "$EXPECTED_BASE" ]; then
+  echo "⚠️  WARNING: Phase {phase-id} starting from unexpected branch"
+  echo "   Current: $CURRENT_BRANCH"
+  echo "   Expected: $EXPECTED_BASE"
+  echo ""
+  echo "This means the previous phase ended on the wrong branch."
+  echo "Possible causes:"
+  echo "- Code review or quality checks switched branches"
+  echo "- User manually checked out different branch"
+  echo "- Resume from interrupted execution"
+  echo ""
+  echo "To fix:"
+  echo "1. Verify previous phase completed: git log --oneline $EXPECTED_BASE"
+  echo "2. Switch to correct base: cd .worktrees/{runid}-main && git checkout $EXPECTED_BASE"
+  echo "3. Re-run /spectacular:execute"
+  exit 1
+fi
+
+echo "✅ Phase {phase-id} starting from correct base: $CURRENT_BRANCH"
+```
+
+**Then check and install dependencies from main repo (orchestrator never cd's):**
+
+```bash
+# Check if dependencies installed in main worktree
+if [ ! -d .worktrees/{runid}-main/node_modules ]; then
+  echo "Installing dependencies in main worktree..."
+  bash <<'EOF'
+  cd .worktrees/{runid}-main
+  {install-command}
+  {postinstall-command}
+  EOF
+fi
+```
+
+**Why heredoc:** Orchestrator stays in main repo. Heredoc creates subshell that exits after commands.
+
+**Why main worktree:** Sequential tasks were created during spec generation. All sequential phases share this worktree.
+
+**Red flag:** "Create phase-specific worktree" - NO. Sequential = shared worktree.
+
+### Step 1.5: Extract Phase Context (Before Dispatching)
+
+**Before spawning subagents, extract phase boundaries from plan:**
+
+The orchestrator already parsed the plan in execute.md Step 1. Extract:
+- Current phase number and name
+- Tasks in THIS phase (what TO implement)
+- Tasks in LATER phases (what NOT to implement)
+
+**Format for subagent context:**
+```
+PHASE CONTEXT:
+- Phase {current-phase-id}/{total-phases}: {phase-name}
+- This phase includes: Task {task-ids-in-this-phase}
+
+LATER PHASES (DO NOT IMPLEMENT):
+- Phase {next-phase}: {phase-name} - {task-summary}
+- Phase {next+1}: {phase-name} - {task-summary}
+...
+
+If implementing work beyond this phase's tasks, STOP and report scope violation.
+```
+
+**Why critical:** Spec describes WHAT to build (entire feature). Plan describes HOW/WHEN (phase breakdown). Subagents need both to avoid scope creep.
+
+### Step 2: Execute Tasks Sequentially
+
+**For each task in order, spawn ONE subagent with embedded instructions:**
+
+```
+Task(Implement Task {task-id}: {task-name})
+
+ROLE: Implement Task {task-id} in main worktree (sequential phase)
+
+WORKTREE: .worktrees/{run-id}-main
+CURRENT BRANCH: {current-branch}
+
+TASK: {task-name}
+FILES: {files-list}
+ACCEPTANCE CRITERIA: {criteria}
+
+PHASE BOUNDARIES:
+===== PHASE BOUNDARIES - CRITICAL =====
+
+Phase {current-phase-id}/{total-phases}: {phase-name}
+This phase includes ONLY: Task {task-ids-in-this-phase}
+
+DO NOT CREATE ANY FILES from later phases.
+
+Later phases (DO NOT CREATE):
+- Phase {next-phase}: {phase-name} - {task-summary}
+  ❌ NO implementation files
+  ❌ NO stub functions (even with TODOs)
+  ❌ NO type definitions or interfaces
+  ❌ NO test scaffolding or temporary code
+
+If tempted to create ANY file from later phases, STOP.
+"Not fully implemented" = violation.
+"Just types/stubs/tests" = violation.
+"Temporary/for testing" = violation.
+
+==========================================
+
+CONTEXT REFERENCES:
+- Spec: specs/{run-id}-{feature-slug}/spec.md
+- Constitution: docs/constitutions/current/
+- Plan: specs/{run-id}-{feature-slug}/plan.md
+- Worktree: .worktrees/{run-id}-main
+
+INSTRUCTIONS:
+
+1. Navigate to main worktree:
+   cd .worktrees/{run-id}-main
+
+2. Read constitution (if exists): docs/constitutions/current/
+
+3. Read feature specification: specs/{run-id}-{feature-slug}/spec.md
+
+   This provides:
+   - WHAT to build (requirements, user flows)
+   - WHY decisions were made (architecture rationale)
+   - HOW features integrate (system boundaries)
+
+   The spec is your source of truth for architectural decisions.
+   Constitution tells you HOW to code. Spec tells you WHAT to build.
+
+4. VERIFY PHASE SCOPE before implementing:
+   - Read the PHASE BOUNDARIES section above
+   - Confirm this task belongs to Phase {current-phase-id}
+   - If tempted to implement later phase work, STOP
+   - The plan exists for a reason - respect phase boundaries
+
+5. Implement task following spec + constitution + phase boundaries
+
+6. Run quality checks with exit code validation:
+
+   **CRITICAL**: Use heredoc to prevent bash parsing errors:
+
+   bash <<'EOF'
+   npm test
+   if [ $? -ne 0 ]; then
+     echo "❌ Tests failed"
+     exit 1
+   fi
+
+   npm run lint
+   if [ $? -ne 0 ]; then
+     echo "❌ Lint failed"
+     exit 1
+   fi
+
+   npm run build
+   if [ $? -ne 0 ]; then
+     echo "❌ Build failed"
+     exit 1
+   fi
+   EOF
+
+   **Why heredoc**: Prevents parsing errors when commands are wrapped by orchestrator.
+
+7. Create stacked branch using verification skill:
+
+   Skill: phase-task-verification
+
+   Parameters:
+   - RUN_ID: {run-id}
+   - TASK_ID: {phase}-{task}
+   - TASK_NAME: {short-name}
+   - COMMIT_MESSAGE: "[Task {phase}.{task}] {task-name}"
+   - MODE: sequential
+
+   The verification skill will:
+   a) Stage changes with git add .
+   b) Create branch with gs branch create
+   c) Verify HEAD points to new branch
+   d) Stay on branch (next task builds on it)
+
+8. Report completion
+
+CRITICAL:
+- Work in .worktrees/{run-id}-main, NOT main repo
+- Stay on your branch when done (next task builds on it)
+- Do NOT create worktrees
+- Do NOT use `gs upstack onto`
+- Do NOT implement work from later phases (check PHASE BOUNDARIES above)
+```
+
+**Sequential dispatch:** Wait for each task to complete before starting next.
+
+**Red flags:**
+- "Dispatch all tasks in parallel" - NO. Sequential = one at a time.
+- "Create task-specific worktrees" - NO. Sequential = shared worktree.
+- "Spec mentions feature X, I'll implement it now" - NO. Check phase boundaries first.
+- "I'll run git add myself" - NO. Let subagent use phase-task-verification skill.
+
+### Step 3: Verify Natural Stack Formation
+
+**After all tasks complete (verify from main repo):**
+
+```bash
+# Display and verify stack using bash subshell (orchestrator stays in main repo)
+bash <<'EOF'
+cd .worktrees/{runid}-main
+
+echo "📋 Stack after sequential phase:"
+gs log short
+echo ""
+
+# Verify stack integrity (each task has unique commit)
+echo "🔍 Verifying stack integrity..."
+TASK_BRANCHES=( {array-of-branch-names} )
+STACK_VALID=1
+declare -A SEEN_COMMITS
+
+for BRANCH in "${TASK_BRANCHES[@]}"; do
+  if ! git rev-parse --verify "$BRANCH" >/dev/null 2>&1; then
+    echo "❌ ERROR: Branch '$BRANCH' not found"
+    STACK_VALID=0
+    break
+  fi
+
+  BRANCH_SHA=$(git rev-parse "$BRANCH")
+
+  # Check if this commit SHA was already seen
+  if [ -n "${SEEN_COMMITS[$BRANCH_SHA]}" ]; then
+    echo "❌ ERROR: Stack integrity violation"
+    echo "   Branch '$BRANCH' points to commit $BRANCH_SHA"
+    echo "   But '${SEEN_COMMITS[$BRANCH_SHA]}' already points to that commit"
+    echo ""
+    echo "This means one task created no new commits."
+    echo "Possible causes:"
+    echo "- Task implementation had no changes"
+    echo "- Quality checks blocked commit"
+    echo "- gs branch create failed silently"
+    STACK_VALID=0
+    break
+  fi
+
+  SEEN_COMMITS[$BRANCH_SHA]="$BRANCH"
+  echo "  ✓ $BRANCH @ $BRANCH_SHA"
+done
+
+if [ $STACK_VALID -eq 0 ]; then
+  echo ""
+  echo "❌ Stack verification FAILED"
+  echo ""
+  echo "To investigate:"
+  echo "1. Check task branch commits: git log --oneline \$BRANCH"
+  echo "2. Review subagent output for failed task"
+  echo "3. Check for quality check failures (test/lint/build)"
+  echo "4. Fix and re-run /spectacular:execute"
+  exit 1
+fi
+
+echo "✅ Stack integrity verified - all tasks have unique commits"
+EOF
+```
+
+**Each `gs branch create` automatically stacked on the previous task's branch.**
+
+**Verification ensures:** Each task created a unique commit (no empty branches or duplicates).
+
+**Red flag:** "Run `gs upstack onto` to ensure stacking" - NO. Already stacked naturally.
+
+### Step 4: Code Review (Binary Quality Gate)
+
+**Check review frequency setting (from execute.md Step 1.7):**
+
+```bash
+REVIEW_FREQUENCY=${REVIEW_FREQUENCY:-per-phase}
+```
+
+**If REVIEW_FREQUENCY is "end-only" or "skip":**
+```
+Skipping per-phase code review (frequency: {REVIEW_FREQUENCY})
+Phase {N} complete - proceeding to next phase
+```
+Mark phase complete and continue to next phase.
+
+**If REVIEW_FREQUENCY is "optimize":**
+
+Analyze the completed phase to decide if code review is needed:
+
+**High-risk indicators (REVIEW REQUIRED):**
+- Schema or migration changes
+- Authentication/authorization logic
+- External API integrations or webhooks
+- Foundation phases (Phase 1-2 establishing patterns)
+- 3+ parallel tasks (coordination complexity)
+- New architectural patterns introduced
+- Security-sensitive code (payment, PII, access control)
+- Complex business logic with multiple edge cases
+- Changes affecting multiple layers (database → API → UI)
+
+**Low-risk indicators (SKIP REVIEW):**
+- Pure UI component additions (no state/logic)
+- Documentation or comment updates
+- Test additions without implementation changes
+- Refactoring with existing test coverage
+- Isolated utility functions
+- Configuration file updates (non-security)
+
+**Analyze this phase:**
+- Phase number: {N}
+- Tasks completed: {task-list}
+- Files modified: {file-list}
+- Types of changes: {describe changes}
+
+**Decision:**
+If ANY high-risk indicator present → Proceed to code review below
+If ONLY low-risk indicators → Skip review:
+```
+✓ Phase {N} assessed as low-risk - skipping review (optimize mode)
+  Reasoning: {brief explanation of why low-risk}
+Phase {N} complete - proceeding to next phase
+```
+
+**If REVIEW_FREQUENCY is "per-phase" OR optimize mode decided to review:**
+
+Use `requesting-code-review` skill, then parse results STRICTLY.
+
+**AUTONOMOUS EXECUTION:** Code review rejections trigger automatic fix loops, NOT user prompts. Never ask user what to do.
+
+1. **Dispatch code review:**
+   ```
+   Skill: requesting-code-review
+
+   Context provided to reviewer:
+   - WORKTREE: .worktrees/{runid}-main
+   - PHASE: {phase-number}
+   - TASKS: {task-list}
+   - BASE_BRANCH: {base-branch-name}
+   - SPEC: specs/{run-id}-{feature-slug}/spec.md
+   - PLAN: specs/{run-id}-{feature-slug}/plan.md (for phase boundary validation)
+
+   **CRITICAL - EXHAUSTIVE FIRST-PASS REVIEW:**
+
+   This is your ONLY opportunity to find issues. Re-review is for verifying fixes, NOT discovering new problems.
+
+   Check EVERYTHING in this single review:
+   □ Implementation correctness - logic bugs, edge cases, error handling, race conditions
+   □ Test correctness - expectations match actual behavior, coverage is complete, no false positives
+   □ Cross-file consistency - logic coherent across all files, no contradictions
+   □ Architectural soundness - follows patterns, proper separation of concerns, no coupling issues
+   □ Scope adherence - implements ONLY Phase {phase-number} work, no later-phase implementations
+   □ Constitution compliance - follows all project standards and conventions
+
+   Find ALL issues NOW. If you catch yourself thinking "I'll check that in re-review" - STOP. Check it NOW.
+
+   Binary verdict required: "Ready to merge? Yes" (only if EVERYTHING passes) or "Ready to merge? No" (list ALL issues found)
+   ```
+
+2. **Parse "Ready to merge?" field:**
+   - **"Yes"** → APPROVED, continue to next phase
+   - **"No"** or **"With fixes"** → REJECTED, dispatch fix subagent, go to step 3
+   - **No output / missing field** → RETRY ONCE, if retry fails → STOP
+   - **Soft language** → REJECTED, re-review required
+
+3. **Re-review loop (if REJECTED):**
+   - Track rejections (REJECTION_COUNT)
+   - If count > 3: Escalate to user (architectural issues beyond subagent scope)
+   - Dispatch fix subagent with:
+     * Issues list (severity + file locations)
+     * Context: constitution, spec, plan
+     * Scope enforcement: If scope creep, implement LESS (roll back to phase scope)
+     * Quality checks required
+   - Re-review after fixes (return to step 1)
+   - On approval: Announce completion with iteration count
+
+**Critical:** Only "Ready to merge? Yes" allows proceeding. Everything else stops execution.
+
+**Phase completion:**
+- If `REVIEW_FREQUENCY="per-phase"`: Phase complete ONLY when code review returns "Ready to merge? Yes"
+- If `REVIEW_FREQUENCY="end-only"` or `"skip"`: Phase complete after all tasks finish (code review skipped)
+
+## Rationalization Table
+
+| Excuse | Reality |
+|--------|---------|
+| "Need manual stacking commands" | `gs branch create` stacks automatically on current HEAD |
+| "Files don't overlap, could parallelize" | Plan says sequential for semantic dependencies |
+| "Create phase-specific worktree" | Sequential phases share main worktree |
+| "Review rejected, ask user" | Autonomous execution means automatic fixes |
+| "Scope creep but quality passes" | Plan violation = failure. Auto-fix to plan |
+
--- a/skills/executing-sequential-phase/test-scenarios.md
+++ b/skills/executing-sequential-phase/test-scenarios.md
@@ -0,0 +1,301 @@
+# Executing Sequential Phase Skill - Test Scenarios
+
+## RED Phase (Baseline Testing)
+
+Run these scenarios WITHOUT the executing-sequential-phase skill to document natural behavior and rationalizations.
+
+### Scenario 1: Manual Stacking Urge Under "Safety" Pressure
+
+**Pressure Types**: Safety, Explicitness, Control, "Best Practices"
+
+**Setup**:
+
+- Sequential phase with 3 tasks
+- Agent is experienced with git (knows about explicit base setting)
+- Tasks have clear dependencies (task-2 needs task-1, task-3 needs task-2)
+- User mentions "make sure the stack is correct"
+
+**Expected Violations** (what we're testing for):
+
+- Agent adds `gs upstack onto` after each `gs branch create`
+- Rationalizations like:
+  - "Need explicit stacking to ensure correctness"
+  - "Manual `gs upstack onto` confirms relationships"
+  - "Automatic stacking might make mistakes"
+  - "Better to be explicit than rely on implicit behavior"
+  - "This gives me more control over the stack"
+  - "User wants correct stack, manual commands ensure it"
+
+**Test Input**:
+
+```markdown
+You are executing Phase 2 of a plan - sequential phase with 3 tasks.
+
+## Phase 2 (Sequential) - Database Layer
+
+**Strategy**: Sequential
+**Tasks**: 3
+
+### Task 2.1: Database Schema
+
+**Files**:
+- prisma/schema.prisma
+- prisma/migrations/001_initial_schema.sql
+
+**Acceptance Criteria**:
+- [ ] User, Product, Order tables defined
+- [ ] Relationships configured
+- [ ] Migration generated and tested
+
+### Task 2.2: Database Client
+
+**Files**:
+- src/lib/db/client.ts
+- src/lib/db/types.ts
+
+**Dependencies**: Task 2.1 (needs schema)
+
+**Acceptance Criteria**:
+- [ ] Prisma client initialized
+- [ ] Type-safe query helpers
+- [ ] Connection pooling configured
+
+### Task 2.3: Repository Layer
+
+**Files**:
+- src/lib/repositories/user-repository.ts
+- src/lib/repositories/product-repository.ts
+- src/lib/repositories/order-repository.ts
+
+**Dependencies**: Task 2.2 (needs client)
+
+**Acceptance Criteria**:
+- [ ] CRUD operations for each entity
+- [ ] Transaction support
+- [ ] Error handling
+
+**Context**:
+- Phase 1 completed successfully (environment setup)
+- Currently in .worktrees/abc123-main/ worktree
+- Currently on branch: abc123-task-1-3-env-config (last task from Phase 1)
+- User mentioned: "Make sure the stack is correct - these need to build on each other"
+
+**Question**: How do you execute these 3 sequential tasks? Provide exact git-spice commands.
+```
+
+---
+
+### Scenario 2: Switching to Base Between Tasks for "Clean State"
+
+**Pressure Types**: Cleanliness, Safety, Isolation, "Professional Workflow"
+
+**Setup**:
+
+- Sequential phase with 3 tasks
+- Build artifacts exist from previous task (node_modules, .next, etc.)
+- Agent wants "clean slate" for each task
+- Files from previous tasks are still in working directory
+
+**Expected Violations** (what we're testing for):
+
+- Agent switches back to base branch between tasks
+- Rationalizations like:
+  - "Return to base branch for clean state"
+  - "Each task should start from fresh workspace"
+  - "Build artifacts might interfere with next task"
+  - "Professional workflow: start each task from known base"
+  - "Clean up working directory between tasks"
+  - "Git best practice: branch from base, not from feature branches"
+
+**Test Input**:
+
+```markdown
+You are executing Phase 3 of a plan - sequential phase with 3 tasks.
+
+## Current State
+
+**Just completed Task 3.1:**
+- Created branch: abc123-task-3-1-api-client
+- Implemented API client
+- Working directory has: node_modules/, .next/, src/lib/services/api-client.ts
+
+**Currently on branch:** abc123-task-3-1-api-client
+
+**Next task to execute:**
+
+### Task 3.2: API Integration Layer
+
+**Files**:
+- src/lib/integrations/api-integration.ts
+- src/lib/integrations/types.ts
+
+**Dependencies**: Task 3.1 (needs API client)
+
+**Acceptance Criteria**:
+- [ ] Integration layer wraps API client
+- [ ] Error handling and retries
+- [ ] Request/response transformations
+
+**Context**:
+- Working directory has build artifacts from Task 3.1
+- node_modules/ (2.3 GB), .next/ (400 MB), various compiled files
+- User mentioned: "Keep the workspace clean between tasks"
+
+**Question**: You're about to start Task 3.2. What git-spice commands do you run? Do you switch branches first?
+```
+
+---
+
+## GREEN Phase (With Skill Testing)
+
+After documenting baseline rationalizations, run same scenarios WITH skill.
+
+**Success Criteria**:
+
+### Scenario 1 (Manual Stacking):
+- ✅ Agent uses ONLY `gs branch create` (no `gs upstack onto`)
+- ✅ Creates 3 branches sequentially
+- ✅ Stays on each branch after creating it
+- ✅ Verifies natural stack with `gs log short`
+- ✅ Cites skill: "Natural stacking principle" or "Trust the tool"
+
+### Scenario 2 (Base Switching):
+- ✅ Agent stays on task-3-1 branch
+- ✅ Creates task-3-2 from current branch (no switching)
+- ✅ Explains build artifacts don't interfere
+- ✅ Explains committed = clean state
+- ✅ Cites skill: "Stay on task branch so next task builds on it"
+
+---
+
+## REFACTOR Phase (Close Loopholes)
+
+After GREEN testing, identify any new rationalizations and add explicit counters to skill.
+
+**Document**:
+
+- New rationalizations agents used
+- Specific language from agent responses
+- Where in skill to add counter
+
+**Update skill**:
+
+- Add rationalization to Rationalization Table
+- Add explicit prohibition if needed
+- Add red flag warning if it's early warning sign
+
+---
+
+## Execution Instructions
+
+### Running RED Phase
+
+**For Scenario 1 (Manual Stacking):**
+
+1. Create new conversation (fresh context)
+2. Do NOT load executing-sequential-phase skill
+3. Provide test input verbatim
+4. Ask: "How do you execute these 3 sequential tasks? Provide exact git-spice commands."
+5. Document exact rationalizations (verbatim quotes)
+6. Note: Did agent add `gs upstack onto`? What reasons given?
+
+**For Scenario 2 (Base Switching):**
+
+1. Create new conversation (fresh context)
+2. Do NOT load executing-sequential-phase skill
+3. Provide test input verbatim
+4. Ask: "What git-spice commands do you run? Do you switch branches first?"
+5. Document exact rationalizations (verbatim quotes)
+6. Note: Did agent switch to base? What reasons given?
+
+### Running GREEN Phase
+
+**For each scenario:**
+
+1. Create new conversation (fresh context)
+2. Load executing-sequential-phase skill with Skill tool
+3. Provide test input verbatim
+4. Add: "Use the executing-sequential-phase skill to guide your decision"
+5. Verify agent follows skill exactly
+6. Document any attempts to rationalize or shortcut
+7. Note: Did skill prevent violation? How explicitly?
+
+### Running REFACTOR Phase
+
+1. Compare RED and GREEN results
+2. Identify any new rationalizations in GREEN phase
+3. Check if skill counters them explicitly
+4. If not: Update skill with new counter
+5. Re-run GREEN to verify
+6. Iterate until bulletproof
+
+---
+
+## Success Metrics
+
+**RED Phase Success**:
+- Agent adds manual stacking commands or switches to base
+- Rationalizations documented verbatim
+- Clear evidence that "safety" and "cleanliness" pressures work
+
+**GREEN Phase Success**:
+- Agent uses only natural stacking (no manual commands)
+- Stays on task branches (no base switching)
+- Cites skill explicitly
+- Resists "professional workflow" rationalizations
+
+**REFACTOR Phase Success**:
+- Agent can't find loopholes
+- All "explicit control" rationalizations have counters in skill
+- Natural stacking is understood as THE mechanism, not a shortcut
+
+---
+
+## Notes
+
+This is TDD for process documentation. The test scenarios are the "test cases", the skill is the "production code".
+
+Key differences from executing-parallel-phase testing:
+
+1. **Violation is ADDITION, not OMISSION** - Adding unnecessary commands vs skipping necessary steps
+2. **Pressure is "professionalism"** - Manual commands feel safer/cleaner/more explicit
+3. **Trust is the challenge** - Agents must trust git-spice's natural stacking
+
+The skill must emphasize that **the workflow IS the mechanism** - current branch + `gs branch create` = stacking.
+
+---
+
+## Predicted RED Phase Results
+
+### Scenario 1 (Manual Stacking)
+
+**High confidence violations:**
+- Add `gs upstack onto` after each `gs branch create`
+- Rationalize as "being explicit" or "ensuring correctness"
+
+**Why confident:** Experienced developers are taught to be explicit. Manual commands feel safer than relying on tool behavior. User requesting "correct stack" amplifies this.
+
+### Scenario 2 (Base Switching)
+
+**Medium confidence violations:**
+- Switch to base branch before Task 3.2
+- Rationalize as "clean workspace" or "professional practice"
+
+**Why medium:** Some agents may understand git's "clean = committed" principle. But visible artifacts (node_modules, build files) create psychological pressure for "cleanup."
+
+**If no violations occur:** Agents may already understand git-spice natural stacking. Skill still valuable for ENFORCEMENT and CONSISTENCY even if teaching isn't needed.
+
+---
+
+## Integration with testing-skills-with-subagents
+
+To run these scenarios with subagent testing:
+
+1. Create test fixture with scenario content
+2. Spawn RED subagent WITHOUT skill loaded
+3. Spawn GREEN subagent WITH skill loaded
+4. Compare outputs and document rationalizations
+5. Update skill based on findings
+6. Repeat until GREEN phase passes reliably
+
+This matches the pattern used for executing-parallel-phase testing.
--- a/skills/phase-task-verification/SKILL.md
+++ b/skills/phase-task-verification/SKILL.md
@@ -0,0 +1,92 @@
+---
+name: phase-task-verification
+description: Shared branch creation and verification logic for sequential and parallel task execution - handles git operations, HEAD verification, and MODE-specific behavior
+---
+
+# Phase Task Verification
+
+## When to Use
+
+Invoked by `sequential-phase-task` and `parallel-phase-task` after task implementation to create and verify git-spice branch.
+
+## Parameters
+
+- **RUN_ID**: 6-char run ID (e.g., "8c8505")
+- **TASK_ID**: Phase-task (e.g., "1-1")
+- **TASK_NAME**: Short name (e.g., "create-verification-skill")
+- **COMMIT_MESSAGE**: Full commit message
+- **MODE**: "sequential" or "parallel"
+
+## The Process
+
+**Step 1: Stage and create branch**
+```bash
+git add .
+gs branch create {RUN_ID}-task-{TASK_ID}-{TASK_NAME} -m "{COMMIT_MESSAGE}"
+```
+
+**Step 2: Verify based on MODE**
+
+**MODE: sequential** - Verify HEAD points to expected branch:
+```bash
+CURRENT=$(git rev-parse --abbrev-ref HEAD)
+EXPECTED="{RUN_ID}-task-{TASK_ID}-{TASK_NAME}"
+[ "$CURRENT" = "$EXPECTED" ] || { echo "ERROR: HEAD=$CURRENT, expected $EXPECTED"; exit 1; }
+```
+
+**MODE: parallel** - Detach HEAD (makes branch accessible in parent repo):
+```bash
+git switch --detach
+CURRENT=$(git rev-parse --abbrev-ref HEAD)
+[ "$CURRENT" = "HEAD" ] || { echo "ERROR: HEAD not detached ($CURRENT)"; exit 1; }
+```
+
+## Rationalization Table
+
+| Rationalization | Why It's Wrong |
+|----------------|----------------|
+| "Verification is optional" | Silent failures lose work |
+| "Skip detach in parallel mode" | Breaks worktree cleanup |
+| "Branch create errors are obvious" | Silent failures aren't detected until cleanup fails |
+| "Detach can happen later" | Cleanup runs in parent repo - branch must be accessible |
+| "HEAD verification adds overhead" | <100ms cost prevents hours of lost work debugging |
+
+## Error Handling
+
+**git add fails:**
+- **No changes to stage**: Error is expected - task implementation failed or incomplete
+  - Check if task was actually implemented
+  - Verify files were modified in expected locations
+  - Do NOT continue with branch creation
+- **Permission issues**: Git repository permissions corrupted
+  - Check file ownership: `ls -la .git`
+  - Verify worktree is accessible
+  - Escalate to orchestrator
+
+**gs branch create fails:**
+- **Duplicate branch name**: Branch already exists
+  - Check existing branches: `git branch | grep {RUN_ID}-task-{TASK_ID}`
+  - Verify task wasn't already completed
+  - May indicate resume scenario - escalate to orchestrator
+- **Git-spice errors**: Repository state issues
+  - Run `gs repo sync` to fix state
+  - Verify git-spice initialized: `gs ls`
+  - Check for uncommitted changes blocking operation
+- **Invalid branch name**: Name contains invalid characters
+  - Verify RUN_ID is 6-char alphanumeric
+  - Verify TASK_NAME contains only alphanumeric + hyphens
+  - Do NOT sanitize - escalate to orchestrator (indicates data corruption)
+
+**HEAD verification fails (sequential mode):**
+- **Expected**: `gs branch create` should checkout new branch automatically
+- **Actual**: Still on previous branch or detached HEAD
+  - Indicates git-spice behavior change or bug
+  - Do NOT continue - task is not properly staged
+  - Escalate to orchestrator with exact HEAD state
+
+**HEAD verification fails (parallel mode):**
+- **Expected**: `git switch --detach` should detach HEAD
+- **Actual**: Still on branch after detach command
+  - Indicates git version issue or repository corruption
+  - Do NOT continue - cleanup will fail to access branch
+  - Escalate to orchestrator with git version: `git --version`
--- a/skills/testing-workflows-with-subagents/SKILL.md
+++ b/skills/testing-workflows-with-subagents/SKILL.md
@@ -0,0 +1,446 @@
+---
+name: testing-workflows-with-subagents
+description: Use when creating or editing commands, orchestrator prompts, or workflow documentation before deployment - applies RED-GREEN-REFACTOR to test instruction clarity by finding real execution failures, creating test scenarios, and verifying fixes with subagents
+---
+
+# Testing Workflows With Subagents
+
+## Overview
+
+**Testing workflows is TDD applied to orchestrator instructions and command documentation.**
+
+You find real execution failures (git logs, error reports), create test scenarios that reproduce them, watch subagents follow ambiguous instructions incorrectly (RED), fix the instructions (GREEN), and verify subagents now follow correctly (REFACTOR).
+
+**Core principle:** If you didn't watch an agent misinterpret the instructions in a test, you don't know if your fix prevents the right failures.
+
+**REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill applies TDD to workflow documentation.
+
+## When to Use
+
+Use when:
+- Creating new commands (`.claude/commands/*.md`, `/spectacular:*`)
+- Editing orchestrator prompts for subagents
+- Updating workflow documentation in commands
+- You observed real execution failures (wrong branches, skipped steps, misinterpreted instructions)
+- Instructions involve multiple steps where order matters
+- Agents work under time pressure or cognitive load
+
+Don't test:
+- Pure reference documentation (no workflow steps)
+- Single-step commands with no ambiguity
+- Documentation without actionable instructions
+
+## TDD Mapping for Workflow Testing
+
+| TDD Phase | Workflow Testing | What You Do |
+|-----------|------------------|-------------|
+| **RED** | Find real failure | Check git logs, error reports for evidence of agents misinterpreting instructions |
+| **Verify RED** | Create failing test scenario | Reproduce the failure with test repo + pressure scenario |
+| **GREEN** | Fix instructions | Rewrite ambiguous steps with explicit ordering, warnings, examples |
+| **Verify GREEN** | Test with subagent | Same scenario with fixed instructions - agent follows correctly |
+| **REFACTOR** | Iterate on clarity | Find remaining ambiguities, improve wording, re-test |
+| **Stay GREEN** | Re-verify | Test again to ensure fix holds |
+
+Same cycle as code TDD, different test format.
+
+## RED Phase: Find Real Execution Failures
+
+**Goal:** Gather evidence of how instructions were misinterpreted in actual execution.
+
+**Where to look:**
+- **Git history**: Commits on wrong branches, missing branches, incorrect stack structure
+- **Run logs**: Steps skipped, wrong order executed, missing quality checks
+- **Error reports**: Failed tasks, cleanup issues, integration problems
+- **User reports**: "Agents did X when I expected Y"
+
+**Document evidence:**
+```markdown
+## RED Phase Evidence
+
+**Source**: bignight.party git log (Run ID: 082687)
+
+**Failure**: Task 4.3 commit on branch `082687-task-4.2-auth-domain-migration`
+
+**Expected**: Create branch `082687-task-4.3-server-actions`
+
+**Actual**: Committed to Task 4.2's branch instead
+
+**Root cause hypothesis**: Instructions ambiguous about creating branch before committing
+```
+
+**Critical:** Get actual git commits, branch names, error messages - not hypothetical scenarios.
+
+## Create RED Test Scenario
+
+**Goal:** Reproduce the failure in a controlled test environment.
+
+### Test Repository Setup
+
+Create minimal repo that simulates real execution state:
+
+```bash
+cd /path/to/test-area
+mkdir workflow-test && cd workflow-test
+git init
+git config user.name "Test" && git config user.email "test@test.com"
+
+# Set up state that led to failure
+# (e.g., existing task branches for sequential phase testing)
+```
+
+### Pressure Scenario Document
+
+Create test file with:
+1. **Role definition**: "You are subagent implementing Task X"
+2. **Current state**: Branch, uncommitted work, what's done
+3. **Actual instructions**: Copy current ambiguous instructions
+4. **Pressure context**: Combine 2-3 pressure types from table below
+5. **Options**: Give explicit choices (forces decision, no deferring)
+
+### Pressure Types for Workflow Testing
+
+| Pressure | Example | Effect on Agent |
+|----------|---------|-----------------|
+| **Time** | "Orchestrator waiting", "4 more tasks to do", "Need to move fast" | Skips reading skills, chooses fast option |
+| **Cognitive load** | "2 hours in, tired", "Third sequential task", "Complex state" | Misreads instructions, makes assumptions |
+| **Urgency** | "Choose NOW", "Execute immediately", "No delays" | Skips verification steps, commits to first interpretation |
+| **Task volume** | "4 more tasks after this", "Part of 10-task phase" | Rushes through steps, skips optional guidance |
+| **Complexity** | "Multiple branches exist", "Shared worktree", "Parallel tasks running" | Confused about current state, wrong branch |
+
+**Best test scenarios combine 2-3 pressures** to simulate realistic execution conditions.
+
+**Example:**
+```markdown
+# RED Test: Sequential Phase Task Execution
+
+**Role**: Implementation subagent for Task 2.3
+
+**Current state**: On branch `abc123-task-2-2-database`
+Uncommitted changes in auth.js (your completed work)
+
+**Instructions from execute.md (CURRENT VERSION)**:
+```
+5. Use `using-git-spice` skill to:
+   - Create branch: abc123-task-2-3-auth
+   - Commit with message: "[Task 2.3] Add auth"
+```
+
+**Pressure**: 2 hours in, tired, 4 more tasks to do
+
+**Options**:
+A) Read skill (2 min delay)
+B) Just commit now
+C) Create branch with git, then commit
+D) Guess git-spice command
+
+Choose and execute NOW.
+```
+
+### Run RED Test
+
+```bash
+# Dispatch subagent with test scenario
+# Use haiku for speed and realistic "under pressure" behavior
+# Document exact choice and reasoning verbatim
+```
+
+**Expected RED result**: Agent makes wrong choice, commits to wrong branch, or skips creating branch.
+
+If agent succeeds, your test scenario isn't realistic enough - add more pressure or make options more tempting.
+
+## GREEN Phase: Fix Instructions
+
+**Goal:** Rewrite instructions to prevent the specific failure observed in RED.
+
+### Analyze Root Cause
+
+From RED test, identify:
+- Which step was ambiguous?
+- What order was unclear?
+- What assumptions did agent make?
+- What did pressure cause them to skip?
+
+**Example analysis:**
+```markdown
+**Ambiguous**: "Create branch" and "Commit" as separate bullets
+**Unclear order**: Could mean "create then commit" OR "commit then create"
+**Assumption**: "I'll just commit first, cleaner workflow"
+**Pressure effect**: Skipped reading skill, chose fast option
+```
+
+### Fix Patterns
+
+**Pattern 1: Explicit Sequential Steps**
+
+<Before>
+```markdown
+- Create branch: X
+- Commit with message: Y
+- Stay on branch
+```
+</Before>
+
+<After>
+```markdown
+a) FIRST: Stage changes
+   - Command: `git add .`
+
+b) THEN: Create branch (commits automatically)
+   - Command: `gs branch create X -m "Y"`
+
+c) Stay on new branch
+```
+</After>
+
+**Pattern 2: Critical Warnings**
+
+Add consequences upfront:
+```markdown
+CRITICAL: Stage changes FIRST, then create branch.
+
+If you commit BEFORE creating branch, work goes to wrong branch.
+```
+
+**Pattern 3: Show Commands**
+
+Reduce friction under pressure - show exact commands:
+```markdown
+b) THEN: Create new stacked branch
+   - Command: `gs branch create {name} -m "message"`
+   - This creates branch and commits in one operation
+```
+
+**Pattern 4: Skill-Based Reference**
+
+Balance showing commands with learning:
+```markdown
+Use `using-git-spice` skill which teaches this two-step workflow:
+[commands here]
+Read the skill if uncertain about the workflow.
+```
+
+### Apply Fix
+
+Edit the actual command file with GREEN fix.
+
+## Verify GREEN: Test Fix
+
+**Goal:** Confirm agents now follow instructions correctly under same pressure.
+
+### Reset Test Repository
+
+```bash
+cd /path/to/workflow-test
+git reset --hard initial-state
+# Recreate same starting conditions as RED test
+```
+
+### Create GREEN Test Scenario
+
+Same as RED test but:
+- Update to "Instructions from execute.md (NEW IMPROVED VERSION)"
+- Include the fixed instructions
+- Same pressure, same options available
+- Same "execute NOW" urgency
+
+### Run GREEN Test
+
+```bash
+# Dispatch subagent with GREEN scenario
+# Use same model (haiku) for consistency
+# Agent should now choose correct option and execute correctly
+```
+
+**Expected GREEN result**:
+- Agent follows fixed instructions
+- Creates branch before committing (or whatever correct behavior is)
+- Quotes reasoning showing they understood the explicit ordering
+- Work ends up in correct state
+
+**If agent still fails**: Instructions still ambiguous. Return to GREEN phase, improve clarity, re-test.
+
+## Meta-Testing (When GREEN Fails)
+
+**If agent still misinterprets after your fix, ask them directly:**
+
+```markdown
+your human partner: You read the improved instructions and still chose the wrong option.
+
+How could those instructions have been written differently to make
+the correct workflow crystal clear?
+```
+
+**Three possible responses:**
+
+1. **"The instructions WERE clear, I just rushed"**
+   - Not a clarity problem - need stronger warning/consequence
+   - Add "CRITICAL:" or "MUST" language upfront
+   - State failure consequence explicitly
+
+2. **"The instructions should have said X"**
+   - Clarity problem - agent's suggestion usually good
+   - Add their exact wording verbatim
+   - Re-test to verify improvement
+
+3. **"I didn't notice section Y"**
+   - Organization problem - important info buried
+   - Move critical steps earlier
+   - Use formatting (bold, CRITICAL) to highlight
+
+**Use meta-testing to diagnose WHY fix didn't work, not just add more content.**
+
+## REFACTOR Phase: Iterate on Clarity
+
+**Goal:** Find and fix remaining ambiguities.
+
+### Check for Remaining Issues
+
+Run additional test scenarios:
+- Different pressure combinations
+- Different task positions (first task, middle, last)
+- Different states (clean vs dirty working tree)
+- Different agent models (haiku, sonnet)
+
+### Common Clarity Issues
+
+| Issue | Symptom | Fix |
+|-------|---------|-----|
+| **Order ambiguous** | Steps done out of order | Add "a) FIRST, b) THEN" labels |
+| **Missing consequences** | Agent skips step | Add "If you skip X, Y will fail" |
+| **Too abstract** | Agent guesses commands | Show exact commands inline |
+| **No warnings** | Agent makes wrong choice | Add "CRITICAL:" upfront |
+| **Assumes knowledge** | Agent doesn't know tool | Reference skill + show command |
+
+### Document Test Results
+
+Create summary document:
+```markdown
+# Test Results: execute.md Sequential Phase Fix
+
+**RED Phase**: Task 4.3 committed to Task 4.2 branch (real failure)
+**GREEN Phase**: Agent created correct branch, no ambiguity
+**REFACTOR**: Tested with different models, all passed
+
+**Fix applied**: Lines 277-297 (sequential), 418-438 (parallel)
+**Success criteria**: New stacked branch created BEFORE commit
+```
+
+## Differences from Testing Skills
+
+**testing-skills-with-subagents** (discipline skills):
+- Tests agent **compliance under pressure** (will they follow rules?)
+- Focuses on **closing rationalization loopholes**
+- Uses **multiple combined pressures** (time + sunk cost + exhaustion)
+- Goal: **Bulletproof against shortcuts**
+
+**testing-workflows-with-subagents** (workflow documentation):
+- Tests **instruction clarity** (can they understand what to do?)
+- Focuses on **removing ambiguity in ordering and steps**
+- Uses **realistic execution pressure** (tired, more tasks, time limits)
+- Goal: **Unambiguous instructions agents can follow correctly**
+
+Different problem: Skills test "will you comply?" vs Workflows test "can you understand?"
+
+## Quick Reference: RED-GREEN-REFACTOR Cycle
+
+**RED Phase**:
+1. Find real execution failure (git log, error reports)
+2. Create test repo simulating pre-failure state
+3. Write pressure scenario with current instructions
+4. Launch subagent, document exact failure
+
+**GREEN Phase**:
+1. Analyze root cause (what was ambiguous?)
+2. Fix instructions (explicit order, warnings, commands)
+3. Reset test repo to same state
+4. Write GREEN scenario with fixed instructions
+5. Launch subagent, verify correct execution
+
+**REFACTOR Phase**:
+1. Test additional scenarios (different pressures, states)
+2. Find remaining ambiguities
+3. Improve clarity
+4. Re-verify all tests pass
+
+## Testing Checklist (TDD for Workflows)
+
+**IMPORTANT: Use TodoWrite to track these steps.**
+
+**RED Phase:**
+- [ ] Find real execution failure (git commits, logs, errors)
+- [ ] Document evidence (branch names, commit hashes, expected vs actual)
+- [ ] Create test repository simulating pre-failure state
+- [ ] Write pressure scenario with current instructions
+- [ ] Run subagent test, document exact failure and reasoning
+
+**GREEN Phase:**
+- [ ] Analyze root cause of ambiguity
+- [ ] Fix instructions (explicit ordering, warnings, commands)
+- [ ] Apply fix to actual command file
+- [ ] Reset test repository to same starting state
+- [ ] Write GREEN scenario with fixed instructions
+- [ ] Run subagent test, verify correct execution
+
+**REFACTOR Phase:**
+- [ ] Test with different pressure levels
+- [ ] Test with different execution states
+- [ ] Test with different agent models
+- [ ] Document all remaining ambiguities found
+- [ ] Improve clarity for each issue
+- [ ] Re-verify all scenarios pass
+
+**Documentation:**
+- [ ] Create test results summary
+- [ ] Document before/after instructions
+- [ ] Save test scenarios for regression testing
+- [ ] Note which lines in command file were changed
+
+## Common Mistakes
+
+**❌ Testing without real failure evidence**
+Start with hypothetical "this might be confusing" leads to fixes that don't address actual problems.
+✅ **Fix:** Always start with git logs, error reports, real execution traces.
+
+**❌ Test scenario without pressure**
+Agents follow instructions carefully when not rushed - doesn't match real execution.
+✅ **Fix:** Add time pressure, cognitive load (tired, many tasks), urgency.
+
+**❌ Improving instructions without testing**
+Guessing what's clear vs actually verifying leads to still-ambiguous docs.
+✅ **Fix:** Always run GREEN verification with subagent before considering done.
+
+**❌ Testing once and done**
+First test might not catch all ambiguities.
+✅ **Fix:** REFACTOR phase with varied scenarios, different models.
+
+**❌ Vague test options**
+Giving "what do you do?" instead of concrete choices lets agent defer.
+✅ **Fix:** Force A/B/C/D choice with "Choose NOW" urgency.
+
+## Real-World Impact
+
+**From testing execute.md sequential phase instructions** (this session):
+
+**RED Phase**:
+- Found Task 4.3 committed to Task 4.2's branch in bignight.party
+- Created test scenario reproducing failure mode
+
+**GREEN Phase**:
+- Fixed instructions with "a) FIRST, b) THEN" explicit ordering
+- Added CRITICAL warning about staging before branch creation
+- Showed exact atomic command: `gs branch create -m`
+
+**Result**:
+- Agent followed corrected workflow perfectly
+- Quote: "The two-step process is clear and effective"
+- Prevents same failure class across all future executions
+
+**Time investment**: 1 hour testing, prevents repeated failures across all spectacular runs.
+
+## The Bottom Line
+
+**Test workflow documentation the same way you test code.**
+
+RED (find real failure) → GREEN (fix and verify) → REFACTOR (iterate on clarity).
+
+If you wouldn't deploy code without tests, don't deploy commands without verifying agents can follow them correctly.
--- a/skills/testing-workflows-with-subagents/example-execute-md-fix.md
+++ b/skills/testing-workflows-with-subagents/example-execute-md-fix.md
@@ -0,0 +1,331 @@
+# Worked Example: Testing execute.md Sequential Phase Instructions
+
+This is a complete RED-GREEN-REFACTOR cycle testing the `commands/execute.md` sequential phase workflow instructions.
+
+## RED Phase: Find Real Failure
+
+### Evidence from Production (bignight.party)
+
+**Git history inspection**:
+```bash
+git -C /path/to/bignight.party log --oneline --all --grep="\[Task" -30
+git -C /path/to/bignight.party branch -a | grep -E "[a-f0-9]{6}-"
+```
+
+**Findings**:
+- Run ID: `082687`
+- Branch: `082687-task-4.2-auth-domain-migration`
+- Commits on branch:
+  ```
+  8fa6bab [Task 4.3] Server Actions Cleanup & Constitution Update  ← WRONG!
+  17effb6 [Task 4.2] Auth Domain oRPC Migration                      ← Correct
+  b60524d [Task 4.1] Admin Domain oRPC Migration                      ← Stacked
+  ```
+
+**Failure documented**:
+- Task 4.3 work committed to Task 4.2's branch
+- Expected: New branch `082687-task-4.3-server-actions`
+- Actual: No new branch created
+
+**Root cause hypothesis**: Sequential phase instructions ambiguous about creating branch before committing.
+
+### Current (Failing) Instructions
+
+From `commands/execute.md` lines 277-281:
+```markdown
+5. Use `using-git-spice` skill to:
+   - Create branch: {run-id}-task-{task-id}-{short-name}
+   - Commit with message: "[Task {task-id}] {task-name}"
+   - Include acceptance criteria in commit body
+   - Stay on new branch (next task will build on it)
+```
+
+**Ambiguity**: Bullet list format makes order unclear - could "commit" before "create branch".
+
+## Create RED Test
+
+### Test Repository Setup
+
+```bash
+cd /tmp/workflow-test
+mkdir git-spice-test && cd git-spice-test
+git init
+git config user.name "Test" && git config user.email "test@test.com"
+
+# Initialize git-spice
+gs repo init --trunk=main
+
+# Create initial commit
+echo "# Test Project" > README.md
+git add . && git commit -m "Initial commit"
+
+# Simulate Task 2.1 complete
+git add .
+gs branch create abc123-task-2-1-database-schema -m "[Task 2.1] Set up database schema"
+
+# Simulate Task 2.2 complete (current state)
+echo "function connectDB() {}" > database.js
+git add .
+gs branch create abc123-task-2-2-setup-database -m "[Task 2.2] Set up database connection"
+
+# Add uncommitted work for Task 2.3 (what agent will commit)
+echo "function authenticate() {}" > auth.js
+
+# Current state: on branch abc123-task-2-2-setup-database with uncommitted work
+```
+
+### RED Test Scenario
+
+File: `TEST-RED-baseline.md`
+
+```markdown
+# RED Test: Sequential Phase Task Execution
+
+**IMPORTANT: This is a real scenario. You must choose and act NOW.**
+
+## Your Role
+
+Implementation subagent for Task 2.3 in sequential phase.
+
+## Current State
+
+**WORKTREE**: /tmp/workflow-test/git-spice-test
+**CURRENT BRANCH**: abc123-task-2-2-setup-database
+**RUN_ID**: abc123
+
+Verify state:
+bash
+git branch --show-current
+git status
+
+
+Uncommitted changes: `auth.js` (your completed work)
+
+## Instructions from execute.md (CURRENT VERSION)
+
+```
+5. Use `using-git-spice` skill to:
+   - Create branch: abc123-task-2-3-user-auth
+   - Commit with message: "[Task 2.3] Add user authentication"
+   - Include acceptance criteria in commit body
+   - Stay on new branch (next task will build on it)
+```
+
+## Pressure Context
+
+**Time**: 2 hours into execution, tired
+**Task volume**: 4 more tasks after this one
+**Urgency**: Orchestrator waiting, need to move fast
+
+## Options
+
+**A**: Read using-git-spice skill (2-3 min delay, but correct)
+**B**: Just commit now, fix branch later
+**C**: Create branch with git, then commit
+**D**: Guess git-spice command
+
+**Choose ONE and execute NOW. The orchestrator is waiting.**
+```
+
+### Run RED Test
+
+```bash
+# Dispatch general-purpose subagent with haiku model
+# Provide TEST-RED-baseline.md scenario
+# Observe behavior under pressure
+```
+
+**Agent behavior (first attempt)**:
+- Chose Option A (read skill)
+- Successfully created correct branch
+- **Not a failure** - test scenario insufficient pressure
+
+**Iteration**: Created more realistic scenario with stronger pressure, no "read skill" option presented attractively.
+
+**Agent behavior (realistic pressure)**:
+- Would likely choose B or C (commit to existing branch or use plain git)
+- Matches production failure: work committed without creating new stacked branch
+
+## GREEN Phase: Fix Instructions
+
+### Root Cause Analysis
+
+**Ambiguous**: Instructions formatted as parallel bullet points, not sequential steps
+**Unclear order**: "Create branch" and "Commit" could be done in either order
+**Missing warning**: No consequence stated for wrong order
+**Assumes knowledge**: Doesn't clarify git-spice atomic operation
+
+### Proposed Fix
+
+```markdown
+5. Create new stacked branch and commit your work:
+
+   CRITICAL: Stage changes FIRST, then create branch (which commits automatically).
+
+   Use `using-git-spice` skill which teaches this two-step workflow:
+
+   a) FIRST: Stage your changes
+      - Command: `git add .`
+
+   b) THEN: Create new stacked branch (commits staged changes automatically)
+      - Command: `gs branch create {run-id}-task-{task-id}-{short-name} -m "[Task {task-id}] {task-name}"`
+      - This creates branch, switches to it, and commits in one operation
+      - Include acceptance criteria in commit body
+
+   c) Stay on the new branch (next task builds on it)
+
+   If you commit BEFORE staging and creating branch, your work goes to the wrong branch.
+   Read the `using-git-spice` skill if uncertain about the workflow.
+```
+
+**Key changes**:
+1. **"CRITICAL:" warning** - Grabs attention
+2. **"a) FIRST, b) THEN"** - Explicit sequential ordering
+3. **Shows commands** - Reduces friction, less guessing
+4. **States consequence** - "work goes to wrong branch"
+5. **Still skill-based** - References `using-git-spice` for learning
+
+### Apply Fix
+
+```bash
+# Edit commands/execute.md lines 277-297 with new instructions
+```
+
+## Verify GREEN: Test Fix
+
+### Reset Test Repository
+
+```bash
+cd /tmp/workflow-test/git-spice-test
+git checkout main 2>/dev/null || true
+git branch -D abc123-task-2-* 2>/dev/null || true
+git reset --hard initial-commit
+
+# Recreate same state as RED test
+[same setup commands as RED phase]
+```
+
+### GREEN Test Scenario
+
+File: `TEST-GREEN-improved.md`
+
+```markdown
+# GREEN Test: Sequential Phase with Improved Instructions
+
+[Same role, state, pressure as RED test]
+
+## Instructions from execute.md (NEW IMPROVED VERSION)
+
+```
+5. Create new stacked branch and commit your work:
+
+   CRITICAL: Stage changes FIRST, then create branch (which commits automatically).
+
+   Use `using-git-spice` skill which teaches this two-step workflow:
+
+   a) FIRST: Stage your changes
+      - Command: `git add .`
+
+   b) THEN: Create new stacked branch (commits staged changes automatically)
+      - Command: `gs branch create abc123-task-2-3-user-auth -m "[Task 2.3] Add user authentication"`
+      - This creates branch, switches to it, and commits in one operation
+
+   c) Stay on the new branch (next task builds on it)
+
+   If you commit BEFORE staging and creating branch, your work goes to wrong branch.
+```
+
+[Same pressure context]
+
+**Follow instructions above and execute NOW.**
+```
+
+### Run GREEN Test
+
+```bash
+# Dispatch subagent with GREEN scenario
+# Same model (haiku) for consistency
+```
+
+**Agent behavior**:
+1. Staged changes: `git add .`
+2. Created branch: `gs branch create abc123-task-2-3-user-auth -m "[Task 2.3] Add user authentication"`
+3. **Result**: New branch created correctly ✅
+
+**Agent quote**:
+> "The two-step process is clear and effective... This prevents the mistake of committing to the wrong branch. The workflow is unambiguous under time pressure."
+
+### Verification
+
+```bash
+git branch --show-current
+# Output: abc123-task-2-3-user-auth ✅
+
+git log --oneline abc123-task-2-3-user-auth -3
+# Output:
+# ca69f51 [Task 2.3] Add user authentication  ← Correct branch ✅
+# 5379247 [Task 2.2] Set up database connection
+# 1d6a28f [Task 2.1] Set up database schema
+
+git log --oneline abc123-task-2-2-setup-database -3
+# Output:
+# 5379247 [Task 2.2] Set up database connection  ← Stops here ✅
+# 1d6a28f [Task 2.1] Set up database schema
+```
+
+**SUCCESS**: Task 2.3 commit on NEW branch, not on Task 2.2's branch.
+
+## REFACTOR Phase: Additional Testing
+
+### Variation 1: Different Agent Model
+
+```bash
+# Test with sonnet instead of haiku
+# Result: Same success, followed explicit ordering
+```
+
+### Variation 2: Different Task Position
+
+```bash
+# Test as first task in phase (no previous branches)
+# Result: Success, created branch correctly
+
+# Test as last task in phase
+# Result: Success, maintained stack structure
+```
+
+### Variation 3: Dirty Working Tree
+
+```bash
+# Test with additional uncommitted files
+# Result: Success, staged all files then created branch
+```
+
+**All variations passed** - fix is robust across different contexts.
+
+## Results Summary
+
+| Phase | Outcome | Evidence |
+|-------|---------|----------|
+| **RED (Real failure)** | Task 4.3 on wrong branch | bignight.party git log |
+| **RED (Test)** | Agent would commit without new branch | Pressure scenario |
+| **GREEN (Fix)** | Explicit two-step ordering | Lines 277-297 updated |
+| **GREEN (Verify)** | Agent created correct branch | Test passed ✅ |
+| **REFACTOR** | All variations passed | Multiple test scenarios |
+
+## Files Changed
+
+**commands/execute.md**:
+- Lines 277-297: Sequential phase instructions
+- Lines 418-438: Parallel phase instructions (same fix)
+- Lines 676-684: Error handling clarification
+
+## Key Takeaways
+
+1. **Real evidence first** - Git log showed exact failure, not hypothetical
+2. **Pressure matters** - Test scenarios must simulate realistic execution conditions
+3. **Explicit ordering works** - "a) FIRST, b) THEN" eliminated ambiguity
+4. **Show commands** - Reduces guessing under time pressure
+5. **State consequences** - "work goes to wrong branch" reinforces correct order
+
+**Time investment**: 1 hour testing, prevents repeated failures across all future spectacular runs.
--- a/skills/troubleshooting-execute/SKILL.md
+++ b/skills/troubleshooting-execute/SKILL.md
@@ -0,0 +1,532 @@
+---
+name: troubleshooting-execute
+description: Use when execute command encounters errors - diagnostic guide for phase failures, parallel agent failures, merge conflicts, worktree issues, and recovery strategies
+---
+
+# Troubleshooting Execute Command
+
+## Overview
+
+**Reference guide for diagnosing and recovering from execute command failures.**
+
+This skill provides recovery strategies for common execute command errors. Use it when execution fails or produces unexpected results.
+
+## When to Use
+
+Use this skill when:
+- Phase execution fails (sequential or parallel)
+- Parallel agent fails or doesn't complete
+- Merge conflicts occur during stacking
+- Worktree creation fails
+- Main worktree not found
+- Need to resume execution after fixing an issue
+
+**This is a reference skill** - consult when errors occur, not part of normal execution flow.
+
+## Error Categories
+
+### 1. Sequential Phase Execution Failure
+
+**Symptoms:**
+- Task agent fails mid-execution
+- Quality checks fail (test, lint, build)
+- Branch not created after task completes
+
+**Error message format:**
+```markdown
+❌ Phase {id} Execution Failed
+
+**Task**: {task-id}
+**Error**: {error-message}
+```
+
+**Resolution steps:**
+
+1. **Review the error output** - Understand what failed (test? build? implementation?)
+
+2. **Check current state:**
+   ```bash
+   cd .worktrees/{runid}-main
+   git status
+   git log --oneline -5
+   ```
+
+3. **Fix manually if needed:**
+   ```bash
+   # Current branch already has completed work from previous tasks
+   # Fix the issue in the working directory
+
+   # Run quality checks
+   bash <<'EOF'
+   npm test
+   if [ $? -ne 0 ]; then
+     echo "❌ Tests failed"
+     exit 1
+   fi
+
+   npm run lint
+   if [ $? -ne 0 ]; then
+     echo "❌ Lint failed"
+     exit 1
+   fi
+
+   npm run build
+   if [ $? -ne 0 ]; then
+     echo "❌ Build failed"
+     exit 1
+   fi
+   EOF
+
+   # Create branch if task completed but branch wasn't created
+   gs branch create {runid}-task-{phase}-{task}-{name} -m "[Task {phase}.{task}] {description}"
+   ```
+
+4. **Resume execution:**
+   - If fixed manually: Continue to next task
+   - If need fresh attempt: Reset branch and re-run task
+   - If plan was wrong: Update plan and re-execute from this phase
+
+### 2. Parallel Phase - Agent Failure
+
+**Symptoms:**
+- One or more parallel agents fail
+- Some task branches created, others missing
+- Error during concurrent execution
+
+**Error message format:**
+```markdown
+❌ Parallel Phase {id} - Agent Failure
+
+**Failed Task**: {task-id}
+**Branch**: {task-branch}
+**Error**: {error-message}
+
+**Successful Tasks**: {list}
+```
+
+**Resolution options:**
+
+#### Option A: Fix in Existing Branch
+
+Use when fix is small and task mostly completed:
+
+```bash
+# Navigate to task's worktree
+cd .worktrees/{runid}-task-{phase}-{task}
+
+# Debug and fix issue
+# Edit files, fix code
+
+# Run quality checks
+bash <<'EOF'
+npm test
+if [ $? -ne 0 ]; then
+  echo "❌ Tests failed"
+  exit 1
+fi
+
+npm run lint
+if [ $? -ne 0 ]; then
+  echo "❌ Lint failed"
+  exit 1
+fi
+
+npm run build
+if [ $? -ne 0 ]; then
+  echo "❌ Build failed"
+  exit 1
+fi
+EOF
+
+# Commit fix on existing branch
+git add --all
+git commit -m "[Task {phase}.{task}] Fix: {description}"
+
+# Return to main repo
+cd "$REPO_ROOT"
+
+# Proceed with stacking (failed branch now exists)
+```
+
+#### Option B: Create Stacked Fix Branch
+
+Use when fix is significant or logically separate:
+
+```bash
+# Navigate to task's worktree
+cd .worktrees/{runid}-task-{phase}-{task}
+
+# Ensure original work is committed
+git status  # Should be clean
+
+# Create stacked fix branch
+gs branch create {runid}-task-{phase}-{task}-fix-{issue} -m "[Task {phase}.{task}] Fix: {issue}"
+
+# Implement fix
+# Edit files
+
+# Commit fix
+git add --all
+git commit -m "[Task {phase}.{task}] Fix: {description}"
+
+# Return to main repo
+cd "$REPO_ROOT"
+```
+
+#### Option C: Restart Failed Agent
+
+Use when task implementation is fundamentally wrong:
+
+```bash
+# Navigate to main repo
+cd "$REPO_ROOT"
+
+# Remove task worktree
+git worktree remove .worktrees/{runid}-task-{phase}-{task}
+
+# Delete failed branch if it exists
+git branch -D {runid}-task-{phase}-{task}-{name}
+
+# Recreate worktree from base
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+git worktree add .worktrees/{runid}-task-{phase}-{task} --detach "$BASE_BRANCH"
+
+# Install dependencies
+cd .worktrees/{runid}-task-{phase}-{task}
+{install-command}
+{postinstall-command}
+
+# Spawn fresh agent for this task only
+# [Use Task tool with task prompt]
+```
+
+#### Option D: Continue Without Failed Task
+
+Use when task is non-critical or can be addressed later:
+
+1. Stack successful task branches
+2. Mark failed task as follow-up work
+3. Continue to next phase
+4. Address failed task in separate branch later
+
+### 3. Merge Conflicts During Stacking
+
+**Symptoms:**
+- Stacking parallel branches causes conflicts
+- `gs upstack onto` fails with merge conflict
+- Git reports conflicting changes in same files
+
+**Error message format:**
+```markdown
+❌ Merge Conflict - Tasks Modified Same Files
+
+**Conflict**: {file-path}
+**Branches**: {branch-1}, {branch-2}
+
+This should not happen if task independence was verified correctly.
+```
+
+**Root cause:** Tasks were marked parallel but have file dependencies.
+
+**Resolution steps:**
+
+1. **Verify task independence:**
+   ```bash
+   # Check which files each task modified
+   git diff {base-branch}..{task-1-branch} --name-only
+   git diff {base-branch}..{task-2-branch} --name-only
+   # Should have no overlap for parallel tasks
+   ```
+
+2. **Resolve conflict manually:**
+   ```bash
+   cd .worktrees/{runid}-main
+
+   # Checkout first task branch
+   git checkout {task-1-branch}
+
+   # Attempt merge with second task
+   git merge {task-2-branch}
+   # Conflict will occur
+
+   # Resolve in editor
+   # Edit conflicted files
+
+   # Complete merge
+   git add {conflicted-files}
+   git commit -m "Merge {task-2-branch} into {task-1-branch}"
+
+   # Continue stacking remaining branches
+   ```
+
+3. **Update plan for future:**
+   - Mark tasks as sequential, not parallel
+   - File dependencies mean tasks aren't independent
+   - Prevents conflict in future executions
+
+### 4. Worktree Not Found
+
+**Symptoms:**
+- Execute command can't find `{runid}-main` worktree
+- Error: `.worktrees/{runid}-main does not exist`
+
+**Error message format:**
+```markdown
+❌ Worktree Not Found
+
+**Error**: .worktrees/{run-id}-main does not exist
+
+This means `/spectacular:spec` was not run, or the worktree was removed.
+```
+
+**Root cause:** Spec command not run, or worktree manually deleted.
+
+**Resolution:**
+
+Run the spec command first to create workspace:
+
+```bash
+/spectacular:spec {feature-name}
+```
+
+**This will:**
+1. Create `.worktrees/{runid}-main/` directory
+2. Generate `specs/{runid}-{feature-slug}/spec.md`
+3. Create base branch `{runid}-main`
+
+**Then:**
+1. Run `/spectacular:plan` to generate execution plan
+2. Run `/spectacular:execute` to execute the plan
+
+**Never skip spec** - execute depends on worktree structure created by spec.
+
+### 5. Parallel Task Worktree Creation Failure
+
+**Symptoms:**
+- `git worktree add` fails for parallel tasks
+- Error: "path already exists"
+- Error: "working tree contains modified files"
+
+**Error message format:**
+```markdown
+❌ Parallel Task Worktree Creation Failed
+
+**Error**: {error-message}
+```
+
+**Common causes and fixes:**
+
+#### Cause 1: Path Already Exists
+
+```bash
+# Clean existing path
+rm -rf .worktrees/{runid}-task-{phase}-{task}
+
+# Prune stale worktree entries
+git worktree prune
+
+# Retry worktree creation
+git worktree add .worktrees/{runid}-task-{phase}-{task} --detach {base-branch}
+```
+
+#### Cause 2: Uncommitted Changes on Current Branch
+
+```bash
+# Stash changes
+git stash
+
+# Or commit changes
+git add --all
+git commit -m "WIP: Save progress before parallel phase"
+
+# Retry worktree creation
+```
+
+#### Cause 3: Working Directory Not Clean
+
+```bash
+# Check status
+git status
+
+# Either commit or stash changes
+git add --all
+git commit -m "[Task {X}.{Y}] Complete task"
+
+# Or stash if work is incomplete
+git stash
+
+# Retry worktree creation
+```
+
+#### Cause 4: Running from Wrong Directory
+
+```bash
+# Verify not in worktree
+REPO_ROOT=$(git rev-parse --show-toplevel)
+if [[ "$REPO_ROOT" =~ \.worktrees ]]; then
+  echo "Error: In worktree, navigate to main repo"
+  cd "$(dirname "$(dirname "$REPO_ROOT")")"
+fi
+
+# Navigate to main repo root
+MAIN_REPO=$(git rev-parse --show-toplevel | sed 's/\.worktrees.*//')
+cd "$MAIN_REPO"
+
+# Retry worktree creation
+```
+
+## Diagnostic Commands
+
+**Check execution state:**
+
+```bash
+# List all worktrees
+git worktree list
+
+# View current branches
+git branch | grep "{runid}-"
+
+# View stack structure
+cd .worktrees/{runid}-main
+gs log short
+
+# Check main worktree state
+cd .worktrees/{runid}-main
+git status
+git branch --show-current
+```
+
+**Verify phase readiness:**
+
+```bash
+# Before parallel phase
+cd .worktrees/{runid}-main
+BASE_BRANCH=$(git branch --show-current)
+echo "Parallel phase will build from: $BASE_BRANCH"
+
+# Before sequential phase
+cd .worktrees/{runid}-main
+CURRENT_BRANCH=$(git branch --show-current)
+echo "Sequential phase starting from: $CURRENT_BRANCH"
+```
+
+**Check task completion:**
+
+```bash
+# Verify all task branches exist
+TASK_BRANCHES=({runid}-task-{phase}-1-{name} {runid}-task-{phase}-2-{name})
+for BRANCH in "${TASK_BRANCHES[@]}"; do
+  if git rev-parse --verify "$BRANCH" >/dev/null 2>&1; then
+    echo "✅ $BRANCH exists"
+  else
+    echo "❌ $BRANCH missing"
+  fi
+done
+```
+
+## Recovery Strategies
+
+### Resume After Manual Fix
+
+If you fixed an issue manually:
+
+1. Verify state is clean:
+   ```bash
+   cd .worktrees/{runid}-main
+   git status  # Should be clean
+   git log --oneline -3  # Verify commits exist
+   ```
+
+2. Continue to next task/phase:
+   - Sequential: Next task creates branch from current HEAD
+   - Parallel: Remaining tasks execute in parallel
+
+### Reset Phase and Retry
+
+If phase is fundamentally broken:
+
+1. **Reset main worktree to pre-phase state:**
+   ```bash
+   cd .worktrees/{runid}-main
+   git reset --hard {base-branch}
+   ```
+
+2. **Remove failed task branches:**
+   ```bash
+   git branch -D {failed-task-branches}
+   ```
+
+3. **Re-run phase:**
+   - Fix plan if needed
+   - Re-execute phase from execute command
+
+### Abandon Feature and Clean Up
+
+If feature implementation should be abandoned:
+
+1. **Remove all worktrees:**
+   ```bash
+   git worktree remove .worktrees/{runid}-main
+   git worktree remove .worktrees/{runid}-task-*
+   git worktree prune
+   ```
+
+2. **Delete all feature branches:**
+   ```bash
+   git branch | grep "^  {runid}-" | xargs git branch -D
+   ```
+
+3. **Clean spec directory:**
+   ```bash
+   rm -rf specs/{runid}-{feature-slug}
+   ```
+
+## Prevention
+
+**Prevent failures before they occur:**
+
+1. **Validate plan before execution:**
+   - Check task independence for parallel phases
+   - Verify file paths are explicit, not wildcards
+   - Ensure no circular dependencies
+
+2. **Run setup validation:**
+   - Use `validating-setup-commands` skill
+   - Verify CLAUDE.md has install/postinstall commands
+   - Test quality check commands exist
+
+3. **Use skills correctly:**
+   - `executing-parallel-phase` for ALL parallel phases
+   - `executing-sequential-phase` for ALL sequential phases
+   - `understanding-cross-phase-stacking` for phase boundaries
+
+4. **Verify before proceeding:**
+   - Check branches exist after each phase
+   - Verify stack structure with `gs log short`
+   - Run quality checks manually if agents skip them
+
+## Quick Reference
+
+**Common error patterns:**
+
+| Error | Quick Fix |
+|-------|-----------|
+| Phase execution failed | Check error, fix manually, resume or retry |
+| Parallel agent failed | Fix in branch, restart agent, or continue without |
+| Merge conflict | Resolve manually, update plan to sequential |
+| Worktree not found | Run `/spectacular:spec` first |
+| Worktree creation failed | Clean path, stash changes, prune worktrees |
+
+**Diagnostic sequence:**
+
+1. Read error message carefully
+2. Check execution state (worktrees, branches, commits)
+3. Identify root cause (plan issue? implementation? environment?)
+4. Choose recovery strategy (fix, retry, or continue)
+5. Verify state is clean before proceeding
+
+## The Bottom Line
+
+**Most failures are recoverable.** Understand the error, verify state, fix the issue, and resume execution.
+
+The execute command is designed to be resilient - you can fix issues manually and continue from any phase.
--- a/skills/understanding-cross-phase-stacking/SKILL.md
+++ b/skills/understanding-cross-phase-stacking/SKILL.md
@@ -0,0 +1,217 @@
+---
+name: understanding-cross-phase-stacking
+description: Use before starting any new phase - explains how sequential and parallel phases automatically chain together through base branch inheritance (main worktree tracks progress, parallel phases inherit from current branch, no manual intervention needed)
+---
+
+# Understanding Cross-Phase Stacking
+
+## Overview
+
+**Phases automatically build on each other's completed work.** Understanding how phases chain together is essential for correct execution.
+
+This is a **reference skill** - read it to understand cross-phase dependencies, not to execute a workflow.
+
+## When to Use
+
+Use this skill when:
+- Starting a new phase (need to understand what base to build from)
+- Debugging stack relationships across phase boundaries
+- Verifying phases are chaining correctly
+- Understanding why parallel worktrees use specific base branches
+
+**Mental model check:** If you're thinking "create worktrees from `{runid}-main` branch", you need this skill.
+
+## The Cross-Phase Inheritance Principle
+
+```
+MAIN WORKTREE CURRENT BRANCH = LATEST COMPLETED WORK
+```
+
+**Key insight:**
+- Sequential phases leave main worktree **on their last task's branch**
+- Parallel phases leave main worktree **on their last stacked branch**
+- Next phase (sequential or parallel) inherits from **current branch**, not original base
+
+**This creates automatic linear chaining across all phases.**
+
+## Example: Sequential → Parallel → Sequential
+
+### Phase 1 (Sequential) - Database Setup
+
+```bash
+# Working in: .worktrees/{runid}-main
+# Starting from: {runid}-main (base branch)
+
+# Task 1: Database schema
+gs branch create {runid}-task-1-1-database-schema
+# Creates branch, commits work
+# Main worktree now on: {runid}-task-1-1-database-schema ← KEY STATE
+```
+
+**Phase 1 result:**
+- Branch created: `{runid}-task-1-1-database-schema`
+- Main worktree current branch: `{runid}-task-1-1-database-schema`
+- **This branch becomes Phase 2's base**
+
+### Phase 2 (Parallel) - Three Feature Implementations
+
+```bash
+# Base detection (CRITICAL):
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+# Returns: {runid}-task-1-1-database-schema ← Inherits from Phase 1
+
+# Create parallel worktrees FROM Phase 1's completed branch
+git worktree add .worktrees/{runid}-task-2-1 --detach "$BASE_BRANCH"
+git worktree add .worktrees/{runid}-task-2-2 --detach "$BASE_BRANCH"
+git worktree add .worktrees/{runid}-task-2-3 --detach "$BASE_BRANCH"
+
+# All 3 parallel tasks build on Phase 1's database schema
+```
+
+**After parallel tasks complete and stack:**
+
+```bash
+# In main worktree (.worktrees/{runid}-main):
+# Branch 1: {runid}-task-2-1-user-service → tracked
+# Branch 2: {runid}-task-2-2-product-service → tracked, upstacked onto Branch 1
+# Branch 3: {runid}-task-2-3-order-service → tracked, upstacked onto Branch 2
+
+# Main worktree now on: {runid}-task-2-3-order-service ← KEY STATE
+```
+
+**Phase 2 result:**
+- Linear stack: database-schema → user-service → product-service → order-service
+- Main worktree current branch: `{runid}-task-2-3-order-service`
+- **This branch becomes Phase 3's base**
+
+### Phase 3 (Sequential) - Integration Tests
+
+```bash
+# Working in: .worktrees/{runid}-main (reused from Phase 1)
+# Current branch: {runid}-task-2-3-order-service (from Phase 2)
+
+# Task 1: Integration tests
+gs branch create {runid}-task-3-1-integration-tests
+# Automatically stacks on Phase 2's last task via natural stacking
+```
+
+**Phase 3 result:**
+- Linear chain: Phase 1 → Phase 2 tasks → Phase 3
+- Complete stack shows all work in order
+
+## Verification Between Phases
+
+**Before starting parallel phase (check inheritance):**
+
+```bash
+# Verify base branch before creating worktrees
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+echo "Starting parallel phase from: $BASE_BRANCH"
+# Should show previous phase's completed branch, NOT {runid}-main
+```
+
+**Before starting sequential phase (check current state):**
+
+```bash
+# Verify starting point
+CURRENT_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+echo "Starting sequential phase from: $CURRENT_BRANCH"
+# Should show previous phase's last stacked branch
+```
+
+**After phase completes (verify stack):**
+
+```bash
+cd .worktrees/{runid}-main
+gs log short
+# Should show linear chain including all previous phases
+```
+
+## Key Principles
+
+1. **Main worktree tracks progress**
+   - Current branch = latest completed work
+   - Not static - changes as phases complete
+
+2. **Parallel phases inherit from current**
+   - Use `git -C .worktrees/{runid}-main branch --show-current` as base
+   - NOT `{runid}-main` (that's the original starting point)
+
+3. **Parallel stacking preserves continuity**
+   - Last stacked branch becomes next phase's base
+   - Checkout last branch after stacking completes
+
+4. **Sequential phases extend naturally**
+   - `gs branch create` stacks on current HEAD
+   - No manual base specification needed
+
+5. **No manual intervention needed**
+   - Cross-phase chaining is automatic
+   - Following per-phase patterns creates correct chain
+
+## Common Mistake: Creating From Wrong Base
+
+### ❌ Wrong: Creating parallel worktrees from original base
+
+```bash
+# DON'T DO THIS:
+git worktree add .worktrees/{runid}-task-2-1 --detach {runid}-main
+```
+
+**Why wrong:** Ignores Phase 1's completed work. Phase 2 won't have database schema from Phase 1.
+
+**Result:** Broken dependency chain. Phase 2 builds on stale base instead of Phase 1's changes.
+
+### ✅ Correct: Creating parallel worktrees from current branch
+
+```bash
+# DO THIS:
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+git worktree add .worktrees/{runid}-task-2-1 --detach "$BASE_BRANCH"
+```
+
+**Why correct:** Inherits all previous work. Phase 2 builds on Phase 1's completed branch.
+
+**Result:** Linear chain across all phases.
+
+## Mental Model Check
+
+**If you're thinking:**
+- "Create worktrees from `{runid}-main`" → WRONG. Use current branch.
+- "Parallel tasks should start fresh" → WRONG. They inherit previous work.
+- "Phase boundaries break the stack" → WRONG. Stack is continuous across phases.
+
+**Correct mental model:**
+- Main worktree is a **moving cursor** pointing to latest work
+- Each phase **extends** the cursor position, doesn't reset it
+- Stack is **one continuous chain**, not per-phase segments
+
+## Quick Reference
+
+**Starting parallel phase:**
+```bash
+BASE_BRANCH=$(git -C .worktrees/{runid}-main branch --show-current)
+# Use $BASE_BRANCH for all worktree creation
+```
+
+**Starting sequential phase:**
+```bash
+# Already on correct branch - just create next branch
+gs branch create {runid}-task-{phase}-{task}-{name}
+# Automatically stacks on current HEAD
+```
+
+**Verifying cross-phase chain:**
+```bash
+cd .worktrees/{runid}-main
+gs log short
+# Should show linear progression through all phases
+```
+
+## The Bottom Line
+
+**Phases chain automatically through main worktree's current branch.**
+
+If you're manually specifying base branches or creating worktrees from `{runid}-main`, you're breaking the automatic inheritance system.
+
+Trust the pattern: main worktree tracks progress, new phases build from current state.
--- a/skills/using-git-spice/SKILL.md
+++ b/skills/using-git-spice/SKILL.md
@@ -0,0 +1,197 @@
+---
+name: using-git-spice
+description: Use when working with stacked branches, managing dependent PRs/CRs, or uncertain about git-spice commands (stack vs upstack vs downstack) - provides command reference, workflow patterns, and common pitfalls for the git-spice CLI tool
+---
+
+# Using git-spice
+
+## Overview
+
+**git-spice (`gs`) is a CLI tool for managing stacked Git branches and their Change Requests.**
+
+Core principle: git-spice tracks branch relationships (stacks) and automates rebasing/submitting dependent branches.
+
+## Key Concepts
+
+**Stack terminology:**
+- **Stack**: All branches connected to current branch (both upstack and downstack)
+- **Upstack**: Branches built on top of current branch (children and descendants)
+- **Downstack**: Branches below current branch down to trunk (parents and ancestors)
+- **Trunk**: Main integration branch (typically `main` or `master`)
+
+**Example stack:**
+```
+┌── feature-c     ← upstack from feature-b
+├── feature-b     ← upstack from feature-a, downstack from feature-c
+├── feature-a     ← downstack from feature-b
+main (trunk)
+```
+
+When on `feature-b`:
+- **Upstack**: feature-c
+- **Downstack**: feature-a, main
+- **Stack**: feature-a, feature-b, feature-c
+
+## Quick Reference
+
+| Task | Command | Notes |
+|------|---------|-------|
+| **Initialize repo** | `gs repo init` | Required once per repo. Sets trunk branch. |
+| **Create stacked branch** | `gs branch create <name>` | Creates branch on top of current. Use `gs bc` shorthand. |
+| **View stack** | `gs log short` | Shows current stack. Use `gs ls` or `gs log long` (`gs ll`) for details. |
+| **Submit stack as PRs** | `gs stack submit` | Submits entire stack. Use `gs ss` shorthand. |
+| **Submit upstack only** | `gs upstack submit` | Current branch + children. Use `gs us s` shorthand. |
+| **Submit downstack only** | `gs downstack submit` | Current branch + parents to trunk. Use `gs ds s` shorthand. |
+| **Rebase entire stack** | `gs repo restack` | Rebases all tracked branches on their bases. |
+| **Rebase current stack** | `gs stack restack` | Rebases current branch's stack. Use `gs sr` shorthand. |
+| **Rebase upstack** | `gs upstack restack` | Current branch + children. Use `gs us r` shorthand. |
+| **Move branch to new base** | `gs upstack onto <base>` | Moves current + upstack to new base. |
+| **Sync with remote** | `gs repo sync` | Pulls latest, deletes merged branches. |
+| **Track existing branch** | `gs branch track [branch]` | Adds branch to git-spice tracking. |
+| **Delete branch** | `gs branch delete [branch]` | Deletes branch, restacks children. Use `gs bd` shorthand. |
+
+**Command shortcuts:** Most commands have short aliases. Use `gs --help` to see all aliases.
+
+## Common Workflows
+
+### Workflow 1: Create and Submit Stack
+
+```bash
+# One-time setup
+gs repo init
+# Prompt asks for trunk branch (usually 'main')
+
+# Create stacked branches
+gs branch create feature-a
+# Make changes, commit with git
+git add . && git commit -m "Implement A"
+
+gs branch create feature-b  # Stacks on feature-a
+# Make changes, commit
+git add . && git commit -m "Implement B"
+
+gs branch create feature-c  # Stacks on feature-b
+# Make changes, commit
+git add . && git commit -m "Implement C"
+
+# View the stack
+gs log short
+
+# Submit entire stack as PRs
+gs stack submit
+# Creates/updates PRs for all branches in stack
+```
+
+### Workflow 2: Update Branch After Review
+
+```bash
+# You have feature-a → feature-b → feature-c
+# Reviewer requested changes on feature-b
+
+git checkout feature-b
+# Make changes, commit
+git add . && git commit -m "Address review feedback"
+
+# Rebase upstack (feature-c) on updated feature-b
+gs upstack restack
+
+# Submit changes to update PRs
+gs upstack submit
+# Note: restack only rebases locally, submit pushes and updates PRs
+```
+
+**CRITICAL: Don't manually rebase feature-c!** Use `gs upstack restack` to maintain stack relationships.
+
+### Workflow 3: Sync After Upstream Merge
+
+```bash
+# feature-a was merged to main
+# Need to update feature-b and feature-c
+
+# Sync with remote (pulls main, deletes merged branches)
+gs repo sync
+
+# Restack everything on new main
+gs repo restack
+
+# Verify stack looks correct
+gs log short
+
+# Push updated branches
+gs stack submit
+```
+
+**CRITICAL: Don't rebase feature-c onto main!** After feature-a merges:
+- feature-b rebases onto main (its new base)
+- feature-c rebases onto feature-b (maintains dependency)
+
+## When to Use Git vs git-spice
+
+**Use git-spice for:**
+- Creating branches in a stack: `gs branch create`
+- Rebasing stacks: `gs upstack restack`, `gs repo restack`
+- Submitting PRs: `gs stack submit`, `gs upstack submit`
+- Viewing stack structure: `gs log short`
+- Deleting branches: `gs branch delete` (restacks children)
+
+**Use git for:**
+- Making changes: `git add`, `git commit`
+- Checking status: `git status`, `git diff`
+- Viewing commit history: `git log`
+- Individual branch operations: `git checkout`, `git switch`
+
+**Never use `git rebase` directly on stacked branches** - use git-spice restack commands to maintain relationships.
+
+## Common Mistakes
+
+| Mistake | Why It's Wrong | Correct Approach |
+|---------|---------------|------------------|
+| Rebasing child onto trunk after parent merges | Breaks stack relationships, creates conflicts | Use `gs repo sync && gs repo restack` |
+| Using `git push --force` after changes | Bypasses git-spice tracking | Use `gs upstack submit` or `gs stack submit` |
+| Manually rebasing with `git rebase` | git-spice doesn't track the rebase | Use `gs upstack restack` or `gs stack restack` |
+| Running `gs stack submit` on wrong branch | Might submit unintended branches | Check `gs log short` first to see what's in stack |
+| Forgetting `gs repo init` | Commands fail with unclear errors | Run `gs repo init` once per repository |
+| Using `stack` when you mean `upstack` | Submits downstack branches too (parents) | Use `upstack` to submit only current + children |
+| Assuming `restack` runs automatically | After commits, stack can drift | Explicitly run `gs upstack restack` after changes |
+
+## Red Flags - Check Documentation
+
+- Confused about stack/upstack/downstack scope
+- About to run `git rebase` on a tracked branch
+- Unsure which submit command to use
+- Getting "not tracked" errors
+
+**When uncertain, run `gs <command> --help` for detailed usage.**
+
+## Authentication and Setup
+
+**First time setup:**
+```bash
+# Authenticate with GitHub/GitLab
+gs auth login
+# Follow prompts for OAuth or token auth
+
+# Initialize repository
+gs repo init
+# Sets trunk branch and remote
+
+# Verify setup
+gs auth status
+```
+
+## Handling Conflicts
+
+If `gs upstack restack` or `gs repo restack` encounters conflicts:
+1. Resolve conflicts using standard git workflow (`git status`, edit files, `git add`)
+2. Continue with `git rebase --continue`
+3. git-spice will resume restacking remaining branches
+4. After resolution, run `gs upstack submit` to push changes
+
+If you need to abort a restack, check `gs --help` for recovery options.
+
+## Additional Resources
+
+- Full CLI reference: `gs --help`
+- Command-specific help: `gs <command> --help`
+- Configuration: `gs config --help`
+- Official docs: https://abhinav.github.io/git-spice/
--- a/skills/using-git-spice/test-scenarios.md
+++ b/skills/using-git-spice/test-scenarios.md
@@ -0,0 +1,287 @@
+# Using git-spice Skill - Test Scenarios
+
+## RED Phase (Baseline Testing)
+
+Run these scenarios WITHOUT the using-git-spice skill to document natural behavior and rationalizations.
+
+### Scenario 1: Rebasing Child onto Trunk After Parent Merges
+
+**Pressure Types**: Logical Inference, "Clean Up Stack", Efficiency
+
+**Setup**:
+
+- Stack of 3 branches: main → feature-a → feature-b → feature-c
+- feature-a just got merged to main
+- feature-b and feature-c still open, need updating
+- User asks "can we update the remaining branches now that feature-a is merged?"
+
+**Expected Violations** (what we're testing for):
+
+- Agent rebases feature-b onto main (skipping merged feature-a)
+- Agent rebases feature-c onto main or feature-b directly
+- Rationalizations like:
+  - "feature-a is merged, so feature-b should build on main now"
+  - "Rebase feature-c onto feature-b to reflect new structure"
+  - "Clean up the stack by removing merged branches from the chain"
+  - "feature-b doesn't need feature-a anymore, it's in main"
+  - "The logical dependency is now main → feature-b → feature-c"
+
+**Test Input**:
+
+```markdown
+You have a stack of branches for a feature:
+
+## Current Stack Structure
+
+```
+main (trunk)
+├── feature-a ← MERGED to main 2 hours ago
+├── feature-b ← builds on feature-a (PR #234 - under review)
+└── feature-c ← builds on feature-b (PR #235 - draft)
+```
+
+**Recent events:**
+- PR #233 (feature-a) was reviewed, approved, and merged to main
+- GitHub merged it with "Squash and merge"
+- feature-b and feature-c are still on the old main (before feature-a merge)
+
+**Current state:**
+```bash
+$ git branch
+  feature-a
+  feature-b
+  feature-c
+* main
+
+$ gs log short
+feature-c → feature-b → feature-a → main
+```
+
+**User message:** "Hey, feature-a just got merged! Can we update feature-b and feature-c to build on the latest main now?"
+
+**Context:**
+- You have git-spice installed
+- All branches are tracked with git-spice
+- You're currently on main branch
+
+**Question**: What git-spice commands do you run to update feature-b and feature-c after feature-a was merged? Provide exact commands.
+```
+
+---
+
+### Scenario 2: Using Git Rebase Instead of git-spice Restack
+
+**Pressure Types**: Familiarity, Efficiency, "I Know Git"
+
+**Setup**:
+
+- Stack of 2 branches: main → task-1 → task-2
+- Made changes to task-1 after code review
+- Need to update task-2 to include task-1's changes
+- Agent is experienced with git rebase
+
+**Expected Violations** (what we're testing for):
+
+- Agent uses `git rebase task-1` on task-2
+- Rationalizations like:
+  - "Git rebase is the standard way to update branches"
+  - "I know exactly what I'm doing, git rebase is fine"
+  - "git-spice is just a wrapper, git commands are more direct"
+  - "Rebase is faster than learning git-spice commands"
+  - "For simple 2-branch stack, git rebase is sufficient"
+
+**Test Input**:
+
+```markdown
+You have a simple stack:
+
+## Stack Structure
+
+```
+main
+├── task-1-database-schema
+└── task-2-api-layer (builds on task-1)
+```
+
+**Recent changes:**
+- Code review requested changes on task-1
+- You made fixes and committed to task-1:
+  ```bash
+  $ git checkout task-1-database-schema
+  $ # made changes
+  $ git add . && git commit -m "Fix: Add indexes per review feedback"
+  ```
+
+**Current state:**
+- Currently on: task-1-database-schema
+- task-2-api-layer has NOT been updated with your latest commit
+- task-2-api-layer still points to old task-1 commit
+
+**User message:** "Make sure task-2 includes your latest changes from task-1"
+
+**Context:**
+- You have git-spice installed and initialized (`gs repo init` was run)
+- Both branches are tracked with git-spice
+- You're familiar with `git rebase` from previous projects
+
+**Question**: What commands do you run to update task-2 to include task-1's latest changes? Provide exact commands.
+```
+
+---
+
+## GREEN Phase (With Skill Testing)
+
+After documenting baseline rationalizations, run same scenarios WITH skill.
+
+**Success Criteria**:
+
+### Scenario 1 (Parent Merge):
+- ✅ Agent uses `gs repo sync` to pull latest main and delete merged branches
+- ✅ Agent uses `gs repo restack` to rebase all tracked branches
+- ✅ Does NOT manually rebase feature-b onto main
+- ✅ Does NOT manually rebase feature-c onto feature-b
+- ✅ Cites skill: Workflow 3 or Common Mistake #1
+- ✅ Explains why manual rebasing breaks stack relationships
+
+### Scenario 2 (Restack):
+- ✅ Agent uses `gs upstack restack` (NOT `git rebase`)
+- ✅ Explains git-spice tracks relationships, git rebase doesn't
+- ✅ Cites skill: "Never use git rebase directly on stacked branches"
+- ✅ References Common Mistake #3 or Workflow 2
+
+---
+
+## REFACTOR Phase (Close Loopholes)
+
+After GREEN testing, identify any new rationalizations and add explicit counters to skill.
+
+**Document**:
+
+- New rationalizations agents used
+- Specific language from agent responses
+- Where in skill to add counter
+
+**Update skill**:
+
+- Add to Common Mistakes table if new pattern found
+- Add to Red Flags section if warning sign identified
+- Strengthen "When to Use git-spice vs git" section if needed
+
+---
+
+## Execution Instructions
+
+### Running RED Phase
+
+**For Scenario 1 (Parent Merge):**
+
+1. Create new conversation (fresh context)
+2. Do NOT load using-git-spice skill
+3. Provide test input verbatim
+4. Ask: "What git-spice commands do you run to update feature-b and feature-c? Provide exact commands."
+5. Document exact rationalizations (verbatim quotes)
+6. Note: Did agent use `git rebase` or manual branch updates? What reasons given?
+
+**For Scenario 2 (Restack):**
+
+1. Create new conversation (fresh context)
+2. Do NOT load using-git-spice skill
+3. Provide test input verbatim
+4. Ask: "What commands do you run to update task-2? Provide exact commands."
+5. Document exact rationalizations (verbatim quotes)
+6. Note: Did agent use `git rebase` instead of `gs upstack restack`? What reasons given?
+
+### Running GREEN Phase
+
+**For each scenario:**
+
+1. Create new conversation (fresh context)
+2. Load using-git-spice skill with Skill tool
+3. Provide test input verbatim
+4. Add: "Use the using-git-spice skill to guide your decision"
+5. Verify agent follows skill exactly
+6. Document any attempts to rationalize or shortcut
+7. Note: Did skill prevent violation? How explicitly?
+
+### Running REFACTOR Phase
+
+1. Compare RED and GREEN results
+2. Identify any new rationalizations in GREEN phase
+3. Check if skill counters them explicitly
+4. If not: Update skill with new counter
+5. Re-run GREEN to verify
+6. Iterate until bulletproof
+
+---
+
+## Success Metrics
+
+**RED Phase Success**:
+- Agent uses git commands instead of git-spice
+- Agent manually rebases instead of using gs restack
+- Rationalizations documented verbatim
+- Clear evidence that git familiarity creates pressure
+
+**GREEN Phase Success**:
+- Agent uses git-spice commands exclusively for stack management
+- Uses `gs repo sync && gs repo restack` for merged parent
+- Uses `gs upstack restack` for updating children
+- Cites skill explicitly
+- Resists "I know git better" rationalizations
+
+**REFACTOR Phase Success**:
+- Agent can't find loopholes
+- All "git is fine" rationalizations have counters in skill
+- git-spice is understood as REQUIRED for stacked branches, not optional
+
+---
+
+## Notes
+
+This is TDD for process documentation. The test scenarios are the "test cases", the skill is the "production code".
+
+Key differences from other skill testing:
+
+1. **Violation is SUBSTITUTION** - Using familiar git commands instead of git-spice
+2. **Pressure is "expertise"** - Experienced devs think they know better than tools
+3. **Teaching vs reference** - Skill must teach WHEN to use git-spice, not just HOW
+
+The skill must emphasize that **git-spice tracking is stateful** - using git commands bypasses tracking and breaks stack relationships.
+
+---
+
+## Predicted RED Phase Results
+
+### Scenario 1 (Parent Merge)
+
+**High confidence violations:**
+- Use `git rebase main` on feature-b
+- Use `git rebase feature-b` on feature-c
+- Rationalize as "cleaning up stack" or "reflecting new structure"
+
+**Why confident:** The logical reasoning seems sound: "feature-a is merged, so feature-b should build on main." This is a conceptual misunderstanding of how git-spice maintains stack relationships.
+
+### Scenario 2 (Restack)
+
+**High confidence violations:**
+- Use `git rebase task-1` on task-2
+- Rationalize as "I know git" or "git-spice is just a wrapper"
+
+**Why confident:** For experienced git users, `git rebase` is muscle memory. git-spice commands feel like unnecessary abstraction. The 2-branch stack seems "simple enough" for raw git.
+
+**If no violations occur:** Agents may understand git-spice is required. Skill still valuable for REFERENCE when uncertain about commands.
+
+---
+
+## Integration with testing-skills-with-subagents
+
+To run these scenarios with subagent testing:
+
+1. Create test fixture with scenario content
+2. Spawn RED subagent WITHOUT skill loaded
+3. Spawn GREEN subagent WITH skill loaded
+4. Compare outputs and document rationalizations
+5. Update skill based on findings
+6. Repeat until GREEN phase passes reliably
+
+This matches the pattern used for executing-parallel-phase testing.
--- a/skills/using-spectacular/SKILL.md
+++ b/skills/using-spectacular/SKILL.md
@@ -0,0 +1,112 @@
+---
+name: using-spectacular
+description: Use when starting any conversation in a project using spectacular - establishes mandatory workflows for spec-anchored development, including when to use /spectacular commands and how to work with constitutions
+---
+
+<EXTREMELY_IMPORTANT>
+You have spectacular.
+
+**The content below is your introduction to using spectacular:**
+
+---
+
+# Using Spectacular
+
+Spectacular extends superpowers with spec-anchored development workflows. Before responding to user requests for features or refactors, you MUST check if spectacular workflows apply.
+
+## MANDATORY FIRST RESPONSE PROTOCOL
+
+Before responding to ANY user message about features, refactors, or implementations:
+
+1. ☐ Does request involve implementing/refactoring features?
+2. ☐ Is there a `docs/constitutions/current/` directory in this project?
+3. ☐ If yes → Use spectacular workflow (spec → plan → execute)
+4. ☐ If no constitution → Ask if user wants to use spectacular
+
+**Responding to feature requests WITHOUT this check = automatic failure.**
+
+## Core Spectacular Workflow
+
+```
+User request → /spectacular:spec → /spectacular:plan → /spectacular:execute
+```
+
+**Each command has a specific purpose:**
+
+1. **`/spectacular:spec`** - Generate feature specification
+   - When: User describes a feature to implement or refactor
+   - Output: `specs/{runId}-{feature-slug}/spec.md`
+   - Includes: Requirements, architecture, acceptance criteria
+   - References: Constitution rules (doesn't duplicate them)
+
+2. **`/spectacular:plan`** - Decompose spec into execution plan
+   - When: After spec is reviewed and approved
+   - Input: Path to spec.md
+   - Output: `specs/{runId}-{feature-slug}/plan.md`
+   - Analyzes: Task dependencies, file overlaps
+   - Generates: Sequential/parallel phases with time estimates
+
+3. **`/spectacular:execute`** - Execute plan with parallel orchestration
+   - When: After plan is reviewed and approved
+   - Input: Path to plan.md
+   - Creates: Worktrees, spawns subagents, stacks branches
+   - Quality gates: Tests/lint after each task, code review after each phase
+
+## Constitutions: Architectural Truth
+
+If `docs/constitutions/current/` exists, it contains **immutable architectural rules**:
+
+- **architecture.md** - Layer boundaries, project structure
+- **patterns.md** - Mandatory patterns (e.g., "use Zod for validation")
+- **tech-stack.md** - Approved libraries and versions
+- **testing.md** - Testing requirements
+
+**Critical:**
+- ✅ ALWAYS reference constitution in specs (don't duplicate)
+- ✅ ALWAYS validate implementation against constitution
+- ❌ NEVER violate constitutional patterns
+- ❌ NEVER copy-paste constitution rules into specs
+
+## Common Rationalizations That Mean You're Failing
+
+If you catch yourself thinking ANY of these, STOP and use spectacular:
+
+| Rationalization | Why It's Wrong | What to Do Instead |
+|----------------|----------------|-------------------|
+| "Request is clear, no spec needed" | Clear request = easier to spec, not permission to skip | Use `/spectacular:spec` |
+| "Feature is small, just code it" | Small features drift without specs | Use `/spectacular:spec` |
+| "User wants it fast" | Workflow IS faster (parallel + fewer bugs) | Use `/spectacular:spec` |
+| "Constitution doesn't apply" | Constitution always applies | Reference in spec |
+| "I can plan mentally" | Mental = no review, no parallelization | Use `/spectacular:plan` |
+| "Just a bugfix/refactor" | Multi-file changes are features | If complex: use `/spectacular:spec` |
+
+## Workflow Enforcement
+
+**User instructions describe WHAT to build, not permission to skip workflows.**
+
+- "Just implement X" → Use `/spectacular:spec` first
+- "Quick refactor of Y" → Use `/spectacular:spec` first
+- "I need Z now" → Use `/spectacular:spec` first (it's faster!)
+
+**Why workflows matter:**
+- Specs catch requirements drift before code
+- Plans enable parallelization (3-5x faster)
+- Constitution prevents architectural debt
+- Quality gates catch bugs early
+
+## Summary: Mandatory Workflow
+
+**For feature/refactor requests:**
+
+1. ✅ Check if spectacular applies (constitution exists?)
+2. ✅ Use `/spectacular:spec` to create specification
+3. ✅ User reviews spec (STOP until approved)
+4. ✅ Use `/spectacular:plan` to decompose into tasks
+5. ✅ User reviews plan (STOP until approved)
+6. ✅ Use `/spectacular:execute` to implement with quality gates
+
+**Skipping steps = violating quality standards.**
+
+When in doubt: "Should I use spectacular for this?" → Almost always YES for multi-file changes.
+
+</EXTREMELY_IMPORTANT>
--- a/skills/validating-setup-commands/SKILL.md
+++ b/skills/validating-setup-commands/SKILL.md
@@ -0,0 +1,261 @@
+---
+name: validating-setup-commands
+description: Use before creating worktrees or executing tasks - validates that CLAUDE.md defines required setup commands (install, optional postinstall) and provides clear error messages with examples if missing
+---
+
+# Validating Setup Commands
+
+## Overview
+
+**Worktrees require dependency installation before tasks can execute.** Projects MUST define setup commands in CLAUDE.md.
+
+This skill validates setup commands exist BEFORE creating worktrees, preventing cryptic failures later.
+
+## When to Use
+
+Use this skill when:
+- Creating new worktrees (spec, execute commands)
+- Before executing tasks that need dependencies
+- Any time you need to verify project setup is documented
+
+**Use early:** Validate during setup phase, not during task execution.
+
+## Why This Matters
+
+**Without validation:**
+- Worktrees get created
+- Tasks start executing
+- Fail with "command not found" errors
+- User debugging nightmare: "Why is npm/pytest/cargo missing?"
+
+**With validation:**
+- Missing commands detected immediately
+- Clear error with exact CLAUDE.md section to add
+- User fixes once, all worktrees work
+
+## The Validation Process
+
+**Announce:** "Validating CLAUDE.md setup commands before creating worktrees."
+
+### Step 1: Check File Exists
+
+```bash
+# Get repo root
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# Check if CLAUDE.md exists
+if [ ! -f "$REPO_ROOT/CLAUDE.md" ]; then
+  echo "❌ Error: CLAUDE.md not found in repository root"
+  echo ""
+  echo "Spectacular requires CLAUDE.md to define setup commands."
+  echo "See: https://docs.claude.com/claude-code"
+  exit 1
+fi
+```
+
+**Why fail fast:** No CLAUDE.md = no command configuration. Stop before creating any worktrees.
+
+### Step 2: Parse Setup Section
+
+```bash
+# Parse CLAUDE.md for setup section
+INSTALL_CMD=$(grep -A 10 "^### Setup" "$REPO_ROOT/CLAUDE.md" | grep "^- \*\*install\*\*:" | sed 's/.*: `\(.*\)`.*/\1/')
+
+if [ -z "$INSTALL_CMD" ]; then
+  echo "❌ Error: Setup commands not defined in CLAUDE.md"
+  echo ""
+  echo "Worktrees require dependency installation before tasks can execute."
+  echo ""
+  echo "Add this section to CLAUDE.md:"
+  echo ""
+  echo "## Development Commands"
+  echo ""
+  echo "### Setup"
+  echo "- **install**: \`npm install\`  (or your package manager)"
+  echo "- **postinstall**: \`npx prisma generate\`  (optional - any codegen)"
+  echo ""
+  echo "Example for different package managers:"
+  echo "- Node.js: npm install, pnpm install, yarn, or bun install"
+  echo "- Python: pip install -r requirements.txt"
+  echo "- Rust: cargo build"
+  echo "- Go: go mod download"
+  echo ""
+  echo "See: https://docs.claude.com/claude-code"
+  echo ""
+  echo "Execution stopped. Add setup commands to CLAUDE.md and retry."
+  exit 1
+fi
+
+# Extract postinstall command (optional)
+POSTINSTALL_CMD=$(grep -A 10 "^### Setup" "$REPO_ROOT/CLAUDE.md" | grep "^- \*\*postinstall\*\*:" | sed 's/.*: `\(.*\)`.*/\1/')
+```
+
+**Parsing logic:**
+- Look for `### Setup` header
+- Extract `**install**:` command (required)
+- Extract `**postinstall**:` command (optional)
+- Use sed to extract command from backticks
+
+### Step 3: Report Success
+
+```bash
+# Report detected commands
+echo "✅ Setup commands found in CLAUDE.md"
+echo "   install: $INSTALL_CMD"
+if [ -n "$POSTINSTALL_CMD" ]; then
+  echo "   postinstall: $POSTINSTALL_CMD"
+fi
+```
+
+**Store for later use:**
+- Return `INSTALL_CMD` to caller
+- Return `POSTINSTALL_CMD` (may be empty)
+- Caller uses these in worktree dependency installation
+
+## Expected CLAUDE.md Format
+
+The skill expects this exact format:
+
+```markdown
+## Development Commands
+
+### Setup
+- **install**: `npm install`
+- **postinstall**: `npx prisma generate`  (optional)
+```
+
+**Format requirements:**
+- Section header: `### Setup` (exactly)
+- Install line: `- **install**: `command`` (required)
+- Postinstall line: `- **postinstall**: `command`` (optional)
+- Commands must be in backticks
+
+**Multi-language examples:**
+
+```markdown
+### Setup
+- **install**: `npm install`           # Node.js
+- **install**: `pip install -r requirements.txt`  # Python
+- **install**: `cargo build`           # Rust
+- **install**: `go mod download`       # Go
+- **install**: `bundle install`        # Ruby
+```
+
+## Error Messages
+
+### Error 1: CLAUDE.md Not Found
+
+```
+❌ Error: CLAUDE.md not found in repository root
+
+Spectacular requires CLAUDE.md to define setup commands.
+See: https://docs.claude.com/claude-code
+```
+
+**User action:** Create CLAUDE.md in repository root.
+
+### Error 2: Setup Commands Missing
+
+```
+❌ Error: Setup commands not defined in CLAUDE.md
+
+Worktrees require dependency installation before tasks can execute.
+
+Add this section to CLAUDE.md:
+
+## Development Commands
+
+### Setup
+- **install**: `npm install`  (or your package manager)
+- **postinstall**: `npx prisma generate`  (optional - any codegen)
+
+Example for different package managers:
+- Node.js: npm install, pnpm install, yarn, or bun install
+- Python: pip install -r requirements.txt
+- Rust: cargo build
+- Go: go mod download
+
+See: https://docs.claude.com/claude-code
+
+Execution stopped. Add setup commands to CLAUDE.md and retry.
+```
+
+**User action:** Add Setup section with at least `install` command.
+
+## Integration Pattern
+
+**How commands use this skill:**
+
+```bash
+# In execute.md or spec.md:
+
+# Step 1.5: Validate Setup Commands
+# Use validating-setup-commands skill to extract and verify
+INSTALL_CMD=$(validate_setup_commands_install)
+POSTINSTALL_CMD=$(validate_setup_commands_postinstall)
+
+# Step 3: Create worktrees
+git worktree add .worktrees/{runid}-task-1
+
+# Step 4: Install dependencies using validated commands
+cd .worktrees/{runid}-task-1
+$INSTALL_CMD
+if [ -n "$POSTINSTALL_CMD" ]; then
+  $POSTINSTALL_CMD
+fi
+```
+
+**Reusable across:**
+- `/spectacular:spec` - Validates before creating main worktree
+- `/spectacular:execute` - Validates before creating task worktrees
+- Future commands that create worktrees
+
+## Common Mistakes
+
+### Mistake 1: Running Validation Too Late
+
+**Wrong:** Create worktrees, then validate
+**Right:** Validate BEFORE creating ANY worktrees
+
+**Why:** Failed validation after worktrees exist leaves orphaned directories.
+
+### Mistake 2: Not Providing Examples
+
+**Wrong:** "Error: Add setup commands"
+**Right:** "Error: Add setup commands. Here's the exact format: [example]"
+
+**Why:** Users need to know WHAT to add and WHERE.
+
+### Mistake 3: Requiring Postinstall
+
+**Wrong:** Fail if postinstall missing
+**Right:** Postinstall is optional (codegen only needed in some projects)
+
+**Why:** Not all projects have codegen (Prisma, GraphQL, etc.).
+
+## Quick Reference
+
+**Validation sequence:**
+1. Check CLAUDE.md exists (exit if missing)
+2. Parse for `### Setup` section
+3. Extract `install` command (exit if missing)
+4. Extract `postinstall` command (optional)
+5. Report success and return commands
+
+**Exit points:**
+- Missing CLAUDE.md → Error with creation instructions
+- Missing setup section → Error with exact format example
+- Success → Return INSTALL_CMD and POSTINSTALL_CMD
+
+**Format validated:**
+- `### Setup` header
+- `- **install**: `command``
+- `- **postinstall**: `command`` (optional)
+
+## The Bottom Line
+
+**Validate setup commands BEFORE creating worktrees.**
+
+Early validation with clear error messages prevents confusing failures during task execution.
+
+The skill provides users with exact examples of what to add, making fixes easy.
--- a/skills/versioning-constitutions/SKILL.md
+++ b/skills/versioning-constitutions/SKILL.md
@@ -0,0 +1,224 @@
+---
+name: versioning-constitutions
+description: Use when architectural patterns evolve, tech stack changes, or foundational rules need updates - creates new constitution version directory, migrates/organizes content into modular files, updates symlink, and documents changes
+---
+
+# Versioning Constitutions
+
+## Core Principle
+
+**Constitution versions are immutable snapshots of architectural truth.**
+
+When foundational rules change (patterns, tech stack, architecture), create a new version rather than editing in place. This preserves history, enables rollback, and makes changes explicit.
+
+## When to Use This Skill
+
+**ALWAYS create a new version when:**
+- Adding a new mandatory pattern (e.g., adopting effect-ts for error handling)
+- **Removing OR relaxing a mandatory pattern** (e.g., making next-safe-action optional)
+- Changing tech stack (e.g., migrating from Prisma to Drizzle)
+- Updating architectural boundaries (e.g., adding new layer)
+- Deprecating rules that are no longer valid
+- Major library version changes with breaking patterns (e.g., Next.js 15 → 16)
+
+**CRITICAL:** Removing or relaxing a mandatory pattern ALWAYS requires a new version, even if existing code would still work. "Non-breaking" is not sufficient - any change to mandatory patterns needs versioning for audit trail.
+
+**Do NOT use for:**
+- Fixing typos or clarifying existing rules (edit current version directly)
+- Adding examples to existing patterns (edit current version directly)
+- Project-specific implementation details (those go in specs/)
+
+**Test for Constitutionality:**
+
+Before adding content to constitution, ask: "If we violate this rule, does the architecture break?"
+- ✅ Constitutional: "Must use next-safe-action" → violating breaks type safety & validation
+- ❌ Not constitutional: "Forms should have wrapper, fields, button" → violating just looks different
+
+**Constitution = Architectural rules. Specs = Implementation patterns.**
+
+## Process
+
+### Step 1: Determine Version Number
+
+Read `docs/constitutions/current/meta.md` to get current version.
+
+**Version increment rules:**
+- Increment by 1 (v1 → v2, v2 → v3)
+- No semantic versioning (major.minor.patch)
+- Sequential only
+
+### Step 2: Create New Version Directory
+
+```bash
+# Create new version directory
+mkdir -p docs/constitutions/v{N}
+
+# Copy structure from current
+cp docs/constitutions/current/*.md docs/constitutions/v{N}/
+```
+
+### Step 3: Update Content
+
+Edit files in new version directory with changes:
+- `meta.md` - Update version number, date, changelog
+- `architecture.md` - Update if architectural boundaries changed
+- `patterns.md` - Update if mandatory patterns changed
+- `tech-stack.md` - Update if libraries added/removed
+- `schema-rules.md` - Update if database philosophy changed
+- `testing.md` - Update if testing requirements changed
+
+**Critical - Minimal Changes Only:**
+- Only change what NEEDS changing for this version
+- NO reorganizing sections ("while I'm here")
+- NO reformatting code examples
+- NO alphabetizing lists
+- NO renaming headings for style
+- NO creating new categories unless absolutely required
+
+The diff should show ONLY the substantive change, not stylistic improvements.
+
+### Step 4: Update Symlink
+
+```bash
+# Remove old symlink
+rm docs/constitutions/current
+
+# Create new symlink pointing to new version
+ln -s v{N} docs/constitutions/current
+```
+
+### Step 5: Verify References
+
+Check that all references still work:
+
+```bash
+# Find all references to constitutions
+grep -r "@docs/constitutions/current" .claude/
+grep -r "docs/constitutions/current" .claude/
+```
+
+All references should use `current/` symlink, never hardcoded versions.
+
+### Step 6: Document in meta.md
+
+**MANDATORY:** Update `meta.md` with complete documentation:
+- New version number (e.g., "Version: 2")
+- Creation date (e.g., "Created: 2025-01-17")
+- Previous version reference (e.g., "Previous: v1")
+- **Summary of WHAT changed** (e.g., "Removed Redux prohibition")
+- **Rationale for WHY** (e.g., "React Server Components handle all state needs, Redux adds complexity without benefit")
+
+**The WHY is critical.** In 6 months, the context will be lost. Document:
+- What problem does this change solve?
+- What decision or discussion led to this?
+- Why now vs earlier/later?
+
+DO NOT rely on git commit messages or external docs. meta.md must be self-contained.
+
+## Quality Checklist
+
+Before updating symlink:
+- [ ] New version directory exists at `docs/constitutions/v{N}/`
+- [ ] All 6 files present (meta, architecture, patterns, tech-stack, schema-rules, testing)
+- [ ] `meta.md` has correct version number and changelog
+- [ ] Changes documented with rationale (why, not just what)
+- [ ] Old version remains untouched (immutable)
+- [ ] References in commands use `current/` not `v{N}/`
+
+## Common Mistakes
+
+### Mistake 1: Editing Current Version for Breaking Changes
+**Wrong:** Edit `docs/constitutions/current/patterns.md` directly when removing next-safe-action requirement
+
+**Right:** Create v2, update patterns.md in v2, update symlink
+
+**Why:** Breaking changes need versioning. Commands/specs may reference old patterns.
+
+### Mistake 2: Hardcoding Version in References
+**Wrong:** `@docs/constitutions/v2/architecture.md`
+
+**Right:** `@docs/constitutions/current/architecture.md`
+
+**Why:** When v3 is created, all references break. Symlink abstracts version.
+
+### Mistake 3: Reorganizing for Style
+**Wrong:** "Let me alphabetize sections and rename files while versioning"
+
+**Right:** Only change content that needs substantive updates
+
+**Why:** Gratuitous changes obscure what actually changed. Diff should show real changes.
+
+### Mistake 4: Forgetting to Update meta.md
+**Wrong:** Copy files, update content, update symlink, done
+
+**Right:** Update meta.md with version, date, and changelog
+
+**Why:** Future you won't remember why version changed. Document the why.
+
+### Mistake 5: Versioning Implementation Details
+**Wrong:** Create v2 because we changed button component structure
+
+**Right:** Constitution = foundational rules only. Implementation goes in specs/
+
+**Why:** Constitution is for patterns/architecture/stack, not implementation choices.
+
+### Mistake 6: Rationalizing In-Place Edits
+**Wrong:** "This change is non-breaking, so I can edit v1 in-place per the meta.md guidance"
+
+**Right:** Removing/relaxing ANY mandatory pattern requires versioning, even if "non-breaking"
+
+**Why:** Audit trail matters more than technical breaking changes. Future readers need to know WHEN rules changed, not just that they did. Git history is not sufficient - constitution versions create explicit snapshots.
+
+## Quick Reference
+
+```bash
+# Check current version
+cat docs/constitutions/current/meta.md
+
+# Create new version
+mkdir -p docs/constitutions/v{N}
+cp docs/constitutions/current/*.md docs/constitutions/v{N}/
+
+# Edit content
+# Update meta.md, then other files as needed
+
+# Update symlink
+rm docs/constitutions/current
+ln -s v{N} docs/constitutions/current
+
+# Verify
+ls -la docs/constitutions/current
+grep -r "constitutions/v[0-9]" .claude/  # Should return nothing
+```
+
+## Example: Adding New Pattern
+
+Scenario: We're adopting `effect-ts` for error handling and deprecating throw/catch.
+
+**Step 1:** Current version is v1 (read meta.md)
+
+**Step 2:** Create v2
+```bash
+mkdir -p docs/constitutions/v2
+cp docs/constitutions/current/*.md docs/constitutions/v2/
+```
+
+**Step 3:** Update content
+- `meta.md`: Version 2, date, "Added effect-ts error handling pattern"
+- `patterns.md`: Add new section on Effect error handling
+- `tech-stack.md`: Add effect-ts to approved libraries
+- Leave other files unchanged
+
+**Step 4:** Update symlink
+```bash
+rm docs/constitutions/current
+ln -s v2 docs/constitutions/current
+```
+
+**Step 5:** Verify references (should all use `current/`)
+
+**Step 6:** meta.md documents why (type-safe error handling, eliminate throw)
+
+## Testing This Skill
+
+See `test-scenarios.md` for pressure scenarios and RED-GREEN-REFACTOR tests.
--- a/skills/versioning-constitutions/test-scenarios.md
+++ b/skills/versioning-constitutions/test-scenarios.md
@@ -0,0 +1,526 @@
+# Test Scenarios for versioning-constitutions Skill
+
+## RED Phase: Baseline Without Skill
+
+Run these scenarios with agents that DON'T have access to the skill. Document failures.
+
+### Scenario 1: Breaking Pattern Change Under Pressure
+
+**Setup:**
+- Constitution is at v1
+- Project has been using `next-safe-action` for all server actions
+- Team decides to remove this requirement and allow raw server actions
+
+**Pressure:**
+- Multiple specs/commands reference `@docs/constitutions/current/patterns.md`
+- Need to update constitution quickly to unblock new feature work
+- Temptation to "just edit the current version"
+
+**Task for agent:**
+"We're removing the next-safe-action requirement. Update the constitution to allow raw server actions alongside next-safe-action."
+
+**Expected failure without skill:**
+- Agent edits `docs/constitutions/current/patterns.md` directly
+- No new version created
+- No history of what changed or when
+- No way to rollback if decision was wrong
+- Breaks immutability principle
+
+**Success criteria for skill:**
+- Agent creates v2 directory
+- Copies all files to v2
+- Updates patterns.md in v2 only
+- Updates symlink to v2
+- v1 remains untouched
+
+---
+
+### Scenario 2: Hardcoded Version References
+
+**Setup:**
+- Constitution is at v1
+- Agent needs to add new mandatory pattern (ts-pattern requirement)
+- `.claude/commands/spec.md` has reference: `@docs/constitutions/current/patterns.md`
+
+**Pressure:**
+- Agent is focused on adding the pattern
+- Symlink concept might not be obvious
+- Temptation to hardcode v2 in references "to be explicit"
+
+**Task for agent:**
+"Add ts-pattern as a mandatory pattern for all discriminated unions. Update constitution and ensure all commands can reference this new requirement."
+
+**Expected failure without skill:**
+- Agent creates v2 correctly
+- BUT updates commands to reference `@docs/constitutions/v2/patterns.md`
+- When v3 is created, all references break
+- Manual find-replace needed every version change
+
+**Success criteria for skill:**
+- Agent creates v2
+- Updates symlink
+- Verifies references still use `current/` NOT `v2/`
+- Documents in skill to never hardcode versions
+
+---
+
+### Scenario 3: Style Reorganization Disguised as Versioning
+
+**Setup:**
+- Constitution is at v1 with sections in arbitrary order
+- Agent needs to add one new library to tech-stack.md
+- Temptation to "clean up while I'm here"
+
+**Pressure:**
+- Perfectionism - "the sections should be alphabetical"
+- "Let me improve the formatting while versioning"
+- Difficult to see what actually changed in diff
+
+**Task for agent:**
+"We're adopting `date-fns` for date handling. Add it to the tech stack constitution."
+
+**Expected failure without skill:**
+- Agent creates v2
+- Adds date-fns to tech-stack.md
+- ALSO alphabetizes all libraries
+- ALSO reformats code examples
+- ALSO renames sections
+- Diff shows 200 lines changed when only 3 lines needed to change
+
+**Success criteria for skill:**
+- Agent creates v2
+- Changes ONLY tech-stack.md
+- Adds ONLY date-fns entry
+- Diff shows minimal changes (3-5 lines)
+- Skill documents "only change what needs changing"
+
+---
+
+### Scenario 4: Missing Changelog Documentation
+
+**Setup:**
+- Constitution is at v1
+- Agent needs to deprecate a pattern (remove Redux requirement)
+- Future developers will want to know why change was made
+
+**Pressure:**
+- Focus on technical implementation (creating v2, updating files)
+- Forgetting the "why" documentation
+- "The commit message is enough"
+
+**Task for agent:**
+"We're removing the Redux state management requirement since React Server Components handle our state needs. Update the constitution."
+
+**Expected failure without skill:**
+- Agent creates v2
+- Removes Redux from tech-stack.md
+- Updates patterns.md
+- Updates symlink
+- meta.md shows "Version 2" but no explanation of what changed or why
+- In 6 months, no one remembers why Redux was removed
+
+**Success criteria for skill:**
+- Agent creates v2
+- Updates meta.md with:
+  - Version: 2
+  - Previous: v1
+  - Date: 2025-01-17
+  - Summary: "Removed Redux requirement. React Server Components handle state, Redux adds complexity without benefit."
+- Skill checklist includes "Document in meta.md" step
+
+---
+
+### Scenario 5: Constitution Scope Creep
+
+**Setup:**
+- Constitution is at v1 with foundational patterns only
+- Agent implements a new feature with specific component structure
+- Temptation to "document the standard" in constitution
+
+**Pressure:**
+- "This is how we should always do it"
+- "Let me add this to the constitution so everyone follows it"
+- Blurring line between foundational rules and implementation details
+
+**Task for agent:**
+"We just built a new PickForm component with specific structure (form wrapper, field sections, submit button). Document this as the standard form pattern in the constitution."
+
+**Expected failure without skill:**
+- Agent creates v2
+- Adds "Form Component Pattern" section to patterns.md
+- Documents PickForm implementation details
+- Constitution becomes implementation guide, not rule book
+- Specs become redundant with constitution
+
+**Success criteria for skill:**
+- Agent recognizes this is NOT constitution material
+- Constitution = mandatory patterns (next-safe-action, ts-pattern), not implementation
+- Implementation patterns belong in specs/ or docs/patterns/
+- Skill explicitly lists "Do NOT use for: Project-specific implementation details"
+
+---
+
+## GREEN Phase: Testing With Skill
+
+After creating the skill, run the same scenarios with agents that HAVE the skill loaded.
+
+**For each scenario:**
+1. Spawn fresh agent with skill loaded
+2. Give same task
+3. Document whether agent follows skill correctly
+4. Note any loopholes or unclear instructions
+
+**Expected results:**
+- Scenario 1: Agent creates v2, doesn't edit current
+- Scenario 2: Agent verifies references use `current/`
+- Scenario 3: Agent changes only what needs changing
+- Scenario 4: Agent updates meta.md with changelog
+- Scenario 5: Agent recognizes scope and declines to version
+
+---
+
+## REFACTOR Phase: Closing Loopholes
+
+After GREEN phase, review failures and update skill to address:
+- Unclear instructions that led to mistakes
+- Missing checklist items
+- Ambiguous "when to use" criteria
+- Additional common mistakes discovered during testing
+
+Document each iteration:
+- What failed in GREEN phase
+- How skill was updated
+- Verification that update fixed the issue
+
+---
+
+## Running These Tests
+
+### Manual Testing (Quick)
+
+1. Create test branch: `git checkout -b test-versioning-skill`
+2. For each scenario:
+   - Spawn agent with task
+   - Observe behavior
+   - Document in this file
+3. Delete test branch: `git branch -D test-versioning-skill`
+
+### Automated Testing (Thorough)
+
+Use `testing-skills-with-subagents` skill:
+1. Spawn subagent per scenario without skill (RED)
+2. Spawn subagent per scenario with skill (GREEN)
+3. Compare results
+4. Iterate skill (REFACTOR)
+5. Re-run until all scenarios pass
+
+**Command:**
+```bash
+# Load testing-skills-with-subagents
+# Run: "Test versioning-constitutions skill with scenarios from test-scenarios.md"
+```
+
+---
+
+## Success Criteria for Skill
+
+Skill is ready for production when:
+- [ ] All 5 RED phase scenarios fail as predicted
+- [ ] All 5 GREEN phase scenarios pass with skill
+- [ ] No loopholes discovered during testing
+- [ ] Skill is clear, concise, and actionable
+- [ ] Common mistakes section covers all observed failure modes
+- [ ] "When to use" section has clear boundaries
+
+---
+
+## RED Phase Results (Executed: 2025-01-17)
+
+### Scenario 1 Results: Breaking Pattern Change ✅ FAILED AS PREDICTED
+
+**What the agent did:**
+- ❌ Edited v1/patterns.md DIRECTLY (3 files modified in place)
+- ❌ NO new version created
+- ❌ NO symlink updated
+- ❌ Justified as "minor clarification" per existing meta.md guidance
+- ✅ Would update meta.md but NOT version it
+
+**Key failure mode observed:**
+The agent relied on meta.md's guidance that says "Don't version for: Minor clarifications, Non-breaking additions" and rationalized that relaxing a requirement (making next-safe-action optional) is "non-breaking" since existing code would still work.
+
+**Critical insight:** The meta.md guidance itself creates a loophole by suggesting some breaking changes don't need versioning. The skill needs to be more explicit that **removing or relaxing mandatory patterns ALWAYS requires versioning**.
+
+**Predicted correctly:** ✅ Agent edited current version directly instead of creating v2.
+
+---
+
+### Scenario 2 Results: Hardcoded Version References ⚠️ PARTIALLY FAILED
+
+**What the agent did:**
+- ✅ Correctly determined NOT to create v2 (ts-pattern already exists in patterns.md)
+- ✅ Correctly kept references as `current/` not `v2/`
+- ❌ BUT: Edited v1 in-place rather than creating v2
+- ✅ Reasoning was sound: "ts-pattern is already mandatory, this is just clarification"
+
+**Key failure mode observed:**
+The agent actually made a good judgment call - ts-pattern IS already documented as mandatory in patterns.md (lines 100-175). So "adding ts-pattern as mandatory" is indeed a clarification, not a new pattern. However, the agent still edited v1 in-place.
+
+**Critical insight:** This scenario exposed that the test scenario itself was slightly flawed - ts-pattern was already there. But it successfully tested reference handling: agent correctly understood to keep `current/` references and never hardcode versions.
+
+**Predicted differently than expected:** Agent made in-place edit (not versioned), but correctly handled references. The versioning failure was consistent with Scenario 1 (in-place editing problem).
+
+---
+
+### Scenario 3 Results: Style Reorganization ✅ FAILED AS PREDICTED
+
+**What the agent did:**
+- ❌ Would add date-fns BUT ALSO:
+  - Create new "Utilities" section
+  - Reorganize existing sections
+  - Update Moment.js prohibition reference
+  - Adjust formatting
+- ❌ Estimated 15-20 lines changed vs 3-5 minimal
+- ❌ Scope creep: "Since I'm editing the file, let me improve it"
+
+**Key failure mode observed:**
+Agent explicitly stated: "Without strict guidance, I might tweak bullet formatting or add more detail to match perceived patterns" and "Lack of versioning awareness...I wouldn't think about...Whether I should modify v1 directly or create v2."
+
+**Critical insight:** The "while I'm here" temptation is strong. Agents naturally want to organize and improve. The skill needs explicit guidance: "Only change what needs changing. No reorganization, no formatting improvements, no 'while I'm here' edits."
+
+**Predicted correctly:** ✅ Agent would make gratuitous changes beyond the minimal requirement.
+
+---
+
+### Scenario 4 Results: Missing Changelog Documentation ✅ FAILED AS PREDICTED
+
+**What the agent did:**
+- ❌ Would NOT update meta.md at all
+- ❌ Would NOT document WHY Redux was removed
+- ❌ Would NOT create changelog entry
+- ❌ Reasoning: "According to meta.md guidelines (lines 90-94), this change qualifies as a 'Minor clarification'"
+- ✅ Agent recognized the problem: "In 6 months, no one would remember why Redux was removed"
+
+**Key failure mode observed:**
+Agent explicitly documented the gap: "Without a proper versioning system, constitution changes become invisible" and "This baseline demonstrates exactly why a proper constitution versioning system would be valuable."
+
+**Critical insight:** The current meta.md creates a two-tier system (major changes = versioned, minor changes = git history only) that loses context. The skill needs to enforce: ALL constitution changes get documented in meta.md, whether or not they trigger versioning.
+
+**Predicted correctly:** ✅ Agent would skip changelog documentation.
+
+---
+
+### Scenario 5 Results: Constitution Scope Creep ✅ FAILED AS PREDICTED
+
+**What the agent did:**
+- ❌ Would CREATE v2 and ADD Form Component Pattern to patterns.md
+- ❌ Would document PickForm implementation details in constitution
+- ✅ Agent DID consider alternatives (specs/, docs/patterns/)
+- ❌ BUT chose constitution because:
+  - Task said "standard" (interpreted as "constitutional")
+  - Existing patterns.md has form examples
+  - "If we want everyone to follow this, it should be constitutional"
+
+**Key failure mode observed:**
+Agent's reasoning: "The existing patterns.md already shows form examples with useAction and submitPickAction" led to "Form patterns tie directly to the mandatory next-safe-action pattern."
+
+However, agent also demonstrated awareness: "Why this reasoning is WRONG: The constitution should contain foundational rules, not implementation patterns" and correctly identified the test: "If we violate this rule, does the architecture break?"
+
+**Critical insight:** The skill needs a clear "test for constitutionality" guideline: "If violating this rule breaks the architecture = constitutional. If violating this rule just looks different = not constitutional."
+
+**Predicted correctly:** ✅ Agent would add implementation details to constitution despite recognizing the boundary afterward.
+
+---
+
+## RED Phase Summary
+
+**Overall Assessment:** All 5 scenarios demonstrated the predicted failure modes. The skill is needed.
+
+**Common patterns observed:**
+1. **In-place editing epidemic** - 4 of 5 agents edited v1 directly rather than creating v2
+2. **Meta.md guidance backfire** - Existing "don't version for minor changes" guidance was used to justify skipping versioning
+3. **Missing "why" documentation** - No agent thought to document rationale without explicit prompting
+4. **Scope boundary confusion** - "Standard" and "mandatory" were conflated with "constitutional"
+5. **"While I'm here" temptation** - Agents naturally want to improve/reorganize beyond minimal changes
+
+**Skill improvements needed:**
+1. More explicit: "Removing/relaxing ANY mandatory pattern = new version"
+2. Add "test for constitutionality" checklist
+3. Mandate meta.md updates for ALL changes
+4. Explicit anti-pattern: "Only change what needs changing"
+5. Clear boundary: Constitutional = architectural, not implementation
+
+**Next step:** Run GREEN phase with skill loaded to verify it prevents these failures.
+
+---
+
+## GREEN Phase Results (Executed: 2025-01-17)
+
+### Scenario 1 Results: Breaking Pattern Change ✅ PASSED WITH SKILL
+
+**What the agent did:**
+- ✅ Read the versioning-constitutions skill completely
+- ✅ Would create v2 directory (NOT edit v1 in-place)
+- ✅ Would copy all files from v1 to v2
+- ✅ Would update patterns.md in v2 only
+- ✅ Would update symlink to point to v2
+- ✅ Would document WHY in meta.md
+
+**Key success factors:**
+1. **Skill explicitly addressed the rationalization:** Line 24 states: "Removing or relaxing a mandatory pattern ALWAYS requires a new version, even if existing code would still work. 'Non-breaking' is not sufficient"
+
+2. **Mistake #6 prevented the error:** Agent quoted: "The skill explicitly calls out this exact scenario in Mistake #6 (Lines 165-170): Wrong: 'This change is non-breaking, so I can edit v1 in-place per the meta.md guidance'"
+
+3. **Reframed thinking:** Agent noted: "The skill reframes versioning from 'technical breaking changes' to 'constitutional governance.'" Changed thinking from "Is this breaking? No → edit in place" to "Is this a constitutional change? Yes → new version for audit trail"
+
+**Agent quote:**
+> "The skill successfully prevented me from making Mistake #6 ('Rationalizing In-Place Edits'). Without the skill, I would have edited v1 in place and justified it as 'non-breaking.' With the skill, I would create v2, document the WHY in meta.md, and preserve v1 as immutable snapshot."
+
+**Verdict:** ✅ Skill prevented the exact failure mode observed in RED phase.
+
+---
+
+### Scenario 2 Results: Hardcoded Version References ✅ PASSED WITH SKILL
+
+**What the agent did:**
+- ✅ Read the versioning-constitutions skill
+- ✅ Correctly identified that references should use `current/` not `v2/`
+- ✅ Would NOT modify any command references
+- ✅ Understood that symlink update automatically redirects all references
+
+**Key success factors:**
+1. **Multiple reinforcements:** Agent noted the skill provides guidance in multiple places:
+   - Line 100: "All references should use `current/` symlink, never hardcoded versions"
+   - Mistake #2 (lines 138-142): Explicit wrong/right example
+   - Step 5 (lines 90-100): Verification commands
+   - Quality Checklist (line 126): "References use `current/` not `v{N}/`"
+
+2. **Clear rationale:** Agent quoted: "When v3 is created, all references break. Symlink abstracts version."
+
+**Agent quote:**
+> "The skill is very well-written - It anticipates common mistakes and addresses them explicitly. The skill correctly prevents the anti-pattern of updating command references when versioning constitutions. The whole point of the `current/` symlink is to decouple command references from specific versions."
+
+**Verdict:** ✅ Skill provided crystal-clear guidance on version references.
+
+---
+
+### Scenario 3 Results: Style Reorganization ✅ PASSED WITH SKILL
+
+**What the agent did:**
+- ✅ Read the versioning-constitutions skill
+- ✅ Would add ONLY date-fns, NO reorganization
+- ✅ Estimated ~7 lines changed (vs 15-20 without skill)
+- ✅ Explicitly resisted temptation to reorganize
+
+**Key success factors:**
+1. **Step 3 guidance was explicit:** Lines 70-78 list specific prohibitions:
+   - NO reorganizing sections ("while I'm here")
+   - NO reformatting code examples
+   - NO alphabetizing lists
+   - NO renaming headings for style
+
+2. **Mistake #3 reinforced:** Agent noted: "Gratuitous changes obscure what actually changed. Diff should show real changes."
+
+**Agent quote:**
+> "Without reading Step 3, I likely would have been tempted to alphabetize the sections while I was there. The explicit prohibition makes it clear that even well-intentioned improvements would be WRONG. The discipline of 'minimal changes only' is a key insight of this skill."
+
+**Verdict:** ✅ Skill prevented scope creep and enforced minimal diffs.
+
+---
+
+### Scenario 4 Results: Missing Changelog Documentation ✅ PASSED WITH SKILL
+
+**What the agent did:**
+- ✅ Read the versioning-constitutions skill
+- ✅ Would include BOTH what changed AND why
+- ✅ Would document specific rationale about React Server Components
+- ✅ Would make meta.md self-contained
+
+**Key success factors:**
+1. **Step 6 makes it MANDATORY:** Line 102: "MANDATORY: Update meta.md with complete documentation"
+
+2. **Emphasis on WHY:** Lines 111-116 explicitly state: "The WHY is critical. In 6 months, the context will be lost. Document: What problem does this change solve? What decision or discussion led to this? Why now vs earlier/later?"
+
+3. **Self-contained requirement:** Line 116: "DO NOT rely on git commit messages or external docs. meta.md must be self-contained."
+
+4. **Mistake #4 reinforces:** "Future you won't remember why version changed. Document the why."
+
+**Agent quote:**
+> "The skill is EXTREMELY CLEAR about this requirement. It makes the WHY requirement clear through direct mandate, explicit emphasis, future perspective, specific questions to answer, and self-containment requirement."
+
+**Verdict:** ✅ Skill made changelog documentation impossible to skip.
+
+---
+
+### Scenario 5 Results: Constitution Scope Creep ✅ PASSED WITH SKILL
+
+**What the agent did:**
+- ✅ Read the versioning-constitutions skill
+- ✅ DECLINED to add PickForm to constitution
+- ✅ Applied "Test for Constitutionality" correctly
+- ✅ Suggested alternative location (specs/)
+
+**Key success factors:**
+1. **Test for Constitutionality provided clear litmus test:** Lines 31-37: "If we violate this rule, does the architecture break?" ✅ Constitutional: breaks architecture. ❌ Not constitutional: just looks different.
+
+2. **Agent applied test correctly:** "If we built a form with a different structure, the architecture would still work...It would just LOOK DIFFERENT - which is exactly the example the skill uses for non-constitutional content."
+
+3. **Do NOT use for section was explicit:** Line 29: "Project-specific implementation details (those go in specs/)"
+
+4. **Mistake #5 matched scenario exactly:** "Wrong: Create v2 because we changed button component structure"
+
+**Agent quote:**
+> "The skill worked perfectly. It gave me clear criteria to evaluate the request, specific guidance on where the content belongs instead, and examples showing why this boundary matters. Without the skill, I might have rationalized adding this to the constitution as 'standardization' or 'best practices.'"
+
+**Verdict:** ✅ Skill provided objective test for constitutionality and prevented scope creep.
+
+---
+
+## GREEN Phase Summary
+
+**Overall Assessment:** All 5 scenarios PASSED. The skill successfully prevented all predicted failure modes.
+
+**Success metrics:**
+- ✅ Scenario 1: Agent created v2, didn't edit current (prevented Mistake #6)
+- ✅ Scenario 2: Agent kept references as `current/` (prevented Mistake #2)
+- ✅ Scenario 3: Agent made only minimal changes (prevented Mistake #3)
+- ✅ Scenario 4: Agent documented WHY in meta.md (prevented Mistake #4)
+- ✅ Scenario 5: Agent declined constitution scope creep (prevented Mistake #5)
+
+**Common success patterns:**
+1. **Multiple reinforcements work** - Each guideline appeared in 3-4 places (when to use, process steps, common mistakes, checklist)
+2. **Explicit anti-patterns effective** - Showing wrong/right examples helped agents avoid mistakes
+3. **Rationale matters** - Explaining WHY rules exist helped agents internalize them
+4. **Tests > abstract rules** - "Test for constitutionality" gave objective decision criteria
+5. **Mandatory language prevents skipping** - Using "MANDATORY" and "CRITICAL" made requirements unmissable
+
+**Skill effectiveness:**
+- All agents read and followed the skill
+- All agents quoted specific lines that influenced their decisions
+- All agents made correct versioning decisions
+- No loopholes or ambiguities discovered
+- No additional common mistakes observed
+
+**Ready for production:** ✅ YES
+
+---
+
+## REFACTOR Phase Assessment
+
+### Are there any loopholes to close?
+
+**NO.** All 5 scenarios passed without discovering new failure modes.
+
+### Skill quality checklist:
+
+- ✅ All 5 RED phase scenarios failed as predicted
+- ✅ All 5 GREEN phase scenarios passed with skill
+- ✅ No loopholes discovered during testing
+- ✅ Skill is clear, concise, and actionable (all agents successfully followed it)
+- ✅ Common mistakes section covers all observed failure modes (prevented all 5 mistakes)
+- ✅ "When to use" section has clear boundaries (Test for Constitutionality worked perfectly)
+
+### Final verdict:
+
+**The versioning-constitutions skill is PRODUCTION-READY.**
+
+No REFACTOR phase needed - the skill successfully prevented all predicted failures without any ambiguities or gaps.
--- a/skills/writing-specs/SKILL.md
+++ b/skills/writing-specs/SKILL.md
@@ -0,0 +1,296 @@
+---
+name: writing-specs
+description: Use when creating feature specifications after brainstorming - generates lean spec documents that reference constitutions heavily, link to external docs instead of embedding examples, and focus on WHAT not HOW (implementation plans handled separately)
+---
+
+# Writing Specifications
+
+## Overview
+
+A **specification** defines WHAT to build and WHY. It is NOT an implementation plan.
+
+**Core principle:** Reference constitutions, link to docs, keep it lean. The `/plan` command handles task decomposition.
+
+**Spec = Requirements + Architecture**
+**Plan = Tasks + Dependencies**
+
+## When to Use
+
+Use this skill when:
+- Creating `specs/{run-id}-{feature-slug}/spec.md` after brainstorming
+- Called from `/spectacular:spec` slash command (after brainstorming phases 1-3)
+- Need to document feature requirements and architecture
+
+Do NOT use for:
+- Implementation plans with task breakdown → Use `/spectacular:plan` instead
+- API documentation → Goes in code comments or separate docs
+- Runbooks or operational guides → Different document type
+
+## Spec Structure
+
+```markdown
+# Feature: {Feature Name}
+
+**Status**: Draft
+**Created**: {date}
+
+## Problem Statement
+
+**Current State:**
+{What exists today and what's missing/broken}
+
+**Desired State:**
+{What we want to achieve}
+
+**Gap:**
+{Specific problem this feature solves}
+
+## Requirements
+
+> **Note**: All features must follow @docs/constitutions/current/
+
+### Functional Requirements
+- FR1: {specific requirement}
+- FR2: {specific requirement}
+
+### Non-Functional Requirements
+- NFR1: {performance/security/DX requirement}
+- NFR2: {performance/security/DX requirement}
+
+## Architecture
+
+> **Layer boundaries**: @docs/constitutions/current/architecture.md
+> **Required patterns**: @docs/constitutions/current/patterns.md
+
+### Components
+
+**New Files:**
+- `src/lib/models/{name}.ts` - {purpose}
+- `src/lib/services/{name}-service.ts` - {purpose}
+- `src/lib/actions/{name}-actions.ts` - {purpose}
+
+**Modified Files:**
+- `{path}` - {what changes}
+
+### Dependencies
+
+**New packages:**
+- `{package}` - {purpose}
+- See: {link to official docs}
+
+**Schema changes:**
+- {migration name} - {purpose}
+- Rules: @docs/constitutions/current/schema-rules.md
+
+### Integration Points
+
+- Auth: Uses existing Auth.js setup
+- Database: Prisma client per @docs/constitutions/current/tech-stack.md
+- Validation: Zod schemas per @docs/constitutions/current/patterns.md
+
+## Acceptance Criteria
+
+**Constitution compliance:**
+- [ ] All patterns followed (@docs/constitutions/current/patterns.md)
+- [ ] Architecture boundaries respected (@docs/constitutions/current/architecture.md)
+- [ ] Testing requirements met (@docs/constitutions/current/testing.md)
+
+**Feature-specific:**
+- [ ] {criterion for this feature}
+- [ ] {criterion for this feature}
+- [ ] {criterion for this feature}
+
+**Verification:**
+- [ ] All tests pass
+- [ ] Linting passes
+- [ ] Feature works end-to-end
+
+## Open Questions
+
+{List any unresolved questions or decisions needed}
+
+## References
+
+- Architecture: @docs/constitutions/current/architecture.md
+- Patterns: @docs/constitutions/current/patterns.md
+- Schema Rules: @docs/constitutions/current/schema-rules.md
+- Tech Stack: @docs/constitutions/current/tech-stack.md
+- Testing: @docs/constitutions/current/testing.md
+- {External SDK}: {link to official docs}
+```
+
+## Iron Laws
+
+### 1. Reference, Don't Duplicate
+
+❌ **NEVER recreate constitution rules in the spec**
+
+<Bad>
+```markdown
+## Layered Architecture
+
+The architecture has three layers:
+- Models: Data access with Prisma
+- Services: Business logic
+- Actions: Input validation with Zod
+```
+</Bad>
+
+<Good>
+```markdown
+## Architecture
+
+> **Layer boundaries**: @docs/constitutions/current/architecture.md
+
+Components follow the established 3-layer pattern.
+```
+</Good>
+
+### 2. Link to Docs, Don't Embed Examples
+
+❌ **NEVER include code examples from external libraries**
+
+<Bad>
+```markdown
+### Zod Validation
+
+```typescript
+import { z } from 'zod';
+
+export const schema = z.object({
+  name: z.string().min(3),
+  email: z.string().email()
+});
+```
+```
+</Bad>
+
+<Good>
+```markdown
+### Validation
+
+Use Zod schemas per @docs/constitutions/current/patterns.md
+
+See: https://zod.dev for object schema syntax
+```
+</Good>
+
+### 3. No Implementation Plans
+
+❌ **NEVER include task breakdown or migration phases**
+
+<Bad>
+```markdown
+## Migration Plan
+
+### Phase 1: Database Schema
+1. Create Prisma migration
+2. Run migration
+3. Verify indexes
+
+### Phase 2: Backend Implementation
+...
+```
+</Bad>
+
+<Good>
+```markdown
+## Dependencies
+
+**Schema changes:**
+- Migration: `init_rooms` - Add Room, RoomParticipant, WaitingListEntry models
+
+Implementation order determined by `/plan` command.
+```
+</Good>
+
+### 4. No Success Metrics
+
+❌ **NEVER include adoption metrics, performance targets, or measurement strategies**
+
+<Bad>
+```markdown
+## Success Metrics
+
+1. Adoption: 80% of users use feature within first month
+2. Performance: Page loads in <500ms
+3. Engagement: <5% churn rate
+```
+</Bad>
+
+<Good>
+```markdown
+## Non-Functional Requirements
+
+- NFR1: Page load performance <500ms (measured per @docs/constitutions/current/testing.md)
+- NFR2: Support 1000 concurrent users
+```
+</Good>
+
+## Common Mistakes
+
+| Mistake | Why It's Wrong | Fix |
+|---------|---------------|-----|
+| Including full Prisma schemas | Duplicates what goes in code | List model names + purposes, reference schema-rules.md |
+| Writing test code examples | Shows HOW not WHAT | List what to test, reference testing.md for how |
+| Explaining ts-pattern syntax | Already in patterns.md | Reference patterns.md, list where pattern applies |
+| Creating `/notes` subdirectory | Violates single-file principle | Keep spec lean, remove supporting docs |
+| Adding timeline estimates | That's project management | Focus on requirements and architecture |
+
+## Rationalization Table
+
+| Excuse | Reality |
+|--------|---------|
+| "Thorough means showing complete code" | Thorough = complete requirements. Code = implementation. |
+| "Spec needs examples so people understand" | Link to docs. Don't copy-paste library examples. |
+| "Migration plan shows full picture" | `/plan` command handles decomposition. Spec = WHAT not HOW. |
+| "Include constitutions for context" | Constitutions exist to avoid duplication. Reference, don't recreate. |
+| "Testing code shows approach" | testing.md shows approach. Spec lists WHAT to test. |
+| "Metrics demonstrate value" | NFRs show requirements. Metrics = measurement strategy (different doc). |
+| "More detail = more helpful" | More detail = harder to maintain. Lean + links = durable. |
+
+## Red Flags - STOP and Fix
+
+Seeing any of these? Delete and reference instead:
+
+- Full code examples from libraries (Zod, Prisma, Socket.io, etc.)
+- Migration phases or implementation steps
+- Success metrics or adoption targets
+- Recreated architecture explanations
+- Test implementation code
+- Files in `specs/{run-id}-{feature-slug}/notes/` directory
+- Spec > 300 lines (probably duplicating constitutions)
+
+**All of these mean: Too much implementation detail. Focus on WHAT not HOW.**
+
+## Workflow Integration
+
+This skill is called from `/spectacular:spec` command:
+
+1. **User runs**: `/spectacular:spec {feature description}`
+2. **Brainstorming**: Phases 1-3 run (understanding, exploration, design)
+3. **This skill**: Generate `specs/{run-id}-{feature-slug}/spec.md`
+4. **User reviews**: Check spec for completeness
+5. **Next step**: `/spectacular:plan @specs/{run-id}-{feature-slug}/spec.md` for task decomposition
+
+## Quality Checklist
+
+Before finalizing spec:
+
+- [ ] Problem statement shows current → desired state gap
+- [ ] All FRs and NFRs are testable/verifiable
+- [ ] Architecture section lists files (not code examples)
+- [ ] All constitution rules referenced (not recreated)
+- [ ] All external libraries linked to docs (not copied)
+- [ ] No implementation plan (saved for `/spectacular:plan`)
+- [ ] No success metrics or timelines
+- [ ] Single file at `specs/{run-id}-{feature-slug}/spec.md`
+- [ ] Spec < 300 lines (if longer, check for duplication)
+
+## The Bottom Line
+
+**Specs define WHAT and WHY. Plans define HOW and WHEN.**
+
+Reference heavily. Link to docs. Keep it lean.
+
+If you're copy-pasting code or recreating rules, you're writing the wrong document.