Initial commit
This commit is contained in:
552
skills/decomposing-tasks/SKILL.md
Normal file
552
skills/decomposing-tasks/SKILL.md
Normal file
@@ -0,0 +1,552 @@
|
||||
---
|
||||
name: decomposing-tasks
|
||||
description: Use when you have a complete feature spec and need to plan implementation - analyzes task dependencies, groups into sequential/parallel phases, validates task quality (no XL tasks, explicit files), and calculates parallelization time savings
|
||||
---
|
||||
|
||||
# Task Decomposition
|
||||
|
||||
Analyze a feature specification and decompose it into an execution-ready plan with automatic phase grouping based on file dependencies.
|
||||
|
||||
**When to use:** After completing a feature spec, before implementation.
|
||||
|
||||
**Announce:** "I'm using the Task Decomposition skill to create an execution plan."
|
||||
|
||||
## Overview
|
||||
|
||||
This skill transforms a feature specification into a structured implementation plan by:
|
||||
|
||||
1. Extracting tasks from spec
|
||||
2. Analyzing file dependencies between tasks
|
||||
3. Grouping into phases (sequential or parallel)
|
||||
4. Validating task quality
|
||||
5. Outputting executable plan.md
|
||||
|
||||
## PR-Sized Chunks Philosophy
|
||||
|
||||
**Tasks should be PR-sized, thematically coherent units** - not mechanical file-by-file splits.
|
||||
|
||||
**Think like a senior engineer:**
|
||||
|
||||
- ❌ "Add schema" + "Install dependency" + "Add routes" (3 tiny tasks)
|
||||
- ✅ "Database Foundation" (schema + migration + dependencies as one unit)
|
||||
|
||||
**Task chunking principles:**
|
||||
|
||||
1. **Thematic Coherence** - Task represents a complete "thing"
|
||||
|
||||
- Complete subsystem (agent system with tools + config + types)
|
||||
- Complete layer (all service methods for a feature)
|
||||
- Complete feature slice (UI flow from form to preview to confirm)
|
||||
|
||||
2. **Natural PR Size** - Reviewable in one sitting (4-7h)
|
||||
|
||||
- M (3-5h): Sweet spot for most tasks
|
||||
- L (5-7h): Complex but coherent units (full UI layer, complete API surface)
|
||||
- S (1-2h): Rare - only for truly standalone work
|
||||
|
||||
3. **Logical Boundaries** - Clear separation points
|
||||
|
||||
- Layer boundaries (Models, Services, Actions, UI)
|
||||
- Subsystem boundaries (Agent, Import Service, API)
|
||||
- Feature boundaries (Auth, Import, Dashboard)
|
||||
|
||||
4. **Stackable** - Dependencies flow cleanly
|
||||
- Database → Logic → API → UI
|
||||
- Foundation → Core → Integration
|
||||
|
||||
**Good chunking examples:**
|
||||
|
||||
```
|
||||
✅ GOOD: PR-sized, thematic chunks
|
||||
- Task 1: Database Foundation (M - 4h)
|
||||
- Schema changes + migration + dependency install
|
||||
- One coherent "foundation" PR
|
||||
|
||||
- Task 2: Agent System (L - 6h)
|
||||
- Agent config + tools + schemas + types
|
||||
- Complete agent subsystem as a unit
|
||||
|
||||
- Task 3: Import Service Layer (M - 4h)
|
||||
- All service methods + business logic
|
||||
- Clean layer boundary
|
||||
|
||||
- Task 4: API Surface (L - 6h)
|
||||
- Server actions + SSE route
|
||||
- Complete API interface
|
||||
|
||||
- Task 5: Import UI (L - 7h)
|
||||
- All components + page + integration
|
||||
- Complete user-facing feature
|
||||
|
||||
Total: 5 tasks, 27h
|
||||
Each task is a reviewable PR that adds value
|
||||
```
|
||||
|
||||
```
|
||||
❌ BAD: Too granular, mechanical splits
|
||||
- Task 1: Add schema fields (S - 2h)
|
||||
- Task 2: Create migration (S - 1h)
|
||||
- Task 3: Install dependency (S - 1h)
|
||||
- Task 4: Create agent config (M - 3h)
|
||||
- Task 5: Create fetch tool (S - 1h)
|
||||
- Task 6: Create schemas (S - 2h)
|
||||
- Task 7: Create service (M - 4h)
|
||||
- Task 8: Create actions (M - 3h)
|
||||
- Task 9: Create SSE route (M - 3h)
|
||||
- Task 10: Create form component (S - 2h)
|
||||
- Task 11: Create progress component (S - 2h)
|
||||
- Task 12: Create preview component (M - 2h)
|
||||
- Task 13: Add routes (S - 1h)
|
||||
- Task 14: Integrate components (S - 1h)
|
||||
|
||||
Total: 14 tasks, 28h
|
||||
Too many tiny PRs, no coherent units
|
||||
```
|
||||
|
||||
**Bundling heuristics:**
|
||||
|
||||
If you're creating S tasks, ask:
|
||||
|
||||
- Can this bundle with a related M task?
|
||||
- Does this complete a subsystem or layer?
|
||||
- Would a senior engineer create a separate PR for this?
|
||||
|
||||
**Common bundling patterns:**
|
||||
|
||||
- Schema + migration + dependencies → "Database Foundation"
|
||||
- Agent + tools + schemas → "Agent System"
|
||||
- Service + helper functions → "Service Layer"
|
||||
- Actions + API routes → "API Layer"
|
||||
- All UI components for a flow → "UI Layer"
|
||||
|
||||
## The Process
|
||||
|
||||
### Step 1: Read Spec and Extract/Design Tasks
|
||||
|
||||
Read the spec file and extract tasks. The spec may provide tasks in two ways:
|
||||
|
||||
**Option A: Spec has "Implementation Plan" section** (structured task breakdown)
|
||||
- Extract tasks directly from this section
|
||||
- Each task should have: ID, description, files, complexity, acceptance criteria
|
||||
|
||||
**Option B: Spec has no "Implementation Plan"** (lean spec - requirements only)
|
||||
- Analyze the requirements and design task breakdown yourself
|
||||
- Look at: Functional Requirements, Architecture section, Files to Create/Modify
|
||||
- Design PR-sized chunks following the chunking philosophy above
|
||||
- Create tasks that implement all requirements
|
||||
|
||||
For each task (extracted or designed), capture:
|
||||
|
||||
- **Task ID** (from heading)
|
||||
- **Description** (what to implement)
|
||||
- **Files** (explicit paths from spec)
|
||||
- **Complexity** (S/M/L/XL - estimated hours)
|
||||
- **Acceptance Criteria** (checklist items)
|
||||
- **Implementation Steps** (detailed steps)
|
||||
|
||||
**Example extraction:**
|
||||
|
||||
```markdown
|
||||
Spec has:
|
||||
|
||||
### Task 1: Database Schema
|
||||
|
||||
**Complexity**: M (2-4h)
|
||||
**Files**:
|
||||
|
||||
- prisma/schema.prisma
|
||||
- prisma/migrations/
|
||||
|
||||
**Description**: Add VerificationToken model for Auth.js...
|
||||
|
||||
**Acceptance**:
|
||||
|
||||
- [ ] Model matches Auth.js spec
|
||||
- [ ] Migration runs cleanly
|
||||
|
||||
Extract to:
|
||||
{
|
||||
id: "task-1-database-schema",
|
||||
description: "Add VerificationToken model",
|
||||
files: ["prisma/schema.prisma", "prisma/migrations/"],
|
||||
complexity: "M",
|
||||
estimated_hours: 3,
|
||||
acceptance_criteria: [...],
|
||||
steps: [...]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Validate Task Quality & Chunking
|
||||
|
||||
For each task, check for quality issues:
|
||||
|
||||
**CRITICAL (must fix):**
|
||||
|
||||
- ❌ XL complexity (>8h) → Must split into M/L tasks
|
||||
- ❌ No files specified → Must add explicit file paths
|
||||
- ❌ No acceptance criteria → Must add 3-5 testable criteria
|
||||
- ❌ Wildcard patterns (`src/**/*.ts`) → Must use explicit paths
|
||||
- ❌ Too many S tasks (>30% of total) → Bundle into thematic M/L tasks
|
||||
|
||||
**HIGH (strongly recommend):**
|
||||
|
||||
- ⚠️ Standalone S task that could bundle with related work
|
||||
- ⚠️ L complexity (5-8h) → Verify it's a coherent unit, not arbitrary split
|
||||
- ⚠️ >10 files → Likely too large, consider splitting by subsystem
|
||||
- ⚠️ <50 char description → Add more detail about what subsystem/layer this completes
|
||||
- ⚠️ <3 acceptance criteria → Add more specific criteria
|
||||
|
||||
**Chunking validation:**
|
||||
|
||||
- If task is S (1-2h), verify it's truly standalone:
|
||||
|
||||
- Can't be bundled with schema/migration/dependencies?
|
||||
- Can't be bundled with related service/action/component?
|
||||
- Would a senior engineer create a separate PR for this?
|
||||
|
||||
- If >50% of tasks are S, that's a red flag:
|
||||
- Likely too granular
|
||||
- Missing thematic coherence
|
||||
- Bundle related S tasks into M tasks
|
||||
|
||||
**If CRITICAL issues found:**
|
||||
|
||||
- STOP and report issues to user
|
||||
- User must update spec or adjust chunking
|
||||
- Re-run skill after fixes
|
||||
|
||||
**If only HIGH issues:**
|
||||
|
||||
- Report warnings
|
||||
- Offer to continue or fix
|
||||
|
||||
### Step 3: Analyze File Dependencies
|
||||
|
||||
Build dependency graph by analyzing file overlaps:
|
||||
|
||||
**Algorithm:**
|
||||
|
||||
```
|
||||
For each task T1:
|
||||
For each task T2 (where T2 appears after T1):
|
||||
shared_files = intersection(T1.files, T2.files)
|
||||
|
||||
If shared_files is not empty:
|
||||
T2.dependencies.add(T1.id)
|
||||
T2.dependency_reason = "Shares files: {shared_files}"
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```
|
||||
Task 1: ["prisma/schema.prisma"]
|
||||
Task 2: ["src/lib/models/auth.ts"]
|
||||
Task 3: ["prisma/schema.prisma", "src/types/auth.ts"]
|
||||
|
||||
Analysis:
|
||||
- Task 2: No dependencies (no shared files with Task 1)
|
||||
- Task 3: Depends on Task 1 (shares prisma/schema.prisma)
|
||||
```
|
||||
|
||||
**Architectural dependencies:**
|
||||
Also add dependencies based on layer order:
|
||||
|
||||
- Models → Services → Actions → UI
|
||||
- Database → Types → Logic → API → Components
|
||||
|
||||
### Step 4: Group into Phases
|
||||
|
||||
Group tasks into phases using dependency graph:
|
||||
|
||||
**Phase grouping algorithm:**
|
||||
|
||||
```
|
||||
1. Start with tasks that have no dependencies (roots)
|
||||
2. Group all independent roots into Phase 1
|
||||
3. Remove roots from graph
|
||||
4. Repeat until all tasks grouped
|
||||
|
||||
For each phase:
|
||||
- If all tasks independent: strategy = "parallel"
|
||||
- If any dependencies exist: strategy = "sequential"
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```
|
||||
Tasks:
|
||||
- Task 1: [] (no deps)
|
||||
- Task 2: [] (no deps)
|
||||
- Task 3: [task-1, task-2]
|
||||
- Task 4: [task-3]
|
||||
|
||||
Grouping:
|
||||
Phase 1: [Task 1, Task 2] - parallel (independent)
|
||||
Phase 2: [Task 3] - sequential (waits for Phase 1)
|
||||
Phase 3: [Task 4] - sequential (waits for Phase 2)
|
||||
```
|
||||
|
||||
### Step 5: Calculate Execution Estimates
|
||||
|
||||
For each phase, calculate:
|
||||
|
||||
- **Sequential time**: Sum of all task hours
|
||||
- **Parallel time**: Max of all task hours (if parallel strategy)
|
||||
- **Time savings**: Sequential - Parallel
|
||||
|
||||
**Example:**
|
||||
|
||||
```
|
||||
Phase 2 (parallel):
|
||||
- Task A: 3h
|
||||
- Task B: 2h
|
||||
- Task C: 4h
|
||||
|
||||
Sequential: 3 + 2 + 4 = 9h
|
||||
Parallel: max(3, 2, 4) = 4h
|
||||
Savings: 9 - 4 = 5h (56% faster)
|
||||
```
|
||||
|
||||
### Step 6: Generate plan.md
|
||||
|
||||
Write plan to `{spec-directory}/plan.md`:
|
||||
|
||||
**Template:**
|
||||
|
||||
````markdown
|
||||
# Feature: {Feature Name} - Implementation Plan
|
||||
|
||||
> **Generated by:** Task Decomposition skill
|
||||
> **From spec:** {spec-path}
|
||||
> **Created:** {date}
|
||||
|
||||
## Execution Summary
|
||||
|
||||
- **Total Tasks**: {count}
|
||||
- **Total Phases**: {count}
|
||||
- **Sequential Time**: {hours}h
|
||||
- **Parallel Time**: {hours}h
|
||||
- **Time Savings**: {hours}h ({percent}%)
|
||||
|
||||
**Parallel Opportunities:**
|
||||
|
||||
- Phase {id}: {task-count} tasks ({hours}h saved)
|
||||
|
||||
---
|
||||
|
||||
## Phase {N}: {Phase Name}
|
||||
|
||||
**Strategy**: {sequential|parallel}
|
||||
**Reason**: {why this strategy}
|
||||
|
||||
### Task {ID}: {Name}
|
||||
|
||||
**Files**:
|
||||
|
||||
- {file-path-1}
|
||||
- {file-path-2}
|
||||
|
||||
**Complexity**: {S|M|L} ({hours}h)
|
||||
|
||||
**Dependencies**: {[task-ids] or "None"}
|
||||
|
||||
**Description**:
|
||||
{What to implement and why}
|
||||
|
||||
**Implementation Steps**:
|
||||
|
||||
1. {step-1}
|
||||
2. {step-2}
|
||||
3. {step-3}
|
||||
|
||||
**Acceptance Criteria**:
|
||||
|
||||
- [ ] {criterion-1}
|
||||
- [ ] {criterion-2}
|
||||
- [ ] {criterion-3}
|
||||
|
||||
**Mandatory Patterns**:
|
||||
|
||||
> **Constitution**: All code must follow @docs/constitutions/current/
|
||||
|
||||
See architecture.md for layer boundaries and patterns.md for required patterns.
|
||||
|
||||
**TDD**: Follow `test-driven-development` skill (write test first, watch fail, minimal code, watch pass)
|
||||
|
||||
**Quality Gates**:
|
||||
|
||||
```bash
|
||||
pnpm biome check --write .
|
||||
pnpm test {test-files}
|
||||
```
|
||||
````
|
||||
|
||||
---
|
||||
|
||||
{Repeat for all tasks in all phases}
|
||||
|
||||
````
|
||||
|
||||
### Step 7: Report to User
|
||||
|
||||
After generating plan:
|
||||
|
||||
```markdown
|
||||
✅ Task Decomposition Complete
|
||||
|
||||
**Plan Location**: specs/{run-id}-{feature-slug}/plan.md
|
||||
|
||||
## Breakdown
|
||||
- Phases: {count}
|
||||
- Tasks: {count}
|
||||
- Complexity: {XL}: {n}, {L}: {n}, {M}: {n}, {S}: {n}
|
||||
|
||||
## Execution Strategy
|
||||
- Sequential Phases: {count} ({tasks} tasks)
|
||||
- Parallel Phases: {count} ({tasks} tasks)
|
||||
|
||||
## Time Estimates
|
||||
- Sequential Execution: {hours}h
|
||||
- With Parallelization: {hours}h
|
||||
- **Time Savings: {hours}h ({percent}% faster)**
|
||||
|
||||
## Next Steps
|
||||
|
||||
Review plan:
|
||||
```bash
|
||||
cat specs/{run-id}-{feature-slug}/plan.md
|
||||
````
|
||||
|
||||
Execute plan:
|
||||
|
||||
```bash
|
||||
/spectacular:execute @specs/{run-id}-{feature-slug}/plan.md
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
## Quality Rules
|
||||
|
||||
**Task Sizing (PR-focused):**
|
||||
- ⚠️ S (1-2h): Rare - only truly standalone work (e.g., config-only changes)
|
||||
- Most S tasks should bundle into M
|
||||
- Ask: "Would a senior engineer PR this alone?"
|
||||
- ✅ M (3-5h): Sweet spot - most tasks should be this size
|
||||
- Complete subsystem, layer, or feature slice
|
||||
- Reviewable in one sitting
|
||||
- Thematically coherent unit
|
||||
- ✅ L (5-7h): Complex coherent units (use for major subsystems)
|
||||
- Full UI layer with all components
|
||||
- Complete API surface (actions + routes)
|
||||
- Major feature integration
|
||||
- ❌ XL (>8h): NEVER - always split into M/L tasks
|
||||
|
||||
**Chunking Standards:**
|
||||
- ❌ <30% S tasks is a red flag (too granular)
|
||||
- ✅ Most tasks should be M (60-80%)
|
||||
- ✅ Some L tasks for major units (10-30%)
|
||||
- ✅ Rare S tasks for truly standalone work (<10%)
|
||||
|
||||
**File Specificity:**
|
||||
- ✅ `src/lib/models/auth.ts`
|
||||
- ✅ `src/components/auth/LoginForm.tsx`
|
||||
- ❌ `src/**/*.ts` (too vague)
|
||||
- ❌ `src/lib/models/` (specify exact files)
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- ✅ 3-5 specific, testable criteria
|
||||
- ✅ Quantifiable (tests pass, build succeeds, API returns 200)
|
||||
- ❌ Vague ("works well", "is good")
|
||||
- ❌ Too many (>7 - task is too large)
|
||||
|
||||
**Dependencies:**
|
||||
- ✅ Minimal (only true blockers)
|
||||
- ✅ Explicit reasons (shares file X)
|
||||
- ❌ Circular dependencies
|
||||
- ❌ Over-constrained (everything depends on everything)
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Spec Has Insufficient Information
|
||||
|
||||
If spec has neither "Implementation Plan" nor enough detail to design tasks:
|
||||
|
||||
```
|
||||
|
||||
❌ Cannot decompose - spec lacks implementation details
|
||||
|
||||
The spec must have either:
|
||||
- An "Implementation Plan" section with tasks, OR
|
||||
- Sufficient requirements and architecture details to design tasks
|
||||
|
||||
Current spec has:
|
||||
- Functional Requirements: [YES/NO]
|
||||
- Architecture section: [YES/NO]
|
||||
- Files to create/modify: [YES/NO]
|
||||
|
||||
Add more implementation details to the spec, then re-run:
|
||||
/spectacular:plan @specs/{run-id}-{feature-slug}/spec.md
|
||||
|
||||
```
|
||||
|
||||
### Critical Quality Issues
|
||||
|
||||
If tasks have critical issues:
|
||||
|
||||
```
|
||||
|
||||
❌ Task Quality Issues - Cannot Generate Plan
|
||||
|
||||
Critical Issues Found:
|
||||
|
||||
- Task 3: XL complexity (12h) - must split
|
||||
- Task 5: No files specified
|
||||
- Task 7: No acceptance criteria
|
||||
|
||||
Fix these issues in the spec, then re-run:
|
||||
/spectacular:plan @specs/{run-id}-{feature-slug}/spec.md
|
||||
|
||||
```
|
||||
|
||||
### Circular Dependencies
|
||||
|
||||
If dependency graph has cycles:
|
||||
|
||||
```
|
||||
|
||||
❌ Circular Dependencies Detected
|
||||
|
||||
Task A depends on Task B
|
||||
Task B depends on Task C
|
||||
Task C depends on Task A
|
||||
|
||||
This is impossible to execute. Review task organization.
|
||||
|
||||
````
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
**Before:** Use `brainstorming` and `spec-feature` to create complete spec
|
||||
|
||||
**After:** Use `/execute` command to run plan with `subagent-driven-development`
|
||||
|
||||
**Pairs with:**
|
||||
- `subagent-driven-development` - Executes individual tasks
|
||||
- `finishing-a-development-branch` - Completes implementation
|
||||
|
||||
## Project-Specific Configuration
|
||||
|
||||
For projects with a constitution, reference it in every task:
|
||||
|
||||
> **Constitution**: All tasks MUST follow @docs/constitutions/current/
|
||||
|
||||
Every task must include:
|
||||
- Reference to constitution for architecture (layer boundaries, dependencies)
|
||||
- Reference to constitution for patterns (validation, state management, etc.)
|
||||
- Quality gates (linting, testing, building)
|
||||
|
||||
**Quality gates:**
|
||||
```bash
|
||||
pnpm biome check --write .
|
||||
pnpm test
|
||||
````
|
||||
414
skills/decomposing-tasks/test-scenarios.md
Normal file
414
skills/decomposing-tasks/test-scenarios.md
Normal file
@@ -0,0 +1,414 @@
|
||||
# Task Decomposition Skill - Test Scenarios
|
||||
|
||||
## RED Phase (Baseline Testing)
|
||||
|
||||
Run these scenarios WITHOUT the decomposing-tasks skill to document natural behavior and rationalizations.
|
||||
|
||||
### Scenario 1: XL Task Pressure
|
||||
|
||||
**Pressure Types**: Time, Authority (spec says 12h)
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Provide a spec with one task marked as "XL (10-12h)"
|
||||
- Task description is detailed and seems well-scoped
|
||||
- Ask agent to decompose into execution plan
|
||||
|
||||
**Expected Violations** (what we're testing for):
|
||||
|
||||
- Agent accepts XL task without splitting
|
||||
- Rationalizations like:
|
||||
- "It's well-scoped so the estimate is reliable"
|
||||
- "12h is borderline, we can proceed"
|
||||
- "The spec author knows best"
|
||||
- "Splitting would add coordination overhead"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
# Feature: Admin Dashboard
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Task 1: Complete Admin Dashboard
|
||||
|
||||
**Complexity**: XL (12h)
|
||||
**Files**:
|
||||
|
||||
- src/app/admin/page.tsx
|
||||
- src/app/admin/users/page.tsx
|
||||
- src/app/admin/categories/page.tsx
|
||||
- src/lib/services/admin-service.ts
|
||||
- src/lib/actions/admin-actions.ts
|
||||
|
||||
**Description**: Build complete admin dashboard with user management, category management, and analytics overview.
|
||||
|
||||
**Acceptance**:
|
||||
|
||||
- [ ] Users table with edit/delete
|
||||
- [ ] Categories CRUD interface
|
||||
- [ ] Analytics dashboard with charts
|
||||
- [ ] All pages properly authenticated
|
||||
```
|
||||
|
||||
### Scenario 2: Wildcard Pattern Pressure
|
||||
|
||||
**Pressure Types**: Convenience, Sunk Cost (spec already written this way)
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Spec uses wildcard patterns like `src/**/*.ts`
|
||||
- Patterns seem reasonable ("all TypeScript files")
|
||||
- Ask agent to decompose
|
||||
|
||||
**Expected Violations**:
|
||||
|
||||
- Agent keeps wildcard patterns
|
||||
- Rationalizations like:
|
||||
- "The wildcard is clear enough"
|
||||
- "We know what files we mean"
|
||||
- "Being explicit would be tedious"
|
||||
- "The spec is already written this way"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
# Feature: Type Safety Refactor
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Task 1: Update Type Definitions
|
||||
|
||||
**Complexity**: M (3h)
|
||||
**Files**:
|
||||
|
||||
- src/\*_/_.ts
|
||||
- types/\*_/_.d.ts
|
||||
|
||||
**Description**: Update all TypeScript files to use strict mode...
|
||||
```
|
||||
|
||||
### Scenario 3: False Independence Pressure
|
||||
|
||||
**Pressure Types**: Optimism, Desired Outcome (want parallelization)
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Two tasks that share a file
|
||||
- Tasks seem independent at first glance
|
||||
- User wants parallelization
|
||||
|
||||
**Expected Violations**:
|
||||
|
||||
- Agent marks tasks as parallel despite file overlap
|
||||
- Rationalizations like:
|
||||
- "They modify different parts of the file"
|
||||
- "We can merge the changes later"
|
||||
- "The overlap is minimal"
|
||||
- "Parallelization benefits outweigh coordination cost"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
# Feature: Authentication System
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Task 1: Magic Link Service
|
||||
|
||||
**Complexity**: M (3h)
|
||||
**Files**:
|
||||
|
||||
- src/lib/services/magic-link-service.ts
|
||||
- src/lib/models/auth.ts
|
||||
- src/types/auth.ts
|
||||
|
||||
### Task 2: Session Management
|
||||
|
||||
**Complexity**: M (3h)
|
||||
**Files**:
|
||||
|
||||
- src/lib/services/session-service.ts
|
||||
- src/lib/models/auth.ts
|
||||
- src/types/auth.ts
|
||||
```
|
||||
|
||||
### Scenario 4: Missing Acceptance Criteria Pressure
|
||||
|
||||
**Pressure Types**: Laziness, "Good Enough" (task seems clear)
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Task with only 1-2 vague acceptance criteria
|
||||
- Implementation steps are detailed
|
||||
- Task seems well-defined otherwise
|
||||
|
||||
**Expected Violations**:
|
||||
|
||||
- Agent proceeds without adding criteria
|
||||
- Rationalizations like:
|
||||
- "The implementation steps are clear"
|
||||
- "We can add criteria later if needed"
|
||||
- "The existing criteria cover it"
|
||||
- "Over-specifying is bureaucratic"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
### Task 1: User Profile Page
|
||||
|
||||
**Complexity**: M (3h)
|
||||
**Files**:
|
||||
|
||||
- src/app/profile/page.tsx
|
||||
- src/lib/services/user-service.ts
|
||||
|
||||
**Implementation Steps**:
|
||||
|
||||
1. Create profile page component
|
||||
2. Add user data fetching
|
||||
3. Display user information
|
||||
4. Add edit button
|
||||
|
||||
**Acceptance**:
|
||||
|
||||
- [ ] Page displays user information
|
||||
```
|
||||
|
||||
### Scenario 5: Architectural Dependency Omission
|
||||
|
||||
**Pressure Types**: Oversight, Assumption (seems obvious)
|
||||
|
||||
**Setup**:
|
||||
|
||||
- Tasks that should have layer dependencies (Model → Service → Action)
|
||||
- File dependencies don't show it
|
||||
- Tasks modifying different files at each layer
|
||||
|
||||
**Expected Violations**:
|
||||
|
||||
- Agent doesn't add architectural dependencies
|
||||
- Marks independent files as parallel
|
||||
- Rationalizations like:
|
||||
- "No file overlap, so they're independent"
|
||||
- "Layer dependencies are implicit"
|
||||
- "The agents will figure it out"
|
||||
|
||||
**Test Input**:
|
||||
|
||||
```markdown
|
||||
### Task 1: Pick Models
|
||||
|
||||
**Files**: src/lib/models/pick.ts
|
||||
|
||||
### Task 2: Pick Service
|
||||
|
||||
**Files**: src/lib/services/pick-service.ts
|
||||
|
||||
### Task 3: Pick Actions
|
||||
|
||||
**Files**: src/lib/actions/pick-actions.ts
|
||||
```
|
||||
|
||||
## GREEN Phase (With Skill Testing)
|
||||
|
||||
After documenting baseline rationalizations, run same scenarios WITH skill.
|
||||
|
||||
**Success Criteria**:
|
||||
|
||||
- XL tasks get split or rejected
|
||||
- Wildcard patterns get flagged
|
||||
- File overlaps prevent parallelization
|
||||
- Missing criteria get caught
|
||||
- Architectural dependencies get added
|
||||
|
||||
## REFACTOR Phase (Close Loopholes)
|
||||
|
||||
After GREEN testing, identify any new rationalizations and add explicit counters to skill.
|
||||
|
||||
**Document**:
|
||||
|
||||
- New rationalizations agents used
|
||||
- Specific language from agent responses
|
||||
- Where in skill to add counter
|
||||
|
||||
**Update skill**:
|
||||
|
||||
- Add rationalization to table
|
||||
- Add explicit prohibition if needed
|
||||
- Add red flag if it's a warning sign
|
||||
|
||||
## Execution Instructions
|
||||
|
||||
### Running RED Phase
|
||||
|
||||
1. Create test spec file: `specs/test-decomposing-tasks.md`
|
||||
2. Use Scenario 1 content
|
||||
3. Ask agent (WITHOUT loading skill): "Decompose this spec into an execution plan"
|
||||
4. Document exact rationalizations used (verbatim quotes)
|
||||
5. Repeat for each scenario
|
||||
6. Compile list of all rationalizations
|
||||
|
||||
### Running GREEN Phase
|
||||
|
||||
1. Same test spec files
|
||||
2. Ask agent (WITH skill loaded): "Use decomposing-tasks skill to create plan"
|
||||
3. Verify agent catches issues
|
||||
4. Document any new rationalizations
|
||||
5. Repeat for each scenario
|
||||
|
||||
### Running REFACTOR Phase
|
||||
|
||||
1. Review all new rationalizations from GREEN
|
||||
2. Update skill with explicit counters
|
||||
3. Re-run scenarios to verify
|
||||
4. Iterate until bulletproof
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**RED Phase Success**: Agent violates rules, rationalizations documented
|
||||
**GREEN Phase Success**: Agent catches violations, follows rules
|
||||
**REFACTOR Phase Success**: Agent can't find loopholes, rules are explicit
|
||||
|
||||
## Notes
|
||||
|
||||
This is TDD for documentation. The test scenarios are the "test cases", the skill is the "production code".
|
||||
|
||||
Same discipline applies:
|
||||
|
||||
- Must see failures first (RED)
|
||||
- Then write minimal fix (GREEN)
|
||||
- Then iterate to close holes (REFACTOR)
|
||||
|
||||
---
|
||||
|
||||
## RED Phase Results (Executed: 2025-01-17)
|
||||
|
||||
### Scenario 1 Results: XL Task Pressure ✅ AGENT CORRECTLY REJECTED
|
||||
|
||||
**What the agent did:**
|
||||
|
||||
- ✅ Would SPLIT the XL task, NOT accept it
|
||||
- ✅ Provided detailed reasoning about blocking risk, testing difficulty, code review burden
|
||||
- ✅ Suggested splitting into 6-8 tasks (2-3h each)
|
||||
- ✅ Actually estimated MORE time (16h vs 12h), indicating original was underestimated
|
||||
|
||||
**Agent quote:**
|
||||
|
||||
> "I would SPLIT it. I would not accept a 12-hour task as-is... A 12-hour task violates several fundamental principles of good task management... Industry standard is to keep tasks to 2-4 hours maximum."
|
||||
|
||||
**Key insight:** Agent naturally understood XL tasks are problematic even WITHOUT skill guidance. No rationalization occurred.
|
||||
|
||||
**Predicted incorrectly:** Expected agent to accept XL task with rationalizations. Agent made correct decision.
|
||||
|
||||
---
|
||||
|
||||
### Scenario 2 Results: Wildcard Pattern Pressure ✅ AGENT CORRECTLY REJECTED
|
||||
|
||||
**What the agent did:**
|
||||
|
||||
- ✅ Would NOT accept wildcard patterns for execution
|
||||
- ✅ Recognized need to glob/scan codebase first
|
||||
- ✅ Understood dependency analysis is impossible with wildcards
|
||||
- ✅ Identified spec as insufficient for execution
|
||||
|
||||
**Agent quote:**
|
||||
|
||||
> "I would NOT accept these wildcard patterns as-is for execution... Wildcard patterns are insufficient for execution planning because: Lack of specificity, No file discovery, Impossible dependency analysis, Poor task breakdown, No parallelization insight."
|
||||
|
||||
**Key insight:** Agent naturally understood wildcards are problematic. No pressure overcome necessary.
|
||||
|
||||
**Predicted incorrectly:** Expected agent to keep wildcards with "good enough" rationalization. Agent made correct decision.
|
||||
|
||||
---
|
||||
|
||||
### Scenario 3 Results: False Independence ✅ AGENT CORRECTLY DETECTED DEPENDENCIES
|
||||
|
||||
**What the agent did:**
|
||||
|
||||
- ✅ Marked tasks as SEQUENTIAL, not parallel
|
||||
- ✅ Detected shared files (auth.ts, types)
|
||||
- ✅ Identified both logical AND file dependencies
|
||||
- ✅ Understood merge conflict risks
|
||||
|
||||
**Agent quote:**
|
||||
|
||||
> "I would mark these as SEQUENTIAL... The tasks have both logical dependencies and file modification conflicts... Yes, I noticed the critical overlap: Both tasks modify src/lib/models/auth.ts and src/types/auth.ts. This is a significant merge conflict risk."
|
||||
|
||||
**Key insight:** Agent performed thorough dependency analysis without prompting. Considered both file overlaps AND logical flow.
|
||||
|
||||
**Predicted incorrectly:** Expected agent to mark as parallel with optimistic rationalizations. Agent made correct decision.
|
||||
|
||||
---
|
||||
|
||||
### Scenario 4 Results: Missing Criteria ✅ AGENT CORRECTLY REQUIRED MORE
|
||||
|
||||
**What the agent did:**
|
||||
|
||||
- ✅ Said one criterion is NOT enough
|
||||
- ✅ Would require 9+ specific, testable criteria
|
||||
- ✅ Identified ambiguity and lack of testability
|
||||
- ✅ Explained why "done" would be subjective without better criteria
|
||||
|
||||
**Agent quote:**
|
||||
|
||||
> "No, one acceptance criterion is not enough... The single criterion 'Page displays user information' is far too vague... acceptance criteria should be testable and unambiguous. The current criterion fails both tests."
|
||||
|
||||
**Key insight:** Agent naturally understood quality requirements for acceptance criteria. No rationalization about "good enough."
|
||||
|
||||
**Predicted incorrectly:** Expected agent to accept vague criteria with "we'll figure it out" rationalization. Agent made correct decision.
|
||||
|
||||
---
|
||||
|
||||
### Scenario 5 Results: Architectural Dependencies ✅ AGENT CORRECTLY APPLIED LAYER ORDER
|
||||
|
||||
**What the agent did:**
|
||||
|
||||
- ✅ Marked tasks as SEQUENTIAL based on architecture
|
||||
- ✅ Explicitly read and referenced patterns.md
|
||||
- ✅ Understood Models → Services → Actions dependency chain
|
||||
- ✅ Recognized layer boundaries create hard import dependencies
|
||||
|
||||
**Agent quote:**
|
||||
|
||||
> "SEQUENTIAL - These tasks must run sequentially, not in parallel... The codebase enforces strict layer boundaries... Each layer depends on the layer below it: Actions MUST import services, Services MUST import models."
|
||||
|
||||
**Key insight:** Agent proactively read architectural documentation and applied it correctly. Very thorough analysis.
|
||||
|
||||
**Predicted incorrectly:** Expected agent to overlook architectural dependencies and focus only on file analysis. Agent made correct decision.
|
||||
|
||||
---
|
||||
|
||||
## RED Phase Summary
|
||||
|
||||
**SURPRISING FINDING:** All 5 agents made CORRECT decisions even WITHOUT the skill.
|
||||
|
||||
**This is fundamentally different from versioning-constitutions testing**, where agents failed all scenarios without skill guidance.
|
||||
|
||||
**Why the difference?**
|
||||
|
||||
1. **Task decomposition principles are well-known** - Industry best practices are clear (small tasks, explicit criteria, dependency analysis)
|
||||
2. **Agents have strong general knowledge** - These concepts are widely documented in software engineering literature
|
||||
3. **The problems are obvious** - XL tasks, wildcards, and missing criteria are clearly problematic
|
||||
4. **Architectural patterns were documented** - patterns.md provided explicit guidance that agents read
|
||||
|
||||
**What does this mean for the skill?**
|
||||
|
||||
The skill serves a different purpose than initially expected:
|
||||
|
||||
1. **NOT teaching new concepts** - Agents already understand task decomposition principles
|
||||
2. **ENFORCING consistency** - Standardize HOW analysis is performed
|
||||
3. **PREVENTING pressure-driven shortcuts** - Guard against time pressure, authority pressure, or "good enough" thinking
|
||||
4. **PROVIDING algorithmic rigor** - Ensure dependency analysis follows consistent algorithm
|
||||
5. **STANDARDIZING output format** - Generate consistent plan.md structure
|
||||
|
||||
**Skill value proposition shifts from:**
|
||||
|
||||
- ❌ "Teaching agents how to decompose tasks" (they already know)
|
||||
- ✅ "Enforcing mandatory checks and consistent methodology" (prevent shortcuts)
|
||||
|
||||
**Next steps:**
|
||||
|
||||
- Run GREEN phase to verify skill provides value through consistency and enforcement
|
||||
- Focus testing on: Does skill make process MORE RIGOROUS and CONSISTENT?
|
||||
- Look for: Are there edge cases where agents might skip steps under pressure?
|
||||
Reference in New Issue
Block a user