Files
2025-11-30 09:06:38 +08:00

26 KiB

name, description
name description
sre-task-refinement Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.

<skill_overview> Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation. </skill_overview>

<rigidity_level> LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps. </rigidity_level>

<quick_reference>

Category Key Questions Auto-Reject If
1. Granularity Tasks 4-8 hours? Phases <16 hours? Any task >16h without breakdown
2. Implementability Junior can execute without questions? Vague language, missing details
3. Success Criteria 3+ measurable criteria per task? Can't verify ("works well")
4. Dependencies Correct parent-child, blocking relationships? Circular dependencies
5. Safety Standards Anti-patterns specified? Error handling? No anti-patterns section
6. Edge Cases Empty input? Unicode? Concurrency? Failures? No edge case consideration
7. Red Flags Placeholder text? Vague instructions? "[detailed above]", "TODO"

Perspective: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.

Time: Don't rush - catching one gap pre-implementation saves hours of rework. </quick_reference>

<when_to_use> Use when:

  • Reviewing bd epic/feature plans before implementation
  • Need to ensure junior engineer can execute without questions
  • Want to catch edge cases and failure modes upfront
  • Need to verify task granularity (4-8 hour subtasks)
  • After hyperpowers:writing-plans creates initial plan
  • Before hyperpowers:executing-plans starts implementation

Don't use when:

  • Task already being implemented (too late)
  • Just need to understand existing code (use codebase-investigator)
  • Debugging issues (use debugging-with-tools)
  • Want to create plan from scratch (use brainstorming → writing-plans) </when_to_use>

<the_process>

Announcement

Announce: "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."


Review Checklist (Apply to Every Task)

1. Task Granularity

Check:

  • No task >8 hours (subtasks) or >16 hours (phases)?
  • Large phases broken into 4-8 hour subtasks?
  • Each subtask independently completable?
  • Each subtask has clear deliverable?

If task >16 hours:

  • Create subtasks with bd create
  • Link with bd dep add child parent --type parent-child
  • Update parent to coordinator role

2. Implementability (Junior Engineer Test)

Check:

  • Can junior engineer implement without asking questions?
  • Function signatures/behaviors described, not just "implement X"?
  • Test scenarios described (what they verify, not just names)?
  • "Done" clearly defined with verifiable criteria?
  • All file paths specified or marked "TBD: new file"?

Red flags:

  • "Implement properly" (how?)
  • "Add support" (for what exactly?)
  • "Make it work" (what does working mean?)
  • File paths missing or ambiguous

3. Success Criteria Quality

Check:

  • Each task has 3+ specific, measurable success criteria?
  • All criteria testable/verifiable (not subjective)?
  • Includes automated verification (tests pass, clippy clean)?
  • No vague criteria like "works well" or "is implemented"?

Good criteria examples:

  • "5+ unit tests pass (valid VIN, invalid checksum, various formats)"
  • "Clippy clean with no warnings"
  • "Performance: <100ms for 1000 records"

Bad criteria examples:

  • "Code is good quality"
  • "Works correctly"
  • "Is implemented"

4. Dependency Structure

Check:

  • Parent-child relationships correct (epic → phases → subtasks)?
  • Blocking dependencies correct (earlier work blocks later)?
  • No circular dependencies?
  • Dependency graph makes logical sense?

Verify with:

bd dep tree bd-1  # Show full dependency tree

5. Safety & Quality Standards

Check:

  • Anti-patterns include unwrap/expect prohibition?
  • Anti-patterns include TODO prohibition (or must have issue #)?
  • Anti-patterns include stub implementation prohibition?
  • Error handling requirements specified (use Result, avoid panic)?
  • Test requirements specific (test names, scenarios listed)?

Minimum anti-patterns:

  • No unwrap/expect in production code
  • No TODOs without issue numbers
  • No stub implementations (unimplemented!, todo!)
  • No regex without catastrophic backtracking check

6. Edge Cases & Failure Modes (Fellow SRE Perspective)

Ask for each task:

  • What happens with malformed input?
  • What happens with empty/nil/zero values?
  • What happens under high load/concurrency?
  • What happens when dependencies fail?
  • What happens with Unicode, special characters, large inputs?
  • Are these edge cases addressed in the plan?

Add to Key Considerations section:

  • Edge case descriptions
  • Mitigation strategies
  • References to similar code handling these cases

7. Red Flags (AUTO-REJECT)

Check for these - if found, REJECT plan:

  • Any task >16 hours without subtask breakdown
  • Vague language: "implement properly", "add support", "make it work"
  • Success criteria that can't be verified: "code is good", "works well"
  • Missing test specifications
  • "We'll handle this later" or "TODO" in the plan itself
  • No anti-patterns section
  • Implementation checklist with fewer than 3 items per task
  • No effort estimates
  • Missing error handling considerations
  • CRITICAL: Placeholder text in design field - "[detailed above]", "[as specified]", "[complete steps here]"

Review Process

For each task in the plan:

Step 1: Read the task

bd show bd-3

Step 2: Apply all 7 checklist categories

  • Task Granularity
  • Implementability
  • Success Criteria Quality
  • Dependency Structure
  • Safety & Quality Standards
  • Edge Cases & Failure Modes
  • Red Flags

Step 3: Document findings Take notes:

  • What's done well
  • What's missing
  • What's vague or ambiguous
  • Hidden failure modes not addressed
  • Better approaches or simplifications

Step 4: Update the task

Use bd update to add missing information:

bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Original goal, preserved]

## Effort Estimate
[Updated estimate if needed]

## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing measurable criteria

## Implementation Checklist
[Complete checklist with file paths]

## Key Considerations (ADDED BY SRE REVIEW)

**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing

**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods

**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition

**Reference Implementation**
- Study src/similar/module.rs for pattern to follow

## Anti-patterns
[Original anti-patterns]
- ❌ NEW: Specific anti-pattern for this task's risks
EOF
)"

IMPORTANT: Use --design for full detailed description, NOT --description (title only).

Step 5: Verify no placeholder text (MANDATORY)

After updating, read back with bd show bd-N and verify:

  • All sections contain actual content, not meta-references
  • No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
  • Implementation steps fully written with actual code examples
  • Success criteria explicit, not referencing "criteria above"
  • If ANY placeholder text found: REJECT and rewrite with actual content

Breaking Down Large Tasks

If task >16 hours, create subtasks:

# Create first subtask
bd create "Subtask 1: [Specific Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design with all 7 categories addressed]"
# Returns bd-10

# Create second subtask
bd create "Subtask 2: [Another Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-11

# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child  # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child  # bd-11 is child of bd-3

# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10  # bd-11 depends on bd-10 (do bd-10 first)

# Update parent to coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.

## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"

Output Format

After reviewing all tasks:

## Plan Review Results

### Epic: [Name] ([epic-id])

### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]

### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]

**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]

### Task-by-Task Review

#### [Task Name] (bd-N)
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])

**Strengths**:
- [What's done well]

**Critical Issues** (must fix):
- [Blocking problems]

**Improvements Needed**:
- [What to add/clarify]

**Edge Cases Missing**:
- [Failure modes not addressed]

**Changes Made**:
- [Specific improvements added via `bd update`]

---

[Repeat for each task/phase/subtask]

### Summary of Changes

**Issues Updated**:
- bd-3 - Added edge case handling for Unicode, regex backtracking risks
- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
- bd-7 - Strengthened success criteria (added test names, verification commands)

### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]

### Recommendations

[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed

[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement

[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]

</the_process>

Developer reviews task but skips edge case analysis (Category 6) # Review of bd-3: Implement VIN scanner

Checklist review:

  1. Granularity: ✅ 6-8 hours
  2. Implementability: ✅ Junior can implement
  3. Success Criteria: ✅ Has 5 test scenarios
  4. Dependencies: ✅ Correct
  5. Safety Standards: ✅ Anti-patterns present
  6. Edge Cases: [SKIPPED - "looks straightforward"]
  7. Red Flags: ✅ None found

Conclusion: "Task looks good, approve ✅"

Task ships without edge case review

Production issues occur:

  • VIN scanner matches random 17-char strings (no checksum validation)
  • Lowercase VINs not handled (should normalize)
  • Catastrophic regex backtracking on long inputs (DoS vulnerability)

<why_it_fails>

  • Skipped Category 6 (Edge Cases) assuming task was "straightforward"
  • Didn't ask: What happens with invalid checksums? Lowercase? Long inputs?
  • Missed critical production issues:
    • False positives (no checksum validation)
    • Data handling bugs (case sensitivity)
    • Security vulnerability (regex DoS)
  • Junior engineer didn't know to handle these (not in task)
  • Production incidents occur after deployment
  • Hours of emergency fixes, customer impact
  • SRE review failed to prevent known failure modes </why_it_fails>
**Apply Category 6 rigorously:**
## Edge Case Analysis for bd-3: VIN Scanner

Ask for EVERY task:
- Malformed input? VIN has checksum - must validate, not just pattern match
- Empty/nil? What if empty string passed?
- Concurrency? Read-only scanner, no concurrency issues
- Dependency failures? No external dependencies
- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
- Large inputs? Regex `.*` patterns can cause catastrophic backtracking

Findings:
❌ VIN checksum validation not mentioned (will match random strings)
❌ Case normalization not mentioned (lowercase VINs exist)
❌ Regex backtracking risk not mentioned (DoS vulnerability)

Update task:

bd update bd-3 --design "$(cat <<'EOF'
[... original content ...]

## Key Considerations (ADDED BY SRE REVIEW)

**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- MUST validate checksum, not just pattern - prevents false positives

**Case Normalization**:
- VINs can appear in lowercase
- MUST normalize to uppercase before validation
- Test with mixed case: "1hgbh41jxmn109186"

**Regex Backtracking Risk**:
- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
- Test with pathological input: 10000 'X's followed by 16-char string
- Use possessive quantifiers or bounded repetition
- Reference: https://www.regular-expressions.info/catastrophic.html

**Edge Cases to Test**:
- Valid VIN with valid checksum (should match)
- Valid pattern but invalid checksum (should NOT match)
- Lowercase VIN (should normalize and validate)
- Ambiguous chars I/O not valid in VIN (should reject)
- Very long input (should not DoS)
EOF
)"

What you gain:

  • Prevented false positives (checksum validation)
  • Prevented data handling bugs (case normalization)
  • Prevented security vulnerability (regex DoS)
  • Junior engineer has complete requirements
  • Production issues caught pre-implementation
  • Proper SRE review preventing known failure modes
  • Customer trust maintained
Developer approves task with placeholder text (Red Flag #10) # Review of bd-5: Implement License Plate Scanner

bd show bd-5:

Implementation Checklist

  • Create scanner module
  • [Complete implementation steps detailed above]
  • Add tests

Success Criteria

  • [As specified in the implementation checklist]
  • Tests pass

Key Considerations

  • [Will be added during implementation]

Developer's review:

"Looks comprehensive, has implementation checklist and success criteria ✅"

During implementation:

Junior engineer: "What are the 'implementation steps detailed above'?" Junior engineer: "What specific success criteria should I verify?" Junior engineer: "What key considerations exist?"

No answers in the task - junior engineer blocked

Have to research and add missing information

Implementation delayed by 2 days

<why_it_fails>

  • Missed Red Flag #10: Placeholder text present
  • "[Complete implementation steps detailed above]" is meta-reference, not content
  • "[As specified in the implementation checklist]" is circular reference
  • "[Will be added during implementation]" is deferral, not specification
  • Junior engineer can't execute - missing critical information
  • Task looks complete but actually incomplete
  • Implementation blocked until details added
  • SRE review failed to catch placeholder text </why_it_fails>
**Check for placeholder text after reading:**
## Red Flag Check (Category 7)

Read through bd-5 line by line:

Line 15: "[Complete implementation steps detailed above]"
❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content

Line 22: "[As specified in the implementation checklist]"
❌ PLACEHOLDER - Circular reference to another section, not explicit criteria

Line 30: "[Will be added during implementation]"
❌ PLACEHOLDER - Deferral to future, not actual considerations

DECISION: REJECT ❌
Reason: Contains placeholder text - task not ready for implementation

Update task with actual content:

bd update bd-5 --design "$(cat <<'EOF'
## Implementation Checklist
- [ ] Create src/scan/plugins/scanners/license_plate.rs
- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
- [ ] Add regex patterns for US states:
  - CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
  - NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
  - TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
  - Generic: `[A-Z0-9]{5,8}` (fallback)
- [ ] Implement has_healthcare_context() check
- [ ] Create test module with 8+ test cases
- [ ] Register in src/scan/plugins/scanners/mod.rs

## Success Criteria
- [ ] Valid CA plate "1ABC123" detected in healthcare context
- [ ] Valid NY plate "ABC1234" detected in healthcare context
- [ ] Invalid plate "123" NOT detected (too short)
- [ ] Valid plate NOT detected outside healthcare context
- [ ] 8+ unit tests pass covering all patterns and edge cases
- [ ] Clippy clean, no warnings
- [ ] cargo test passes

## Key Considerations

**False Positive Risk**:
- License plates are short and generic (5-8 chars)
- MUST require healthcare context via has_healthcare_context()
- Without context, will match random alphanumeric sequences
- Test: Random string "ABC1234" should NOT match outside healthcare context

**State Format Variations**:
- 50 US states have different formats
- Implement common formats (CA, NY, TX) + generic fallback
- Document which formats supported in module docstring
- Consider international plates in future iteration

**Performance**:
- Regex patterns are simple, no backtracking risk
- Should process <1ms per chunk

**Reference Implementation**:
- Study src/scan/plugins/scanners/vehicle_identifier.rs
- Follow same pattern: regex + context check + tests
EOF
)"

Verify no placeholder text:

bd show bd-5
# Read entire output
# Confirm: All sections have actual content
# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
# ✅ Task ready for implementation

What you gain:

  • Junior engineer has complete specification
  • No blocked implementation waiting for details
  • All edge cases documented upfront
  • Success criteria explicit and verifiable
  • Key considerations prevent common mistakes
  • No placeholder text - task truly ready
  • Professional SRE review standard maintained
Developer accepts vague success criteria (Category 3) # Review of bd-7: Implement Data Encryption

bd show bd-7:

Success Criteria

  • Encryption is implemented correctly
  • Code is good quality
  • Tests work properly

Developer's review:

"Has 3 success criteria ✅ Meets minimum requirement"

During implementation:

Junior engineer: "How do I know if encryption is 'correct'?" Junior engineer: "What makes code 'good quality'?" Junior engineer: "What does 'tests work properly' mean?"

Junior engineer makes best guesses:

  • Uses ECB mode (insecure, should use GCM)
  • No key rotation (bad practice)
  • Tests only happy path (misses edge cases)

Code review finds critical security issues

Complete rewrite required - 3 days wasted

<why_it_fails>

  • Category 3 check: "All criteria testable/verifiable?" FAILED
  • "Implemented correctly" - not measurable (correct by what standard?)
  • "Good quality" - subjective, not verifiable
  • "Work properly" - vague, what is proper?
  • Junior engineer can't verify criteria objectively
  • Makes incorrect assumptions filling gaps
  • Security vulnerabilities introduced
  • Wastes time on implementation that fails review
  • SRE review failed to strengthen criteria </why_it_fails>
**Apply Category 3 rigorously:**
## Success Criteria Analysis for bd-7

Current criteria:
- [ ] Encryption is implemented correctly
  ❌ NOT TESTABLE - "correctly" is subjective, no standard specified

- [ ] Code is good quality
  ❌ NOT TESTABLE - "good quality" is opinion, not measurable

- [ ] Tests work properly
  ❌ NOT TESTABLE - "properly" is vague, no definition

Minimum requirement: 3+ specific, measurable, testable criteria
Current: 0 testable criteria
DECISION: REJECT ❌

Update with measurable criteria:

bd update bd-7 --design "$(cat <<'EOF'
[... original content ...]

## Success Criteria

**Encryption Implementation**:
- [ ] Uses AES-256-GCM mode (verified in code review)
- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
- [ ] Unique IV generated per encryption (crypto_random)
- [ ] Authentication tag verified on decryption

**Code Quality** (automated checks):
- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
- [ ] Rustfmt compliant: `cargo fmt --check`
- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0

**Test Coverage**:
- [ ] 12+ unit tests pass covering:
  - test_encrypt_decrypt_roundtrip (happy path)
  - test_wrong_key_fails_auth (security)
  - test_modified_ciphertext_fails_auth (security)
  - test_empty_plaintext (edge case)
  - test_large_plaintext_10mb (performance)
  - test_unicode_plaintext (data handling)
  - test_concurrent_encryption (thread safety)
  - test_iv_uniqueness (security)
  - [4 more specific scenarios]
- [ ] All tests pass: `cargo test encryption`
- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`

**Documentation**:
- [ ] Module docstring explains encryption scheme (AES-256-GCM)
- [ ] Function docstrings include examples
- [ ] Security considerations documented (key management, IV handling)

**Security Review**:
- [ ] No hardcoded keys or IVs (verified via grep)
- [ ] Key zeroized after use (verified in code)
- [ ] Constant-time comparison for auth tag (timing attack prevention)
EOF
)"

What you gain:

  • Every criterion objectively verifiable
  • Junior engineer knows exactly what "done" means
  • Automated checks (clippy, fmt, grep) provide instant feedback
  • Specific test scenarios prevent missed edge cases
  • Security requirements explicit (GCM, PBKDF2, unique IV)
  • No ambiguity - can verify each criterion with command or code review
  • Professional SRE review standard: measurable, testable, specific

<critical_rules>

Rules That Have No Exceptions

  1. Apply all 7 categories to every task → No skipping any category for any task
  2. Reject plans with placeholder text → "[detailed above]", "[as specified]" = instant reject
  3. Verify no placeholder after updates → Read back with bd show and confirm actual content
  4. Break tasks >16 hours → Create subtasks, don't accept large tasks
  5. Strengthen vague criteria → "Works correctly" → measurable verification commands
  6. Add edge cases to every task → Empty? Unicode? Concurrency? Failures?
  7. Never skip Category 6 → Edge case analysis prevents production issues

Common Excuses

All of these mean: STOP. Apply the full process.

  • "Task looks straightforward" (Edge cases hide in "straightforward" tasks)
  • "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items)
  • "Placeholder text is just formatting" (Placeholders mean incomplete specification)
  • "Can handle edge cases during implementation" (Must specify upfront, not defer)
  • "Junior will figure it out" (Junior should NOT need to figure out - we specify)
  • "Too detailed, feels like micromanaging" (Detail prevents questions and rework)
  • "Taking too long to review" (One gap caught saves hours of rework) </critical_rules>

<verification_checklist> Before completing SRE review:

Per task reviewed:

  • Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags)
  • Checked for placeholder text in design field
  • Updated task with missing information via bd update --design
  • Verified updated task with bd show (no placeholders remain)
  • Broke down any task >16 hours into subtasks
  • Strengthened vague success criteria to measurable
  • Added edge case analysis to Key Considerations
  • Strengthened anti-patterns based on failure modes

Overall plan:

  • Reviewed ALL tasks/phases/subtasks (no exceptions)
  • Verified dependency structure with bd dep tree
  • Documented findings for each task
  • Created summary of changes made
  • Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT)

Can't check all boxes? Return to review process and complete missing steps. </verification_checklist>

**This skill is used after:** - hyperpowers:writing-plans (creates initial plan) - hyperpowers:brainstorming (establishes requirements)

This skill is used before:

  • hyperpowers:executing-plans (implements tasks)

This skill is also called by:

  • hyperpowers:executing-plans (REQUIRED for new tasks created during execution)

Call chains:

Initial planning:
hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
                                                    ↓
                                            (if gaps: revise and re-review)

During execution (for new tasks):
hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint

This skill uses:

  • bd commands (show, update, create, dep add, dep tree)
  • Google Fellow SRE perspective (20+ years distributed systems)
  • 7-category checklist (mandatory for every task)

Time expectations:

  • Small epic (3-5 tasks): 15-20 minutes
  • Medium epic (6-10 tasks): 25-40 minutes
  • Large epic (10+ tasks): 45-60 minutes

Don't rush: Catching one critical gap pre-implementation saves hours of rework.

**Review patterns:** - Task too large (>16h) → Break into 4-8h subtasks - Vague criteria ("works correctly") → Measurable commands/checks - Missing edge cases → Add to Key Considerations with mitigations - Placeholder text → Rewrite with actual content

When stuck:

  • Unsure if task too large → Ask: Can junior complete in one day?
  • Unsure if criteria measurable → Ask: Can I verify with command/code review?
  • Unsure if edge case matters → Ask: Could this fail in production?
  • Unsure if placeholder → Ask: Does this reference other content instead of providing content?

Key principle: Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.