gh-withzombies-hyperpowers/skills/sre-task-refinement/SKILL.md

---
name: sre-task-refinement
description: Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.
---

<skill_overview>
Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation.
</skill_overview>

<rigidity_level>
LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps.
</rigidity_level>

<quick_reference>
| Category | Key Questions | Auto-Reject If |
|----------|---------------|----------------|
| 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown |
| 2. Implementability | Junior can execute without questions? | Vague language, missing details |
| 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") |
| 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies |
| 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section |
| 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration |
| 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" |

**Perspective**: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.

**Time**: Don't rush - catching one gap pre-implementation saves hours of rework.
</quick_reference>

<when_to_use>
Use when:
- Reviewing bd epic/feature plans before implementation
- Need to ensure junior engineer can execute without questions
- Want to catch edge cases and failure modes upfront
- Need to verify task granularity (4-8 hour subtasks)
- After hyperpowers:writing-plans creates initial plan
- Before hyperpowers:executing-plans starts implementation

Don't use when:
- Task already being implemented (too late)
- Just need to understand existing code (use codebase-investigator)
- Debugging issues (use debugging-with-tools)
- Want to create plan from scratch (use brainstorming → writing-plans)
</when_to_use>

<the_process>
## Announcement

**Announce:** "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."

---

## Review Checklist (Apply to Every Task)

### 1. Task Granularity

**Check:**
- [ ] No task >8 hours (subtasks) or >16 hours (phases)?
- [ ] Large phases broken into 4-8 hour subtasks?
- [ ] Each subtask independently completable?
- [ ] Each subtask has clear deliverable?

**If task >16 hours:**
- Create subtasks with `bd create`
- Link with `bd dep add child parent --type parent-child`
- Update parent to coordinator role

---

### 2. Implementability (Junior Engineer Test)

**Check:**
- [ ] Can junior engineer implement without asking questions?
- [ ] Function signatures/behaviors described, not just "implement X"?
- [ ] Test scenarios described (what they verify, not just names)?
- [ ] "Done" clearly defined with verifiable criteria?
- [ ] All file paths specified or marked "TBD: new file"?

**Red flags:**
- "Implement properly" (how?)
- "Add support" (for what exactly?)
- "Make it work" (what does working mean?)
- File paths missing or ambiguous

---

### 3. Success Criteria Quality

**Check:**
- [ ] Each task has 3+ specific, measurable success criteria?
- [ ] All criteria testable/verifiable (not subjective)?
- [ ] Includes automated verification (tests pass, clippy clean)?
- [ ] No vague criteria like "works well" or "is implemented"?

**Good criteria examples:**
- ✅ "5+ unit tests pass (valid VIN, invalid checksum, various formats)"
- ✅ "Clippy clean with no warnings"
- ✅ "Performance: <100ms for 1000 records"

**Bad criteria examples:**
- ❌ "Code is good quality"
- ❌ "Works correctly"
- ❌ "Is implemented"

---

### 4. Dependency Structure

**Check:**
- [ ] Parent-child relationships correct (epic → phases → subtasks)?
- [ ] Blocking dependencies correct (earlier work blocks later)?
- [ ] No circular dependencies?
- [ ] Dependency graph makes logical sense?

**Verify with:**
```bash
bd dep tree bd-1  # Show full dependency tree
```

---

### 5. Safety & Quality Standards

**Check:**
- [ ] Anti-patterns include unwrap/expect prohibition?
- [ ] Anti-patterns include TODO prohibition (or must have issue #)?
- [ ] Anti-patterns include stub implementation prohibition?
- [ ] Error handling requirements specified (use Result, avoid panic)?
- [ ] Test requirements specific (test names, scenarios listed)?

**Minimum anti-patterns:**
- ❌ No unwrap/expect in production code
- ❌ No TODOs without issue numbers
- ❌ No stub implementations (unimplemented!, todo!)
- ❌ No regex without catastrophic backtracking check

---

### 6. Edge Cases & Failure Modes (Fellow SRE Perspective)

**Ask for each task:**
- [ ] What happens with malformed input?
- [ ] What happens with empty/nil/zero values?
- [ ] What happens under high load/concurrency?
- [ ] What happens when dependencies fail?
- [ ] What happens with Unicode, special characters, large inputs?
- [ ] Are these edge cases addressed in the plan?

**Add to Key Considerations section:**
- Edge case descriptions
- Mitigation strategies
- References to similar code handling these cases

---

### 7. Red Flags (AUTO-REJECT)

**Check for these - if found, REJECT plan:**
- ❌ Any task >16 hours without subtask breakdown
- ❌ Vague language: "implement properly", "add support", "make it work"
- ❌ Success criteria that can't be verified: "code is good", "works well"
- ❌ Missing test specifications
- ❌ "We'll handle this later" or "TODO" in the plan itself
- ❌ No anti-patterns section
- ❌ Implementation checklist with fewer than 3 items per task
- ❌ No effort estimates
- ❌ Missing error handling considerations
- ❌ **CRITICAL: Placeholder text in design field** - "[detailed above]", "[as specified]", "[complete steps here]"

---

## Review Process

For each task in the plan:

**Step 1: Read the task**
```bash
bd show bd-3
```

**Step 2: Apply all 7 checklist categories**
- Task Granularity
- Implementability
- Success Criteria Quality
- Dependency Structure
- Safety & Quality Standards
- Edge Cases & Failure Modes
- Red Flags

**Step 3: Document findings**
Take notes:
- What's done well
- What's missing
- What's vague or ambiguous
- Hidden failure modes not addressed
- Better approaches or simplifications

**Step 4: Update the task**

Use `bd update` to add missing information:

```bash
bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Original goal, preserved]

## Effort Estimate
[Updated estimate if needed]

## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing measurable criteria

## Implementation Checklist
[Complete checklist with file paths]

## Key Considerations (ADDED BY SRE REVIEW)

**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing

**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods

**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition

**Reference Implementation**
- Study src/similar/module.rs for pattern to follow

## Anti-patterns
[Original anti-patterns]
- ❌ NEW: Specific anti-pattern for this task's risks
EOF
)"
```

**IMPORTANT:** Use `--design` for full detailed description, NOT `--description` (title only).

**Step 5: Verify no placeholder text (MANDATORY)**

After updating, read back with `bd show bd-N` and verify:
- ✅ All sections contain actual content, not meta-references
- ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
- ✅ Implementation steps fully written with actual code examples
- ✅ Success criteria explicit, not referencing "criteria above"
- ❌ If ANY placeholder text found: REJECT and rewrite with actual content

---

## Breaking Down Large Tasks

If task >16 hours, create subtasks:

```bash
# Create first subtask
bd create "Subtask 1: [Specific Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design with all 7 categories addressed]"
# Returns bd-10

# Create second subtask
bd create "Subtask 2: [Another Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-11

# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child  # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child  # bd-11 is child of bd-3

# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10  # bd-11 depends on bd-10 (do bd-10 first)

# Update parent to coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.

## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"
```

---

## Output Format

After reviewing all tasks:

```markdown
## Plan Review Results

### Epic: [Name] ([epic-id])

### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]

### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]

**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]

### Task-by-Task Review

#### [Task Name] (bd-N)
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])

**Strengths**:
- [What's done well]

**Critical Issues** (must fix):
- [Blocking problems]

**Improvements Needed**:
- [What to add/clarify]

**Edge Cases Missing**:
- [Failure modes not addressed]

**Changes Made**:
- [Specific improvements added via `bd update`]

---

[Repeat for each task/phase/subtask]

### Summary of Changes

**Issues Updated**:
- bd-3 - Added edge case handling for Unicode, regex backtracking risks
- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
- bd-7 - Strengthened success criteria (added test names, verification commands)

### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]

### Recommendations

[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed

[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement

[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]
```
</the_process>

<examples>
<example>
<scenario>Developer reviews task but skips edge case analysis (Category 6)</scenario>

<code>
# Review of bd-3: Implement VIN scanner

## Checklist review:
1. Granularity: ✅ 6-8 hours
2. Implementability: ✅ Junior can implement
3. Success Criteria: ✅ Has 5 test scenarios
4. Dependencies: ✅ Correct
5. Safety Standards: ✅ Anti-patterns present
6. Edge Cases: [SKIPPED - "looks straightforward"]
7. Red Flags: ✅ None found

Conclusion: "Task looks good, approve ✅"

# Task ships without edge case review
# Production issues occur:
- VIN scanner matches random 17-char strings (no checksum validation)
- Lowercase VINs not handled (should normalize)
- Catastrophic regex backtracking on long inputs (DoS vulnerability)
</code>

<why_it_fails>
- Skipped Category 6 (Edge Cases) assuming task was "straightforward"
- Didn't ask: What happens with invalid checksums? Lowercase? Long inputs?
- Missed critical production issues:
  - False positives (no checksum validation)
  - Data handling bugs (case sensitivity)
  - Security vulnerability (regex DoS)
- Junior engineer didn't know to handle these (not in task)
- Production incidents occur after deployment
- Hours of emergency fixes, customer impact
- SRE review failed to prevent known failure modes
</why_it_fails>

<correction>
**Apply Category 6 rigorously:**

```markdown
## Edge Case Analysis for bd-3: VIN Scanner

Ask for EVERY task:
- Malformed input? VIN has checksum - must validate, not just pattern match
- Empty/nil? What if empty string passed?
- Concurrency? Read-only scanner, no concurrency issues
- Dependency failures? No external dependencies
- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
- Large inputs? Regex `.*` patterns can cause catastrophic backtracking

Findings:
❌ VIN checksum validation not mentioned (will match random strings)
❌ Case normalization not mentioned (lowercase VINs exist)
❌ Regex backtracking risk not mentioned (DoS vulnerability)
```

**Update task:**
```bash
bd update bd-3 --design "$(cat <<'EOF'
[... original content ...]

## Key Considerations (ADDED BY SRE REVIEW)

**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- MUST validate checksum, not just pattern - prevents false positives

**Case Normalization**:
- VINs can appear in lowercase
- MUST normalize to uppercase before validation
- Test with mixed case: "1hgbh41jxmn109186"

**Regex Backtracking Risk**:
- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
- Test with pathological input: 10000 'X's followed by 16-char string
- Use possessive quantifiers or bounded repetition
- Reference: https://www.regular-expressions.info/catastrophic.html

**Edge Cases to Test**:
- Valid VIN with valid checksum (should match)
- Valid pattern but invalid checksum (should NOT match)
- Lowercase VIN (should normalize and validate)
- Ambiguous chars I/O not valid in VIN (should reject)
- Very long input (should not DoS)
EOF
)"
```

**What you gain:**
- Prevented false positives (checksum validation)
- Prevented data handling bugs (case normalization)
- Prevented security vulnerability (regex DoS)
- Junior engineer has complete requirements
- Production issues caught pre-implementation
- Proper SRE review preventing known failure modes
- Customer trust maintained
</correction>
</example>

<example>
<scenario>Developer approves task with placeholder text (Red Flag #10)</scenario>

<code>
# Review of bd-5: Implement License Plate Scanner

bd show bd-5:

## Implementation Checklist
- [ ] Create scanner module
- [ ] [Complete implementation steps detailed above]
- [ ] Add tests

## Success Criteria
- [ ] [As specified in the implementation checklist]
- [ ] Tests pass

## Key Considerations
- [Will be added during implementation]

# Developer's review:
"Looks comprehensive, has implementation checklist and success criteria ✅"

# During implementation:
Junior engineer: "What are the 'implementation steps detailed above'?"
Junior engineer: "What specific success criteria should I verify?"
Junior engineer: "What key considerations exist?"

# No answers in the task - junior engineer blocked
# Have to research and add missing information
# Implementation delayed by 2 days
</code>

<why_it_fails>
- Missed Red Flag #10: Placeholder text present
- "[Complete implementation steps detailed above]" is meta-reference, not content
- "[As specified in the implementation checklist]" is circular reference
- "[Will be added during implementation]" is deferral, not specification
- Junior engineer can't execute - missing critical information
- Task looks complete but actually incomplete
- Implementation blocked until details added
- SRE review failed to catch placeholder text
</why_it_fails>

<correction>
**Check for placeholder text after reading:**

```markdown
## Red Flag Check (Category 7)

Read through bd-5 line by line:

Line 15: "[Complete implementation steps detailed above]"
❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content

Line 22: "[As specified in the implementation checklist]"
❌ PLACEHOLDER - Circular reference to another section, not explicit criteria

Line 30: "[Will be added during implementation]"
❌ PLACEHOLDER - Deferral to future, not actual considerations

DECISION: REJECT ❌
Reason: Contains placeholder text - task not ready for implementation
```

**Update task with actual content:**
```bash
bd update bd-5 --design "$(cat <<'EOF'
## Implementation Checklist
- [ ] Create src/scan/plugins/scanners/license_plate.rs
- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
- [ ] Add regex patterns for US states:
  - CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
  - NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
  - TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
  - Generic: `[A-Z0-9]{5,8}` (fallback)
- [ ] Implement has_healthcare_context() check
- [ ] Create test module with 8+ test cases
- [ ] Register in src/scan/plugins/scanners/mod.rs

## Success Criteria
- [ ] Valid CA plate "1ABC123" detected in healthcare context
- [ ] Valid NY plate "ABC1234" detected in healthcare context
- [ ] Invalid plate "123" NOT detected (too short)
- [ ] Valid plate NOT detected outside healthcare context
- [ ] 8+ unit tests pass covering all patterns and edge cases
- [ ] Clippy clean, no warnings
- [ ] cargo test passes

## Key Considerations

**False Positive Risk**:
- License plates are short and generic (5-8 chars)
- MUST require healthcare context via has_healthcare_context()
- Without context, will match random alphanumeric sequences
- Test: Random string "ABC1234" should NOT match outside healthcare context

**State Format Variations**:
- 50 US states have different formats
- Implement common formats (CA, NY, TX) + generic fallback
- Document which formats supported in module docstring
- Consider international plates in future iteration

**Performance**:
- Regex patterns are simple, no backtracking risk
- Should process <1ms per chunk

**Reference Implementation**:
- Study src/scan/plugins/scanners/vehicle_identifier.rs
- Follow same pattern: regex + context check + tests
EOF
)"
```

**Verify no placeholder text:**
```bash
bd show bd-5
# Read entire output
# Confirm: All sections have actual content
# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
# ✅ Task ready for implementation
```

**What you gain:**
- Junior engineer has complete specification
- No blocked implementation waiting for details
- All edge cases documented upfront
- Success criteria explicit and verifiable
- Key considerations prevent common mistakes
- No placeholder text - task truly ready
- Professional SRE review standard maintained
</correction>
</example>

<example>
<scenario>Developer accepts vague success criteria (Category 3)</scenario>

<code>
# Review of bd-7: Implement Data Encryption

bd show bd-7:

## Success Criteria
- [ ] Encryption is implemented correctly
- [ ] Code is good quality
- [ ] Tests work properly

# Developer's review:
"Has 3 success criteria ✅ Meets minimum requirement"

# During implementation:
Junior engineer: "How do I know if encryption is 'correct'?"
Junior engineer: "What makes code 'good quality'?"
Junior engineer: "What does 'tests work properly' mean?"

# Junior engineer makes best guesses:
- Uses ECB mode (insecure, should use GCM)
- No key rotation (bad practice)
- Tests only happy path (misses edge cases)

# Code review finds critical security issues
# Complete rewrite required - 3 days wasted
</code>

<why_it_fails>
- Category 3 check: "All criteria testable/verifiable?" ❌ FAILED
- "Implemented correctly" - not measurable (correct by what standard?)
- "Good quality" - subjective, not verifiable
- "Work properly" - vague, what is proper?
- Junior engineer can't verify criteria objectively
- Makes incorrect assumptions filling gaps
- Security vulnerabilities introduced
- Wastes time on implementation that fails review
- SRE review failed to strengthen criteria
</why_it_fails>

<correction>
**Apply Category 3 rigorously:**

```markdown
## Success Criteria Analysis for bd-7

Current criteria:
- [ ] Encryption is implemented correctly
  ❌ NOT TESTABLE - "correctly" is subjective, no standard specified

- [ ] Code is good quality
  ❌ NOT TESTABLE - "good quality" is opinion, not measurable

- [ ] Tests work properly
  ❌ NOT TESTABLE - "properly" is vague, no definition

Minimum requirement: 3+ specific, measurable, testable criteria
Current: 0 testable criteria
DECISION: REJECT ❌
```

**Update with measurable criteria:**
```bash
bd update bd-7 --design "$(cat <<'EOF'
[... original content ...]

## Success Criteria

**Encryption Implementation**:
- [ ] Uses AES-256-GCM mode (verified in code review)
- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
- [ ] Unique IV generated per encryption (crypto_random)
- [ ] Authentication tag verified on decryption

**Code Quality** (automated checks):
- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
- [ ] Rustfmt compliant: `cargo fmt --check`
- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0

**Test Coverage**:
- [ ] 12+ unit tests pass covering:
  - test_encrypt_decrypt_roundtrip (happy path)
  - test_wrong_key_fails_auth (security)
  - test_modified_ciphertext_fails_auth (security)
  - test_empty_plaintext (edge case)
  - test_large_plaintext_10mb (performance)
  - test_unicode_plaintext (data handling)
  - test_concurrent_encryption (thread safety)
  - test_iv_uniqueness (security)
  - [4 more specific scenarios]
- [ ] All tests pass: `cargo test encryption`
- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`

**Documentation**:
- [ ] Module docstring explains encryption scheme (AES-256-GCM)
- [ ] Function docstrings include examples
- [ ] Security considerations documented (key management, IV handling)

**Security Review**:
- [ ] No hardcoded keys or IVs (verified via grep)
- [ ] Key zeroized after use (verified in code)
- [ ] Constant-time comparison for auth tag (timing attack prevention)
EOF
)"
```

**What you gain:**
- Every criterion objectively verifiable
- Junior engineer knows exactly what "done" means
- Automated checks (clippy, fmt, grep) provide instant feedback
- Specific test scenarios prevent missed edge cases
- Security requirements explicit (GCM, PBKDF2, unique IV)
- No ambiguity - can verify each criterion with command or code review
- Professional SRE review standard: measurable, testable, specific
</correction>
</example>
</examples>

<critical_rules>
## Rules That Have No Exceptions

1. **Apply all 7 categories to every task** → No skipping any category for any task
2. **Reject plans with placeholder text** → "[detailed above]", "[as specified]" = instant reject
3. **Verify no placeholder after updates** → Read back with `bd show` and confirm actual content
4. **Break tasks >16 hours** → Create subtasks, don't accept large tasks
5. **Strengthen vague criteria** → "Works correctly" → measurable verification commands
6. **Add edge cases to every task** → Empty? Unicode? Concurrency? Failures?
7. **Never skip Category 6** → Edge case analysis prevents production issues

## Common Excuses

All of these mean: **STOP. Apply the full process.**

- "Task looks straightforward" (Edge cases hide in "straightforward" tasks)
- "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items)
- "Placeholder text is just formatting" (Placeholders mean incomplete specification)
- "Can handle edge cases during implementation" (Must specify upfront, not defer)
- "Junior will figure it out" (Junior should NOT need to figure out - we specify)
- "Too detailed, feels like micromanaging" (Detail prevents questions and rework)
- "Taking too long to review" (One gap caught saves hours of rework)
</critical_rules>

<verification_checklist>
Before completing SRE review:

**Per task reviewed:**
- [ ] Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags)
- [ ] Checked for placeholder text in design field
- [ ] Updated task with missing information via `bd update --design`
- [ ] Verified updated task with `bd show` (no placeholders remain)
- [ ] Broke down any task >16 hours into subtasks
- [ ] Strengthened vague success criteria to measurable
- [ ] Added edge case analysis to Key Considerations
- [ ] Strengthened anti-patterns based on failure modes

**Overall plan:**
- [ ] Reviewed ALL tasks/phases/subtasks (no exceptions)
- [ ] Verified dependency structure with `bd dep tree`
- [ ] Documented findings for each task
- [ ] Created summary of changes made
- [ ] Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT)

**Can't check all boxes?** Return to review process and complete missing steps.
</verification_checklist>

<integration>
**This skill is used after:**
- hyperpowers:writing-plans (creates initial plan)
- hyperpowers:brainstorming (establishes requirements)

**This skill is used before:**
- hyperpowers:executing-plans (implements tasks)

**This skill is also called by:**
- hyperpowers:executing-plans (REQUIRED for new tasks created during execution)

**Call chains:**
```
Initial planning:
hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
                                                    ↓
                                            (if gaps: revise and re-review)

During execution (for new tasks):
hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint
```

**This skill uses:**
- bd commands (show, update, create, dep add, dep tree)
- Google Fellow SRE perspective (20+ years distributed systems)
- 7-category checklist (mandatory for every task)

**Time expectations:**
- Small epic (3-5 tasks): 15-20 minutes
- Medium epic (6-10 tasks): 25-40 minutes
- Large epic (10+ tasks): 45-60 minutes

**Don't rush:** Catching one critical gap pre-implementation saves hours of rework.
</integration>

<resources>
**Review patterns:**
- Task too large (>16h) → Break into 4-8h subtasks
- Vague criteria ("works correctly") → Measurable commands/checks
- Missing edge cases → Add to Key Considerations with mitigations
- Placeholder text → Rewrite with actual content

**When stuck:**
- Unsure if task too large → Ask: Can junior complete in one day?
- Unsure if criteria measurable → Ask: Can I verify with command/code review?
- Unsure if edge case matters → Ask: Could this fail in production?
- Unsure if placeholder → Ask: Does this reference other content instead of providing content?

**Key principle:** Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.
</resources>