Initial commit

2025-11-30 09:06:38 +08:00
commit ed3e4c84c3
76 changed files with 20449 additions and 0 deletions
--- a/skills/sre-task-refinement/SKILL.md
+++ b/skills/sre-task-refinement/SKILL.md
@@ -0,0 +1,826 @@
+---
+name: sre-task-refinement
+description: Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.
+---
+
+<skill_overview>
+Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation.
+</skill_overview>
+
+<rigidity_level>
+LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps.
+</rigidity_level>
+
+<quick_reference>
+| Category | Key Questions | Auto-Reject If |
+|----------|---------------|----------------|
+| 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown |
+| 2. Implementability | Junior can execute without questions? | Vague language, missing details |
+| 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") |
+| 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies |
+| 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section |
+| 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration |
+| 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" |
+
+**Perspective**: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.
+
+**Time**: Don't rush - catching one gap pre-implementation saves hours of rework.
+</quick_reference>
+
+<when_to_use>
+Use when:
+- Reviewing bd epic/feature plans before implementation
+- Need to ensure junior engineer can execute without questions
+- Want to catch edge cases and failure modes upfront
+- Need to verify task granularity (4-8 hour subtasks)
+- After hyperpowers:writing-plans creates initial plan
+- Before hyperpowers:executing-plans starts implementation
+
+Don't use when:
+- Task already being implemented (too late)
+- Just need to understand existing code (use codebase-investigator)
+- Debugging issues (use debugging-with-tools)
+- Want to create plan from scratch (use brainstorming → writing-plans)
+</when_to_use>
+
+<the_process>
+## Announcement
+
+**Announce:** "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."
+
+---
+
+## Review Checklist (Apply to Every Task)
+
+### 1. Task Granularity
+
+**Check:**
+- [ ] No task >8 hours (subtasks) or >16 hours (phases)?
+- [ ] Large phases broken into 4-8 hour subtasks?
+- [ ] Each subtask independently completable?
+- [ ] Each subtask has clear deliverable?
+
+**If task >16 hours:**
+- Create subtasks with `bd create`
+- Link with `bd dep add child parent --type parent-child`
+- Update parent to coordinator role
+
+---
+
+### 2. Implementability (Junior Engineer Test)
+
+**Check:**
+- [ ] Can junior engineer implement without asking questions?
+- [ ] Function signatures/behaviors described, not just "implement X"?
+- [ ] Test scenarios described (what they verify, not just names)?
+- [ ] "Done" clearly defined with verifiable criteria?
+- [ ] All file paths specified or marked "TBD: new file"?
+
+**Red flags:**
+- "Implement properly" (how?)
+- "Add support" (for what exactly?)
+- "Make it work" (what does working mean?)
+- File paths missing or ambiguous
+
+---
+
+### 3. Success Criteria Quality
+
+**Check:**
+- [ ] Each task has 3+ specific, measurable success criteria?
+- [ ] All criteria testable/verifiable (not subjective)?
+- [ ] Includes automated verification (tests pass, clippy clean)?
+- [ ] No vague criteria like "works well" or "is implemented"?
+
+**Good criteria examples:**
+- ✅ "5+ unit tests pass (valid VIN, invalid checksum, various formats)"
+- ✅ "Clippy clean with no warnings"
+- ✅ "Performance: <100ms for 1000 records"
+
+**Bad criteria examples:**
+- ❌ "Code is good quality"
+- ❌ "Works correctly"
+- ❌ "Is implemented"
+
+---
+
+### 4. Dependency Structure
+
+**Check:**
+- [ ] Parent-child relationships correct (epic → phases → subtasks)?
+- [ ] Blocking dependencies correct (earlier work blocks later)?
+- [ ] No circular dependencies?
+- [ ] Dependency graph makes logical sense?
+
+**Verify with:**
+```bash
+bd dep tree bd-1  # Show full dependency tree
+```
+
+---
+
+### 5. Safety & Quality Standards
+
+**Check:**
+- [ ] Anti-patterns include unwrap/expect prohibition?
+- [ ] Anti-patterns include TODO prohibition (or must have issue #)?
+- [ ] Anti-patterns include stub implementation prohibition?
+- [ ] Error handling requirements specified (use Result, avoid panic)?
+- [ ] Test requirements specific (test names, scenarios listed)?
+
+**Minimum anti-patterns:**
+- ❌ No unwrap/expect in production code
+- ❌ No TODOs without issue numbers
+- ❌ No stub implementations (unimplemented!, todo!)
+- ❌ No regex without catastrophic backtracking check
+
+---
+
+### 6. Edge Cases & Failure Modes (Fellow SRE Perspective)
+
+**Ask for each task:**
+- [ ] What happens with malformed input?
+- [ ] What happens with empty/nil/zero values?
+- [ ] What happens under high load/concurrency?
+- [ ] What happens when dependencies fail?
+- [ ] What happens with Unicode, special characters, large inputs?
+- [ ] Are these edge cases addressed in the plan?
+
+**Add to Key Considerations section:**
+- Edge case descriptions
+- Mitigation strategies
+- References to similar code handling these cases
+
+---
+
+### 7. Red Flags (AUTO-REJECT)
+
+**Check for these - if found, REJECT plan:**
+- ❌ Any task >16 hours without subtask breakdown
+- ❌ Vague language: "implement properly", "add support", "make it work"
+- ❌ Success criteria that can't be verified: "code is good", "works well"
+- ❌ Missing test specifications
+- ❌ "We'll handle this later" or "TODO" in the plan itself
+- ❌ No anti-patterns section
+- ❌ Implementation checklist with fewer than 3 items per task
+- ❌ No effort estimates
+- ❌ Missing error handling considerations
+- ❌ **CRITICAL: Placeholder text in design field** - "[detailed above]", "[as specified]", "[complete steps here]"
+
+---
+
+## Review Process
+
+For each task in the plan:
+
+**Step 1: Read the task**
+```bash
+bd show bd-3
+```
+
+**Step 2: Apply all 7 checklist categories**
+- Task Granularity
+- Implementability
+- Success Criteria Quality
+- Dependency Structure
+- Safety & Quality Standards
+- Edge Cases & Failure Modes
+- Red Flags
+
+**Step 3: Document findings**
+Take notes:
+- What's done well
+- What's missing
+- What's vague or ambiguous
+- Hidden failure modes not addressed
+- Better approaches or simplifications
+
+**Step 4: Update the task**
+
+Use `bd update` to add missing information:
+
+```bash
+bd update bd-3 --design "$(cat <<'EOF'
+## Goal
+[Original goal, preserved]
+
+## Effort Estimate
+[Updated estimate if needed]
+
+## Success Criteria
+- [ ] Existing criteria
+- [ ] NEW: Added missing measurable criteria
+
+## Implementation Checklist
+[Complete checklist with file paths]
+
+## Key Considerations (ADDED BY SRE REVIEW)
+
+**Edge Case: Empty Input**
+- What happens when input is empty string?
+- MUST validate input length before processing
+
+**Edge Case: Unicode Handling**
+- What if string contains RTL or surrogate pairs?
+- Use proper Unicode-aware string methods
+
+**Performance Concern: Regex Backtracking**
+- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
+- MUST test with pathological inputs (e.g., 10000 'a's)
+- Use possessive quantifiers or bounded repetition
+
+**Reference Implementation**
+- Study src/similar/module.rs for pattern to follow
+
+## Anti-patterns
+[Original anti-patterns]
+- ❌ NEW: Specific anti-pattern for this task's risks
+EOF
+)"
+```
+
+**IMPORTANT:** Use `--design` for full detailed description, NOT `--description` (title only).
+
+**Step 5: Verify no placeholder text (MANDATORY)**
+
+After updating, read back with `bd show bd-N` and verify:
+- ✅ All sections contain actual content, not meta-references
+- ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
+- ✅ Implementation steps fully written with actual code examples
+- ✅ Success criteria explicit, not referencing "criteria above"
+- ❌ If ANY placeholder text found: REJECT and rewrite with actual content
+
+---
+
+## Breaking Down Large Tasks
+
+If task >16 hours, create subtasks:
+
+```bash
+# Create first subtask
+bd create "Subtask 1: [Specific Component]" \
+  --type task \
+  --priority 1 \
+  --design "[Complete subtask design with all 7 categories addressed]"
+# Returns bd-10
+
+# Create second subtask
+bd create "Subtask 2: [Another Component]" \
+  --type task \
+  --priority 1 \
+  --design "[Complete subtask design]"
+# Returns bd-11
+
+# Link subtasks to parent with parent-child relationship
+bd dep add bd-10 bd-3 --type parent-child  # bd-10 is child of bd-3
+bd dep add bd-11 bd-3 --type parent-child  # bd-11 is child of bd-3
+
+# Add sequential dependencies if needed (LATER depends on EARLIER)
+bd dep add bd-11 bd-10  # bd-11 depends on bd-10 (do bd-10 first)
+
+# Update parent to coordinator
+bd update bd-3 --design "$(cat <<'EOF'
+## Goal
+Coordinate implementation of [feature]. Broken into N subtasks.
+
+## Success Criteria
+- [ ] All N child subtasks closed
+- [ ] Integration tests pass
+- [ ] [High-level verification criteria]
+EOF
+)"
+```
+
+---
+
+## Output Format
+
+After reviewing all tasks:
+
+```markdown
+## Plan Review Results
+
+### Epic: [Name] ([epic-id])
+
+### Overall Assessment
+[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]
+
+### Dependency Structure Review
+[Output of `bd dep tree [epic-id]`]
+
+**Structure Quality**: [✅ Correct / ❌ Issues found]
+- [Comments on parent-child relationships]
+- [Comments on blocking dependencies]
+- [Comments on granularity]
+
+### Task-by-Task Review
+
+#### [Task Name] (bd-N)
+**Type**: [epic/feature/task]
+**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
+**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])
+
+**Strengths**:
+- [What's done well]
+
+**Critical Issues** (must fix):
+- [Blocking problems]
+
+**Improvements Needed**:
+- [What to add/clarify]
+
+**Edge Cases Missing**:
+- [Failure modes not addressed]
+
+**Changes Made**:
+- [Specific improvements added via `bd update`]
+
+---
+
+[Repeat for each task/phase/subtask]
+
+### Summary of Changes
+
+**Issues Updated**:
+- bd-3 - Added edge case handling for Unicode, regex backtracking risks
+- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
+- bd-7 - Strengthened success criteria (added test names, verification commands)
+
+### Critical Gaps Across Plan
+1. [Pattern of missing items across multiple tasks]
+2. [Systemic issues in the plan]
+
+### Recommendations
+
+[If APPROVE]:
+✅ Plan is solid and ready for implementation.
+- All tasks are junior-engineer implementable
+- Dependency structure is correct
+- Edge cases and failure modes addressed
+
+[If NEEDS REVISION]:
+⚠️ Plan needs improvements before implementation:
+- [List major items that need addressing]
+- After changes, re-run hyperpowers:sre-task-refinement
+
+[If REJECT]:
+❌ Plan has fundamental issues and needs redesign:
+- [Critical problems]
+```
+</the_process>
+
+<examples>
+<example>
+<scenario>Developer reviews task but skips edge case analysis (Category 6)</scenario>
+
+<code>
+# Review of bd-3: Implement VIN scanner
+
+## Checklist review:
+1. Granularity: ✅ 6-8 hours
+2. Implementability: ✅ Junior can implement
+3. Success Criteria: ✅ Has 5 test scenarios
+4. Dependencies: ✅ Correct
+5. Safety Standards: ✅ Anti-patterns present
+6. Edge Cases: [SKIPPED - "looks straightforward"]
+7. Red Flags: ✅ None found
+
+Conclusion: "Task looks good, approve ✅"
+
+# Task ships without edge case review
+# Production issues occur:
+- VIN scanner matches random 17-char strings (no checksum validation)
+- Lowercase VINs not handled (should normalize)
+- Catastrophic regex backtracking on long inputs (DoS vulnerability)
+</code>
+
+<why_it_fails>
+- Skipped Category 6 (Edge Cases) assuming task was "straightforward"
+- Didn't ask: What happens with invalid checksums? Lowercase? Long inputs?
+- Missed critical production issues:
+  - False positives (no checksum validation)
+  - Data handling bugs (case sensitivity)
+  - Security vulnerability (regex DoS)
+- Junior engineer didn't know to handle these (not in task)
+- Production incidents occur after deployment
+- Hours of emergency fixes, customer impact
+- SRE review failed to prevent known failure modes
+</why_it_fails>
+
+<correction>
+**Apply Category 6 rigorously:**
+
+```markdown
+## Edge Case Analysis for bd-3: VIN Scanner
+
+Ask for EVERY task:
+- Malformed input? VIN has checksum - must validate, not just pattern match
+- Empty/nil? What if empty string passed?
+- Concurrency? Read-only scanner, no concurrency issues
+- Dependency failures? No external dependencies
+- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
+- Large inputs? Regex `.*` patterns can cause catastrophic backtracking
+
+Findings:
+❌ VIN checksum validation not mentioned (will match random strings)
+❌ Case normalization not mentioned (lowercase VINs exist)
+❌ Regex backtracking risk not mentioned (DoS vulnerability)
+```
+
+**Update task:**
+```bash
+bd update bd-3 --design "$(cat <<'EOF'
+[... original content ...]
+
+## Key Considerations (ADDED BY SRE REVIEW)
+
+**VIN Checksum Complexity**:
+- ISO 3779 requires transliteration table (letters → numbers)
+- Weighted sum algorithm with modulo 11
+- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
+- MUST validate checksum, not just pattern - prevents false positives
+
+**Case Normalization**:
+- VINs can appear in lowercase
+- MUST normalize to uppercase before validation
+- Test with mixed case: "1hgbh41jxmn109186"
+
+**Regex Backtracking Risk**:
+- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
+- Test with pathological input: 10000 'X's followed by 16-char string
+- Use possessive quantifiers or bounded repetition
+- Reference: https://www.regular-expressions.info/catastrophic.html
+
+**Edge Cases to Test**:
+- Valid VIN with valid checksum (should match)
+- Valid pattern but invalid checksum (should NOT match)
+- Lowercase VIN (should normalize and validate)
+- Ambiguous chars I/O not valid in VIN (should reject)
+- Very long input (should not DoS)
+EOF
+)"
+```
+
+**What you gain:**
+- Prevented false positives (checksum validation)
+- Prevented data handling bugs (case normalization)
+- Prevented security vulnerability (regex DoS)
+- Junior engineer has complete requirements
+- Production issues caught pre-implementation
+- Proper SRE review preventing known failure modes
+- Customer trust maintained
+</correction>
+</example>
+
+<example>
+<scenario>Developer approves task with placeholder text (Red Flag #10)</scenario>
+
+<code>
+# Review of bd-5: Implement License Plate Scanner
+
+bd show bd-5:
+
+## Implementation Checklist
+- [ ] Create scanner module
+- [ ] [Complete implementation steps detailed above]
+- [ ] Add tests
+
+## Success Criteria
+- [ ] [As specified in the implementation checklist]
+- [ ] Tests pass
+
+## Key Considerations
+- [Will be added during implementation]
+
+# Developer's review:
+"Looks comprehensive, has implementation checklist and success criteria ✅"
+
+# During implementation:
+Junior engineer: "What are the 'implementation steps detailed above'?"
+Junior engineer: "What specific success criteria should I verify?"
+Junior engineer: "What key considerations exist?"
+
+# No answers in the task - junior engineer blocked
+# Have to research and add missing information
+# Implementation delayed by 2 days
+</code>
+
+<why_it_fails>
+- Missed Red Flag #10: Placeholder text present
+- "[Complete implementation steps detailed above]" is meta-reference, not content
+- "[As specified in the implementation checklist]" is circular reference
+- "[Will be added during implementation]" is deferral, not specification
+- Junior engineer can't execute - missing critical information
+- Task looks complete but actually incomplete
+- Implementation blocked until details added
+- SRE review failed to catch placeholder text
+</why_it_fails>
+
+<correction>
+**Check for placeholder text after reading:**
+
+```markdown
+## Red Flag Check (Category 7)
+
+Read through bd-5 line by line:
+
+Line 15: "[Complete implementation steps detailed above]"
+❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content
+
+Line 22: "[As specified in the implementation checklist]"
+❌ PLACEHOLDER - Circular reference to another section, not explicit criteria
+
+Line 30: "[Will be added during implementation]"
+❌ PLACEHOLDER - Deferral to future, not actual considerations
+
+DECISION: REJECT ❌
+Reason: Contains placeholder text - task not ready for implementation
+```
+
+**Update task with actual content:**
+```bash
+bd update bd-5 --design "$(cat <<'EOF'
+## Implementation Checklist
+- [ ] Create src/scan/plugins/scanners/license_plate.rs
+- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
+- [ ] Add regex patterns for US states:
+  - CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
+  - NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
+  - TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
+  - Generic: `[A-Z0-9]{5,8}` (fallback)
+- [ ] Implement has_healthcare_context() check
+- [ ] Create test module with 8+ test cases
+- [ ] Register in src/scan/plugins/scanners/mod.rs
+
+## Success Criteria
+- [ ] Valid CA plate "1ABC123" detected in healthcare context
+- [ ] Valid NY plate "ABC1234" detected in healthcare context
+- [ ] Invalid plate "123" NOT detected (too short)
+- [ ] Valid plate NOT detected outside healthcare context
+- [ ] 8+ unit tests pass covering all patterns and edge cases
+- [ ] Clippy clean, no warnings
+- [ ] cargo test passes
+
+## Key Considerations
+
+**False Positive Risk**:
+- License plates are short and generic (5-8 chars)
+- MUST require healthcare context via has_healthcare_context()
+- Without context, will match random alphanumeric sequences
+- Test: Random string "ABC1234" should NOT match outside healthcare context
+
+**State Format Variations**:
+- 50 US states have different formats
+- Implement common formats (CA, NY, TX) + generic fallback
+- Document which formats supported in module docstring
+- Consider international plates in future iteration
+
+**Performance**:
+- Regex patterns are simple, no backtracking risk
+- Should process <1ms per chunk
+
+**Reference Implementation**:
+- Study src/scan/plugins/scanners/vehicle_identifier.rs
+- Follow same pattern: regex + context check + tests
+EOF
+)"
+```
+
+**Verify no placeholder text:**
+```bash
+bd show bd-5
+# Read entire output
+# Confirm: All sections have actual content
+# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
+# ✅ Task ready for implementation
+```
+
+**What you gain:**
+- Junior engineer has complete specification
+- No blocked implementation waiting for details
+- All edge cases documented upfront
+- Success criteria explicit and verifiable
+- Key considerations prevent common mistakes
+- No placeholder text - task truly ready
+- Professional SRE review standard maintained
+</correction>
+</example>
+
+<example>
+<scenario>Developer accepts vague success criteria (Category 3)</scenario>
+
+<code>
+# Review of bd-7: Implement Data Encryption
+
+bd show bd-7:
+
+## Success Criteria
+- [ ] Encryption is implemented correctly
+- [ ] Code is good quality
+- [ ] Tests work properly
+
+# Developer's review:
+"Has 3 success criteria ✅ Meets minimum requirement"
+
+# During implementation:
+Junior engineer: "How do I know if encryption is 'correct'?"
+Junior engineer: "What makes code 'good quality'?"
+Junior engineer: "What does 'tests work properly' mean?"
+
+# Junior engineer makes best guesses:
+- Uses ECB mode (insecure, should use GCM)
+- No key rotation (bad practice)
+- Tests only happy path (misses edge cases)
+
+# Code review finds critical security issues
+# Complete rewrite required - 3 days wasted
+</code>
+
+<why_it_fails>
+- Category 3 check: "All criteria testable/verifiable?" ❌ FAILED
+- "Implemented correctly" - not measurable (correct by what standard?)
+- "Good quality" - subjective, not verifiable
+- "Work properly" - vague, what is proper?
+- Junior engineer can't verify criteria objectively
+- Makes incorrect assumptions filling gaps
+- Security vulnerabilities introduced
+- Wastes time on implementation that fails review
+- SRE review failed to strengthen criteria
+</why_it_fails>
+
+<correction>
+**Apply Category 3 rigorously:**
+
+```markdown
+## Success Criteria Analysis for bd-7
+
+Current criteria:
+- [ ] Encryption is implemented correctly
+  ❌ NOT TESTABLE - "correctly" is subjective, no standard specified
+
+- [ ] Code is good quality
+  ❌ NOT TESTABLE - "good quality" is opinion, not measurable
+
+- [ ] Tests work properly
+  ❌ NOT TESTABLE - "properly" is vague, no definition
+
+Minimum requirement: 3+ specific, measurable, testable criteria
+Current: 0 testable criteria
+DECISION: REJECT ❌
+```
+
+**Update with measurable criteria:**
+```bash
+bd update bd-7 --design "$(cat <<'EOF'
+[... original content ...]
+
+## Success Criteria
+
+**Encryption Implementation**:
+- [ ] Uses AES-256-GCM mode (verified in code review)
+- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
+- [ ] Unique IV generated per encryption (crypto_random)
+- [ ] Authentication tag verified on decryption
+
+**Code Quality** (automated checks):
+- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
+- [ ] Rustfmt compliant: `cargo fmt --check`
+- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
+- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0
+
+**Test Coverage**:
+- [ ] 12+ unit tests pass covering:
+  - test_encrypt_decrypt_roundtrip (happy path)
+  - test_wrong_key_fails_auth (security)
+  - test_modified_ciphertext_fails_auth (security)
+  - test_empty_plaintext (edge case)
+  - test_large_plaintext_10mb (performance)
+  - test_unicode_plaintext (data handling)
+  - test_concurrent_encryption (thread safety)
+  - test_iv_uniqueness (security)
+  - [4 more specific scenarios]
+- [ ] All tests pass: `cargo test encryption`
+- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`
+
+**Documentation**:
+- [ ] Module docstring explains encryption scheme (AES-256-GCM)
+- [ ] Function docstrings include examples
+- [ ] Security considerations documented (key management, IV handling)
+
+**Security Review**:
+- [ ] No hardcoded keys or IVs (verified via grep)
+- [ ] Key zeroized after use (verified in code)
+- [ ] Constant-time comparison for auth tag (timing attack prevention)
+EOF
+)"
+```
+
+**What you gain:**
+- Every criterion objectively verifiable
+- Junior engineer knows exactly what "done" means
+- Automated checks (clippy, fmt, grep) provide instant feedback
+- Specific test scenarios prevent missed edge cases
+- Security requirements explicit (GCM, PBKDF2, unique IV)
+- No ambiguity - can verify each criterion with command or code review
+- Professional SRE review standard: measurable, testable, specific
+</correction>
+</example>
+</examples>
+
+<critical_rules>
+## Rules That Have No Exceptions
+
+1. **Apply all 7 categories to every task** → No skipping any category for any task
+2. **Reject plans with placeholder text** → "[detailed above]", "[as specified]" = instant reject
+3. **Verify no placeholder after updates** → Read back with `bd show` and confirm actual content
+4. **Break tasks >16 hours** → Create subtasks, don't accept large tasks
+5. **Strengthen vague criteria** → "Works correctly" → measurable verification commands
+6. **Add edge cases to every task** → Empty? Unicode? Concurrency? Failures?
+7. **Never skip Category 6** → Edge case analysis prevents production issues
+
+## Common Excuses
+
+All of these mean: **STOP. Apply the full process.**
+
+- "Task looks straightforward" (Edge cases hide in "straightforward" tasks)
+- "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items)
+- "Placeholder text is just formatting" (Placeholders mean incomplete specification)
+- "Can handle edge cases during implementation" (Must specify upfront, not defer)
+- "Junior will figure it out" (Junior should NOT need to figure out - we specify)
+- "Too detailed, feels like micromanaging" (Detail prevents questions and rework)
+- "Taking too long to review" (One gap caught saves hours of rework)
+</critical_rules>
+
+<verification_checklist>
+Before completing SRE review:
+
+**Per task reviewed:**
+- [ ] Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags)
+- [ ] Checked for placeholder text in design field
+- [ ] Updated task with missing information via `bd update --design`
+- [ ] Verified updated task with `bd show` (no placeholders remain)
+- [ ] Broke down any task >16 hours into subtasks
+- [ ] Strengthened vague success criteria to measurable
+- [ ] Added edge case analysis to Key Considerations
+- [ ] Strengthened anti-patterns based on failure modes
+
+**Overall plan:**
+- [ ] Reviewed ALL tasks/phases/subtasks (no exceptions)
+- [ ] Verified dependency structure with `bd dep tree`
+- [ ] Documented findings for each task
+- [ ] Created summary of changes made
+- [ ] Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT)
+
+**Can't check all boxes?** Return to review process and complete missing steps.
+</verification_checklist>
+
+<integration>
+**This skill is used after:**
+- hyperpowers:writing-plans (creates initial plan)
+- hyperpowers:brainstorming (establishes requirements)
+
+**This skill is used before:**
+- hyperpowers:executing-plans (implements tasks)
+
+**This skill is also called by:**
+- hyperpowers:executing-plans (REQUIRED for new tasks created during execution)
+
+**Call chains:**
+```
+Initial planning:
+hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
+                                                    ↓
+                                            (if gaps: revise and re-review)
+
+During execution (for new tasks):
+hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint
+```
+
+**This skill uses:**
+- bd commands (show, update, create, dep add, dep tree)
+- Google Fellow SRE perspective (20+ years distributed systems)
+- 7-category checklist (mandatory for every task)
+
+**Time expectations:**
+- Small epic (3-5 tasks): 15-20 minutes
+- Medium epic (6-10 tasks): 25-40 minutes
+- Large epic (10+ tasks): 45-60 minutes
+
+**Don't rush:** Catching one critical gap pre-implementation saves hours of rework.
+</integration>
+
+<resources>
+**Review patterns:**
+- Task too large (>16h) → Break into 4-8h subtasks
+- Vague criteria ("works correctly") → Measurable commands/checks
+- Missing edge cases → Add to Key Considerations with mitigations
+- Placeholder text → Rewrite with actual content
+
+**When stuck:**
+- Unsure if task too large → Ask: Can junior complete in one day?
+- Unsure if criteria measurable → Ask: Can I verify with command/code review?
+- Unsure if edge case matters → Ask: Could this fail in production?
+- Unsure if placeholder → Ask: Does this reference other content instead of providing content?
+
+**Key principle:** Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.
+</resources>