--- name: sre-task-refinement description: Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements. --- Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation. LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps. | Category | Key Questions | Auto-Reject If | |----------|---------------|----------------| | 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown | | 2. Implementability | Junior can execute without questions? | Vague language, missing details | | 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") | | 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies | | 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section | | 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration | | 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" | **Perspective**: Google Fellow SRE with 20+ years experience reviewing junior engineer designs. **Time**: Don't rush - catching one gap pre-implementation saves hours of rework. Use when: - Reviewing bd epic/feature plans before implementation - Need to ensure junior engineer can execute without questions - Want to catch edge cases and failure modes upfront - Need to verify task granularity (4-8 hour subtasks) - After hyperpowers:writing-plans creates initial plan - Before hyperpowers:executing-plans starts implementation Don't use when: - Task already being implemented (too late) - Just need to understand existing code (use codebase-investigator) - Debugging issues (use debugging-with-tools) - Want to create plan from scratch (use brainstorming → writing-plans) ## Announcement **Announce:** "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny." --- ## Review Checklist (Apply to Every Task) ### 1. Task Granularity **Check:** - [ ] No task >8 hours (subtasks) or >16 hours (phases)? - [ ] Large phases broken into 4-8 hour subtasks? - [ ] Each subtask independently completable? - [ ] Each subtask has clear deliverable? **If task >16 hours:** - Create subtasks with `bd create` - Link with `bd dep add child parent --type parent-child` - Update parent to coordinator role --- ### 2. Implementability (Junior Engineer Test) **Check:** - [ ] Can junior engineer implement without asking questions? - [ ] Function signatures/behaviors described, not just "implement X"? - [ ] Test scenarios described (what they verify, not just names)? - [ ] "Done" clearly defined with verifiable criteria? - [ ] All file paths specified or marked "TBD: new file"? **Red flags:** - "Implement properly" (how?) - "Add support" (for what exactly?) - "Make it work" (what does working mean?) - File paths missing or ambiguous --- ### 3. Success Criteria Quality **Check:** - [ ] Each task has 3+ specific, measurable success criteria? - [ ] All criteria testable/verifiable (not subjective)? - [ ] Includes automated verification (tests pass, clippy clean)? - [ ] No vague criteria like "works well" or "is implemented"? **Good criteria examples:** - ✅ "5+ unit tests pass (valid VIN, invalid checksum, various formats)" - ✅ "Clippy clean with no warnings" - ✅ "Performance: <100ms for 1000 records" **Bad criteria examples:** - ❌ "Code is good quality" - ❌ "Works correctly" - ❌ "Is implemented" --- ### 4. Dependency Structure **Check:** - [ ] Parent-child relationships correct (epic → phases → subtasks)? - [ ] Blocking dependencies correct (earlier work blocks later)? - [ ] No circular dependencies? - [ ] Dependency graph makes logical sense? **Verify with:** ```bash bd dep tree bd-1 # Show full dependency tree ``` --- ### 5. Safety & Quality Standards **Check:** - [ ] Anti-patterns include unwrap/expect prohibition? - [ ] Anti-patterns include TODO prohibition (or must have issue #)? - [ ] Anti-patterns include stub implementation prohibition? - [ ] Error handling requirements specified (use Result, avoid panic)? - [ ] Test requirements specific (test names, scenarios listed)? **Minimum anti-patterns:** - ❌ No unwrap/expect in production code - ❌ No TODOs without issue numbers - ❌ No stub implementations (unimplemented!, todo!) - ❌ No regex without catastrophic backtracking check --- ### 6. Edge Cases & Failure Modes (Fellow SRE Perspective) **Ask for each task:** - [ ] What happens with malformed input? - [ ] What happens with empty/nil/zero values? - [ ] What happens under high load/concurrency? - [ ] What happens when dependencies fail? - [ ] What happens with Unicode, special characters, large inputs? - [ ] Are these edge cases addressed in the plan? **Add to Key Considerations section:** - Edge case descriptions - Mitigation strategies - References to similar code handling these cases --- ### 7. Red Flags (AUTO-REJECT) **Check for these - if found, REJECT plan:** - ❌ Any task >16 hours without subtask breakdown - ❌ Vague language: "implement properly", "add support", "make it work" - ❌ Success criteria that can't be verified: "code is good", "works well" - ❌ Missing test specifications - ❌ "We'll handle this later" or "TODO" in the plan itself - ❌ No anti-patterns section - ❌ Implementation checklist with fewer than 3 items per task - ❌ No effort estimates - ❌ Missing error handling considerations - ❌ **CRITICAL: Placeholder text in design field** - "[detailed above]", "[as specified]", "[complete steps here]" --- ## Review Process For each task in the plan: **Step 1: Read the task** ```bash bd show bd-3 ``` **Step 2: Apply all 7 checklist categories** - Task Granularity - Implementability - Success Criteria Quality - Dependency Structure - Safety & Quality Standards - Edge Cases & Failure Modes - Red Flags **Step 3: Document findings** Take notes: - What's done well - What's missing - What's vague or ambiguous - Hidden failure modes not addressed - Better approaches or simplifications **Step 4: Update the task** Use `bd update` to add missing information: ```bash bd update bd-3 --design "$(cat <<'EOF' ## Goal [Original goal, preserved] ## Effort Estimate [Updated estimate if needed] ## Success Criteria - [ ] Existing criteria - [ ] NEW: Added missing measurable criteria ## Implementation Checklist [Complete checklist with file paths] ## Key Considerations (ADDED BY SRE REVIEW) **Edge Case: Empty Input** - What happens when input is empty string? - MUST validate input length before processing **Edge Case: Unicode Handling** - What if string contains RTL or surrogate pairs? - Use proper Unicode-aware string methods **Performance Concern: Regex Backtracking** - Pattern `.*[a-z]+.*` has catastrophic backtracking risk - MUST test with pathological inputs (e.g., 10000 'a's) - Use possessive quantifiers or bounded repetition **Reference Implementation** - Study src/similar/module.rs for pattern to follow ## Anti-patterns [Original anti-patterns] - ❌ NEW: Specific anti-pattern for this task's risks EOF )" ``` **IMPORTANT:** Use `--design` for full detailed description, NOT `--description` (title only). **Step 5: Verify no placeholder text (MANDATORY)** After updating, read back with `bd show bd-N` and verify: - ✅ All sections contain actual content, not meta-references - ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]" - ✅ Implementation steps fully written with actual code examples - ✅ Success criteria explicit, not referencing "criteria above" - ❌ If ANY placeholder text found: REJECT and rewrite with actual content --- ## Breaking Down Large Tasks If task >16 hours, create subtasks: ```bash # Create first subtask bd create "Subtask 1: [Specific Component]" \ --type task \ --priority 1 \ --design "[Complete subtask design with all 7 categories addressed]" # Returns bd-10 # Create second subtask bd create "Subtask 2: [Another Component]" \ --type task \ --priority 1 \ --design "[Complete subtask design]" # Returns bd-11 # Link subtasks to parent with parent-child relationship bd dep add bd-10 bd-3 --type parent-child # bd-10 is child of bd-3 bd dep add bd-11 bd-3 --type parent-child # bd-11 is child of bd-3 # Add sequential dependencies if needed (LATER depends on EARLIER) bd dep add bd-11 bd-10 # bd-11 depends on bd-10 (do bd-10 first) # Update parent to coordinator bd update bd-3 --design "$(cat <<'EOF' ## Goal Coordinate implementation of [feature]. Broken into N subtasks. ## Success Criteria - [ ] All N child subtasks closed - [ ] Integration tests pass - [ ] [High-level verification criteria] EOF )" ``` --- ## Output Format After reviewing all tasks: ```markdown ## Plan Review Results ### Epic: [Name] ([epic-id]) ### Overall Assessment [APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌] ### Dependency Structure Review [Output of `bd dep tree [epic-id]`] **Structure Quality**: [✅ Correct / ❌ Issues found] - [Comments on parent-child relationships] - [Comments on blocking dependencies] - [Comments on granularity] ### Task-by-Task Review #### [Task Name] (bd-N) **Type**: [epic/feature/task] **Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision] **Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown]) **Strengths**: - [What's done well] **Critical Issues** (must fix): - [Blocking problems] **Improvements Needed**: - [What to add/clarify] **Edge Cases Missing**: - [Failure modes not addressed] **Changes Made**: - [Specific improvements added via `bd update`] --- [Repeat for each task/phase/subtask] ### Summary of Changes **Issues Updated**: - bd-3 - Added edge case handling for Unicode, regex backtracking risks - bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours) - bd-7 - Strengthened success criteria (added test names, verification commands) ### Critical Gaps Across Plan 1. [Pattern of missing items across multiple tasks] 2. [Systemic issues in the plan] ### Recommendations [If APPROVE]: ✅ Plan is solid and ready for implementation. - All tasks are junior-engineer implementable - Dependency structure is correct - Edge cases and failure modes addressed [If NEEDS REVISION]: ⚠️ Plan needs improvements before implementation: - [List major items that need addressing] - After changes, re-run hyperpowers:sre-task-refinement [If REJECT]: ❌ Plan has fundamental issues and needs redesign: - [Critical problems] ``` Developer reviews task but skips edge case analysis (Category 6) # Review of bd-3: Implement VIN scanner ## Checklist review: 1. Granularity: ✅ 6-8 hours 2. Implementability: ✅ Junior can implement 3. Success Criteria: ✅ Has 5 test scenarios 4. Dependencies: ✅ Correct 5. Safety Standards: ✅ Anti-patterns present 6. Edge Cases: [SKIPPED - "looks straightforward"] 7. Red Flags: ✅ None found Conclusion: "Task looks good, approve ✅" # Task ships without edge case review # Production issues occur: - VIN scanner matches random 17-char strings (no checksum validation) - Lowercase VINs not handled (should normalize) - Catastrophic regex backtracking on long inputs (DoS vulnerability) - Skipped Category 6 (Edge Cases) assuming task was "straightforward" - Didn't ask: What happens with invalid checksums? Lowercase? Long inputs? - Missed critical production issues: - False positives (no checksum validation) - Data handling bugs (case sensitivity) - Security vulnerability (regex DoS) - Junior engineer didn't know to handle these (not in task) - Production incidents occur after deployment - Hours of emergency fixes, customer impact - SRE review failed to prevent known failure modes **Apply Category 6 rigorously:** ```markdown ## Edge Case Analysis for bd-3: VIN Scanner Ask for EVERY task: - Malformed input? VIN has checksum - must validate, not just pattern match - Empty/nil? What if empty string passed? - Concurrency? Read-only scanner, no concurrency issues - Dependency failures? No external dependencies - Unicode/special chars? VIN is alphanumeric only, but what about lowercase? - Large inputs? Regex `.*` patterns can cause catastrophic backtracking Findings: ❌ VIN checksum validation not mentioned (will match random strings) ❌ Case normalization not mentioned (lowercase VINs exist) ❌ Regex backtracking risk not mentioned (DoS vulnerability) ``` **Update task:** ```bash bd update bd-3 --design "$(cat <<'EOF' [... original content ...] ## Key Considerations (ADDED BY SRE REVIEW) **VIN Checksum Complexity**: - ISO 3779 requires transliteration table (letters → numbers) - Weighted sum algorithm with modulo 11 - Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit - MUST validate checksum, not just pattern - prevents false positives **Case Normalization**: - VINs can appear in lowercase - MUST normalize to uppercase before validation - Test with mixed case: "1hgbh41jxmn109186" **Regex Backtracking Risk**: - CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk - Test with pathological input: 10000 'X's followed by 16-char string - Use possessive quantifiers or bounded repetition - Reference: https://www.regular-expressions.info/catastrophic.html **Edge Cases to Test**: - Valid VIN with valid checksum (should match) - Valid pattern but invalid checksum (should NOT match) - Lowercase VIN (should normalize and validate) - Ambiguous chars I/O not valid in VIN (should reject) - Very long input (should not DoS) EOF )" ``` **What you gain:** - Prevented false positives (checksum validation) - Prevented data handling bugs (case normalization) - Prevented security vulnerability (regex DoS) - Junior engineer has complete requirements - Production issues caught pre-implementation - Proper SRE review preventing known failure modes - Customer trust maintained Developer approves task with placeholder text (Red Flag #10) # Review of bd-5: Implement License Plate Scanner bd show bd-5: ## Implementation Checklist - [ ] Create scanner module - [ ] [Complete implementation steps detailed above] - [ ] Add tests ## Success Criteria - [ ] [As specified in the implementation checklist] - [ ] Tests pass ## Key Considerations - [Will be added during implementation] # Developer's review: "Looks comprehensive, has implementation checklist and success criteria ✅" # During implementation: Junior engineer: "What are the 'implementation steps detailed above'?" Junior engineer: "What specific success criteria should I verify?" Junior engineer: "What key considerations exist?" # No answers in the task - junior engineer blocked # Have to research and add missing information # Implementation delayed by 2 days - Missed Red Flag #10: Placeholder text present - "[Complete implementation steps detailed above]" is meta-reference, not content - "[As specified in the implementation checklist]" is circular reference - "[Will be added during implementation]" is deferral, not specification - Junior engineer can't execute - missing critical information - Task looks complete but actually incomplete - Implementation blocked until details added - SRE review failed to catch placeholder text **Check for placeholder text after reading:** ```markdown ## Red Flag Check (Category 7) Read through bd-5 line by line: Line 15: "[Complete implementation steps detailed above]" ❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content Line 22: "[As specified in the implementation checklist]" ❌ PLACEHOLDER - Circular reference to another section, not explicit criteria Line 30: "[Will be added during implementation]" ❌ PLACEHOLDER - Deferral to future, not actual considerations DECISION: REJECT ❌ Reason: Contains placeholder text - task not ready for implementation ``` **Update task with actual content:** ```bash bd update bd-5 --design "$(cat <<'EOF' ## Implementation Checklist - [ ] Create src/scan/plugins/scanners/license_plate.rs - [ ] Implement LicensePlateScanner struct with ScanPlugin trait - [ ] Add regex patterns for US states: - CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123) - NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234) - TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC) - Generic: `[A-Z0-9]{5,8}` (fallback) - [ ] Implement has_healthcare_context() check - [ ] Create test module with 8+ test cases - [ ] Register in src/scan/plugins/scanners/mod.rs ## Success Criteria - [ ] Valid CA plate "1ABC123" detected in healthcare context - [ ] Valid NY plate "ABC1234" detected in healthcare context - [ ] Invalid plate "123" NOT detected (too short) - [ ] Valid plate NOT detected outside healthcare context - [ ] 8+ unit tests pass covering all patterns and edge cases - [ ] Clippy clean, no warnings - [ ] cargo test passes ## Key Considerations **False Positive Risk**: - License plates are short and generic (5-8 chars) - MUST require healthcare context via has_healthcare_context() - Without context, will match random alphanumeric sequences - Test: Random string "ABC1234" should NOT match outside healthcare context **State Format Variations**: - 50 US states have different formats - Implement common formats (CA, NY, TX) + generic fallback - Document which formats supported in module docstring - Consider international plates in future iteration **Performance**: - Regex patterns are simple, no backtracking risk - Should process <1ms per chunk **Reference Implementation**: - Study src/scan/plugins/scanners/vehicle_identifier.rs - Follow same pattern: regex + context check + tests EOF )" ``` **Verify no placeholder text:** ```bash bd show bd-5 # Read entire output # Confirm: All sections have actual content # Confirm: No "[detailed above]", "[as specified]", "[will be added]" # ✅ Task ready for implementation ``` **What you gain:** - Junior engineer has complete specification - No blocked implementation waiting for details - All edge cases documented upfront - Success criteria explicit and verifiable - Key considerations prevent common mistakes - No placeholder text - task truly ready - Professional SRE review standard maintained Developer accepts vague success criteria (Category 3) # Review of bd-7: Implement Data Encryption bd show bd-7: ## Success Criteria - [ ] Encryption is implemented correctly - [ ] Code is good quality - [ ] Tests work properly # Developer's review: "Has 3 success criteria ✅ Meets minimum requirement" # During implementation: Junior engineer: "How do I know if encryption is 'correct'?" Junior engineer: "What makes code 'good quality'?" Junior engineer: "What does 'tests work properly' mean?" # Junior engineer makes best guesses: - Uses ECB mode (insecure, should use GCM) - No key rotation (bad practice) - Tests only happy path (misses edge cases) # Code review finds critical security issues # Complete rewrite required - 3 days wasted - Category 3 check: "All criteria testable/verifiable?" ❌ FAILED - "Implemented correctly" - not measurable (correct by what standard?) - "Good quality" - subjective, not verifiable - "Work properly" - vague, what is proper? - Junior engineer can't verify criteria objectively - Makes incorrect assumptions filling gaps - Security vulnerabilities introduced - Wastes time on implementation that fails review - SRE review failed to strengthen criteria **Apply Category 3 rigorously:** ```markdown ## Success Criteria Analysis for bd-7 Current criteria: - [ ] Encryption is implemented correctly ❌ NOT TESTABLE - "correctly" is subjective, no standard specified - [ ] Code is good quality ❌ NOT TESTABLE - "good quality" is opinion, not measurable - [ ] Tests work properly ❌ NOT TESTABLE - "properly" is vague, no definition Minimum requirement: 3+ specific, measurable, testable criteria Current: 0 testable criteria DECISION: REJECT ❌ ``` **Update with measurable criteria:** ```bash bd update bd-7 --design "$(cat <<'EOF' [... original content ...] ## Success Criteria **Encryption Implementation**: - [ ] Uses AES-256-GCM mode (verified in code review) - [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation) - [ ] Unique IV generated per encryption (crypto_random) - [ ] Authentication tag verified on decryption **Code Quality** (automated checks): - [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings` - [ ] Rustfmt compliant: `cargo fmt --check` - [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0 - [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0 **Test Coverage**: - [ ] 12+ unit tests pass covering: - test_encrypt_decrypt_roundtrip (happy path) - test_wrong_key_fails_auth (security) - test_modified_ciphertext_fails_auth (security) - test_empty_plaintext (edge case) - test_large_plaintext_10mb (performance) - test_unicode_plaintext (data handling) - test_concurrent_encryption (thread safety) - test_iv_uniqueness (security) - [4 more specific scenarios] - [ ] All tests pass: `cargo test encryption` - [ ] Test coverage >90%: `cargo tarpaulin --packages encryption` **Documentation**: - [ ] Module docstring explains encryption scheme (AES-256-GCM) - [ ] Function docstrings include examples - [ ] Security considerations documented (key management, IV handling) **Security Review**: - [ ] No hardcoded keys or IVs (verified via grep) - [ ] Key zeroized after use (verified in code) - [ ] Constant-time comparison for auth tag (timing attack prevention) EOF )" ``` **What you gain:** - Every criterion objectively verifiable - Junior engineer knows exactly what "done" means - Automated checks (clippy, fmt, grep) provide instant feedback - Specific test scenarios prevent missed edge cases - Security requirements explicit (GCM, PBKDF2, unique IV) - No ambiguity - can verify each criterion with command or code review - Professional SRE review standard: measurable, testable, specific ## Rules That Have No Exceptions 1. **Apply all 7 categories to every task** → No skipping any category for any task 2. **Reject plans with placeholder text** → "[detailed above]", "[as specified]" = instant reject 3. **Verify no placeholder after updates** → Read back with `bd show` and confirm actual content 4. **Break tasks >16 hours** → Create subtasks, don't accept large tasks 5. **Strengthen vague criteria** → "Works correctly" → measurable verification commands 6. **Add edge cases to every task** → Empty? Unicode? Concurrency? Failures? 7. **Never skip Category 6** → Edge case analysis prevents production issues ## Common Excuses All of these mean: **STOP. Apply the full process.** - "Task looks straightforward" (Edge cases hide in "straightforward" tasks) - "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items) - "Placeholder text is just formatting" (Placeholders mean incomplete specification) - "Can handle edge cases during implementation" (Must specify upfront, not defer) - "Junior will figure it out" (Junior should NOT need to figure out - we specify) - "Too detailed, feels like micromanaging" (Detail prevents questions and rework) - "Taking too long to review" (One gap caught saves hours of rework) Before completing SRE review: **Per task reviewed:** - [ ] Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags) - [ ] Checked for placeholder text in design field - [ ] Updated task with missing information via `bd update --design` - [ ] Verified updated task with `bd show` (no placeholders remain) - [ ] Broke down any task >16 hours into subtasks - [ ] Strengthened vague success criteria to measurable - [ ] Added edge case analysis to Key Considerations - [ ] Strengthened anti-patterns based on failure modes **Overall plan:** - [ ] Reviewed ALL tasks/phases/subtasks (no exceptions) - [ ] Verified dependency structure with `bd dep tree` - [ ] Documented findings for each task - [ ] Created summary of changes made - [ ] Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT) **Can't check all boxes?** Return to review process and complete missing steps. **This skill is used after:** - hyperpowers:writing-plans (creates initial plan) - hyperpowers:brainstorming (establishes requirements) **This skill is used before:** - hyperpowers:executing-plans (implements tasks) **This skill is also called by:** - hyperpowers:executing-plans (REQUIRED for new tasks created during execution) **Call chains:** ``` Initial planning: hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans ↓ (if gaps: revise and re-review) During execution (for new tasks): hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint ``` **This skill uses:** - bd commands (show, update, create, dep add, dep tree) - Google Fellow SRE perspective (20+ years distributed systems) - 7-category checklist (mandatory for every task) **Time expectations:** - Small epic (3-5 tasks): 15-20 minutes - Medium epic (6-10 tasks): 25-40 minutes - Large epic (10+ tasks): 45-60 minutes **Don't rush:** Catching one critical gap pre-implementation saves hours of rework. **Review patterns:** - Task too large (>16h) → Break into 4-8h subtasks - Vague criteria ("works correctly") → Measurable commands/checks - Missing edge cases → Add to Key Considerations with mitigations - Placeholder text → Rewrite with actual content **When stuck:** - Unsure if task too large → Ask: Can junior complete in one day? - Unsure if criteria measurable → Ask: Can I verify with command/code review? - Unsure if edge case matters → Ask: Could this fail in production? - Unsure if placeholder → Ask: Does this reference other content instead of providing content? **Key principle:** Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.