827 lines
26 KiB
Markdown
827 lines
26 KiB
Markdown
---
|
|
name: sre-task-refinement
|
|
description: Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.
|
|
---
|
|
|
|
<skill_overview>
|
|
Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation.
|
|
</skill_overview>
|
|
|
|
<rigidity_level>
|
|
LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps.
|
|
</rigidity_level>
|
|
|
|
<quick_reference>
|
|
| Category | Key Questions | Auto-Reject If |
|
|
|----------|---------------|----------------|
|
|
| 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown |
|
|
| 2. Implementability | Junior can execute without questions? | Vague language, missing details |
|
|
| 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") |
|
|
| 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies |
|
|
| 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section |
|
|
| 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration |
|
|
| 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" |
|
|
|
|
**Perspective**: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.
|
|
|
|
**Time**: Don't rush - catching one gap pre-implementation saves hours of rework.
|
|
</quick_reference>
|
|
|
|
<when_to_use>
|
|
Use when:
|
|
- Reviewing bd epic/feature plans before implementation
|
|
- Need to ensure junior engineer can execute without questions
|
|
- Want to catch edge cases and failure modes upfront
|
|
- Need to verify task granularity (4-8 hour subtasks)
|
|
- After hyperpowers:writing-plans creates initial plan
|
|
- Before hyperpowers:executing-plans starts implementation
|
|
|
|
Don't use when:
|
|
- Task already being implemented (too late)
|
|
- Just need to understand existing code (use codebase-investigator)
|
|
- Debugging issues (use debugging-with-tools)
|
|
- Want to create plan from scratch (use brainstorming → writing-plans)
|
|
</when_to_use>
|
|
|
|
<the_process>
|
|
## Announcement
|
|
|
|
**Announce:** "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."
|
|
|
|
---
|
|
|
|
## Review Checklist (Apply to Every Task)
|
|
|
|
### 1. Task Granularity
|
|
|
|
**Check:**
|
|
- [ ] No task >8 hours (subtasks) or >16 hours (phases)?
|
|
- [ ] Large phases broken into 4-8 hour subtasks?
|
|
- [ ] Each subtask independently completable?
|
|
- [ ] Each subtask has clear deliverable?
|
|
|
|
**If task >16 hours:**
|
|
- Create subtasks with `bd create`
|
|
- Link with `bd dep add child parent --type parent-child`
|
|
- Update parent to coordinator role
|
|
|
|
---
|
|
|
|
### 2. Implementability (Junior Engineer Test)
|
|
|
|
**Check:**
|
|
- [ ] Can junior engineer implement without asking questions?
|
|
- [ ] Function signatures/behaviors described, not just "implement X"?
|
|
- [ ] Test scenarios described (what they verify, not just names)?
|
|
- [ ] "Done" clearly defined with verifiable criteria?
|
|
- [ ] All file paths specified or marked "TBD: new file"?
|
|
|
|
**Red flags:**
|
|
- "Implement properly" (how?)
|
|
- "Add support" (for what exactly?)
|
|
- "Make it work" (what does working mean?)
|
|
- File paths missing or ambiguous
|
|
|
|
---
|
|
|
|
### 3. Success Criteria Quality
|
|
|
|
**Check:**
|
|
- [ ] Each task has 3+ specific, measurable success criteria?
|
|
- [ ] All criteria testable/verifiable (not subjective)?
|
|
- [ ] Includes automated verification (tests pass, clippy clean)?
|
|
- [ ] No vague criteria like "works well" or "is implemented"?
|
|
|
|
**Good criteria examples:**
|
|
- ✅ "5+ unit tests pass (valid VIN, invalid checksum, various formats)"
|
|
- ✅ "Clippy clean with no warnings"
|
|
- ✅ "Performance: <100ms for 1000 records"
|
|
|
|
**Bad criteria examples:**
|
|
- ❌ "Code is good quality"
|
|
- ❌ "Works correctly"
|
|
- ❌ "Is implemented"
|
|
|
|
---
|
|
|
|
### 4. Dependency Structure
|
|
|
|
**Check:**
|
|
- [ ] Parent-child relationships correct (epic → phases → subtasks)?
|
|
- [ ] Blocking dependencies correct (earlier work blocks later)?
|
|
- [ ] No circular dependencies?
|
|
- [ ] Dependency graph makes logical sense?
|
|
|
|
**Verify with:**
|
|
```bash
|
|
bd dep tree bd-1 # Show full dependency tree
|
|
```
|
|
|
|
---
|
|
|
|
### 5. Safety & Quality Standards
|
|
|
|
**Check:**
|
|
- [ ] Anti-patterns include unwrap/expect prohibition?
|
|
- [ ] Anti-patterns include TODO prohibition (or must have issue #)?
|
|
- [ ] Anti-patterns include stub implementation prohibition?
|
|
- [ ] Error handling requirements specified (use Result, avoid panic)?
|
|
- [ ] Test requirements specific (test names, scenarios listed)?
|
|
|
|
**Minimum anti-patterns:**
|
|
- ❌ No unwrap/expect in production code
|
|
- ❌ No TODOs without issue numbers
|
|
- ❌ No stub implementations (unimplemented!, todo!)
|
|
- ❌ No regex without catastrophic backtracking check
|
|
|
|
---
|
|
|
|
### 6. Edge Cases & Failure Modes (Fellow SRE Perspective)
|
|
|
|
**Ask for each task:**
|
|
- [ ] What happens with malformed input?
|
|
- [ ] What happens with empty/nil/zero values?
|
|
- [ ] What happens under high load/concurrency?
|
|
- [ ] What happens when dependencies fail?
|
|
- [ ] What happens with Unicode, special characters, large inputs?
|
|
- [ ] Are these edge cases addressed in the plan?
|
|
|
|
**Add to Key Considerations section:**
|
|
- Edge case descriptions
|
|
- Mitigation strategies
|
|
- References to similar code handling these cases
|
|
|
|
---
|
|
|
|
### 7. Red Flags (AUTO-REJECT)
|
|
|
|
**Check for these - if found, REJECT plan:**
|
|
- ❌ Any task >16 hours without subtask breakdown
|
|
- ❌ Vague language: "implement properly", "add support", "make it work"
|
|
- ❌ Success criteria that can't be verified: "code is good", "works well"
|
|
- ❌ Missing test specifications
|
|
- ❌ "We'll handle this later" or "TODO" in the plan itself
|
|
- ❌ No anti-patterns section
|
|
- ❌ Implementation checklist with fewer than 3 items per task
|
|
- ❌ No effort estimates
|
|
- ❌ Missing error handling considerations
|
|
- ❌ **CRITICAL: Placeholder text in design field** - "[detailed above]", "[as specified]", "[complete steps here]"
|
|
|
|
---
|
|
|
|
## Review Process
|
|
|
|
For each task in the plan:
|
|
|
|
**Step 1: Read the task**
|
|
```bash
|
|
bd show bd-3
|
|
```
|
|
|
|
**Step 2: Apply all 7 checklist categories**
|
|
- Task Granularity
|
|
- Implementability
|
|
- Success Criteria Quality
|
|
- Dependency Structure
|
|
- Safety & Quality Standards
|
|
- Edge Cases & Failure Modes
|
|
- Red Flags
|
|
|
|
**Step 3: Document findings**
|
|
Take notes:
|
|
- What's done well
|
|
- What's missing
|
|
- What's vague or ambiguous
|
|
- Hidden failure modes not addressed
|
|
- Better approaches or simplifications
|
|
|
|
**Step 4: Update the task**
|
|
|
|
Use `bd update` to add missing information:
|
|
|
|
```bash
|
|
bd update bd-3 --design "$(cat <<'EOF'
|
|
## Goal
|
|
[Original goal, preserved]
|
|
|
|
## Effort Estimate
|
|
[Updated estimate if needed]
|
|
|
|
## Success Criteria
|
|
- [ ] Existing criteria
|
|
- [ ] NEW: Added missing measurable criteria
|
|
|
|
## Implementation Checklist
|
|
[Complete checklist with file paths]
|
|
|
|
## Key Considerations (ADDED BY SRE REVIEW)
|
|
|
|
**Edge Case: Empty Input**
|
|
- What happens when input is empty string?
|
|
- MUST validate input length before processing
|
|
|
|
**Edge Case: Unicode Handling**
|
|
- What if string contains RTL or surrogate pairs?
|
|
- Use proper Unicode-aware string methods
|
|
|
|
**Performance Concern: Regex Backtracking**
|
|
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
|
|
- MUST test with pathological inputs (e.g., 10000 'a's)
|
|
- Use possessive quantifiers or bounded repetition
|
|
|
|
**Reference Implementation**
|
|
- Study src/similar/module.rs for pattern to follow
|
|
|
|
## Anti-patterns
|
|
[Original anti-patterns]
|
|
- ❌ NEW: Specific anti-pattern for this task's risks
|
|
EOF
|
|
)"
|
|
```
|
|
|
|
**IMPORTANT:** Use `--design` for full detailed description, NOT `--description` (title only).
|
|
|
|
**Step 5: Verify no placeholder text (MANDATORY)**
|
|
|
|
After updating, read back with `bd show bd-N` and verify:
|
|
- ✅ All sections contain actual content, not meta-references
|
|
- ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
|
|
- ✅ Implementation steps fully written with actual code examples
|
|
- ✅ Success criteria explicit, not referencing "criteria above"
|
|
- ❌ If ANY placeholder text found: REJECT and rewrite with actual content
|
|
|
|
---
|
|
|
|
## Breaking Down Large Tasks
|
|
|
|
If task >16 hours, create subtasks:
|
|
|
|
```bash
|
|
# Create first subtask
|
|
bd create "Subtask 1: [Specific Component]" \
|
|
--type task \
|
|
--priority 1 \
|
|
--design "[Complete subtask design with all 7 categories addressed]"
|
|
# Returns bd-10
|
|
|
|
# Create second subtask
|
|
bd create "Subtask 2: [Another Component]" \
|
|
--type task \
|
|
--priority 1 \
|
|
--design "[Complete subtask design]"
|
|
# Returns bd-11
|
|
|
|
# Link subtasks to parent with parent-child relationship
|
|
bd dep add bd-10 bd-3 --type parent-child # bd-10 is child of bd-3
|
|
bd dep add bd-11 bd-3 --type parent-child # bd-11 is child of bd-3
|
|
|
|
# Add sequential dependencies if needed (LATER depends on EARLIER)
|
|
bd dep add bd-11 bd-10 # bd-11 depends on bd-10 (do bd-10 first)
|
|
|
|
# Update parent to coordinator
|
|
bd update bd-3 --design "$(cat <<'EOF'
|
|
## Goal
|
|
Coordinate implementation of [feature]. Broken into N subtasks.
|
|
|
|
## Success Criteria
|
|
- [ ] All N child subtasks closed
|
|
- [ ] Integration tests pass
|
|
- [ ] [High-level verification criteria]
|
|
EOF
|
|
)"
|
|
```
|
|
|
|
---
|
|
|
|
## Output Format
|
|
|
|
After reviewing all tasks:
|
|
|
|
```markdown
|
|
## Plan Review Results
|
|
|
|
### Epic: [Name] ([epic-id])
|
|
|
|
### Overall Assessment
|
|
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]
|
|
|
|
### Dependency Structure Review
|
|
[Output of `bd dep tree [epic-id]`]
|
|
|
|
**Structure Quality**: [✅ Correct / ❌ Issues found]
|
|
- [Comments on parent-child relationships]
|
|
- [Comments on blocking dependencies]
|
|
- [Comments on granularity]
|
|
|
|
### Task-by-Task Review
|
|
|
|
#### [Task Name] (bd-N)
|
|
**Type**: [epic/feature/task]
|
|
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
|
|
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])
|
|
|
|
**Strengths**:
|
|
- [What's done well]
|
|
|
|
**Critical Issues** (must fix):
|
|
- [Blocking problems]
|
|
|
|
**Improvements Needed**:
|
|
- [What to add/clarify]
|
|
|
|
**Edge Cases Missing**:
|
|
- [Failure modes not addressed]
|
|
|
|
**Changes Made**:
|
|
- [Specific improvements added via `bd update`]
|
|
|
|
---
|
|
|
|
[Repeat for each task/phase/subtask]
|
|
|
|
### Summary of Changes
|
|
|
|
**Issues Updated**:
|
|
- bd-3 - Added edge case handling for Unicode, regex backtracking risks
|
|
- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
|
|
- bd-7 - Strengthened success criteria (added test names, verification commands)
|
|
|
|
### Critical Gaps Across Plan
|
|
1. [Pattern of missing items across multiple tasks]
|
|
2. [Systemic issues in the plan]
|
|
|
|
### Recommendations
|
|
|
|
[If APPROVE]:
|
|
✅ Plan is solid and ready for implementation.
|
|
- All tasks are junior-engineer implementable
|
|
- Dependency structure is correct
|
|
- Edge cases and failure modes addressed
|
|
|
|
[If NEEDS REVISION]:
|
|
⚠️ Plan needs improvements before implementation:
|
|
- [List major items that need addressing]
|
|
- After changes, re-run hyperpowers:sre-task-refinement
|
|
|
|
[If REJECT]:
|
|
❌ Plan has fundamental issues and needs redesign:
|
|
- [Critical problems]
|
|
```
|
|
</the_process>
|
|
|
|
<examples>
|
|
<example>
|
|
<scenario>Developer reviews task but skips edge case analysis (Category 6)</scenario>
|
|
|
|
<code>
|
|
# Review of bd-3: Implement VIN scanner
|
|
|
|
## Checklist review:
|
|
1. Granularity: ✅ 6-8 hours
|
|
2. Implementability: ✅ Junior can implement
|
|
3. Success Criteria: ✅ Has 5 test scenarios
|
|
4. Dependencies: ✅ Correct
|
|
5. Safety Standards: ✅ Anti-patterns present
|
|
6. Edge Cases: [SKIPPED - "looks straightforward"]
|
|
7. Red Flags: ✅ None found
|
|
|
|
Conclusion: "Task looks good, approve ✅"
|
|
|
|
# Task ships without edge case review
|
|
# Production issues occur:
|
|
- VIN scanner matches random 17-char strings (no checksum validation)
|
|
- Lowercase VINs not handled (should normalize)
|
|
- Catastrophic regex backtracking on long inputs (DoS vulnerability)
|
|
</code>
|
|
|
|
<why_it_fails>
|
|
- Skipped Category 6 (Edge Cases) assuming task was "straightforward"
|
|
- Didn't ask: What happens with invalid checksums? Lowercase? Long inputs?
|
|
- Missed critical production issues:
|
|
- False positives (no checksum validation)
|
|
- Data handling bugs (case sensitivity)
|
|
- Security vulnerability (regex DoS)
|
|
- Junior engineer didn't know to handle these (not in task)
|
|
- Production incidents occur after deployment
|
|
- Hours of emergency fixes, customer impact
|
|
- SRE review failed to prevent known failure modes
|
|
</why_it_fails>
|
|
|
|
<correction>
|
|
**Apply Category 6 rigorously:**
|
|
|
|
```markdown
|
|
## Edge Case Analysis for bd-3: VIN Scanner
|
|
|
|
Ask for EVERY task:
|
|
- Malformed input? VIN has checksum - must validate, not just pattern match
|
|
- Empty/nil? What if empty string passed?
|
|
- Concurrency? Read-only scanner, no concurrency issues
|
|
- Dependency failures? No external dependencies
|
|
- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
|
|
- Large inputs? Regex `.*` patterns can cause catastrophic backtracking
|
|
|
|
Findings:
|
|
❌ VIN checksum validation not mentioned (will match random strings)
|
|
❌ Case normalization not mentioned (lowercase VINs exist)
|
|
❌ Regex backtracking risk not mentioned (DoS vulnerability)
|
|
```
|
|
|
|
**Update task:**
|
|
```bash
|
|
bd update bd-3 --design "$(cat <<'EOF'
|
|
[... original content ...]
|
|
|
|
## Key Considerations (ADDED BY SRE REVIEW)
|
|
|
|
**VIN Checksum Complexity**:
|
|
- ISO 3779 requires transliteration table (letters → numbers)
|
|
- Weighted sum algorithm with modulo 11
|
|
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
|
|
- MUST validate checksum, not just pattern - prevents false positives
|
|
|
|
**Case Normalization**:
|
|
- VINs can appear in lowercase
|
|
- MUST normalize to uppercase before validation
|
|
- Test with mixed case: "1hgbh41jxmn109186"
|
|
|
|
**Regex Backtracking Risk**:
|
|
- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
|
|
- Test with pathological input: 10000 'X's followed by 16-char string
|
|
- Use possessive quantifiers or bounded repetition
|
|
- Reference: https://www.regular-expressions.info/catastrophic.html
|
|
|
|
**Edge Cases to Test**:
|
|
- Valid VIN with valid checksum (should match)
|
|
- Valid pattern but invalid checksum (should NOT match)
|
|
- Lowercase VIN (should normalize and validate)
|
|
- Ambiguous chars I/O not valid in VIN (should reject)
|
|
- Very long input (should not DoS)
|
|
EOF
|
|
)"
|
|
```
|
|
|
|
**What you gain:**
|
|
- Prevented false positives (checksum validation)
|
|
- Prevented data handling bugs (case normalization)
|
|
- Prevented security vulnerability (regex DoS)
|
|
- Junior engineer has complete requirements
|
|
- Production issues caught pre-implementation
|
|
- Proper SRE review preventing known failure modes
|
|
- Customer trust maintained
|
|
</correction>
|
|
</example>
|
|
|
|
<example>
|
|
<scenario>Developer approves task with placeholder text (Red Flag #10)</scenario>
|
|
|
|
<code>
|
|
# Review of bd-5: Implement License Plate Scanner
|
|
|
|
bd show bd-5:
|
|
|
|
## Implementation Checklist
|
|
- [ ] Create scanner module
|
|
- [ ] [Complete implementation steps detailed above]
|
|
- [ ] Add tests
|
|
|
|
## Success Criteria
|
|
- [ ] [As specified in the implementation checklist]
|
|
- [ ] Tests pass
|
|
|
|
## Key Considerations
|
|
- [Will be added during implementation]
|
|
|
|
# Developer's review:
|
|
"Looks comprehensive, has implementation checklist and success criteria ✅"
|
|
|
|
# During implementation:
|
|
Junior engineer: "What are the 'implementation steps detailed above'?"
|
|
Junior engineer: "What specific success criteria should I verify?"
|
|
Junior engineer: "What key considerations exist?"
|
|
|
|
# No answers in the task - junior engineer blocked
|
|
# Have to research and add missing information
|
|
# Implementation delayed by 2 days
|
|
</code>
|
|
|
|
<why_it_fails>
|
|
- Missed Red Flag #10: Placeholder text present
|
|
- "[Complete implementation steps detailed above]" is meta-reference, not content
|
|
- "[As specified in the implementation checklist]" is circular reference
|
|
- "[Will be added during implementation]" is deferral, not specification
|
|
- Junior engineer can't execute - missing critical information
|
|
- Task looks complete but actually incomplete
|
|
- Implementation blocked until details added
|
|
- SRE review failed to catch placeholder text
|
|
</why_it_fails>
|
|
|
|
<correction>
|
|
**Check for placeholder text after reading:**
|
|
|
|
```markdown
|
|
## Red Flag Check (Category 7)
|
|
|
|
Read through bd-5 line by line:
|
|
|
|
Line 15: "[Complete implementation steps detailed above]"
|
|
❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content
|
|
|
|
Line 22: "[As specified in the implementation checklist]"
|
|
❌ PLACEHOLDER - Circular reference to another section, not explicit criteria
|
|
|
|
Line 30: "[Will be added during implementation]"
|
|
❌ PLACEHOLDER - Deferral to future, not actual considerations
|
|
|
|
DECISION: REJECT ❌
|
|
Reason: Contains placeholder text - task not ready for implementation
|
|
```
|
|
|
|
**Update task with actual content:**
|
|
```bash
|
|
bd update bd-5 --design "$(cat <<'EOF'
|
|
## Implementation Checklist
|
|
- [ ] Create src/scan/plugins/scanners/license_plate.rs
|
|
- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
|
|
- [ ] Add regex patterns for US states:
|
|
- CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
|
|
- NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
|
|
- TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
|
|
- Generic: `[A-Z0-9]{5,8}` (fallback)
|
|
- [ ] Implement has_healthcare_context() check
|
|
- [ ] Create test module with 8+ test cases
|
|
- [ ] Register in src/scan/plugins/scanners/mod.rs
|
|
|
|
## Success Criteria
|
|
- [ ] Valid CA plate "1ABC123" detected in healthcare context
|
|
- [ ] Valid NY plate "ABC1234" detected in healthcare context
|
|
- [ ] Invalid plate "123" NOT detected (too short)
|
|
- [ ] Valid plate NOT detected outside healthcare context
|
|
- [ ] 8+ unit tests pass covering all patterns and edge cases
|
|
- [ ] Clippy clean, no warnings
|
|
- [ ] cargo test passes
|
|
|
|
## Key Considerations
|
|
|
|
**False Positive Risk**:
|
|
- License plates are short and generic (5-8 chars)
|
|
- MUST require healthcare context via has_healthcare_context()
|
|
- Without context, will match random alphanumeric sequences
|
|
- Test: Random string "ABC1234" should NOT match outside healthcare context
|
|
|
|
**State Format Variations**:
|
|
- 50 US states have different formats
|
|
- Implement common formats (CA, NY, TX) + generic fallback
|
|
- Document which formats supported in module docstring
|
|
- Consider international plates in future iteration
|
|
|
|
**Performance**:
|
|
- Regex patterns are simple, no backtracking risk
|
|
- Should process <1ms per chunk
|
|
|
|
**Reference Implementation**:
|
|
- Study src/scan/plugins/scanners/vehicle_identifier.rs
|
|
- Follow same pattern: regex + context check + tests
|
|
EOF
|
|
)"
|
|
```
|
|
|
|
**Verify no placeholder text:**
|
|
```bash
|
|
bd show bd-5
|
|
# Read entire output
|
|
# Confirm: All sections have actual content
|
|
# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
|
|
# ✅ Task ready for implementation
|
|
```
|
|
|
|
**What you gain:**
|
|
- Junior engineer has complete specification
|
|
- No blocked implementation waiting for details
|
|
- All edge cases documented upfront
|
|
- Success criteria explicit and verifiable
|
|
- Key considerations prevent common mistakes
|
|
- No placeholder text - task truly ready
|
|
- Professional SRE review standard maintained
|
|
</correction>
|
|
</example>
|
|
|
|
<example>
|
|
<scenario>Developer accepts vague success criteria (Category 3)</scenario>
|
|
|
|
<code>
|
|
# Review of bd-7: Implement Data Encryption
|
|
|
|
bd show bd-7:
|
|
|
|
## Success Criteria
|
|
- [ ] Encryption is implemented correctly
|
|
- [ ] Code is good quality
|
|
- [ ] Tests work properly
|
|
|
|
# Developer's review:
|
|
"Has 3 success criteria ✅ Meets minimum requirement"
|
|
|
|
# During implementation:
|
|
Junior engineer: "How do I know if encryption is 'correct'?"
|
|
Junior engineer: "What makes code 'good quality'?"
|
|
Junior engineer: "What does 'tests work properly' mean?"
|
|
|
|
# Junior engineer makes best guesses:
|
|
- Uses ECB mode (insecure, should use GCM)
|
|
- No key rotation (bad practice)
|
|
- Tests only happy path (misses edge cases)
|
|
|
|
# Code review finds critical security issues
|
|
# Complete rewrite required - 3 days wasted
|
|
</code>
|
|
|
|
<why_it_fails>
|
|
- Category 3 check: "All criteria testable/verifiable?" ❌ FAILED
|
|
- "Implemented correctly" - not measurable (correct by what standard?)
|
|
- "Good quality" - subjective, not verifiable
|
|
- "Work properly" - vague, what is proper?
|
|
- Junior engineer can't verify criteria objectively
|
|
- Makes incorrect assumptions filling gaps
|
|
- Security vulnerabilities introduced
|
|
- Wastes time on implementation that fails review
|
|
- SRE review failed to strengthen criteria
|
|
</why_it_fails>
|
|
|
|
<correction>
|
|
**Apply Category 3 rigorously:**
|
|
|
|
```markdown
|
|
## Success Criteria Analysis for bd-7
|
|
|
|
Current criteria:
|
|
- [ ] Encryption is implemented correctly
|
|
❌ NOT TESTABLE - "correctly" is subjective, no standard specified
|
|
|
|
- [ ] Code is good quality
|
|
❌ NOT TESTABLE - "good quality" is opinion, not measurable
|
|
|
|
- [ ] Tests work properly
|
|
❌ NOT TESTABLE - "properly" is vague, no definition
|
|
|
|
Minimum requirement: 3+ specific, measurable, testable criteria
|
|
Current: 0 testable criteria
|
|
DECISION: REJECT ❌
|
|
```
|
|
|
|
**Update with measurable criteria:**
|
|
```bash
|
|
bd update bd-7 --design "$(cat <<'EOF'
|
|
[... original content ...]
|
|
|
|
## Success Criteria
|
|
|
|
**Encryption Implementation**:
|
|
- [ ] Uses AES-256-GCM mode (verified in code review)
|
|
- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
|
|
- [ ] Unique IV generated per encryption (crypto_random)
|
|
- [ ] Authentication tag verified on decryption
|
|
|
|
**Code Quality** (automated checks):
|
|
- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
|
|
- [ ] Rustfmt compliant: `cargo fmt --check`
|
|
- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
|
|
- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0
|
|
|
|
**Test Coverage**:
|
|
- [ ] 12+ unit tests pass covering:
|
|
- test_encrypt_decrypt_roundtrip (happy path)
|
|
- test_wrong_key_fails_auth (security)
|
|
- test_modified_ciphertext_fails_auth (security)
|
|
- test_empty_plaintext (edge case)
|
|
- test_large_plaintext_10mb (performance)
|
|
- test_unicode_plaintext (data handling)
|
|
- test_concurrent_encryption (thread safety)
|
|
- test_iv_uniqueness (security)
|
|
- [4 more specific scenarios]
|
|
- [ ] All tests pass: `cargo test encryption`
|
|
- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`
|
|
|
|
**Documentation**:
|
|
- [ ] Module docstring explains encryption scheme (AES-256-GCM)
|
|
- [ ] Function docstrings include examples
|
|
- [ ] Security considerations documented (key management, IV handling)
|
|
|
|
**Security Review**:
|
|
- [ ] No hardcoded keys or IVs (verified via grep)
|
|
- [ ] Key zeroized after use (verified in code)
|
|
- [ ] Constant-time comparison for auth tag (timing attack prevention)
|
|
EOF
|
|
)"
|
|
```
|
|
|
|
**What you gain:**
|
|
- Every criterion objectively verifiable
|
|
- Junior engineer knows exactly what "done" means
|
|
- Automated checks (clippy, fmt, grep) provide instant feedback
|
|
- Specific test scenarios prevent missed edge cases
|
|
- Security requirements explicit (GCM, PBKDF2, unique IV)
|
|
- No ambiguity - can verify each criterion with command or code review
|
|
- Professional SRE review standard: measurable, testable, specific
|
|
</correction>
|
|
</example>
|
|
</examples>
|
|
|
|
<critical_rules>
|
|
## Rules That Have No Exceptions
|
|
|
|
1. **Apply all 7 categories to every task** → No skipping any category for any task
|
|
2. **Reject plans with placeholder text** → "[detailed above]", "[as specified]" = instant reject
|
|
3. **Verify no placeholder after updates** → Read back with `bd show` and confirm actual content
|
|
4. **Break tasks >16 hours** → Create subtasks, don't accept large tasks
|
|
5. **Strengthen vague criteria** → "Works correctly" → measurable verification commands
|
|
6. **Add edge cases to every task** → Empty? Unicode? Concurrency? Failures?
|
|
7. **Never skip Category 6** → Edge case analysis prevents production issues
|
|
|
|
## Common Excuses
|
|
|
|
All of these mean: **STOP. Apply the full process.**
|
|
|
|
- "Task looks straightforward" (Edge cases hide in "straightforward" tasks)
|
|
- "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items)
|
|
- "Placeholder text is just formatting" (Placeholders mean incomplete specification)
|
|
- "Can handle edge cases during implementation" (Must specify upfront, not defer)
|
|
- "Junior will figure it out" (Junior should NOT need to figure out - we specify)
|
|
- "Too detailed, feels like micromanaging" (Detail prevents questions and rework)
|
|
- "Taking too long to review" (One gap caught saves hours of rework)
|
|
</critical_rules>
|
|
|
|
<verification_checklist>
|
|
Before completing SRE review:
|
|
|
|
**Per task reviewed:**
|
|
- [ ] Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags)
|
|
- [ ] Checked for placeholder text in design field
|
|
- [ ] Updated task with missing information via `bd update --design`
|
|
- [ ] Verified updated task with `bd show` (no placeholders remain)
|
|
- [ ] Broke down any task >16 hours into subtasks
|
|
- [ ] Strengthened vague success criteria to measurable
|
|
- [ ] Added edge case analysis to Key Considerations
|
|
- [ ] Strengthened anti-patterns based on failure modes
|
|
|
|
**Overall plan:**
|
|
- [ ] Reviewed ALL tasks/phases/subtasks (no exceptions)
|
|
- [ ] Verified dependency structure with `bd dep tree`
|
|
- [ ] Documented findings for each task
|
|
- [ ] Created summary of changes made
|
|
- [ ] Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT)
|
|
|
|
**Can't check all boxes?** Return to review process and complete missing steps.
|
|
</verification_checklist>
|
|
|
|
<integration>
|
|
**This skill is used after:**
|
|
- hyperpowers:writing-plans (creates initial plan)
|
|
- hyperpowers:brainstorming (establishes requirements)
|
|
|
|
**This skill is used before:**
|
|
- hyperpowers:executing-plans (implements tasks)
|
|
|
|
**This skill is also called by:**
|
|
- hyperpowers:executing-plans (REQUIRED for new tasks created during execution)
|
|
|
|
**Call chains:**
|
|
```
|
|
Initial planning:
|
|
hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
|
|
↓
|
|
(if gaps: revise and re-review)
|
|
|
|
During execution (for new tasks):
|
|
hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint
|
|
```
|
|
|
|
**This skill uses:**
|
|
- bd commands (show, update, create, dep add, dep tree)
|
|
- Google Fellow SRE perspective (20+ years distributed systems)
|
|
- 7-category checklist (mandatory for every task)
|
|
|
|
**Time expectations:**
|
|
- Small epic (3-5 tasks): 15-20 minutes
|
|
- Medium epic (6-10 tasks): 25-40 minutes
|
|
- Large epic (10+ tasks): 45-60 minutes
|
|
|
|
**Don't rush:** Catching one critical gap pre-implementation saves hours of rework.
|
|
</integration>
|
|
|
|
<resources>
|
|
**Review patterns:**
|
|
- Task too large (>16h) → Break into 4-8h subtasks
|
|
- Vague criteria ("works correctly") → Measurable commands/checks
|
|
- Missing edge cases → Add to Key Considerations with mitigations
|
|
- Placeholder text → Rewrite with actual content
|
|
|
|
**When stuck:**
|
|
- Unsure if task too large → Ask: Can junior complete in one day?
|
|
- Unsure if criteria measurable → Ask: Can I verify with command/code review?
|
|
- Unsure if edge case matters → Ask: Could this fail in production?
|
|
- Unsure if placeholder → Ask: Does this reference other content instead of providing content?
|
|
|
|
**Key principle:** Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.
|
|
</resources>
|