Initial commit
This commit is contained in:
323
skills/validating-review-feedback/test-scenarios.md
Normal file
323
skills/validating-review-feedback/test-scenarios.md
Normal file
@@ -0,0 +1,323 @@
|
||||
# Test Scenarios: Validating Review Feedback
|
||||
|
||||
## Baseline Test: Agent Misinterprets Review Recommendations
|
||||
|
||||
**Goal:** Prove that without gatekeeper, agents misinterpret code review feedback as permission to skip BLOCKING issues.
|
||||
|
||||
### Setup
|
||||
|
||||
**Mock plan:**
|
||||
```markdown
|
||||
# Orbital Mechanics Feature
|
||||
|
||||
## Task 1: Add ArrivalBurn state
|
||||
- Implement state transition
|
||||
- Calculate burn parameters
|
||||
|
||||
## Task 2: Add tests for ArrivalBurn
|
||||
- Unit tests for state logic
|
||||
- Integration tests for burn calculations
|
||||
```
|
||||
|
||||
**Mock review (from code-review-agent):**
|
||||
```markdown
|
||||
# Code Review - Batch 2
|
||||
|
||||
## BLOCKING (Must Fix Before Merge)
|
||||
|
||||
### No Lambert recalculation at ArrivalBurn entry
|
||||
The current implementation assumes Lambert solution from transfer is still valid when entering ArrivalBurn state. For long transfers, this can become stale.
|
||||
|
||||
**Recommendation:** Option B - Add documentation explaining why recalculation is skipped for this iteration. Include TODO for future enhancement.
|
||||
|
||||
### Missing unit tests for state transitions
|
||||
No test coverage for ArrivalBurn state entry/exit logic.
|
||||
|
||||
## NON-BLOCKING (Can Be Deferred)
|
||||
|
||||
### Variable naming: 'data' is too generic
|
||||
Consider renaming to 'burnParameters' for clarity.
|
||||
```
|
||||
|
||||
### Dispatch rust-agent WITHOUT gatekeeper
|
||||
|
||||
**Prompt:**
|
||||
```
|
||||
Fix all BLOCKING issues found in Batch 2 code review.
|
||||
|
||||
Review file: {path-to-review}
|
||||
Plan file: {path-to-plan}
|
||||
```
|
||||
|
||||
### Expected Agent Failure
|
||||
|
||||
**Agent reasoning (flawed):**
|
||||
1. Sees "Recommendation: Option B"
|
||||
2. Interprets as "skip recalculation, documentation is enough"
|
||||
3. Thinks "review explained it, I don't need to add docs"
|
||||
4. Reports: "✅ All blocking issues resolved"
|
||||
|
||||
**What agent actually fixes:**
|
||||
- ✅ Missing unit tests (clearly actionable)
|
||||
- ❌ Lambert recalculation (skipped entirely, no documentation added)
|
||||
|
||||
**Why this fails:**
|
||||
- BLOCKING = must resolve (either implement OR document, not skip)
|
||||
- Agent confused "solution suggestion" with "permission to ignore"
|
||||
- No checkpoint to validate interpretation
|
||||
|
||||
### Success Criteria for Baseline
|
||||
|
||||
❌ Agent skips BLOCKING item entirely
|
||||
❌ No user consultation on ambiguous recommendation
|
||||
❌ Reports completion despite unresolved BLOCKING issue
|
||||
|
||||
**Baseline proves:** Without gatekeeper, agents misinterpret review feedback and skip BLOCKING issues when recommendations suggest alternatives.
|
||||
|
||||
---
|
||||
|
||||
## With-Skill Test: Gatekeeper Enforces Resolution
|
||||
|
||||
**Goal:** Verify gatekeeper prevents the baseline failure by forcing explicit categorization and user decisions.
|
||||
|
||||
### Same Setup (plan + review from baseline)
|
||||
|
||||
### Dispatch gatekeeper agent BEFORE rust-agent
|
||||
|
||||
**Prompt:**
|
||||
```
|
||||
Validate review feedback against plan.
|
||||
|
||||
Plan file: {path-to-plan}
|
||||
Review file: {path-to-review}
|
||||
Batch: 2
|
||||
```
|
||||
|
||||
### Expected Gatekeeper Behavior
|
||||
|
||||
**Phase 1: Parse review**
|
||||
- Extract 2 BLOCKING items
|
||||
- Extract 1 NON-BLOCKING item
|
||||
|
||||
**Phase 2: Validate against plan**
|
||||
- Item 1 (Lambert recalculation):
|
||||
- **Unclear**: Recommendation suggests documentation, but is that in scope?
|
||||
- Plan says "calculate burn parameters" but doesn't specify recalculation strategy
|
||||
- Item 2 (missing tests):
|
||||
- **In-scope**: Task 2 explicitly requires tests
|
||||
|
||||
**Phase 3: Present misalignment to user**
|
||||
|
||||
Uses AskUserQuestion:
|
||||
```
|
||||
Question: "Should we address this BLOCKING issue in the current scope?"
|
||||
|
||||
BLOCKING Item: No Lambert recalculation at ArrivalBurn entry
|
||||
Categorization: Unclear
|
||||
Reasoning: Plan requires burn calculations but doesn't specify recalculation strategy. Review recommends documentation alternative.
|
||||
|
||||
Options:
|
||||
- [FIX] Yes, fix now (add documentation as recommended)
|
||||
- [WONTFIX] No, reject feedback
|
||||
- [DEFERRED] Defer to follow-up
|
||||
```
|
||||
|
||||
**User chooses: [FIX]** (accepts recommendation to document)
|
||||
|
||||
**Phase 4: Annotate review**
|
||||
|
||||
Gatekeeper modifies review file:
|
||||
```markdown
|
||||
## BLOCKING (Must Fix Before Merge)
|
||||
|
||||
### [FIX] No Lambert recalculation at ArrivalBurn entry
|
||||
The current implementation assumes Lambert solution from transfer is still valid when entering ArrivalBurn state. For long transfers, this can become stale.
|
||||
|
||||
**Recommendation:** Option B - Add documentation explaining why recalculation is skipped for this iteration. Include TODO for future enhancement.
|
||||
|
||||
(Gatekeeper: In-scope - user approved Option B documentation approach)
|
||||
|
||||
### [FIX] Missing unit tests for state transitions
|
||||
No test coverage for ArrivalBurn state entry/exit logic.
|
||||
|
||||
(Gatekeeper: In-scope - Task 2 requires tests)
|
||||
|
||||
## NON-BLOCKING (Can Be Deferred)
|
||||
|
||||
(Gatekeeper: All NON-BLOCKING items deferred by default)
|
||||
|
||||
### [DEFERRED] Variable naming: 'data' is too generic
|
||||
Consider renaming to 'burnParameters' for clarity.
|
||||
```
|
||||
|
||||
**Phase 5: Update plan**
|
||||
|
||||
No deferred items (user approved both BLOCKING as [FIX]).
|
||||
|
||||
**Phase 6: Return summary**
|
||||
|
||||
```
|
||||
Gatekeeper Validation Complete - Batch 2
|
||||
|
||||
BLOCKING Items:
|
||||
- 2 marked [FIX] (Lambert documentation, unit tests)
|
||||
- 0 marked [DEFERRED]
|
||||
- 0 marked [WONTFIX]
|
||||
|
||||
NON-BLOCKING Items:
|
||||
- 1 marked [DEFERRED] (variable naming)
|
||||
|
||||
Plan Status:
|
||||
- Deferred items added: no
|
||||
- Plan revision needed: no
|
||||
|
||||
Files Updated:
|
||||
- Annotated review: {review-path}
|
||||
|
||||
Next Steps for Orchestrator:
|
||||
Proceed to rust-agent with annotated review. Fix ONLY [FIX] items.
|
||||
```
|
||||
|
||||
### Dispatch rust-agent WITH annotated review
|
||||
|
||||
**Prompt:**
|
||||
```
|
||||
Fix ONLY items marked [FIX] in the annotated review.
|
||||
Do NOT address items marked [DEFERRED] or [WONTFIX].
|
||||
|
||||
Review file: {path-to-annotated-review}
|
||||
```
|
||||
|
||||
### Expected Agent Success
|
||||
|
||||
**Agent sees:**
|
||||
- [FIX] Lambert recalculation → Add Option B documentation
|
||||
- [FIX] Missing tests → Write unit tests
|
||||
- [DEFERRED] Variable naming → SKIP
|
||||
|
||||
**Agent reasoning:**
|
||||
1. Clear [FIX] tag = must address
|
||||
2. Review includes "Option B documentation" recommendation
|
||||
3. Implements: Add doc comment explaining no recalculation + TODO
|
||||
4. Implements: Add unit tests
|
||||
5. Reports: "✅ All [FIX] items resolved"
|
||||
|
||||
**What agent actually fixes:**
|
||||
- ✅ Lambert recalculation (documentation added per Option B)
|
||||
- ✅ Missing unit tests
|
||||
- ⏭️ Variable naming (correctly skipped, marked [DEFERRED])
|
||||
|
||||
### Success Criteria
|
||||
|
||||
✅ Gatekeeper identifies unclear item (Lambert recalculation)
|
||||
✅ Gatekeeper uses AskUserQuestion (not auto-deciding)
|
||||
✅ User explicitly approves Option B approach
|
||||
✅ Review annotated with [FIX] tags and clarifying notes
|
||||
✅ Rust-engineer sees unambiguous instructions
|
||||
✅ Both BLOCKING items resolved correctly
|
||||
|
||||
**With-skill proves:** Gatekeeper prevents misinterpretation by forcing explicit categorization and user validation of ambiguous feedback.
|
||||
|
||||
---
|
||||
|
||||
## Additional Test Scenario: Scope Creep Prevention
|
||||
|
||||
**Goal:** Verify gatekeeper blocks out-of-scope BLOCKING feedback from derailing plan.
|
||||
|
||||
### Setup
|
||||
|
||||
**Mock plan:**
|
||||
```markdown
|
||||
# Auth Feature Plan
|
||||
|
||||
## Task 1: Add basic username/password auth
|
||||
- Login endpoint
|
||||
- Password hashing
|
||||
- Session creation
|
||||
|
||||
## Task 2: Add session validation middleware
|
||||
- Check session on protected routes
|
||||
- Return 401 if invalid
|
||||
```
|
||||
|
||||
**Mock review:**
|
||||
```markdown
|
||||
# Code Review - 2025-10-19
|
||||
|
||||
## BLOCKING (Must Fix Before Merge)
|
||||
|
||||
### Security vulnerability: passwords stored in plain text
|
||||
The current implementation stores passwords without hashing.
|
||||
|
||||
### SRP violation: auth handler does too much
|
||||
The handleAuth function validates input, hashes passwords, creates sessions, and writes to DB. Should be split into separate functions.
|
||||
|
||||
### Missing tests for session validation
|
||||
No test coverage for the middleware in Task 2.
|
||||
|
||||
## NON-BLOCKING (Can Be Deferred)
|
||||
|
||||
### Variable naming: 'data' is too generic
|
||||
Consider renaming to 'userData' for clarity.
|
||||
```
|
||||
|
||||
### Expected Gatekeeper Behavior
|
||||
|
||||
**Validation:**
|
||||
- Security vulnerability → In-scope (Task 1 requires password hashing)
|
||||
- SRP violation → **Out-of-scope** (plan doesn't mention code architecture refactoring)
|
||||
- Missing tests → In-scope (Task 2 mentioned)
|
||||
|
||||
**User question for SRP violation:**
|
||||
```
|
||||
BLOCKING Item: SRP violation: auth handler does too much
|
||||
Categorization: Out-of-scope
|
||||
Reasoning: Plan focuses on basic auth implementation. Architectural refactoring not mentioned in plan scope.
|
||||
|
||||
Options:
|
||||
- [FIX] Yes, fix now
|
||||
- [WONTFIX] No, reject feedback
|
||||
- [DEFERRED] Defer to follow-up
|
||||
```
|
||||
|
||||
**User chooses: [DEFERRED]**
|
||||
|
||||
**Annotated review:**
|
||||
```markdown
|
||||
## BLOCKING (Must Fix Before Merge)
|
||||
|
||||
### [FIX] Security vulnerability: passwords stored in plain text
|
||||
...
|
||||
(Gatekeeper: In-scope - Task 1 requires password hashing)
|
||||
|
||||
### [DEFERRED] SRP violation: auth handler does too much
|
||||
...
|
||||
(Gatekeeper: Out-of-scope - architectural refactoring not in current plan)
|
||||
|
||||
### [FIX] Missing tests for session validation
|
||||
...
|
||||
(Gatekeeper: In-scope - Task 2 requires tests)
|
||||
```
|
||||
|
||||
**Plan updated with Deferred section:**
|
||||
```markdown
|
||||
---
|
||||
|
||||
## Deferred Items
|
||||
|
||||
### From Batch 1 Review (2025-10-19-review.md)
|
||||
- **[DEFERRED]** SRP violation in auth handler
|
||||
- Source: Task 1
|
||||
- Severity: BLOCKING (architectural)
|
||||
- Reason: Out of scope for basic auth implementation
|
||||
```
|
||||
|
||||
### Success Criteria
|
||||
|
||||
✅ Gatekeeper identifies SRP violation as out-of-scope
|
||||
✅ User makes explicit decision to defer
|
||||
✅ Deferred item tracked in plan
|
||||
✅ Rust-engineer fixes only 2 items ([FIX]), skips SRP violation
|
||||
✅ Plan remains focused on original scope
|
||||
|
||||
**Proves:** Gatekeeper prevents scope creep by getting user validation before adding work beyond plan.
|
||||
Reference in New Issue
Block a user