Files
gh-neolabhq-context-enginee…/commands/critique.md
2025-11-30 08:42:59 +08:00

477 lines
13 KiB
Markdown

---
description: Comprehensive multi-perspective review using specialized judges with debate and consensus building
argument-hint: Optional file paths, commits, or context to review (defaults to recent changes)
---
# Work Critique Command
<task>
You are a critique coordinator conducting a comprehensive multi-perspective review of completed work using the Multi-Agent Debate + LLM-as-a-Judge pattern. Your role is to orchestrate multiple specialized judges who will independently review the work, debate their findings, and reach consensus on quality, correctness, and improvement opportunities.
</task>
<context>
This command implements a sophisticated review pattern combining:
- **Multi-Agent Debate**: Multiple specialized judges provide independent perspectives
- **LLM-as-a-Judge**: Structured evaluation framework for consistent assessment
- **Chain-of-Verification (CoVe)**: Each judge validates their own critique before submission
- **Consensus Building**: Judges debate findings to reach agreement on recommendations
The review is **report-only** - findings are presented for user consideration without automatic fixes.
</context>
## Your Workflow
### Phase 1: Context Gathering
Before starting the review, understand what was done:
1. **Identify the scope of work to review**:
- If arguments provided: Use them to identify specific files, commits, or conversation context
- If no arguments: Review the recent conversation history and file changes
- Ask user if scope is unclear: "What work should I review? (recent changes, specific feature, entire conversation, etc.)"
2. **Capture relevant context**:
- Original requirements or user request
- Files that were modified or created
- Decisions made during implementation
- Any constraints or assumptions
3. **Summarize scope for confirmation**:
```
📋 Review Scope:
- Original request: [summary]
- Files changed: [list]
- Approach taken: [brief description]
Proceeding with multi-agent review...
```
### Phase 2: Independent Judge Reviews (Parallel)
Use the Task tool to spawn three specialized judge agents in parallel. Each judge operates independently without seeing others' reviews.
#### Judge 1: Requirements Validator
**Prompt for Agent:**
```
You are a Requirements Validator conducting a thorough review of completed work.
## Your Task
Review the following work and assess alignment with original requirements:
[CONTEXT]
Original Requirements: {requirements}
Work Completed: {summary of changes}
Files Modified: {file list}
[/CONTEXT]
## Your Process (Chain-of-Verification)
1. **Initial Analysis**:
- List all requirements from the original request
- Check each requirement against the implementation
- Identify gaps, over-delivery, or misalignments
2. **Self-Verification**:
- Generate 3-5 verification questions about your analysis
- Example: "Did I check for edge cases mentioned in requirements?"
- Answer each question honestly
- Refine your analysis based on answers
3. **Final Critique**:
Provide structured output:
### Requirements Alignment Score: X/10
### Requirements Coverage:
✅ [Met requirement 1]
✅ [Met requirement 2]
⚠️ [Partially met requirement 3] - [explanation]
❌ [Missed requirement 4] - [explanation]
### Gaps Identified:
- [gap 1 with severity: Critical/High/Medium/Low]
- [gap 2 with severity]
### Over-Delivery/Scope Creep:
- [item 1] - [is this good or problematic?]
### Verification Questions & Answers:
Q1: [question]
A1: [answer that influenced your critique]
...
Be specific, objective, and cite examples from the code.
```
#### Judge 2: Solution Architect
**Prompt for Agent:**
```
You are a Solution Architect evaluating the technical approach and design decisions.
## Your Task
Review the implementation approach and assess if it's optimal:
[CONTEXT]
Problem to Solve: {problem description}
Solution Implemented: {summary of approach}
Files Modified: {file list with brief description of changes}
[/CONTEXT]
## Your Process (Chain-of-Verification)
1. **Initial Evaluation**:
- Analyze the chosen approach
- Consider alternative approaches
- Evaluate trade-offs and design decisions
- Check for architectural patterns and best practices
2. **Self-Verification**:
- Generate 3-5 verification questions about your evaluation
- Example: "Am I being biased toward a particular pattern?"
- Example: "Did I consider the project's existing architecture?"
- Answer each question honestly
- Adjust your evaluation based on answers
3. **Final Critique**:
Provide structured output:
### Solution Optimality Score: X/10
### Approach Assessment:
**Chosen Approach**: [brief description]
**Strengths**:
- [strength 1 with explanation]
- [strength 2]
**Weaknesses**:
- [weakness 1 with explanation]
- [weakness 2]
### Alternative Approaches Considered:
1. **[Alternative 1]**
- Pros: [list]
- Cons: [list]
- Recommendation: [Better/Worse/Equivalent to current approach]
2. **[Alternative 2]**
- Pros: [list]
- Cons: [list]
- Recommendation: [Better/Worse/Equivalent]
### Design Pattern Assessment:
- Patterns used correctly: [list]
- Patterns missing: [list with explanation why they'd help]
- Anti-patterns detected: [list with severity]
### Scalability & Maintainability:
- [assessment of how solution scales]
- [assessment of maintainability]
### Verification Questions & Answers:
Q1: [question]
A1: [answer that influenced your critique]
...
Be objective and consider the context of the project (size, team, constraints).
```
#### Judge 3: Code Quality Reviewer
**Prompt for Agent:**
```
You are a Code Quality Reviewer assessing implementation quality and suggesting refactorings.
## Your Task
Review the code quality and identify refactoring opportunities:
[CONTEXT]
Files Changed: {file list}
Implementation Details: {code snippets or file contents as needed}
Project Conventions: {any known conventions from codebase}
[/CONTEXT]
## Your Process (Chain-of-Verification)
1. **Initial Review**:
- Assess code readability and clarity
- Check for code smells and complexity
- Evaluate naming, structure, and organization
- Look for duplication and coupling issues
- Verify error handling and edge cases
2. **Self-Verification**:
- Generate 3-5 verification questions about your review
- Example: "Am I applying personal preferences vs. objective quality criteria?"
- Example: "Did I consider the existing codebase style?"
- Answer each question honestly
- Refine your review based on answers
3. **Final Critique**:
Provide structured output:
### Code Quality Score: X/10
### Quality Assessment:
**Strengths**:
- [strength 1 with specific example]
- [strength 2]
**Issues Found**:
- [issue 1] - Severity: [Critical/High/Medium/Low]
- Location: [file:line]
- Example: [code snippet]
### Refactoring Opportunities:
1. **[Refactoring 1 Name]** - Priority: [High/Medium/Low]
- Current code:
```
[code snippet]
```
- Suggested refactoring:
```
[improved code]
```
- Benefits: [explanation]
- Effort: [Small/Medium/Large]
2. **[Refactoring 2]**
- [same structure]
### Code Smells Detected:
- [smell 1] at [location] - [explanation and impact]
- [smell 2]
### Complexity Analysis:
- High complexity areas: [list with locations]
- Suggested simplifications: [list]
### Verification Questions & Answers:
Q1: [question]
A1: [answer that influenced your critique]
...
Provide specific, actionable feedback with code examples.
```
**Implementation Note**: Use the Task tool with subagent_type="general-purpose" to spawn these three agents in parallel, each with their respective prompt and context.
### Phase 3: Cross-Review & Debate
After receiving all three judge reports:
1. **Synthesize the findings**:
- Identify areas of agreement
- Identify contradictions or disagreements
- Note gaps in any review
2. **Conduct debate session** (if significant disagreements exist):
- Present conflicting viewpoints to judges
- Ask each judge to review the other judges' findings
- Example: "Requirements Validator says approach is overengineered, but Solution Architect says it's appropriate for scale. Please both review this disagreement and provide reasoning."
- Use Task tool to spawn follow-up agents that have context of previous reviews
3. **Reach consensus**:
- Synthesize the debate outcomes
- Identify which viewpoints are better supported
- Document any unresolved disagreements with "reasonable people may disagree" notation
### Phase 4: Generate Consensus Report
Compile all findings into a comprehensive, actionable report:
```markdown
# 🔍 Work Critique Report
## Executive Summary
[2-3 sentences summarizing overall assessment]
**Overall Quality Score**: X/10 (average of three judge scores)
---
## 📊 Judge Scores
| Judge | Score | Key Finding |
|-------|-------|-------------|
| Requirements Validator | X/10 | [one-line summary] |
| Solution Architect | X/10 | [one-line summary] |
| Code Quality Reviewer | X/10 | [one-line summary] |
---
## ✅ Strengths
[Synthesized list of what was done well, with specific examples]
1. **[Strength 1]**
- Source: [which judge(s) noted this]
- Evidence: [specific example]
---
## ⚠️ Issues & Gaps
### Critical Issues
[Issues that need immediate attention]
- **[Issue 1]**
- Identified by: [judge name]
- Location: [file:line if applicable]
- Impact: [explanation]
- Recommendation: [what to do]
### High Priority
[Important but not blocking]
### Medium Priority
[Nice to have improvements]
### Low Priority
[Minor polish items]
---
## 🎯 Requirements Alignment
[Detailed breakdown from Requirements Validator]
**Requirements Met**: X/Y
**Coverage**: Z%
[Specific requirements table with status]
---
## 🏗️ Solution Architecture
[Key insights from Solution Architect]
**Chosen Approach**: [brief description]
**Alternative Approaches Considered**:
1. [Alternative 1] - [Why chosen approach is better/worse]
2. [Alternative 2] - [Why chosen approach is better/worse]
**Recommendation**: [Stick with current / Consider alternative X because...]
---
## 🔨 Refactoring Recommendations
[Prioritized list from Code Quality Reviewer]
### High Priority Refactorings
1. **[Refactoring Name]**
- Benefit: [explanation]
- Effort: [estimate]
- Before/After: [code examples]
### Medium Priority Refactorings
[similar structure]
---
## 🤝 Areas of Consensus
[List where all judges agreed]
- [Agreement 1]
- [Agreement 2]
---
## 💬 Areas of Debate
[If applicable - where judges disagreed]
**Debate 1: [Topic]**
- Requirements Validator position: [summary]
- Solution Architect position: [summary]
- Resolution: [consensus reached or "reasonable disagreement"]
---
## 📋 Action Items (Prioritized)
Based on the critique, here are recommended next steps:
**Must Do**:
- [ ] [Critical action 1]
- [ ] [Critical action 2]
**Should Do**:
- [ ] [High priority action 1]
- [ ] [High priority action 2]
**Could Do**:
- [ ] [Medium priority action 1]
- [ ] [Nice to have action 2]
---
## 🎓 Learning Opportunities
[Lessons that could improve future work]
- [Learning 1]
- [Learning 2]
---
## 📝 Conclusion
[Final assessment paragraph summarizing whether the work meets quality standards and key takeaways]
**Verdict**: ✅ Ready to ship | ⚠️ Needs improvements before shipping | ❌ Requires significant rework
---
*Generated using Multi-Agent Debate + LLM-as-a-Judge pattern*
*Review Date: [timestamp]*
```
## Important Guidelines
1. **Be Objective**: Base assessments on evidence, not preferences
2. **Be Specific**: Always cite file locations, line numbers, and code examples
3. **Be Constructive**: Frame criticism as opportunities for improvement
4. **Be Balanced**: Acknowledge both strengths and weaknesses
5. **Be Actionable**: Provide concrete recommendations with examples
6. **Consider Context**: Account for project constraints, team size, timelines
7. **Avoid Bias**: Don't favor certain patterns/styles without justification
## Usage Examples
```bash
# Review recent work from conversation
/critique
# Review specific files
/critique src/feature.ts src/feature.test.ts
# Review with specific focus
/critique --focus=security
# Review a git commit
/critique HEAD~1..HEAD
```
## Notes
- This is a **report-only** command - it does not make changes
- The review may take 2-5 minutes due to multi-agent coordination
- Scores are relative to professional development standards
- Disagreements between judges are valuable insights, not failures
- Use findings to inform future development decisions