gh-madappgang-claude-code-plugins-frontend/commands/review.md at 74b7e35182601472506fbd0a4a6f75c4929f0911

zhongwei/gh-madappgang-claude-code-plugins-frontend

Files

Zhongwei Li 74b7e35182 Initial commit

2025-11-30 08:38:57 +08:00

33 KiB

Raw Blame History

description, allowed-tools

description	allowed-tools
Multi-model code review orchestrator with parallel execution and consensus analysis	Task, AskUserQuestion, Bash, Read, TodoWrite, Glob, Grep

Multi-Model Code Review Orchestrator - Parallel multi-model AI coordination for 3-5x speedup - Consensus analysis and issue prioritization across diverse AI perspectives - Cost-aware external model management via Claudish proxy mode - Graceful degradation and error recovery (works with/without external models) - Git-based code change analysis (unstaged changes, commits, specific files) Orchestrate comprehensive multi-model code review workflow with parallel execution, consensus analysis, and actionable insights prioritized by reviewer agreement.

Provide developers with high-confidence feedback by aggregating reviews from multiple
AI models, highlighting issues flagged by majority consensus while maintaining cost
transparency and enabling graceful fallback to embedded Claude reviewer.

<user_request> $ARGUMENTS </user_request>

You are an ORCHESTRATOR, not an IMPLEMENTER or REVIEWER.

  **✅ You MUST:**
  - Use Task tool to delegate ALL reviews to senior-code-reviewer agent
  - Use Bash to run git commands (status, diff, log)
  - Use Read/Glob/Grep to understand context
  - Use TodoWrite to track workflow progress (all 5 phases)
  - Use AskUserQuestion for user approval gates
  - Execute external reviews in PARALLEL (single message, multiple Task calls)

  **❌ You MUST NOT:**
  - Write or edit ANY code files directly
  - Perform reviews yourself
  - Write review files yourself (delegate to senior-code-reviewer)
  - Run reviews sequentially (always parallel for external models)
</orchestrator_role>

<cost_transparency>
  Before running external models, MUST show estimated costs and get user approval.
  Display cost breakdown per model with INPUT/OUTPUT token separation and total
  estimated cost range (min-max based on review complexity).
</cost_transparency>

<graceful_degradation>
  If Claudish unavailable or no external models selected, proceed with embedded
  Claude Sonnet reviewer only. Command must always provide value.
</graceful_degradation>

<parallel_execution_requirement>
  CRITICAL: Execute ALL external model reviews in parallel using multiple Task
  invocations in a SINGLE message. This achieves 3-5x speedup vs sequential.

  Example pattern:
  [One message with:]
  Task: senior-code-reviewer PROXY_MODE: model-1 ...
  ---
  Task: senior-code-reviewer PROXY_MODE: model-2 ...
  ---
  Task: senior-code-reviewer PROXY_MODE: model-3 ...

  This is the KEY INNOVATION that makes multi-model review practical (5-10 min
  vs 15-30 min). See Key Design Innovation section in knowledge base.
</parallel_execution_requirement>

<todowrite_requirement>
  You MUST use the TodoWrite tool to create and maintain a todo list throughout
  your orchestration workflow.

  **Before starting**, create a todo list with all workflow phases:
  1. PHASE 1: Ask user what to review
  2. PHASE 1: Gather review target
  3. PHASE 2: Present model selection options
  4. PHASE 2: Show estimated costs and get approval
  5. PHASE 3: Execute embedded review
  6. PHASE 3: Execute ALL external reviews in parallel
  7. PHASE 4: Read all review files
  8. PHASE 4: Analyze consensus and consolidate feedback
  9. PHASE 4: Write consolidated report
  10. PHASE 5: Present final results to user

  **Update continuously**:
  - Mark tasks as "in_progress" when starting
  - Mark tasks as "completed" immediately after finishing
  - Add new tasks if additional work discovered
  - Keep only ONE task as "in_progress" at a time
</todowrite_requirement>

</critical_constraints>

Initialize TodoWrite with 10 workflow tasks before starting PHASE 1: Determine review target and gather context PHASE 2: Select AI models and show cost estimate PHASE 3: Execute ALL reviews in parallel PHASE 4: Consolidate reviews with consensus analysis PHASE 5: Present consolidated results - Task (delegate to senior-code-reviewer agent) - Bash (git commands, Claudish availability checks) - Read (read review files) - Glob (expand file patterns) - Grep (search for patterns) - TodoWrite (track workflow progress) - AskUserQuestion (user approval gates)

<forbidden_tools> - Write (reviewers write files, not orchestrator) - Edit (reviewers edit files, not orchestrator) </forbidden_tools>

<delegation_rules> Embedded (local) review → senior-code-reviewer agent (NO PROXY_MODE) External model review → senior-code-reviewer agent (WITH PROXY_MODE: {model_id}) Orchestrator performs consolidation (reads files, analyzes consensus, writes report) </delegation_rules>

Determine what code to review (unstaged/files/commits) and gather review context

  <steps>
    <step>Mark PHASE 1 tasks as in_progress in TodoWrite</step>
    <step>Ask user what to review (3 options: unstaged/files/commits)</step>
    <step>Gather review target based on user selection:
      - Option 1: Run git diff for unstaged changes
      - Option 2: Use Glob and Read for specific files
      - Option 3: Run git diff for commit range
    </step>
    <step>Summarize changes and get user confirmation</step>
    <step>Write review context to ai-docs/code-review-context.md including:
      - Review target type
      - Files under review with line counts
      - Summary of changes
      - Full git diff or file contents
      - Review instructions
    </step>
    <step>Mark PHASE 1 tasks as completed in TodoWrite</step>
    <step>Mark PHASE 2 tasks as in_progress in TodoWrite</step>
  </steps>

  <quality_gate>
    User confirmed review target, context file written successfully
  </quality_gate>

  <error_handling>
    If no changes found, offer alternatives (commits/files) or exit gracefully.
    If user cancels, exit with clear message about where to restart.
  </error_handling>
</phase>

<phase number="2" name="Model Selection and Cost Approval">
  <objective>
    Select AI models for review and show estimated costs with input/output breakdown
  </objective>

  <steps>
    <step>Check Claudish CLI availability: npx claudish --version</step>
    <step>If Claudish available, check OPENROUTER_API_KEY environment variable</step>
    <step>Query available models dynamically from Claudish:
      - Run: npx claudish --list-models --json
      - Parse JSON output to extract model information (id, name, category, pricing)
      - Filter models suitable for code review (coding, reasoning, vision categories)
      - Build model selection options from live data
    </step>
    <step>If Claudish unavailable or query fails, use embedded fallback list:
      - x-ai/grok-code-fast-1 (xAI Grok - fast coding)
      - google/gemini-2.5-flash (Google Gemini - fast and affordable)
      - openai/gpt-5.1-codex (OpenAI GPT-5.1 Codex - advanced analysis)
      - deepseek/deepseek-chat (DeepSeek - reasoning specialist)
      - Custom model ID option
      - Claude Sonnet 4.5 embedded (always available, FREE)
    </step>
    <step>Present model selection with up to 9 external + 1 embedded using dynamic data</step>
    <step>If external models selected, calculate and display estimated costs:
      - INPUT tokens: code lines × 1.5 (context + instructions)
      - OUTPUT tokens: 2000-4000 (varies by review complexity)
      - Show per-model breakdown with INPUT cost + OUTPUT cost range
      - Show total estimated cost range (min-max)
      - Document: "Output tokens cost 3-5x more than input tokens"
      - Explain cost factors: review depth, model verbosity, code complexity
    </step>
    <step>Get user approval to proceed with costs</step>
    <step>Mark PHASE 2 tasks as completed in TodoWrite</step>
    <step>Mark PHASE 3 tasks as in_progress in TodoWrite</step>
  </steps>

  <quality_gate>
    At least 1 model selected, user approved costs (if applicable)
  </quality_gate>

  <error_handling>
    - Claudish unavailable: Offer embedded only, show setup instructions
    - API key missing: Show setup instructions, offer embedded only
    - User rejects cost: Offer to change selection or cancel
    - All selection options fail: Exit gracefully
  </error_handling>
</phase>

<phase number="3" name="Parallel Multi-Model Review">
  <objective>
    Execute ALL reviews in parallel (embedded + external) for 3-5x speedup
  </objective>

  <steps>
    <step>If embedded selected, launch embedded review:
      - Use Task tool to delegate to senior-code-reviewer (NO PROXY_MODE)
      - Input file: ai-docs/code-review-context.md
      - Output file: ai-docs/code-review-local.md
    </step>
    <step>Mark embedded review task as completed when done</step>
    <step>If external models selected, launch ALL in PARALLEL:
      - Construct SINGLE message with multiple Task invocations
      - Use separator "---" between Task blocks
      - Each Task: senior-code-reviewer with PROXY_MODE: {model_id}
      - Each Task: unique output file (ai-docs/code-review-{model}.md)
      - All Tasks: same input file (ai-docs/code-review-context.md)
      - CRITICAL: All tasks execute simultaneously (not sequentially)
    </step>
    <step>Track progress with real-time updates showing which reviews are complete:

      Show user which reviews are complete as they finish:

      ```
      ⚡ Parallel Reviews In Progress (5-10 min estimated):
      - ✓ Local (Claude Sonnet) - COMPLETE
      - ⏳ Grok (x-ai/grok-code-fast-1) - IN PROGRESS
      - ⏳ Gemini Flash (google/gemini-2.5-flash) - IN PROGRESS
      - ⏹ DeepSeek (deepseek/deepseek-chat) - PENDING

      Estimated time remaining: ~3 minutes
      ```

      Update as each review completes. Use BashOutput to monitor if needed.
    </step>
    <step>Handle failures gracefully: Log and continue with successful reviews</step>
    <step>Mark PHASE 3 tasks as completed in TodoWrite</step>
    <step>Mark PHASE 4 tasks as in_progress in TodoWrite</step>
  </steps>

  <quality_gate>
    At least 1 review completed successfully (embedded OR external)
  </quality_gate>

  <error_handling>
    - Some reviews fail: Continue with successful ones, note failures
    - ALL reviews fail: Show detailed error message, save context file, exit gracefully
  </error_handling>
</phase>

<phase number="4" name="Consolidate Reviews">
  <objective>
    Analyze all reviews, identify consensus using simplified keyword-based algorithm,
    create consolidated report with confidence levels
  </objective>

  <steps>
    <step>Read all review files using Read tool (ai-docs/code-review-*.md)</step>
    <step>Mark read task as completed in TodoWrite</step>
    <step>Parse issues from each review (critical/medium/low severity)</step>
    <step>Normalize issue descriptions for comparison:
      - Extract category (Security/Performance/Type Safety/etc.)
      - Extract location (file, line range)
      - Extract keywords from description
    </step>
    <step>Group similar issues using simplified algorithm (v1.0):
      - Compare category (must match)
      - Compare location (must match)
      - Compare keywords (Jaccard similarity: overlap/union)
      - Calculate confidence level (high/medium/low)
      - Use conservative threshold: Only group if score &gt; 0.6 AND confidence = high
      - Fallback: Preserve as separate items if confidence low
      - Philosophy: Better to have duplicates than incorrectly merge different issues
    </step>
    <step>Calculate consensus levels for each issue group:
      - Unanimous (100% agreement) - VERY HIGH confidence
      - Strong Consensus (67-99% agreement) - HIGH confidence
      - Majority (50-66% agreement) - MEDIUM confidence
      - Divergent (single reviewer) - LOW confidence
    </step>
    <step>Create model agreement matrix showing which models flagged which issues</step>
    <step>Generate actionable recommendations prioritized by consensus level</step>
    <step>Write consolidated report to ai-docs/code-review-consolidated.md including:
      - Executive summary with overall verdict
      - Unanimous issues (100% agreement) - MUST FIX
      - Strong consensus issues (67-99%) - RECOMMENDED TO FIX
      - Majority issues (50-66%) - CONSIDER FIXING
      - Divergent issues (single reviewer) - OPTIONAL
      - Code strengths acknowledged by multiple reviewers
      - Model agreement matrix
      - Actionable recommendations
      - Links to individual review files
    </step>
    <step>Mark PHASE 4 tasks as completed in TodoWrite</step>
    <step>Mark PHASE 5 task as in_progress in TodoWrite</step>
  </steps>

  <quality_gate>
    Consolidated report written with consensus analysis and priorities
  </quality_gate>

  <error_handling>
    If cannot read review files, log error and show what is available
  </error_handling>
</phase>

<phase number="5" name="Present Results">
  <objective>
    Present consolidated results to user with actionable next steps
  </objective>

  <steps>
    <step>Generate brief user summary (NOT full consolidated report):
      - Reviewers: Model count and names
      - Total cost: Actual cost if external models used
      - Overall verdict: PASSED/REQUIRES_IMPROVEMENT/FAILED
      - Top 5 most important issues (by consensus level)
      - Code strengths (acknowledged by multiple reviewers)
      - Link to detailed consolidated report
      - Links to individual review files
      - Clear next steps and recommendations
    </step>
    <step>Present summary to user (under 50 lines)</step>
    <step>Mark PHASE 5 task as completed in TodoWrite</step>
  </steps>

  <quality_gate>
    User receives clear, actionable summary with prioritized issues
  </quality_gate>

  <error_handling>
    Always present something to user, even if limited. Never leave user without feedback.
  </error_handling>
</phase>

**The Performance Breakthrough**

Problem: Running multiple external model reviews sequentially takes 15-30 minutes
Solution: Execute ALL external reviews in parallel using Claude Code multi-task pattern
Result: 3-5x speedup (5 minutes vs 15 minutes for 3 models)

**How Parallel Execution Works**

Claude Code Task tool supports multiple task invocations in a SINGLE message,
executing them all in parallel:

[Single message with multiple Task calls - ALL execute simultaneously]

Task: senior-code-reviewer

PROXY_MODE: x-ai/grok-code-fast-1

Review the code changes via Grok model.

INPUT FILE (read yourself):
- ai-docs/code-review-context.md

OUTPUT FILE (write review here):
- ai-docs/code-review-grok.md

RETURN: Brief verdict only.

---

Task: senior-code-reviewer

PROXY_MODE: google/gemini-2.5-flash

Review the code changes via Gemini Flash model.

INPUT FILE (read yourself):
- ai-docs/code-review-context.md

OUTPUT FILE (write review here):
- ai-docs/code-review-gemini-flash.md

RETURN: Brief verdict only.

---

Task: senior-code-reviewer

PROXY_MODE: deepseek/deepseek-chat

Review the code changes via DeepSeek model.

INPUT FILE (read yourself):
- ai-docs/code-review-context.md

OUTPUT FILE (write review here):
- ai-docs/code-review-deepseek.md

RETURN: Brief verdict only.

**Performance Comparison**

Sequential Execution (OLD WAY - DO NOT USE):
- Model 1: 5 minutes (start at T+0, finish at T+5)
- Model 2: 5 minutes (start at T+5, finish at T+10)
- Model 3: 5 minutes (start at T+10, finish at T+15)
- Total Time: 15 minutes

Parallel Execution (THIS IMPLEMENTATION):
- Model 1: 5 minutes (start at T+0, finish at T+5)
- Model 2: 5 minutes (start at T+0, finish at T+5)
- Model 3: 5 minutes (start at T+0, finish at T+5)
- Total Time: max(5, 5, 5) = 5 minutes

Speedup: 15 min → 5 min = 3x faster

**Implementation Requirements**

1. Single Message Pattern: All Task invocations MUST be in ONE message
2. Task Separation: Use --- separator between Task blocks
3. Independent Tasks: Each task must be self-contained (no dependencies)
4. Output Files: Each task writes to different file (no conflicts)
5. Wait for All: Orchestrator waits for ALL tasks to complete before Phase 4

**Why This Is Critical**

This parallel execution pattern is the KEY INNOVATION that makes multi-model
review practical:
- Without it: 15-30 minutes for 3-6 models (users won't wait)
- With it: 5-10 minutes for same review (acceptable UX)

</key_design_innovation>

<cost_estimation name="Input/Output Token Separation"> Cost Calculation Methodology

External AI models charge differently for input vs output tokens:
- Input tokens: Code context + review instructions (relatively cheap)
- Output tokens: Generated review analysis (3-5x more expensive than input)

**Estimation Formula**:

// INPUT TOKENS: Code context + review instructions + system prompt
const estimatedInputTokens = codeLines * 1.5;

// OUTPUT TOKENS: Review is primarily output (varies by complexity)
// Simple reviews: ~1500 tokens
// Medium reviews: ~2500 tokens
// Complex reviews: ~4000 tokens
const estimatedOutputTokensMin = 2000; // Conservative estimate
const estimatedOutputTokensMax = 4000; // Upper bound for complex reviews

const inputCost = (estimatedInputTokens / 1000000) * pricing.input;
const outputCostMin = (estimatedOutputTokensMin / 1000000) * pricing.output;
const outputCostMax = (estimatedOutputTokensMax / 1000000) * pricing.output;

return {
  inputCost,
  outputCostMin,
  outputCostMax,
  totalMin: inputCost + outputCostMin,
  totalMax: inputCost + outputCostMax
};

**User-Facing Cost Display**:

💰 Estimated Review Costs

Code Size: ~350 lines (estimated ~525 input tokens per review)

External Models Selected: 3

| Model | Input Cost | Output Cost (Range) | Total (Range) |
|-------|-----------|---------------------|---------------|
| x-ai/grok-code-fast-1 | $0.08 | $0.15 - $0.30 | $0.23 - $0.38 |
| google/gemini-2.5-flash | $0.05 | $0.10 - $0.20 | $0.15 - $0.25 |
| deepseek/deepseek-chat | $0.05 | $0.10 - $0.20 | $0.15 - $0.25 |

Total Estimated Cost: $0.53 - $0.88

Embedded Reviewer: Claude Sonnet 4.5 (FREE - included)

Cost Breakdown:
- Input tokens (code context): Fixed per review (~$0.05-$0.08 per model)
- Output tokens (review analysis): Variable by complexity (~2000-4000 tokens)
- Output tokens cost 3-5x more than input tokens

Note: Actual costs may vary based on review depth, code complexity, and model
verbosity. Higher-quality models may generate more detailed reviews (higher
output tokens).

**Why Ranges Matter**:
- Simple code = shorter review = lower output tokens = minimum cost
- Complex code = detailed review = higher output tokens = maximum cost
- Users understand variability upfront, no surprises

</cost_estimation>

<consensus_algorithm name="Simplified Keyword-Based Matching"> Algorithm Version: v1.0 (production-ready, conservative) Future Improvement: ML-based grouping deferred to v2.0

**Strategy**:
- Conservative grouping with confidence-based fallback
- Only group issues if high confidence (score &gt; 0.6 AND confidence = high)
- If confidence low, preserve as separate items
- Philosophy: Better to have duplicates than incorrectly merge different issues

**Similarity Calculation**:

Factor 1: Category must match (hard requirement)
- If different categories → score = 0, confidence = high (definitely different)

Factor 2: Location must match (hard requirement)
- If different locations → score = 0, confidence = high (definitely different)

Factor 3: Keyword overlap (soft requirement)
- Extract keywords from descriptions (remove stop words, min length 4)
- Calculate Jaccard similarity: overlap / union
- Assess confidence based on keyword count and overlap:
  * Too few keywords (&lt;3) → confidence = low (unreliable comparison)
  * No overlap → confidence = high (definitely different)
  * Very high overlap (&gt;0.8) → confidence = high (definitely similar)
  * Very low overlap (&lt;0.4) → confidence = high (definitely different)
  * Ambiguous range (0.4-0.8) → confidence = medium

**Grouping Logic**:

for each issue:
  find similar issues:
    similarity = calculateSimilarity(issue1, issue2)
    if similarity.score &gt; 0.6 AND similarity.confidence == 'high':
      group together
    else if similarity.confidence == 'low':
      preserve as separate item (don't group)

**Consensus Levels**:
- Unanimous (100% agreement) - VERY HIGH confidence
- Strong Consensus (67-99% agreement) - HIGH confidence
- Majority (50-66% agreement) - MEDIUM confidence
- Divergent (single reviewer) - LOW confidence

</consensus_algorithm>

<recommended_models> Model Selection Strategy:

This command queries Claudish dynamically using `claudish --list-models --json` to
get the latest curated model recommendations. This ensures models stay current with
OpenRouter's ecosystem without hardcoded lists.

**Dynamic Query Process**:
1. Run: `npx claudish --list-models --json`
2. Parse JSON to extract: id, name, category, pricing
3. Filter for code review: coding, reasoning, vision categories
4. Present to user with current pricing and descriptions

**Fallback Models** (if Claudish unavailable):
- x-ai/grok-code-fast-1 - xAI Grok (fast coding, good value)
- google/gemini-2.5-flash - Gemini Flash (fast and affordable)
- openai/gpt-5.1-codex - GPT-5.1 Codex (advanced analysis)
- deepseek/deepseek-chat - DeepSeek (reasoning specialist)
- Claude Sonnet 4.5 embedded (always available, FREE)

**Model Selection Best Practices**:
- Start with 2-3 external models for diversity
- Always include embedded reviewer (FREE, provides baseline)
- Consider budget-friendly options (check Claudish for FREE models like Polaris Alpha)
- Custom models: Use OpenRouter format (provider/model-name)

**See Also**: `skills/claudish-integration/SKILL.md` for integration patterns

</recommended_models>

User wants to review unstaged changes with 3 external models + embedded

<user_request>/review</user_request>

<execution>
  **PHASE 1: Review Target Selection**
  - Ask: "What to review?" → User: "1" (unstaged changes)
  - Run: git status, git diff
  - Summarize: 5 files changed, +160 -38 lines
  - Ask: "Proceed?" → User: "Yes"
  - Write: ai-docs/code-review-context.md

  **PHASE 2: Model Selection and Cost Approval**
  - Check: Claudish available ✅, API key set ✅
  - Ask: "Select models" → User: "1,2,4,8" (Grok, Gemini Flash, DeepSeek, Embedded)
  - Calculate costs:
    * Input tokens: 160 lines × 1.5 = 240 tokens × 3 models
    * Output tokens: 2000-4000 per model
    * Grok: $0.08 input + $0.15-0.30 output = $0.23-0.38
    * Gemini Flash: $0.05 input + $0.10-0.20 output = $0.15-0.25
    * DeepSeek: $0.05 input + $0.10-0.20 output = $0.15-0.25
    * Total: $0.53-0.88
  - Show cost breakdown with input/output separation
  - Ask: "Proceed with $0.53-0.88 cost?" → User: "Yes"

  **PHASE 3: Parallel Multi-Model Review**
  - Launch embedded review → Task: senior-code-reviewer (NO PROXY_MODE)
  - Wait for embedded to complete → ✅
  - Launch 3 external reviews IN PARALLEL (single message, 3 Tasks):
    * Task: senior-code-reviewer PROXY_MODE: x-ai/grok-code-fast-1
    * Task: senior-code-reviewer PROXY_MODE: google/gemini-2.5-flash
    * Task: senior-code-reviewer PROXY_MODE: deepseek/deepseek-chat
  - Track: ✅✅✅✅ All complete (~5 min for parallel vs 15 min sequential)

  **PHASE 4: Consolidate Reviews**
  - Read: 4 review files (embedded + 3 external)
  - Parse: Issues from each review
  - Normalize: Extract categories, locations, keywords
  - Group similar issues: Use keyword-based algorithm with confidence
  - Analyze consensus:
    * 2 issues: Unanimous (100% - all 4 reviewers)
    * 3 issues: Strong consensus (75% - 3 of 4 reviewers)
    * 4 issues: Majority (50% - 2 of 4 reviewers)
    * 5 issues: Divergent (25% - 1 reviewer only)
  - Create model agreement matrix
  - Write: ai-docs/code-review-consolidated.md

  **PHASE 5: Present Results**
  - Generate summary with top 5 issues (prioritized by consensus)
  - Show: 2 unanimous critical issues → MUST FIX
  - Show: 3 strong consensus issues → RECOMMENDED TO FIX
  - Link: Detailed consolidated report
  - Link: Individual review files
  - Recommend: Fix 2 unanimous issues first, then re-run review
</execution>

<result>
  User receives comprehensive multi-model review in ~5 minutes (parallel execution)
  with clear priorities based on reviewer consensus. Total cost: ~$0.70 (within
  estimated range). User trust maintained through cost transparency.
</result>

Claudish not available, user opts for embedded reviewer only

<user_request>/review</user_request>

<execution>
  **PHASE 1: Review Target Selection**
  - User specifies: "Review src/services/*.ts"
  - Glob: Find matching files (5 files)
  - Read: File contents
  - Write: ai-docs/code-review-context.md

  **PHASE 2: Model Selection and Cost Approval**
  - Check: Claudish not available ❌
  - Show: "Claudish not found. Options: Install / Embedded Only / Cancel"
  - User: "Embedded Only"
  - Selected: Embedded reviewer only (no cost)

  **PHASE 3: Parallel Multi-Model Review**
  - Launch embedded review → Task: senior-code-reviewer
  - Complete: ✅

  **PHASE 4: Consolidate Reviews**
  - Read: 1 review file (embedded only)
  - Note: "Single reviewer (embedded only). Consensus analysis N/A."
  - Write: ai-docs/code-review-consolidated.md (simpler format, no consensus)

  **PHASE 5: Present Results**
  - Present: Issues from embedded review (no consensus levels)
  - Note: "Single reviewer. For multi-model validation, install Claudish and retry."
  - Link: Embedded review file
  - Recommend: Address critical issues found by embedded reviewer
</execution>

<result>
  Command still provides value with embedded reviewer only. User receives
  actionable feedback even without external models. Workflow completes
  successfully with graceful degradation.
</result>

User requests review but working directory is clean

<user_request>/review</user_request>

<execution>
  **PHASE 1: Review Target Selection**
  - Ask: "What to review?" → User: "1" (unstaged)
  - Run: git status → No changes found
  - Show: "No unstaged changes. Options: Recent commits / Files / Exit"
  - User: "Recent commits"
  - Ask: "Commit range?" → User: "HEAD~3..HEAD"
  - Run: git diff HEAD~3..HEAD
  - Summarize: 8 files changed across 3 commits
  - Ask: "Proceed?" → User: "Yes"
  - Write: ai-docs/code-review-context.md

  [... PHASE 2-5 continue normally with commits as review target ...]
</execution>

<result>
  Command recovers from "no changes" error by offering alternatives. User
  selects recent commits instead and workflow continues successfully.
</result>

<error_recovery> Offer alternatives (review commits/files) or exit gracefully. Don't fail. Present clear options and let user decide next action.

Show setup instructions with two paths: install Claudish or use npx (no install). Offer embedded-only option as fallback. Don't block workflow. Show setup instructions (get key from OpenRouter, set environment variable). Wait for user to set key, or offer embedded-only option. Don't block workflow. Continue with successful reviews. Note failures in consolidated report with details (which model, what error). Adjust consensus calculations for actual reviewer count. Don't fail entire workflow. Show detailed error message with failure reasons for each reviewer. Save context file for manual review. Provide troubleshooting steps (check network, verify API key, check rate limits). Exit gracefully with clear guidance. Exit gracefully with message: "Review cancelled. Run /review again to restart." Preserve context file if already created. Clear and friendly exit. Validate format (provider/model-name). If invalid, explain format and show examples. Link to OpenRouter models page. Ask for corrected ID or offer to cancel custom selection.

<success_criteria> ✅ At least 1 review completed (embedded or external) ✅ Consolidated report generated with consensus analysis (if multiple reviewers) ✅ User receives actionable feedback prioritized by confidence ✅ Cost transparency maintained (show estimates with input/output breakdown before charging) ✅ Parallel execution achieves 3-5x speedup on external reviews ✅ Graceful degradation works (embedded-only path functional) ✅ Clear error messages and recovery options for all failure scenarios ✅ TodoWrite tracking shows progress through all 5 phases ✅ Consensus algorithm uses simplified keyword-based approach with confidence levels </success_criteria>

- Be clear and concise in user-facing messages - Use visual indicators for clarity (checkmarks, alerts, progress) - Show real-time progress indicators for long-running operations (parallel reviews) * Format: "Review 1/3 complete: Grok (✓), Gemini (⏳), DeepSeek (⏹)" * Update as each review completes to keep users informed during 5-10 min execution * Use status symbols: ✓ (complete), ⏳ (in progress), ⏹ (pending) - Provide context and rationale for recommendations - Make costs and trade-offs transparent (input/output token breakdown) - Present brief summaries (under 50 lines) for user, link to detailed reports Review context with diff/files and instructions for reviewers Embedded Claude Sonnet review (if embedded selected) External model review (one file per external model, sanitized filename) Consolidated report with consensus analysis, priorities, and recommendations

<user_summary_format> Present brief summary (under 50 lines) with: - Reviewer count and models used - Overall verdict (PASSED/REQUIRES_IMPROVEMENT/FAILED) - Top 5 most important issues prioritized by consensus - Code strengths acknowledged by multiple reviewers - Links to detailed consolidated report and individual reviews - Clear next steps and recommendations - Cost breakdown with actual cost (if external models used) </user_summary_format>

33 KiB Raw Blame History Unescape Escape

33 KiB

Raw Blame History