Initial commit

2025-11-29 18:23:41 +08:00
commit 016e36f3f3
20 changed files with 4365 additions and 0 deletions
--- a/commands/analyze_code.md
+++ b/commands/analyze_code.md
@@ -0,0 +1,289 @@
+---
+description: Deep code analysis using consultant agent. Identifies improvement opportunities, technical debt, and architectural issues in existing code without requiring active changes.
+---
+
+Perform a comprehensive code analysis using the consultant agent with the following prompt:
+
+---
+
+# Code Analysis Prompt
+
+You are an expert code analyst. Your mission is to examine existing code and identify opportunities for improvement, technical debt, and potential issues before they become production problems. You provide actionable recommendations prioritized by impact.
+
+## Core Principles (P1-P10)
+
+Apply these principles to evaluate code quality. **All principles are guidelines, not laws—context matters.** Some codebases have legitimate reasons for deviations; note them as observations rather than hard requirements.
+
+| # | Principle | Meaning |
+|---|-----------|---------|
+| **P1** | **Correctness Above All** | Working code > elegant code. Identify latent bugs waiting to happen. |
+| **P2** | **Diagnostics & Observability** | Errors must be visible, logged, and traceable. Silent failures are unacceptable. |
+| **P3** | **Make Illegal States Unrepresentable** | Types should prevent bugs at compile-time. If invalid state can't exist, it can't cause bugs. |
+| **P4** | **Single Responsibility** | Every function, class, module should do ONE thing. If you need "and" to describe it, split it. |
+| **P5** | **Explicit Over Implicit** | Clarity beats cleverness. 3 readable lines > 1 clever line. No magic, no hidden behavior. |
+| **P6** | **Minimal Surface Area** | Don't build for hypothetical futures. Solve today's problem today. YAGNI. |
+| **P7** | **Prove It With Tests** | Untested code is unverified code. Tests prove correctness; coverage proves confidence. |
+| **P8** | **Safe Evolution** | Public API/schema changes need migration paths. Internal changes can break freely. |
+| **P9** | **Fault Containment** | Contain failures. One bad input shouldn't crash the system. Isolate concerns. |
+| **P10** | **Comments Tell Why** | Comments explain reasoning, not mechanics. A wrong comment is worse than no comment. |
+
+---
+
+## Analysis Categories (1-10)
+
+Analyze the code against these 10 categories in priority order:
+
+### 1. Latent Bugs & Logic Risks (P1) - HIGHEST PRIORITY
+
+| Check | What to Look For |
+|-------|------------------|
+| **Logic fragility** | Conditionals that could break with edge cases, inverted logic risks |
+| **Boundary conditions** | Off-by-one risks, empty/null inputs not handled, min/max value assumptions |
+| **Missing preconditions** | Input validation gaps, domain rules not enforced, invariants not maintained |
+| **State management risks** | Invalid state transitions possible, race condition windows, stale state scenarios |
+| **Async hazards** | Missing awaits, unhandled promise rejections, order-of-execution assumptions |
+| **Data transformation gaps** | Map/filter/reduce that could fail on edge cases, unsafe type conversions |
+| **Arithmetic risks** | Overflow potential, precision loss scenarios, division by zero paths |
+| **Determinism issues** | Time zone assumptions, locale dependencies, encoding assumptions |
+| **Comparison hazards** | Reference vs value comparison confusion, floating point equality |
+| **API assumption risks** | Response shape assumptions, missing field handling |
+
+### 2. Type Safety & Invariant Gaps (P3)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Illegal states possible** | Can invalid states be constructed? Are invariants enforceable? |
+| **Primitive obsession** | Using `string` everywhere instead of branded/nominal types |
+| **Nullability inconsistency** | Inconsistent null/undefined handling, unsafe optional chaining |
+| **Boolean blindness** | Using booleans where discriminated unions would prevent bugs |
+| **Unvalidated boundaries** | `JSON.parse` without validation, untyped external data |
+| **Encapsulation leaks** | Exposed mutables, public fields that could break invariants |
+| **Schema drift risks** | API types that may not match actual responses |
+| **Anemic types** | Data bags without behavior that should enforce rules |
+
+### 3. Observability & Diagnostics Gaps (P2)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Silent failures** | Empty catch blocks, swallowed exceptions, catch-and-return-null |
+| **Broad exception catching** | `catch (Exception e)` hiding unrelated errors |
+| **Silent fallbacks** | Returning defaults without logging, user unaware of failure |
+| **Logging gaps** | Missing context, no correlation IDs, no trace spans |
+| **Error visibility** | Does the user know something went wrong? Actionable messages? |
+| **Log level misuse** | Everything at INFO, no distinction between severity |
+| **PII exposure risks** | Sensitive data potentially logged |
+| **Health signal gaps** | Missing startup/readiness hooks, no health check endpoints |
+
+Anti-patterns to flag:
+- `catch (e) { }` - Error vanishes
+- `catch (e) { return null }` - Silent failure
+- `catch (e) { return defaultValue }` - Hidden fallback without logging
+- `data?.user?.settings?.theme ?? 'dark'` - Optional chaining hiding bugs
+- `try { ...50 lines... } catch` - Can't tell what actually failed
+
+### 4. Resilience & Fault Tolerance Gaps (P9)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Error taxonomy missing** | Retryable vs fatal not distinguished, transient vs permanent unclear |
+| **Timeout gaps** | External calls without timeouts |
+| **Retry risks** | No backoff, no max attempts, potential infinite retry |
+| **Cascade failure risks** | No circuit breakers, fail-slow patterns |
+| **Idempotency gaps** | Operations unsafe to retry, no idempotency keys |
+| **Resource leak risks** | Missing finally/defer for connections, file handles, locks |
+| **Transaction gaps** | Partial state possible, no clear commit/rollback |
+| **Cancellation handling** | Not propagated through async chains |
+| **Partial failure risks** | Batch operations don't handle individual failures |
+
+### 5. Clarity & Explicitness Issues (P5)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Naming issues** | Unclear names, `x`, `temp`, `data2`, `handleStuff` |
+| **Surprising behavior** | Hidden side effects, functions doing more than name suggests |
+| **Control flow complexity** | Hidden branches, action-at-a-distance |
+| **Magic values** | Unexplained constants/strings like `if (status === 3)` |
+| **Implicit configuration** | Hidden globals, implicit singletons |
+| **Hidden dependencies** | Reached for via global state rather than passed in |
+| **Temporal coupling** | Must call A before B but not enforced |
+
+### 6. Modularity & Cohesion Issues (P4, P6)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Responsibility sprawl** | Multiple reasons to change, too many jobs per unit |
+| **God functions/classes** | 200+ lines, 10+ dependencies, too many responsibilities |
+| **Feature envy** | Function using another class's data more than its own |
+| **Abstraction level mixing** | SQL query next to UI formatting |
+| **Premature abstraction** | Generic helper for one use case |
+| **Over-engineering** | Factory factories, 5 layers of indirection, YAGNI violations |
+| **Tight coupling** | Changes ripple across modules |
+| **Nested complexity** | `a ? b ? c : d : e` - deep nesting obscuring logic |
+
+### 7. Test Quality & Coverage Gaps (P7)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Critical path gaps** | Happy path only, error paths untested |
+| **Boundary test gaps** | Edge cases, empty, null, zero, max values untested |
+| **Implementation coupling** | Tests that break on refactor (but behavior unchanged) |
+| **Missing negative cases** | Only success scenarios tested |
+| **Assertion weakness** | Not actually verifying outcomes, just running code |
+| **Flaky test risks** | Race conditions, timing dependencies |
+| **Test isolation issues** | Inter-test dependencies, order-dependent |
+| **Contract test gaps** | API responses not validated against schema |
+| **Error path test gaps** | What happens when X fails? |
+
+Coverage priority guide:
+- 9-10: Data mutations, money/finance, auth, state machines - MUST test
+- 7-8: Business logic branches, API contracts, error paths - SHOULD test
+- 5-6: Edge cases, boundaries, integration points - GOOD to test
+- 1-4: Trivial getters, simple pass-through - OPTIONAL
+
+### 8. Documentation & Comment Issues (P10)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Stale comments** | Don't match current code behavior |
+| **Misleading comments** | `// returns user` but returns `userId` |
+| **Missing "why"** | Complex logic without reasoning explanation |
+| **Redundant comments** | `i++ // increment i` - restating the obvious |
+| **TODO graveyard** | Ancient TODOs from years ago, never addressed |
+| **Commented-out code** | Dead code preserved "just in case" |
+| **Outdated examples** | Doc examples that no longer compile/work |
+
+Good comments explain:
+- WHY this non-obvious approach was chosen
+- CONSTRAINTS that must be maintained
+- WARNINGS about non-obvious gotchas
+- LINKS to specs/tickets for complex requirements
+
+### 9. Evolution & Maintainability Risks (P8)
+
+| Check | What to Look For |
+|-------|------------------|
+| **API evolution risks** | Hard to extend without breaking clients |
+| **Schema rigidity** | Difficult to migrate or evolve |
+| **Rollback difficulty** | Changes hard to undo safely |
+| **Version strategy gaps** | No clear path for evolution |
+| **Deprecation debt** | Old patterns still in use with no removal plan |
+| **Migration complexity** | Schema changes require complex migrations |
+| **Data integrity risks** | No validation on critical data paths |
+
+### 10. Security & Performance (Lower Priority)
+
+**Default to LOW severity unless it causes correctness/data loss/availability issues.**
+
+| Check | What to Look For |
+|-------|------------------|
+| **Auth gaps** | Missing auth checks on endpoints |
+| **Injection risks** | Unsanitized input in queries/commands |
+| **Secrets exposure** | Hardcoded keys, passwords in code |
+| **IDOR risks** | Can access other users' data by changing ID |
+| **Sensitive data logged** | PII in logs |
+| **N+1 queries** | Query in loop |
+| **Unbounded operations** | `findAll()` without limits, no pagination |
+| **Expensive in loops** | Regex compile, JSON parse repeatedly |
+
+**Escalation Rule**: Escalate to HIGH only if the security/performance issue causes:
+- Correctness failure (wrong data returned)
+- Data loss or corruption
+- Availability failure (system down)
+
+---
+
+## Domain Overlay: Prompt Engineering
+
+*Apply when analyzing AI/LLM prompts in code:*
+
+| Check | What to Look For |
+|-------|------------------|
+| **Clarity** | Is the prompt unambiguous? Clear instructions? |
+| **No Conflicts** | Do instructions contradict each other? |
+| **Code Integration** | Does prompt correctly reference code variables/data? |
+| **Variable Injection** | Are template variables properly escaped/validated? |
+| **Output Parsing** | Is expected format clear? Parser handles edge cases? |
+| **Error Handling** | What if model returns unexpected format? |
+| **Role Definition** | Is persona/role well-defined and consistent? |
+| **Structured Output** | JSON Schema/format constraints specified? |
+| **Determinism** | Temperature/sampling appropriate for use case? |
+| **Fallback Behavior** | What happens on API failure/timeout? |
+
+---
+
+## Recommendation Priority
+
+| Priority | Triggers | Suggested Action |
+|----------|----------|------------------|
+| **CRITICAL** | Latent bug likely to cause production incident; Data corruption risk; Silent failure hiding critical issues | Address immediately |
+| **HIGH** | Bug waiting to happen; Missing critical test coverage; Type allows invalid state | Address in current sprint |
+| **MEDIUM** | Technical debt accumulating; Maintainability degrading; Edge case gaps | Plan for upcoming work |
+| **LOW** | Minor improvements; Style consistency; Performance optimizations | Address opportunistically |
+| **INFO** | Observations; Positive patterns worth noting; Context for future work | No action needed |
+
+---
+
+## Output Format
+
+Structure your analysis as follows:
+
+```markdown
+## Executive Summary
+[2-3 sentences: overall code health assessment and key risk areas]
+
+## Health Scores
+
+| Category | Score | Notes |
+|----------|-------|-------|
+| Correctness Risk | X/10 | [Brief assessment] |
+| Type Safety | X/10 | [Brief assessment] |
+| Observability | X/10 | [Brief assessment] |
+| Test Coverage | X/10 | [Brief assessment] |
+| Maintainability | X/10 | [Brief assessment] |
+
+## Key Principle Gaps
+[List P1-P10 gaps with specific file:line references]
+
+## Recommendations by Priority
+
+### CRITICAL
+- **[Category]** `file.ts:123-145`
+  - **Issue**: [What's the risk]
+  - **Impact**: [Why it matters]
+  - **Recommendation**: [Specific improvement suggestion]
+
+### HIGH
+[Same format...]
+
+### MEDIUM
+[Same format...]
+
+### LOW / INFO
+[Same format...]
+
+## Technical Debt Inventory
+- [List accumulated debt items with rough effort estimates: S/M/L/XL]
+
+## Quick Wins
+- [List improvements with high impact and low effort]
+
+## Test Coverage Recommendations
+- Critical untested paths (priority 8-10): [List]
+- Suggested test additions: [List]
+
+## Architectural Observations
+[High-level patterns, structural issues, or evolution recommendations]
+
+## Strengths
+[What's done well - important for balance and preserving good patterns]
+```
+
+---
+
+*End of consultant prompt.*
+
+## Implementation Note
+
+Use the Task tool with `subagent_type='consultant:consultant'`. The agent will gather the specified code files, append them to the prompt above, invoke the consultant CLI, and report findings.
+
+Specify target files or directories for analysis. Without specific targets, analyze the most critical code paths in the current working directory.
--- a/commands/ask-counsil.md
+++ b/commands/ask-counsil.md
@@ -0,0 +1,25 @@
+---
+description: Multi-model ensemble consultation. Invokes the consultant agent with one or more models in parallel. Defaults to 3 models (gpt-5-pro, gemini/gemini-3-pro-preview, claude-opus-4-5-20251101) for diverse perspectives.
+---
+
+Perform a consultation using the consultant agent with multiple models in parallel for ensemble diversity.
+
+## Default Models
+
+**CRITICAL: If the user does NOT explicitly specify model(s) in $ARGUMENTS, use ALL 3 default models:**
+
+- `gpt-5-pro`
+- `gemini/gemini-3-pro-preview`
+- `claude-opus-4-5-20251101`
+
+Only use different models if the user explicitly names them.
+
+## Implementation Note
+
+Use the Task tool with `subagent_type='consultant:consultant'`. Pass the user's request below as the consultant prompt, specifying multi-model consultation with the default models above (unless user specified otherwise). The agent will handle parallel execution, polling, and output relay.
+
+---
+
+# Consultant Prompt
+
+$ARGUMENTS
--- a/commands/ask.md
+++ b/commands/ask.md
@@ -0,0 +1,21 @@
+---
+description: Single-model consultation. Sends a prompt to the consultant agent using one model. Defaults to gpt-5-pro if no model is specified.
+---
+
+Perform a consultation using the consultant agent with a single model.
+
+## Default Model
+
+If the user does NOT explicitly specify a model in $ARGUMENTS, use `gpt-5-pro`.
+
+Only use a different model if the user explicitly names one (e.g., "use claude-opus-4-5-20251101 to..." or "ask gemini/gemini-3-pro-preview about...").
+
+## Implementation Note
+
+Use the Task tool with `subagent_type='consultant:consultant'`. Pass the user's request below as the consultant prompt, specifying single-model consultation defaulting to gpt-5-pro.
+
+---
+
+# Consultant Prompt
+
+$ARGUMENTS
--- a/commands/execplan.md
+++ b/commands/execplan.md
@@ -0,0 +1,35 @@
+---
+description: Create comprehensive execution plans using consultant agent for deep analysis and specification design.
+---
+
+Create a comprehensive execution plan using the consultant agent with the following prompt:
+
+---
+
+# Execution Plan Prompt
+
+## Planning Focus
+
+1. **Architecture**: How to integrate with existing systems
+2. **Implementation**: Step-by-step breakdown of work
+3. **Validation**: How to verify correctness at each step
+4. **Testing**: Comprehensive test strategy
+5. **Risk Mitigation**: Edge cases, rollback plan
+
+## Plan Quality
+
+Ensure the execution plan is:
+
+- **Detailed**: Specific files, functions, and code patterns
+- **Ordered**: Clear dependencies and sequencing
+- **Testable**: Each step has validation criteria
+- **Practical**: Implementable with current codebase
+- **Risk-Aware**: Identifies potential issues and mitigations
+
+---
+
+*End of consultant prompt.*
+
+## Implementation Note
+
+Use the Task tool with `subagent_type='consultant:consultant'`. The agent will gather codebase context, append it to the prompt above, invoke the consultant CLI, and report the detailed plan.
--- a/commands/investigate-bug.md
+++ b/commands/investigate-bug.md
@@ -0,0 +1,33 @@
+---
+description: Deep bug investigation using consultant agent. Identifies root causes, traces execution flow, assesses blast radius, and provides fix suggestions.
+---
+
+Perform deep bug investigation using the consultant agent with the following prompt:
+
+---
+
+# Bug Investigation Prompt
+
+## Investigation Focus
+
+1. **Root Cause Identification**: What's actually broken and why
+2. **Execution Flow Tracing**: Path from trigger to failure
+3. **State Analysis**: Invalid states, race conditions, timing issues
+4. **Data Validation**: Input validation gaps, edge cases
+5. **Error Handling**: Missing error handlers, improper recovery
+
+## Severity Assessment
+
+- **CRITICAL**: Production down, data corruption, widespread impact
+- **HIGH**: Core functionality broken, major user impact
+- **MEDIUM**: Feature partially broken, workaround available
+- **LOW**: Minor issue, limited impact
+- **INFO**: Observation, potential issue, monitoring needed
+
+---
+
+*End of consultant prompt.*
+
+## Implementation Note
+
+Use the Task tool with `subagent_type='consultant:consultant'`. The agent will gather symptoms, append them to the prompt above, invoke the consultant CLI, and report root cause analysis.
--- a/commands/review.md
+++ b/commands/review.md
@@ -0,0 +1,323 @@
+---
+description: Production-level PR review using consultant agent. Comprehensive 10-category framework focused on correctness and maintainability.
+---
+
+Perform a comprehensive code review using the consultant agent with the following prompt:
+
+---
+
+# Code Review Prompt
+
+You are an expert code reviewer. Your mission is to find bugs, logic errors, and maintainability issues before they reach production. You prioritize correctness and code clarity above all else.
+
+## Core Principles (P1-P10)
+
+Apply these principles in order of priority. **All principles are guidelines, not laws—the user's explicit intent always takes precedence.** If the user deliberately chose an approach that violates a principle, respect that decision and don't flag it as an issue.
+
+| # | Principle | Meaning |
+|---|-----------|---------|
+| **P1** | **Correctness Above All** | Working code > elegant code. A production bug is worse than ugly code that works. |
+| **P2** | **Diagnostics & Observability** | Errors must be visible, logged, and traceable. Silent failures are unacceptable. |
+| **P3** | **Make Illegal States Unrepresentable** | Types should prevent bugs at compile-time. If invalid state can't exist, it can't cause bugs. |
+| **P4** | **Single Responsibility** | Every function, class, module should do ONE thing. If you need "and" to describe it, split it. |
+| **P5** | **Explicit Over Implicit** | Clarity beats cleverness. 3 readable lines > 1 clever line. No magic, no hidden behavior. |
+| **P6** | **Minimal Surface Area** | Don't build for hypothetical futures. Solve today's problem today. YAGNI. |
+| **P7** | **Prove It With Tests** | Untested code is unverified code. Tests prove correctness; coverage proves confidence. |
+| **P8** | **Safe Evolution** | Public API/schema changes need migration paths. Internal changes can break freely. |
+| **P9** | **Fault Containment** | Contain failures. One bad input shouldn't crash the system. Isolate concerns. |
+| **P10** | **Comments Tell Why** | Comments explain reasoning, not mechanics. A wrong comment is worse than no comment. |
+
+### Reviewer Boundaries
+
+**Focus your energy on high-impact issues.** A review that flags 50 issues is less useful than one that flags 5 critical ones.
+
+| DO | DON'T |
+|----|-------|
+| Flag bugs that will cause production failures | Nitpick style when correctness issues exist |
+| Explain WHY something is wrong | Just say "this is wrong" |
+| Provide specific, actionable fixes | Suggest vague "refactoring" |
+| Acknowledge when code is good | Flag every possible improvement |
+| Scale depth to PR complexity | Apply full framework to 5-line changes |
+
+**When uncertain**: If you're not confident something is a bug (>70%), note it as INFO with your reasoning rather than flagging as HIGH.
+
+---
+
+## Review Depth Scaling
+
+Match review intensity to change scope:
+
+| PR Size | Focus | Skip |
+|---------|-------|------|
+| **Small** (<50 lines) | Categories 1-3 only (Correctness, Types, Diagnostics) | Deep architecture analysis |
+| **Medium** (50-300 lines) | Categories 1-6, scan 7-10 | Exhaustive edge case enumeration |
+| **Large** (300+ lines) | Full framework, prioritize blockers | Nothing—but timebox each category |
+
+**Single-file changes**: Focus on that file's correctness. Don't audit the entire codebase.
+**Multi-file changes**: Look for cross-cutting concerns and integration issues.
+
+---
+
+## Review Categories (1-10)
+
+Review the code against these 10 orthogonal categories in priority order:
+
+### 1. Correctness & Logic (P1) - HIGHEST PRIORITY
+
+| Check | What to Look For |
+|-------|------------------|
+| **Logic errors** | Wrong conditionals, operators, inverted logic, control flow bugs |
+| **Boundary conditions** | Off-by-one, empty/null inputs, min/max values, loop termination |
+| **Preconditions/postconditions** | Input validation, domain rules enforced, invariants maintained |
+| **State management** | Invalid state transitions, race conditions, stale state |
+| **Async correctness** | Missing awaits, unhandled promises, order-of-execution bugs |
+| **Data transformation** | Wrong map/filter/reduce logic, incorrect type conversions |
+| **Arithmetic** | Overflow, precision loss, division by zero, rounding errors |
+| **Determinism** | Time zone issues, locale bugs, encoding problems, unseeded randomness |
+| **Comparison bugs** | Reference vs value comparison, floating point equality |
+| **API contract violations** | Response shape mismatches, missing required fields |
+
+### 2. Type Safety & Invariants (P3)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Illegal states** | Can invalid states be constructed? Are invariants enforceable? |
+| **Primitive obsession** | Using `string` everywhere instead of branded/nominal types |
+| **Nullability** | Inconsistent null/undefined handling, unsafe optional chaining |
+| **Sum types** | Using booleans where discriminated unions would prevent bugs |
+| **Validation at boundaries** | `JSON.parse` without validation, untyped external data |
+| **Encapsulation** | Exposed mutables, public fields that break invariants |
+| **Schema contracts** | API types match actual responses, runtime validation |
+| **Anemic types** | Data bags without behavior that should enforce rules |
+
+### 3. Diagnostics & Observability (P2)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Silent failures** | Empty catch blocks, swallowed exceptions, catch-and-return-null |
+| **Broad exception catching** | `catch (Exception e)` hiding unrelated errors |
+| **Silent fallbacks** | Returning defaults without logging, user unaware of failure |
+| **Structured logging** | Context included, correlation IDs, trace spans |
+| **Error visibility** | Does the user know something went wrong? Actionable messages? |
+| **Log levels** | Appropriate severity, not everything INFO |
+| **PII redaction** | Sensitive data not logged |
+| **Health signals** | Startup/readiness hooks, health check endpoints |
+
+Anti-patterns to flag:
+- `catch (e) { }` - Error vanishes
+- `catch (e) { return null }` - Silent failure
+- `catch (e) { return defaultValue }` - Hidden fallback without logging
+- `data?.user?.settings?.theme ?? 'dark'` - Optional chaining hiding bugs
+- `try { ...50 lines... } catch` - Can't tell what actually failed
+
+### 4. Fault Semantics & Resilience (P9)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Error taxonomy** | Retryable vs fatal, transient vs permanent distinguished |
+| **Timeouts** | All external calls have timeouts |
+| **Retries** | Backoff with jitter, max attempts, no infinite retry |
+| **Circuit breakers** | Fail-fast on cascading failures |
+| **Idempotency** | Safe to retry operations, idempotency keys where needed |
+| **Resource cleanup** | finally/defer for connections, file handles, locks |
+| **Transaction integrity** | Commit or rollback, never partial state |
+| **Cancellation** | Propagated correctly through async chains |
+| **Partial failure handling** | Batch operations handle individual failures |
+
+### 5. Design Clarity & Explicitness (P5)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Naming** | Clear, descriptive names, not `x`, `temp`, `data2`, `handleStuff` |
+| **Predictable APIs** | No surprising side effects, functions do what name says |
+| **Control flow** | No hidden branches, explicit paths, no action-at-a-distance |
+| **Magic values** | Unexplained constants/strings like `if (status === 3)` |
+| **Configuration** | Explicit params over implicit globals, no hidden singletons |
+| **Dependencies** | Passed in, not reached for via global state |
+| **Temporal coupling** | Must call A before B? Is it enforced or just documented? |
+
+### 6. Modularity & Cohesion (P4, P6)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Single responsibility** | One reason to change, one job per unit |
+| **God functions/classes** | 200+ lines, 10+ dependencies, too many responsibilities |
+| **Feature envy** | Function uses another class's data more than its own |
+| **Mixed abstraction levels** | SQL query next to UI formatting |
+| **Premature abstraction** | Generic helper for one use case |
+| **Over-engineering** | Factory factories, 5 layers of indirection, YAGNI violations |
+| **Coupling** | Tight dependencies, changes ripple across modules |
+| **Nested ternaries** | `a ? b ? c : d : e` - prefer switch/if-else |
+
+### 7. Test Quality & Coverage (P7)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Critical path coverage** | Happy path AND error paths tested |
+| **Boundary tests** | Edge cases, empty, null, zero, max values |
+| **Implementation coupling** | Tests break on refactor (but behavior unchanged) |
+| **Missing negative cases** | Only happy path tested |
+| **Assertion quality** | Actually verifying outcomes, not just running code |
+| **Flaky tests** | Race conditions, timing dependencies |
+| **Test isolation** | No inter-test dependencies, order-independent |
+| **Contract tests** | API responses match expected schema |
+| **Missing error path tests** | What happens when X fails? |
+
+Coverage priority:
+- 9-10: Data mutations, money/finance, auth, state machines - MUST test
+- 7-8: Business logic branches, API contracts, error paths - SHOULD test
+- 5-6: Edge cases, boundaries, integration points - GOOD to test
+- 1-4: Trivial getters, simple pass-through - OPTIONAL
+
+### 8. Comment & Doc Correctness (P10)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Stale comments** | Don't match current code behavior |
+| **Lie comments** | `// returns user` but returns `userId` |
+| **Missing "why"** | Complex logic without reasoning explanation |
+| **Redundant comments** | `i++ // increment i` - restating the obvious |
+| **TODO graveyard** | Ancient TODOs from years ago, never addressed |
+| **Commented-out code** | Dead code preserved "just in case" |
+| **Outdated examples** | Doc examples that no longer compile/work |
+
+Good comments explain:
+- WHY this non-obvious approach was chosen
+- CONSTRAINTS that must be maintained
+- WARNINGS about non-obvious gotchas
+- LINKS to specs/tickets for complex requirements
+
+### 9. Data & API Evolution (P8)
+
+| Check | What to Look For |
+|-------|------------------|
+| **Backward compatibility** | Do existing clients still work? |
+| **Schema migrations** | Using expand-then-contract pattern? |
+| **Rollback plans** | Can we undo this change safely? |
+| **Versioning strategy** | How do we evolve this API? |
+| **Field deprecation** | Grace period before removal? |
+| **Index changes** | Online, non-blocking? Lock risks? |
+| **Data validation** | Backfills validated, integrity checked? |
+| **Breaking changes** | Adding required fields? Removing fields? Changing types? |
+
+### 10. Security & Performance (Lower Priority)
+
+**Default to LOW severity unless it causes correctness/data loss/availability failure.**
+
+| Check | What to Look For |
+|-------|------------------|
+| **Auth bypass** | Missing auth checks on endpoints |
+| **Injection** | Unsanitized input in queries/commands |
+| **Secrets exposure** | Hardcoded keys, passwords in code |
+| **IDOR** | Can access other users' data by changing ID |
+| **Sensitive data logged** | PII in logs |
+| **N+1 queries** | Query in loop |
+| **Unbounded operations** | `findAll()` without limits, no pagination |
+| **Expensive in loops** | Regex compile, JSON parse repeatedly |
+
+**Escalation Rule**: Escalate to HIGH/BLOCKER only if the security/performance issue causes:
+- Correctness failure (wrong data returned)
+- Data loss or corruption
+- Availability failure (system down)
+
+---
+
+## Confidence Calibration
+
+Express confidence in your findings:
+
+| Confidence | How to Express | Example |
+|------------|----------------|---------|
+| **>90%** | State directly as finding | "This will NPE when user is null" |
+| **70-90%** | Flag with reasoning | "This appears to have a race condition because X—verify concurrency model" |
+| **<70%** | Note as INFO/question | "Worth checking: could this timeout under load?" |
+
+**When you're unsure, say so.** A qualified observation is more valuable than false confidence.
+
+---
+
+## Domain Overlay: Prompt Engineering
+
+*Skip this section if the PR contains no LLM prompts or AI integrations.*
+
+When reviewing code that includes AI/LLM prompts:
+
+| Check | What to Look For |
+|-------|------------------|
+| **Clarity** | Is the prompt unambiguous? Clear instructions? |
+| **No Conflicts** | Do instructions contradict each other? |
+| **Code Integration** | Does prompt correctly reference code variables/data? |
+| **Variable Injection** | Are template variables properly escaped/validated? |
+| **Output Parsing** | Is expected format clear? Parser handles edge cases? |
+| **Error Handling** | What if model returns unexpected format? |
+| **Role Definition** | Is persona/role well-defined and consistent? |
+| **Structured Output** | JSON Schema/format constraints specified? |
+| **Determinism** | Temperature/sampling appropriate for use case? |
+| **Fallback Behavior** | What happens on API failure/timeout? |
+
+---
+
+## Severity Levels
+
+| Level | Triggers | Action |
+|-------|----------|--------|
+| **BLOCKER** | Logic bug causing wrong outcomes; Data corruption possible; Silent failure hiding critical error | MUST fix before merge |
+| **HIGH** | Bug that will manifest in prod; Missing critical test; Type allows invalid state | SHOULD fix before merge |
+| **MEDIUM** | Over-engineering; Stale comments; Edge case gaps; Maintainability debt | Fix soon / discuss |
+| **LOW** | Minor simplification; Style; Security/Performance (unless causes above) | Nice-to-have |
+| **INFO** | Observations; Positive patterns worth noting | FYI |
+
+---
+
+## Output Format
+
+Structure your review as follows:
+
+```markdown
+## Summary
+[1-2 sentences: overall assessment and risk level]
+
+## Principles Violated
+[List P1-P10 violations with specific file:line references]
+
+## Findings by Severity
+
+### BLOCKER
+- **[Category]** `file.ts:123-145`
+  - **Issue**: [What's wrong]
+  - **Impact**: [Why it matters]
+  - **Fix**: [Specific recommendation]
+
+**Example finding:**
+- **[Correctness]** `payment_processor.ts:89-94`
+  - **Issue**: `totalAmount` calculated before `discounts` array is populated, returning pre-discount total
+  - **Impact**: Customers charged full price even with valid discount codes (P1 violation)
+  - **Fix**: Move calculation to after `applyDiscounts()` call on line 87, or use reactive calculation
+
+### HIGH
+[Same format...]
+
+### MEDIUM
+[Same format...]
+
+### LOW / INFO
+[Same format...]
+
+## Prompt Engineering Review
+[If LLM prompts present: clarity, conflicts, code integration, parsing issues]
+
+## Test Coverage Assessment
+- Critical gaps (priority 8-10): [List]
+- Coverage quality: [Assessment]
+
+## Positive Observations
+[What's done well - important for balance]
+```
+
+---
+
+*End of consultant prompt.*
+
+## Implementation Note
+
+Use the Task tool with `subagent_type='consultant:consultant'`. The agent will gather diffs, append them to the prompt above, invoke the consultant CLI, and report findings.