Initial commit

2025-11-30 08:36:05 +08:00
commit 0795c78068
10 changed files with 1371 additions and 0 deletions
--- a/commands/codereview.md
+++ b/commands/codereview.md
@@ -0,0 +1,481 @@
+---
+description: Conduct systematic multi-step code review with severity-classified findings and actionable fixes
+argument-hint: files to review and optional focus areas
+disable-model-invocation: true
+---
+
+# CodeReview Investigation Workflow
+
+Conduct a **systematic, multi-step code review** of the specified files using the CodeReview methodology. This approach prevents superficial single-pass reviews by enforcing multiple investigation steps with progressive confidence building.
+
+**Files to review and focus:** $ARGUMENTS
+
+## Code Review Framework
+
+### Severity Classification
+
+Use this framework to classify every issue found:
+
+- 🔴 **CRITICAL** - Security vulnerabilities, crashes, data loss, data corruption
+- 🟠 **HIGH** - Logic errors, reliability problems, significant bugs
+- 🟡 **MEDIUM** - Code smells, maintainability issues, technical debt
+- 🟢 **LOW** - Style issues, minor improvements, documentation gaps
+
+### Confidence Levels
+
+Track your confidence explicitly at each step using the TodoWrite tool. Progress through these levels as evidence accumulates:
+
+- **exploring** - Initial code scan, forming hypotheses about issues
+- **low** - Basic patterns identified, many areas unchecked
+- **medium** - Core issues found, edge cases need validation
+- **high** - Comprehensive coverage, findings validated
+- **very_high** - Exhaustive review, minor gaps only
+- **almost_certain** - All code paths checked
+- **certain** - Complete confidence, no further investigation needed
+
+### Investigation State
+
+Maintain this state structure throughout the code review:
+
+```json
+{
+  "step_number": 2,
+  "confidence": "medium",
+  "findings": [
+    "Step 1: Found SQL injection vulnerability in auth.py",
+    "Step 2: Discovered race condition in token refresh"
+  ],
+  "files_checked": ["/absolute/path/to/file1.py", "/absolute/path/to/file2.py"],
+  "issues_found": [
+    {
+      "severity": "critical",
+      "description": "SQL injection in user query construction",
+      "location": "auth.py:45",
+      "impact": "Attackers can execute arbitrary SQL commands"
+    }
+  ]
+}
+```
+
+## Workflow Steps
+
+### Step 1: Initial Code Scan (Confidence: exploring)
+
+**Focus on:**
+
+- Reading specified code files completely
+- Understanding structure, architecture, design patterns
+- Identifying obvious issues (bugs, security vulnerabilities, performance problems)
+- Noting code smells and anti-patterns
+- Looking for common vulnerability patterns
+
+**Actions:**
+
+- Read all specified files using Read tool
+- Examine imports, dependencies, external integrations
+- Check for obvious security issues (hardcoded secrets, SQL injection points)
+- Note architectural concerns
+
+**When done:** Use a **Haiku agent** as your investigation guide with these instructions:
+
+---
+
+#### Investigator Agent Instructions
+
+You are a code review guide specializing in systematic code analysis. Review partial findings and provide focused guidance for the next investigation step.
+
+**Your responsibilities:**
+
+1. Assess current findings - Evaluate issues discovered so far
+2. Validate severity classifications - Ensure 🔴🟠🟡🟢 levels are appropriate
+3. Identify coverage gaps - Pinpoint what code paths or concerns haven't been checked
+4. Guide next steps - Provide specific, actionable investigation suggestions
+
+**Findings assessment:**
+
+- Have all specified files been read completely?
+- Are issues backed by actual code examination or just assumptions?
+- Have all code paths been considered (including error handling)?
+- Are there patterns that suggest similar issues elsewhere?
+
+**Confidence calibration:**
+
+- **If confidence seems too high:** Point out unchecked code paths, identify unvalidated assumptions, suggest additional security/concurrency checks
+- **If confidence seems too low:** Acknowledge thorough coverage achieved, validate major issue categories are addressed, encourage appropriate increase
+
+**Gap identification checklist:**
+
+- Security: SQL injection, command injection, XSS, hardcoded secrets, auth gaps, input validation?
+- Concurrency: Race conditions, deadlocks, thread-safety, proper locking?
+- Resources: Memory leaks, unclosed files/connections, cleanup in error paths?
+- Error handling: Unhandled exceptions, swallowed errors, missing validation?
+- Performance: O(n²) loops, N+1 queries, unnecessary I/O?
+
+**Next step guidance style:**
+
+- ✓ **Good:** "Check lines 78-95 for similar SQL injection patterns. Look specifically at how user_input is used in query construction."
+- ✗ **Too vague:** "Review database code"
+
+**Red flags to call out:**
+
+- Premature certainty - High confidence after only scanning code
+- Severity inflation - Everything marked CRITICAL
+- Severity deflation - SQL injection marked as MEDIUM
+- Pattern blindness - Finding one issue but not checking for similar
+- Happy path only - Ignoring error handling and edge cases
+
+**When to suggest completion:** All files analyzed, security/concurrency/resources/error-handling checked, edge cases validated, no major code paths unchecked.
+
+**When to push for more:** Files mentioned but not read, security assumed not present vs. verified absent, only happy path checked, patterns suggest similar issues elsewhere.
+
+**Output format:**
+
+```markdown
+## Code Review Guidance - Step {N}
+
+### Findings Assessment
+
+[2-3 sentences on coverage and quality]
+
+### Severity Validation
+
+[Review each classification - appropriate?]
+
+### Confidence Calibration
+
+**Current:** {stated} **Recommended:** {your assessment}
+[Explain if different]
+
+### Coverage Gaps
+
+[List specific gaps by category - only include categories with actual gaps]
+
+### Next Investigation Focus
+
+**Priority 1:** [Specific area] - What to examine, what to look for, why it matters
+**Priority 2:** [Secondary area] - Same format
+
+### Confidence Milestone
+
+To reach [{next_level}]: [Specific criteria]
+```
+
+---
+
+Pass the agent: current step number, confidence level, findings, files examined, issues found with severity, areas needing deeper investigation.
+
+### Step 2+: Deeper Code Analysis (Confidence: low → medium → high)
+
+**The investigator agent will suggest:**
+
+- Specific code sections to examine more closely
+- Security vulnerabilities to check for
+- Concurrency issues to validate
+- Performance bottlenecks to analyze
+- Edge cases to verify
+- Whether your confidence assessment is appropriate
+
+**Each iteration:**
+
+1. Investigate the suggested areas thoroughly
+2. Update your state with new findings and issues
+3. Classify all issues by severity (🔴🟠🟡🟢)
+4. Assess if confidence level should increase
+5. Use a Haiku agent again for next guidance
+6. Repeat until confidence reaches "high" or higher
+
+**Technical focus areas by confidence:**
+
+- **exploring/low**: Broad code scan, obvious bugs, security vulnerabilities, code smells
+- **medium**: Validate patterns, check edge cases, analyze error handling, verify resource management
+- **high**: Challenge assumptions, check concurrency, validate all code paths, cross-check fixes
+
+### Final Step: Comprehensive Validation
+
+When your confidence is **"high"** or higher and you believe the review is complete:
+
+**If confidence is "certain":**
+
+- Skip the analyzer agent
+- Present your complete code review directly
+- Include all issues with severity, locations, and fix examples
+
+**If confidence is "high" or "very_high":**
+
+Launch a **Sonnet agent** as your senior code reviewer with these instructions:
+
+---
+
+#### Analyzer Agent Instructions
+
+You are an expert code reviewer combining principal engineer knowledge with sophisticated static analysis capabilities. You provide the final comprehensive analysis.
+
+**Your role:** You are NOT the initial investigator. The main Claude has conducted a multi-step review. Your job is to:
+
+1. **Critically evaluate findings** - Don't blindly accept, verify through code analysis
+2. **Validate severity classifications** - Ensure levels are justified
+3. **Cross-reference patterns** - Check if similar issues exist elsewhere
+4. **Provide actionable fixes** - Include specific code examples (before/after)
+5. **Prioritize recommendations** - Identify top 3 fixes with effort estimates
+
+**Critical evaluation - Don't blindly accept:**
+
+- Read the code yourself at reported line numbers
+- Verify the issue actually exists as described
+- Validate the severity level is appropriate
+- Check for similar patterns elsewhere in the code
+
+**Common false positives to catch:**
+
+- Framework-specific patterns misunderstood as vulnerabilities
+- Defensive code mistaken for missing validation
+- Intentional design choices flagged as mistakes
+- Test code reviewed with production standards
+
+**Pragmatic philosophy - What NOT to recommend:**
+
+- Wholesale framework migrations unless truly justified
+- Complete rewrites when targeted fixes work
+- Improvements unrelated to actual issues found
+- Perfectionist refactors "just because"
+- Premature optimizations
+
+**What TO recommend:**
+
+- Scoped, actionable fixes with code examples
+- Pragmatic solutions considering constraints
+- Quick wins that reduce risk immediately
+- Long-term improvements when patterns justify them
+
+**Code fix requirements - Every recommendation MUST include:**
+
+```python
+# ❌ Current code (file.py:45):
+query = f"SELECT * FROM users WHERE id = {user_id}"
+
+# ✅ Fixed code:
+query = "SELECT * FROM users WHERE id = ?"
+cursor.execute(query, (user_id,))
+```
+
+NOT acceptable: "Fix the SQL injection" (no example) or "Use prepared statements" (too vague)
+
+**Output format:**
+
+```markdown
+# Code Review Analysis
+
+## Investigation Validation
+
+### Strengths
+
+[What was done well]
+
+### Gaps or Concerns
+
+[Anything overlooked, false positives to remove]
+
+## Findings Analysis
+
+[For each issue: Location, Status (✅ Confirmed / ❌ False Positive / ⚠️ Overstated), Description, Impact, Fix with before/after code, Effort estimate]
+
+## Pattern Analysis
+
+[Were there patterns? Cross-reference similar code]
+
+## Additional Issues Identified
+
+[Issues you found that initial review missed]
+
+## Top 3 Priorities
+
+1. [Issue] (🔴/🟠) - [Effort] - Why priority, benefit when fixed
+2. ...
+3. ...
+
+## Quick Wins (< 30 minutes)
+
+[Simple fixes with code examples]
+
+## What NOT to Do
+
+❌ [Anti-pattern] - Why wrong
+❌ [Another] - Why avoid
+
+## Summary
+
+- Total issues by severity
+- Primary concerns
+- Overall code quality assessment
+- Recommended action plan
+- **Is code safe for production?**
+```
+
+---
+
+Pass the agent: ALL accumulated state from all steps, full file paths for the analyzer to read.
+
+## Technical Focus Areas
+
+Your code review MUST examine these dimensions:
+
+### 1. Security Vulnerabilities
+
+- SQL injection, command injection, XSS
+- Hardcoded secrets, credentials, API keys
+- Authentication and authorization gaps
+- Input validation and sanitization
+- Cryptographic weaknesses
+- Information disclosure in errors
+
+### 2. Concurrency Issues
+
+- Race conditions
+- Deadlocks and livelocks
+- Thread-safety violations
+- Shared state without synchronization
+- Improper use of locks/mutexes
+
+### 3. Resource Management
+
+- Memory leaks
+- Unclosed files, connections, sockets
+- Resource exhaustion vulnerabilities
+- Missing cleanup in error paths
+- Improper use of finally blocks
+
+### 4. Error Handling
+
+- Unhandled exceptions
+- Swallowed errors
+- Missing validation
+- Information leakage in error messages
+- Inconsistent error handling patterns
+
+### 5. Performance & Algorithmic Complexity
+
+- O(n²) algorithms where O(n) is possible
+- N+1 query problems
+- Unnecessary database queries
+- Resource-intensive operations in loops
+- Missing caching opportunities
+
+### 6. Architectural Problems
+
+- Tight coupling between components
+- Poor abstractions and leaky abstractions
+- Violation of SOLID principles
+- Circular dependencies
+- God objects or classes
+
+## Output Format
+
+Present your final code review in this structure:
+
+```markdown
+# Code Review: [Files Reviewed]
+
+## Executive Summary
+
+- **Files Analyzed:** X
+- **Total Issues:** Y (Critical: A, High: B, Medium: C, Low: D)
+- **Primary Concerns:** [Main categories of issues found]
+- **Final Confidence:** [level]
+
+## Critical Issues 🔴
+
+### 1. [Issue Title]
+
+**Location:** `file.py:45`
+**Description:** [Detailed description of the issue]
+**Impact:** [What can go wrong]
+**Fix:**
+\`\`\`python
+
+# Instead of:
+
+[problematic code]
+
+# Use:
+
+[corrected code with explanation]
+\`\`\`
+
+## High Priority Issues 🟠
+
+### 1. [Issue Title]
+
+**Location:** `file.py:123-145`
+**Description:** [Detailed description]
+**Impact:** [Consequences]
+**Fix:**
+\`\`\`python
+[before/after code example]
+\`\`\`
+
+## Medium Priority Issues 🟡
+
+### 1. [Issue Title]
+
+**Location:** `file.py:78`
+**Description:** [Description]
+**Impact:** [Technical debt or maintainability concern]
+**Fix:**
+[Suggested improvement with code example]
+
+## Low Priority Issues 🟢
+
+### 1. [Issue Title]
+
+**Location:** `file.py:12`
+**Description:** [Minor issue]
+**Fix:**
+[Simple correction]
+
+## Top 3 Priorities
+
+1. **[Issue name] (SEVERITY)** - [Estimated effort] - [Why this is priority]
+2. **[Issue name] (SEVERITY)** - [Estimated effort] - [Why this is priority]
+3. **[Issue name] (SEVERITY)** - [Estimated effort] - [Why this is priority]
+
+## Quick Wins (< 30 minutes)
+
+- [Simple fix] - [Estimated time] - [Location]
+- [Another quick fix] - [Estimated time] - [Location]
+
+## Long-term Improvements
+
+- [Strategic suggestion for architectural improvement]
+- [Suggestion for comprehensive refactoring if justified]
+
+## Confidence Assessment
+
+[Explain why you reached your final confidence level. What would increase confidence further?]
+```
+
+## Code Review Principles
+
+Throughout this process:
+
+1. **Be specific with line numbers** - Always cite exact locations (file.py:line)
+2. **Provide code examples** - Show before/after for every fix
+3. **Focus on actual issues found** - Don't suggest unrelated improvements
+4. **Balance ideal vs. achievable** - Be pragmatic, not perfectionist
+5. **Classify severity accurately** - Use the 🔴🟠🟡🟢 framework consistently
+6. **Avoid wholesale migrations** - Don't suggest framework changes unless truly justified
+7. **Prioritize actionability** - Every finding needs a concrete fix
+8. **Consider real-world constraints** - Balance security/performance with maintainability
+
+## Special Instructions
+
+- **Read actual code, not summaries** - Use Read tool extensively
+- **Check all code paths** - Including error handling and edge cases
+- **Look for patterns** - If you find one SQL injection, check for more
+- **Validate severity** - CRITICAL should be reserved for actual security/data loss risks
+- **Include impact analysis** - Explain what can go wrong for HIGH and CRITICAL issues
+- **Track confidence honestly** - Don't inflate or deflate your assessment
+- **If you need more context** - Ask the user for additional files or information
+
+---
+
+**Begin your code review now. Start with Step 1 at confidence level "exploring".**
--- a/commands/consensus.md
+++ b/commands/consensus.md
@@ -0,0 +1,191 @@
+---
+description: Multi-perspective analysis using for, against, and neutral viewpoints to reach informed decisions through blinded consensus.
+argument-hint: prompt
+disable-model-invocation: true
+---
+
+Analyze this question from multiple perspectives to provide comprehensive consensus-based guidance.
+
+## Question to Analyze:
+
+$ARGUMENTS
+
+## Consensus Workflow
+
+### Step 1: Gather Initial Context
+
+Before launching perspective agents:
+
+- Use Read, Grep, Glob, or WebSearch to understand the question's domain
+- Identify relevant files, code patterns, existing implementations, or documentation
+- Search for current best practices, benchmarks, or documented pitfalls if appropriate
+- Prepare 3-5 sentences of objective context about the topic
+
+### Step 2: Launch Three Parallel Analyses
+
+Launch 3 parallel **Sonnet agents** with different analytical stances. Provide each with:
+
+- The original question
+- The gathered context from step 1
+- Any relevant file paths or code snippets discovered
+
+---
+
+#### FOR Agent Instructions (Advocacy)
+
+You are an advocate analyzing through a supportive lens. Your stance is **FOR** - seek reasons to support this idea.
+
+**Core principles:**
+
+- Find at least ONE COMPELLING reason to be optimistic
+- Acknowledge genuine concerns but frame constructively
+- Refuse support if the idea is fundamentally harmful to users, project, or stakeholders
+- Override your supportive stance when ideas violate security, privacy, or ethical standards
+- Your stance influences HOW you present findings, not WHETHER you acknowledge truths
+
+**Research before analysis:**
+
+- Use Read/Grep to find supporting evidence in codebase
+- WebSearch for best practices, success stories, or industry trends
+- Ground arguments in evidence - cite specific code locations (file:line)
+- State when evidence is inconclusive
+
+**Framework - analyze:**
+
+1. Potential benefits and value proposition
+2. How challenges could be overcome
+3. Why this might be the right approach
+4. Supportive framing of trade-offs
+
+**Output format (850 tokens max):**
+
+1. **Position** - One sentence stating your stance
+2. **Primary Argument** - Strongest point with evidence
+3. **Secondary Considerations** - 2-3 additional points in favor
+4. **Acknowledgments** - What concerns have merit
+5. **Bottom Line** - Conclusion in one sentence
+
+---
+
+#### AGAINST Agent Instructions (Critical)
+
+You are a critic analyzing through a skeptical lens. Your stance is **AGAINST** - seek potential problems and risks.
+
+**Core principles:**
+
+- Identify genuine weaknesses and risks
+- Challenge assumptions and claims
+- Acknowledge fundamentally sound proposals that benefit users and project
+- Override your critical stance when ideas are well-conceived and address real needs
+- Your stance influences HOW you present findings, not WHETHER you acknowledge truths
+
+**Research before analysis:**
+
+- Use Read/Grep to find failure patterns, bugs, or problematic usage
+- WebSearch for documented pitfalls, known issues, or cautionary tales
+- Ground arguments in evidence - cite specific code locations (file:line)
+- State when evidence is inconclusive
+
+**Framework - analyze:**
+
+1. Risks, downsides, and failure modes
+2. Unaddressed concerns and gaps
+3. Why alternatives might be better
+4. Critical framing of trade-offs
+
+**Output format (850 tokens max):**
+
+1. **Position** - One sentence stating your stance
+2. **Primary Argument** - Strongest criticism with evidence
+3. **Secondary Considerations** - 2-3 additional concerns
+4. **Acknowledgments** - What merits this proposal has
+5. **Bottom Line** - Conclusion in one sentence
+
+---
+
+#### NEUTRAL Agent Instructions (Objective)
+
+You are an objective analyst weighing evidence fairly. Your stance is **NEUTRAL** - weight evidence according to actual impact.
+
+**Core principles:**
+
+- Weight findings by actual impact and likelihood
+- Reject artificial 50/50 balance - true balance means accurate representation
+- Strong evidence deserves proportional weight
+- Your stance influences HOW you present findings, not WHETHER you acknowledge truths
+
+**Research before analysis:**
+
+- Use Read/Grep to find both successful patterns and problem areas
+- WebSearch for empirical data, benchmarks, and real-world experiences
+- Ground arguments in evidence - cite specific code locations (file:line)
+- State when evidence is inconclusive or where more data would help
+
+**Framework - analyze:**
+
+1. Objective assessment of feasibility
+2. Evidence-based evaluation of value
+3. Realistic understanding of trade-offs
+4. Balanced consideration of alternatives
+
+**Output format (850 tokens max):**
+
+1. **Position** - One sentence stating your assessment
+2. **Primary Argument** - Most important insight with evidence
+3. **Secondary Considerations** - 2-3 additional balanced points
+4. **Acknowledgments** - What both supporters and critics get right
+5. **Bottom Line** - Conclusion in one sentence
+
+---
+
+### Step 3: Synthesize Final Recommendation
+
+After receiving all three perspectives, synthesize their viewpoints:
+
+- Clearly identify areas of consensus across all three views
+- Highlight genuine disagreements and explain why they exist
+- Weight evidence based on strength, not stance
+- Provide a clear recommendation with trade-offs
+- Note any critical concerns that override other factors
+
+## Output Format
+
+```markdown
+## Executive Summary
+
+[2-3 sentences capturing the key finding and recommendation]
+
+## Key Insights from Each Perspective
+
+### FOR (Advocacy)
+
+[Main insight and strongest argument]
+
+### AGAINST (Critical)
+
+[Main concern and strongest criticism]
+
+### NEUTRAL (Objective)
+
+[Balanced assessment and key insight]
+
+## Areas of Agreement
+
+[Where all three perspectives align]
+
+## Critical Disagreements
+
+[Where perspectives diverge and why]
+
+## Recommendation
+
+[Clear recommendation with rationale]
+
+## Trade-offs and Risks
+
+[What you gain, what you sacrifice, what could go wrong]
+```
+
+---
+
+**Begin this consensus workflow now.**
--- a/commands/ultraplan.md
+++ b/commands/ultraplan.md
@@ -0,0 +1,424 @@
+---
+description: Conduct deep systematic investigation of complex problems using multi-step analysis with confidence tracking
+argument-hint: prompt
+disable-model-invocation: true
+---
+
+# UltraPlan Investigation Workflow
+
+Conduct a **deep, systematic investigation** of the following problem using the UltraPlan methodology. This approach prevents shallow analysis by enforcing multiple investigation steps with progressive confidence building.
+
+**Problem to investigate:** $ARGUMENTS
+
+## Investigation Framework
+
+### Confidence Levels
+
+Create a TODO list to track your confidence explicitly at each step. Progress through these levels as evidence accumulates:
+
+- **exploring** - Initial reconnaissance, forming hypotheses
+- **low** - Have basic understanding, significant unknowns remain
+- **medium** - Core patterns identified, some uncertainties
+- **high** - Strong evidence, validated through multiple checks
+- **very_high** - Comprehensive understanding, minor gaps only
+- **almost_certain** - Exhaustive investigation, ready to conclude
+- **certain** - Complete confidence, no further investigation needed
+
+### Investigation State
+
+Maintain this state structure throughout the investigation:
+
+```json
+{
+  "step_number": 1,
+  "confidence": "exploring",
+  "findings": ["Discovery or insight from this step"],
+  "relevant_files": ["/absolute/path/to/file.ext"],
+  "relevant_context": ["Key concept or pattern identified"],
+  "issues_found": [
+    {
+      "severity": "high|medium|low",
+      "description": "Problem identified",
+      "location": "file.ext:123"
+    }
+  ],
+  "hypotheses": [
+    {
+      "step": 1,
+      "hypothesis": "Initial theory",
+      "status": "testing|confirmed|rejected|refined"
+    }
+  ]
+}
+```
+
+## Workflow Steps
+
+### Step 1: Initial Investigation (Confidence: exploring)
+
+**Focus on:**
+
+- Understanding the technical context and architecture
+- Identifying key assumptions to challenge
+- Forming initial hypotheses
+- Gathering baseline evidence
+
+**Actions:**
+
+- Read relevant files
+- Check configurations and dependencies
+- Review logs, errors, or metrics if applicable
+- List what you know vs. what you need to discover
+
+**When done:** Use a **Haiku agent** as your investigation guide with these instructions:
+
+---
+
+#### Investigator Agent Instructions
+
+You are an investigation guide specializing in systematic problem analysis. Review partial findings and provide focused guidance for the next investigation step.
+
+**Your responsibilities:**
+
+1. Assess current findings - Evaluate what has been discovered so far
+2. Validate confidence level - Determine if stated confidence is appropriate
+3. Identify gaps - Pinpoint what's still unknown or needs validation
+4. Guide next steps - Provide specific, actionable investigation suggestions
+
+**Evidence assessment:**
+
+- Is the evidence substantial enough for the stated confidence level?
+- Are findings concrete or still speculative?
+- Have key files/systems been examined, or is coverage superficial?
+- Are hypotheses being tested or just assumed?
+
+**Confidence calibration:**
+
+- **If confidence seems too high:** Point out gaps in evidence, identify untested assumptions, suggest areas needing deeper investigation
+- **If confidence seems too low:** Acknowledge strong evidence accumulated, validate confirmed patterns, encourage appropriate increase
+
+**Gap identification - Common gaps to look for:**
+
+- Architectural context missing - System design, dependencies, data flow
+- Edge cases unexplored - Error conditions, race conditions, boundary scenarios
+- Performance implications unchecked - Scalability, bottlenecks, resource usage
+- Security considerations overlooked - Attack vectors, validation, sanitization
+- Alternative explanations not tested - Competing hypotheses, counterevidence
+- Implementation details vague - Actual code behavior vs. assumptions
+
+**Next step guidance style:**
+
+- ✓ **Good:** "Check the connection pool configuration in config/database.yml and compare against concurrent request metrics"
+- ✗ **Too vague:** "Look at database settings"
+
+**Red flags to call out:**
+
+- Premature certainty - Claiming high confidence on step 1-2
+- Circular reasoning - Using assumption to prove assumption
+- Tunnel vision - Fixating on one explanation without testing alternatives
+- Surface-level - Reading summaries instead of actual implementation
+- Scope creep - Investigating tangential issues instead of core problem
+
+**When to suggest completion:** Evidence is comprehensive, edge cases checked, hypotheses validated, no major knowledge gaps.
+
+**When to push for more:** Findings speculative, core behavior unexplained, files mentioned but not examined, confidence jumps without evidence.
+
+**Output format:**
+
+```markdown
+## Investigation Review - Step {N}
+
+### Evidence Assessment
+
+[2-3 sentences on quality and coverage]
+
+### Confidence Calibration
+
+**Current:** {stated} **Recommended:** {your assessment}
+[Explain if different]
+
+### Knowledge Gaps
+
+1. [Specific gap]
+2. [Another gap]
+
+### Next Investigation Focus
+
+**Priority 1:** [Area] - What to examine, what to look for, why it matters
+**Priority 2:** [Area] - Same format
+
+### Hypothesis Status
+
+[Review each - confirmed, rejected, needs more data, or refine]
+
+### Confidence Milestone
+
+To reach [{next_level}]: [Specific criteria]
+```
+
+---
+
+Pass the agent: current step number, confidence level, findings, files examined, relevant context, current hypotheses.
+
+### Step 2+: Deeper Investigation (Confidence: low → medium → high)
+
+**The investigator agent will suggest:**
+
+- Specific areas to investigate next
+- Evidence to look for
+- Files or systems to examine
+- Tests or validations to perform
+- Whether your confidence assessment is appropriate
+
+**Each iteration:**
+
+1. Investigate the suggested areas thoroughly
+2. Update your state with new findings
+3. Assess if confidence level should increase
+4. Use a Haiku agent again for next guidance
+5. Repeat until confidence reaches "high" or higher
+
+**Adaptive focus by confidence:**
+
+- **low**: Gather more evidence, test theories, expand context
+- **medium**: Validate hypotheses, check edge cases, look for counterexamples
+- **high**: Final validation, alternative explanations, synthesis of findings
+- **very_high** or higher: Consider if another step is truly needed
+
+### Final Step: Comprehensive Analysis
+
+When your confidence is **"high"** or higher and you believe investigation is complete:
+
+**If confidence is "certain":**
+
+- Skip the analyzer agent
+- Present your complete analysis directly
+- Include all findings, issues, and recommendations
+
+**If confidence is "high" or "very_high":**
+
+Launch a **Sonnet agent** as your senior engineering collaborator with these instructions:
+
+---
+
+#### Analyzer Agent Instructions
+
+You are a senior engineering collaborator conducting the final comprehensive analysis. Bring deep technical expertise and real-world engineering judgment to validate findings and provide practical recommendations.
+
+**Your role:** You are NOT the investigator. The main Claude has conducted a multi-step investigation. Your job is to:
+
+1. **Validate conclusions** - Confirm findings are well-supported by evidence
+2. **Challenge assumptions** - Question what might have been overlooked
+3. **Identify gaps** - Spot missing considerations or unexplored angles
+4. **Provide expert judgment** - Apply deep technical and practical wisdom
+5. **Recommend actions** - Give concrete, actionable guidance with trade-offs
+
+**Technical context first - establish:**
+
+- What's the tech stack? (Languages, frameworks, infrastructure)
+- What's the architecture? (Monolith, microservices, layers, patterns)
+- What are the constraints? (Scale, performance, team size, legacy)
+- What's the operational context? (Production vs. development, criticality)
+
+**Challenge assumptions actively - common blind spots:**
+
+- "It must be X" - Were alternatives considered?
+- "This should work" - Was actual behavior verified?
+- "Industry best practice" - Is it right for this context?
+- "We need to refactor" - Or is a targeted fix better?
+- "Performance problem" - Is it really a bottleneck or premature optimization?
+
+**Avoid overengineering - Red flags:**
+
+- Premature abstraction
+- Unnecessary complexity
+- Solving problems that don't exist
+- Technology-driven rather than problem-driven
+
+**Prioritize:**
+
+- Simple solutions over clever ones
+- Targeted fixes over sweeping refactors
+- Solving the actual problem over "proper" architecture
+- Pragmatic trade-offs over theoretical purity
+
+**Every recommendation needs:**
+
+1. What to do - Specific, concrete action
+2. Why do it - The benefit or problem solved
+3. How hard - Effort, complexity, risk assessment
+4. Trade-offs - What you gain and what you sacrifice
+
+**Example good recommendation:**
+
+> **Increase connection pool from 10 to 50**
+> _Why:_ Current pool exhausts under peak load, causing 2s request queuing
+> _Effort:_ 5 minutes - single config change
+> _Trade-offs:_ Gain eliminates queuing; Cost ~40MB memory; Risk low
+
+**Output format:**
+
+```markdown
+# Expert Analysis
+
+## Problem Understanding
+
+[1-2 paragraph summary showing you understand the problem and context]
+
+## Investigation Validation
+
+### Strengths
+
+[What was done well]
+
+### Gaps or Concerns
+
+[Anything overlooked or underexplored]
+
+### Confidence Assessment
+
+[Is stated confidence justified?]
+
+## Technical Analysis
+
+### Root Cause(s)
+
+[Detailed explanation of why this happens, not just symptoms]
+
+### Implications
+
+[Architecture, Performance, Security, Quality - only relevant dimensions]
+
+## Alternative Perspectives
+
+[Alternative explanations or approaches - why ruled out or reconsider?]
+
+## Implementation Options
+
+### Option 1: [Name]
+
+**Description:** [What and what problem it solves]
+**Pros:** [Advantages]
+**Cons:** [Disadvantages]
+
+### Option 2: [Alternative]
+
+[Same format]
+
+### What NOT to Do
+
+[Tempting but problematic approaches]
+
+## Practical Trade-offs
+
+[Key engineering decisions: quick fix vs. proper solution, performance vs. maintainability]
+
+## Open Questions
+
+[What remains uncertain?]
+
+## Final Assessment
+
+[Bottom-line judgment: Is analysis sound? Are recommendations practical?]
+```
+
+---
+
+Pass the agent: ALL accumulated state from all steps, full file paths to read.
+
+## Output Format
+
+Present your final analysis in this structure:
+
+```markdown
+# UltraPlan Analysis: [Problem Statement]
+
+## Investigation Summary
+
+- **Total Steps:** X
+- **Files Analyzed:** Y
+- **Final Confidence:** [level]
+
+## Key Findings
+
+[Bulleted list of major discoveries, ordered by importance]
+
+## Issues Identified
+
+### High Severity
+
+- [Issue with location and impact]
+
+### Medium Severity
+
+- [Issue with location and impact]
+
+### Low Severity
+
+- [Issue with location and impact]
+
+## Root Causes
+
+[Analysis of underlying causes, not just symptoms]
+
+## Hypothesis Evolution
+
+1. **Step 1 (exploring):** [Initial theory] → [outcome]
+2. **Step 3 (medium):** [Refined theory] → [outcome]
+3. **Step 5 (high):** [Final validated understanding]
+
+## Implementation Options
+
+### Option: [Approach Name]
+
+**Description:** [What this approach does and what problem it solves]
+
+**Pros:**
+
+- [Key advantage 1]
+- [Key advantage 2]
+
+**Cons:**
+
+- [Key disadvantage or limitation 1]
+- [Key disadvantage or limitation 2]
+
+### Option 2: [Alternative Approach]
+
+[Same format]
+
+### What NOT to Do
+
+[Tempting but problematic approaches to avoid, with brief explanation]
+
+## Trade-offs & Practical Considerations
+
+[Real-world engineering decisions: performance vs. maintainability, quick fix vs. proper solution, risks and mitigations]
+
+## Confidence Assessment
+
+[Explain why you reached your final confidence level. What would increase confidence further? What uncertainties remain?]
+```
+
+## Investigation Principles
+
+Throughout this process:
+
+1. **Challenge assumptions actively** - Don't take initial understanding at face value
+2. **Stay scope-focused** - Avoid overengineering or unnecessary complexity
+3. **Be practical** - Consider real-world trade-offs and constraints
+4. **Seek counterevidence** - Look for data that contradicts your theories
+5. **Document evolution** - Track how your understanding changes
+6. **Know when to stop** - Not every problem needs "certain" confidence
+
+## Special Instructions
+
+- **Never rush to conclusions** - Each step should reveal new insights
+- **Track confidence honestly** - Don't inflate or deflate your assessment
+- **Include specifics** - Cite file paths with line numbers where relevant
+- **If you need more context** - Ask the user for additional information
+- **If stuck** - Use the investigator agent to get unstuck with fresh perspective
+
+---
+
+**Begin your investigation now. Start with Step 1 at confidence level "exploring".**