Initial commit

2025-11-29 17:54:49 +08:00
commit ee11345c5b
28 changed files with 6747 additions and 0 deletions
--- a/commands/evi.md
+++ b/commands/evi.md
@@ -0,0 +1,669 @@
+---
+description: Evaluate GitHub issue quality and completeness for agent implementation
+argument-hint: [issue number or URL]
+---
+
+# EVI - Evaluate Issue Quality
+
+Evaluates a GitHub issue to ensure it contains everything needed for the agent implementation framework to succeed.
+
+## Input
+
+**Issue Number or URL:**
+```
+$ARGUMENTS
+```
+
+## Agent Framework Requirements
+
+The issue must support all agents in the pipeline:
+
+**issue-implementer** needs:
+- Clear requirements and end state
+- Files/components to create or modify
+- Edge cases and error handling specs
+- Testing expectations
+
+**quality-checker** needs:
+- Testable acceptance criteria matching automated checks
+- Performance benchmarks (if applicable)
+- Expected test outcomes
+
+**security-checker** needs:
+- Security considerations and requirements
+- Authentication/authorization specs
+- Data handling requirements
+
+**doc-checker** needs:
+- Documentation update requirements
+- What needs README/wiki updates
+- API documentation expectations
+
+**review-orchestrator** needs:
+- Clear pass/fail criteria
+- Non-negotiable vs nice-to-have requirements
+
+**issue-merger** needs:
+- Unambiguous "done" definition
+- Integration requirements
+
+## Evaluation Criteria
+
+### ✅ CRITICAL (Must Have)
+
+**1. Clear Requirements**
+- Exact end state specified
+- Technical details: names, types, constraints, behavior
+- Files/components to create or modify
+- Validation rules and error handling
+- Edge cases covered
+
+**2. Testable Acceptance Criteria**
+- Specific, measurable outcomes
+- Aligns with automated checks (pytest, ESLint, TypeScript, build)
+- Includes edge cases
+- References quality checks: "All tests pass", "ESLint passes", "Build succeeds"
+
+**3. Affected Components**
+- Which files/modules to modify
+- Which APIs/endpoints involved
+- Which database tables affected
+- Which UI components changed
+
+**4. Testing Expectations**
+- What tests need to be written
+- What tests need to pass
+- Performance benchmarks (if applicable)
+- Integration test requirements
+
+**5. Context & Why**
+- Business value
+- User impact
+- Current limitation
+- Why this matters now
+
+### ⚠️ IMPORTANT (Should Have)
+
+**6. Security Requirements**
+- Authentication/authorization needs
+- Data privacy considerations
+- Input validation requirements
+- Security vulnerabilities to avoid
+
+**7. Documentation Requirements**
+- What needs README updates
+- What needs wiki/API docs
+- Inline code comments expected
+- FEATURE-LIST.md updates
+
+**8. Error Handling**
+- Expected error messages
+- Error codes to return
+- Fallback behavior
+- User-facing error text
+
+**9. Scope Boundaries**
+- What IS included
+- What is NOT included
+- Out of scope items
+- Future work references
+
+### 💡 HELPFUL (Nice to Have)
+
+**10. Performance Requirements** (OPTIONAL - only if specific concern)
+- Response time limits
+- Query performance expectations
+- Scale requirements
+- Load handling
+- **Note:** Most issues don't need this. Build first, measure, optimize later.
+
+**11. Related Issues**
+- Dependencies (blocked by, depends on)
+- Related work
+- Follow-up issues planned
+
+**12. Implementation Guidance**
+- Problems agent needs to solve
+- Existing patterns to follow
+- Challenges to consider
+- No prescriptive solutions (guides, doesn't prescribe)
+
+### ❌ RED FLAGS (Must NOT Have)
+
+**13. Prescriptive Implementation**
+- Complete function/component implementations (full solutions)
+- Large blocks of working code (> 10-15 lines)
+- Complete SQL migration scripts
+- Step-by-step implementation guide
+- "Add this code to line X" with specific file/line fixes
+
+**✅ OK to have:**
+- Short code examples (< 10 lines): `{ error: 'message', code: 400 }`
+- Type definitions: `{ email: string, category?: string }`
+- Example API responses: `{ id: 5, status: 'active' }`
+- Error message formats: `"Invalid email format"`
+- Small syntax examples showing the shape/format
+
+## Evaluation Process
+
+### STEP 1: Fetch the Issue
+```bash
+ISSUE_NUM=$ARGUMENTS
+
+# Fetch issue content
+gh issue view $ISSUE_NUM --json title,body,labels > issue-data.json
+
+TITLE=$(jq -r '.title' issue-data.json)
+BODY=$(jq -r '.body' issue-data.json)
+LABELS=$(jq -r '.labels[].name' issue-data.json)
+
+echo "Evaluating Issue #${ISSUE_NUM}: ${TITLE}"
+```
+
+### STEP 2: Check Agent Framework Compatibility
+
+Evaluate for each agent in the pipeline:
+
+**For issue-implementer:**
+```bash
+# Check: Clear requirements present?
+echo "$BODY" | grep -iE '(requirements|end state|target state)' || echo "⚠️ No clear requirements section"
+
+# Check: Files/components specified?
+echo "$BODY" | grep -iE '(files? (to )?(create|modify)|components? (to )?(create|modify|affected)|tables? (to )?(create|modify))' || echo "⚠️ No affected files/components specified"
+
+# Check: Edge cases covered?
+echo "$BODY" | grep -iE '(edge cases?|error handling|fallback|validation)' || echo "⚠️ No edge cases or error handling mentioned"
+```
+
+**For quality-checker:**
+```bash
+# Check: Acceptance criteria reference automated checks?
+echo "$BODY" | grep -iE '(tests? pass|eslint|flake8|mypy|pytest|typescript|build succeeds|linting passes)' || echo "⚠️ Acceptance criteria don't reference automated quality checks"
+
+# Note: Performance requirements are optional - don't check for them
+```
+
+**For security-checker:**
+```bash
+# Check: Security requirements present if handling sensitive data?
+if echo "$BODY" | grep -iE '(password|token|secret|api.?key|auth|credential|email|private)'; then
+  echo "$BODY" | grep -iE '(security|authentication|authorization|encrypt|sanitize|validate|sql.?injection|xss)' || echo "⚠️ Handles sensitive data but no security requirements"
+fi
+```
+
+**For doc-checker:**
+```bash
+# Check: Documentation requirements specified?
+echo "$BODY" | grep -iE '(documentation|readme|wiki|api.?docs?|comments?|feature.?list)' || echo "⚠️ No documentation requirements specified"
+```
+
+**For review-orchestrator:**
+```bash
+# Check: Clear pass/fail criteria?
+CRITERIA_COUNT=$(echo "$BODY" | grep -c '- \[.\]' || echo "0")
+if [ "$CRITERIA_COUNT" -lt 3 ]; then
+  echo "⚠️ Only $CRITERIA_COUNT acceptance criteria (need at least 3-5)"
+fi
+```
+
+### STEP 3: Evaluate Testable Acceptance Criteria
+
+Check if criteria align with automated checks:
+
+**Good criteria (matches quality-checker):**
+- [x] "All pytest tests pass" → quality-checker runs pytest
+- [x] "ESLint passes with no errors" → quality-checker runs ESLint
+- [x] "TypeScript compilation succeeds" → quality-checker runs tsc
+- [x] "Query completes in < 100ms" → measurable, testable
+- [x] "Can insert 5 accounts per user" → specific, verifiable
+
+**Bad criteria (vague/unmeasurable):**
+- [ ] "Works correctly" → subjective
+- [ ] "Code looks good" → subjective
+- [ ] "Function created" → process-oriented, not outcome
+- [ ] "Bug is fixed" → not specific enough
+
+### STEP 4: Check for Affected Components
+
+Does issue specify:
+- Which files to create?
+- Which files to modify?
+- Which APIs/endpoints involved?
+- Which database tables affected?
+- Which UI components changed?
+- Which tests need updating?
+
+### STEP 5: Evaluate Testing Expectations
+
+Does issue specify:
+- What tests to write (unit, integration, e2e)?
+- What existing tests need to pass?
+- What test coverage is expected?
+- Performance test requirements?
+
+### STEP 6: Check Security Requirements
+
+**Security (if applicable):**
+- Authentication/authorization requirements?
+- Input validation specs?
+- Data privacy considerations?
+- Known vulnerabilities to avoid?
+
+**Note on Performance:**
+Performance requirements are OPTIONAL and NOT evaluated as a critical check. Most issues don't need explicit performance requirements - build first, measure, optimize later. Only flag as missing if:
+- Issue specifically mentions performance as a concern
+- Feature handles large-scale data (millions of records)
+- There's a user-facing latency requirement
+
+Otherwise, absence of performance requirements is fine.
+
+### STEP 7: Check Documentation Requirements
+
+Does issue specify:
+- README updates needed?
+- Wiki/API docs updates?
+- Inline comments expected?
+- FEATURE-LIST.md updates?
+
+### STEP 8: Scan for Red Flags
+
+```bash
+# RED FLAG: Large code blocks (> 15 lines)
+# Count lines in code blocks
+CODE_BLOCKS=$(grep -E '```(typescript|javascript|python|sql|java|go|rust)' <<< "$BODY")
+if [ ! -z "$CODE_BLOCKS" ]; then
+  # Check if any code block has > 15 lines
+  # (This is a simplified check - full implementation would count lines per block)
+  echo "⚠️ Found code blocks - checking size..."
+  # Manual review needed: Are these short examples or full implementations?
+fi
+
+# RED FLAG: Complete function implementations
+grep -iE '(function|def|class).*\{' <<< "$BODY" && echo "🚨 Contains complete function/class implementations"
+
+# RED FLAG: Prescriptive instructions with specific fixes
+grep -iE '(add (this|the following) code to line [0-9]+|here is the implementation|use this exact code)' <<< "$BODY" && echo "🚨 Contains prescriptive code placement instructions"
+
+# RED FLAG: Specific file/line references for bug fixes
+grep -E '(fix|change|modify|add).*(in|at|on) line [0-9]+' <<< "$BODY" && echo "🚨 Contains specific file/line fix locations"
+
+# RED FLAG: Step-by-step implementation guide (not just planning)
+grep -iE '(step [0-9]|first,.*second,.*third)' <<< "$BODY" | grep -iE '(write this code|add this function|implement as follows)' && echo "🚨 Contains step-by-step code implementation guide"
+
+# OK: Short examples are fine
+# These are NOT red flags:
+# - Type definitions: { email: string }
+# - Example responses: { id: 5, status: 'active' }
+# - Error formats: "Invalid email"
+# - Small syntax examples (< 5 lines)
+```
+
+## Output Format
+
+```markdown
+# Issue #${ISSUE_NUM} Evaluation Report
+
+**Title:** ${TITLE}
+
+## Agent Framework Compatibility: X/9 Critical Checks Passed
+
+**Ready for implementation?** [YES / NEEDS_WORK / NO]
+
+**Note:** Performance, Related Issues, and Implementation Guidance are optional - not counted in score.
+
+### ✅ Strengths (What's Good)
+- [Specific strength 1]
+- [Specific strength 2]
+- [Specific strength 3]
+
+### ⚠️ Needs Improvement (Fix Before Implementation)
+- [Missing element with specific impact on agent]
+- [Missing element with specific impact on agent]
+- [Missing element with specific impact on agent]
+
+### ❌ Critical Issues (Blocks Agent Success)
+- [Critical missing element / red flag]
+- [Critical missing element / red flag]
+
+### 🚨 Red Flags Found
+- [Code snippet / prescriptive instruction found]
+- [Specific file/line reference found]
+
+## Agent-by-Agent Analysis
+
+### issue-implementer Readiness: [READY / NEEDS_WORK / BLOCKED]
+**Can the implementer succeed with this issue?**
+
+✅ **Has:**
+- Clear requirements and end state
+- Files/components to modify specified
+- Edge cases covered
+
+❌ **Missing:**
+- Error handling specifications
+- Validation rules not detailed
+
+**Impact:** [Description of how missing elements affect implementer]
+
+### quality-checker Readiness: [READY / NEEDS_WORK / BLOCKED]
+**Can the quality checker validate this?**
+
+✅ **Has:**
+- Acceptance criteria reference "pytest passes"
+- Performance benchmark: "< 100ms"
+
+❌ **Missing:**
+- No reference to ESLint/TypeScript checks
+- Test coverage expectations not specified
+
+**Impact:** [Description of how missing elements affect quality validation]
+
+### security-checker Readiness: [READY / NEEDS_WORK / N/A]
+**Are security requirements clear?**
+
+✅ **Has:**
+- Authentication requirements specified
+- Input validation rules defined
+
+❌ **Missing:**
+- No SQL injection prevention mentioned
+- Data encryption requirements unclear
+
+**Impact:** [Description of security gaps]
+
+### doc-checker Readiness: [READY / NEEDS_WORK / BLOCKED]
+**Are documentation expectations clear?**
+
+✅ **Has:**
+- README update requirement specified
+
+❌ **Missing:**
+- No API documentation mentioned
+- FEATURE-LIST.md updates not specified
+
+**Impact:** [Description of doc gaps]
+
+### review-orchestrator Readiness: [READY / NEEDS_WORK / BLOCKED]
+**Are pass/fail criteria clear?**
+
+✅ **Has:**
+- 5 specific acceptance criteria
+- Clear success conditions
+
+❌ **Missing:**
+- Distinction between blocking vs non-blocking issues
+- Performance criteria not measurable
+
+**Impact:** [Description of review ambiguity]
+
+## Recommendations
+
+### High Priority (Fix Before Implementation)
+1. [Specific actionable fix]
+2. [Specific actionable fix]
+3. [Specific actionable fix]
+
+### Medium Priority (Improve Clarity)
+1. [Specific suggestion]
+2. [Specific suggestion]
+
+### Low Priority (Nice to Have)
+1. [Optional improvement]
+2. [Optional improvement]
+
+## Example Improvements
+
+### Before (Current):
+```markdown
+[Quote problematic section from issue]
+```
+
+### After (Suggested):
+```markdown
+[Show how it should be written]
+```
+
+## Agent Framework Compatibility
+
+**Will this issue work well with the agent framework?**
+
+- ✅ YES - Issue is well-structured for agents
+- ⚠️ MAYBE - Needs improvements but workable
+- ❌ NO - Critical issues will confuse agents
+
+**Specific concerns:**
+- [Agent compatibility issue 1]
+- [Agent compatibility issue 2]
+
+## Quick Fixes
+
+If you want to improve this issue, run:
+
+```bash
+# Add these sections
+gh issue comment $ISSUE_NUM --body "## Implementation Guidance
+
+To implement this, you will need to:
+- [List problems to solve]
+"
+
+# Remove code snippets
+# Edit the issue and remove code blocks showing implementation
+gh issue edit $ISSUE_NUM
+
+# Add acceptance criteria
+gh issue comment $ISSUE_NUM --body "## Additional Acceptance Criteria
+- [ ] [Specific testable criterion]
+"
+```
+
+## Summary
+
+**Ready for agent implementation?** [YES/NO/NEEDS_WORK]
+
+**Confidence level:** [HIGH/MEDIUM/LOW]
+
+**Estimated time to fix issues:** [X minutes]
+```
+
+## Agent Framework Compatibility Score
+
+**12 Critical Checks (Pass/Fail):**
+
+### ✅ CRITICAL (Must Pass) - 5 checks
+1. **Clear Requirements** - End state specified with technical details
+2. **Testable Acceptance Criteria** - Aligns with automated checks (pytest, ESLint, etc.)
+3. **Affected Components** - Files/modules to create or modify listed
+4. **Testing Expectations** - What tests to write/pass specified
+5. **Context** - Why, who, business value explained
+
+### ⚠️ IMPORTANT (Should Pass) - 4 checks
+6. **Security Requirements** - If handling sensitive data/auth
+7. **Documentation Requirements** - README/wiki/API docs expectations
+8. **Error Handling** - Error messages, codes, fallback behavior
+9. **Scope Boundaries** - What IS and ISN'T included
+
+### 💡 HELPFUL (Nice to Pass) - 3 checks
+10. **Performance Requirements** - OPTIONAL, only if specific concern mentioned
+11. **Related Issues** - Dependencies, related work
+12. **Implementation Guidance** - Problems to solve (without prescribing HOW)
+
+### ❌ RED FLAG (Must NOT Have) - 1 check
+13. **No Prescriptive Implementation** - No code snippets, algorithms, step-by-step
+
+**Scoring (based on 9 critical/important checks):**
+- **9/9 + No Red Flags**: Perfect - Ready for agents
+- **7-8/9 + No Red Flags**: Excellent - Minor improvements
+- **5-6/9 + No Red Flags**: Good - Some gaps but workable
+- **3-4/9 + No Red Flags**: Needs work - Significant gaps
+- **< 3/9 OR Red Flags Present**: Blocked - Must fix before implementation
+
+**Note:** Performance, Related Issues, and Implementation Guidance are HELPFUL but not required.
+
+**Ready for Implementation?**
+- **YES**: 7+ checks passed, no red flags
+- **NEEDS_WORK**: 5-6 checks passed OR minor red flags
+- **NO**: < 5 checks passed OR major red flags (code snippets, step-by-step)
+
+## Examples
+
+### Example 1: Excellent Issue (9/9 Checks Passed)
+```
+Issue #42: "Support multiple Google accounts per user"
+
+✅ Agent Framework Compatibility: 9/9
+**Ready for implementation?** YES
+
+Strengths:
+- Clear database requirements (tables, columns, constraints)
+- Acceptance criteria: "pytest passes", "Build succeeds"
+- Files specified: user_google_accounts table, modify user_google_tokens
+- Testing: "Write tests for multi-account insertion"
+- Security: "Validate email uniqueness per user"
+- Documentation: "Update README with multi-account setup"
+- Error handling: "Return 409 if duplicate email"
+- Scope: "Account switching UI is out of scope (Issue #43)"
+- Context: Explains why users need personal + work accounts
+
+Agent-by-Agent:
+✅ implementer: Has everything needed
+✅ quality-checker: Criteria match automated checks
+✅ security-checker: Security requirements clear
+✅ doc-checker: Documentation expectations specified
+✅ review-orchestrator: Clear pass/fail criteria
+
+Note: No performance requirements specified - this is fine. Build first, optimize later if needed.
+```
+
+### Example 2: Needs Work (6/9 Checks Passed)
+```
+Issue #55: "Add email search functionality"
+
+⚠️ Agent Framework Compatibility: 6/9
+**Ready for implementation?** NEEDS_WORK
+
+Strengths:
+- Clear requirements: search endpoint, query parameters
+- Context explains user need
+- Affected components specified
+
+Issues:
+❌ Missing testing expectations (what tests to write?)
+❌ Missing documentation requirements
+❌ Missing error handling specs
+❌ Vague acceptance criteria: "Search works correctly"
+
+Agent-by-Agent:
+✅ implementer: Has requirements
+⚠️ quality-checker: Criteria too vague ("works correctly")
+⚠️ security-checker: No SQL injection prevention mentioned
+⚠️ doc-checker: No docs requirements
+⚠️ review-orchestrator: Criteria not specific enough
+
+Recommendations:
+- Add: "pytest tests pass for search functionality"
+- Add: "Update API documentation"
+- Add: "Error handling: Return 400 if invalid query, 500 if database error"
+- Replace "works correctly" with specific outcomes: "Returns matching emails", "Handles empty results"
+
+Note: Performance requirements are optional - don't need to add unless there's a specific concern.
+```
+
+### Example 3: Blocked (2/9, Red Flags Present)
+```
+Issue #67: "Fix email parsing bug"
+
+❌ Agent Framework Compatibility: 2/9
+**Ready for implementation?** NO
+
+Critical Issues:
+🚨 Contains 30-line complete function implementation
+🚨 Prescriptive: "Add this exact function to inbox.ts line 45"
+🚨 Step-by-step: "First, create parseEmail(). Second, add validation. Third, handle errors."
+❌ No clear requirements (what should parsing do?)
+❌ No acceptance criteria
+❌ No testing expectations
+
+Agent-by-Agent:
+❌ implementer: Doesn't need to think - solution provided
+❌ quality-checker: Can't validate (no criteria)
+❌ security-checker: Missing validation specs
+❌ doc-checker: No docs requirements
+❌ review-orchestrator: No pass/fail criteria
+
+Must fix:
+1. Remove complete function - describe WHAT parsing should do instead
+   - ✅ OK to keep: "Output format: `{ local: string, domain: string }`"
+   - ✅ OK to keep: "Error message: 'Invalid email format'"
+   - ❌ Remove: 30-line function implementation
+2. Add requirements: input format, output format, error handling
+3. Add acceptance criteria: "Parses valid emails", "Rejects invalid formats"
+4. Specify testing: "Add unit tests for edge cases"
+5. Remove line 45 reference - describe behavior instead
+6. Convert step-by-step to "Implementation Guidance" section
+```
+
+### Example 4: Good Use of Code Examples (9/9)
+```
+Issue #78: "Add user profile endpoint"
+
+✅ Agent Framework Compatibility: 9/9
+**Ready for implementation?** YES
+
+Requirements:
+- Endpoint: POST /api/profile
+- Request body: `{ name: string, bio?: string }`  ← Short example (OK!)
+- Response: `{ id: number, name: string, bio: string | null }`  ← Type def (OK!)
+- Error: `{ error: "Name required", code: 400 }`  ← Error format (OK!)
+
+✅ Good use of code examples:
+- Shows shape/format without providing implementation
+- Type definitions help clarify requirements
+- Error format specifies exact user-facing text
+- No function implementations
+
+Agent-by-Agent:
+✅ implementer: Clear requirements, no prescribed solution
+✅ quality-checker: Testable criteria
+✅ security-checker: Input validation specified
+✅ doc-checker: API docs requirement present
+```
+
+## Integration with GHI Command
+
+This command complements GHI:
+- **GHI**: Creates new issues following best practices
+- **EVI**: Evaluates existing issues against same standards
+
+Use EVI to:
+- Review issues written by other agents
+- Audit issues before sending to implementation
+- Train team on issue quality standards
+- Ensure agent framework compatibility
+
+## Notes
+
+- Run EVI before implementing any issue
+- Use it to review issues written by other agents or teammates
+- Consider making EVI a required check in your workflow
+- **11+ checks passed**: Agent-ready, implement immediately
+- **8-10 checks passed**: Needs minor improvements, but workable
+- **< 8 checks passed**: Must revise before implementation
+- Look for red flags even in high-scoring issues
+
+### What About Code Snippets?
+
+**✅ Short examples are GOOD:**
+- Type definitions: `interface User { id: number, name: string }`
+- API responses: `{ success: true, data: {...} }`
+- Error formats: `{ error: "message", code: 400 }`
+- Small syntax examples (< 10 lines)
+
+**❌ Large implementations are BAD:**
+- Complete functions (15+ lines)
+- Full component implementations
+- Complete SQL migration scripts
+- Working solutions that agents can copy-paste
+
+**The key distinction:** Are you showing the shape/format (good) or providing the solution (bad)?