--- description: Evaluate GitHub issue quality and completeness for agent implementation argument-hint: [issue number or URL] --- # EVI - Evaluate Issue Quality Evaluates a GitHub issue to ensure it contains everything needed for the agent implementation framework to succeed. ## Input **Issue Number or URL:** ``` $ARGUMENTS ``` ## Agent Framework Requirements The issue must support all agents in the pipeline: **issue-implementer** needs: - Clear requirements and end state - Files/components to create or modify - Edge cases and error handling specs - Testing expectations **quality-checker** needs: - Testable acceptance criteria matching automated checks - Performance benchmarks (if applicable) - Expected test outcomes **security-checker** needs: - Security considerations and requirements - Authentication/authorization specs - Data handling requirements **doc-checker** needs: - Documentation update requirements - What needs README/wiki updates - API documentation expectations **review-orchestrator** needs: - Clear pass/fail criteria - Non-negotiable vs nice-to-have requirements **issue-merger** needs: - Unambiguous "done" definition - Integration requirements ## Evaluation Criteria ### ✅ CRITICAL (Must Have) **1. Clear Requirements** - Exact end state specified - Technical details: names, types, constraints, behavior - Files/components to create or modify - Validation rules and error handling - Edge cases covered **2. Testable Acceptance Criteria** - Specific, measurable outcomes - Aligns with automated checks (pytest, ESLint, TypeScript, build) - Includes edge cases - References quality checks: "All tests pass", "ESLint passes", "Build succeeds" **3. Affected Components** - Which files/modules to modify - Which APIs/endpoints involved - Which database tables affected - Which UI components changed **4. Testing Expectations** - What tests need to be written - What tests need to pass - Performance benchmarks (if applicable) - Integration test requirements **5. Context & Why** - Business value - User impact - Current limitation - Why this matters now ### ⚠️ IMPORTANT (Should Have) **6. Security Requirements** - Authentication/authorization needs - Data privacy considerations - Input validation requirements - Security vulnerabilities to avoid **7. Documentation Requirements** - What needs README updates - What needs wiki/API docs - Inline code comments expected - FEATURE-LIST.md updates **8. Error Handling** - Expected error messages - Error codes to return - Fallback behavior - User-facing error text **9. Scope Boundaries** - What IS included - What is NOT included - Out of scope items - Future work references ### 💡 HELPFUL (Nice to Have) **10. Performance Requirements** (OPTIONAL - only if specific concern) - Response time limits - Query performance expectations - Scale requirements - Load handling - **Note:** Most issues don't need this. Build first, measure, optimize later. **11. Related Issues** - Dependencies (blocked by, depends on) - Related work - Follow-up issues planned **12. Implementation Guidance** - Problems agent needs to solve - Existing patterns to follow - Challenges to consider - No prescriptive solutions (guides, doesn't prescribe) ### ❌ RED FLAGS (Must NOT Have) **13. Prescriptive Implementation** - Complete function/component implementations (full solutions) - Large blocks of working code (> 10-15 lines) - Complete SQL migration scripts - Step-by-step implementation guide - "Add this code to line X" with specific file/line fixes **✅ OK to have:** - Short code examples (< 10 lines): `{ error: 'message', code: 400 }` - Type definitions: `{ email: string, category?: string }` - Example API responses: `{ id: 5, status: 'active' }` - Error message formats: `"Invalid email format"` - Small syntax examples showing the shape/format ## Evaluation Process ### STEP 1: Fetch the Issue ```bash ISSUE_NUM=$ARGUMENTS # Fetch issue content gh issue view $ISSUE_NUM --json title,body,labels > issue-data.json TITLE=$(jq -r '.title' issue-data.json) BODY=$(jq -r '.body' issue-data.json) LABELS=$(jq -r '.labels[].name' issue-data.json) echo "Evaluating Issue #${ISSUE_NUM}: ${TITLE}" ``` ### STEP 2: Check Agent Framework Compatibility Evaluate for each agent in the pipeline: **For issue-implementer:** ```bash # Check: Clear requirements present? echo "$BODY" | grep -iE '(requirements|end state|target state)' || echo "⚠️ No clear requirements section" # Check: Files/components specified? echo "$BODY" | grep -iE '(files? (to )?(create|modify)|components? (to )?(create|modify|affected)|tables? (to )?(create|modify))' || echo "⚠️ No affected files/components specified" # Check: Edge cases covered? echo "$BODY" | grep -iE '(edge cases?|error handling|fallback|validation)' || echo "⚠️ No edge cases or error handling mentioned" ``` **For quality-checker:** ```bash # Check: Acceptance criteria reference automated checks? echo "$BODY" | grep -iE '(tests? pass|eslint|flake8|mypy|pytest|typescript|build succeeds|linting passes)' || echo "⚠️ Acceptance criteria don't reference automated quality checks" # Note: Performance requirements are optional - don't check for them ``` **For security-checker:** ```bash # Check: Security requirements present if handling sensitive data? if echo "$BODY" | grep -iE '(password|token|secret|api.?key|auth|credential|email|private)'; then echo "$BODY" | grep -iE '(security|authentication|authorization|encrypt|sanitize|validate|sql.?injection|xss)' || echo "⚠️ Handles sensitive data but no security requirements" fi ``` **For doc-checker:** ```bash # Check: Documentation requirements specified? echo "$BODY" | grep -iE '(documentation|readme|wiki|api.?docs?|comments?|feature.?list)' || echo "⚠️ No documentation requirements specified" ``` **For review-orchestrator:** ```bash # Check: Clear pass/fail criteria? CRITERIA_COUNT=$(echo "$BODY" | grep -c '- \[.\]' || echo "0") if [ "$CRITERIA_COUNT" -lt 3 ]; then echo "⚠️ Only $CRITERIA_COUNT acceptance criteria (need at least 3-5)" fi ``` ### STEP 3: Evaluate Testable Acceptance Criteria Check if criteria align with automated checks: **Good criteria (matches quality-checker):** - [x] "All pytest tests pass" → quality-checker runs pytest - [x] "ESLint passes with no errors" → quality-checker runs ESLint - [x] "TypeScript compilation succeeds" → quality-checker runs tsc - [x] "Query completes in < 100ms" → measurable, testable - [x] "Can insert 5 accounts per user" → specific, verifiable **Bad criteria (vague/unmeasurable):** - [ ] "Works correctly" → subjective - [ ] "Code looks good" → subjective - [ ] "Function created" → process-oriented, not outcome - [ ] "Bug is fixed" → not specific enough ### STEP 4: Check for Affected Components Does issue specify: - Which files to create? - Which files to modify? - Which APIs/endpoints involved? - Which database tables affected? - Which UI components changed? - Which tests need updating? ### STEP 5: Evaluate Testing Expectations Does issue specify: - What tests to write (unit, integration, e2e)? - What existing tests need to pass? - What test coverage is expected? - Performance test requirements? ### STEP 6: Check Security Requirements **Security (if applicable):** - Authentication/authorization requirements? - Input validation specs? - Data privacy considerations? - Known vulnerabilities to avoid? **Note on Performance:** Performance requirements are OPTIONAL and NOT evaluated as a critical check. Most issues don't need explicit performance requirements - build first, measure, optimize later. Only flag as missing if: - Issue specifically mentions performance as a concern - Feature handles large-scale data (millions of records) - There's a user-facing latency requirement Otherwise, absence of performance requirements is fine. ### STEP 7: Check Documentation Requirements Does issue specify: - README updates needed? - Wiki/API docs updates? - Inline comments expected? - FEATURE-LIST.md updates? ### STEP 8: Scan for Red Flags ```bash # RED FLAG: Large code blocks (> 15 lines) # Count lines in code blocks CODE_BLOCKS=$(grep -E '```(typescript|javascript|python|sql|java|go|rust)' <<< "$BODY") if [ ! -z "$CODE_BLOCKS" ]; then # Check if any code block has > 15 lines # (This is a simplified check - full implementation would count lines per block) echo "⚠️ Found code blocks - checking size..." # Manual review needed: Are these short examples or full implementations? fi # RED FLAG: Complete function implementations grep -iE '(function|def|class).*\{' <<< "$BODY" && echo "🚨 Contains complete function/class implementations" # RED FLAG: Prescriptive instructions with specific fixes grep -iE '(add (this|the following) code to line [0-9]+|here is the implementation|use this exact code)' <<< "$BODY" && echo "🚨 Contains prescriptive code placement instructions" # RED FLAG: Specific file/line references for bug fixes grep -E '(fix|change|modify|add).*(in|at|on) line [0-9]+' <<< "$BODY" && echo "🚨 Contains specific file/line fix locations" # RED FLAG: Step-by-step implementation guide (not just planning) grep -iE '(step [0-9]|first,.*second,.*third)' <<< "$BODY" | grep -iE '(write this code|add this function|implement as follows)' && echo "🚨 Contains step-by-step code implementation guide" # OK: Short examples are fine # These are NOT red flags: # - Type definitions: { email: string } # - Example responses: { id: 5, status: 'active' } # - Error formats: "Invalid email" # - Small syntax examples (< 5 lines) ``` ## Output Format ```markdown # Issue #${ISSUE_NUM} Evaluation Report **Title:** ${TITLE} ## Agent Framework Compatibility: X/9 Critical Checks Passed **Ready for implementation?** [YES / NEEDS_WORK / NO] **Note:** Performance, Related Issues, and Implementation Guidance are optional - not counted in score. ### ✅ Strengths (What's Good) - [Specific strength 1] - [Specific strength 2] - [Specific strength 3] ### ⚠️ Needs Improvement (Fix Before Implementation) - [Missing element with specific impact on agent] - [Missing element with specific impact on agent] - [Missing element with specific impact on agent] ### ❌ Critical Issues (Blocks Agent Success) - [Critical missing element / red flag] - [Critical missing element / red flag] ### 🚨 Red Flags Found - [Code snippet / prescriptive instruction found] - [Specific file/line reference found] ## Agent-by-Agent Analysis ### issue-implementer Readiness: [READY / NEEDS_WORK / BLOCKED] **Can the implementer succeed with this issue?** ✅ **Has:** - Clear requirements and end state - Files/components to modify specified - Edge cases covered ❌ **Missing:** - Error handling specifications - Validation rules not detailed **Impact:** [Description of how missing elements affect implementer] ### quality-checker Readiness: [READY / NEEDS_WORK / BLOCKED] **Can the quality checker validate this?** ✅ **Has:** - Acceptance criteria reference "pytest passes" - Performance benchmark: "< 100ms" ❌ **Missing:** - No reference to ESLint/TypeScript checks - Test coverage expectations not specified **Impact:** [Description of how missing elements affect quality validation] ### security-checker Readiness: [READY / NEEDS_WORK / N/A] **Are security requirements clear?** ✅ **Has:** - Authentication requirements specified - Input validation rules defined ❌ **Missing:** - No SQL injection prevention mentioned - Data encryption requirements unclear **Impact:** [Description of security gaps] ### doc-checker Readiness: [READY / NEEDS_WORK / BLOCKED] **Are documentation expectations clear?** ✅ **Has:** - README update requirement specified ❌ **Missing:** - No API documentation mentioned - FEATURE-LIST.md updates not specified **Impact:** [Description of doc gaps] ### review-orchestrator Readiness: [READY / NEEDS_WORK / BLOCKED] **Are pass/fail criteria clear?** ✅ **Has:** - 5 specific acceptance criteria - Clear success conditions ❌ **Missing:** - Distinction between blocking vs non-blocking issues - Performance criteria not measurable **Impact:** [Description of review ambiguity] ## Recommendations ### High Priority (Fix Before Implementation) 1. [Specific actionable fix] 2. [Specific actionable fix] 3. [Specific actionable fix] ### Medium Priority (Improve Clarity) 1. [Specific suggestion] 2. [Specific suggestion] ### Low Priority (Nice to Have) 1. [Optional improvement] 2. [Optional improvement] ## Example Improvements ### Before (Current): ```markdown [Quote problematic section from issue] ``` ### After (Suggested): ```markdown [Show how it should be written] ``` ## Agent Framework Compatibility **Will this issue work well with the agent framework?** - ✅ YES - Issue is well-structured for agents - ⚠️ MAYBE - Needs improvements but workable - ❌ NO - Critical issues will confuse agents **Specific concerns:** - [Agent compatibility issue 1] - [Agent compatibility issue 2] ## Quick Fixes If you want to improve this issue, run: ```bash # Add these sections gh issue comment $ISSUE_NUM --body "## Implementation Guidance To implement this, you will need to: - [List problems to solve] " # Remove code snippets # Edit the issue and remove code blocks showing implementation gh issue edit $ISSUE_NUM # Add acceptance criteria gh issue comment $ISSUE_NUM --body "## Additional Acceptance Criteria - [ ] [Specific testable criterion] " ``` ## Summary **Ready for agent implementation?** [YES/NO/NEEDS_WORK] **Confidence level:** [HIGH/MEDIUM/LOW] **Estimated time to fix issues:** [X minutes] ``` ## Agent Framework Compatibility Score **12 Critical Checks (Pass/Fail):** ### ✅ CRITICAL (Must Pass) - 5 checks 1. **Clear Requirements** - End state specified with technical details 2. **Testable Acceptance Criteria** - Aligns with automated checks (pytest, ESLint, etc.) 3. **Affected Components** - Files/modules to create or modify listed 4. **Testing Expectations** - What tests to write/pass specified 5. **Context** - Why, who, business value explained ### ⚠️ IMPORTANT (Should Pass) - 4 checks 6. **Security Requirements** - If handling sensitive data/auth 7. **Documentation Requirements** - README/wiki/API docs expectations 8. **Error Handling** - Error messages, codes, fallback behavior 9. **Scope Boundaries** - What IS and ISN'T included ### 💡 HELPFUL (Nice to Pass) - 3 checks 10. **Performance Requirements** - OPTIONAL, only if specific concern mentioned 11. **Related Issues** - Dependencies, related work 12. **Implementation Guidance** - Problems to solve (without prescribing HOW) ### ❌ RED FLAG (Must NOT Have) - 1 check 13. **No Prescriptive Implementation** - No code snippets, algorithms, step-by-step **Scoring (based on 9 critical/important checks):** - **9/9 + No Red Flags**: Perfect - Ready for agents - **7-8/9 + No Red Flags**: Excellent - Minor improvements - **5-6/9 + No Red Flags**: Good - Some gaps but workable - **3-4/9 + No Red Flags**: Needs work - Significant gaps - **< 3/9 OR Red Flags Present**: Blocked - Must fix before implementation **Note:** Performance, Related Issues, and Implementation Guidance are HELPFUL but not required. **Ready for Implementation?** - **YES**: 7+ checks passed, no red flags - **NEEDS_WORK**: 5-6 checks passed OR minor red flags - **NO**: < 5 checks passed OR major red flags (code snippets, step-by-step) ## Examples ### Example 1: Excellent Issue (9/9 Checks Passed) ``` Issue #42: "Support multiple Google accounts per user" ✅ Agent Framework Compatibility: 9/9 **Ready for implementation?** YES Strengths: - Clear database requirements (tables, columns, constraints) - Acceptance criteria: "pytest passes", "Build succeeds" - Files specified: user_google_accounts table, modify user_google_tokens - Testing: "Write tests for multi-account insertion" - Security: "Validate email uniqueness per user" - Documentation: "Update README with multi-account setup" - Error handling: "Return 409 if duplicate email" - Scope: "Account switching UI is out of scope (Issue #43)" - Context: Explains why users need personal + work accounts Agent-by-Agent: ✅ implementer: Has everything needed ✅ quality-checker: Criteria match automated checks ✅ security-checker: Security requirements clear ✅ doc-checker: Documentation expectations specified ✅ review-orchestrator: Clear pass/fail criteria Note: No performance requirements specified - this is fine. Build first, optimize later if needed. ``` ### Example 2: Needs Work (6/9 Checks Passed) ``` Issue #55: "Add email search functionality" ⚠️ Agent Framework Compatibility: 6/9 **Ready for implementation?** NEEDS_WORK Strengths: - Clear requirements: search endpoint, query parameters - Context explains user need - Affected components specified Issues: ❌ Missing testing expectations (what tests to write?) ❌ Missing documentation requirements ❌ Missing error handling specs ❌ Vague acceptance criteria: "Search works correctly" Agent-by-Agent: ✅ implementer: Has requirements ⚠️ quality-checker: Criteria too vague ("works correctly") ⚠️ security-checker: No SQL injection prevention mentioned ⚠️ doc-checker: No docs requirements ⚠️ review-orchestrator: Criteria not specific enough Recommendations: - Add: "pytest tests pass for search functionality" - Add: "Update API documentation" - Add: "Error handling: Return 400 if invalid query, 500 if database error" - Replace "works correctly" with specific outcomes: "Returns matching emails", "Handles empty results" Note: Performance requirements are optional - don't need to add unless there's a specific concern. ``` ### Example 3: Blocked (2/9, Red Flags Present) ``` Issue #67: "Fix email parsing bug" ❌ Agent Framework Compatibility: 2/9 **Ready for implementation?** NO Critical Issues: 🚨 Contains 30-line complete function implementation 🚨 Prescriptive: "Add this exact function to inbox.ts line 45" 🚨 Step-by-step: "First, create parseEmail(). Second, add validation. Third, handle errors." ❌ No clear requirements (what should parsing do?) ❌ No acceptance criteria ❌ No testing expectations Agent-by-Agent: ❌ implementer: Doesn't need to think - solution provided ❌ quality-checker: Can't validate (no criteria) ❌ security-checker: Missing validation specs ❌ doc-checker: No docs requirements ❌ review-orchestrator: No pass/fail criteria Must fix: 1. Remove complete function - describe WHAT parsing should do instead - ✅ OK to keep: "Output format: `{ local: string, domain: string }`" - ✅ OK to keep: "Error message: 'Invalid email format'" - ❌ Remove: 30-line function implementation 2. Add requirements: input format, output format, error handling 3. Add acceptance criteria: "Parses valid emails", "Rejects invalid formats" 4. Specify testing: "Add unit tests for edge cases" 5. Remove line 45 reference - describe behavior instead 6. Convert step-by-step to "Implementation Guidance" section ``` ### Example 4: Good Use of Code Examples (9/9) ``` Issue #78: "Add user profile endpoint" ✅ Agent Framework Compatibility: 9/9 **Ready for implementation?** YES Requirements: - Endpoint: POST /api/profile - Request body: `{ name: string, bio?: string }` ← Short example (OK!) - Response: `{ id: number, name: string, bio: string | null }` ← Type def (OK!) - Error: `{ error: "Name required", code: 400 }` ← Error format (OK!) ✅ Good use of code examples: - Shows shape/format without providing implementation - Type definitions help clarify requirements - Error format specifies exact user-facing text - No function implementations Agent-by-Agent: ✅ implementer: Clear requirements, no prescribed solution ✅ quality-checker: Testable criteria ✅ security-checker: Input validation specified ✅ doc-checker: API docs requirement present ``` ## Integration with GHI Command This command complements GHI: - **GHI**: Creates new issues following best practices - **EVI**: Evaluates existing issues against same standards Use EVI to: - Review issues written by other agents - Audit issues before sending to implementation - Train team on issue quality standards - Ensure agent framework compatibility ## Notes - Run EVI before implementing any issue - Use it to review issues written by other agents or teammates - Consider making EVI a required check in your workflow - **11+ checks passed**: Agent-ready, implement immediately - **8-10 checks passed**: Needs minor improvements, but workable - **< 8 checks passed**: Must revise before implementation - Look for red flags even in high-scoring issues ### What About Code Snippets? **✅ Short examples are GOOD:** - Type definitions: `interface User { id: number, name: string }` - API responses: `{ success: true, data: {...} }` - Error formats: `{ error: "message", code: 400 }` - Small syntax examples (< 10 lines) **❌ Large implementations are BAD:** - Complete functions (15+ lines) - Full component implementations - Complete SQL migration scripts - Working solutions that agents can copy-paste **The key distinction:** Are you showing the shape/format (good) or providing the solution (bad)?