zhongwei/gh-andre-mygentic-andre-engineering-system

Fork 0

Files

Zhongwei Li ee11345c5b Initial commit

2025-11-29 17:54:49 +08:00

21 KiB

Raw Permalink Blame History

description, argument-hint

description

argument-hint

Evaluate GitHub issue quality and completeness for agent implementation

issue number or URL

EVI - Evaluate Issue Quality

Evaluates a GitHub issue to ensure it contains everything needed for the agent implementation framework to succeed.

Input

Issue Number or URL:

$ARGUMENTS

Agent Framework Requirements

The issue must support all agents in the pipeline:

issue-implementer needs:

Clear requirements and end state
Files/components to create or modify
Edge cases and error handling specs
Testing expectations

quality-checker needs:

Testable acceptance criteria matching automated checks
Performance benchmarks (if applicable)
Expected test outcomes

security-checker needs:

Security considerations and requirements
Authentication/authorization specs
Data handling requirements

doc-checker needs:

Documentation update requirements
What needs README/wiki updates
API documentation expectations

review-orchestrator needs:

Clear pass/fail criteria
Non-negotiable vs nice-to-have requirements

issue-merger needs:

Unambiguous "done" definition
Integration requirements

Evaluation Criteria

✅ CRITICAL (Must Have)

1. Clear Requirements

Exact end state specified
Technical details: names, types, constraints, behavior
Files/components to create or modify
Validation rules and error handling
Edge cases covered

2. Testable Acceptance Criteria

Specific, measurable outcomes
Aligns with automated checks (pytest, ESLint, TypeScript, build)
Includes edge cases
References quality checks: "All tests pass", "ESLint passes", "Build succeeds"

3. Affected Components

Which files/modules to modify
Which APIs/endpoints involved
Which database tables affected
Which UI components changed

4. Testing Expectations

What tests need to be written
What tests need to pass
Performance benchmarks (if applicable)
Integration test requirements

5. Context & Why

Business value
User impact
Current limitation
Why this matters now

⚠️ IMPORTANT (Should Have)

6. Security Requirements

Authentication/authorization needs
Data privacy considerations
Input validation requirements
Security vulnerabilities to avoid

7. Documentation Requirements

What needs README updates
What needs wiki/API docs
Inline code comments expected
FEATURE-LIST.md updates

8. Error Handling

Expected error messages
Error codes to return
Fallback behavior
User-facing error text

9. Scope Boundaries

What IS included
What is NOT included
Out of scope items
Future work references

💡 HELPFUL (Nice to Have)

10. Performance Requirements (OPTIONAL - only if specific concern)

Response time limits
Query performance expectations
Scale requirements
Load handling
Note: Most issues don't need this. Build first, measure, optimize later.

11. Related Issues

Dependencies (blocked by, depends on)
Related work
Follow-up issues planned

12. Implementation Guidance

Problems agent needs to solve
Existing patterns to follow
Challenges to consider
No prescriptive solutions (guides, doesn't prescribe)

❌ RED FLAGS (Must NOT Have)

13. Prescriptive Implementation

Complete function/component implementations (full solutions)
Large blocks of working code (> 10-15 lines)
Complete SQL migration scripts
Step-by-step implementation guide
"Add this code to line X" with specific file/line fixes

✅ OK to have:

Short code examples (< 10 lines): { error: 'message', code: 400 }
Type definitions: { email: string, category?: string }
Example API responses: { id: 5, status: 'active' }
Error message formats: "Invalid email format"
Small syntax examples showing the shape/format

Evaluation Process

STEP 1: Fetch the Issue

ISSUE_NUM=$ARGUMENTS

# Fetch issue content
gh issue view $ISSUE_NUM --json title,body,labels > issue-data.json

TITLE=$(jq -r '.title' issue-data.json)
BODY=$(jq -r '.body' issue-data.json)
LABELS=$(jq -r '.labels[].name' issue-data.json)

echo "Evaluating Issue #${ISSUE_NUM}: ${TITLE}"

STEP 2: Check Agent Framework Compatibility

Evaluate for each agent in the pipeline:

For issue-implementer:

# Check: Clear requirements present?
echo "$BODY" | grep -iE '(requirements|end state|target state)' || echo "⚠️ No clear requirements section"

# Check: Files/components specified?
echo "$BODY" | grep -iE '(files? (to )?(create|modify)|components? (to )?(create|modify|affected)|tables? (to )?(create|modify))' || echo "⚠️ No affected files/components specified"

# Check: Edge cases covered?
echo "$BODY" | grep -iE '(edge cases?|error handling|fallback|validation)' || echo "⚠️ No edge cases or error handling mentioned"

For quality-checker:

# Check: Acceptance criteria reference automated checks?
echo "$BODY" | grep -iE '(tests? pass|eslint|flake8|mypy|pytest|typescript|build succeeds|linting passes)' || echo "⚠️ Acceptance criteria don't reference automated quality checks"

# Note: Performance requirements are optional - don't check for them

For security-checker:

# Check: Security requirements present if handling sensitive data?
if echo "$BODY" | grep -iE '(password|token|secret|api.?key|auth|credential|email|private)'; then
  echo "$BODY" | grep -iE '(security|authentication|authorization|encrypt|sanitize|validate|sql.?injection|xss)' || echo "⚠️ Handles sensitive data but no security requirements"
fi

For doc-checker:

# Check: Documentation requirements specified?
echo "$BODY" | grep -iE '(documentation|readme|wiki|api.?docs?|comments?|feature.?list)' || echo "⚠️ No documentation requirements specified"

For review-orchestrator:

# Check: Clear pass/fail criteria?
CRITERIA_COUNT=$(echo "$BODY" | grep -c '- \[.\]' || echo "0")
if [ "$CRITERIA_COUNT" -lt 3 ]; then
  echo "⚠️ Only $CRITERIA_COUNT acceptance criteria (need at least 3-5)"
fi

STEP 3: Evaluate Testable Acceptance Criteria

Check if criteria align with automated checks:

Good criteria (matches quality-checker):

"All pytest tests pass" → quality-checker runs pytest
"ESLint passes with no errors" → quality-checker runs ESLint
"TypeScript compilation succeeds" → quality-checker runs tsc
"Query completes in < 100ms" → measurable, testable
"Can insert 5 accounts per user" → specific, verifiable

Bad criteria (vague/unmeasurable):

"Works correctly" → subjective
"Code looks good" → subjective
"Function created" → process-oriented, not outcome
"Bug is fixed" → not specific enough

STEP 4: Check for Affected Components

Does issue specify:

Which files to create?
Which files to modify?
Which APIs/endpoints involved?
Which database tables affected?
Which UI components changed?
Which tests need updating?

STEP 5: Evaluate Testing Expectations

Does issue specify:

What tests to write (unit, integration, e2e)?
What existing tests need to pass?
What test coverage is expected?
Performance test requirements?

STEP 6: Check Security Requirements

Security (if applicable):

Authentication/authorization requirements?
Input validation specs?
Data privacy considerations?
Known vulnerabilities to avoid?

Note on Performance: Performance requirements are OPTIONAL and NOT evaluated as a critical check. Most issues don't need explicit performance requirements - build first, measure, optimize later. Only flag as missing if:

Issue specifically mentions performance as a concern
Feature handles large-scale data (millions of records)
There's a user-facing latency requirement

Otherwise, absence of performance requirements is fine.

STEP 7: Check Documentation Requirements

Does issue specify:

README updates needed?
Wiki/API docs updates?
Inline comments expected?
FEATURE-LIST.md updates?

STEP 8: Scan for Red Flags

# RED FLAG: Large code blocks (> 15 lines)
# Count lines in code blocks
CODE_BLOCKS=$(grep -E '```(typescript|javascript|python|sql|java|go|rust)' <<< "$BODY")
if [ ! -z "$CODE_BLOCKS" ]; then
  # Check if any code block has > 15 lines
  # (This is a simplified check - full implementation would count lines per block)
  echo "⚠️ Found code blocks - checking size..."
  # Manual review needed: Are these short examples or full implementations?
fi

# RED FLAG: Complete function implementations
grep -iE '(function|def|class).*\{' <<< "$BODY" && echo "🚨 Contains complete function/class implementations"

# RED FLAG: Prescriptive instructions with specific fixes
grep -iE '(add (this|the following) code to line [0-9]+|here is the implementation|use this exact code)' <<< "$BODY" && echo "🚨 Contains prescriptive code placement instructions"

# RED FLAG: Specific file/line references for bug fixes
grep -E '(fix|change|modify|add).*(in|at|on) line [0-9]+' <<< "$BODY" && echo "🚨 Contains specific file/line fix locations"

# RED FLAG: Step-by-step implementation guide (not just planning)
grep -iE '(step [0-9]|first,.*second,.*third)' <<< "$BODY" | grep -iE '(write this code|add this function|implement as follows)' && echo "🚨 Contains step-by-step code implementation guide"

# OK: Short examples are fine
# These are NOT red flags:
# - Type definitions: { email: string }
# - Example responses: { id: 5, status: 'active' }
# - Error formats: "Invalid email"
# - Small syntax examples (< 5 lines)

Output Format

# Issue #${ISSUE_NUM} Evaluation Report

**Title:** ${TITLE}

## Agent Framework Compatibility: X/9 Critical Checks Passed

**Ready for implementation?** [YES / NEEDS_WORK / NO]

**Note:** Performance, Related Issues, and Implementation Guidance are optional - not counted in score.

### ✅ Strengths (What's Good)
- [Specific strength 1]
- [Specific strength 2]
- [Specific strength 3]

### ⚠️ Needs Improvement (Fix Before Implementation)
- [Missing element with specific impact on agent]
- [Missing element with specific impact on agent]
- [Missing element with specific impact on agent]

### ❌ Critical Issues (Blocks Agent Success)
- [Critical missing element / red flag]
- [Critical missing element / red flag]

### 🚨 Red Flags Found
- [Code snippet / prescriptive instruction found]
- [Specific file/line reference found]

## Agent-by-Agent Analysis

### issue-implementer Readiness: [READY / NEEDS_WORK / BLOCKED]
**Can the implementer succeed with this issue?**

✅ **Has:**
- Clear requirements and end state
- Files/components to modify specified
- Edge cases covered

❌ **Missing:**
- Error handling specifications
- Validation rules not detailed

**Impact:** [Description of how missing elements affect implementer]

### quality-checker Readiness: [READY / NEEDS_WORK / BLOCKED]
**Can the quality checker validate this?**

✅ **Has:**
- Acceptance criteria reference "pytest passes"
- Performance benchmark: "< 100ms"

❌ **Missing:**
- No reference to ESLint/TypeScript checks
- Test coverage expectations not specified

**Impact:** [Description of how missing elements affect quality validation]

### security-checker Readiness: [READY / NEEDS_WORK / N/A]
**Are security requirements clear?**

✅ **Has:**
- Authentication requirements specified
- Input validation rules defined

❌ **Missing:**
- No SQL injection prevention mentioned
- Data encryption requirements unclear

**Impact:** [Description of security gaps]

### doc-checker Readiness: [READY / NEEDS_WORK / BLOCKED]
**Are documentation expectations clear?**

✅ **Has:**
- README update requirement specified

❌ **Missing:**
- No API documentation mentioned
- FEATURE-LIST.md updates not specified

**Impact:** [Description of doc gaps]

### review-orchestrator Readiness: [READY / NEEDS_WORK / BLOCKED]
**Are pass/fail criteria clear?**

✅ **Has:**
- 5 specific acceptance criteria
- Clear success conditions

❌ **Missing:**
- Distinction between blocking vs non-blocking issues
- Performance criteria not measurable

**Impact:** [Description of review ambiguity]

## Recommendations

### High Priority (Fix Before Implementation)
1. [Specific actionable fix]
2. [Specific actionable fix]
3. [Specific actionable fix]

### Medium Priority (Improve Clarity)
1. [Specific suggestion]
2. [Specific suggestion]

### Low Priority (Nice to Have)
1. [Optional improvement]
2. [Optional improvement]

## Example Improvements

### Before (Current):
```markdown
[Quote problematic section from issue]

After (Suggested):

[Show how it should be written]

Agent Framework Compatibility

Will this issue work well with the agent framework?

✅ YES - Issue is well-structured for agents
⚠️ MAYBE - Needs improvements but workable
❌ NO - Critical issues will confuse agents

Specific concerns:

[Agent compatibility issue 1]
[Agent compatibility issue 2]

Quick Fixes

If you want to improve this issue, run:

# Add these sections
gh issue comment $ISSUE_NUM --body "## Implementation Guidance

To implement this, you will need to:
- [List problems to solve]
"

# Remove code snippets
# Edit the issue and remove code blocks showing implementation
gh issue edit $ISSUE_NUM

# Add acceptance criteria
gh issue comment $ISSUE_NUM --body "## Additional Acceptance Criteria
- [ ] [Specific testable criterion]
"

Summary

Ready for agent implementation? [YES/NO/NEEDS_WORK]

Confidence level: [HIGH/MEDIUM/LOW]

Estimated time to fix issues: [X minutes]


## Agent Framework Compatibility Score

**12 Critical Checks (Pass/Fail):**

### ✅ CRITICAL (Must Pass) - 5 checks
1. **Clear Requirements** - End state specified with technical details
2. **Testable Acceptance Criteria** - Aligns with automated checks (pytest, ESLint, etc.)
3. **Affected Components** - Files/modules to create or modify listed
4. **Testing Expectations** - What tests to write/pass specified
5. **Context** - Why, who, business value explained

### ⚠️ IMPORTANT (Should Pass) - 4 checks
6. **Security Requirements** - If handling sensitive data/auth
7. **Documentation Requirements** - README/wiki/API docs expectations
8. **Error Handling** - Error messages, codes, fallback behavior
9. **Scope Boundaries** - What IS and ISN'T included

### 💡 HELPFUL (Nice to Pass) - 3 checks
10. **Performance Requirements** - OPTIONAL, only if specific concern mentioned
11. **Related Issues** - Dependencies, related work
12. **Implementation Guidance** - Problems to solve (without prescribing HOW)

### ❌ RED FLAG (Must NOT Have) - 1 check
13. **No Prescriptive Implementation** - No code snippets, algorithms, step-by-step

**Scoring (based on 9 critical/important checks):**
- **9/9 + No Red Flags**: Perfect - Ready for agents
- **7-8/9 + No Red Flags**: Excellent - Minor improvements
- **5-6/9 + No Red Flags**: Good - Some gaps but workable
- **3-4/9 + No Red Flags**: Needs work - Significant gaps
- **< 3/9 OR Red Flags Present**: Blocked - Must fix before implementation

**Note:** Performance, Related Issues, and Implementation Guidance are HELPFUL but not required.

**Ready for Implementation?**
- **YES**: 7+ checks passed, no red flags
- **NEEDS_WORK**: 5-6 checks passed OR minor red flags
- **NO**: < 5 checks passed OR major red flags (code snippets, step-by-step)

## Examples

### Example 1: Excellent Issue (9/9 Checks Passed)

Issue #42: "Support multiple Google accounts per user"

✅ Agent Framework Compatibility: 9/9 Ready for implementation? YES

Strengths:

Clear database requirements (tables, columns, constraints)
Acceptance criteria: "pytest passes", "Build succeeds"
Files specified: user_google_accounts table, modify user_google_tokens
Testing: "Write tests for multi-account insertion"
Security: "Validate email uniqueness per user"
Documentation: "Update README with multi-account setup"
Error handling: "Return 409 if duplicate email"
Scope: "Account switching UI is out of scope (Issue #43)"
Context: Explains why users need personal + work accounts

Agent-by-Agent: ✅ implementer: Has everything needed ✅ quality-checker: Criteria match automated checks ✅ security-checker: Security requirements clear ✅ doc-checker: Documentation expectations specified ✅ review-orchestrator: Clear pass/fail criteria

Note: No performance requirements specified - this is fine. Build first, optimize later if needed.


### Example 2: Needs Work (6/9 Checks Passed)

Issue #55: "Add email search functionality"

⚠️ Agent Framework Compatibility: 6/9 Ready for implementation? NEEDS_WORK

Strengths:

Clear requirements: search endpoint, query parameters
Context explains user need
Affected components specified

Issues: ❌ Missing testing expectations (what tests to write?) ❌ Missing documentation requirements ❌ Missing error handling specs ❌ Vague acceptance criteria: "Search works correctly"

Agent-by-Agent: ✅ implementer: Has requirements ⚠️ quality-checker: Criteria too vague ("works correctly") ⚠️ security-checker: No SQL injection prevention mentioned ⚠️ doc-checker: No docs requirements ⚠️ review-orchestrator: Criteria not specific enough

Recommendations:

Add: "pytest tests pass for search functionality"
Add: "Update API documentation"
Add: "Error handling: Return 400 if invalid query, 500 if database error"
Replace "works correctly" with specific outcomes: "Returns matching emails", "Handles empty results"

Note: Performance requirements are optional - don't need to add unless there's a specific concern.


### Example 3: Blocked (2/9, Red Flags Present)

Issue #67: "Fix email parsing bug"

❌ Agent Framework Compatibility: 2/9 Ready for implementation? NO

Critical Issues: 🚨 Contains 30-line complete function implementation 🚨 Prescriptive: "Add this exact function to inbox.ts line 45" 🚨 Step-by-step: "First, create parseEmail(). Second, add validation. Third, handle errors." ❌ No clear requirements (what should parsing do?) ❌ No acceptance criteria ❌ No testing expectations

Agent-by-Agent: ❌ implementer: Doesn't need to think - solution provided ❌ quality-checker: Can't validate (no criteria) ❌ security-checker: Missing validation specs ❌ doc-checker: No docs requirements ❌ review-orchestrator: No pass/fail criteria

Must fix:

Remove complete function - describe WHAT parsing should do instead
- ✅ OK to keep: "Output format: { local: string, domain: string }"
- ✅ OK to keep: "Error message: 'Invalid email format'"
- ❌ Remove: 30-line function implementation
Add requirements: input format, output format, error handling
Add acceptance criteria: "Parses valid emails", "Rejects invalid formats"
Specify testing: "Add unit tests for edge cases"
Remove line 45 reference - describe behavior instead
Convert step-by-step to "Implementation Guidance" section


### Example 4: Good Use of Code Examples (9/9)

Issue #78: "Add user profile endpoint"

✅ Agent Framework Compatibility: 9/9 Ready for implementation? YES

Requirements:

Endpoint: POST /api/profile
Request body: { name: string, bio?: string } ← Short example (OK!)
Response: { id: number, name: string, bio: string | null } ← Type def (OK!)
Error: { error: "Name required", code: 400 } ← Error format (OK!)

✅ Good use of code examples:

Shows shape/format without providing implementation
Type definitions help clarify requirements
Error format specifies exact user-facing text
No function implementations

Agent-by-Agent: ✅ implementer: Clear requirements, no prescribed solution ✅ quality-checker: Testable criteria ✅ security-checker: Input validation specified ✅ doc-checker: API docs requirement present


## Integration with GHI Command

This command complements GHI:
- **GHI**: Creates new issues following best practices
- **EVI**: Evaluates existing issues against same standards

Use EVI to:
- Review issues written by other agents
- Audit issues before sending to implementation
- Train team on issue quality standards
- Ensure agent framework compatibility

## Notes

- Run EVI before implementing any issue
- Use it to review issues written by other agents or teammates
- Consider making EVI a required check in your workflow
- **11+ checks passed**: Agent-ready, implement immediately
- **8-10 checks passed**: Needs minor improvements, but workable
- **< 8 checks passed**: Must revise before implementation
- Look for red flags even in high-scoring issues

### What About Code Snippets?

**✅ Short examples are GOOD:**
- Type definitions: `interface User { id: number, name: string }`
- API responses: `{ success: true, data: {...} }`
- Error formats: `{ error: "message", code: 400 }`
- Small syntax examples (< 10 lines)

**❌ Large implementations are BAD:**
- Complete functions (15+ lines)
- Full component implementations
- Complete SQL migration scripts
- Working solutions that agents can copy-paste

**The key distinction:** Are you showing the shape/format (good) or providing the solution (bad)?

21 KiB Raw Permalink Blame History

EVI - Evaluate Issue Quality

Input

Agent Framework Requirements

Evaluation Criteria

✅ CRITICAL (Must Have)

⚠️ IMPORTANT (Should Have)

💡 HELPFUL (Nice to Have)

❌ RED FLAGS (Must NOT Have)

Evaluation Process

STEP 1: Fetch the Issue

STEP 2: Check Agent Framework Compatibility

STEP 3: Evaluate Testable Acceptance Criteria

STEP 4: Check for Affected Components

STEP 5: Evaluate Testing Expectations

STEP 6: Check Security Requirements

STEP 7: Check Documentation Requirements

STEP 8: Scan for Red Flags

Output Format

After (Suggested):

Agent Framework Compatibility

Quick Fixes

Summary

21 KiB

Raw Permalink Blame History