238 lines
7.9 KiB
Markdown
238 lines
7.9 KiB
Markdown
---
|
|
title: GitHub Issue Analysis
|
|
description: Scoring rubrics and clustering methodology for analyzing GitHub issues
|
|
tags: [github, issues, scoring, clustering, analysis]
|
|
---
|
|
|
|
# GitHub Issue Analysis
|
|
|
|
## Metadata
|
|
|
|
**Purpose**: Analyze GitHub issues with multi-dimensional scoring and clustering
|
|
**Applies to**: Repository investigation and issue backlog analysis
|
|
**Version**: 1.0.0
|
|
|
|
---
|
|
|
|
## Instructions
|
|
|
|
### Issue Scoring Dimensions
|
|
|
|
Score each issue on a scale of 1-5 (low to high) across four dimensions:
|
|
|
|
#### 1. Risk Score (1-5)
|
|
Assess the potential negative impact if the issue is NOT addressed:
|
|
|
|
- **5 - Critical Risk**: Security vulnerability, data loss, production outage, compliance violation
|
|
- **4 - High Risk**: Major feature broken, significant performance degradation, user-blocking bug
|
|
- **3 - Medium Risk**: Moderate bug affecting some users, technical debt that will compound
|
|
- **2 - Low Risk**: Minor bug with workarounds, cosmetic issue, edge case problem
|
|
- **1 - Minimal Risk**: Nice-to-have improvement, suggestion, question
|
|
|
|
**Detection patterns**:
|
|
- Keywords: `security`, `vulnerability`, `crash`, `data loss`, `broken`, `urgent`, `blocker`
|
|
- Labels: `security`, `critical`, `bug`, `p0`, `blocker`
|
|
- Issue state: Open for >90 days with high comment activity
|
|
|
|
#### 2. Value Score (1-5)
|
|
Assess the positive impact if the issue IS addressed:
|
|
|
|
- **5 - Transformative**: Major feature enabling new use cases, significant UX improvement
|
|
- **4 - High Value**: Important enhancement improving core workflows, performance boost
|
|
- **3 - Medium Value**: Useful improvement benefiting many users, code quality enhancement
|
|
- **2 - Low Value**: Minor convenience, small optimization, documentation improvement
|
|
- **1 - Minimal Value**: Cosmetic change, personal preference, duplicate effort
|
|
|
|
**Detection patterns**:
|
|
- Keywords: `feature`, `enhancement`, `improve`, `optimize`, `speed up`, `user request`
|
|
- Labels: `enhancement`, `feature`, `performance`, `ux`
|
|
- Community engagement: High thumbs-up reactions, many comments requesting feature
|
|
|
|
#### 3. Ease Score (1-5)
|
|
Assess the effort required to implement (inverse scale - 5 is easiest):
|
|
|
|
- **5 - Trivial**: Documentation update, config change, simple one-liner fix
|
|
- **4 - Easy**: Small bug fix, minor refactor, straightforward enhancement (<1 day)
|
|
- **3 - Moderate**: Feature addition requiring new code, moderate refactor (1-3 days)
|
|
- **2 - Challenging**: Complex feature, architectural change, multiple dependencies (1-2 weeks)
|
|
- **1 - Very Hard**: Major refactor, breaking changes, requires research/prototyping (>2 weeks)
|
|
|
|
**Detection patterns**:
|
|
- Keywords: `easy`, `good first issue`, `help wanted`, `typo`, `documentation`
|
|
- Labels: `good-first-issue`, `easy`, `documentation`, `beginner-friendly`
|
|
- Code references: Issues with specific file/line references are often easier
|
|
|
|
#### 4. DocQuality Score (1-5)
|
|
Assess how well the issue is documented:
|
|
|
|
- **5 - Excellent**: Clear description, reproduction steps, expected behavior, context, examples
|
|
- **4 - Good**: Clear problem statement, some context, actionable
|
|
- **3 - Adequate**: Basic description, understandable but missing details
|
|
- **2 - Poor**: Vague description, missing context, requires clarification
|
|
- **1 - Very Poor**: Unclear, no details, cannot action without extensive follow-up
|
|
|
|
**Detection patterns**:
|
|
- Length: Longer issues (>500 chars) often have better documentation
|
|
- Structure: Issues with headers, code blocks, numbered steps
|
|
- Completeness: Includes version info, environment details, error messages
|
|
|
|
### Issue Clustering and Themes
|
|
|
|
Group related issues by detecting common themes:
|
|
|
|
#### Theme Detection Approach
|
|
|
|
1. **Keyword extraction**: Extract significant terms from title and body
|
|
2. **Label analysis**: Group by common labels
|
|
3. **Component detection**: Identify affected components/modules
|
|
4. **Similarity scoring**: Find issues with overlapping keywords
|
|
|
|
#### Common Theme Categories
|
|
|
|
- **Bug Clusters**: Similar error messages, same component, related failures
|
|
- **Feature Requests**: Related enhancements, common user needs
|
|
- **Performance Issues**: Speed, memory, scalability concerns
|
|
- **Documentation**: Missing docs, unclear guides, examples needed
|
|
- **Dependencies**: Third-party library issues, version conflicts
|
|
- **DevOps/Infrastructure**: CI/CD, deployment, build issues
|
|
- **Security**: Vulnerabilities, auth, permissions
|
|
|
|
#### Clustering Algorithm
|
|
|
|
```
|
|
For each issue:
|
|
1. Extract keywords from title and body (remove stopwords)
|
|
2. Extract labels
|
|
3. Count keyword overlaps with other issues
|
|
4. If overlap > threshold (e.g., 3+ keywords), mark as related
|
|
5. Group related issues into clusters
|
|
6. Name cluster by most common keywords
|
|
```
|
|
|
|
### Staleness Detection
|
|
|
|
**Definition**: Issues with no activity for ≥N days (default: 60)
|
|
|
|
**Activity indicators**:
|
|
- New comments
|
|
- Label changes
|
|
- Status updates
|
|
- Commits referencing the issue
|
|
|
|
**Staleness categories**:
|
|
- **Fresh**: Activity within last 30 days
|
|
- **Aging**: 30-60 days since last activity
|
|
- **Stale**: 60-90 days since last activity
|
|
- **Very Stale**: 90+ days since last activity
|
|
|
|
**Action recommendations**:
|
|
- **Very Stale**: Consider closing with explanation or ping assignees
|
|
- **Stale**: Review priority, add to backlog grooming
|
|
- **Aging**: Monitor for staleness
|
|
- **Fresh**: Active work, no action needed
|
|
|
|
### Issue Bucketing
|
|
|
|
Categorize issues into actionable buckets based on scores:
|
|
|
|
1. **High Value**: Value ≥ 4, Risk ≥ 3 → Priority work
|
|
2. **High Risk**: Risk ≥ 4 → Address urgently regardless of value
|
|
3. **Newcomer Friendly**: Ease ≥ 4, DocQuality ≥ 3 → Good first issues
|
|
4. **Automation Candidate**: Ease ≥ 4 AND at least 2 similar issues detected by clustering → Consider scripting/automation
|
|
5. **Needs Context**: DocQuality ≤ 2 → Request more information
|
|
6. **Stale**: No activity ≥ stale_days → Consider closing/archiving
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
### Scoring Examples
|
|
|
|
#### Example 1: Security Vulnerability
|
|
```
|
|
Title: "SQL injection in login endpoint"
|
|
Risk: 5 (critical security issue)
|
|
Value: 5 (must fix)
|
|
Ease: 3 (moderate fix, requires testing)
|
|
DocQuality: 5 (clear reproduction steps, example payload)
|
|
Bucket: High Risk
|
|
```
|
|
|
|
#### Example 2: Documentation Typo
|
|
```
|
|
Title: "Fix typo in README.md - 'installtion' -> 'installation'"
|
|
Risk: 1 (minimal impact)
|
|
Value: 1 (minor improvement)
|
|
Ease: 5 (trivial one-character change)
|
|
DocQuality: 5 (exact location specified)
|
|
Bucket: Newcomer Friendly
|
|
```
|
|
|
|
#### Example 3: Feature Request
|
|
```
|
|
Title: "Add dark mode support"
|
|
Risk: 1 (no negative impact if not done)
|
|
Value: 4 (highly requested UX improvement)
|
|
Ease: 2 (significant UI/CSS work)
|
|
DocQuality: 3 (clear request but missing design specs)
|
|
Bucket: High Value
|
|
```
|
|
|
|
#### Example 4: Vague Bug Report
|
|
```
|
|
Title: "App doesn't work"
|
|
Risk: ? (unclear)
|
|
Value: ? (unclear)
|
|
Ease: ? (cannot assess)
|
|
DocQuality: 1 (no details whatsoever)
|
|
Bucket: Needs Context
|
|
```
|
|
|
|
### CLI Integration
|
|
|
|
The analysis uses GitHub CLI (`gh`) for read-only access:
|
|
|
|
```bash
|
|
# Fetch all open issues
|
|
gh issue list --state open --limit 1000 --json number,title,body,labels,createdAt,updatedAt,comments
|
|
|
|
# Fetch issues with specific labels
|
|
gh issue list --label bug --state open --json number,title,body,labels,createdAt,updatedAt
|
|
|
|
# Fetch issues since specific date
|
|
gh issue list --state all --search "updated:>2024-01-01"
|
|
```
|
|
|
|
### Output Format
|
|
|
|
**Console Output**: Structured dashboard with tables and buckets
|
|
**Markdown Output**: Detailed report with scores, themes, and recommendations
|
|
|
|
Example markdown structure:
|
|
```markdown
|
|
# GitHub Issue Analysis
|
|
|
|
## Summary
|
|
- Total issues analyzed: 47
|
|
- Stale issues: 12
|
|
- High risk issues: 3
|
|
- High value issues: 8
|
|
|
|
## Issue Buckets
|
|
|
|
### High Risk (3 issues)
|
|
- #42: SQL injection vulnerability [Risk:5, Value:5, Ease:3]
|
|
- #38: Data corruption in export [Risk:5, Value:4, Ease:2]
|
|
|
|
### High Value (8 issues)
|
|
...
|
|
|
|
## Themes
|
|
1. **Performance** (5 issues): #12, #23, #34, #45, #56
|
|
2. **Documentation** (8 issues): ...
|
|
|
|
## Stale Issues
|
|
- #15: Last activity 87 days ago
|
|
- #22: Last activity 92 days ago
|
|
```
|