Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:45:53 +08:00
commit 74958112ad
11 changed files with 1882 additions and 0 deletions

View File

@@ -0,0 +1,347 @@
---
name: Suggest Reviewers Helper
description: Git blame analysis helper for the suggest-reviewers command
---
# Suggest Reviewers Helper
This skill provides a Python helper script that analyzes git blame data for the `/git:suggest-reviewers` command. The script handles the complex task of identifying which lines were changed and who authored the original code.
## When to Use This Skill
Use this skill when implementing the `/git:suggest-reviewers` command. The helper script should be invoked during Step 3 of the command implementation (analyzing git blame for changed lines).
**DO NOT implement git blame analysis manually** - always use the provided `analyze_blame.py` script.
## Prerequisites
- Python 3.6 or higher
- Git repository with commit history
- Git CLI available in PATH
## Helper Script: analyze_blame.py
The `analyze_blame.py` script automates the complex process of:
1. Parsing git diff output to identify specific line ranges that were modified
2. Running git blame on only the changed line ranges (not entire files)
3. Extracting and aggregating author information with statistics
4. Filtering out bot accounts automatically
### Usage
**For uncommitted changes:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/suggest-reviewers/analyze_blame.py \
--mode uncommitted \
--file path/to/file1.go \
--file path/to/file2.py \
--output json
```
**For committed changes on a feature branch:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/suggest-reviewers/analyze_blame.py \
--mode committed \
--base-branch main \
--file path/to/file1.go \
--file path/to/file2.py \
--output json
```
### Parameters
- `--mode`: Required. Either `uncommitted` or `committed`
- `uncommitted`: Analyzes unstaged/staged changes against HEAD
- `committed`: Analyzes committed changes against a base branch
- `--base-branch`: Required when mode is `committed`. The base branch to compare against (e.g., `main`, `master`)
- `--file`: Can be specified multiple times. Each file to analyze for blame information. Only changed files should be passed.
- `--output`: Output format. Default is `json`. Options:
- `json`: Machine-readable JSON output
- `text`: Human-readable text output
### Output Format (JSON)
```json
{
"Author Name": {
"line_count": 45,
"most_recent_date": "2024-10-15T14:23:10",
"files": ["file1.go", "file2.go"],
"email": "author@example.com"
},
"Another Author": {
"line_count": 23,
"most_recent_date": "2024-09-20T09:15:33",
"files": ["file3.py"],
"email": "another@example.com"
}
}
```
### Output Fields
- `line_count`: Total number of modified lines authored by this person
- `most_recent_date`: ISO 8601 timestamp of their most recent contribution to the changed code
- `files`: Array of files where this author has contributions in the changed lines
- `email`: Author's email address from git commits
### Bot Filtering
The script automatically filters out common bot accounts:
- GitHub bots (e.g., `dependabot[bot]`, `renovate[bot]`)
- CI bots (e.g., `openshift-ci-robot`, `k8s-ci-robot`)
- Generic bot patterns (any name containing `[bot]` or ending in `-bot`)
## Implementation Steps
### Step 1: Collect changed files
Before invoking the script, collect the list of changed files based on the scenario:
**Uncommitted changes:**
```bash
# Get staged and unstaged files
files=$(git diff --name-only --diff-filter=d HEAD)
files+=" $(git diff --name-only --diff-filter=d --cached)"
```
**Committed changes:**
```bash
# Get files changed from base branch
files=$(git diff --name-only --diff-filter=d ${base_branch}...HEAD)
```
### Step 2: Invoke the script
Build the command with the appropriate mode and all changed files:
```bash
# Start building the command
cmd="python3 ${CLAUDE_PLUGIN_ROOT}/skills/suggest-reviewers/analyze_blame.py"
# Add mode
if [ "$has_uncommitted" = true ] || [ "$on_base_branch" = true ]; then
cmd="$cmd --mode uncommitted"
else
cmd="$cmd --mode committed --base-branch $base_branch"
fi
# Add each file
for file in $files; do
cmd="$cmd --file $file"
done
# Add output format
cmd="$cmd --output json"
# Execute and capture JSON output
blame_data=$($cmd)
```
### Step 3: Parse the output
The JSON output can be parsed using Python, jq, or any JSON parser:
```bash
# Example using jq to get top contributor
echo "$blame_data" | jq -r 'to_entries | sort_by(-.value.line_count) | .[0].key'
# Example using Python
python3 << EOF
import json
import sys
data = json.loads('''$blame_data''')
# Sort by line count
sorted_authors = sorted(data.items(), key=lambda x: x[1]['line_count'], reverse=True)
for author, stats in sorted_authors:
print(f"{author}: {stats['line_count']} lines, last modified {stats['most_recent_date']}")
EOF
```
### Step 4: Combine with OWNERS data
After getting blame data, merge it with OWNERS file information to produce the final ranked list of reviewers.
## Error Handling
### No changed files
If no files are passed to the script:
```
Error: No files specified. Use --file option at least once.
```
**Resolution:** Ensure you've detected changed files correctly before invoking the script.
### Invalid mode
If an invalid mode is specified:
```
Error: Invalid mode 'invalid'. Must be 'uncommitted' or 'committed'.
```
**Resolution:** Use either `--mode uncommitted` or `--mode committed`.
### Missing base branch in committed mode
If `--mode committed` is used without `--base-branch`:
```
Error: --base-branch is required when mode is 'committed'.
```
**Resolution:** Provide the base branch: `--base-branch main`
### File not in repository
If a specified file is not tracked by git:
```
Warning: File 'path/to/file' is not tracked by git, skipping.
```
**Resolution:** This is a warning and can be safely ignored. The script will skip untracked files.
### No blame data found
If git blame returns no data for any files:
```json
{}
```
**Resolution:** This can happen if:
- All changed files are newly created (no blame history)
- All changes are in binary files
- Git blame is unable to run
In this case, fall back to OWNERS-only suggestions.
## Examples
### Example 1: Analyze uncommitted changes
```bash
$ python3 analyze_blame.py --mode uncommitted --file src/main.go --file src/utils.go --output json
{
"Alice Developer": {
"line_count": 45,
"most_recent_date": "2024-10-15T14:23:10",
"files": ["src/main.go", "src/utils.go"],
"email": "alice@example.com"
},
"Bob Engineer": {
"line_count": 12,
"most_recent_date": "2024-09-20T09:15:33",
"files": ["src/main.go"],
"email": "bob@example.com"
}
}
```
### Example 2: Analyze committed changes on feature branch
```bash
$ python3 analyze_blame.py --mode committed --base-branch main --file pkg/controller/manager.go --output json
{
"Charlie Contributor": {
"line_count": 78,
"most_recent_date": "2024-10-01T11:42:55",
"files": ["pkg/controller/manager.go"],
"email": "charlie@example.com"
}
}
```
### Example 3: Text output format
```bash
$ python3 analyze_blame.py --mode uncommitted --file README.md --output text
Blame Analysis Results:
=======================
Alice Developer (alice@example.com)
Lines: 23
Most recent: 2024-10-15T14:23:10
Files: README.md
Bob Engineer (bob@example.com)
Lines: 5
Most recent: 2024-08-12T16:30:21
Files: README.md
```
### Example 4: Multiple files with mixed results
```bash
$ python3 analyze_blame.py --mode committed --base-branch release-4.15 \
--file vendor/k8s.io/client-go/kubernetes/clientset.go \
--file pkg/controller/node.go \
--file docs/README.md \
--output json
{
"Diana Developer": {
"line_count": 156,
"most_recent_date": "2024-09-28T13:15:42",
"files": ["vendor/k8s.io/client-go/kubernetes/clientset.go", "pkg/controller/node.go"],
"email": "diana@example.com"
},
"Eve Technical Writer": {
"line_count": 34,
"most_recent_date": "2024-10-10T10:22:18",
"files": ["docs/README.md"],
"email": "eve@example.com"
}
}
```
## Technical Details
### How the script works
1. **Determine diff range**: Based on mode, calculates what to compare:
- `uncommitted`: Compares working directory against HEAD
- `committed`: Compares HEAD against base branch
2. **Parse diff output**: Runs `git diff` with unified format to identify:
- Which files changed
- Which line ranges were added/modified
- Ignores deleted lines (can't blame what doesn't exist)
3. **Run git blame**: For each file and line range:
- Runs `git blame -L start,end --line-porcelain file`
- Parses porcelain format to extract author, email, and timestamp
- Aggregates data across all changed lines
4. **Filter and aggregate**:
- Removes bot accounts
- Groups by author name
- Counts total lines per author
- Tracks most recent contribution date
- Lists all files each author contributed to
5. **Output results**: Formats as JSON or text based on `--output` parameter
### Performance considerations
- Only blames changed line ranges, not entire files (much faster for small changes to large files)
- Processes files in parallel when possible
- Caches git commands where appropriate
- Skips binary files automatically
## Limitations
- Does not handle file renames/moves (treats as delete + add)
- Bot filtering is based on common patterns; custom bots may not be filtered
- Requires git history; newly initialized repos may not have useful data
- Does not consider commit message content or PR review history
## See Also
- Main command: `/git:suggest-reviewers` in `plugins/git/commands/suggest-reviewers.md`
- Git blame documentation: https://git-scm.com/docs/git-blame
- Git diff documentation: https://git-scm.com/docs/git-diff

View File

@@ -0,0 +1,380 @@
#!/usr/bin/env python3
"""
Git Blame Analysis Helper for suggest-reviewers command.
This script helps identify the authors of code lines being modified in a PR,
aggregating git blame data to suggest the most relevant reviewers.
Usage:
python analyze_blame.py --mode <uncommitted|committed> --file <filepath> [--base-branch <branch>]
Modes:
uncommitted: Analyze uncommitted changes (compares against HEAD)
committed: Analyze committed changes on feature branch (compares against base branch)
"""
import argparse
import json
import re
import subprocess
import sys
from collections import defaultdict
from datetime import datetime
from typing import Dict, List, Tuple, Optional
class BlameAnalyzer:
"""Analyzes git blame for changed lines in files."""
# Bot patterns to filter out
BOT_PATTERNS = [
r'.*\[bot\]',
r'openshift-bot',
r'k8s-ci-robot',
r'openshift-merge-robot',
r'openshift-ci\[bot\]',
r'dependabot',
r'renovate\[bot\]',
]
def __init__(self, mode: str, base_branch: Optional[str] = None):
"""
Initialize the analyzer.
Args:
mode: 'uncommitted' or 'committed'
base_branch: Base branch for committed mode (e.g., 'main')
"""
self.mode = mode
self.base_branch = base_branch
self.authors = defaultdict(lambda: {
'line_count': 0,
'most_recent_date': None,
'files': set(),
'email': None
})
if mode == 'committed' and not base_branch:
raise ValueError("base_branch required for 'committed' mode")
# Get current user to exclude from suggestions
self.current_user_name = self._get_git_config('user.name')
self.current_user_email = self._get_git_config('user.email')
def _get_git_config(self, key: str) -> Optional[str]:
"""Get a git config value."""
try:
result = subprocess.run(
['git', 'config', '--get', key],
capture_output=True,
text=True,
check=False
)
if result.returncode == 0:
return result.stdout.strip()
except Exception:
pass
return None
def is_bot(self, author: str) -> bool:
"""Check if an author name matches bot patterns."""
for pattern in self.BOT_PATTERNS:
if re.match(pattern, author, re.IGNORECASE):
return True
return False
def is_current_user(self, author: str, email: Optional[str]) -> bool:
"""Check if the author is the current user."""
if self.current_user_name and author == self.current_user_name:
return True
if self.current_user_email and email and email == self.current_user_email:
return True
return False
def parse_diff_ranges(self, file_path: str) -> List[Tuple[int, int]]:
"""
Parse git diff output to extract changed line ranges.
Returns:
List of (start_line, line_count) tuples for changed ranges
"""
ranges = []
try:
if self.mode == 'uncommitted':
# Check staged changes
diff_cmd = ['git', 'diff', '--cached', '--unified=0', file_path]
result = subprocess.run(diff_cmd, capture_output=True, text=True, check=False)
ranges.extend(self._extract_ranges_from_diff(result.stdout))
# Check unstaged changes
diff_cmd = ['git', 'diff', 'HEAD', '--unified=0', file_path]
result = subprocess.run(diff_cmd, capture_output=True, text=True, check=False)
ranges.extend(self._extract_ranges_from_diff(result.stdout))
else:
# Committed changes: compare against base branch
diff_cmd = ['git', 'diff', f'{self.base_branch}...HEAD', '--unified=0', file_path]
result = subprocess.run(diff_cmd, capture_output=True, text=True, check=True)
ranges.extend(self._extract_ranges_from_diff(result.stdout))
except subprocess.CalledProcessError as e:
print(f"Error running diff for {file_path}: {e}", file=sys.stderr)
return []
# Deduplicate and merge overlapping ranges
return self._merge_ranges(ranges)
def _extract_ranges_from_diff(self, diff_output: str) -> List[Tuple[int, int]]:
"""
Extract line ranges from diff @@ markers.
Diff format: @@ -old_start,old_count +new_start,new_count @@
We want the 'old' ranges (lines being replaced/modified in the base)
For pure additions (count=0), we analyze context lines before the insertion
point to find relevant code owners.
"""
ranges = []
# Match @@ -start[,count] +start[,count] @@
pattern = r'^@@\s+-(\d+)(?:,(\d+))?\s+\+\d+(?:,\d+)?\s+@@'
for line in diff_output.split('\n'):
match = re.match(pattern, line)
if match:
start = int(match.group(1))
count = int(match.group(2)) if match.group(2) else 1
if start > 0:
if count > 0:
# Regular modification/deletion
ranges.append((start, count))
else:
# Pure addition (count=0): analyze context before insertion
# Look at up to 5 lines before the insertion point
context_start = max(1, start - 5)
context_count = start - context_start
if context_count > 0:
ranges.append((context_start, context_count))
return ranges
def _merge_ranges(self, ranges: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
"""Merge overlapping line ranges."""
if not ranges:
return []
# Sort by start line
sorted_ranges = sorted(ranges, key=lambda x: x[0])
merged = [sorted_ranges[0]]
for start, count in sorted_ranges[1:]:
last_start, last_count = merged[-1]
last_end = last_start + last_count - 1
current_end = start + count - 1
# Check if ranges overlap or are adjacent
if start <= last_end + 1:
# Merge ranges
new_end = max(last_end, current_end)
new_count = new_end - last_start + 1
merged[-1] = (last_start, new_count)
else:
merged.append((start, count))
return merged
def analyze_file(self, file_path: str) -> None:
"""
Analyze git blame for a specific file.
Args:
file_path: Path to file relative to repo root
"""
# Get changed line ranges
ranges = self.parse_diff_ranges(file_path)
if not ranges:
return
# Determine which revision to blame
if self.mode == 'uncommitted':
blame_target = 'HEAD'
else:
blame_target = self.base_branch
# Run git blame on each range
for start, count in ranges:
end = start + count - 1
self._blame_range(file_path, start, end, blame_target)
def _blame_range(self, file_path: str, start: int, end: int, revision: str) -> None:
"""
Run git blame on a specific line range and extract author data.
Args:
file_path: File to blame
start: Start line number
end: End line number
revision: Git revision to blame (e.g., 'HEAD', 'main')
"""
try:
# Use porcelain format for easier parsing
blame_cmd = [
'git', 'blame',
'--porcelain',
'-L', f'{start},{end}',
revision,
'--',
file_path
]
result = subprocess.run(blame_cmd, capture_output=True, text=True, check=True)
self._parse_blame_output(result.stdout, file_path)
except subprocess.CalledProcessError as e:
print(f"Error running blame on {file_path}:{start}-{end}: {e}", file=sys.stderr)
def _parse_blame_output(self, blame_output: str, file_path: str) -> None:
"""
Parse git blame --porcelain output and aggregate author data.
Porcelain format:
<commit-hash> <original-line> <final-line> <num-lines>
author <author-name>
author-mail <email>
author-time <unix-timestamp>
...
\t<line-content>
"""
lines = blame_output.split('\n')
i = 0
while i < len(lines):
line = lines[i]
# Check if this is a commit header line
if line and not line.startswith('\t'):
parts = line.split()
if len(parts) >= 4 and len(parts[0]) == 40: # Looks like a SHA
# Parse commit metadata
author = None
email = None
timestamp = None
# Look ahead for author info
j = i + 1
while j < len(lines) and not lines[j].startswith('\t'):
if lines[j].startswith('author '):
author = lines[j][7:] # Remove 'author ' prefix
elif lines[j].startswith('author-mail '):
email = lines[j][12:].strip('<>') # Remove 'author-mail ' and <>
elif lines[j].startswith('author-time '):
timestamp = int(lines[j][12:])
j += 1
# Update author data (exclude bots and current user)
if author and not self.is_bot(author) and not self.is_current_user(author, email):
author_date = datetime.fromtimestamp(timestamp) if timestamp else None
self.authors[author]['line_count'] += 1
self.authors[author]['files'].add(file_path)
self.authors[author]['email'] = email
# Track most recent contribution
if author_date:
current_recent = self.authors[author]['most_recent_date']
if current_recent is None or author_date > current_recent:
self.authors[author]['most_recent_date'] = author_date
i = j
continue
i += 1
def get_results(self) -> Dict:
"""
Get aggregated results as a dictionary.
Returns:
Dictionary mapping author names to their statistics
"""
results = {}
for author, data in self.authors.items():
results[author] = {
'line_count': data['line_count'],
'most_recent_date': data['most_recent_date'].isoformat() if data['most_recent_date'] else None,
'files': sorted(list(data['files'])),
'email': data['email']
}
return results
def main():
parser = argparse.ArgumentParser(
description='Analyze git blame for changed lines to identify code authors'
)
parser.add_argument(
'--mode',
choices=['uncommitted', 'committed'],
required=True,
help='Analysis mode: uncommitted (vs HEAD) or committed (vs base branch)'
)
parser.add_argument(
'--file',
required=True,
action='append',
dest='files',
help='File(s) to analyze (can be specified multiple times)'
)
parser.add_argument(
'--base-branch',
help='Base branch for committed mode (e.g., main, master)'
)
parser.add_argument(
'--output',
choices=['json', 'text'],
default='json',
help='Output format (default: json)'
)
args = parser.parse_args()
# Validate arguments
if args.mode == 'committed' and not args.base_branch:
print("Error: --base-branch required for 'committed' mode", file=sys.stderr)
sys.exit(1)
# Analyze files
analyzer = BlameAnalyzer(mode=args.mode, base_branch=args.base_branch)
for file_path in args.files:
analyzer.analyze_file(file_path)
# Output results
results = analyzer.get_results()
if args.output == 'json':
print(json.dumps(results, indent=2))
else:
# Text output
print(f"\nAuthors of modified code ({len(results)} found):\n")
# Sort by line count
sorted_authors = sorted(
results.items(),
key=lambda x: x[1]['line_count'],
reverse=True
)
for author, data in sorted_authors:
print(f"{author} <{data['email']}>")
print(f" Lines: {data['line_count']}")
print(f" Most recent: {data['most_recent_date'] or 'unknown'}")
print(f" Files: {', '.join(data['files'])}")
print()
if __name__ == '__main__':
main()