Initial commit

2025-11-30 08:45:53 +08:00
commit 74958112ad
11 changed files with 1882 additions and 0 deletions
--- a/skills/suggest-reviewers/SKILL.md
+++ b/skills/suggest-reviewers/SKILL.md
@@ -0,0 +1,347 @@
+---
+name: Suggest Reviewers Helper
+description: Git blame analysis helper for the suggest-reviewers command
+---
+
+# Suggest Reviewers Helper
+
+This skill provides a Python helper script that analyzes git blame data for the `/git:suggest-reviewers` command. The script handles the complex task of identifying which lines were changed and who authored the original code.
+
+## When to Use This Skill
+
+Use this skill when implementing the `/git:suggest-reviewers` command. The helper script should be invoked during Step 3 of the command implementation (analyzing git blame for changed lines).
+
+**DO NOT implement git blame analysis manually** - always use the provided `analyze_blame.py` script.
+
+## Prerequisites
+
+- Python 3.6 or higher
+- Git repository with commit history
+- Git CLI available in PATH
+
+## Helper Script: analyze_blame.py
+
+The `analyze_blame.py` script automates the complex process of:
+1. Parsing git diff output to identify specific line ranges that were modified
+2. Running git blame on only the changed line ranges (not entire files)
+3. Extracting and aggregating author information with statistics
+4. Filtering out bot accounts automatically
+
+### Usage
+
+**For uncommitted changes:**
+```bash
+python3 ${CLAUDE_PLUGIN_ROOT}/skills/suggest-reviewers/analyze_blame.py \
+  --mode uncommitted \
+  --file path/to/file1.go \
+  --file path/to/file2.py \
+  --output json
+```
+
+**For committed changes on a feature branch:**
+```bash
+python3 ${CLAUDE_PLUGIN_ROOT}/skills/suggest-reviewers/analyze_blame.py \
+  --mode committed \
+  --base-branch main \
+  --file path/to/file1.go \
+  --file path/to/file2.py \
+  --output json
+```
+
+### Parameters
+
+- `--mode`: Required. Either `uncommitted` or `committed`
+  - `uncommitted`: Analyzes unstaged/staged changes against HEAD
+  - `committed`: Analyzes committed changes against a base branch
+
+- `--base-branch`: Required when mode is `committed`. The base branch to compare against (e.g., `main`, `master`)
+
+- `--file`: Can be specified multiple times. Each file to analyze for blame information. Only changed files should be passed.
+
+- `--output`: Output format. Default is `json`. Options:
+  - `json`: Machine-readable JSON output
+  - `text`: Human-readable text output
+
+### Output Format (JSON)
+
+```json
+{
+  "Author Name": {
+    "line_count": 45,
+    "most_recent_date": "2024-10-15T14:23:10",
+    "files": ["file1.go", "file2.go"],
+    "email": "author@example.com"
+  },
+  "Another Author": {
+    "line_count": 23,
+    "most_recent_date": "2024-09-20T09:15:33",
+    "files": ["file3.py"],
+    "email": "another@example.com"
+  }
+}
+```
+
+### Output Fields
+
+- `line_count`: Total number of modified lines authored by this person
+- `most_recent_date`: ISO 8601 timestamp of their most recent contribution to the changed code
+- `files`: Array of files where this author has contributions in the changed lines
+- `email`: Author's email address from git commits
+
+### Bot Filtering
+
+The script automatically filters out common bot accounts:
+- GitHub bots (e.g., `dependabot[bot]`, `renovate[bot]`)
+- CI bots (e.g., `openshift-ci-robot`, `k8s-ci-robot`)
+- Generic bot patterns (any name containing `[bot]` or ending in `-bot`)
+
+## Implementation Steps
+
+### Step 1: Collect changed files
+
+Before invoking the script, collect the list of changed files based on the scenario:
+
+**Uncommitted changes:**
+```bash
+# Get staged and unstaged files
+files=$(git diff --name-only --diff-filter=d HEAD)
+files+=" $(git diff --name-only --diff-filter=d --cached)"
+```
+
+**Committed changes:**
+```bash
+# Get files changed from base branch
+files=$(git diff --name-only --diff-filter=d ${base_branch}...HEAD)
+```
+
+### Step 2: Invoke the script
+
+Build the command with the appropriate mode and all changed files:
+
+```bash
+# Start building the command
+cmd="python3 ${CLAUDE_PLUGIN_ROOT}/skills/suggest-reviewers/analyze_blame.py"
+
+# Add mode
+if [ "$has_uncommitted" = true ] || [ "$on_base_branch" = true ]; then
+  cmd="$cmd --mode uncommitted"
+else
+  cmd="$cmd --mode committed --base-branch $base_branch"
+fi
+
+# Add each file
+for file in $files; do
+  cmd="$cmd --file $file"
+done
+
+# Add output format
+cmd="$cmd --output json"
+
+# Execute and capture JSON output
+blame_data=$($cmd)
+```
+
+### Step 3: Parse the output
+
+The JSON output can be parsed using Python, jq, or any JSON parser:
+
+```bash
+# Example using jq to get top contributor
+echo "$blame_data" | jq -r 'to_entries | sort_by(-.value.line_count) | .[0].key'
+
+# Example using Python
+python3 << EOF
+import json
+import sys
+
+data = json.loads('''$blame_data''')
+
+# Sort by line count
+sorted_authors = sorted(data.items(), key=lambda x: x[1]['line_count'], reverse=True)
+
+for author, stats in sorted_authors:
+    print(f"{author}: {stats['line_count']} lines, last modified {stats['most_recent_date']}")
+EOF
+```
+
+### Step 4: Combine with OWNERS data
+
+After getting blame data, merge it with OWNERS file information to produce the final ranked list of reviewers.
+
+## Error Handling
+
+### No changed files
+
+If no files are passed to the script:
+```
+Error: No files specified. Use --file option at least once.
+```
+
+**Resolution:** Ensure you've detected changed files correctly before invoking the script.
+
+### Invalid mode
+
+If an invalid mode is specified:
+```
+Error: Invalid mode 'invalid'. Must be 'uncommitted' or 'committed'.
+```
+
+**Resolution:** Use either `--mode uncommitted` or `--mode committed`.
+
+### Missing base branch in committed mode
+
+If `--mode committed` is used without `--base-branch`:
+```
+Error: --base-branch is required when mode is 'committed'.
+```
+
+**Resolution:** Provide the base branch: `--base-branch main`
+
+### File not in repository
+
+If a specified file is not tracked by git:
+```
+Warning: File 'path/to/file' is not tracked by git, skipping.
+```
+
+**Resolution:** This is a warning and can be safely ignored. The script will skip untracked files.
+
+### No blame data found
+
+If git blame returns no data for any files:
+```json
+{}
+```
+
+**Resolution:** This can happen if:
+- All changed files are newly created (no blame history)
+- All changes are in binary files
+- Git blame is unable to run
+
+In this case, fall back to OWNERS-only suggestions.
+
+## Examples
+
+### Example 1: Analyze uncommitted changes
+
+```bash
+$ python3 analyze_blame.py --mode uncommitted --file src/main.go --file src/utils.go --output json
+{
+  "Alice Developer": {
+    "line_count": 45,
+    "most_recent_date": "2024-10-15T14:23:10",
+    "files": ["src/main.go", "src/utils.go"],
+    "email": "alice@example.com"
+  },
+  "Bob Engineer": {
+    "line_count": 12,
+    "most_recent_date": "2024-09-20T09:15:33",
+    "files": ["src/main.go"],
+    "email": "bob@example.com"
+  }
+}
+```
+
+### Example 2: Analyze committed changes on feature branch
+
+```bash
+$ python3 analyze_blame.py --mode committed --base-branch main --file pkg/controller/manager.go --output json
+{
+  "Charlie Contributor": {
+    "line_count": 78,
+    "most_recent_date": "2024-10-01T11:42:55",
+    "files": ["pkg/controller/manager.go"],
+    "email": "charlie@example.com"
+  }
+}
+```
+
+### Example 3: Text output format
+
+```bash
+$ python3 analyze_blame.py --mode uncommitted --file README.md --output text
+
+Blame Analysis Results:
+=======================
+
+Alice Developer (alice@example.com)
+  Lines: 23
+  Most recent: 2024-10-15T14:23:10
+  Files: README.md
+
+Bob Engineer (bob@example.com)
+  Lines: 5
+  Most recent: 2024-08-12T16:30:21
+  Files: README.md
+```
+
+### Example 4: Multiple files with mixed results
+
+```bash
+$ python3 analyze_blame.py --mode committed --base-branch release-4.15 \
+    --file vendor/k8s.io/client-go/kubernetes/clientset.go \
+    --file pkg/controller/node.go \
+    --file docs/README.md \
+    --output json
+{
+  "Diana Developer": {
+    "line_count": 156,
+    "most_recent_date": "2024-09-28T13:15:42",
+    "files": ["vendor/k8s.io/client-go/kubernetes/clientset.go", "pkg/controller/node.go"],
+    "email": "diana@example.com"
+  },
+  "Eve Technical Writer": {
+    "line_count": 34,
+    "most_recent_date": "2024-10-10T10:22:18",
+    "files": ["docs/README.md"],
+    "email": "eve@example.com"
+  }
+}
+```
+
+## Technical Details
+
+### How the script works
+
+1. **Determine diff range**: Based on mode, calculates what to compare:
+   - `uncommitted`: Compares working directory against HEAD
+   - `committed`: Compares HEAD against base branch
+
+2. **Parse diff output**: Runs `git diff` with unified format to identify:
+   - Which files changed
+   - Which line ranges were added/modified
+   - Ignores deleted lines (can't blame what doesn't exist)
+
+3. **Run git blame**: For each file and line range:
+   - Runs `git blame -L start,end --line-porcelain file`
+   - Parses porcelain format to extract author, email, and timestamp
+   - Aggregates data across all changed lines
+
+4. **Filter and aggregate**:
+   - Removes bot accounts
+   - Groups by author name
+   - Counts total lines per author
+   - Tracks most recent contribution date
+   - Lists all files each author contributed to
+
+5. **Output results**: Formats as JSON or text based on `--output` parameter
+
+### Performance considerations
+
+- Only blames changed line ranges, not entire files (much faster for small changes to large files)
+- Processes files in parallel when possible
+- Caches git commands where appropriate
+- Skips binary files automatically
+
+## Limitations
+
+- Does not handle file renames/moves (treats as delete + add)
+- Bot filtering is based on common patterns; custom bots may not be filtered
+- Requires git history; newly initialized repos may not have useful data
+- Does not consider commit message content or PR review history
+
+## See Also
+
+- Main command: `/git:suggest-reviewers` in `plugins/git/commands/suggest-reviewers.md`
+- Git blame documentation: https://git-scm.com/docs/git-blame
+- Git diff documentation: https://git-scm.com/docs/git-diff
--- a/skills/suggest-reviewers/analyze_blame.py
+++ b/skills/suggest-reviewers/analyze_blame.py
@@ -0,0 +1,380 @@
+#!/usr/bin/env python3
+"""
+Git Blame Analysis Helper for suggest-reviewers command.
+
+This script helps identify the authors of code lines being modified in a PR,
+aggregating git blame data to suggest the most relevant reviewers.
+
+Usage:
+    python analyze_blame.py --mode <uncommitted|committed> --file <filepath> [--base-branch <branch>]
+
+Modes:
+    uncommitted: Analyze uncommitted changes (compares against HEAD)
+    committed:   Analyze committed changes on feature branch (compares against base branch)
+"""
+
+import argparse
+import json
+import re
+import subprocess
+import sys
+from collections import defaultdict
+from datetime import datetime
+from typing import Dict, List, Tuple, Optional
+
+
+class BlameAnalyzer:
+    """Analyzes git blame for changed lines in files."""
+
+    # Bot patterns to filter out
+    BOT_PATTERNS = [
+        r'.*\[bot\]',
+        r'openshift-bot',
+        r'k8s-ci-robot',
+        r'openshift-merge-robot',
+        r'openshift-ci\[bot\]',
+        r'dependabot',
+        r'renovate\[bot\]',
+    ]
+
+    def __init__(self, mode: str, base_branch: Optional[str] = None):
+        """
+        Initialize the analyzer.
+
+        Args:
+            mode: 'uncommitted' or 'committed'
+            base_branch: Base branch for committed mode (e.g., 'main')
+        """
+        self.mode = mode
+        self.base_branch = base_branch
+        self.authors = defaultdict(lambda: {
+            'line_count': 0,
+            'most_recent_date': None,
+            'files': set(),
+            'email': None
+        })
+
+        if mode == 'committed' and not base_branch:
+            raise ValueError("base_branch required for 'committed' mode")
+
+        # Get current user to exclude from suggestions
+        self.current_user_name = self._get_git_config('user.name')
+        self.current_user_email = self._get_git_config('user.email')
+
+    def _get_git_config(self, key: str) -> Optional[str]:
+        """Get a git config value."""
+        try:
+            result = subprocess.run(
+                ['git', 'config', '--get', key],
+                capture_output=True,
+                text=True,
+                check=False
+            )
+            if result.returncode == 0:
+                return result.stdout.strip()
+        except Exception:
+            pass
+        return None
+
+    def is_bot(self, author: str) -> bool:
+        """Check if an author name matches bot patterns."""
+        for pattern in self.BOT_PATTERNS:
+            if re.match(pattern, author, re.IGNORECASE):
+                return True
+        return False
+
+    def is_current_user(self, author: str, email: Optional[str]) -> bool:
+        """Check if the author is the current user."""
+        if self.current_user_name and author == self.current_user_name:
+            return True
+        if self.current_user_email and email and email == self.current_user_email:
+            return True
+        return False
+
+    def parse_diff_ranges(self, file_path: str) -> List[Tuple[int, int]]:
+        """
+        Parse git diff output to extract changed line ranges.
+
+        Returns:
+            List of (start_line, line_count) tuples for changed ranges
+        """
+        ranges = []
+
+        try:
+            if self.mode == 'uncommitted':
+                # Check staged changes
+                diff_cmd = ['git', 'diff', '--cached', '--unified=0', file_path]
+                result = subprocess.run(diff_cmd, capture_output=True, text=True, check=False)
+                ranges.extend(self._extract_ranges_from_diff(result.stdout))
+
+                # Check unstaged changes
+                diff_cmd = ['git', 'diff', 'HEAD', '--unified=0', file_path]
+                result = subprocess.run(diff_cmd, capture_output=True, text=True, check=False)
+                ranges.extend(self._extract_ranges_from_diff(result.stdout))
+            else:
+                # Committed changes: compare against base branch
+                diff_cmd = ['git', 'diff', f'{self.base_branch}...HEAD', '--unified=0', file_path]
+                result = subprocess.run(diff_cmd, capture_output=True, text=True, check=True)
+                ranges.extend(self._extract_ranges_from_diff(result.stdout))
+
+        except subprocess.CalledProcessError as e:
+            print(f"Error running diff for {file_path}: {e}", file=sys.stderr)
+            return []
+
+        # Deduplicate and merge overlapping ranges
+        return self._merge_ranges(ranges)
+
+    def _extract_ranges_from_diff(self, diff_output: str) -> List[Tuple[int, int]]:
+        """
+        Extract line ranges from diff @@ markers.
+
+        Diff format: @@ -old_start,old_count +new_start,new_count @@
+        We want the 'old' ranges (lines being replaced/modified in the base)
+
+        For pure additions (count=0), we analyze context lines before the insertion
+        point to find relevant code owners.
+        """
+        ranges = []
+        # Match @@ -start[,count] +start[,count] @@
+        pattern = r'^@@\s+-(\d+)(?:,(\d+))?\s+\+\d+(?:,\d+)?\s+@@'
+
+        for line in diff_output.split('\n'):
+            match = re.match(pattern, line)
+            if match:
+                start = int(match.group(1))
+                count = int(match.group(2)) if match.group(2) else 1
+
+                if start > 0:
+                    if count > 0:
+                        # Regular modification/deletion
+                        ranges.append((start, count))
+                    else:
+                        # Pure addition (count=0): analyze context before insertion
+                        # Look at up to 5 lines before the insertion point
+                        context_start = max(1, start - 5)
+                        context_count = start - context_start
+                        if context_count > 0:
+                            ranges.append((context_start, context_count))
+
+        return ranges
+
+    def _merge_ranges(self, ranges: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
+        """Merge overlapping line ranges."""
+        if not ranges:
+            return []
+
+        # Sort by start line
+        sorted_ranges = sorted(ranges, key=lambda x: x[0])
+        merged = [sorted_ranges[0]]
+
+        for start, count in sorted_ranges[1:]:
+            last_start, last_count = merged[-1]
+            last_end = last_start + last_count - 1
+            current_end = start + count - 1
+
+            # Check if ranges overlap or are adjacent
+            if start <= last_end + 1:
+                # Merge ranges
+                new_end = max(last_end, current_end)
+                new_count = new_end - last_start + 1
+                merged[-1] = (last_start, new_count)
+            else:
+                merged.append((start, count))
+
+        return merged
+
+    def analyze_file(self, file_path: str) -> None:
+        """
+        Analyze git blame for a specific file.
+
+        Args:
+            file_path: Path to file relative to repo root
+        """
+        # Get changed line ranges
+        ranges = self.parse_diff_ranges(file_path)
+
+        if not ranges:
+            return
+
+        # Determine which revision to blame
+        if self.mode == 'uncommitted':
+            blame_target = 'HEAD'
+        else:
+            blame_target = self.base_branch
+
+        # Run git blame on each range
+        for start, count in ranges:
+            end = start + count - 1
+            self._blame_range(file_path, start, end, blame_target)
+
+    def _blame_range(self, file_path: str, start: int, end: int, revision: str) -> None:
+        """
+        Run git blame on a specific line range and extract author data.
+
+        Args:
+            file_path: File to blame
+            start: Start line number
+            end: End line number
+            revision: Git revision to blame (e.g., 'HEAD', 'main')
+        """
+        try:
+            # Use porcelain format for easier parsing
+            blame_cmd = [
+                'git', 'blame',
+                '--porcelain',
+                '-L', f'{start},{end}',
+                revision,
+                '--',
+                file_path
+            ]
+
+            result = subprocess.run(blame_cmd, capture_output=True, text=True, check=True)
+            self._parse_blame_output(result.stdout, file_path)
+
+        except subprocess.CalledProcessError as e:
+            print(f"Error running blame on {file_path}:{start}-{end}: {e}", file=sys.stderr)
+
+    def _parse_blame_output(self, blame_output: str, file_path: str) -> None:
+        """
+        Parse git blame --porcelain output and aggregate author data.
+
+        Porcelain format:
+            <commit-hash> <original-line> <final-line> <num-lines>
+            author <author-name>
+            author-mail <email>
+            author-time <unix-timestamp>
+            ...
+            \t<line-content>
+        """
+        lines = blame_output.split('\n')
+        i = 0
+
+        while i < len(lines):
+            line = lines[i]
+
+            # Check if this is a commit header line
+            if line and not line.startswith('\t'):
+                parts = line.split()
+                if len(parts) >= 4 and len(parts[0]) == 40:  # Looks like a SHA
+                    # Parse commit metadata
+                    author = None
+                    email = None
+                    timestamp = None
+
+                    # Look ahead for author info
+                    j = i + 1
+                    while j < len(lines) and not lines[j].startswith('\t'):
+                        if lines[j].startswith('author '):
+                            author = lines[j][7:]  # Remove 'author ' prefix
+                        elif lines[j].startswith('author-mail '):
+                            email = lines[j][12:].strip('<>')  # Remove 'author-mail ' and <>
+                        elif lines[j].startswith('author-time '):
+                            timestamp = int(lines[j][12:])
+                        j += 1
+
+                    # Update author data (exclude bots and current user)
+                    if author and not self.is_bot(author) and not self.is_current_user(author, email):
+                        author_date = datetime.fromtimestamp(timestamp) if timestamp else None
+
+                        self.authors[author]['line_count'] += 1
+                        self.authors[author]['files'].add(file_path)
+                        self.authors[author]['email'] = email
+
+                        # Track most recent contribution
+                        if author_date:
+                            current_recent = self.authors[author]['most_recent_date']
+                            if current_recent is None or author_date > current_recent:
+                                self.authors[author]['most_recent_date'] = author_date
+
+                    i = j
+                    continue
+
+            i += 1
+
+    def get_results(self) -> Dict:
+        """
+        Get aggregated results as a dictionary.
+
+        Returns:
+            Dictionary mapping author names to their statistics
+        """
+        results = {}
+
+        for author, data in self.authors.items():
+            results[author] = {
+                'line_count': data['line_count'],
+                'most_recent_date': data['most_recent_date'].isoformat() if data['most_recent_date'] else None,
+                'files': sorted(list(data['files'])),
+                'email': data['email']
+            }
+
+        return results
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Analyze git blame for changed lines to identify code authors'
+    )
+    parser.add_argument(
+        '--mode',
+        choices=['uncommitted', 'committed'],
+        required=True,
+        help='Analysis mode: uncommitted (vs HEAD) or committed (vs base branch)'
+    )
+    parser.add_argument(
+        '--file',
+        required=True,
+        action='append',
+        dest='files',
+        help='File(s) to analyze (can be specified multiple times)'
+    )
+    parser.add_argument(
+        '--base-branch',
+        help='Base branch for committed mode (e.g., main, master)'
+    )
+    parser.add_argument(
+        '--output',
+        choices=['json', 'text'],
+        default='json',
+        help='Output format (default: json)'
+    )
+
+    args = parser.parse_args()
+
+    # Validate arguments
+    if args.mode == 'committed' and not args.base_branch:
+        print("Error: --base-branch required for 'committed' mode", file=sys.stderr)
+        sys.exit(1)
+
+    # Analyze files
+    analyzer = BlameAnalyzer(mode=args.mode, base_branch=args.base_branch)
+
+    for file_path in args.files:
+        analyzer.analyze_file(file_path)
+
+    # Output results
+    results = analyzer.get_results()
+
+    if args.output == 'json':
+        print(json.dumps(results, indent=2))
+    else:
+        # Text output
+        print(f"\nAuthors of modified code ({len(results)} found):\n")
+
+        # Sort by line count
+        sorted_authors = sorted(
+            results.items(),
+            key=lambda x: x[1]['line_count'],
+            reverse=True
+        )
+
+        for author, data in sorted_authors:
+            print(f"{author} <{data['email']}>")
+            print(f"  Lines: {data['line_count']}")
+            print(f"  Most recent: {data['most_recent_date'] or 'unknown'}")
+            print(f"  Files: {', '.join(data['files'])}")
+            print()
+
+
+if __name__ == '__main__':
+    main()