Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:45:11 +08:00
commit 42c0b6ee81
16 changed files with 3608 additions and 0 deletions

View File

@@ -0,0 +1,247 @@
---
name: get-git-diff
description: Examines git diffs between commits or branches with intelligent analysis. Provides unified diff format with comprehensive summaries including file statistics, rename detection, and merge commit handling. Outputs formatted diffs to /claudedocs for documentation and review purposes.
---
# Git Diff Analyzer
## <20> MANDATORY COMPLIANCE <20>
**CRITICAL**: The 4-step workflow outlined in this document MUST be followed in exact order for EVERY diff analysis. Skipping steps or deviating from the procedure will result in incomplete analysis. This is non-negotiable.
## File Structure
- **SKILL.md** (this file): Main instructions and MANDATORY workflow
- **examples.md**: Usage scenarios with different diff types
- **../../context/git/**: Shared git context files
- `git_diff_reference.md`: Unified diff format reference and best practices
- `diff_patterns.md`: Common patterns to identify in code changes
- **../../memory/skills/get-git-diff/**: Project-specific diff analysis memory
- `{project-name}/`: Per-project diff patterns and insights
- **scripts/**:
- `README.md`: Complete documentation for all helper scripts
- `validate.sh`: Git repository and commit validation functions
- `commit_info.sh`: Commit metadata retrieval (hash, author, date, message)
- `diff_stats.sh`: Diff statistics and line count analysis
- `file_operations.sh`: File operation detection (add, modify, delete, rename)
- `utils.sh`: General utilities (branch detection, formatting, repo info)
- **templates/**:
- `output_template.md`: Standard output format template
## Analysis Focus Areas
Git diff analysis evaluates 7 critical dimensions:
1. **Change Scope**: Files affected, lines modified, overall impact radius
2. **Change Type**: Feature addition, bug fix, refactoring, configuration change
3. **Structural Changes**: File renames, moves, deletions, additions
4. **Risk Assessment**: Breaking changes, API modifications, database migrations
5. **Code Quality Impact**: Complexity changes, test coverage changes
6. **Merge Conflicts**: Merge commit analysis, conflict resolution patterns
7. **Performance Impact**: Algorithm changes, database query modifications, resource usage
**Note**: Analysis depth is summary-level, focusing on what changed and high-level impact.
---
## MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)
### <20> STEP 1: Commit Identification (REQUIRED)
**YOU MUST:**
1. Check if commit hashes/branch names were provided in the triggering prompt
2. If NOT provided, ask the user with these options:
- **Option A**: Compare specific commit hashes (ask for two commit SHAs)
- **Option B**: Compare HEAD of current branch to main/master
- **Option C**: Compare two branch names
- **Option D**: Compare current changes to a specific commit
3. Validate that provided commits/branches exist in the repository
4. Use `git rev-parse` to verify and get full commit hashes
5. Get commit metadata (author, date, message) for both commits
**DO NOT PROCEED WITHOUT VALID COMMITS**
### <20> STEP 2: Execute Git Diff with Special Handling (REQUIRED)
**YOU MUST:**
1. Execute `git diff [commit1]...[commit2]` to get the unified diff
2. Check for and handle special cases:
- **Large diffs** (>1000 lines): Warn user, offer to summarize only or proceed
- **Renamed files**: Use `git diff -M` to detect renames
- **Merge commits**: Use `git diff [commit]^...[commit]` for merge commit analysis
- **Binary files**: Note binary file changes separately
3. Get diff statistics with `git diff --stat`
4. Get file list with `git diff --name-status` to identify A/M/D/R operations
**DO NOT SKIP SPECIAL CASE DETECTION**
### <20> STEP 3: Analyze and Summarize (REQUIRED)
**YOU MUST analyze and document**:
1. **Commit Metadata**:
- Commit hashes (full and short)
- Author and date for both commits
- Commit messages
- Number of commits between the two refs (if applicable)
2. **Change Statistics**:
- Total files changed
- Total insertions (+)
- Total deletions (-)
- Net change
3. **File Operations**:
- Added files (A)
- Modified files (M)
- Deleted files (D)
- Renamed files (R) - show old vs new
- Copied files (C)
4. **Change Categorization**:
- Group files by type (source code, tests, docs, config)
- Identify potential areas of impact
- Flag potentially risky changes
5. **Special Notes**:
- Merge commit indicator (if applicable)
- Large diff warning (if >1000 lines)
- Binary file changes
- Submodule changes
**DO NOT SKIP ANALYSIS**
### <20> STEP 4: Generate Output & Update Project Memory (REQUIRED)
**YOU MUST:**
1. Use the template from `templates/output_template.md`
2. Create filename: `diff_{short_hash1}_{short_hash2}.md`
3. Include all components:
- Header with commit information
- Summary section with all statistics and analysis
- Full unified diff wrapped in markdown code blocks
4. Save to `/claudedocs/` directory
5. Confirm file was written successfully
**Output Format Requirements**:
- Unified diff must be in triple-backtick code blocks with `diff` language tag
- Summary must be in clear markdown sections
- File paths must use code formatting
- Statistics must be in tables or lists
- All sections must be clearly labeled
**DO NOT OMIT ANY REQUIRED SECTIONS**
**OPTIONAL: Update Project Memory**
If patterns emerge during analysis, consider storing insights in `../../memory/skills/get-git-diff/{project-name}/`:
- Common file change patterns
- Frequently modified areas
- Notable commit patterns or conventions
---
## Special Case Handling
### Large Diffs (>1000 lines)
When encountering large diffs:
1. Calculate total line count
2. Warn user: "This diff contains [N] lines across [M] files"
3. Ask user: "Would you like to proceed with full diff or summary only?"
4. If summary only:
- Include all metadata and statistics
- List all changed files with their line counts
- Omit the detailed unified diff
- Note: "Full diff omitted due to size. Use `git diff [hash1]...[hash2]` to view."
### Renamed/Moved Files
For file renames:
1. Use `git diff -M` flag to detect renames (default similarity index: 50%)
2. In summary, clearly show: `old/path/file.py <20> new/path/file.py`
3. Indicate if content was also modified: `R+M` (renamed and modified)
4. In unified diff, show rename header: `rename from/to`
### Merge Commits
For merge commits:
1. Detect with `git rev-list --merges`
2. Note in summary: "This is a merge commit"
3. Show both parent commits
4. Use `git diff [commit]^...[commit]` to show changes introduced by merge
5. Optionally offer to show diff against each parent separately
---
## Compliance Checklist
Before completing ANY diff analysis, verify:
- [ ] Step 1: Commits identified and validated
- [ ] Step 2: Git diff executed with special case detection
- [ ] Step 3: Complete analysis with all statistics and categorization
- [ ] Step 4: Output generated in correct format and saved to /claudedocs
**FAILURE TO COMPLETE ALL STEPS INVALIDATES THE ANALYSIS**
---
## Output File Naming Convention
**Format**: `diff_{short1}_{short2}.md`
Where:
- `{short1}` = First 7 characters of first commit hash
- `{short2}` = First 7 characters of second commit hash
**Examples**:
- `diff_a1b2c3d_e4f5g6h.md` (commit to commit)
- `diff_main_feature-branch.md` (branch comparison, if hashes not available)
**Alternative for branches**: If comparing branch tips, you may use branch names if they're short and filesystem-safe.
---
## Git Commands Reference
### Core Commands Used:
```bash
# Get commit info
git rev-parse [commit]
git log -1 --format="%H|%h|%an|%ae|%ad|%s" [commit]
# Generate diff
git diff [commit1]...[commit2]
git diff --stat [commit1]...[commit2]
git diff --name-status [commit1]...[commit2]
git diff -M [commit1]...[commit2] # Detect renames
# Special cases
git rev-list --merges [commit] # Check if merge commit
git diff [commit]^1..[commit] # Merge commit against first parent
```
---
## Further Reading
Refer to official documentation:
- **Git Documentation**:
- Git Diff: https://git-scm.com/docs/git-diff
- Diff Format: https://git-scm.com/docs/diff-format
- **Best Practices**:
- Pro Git Book: https://git-scm.com/book/en/v2
- Understanding Git Diff: https://git-scm.com/docs/git-diff#_generating_patches_with_p
---
## Version History
- v1.1.0 (2025-01-XX): Centralized context and project memory
- Context files moved to ../../context/git/
- Project-specific memory system in ../../memory/skills/get-git-diff/
- Optional memory updates for common patterns
- v1.0.0 (2025-11-13): Initial release
- Mandatory 4-step workflow
- Summary-level analysis with statistics
- Special handling for large diffs, renames, and merge commits
- Unified diff output to /claudedocs

View File

@@ -0,0 +1,510 @@
# Git Diff Analyzer Examples
This file contains example scenarios demonstrating how to use the get-git-diff skill for various diff analysis tasks.
---
## Example 1: Comparing Two Specific Commits
### Scenario
Developer wants to see what changed between two feature commits.
### User Prompt
```
I need to see the diff between commits a1b2c3d and e4f5g6h
```
### Skill Execution
**Step 1 - Commit Identification**:
- Commits provided: `a1b2c3d` and `e4f5g6h`
- Validate: `git rev-parse a1b2c3d e4f5g6h`
- Get metadata: `git log -1 --format="%H|%h|%an|%ae|%ad|%s" [commit]`
**Step 2 - Execute Diff**:
```bash
git diff a1b2c3d...e4f5g6h
git diff --stat a1b2c3d...e4f5g6h
git diff --name-status a1b2c3d...e4f5g6h
```
**Step 3 - Analysis**:
```
Commit Metadata:
- From: a1b2c3d (John Doe, 2025-11-01, "Add user authentication")
- To: e4f5g6h (Jane Smith, 2025-11-13, "Refactor auth module")
- Commits between: 15
Statistics:
- Files changed: 8
- Insertions: 234
- Deletions: 156
- Net change: +78
File Operations:
- Modified: src/auth/login.py, src/auth/session.py, tests/test_auth.py
- Added: src/auth/jwt_handler.py, src/auth/validators.py
- Deleted: src/auth/legacy_auth.py
```
**Step 4 - Output**:
File: `/claudedocs/diff_a1b2c3d_e4f5g6h.md`
---
## Example 2: Comparing Current Branch to Main
### Scenario
Developer wants to review all changes in feature branch before creating a PR.
### User Prompt
```
Show me the diff for my feature branch compared to main
```
### Skill Response
Since no specific commits provided, ask user:
- Option A: Compare specific commit hashes
- Option B: Compare HEAD of current branch to main ← **User selects this**
- Option C: Compare two branch names
- Option D: Compare current changes to a specific commit
**Step 1 - Commit Identification**:
```bash
# Get current branch
git rev-parse --abbrev-ref HEAD # Returns: feature/new-api
# Get commit hashes
git rev-parse main # Returns: d7e8f9a...
git rev-parse HEAD # Returns: m3n4o5p...
```
**Step 2 - Execute Diff**:
```bash
git diff main...HEAD
git diff --stat main...HEAD
git diff --name-status main...HEAD
```
**Step 3 - Analysis**:
```
Comparing: main (d7e8f9a) → feature/new-api (m3n4o5p)
Statistics:
- Files changed: 12
- Insertions: 567
- Deletions: 89
- Net change: +478
File Operations:
- Added: src/api/v2/endpoints.py, src/api/v2/schemas.py, tests/api/test_v2.py
- Modified: src/api/router.py, src/main.py, README.md
- Deleted: (none)
Change Categorization:
- Source code: 8 files (src/api/*, src/main.py)
- Tests: 3 files (tests/api/*)
- Documentation: 1 file (README.md)
Impact Areas:
- API routing layer
- Request/response schemas
- Test coverage
```
**Step 4 - Output**:
File: `/claudedocs/diff_d7e8f9a_m3n4o5p.md`
---
## Example 3: Large Diff Handling
### Scenario
Comparing two commits with extensive changes (e.g., major refactoring or dependency update).
### User Prompt
```
Compare commit abc1234 to def5678
```
**Step 2 - Execute Diff**:
```bash
git diff --stat abc1234...def5678
# Output shows: 87 files changed, 15234 insertions(+), 12456 deletions(-)
```
**Large Diff Detected**: Total lines = 27,690
**Skill Response**:
⚠️ **Large Diff Warning**
This diff contains 27,690 lines across 87 files. This is a substantial change set.
Would you like to:
- **Option A**: Proceed with full diff (will be very long)
- **Option B**: Summary only (statistics and file list, no line-by-line diff)
**User selects**: Option B - Summary only
**Step 3 - Analysis**:
```
Commit Metadata:
- From: abc1234 (Team Lead, 2025-10-15, "Update dependencies to Python 3.12")
- To: def5678 (Team Lead, 2025-11-13, "Complete Python 3.12 migration")
Statistics:
- Files changed: 87
- Insertions: 15,234
- Deletions: 12,456
- Net change: +2,778
Top Changed Files:
1. requirements.txt (+45, -32)
2. src/core/utils.py (+234, -189)
3. tests/test_compatibility.py (+456, -123)
... (84 more files)
Change Categorization:
- Dependency files: 3 files
- Source code: 62 files
- Tests: 18 files
- Documentation: 4 files
Impact Assessment:
⚠️ Major refactoring detected
⚠️ Breaking changes likely
⚠️ Extensive test updates
```
**Step 4 - Output**:
File: `/claudedocs/diff_abc1234_def5678.md`
**Note in output**: "Full unified diff omitted due to size (27,690 lines). Use `git diff abc1234...def5678` to view complete diff."
---
## Example 4: Renamed Files Detection
### Scenario
Developer refactored project structure, moving files to new directories.
### User Prompt
```
Show me what changed in the refactoring commit x9y8z7w
```
**Step 1 - Commit Identification**:
- Comparing: x9y8z7w^ (parent) → x9y8z7w (the refactoring commit)
**Step 2 - Execute Diff with Rename Detection**:
```bash
git diff -M x9y8z7w^...x9y8z7w
git diff --name-status -M x9y8z7w^...x9y8z7w
```
Output:
```
R100 src/utils.py → src/core/utils.py
R100 src/helpers.py → src/core/helpers.py
R085 src/auth.py → src/auth/main.py
M src/main.py
A src/core/__init__.py
```
**Step 3 - Analysis**:
```
Commit Metadata:
- Commit: x9y8z7w (Developer, 2025-11-12, "Refactor: reorganize project structure")
Statistics:
- Files changed: 6
- Insertions: 45
- Deletions: 12
- Net change: +33
File Operations:
- Renamed (no changes):
• src/utils.py → src/core/utils.py
• src/helpers.py → src/core/helpers.py
- Renamed + Modified (85% similarity):
• src/auth.py → src/auth/main.py
- Modified:
• src/main.py (updated imports)
- Added:
• src/core/__init__.py (new package init)
Change Type: Refactoring
Risk Assessment: Medium (import paths changed, may break external references)
```
**Step 4 - Output**:
File: `/claudedocs/diff_w6v5u4t_x9y8z7w.md`
---
## Example 5: Merge Commit Analysis
### Scenario
Developer wants to understand what a merge commit introduced.
### User Prompt
```
Analyze the merge commit m1e2r3g4
```
**Step 1 - Commit Identification**:
```bash
git rev-list --merges m1e2r3g4^..m1e2r3g4
# Confirms: m1e2r3g4 is a merge commit
git log -1 --format="%P" m1e2r3g4
# Returns: a1b2c3d e4f5g6h (two parents)
```
**Skill Response**:
⚠️ **Merge Commit Detected**
This is a merge commit with two parents:
- Parent 1: a1b2c3d (main branch)
- Parent 2: e4f5g6h (feature branch)
**Step 2 - Execute Diff**:
```bash
# Show changes introduced by the merge (against first parent)
git diff m1e2r3g4^1...m1e2r3g4
git diff --stat m1e2r3g4^1...m1e2r3g4
```
**Step 3 - Analysis**:
```
Merge Commit Metadata:
- Commit: m1e2r3g4
- Author: John Doe
- Date: 2025-11-13
- Message: "Merge feature/user-profiles into main"
- Parent 1 (base): a1b2c3d (main)
- Parent 2 (merged): e4f5g6h (feature/user-profiles)
Statistics:
- Files changed: 15
- Insertions: 892
- Deletions: 134
- Net change: +758
Merged Branch Summary:
- Branch: feature/user-profiles
- Commits merged: 23
- Primary changes:
• New user profile management system
• Profile API endpoints
• Profile database models
• Associated tests
Change Categorization:
- Source code: 9 files (src/profiles/*, src/api/profiles.py)
- Database: 2 files (migrations/*, models/profile.py)
- Tests: 3 files (tests/profiles/*)
- Documentation: 1 file (docs/api.md)
Merge Notes:
✓ No merge conflicts (clean merge)
✓ All tests passing (based on commit message)
```
**Step 4 - Output**:
File: `/claudedocs/diff_a1b2c3d_m1e2r3g4.md`
Special note in file: "This is a merge commit analysis showing changes introduced by merging feature/user-profiles into main."
---
## Example 6: Comparing Branches by Name
### Scenario
Developer wants to see differences between two branches without knowing commit hashes.
### User Prompt
```
Compare the staging branch to the production branch
```
**Skill Response**:
No specific commits provided. Asking user for comparison target...
User selects: **Option C - Compare two branch names**
Please provide:
- First branch: **production**
- Second branch: **staging**
**Step 1 - Commit Identification**:
```bash
git rev-parse production # Returns: p1r2o3d...
git rev-parse staging # Returns: s4t5a6g...
git log --oneline production..staging
# Shows: 8 commits ahead
```
**Step 3 - Analysis**:
```
Comparing Branches:
- production (p1r2o3d): "Release v2.1.0" (2025-11-01)
- staging (s4t5a6g): "Add feature X" (2025-11-13)
- Commits ahead: 8
- Commits behind: 0
Statistics:
- Files changed: 24
- Insertions: 1,234
- Deletions: 456
- Net change: +778
File Operations:
- Added: 8 files (new features)
- Modified: 14 files (enhancements)
- Deleted: 2 files (deprecated code)
Impact Assessment:
✓ Ready for production deployment
✓ All features tested in staging
⚠️ Database migration required (see migrations/0023_add_feature_x.py)
```
**Step 4 - Output**:
File: `/claudedocs/diff_p1r2o3d_s4t5a6g.md`
---
## Example 7: Current Working Directory Changes
### Scenario
Developer has uncommitted changes and wants to see diff against last commit.
### User Prompt
```
Show me my current uncommitted changes
```
**Skill Response**:
No specific commits provided. Asking user...
User selects: **Option D - Compare current changes to a specific commit**
Please provide the commit to compare against (or press Enter for HEAD): **[User presses Enter]**
**Step 1 - Commit Identification**:
```bash
# Comparing working directory against HEAD
git rev-parse HEAD # Returns: h3e4a5d...
```
**Step 2 - Execute Diff**:
```bash
git diff HEAD
git diff --stat HEAD
git diff --name-status HEAD
```
**Step 3 - Analysis**:
```
Working Directory Changes:
- Base commit: h3e4a5d (HEAD, "Latest commit")
- Status: Uncommitted changes
Statistics:
- Files changed: 3
- Insertions: 67
- Deletions: 23
- Net change: +44
File Operations:
- Modified:
• src/api/endpoints.py (+45, -12)
• tests/test_api.py (+18, -8)
• README.md (+4, -3)
Change Type: Feature development (in progress)
Status: ⚠️ Uncommitted - not yet in version control
```
**Step 4 - Output**:
File: `/claudedocs/diff_h3e4a5d_working.md`
---
## Example 8: Binary Files in Diff
### Scenario
Diff includes binary file changes (images, PDFs, compiled files).
### User Prompt
```
Compare commit old123 to new456
```
**Step 2 - Execute Diff**:
```bash
git diff --stat old123...new456
git diff --name-status old123...new456
```
Output includes:
```
M docs/architecture.pdf
M static/images/logo.png
A static/images/hero.jpg
```
**Step 3 - Analysis**:
```
Statistics:
- Files changed: 15
- Text files: 12 (analyzed in diff)
- Binary files: 3 (listed separately)
Text File Changes:
- Insertions: 234
- Deletions: 156
- Net change: +78
Binary File Changes:
- Modified:
• docs/architecture.pdf (binary file changed)
• static/images/logo.png (binary file changed)
- Added:
• static/images/hero.jpg (new binary file)
Note: Binary file content changes not shown in diff.
Use `git show old123:path` and `git show new456:path` to extract binary files.
```
**Step 4 - Output**:
File includes note: "⚠️ This diff contains 3 binary files. Binary content not displayed. See summary for list of affected binary files."
---
## Summary of Use Cases
1. **Two specific commits** - Direct comparison with full metadata
2. **Branch to main** - Pre-PR review workflow
3. **Large diffs** - Summary-only option for massive changes
4. **Renamed files** - Detect and document file restructuring
5. **Merge commits** - Special handling with parent information
6. **Branch comparison** - Compare branch tips by name
7. **Working directory** - Review uncommitted changes
8. **Binary files** - Special notation for non-text files
## Best Practices
- Always validate commits exist before running diff
- For large diffs, offer summary option first
- Clearly indicate merge commits with special notation
- Show both old and new paths for renamed files
- Categorize changes by file type and impact area
- Provide actionable insights in the summary
- Save output with descriptive filenames
- Include enough metadata for audit trail

View File

@@ -0,0 +1,240 @@
# Git Diff Helper Scripts
This directory contains bash utility scripts for git diff operations used by the get-git-diff skill.
## Scripts Overview
### validate.sh
Validation and verification functions.
**Functions:**
- `is_git_repo()` - Check if in a git repository
- `validate_commit <ref>` - Validate a commit reference
- `validate_commit_pair <ref1> <ref2>` - Validate two commits
**Usage:**
```bash
source validate.sh
if is_git_repo; then
validate_commit "HEAD"
fi
```
### commit_info.sh
Commit information and metadata retrieval.
**Functions:**
- `get_commit_info <ref>` - Get full commit information
- `get_short_hash <ref>` - Get 7-character hash
- `is_merge_commit <ref>` - Check if merge commit
- `get_commits_between <ref1> <ref2>` - Count commits between refs
- `get_commit_message <ref>` - Get commit message
- `get_commit_author <ref>` - Get commit author
- `get_commit_date <ref>` - Get commit date
**Usage:**
```bash
source commit_info.sh
info=$(get_commit_info "HEAD")
IFS='|' read -r full short author email date message <<< "$info"
echo "Commit: $short - $message"
```
### diff_stats.sh
Diff statistics and analysis.
**Functions:**
- `get_diff_stats <ref1> <ref2>` - Get diff statistics
- `get_file_stats <ref1> <ref2>` - Get per-file statistics
- `is_large_diff <ref1> <ref2> [threshold]` - Check if diff is large
- `get_total_lines <ref1> <ref2>` - Get total lines changed
- `get_files_changed <ref1> <ref2>` - Get file count
**Usage:**
```bash
source diff_stats.sh
stats=$(get_diff_stats "HEAD^" "HEAD")
IFS=$'\t' read -r files ins del net <<< "$stats"
echo "Changed: $files files, +$ins -$del"
```
### file_operations.sh
File operation analysis (add, modify, delete, rename).
**Functions:**
- `get_file_operations <ref1> <ref2>` - Get file operations
- `count_file_operations <ref1> <ref2>` - Count operations by type
- `get_files_by_operation <ref1> <ref2> <type>` - Filter by operation (A/M/D/R)
- `get_renamed_files <ref1> <ref2>` - Get renames with similarity
- `get_binary_files <ref1> <ref2>` - Get binary files
- `count_binary_files <ref1> <ref2>` - Count binary files
- `categorize_files` - Categorize files by type (reads from stdin)
- `get_categorized_counts <ref1> <ref2>` - Get category counts
**Usage:**
```bash
source file_operations.sh
counts=$(count_file_operations "main" "feature-branch")
IFS=$'\t' read -r added modified deleted renamed copied <<< "$counts"
echo "Added: $added, Modified: $modified"
```
### utils.sh
General utility functions.
**Functions:**
- `get_current_branch()` - Get current branch name
- `get_default_branch()` - Get default branch (main/master)
- `format_diff_filename <hash1> <hash2>` - Format output filename
- `get_repo_root()` - Get repository root path
- `get_repo_name()` - Get repository name
- `get_remote_url [remote]` - Get remote URL
- `is_ancestor <ref1> <ref2>` - Check if ref1 is ancestor of ref2
- `get_common_ancestor <ref1> <ref2>` - Get merge base
- `format_timestamp [format]` - Format current timestamp
- `ensure_directory <path>` - Create directory if needed
- `get_git_config <key>` - Get git config value
- `is_working_tree_clean()` - Check for uncommitted changes
- `get_branches [type]` - List branches (local/remote/all)
- `ref_exists <ref>` - Check if ref exists
**Usage:**
```bash
source utils.sh
branch=$(get_current_branch)
filename=$(format_diff_filename "abc123" "def456")
```
## Running Scripts Directly
All scripts can be sourced for their functions or run directly for examples:
```bash
# Run directly for examples
./commit_info.sh HEAD
./diff_stats.sh HEAD^ HEAD
./file_operations.sh main feature-branch
./utils.sh
# Source for functions
source diff_stats.sh
if is_large_diff "HEAD^" "HEAD" 500; then
echo "Large diff detected!"
fi
```
## Complete Example
```bash
#!/usr/bin/env bash
# Source all helper scripts
SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")"
source "${SCRIPT_DIR}/validate.sh"
source "${SCRIPT_DIR}/commit_info.sh"
source "${SCRIPT_DIR}/diff_stats.sh"
source "${SCRIPT_DIR}/file_operations.sh"
source "${SCRIPT_DIR}/utils.sh"
# Validate we're in a git repo
if ! is_git_repo; then
echo "Error: Not in a git repository"
exit 1
fi
# Get commits to compare
commit1="HEAD^"
commit2="HEAD"
# Validate commits
if ! validate_commit_pair "$commit1" "$commit2" >/dev/null; then
echo "Error: Invalid commits"
exit 1
fi
# Get commit info
info1=$(get_commit_info "$commit1")
info2=$(get_commit_info "$commit2")
IFS='|' read -r _ short1 _ _ _ msg1 <<< "$info1"
IFS='|' read -r _ short2 _ _ _ msg2 <<< "$info2"
echo "Comparing: $short1$short2"
# Get statistics
stats=$(get_diff_stats "$commit1" "$commit2")
IFS=$'\t' read -r files ins del net <<< "$stats"
echo "Files: $files"
echo "Changes: +$ins -$del (net: $net)"
# Check if large
if is_large_diff "$commit1" "$commit2"; then
total=$(is_large_diff "$commit1" "$commit2")
echo "⚠ Large diff: $total lines"
fi
# File operations
ops=$(count_file_operations "$commit1" "$commit2")
IFS=$'\t' read -r added modified deleted renamed _ <<< "$ops"
echo "Added: $added, Modified: $modified"
if [[ $renamed -gt 0 ]]; then
echo "Renamed files:"
get_renamed_files "$commit1" "$commit2" | while IFS=$'\t' read -r sim old new; do
echo " $old$new ($sim%)"
done
fi
# Generate filename
filename=$(format_diff_filename "$short1" "$short2")
echo "Output: $filename"
```
## Error Handling
All scripts use `set -euo pipefail` for strict error handling:
- `-e`: Exit on error
- `-u`: Exit on undefined variable
- `-o pipefail`: Fail on pipe errors
When sourcing, you may want to disable this:
```bash
set +euo pipefail
source diff_stats.sh
set -euo pipefail
```
## Dependencies
- bash 4.0+
- git 2.0+
- Standard Unix utilities (awk, sed, cut, wc)
## Testing
Each script includes a main section that runs when executed directly:
```bash
# Test all scripts
./validate.sh HEAD
./commit_info.sh HEAD HEAD^
./diff_stats.sh HEAD^ HEAD
./file_operations.sh HEAD^ HEAD
./utils.sh
```
## Integration with Skill
These scripts are designed to be used by Claude Code when executing the get-git-diff skill. They provide reliable, composable functions for git diff analysis.
Example integration:
```bash
# In skill execution
source scripts/validate.sh
source scripts/diff_stats.sh
if validate_commit "$user_commit"; then
stats=$(get_diff_stats "$user_commit" "HEAD")
# Process stats...
fi
```

View File

@@ -0,0 +1,157 @@
#!/usr/bin/env bash
#
# Git Commit Information Functions
#
# Functions for retrieving commit metadata and relationships.
#
# Author: Claude Code get-git-diff skill
# Version: 1.0.0
set -euo pipefail
#######################################
# Get detailed information about a commit
# Arguments:
# $1 - commit reference
# Outputs:
# Pipe-separated: full_hash|short_hash|author_name|author_email|date|message
#######################################
get_commit_info() {
local commit_ref="$1"
local format="%H|%h|%an|%ae|%ad|%s"
git log -1 --format="${format}" "${commit_ref}" 2>/dev/null
}
#######################################
# Get short hash from commit reference
# Arguments:
# $1 - commit reference
# $2 - length (optional, default: 7)
# Outputs:
# Short hash
#######################################
get_short_hash() {
local commit_ref="$1"
local length="${2:-7}"
git rev-parse --short="${length}" "${commit_ref}" 2>/dev/null
}
#######################################
# Check if a commit is a merge commit
# Arguments:
# $1 - commit reference
# Returns:
# 0 if merge commit, 1 if not
# Outputs:
# Parent hashes separated by spaces
#######################################
is_merge_commit() {
local commit_ref="$1"
local parents
if parents=$(git log -1 --format="%P" "${commit_ref}" 2>/dev/null); then
local parent_count=$(echo "${parents}" | wc -w)
if [[ ${parent_count} -gt 1 ]]; then
echo "${parents}"
return 0
fi
fi
return 1
}
#######################################
# Get number of commits between two refs
# Arguments:
# $1 - first commit (older)
# $2 - second commit (newer)
# Outputs:
# Number of commits between them
#######################################
get_commits_between() {
local commit1="$1"
local commit2="$2"
git rev-list --count "${commit1}..${commit2}" 2>/dev/null || echo "0"
}
#######################################
# Get commit message (first line)
# Arguments:
# $1 - commit reference
# Outputs:
# Commit message subject line
#######################################
get_commit_message() {
local commit_ref="$1"
git log -1 --format="%s" "${commit_ref}" 2>/dev/null
}
#######################################
# Get commit author
# Arguments:
# $1 - commit reference
# Outputs:
# Author name and email
#######################################
get_commit_author() {
local commit_ref="$1"
git log -1 --format="%an <%ae>" "${commit_ref}" 2>/dev/null
}
#######################################
# Get commit date
# Arguments:
# $1 - commit reference
# Outputs:
# Commit date
#######################################
get_commit_date() {
local commit_ref="$1"
git log -1 --format="%ad" "${commit_ref}" 2>/dev/null
}
# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Example usage
if [[ $# -eq 0 ]]; then
echo "Usage: $0 <commit-ref> [<commit-ref2>]"
echo ""
echo "Examples:"
echo " $0 HEAD"
echo " $0 HEAD HEAD^"
exit 1
fi
commit1="$1"
commit2="${2:-}"
echo "=== Commit Info for ${commit1} ==="
info=$(get_commit_info "${commit1}")
IFS='|' read -r full short author email date message <<< "${info}"
echo "Full hash: ${full}"
echo "Short hash: ${short}"
echo "Author: ${author} <${email}>"
echo "Date: ${date}"
echo "Message: ${message}"
if is_merge_commit "${commit1}"; then
parents=$(is_merge_commit "${commit1}")
echo "Merge commit: yes"
echo "Parents: ${parents}"
else
echo "Merge commit: no"
fi
if [[ -n "${commit2}" ]]; then
echo ""
echo "=== Comparison: ${commit1} to ${commit2} ==="
count=$(get_commits_between "${commit1}" "${commit2}")
echo "Commits between: ${count}"
fi
fi

View File

@@ -0,0 +1,178 @@
#!/usr/bin/env bash
#
# Git Diff Statistics Functions
#
# Functions for calculating and analyzing diff statistics.
#
# Author: Claude Code get-git-diff skill
# Version: 1.0.0
set -euo pipefail
#######################################
# Get diff statistics between two commits
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Tab-separated: files_changed insertions deletions net_change
#######################################
get_diff_stats() {
local commit1="$1"
local commit2="$2"
local shortstat
if shortstat=$(git diff --shortstat "${commit1}...${commit2}" 2>/dev/null); then
local files=0 insertions=0 deletions=0
# Parse: "3 files changed, 165 insertions(+), 20 deletions(-)"
if [[ ${shortstat} =~ ([0-9]+)\ files?\ changed ]]; then
files="${BASH_REMATCH[1]}"
fi
if [[ ${shortstat} =~ ([0-9]+)\ insertions? ]]; then
insertions="${BASH_REMATCH[1]}"
fi
if [[ ${shortstat} =~ ([0-9]+)\ deletions? ]]; then
deletions="${BASH_REMATCH[1]}"
fi
local net_change=$((insertions - deletions))
echo "${files} ${insertions} ${deletions} ${net_change}"
else
echo "0 0 0 0"
fi
}
#######################################
# Get detailed per-file statistics
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Tab-separated: insertions deletions filename (one per line)
#######################################
get_file_stats() {
local commit1="$1"
local commit2="$2"
git diff --numstat "${commit1}...${commit2}" 2>/dev/null
}
#######################################
# Check if diff is large (exceeds threshold)
# Arguments:
# $1 - first commit
# $2 - second commit
# $3 - threshold (optional, default: 1000)
# Returns:
# 0 if large, 1 if not
# Outputs:
# Total line count if large
#######################################
is_large_diff() {
local commit1="$1"
local commit2="$2"
local threshold="${3:-1000}"
local stats
if stats=$(get_diff_stats "${commit1}" "${commit2}"); then
local insertions deletions total_lines
insertions=$(echo "${stats}" | cut -f2)
deletions=$(echo "${stats}" | cut -f3)
total_lines=$((insertions + deletions))
if [[ ${total_lines} -gt ${threshold} ]]; then
echo "${total_lines}"
return 0
fi
fi
return 1
}
#######################################
# Get total line count in diff
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Total lines changed (insertions + deletions)
#######################################
get_total_lines() {
local commit1="$1"
local commit2="$2"
local stats
if stats=$(get_diff_stats "${commit1}" "${commit2}"); then
local insertions deletions
insertions=$(echo "${stats}" | cut -f2)
deletions=$(echo "${stats}" | cut -f3)
echo $((insertions + deletions))
else
echo "0"
fi
}
#######################################
# Get files changed count
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Number of files changed
#######################################
get_files_changed() {
local commit1="$1"
local commit2="$2"
local stats
if stats=$(get_diff_stats "${commit1}" "${commit2}"); then
echo "${stats}" | cut -f1
else
echo "0"
fi
}
# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Example usage
if [[ $# -lt 2 ]]; then
echo "Usage: $0 <commit1> <commit2> [threshold]"
echo ""
echo "Examples:"
echo " $0 HEAD^ HEAD"
echo " $0 main feature-branch 500"
exit 1
fi
commit1="$1"
commit2="$2"
threshold="${3:-1000}"
echo "=== Diff Statistics: ${commit1}${commit2} ==="
stats=$(get_diff_stats "${commit1}" "${commit2}")
IFS=$'\t' read -r files ins del net <<< "${stats}"
echo "Files changed: ${files}"
echo "Insertions: +${ins}"
echo "Deletions: -${del}"
echo "Net change: ${net}"
echo "Total lines: $((ins + del))"
if is_large_diff "${commit1}" "${commit2}" "${threshold}"; then
total=$(is_large_diff "${commit1}" "${commit2}" "${threshold}")
echo "⚠ Large diff detected (${total} lines, threshold: ${threshold})"
else
echo "✓ Normal size diff (threshold: ${threshold})"
fi
echo ""
echo "=== Top 10 Changed Files ==="
get_file_stats "${commit1}" "${commit2}" | head -10 | while IFS=$'\t' read -r ins del file; do
if [[ "${ins}" == "-" && "${del}" == "-" ]]; then
echo " ${file} (binary)"
else
echo " ${file} (+${ins}, -${del})"
fi
done
fi

View File

@@ -0,0 +1,250 @@
#!/usr/bin/env bash
#
# Git File Operations Functions
#
# Functions for analyzing file operations in diffs (add, modify, delete, rename).
#
# Author: Claude Code get-git-diff skill
# Version: 1.0.0
set -euo pipefail
#######################################
# Get file operations (add, modify, delete, rename)
# Arguments:
# $1 - first commit
# $2 - second commit
# $3 - detect renames (optional, default: true)
# Outputs:
# File operations, one per line (status TAB path)
#######################################
get_file_operations() {
local commit1="$1"
local commit2="$2"
local detect_renames="${3:-true}"
local cmd="git diff --name-status"
if [[ "${detect_renames}" == "true" ]]; then
cmd="${cmd} -M"
fi
${cmd} "${commit1}...${commit2}" 2>/dev/null
}
#######################################
# Count file operations by type
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Tab-separated: added modified deleted renamed copied
#######################################
count_file_operations() {
local commit1="$1"
local commit2="$2"
local operations
operations=$(get_file_operations "${commit1}" "${commit2}")
local added=0 modified=0 deleted=0 renamed=0 copied=0
while IFS= read -r line; do
[[ -z "${line}" ]] && continue
local status="${line:0:1}"
case "${status}" in
A) ((added++)) ;;
M) ((modified++)) ;;
D) ((deleted++)) ;;
R) ((renamed++)) ;;
C) ((copied++)) ;;
esac
done <<< "${operations}"
echo "${added} ${modified} ${deleted} ${renamed} ${copied}"
}
#######################################
# Get files by operation type
# Arguments:
# $1 - first commit
# $2 - second commit
# $3 - operation type (A, M, D, R, C)
# Outputs:
# List of files, one per line
#######################################
get_files_by_operation() {
local commit1="$1"
local commit2="$2"
local operation_type="$3"
local operations
operations=$(get_file_operations "${commit1}" "${commit2}")
echo "${operations}" | awk -v op="${operation_type}" '$1 ~ "^"op {print $2}'
}
#######################################
# Get renamed files with old and new paths
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Tab-separated: similarity old_path new_path (one per line)
#######################################
get_renamed_files() {
local commit1="$1"
local commit2="$2"
local operations
operations=$(get_file_operations "${commit1}" "${commit2}" true)
echo "${operations}" | awk '
/^R[0-9]+/ {
similarity = substr($1, 2)
old_path = $2
new_path = $3
print similarity "\t" old_path "\t" new_path
}
'
}
#######################################
# Get binary files in diff
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# List of binary file paths, one per line
#######################################
get_binary_files() {
local commit1="$1"
local commit2="$2"
local numstat
if numstat=$(git diff --numstat "${commit1}...${commit2}" 2>/dev/null); then
echo "${numstat}" | awk '$1 == "-" && $2 == "-" {print $3}'
fi
}
#######################################
# Count binary files
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Number of binary files
#######################################
count_binary_files() {
local commit1="$1"
local commit2="$2"
get_binary_files "${commit1}" "${commit2}" | wc -l | tr -d ' '
}
#######################################
# Categorize files by type
# Arguments:
# Reads file paths from stdin, one per line
# Outputs:
# Tab-separated: source tests docs config database other
#######################################
categorize_files() {
local source=0 tests=0 docs=0 config=0 database=0 other=0
while IFS= read -r path; do
[[ -z "${path}" ]] && continue
# Categorize based on path and extension
if [[ "${path}" =~ test|tests/ ]]; then
((tests++))
elif [[ "${path}" =~ \.md$|doc ]]; then
((docs++))
elif [[ "${path}" =~ config|settings|\.env|\.ya?ml$|\.json$|\.toml$|\.ini$ ]]; then
((config++))
elif [[ "${path}" =~ migration|schema|models ]]; then
((database++))
elif [[ "${path}" =~ \.(py|js|ts|java|go|rs|cpp|c|rb|php|swift|kt)$ ]]; then
((source++))
else
((other++))
fi
done
echo "${source} ${tests} ${docs} ${config} ${database} ${other}"
}
#######################################
# Get categorized file counts from diff
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Tab-separated category counts
#######################################
get_categorized_counts() {
local commit1="$1"
local commit2="$2"
get_file_operations "${commit1}" "${commit2}" | cut -f2 | categorize_files
}
# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Example usage
if [[ $# -lt 2 ]]; then
echo "Usage: $0 <commit1> <commit2>"
echo ""
echo "Examples:"
echo " $0 HEAD^ HEAD"
echo " $0 main feature-branch"
exit 1
fi
commit1="$1"
commit2="$2"
echo "=== File Operations: ${commit1}${commit2} ==="
counts=$(count_file_operations "${commit1}" "${commit2}")
IFS=$'\t' read -r added modified deleted renamed copied <<< "${counts}"
echo "Added: ${added}"
echo "Modified: ${modified}"
echo "Deleted: ${deleted}"
echo "Renamed: ${renamed}"
echo "Copied: ${copied}"
binary_count=$(count_binary_files "${commit1}" "${commit2}")
echo "Binary: ${binary_count}"
if [[ ${added} -gt 0 ]]; then
echo ""
echo "=== Added Files ==="
get_files_by_operation "${commit1}" "${commit2}" "A" | head -10
if [[ ${added} -gt 10 ]]; then
echo " ... and $((added - 10)) more"
fi
fi
if [[ ${renamed} -gt 0 ]]; then
echo ""
echo "=== Renamed Files ==="
get_renamed_files "${commit1}" "${commit2}" | head -10 | while IFS=$'\t' read -r sim old new; do
echo " ${old}${new} (${sim}%)"
done
if [[ ${renamed} -gt 10 ]]; then
echo " ... and $((renamed - 10)) more"
fi
fi
echo ""
echo "=== File Categorization ==="
categories=$(get_categorized_counts "${commit1}" "${commit2}")
IFS=$'\t' read -r source tests docs config database other <<< "${categories}"
echo "Source code: ${source}"
echo "Tests: ${tests}"
echo "Documentation: ${docs}"
echo "Configuration: ${config}"
echo "Database: ${database}"
echo "Other: ${other}"
fi

View File

@@ -0,0 +1,250 @@
#!/usr/bin/env bash
#
# Git Utility Functions
#
# General utility functions for git operations and formatting.
#
# Author: Claude Code get-git-diff skill
# Version: 1.0.0
set -euo pipefail
#######################################
# Get current branch name
# Outputs:
# Branch name, or "HEAD" if detached
#######################################
get_current_branch() {
local branch
branch=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "HEAD")
echo "${branch}"
}
#######################################
# Get default branch (main or master)
# Outputs:
# Default branch name, or empty if not found
#######################################
get_default_branch() {
if git rev-parse --verify main >/dev/null 2>&1; then
echo "main"
elif git rev-parse --verify master >/dev/null 2>&1; then
echo "master"
else
echo ""
fi
}
#######################################
# Format diff filename
# Arguments:
# $1 - first commit short hash
# $2 - second commit short hash
# $3 - extension (optional, default: md)
# Outputs:
# Formatted filename like "diff_abc123_def456.md"
#######################################
format_diff_filename() {
local hash1="$1"
local hash2="$2"
local ext="${3:-md}"
echo "diff_${hash1}_${hash2}.${ext}"
}
#######################################
# Get repository root directory
# Outputs:
# Absolute path to repository root
#######################################
get_repo_root() {
git rev-parse --show-toplevel 2>/dev/null
}
#######################################
# Get repository name
# Outputs:
# Name of the repository (directory name)
#######################################
get_repo_name() {
local root
root=$(get_repo_root)
if [[ -n "${root}" ]]; then
basename "${root}"
fi
}
#######################################
# Get remote URL (if available)
# Arguments:
# $1 - remote name (optional, default: origin)
# Outputs:
# Remote URL or empty if not found
#######################################
get_remote_url() {
local remote="${1:-origin}"
git remote get-url "${remote}" 2>/dev/null || echo ""
}
#######################################
# Check if commit is reachable from another
# Arguments:
# $1 - ancestor commit
# $2 - descendant commit
# Returns:
# 0 if reachable, 1 if not
#######################################
is_ancestor() {
local ancestor="$1"
local descendant="$2"
git merge-base --is-ancestor "${ancestor}" "${descendant}" 2>/dev/null
}
#######################################
# Get common ancestor of two commits
# Arguments:
# $1 - first commit
# $2 - second commit
# Outputs:
# Common ancestor commit hash
#######################################
get_common_ancestor() {
local commit1="$1"
local commit2="$2"
git merge-base "${commit1}" "${commit2}" 2>/dev/null
}
#######################################
# Format timestamp
# Arguments:
# $1 - format (optional, default: "%Y-%m-%d %H:%M:%S")
# Outputs:
# Formatted current timestamp
#######################################
format_timestamp() {
local format="${1:-%Y-%m-%d %H:%M:%S}"
date "+${format}"
}
#######################################
# Create directory if it doesn't exist
# Arguments:
# $1 - directory path
# Returns:
# 0 if successful or already exists
#######################################
ensure_directory() {
local dir="$1"
if [[ ! -d "${dir}" ]]; then
mkdir -p "${dir}"
fi
}
#######################################
# Get git config value
# Arguments:
# $1 - config key (e.g., user.name)
# Outputs:
# Config value or empty if not set
#######################################
get_git_config() {
local key="$1"
git config --get "${key}" 2>/dev/null || echo ""
}
#######################################
# Check if working directory is clean
# Returns:
# 0 if clean, 1 if dirty
#######################################
is_working_tree_clean() {
git diff-index --quiet HEAD -- 2>/dev/null
}
#######################################
# Get list of all branches
# Arguments:
# $1 - type: "local", "remote", or "all" (default: local)
# Outputs:
# List of branches, one per line
#######################################
get_branches() {
local type="${1:-local}"
case "${type}" in
local)
git branch --format='%(refname:short)' 2>/dev/null
;;
remote)
git branch -r --format='%(refname:short)' 2>/dev/null
;;
all)
git branch -a --format='%(refname:short)' 2>/dev/null
;;
*)
echo "Error: Invalid branch type: ${type}" >&2
return 1
;;
esac
}
#######################################
# Check if ref exists
# Arguments:
# $1 - ref name (branch, tag, commit)
# Returns:
# 0 if exists, 1 if not
#######################################
ref_exists() {
local ref="$1"
git rev-parse --verify "${ref}" >/dev/null 2>&1
}
# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Example usage
echo "=== Git Repository Utilities ==="
echo ""
echo "Current branch: $(get_current_branch)"
echo "Default branch: $(get_default_branch)"
echo "Repository root: $(get_repo_root)"
echo "Repository name: $(get_repo_name)"
remote_url=$(get_remote_url)
if [[ -n "${remote_url}" ]]; then
echo "Remote URL (origin): ${remote_url}"
fi
echo ""
echo "Git user name: $(get_git_config user.name)"
echo "Git user email: $(get_git_config user.email)"
echo ""
if is_working_tree_clean; then
echo "Working tree: clean"
else
echo "Working tree: dirty (uncommitted changes)"
fi
echo ""
echo "Example filename: $(format_diff_filename "abc123" "def456")"
echo "Current timestamp: $(format_timestamp)"
echo ""
echo "=== Local Branches ==="
get_branches local | head -5
local_count=$(get_branches local | wc -l)
if [[ ${local_count} -gt 5 ]]; then
echo " ... and $((local_count - 5)) more"
fi
fi

View File

@@ -0,0 +1,88 @@
#!/usr/bin/env bash
#
# Git Validation Functions
#
# Functions for validating git repositories and commit references.
#
# Author: Claude Code get-git-diff skill
# Version: 1.0.0
set -euo pipefail
#######################################
# Check if in a git repository
# Returns:
# 0 if in git repo, 1 if not
#######################################
is_git_repo() {
git rev-parse --is-inside-work-tree >/dev/null 2>&1
}
#######################################
# Validate that a commit reference exists
# Arguments:
# $1 - commit reference (hash, branch, tag)
# Returns:
# 0 if valid, 1 if invalid
# Outputs:
# Full commit hash if valid, error message to stderr if invalid
#######################################
validate_commit() {
local commit_ref="$1"
local full_hash
if full_hash=$(git rev-parse --verify "${commit_ref}" 2>/dev/null); then
echo "${full_hash}"
return 0
else
echo "Error: Invalid commit reference: ${commit_ref}" >&2
return 1
fi
}
#######################################
# Validate two commit references
# Arguments:
# $1 - first commit reference
# $2 - second commit reference
# Returns:
# 0 if both valid, 1 if either invalid
# Outputs:
# Tab-separated full hashes if valid
#######################################
validate_commit_pair() {
local commit1="$1"
local commit2="$2"
local hash1 hash2
if hash1=$(validate_commit "${commit1}") && hash2=$(validate_commit "${commit2}"); then
echo "${hash1} ${hash2}"
return 0
else
return 1
fi
}
# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Example usage
if [[ $# -eq 0 ]]; then
echo "Usage: $0 <commit-ref> [<commit-ref2>]"
echo ""
echo "Examples:"
echo " $0 HEAD"
echo " $0 abc123 def456"
exit 1
fi
if is_git_repo; then
if [[ $# -eq 1 ]]; then
validate_commit "$1"
else
validate_commit_pair "$1" "$2"
fi
else
echo "Error: Not in a git repository" >&2
exit 1
fi
fi

View File

@@ -0,0 +1,241 @@
# Git Diff: [SHORT_HASH_1] → [SHORT_HASH_2]
**Generated**: [YYYY-MM-DD HH:MM:SS]
**Skill**: get-git-diff v1.0.0
---
## Commit Information
### From Commit
- **Hash**: `[FULL_HASH_1]` ([SHORT_HASH_1])
- **Author**: [AUTHOR_NAME] <[AUTHOR_EMAIL]>
- **Date**: [COMMIT_DATE_1]
- **Message**: [COMMIT_MESSAGE_1]
### To Commit
- **Hash**: `[FULL_HASH_2]` ([SHORT_HASH_2])
- **Author**: [AUTHOR_NAME] <[AUTHOR_EMAIL]>
- **Date**: [COMMIT_DATE_2]
- **Message**: [COMMIT_MESSAGE_2]
### Comparison Details
- **Commits Between**: [N] commits
- **Branch Context**: [BRANCH_INFO] (if applicable)
- **Merge Commit**: [Yes/No] (if applicable)
---
## Summary
### Change Statistics
| Metric | Count |
|--------|-------|
| Files Changed | [N] |
| Insertions | [N] (+) |
| Deletions | [N] (-) |
| Net Change | [±N] |
### File Operations
#### Added Files ([N])
[If any files were added, list them here]
- `path/to/new_file.py`
- `path/to/another_file.js`
#### Modified Files ([N])
[If any files were modified, list them here]
- `src/module.py` (+[N], -[N])
- `tests/test_module.py` (+[N], -[N])
#### Deleted Files ([N])
[If any files were deleted, list them here]
- `old/deprecated_file.py`
#### Renamed Files ([N])
[If any files were renamed, list them here]
- `old/path/file.py``new/path/file.py` ([SIMILARITY]%)
#### Binary Files ([N])
[If any binary files changed, list them here]
- `static/images/logo.png` (binary)
- `docs/manual.pdf` (binary)
---
## Change Categorization
### By File Type
| Category | Files | Insertions | Deletions |
|----------|-------|------------|-----------|
| Source Code | [N] | [N] | [N] |
| Tests | [N] | [N] | [N] |
| Documentation | [N] | [N] | [N] |
| Configuration | [N] | [N] | [N] |
| Database | [N] | [N] | [N] |
| Other | [N] | [N] | [N] |
### Change Type Assessment
[Based on patterns identified]
- **Primary Type**: [Feature Addition / Bug Fix / Refactoring / etc.]
- **Impact Areas**:
- [Area 1 - e.g., Authentication system]
- [Area 2 - e.g., User API endpoints]
- [Area 3 - e.g., Test coverage]
### Risk Assessment
[Based on change patterns and areas affected]
- **Risk Level**: [Low / Medium / High / Critical]
- **Risk Factors**:
- [Factor 1 - e.g., Database schema changes]
- [Factor 2 - e.g., Breaking API changes]
- [Factor 3 - e.g., Security-sensitive code modified]
---
## Special Notes
[Include any special considerations or warnings]
### Large Diff Warning
[If applicable]
⚠️ **This diff contains [N] lines across [M] files.** This is a substantial change set.
[If summary only: "Full unified diff omitted due to size. Use `git diff [hash1]...[hash2]` to view complete diff."]
### Merge Commit
[If applicable]
⚠️ **This is a merge commit** merging [BRANCH_NAME] into [BASE_BRANCH].
- Parent 1: [HASH] ([BRANCH])
- Parent 2: [HASH] ([BRANCH])
### Binary Changes
[If applicable]
⚠️ **This diff includes [N] binary files.** Binary content is not displayed in the diff below.
### Renamed Files
[If applicable]
**[N] files were renamed or moved.** Old and new paths are shown above with similarity percentages.
---
## Detailed Diff
[If full diff is included, otherwise note that it was omitted]
```diff
[FULL_UNIFIED_DIFF_OUTPUT]
Example format:
diff --git a/src/module.py b/src/module.py
index abc123..def456 100644
--- a/src/module.py
+++ b/src/module.py
@@ -10,7 +10,8 @@ def my_function():
# Unchanged line
- old_line = "removed"
+ new_line = "added"
+ another_line = "also added"
# More context
diff --git a/tests/test_module.py b/tests/test_module.py
index 111222..333444 100644
--- a/tests/test_module.py
+++ b/tests/test_module.py
@@ -5,3 +5,6 @@ def test_my_function():
assert my_function() is not None
+
+def test_new_behavior():
+ assert new_line == "added"
```
---
## Recommendations
[Optional section with actionable recommendations based on diff analysis]
### Testing
- [ ] Verify all tests pass with these changes
- [ ] Add tests for any new functionality
- [ ] Check test coverage for modified areas
### Code Review Focus
- [ ] Review [specific area] for correctness
- [ ] Validate [specific concern] is addressed
- [ ] Ensure backward compatibility for [specific component]
### Deployment Considerations
- [ ] Update environment variables if config changed
- [ ] Run database migrations if schema changed
- [ ] Update documentation if API changed
- [ ] Coordinate with [team] if [specific area] changed
---
## Related Commands
```bash
# View this diff locally
git diff [SHORT_HASH_1]...[SHORT_HASH_2]
# View with file names only
git diff --name-status [SHORT_HASH_1]...[SHORT_HASH_2]
# View with statistics
git diff --stat [SHORT_HASH_1]...[SHORT_HASH_2]
# View specific file from this diff
git diff [SHORT_HASH_1]...[SHORT_HASH_2] -- path/to/file
# View commit messages in range
git log --oneline [SHORT_HASH_1]..[SHORT_HASH_2]
```
---
## Metadata
- **Analyzed by**: Claude Code get-git-diff skill v1.0.0
- **Analysis date**: [YYYY-MM-DD HH:MM:SS]
- **Repository**: [REPO_PATH or REPO_URL if available]
- **Output format**: Standard Unified Diff with Summary
---
<!--
Template Usage Instructions:
Replace all placeholders in [BRACKETS] with actual values:
- [SHORT_HASH_1/2] - 7-character commit hash
- [FULL_HASH_1/2] - Full 40-character commit hash
- [AUTHOR_NAME] - Commit author name
- [AUTHOR_EMAIL] - Commit author email
- [COMMIT_DATE_1/2] - Commit date
- [COMMIT_MESSAGE_1/2] - Commit message (first line)
- [N] - Numeric values (file counts, line counts, etc.)
- [BRANCH_INFO] - Branch context information
- [SIMILARITY] - Similarity percentage for renames
- [FULL_UNIFIED_DIFF_OUTPUT] - Complete git diff output
Conditional Sections:
- Include "Large Diff Warning" only if diff > 1000 lines
- Include "Merge Commit" only if commit is a merge
- Include "Binary Changes" only if binary files present
- Include "Renamed Files" only if renames detected
- Omit empty categories (e.g., if no files deleted, omit that section)
Formatting:
- Use proper markdown syntax
- Wrap code/paths in backticks: `path/to/file`
- Use tables for structured data
- Use bullet lists for items
- Use checkboxes [ ] for action items
- Use emoji/symbols sparingly: ⚠️ for warnings, for info
Output Location:
- Save to: /claudedocs/diff_{short_hash1}_{short_hash2}.md
- Ensure /claudedocs directory exists
- Use filesystem-safe characters only
-->

View File

@@ -0,0 +1,183 @@
---
name: python-code-review
description: Deep Python code review of changed files using git diff analysis. Focuses on production quality, security vulnerabilities, performance bottlenecks, architectural issues, and subtle bugs in code changes. Analyzes correctness, efficiency, scalability, and production readiness of modifications. Use for pull request reviews, commit reviews, security audits of changes, and pre-deployment validation. Supports Django, Flask, FastAPI, pandas, and ML frameworks.
---
# Python Code Review Expert
## ⚠️ MANDATORY COMPLIANCE ⚠️
**CRITICAL**: The 5-step workflow outlined in this document MUST be followed in exact order for EVERY code review. Skipping steps or deviating from the procedure will result in incomplete and unreliable reviews. This is non-negotiable.
## File Structure
- **SKILL.md** (this file): Main instructions and MANDATORY workflow
- **examples.md**: Review scenarios with before/after examples
- **../../context/python/**: Framework patterns and detection logic
- `context_detection.md`, `common_issues.md`, `{framework}_patterns.md`
- **../../context/security/**: Security guidelines and OWASP references
- `security_guidelines.md`, `owasp_python.md`
- **../../memory/skills/python-code-review/**: Project-specific memory storage
- `{project-name}/`: Per-project learned patterns and context
- **templates/**: `report_template.md`, `inline_comment_template.md`
## Review Focus Areas
Deep reviews evaluate 8 critical dimensions **in the changed code**:
1. **Production Quality**: Correctness, edge cases, error recovery, resilience
2. **Deep Bugs**: Race conditions, memory leaks, resource exhaustion, subtle logic errors
3. **Security**: Injection flaws, auth bypasses, insecure deserialization, data exposure
4. **Performance**: Algorithmic complexity, N+1 queries, memory inefficiency, I/O blocking
5. **Architecture**: Tight coupling, missing abstractions, SOLID violations, circular deps
6. **Reliability**: Transaction safety, error handling, resource leaks, idempotency
7. **Scalability**: Concurrency issues, connection pooling, pagination, unbounded consumption
8. **Testing**: Missing critical tests, inadequate edge case coverage
**Note**: Focus on substantive issues requiring human judgment, not style/formatting details. Reviews are performed on changed code only, using the `get-git-diff` skill to identify modifications.
---
## MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)
### ⚠️ STEP 1: Identify Changed Files via Git Diff (REQUIRED)
**YOU MUST:**
1. **Invoke the `get-git-diff` skill** to identify changed Python files
2. Ask clarifying questions to determine comparison scope:
- Which commits/branches to compare? (e.g., `HEAD^ vs HEAD`, `main vs feature-branch`)
- If not specified, default to comparing current changes against the default branch
- Use the diff output to extract the list of modified Python files (`.py` extension)
3. If no Python files were changed, inform the user and exit gracefully
4. Focus subsequent review ONLY on the files identified in the diff
**DO NOT PROCEED WITHOUT GIT DIFF ANALYSIS**
### ⚠️ STEP 2: Load Project Memory & Context Detection (REQUIRED)
**YOU MUST:**
1. **CHECK PROJECT MEMORY FIRST**:
- Identify the project name from the repository root or ask the user
- Check `../../memory/skills/python-code-review/{project-name}/` for existing project memory
- If memory exists, read all files to understand previously learned patterns, frameworks, and project-specific context
- If no memory exists, you will create it later in this process
2. Analyze changed files' structure and imports
3. **READ** `../../context/python/context_detection.md` to identify framework
4. Determine which framework-specific patterns file(s) to load
5. Ask clarifying questions in Socratic format:
- What is the purpose of these changes?
- Specific concerns to focus on?
- Deployment environment?
- Any project-specific conventions or patterns to be aware of?
**DO NOT PROCEED WITHOUT COMPLETING THIS STEP**
### ⚠️ STEP 3: Read Pattern Files (REQUIRED)
**YOU MUST read these files based on context**:
1. **ALWAYS**: `../../context/python/common_issues.md` (universal anti-patterns and deep bugs)
2. **If Django detected**: `../../context/python/django_patterns.md`
3. **If Flask detected**: `../../context/python/flask_patterns.md`
4. **If FastAPI detected**: `../../context/python/fastapi_patterns.md`
5. **If data science detected**: `../../context/python/datascience_patterns.md`
6. **If ML detected**: `../../context/python/ml_patterns.md`
7. **For security reviews**: `../../context/security/security_guidelines.md` AND `../../context/security/owasp_python.md`
**Progressive loading**: Only read framework files when detected. Don't load all upfront.
**DO NOT SKIP PATTERN FILE READING**
### ⚠️ STEP 4: Deep Manual Review of Changed Code (REQUIRED)
**YOU MUST examine ONLY the changed code for ALL categories below**:
**Important**: While reviewing changed lines, consider the surrounding context to understand:
- How changes interact with existing code
- Whether changes introduce regressions
- Impact on callers and dependent code
- Whether the change addresses the root cause or masks symptoms
**Review Categories**:
**Production Readiness**: Edge cases, input validation, error recovery, resource cleanup, timeouts
**Deep Bugs**: Race conditions, memory leaks, off-by-one errors, unhandled exceptions, state corruption, infinite loops, integer overflow, timezone issues
**Architecture**: Tight coupling, missing abstractions, SOLID violations, global state, circular dependencies
**Security**: SQL/NoSQL/Command injection, auth bypasses, insecure deserialization, SSRF, XXE, crypto weaknesses, data exposure, missing rate limiting
**Performance**: O(n²) complexity, N+1 queries, memory leaks, blocking I/O in async, missing indexes, inefficient data structures, cache stampede
**Scalability**: Connection pool exhaustion, lock contention, deadlocks, missing pagination, unbounded consumption
**Reliability**: Transaction boundaries, data races, resource leaks, missing idempotency
**DO NOT SKIP ANY CATEGORY**
### ⚠️ STEP 5: Generate Output & Update Project Memory (REQUIRED)
**YOU MUST ask user for preferred output format**:
- **Option A**: Structured report (`templates/report_template.md`) → executive summary, categorized findings, action items → output to `claudedocs/`
- **Option B**: Inline comments (`templates/inline_comment_template.md`) → file:line feedback, PR-style
- **Option C (Default)**: Both formats
**DO NOT CHOOSE FORMAT WITHOUT USER INPUT**
**For EVERY issue in the output, YOU MUST provide**:
1. **Severity**: Critical / Important / Minor
2. **Category**: Security / Performance / Code Quality / Architecture / Reliability
3. **Description**: What is wrong and why it matters
4. **Fix**: Concrete code example with improvement
5. **Reference**: Link to PEP, OWASP, or framework docs
6. **File:line**: Exact location (e.g., `auth.py:142`)
**Format guidelines**:
- Explain WHY (not just what)
- Show HOW to fix with examples
- Be specific with file:line references
- Be balanced (acknowledge good patterns)
- Educate, don't criticize
**DO NOT PROVIDE INCOMPLETE RECOMMENDATIONS**
**After completing the review, UPDATE PROJECT MEMORY**:
Create or update files in `../../memory/skills/python-code-review/{project-name}/`:
1. **project_overview.md**: Framework, architecture patterns, deployment info
2. **common_patterns.md**: Project-specific coding patterns and conventions discovered
3. **known_issues.md**: Recurring issues or anti-patterns found in this project
4. **review_history.md**: Summary of reviews performed with dates and key findings
This memory will be consulted in future reviews to provide context-aware analysis.
---
## Compliance Checklist
Before completing ANY review, verify:
- [ ] Step 1: Git diff analyzed using `get-git-diff` skill and changed Python files identified
- [ ] Step 2: Project memory checked in `../../memory/skills/python-code-review/{project-name}/` and context detected
- [ ] Step 3: All relevant pattern files read from `../../context/python/` and `../../context/security/`
- [ ] Step 4: Manual review completed for ALL categories on changed code only
- [ ] Step 5: Output generated with all required fields AND project memory updated
**FAILURE TO COMPLETE ALL STEPS INVALIDATES THE REVIEW**
## Further Reading
Refer to the official documentation:
- **Python Standards**:
- Python PEPs: https://peps.python.org/
- OWASP Python Security: https://owasp.org/www-project-python-security/
- **Frameworks**:
- Django, Flask, FastAPI official documentation
- **Best Practices**:
- Real Python: https://realpython.com/
## Version History
- v2.1.0 (2025-11-14): Refactored to use centralized context and project-specific memory system
- Context files moved to `forge-plugin/context/python/` and `forge-plugin/context/security/`
- Project memory stored in `forge-plugin/memory/skills/python-code-review/{project-name}/`
- Added project memory loading and persistence in workflow
- v2.0.0 (2025-11-13): Changed to diff-based review using `get-git-diff` skill - reviews only changed code
- v1.1.0 (2025-11-13): Removed automated analysis and linting/formatting tools
- v1.0.0 (2025-11-13): Initial release

View File

@@ -0,0 +1,503 @@
# Python Code Review Examples
This file contains example code review scenarios demonstrating common issues and recommended fixes.
## Example 1: Security Vulnerability - SQL Injection
### Before (Vulnerable Code)
```python
# user_service.py:15
def get_user_by_email(email):
query = f"SELECT * FROM users WHERE email = '{email}'"
cursor.execute(query)
return cursor.fetchone()
```
### Review Comment
**Severity**: Critical
**Category**: Security
**File**: user_service.py:16
SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands.
**Attack example**:
```python
email = "'; DROP TABLE users; --"
# Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'
```
### After (Fixed Code)
```python
# user_service.py:15
def get_user_by_email(email):
query = "SELECT * FROM users WHERE email = %s"
cursor.execute(query, (email,))
return cursor.fetchone()
```
**Reference**: OWASP A03:2021 - Injection
---
## Example 2: Performance Issue - N+1 Query Problem (Django)
### Before (Inefficient Code)
```python
# views.py:45
def get_posts_with_authors(request):
posts = Post.objects.all() # 1 query
result = []
for post in posts:
result.append({
'title': post.title,
'author': post.author.name # N additional queries!
})
return JsonResponse(result, safe=False)
```
### Review Comment
**Severity**: Important
**Category**: Performance
**File**: views.py:48
N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load.
### After (Optimized Code)
```python
# views.py:45
def get_posts_with_authors(request):
posts = Post.objects.select_related('author').all() # 1 query with JOIN
result = []
for post in posts:
result.append({
'title': post.title,
'author': post.author.name
})
return JsonResponse(result, safe=False)
```
**Performance gain**: 101 queries → 1 query (100x improvement for 100 posts)
**Reference**: Django QuerySet optimization
---
## Example 3: Code Quality - Mutable Default Argument
### Before (Buggy Code)
```python
# utils.py:22
def add_item(item, items=[]):
items.append(item)
return items
# Usage that reveals the bug:
list1 = add_item('a') # ['a']
list2 = add_item('b') # ['a', 'b'] - UNEXPECTED!
```
### Review Comment
**Severity**: Important
**Category**: Code Quality
**File**: utils.py:22
Mutable default argument antipattern. The default list `[]` is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence.
### After (Fixed Code)
```python
# utils.py:22
def add_item(item, items=None):
if items is None:
items = []
items.append(item)
return items
# Now works correctly:
list1 = add_item('a') # ['a']
list2 = add_item('b') # ['b'] - CORRECT!
```
**Reference**: Common Python Gotchas
---
## Example 4: PEP 8 Compliance - Naming Conventions
### Before (Non-compliant Code)
```python
# data_processor.py:10
def CalculateUserAge(BirthDate):
CurrentYear = 2025
user_birth_year = BirthDate.year
AGE = CurrentYear - user_birth_year
return AGE
```
### Review Comment
**Severity**: Minor
**Category**: Style
**File**: data_processor.py:10-15
Multiple PEP 8 naming violations:
- Function name should be `snake_case`, not `PascalCase`
- Parameter name should be `snake_case`, not `PascalCase`
- Local variables should be lowercase, not mixed case or UPPERCASE
- UPPERCASE is reserved for constants
### After (Compliant Code)
```python
# data_processor.py:10
def calculate_user_age(birth_date):
current_year = 2025
user_birth_year = birth_date.year
age = current_year - user_birth_year
return age
```
**Reference**: PEP 8 - Naming Conventions
---
## Example 5: Best Practice - Context Manager for Resource Handling
### Before (Resource Leak Risk)
```python
# file_processor.py:30
def process_log_file(filepath):
file = open(filepath, 'r')
data = file.read()
results = analyze(data)
file.close() # May not execute if analyze() raises exception
return results
```
### Review Comment
**Severity**: Important
**Category**: Best Practices
**File**: file_processor.py:31
Missing context manager for file handling. If `analyze()` raises an exception, `file.close()` never executes, leaving the file handle open (resource leak).
### After (Safe Code)
```python
# file_processor.py:30
def process_log_file(filepath):
with open(filepath, 'r') as file:
data = file.read()
results = analyze(data)
# File automatically closed even if exception occurs
return results
```
**Bonus improvement**:
```python
# Even better with pathlib
from pathlib import Path
def process_log_file(filepath):
data = Path(filepath).read_text()
return analyze(data)
```
**Reference**: PEP 343 - The "with" Statement
---
## Example 6: Security - Hardcoded Credentials
### Before (Security Risk)
```python
# config.py:5
DATABASE_CONFIG = {
'host': 'prod-db.example.com',
'user': 'admin',
'password': 'SuperSecret123!', # NEVER do this
'database': 'production'
}
```
### Review Comment
**Severity**: Critical
**Category**: Security
**File**: config.py:8
Hardcoded credentials detected. Passwords in source code:
1. Are visible to anyone with repository access
2. Get committed to version control history
3. Can't be rotated without code changes
4. May be exposed in logs or error messages
### After (Secure Code)
```python
# config.py:5
import os
DATABASE_CONFIG = {
'host': os.getenv('DB_HOST', 'localhost'),
'user': os.getenv('DB_USER'),
'password': os.getenv('DB_PASSWORD'),
'database': os.getenv('DB_NAME', 'production')
}
# Validate required environment variables
required_vars = ['DB_USER', 'DB_PASSWORD']
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
raise RuntimeError(f"Missing required environment variables: {missing}")
```
**Additional security**:
```bash
# Use environment files (not committed to git)
echo "DB_PASSWORD=..." > .env
echo ".env" >> .gitignore
```
**Reference**: OWASP A07:2021 - Identification and Authentication Failures
---
## Example 7: Performance - Pandas Optimization
### Before (Inefficient Code)
```python
# data_analysis.py:50
import pandas as pd
def calculate_discounts(df):
# Anti-pattern: Iterating over DataFrame rows
discounts = []
for index, row in df.iterrows():
if row['total'] > 100:
discount = row['total'] * 0.1
else:
discount = 0
discounts.append(discount)
df['discount'] = discounts
return df
```
### Review Comment
**Severity**: Important
**Category**: Performance
**File**: data_analysis.py:53
Using `iterrows()` on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations.
### After (Vectorized Code)
```python
# data_analysis.py:50
import pandas as pd
def calculate_discounts(df):
# Vectorized operation - operates on entire column at once
df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0)
return df
# Alternative using numpy where:
import numpy as np
def calculate_discounts(df):
df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0)
return df
```
**Performance**: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets.
**Reference**: Pandas Performance Optimization
---
## Example 8: Testing - Missing Edge Cases
### Before (Incomplete Tests)
```python
# test_validators.py:15
def test_email_validation():
assert is_valid_email('user@example.com') == True
assert is_valid_email('invalid-email') == False
```
### Review Comment
**Severity**: Important
**Category**: Testing
**File**: test_validators.py:15
Email validation tests are insufficient. Missing edge cases:
- Empty string
- None value
- Email with special characters
- Multiple @ symbols
- Missing domain
- Whitespace handling
- Maximum length validation
### After (Comprehensive Tests)
```python
# test_validators.py:15
import pytest
@pytest.mark.parametrize('email,expected', [
# Valid emails
('user@example.com', True),
('first.last@example.co.uk', True),
('user+tag@example.com', True),
# Invalid emails
('invalid-email', False),
('', False),
('user@', False),
('user@@example.com', False),
('@example.com', False),
('user @example.com', False),
('a' * 256 + '@example.com', False), # Too long
])
def test_email_validation(email, expected):
assert is_valid_email(email) == expected
def test_email_validation_with_none():
with pytest.raises(TypeError):
is_valid_email(None)
```
**Reference**: Testing Best Practices
---
## Example 9: Architecture - Separation of Concerns (FastAPI)
### Before (Tightly Coupled Code)
```python
# main.py:25
from fastapi import FastAPI
import psycopg2
app = FastAPI()
@app.get('/users/{user_id}')
def get_user(user_id: int):
# Business logic mixed with data access and presentation
conn = psycopg2.connect("dbname=mydb user=admin password=secret")
cursor = conn.cursor()
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
user = cursor.fetchone()
conn.close()
if user:
return {'id': user[0], 'name': user[1], 'email': user[2]}
return {'error': 'User not found'}
```
### Review Comment
**Severity**: Important
**Category**: Architecture
**File**: main.py:25-38
Multiple violations of separation of concerns:
1. Database connection logic in route handler
2. SQL injection vulnerability
3. Hardcoded credentials
4. No error handling
5. Manual dict construction
6. No dependency injection
### After (Layered Architecture)
```python
# models.py
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
# database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os
SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL')
engine = create_engine(SQLALCHEMY_DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
# repositories.py
from sqlalchemy.orm import Session
from . import models
class UserRepository:
def get_by_id(self, db: Session, user_id: int):
return db.query(models.User).filter(models.User.id == user_id).first()
# main.py
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from . import models, database, repositories
app = FastAPI()
user_repo = UserRepository()
@app.get('/users/{user_id}', response_model=models.User)
def get_user(user_id: int, db: Session = Depends(database.get_db)):
user = user_repo.get_by_id(db, user_id)
if not user:
raise HTTPException(status_code=404, detail='User not found')
return user
```
**Benefits**:
- Clear separation of concerns
- Dependency injection
- Type safety with Pydantic
- SQL injection protection via ORM
- Reusable repository pattern
- Proper error handling
**Reference**: FastAPI Best Practices, Repository Pattern
---
## Summary of Common Issues
1. **Security**: SQL injection, XSS, hardcoded credentials, insecure cryptography
2. **Performance**: N+1 queries, inefficient loops, missing indexes, no caching
3. **Code Quality**: Mutable defaults, global state, poor naming, missing docstrings
4. **Style**: PEP 8 violations, inconsistent formatting, magic numbers
5. **Best Practices**: Missing context managers, no type hints, poor error handling
6. **Testing**: Insufficient coverage, missing edge cases, no integration tests
7. **Architecture**: Tight coupling, mixed concerns, no dependency injection
Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.

View File

@@ -0,0 +1,391 @@
# Inline Code Review Comments Template
This template provides examples of inline PR-style comments for different types of issues.
---
## Critical Issues
### Security Vulnerability
**File**: `auth.py:45`
```python
# Current code
user = db.execute(f"SELECT * FROM users WHERE username = '{username}'")
```
**Issue**: SQL Injection Vulnerability
**Severity**: 🔴 Critical
**Description**:
User input is directly interpolated into the SQL query, allowing attackers to execute arbitrary SQL commands.
**Attack Vector**:
```python
username = "admin' OR '1'='1"
# Results in: SELECT * FROM users WHERE username = 'admin' OR '1'='1'
```
**Fix**:
```python
# Use parameterized queries
user = db.execute("SELECT * FROM users WHERE username = %s", (username,))
# Or use ORM
user = User.query.filter_by(username=username).first()
```
**Reference**: OWASP A03:2021 - Injection
---
### Data Corruption Risk
**File**: `payment.py:123`
```python
# Current code
order.amount -= discount
order.save()
payment.process(order.amount)
```
**Issue**: Race Condition in Payment Processing
**Severity**: 🔴 Critical
**Description**:
If two requests process the same order simultaneously, the discount could be applied twice, leading to incorrect payment amounts.
**Fix**:
```python
from django.db import transaction
@transaction.atomic
def process_payment(order_id, discount):
order = Order.objects.select_for_update().get(id=order_id)
order.amount -= discount
order.save()
payment.process(order.amount)
```
---
## Important Issues
### Performance Bottleneck
**File**: `views.py:67`
```python
# Current code
posts = Post.objects.all()
for post in posts:
print(post.author.name) # N+1 query problem
```
**Issue**: N+1 Query Problem
**Severity**: 🟡 Important
**Impact**: For 100 posts, this executes 101 database queries instead of 1.
**Performance**: ~1000ms → ~10ms (100x improvement)
**Fix**:
```python
posts = Post.objects.select_related('author').all()
for post in posts:
print(post.author.name) # No additional queries
```
---
### Type Safety Issue
**File**: `utils.py:234`
```python
def calculate_total(prices):
return sum(prices) * 1.1
```
**Issue**: Missing Type Hints
**Severity**: 🟡 Important
**Description**:
Function lacks type hints, making it unclear what types are expected and returned. Could lead to runtime errors.
**Fix**:
```python
def calculate_total(prices: list[float]) -> float:
"""Calculate total with 10% tax.
Args:
prices: List of item prices
Returns:
Total amount including tax
"""
return sum(prices) * 1.1
```
---
### Architectural Concern
**File**: `api.py:89`
```python
@app.route('/process')
def process_data():
# 150 lines of business logic mixed with HTTP handling
data = request.get_json()
# ... lots of processing ...
return jsonify(result)
```
**Issue**: Fat Controller / Missing Service Layer
**Severity**: 🟡 Important
**Impact**:
- Hard to test business logic
- Violates Single Responsibility Principle
- Difficult to reuse logic elsewhere
**Recommendation**:
```python
# services/data_processor.py
class DataProcessor:
def process(self, data: dict) -> dict:
# Business logic here
return result
# api.py
@app.route('/process')
def process_data():
data = request.get_json()
processor = DataProcessor()
result = processor.process(data)
return jsonify(result)
```
---
## Minor Issues
### Code Smell
**File**: `helpers.py:45`
```python
def append_to_list(item, items=[]): # Mutable default argument!
items.append(item)
return items
```
**Issue**: Mutable Default Argument
**Severity**: 🔵 Minor
**Bug**: Default list is shared between all function calls, causing unexpected behavior.
**Example**:
```python
list1 = append_to_list('a') # ['a']
list2 = append_to_list('b') # ['a', 'b'] - UNEXPECTED!
```
**Fix**:
```python
def append_to_list(item, items=None):
if items is None:
items = []
items.append(item)
return items
```
---
### Dead Code
**File**: `old_utils.py:123`
```python
def legacy_function():
# This function is never called
pass
```
**Issue**: Unused Code
**Severity**: 🔵 Minor
**Recommendation**: Remove to improve code maintainability and reduce cognitive load.
---
### Complexity
**File**: `calculator.py:56`
```python
def complex_calculation(x, y, z, mode, options):
# 50 lines with nested if/else
# Cyclomatic complexity: 23 (Rank D)
...
```
**Issue**: High Cyclomatic Complexity
**Severity**: 🔵 Minor
**Impact**: Hard to understand, test, and maintain.
**Recommendation**: Refactor into smaller, focused functions:
```python
def complex_calculation(x, y, z, mode, options):
if mode == 'simple':
return _simple_calc(x, y, z)
elif mode == 'advanced':
return _advanced_calc(x, y, z, options)
else:
return _default_calc(x, y)
def _simple_calc(x, y, z):
...
def _advanced_calc(x, y, z, options):
...
```
---
## Testing Issues
### Missing Test Coverage
**File**: `payment.py:200`
```python
def process_refund(order_id, amount):
# Critical business logic with no tests!
order = Order.objects.get(id=order_id)
order.refund(amount)
send_notification(order.user, f"Refunded ${amount}")
```
**Issue**: Missing Tests for Critical Path
**Severity**: 🟡 Important
**Recommendation**: Add comprehensive tests:
```python
# tests/test_payment.py
def test_process_refund_success():
order = create_test_order(amount=100)
process_refund(order.id, 50)
assert order.amount == 50
assert_notification_sent(order.user)
def test_process_refund_exceeds_amount():
order = create_test_order(amount=100)
with pytest.raises(ValueError):
process_refund(order.id, 150)
def test_process_refund_invalid_order():
with pytest.raises(Order.DoesNotExist):
process_refund(99999, 50)
```
---
## Information / Suggestions
### Opportunity for Optimization
**File**: `data_processor.py:78`
```python
results = []
for item in large_dataset:
results.append(transform(item))
```
**Suggestion**: Use list comprehension or generator for better performance
```python
# List comprehension (if all results needed in memory)
results = [transform(item) for item in large_dataset]
# Generator (if processing one at a time)
results = (transform(item) for item in large_dataset)
```
---
### Modern Python Pattern
**File**: `file_handler.py:34`
```python
f = open('data.txt', 'r')
data = f.read()
f.close() # May not execute if read() raises exception
```
**Suggestion**: Use context manager
```python
with open('data.txt', 'r') as f:
data = f.read()
# File automatically closed, even if exception occurs
# Or use pathlib
from pathlib import Path
data = Path('data.txt').read_text()
```
---
## Comment Format Guidelines
### Structure
```
**File**: path/to/file.py:line_number
[Code snippet if helpful]
**Issue**: Brief title
**Severity**: 🔴 Critical | 🟡 Important | 🔵 Minor | ⚪ Info
**Description**: Detailed explanation
**Impact/Why it matters**: Consequences
**Fix/Recommendation**: Concrete solution with code example
**Reference**: Links to docs, CVEs, etc. (if applicable)
```
### Severity Levels
- 🔴 **Critical**: Security vulnerabilities, data corruption, production failures
- 🟡 **Important**: Performance issues, type safety, architectural problems, missing tests
- 🔵 **Minor**: Code smells, complexity, dead code, minor bugs
-**Info**: Suggestions, optimizations, style (only if blocking automation)
### Tone
- Be specific and actionable
- Explain the "why" not just the "what"
- Provide code examples
- Reference authoritative sources
- Acknowledge good code when present
- Be constructive, not critical

View File

@@ -0,0 +1,263 @@
# Code Review Report: [Project Name]
**Date**: [YYYY-MM-DD]
**Reviewer**: Claude Code
**Scope**: [Brief description of what was reviewed]
---
## Executive Summary
**Overall Assessment**: [Excellent | Good | Fair | Needs Improvement | Critical Issues Found]
**Key Findings**:
- Critical Issues: [N]
- Important Issues: [N]
- Performance Concerns: [N]
- Security Vulnerabilities: [N]
**Recommendation**: [Summary recommendation - e.g., "Address critical security issues before deployment" or "Code is production-ready with minor improvements recommended"]
---
## Critical Issues
### 1. [Issue Title]
**Severity**: Critical
**Category**: [Security | Data Corruption | Production Failure]
**Location**: `file.py:123`
**Description**:
[Detailed description of the issue]
**Impact**:
[What could go wrong if not fixed]
**Recommendation**:
```python
# Before (vulnerable)
[problematic code]
# After (fixed)
[corrected code]
```
**References**:
- [CWE-XXX](link) or [OWASP](link) if applicable
---
### 2. [Next Critical Issue]
...
---
## Important Issues
### Performance Bottleneck: [Description]
**Location**: `file.py:456`
**Impact**: [e.g., "O(n²) complexity causes slowdown with large datasets"]
**Analysis**:
[Explanation of the performance issue]
**Recommendation**:
```python
# Current implementation (slow)
[current code]
# Optimized implementation
[improved code]
```
**Expected Improvement**: [e.g., "100x faster for 10,000 items"]
---
### Security Concern: [Description]
**Location**: `file.py:789`
**Severity**: Important
**Details**:
[Description of security concern]
**Fix**:
```python
[corrected code]
```
---
## Architecture and Design
### Concerns
1. **Tight Coupling**: [Description]
- Location: [files]
- Recommendation: [architectural improvement]
2. **Missing Abstractions**: [Description]
- Impact: [code duplication, hard to test, etc.]
- Recommendation: [refactoring suggestion]
### Positive Patterns
- [Well-implemented pattern 1]
- [Good design choice 2]
---
## Performance Analysis
### CPU Profiling Results
**Top Hotspots**:
1. `function_name()` in `file.py`: [X]ms cumulative ([Y]% of total)
2. [Next hotspot]
### Memory Usage
**Peak Memory**: [X] MB
**Concerns**:
- [Memory leak in function X]
- [Inefficient data structure in Y]
### Recommendations
1. [Specific performance improvement 1]
2. [Specific performance improvement 2]
---
## Code Quality
### Complexity Analysis
**High Complexity Functions**:
- `function_name()` (file.py:123): Complexity 25 (Rank C)
- Recommendation: Refactor into smaller functions
### Dead Code
**Unused Code Found**:
- `unused_function()` in utils.py
- Variable `UNUSED_CONSTANT` in config.py
**Recommendation**: Remove to improve maintainability
---
## Testing
### Coverage Analysis
**Current Coverage**: [X]%
**Missing Critical Tests**:
1. Edge case: [description]
2. Error path: [description]
3. Integration test: [description]
### Test Quality Issues
- [Issue with existing tests]
- [Recommendation for improvement]
---
## Dependencies
### Vulnerable Dependencies
| Package | Current | Vulnerability | Fix |
|---------|---------|---------------|-----|
| package-name | 1.0.0 | CVE-XXXX-XXXX | Upgrade to 1.1.0 |
### Outdated Dependencies
- [List of significantly outdated packages]
---
## Minor Issues and Suggestions
### Style and Conventions
**Note**: These should be handled by automated tools (ruff, isort, basedpyright) in CI/CD.
- [Only list if blocking automated tool adoption]
### Documentation
- Missing docstrings: [list key functions]
- Unclear variable names: [examples]
---
## Positive Highlights
**Well-Implemented Features**:
1. [Good pattern or implementation 1]
2. [Good practice observed 2]
3. [Security measure properly implemented]
---
## Recommendations Priority Matrix
### Immediate (Before Deployment)
1. [ ] Fix SQL injection vulnerability (file.py:123)
2. [ ] Address race condition in payment processing (payment.py:456)
3. [ ] Fix memory leak in upload handler (upload.py:789)
### High Priority (This Sprint)
1. [ ] Optimize N+1 query in user list (views.py:234)
2. [ ] Add missing authentication check (api.py:567)
3. [ ] Implement error handling in critical path (processor.py:890)
### Medium Priority (Next Sprint)
1. [ ] Refactor high complexity functions
2. [ ] Add integration tests for payment flow
3. [ ] Update vulnerable dependencies
### Low Priority (Backlog)
1. [ ] Remove dead code
2. [ ] Improve documentation
3. [ ] Consider architectural refactoring for module X
---
## Automated Tool Results Summary
- **Ruff**: [N] issues found
- **Basedpyright**: [N] type errors
- **Bandit**: [N] security issues
- **Safety**: [N] vulnerable dependencies
- **Performance Profiler**: [Summary of findings]
**Detailed reports**: See `review_results/` directory
---
## Conclusion
[Overall assessment paragraph summarizing the review, key takeaways, and next steps]
**Approval Status**: [Approved | Approved with Conditions | Requires Changes | Blocked]
**Next Steps**:
1. [Action item 1]
2. [Action item 2]
3. [Action item 3]
---
**Review Conducted By**: Claude Code Python Review Skill
**Tools Used**: ruff, basedpyright, isort, bandit, safety, performance_profiler