266 lines
6.6 KiB
Markdown
266 lines
6.6 KiB
Markdown
---
|
|
title: Git History Analysis
|
|
description: Patterns for parsing git commits and PRs to infer changelog entries
|
|
tags: [git, commits, changelog, parsing]
|
|
---
|
|
|
|
# Git History Analysis
|
|
|
|
## Metadata
|
|
|
|
**Purpose**: Extract changelog entries from git history
|
|
**Version**: 1.0.0
|
|
|
|
---
|
|
|
|
## Git Log Commands
|
|
|
|
### Retrieve Commits in Range
|
|
|
|
**All commits since last tag**:
|
|
```bash
|
|
git log $(git describe --tags --abbrev=0)..HEAD --pretty=format:"%h|%s|%an|%ad" --date=short
|
|
```
|
|
|
|
**All commits since specific date**:
|
|
```bash
|
|
git log --since="2025-01-01" --pretty=format:"%h|%s|%an|%ad" --date=short
|
|
```
|
|
|
|
**All commits between two refs**:
|
|
```bash
|
|
git log v1.0.0..v2.0.0 --pretty=format:"%h|%s|%an|%ad" --date=short
|
|
```
|
|
|
|
**Format explained**:
|
|
- `%h` - Short commit hash
|
|
- `%s` - Subject/title
|
|
- `%an` - Author name
|
|
- `%ad` - Author date
|
|
- `|` - Delimiter for parsing
|
|
|
|
### Exclude Noise Commits
|
|
|
|
**Filter out merge commits**:
|
|
```bash
|
|
git log --no-merges
|
|
```
|
|
|
|
**Filter out version bump commits**:
|
|
```bash
|
|
git log --grep="^Bump version" --invert-grep
|
|
```
|
|
|
|
**Combined filters**:
|
|
```bash
|
|
git log --no-merges --grep="^Bump version\|^Release" --invert-grep
|
|
```
|
|
|
|
## Commit Message Parsing
|
|
|
|
### Conventional Commits
|
|
|
|
**Format**: `<type>(<scope>): <description>`
|
|
|
|
See "Changelog Section Mapping" table below for complete type-to-section mappings.
|
|
|
|
**Examples**:
|
|
```
|
|
feat(api): add customer export endpoint → Added
|
|
fix(auth): resolve timeout on login → Fixed
|
|
perf(db): optimize query performance → Changed
|
|
security(deps): update vulnerable packages → Security
|
|
```
|
|
|
|
**Parsing regex**: `^(feat|fix|docs|refactor|perf|test|chore|security)(\([^\)]+\))?: (.+)$`
|
|
|
|
### Non-Conventional Commits
|
|
|
|
**Keyword-based categorization**:
|
|
|
|
**Added indicators**:
|
|
- Starts with: "Add", "Implement", "Introduce", "Create"
|
|
- Contains: "new feature", "support for"
|
|
|
|
**Fixed indicators**:
|
|
- Starts with: "Fix", "Resolve", "Correct", "Patch"
|
|
- Contains: "bug", "issue", "error", "crash"
|
|
|
|
**Changed indicators**:
|
|
- Starts with: "Update", "Improve", "Enhance", "Optimize", "Refactor"
|
|
- Contains: "performance", "efficiency"
|
|
|
|
**Removed indicators**:
|
|
- Starts with: "Remove", "Delete", "Drop"
|
|
- Contains: "deprecated", "legacy"
|
|
|
|
**Security indicators**:
|
|
- Contains: "security", "vulnerability", "CVE-", "XSS", "injection"
|
|
|
|
**Example classification**:
|
|
```
|
|
"Add customer segmentation feature" → Added
|
|
"Fix memory leak in batch processing" → Fixed
|
|
"Update data validation logic" → Changed
|
|
"Remove deprecated API endpoints" → Removed
|
|
"Patch security vulnerability in auth" → Security
|
|
```
|
|
|
|
## PR Title Parsing
|
|
|
|
GitHub/GitLab PR titles often follow similar patterns:
|
|
|
|
**PR number extraction**:
|
|
```bash
|
|
gh pr list --state merged --limit 100 --json number,title,mergedAt
|
|
```
|
|
|
|
**Parse PR titles** using same rules as commit messages
|
|
|
|
**Prefer PR titles over commits** when available (cleaner, more user-facing)
|
|
|
|
## Extracting PR/Issue References
|
|
|
|
**From commit messages**:
|
|
- Pattern 1: `(#123)` at end of message
|
|
- Pattern 2: `#123` anywhere in message
|
|
- Pattern 3: `Merge pull request #123`
|
|
- Pattern 4: `Fixes #123`, `Closes #123`, `Resolves #123`
|
|
|
|
**From PR list**:
|
|
```bash
|
|
# Get PR number associated with commit
|
|
gh pr list --state merged --search "sha:COMMIT_HASH" --json number
|
|
```
|
|
|
|
**Link format in changelog**:
|
|
- Short form: `(#123)` for PR/issue number
|
|
- Full entries: `- Feature description (#123)`
|
|
- Multi-ref: `- Feature description (#123, #124)`
|
|
|
|
## Filtering Strategy
|
|
|
|
### Skip These Commits
|
|
|
|
**Merge commits**:
|
|
- Pattern: `^Merge (branch|pull request)`
|
|
|
|
**Version bumps**:
|
|
- Pattern: `^(Bump|Release|Version) (version )?[0-9]`
|
|
|
|
**Internal housekeeping**:
|
|
- Pattern: `^(chore|ci|test|docs):`
|
|
- Exception: Major docs overhauls might be "Added"
|
|
|
|
**WIP commits**:
|
|
- Pattern: `^WIP:`, `^temp:`, `^tmp:`
|
|
|
|
### Grouping and Deduplication
|
|
|
|
**Group by scope** (if using conventional commits):
|
|
```
|
|
feat(api): add export endpoint
|
|
feat(api): add import endpoint
|
|
→ Group under "API" subsection in Added
|
|
```
|
|
|
|
**Deduplicate similar commits**:
|
|
```
|
|
fix: typo in validation message
|
|
fix: another typo in error message
|
|
→ Combine as "Fixed typos in validation messages"
|
|
```
|
|
|
|
**Squash implementation details**:
|
|
```
|
|
feat: add customer export (part 1)
|
|
feat: add customer export (part 2)
|
|
feat: finalize customer export
|
|
→ Single entry: "Customer export feature"
|
|
```
|
|
|
|
## Changelog Section Mapping
|
|
|
|
**Type → Section mapping**:
|
|
|
|
| Commit Type | Changelog Section | Notes |
|
|
|-------------|-------------------|-------|
|
|
| `feat` | Added | New features |
|
|
| `fix` | Fixed | Bug fixes |
|
|
| `perf` | Changed | Performance improvements |
|
|
| `refactor` | Changed | Only if user-facing |
|
|
| `security` | Security | Security patches |
|
|
| `deprecate` | Deprecated | Features marked for removal |
|
|
| `remove` | Removed | Deleted features |
|
|
| `docs` | (Skip) | Unless major docs addition |
|
|
| `test` | (Skip) | Internal only |
|
|
| `chore` | (Skip) | Internal only |
|
|
| `ci` | (Skip) | Internal only |
|
|
|
|
## Extracting User-Facing Changes
|
|
|
|
**Focus on impact, not implementation**:
|
|
|
|
**Good**: "Added support for multi-year trend analysis"
|
|
**Bad**: "Refactored TrendAnalyzer class to support configurable date ranges"
|
|
|
|
**Transformation examples**:
|
|
|
|
```
|
|
Git: "refactor: extract validation logic to separate module"
|
|
→ Skip (internal refactoring)
|
|
|
|
Git: "feat: implement caching layer for API responses"
|
|
→ Added: "API response caching for improved performance"
|
|
|
|
Git: "fix: handle edge case in date parsing"
|
|
→ Fixed: "Corrected date parsing for edge cases"
|
|
|
|
Git: "perf(db): add indexes to customer table"
|
|
→ Changed: "Improved database query performance"
|
|
```
|
|
|
|
## Git Commands Summary
|
|
|
|
**Get commits since last version**:
|
|
```bash
|
|
git log $(git describe --tags --abbrev=0)..HEAD --no-merges --pretty=format:"%h|%s"
|
|
```
|
|
|
|
**Get all tags** (for version detection):
|
|
```bash
|
|
git tag --list --sort=-version:refname
|
|
```
|
|
|
|
**Get commit count** (for version increment hint):
|
|
```bash
|
|
git rev-list --count $(git describe --tags --abbrev=0)..HEAD
|
|
```
|
|
|
|
**Check if commit is a merge**:
|
|
```bash
|
|
git rev-list --parents -n 1 <commit-hash> | grep -q " .* .*"
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Prefer PR titles** over individual commits when available
|
|
2. **Group related changes** by feature or component
|
|
3. **Use user-facing language**, avoid technical jargon
|
|
4. **Skip internal commits** that don't affect users
|
|
5. **Combine granular commits** into cohesive changelog entries
|
|
6. **Link to issues/PRs** for detailed context
|
|
7. **Preserve intent**, even if wording changes
|
|
|
|
## Example Workflow
|
|
|
|
1. Get commits: `git log v1.0.0..HEAD --no-merges`
|
|
2. Parse each commit message
|
|
3. Classify by type (feat/fix/etc.)
|
|
4. Filter out internal commits
|
|
5. Group by scope or feature
|
|
6. Rephrase in user-facing language
|
|
7. Map to changelog sections
|
|
8. Deduplicate and clean up
|
|
9. Format as changelog entries
|