Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:01:57 +08:00
commit 2e6b99fcf7
16 changed files with 4443 additions and 0 deletions

View File

@@ -0,0 +1,317 @@
---
title: Commit Message Standards
description: Standard commit message format and quality expectations
tags: [git, commits, standards, conventions, conventional-commits]
---
# Commit Message Standards
## Metadata
**Purpose**: Define Standard commit message format and quality expectations
**Applies to**: All Git commits in projects
**Version**: 1.0.0
---
## Instructions
### Conventional Commits Format
All commits must follow the **Conventional Commits** specification:
```
<type>(<scope>): <subject>
[optional body]
[optional footer(s)]
```
### Required Elements
**Type** (required): Indicates the kind of change
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation changes
- `refactor`: Code refactoring (no functional changes)
- `test`: Adding or updating tests
- `chore`: Maintenance tasks (dependencies, tooling)
- `perf`: Performance improvements
- `style`: Code style changes (formatting, whitespace)
- `ci`: CI/CD configuration changes
**Scope** (optional but recommended): Component or module affected
- Examples: `auth`, `api`, `ui`, `database`, `config`
- Keep lowercase and concise
**Subject** (required): Clear, concise description
- Use imperative mood ("add" not "added" or "adds")
- Start with lowercase letter
- No period at the end
- Maximum 50 characters
- Be specific about what changed
### Quality Criteria
**✅ Good commit messages**:
- Clearly communicate intent and context
- Answer: "What does this commit do and why?"
- Use specific, descriptive language
- Focus on the "why" not just the "what"
**❌ Prohibited patterns**:
- Vague messages: "fix bug", "update code", "changes"
- WIP commits: "wip", "work in progress", "temp"
- Generic messages: "fixes", "updates", "misc"
- All caps: "FIX BUG IN LOGIN"
- Emoji without clear text (text is primary, emoji optional)
### Character Limits
- **Subject line**: ≤50 characters (ideal), ≤72 characters (hard limit)
- **Body lines**: Wrap at 72 characters
- **Blank line**: Always separate subject from body
### Issue Linking Policy
**When issue references are required**:
- `feat` - New features (REQUIRED)
- `fix` - Bug fixes (REQUIRED)
- Breaking changes (REQUIRED)
- `refactor` - Large refactorings (RECOMMENDED)
- `perf` - Performance improvements (RECOMMENDED)
**When issue references are optional**:
- `docs` - Documentation-only changes
- `chore` - Dependency updates, tooling
- `test` - Test-only changes
- `style` - Formatting-only changes
**Format in footer**:
- `Closes #123` - Closes an issue (for features, completed work)
- `Fixes #456` - Fixes a bug (for bug fixes)
- `Relates to #789` - Related but doesn't close (for partial work)
- `Refs #101, #102` - References multiple issues
**If no issue exists**:
- Create one first for features/bugs
- OR provide detailed context in commit body explaining why no issue
**Examples**:
```
feat(api): add customer export endpoint
Closes #234
```
```
fix(auth): resolve session timeout on mobile
Multiple users reported session timeouts after 5 minutes of inactivity
on mobile devices. Root cause was aggressive token expiration settings.
Fixes #456
```
### Examples
**Good examples**:
```
feat(auth): add two-factor authentication support
Implements TOTP-based 2FA for user accounts to enhance security.
Users can enable 2FA in account settings and must verify with
authenticator app on subsequent logins.
Closes #123
```
```
fix(api): resolve race condition in order processing
Multiple concurrent requests were creating duplicate orders due to
non-atomic transaction handling. Added database-level locking to
ensure order creation is properly serialized.
Fixes #456
```
```
docs(readme): update installation instructions for Python 3.11
```
**Bad examples**:
```
❌ "fixed stuff"
Problem: Too vague, no context
❌ "update code"
Problem: Not specific, doesn't explain what or why
❌ "WIP: working on feature"
Problem: WIP commits should be rebased before merging
❌ "Added new feature and fixed several bugs and updated docs"
Problem: Multiple unrelated changes, should be separate commits
```
---
## Resources
### Detailed Type Guidelines
**feat** - New features or capabilities
- Adding a new API endpoint
- Implementing a new user-facing feature
- Adding support for a new data format
- Example: `feat(api): add bulk import endpoint for customer data`
**fix** - Bug fixes
- Resolving incorrect behavior
- Fixing crashes or errors
- Correcting data issues
- Example: `fix(validation): prevent negative values in quantity field`
**docs** - Documentation only
- README updates
- Code comments
- API documentation
- User guides
- Example: `docs(api): add examples for authentication endpoints`
**refactor** - Code restructuring
- Extracting functions or classes
- Renaming for clarity
- Reorganizing file structure
- No behavior changes
- Example: `refactor(services): extract common validation logic to utilities`
**test** - Test additions or updates
- Adding missing test coverage
- Updating tests for refactored code
- Fixing broken tests
- Example: `test(auth): add integration tests for 2FA flow`
**chore** - Maintenance tasks
- Dependency updates
- Build configuration
- Tooling changes
- Example: `chore(deps): update pandas to 2.0.0`
**perf** - Performance improvements
- Optimization changes
- Reducing memory usage
- Improving query performance
- Example: `perf(database): add index on user_id for faster lookups`
### Edge Cases
**Breaking changes**: Use `!` after type or `BREAKING CHANGE:` in footer
```
feat(api)!: change authentication endpoint to use OAuth2
BREAKING CHANGE: The /auth endpoint now requires OAuth2 tokens
instead of API keys. Update all clients to use the new flow.
```
**Merge commits**: Auto-generated, acceptable as-is
```
Merge pull request #123 from feature/new-auth
```
**Reverts**: Use `revert:` prefix
```
revert: feat(auth): add two-factor authentication support
This reverts commit abc123def456 due to production issues.
```
**Co-authors**: Add in footer
```
feat(analytics): add user behavior tracking
Co-authored-by: Jane Doe <jane@example.com>
Co-authored-by: John Smith <john@example.com>
```
### Integration with Tooling
**Git hooks**: Use pre-commit hooks to validate format
```bash
# .git/hooks/commit-msg
#!/bin/sh
commit_msg=$(cat "$1")
pattern="^(feat|fix|docs|refactor|test|chore|perf|style|ci)(\(.+\))?: .+"
if ! echo "$commit_msg" | grep -qE "$pattern"; then
echo "Error: Commit message must follow Conventional Commits format"
echo "Format: <type>(<scope>): <subject>"
exit 1
fi
```
**Commit templates**: Configure Git to use a template
```bash
git config commit.template .gitmessage
```
`.gitmessage`:
```
# <type>(<scope>): <subject> (max 50 chars)
# |<---- Using a Maximum Of 50 Characters ---->|
# Explain why this change is being made
# |<---- Try To Limit Each Line to a Maximum Of 72 Characters ---->|
# Provide links or keys to any relevant tickets, articles or resources
# Example: Closes #23
# --- COMMIT END ---
# Type can be:
# feat (new feature)
# fix (bug fix)
# refactor (code change without feature/fix)
# docs (documentation changes)
# test (adding/updating tests)
# chore (maintenance)
# perf (performance improvement)
# style (formatting, missing semicolons)
# ci (CI/CD changes)
# --------------------
# Remember:
# - Use imperative mood ("add" not "added")
# - Subject starts with lowercase
# - No period at end of subject
# - Separate subject from body with blank line
# - Wrap body at 72 characters
# --------------------
```
### Personal Patterns
**Multi-repo changes**: Reference related commits
```
feat(shared): update data models for new customer schema
Related changes:
- api-service: abc123
- web-app: def456
```
**Research experiments**: Use `experiment` scope
```
feat(experiment): implement XGBoost model variant for A/B test
```
**Data pipeline changes**: Be specific about data impact
```
fix(pipeline): correct date parsing in customer import job
Previous logic incorrectly parsed ISO dates with timezone offsets,
causing records from 11pm-midnight UTC to be assigned wrong dates.
Affects ~500 records from 2024-01-15 to 2024-01-20.
```

View File

@@ -0,0 +1,237 @@
---
title: GitHub Issue Analysis
description: Scoring rubrics and clustering methodology for analyzing GitHub issues
tags: [github, issues, scoring, clustering, analysis]
---
# GitHub Issue Analysis
## Metadata
**Purpose**: Analyze GitHub issues with multi-dimensional scoring and clustering
**Applies to**: Repository investigation and issue backlog analysis
**Version**: 1.0.0
---
## Instructions
### Issue Scoring Dimensions
Score each issue on a scale of 1-5 (low to high) across four dimensions:
#### 1. Risk Score (1-5)
Assess the potential negative impact if the issue is NOT addressed:
- **5 - Critical Risk**: Security vulnerability, data loss, production outage, compliance violation
- **4 - High Risk**: Major feature broken, significant performance degradation, user-blocking bug
- **3 - Medium Risk**: Moderate bug affecting some users, technical debt that will compound
- **2 - Low Risk**: Minor bug with workarounds, cosmetic issue, edge case problem
- **1 - Minimal Risk**: Nice-to-have improvement, suggestion, question
**Detection patterns**:
- Keywords: `security`, `vulnerability`, `crash`, `data loss`, `broken`, `urgent`, `blocker`
- Labels: `security`, `critical`, `bug`, `p0`, `blocker`
- Issue state: Open for >90 days with high comment activity
#### 2. Value Score (1-5)
Assess the positive impact if the issue IS addressed:
- **5 - Transformative**: Major feature enabling new use cases, significant UX improvement
- **4 - High Value**: Important enhancement improving core workflows, performance boost
- **3 - Medium Value**: Useful improvement benefiting many users, code quality enhancement
- **2 - Low Value**: Minor convenience, small optimization, documentation improvement
- **1 - Minimal Value**: Cosmetic change, personal preference, duplicate effort
**Detection patterns**:
- Keywords: `feature`, `enhancement`, `improve`, `optimize`, `speed up`, `user request`
- Labels: `enhancement`, `feature`, `performance`, `ux`
- Community engagement: High thumbs-up reactions, many comments requesting feature
#### 3. Ease Score (1-5)
Assess the effort required to implement (inverse scale - 5 is easiest):
- **5 - Trivial**: Documentation update, config change, simple one-liner fix
- **4 - Easy**: Small bug fix, minor refactor, straightforward enhancement (<1 day)
- **3 - Moderate**: Feature addition requiring new code, moderate refactor (1-3 days)
- **2 - Challenging**: Complex feature, architectural change, multiple dependencies (1-2 weeks)
- **1 - Very Hard**: Major refactor, breaking changes, requires research/prototyping (>2 weeks)
**Detection patterns**:
- Keywords: `easy`, `good first issue`, `help wanted`, `typo`, `documentation`
- Labels: `good-first-issue`, `easy`, `documentation`, `beginner-friendly`
- Code references: Issues with specific file/line references are often easier
#### 4. DocQuality Score (1-5)
Assess how well the issue is documented:
- **5 - Excellent**: Clear description, reproduction steps, expected behavior, context, examples
- **4 - Good**: Clear problem statement, some context, actionable
- **3 - Adequate**: Basic description, understandable but missing details
- **2 - Poor**: Vague description, missing context, requires clarification
- **1 - Very Poor**: Unclear, no details, cannot action without extensive follow-up
**Detection patterns**:
- Length: Longer issues (>500 chars) often have better documentation
- Structure: Issues with headers, code blocks, numbered steps
- Completeness: Includes version info, environment details, error messages
### Issue Clustering and Themes
Group related issues by detecting common themes:
#### Theme Detection Approach
1. **Keyword extraction**: Extract significant terms from title and body
2. **Label analysis**: Group by common labels
3. **Component detection**: Identify affected components/modules
4. **Similarity scoring**: Find issues with overlapping keywords
#### Common Theme Categories
- **Bug Clusters**: Similar error messages, same component, related failures
- **Feature Requests**: Related enhancements, common user needs
- **Performance Issues**: Speed, memory, scalability concerns
- **Documentation**: Missing docs, unclear guides, examples needed
- **Dependencies**: Third-party library issues, version conflicts
- **DevOps/Infrastructure**: CI/CD, deployment, build issues
- **Security**: Vulnerabilities, auth, permissions
#### Clustering Algorithm
```
For each issue:
1. Extract keywords from title and body (remove stopwords)
2. Extract labels
3. Count keyword overlaps with other issues
4. If overlap > threshold (e.g., 3+ keywords), mark as related
5. Group related issues into clusters
6. Name cluster by most common keywords
```
### Staleness Detection
**Definition**: Issues with no activity for ≥N days (default: 60)
**Activity indicators**:
- New comments
- Label changes
- Status updates
- Commits referencing the issue
**Staleness categories**:
- **Fresh**: Activity within last 30 days
- **Aging**: 30-60 days since last activity
- **Stale**: 60-90 days since last activity
- **Very Stale**: 90+ days since last activity
**Action recommendations**:
- **Very Stale**: Consider closing with explanation or ping assignees
- **Stale**: Review priority, add to backlog grooming
- **Aging**: Monitor for staleness
- **Fresh**: Active work, no action needed
### Issue Bucketing
Categorize issues into actionable buckets based on scores:
1. **High Value**: Value ≥ 4, Risk ≥ 3 → Priority work
2. **High Risk**: Risk ≥ 4 → Address urgently regardless of value
3. **Newcomer Friendly**: Ease ≥ 4, DocQuality ≥ 3 → Good first issues
4. **Automation Candidate**: Ease ≥ 4 AND at least 2 similar issues detected by clustering → Consider scripting/automation
5. **Needs Context**: DocQuality ≤ 2 → Request more information
6. **Stale**: No activity ≥ stale_days → Consider closing/archiving
---
## Resources
### Scoring Examples
#### Example 1: Security Vulnerability
```
Title: "SQL injection in login endpoint"
Risk: 5 (critical security issue)
Value: 5 (must fix)
Ease: 3 (moderate fix, requires testing)
DocQuality: 5 (clear reproduction steps, example payload)
Bucket: High Risk
```
#### Example 2: Documentation Typo
```
Title: "Fix typo in README.md - 'installtion' -> 'installation'"
Risk: 1 (minimal impact)
Value: 1 (minor improvement)
Ease: 5 (trivial one-character change)
DocQuality: 5 (exact location specified)
Bucket: Newcomer Friendly
```
#### Example 3: Feature Request
```
Title: "Add dark mode support"
Risk: 1 (no negative impact if not done)
Value: 4 (highly requested UX improvement)
Ease: 2 (significant UI/CSS work)
DocQuality: 3 (clear request but missing design specs)
Bucket: High Value
```
#### Example 4: Vague Bug Report
```
Title: "App doesn't work"
Risk: ? (unclear)
Value: ? (unclear)
Ease: ? (cannot assess)
DocQuality: 1 (no details whatsoever)
Bucket: Needs Context
```
### CLI Integration
The analysis uses GitHub CLI (`gh`) for read-only access:
```bash
# Fetch all open issues
gh issue list --state open --limit 1000 --json number,title,body,labels,createdAt,updatedAt,comments
# Fetch issues with specific labels
gh issue list --label bug --state open --json number,title,body,labels,createdAt,updatedAt
# Fetch issues since specific date
gh issue list --state all --search "updated:>2024-01-01"
```
### Output Format
**Console Output**: Structured dashboard with tables and buckets
**Markdown Output**: Detailed report with scores, themes, and recommendations
Example markdown structure:
```markdown
# GitHub Issue Analysis
## Summary
- Total issues analyzed: 47
- Stale issues: 12
- High risk issues: 3
- High value issues: 8
## Issue Buckets
### High Risk (3 issues)
- #42: SQL injection vulnerability [Risk:5, Value:5, Ease:3]
- #38: Data corruption in export [Risk:5, Value:4, Ease:2]
### High Value (8 issues)
...
## Themes
1. **Performance** (5 issues): #12, #23, #34, #45, #56
2. **Documentation** (8 issues): ...
## Stale Issues
- #15: Last activity 87 days ago
- #22: Last activity 92 days ago
```

View File

@@ -0,0 +1,368 @@
---
title: GitHub Workflow Patterns
description: branching strategies, merge practices, and GitHub collaboration patterns
tags: [github, workflow, branching, merging, collaboration, best-practices]
---
# GitHub Workflow Patterns
## Metadata
**Purpose**: Best practices for branching, merging, releases, and GitHub features
**Applies to**: All projects using GitHub
**Version**: 1.0.0
---
## Instructions
### Branch Naming Conventions
**Standard format**:
```
<type>/<short-description>
```
**Types**:
- `feature/` - New features
- `fix/` or `bugfix/` - Bug fixes
- `hotfix/` - Urgent production fixes
- `refactor/` - Code refactoring
- `docs/` - Documentation updates
- `experiment/` - Research experiments
- `chore/` - Maintenance tasks
**Examples**:
-`feature/add-customer-segmentation`
-`fix/resolve-auth-timeout`
-`hotfix/patch-security-vulnerability`
-`refactor/extract-validation-logic`
-`john-dev` (not descriptive)
-`temp-branch` (not meaningful)
-`FEATURE-123` (all caps, no description)
**Guidelines**:
- Use lowercase with hyphens
- Be descriptive but concise
- Include ticket number if applicable: `feature/PROJ-123-add-export`
- Avoid personal names or dates
### Merge Strategies
**Squash Merge** (recommended for most PRs)
- Combines all commits into single commit
- Clean, linear history
- Use when: Feature branch has messy commit history
```bash
# GitHub: "Squash and merge" button
# CLI:
git merge --squash feature/my-feature
git commit -m "feat: add new feature"
```
**Merge Commit** (preserve commit history)
- Keeps all individual commits
- Shows full development history
- Use when: Commits are clean and tell a story
```bash
# GitHub: "Create a merge commit" button
# CLI:
git merge --no-ff feature/my-feature
```
**Rebase** (update feature branch)
- Replay commits on top of target branch
- Linear history, no merge commits
- Use when: Updating feature branch with main
```bash
# Update feature branch with main
git checkout feature/my-feature
git rebase main
```
**Default**: Squash merge for cleaner history
### Pre-Commit Hook Requirements
**All projects should use pre-commit hooks for**:
1. **Code quality**:
- Linting (ruff, eslint, etc.)
- Formatting (black, prettier, etc.)
- Type checking (mypy, typescript)
2. **Security**:
- Secret detection (detect-secrets)
- Dependency scanning
3. **Git hygiene**:
- Commit message format validation
- Prevent commits to main/master
- Check for debugging code
**Setup**:
```bash
# Install pre-commit
pip install pre-commit
# Install hooks
pre-commit install
# .pre-commit-config.yaml example
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-added-large-files
- id: check-merge-conflict
- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.9
hooks:
- id: ruff
```
### Draft PRs
**Use draft PRs for**:
- Work in progress
- Seeking early architectural feedback
- CI/CD pipeline testing
- Demonstrating approach
**Workflow**:
```
1. Create PR as draft
2. Add [WIP] or [Draft] prefix to title (optional)
3. Request feedback in comments
4. Mark "Ready for review" when complete
```
**Draft PR checklist before marking ready**:
- [ ] All tests passing
- [ ] Code complete (no TODOs for this PR)
- [ ] Description updated
- [ ] Self-review completed
### GitHub Actions Integration Points
**Common automation use cases**:
1. **PR validation**:
- Run tests on every PR
- Check code coverage
- Lint and format checks
- Security scans
2. **Branch protection**:
- Require status checks to pass
- Require reviews before merge
- Prevent force pushes to main
- Require signed commits
3. **Automated workflows**:
- Auto-label PRs based on files changed
- Auto-assign reviewers
- Post coverage reports
- Deploy to staging environments
**Example workflow** (`.github/workflows/pr-checks.yml`):
```yaml
name: PR Checks
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: pytest
- name: Check coverage
run: pytest --cov --cov-fail-under=80
```
### Protected Branches
**Main branch should be protected with**:
- Require PR before merging
- Require status checks to pass
- Require review from code owners
- Dismiss stale reviews on new commits
- Require linear history (optional)
- Prevent force pushes
- Prevent deletions
---
## Resources
### Complete Workflow Example
**Feature Development Workflow**:
```bash
# 1. Create and switch to feature branch
git checkout main && git pull origin main
git checkout -b feature/add-export-api
# 2. Make changes and commit
git add src/api/export.py
git commit -m "feat(api): add customer data export endpoint"
# 3. Push and create PR
git push -u origin feature/add-export-api
gh pr create --title "feat: add customer data export API" \
--body "Adds endpoint for exporting customer data as CSV/JSON"
# 4. Address review feedback, merge via GitHub UI (squash merge)
# 5. Clean up local branch
git checkout main && git pull origin main
git branch -d feature/add-export-api
```
**For hotfixes**: Use `hotfix/` branch type, label as `security,hotfix`, request expedited review
**For rebasing**: `git fetch origin && git rebase origin/main && git push --force-with-lease`
### Conflict Resolution
```bash
git status # Identify conflicted files
# Resolve conflicts in editor (look for <<<<<<<, =======, >>>>>>>)
git add <resolved-files>
git rebase --continue # or git merge --continue
git push --force-with-lease
```
**Prevention**: Keep PRs small, merge main frequently, use feature flags
### Release Workflows
**SemVer** (`vMAJOR.MINOR.PATCH`): For libraries/APIs - MAJOR=breaking, MINOR=features, PATCH=fixes
**CalVer** (`vYYYY.MM.MICRO`): For applications/pipelines with time-based releases
**Release Process**:
```bash
git checkout -b release/v1.2.0
# Update version files and CHANGELOG.md
gh pr create --title "chore: release v1.2.0"
# After merge:
git checkout main && git pull origin main
git tag -a v1.2.0 -m "Release v1.2.0" && git push origin v1.2.0
gh release create v1.2.0 --title "Version 1.2.0" --notes "Release notes..."
```
### Personal Patterns
**Multi-Repo Changes**
When changes span multiple repositories:
```markdown
## Related PRs
This change requires updates across multiple repos:
- [ ] API Service: username/repo#123
- [ ] Web App: username/repo#456
- [ ] Data Pipeline: username/repo#789
**Merge order**: API → Pipeline → Web App
**Deployment**: Coordinate deployment window
```
**Research Experiments**
Use `experiment/` branches for exploratory work:
```bash
git checkout -b experiment/test-xgboost-model
# Document outcomes in PR description
# Merge if successful, close if unsuccessful
# Keep experiment history for reference
```
**Data Pipeline Branches**
Include data validation in PR workflow:
```bash
# 1. Create feature branch with data change
# 2. Run validation queries in staging
# 3. Document data impact in PR
# 4. Get data team review
# 5. Plan backfill if needed
# 6. Merge and monitor data quality
```
### Common Git Operations
| Issue | Solution |
|-------|----------|
| **Committed to wrong branch** | `git reset --soft HEAD~1 && git stash && git checkout correct-branch && git stash pop && git commit` |
| **Undo last commit (keep changes)** | `git reset --soft HEAD~1` |
| **Cherry-pick to another branch** | `git checkout target-branch && git cherry-pick <commit-hash>` |
| **Accidentally committed secrets** | Remove secret, commit, `git push --force-with-lease`, **rotate secret immediately** |
### GitHub Features to Leverage
**Code Owners** (`.github/CODEOWNERS`):
```
# Auto-assign reviewers based on files changed
*.py @data-team
src/api/* @backend-team
docs/* @documentation-team
```
**Issue Templates** (`.github/ISSUE_TEMPLATE/bug_report.md`):
```markdown
---
name: Bug Report
about: Report a bug
---
**Description**
Clear description of the bug
**Steps to Reproduce**
1. Step 1
2. Step 2
**Expected Behavior**
What should happen
**Actual Behavior**
What actually happened
```
**PR Templates** (`.github/pull_request_template.md`):
```markdown
## Purpose
Why is this change needed?
## Changes
- What was modified?
## Testing
- [ ] Tests added
- [ ] Manual testing completed
## Checklist
- [ ] Code follows style guidelines
- [ ] Documentation updated
```

View File

@@ -0,0 +1,418 @@
---
title: Pull Request Quality Standards
description: PR sizing, scope, documentation, and review-readiness criteria
tags: [github, pull-requests, standards, code-review, collaboration]
---
# Pull Request Quality Standards
## Metadata
**Purpose**: Define what makes a good PR at (size, focus, documentation)
**Applies to**: All pull requests in projects
**Version**: 1.0.0
---
## Instructions
### Single-Purpose Principle
**Each PR should do ONE thing well**
A PR should represent a single, cohesive change:
- One feature
- One bug fix
- One refactoring
- One set of related documentation updates
**❌ Multi-purpose PRs** (avoid these):
- Feature + unrelated bug fixes
- Refactoring + new functionality
- Multiple independent features
- Bug fixes + dependency updates + docs
**Why it matters**:
- Easier to review and understand
- Clearer rollback if issues arise
- Better Git history for debugging
- Faster review cycles
### PR Size Guidelines
**Target sizes** (lines of code changed):
- **Small**: < 200 LOC ✅ (ideal, ~1-2 hours review time)
- **Medium**: 200-500 LOC ⚠️ (acceptable, ~2-4 hours review time)
- **Large**: 500-1000 LOC ⚠️ (needs justification, ~1 day review time)
- **Too Large**: > 1000 LOC ❌ (should be split)
**File count**: Aim for < 15 files changed
**Exceptions** (large PRs acceptable when):
- Generated code (migrations, API clients)
- Vendored dependencies
- Large data files or configuration
- Initial project setup
- Bulk renaming/reformatting (use separate PR)
**When PR is too large**, split by:
- Logical components (backend → frontend)
- Sequential steps (refactor → feature → tests)
- Module/service boundaries
- Infrastructure vs application changes
### Title Format
**Use clear, descriptive titles**:
```
<type>: <clear description of change>
```
**Examples**:
-`feat: add customer segmentation API endpoint`
-`fix: resolve race condition in order processing`
-`refactor: extract validation logic to shared utilities`
-`updates` (too vague)
-`fix bug` (which bug?)
-`Feature/123` (not descriptive)
**Title should**:
- Start with lowercase (after type)
- Use imperative mood
- Be specific about what changed
- Avoid ticket numbers only (include description)
### Description Requirements
**Minimum requirements** for all PRs:
1. **Purpose**: Why is this change needed?
2. **Changes**: What was modified (high-level)?
3. **Testing**: How was this tested?
4. **Impact**: What's affected by this change?
5. **Issue Reference**: Link to related GitHub issue(s)
### Issue Linking Policy for PRs
**Issue references are REQUIRED for**:
- Feature PRs (new functionality)
- Bug fix PRs (resolving issues)
- Breaking change PRs
- Large refactorings (> 200 LOC)
**Issue references are RECOMMENDED for**:
- Performance improvements
- Smaller refactorings
- Significant test additions
**Issue references are OPTIONAL for**:
- Documentation-only updates
- Dependency updates (chores)
- Minor formatting/style fixes
**Format in PR description**:
- `Closes #123` - Closes an issue when PR merges
- `Fixes #456` - Fixes a bug (auto-closes issue)
- `Resolves #789` - Resolves an issue
- `Relates to #101` - Related but doesn't auto-close
- `Part of #102` - Part of larger epic/feature
- Multiple issues: `Closes #123, #456`
**Placement**: Include in Purpose section or at end of description
**Examples**:
```markdown
## Purpose
Add email validation to prevent registration with invalid addresses.
Closes #234
```
```markdown
## Purpose
Fix session timeout issue reported by multiple mobile users.
This addresses the root cause identified in investigation.
Fixes #456
```
**Template**:
```markdown
## Purpose
Brief explanation of why this change is needed.
## Changes
- High-level summary of what was modified
- Key technical decisions
- Any breaking changes
## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests pass
- [ ] Manual testing completed
- [ ] Tested on [environment/platform]
## Screenshots (if UI changes)
[Add screenshots or recordings]
## Impact
- Who/what is affected?
- Any performance implications?
- Dependencies or follow-up work?
## Checklist
- [ ] Code follows style guidelines
- [ ] Tests added for new functionality
- [ ] Documentation updated
- [ ] No debugging code or console logs
```
### Multi-Purpose PR Detection
**Red flags** indicating a multi-purpose PR:
🚩 **Multiple unrelated file groups**:
```
Changes:
- src/auth/* (authentication changes)
- src/reports/* (reporting feature)
- src/database/* (schema updates)
```
→ Split into 3 PRs
🚩 **Commit history with mixed types**:
```
feat: add new dashboard
fix: resolve login issue
chore: update dependencies
docs: update README
```
→ Each should be separate PR
🚩 **Description uses "and" repeatedly**:
```
"This PR adds user authentication AND fixes the bug in reports
AND updates dependencies AND refactors the API"
```
→ Split into 4 PRs
🚩 **Changes span multiple services/repos**:
→ Consider separate PRs per service, or use stack of dependent PRs
### Draft PRs
Use draft PRs for:
- Work in progress (WIP)
- Seeking early feedback
- Sharing approach before completion
- Long-running feature branches
**Mark as draft when**:
- Tests aren't passing
- Code isn't ready for review
- Exploring an approach
**Convert to ready when**:
- All tests pass
- Code is complete
- Ready for full review
---
## Resources
### Size Refactoring Strategies
**Strategy 1: Layer-based splitting**
```
PR 1: Database schema changes
PR 2: Backend API implementation
PR 3: Frontend integration
PR 4: Tests and documentation
```
**Strategy 2: Feature-based splitting**
```
PR 1: Core feature (minimal viable)
PR 2: Additional options/configuration
PR 3: UI polish and edge cases
PR 4: Performance optimizations
```
**Strategy 3: Refactor-then-feature**
```
PR 1: Refactor existing code (no behavior change)
PR 2: Add new feature using refactored structure
```
**Strategy 4: Component isolation**
```
PR 1: Shared utilities (used by feature)
PR 2: Service A changes
PR 3: Service B changes
PR 4: Integration and tests
```
### Good PR Examples
**Example 1: Small, focused feature**
```
Title: feat: add email validation to user registration
Description:
## Purpose
Users were able to register with invalid email addresses, causing
delivery failures for password reset emails.
## Changes
- Added regex-based email validation in registration form
- Added backend validation in user creation endpoint
- Display user-friendly error for invalid emails
## Testing
- [x] Unit tests for validation logic
- [x] Integration tests for registration flow
- [x] Manual testing with various email formats
## Impact
- Affects new user registration only
- Existing users unaffected
- No database changes required
Files changed: 4 (+120, -15)
```
**Example 2: Well-scoped refactor**
```
Title: refactor: extract common validation logic to utilities
Description:
## Purpose
Validation logic was duplicated across 8 API endpoints, making
updates error-prone and inconsistent.
## Changes
- Created validation utilities module
- Extracted email, phone, date validation to shared functions
- Updated 8 endpoints to use shared validators
- No behavior changes (existing tests still pass)
## Testing
- [x] All existing unit tests pass unchanged
- [x] Added tests for new validation utilities
- [x] Manual regression testing on auth flows
## Impact
- All API endpoints using validation
- Easier to add new validation rules in future
- No user-facing changes
Files changed: 12 (+250, -180)
```
### PR Review Checklist
**For PR authors (self-review)**:
- [ ] Single, focused purpose
- [ ] Reasonable size (< 500 LOC if possible)
- [ ] Clear title and description
- [ ] All tests passing
- [ ] No commented-out code
- [ ] No console.log / debugging statements
- [ ] Documentation updated
- [ ] Breaking changes clearly marked
- [ ] Follow-up issues created for future work
**For reviewers**:
- [ ] Purpose is clear and justified
- [ ] Changes match description
- [ ] No scope creep or unrelated changes
- [ ] Code follows standards
- [ ] Tests are adequate
- [ ] No security issues
- [ ] Performance impact acceptable
- [ ] Documentation sufficient
### Breaking Change Communication
**If PR includes breaking changes**:
1. **Mark clearly in title**:
```
feat!: change authentication to OAuth2
```
2. **Add BREAKING CHANGE section**:
```markdown
## BREAKING CHANGE
The `/auth` endpoint now requires OAuth2 tokens instead of API keys.
**Migration guide**:
1. Register OAuth2 application
2. Update client to obtain tokens
3. Replace API key header with Bearer token
**Timeline**: Old authentication deprecated 2024-06-01, removed 2024-09-01
```
3. **Communicate to stakeholders**:
- Tag affected teams in PR
- Post in relevant Slack channels
- Update migration documentation
### Personal Patterns
**Research PRs**: Include experiment context
```markdown
## Research Context
- **Hypothesis**: XGBoost will improve prediction accuracy by 5%
- **Baseline**: Current model (Random Forest, 0.82 AUC)
- **Metrics**: AUC, precision, recall on validation set
- **Outcome**: Document results in PR comments
```
**Data pipeline PRs**: Include data impact
```markdown
## Data Impact
- **Affected tables**: customer_data, transactions
- **Estimated runtime**: +15 minutes per run
- **Backfill needed**: Yes, for records since 2024-01-01
- **Validation queries**: [link to SQL validation]
```
**Multi-repo PRs**: Link related PRs
```markdown
## Related PRs
- API service: #123
- Web app: #456
- Data pipeline: #789
**Merge order**: API → Pipeline → Web App
```
### Handling Large PRs (When Unavoidable)
If you must create a large PR:
1. **Provide detailed description** with section breakdown
2. **Add inline comments** explaining complex sections
3. **Create review guide**:
```markdown
## Review Guide
Recommended review order:
1. Start with `src/models/*.py` (core logic)
2. Then `src/api/*.py` (API changes)
3. Then `tests/*` (test coverage)
4. Finally `docs/*` (documentation)
Focus areas:
- Data validation logic in `validators.py`
- Performance of batch processing in `processor.py`
```
4. **Break into logical commits** that can be reviewed individually
5. **Offer to pair review** for complex sections

View File

@@ -0,0 +1,392 @@
---
title: Pre-Commit Quality Standards
description: Project type detection, quality checks, and pre-commit configurations for pre-commit validation
tags: [pre-commit, quality, linting, security, project-types, validation]
---
# Pre-Commit Quality Standards
## Metadata
**Purpose**: Define project type detection patterns, pre-commit quality checks, and standard pre-commit configurations for different project types
**Applies to**: Pre-commit validation commands and pre-commit setup workflows
**Version**: 1.0.0
---
## Instructions
### Project Type Detection
Use file system indicators to determine project type(s):
#### Python Projects
**Indicators**:
- `pyproject.toml` (modern Python packaging)
- `setup.py` (traditional Python packaging)
- `requirements.txt` or `requirements/*.txt`
- `Pipfile` (pipenv)
- Presence of `*.py` files in src/ or root
**Detection command**:
```bash
ls -la | grep -E 'pyproject.toml|setup.py|requirements.txt|Pipfile'
find . -maxdepth 2 -name "*.py" | head -1
```
#### Data Science Projects
**Indicators**:
- `*.ipynb` Jupyter notebooks
- Common data directories: `data/`, `notebooks/`, `models/`
- Data files: `*.csv`, `*.parquet`, `*.pkl`
- ML config files: `mlflow.yaml`, `dvc.yaml`
**Detection command**:
```bash
find . -name "*.ipynb" | head -1
ls -d data/ notebooks/ models/ 2>/dev/null
```
#### Plugin Marketplace Projects
**Indicators**:
- `.claude-plugin/` directory
- `.claude-plugin/plugin.json` or `.claude-plugin/marketplace.json`
- `commands/`, `skills/`, `agents/` directories
**Detection command**:
```bash
ls -d .claude-plugin/ 2>/dev/null
```
#### Mixed Projects
Projects may have multiple types (e.g., Python + Jupyter, Python + Marketplace). Report all detected types.
### Universal Quality Checks
These checks apply to **all project types**:
#### 1. Commit Message Validation
Reference: `commit-message-standards` skill
- Format: `<type>(<scope>): <subject>`
- Length: Subject ≤50 characters
- Issue linking: Based on commit type
#### 2. Branch Naming Validation
Reference: `github-workflow-patterns` skill
- Format: `<type>/<description>`
- Valid types: feature, fix, hotfix, refactor, docs, experiment, chore
- Lowercase with hyphens
#### 3. Secret Detection
Scan for common secret patterns:
```regex
# API Keys
(api[_-]?key|apikey)[\s:=]["']?[a-zA-Z0-9]{20,}
# AWS Keys
(aws[_-]?access[_-]?key[_-]?id|AKIA[0-9A-Z]{16})
# Private Keys
-----BEGIN (RSA |DSA |EC )?PRIVATE KEY-----
# Tokens
(github[_-]?token|gh[pous]_[a-zA-Z0-9]{36,})
(sk-[a-zA-Z0-9]{48}) # OpenAI/Anthropic
# Passwords
(password|passwd|pwd)[\s:=]["'][^"']{8,}
# Generic secrets
(secret|token)[\s:=]["'][a-zA-Z0-9+/=]{20,}
```
**Check command**:
```bash
git diff --staged | grep -E '(api[_-]?key|password|secret|token|AWS|AKIA|sk-[a-zA-Z0-9])'
```
#### 4. Large File Detection
**Thresholds**:
- Warn: >5MB
- Fail: >10MB (unless using Git LFS)
**Check command**:
```bash
git diff --staged --name-only | python3 -c "
import sys, os
for line in sys.stdin:
file = line.strip()
if os.path.isfile(file):
size = os.path.getsize(file)
if size > 10*1024*1024:
print(f'{file}: {size/1024/1024:.2f} MB')
"
```
#### 5. Merge Conflict Markers
```bash
git diff --staged | grep -E '^(<{7}|={7}|>{7})'
```
#### 6. Trailing Whitespace
```bash
git diff --staged --check
```
#### 7. Direct Commits to Protected Branches
```bash
branch=$(git branch --show-current)
if [[ "$branch" == "main" || "$branch" == "master" ]]; then
echo "ERROR: Direct commits to $branch not allowed"
fi
```
### Project-Specific Quality Checks
#### Python Projects
**Syntax Validation**:
```bash
# Check Python syntax
for file in $(git diff --staged --name-only | grep '\.py$'); do
python -m py_compile "$file"
done
```
**Common Issues to Detect**:
```bash
# Debug statements
grep -r "import pdb" $(git diff --staged --name-only)
grep -r "breakpoint()" $(git diff --staged --name-only)
# Print debugging (warn only, not fail)
grep -r "print(" $(git diff --staged --name-only | grep '\.py$')
# Hardcoded paths (warn)
grep -E '(/Users/|/home/|C:\\)' $(git diff --staged --name-only | grep '\.py$')
```
**Type Hints** (for production-tier projects):
```bash
# Check if type hints are present
grep -E ': (str|int|float|bool|List|Dict|Optional)' file.py
```
#### Plugin Marketplace Projects
**JSON Validation**:
```bash
# Validate all plugin.json files
for file in $(find . -path "*/.claude-plugin/plugin.json"); do
python -m json.tool "$file" >/dev/null || echo "Invalid JSON: $file"
done
# Validate marketplace.json
python -m json.tool .claude-plugin/marketplace.json >/dev/null
```
**Markdown Quality**:
```bash
# Check for trailing whitespace
grep -n '\s$' *.md
# Check for proper heading hierarchy
# (H1 only once, H2-H6 properly nested)
```
**Plugin Structure Validation**:
- Required files: `.claude-plugin/plugin.json`
- Required fields in plugin.json: name, description, version, author
- Commands must have YAML frontmatter with description
- Skills should follow 3-tier structure (Metadata/Instructions/Resources)
#### Data Science Projects
**Notebook Checks**:
```bash
# Large notebooks (>5MB)
find . -name "*.ipynb" -size +5M
# Check for outputs in notebooks (optional - some teams want outputs cleared)
grep -l '"outputs": \[' *.ipynb | grep -v '"outputs": \[\]'
# Check for execution count in notebooks
grep -l '"execution_count":' *.ipynb
```
**Data File Checks**:
```bash
# Large data files that shouldn't be in git
find data/ -type f -size +10M 2>/dev/null
# Check if Git LFS is configured for data files
git lfs ls-files | grep -E '\.(csv|parquet|pkl|h5)$'
```
**Model File Checks**:
```bash
# Large model files
find models/ -type f -size +100M 2>/dev/null
```
---
## Resources
### Pre-Commit Configuration Templates
#### Python Project Template
```yaml
# .pre-commit-config.yaml for Python projects
repos:
# Code formatting
- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
language_version: python3.11
# Linting
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.9
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
# Type checking
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
hooks:
- id: mypy
additional_dependencies: [types-all]
# Security and general checks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-json
- id: check-added-large-files
args: ['--maxkb=10000']
- id: check-merge-conflict
- id: detect-private-key
# Secret detection
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
```
#### Plugin Marketplace Project Template
```yaml
# .pre-commit-config.yaml for Plugin Marketplace projects
repos:
# Markdown linting
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.38.0
hooks:
- id: markdownlint
args: [--fix]
# YAML validation
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-yaml
- id: check-json
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-merge-conflict
# JSON formatting
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v3.1.0
hooks:
- id: prettier
types_or: [json, yaml, markdown]
```
#### Data Science Project Template
```yaml
# .pre-commit-config.yaml for Data Science projects
repos:
# Python formatting and linting (same as Python template)
- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.9
hooks:
- id: ruff
# Jupyter notebook handling
- repo: https://github.com/nbQA-dev/nbQA
rev: 1.7.1
hooks:
- id: nbqa-black
- id: nbqa-ruff
# Clear notebook outputs
- repo: https://github.com/kynan/nbstripout
rev: 0.6.1
hooks:
- id: nbstripout
# General checks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-added-large-files
args: ['--maxkb=5000']
- id: detect-private-key
```
### Secret Detection Patterns Reference
**High-confidence patterns** (should fail):
- AWS Access Key: `AKIA[0-9A-Z]{16}`
- GitHub Personal Access Token: `ghp_[a-zA-Z0-9]{36}`
- GitHub OAuth Token: `gho_[a-zA-Z0-9]{36}`
- Private Key: `-----BEGIN.*PRIVATE KEY-----`
- Anthropic/OpenAI Key: `sk-[a-zA-Z0-9]{48}`
**Medium-confidence patterns** (should warn):
- Generic API key assignment: `api_key\s*=\s*["'][^"']+["']`
- Password assignment: `password\s*=\s*["'][^"']+["']`
- Token assignment: `token\s*=\s*["'][^"']+["']`
**Low-confidence patterns** (informational only):
- Environment variable usage: `os.getenv('API_KEY')`
- Config references: `config['secret']`
### Quality Check Severity Levels
**FAIL (Blocking)**:
- Secrets detected (high confidence)
- Syntax errors
- Merge conflict markers
- Direct commits to main/master
- Files >10MB without Git LFS
**WARN (Non-blocking)**:
- Print statements in Python code
- Files >5MB
- Debug statements in non-test files
- Missing type hints (production tier only)
- Secrets detected (medium confidence)
**INFO (Informational)**:
- Pre-commit not configured
- Optional checks not run
- Secrets detected (low confidence)