Files
gh-bradleyboehmke-brads-mar…/skills/pre-commit-quality-standards.md
2025-11-29 18:01:57 +08:00

9.4 KiB

title, description, tags
title description tags
Pre-Commit Quality Standards Project type detection, quality checks, and pre-commit configurations for pre-commit validation
pre-commit
quality
linting
security
project-types
validation

Pre-Commit Quality Standards

Metadata

Purpose: Define project type detection patterns, pre-commit quality checks, and standard pre-commit configurations for different project types Applies to: Pre-commit validation commands and pre-commit setup workflows Version: 1.0.0


Instructions

Project Type Detection

Use file system indicators to determine project type(s):

Python Projects

Indicators:

  • pyproject.toml (modern Python packaging)
  • setup.py (traditional Python packaging)
  • requirements.txt or requirements/*.txt
  • Pipfile (pipenv)
  • Presence of *.py files in src/ or root

Detection command:

ls -la | grep -E 'pyproject.toml|setup.py|requirements.txt|Pipfile'
find . -maxdepth 2 -name "*.py" | head -1

Data Science Projects

Indicators:

  • *.ipynb Jupyter notebooks
  • Common data directories: data/, notebooks/, models/
  • Data files: *.csv, *.parquet, *.pkl
  • ML config files: mlflow.yaml, dvc.yaml

Detection command:

find . -name "*.ipynb" | head -1
ls -d data/ notebooks/ models/ 2>/dev/null

Plugin Marketplace Projects

Indicators:

  • .claude-plugin/ directory
  • .claude-plugin/plugin.json or .claude-plugin/marketplace.json
  • commands/, skills/, agents/ directories

Detection command:

ls -d .claude-plugin/ 2>/dev/null

Mixed Projects

Projects may have multiple types (e.g., Python + Jupyter, Python + Marketplace). Report all detected types.

Universal Quality Checks

These checks apply to all project types:

1. Commit Message Validation

Reference: commit-message-standards skill

  • Format: <type>(<scope>): <subject>
  • Length: Subject ≤50 characters
  • Issue linking: Based on commit type

2. Branch Naming Validation

Reference: github-workflow-patterns skill

  • Format: <type>/<description>
  • Valid types: feature, fix, hotfix, refactor, docs, experiment, chore
  • Lowercase with hyphens

3. Secret Detection

Scan for common secret patterns:

# API Keys
(api[_-]?key|apikey)[\s:=]["']?[a-zA-Z0-9]{20,}

# AWS Keys
(aws[_-]?access[_-]?key[_-]?id|AKIA[0-9A-Z]{16})

# Private Keys
-----BEGIN (RSA |DSA |EC )?PRIVATE KEY-----

# Tokens
(github[_-]?token|gh[pous]_[a-zA-Z0-9]{36,})
(sk-[a-zA-Z0-9]{48})  # OpenAI/Anthropic

# Passwords
(password|passwd|pwd)[\s:=]["'][^"']{8,}

# Generic secrets
(secret|token)[\s:=]["'][a-zA-Z0-9+/=]{20,}

Check command:

git diff --staged | grep -E '(api[_-]?key|password|secret|token|AWS|AKIA|sk-[a-zA-Z0-9])'

4. Large File Detection

Thresholds:

  • Warn: >5MB
  • Fail: >10MB (unless using Git LFS)

Check command:

git diff --staged --name-only | python3 -c "
import sys, os
for line in sys.stdin:
    file = line.strip()
    if os.path.isfile(file):
        size = os.path.getsize(file)
        if size > 10*1024*1024:
            print(f'{file}: {size/1024/1024:.2f} MB')
"

5. Merge Conflict Markers

git diff --staged | grep -E '^(<{7}|={7}|>{7})'

6. Trailing Whitespace

git diff --staged --check

7. Direct Commits to Protected Branches

branch=$(git branch --show-current)
if [[ "$branch" == "main" || "$branch" == "master" ]]; then
  echo "ERROR: Direct commits to $branch not allowed"
fi

Project-Specific Quality Checks

Python Projects

Syntax Validation:

# Check Python syntax
for file in $(git diff --staged --name-only | grep '\.py$'); do
  python -m py_compile "$file"
done

Common Issues to Detect:

# Debug statements
grep -r "import pdb" $(git diff --staged --name-only)
grep -r "breakpoint()" $(git diff --staged --name-only)

# Print debugging (warn only, not fail)
grep -r "print(" $(git diff --staged --name-only | grep '\.py$')

# Hardcoded paths (warn)
grep -E '(/Users/|/home/|C:\\)' $(git diff --staged --name-only | grep '\.py$')

Type Hints (for production-tier projects):

# Check if type hints are present
grep -E ': (str|int|float|bool|List|Dict|Optional)' file.py

Plugin Marketplace Projects

JSON Validation:

# Validate all plugin.json files
for file in $(find . -path "*/.claude-plugin/plugin.json"); do
  python -m json.tool "$file" >/dev/null || echo "Invalid JSON: $file"
done

# Validate marketplace.json
python -m json.tool .claude-plugin/marketplace.json >/dev/null

Markdown Quality:

# Check for trailing whitespace
grep -n '\s$' *.md

# Check for proper heading hierarchy
# (H1 only once, H2-H6 properly nested)

Plugin Structure Validation:

  • Required files: .claude-plugin/plugin.json
  • Required fields in plugin.json: name, description, version, author
  • Commands must have YAML frontmatter with description
  • Skills should follow 3-tier structure (Metadata/Instructions/Resources)

Data Science Projects

Notebook Checks:

# Large notebooks (>5MB)
find . -name "*.ipynb" -size +5M

# Check for outputs in notebooks (optional - some teams want outputs cleared)
grep -l '"outputs": \[' *.ipynb | grep -v '"outputs": \[\]'

# Check for execution count in notebooks
grep -l '"execution_count":' *.ipynb

Data File Checks:

# Large data files that shouldn't be in git
find data/ -type f -size +10M 2>/dev/null

# Check if Git LFS is configured for data files
git lfs ls-files | grep -E '\.(csv|parquet|pkl|h5)$'

Model File Checks:

# Large model files
find models/ -type f -size +100M 2>/dev/null

Resources

Pre-Commit Configuration Templates

Python Project Template

# .pre-commit-config.yaml for Python projects
repos:
  # Code formatting
  - repo: https://github.com/psf/black
    rev: 23.12.1
    hooks:
      - id: black
        language_version: python3.11

  # Linting
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.1.9
    hooks:
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]

  # Type checking
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.8.0
    hooks:
      - id: mypy
        additional_dependencies: [types-all]

  # Security and general checks
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-json
      - id: check-added-large-files
        args: ['--maxkb=10000']
      - id: check-merge-conflict
      - id: detect-private-key

  # Secret detection
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Plugin Marketplace Project Template

# .pre-commit-config.yaml for Plugin Marketplace projects
repos:
  # Markdown linting
  - repo: https://github.com/igorshubovych/markdownlint-cli
    rev: v0.38.0
    hooks:
      - id: markdownlint
        args: [--fix]

  # YAML validation
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: check-yaml
      - id: check-json
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-merge-conflict

  # JSON formatting
  - repo: https://github.com/pre-commit/mirrors-prettier
    rev: v3.1.0
    hooks:
      - id: prettier
        types_or: [json, yaml, markdown]

Data Science Project Template

# .pre-commit-config.yaml for Data Science projects
repos:
  # Python formatting and linting (same as Python template)
  - repo: https://github.com/psf/black
    rev: 23.12.1
    hooks:
      - id: black

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.1.9
    hooks:
      - id: ruff

  # Jupyter notebook handling
  - repo: https://github.com/nbQA-dev/nbQA
    rev: 1.7.1
    hooks:
      - id: nbqa-black
      - id: nbqa-ruff

  # Clear notebook outputs
  - repo: https://github.com/kynan/nbstripout
    rev: 0.6.1
    hooks:
      - id: nbstripout

  # General checks
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-added-large-files
        args: ['--maxkb=5000']
      - id: detect-private-key

Secret Detection Patterns Reference

High-confidence patterns (should fail):

  • AWS Access Key: AKIA[0-9A-Z]{16}
  • GitHub Personal Access Token: ghp_[a-zA-Z0-9]{36}
  • GitHub OAuth Token: gho_[a-zA-Z0-9]{36}
  • Private Key: -----BEGIN.*PRIVATE KEY-----
  • Anthropic/OpenAI Key: sk-[a-zA-Z0-9]{48}

Medium-confidence patterns (should warn):

  • Generic API key assignment: api_key\s*=\s*["'][^"']+["']
  • Password assignment: password\s*=\s*["'][^"']+["']
  • Token assignment: token\s*=\s*["'][^"']+["']

Low-confidence patterns (informational only):

  • Environment variable usage: os.getenv('API_KEY')
  • Config references: config['secret']

Quality Check Severity Levels

FAIL (Blocking):

  • Secrets detected (high confidence)
  • Syntax errors
  • Merge conflict markers
  • Direct commits to main/master
  • Files >10MB without Git LFS

WARN (Non-blocking):

  • Print statements in Python code
  • Files >5MB
  • Debug statements in non-test files
  • Missing type hints (production tier only)
  • Secrets detected (medium confidence)

INFO (Informational):

  • Pre-commit not configured
  • Optional checks not run
  • Secrets detected (low confidence)