# False Positives Management Strategies for managing false positives in Gitleaks secret detection. ## Table of Contents - [Understanding False Positives](#understanding-false-positives) - [Allowlist Strategies](#allowlist-strategies) - [Common False Positive Patterns](#common-false-positive-patterns) - [Configuration Examples](#configuration-examples) - [Best Practices](#best-practices) ## Understanding False Positives False positives occur when legitimate code patterns match secret detection rules. ### Categories of False Positives 1. **Example/Placeholder Values**: Documentation and examples using fake credentials 2. **Test Fixtures**: Test data with credential-like patterns 3. **Non-Secret Constants**: Configuration values that match patterns but aren't sensitive 4. **Generated Code**: Auto-generated code with high-entropy strings 5. **Comments and Documentation**: Explanatory text matching patterns ### Impact Assessment Before allowlisting, verify it's truly a false positive: ```bash # Extract the flagged value echo "api_key_here" | base64 # Check if valid encoding curl -H "Authorization: Bearer " https://api.service.com/test # Test if active # Check git history for when added git log -p --all -S "flagged_value" # Review context around detection git show : ``` ## Allowlist Strategies ### 1. Path-Based Allowlisting Exclude entire directories or file patterns: ```toml [allowlist] description = "Exclude test and documentation files" paths = [ '''test/.*''', # All test directories '''tests/.*''', # Alternative test directory name '''.*/fixtures/.*''', # Test fixtures anywhere '''examples/.*''', # Example code '''docs/.*''', # Documentation '''.*\.md$''', # Markdown files '''.*\.rst$''', # ReStructuredText files '''.*_test\.go$''', # Go test files '''.*\.test\.js$''', # JavaScript test files '''.*\.spec\.ts$''', # TypeScript spec files ] ``` ### 2. Stopword Allowlisting Filter out known placeholder values: ```toml [allowlist] description = "Common placeholder values" stopwords = [ "example", "placeholder", "your_api_key_here", "your_secret_here", "REPLACEME", "CHANGEME", "xxxxxx", "000000", "123456", "abcdef", "sample", "dummy", "fake", "test_key", "mock_token", ] ``` ### 3. Commit-Based Allowlisting Allowlist specific commits after manual verification: ```toml [allowlist] description = "Verified false positives" commits = [ "a1b2c3d4e5f6", # Initial test fixtures - verified 2024-01-15 "f6e5d4c3b2a1", # Documentation examples - verified 2024-01-16 ] ``` Add comment explaining why each commit is allowlisted. ### 4. Regex Allowlisting Allowlist specific patterns: ```toml [allowlist] description = "Pattern-based allowlist" regexes = [ '''example_api_key_[0-9]+''', # Example keys with numeric suffix '''key\s*=\s*["']EXAMPLE["']''', # Explicitly marked examples '''(?i)test_?password_?[0-9]*''', # Test passwords '''(?i)dummy.*secret''', # Dummy secrets ] ``` ### 5. Rule-Specific Allowlisting Create exceptions for specific rules only: ```toml [[rules]] id = "generic-api-key" description = "Generic API Key" regex = '''(?i)api_key\s*=\s*["']([a-zA-Z0-9]{32})["']''' [rules.allowlist] description = "Allow generic API key pattern in specific contexts" paths = ['''config/defaults\.yaml'''] regexes = ['''api_key\s*=\s*["']example'''] ``` ### 6. Global vs Rule Allowlists Global allowlists override rule-specific ones: ```toml # Global allowlist - highest precedence [allowlist] description = "Organization-wide exceptions" paths = ['''vendor/''', '''node_modules/'''] # Rule-specific allowlist [[rules]] id = "custom-secret" [rules.allowlist] description = "Exceptions only for this rule" paths = ['''config/template\.yml'''] ``` ## Common False Positive Patterns ### 1. Documentation Examples **Problem**: README and documentation contain example credentials. **Solution**: ```toml [allowlist] paths = [ '''README\.md$''', '''CONTRIBUTING\.md$''', '''docs/.*\.md$''', '''.*\.example$''', # .env.example files '''.*\.template$''', # Template files '''.*\.sample$''', # Sample configurations ] stopwords = [ "example.com", "user@example.org", "YOUR_API_KEY", ] ``` ### 2. Test Fixtures **Problem**: Test data contains credential-like strings for testing credential handling. **Solution**: ```toml [allowlist] paths = [ '''test/fixtures/.*''', '''spec/fixtures/.*''', '''.*/testdata/.*''', # Go convention '''.*/mocks/.*''', '''cypress/fixtures/.*''', # Cypress test data ] # Or use inline comments in code # password = "test_password_123" # gitleaks:allow ``` ### 3. Generated Code **Problem**: Code generators produce high-entropy identifiers. **Solution**: ```toml [allowlist] description = "Generated code" paths = [ '''.*\.pb\.go$''', # Protocol buffer generated code '''.*_generated\..*''', # Generated file marker '''node_modules/.*''', # Dependencies '''vendor/.*''', # Vendored dependencies '''dist/.*''', # Build output '''build/.*''', ] ``` ### 4. Configuration Templates **Problem**: Config templates with placeholder values match patterns. **Solution**: ```toml [allowlist] paths = [ '''config/.*\.template''', '''templates/.*''', '''.*\.tpl$''', '''.*\.tmpl$''', ] stopwords = [ "REPLACE_WITH_YOUR", "CONFIGURE_ME", "SET_THIS_VALUE", ] ``` ### 5. Base64 Encoded Strings **Problem**: Non-secret base64 data flagged due to high entropy. **Solution**: ```toml # Increase entropy threshold to reduce false positives [[rules]] id = "high-entropy-base64" regex = '''[a-zA-Z0-9+/]{40,}={0,2}''' entropy = 5.5 # Increase from default 4.5 ``` Or allowlist specific patterns: ```toml [allowlist] regexes = [ '''data:image/[^;]+;base64,''', # Base64 encoded images '''-----BEGIN CERTIFICATE-----''', # Public certificates (not private keys) ] ``` ### 6. Public Keys and Certificates **Problem**: Public keys detected (which are not secrets). **Solution**: ```toml [allowlist] regexes = [ '''-----BEGIN PUBLIC KEY-----''', '''-----BEGIN CERTIFICATE-----''', '''-----BEGIN X509 CERTIFICATE-----''', ] # But DO NOT allowlist: # -----BEGIN PRIVATE KEY----- # -----BEGIN RSA PRIVATE KEY----- ``` ### 7. UUIDs and Identifiers **Problem**: UUIDs match high-entropy patterns. **Solution**: ```toml [allowlist] regexes = [ '''[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}''', # UUID '''[0-9a-f]{24}''', # MongoDB ObjectId ] ``` Or adjust entropy detection: ```toml [[rules]] id = "generic-high-entropy" entropy = 6.0 # Only flag very high entropy ``` ## Configuration Examples ### Minimal Configuration Start with broad allowlists, refine over time: ```toml title = "Minimal Gitleaks Configuration" [extend] useDefault = true # Use all built-in rules [allowlist] description = "Broad allowlist for initial rollout" paths = [ '''test/.*''', '''.*\.md$''', '''vendor/.*''', '''node_modules/.*''', ] stopwords = [ "example", "test", "mock", "dummy", ] ``` ### Strict Configuration Minimize false positives with targeted allowlists: ```toml title = "Strict Gitleaks Configuration" [extend] useDefault = true [allowlist] description = "Minimal allowlist - verify all exceptions" # Only allow specific known false positives paths = [ '''docs/api-examples\.md''', # API documentation with examples '''test/fixtures/auth\.json''', # Authentication test fixtures ] # Specific known placeholder values stopwords = [ "YOUR_API_KEY_HERE", "sk_test_example_key_123456789", ] # Manually verified commits commits = [ "abc123def456", # Test fixtures added - verified 2024-01-15 by security@company.com ] ``` ### Balanced Configuration Balance detection sensitivity with operational overhead: ```toml title = "Balanced Gitleaks Configuration" [extend] useDefault = true [allowlist] description = "Balanced allowlist" # Common non-secret paths paths = [ '''test/fixtures/.*''', '''spec/fixtures/.*''', '''.*\.md$''', '''docs/.*''', '''examples/.*''', '''vendor/.*''', '''node_modules/.*''', ] # Common placeholders stopwords = [ "example", "placeholder", "your_key_here", "replace_me", "changeme", "test", "dummy", "mock", ] # Public non-secrets regexes = [ '''-----BEGIN CERTIFICATE-----''', '''-----BEGIN PUBLIC KEY-----''', '''data:image/[^;]+;base64,''', ] ``` ## Best Practices ### 1. Document Allowlist Decisions Always add comments explaining why patterns are allowlisted: ```toml [allowlist] description = "Verified false positives - reviewed 2024-01-15" # Test fixtures created during initial test suite development # Contains only example credentials for testing credential validation paths = ['''test/fixtures/credentials\.json'''] # Documentation examples using clearly fake values # All examples prefixed with "example_" or "test_" stopwords = ["example_", "test_"] ``` ### 2. Regular Allowlist Review Schedule periodic reviews: ```bash #!/bin/bash # review-allowlist.sh echo "Gitleaks Allowlist Review" echo "=========================" echo "" # Show allowlist paths echo "Allowlisted paths:" grep -A 10 "^\[allowlist\]" .gitleaks.toml | grep "paths = " # Show allowlisted commits echo "" echo "Allowlisted commits:" grep -A 10 "^\[allowlist\]" .gitleaks.toml | grep "commits = " # Check if commits still exist # (May have been removed in history rewrite) git rev-parse --verify abc123def456 2>/dev/null || echo "WARNING: Commit abc123def456 not found" ``` ### 3. Use Inline Annotations Sparingly For one-off false positives, use inline comments: ```python # This is a test password for unit tests only # gitleaks:allow TEST_PASSWORD = "test_password_123" ``` **Warning**: Overuse of inline annotations indicates poorly tuned configuration. ### 4. Version Control Your Configuration Track changes to `.gitleaks.toml`: ```bash git log -p .gitleaks.toml # See who allowlisted what and when git blame .gitleaks.toml ``` ### 5. Test Allowlist Changes Before committing allowlist changes: ```bash # Test configuration gitleaks detect --config .gitleaks.toml -v # Verify specific file is now allowed gitleaks detect --config .gitleaks.toml --source test/fixtures/credentials.json # Verify secret is still caught in production code echo 'api_key = "sk_live_actual_key"' > /tmp/test_detection.py gitleaks detect --config .gitleaks.toml --source /tmp/test_detection.py --no-git ``` ### 6. Separate Allowlists by Environment Use different configurations for different contexts: ```bash # Strict config for production code gitleaks detect --config .gitleaks.strict.toml --source src/ # Lenient config for test code gitleaks detect --config .gitleaks.lenient.toml --source test/ ``` ### 7. Monitor False Positive Rate Track metrics over time: ```bash # Total findings TOTAL=$(gitleaks detect --report-format json 2>/dev/null | jq '. | length') # Run with allowlist AFTER_FILTER=$(gitleaks detect --config .gitleaks.toml --report-format json 2>/dev/null | jq '. | length') # Calculate reduction echo "False positive reduction: $(($TOTAL - $AFTER_FILTER)) / $TOTAL" ``` **Target**: < 10% false positive rate for good developer experience. ### 8. Security Review for New Allowlists Require security team approval for: - New allowlisted paths in `src/` or production code - New allowlisted commits (verify manually first) - Changes to rule-specific allowlists - New stopwords that could mask real secrets ### 9. Avoid Overly Broad Patterns **Bad** (too broad): ```toml [allowlist] paths = ['''.*'''] # Disables all detection! stopwords = ["key", "secret"] # Matches too many real secrets ``` **Good** (specific): ```toml [allowlist] paths = ['''test/unit/.*\.test\.js$'''] # Specific test directory stopwords = ["example_key", "test_secret"] # Specific placeholders ``` ### 10. Escape Special Characters When using regex patterns, escape properly: ```toml [allowlist] regexes = [ '''api\.example\.com''', # Literal dot '''config\[\'key\'\]''', # Literal brackets and quotes ] ``` ## Troubleshooting False Positives ### Issue: Can't Identify Source of False Positive ```bash # Run with verbose output gitleaks detect -v | grep "RuleID" # Get detailed finding information gitleaks detect --report-format json | jq '.[] | {file: .File, line: .StartLine, rule: .RuleID}' # View context around detection gitleaks detect --report-format json | jq -r '.[0] | .File, .StartLine' | xargs -I {} sh -c 'sed -n "{}-5,{}+5p" {}' ``` ### Issue: Allowlist Not Working ```bash # Verify config is loaded gitleaks detect --config .gitleaks.toml -v 2>&1 | grep "config" # Check regex syntax echo "test_string" | grep -E 'your_regex_pattern' # Test path matching echo "test/fixtures/file.json" | grep -E 'test/fixtures/.*' ``` ### Issue: Too Many False Positives 1. **Export findings**: `gitleaks detect --report-format json > findings.json` 2. **Analyze patterns**: `jq -r '.[].File' findings.json | sort | uniq -c | sort -rn` 3. **Group by rule**: `jq -r '.[].RuleID' findings.json | sort | uniq -c | sort -rn` 4. **Create targeted allowlists** based on analysis ## False Positive vs Real Secret When unsure, err on the side of caution: | Indicator | False Positive | Real Secret | |-----------|----------------|-------------| | Location | Test/docs/examples | Production code | | Pattern | "example", "test", "mock" | No such indicators | | Entropy | Low/medium | High | | Format | Incomplete/truncated | Complete/valid | | Context | Educational comments | Functional code | | Git history | Added in test commits | Added furtively | **When in doubt**: Treat as real secret and investigate.