gh-agentsecops-secopsagentkit/skills/devsecops/secrets-gitleaks/references/false_positives.md

# False Positives Management

Strategies for managing false positives in Gitleaks secret detection.

## Table of Contents

- [Understanding False Positives](#understanding-false-positives)
- [Allowlist Strategies](#allowlist-strategies)
- [Common False Positive Patterns](#common-false-positive-patterns)
- [Configuration Examples](#configuration-examples)
- [Best Practices](#best-practices)

## Understanding False Positives

False positives occur when legitimate code patterns match secret detection rules.

### Categories of False Positives

1. **Example/Placeholder Values**: Documentation and examples using fake credentials
2. **Test Fixtures**: Test data with credential-like patterns
3. **Non-Secret Constants**: Configuration values that match patterns but aren't sensitive
4. **Generated Code**: Auto-generated code with high-entropy strings
5. **Comments and Documentation**: Explanatory text matching patterns

### Impact Assessment

Before allowlisting, verify it's truly a false positive:

```bash
# Extract the flagged value
echo "api_key_here" | base64  # Check if valid encoding
curl -H "Authorization: Bearer <token>" https://api.service.com/test  # Test if active

# Check git history for when added
git log -p --all -S "flagged_value"

# Review context around detection
git show <commit-sha>:<file-path>
```

## Allowlist Strategies

### 1. Path-Based Allowlisting

Exclude entire directories or file patterns:

```toml
[allowlist]
description = "Exclude test and documentation files"
paths = [
  '''test/.*''',              # All test directories
  '''tests/.*''',             # Alternative test directory name
  '''.*/fixtures/.*''',       # Test fixtures anywhere
  '''examples/.*''',          # Example code
  '''docs/.*''',              # Documentation
  '''.*\.md$''',              # Markdown files
  '''.*\.rst$''',             # ReStructuredText files
  '''.*_test\.go$''',         # Go test files
  '''.*\.test\.js$''',        # JavaScript test files
  '''.*\.spec\.ts$''',        # TypeScript spec files
]
```

### 2. Stopword Allowlisting

Filter out known placeholder values:

```toml
[allowlist]
description = "Common placeholder values"
stopwords = [
  "example",
  "placeholder",
  "your_api_key_here",
  "your_secret_here",
  "REPLACEME",
  "CHANGEME",
  "xxxxxx",
  "000000",
  "123456",
  "abcdef",
  "sample",
  "dummy",
  "fake",
  "test_key",
  "mock_token",
]
```

### 3. Commit-Based Allowlisting

Allowlist specific commits after manual verification:

```toml
[allowlist]
description = "Verified false positives"
commits = [
  "a1b2c3d4e5f6",  # Initial test fixtures - verified 2024-01-15
  "f6e5d4c3b2a1",  # Documentation examples - verified 2024-01-16
]
```

Add comment explaining why each commit is allowlisted.

### 4. Regex Allowlisting

Allowlist specific patterns:

```toml
[allowlist]
description = "Pattern-based allowlist"
regexes = [
  '''example_api_key_[0-9]+''',            # Example keys with numeric suffix
  '''key\s*=\s*["']EXAMPLE["']''',         # Explicitly marked examples
  '''(?i)test_?password_?[0-9]*''',        # Test passwords
  '''(?i)dummy.*secret''',                 # Dummy secrets
]
```

### 5. Rule-Specific Allowlisting

Create exceptions for specific rules only:

```toml
[[rules]]
id = "generic-api-key"
description = "Generic API Key"
regex = '''(?i)api_key\s*=\s*["']([a-zA-Z0-9]{32})["']'''

[rules.allowlist]
description = "Allow generic API key pattern in specific contexts"
paths = ['''config/defaults\.yaml''']
regexes = ['''api_key\s*=\s*["']example''']
```

### 6. Global vs Rule Allowlists

Global allowlists override rule-specific ones:

```toml
# Global allowlist - highest precedence
[allowlist]
description = "Organization-wide exceptions"
paths = ['''vendor/''', '''node_modules/''']

# Rule-specific allowlist
[[rules]]
id = "custom-secret"
[rules.allowlist]
description = "Exceptions only for this rule"
paths = ['''config/template\.yml''']
```

## Common False Positive Patterns

### 1. Documentation Examples

**Problem**: README and documentation contain example credentials.

**Solution**:
```toml
[allowlist]
paths = [
  '''README\.md$''',
  '''CONTRIBUTING\.md$''',
  '''docs/.*\.md$''',
  '''.*\.example$''',           # .env.example files
  '''.*\.template$''',          # Template files
  '''.*\.sample$''',            # Sample configurations
]

stopwords = [
  "example.com",
  "user@example.org",
  "YOUR_API_KEY",
]
```

### 2. Test Fixtures

**Problem**: Test data contains credential-like strings for testing credential handling.

**Solution**:
```toml
[allowlist]
paths = [
  '''test/fixtures/.*''',
  '''spec/fixtures/.*''',
  '''.*/testdata/.*''',         # Go convention
  '''.*/mocks/.*''',
  '''cypress/fixtures/.*''',    # Cypress test data
]

# Or use inline comments in code
# password = "test_password_123"  # gitleaks:allow
```

### 3. Generated Code

**Problem**: Code generators produce high-entropy identifiers.

**Solution**:
```toml
[allowlist]
description = "Generated code"
paths = [
  '''.*\.pb\.go$''',            # Protocol buffer generated code
  '''.*_generated\..*''',        # Generated file marker
  '''node_modules/.*''',         # Dependencies
  '''vendor/.*''',               # Vendored dependencies
  '''dist/.*''',                 # Build output
  '''build/.*''',
]
```

### 4. Configuration Templates

**Problem**: Config templates with placeholder values match patterns.

**Solution**:
```toml
[allowlist]
paths = [
  '''config/.*\.template''',
  '''templates/.*''',
  '''.*\.tpl$''',
  '''.*\.tmpl$''',
]

stopwords = [
  "REPLACE_WITH_YOUR",
  "CONFIGURE_ME",
  "SET_THIS_VALUE",
]
```

### 5. Base64 Encoded Strings

**Problem**: Non-secret base64 data flagged due to high entropy.

**Solution**:
```toml
# Increase entropy threshold to reduce false positives
[[rules]]
id = "high-entropy-base64"
regex = '''[a-zA-Z0-9+/]{40,}={0,2}'''
entropy = 5.5  # Increase from default 4.5
```

Or allowlist specific patterns:
```toml
[allowlist]
regexes = [
  '''data:image/[^;]+;base64,''',  # Base64 encoded images
  '''-----BEGIN CERTIFICATE-----''',  # Public certificates (not private keys)
]
```

### 6. Public Keys and Certificates

**Problem**: Public keys detected (which are not secrets).

**Solution**:
```toml
[allowlist]
regexes = [
  '''-----BEGIN PUBLIC KEY-----''',
  '''-----BEGIN CERTIFICATE-----''',
  '''-----BEGIN X509 CERTIFICATE-----''',
]

# But DO NOT allowlist:
# -----BEGIN PRIVATE KEY-----
# -----BEGIN RSA PRIVATE KEY-----
```

### 7. UUIDs and Identifiers

**Problem**: UUIDs match high-entropy patterns.

**Solution**:
```toml
[allowlist]
regexes = [
  '''[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}''',  # UUID
  '''[0-9a-f]{24}''',  # MongoDB ObjectId
]
```

Or adjust entropy detection:
```toml
[[rules]]
id = "generic-high-entropy"
entropy = 6.0  # Only flag very high entropy
```

## Configuration Examples

### Minimal Configuration

Start with broad allowlists, refine over time:

```toml
title = "Minimal Gitleaks Configuration"

[extend]
useDefault = true  # Use all built-in rules

[allowlist]
description = "Broad allowlist for initial rollout"
paths = [
  '''test/.*''',
  '''.*\.md$''',
  '''vendor/.*''',
  '''node_modules/.*''',
]

stopwords = [
  "example",
  "test",
  "mock",
  "dummy",
]
```

### Strict Configuration

Minimize false positives with targeted allowlists:

```toml
title = "Strict Gitleaks Configuration"

[extend]
useDefault = true

[allowlist]
description = "Minimal allowlist - verify all exceptions"

# Only allow specific known false positives
paths = [
  '''docs/api-examples\.md''',     # API documentation with examples
  '''test/fixtures/auth\.json''',  # Authentication test fixtures
]

# Specific known placeholder values
stopwords = [
  "YOUR_API_KEY_HERE",
  "sk_test_example_key_123456789",
]

# Manually verified commits
commits = [
  "abc123def456",  # Test fixtures added - verified 2024-01-15 by security@company.com
]
```

### Balanced Configuration

Balance detection sensitivity with operational overhead:

```toml
title = "Balanced Gitleaks Configuration"

[extend]
useDefault = true

[allowlist]
description = "Balanced allowlist"

# Common non-secret paths
paths = [
  '''test/fixtures/.*''',
  '''spec/fixtures/.*''',
  '''.*\.md$''',
  '''docs/.*''',
  '''examples/.*''',
  '''vendor/.*''',
  '''node_modules/.*''',
]

# Common placeholders
stopwords = [
  "example",
  "placeholder",
  "your_key_here",
  "replace_me",
  "changeme",
  "test",
  "dummy",
  "mock",
]

# Public non-secrets
regexes = [
  '''-----BEGIN CERTIFICATE-----''',
  '''-----BEGIN PUBLIC KEY-----''',
  '''data:image/[^;]+;base64,''',
]
```

## Best Practices

### 1. Document Allowlist Decisions

Always add comments explaining why patterns are allowlisted:

```toml
[allowlist]
description = "Verified false positives - reviewed 2024-01-15"

# Test fixtures created during initial test suite development
# Contains only example credentials for testing credential validation
paths = ['''test/fixtures/credentials\.json''']

# Documentation examples using clearly fake values
# All examples prefixed with "example_" or "test_"
stopwords = ["example_", "test_"]
```

### 2. Regular Allowlist Review

Schedule periodic reviews:

```bash
#!/bin/bash
# review-allowlist.sh

echo "Gitleaks Allowlist Review"
echo "========================="
echo ""

# Show allowlist paths
echo "Allowlisted paths:"
grep -A 10 "^\[allowlist\]" .gitleaks.toml | grep "paths = "

# Show allowlisted commits
echo ""
echo "Allowlisted commits:"
grep -A 10 "^\[allowlist\]" .gitleaks.toml | grep "commits = "

# Check if commits still exist
# (May have been removed in history rewrite)
git rev-parse --verify abc123def456 2>/dev/null || echo "WARNING: Commit abc123def456 not found"
```

### 3. Use Inline Annotations Sparingly

For one-off false positives, use inline comments:

```python
# This is a test password for unit tests only
# gitleaks:allow
TEST_PASSWORD = "test_password_123"
```

**Warning**: Overuse of inline annotations indicates poorly tuned configuration.

### 4. Version Control Your Configuration

Track changes to `.gitleaks.toml`:

```bash
git log -p .gitleaks.toml

# See who allowlisted what and when
git blame .gitleaks.toml
```

### 5. Test Allowlist Changes

Before committing allowlist changes:

```bash
# Test configuration
gitleaks detect --config .gitleaks.toml -v

# Verify specific file is now allowed
gitleaks detect --config .gitleaks.toml --source test/fixtures/credentials.json

# Verify secret is still caught in production code
echo 'api_key = "sk_live_actual_key"' > /tmp/test_detection.py
gitleaks detect --config .gitleaks.toml --source /tmp/test_detection.py --no-git
```

### 6. Separate Allowlists by Environment

Use different configurations for different contexts:

```bash
# Strict config for production code
gitleaks detect --config .gitleaks.strict.toml --source src/

# Lenient config for test code
gitleaks detect --config .gitleaks.lenient.toml --source test/
```

### 7. Monitor False Positive Rate

Track metrics over time:

```bash
# Total findings
TOTAL=$(gitleaks detect --report-format json 2>/dev/null | jq '. | length')

# Run with allowlist
AFTER_FILTER=$(gitleaks detect --config .gitleaks.toml --report-format json 2>/dev/null | jq '. | length')

# Calculate reduction
echo "False positive reduction: $(($TOTAL - $AFTER_FILTER)) / $TOTAL"
```

**Target**: < 10% false positive rate for good developer experience.

### 8. Security Review for New Allowlists

Require security team approval for:
- New allowlisted paths in `src/` or production code
- New allowlisted commits (verify manually first)
- Changes to rule-specific allowlists
- New stopwords that could mask real secrets

### 9. Avoid Overly Broad Patterns

**Bad** (too broad):
```toml
[allowlist]
paths = ['''.*''']  # Disables all detection!
stopwords = ["key", "secret"]  # Matches too many real secrets
```

**Good** (specific):
```toml
[allowlist]
paths = ['''test/unit/.*\.test\.js$''']  # Specific test directory
stopwords = ["example_key", "test_secret"]  # Specific placeholders
```

### 10. Escape Special Characters

When using regex patterns, escape properly:

```toml
[allowlist]
regexes = [
  '''api\.example\.com''',  # Literal dot
  '''config\[\'key\'\]''',  # Literal brackets and quotes
]
```

## Troubleshooting False Positives

### Issue: Can't Identify Source of False Positive

```bash
# Run with verbose output
gitleaks detect -v | grep "RuleID"

# Get detailed finding information
gitleaks detect --report-format json | jq '.[] | {file: .File, line: .StartLine, rule: .RuleID}'

# View context around detection
gitleaks detect --report-format json | jq -r '.[0] | .File, .StartLine' | xargs -I {} sh -c 'sed -n "{}-5,{}+5p" {}'
```

### Issue: Allowlist Not Working

```bash
# Verify config is loaded
gitleaks detect --config .gitleaks.toml -v 2>&1 | grep "config"

# Check regex syntax
echo "test_string" | grep -E 'your_regex_pattern'

# Test path matching
echo "test/fixtures/file.json" | grep -E 'test/fixtures/.*'
```

### Issue: Too Many False Positives

1. **Export findings**: `gitleaks detect --report-format json > findings.json`
2. **Analyze patterns**: `jq -r '.[].File' findings.json | sort | uniq -c | sort -rn`
3. **Group by rule**: `jq -r '.[].RuleID' findings.json | sort | uniq -c | sort -rn`
4. **Create targeted allowlists** based on analysis

## False Positive vs Real Secret

When unsure, err on the side of caution:

| Indicator | False Positive | Real Secret |
|-----------|----------------|-------------|
| Location | Test/docs/examples | Production code |
| Pattern | "example", "test", "mock" | No such indicators |
| Entropy | Low/medium | High |
| Format | Incomplete/truncated | Complete/valid |
| Context | Educational comments | Functional code |
| Git history | Added in test commits | Added furtively |

**When in doubt**: Treat as real secret and investigate.