370 lines
10 KiB
Markdown
370 lines
10 KiB
Markdown
---
|
|
name: pii-redaction
|
|
description: Automatically applies when logging sensitive data. Ensures PII (phone numbers, emails, IDs, payment data) is redacted in all logs and outputs for compliance.
|
|
---
|
|
|
|
# PII Redaction Enforcer
|
|
|
|
When you are writing code that logs or outputs:
|
|
- Phone numbers (mobile, landline)
|
|
- Email addresses
|
|
- User IDs / Subscriber IDs
|
|
- Payment tokens / Credit card numbers
|
|
- Social Security Numbers / Tax IDs
|
|
- Physical addresses
|
|
- Names (in sensitive contexts)
|
|
- IP addresses (in some jurisdictions)
|
|
- Any other personally identifiable information
|
|
|
|
**Always redact PII in logs, error messages, and debug output.**
|
|
|
|
## ✅ Correct Pattern
|
|
|
|
```python
|
|
import logging
|
|
import re
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
def redact_pii(value: str, show_last: int = 4) -> str:
|
|
"""
|
|
Redact PII showing only last N characters.
|
|
|
|
Args:
|
|
value: String to redact
|
|
show_last: Number of characters to show (default: 4)
|
|
|
|
Returns:
|
|
Redacted string (e.g., "***5678")
|
|
"""
|
|
if not value or len(value) <= show_last:
|
|
return "***"
|
|
return f"***{value[-show_last:]}"
|
|
|
|
def redact_email(email: str) -> str:
|
|
"""
|
|
Redact email preserving domain.
|
|
|
|
Args:
|
|
email: Email address to redact
|
|
|
|
Returns:
|
|
Redacted email (e.g., "u***r@example.com")
|
|
"""
|
|
if not email or '@' not in email:
|
|
return "***"
|
|
|
|
local, domain = email.split('@', 1)
|
|
if len(local) <= 2:
|
|
return f"***@{domain}"
|
|
|
|
return f"{local[0]}***{local[-1]}@{domain}"
|
|
|
|
# Usage in logging
|
|
logger.info(f"Processing user | subscriber_id={redact_pii(subscriber_id)}")
|
|
logger.info(f"Payment received | token={redact_pii(payment_token)} | user={redact_email(email)}")
|
|
logger.info(f"Order created | phone={redact_pii(phone_number)}")
|
|
```
|
|
|
|
**Output:**
|
|
```
|
|
Processing user | subscriber_id=***5678
|
|
Payment received | token=***2344 | user=u***r@example.com
|
|
Order created | phone=***1234
|
|
```
|
|
|
|
## ❌ Incorrect Pattern (PII Leak)
|
|
|
|
```python
|
|
# ❌ Full PII exposed
|
|
logger.info(f"Processing subscriber_id: {subscriber_id}")
|
|
logger.info(f"User email: {email}")
|
|
logger.info(f"Phone: {phone_number}")
|
|
|
|
# ❌ PII in error messages
|
|
raise ValueError(f"Invalid subscriber ID: {subscriber_id}")
|
|
|
|
# ❌ PII in API responses (should only redact in logs)
|
|
return {"error": f"User {email} not found"} # OK in response, but log should redact
|
|
|
|
# ❌ Insufficient redaction
|
|
logger.info(f"Phone: {phone_number[:3]}****") # Shows area code - still PII!
|
|
```
|
|
|
|
## Redaction Helper Functions
|
|
|
|
```python
|
|
import re
|
|
from typing import Optional
|
|
|
|
class PIIRedactor:
|
|
"""Helper class for consistent PII redaction."""
|
|
|
|
@staticmethod
|
|
def phone(phone: str, show_last: int = 4) -> str:
|
|
"""Redact phone number."""
|
|
digits = re.sub(r'\D', '', phone)
|
|
if len(digits) <= show_last:
|
|
return "***"
|
|
return f"***{digits[-show_last:]}"
|
|
|
|
@staticmethod
|
|
def email(email: str) -> str:
|
|
"""Redact email preserving domain."""
|
|
if not email or '@' not in email:
|
|
return "***"
|
|
|
|
local, domain = email.split('@', 1)
|
|
if len(local) <= 2:
|
|
return f"***@{domain}"
|
|
|
|
return f"{local[0]}***{local[-1]}@{domain}"
|
|
|
|
@staticmethod
|
|
def card_number(card: str, show_last: int = 4) -> str:
|
|
"""Redact credit card number."""
|
|
digits = re.sub(r'\D', '', card)
|
|
if len(digits) <= show_last:
|
|
return "***"
|
|
return f"***{digits[-show_last:]}"
|
|
|
|
@staticmethod
|
|
def ssn(ssn: str) -> str:
|
|
"""Redact SSN completely."""
|
|
return "***-**-****"
|
|
|
|
@staticmethod
|
|
def address(address: str) -> str:
|
|
"""Redact street address, keep city/state."""
|
|
# Example: "123 Main St, Boston, MA" -> "*** Main St, Boston, MA"
|
|
parts = address.split(',')
|
|
if parts:
|
|
street = parts[0]
|
|
# Redact house number but keep street name
|
|
street_redacted = re.sub(r'^\d+', '***', street)
|
|
parts[0] = street_redacted
|
|
return ','.join(parts)
|
|
|
|
@staticmethod
|
|
def generic(value: str, show_last: int = 4) -> str:
|
|
"""Generic redaction for any string."""
|
|
if not value or len(value) <= show_last:
|
|
return "***"
|
|
return f"***{value[-show_last:]}"
|
|
|
|
# Usage
|
|
redactor = PIIRedactor()
|
|
logger.info(f"User phone: {redactor.phone('555-123-4567')}")
|
|
logger.info(f"User email: {redactor.email('john.doe@example.com')}")
|
|
logger.info(f"Card: {redactor.card_number('4111-1111-1111-1111')}")
|
|
```
|
|
|
|
## When to Redact
|
|
|
|
**Always redact in:**
|
|
- ✅ Log statements (`logger.info`, `logger.debug`, `logger.error`)
|
|
- ✅ Audit logs
|
|
- ✅ Error messages (logs)
|
|
- ✅ Monitoring/observability data (OTEL, metrics)
|
|
- ✅ Debug output
|
|
- ✅ Stack traces (if they contain PII)
|
|
|
|
**Don't redact in:**
|
|
- ❌ Actual API calls (need full data)
|
|
- ❌ Database queries (need full data)
|
|
- ❌ Function parameters (internal use)
|
|
- ❌ Return values to authorized clients
|
|
- ❌ Encrypted storage
|
|
- ❌ Secure internal processing
|
|
|
|
**Context matters:**
|
|
```python
|
|
# ❌ Bad: Redacting in business logic
|
|
def send_email(email: str):
|
|
send_to(redact_email(email)) # Wrong! Need real email to send!
|
|
|
|
# ✅ Good: Redacting in logs
|
|
def send_email(email: str):
|
|
logger.info(f"Sending email to {redact_email(email)}") # Log: redacted
|
|
send_to(email) # Actual operation: full email
|
|
```
|
|
|
|
## Structured Logging with Redaction
|
|
|
|
```python
|
|
import logging
|
|
import structlog
|
|
|
|
# Configure structlog with redaction
|
|
def redact_processor(logger, method_name, event_dict):
|
|
"""Processor to redact PII fields."""
|
|
pii_fields = ['email', 'phone', 'ssn', 'card_number', 'subscriber_id']
|
|
|
|
for field in pii_fields:
|
|
if field in event_dict:
|
|
if field == 'email':
|
|
event_dict[field] = PIIRedactor.email(event_dict[field])
|
|
else:
|
|
event_dict[field] = PIIRedactor.generic(event_dict[field])
|
|
|
|
return event_dict
|
|
|
|
structlog.configure(
|
|
processors=[
|
|
redact_processor,
|
|
structlog.processors.JSONRenderer()
|
|
]
|
|
)
|
|
|
|
logger = structlog.get_logger()
|
|
|
|
# Usage - automatic redaction
|
|
logger.info("user_action", email="user@example.com", phone="555-1234")
|
|
# Output: {"event": "user_action", "email": "u***r@example.com", "phone": "***1234"}
|
|
```
|
|
|
|
## Compliance Considerations
|
|
|
|
**GDPR (EU):**
|
|
- Personal data must be protected
|
|
- Logs are considered data processing
|
|
- Must have legal basis for logging PII
|
|
- Right to be forgotten applies to logs
|
|
|
|
**CCPA (California):**
|
|
- Consumer personal information must be protected
|
|
- Includes identifiers, browsing history, biometric data
|
|
|
|
**PCI-DSS (Payment Cards):**
|
|
- Never log full card numbers (PAN)
|
|
- Never log CVV/CVV2
|
|
- Card holder name is sensitive too
|
|
- Mask PAN when displayed (show last 4 only)
|
|
|
|
**HIPAA (Healthcare):**
|
|
- PHI (Protected Health Information) must be secured
|
|
- Includes names, addresses, dates, phone numbers
|
|
- Medical record numbers, device identifiers
|
|
|
|
```python
|
|
# PCI-DSS compliant logging
|
|
def process_payment(card_number: str, cvv: str, amount: Decimal):
|
|
"""
|
|
Process payment transaction.
|
|
|
|
Security Note:
|
|
PCI-DSS compliant. Never logs full card number or CVV.
|
|
"""
|
|
# ✅ Redact card number in logs
|
|
logger.info(
|
|
f"Processing payment | "
|
|
f"card={redact_pii(card_number, show_last=4)} | "
|
|
f"amount={amount}"
|
|
# ✅ NEVER log CVV
|
|
)
|
|
|
|
# Process with full data
|
|
result = payment_gateway.charge(card_number, cvv, amount)
|
|
|
|
# ✅ Redact in audit log
|
|
logger.info(
|
|
f"AUDIT: Payment processed | "
|
|
f"card={redact_pii(card_number, show_last=4)} | "
|
|
f"amount={amount} | "
|
|
f"status={result.status}"
|
|
)
|
|
|
|
return result
|
|
```
|
|
|
|
## Testing Redaction
|
|
|
|
```python
|
|
import pytest
|
|
|
|
def test_redact_phone():
|
|
"""Test phone redaction."""
|
|
assert redact_pii("5551234567") == "***4567"
|
|
assert redact_pii("555-123-4567") == "***4567"
|
|
|
|
def test_redact_email():
|
|
"""Test email redaction."""
|
|
assert redact_email("user@example.com") == "u***r@example.com"
|
|
assert redact_email("a@example.com") == "***@example.com"
|
|
|
|
def test_redact_empty():
|
|
"""Test empty string handling."""
|
|
assert redact_pii("") == "***"
|
|
assert redact_pii(None) == "***"
|
|
|
|
def test_log_redaction(caplog):
|
|
"""Test that logs contain redacted PII."""
|
|
with caplog.at_level(logging.INFO):
|
|
user_id = "user_12345678"
|
|
logger.info(f"Processing user={redact_pii(user_id)}")
|
|
|
|
assert "***5678" in caplog.text
|
|
assert "user_12345678" not in caplog.text
|
|
```
|
|
|
|
## ❌ Anti-Patterns
|
|
|
|
```python
|
|
# ❌ Not redacting at all
|
|
logger.info(f"User {user_id} logged in from {ip_address}")
|
|
|
|
# ❌ Inconsistent redaction
|
|
logger.info(f"User: ***{user_id[-4:]}") # Different format each time
|
|
logger.debug(f"User: {user_id[:4]}****")
|
|
|
|
# ❌ Insufficient redaction
|
|
logger.info(f"Email: {email[0]}****") # Just first letter isn't enough
|
|
|
|
# ❌ Redacting too much (lost debugability)
|
|
logger.info("User logged in") # No identifier at all - can't debug!
|
|
|
|
# ✅ Better: Redact but keep debugability
|
|
logger.info(f"User logged in | user_id={redact_pii(user_id)}")
|
|
|
|
# ❌ Forgetting error messages
|
|
try:
|
|
process_user(email)
|
|
except Exception as e:
|
|
logger.error(f"Failed for {email}: {e}") # PII in error log!
|
|
|
|
# ✅ Better
|
|
try:
|
|
process_user(email)
|
|
except Exception as e:
|
|
logger.error(f"Failed for {redact_email(email)}: {e}")
|
|
```
|
|
|
|
## Best Practices Checklist
|
|
|
|
- ✅ Create consistent redaction helper functions
|
|
- ✅ Redact all PII in logs automatically
|
|
- ✅ Show last 4 characters for debugging
|
|
- ✅ Preserve enough info for support/debugging
|
|
- ✅ Use structured logging with auto-redaction
|
|
- ✅ Document PII handling in Security Notes
|
|
- ✅ Test that redaction works correctly
|
|
- ✅ Train team on PII identification
|
|
- ✅ Regular audit of logs for PII leaks
|
|
- ✅ Consider legal requirements (GDPR, CCPA, PCI-DSS)
|
|
|
|
## Auto-Apply
|
|
|
|
When you see logging of PII, automatically:
|
|
1. Wrap with appropriate redaction function
|
|
2. Show last 4 characters for debugging
|
|
3. Add Security Note to docstring if handling PII
|
|
4. Use consistent redaction format
|
|
5. Test that PII is not in logs
|
|
|
|
## Related Skills
|
|
|
|
- structured-errors - Redact PII in error responses
|
|
- docstring-format - Document PII handling with Security Note
|
|
- tool-design-pattern - Redact PII in tool logs
|
|
- pytest-patterns - Test PII redaction
|