Initial commit
This commit is contained in:
369
.claude/skills/pii-redaction/SKILL.md
Normal file
369
.claude/skills/pii-redaction/SKILL.md
Normal file
@@ -0,0 +1,369 @@
|
||||
---
|
||||
name: pii-redaction
|
||||
description: Automatically applies when logging sensitive data. Ensures PII (phone numbers, emails, IDs, payment data) is redacted in all logs and outputs for compliance.
|
||||
---
|
||||
|
||||
# PII Redaction Enforcer
|
||||
|
||||
When you are writing code that logs or outputs:
|
||||
- Phone numbers (mobile, landline)
|
||||
- Email addresses
|
||||
- User IDs / Subscriber IDs
|
||||
- Payment tokens / Credit card numbers
|
||||
- Social Security Numbers / Tax IDs
|
||||
- Physical addresses
|
||||
- Names (in sensitive contexts)
|
||||
- IP addresses (in some jurisdictions)
|
||||
- Any other personally identifiable information
|
||||
|
||||
**Always redact PII in logs, error messages, and debug output.**
|
||||
|
||||
## ✅ Correct Pattern
|
||||
|
||||
```python
|
||||
import logging
|
||||
import re
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def redact_pii(value: str, show_last: int = 4) -> str:
|
||||
"""
|
||||
Redact PII showing only last N characters.
|
||||
|
||||
Args:
|
||||
value: String to redact
|
||||
show_last: Number of characters to show (default: 4)
|
||||
|
||||
Returns:
|
||||
Redacted string (e.g., "***5678")
|
||||
"""
|
||||
if not value or len(value) <= show_last:
|
||||
return "***"
|
||||
return f"***{value[-show_last:]}"
|
||||
|
||||
def redact_email(email: str) -> str:
|
||||
"""
|
||||
Redact email preserving domain.
|
||||
|
||||
Args:
|
||||
email: Email address to redact
|
||||
|
||||
Returns:
|
||||
Redacted email (e.g., "u***r@example.com")
|
||||
"""
|
||||
if not email or '@' not in email:
|
||||
return "***"
|
||||
|
||||
local, domain = email.split('@', 1)
|
||||
if len(local) <= 2:
|
||||
return f"***@{domain}"
|
||||
|
||||
return f"{local[0]}***{local[-1]}@{domain}"
|
||||
|
||||
# Usage in logging
|
||||
logger.info(f"Processing user | subscriber_id={redact_pii(subscriber_id)}")
|
||||
logger.info(f"Payment received | token={redact_pii(payment_token)} | user={redact_email(email)}")
|
||||
logger.info(f"Order created | phone={redact_pii(phone_number)}")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
Processing user | subscriber_id=***5678
|
||||
Payment received | token=***2344 | user=u***r@example.com
|
||||
Order created | phone=***1234
|
||||
```
|
||||
|
||||
## ❌ Incorrect Pattern (PII Leak)
|
||||
|
||||
```python
|
||||
# ❌ Full PII exposed
|
||||
logger.info(f"Processing subscriber_id: {subscriber_id}")
|
||||
logger.info(f"User email: {email}")
|
||||
logger.info(f"Phone: {phone_number}")
|
||||
|
||||
# ❌ PII in error messages
|
||||
raise ValueError(f"Invalid subscriber ID: {subscriber_id}")
|
||||
|
||||
# ❌ PII in API responses (should only redact in logs)
|
||||
return {"error": f"User {email} not found"} # OK in response, but log should redact
|
||||
|
||||
# ❌ Insufficient redaction
|
||||
logger.info(f"Phone: {phone_number[:3]}****") # Shows area code - still PII!
|
||||
```
|
||||
|
||||
## Redaction Helper Functions
|
||||
|
||||
```python
|
||||
import re
|
||||
from typing import Optional
|
||||
|
||||
class PIIRedactor:
|
||||
"""Helper class for consistent PII redaction."""
|
||||
|
||||
@staticmethod
|
||||
def phone(phone: str, show_last: int = 4) -> str:
|
||||
"""Redact phone number."""
|
||||
digits = re.sub(r'\D', '', phone)
|
||||
if len(digits) <= show_last:
|
||||
return "***"
|
||||
return f"***{digits[-show_last:]}"
|
||||
|
||||
@staticmethod
|
||||
def email(email: str) -> str:
|
||||
"""Redact email preserving domain."""
|
||||
if not email or '@' not in email:
|
||||
return "***"
|
||||
|
||||
local, domain = email.split('@', 1)
|
||||
if len(local) <= 2:
|
||||
return f"***@{domain}"
|
||||
|
||||
return f"{local[0]}***{local[-1]}@{domain}"
|
||||
|
||||
@staticmethod
|
||||
def card_number(card: str, show_last: int = 4) -> str:
|
||||
"""Redact credit card number."""
|
||||
digits = re.sub(r'\D', '', card)
|
||||
if len(digits) <= show_last:
|
||||
return "***"
|
||||
return f"***{digits[-show_last:]}"
|
||||
|
||||
@staticmethod
|
||||
def ssn(ssn: str) -> str:
|
||||
"""Redact SSN completely."""
|
||||
return "***-**-****"
|
||||
|
||||
@staticmethod
|
||||
def address(address: str) -> str:
|
||||
"""Redact street address, keep city/state."""
|
||||
# Example: "123 Main St, Boston, MA" -> "*** Main St, Boston, MA"
|
||||
parts = address.split(',')
|
||||
if parts:
|
||||
street = parts[0]
|
||||
# Redact house number but keep street name
|
||||
street_redacted = re.sub(r'^\d+', '***', street)
|
||||
parts[0] = street_redacted
|
||||
return ','.join(parts)
|
||||
|
||||
@staticmethod
|
||||
def generic(value: str, show_last: int = 4) -> str:
|
||||
"""Generic redaction for any string."""
|
||||
if not value or len(value) <= show_last:
|
||||
return "***"
|
||||
return f"***{value[-show_last:]}"
|
||||
|
||||
# Usage
|
||||
redactor = PIIRedactor()
|
||||
logger.info(f"User phone: {redactor.phone('555-123-4567')}")
|
||||
logger.info(f"User email: {redactor.email('john.doe@example.com')}")
|
||||
logger.info(f"Card: {redactor.card_number('4111-1111-1111-1111')}")
|
||||
```
|
||||
|
||||
## When to Redact
|
||||
|
||||
**Always redact in:**
|
||||
- ✅ Log statements (`logger.info`, `logger.debug`, `logger.error`)
|
||||
- ✅ Audit logs
|
||||
- ✅ Error messages (logs)
|
||||
- ✅ Monitoring/observability data (OTEL, metrics)
|
||||
- ✅ Debug output
|
||||
- ✅ Stack traces (if they contain PII)
|
||||
|
||||
**Don't redact in:**
|
||||
- ❌ Actual API calls (need full data)
|
||||
- ❌ Database queries (need full data)
|
||||
- ❌ Function parameters (internal use)
|
||||
- ❌ Return values to authorized clients
|
||||
- ❌ Encrypted storage
|
||||
- ❌ Secure internal processing
|
||||
|
||||
**Context matters:**
|
||||
```python
|
||||
# ❌ Bad: Redacting in business logic
|
||||
def send_email(email: str):
|
||||
send_to(redact_email(email)) # Wrong! Need real email to send!
|
||||
|
||||
# ✅ Good: Redacting in logs
|
||||
def send_email(email: str):
|
||||
logger.info(f"Sending email to {redact_email(email)}") # Log: redacted
|
||||
send_to(email) # Actual operation: full email
|
||||
```
|
||||
|
||||
## Structured Logging with Redaction
|
||||
|
||||
```python
|
||||
import logging
|
||||
import structlog
|
||||
|
||||
# Configure structlog with redaction
|
||||
def redact_processor(logger, method_name, event_dict):
|
||||
"""Processor to redact PII fields."""
|
||||
pii_fields = ['email', 'phone', 'ssn', 'card_number', 'subscriber_id']
|
||||
|
||||
for field in pii_fields:
|
||||
if field in event_dict:
|
||||
if field == 'email':
|
||||
event_dict[field] = PIIRedactor.email(event_dict[field])
|
||||
else:
|
||||
event_dict[field] = PIIRedactor.generic(event_dict[field])
|
||||
|
||||
return event_dict
|
||||
|
||||
structlog.configure(
|
||||
processors=[
|
||||
redact_processor,
|
||||
structlog.processors.JSONRenderer()
|
||||
]
|
||||
)
|
||||
|
||||
logger = structlog.get_logger()
|
||||
|
||||
# Usage - automatic redaction
|
||||
logger.info("user_action", email="user@example.com", phone="555-1234")
|
||||
# Output: {"event": "user_action", "email": "u***r@example.com", "phone": "***1234"}
|
||||
```
|
||||
|
||||
## Compliance Considerations
|
||||
|
||||
**GDPR (EU):**
|
||||
- Personal data must be protected
|
||||
- Logs are considered data processing
|
||||
- Must have legal basis for logging PII
|
||||
- Right to be forgotten applies to logs
|
||||
|
||||
**CCPA (California):**
|
||||
- Consumer personal information must be protected
|
||||
- Includes identifiers, browsing history, biometric data
|
||||
|
||||
**PCI-DSS (Payment Cards):**
|
||||
- Never log full card numbers (PAN)
|
||||
- Never log CVV/CVV2
|
||||
- Card holder name is sensitive too
|
||||
- Mask PAN when displayed (show last 4 only)
|
||||
|
||||
**HIPAA (Healthcare):**
|
||||
- PHI (Protected Health Information) must be secured
|
||||
- Includes names, addresses, dates, phone numbers
|
||||
- Medical record numbers, device identifiers
|
||||
|
||||
```python
|
||||
# PCI-DSS compliant logging
|
||||
def process_payment(card_number: str, cvv: str, amount: Decimal):
|
||||
"""
|
||||
Process payment transaction.
|
||||
|
||||
Security Note:
|
||||
PCI-DSS compliant. Never logs full card number or CVV.
|
||||
"""
|
||||
# ✅ Redact card number in logs
|
||||
logger.info(
|
||||
f"Processing payment | "
|
||||
f"card={redact_pii(card_number, show_last=4)} | "
|
||||
f"amount={amount}"
|
||||
# ✅ NEVER log CVV
|
||||
)
|
||||
|
||||
# Process with full data
|
||||
result = payment_gateway.charge(card_number, cvv, amount)
|
||||
|
||||
# ✅ Redact in audit log
|
||||
logger.info(
|
||||
f"AUDIT: Payment processed | "
|
||||
f"card={redact_pii(card_number, show_last=4)} | "
|
||||
f"amount={amount} | "
|
||||
f"status={result.status}"
|
||||
)
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
## Testing Redaction
|
||||
|
||||
```python
|
||||
import pytest
|
||||
|
||||
def test_redact_phone():
|
||||
"""Test phone redaction."""
|
||||
assert redact_pii("5551234567") == "***4567"
|
||||
assert redact_pii("555-123-4567") == "***4567"
|
||||
|
||||
def test_redact_email():
|
||||
"""Test email redaction."""
|
||||
assert redact_email("user@example.com") == "u***r@example.com"
|
||||
assert redact_email("a@example.com") == "***@example.com"
|
||||
|
||||
def test_redact_empty():
|
||||
"""Test empty string handling."""
|
||||
assert redact_pii("") == "***"
|
||||
assert redact_pii(None) == "***"
|
||||
|
||||
def test_log_redaction(caplog):
|
||||
"""Test that logs contain redacted PII."""
|
||||
with caplog.at_level(logging.INFO):
|
||||
user_id = "user_12345678"
|
||||
logger.info(f"Processing user={redact_pii(user_id)}")
|
||||
|
||||
assert "***5678" in caplog.text
|
||||
assert "user_12345678" not in caplog.text
|
||||
```
|
||||
|
||||
## ❌ Anti-Patterns
|
||||
|
||||
```python
|
||||
# ❌ Not redacting at all
|
||||
logger.info(f"User {user_id} logged in from {ip_address}")
|
||||
|
||||
# ❌ Inconsistent redaction
|
||||
logger.info(f"User: ***{user_id[-4:]}") # Different format each time
|
||||
logger.debug(f"User: {user_id[:4]}****")
|
||||
|
||||
# ❌ Insufficient redaction
|
||||
logger.info(f"Email: {email[0]}****") # Just first letter isn't enough
|
||||
|
||||
# ❌ Redacting too much (lost debugability)
|
||||
logger.info("User logged in") # No identifier at all - can't debug!
|
||||
|
||||
# ✅ Better: Redact but keep debugability
|
||||
logger.info(f"User logged in | user_id={redact_pii(user_id)}")
|
||||
|
||||
# ❌ Forgetting error messages
|
||||
try:
|
||||
process_user(email)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed for {email}: {e}") # PII in error log!
|
||||
|
||||
# ✅ Better
|
||||
try:
|
||||
process_user(email)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed for {redact_email(email)}: {e}")
|
||||
```
|
||||
|
||||
## Best Practices Checklist
|
||||
|
||||
- ✅ Create consistent redaction helper functions
|
||||
- ✅ Redact all PII in logs automatically
|
||||
- ✅ Show last 4 characters for debugging
|
||||
- ✅ Preserve enough info for support/debugging
|
||||
- ✅ Use structured logging with auto-redaction
|
||||
- ✅ Document PII handling in Security Notes
|
||||
- ✅ Test that redaction works correctly
|
||||
- ✅ Train team on PII identification
|
||||
- ✅ Regular audit of logs for PII leaks
|
||||
- ✅ Consider legal requirements (GDPR, CCPA, PCI-DSS)
|
||||
|
||||
## Auto-Apply
|
||||
|
||||
When you see logging of PII, automatically:
|
||||
1. Wrap with appropriate redaction function
|
||||
2. Show last 4 characters for debugging
|
||||
3. Add Security Note to docstring if handling PII
|
||||
4. Use consistent redaction format
|
||||
5. Test that PII is not in logs
|
||||
|
||||
## Related Skills
|
||||
|
||||
- structured-errors - Redact PII in error responses
|
||||
- docstring-format - Document PII handling with Security Note
|
||||
- tool-design-pattern - Redact PII in tool logs
|
||||
- pytest-patterns - Test PII redaction
|
||||
Reference in New Issue
Block a user