Files
gh-ricardoroche-ricardos-cl…/.claude/skills/pii-redaction/SKILL.md
2025-11-30 08:51:46 +08:00

10 KiB

name, description
name description
pii-redaction Automatically applies when logging sensitive data. Ensures PII (phone numbers, emails, IDs, payment data) is redacted in all logs and outputs for compliance.

PII Redaction Enforcer

When you are writing code that logs or outputs:

  • Phone numbers (mobile, landline)
  • Email addresses
  • User IDs / Subscriber IDs
  • Payment tokens / Credit card numbers
  • Social Security Numbers / Tax IDs
  • Physical addresses
  • Names (in sensitive contexts)
  • IP addresses (in some jurisdictions)
  • Any other personally identifiable information

Always redact PII in logs, error messages, and debug output.

Correct Pattern

import logging
import re

logger = logging.getLogger(__name__)

def redact_pii(value: str, show_last: int = 4) -> str:
    """
    Redact PII showing only last N characters.

    Args:
        value: String to redact
        show_last: Number of characters to show (default: 4)

    Returns:
        Redacted string (e.g., "***5678")
    """
    if not value or len(value) <= show_last:
        return "***"
    return f"***{value[-show_last:]}"

def redact_email(email: str) -> str:
    """
    Redact email preserving domain.

    Args:
        email: Email address to redact

    Returns:
        Redacted email (e.g., "u***r@example.com")
    """
    if not email or '@' not in email:
        return "***"

    local, domain = email.split('@', 1)
    if len(local) <= 2:
        return f"***@{domain}"

    return f"{local[0]}***{local[-1]}@{domain}"

# Usage in logging
logger.info(f"Processing user | subscriber_id={redact_pii(subscriber_id)}")
logger.info(f"Payment received | token={redact_pii(payment_token)} | user={redact_email(email)}")
logger.info(f"Order created | phone={redact_pii(phone_number)}")

Output:

Processing user | subscriber_id=***5678
Payment received | token=***2344 | user=u***r@example.com
Order created | phone=***1234

Incorrect Pattern (PII Leak)

# ❌ Full PII exposed
logger.info(f"Processing subscriber_id: {subscriber_id}")
logger.info(f"User email: {email}")
logger.info(f"Phone: {phone_number}")

# ❌ PII in error messages
raise ValueError(f"Invalid subscriber ID: {subscriber_id}")

# ❌ PII in API responses (should only redact in logs)
return {"error": f"User {email} not found"}  # OK in response, but log should redact

# ❌ Insufficient redaction
logger.info(f"Phone: {phone_number[:3]}****")  # Shows area code - still PII!

Redaction Helper Functions

import re
from typing import Optional

class PIIRedactor:
    """Helper class for consistent PII redaction."""

    @staticmethod
    def phone(phone: str, show_last: int = 4) -> str:
        """Redact phone number."""
        digits = re.sub(r'\D', '', phone)
        if len(digits) <= show_last:
            return "***"
        return f"***{digits[-show_last:]}"

    @staticmethod
    def email(email: str) -> str:
        """Redact email preserving domain."""
        if not email or '@' not in email:
            return "***"

        local, domain = email.split('@', 1)
        if len(local) <= 2:
            return f"***@{domain}"

        return f"{local[0]}***{local[-1]}@{domain}"

    @staticmethod
    def card_number(card: str, show_last: int = 4) -> str:
        """Redact credit card number."""
        digits = re.sub(r'\D', '', card)
        if len(digits) <= show_last:
            return "***"
        return f"***{digits[-show_last:]}"

    @staticmethod
    def ssn(ssn: str) -> str:
        """Redact SSN completely."""
        return "***-**-****"

    @staticmethod
    def address(address: str) -> str:
        """Redact street address, keep city/state."""
        # Example: "123 Main St, Boston, MA" -> "*** Main St, Boston, MA"
        parts = address.split(',')
        if parts:
            street = parts[0]
            # Redact house number but keep street name
            street_redacted = re.sub(r'^\d+', '***', street)
            parts[0] = street_redacted
        return ','.join(parts)

    @staticmethod
    def generic(value: str, show_last: int = 4) -> str:
        """Generic redaction for any string."""
        if not value or len(value) <= show_last:
            return "***"
        return f"***{value[-show_last:]}"

# Usage
redactor = PIIRedactor()
logger.info(f"User phone: {redactor.phone('555-123-4567')}")
logger.info(f"User email: {redactor.email('john.doe@example.com')}")
logger.info(f"Card: {redactor.card_number('4111-1111-1111-1111')}")

When to Redact

Always redact in:

  • Log statements (logger.info, logger.debug, logger.error)
  • Audit logs
  • Error messages (logs)
  • Monitoring/observability data (OTEL, metrics)
  • Debug output
  • Stack traces (if they contain PII)

Don't redact in:

  • Actual API calls (need full data)
  • Database queries (need full data)
  • Function parameters (internal use)
  • Return values to authorized clients
  • Encrypted storage
  • Secure internal processing

Context matters:

# ❌ Bad: Redacting in business logic
def send_email(email: str):
    send_to(redact_email(email))  # Wrong! Need real email to send!

# ✅ Good: Redacting in logs
def send_email(email: str):
    logger.info(f"Sending email to {redact_email(email)}")  # Log: redacted
    send_to(email)  # Actual operation: full email

Structured Logging with Redaction

import logging
import structlog

# Configure structlog with redaction
def redact_processor(logger, method_name, event_dict):
    """Processor to redact PII fields."""
    pii_fields = ['email', 'phone', 'ssn', 'card_number', 'subscriber_id']

    for field in pii_fields:
        if field in event_dict:
            if field == 'email':
                event_dict[field] = PIIRedactor.email(event_dict[field])
            else:
                event_dict[field] = PIIRedactor.generic(event_dict[field])

    return event_dict

structlog.configure(
    processors=[
        redact_processor,
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()

# Usage - automatic redaction
logger.info("user_action", email="user@example.com", phone="555-1234")
# Output: {"event": "user_action", "email": "u***r@example.com", "phone": "***1234"}

Compliance Considerations

GDPR (EU):

  • Personal data must be protected
  • Logs are considered data processing
  • Must have legal basis for logging PII
  • Right to be forgotten applies to logs

CCPA (California):

  • Consumer personal information must be protected
  • Includes identifiers, browsing history, biometric data

PCI-DSS (Payment Cards):

  • Never log full card numbers (PAN)
  • Never log CVV/CVV2
  • Card holder name is sensitive too
  • Mask PAN when displayed (show last 4 only)

HIPAA (Healthcare):

  • PHI (Protected Health Information) must be secured
  • Includes names, addresses, dates, phone numbers
  • Medical record numbers, device identifiers
# PCI-DSS compliant logging
def process_payment(card_number: str, cvv: str, amount: Decimal):
    """
    Process payment transaction.

    Security Note:
        PCI-DSS compliant. Never logs full card number or CVV.
    """
    # ✅ Redact card number in logs
    logger.info(
        f"Processing payment | "
        f"card={redact_pii(card_number, show_last=4)} | "
        f"amount={amount}"
        # ✅ NEVER log CVV
    )

    # Process with full data
    result = payment_gateway.charge(card_number, cvv, amount)

    # ✅ Redact in audit log
    logger.info(
        f"AUDIT: Payment processed | "
        f"card={redact_pii(card_number, show_last=4)} | "
        f"amount={amount} | "
        f"status={result.status}"
    )

    return result

Testing Redaction

import pytest

def test_redact_phone():
    """Test phone redaction."""
    assert redact_pii("5551234567") == "***4567"
    assert redact_pii("555-123-4567") == "***4567"

def test_redact_email():
    """Test email redaction."""
    assert redact_email("user@example.com") == "u***r@example.com"
    assert redact_email("a@example.com") == "***@example.com"

def test_redact_empty():
    """Test empty string handling."""
    assert redact_pii("") == "***"
    assert redact_pii(None) == "***"

def test_log_redaction(caplog):
    """Test that logs contain redacted PII."""
    with caplog.at_level(logging.INFO):
        user_id = "user_12345678"
        logger.info(f"Processing user={redact_pii(user_id)}")

    assert "***5678" in caplog.text
    assert "user_12345678" not in caplog.text

Anti-Patterns

# ❌ Not redacting at all
logger.info(f"User {user_id} logged in from {ip_address}")

# ❌ Inconsistent redaction
logger.info(f"User: ***{user_id[-4:]}")  # Different format each time
logger.debug(f"User: {user_id[:4]}****")

# ❌ Insufficient redaction
logger.info(f"Email: {email[0]}****")  # Just first letter isn't enough

# ❌ Redacting too much (lost debugability)
logger.info("User logged in")  # No identifier at all - can't debug!

# ✅ Better: Redact but keep debugability
logger.info(f"User logged in | user_id={redact_pii(user_id)}")

# ❌ Forgetting error messages
try:
    process_user(email)
except Exception as e:
    logger.error(f"Failed for {email}: {e}")  # PII in error log!

# ✅ Better
try:
    process_user(email)
except Exception as e:
    logger.error(f"Failed for {redact_email(email)}: {e}")

Best Practices Checklist

  • Create consistent redaction helper functions
  • Redact all PII in logs automatically
  • Show last 4 characters for debugging
  • Preserve enough info for support/debugging
  • Use structured logging with auto-redaction
  • Document PII handling in Security Notes
  • Test that redaction works correctly
  • Train team on PII identification
  • Regular audit of logs for PII leaks
  • Consider legal requirements (GDPR, CCPA, PCI-DSS)

Auto-Apply

When you see logging of PII, automatically:

  1. Wrap with appropriate redaction function
  2. Show last 4 characters for debugging
  3. Add Security Note to docstring if handling PII
  4. Use consistent redaction format
  5. Test that PII is not in logs
  • structured-errors - Redact PII in error responses
  • docstring-format - Document PII handling with Security Note
  • tool-design-pattern - Redact PII in tool logs
  • pytest-patterns - Test PII redaction