Files
gh-greyhaven-ai-claude-code…/skills/smart-debugging/reference/fix-generation-patterns.md
2025-11-29 18:29:18 +08:00

13 KiB

Fix Generation Patterns

Comprehensive guide to generating, evaluating, and implementing fixes for software bugs.

Multiple Fix Options Strategy

Core Principle: Always generate 2-3 fix options with trade-off analysis.

Fix Option Template

**Option 1: [Name]** (e.g., Quick Fix)
**Implementation**: [What to change]
**Pros**: [Benefits]
**Cons**: [Drawbacks]
**Effort**: [Time estimate]
**Risk**: [Low/Medium/High]

**Option 2: [Name]** (e.g., Proper Fix)
**Implementation**: [What to change]
**Pros**: [Benefits]
**Cons**: [Drawbacks]
**Effort**: [Time estimate]
**Risk**: [Low/Medium/High]

**Option 3: [Name]** (e.g., Comprehensive Fix)
**Implementation**: [What to change]
**Pros**: [Benefits]
**Cons**: [Drawbacks]
**Effort**: [Time estimate]
**Risk**: [Low/Medium/High]

**Recommendation**: Option [X] because [reasoning]

See null-pointer-debug-example.md for complete fix options example.

Quick Fix vs. Proper Fix

Decision Matrix

Criteria Quick Fix Proper Fix
Urgency Production down, immediate relief needed Incident resolved, addressing root cause
Scope Minimal changes, single file Multiple files, architectural changes
Time Minutes to hours Hours to days
Testing Manual verification Full test coverage required
Risk Low (minimal changes) Medium (broader impact)
Longevity Temporary patch Permanent solution

When to Use Quick Fix

Production incident - System is down, users impacted Known workaround - Clear, safe mitigation exists Low risk - Change is isolated and reversible Follow-up planned - Proper fix scheduled for next sprint

Pattern: Quick fix now → Monitor → Proper fix later

When to Use Proper Fix

Root cause addressed - Not just treating symptoms Proper testing - Comprehensive test coverage added Type safety - Leverages static type checking Prevention - Prevents entire class of similar bugs Documentation - Code is self-documenting

Pattern: Understand root cause → Comprehensive fix → Prevent recurrence

Fix Priority Assessment

Priority Matrix

Severity Frequency Priority Response Time
Critical High P0 Immediate (< 1 hour)
Critical Low P1 Same day
Major High P1 Same day
Major Low P2 This week
Minor High P2 This week
Minor Low P3 Next sprint

Severity Criteria:

  • Critical: Data loss, security breach, production down
  • Major: Degraded performance, incorrect results, feature broken
  • Minor: Edge case, cosmetic issue, rare error

Frequency Criteria:

  • High: Affects >10% of users or happens >10 times/day
  • Low: Affects <1% of users or happens occasionally

Common Fix Patterns by Error Type

Null/Undefined Errors

Pattern 1: Null Check with Default

# Before
name = user.name  # NoneType error

# After
name = user.name if user else "Unknown"

Pattern 2: Raise Exception (API boundaries)

# Before
user = db.users.find_one(user_id)
return user.name  # NoneType error

# After
user = db.users.find_one(user_id)
if user is None:
    raise HTTPException(404, "User not found")
return user.name

Type Errors

Pattern 1: Type Conversion with Validation

# Before
total = base_price + discount  # TypeError: int + str

# After
from pydantic import BaseModel

class PriceInput(BaseModel):
    base_price: int
    discount: int  # Automatic validation and conversion

input_data = PriceInput(**request_body)  # Validates types
total = input_data.base_price + input_data.discount

Database Errors

Pattern 1: Constraint Violations

# Before
db.add(user)
db.commit()  # IntegrityError: UNIQUE constraint failed

# After
from sqlalchemy.exc import IntegrityError

try:
    db.add(user)
    db.commit()
except IntegrityError:
    db.rollback()
    # Option A: Return error
    raise HTTPException(409, "User with this email already exists")
    # Option B: Upsert
    existing = db.query(User).filter_by(email=user.email).first()
    if existing:
        existing.name = user.name
        db.commit()

Pattern 2: Connection Failures

# Before
engine = create_engine(DATABASE_URL)
connection = engine.connect()  # OperationalError: connection refused

# After
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def get_connection():
    engine = create_engine(DATABASE_URL)
    return engine.connect()

connection = get_connection()

API/Integration Errors

Pattern 1: Validation at Boundary

# Before
response = payment_api.create_charge(amount=order.total)
# Fails with 422 if amount < 50 (API minimum)

# After
class CreateChargeRequest(BaseModel):
    amount: int

    @validator('amount')
    def amount_meets_minimum(cls, v):
        if v < 50:
            raise ValueError('Amount must be at least $0.50')
        return v

# Validate before API call
request = CreateChargeRequest(amount=order.total)  # Fails early
response = payment_api.create_charge(**request.dict())

Pattern 2: Retry with Backoff

# Before
response = httpx.get(api_url)  # Timeout occasionally

# After
from tenacity import retry, retry_if_exception_type, stop_after_attempt

@retry(
    retry=retry_if_exception_type(httpx.TimeoutException),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def fetch_with_retry(url: str):
    async with httpx.AsyncClient(timeout=5.0) as client:
        return await client.get(url)

Performance Errors

Pattern 1: N+1 Query Fix

# Before (N+1 queries)
users = db.query(User).all()  # 1 query
for user in users:
    posts = db.query(Post).filter(Post.user_id == user.id).all()  # N queries

# After (Single query with join)
users = db.query(User).options(
    joinedload(User.posts)
).all()  # 1 query with join

Pattern 2: Caching

# Before
def get_user_profile(user_id: str):
    return db.query(User).filter_by(id=user_id).first()  # Every time

# After
from functools import lru_cache
from cachetools import TTLCache, cached

cache = TTLCache(maxsize=1000, ttl=300)  # 5 minute TTL

@cached(cache)
def get_user_profile(user_id: str):
    return db.query(User).filter_by(id=user_id).first()

Fix Validation Strategies

Validation Checklist

Before deploying fix:
- [ ] Fix addresses root cause (not just symptoms)
- [ ] Tests added to prevent recurrence
- [ ] Tests pass locally
- [ ] Code reviewed by peer
- [ ] No new linting/type errors
- [ ] Performance impact assessed
- [ ] Security implications reviewed
- [ ] Rollback plan documented
- [ ] Monitoring/alerts updated

Test-Driven Fix Approach

Pattern: Write failing test → Implement fix → Test passes

# Step 1: Write failing test
def test_get_user_with_invalid_id_returns_404():
    """Test that invalid user_id returns 404, not 500."""
    response = client.get("/users/invalid-id")
    assert response.status_code == 404
    assert "User not found" in response.json()["detail"]

# Step 2: Run test (should fail with current bug)
# pytest tests/test_users.py::test_get_user_with_invalid_id_returns_404
# AssertionError: 500 != 404

# Step 3: Implement fix
@app.get("/users/{user_id}")
async def get_user(user_id: str):
    user = await db.users.find_one({"id": user_id})
    if user is None:
        raise HTTPException(404, "User not found")
    return user

# Step 4: Run test (should pass)
# pytest tests/test_users.py::test_get_user_with_invalid_id_returns_404
# PASSED

Integration Testing

# Test fix with realistic scenario
@pytest.mark.integration
async def test_order_creation_with_negative_total():
    """Integration test: Ensure negative order total is rejected."""
    # Setup
    user = await create_test_user()

    # Attempt to create order with negative total
    response = await client.post("/orders", json={
        "user_id": user.id,
        "items": [],
        "total": -100  # Invalid
    })

    # Assert validation error
    assert response.status_code == 422
    assert "total must be positive" in response.json()["detail"]

    # Verify no order created in database
    orders = await db.orders.find({"user_id": user.id})
    assert len(orders) == 0

Refactoring Considerations

When to Refactor During Fix

Refactor if: Fix requires understanding convoluted code Code duplication prevents proper fix Poor structure makes fix risky Fix is part of larger architectural improvement

Don't refactor if: Production incident needs immediate fix Refactoring scope unclear Tests insufficient to ensure safety Refactoring can be done separately

Refactoring Patterns

Pattern 1: Extract Function

# Before (hard to fix null error)
def process_order(order_data):
    user = db.users.find_one(order_data["user_id"])
    if user.is_active and user.credits > 0:
        # 50 lines of order processing
        pass

# After (easier to add null check)
def process_order(order_data):
    user = get_validated_user(order_data["user_id"])
    process_order_for_user(user, order_data)

def get_validated_user(user_id: str) -> User:
    """Get user and validate they can place orders."""
    user = db.users.find_one(user_id)
    if user is None:
        raise HTTPException(404, "User not found")
    if not user.is_active:
        raise HTTPException(403, "User account inactive")
    if user.credits <= 0:
        raise HTTPException(402, "Insufficient credits")
    return user

Production Safety

Pre-Deployment Checklist

- [ ] Fix tested in staging environment
- [ ] Performance impact measured (CPU, memory, latency)
- [ ] Database migrations tested with production-sized data
- [ ] Feature flag available for gradual rollout
- [ ] Rollback procedure documented and tested
- [ ] Monitoring dashboard shows relevant metrics
- [ ] Alerts configured for fix-related failures
- [ ] On-call engineer briefed on deployment
- [ ] Communication sent to stakeholders

Gradual Rollout Pattern

# Use feature flag for gradual rollout
from launchdarkly import LDClient

ld_client = LDClient("sdk-key")

@app.get("/users/{user_id}")
async def get_user(user_id: str):
    use_new_validation = ld_client.variation(
        "new-user-validation",
        {"key": user_id},
        default=False
    )

    if use_new_validation:
        # New fix with validation
        user = await get_validated_user(user_id)
    else:
        # Old code (fallback)
        user = await db.users.find_one(user_id)

    return user

Rollback Planning

Rollback Decision Criteria

Rollback immediately if:

  • Error rate spikes >5% above baseline
  • Critical functionality broken
  • Data corruption detected
  • Performance degrades >50%
  • Security vulnerability introduced

Monitor and investigate if:

  • Error rate increases <5%
  • Non-critical functionality affected
  • Performance degrades <20%
  • Edge cases failing

Rollback Procedures

1. Application Code Rollback

# Git-based rollback
git revert <commit-hash>
git push origin main

# Or redeploy previous version
git checkout <previous-tag>
./deploy.sh

2. Database Migration Rollback

# Alembic (Python)
alembic downgrade -1

# Drizzle (TypeScript)
bun run drizzle-kit drop --migration <migration-name>

3. Feature Flag Disable

# Instantly disable via LaunchDarkly dashboard or API
ld_client.variation("new-user-validation", context, default=False)

4. Cache Invalidation

# Clear cache after rollback
redis_client.flushdb()  # Clear all cache
# Or selectively
redis_client.delete("user:*")  # Clear user cache only

Quick Reference

Error Type Primary Fix Pattern Testing Strategy
Null/Undefined Null check, optional chaining, raise exception Unit test with None input
Type Mismatch Pydantic validation, type guards Unit test with wrong types
Database Try/except with rollback, retries Integration test with DB
API/Integration Validation at boundary, retries Mock API responses
Performance Caching, query optimization Performance benchmark test
Fix Type When to Use Risk Level
Quick Fix Production incident Low (isolated change)
Proper Fix Root cause resolution Medium (broader changes)
Comprehensive Fix Prevention of entire class Medium-High (architectural)

Usage: When implementing fix, generate 2-3 options with trade-offs, select best option based on priority, validate with tests, deploy with gradual rollout, monitor closely, document rollback procedure.