gh-greyhaven-ai-claude-code…/skills/smart-debugging/reference/fix-generation-patterns.md

# Fix Generation Patterns

Comprehensive guide to generating, evaluating, and implementing fixes for software bugs.

## Multiple Fix Options Strategy

**Core Principle**: Always generate 2-3 fix options with trade-off analysis.

### Fix Option Template

```markdown
**Option 1: [Name]** (e.g., Quick Fix)
**Implementation**: [What to change]
**Pros**: [Benefits]
**Cons**: [Drawbacks]
**Effort**: [Time estimate]
**Risk**: [Low/Medium/High]

**Option 2: [Name]** (e.g., Proper Fix)
**Implementation**: [What to change]
**Pros**: [Benefits]
**Cons**: [Drawbacks]
**Effort**: [Time estimate]
**Risk**: [Low/Medium/High]

**Option 3: [Name]** (e.g., Comprehensive Fix)
**Implementation**: [What to change]
**Pros**: [Benefits]
**Cons**: [Drawbacks]
**Effort**: [Time estimate]
**Risk**: [Low/Medium/High]

**Recommendation**: Option [X] because [reasoning]
```

_See [null-pointer-debug-example.md](../examples/null-pointer-debug-example.md) for complete fix options example._

## Quick Fix vs. Proper Fix

### Decision Matrix

| Criteria | Quick Fix | Proper Fix |
|----------|-----------|------------|
| **Urgency** | Production down, immediate relief needed | Incident resolved, addressing root cause |
| **Scope** | Minimal changes, single file | Multiple files, architectural changes |
| **Time** | Minutes to hours | Hours to days |
| **Testing** | Manual verification | Full test coverage required |
| **Risk** | Low (minimal changes) | Medium (broader impact) |
| **Longevity** | Temporary patch | Permanent solution |

### When to Use Quick Fix

✅ **Production incident** - System is down, users impacted
✅ **Known workaround** - Clear, safe mitigation exists
✅ **Low risk** - Change is isolated and reversible
✅ **Follow-up planned** - Proper fix scheduled for next sprint

**Pattern**: Quick fix now → Monitor → Proper fix later

### When to Use Proper Fix

✅ **Root cause addressed** - Not just treating symptoms
✅ **Proper testing** - Comprehensive test coverage added
✅ **Type safety** - Leverages static type checking
✅ **Prevention** - Prevents entire class of similar bugs
✅ **Documentation** - Code is self-documenting

**Pattern**: Understand root cause → Comprehensive fix → Prevent recurrence

## Fix Priority Assessment

### Priority Matrix

| Severity | Frequency | Priority | Response Time |
|----------|-----------|----------|---------------|
| **Critical** | High | P0 | Immediate (< 1 hour) |
| **Critical** | Low | P1 | Same day |
| **Major** | High | P1 | Same day |
| **Major** | Low | P2 | This week |
| **Minor** | High | P2 | This week |
| **Minor** | Low | P3 | Next sprint |

**Severity Criteria**:
- **Critical**: Data loss, security breach, production down
- **Major**: Degraded performance, incorrect results, feature broken
- **Minor**: Edge case, cosmetic issue, rare error

**Frequency Criteria**:
- **High**: Affects >10% of users or happens >10 times/day
- **Low**: Affects <1% of users or happens occasionally

## Common Fix Patterns by Error Type

### Null/Undefined Errors

**Pattern 1: Null Check with Default**
```python
# Before
name = user.name  # NoneType error

# After
name = user.name if user else "Unknown"
```

**Pattern 2: Raise Exception** (API boundaries)
```python
# Before
user = db.users.find_one(user_id)
return user.name  # NoneType error

# After
user = db.users.find_one(user_id)
if user is None:
    raise HTTPException(404, "User not found")
return user.name
```

### Type Errors

**Pattern 1: Type Conversion with Validation**
```python
# Before
total = base_price + discount  # TypeError: int + str

# After
from pydantic import BaseModel

class PriceInput(BaseModel):
    base_price: int
    discount: int  # Automatic validation and conversion

input_data = PriceInput(**request_body)  # Validates types
total = input_data.base_price + input_data.discount
```

### Database Errors

**Pattern 1: Constraint Violations**
```python
# Before
db.add(user)
db.commit()  # IntegrityError: UNIQUE constraint failed

# After
from sqlalchemy.exc import IntegrityError

try:
    db.add(user)
    db.commit()
except IntegrityError:
    db.rollback()
    # Option A: Return error
    raise HTTPException(409, "User with this email already exists")
    # Option B: Upsert
    existing = db.query(User).filter_by(email=user.email).first()
    if existing:
        existing.name = user.name
        db.commit()
```

**Pattern 2: Connection Failures**
```python
# Before
engine = create_engine(DATABASE_URL)
connection = engine.connect()  # OperationalError: connection refused

# After
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def get_connection():
    engine = create_engine(DATABASE_URL)
    return engine.connect()

connection = get_connection()
```

### API/Integration Errors

**Pattern 1: Validation at Boundary**
```python
# Before
response = payment_api.create_charge(amount=order.total)
# Fails with 422 if amount < 50 (API minimum)

# After
class CreateChargeRequest(BaseModel):
    amount: int

    @validator('amount')
    def amount_meets_minimum(cls, v):
        if v < 50:
            raise ValueError('Amount must be at least $0.50')
        return v

# Validate before API call
request = CreateChargeRequest(amount=order.total)  # Fails early
response = payment_api.create_charge(**request.dict())
```

**Pattern 2: Retry with Backoff**
```python
# Before
response = httpx.get(api_url)  # Timeout occasionally

# After
from tenacity import retry, retry_if_exception_type, stop_after_attempt

@retry(
    retry=retry_if_exception_type(httpx.TimeoutException),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def fetch_with_retry(url: str):
    async with httpx.AsyncClient(timeout=5.0) as client:
        return await client.get(url)
```

### Performance Errors

**Pattern 1: N+1 Query Fix**
```python
# Before (N+1 queries)
users = db.query(User).all()  # 1 query
for user in users:
    posts = db.query(Post).filter(Post.user_id == user.id).all()  # N queries

# After (Single query with join)
users = db.query(User).options(
    joinedload(User.posts)
).all()  # 1 query with join
```

**Pattern 2: Caching**
```python
# Before
def get_user_profile(user_id: str):
    return db.query(User).filter_by(id=user_id).first()  # Every time

# After
from functools import lru_cache
from cachetools import TTLCache, cached

cache = TTLCache(maxsize=1000, ttl=300)  # 5 minute TTL

@cached(cache)
def get_user_profile(user_id: str):
    return db.query(User).filter_by(id=user_id).first()
```

## Fix Validation Strategies

### Validation Checklist

```markdown
Before deploying fix:
- [ ] Fix addresses root cause (not just symptoms)
- [ ] Tests added to prevent recurrence
- [ ] Tests pass locally
- [ ] Code reviewed by peer
- [ ] No new linting/type errors
- [ ] Performance impact assessed
- [ ] Security implications reviewed
- [ ] Rollback plan documented
- [ ] Monitoring/alerts updated
```

### Test-Driven Fix Approach

**Pattern**: Write failing test → Implement fix → Test passes

```python
# Step 1: Write failing test
def test_get_user_with_invalid_id_returns_404():
    """Test that invalid user_id returns 404, not 500."""
    response = client.get("/users/invalid-id")
    assert response.status_code == 404
    assert "User not found" in response.json()["detail"]

# Step 2: Run test (should fail with current bug)
# pytest tests/test_users.py::test_get_user_with_invalid_id_returns_404
# AssertionError: 500 != 404

# Step 3: Implement fix
@app.get("/users/{user_id}")
async def get_user(user_id: str):
    user = await db.users.find_one({"id": user_id})
    if user is None:
        raise HTTPException(404, "User not found")
    return user

# Step 4: Run test (should pass)
# pytest tests/test_users.py::test_get_user_with_invalid_id_returns_404
# PASSED
```

### Integration Testing

```python
# Test fix with realistic scenario
@pytest.mark.integration
async def test_order_creation_with_negative_total():
    """Integration test: Ensure negative order total is rejected."""
    # Setup
    user = await create_test_user()

    # Attempt to create order with negative total
    response = await client.post("/orders", json={
        "user_id": user.id,
        "items": [],
        "total": -100  # Invalid
    })

    # Assert validation error
    assert response.status_code == 422
    assert "total must be positive" in response.json()["detail"]

    # Verify no order created in database
    orders = await db.orders.find({"user_id": user.id})
    assert len(orders) == 0
```

## Refactoring Considerations

### When to Refactor During Fix

**Refactor if**:
✅ Fix requires understanding convoluted code
✅ Code duplication prevents proper fix
✅ Poor structure makes fix risky
✅ Fix is part of larger architectural improvement

**Don't refactor if**:
❌ Production incident needs immediate fix
❌ Refactoring scope unclear
❌ Tests insufficient to ensure safety
❌ Refactoring can be done separately

### Refactoring Patterns

**Pattern 1: Extract Function**
```python
# Before (hard to fix null error)
def process_order(order_data):
    user = db.users.find_one(order_data["user_id"])
    if user.is_active and user.credits > 0:
        # 50 lines of order processing
        pass

# After (easier to add null check)
def process_order(order_data):
    user = get_validated_user(order_data["user_id"])
    process_order_for_user(user, order_data)

def get_validated_user(user_id: str) -> User:
    """Get user and validate they can place orders."""
    user = db.users.find_one(user_id)
    if user is None:
        raise HTTPException(404, "User not found")
    if not user.is_active:
        raise HTTPException(403, "User account inactive")
    if user.credits <= 0:
        raise HTTPException(402, "Insufficient credits")
    return user
```

## Production Safety

### Pre-Deployment Checklist

```markdown
- [ ] Fix tested in staging environment
- [ ] Performance impact measured (CPU, memory, latency)
- [ ] Database migrations tested with production-sized data
- [ ] Feature flag available for gradual rollout
- [ ] Rollback procedure documented and tested
- [ ] Monitoring dashboard shows relevant metrics
- [ ] Alerts configured for fix-related failures
- [ ] On-call engineer briefed on deployment
- [ ] Communication sent to stakeholders
```

### Gradual Rollout Pattern

```python
# Use feature flag for gradual rollout
from launchdarkly import LDClient

ld_client = LDClient("sdk-key")

@app.get("/users/{user_id}")
async def get_user(user_id: str):
    use_new_validation = ld_client.variation(
        "new-user-validation",
        {"key": user_id},
        default=False
    )

    if use_new_validation:
        # New fix with validation
        user = await get_validated_user(user_id)
    else:
        # Old code (fallback)
        user = await db.users.find_one(user_id)

    return user
```

## Rollback Planning

### Rollback Decision Criteria

**Rollback immediately if**:
- Error rate spikes >5% above baseline
- Critical functionality broken
- Data corruption detected
- Performance degrades >50%
- Security vulnerability introduced

**Monitor and investigate if**:
- Error rate increases <5%
- Non-critical functionality affected
- Performance degrades <20%
- Edge cases failing

### Rollback Procedures

**1. Application Code Rollback**
```bash
# Git-based rollback
git revert <commit-hash>
git push origin main

# Or redeploy previous version
git checkout <previous-tag>
./deploy.sh
```

**2. Database Migration Rollback**
```bash
# Alembic (Python)
alembic downgrade -1

# Drizzle (TypeScript)
bun run drizzle-kit drop --migration <migration-name>
```

**3. Feature Flag Disable**
```python
# Instantly disable via LaunchDarkly dashboard or API
ld_client.variation("new-user-validation", context, default=False)
```

**4. Cache Invalidation**
```python
# Clear cache after rollback
redis_client.flushdb()  # Clear all cache
# Or selectively
redis_client.delete("user:*")  # Clear user cache only
```

## Quick Reference

| Error Type | Primary Fix Pattern | Testing Strategy |
|------------|-------------------|------------------|
| **Null/Undefined** | Null check, optional chaining, raise exception | Unit test with None input |
| **Type Mismatch** | Pydantic validation, type guards | Unit test with wrong types |
| **Database** | Try/except with rollback, retries | Integration test with DB |
| **API/Integration** | Validation at boundary, retries | Mock API responses |
| **Performance** | Caching, query optimization | Performance benchmark test |

| Fix Type | When to Use | Risk Level |
|----------|-------------|------------|
| **Quick Fix** | Production incident | Low (isolated change) |
| **Proper Fix** | Root cause resolution | Medium (broader changes) |
| **Comprehensive Fix** | Prevention of entire class | Medium-High (architectural) |

---

**Usage**: When implementing fix, generate 2-3 options with trade-offs, select best option based on priority, validate with tests, deploy with gradual rollout, monitor closely, document rollback procedure.