Initial commit

2025-11-29 18:29:18 +08:00
commit 46dfc30864
25 changed files with 4683 additions and 0 deletions
--- a/skills/smart-debugging/reference/fix-generation-patterns.md
+++ b/skills/smart-debugging/reference/fix-generation-patterns.md
@@ -0,0 +1,483 @@
+# Fix Generation Patterns
+
+Comprehensive guide to generating, evaluating, and implementing fixes for software bugs.
+
+## Multiple Fix Options Strategy
+
+**Core Principle**: Always generate 2-3 fix options with trade-off analysis.
+
+### Fix Option Template
+
+```markdown
+**Option 1: [Name]** (e.g., Quick Fix)
+**Implementation**: [What to change]
+**Pros**: [Benefits]
+**Cons**: [Drawbacks]
+**Effort**: [Time estimate]
+**Risk**: [Low/Medium/High]
+
+**Option 2: [Name]** (e.g., Proper Fix)
+**Implementation**: [What to change]
+**Pros**: [Benefits]
+**Cons**: [Drawbacks]
+**Effort**: [Time estimate]
+**Risk**: [Low/Medium/High]
+
+**Option 3: [Name]** (e.g., Comprehensive Fix)
+**Implementation**: [What to change]
+**Pros**: [Benefits]
+**Cons**: [Drawbacks]
+**Effort**: [Time estimate]
+**Risk**: [Low/Medium/High]
+
+**Recommendation**: Option [X] because [reasoning]
+```
+
+_See [null-pointer-debug-example.md](../examples/null-pointer-debug-example.md) for complete fix options example._
+
+## Quick Fix vs. Proper Fix
+
+### Decision Matrix
+
+| Criteria | Quick Fix | Proper Fix |
+|----------|-----------|------------|
+| **Urgency** | Production down, immediate relief needed | Incident resolved, addressing root cause |
+| **Scope** | Minimal changes, single file | Multiple files, architectural changes |
+| **Time** | Minutes to hours | Hours to days |
+| **Testing** | Manual verification | Full test coverage required |
+| **Risk** | Low (minimal changes) | Medium (broader impact) |
+| **Longevity** | Temporary patch | Permanent solution |
+
+### When to Use Quick Fix
+
+✅ **Production incident** - System is down, users impacted
+✅ **Known workaround** - Clear, safe mitigation exists
+✅ **Low risk** - Change is isolated and reversible
+✅ **Follow-up planned** - Proper fix scheduled for next sprint
+
+**Pattern**: Quick fix now → Monitor → Proper fix later
+
+### When to Use Proper Fix
+
+✅ **Root cause addressed** - Not just treating symptoms
+✅ **Proper testing** - Comprehensive test coverage added
+✅ **Type safety** - Leverages static type checking
+✅ **Prevention** - Prevents entire class of similar bugs
+✅ **Documentation** - Code is self-documenting
+
+**Pattern**: Understand root cause → Comprehensive fix → Prevent recurrence
+
+## Fix Priority Assessment
+
+### Priority Matrix
+
+| Severity | Frequency | Priority | Response Time |
+|----------|-----------|----------|---------------|
+| **Critical** | High | P0 | Immediate (< 1 hour) |
+| **Critical** | Low | P1 | Same day |
+| **Major** | High | P1 | Same day |
+| **Major** | Low | P2 | This week |
+| **Minor** | High | P2 | This week |
+| **Minor** | Low | P3 | Next sprint |
+
+**Severity Criteria**:
+- **Critical**: Data loss, security breach, production down
+- **Major**: Degraded performance, incorrect results, feature broken
+- **Minor**: Edge case, cosmetic issue, rare error
+
+**Frequency Criteria**:
+- **High**: Affects >10% of users or happens >10 times/day
+- **Low**: Affects <1% of users or happens occasionally
+
+## Common Fix Patterns by Error Type
+
+### Null/Undefined Errors
+
+**Pattern 1: Null Check with Default**
+```python
+# Before
+name = user.name  # NoneType error
+
+# After
+name = user.name if user else "Unknown"
+```
+
+**Pattern 2: Raise Exception** (API boundaries)
+```python
+# Before
+user = db.users.find_one(user_id)
+return user.name  # NoneType error
+
+# After
+user = db.users.find_one(user_id)
+if user is None:
+    raise HTTPException(404, "User not found")
+return user.name
+```
+
+### Type Errors
+
+**Pattern 1: Type Conversion with Validation**
+```python
+# Before
+total = base_price + discount  # TypeError: int + str
+
+# After
+from pydantic import BaseModel
+
+class PriceInput(BaseModel):
+    base_price: int
+    discount: int  # Automatic validation and conversion
+
+input_data = PriceInput(**request_body)  # Validates types
+total = input_data.base_price + input_data.discount
+```
+
+### Database Errors
+
+**Pattern 1: Constraint Violations**
+```python
+# Before
+db.add(user)
+db.commit()  # IntegrityError: UNIQUE constraint failed
+
+# After
+from sqlalchemy.exc import IntegrityError
+
+try:
+    db.add(user)
+    db.commit()
+except IntegrityError:
+    db.rollback()
+    # Option A: Return error
+    raise HTTPException(409, "User with this email already exists")
+    # Option B: Upsert
+    existing = db.query(User).filter_by(email=user.email).first()
+    if existing:
+        existing.name = user.name
+        db.commit()
+```
+
+**Pattern 2: Connection Failures**
+```python
+# Before
+engine = create_engine(DATABASE_URL)
+connection = engine.connect()  # OperationalError: connection refused
+
+# After
+from tenacity import retry, stop_after_attempt, wait_exponential
+
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=2, max=10)
+)
+def get_connection():
+    engine = create_engine(DATABASE_URL)
+    return engine.connect()
+
+connection = get_connection()
+```
+
+### API/Integration Errors
+
+**Pattern 1: Validation at Boundary**
+```python
+# Before
+response = payment_api.create_charge(amount=order.total)
+# Fails with 422 if amount < 50 (API minimum)
+
+# After
+class CreateChargeRequest(BaseModel):
+    amount: int
+
+    @validator('amount')
+    def amount_meets_minimum(cls, v):
+        if v < 50:
+            raise ValueError('Amount must be at least $0.50')
+        return v
+
+# Validate before API call
+request = CreateChargeRequest(amount=order.total)  # Fails early
+response = payment_api.create_charge(**request.dict())
+```
+
+**Pattern 2: Retry with Backoff**
+```python
+# Before
+response = httpx.get(api_url)  # Timeout occasionally
+
+# After
+from tenacity import retry, retry_if_exception_type, stop_after_attempt
+
+@retry(
+    retry=retry_if_exception_type(httpx.TimeoutException),
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=2, max=10)
+)
+async def fetch_with_retry(url: str):
+    async with httpx.AsyncClient(timeout=5.0) as client:
+        return await client.get(url)
+```
+
+### Performance Errors
+
+**Pattern 1: N+1 Query Fix**
+```python
+# Before (N+1 queries)
+users = db.query(User).all()  # 1 query
+for user in users:
+    posts = db.query(Post).filter(Post.user_id == user.id).all()  # N queries
+
+# After (Single query with join)
+users = db.query(User).options(
+    joinedload(User.posts)
+).all()  # 1 query with join
+```
+
+**Pattern 2: Caching**
+```python
+# Before
+def get_user_profile(user_id: str):
+    return db.query(User).filter_by(id=user_id).first()  # Every time
+
+# After
+from functools import lru_cache
+from cachetools import TTLCache, cached
+
+cache = TTLCache(maxsize=1000, ttl=300)  # 5 minute TTL
+
+@cached(cache)
+def get_user_profile(user_id: str):
+    return db.query(User).filter_by(id=user_id).first()
+```
+
+## Fix Validation Strategies
+
+### Validation Checklist
+
+```markdown
+Before deploying fix:
+- [ ] Fix addresses root cause (not just symptoms)
+- [ ] Tests added to prevent recurrence
+- [ ] Tests pass locally
+- [ ] Code reviewed by peer
+- [ ] No new linting/type errors
+- [ ] Performance impact assessed
+- [ ] Security implications reviewed
+- [ ] Rollback plan documented
+- [ ] Monitoring/alerts updated
+```
+
+### Test-Driven Fix Approach
+
+**Pattern**: Write failing test → Implement fix → Test passes
+
+```python
+# Step 1: Write failing test
+def test_get_user_with_invalid_id_returns_404():
+    """Test that invalid user_id returns 404, not 500."""
+    response = client.get("/users/invalid-id")
+    assert response.status_code == 404
+    assert "User not found" in response.json()["detail"]
+
+# Step 2: Run test (should fail with current bug)
+# pytest tests/test_users.py::test_get_user_with_invalid_id_returns_404
+# AssertionError: 500 != 404
+
+# Step 3: Implement fix
+@app.get("/users/{user_id}")
+async def get_user(user_id: str):
+    user = await db.users.find_one({"id": user_id})
+    if user is None:
+        raise HTTPException(404, "User not found")
+    return user
+
+# Step 4: Run test (should pass)
+# pytest tests/test_users.py::test_get_user_with_invalid_id_returns_404
+# PASSED
+```
+
+### Integration Testing
+
+```python
+# Test fix with realistic scenario
+@pytest.mark.integration
+async def test_order_creation_with_negative_total():
+    """Integration test: Ensure negative order total is rejected."""
+    # Setup
+    user = await create_test_user()
+
+    # Attempt to create order with negative total
+    response = await client.post("/orders", json={
+        "user_id": user.id,
+        "items": [],
+        "total": -100  # Invalid
+    })
+
+    # Assert validation error
+    assert response.status_code == 422
+    assert "total must be positive" in response.json()["detail"]
+
+    # Verify no order created in database
+    orders = await db.orders.find({"user_id": user.id})
+    assert len(orders) == 0
+```
+
+## Refactoring Considerations
+
+### When to Refactor During Fix
+
+**Refactor if**:
+✅ Fix requires understanding convoluted code
+✅ Code duplication prevents proper fix
+✅ Poor structure makes fix risky
+✅ Fix is part of larger architectural improvement
+
+**Don't refactor if**:
+❌ Production incident needs immediate fix
+❌ Refactoring scope unclear
+❌ Tests insufficient to ensure safety
+❌ Refactoring can be done separately
+
+### Refactoring Patterns
+
+**Pattern 1: Extract Function**
+```python
+# Before (hard to fix null error)
+def process_order(order_data):
+    user = db.users.find_one(order_data["user_id"])
+    if user.is_active and user.credits > 0:
+        # 50 lines of order processing
+        pass
+
+# After (easier to add null check)
+def process_order(order_data):
+    user = get_validated_user(order_data["user_id"])
+    process_order_for_user(user, order_data)
+
+def get_validated_user(user_id: str) -> User:
+    """Get user and validate they can place orders."""
+    user = db.users.find_one(user_id)
+    if user is None:
+        raise HTTPException(404, "User not found")
+    if not user.is_active:
+        raise HTTPException(403, "User account inactive")
+    if user.credits <= 0:
+        raise HTTPException(402, "Insufficient credits")
+    return user
+```
+
+## Production Safety
+
+### Pre-Deployment Checklist
+
+```markdown
+- [ ] Fix tested in staging environment
+- [ ] Performance impact measured (CPU, memory, latency)
+- [ ] Database migrations tested with production-sized data
+- [ ] Feature flag available for gradual rollout
+- [ ] Rollback procedure documented and tested
+- [ ] Monitoring dashboard shows relevant metrics
+- [ ] Alerts configured for fix-related failures
+- [ ] On-call engineer briefed on deployment
+- [ ] Communication sent to stakeholders
+```
+
+### Gradual Rollout Pattern
+
+```python
+# Use feature flag for gradual rollout
+from launchdarkly import LDClient
+
+ld_client = LDClient("sdk-key")
+
+@app.get("/users/{user_id}")
+async def get_user(user_id: str):
+    use_new_validation = ld_client.variation(
+        "new-user-validation",
+        {"key": user_id},
+        default=False
+    )
+
+    if use_new_validation:
+        # New fix with validation
+        user = await get_validated_user(user_id)
+    else:
+        # Old code (fallback)
+        user = await db.users.find_one(user_id)
+
+    return user
+```
+
+## Rollback Planning
+
+### Rollback Decision Criteria
+
+**Rollback immediately if**:
+- Error rate spikes >5% above baseline
+- Critical functionality broken
+- Data corruption detected
+- Performance degrades >50%
+- Security vulnerability introduced
+
+**Monitor and investigate if**:
+- Error rate increases <5%
+- Non-critical functionality affected
+- Performance degrades <20%
+- Edge cases failing
+
+### Rollback Procedures
+
+**1. Application Code Rollback**
+```bash
+# Git-based rollback
+git revert <commit-hash>
+git push origin main
+
+# Or redeploy previous version
+git checkout <previous-tag>
+./deploy.sh
+```
+
+**2. Database Migration Rollback**
+```bash
+# Alembic (Python)
+alembic downgrade -1
+
+# Drizzle (TypeScript)
+bun run drizzle-kit drop --migration <migration-name>
+```
+
+**3. Feature Flag Disable**
+```python
+# Instantly disable via LaunchDarkly dashboard or API
+ld_client.variation("new-user-validation", context, default=False)
+```
+
+**4. Cache Invalidation**
+```python
+# Clear cache after rollback
+redis_client.flushdb()  # Clear all cache
+# Or selectively
+redis_client.delete("user:*")  # Clear user cache only
+```
+
+## Quick Reference
+
+| Error Type | Primary Fix Pattern | Testing Strategy |
+|------------|-------------------|------------------|
+| **Null/Undefined** | Null check, optional chaining, raise exception | Unit test with None input |
+| **Type Mismatch** | Pydantic validation, type guards | Unit test with wrong types |
+| **Database** | Try/except with rollback, retries | Integration test with DB |
+| **API/Integration** | Validation at boundary, retries | Mock API responses |
+| **Performance** | Caching, query optimization | Performance benchmark test |
+
+| Fix Type | When to Use | Risk Level |
+|----------|-------------|------------|
+| **Quick Fix** | Production incident | Low (isolated change) |
+| **Proper Fix** | Root cause resolution | Medium (broader changes) |
+| **Comprehensive Fix** | Prevention of entire class | Medium-High (architectural) |
+
+---
+
+**Usage**: When implementing fix, generate 2-3 options with trade-offs, select best option based on priority, validate with tests, deploy with gradual rollout, monitor closely, document rollback procedure.