Files
gh-greyhaven-ai-claude-code…/skills/smart-debugging/examples/null-pointer-debug-example.md
2025-11-29 18:29:18 +08:00

12 KiB

Null Pointer Debug Example

Complete walkthrough of debugging a NoneType AttributeError using smart-debug systematic methodology.

Error Encountered

Environment: Production Severity: SEV2 (Degraded service - user profile pages failing) Frequency: 127 occurrences in last 24 hours First Occurrence: 2025-01-16 14:23:00 UTC

Error Message

AttributeError: 'NoneType' object has no attribute 'name'

User Report

"When I click on a user's profile after they've deleted their account, the page crashes with a 500 error instead of showing a 'User not found' message."

Phase 1: Triage (3 minutes)

Severity Assessment:

  • Not production down (SEV1)
  • Affects specific user workflow (profile viewing)
  • 127 occurrences = moderate frequency
  • Decision: SEV2 - Proceed with full smart-debug workflow

Error Category: Runtime Exception (NoneType error)

Phase 2: Stack Trace Analysis

Full Stack Trace

Traceback (most recent call last):
  File "/app/api/users.py", line 42, in get_user_profile
    return {"name": user.name, "email": user.email}
  File "/app/models/user.py", line 89, in name
    return self._name.upper()
AttributeError: 'NoneType' object has no attribute 'name'

Pattern Match

Pattern: null_pointer Indicators: 'NoneType' object has no attribute Likely Cause: Accessing property on None value - check for null/undefined Fix Template: Add null check before access

Phase 3: Code Inspection

Problematic Code (api/users.py:42)

@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
    """Get user profile information."""
    user = await db.users.find_one({"id": user_id})

    # Line 42 - THE PROBLEM
    return {
        "name": user.name,
        "email": user.email,
        "created_at": user.created_at
    }

Issue Identified: No null check! If find_one() returns None (user doesn't exist), accessing user.name causes AttributeError.

Root Cause

When a user deletes their account:

  1. User record is deleted from database
  2. But references to that user ID may still exist (links, mentions, etc.)
  3. When someone tries to view the deleted user's profile, find_one() returns None
  4. Code tries to access None.name → AttributeError

Phase 4: Reproduce Locally

Test Case (Failing)

# tests/test_user_api.py
import pytest
from fastapi.testclient import TestClient

def test_get_deleted_user_profile(client: TestClient, db):
    """Test viewing profile of deleted user."""
    # Setup: Create and then delete a user
    user_id = "usr_deleted_123"
    db.users.insert_one({"id": user_id, "name": "Deleted User"})
    db.users.delete_one({"id": user_id})

    # Action: Try to get profile of deleted user
    response = client.get(f"/users/{user_id}/profile")

    # Expected: 404 Not Found, not 500 Internal Server Error
    assert response.status_code == 404
    assert response.json() == {"detail": f"User {user_id} not found"}

Run Test (Fails as Expected)

$ pytest tests/test_user_api.py::test_get_deleted_user_profile -v

tests/test_user_api.py::test_get_deleted_user_profile FAILED

E   assert 500 == 404
E    +  where 500 = <Response [500 Internal Server Error]>.status_code

Reproduction Successful - Test reliably reproduces the bug.

Phase 5: Fix Generation

Option 1: Quick Fix (Return Early)

@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
    """Get user profile information."""
    user = await db.users.find_one({"id": user_id})

    # Quick fix: Return early if user not found
    if user is None:
        raise HTTPException(status_code=404, detail=f"User {user_id} not found")

    return {
        "name": user.name,
        "email": user.email,
        "created_at": user.created_at
    }

Pros: Simple, fixes the immediate issue Cons: Doesn't prevent similar issues elsewhere

Option 2: Robust Fix (Custom Exception)

# models/exceptions.py
class UserNotFoundError(Exception):
    """Raised when user is not found in database."""
    def __init__(self, user_id: str):
        self.user_id = user_id
        super().__init__(f"User {user_id} not found")

# api/users.py
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
    """Get user profile information."""
    user = await db.users.find_one({"id": user_id})

    if user is None:
        raise UserNotFoundError(user_id)

    return {
        "name": user.name,
        "email": user.email,
        "created_at": user.created_at
    }

# Global exception handler
@app.exception_handler(UserNotFoundError)
async def user_not_found_handler(request, exc):
    return JSONResponse(
        status_code=404,
        content={"detail": str(exc)}
    )

Pros: Reusable, type-safe, better error handling Cons: More boilerplate

Option 3: Best Practice (Helper Function)

# services/user_service.py
async def get_user_or_404(user_id: str) -> User:
    """Get user by ID or raise 404."""
    user = await db.users.find_one({"id": user_id})
    if user is None:
        raise HTTPException(status_code=404, detail=f"User {user_id} not found")
    return user

# api/users.py
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
    """Get user profile information."""
    user = await get_user_or_404(user_id)  # Handles None case

    return {
        "name": user.name,
        "email": user.email,
        "created_at": user.created_at
    }

Pros: Reusable pattern, prevents similar bugs, clean API code Cons: Slightly more abstraction

Selected: Option 3 - Best practice with reusable helper

Phase 6: Apply Fix

# services/user_service.py
from fastapi import HTTPException

async def get_user_or_404(user_id: str) -> User:
    """
    Get user by ID or raise 404 Not Found.

    Args:
        user_id: User ID to retrieve

    Returns:
        User object

    Raises:
        HTTPException: 404 if user not found

    Example:
        user = await get_user_or_404("usr_123")
    """
    user = await db.users.find_one({"id": user_id})
    if user is None:
        raise HTTPException(
            status_code=404,
            detail=f"User {user_id} not found"
        )
    return user

# api/users.py (FIXED)
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
    """Get user profile information."""
    user = await get_user_or_404(user_id)  # Null-safe!

    return {
        "name": user.name,
        "email": user.email,
        "created_at": user.created_at
    }

Phase 7: Verify Fix

Run Test Again (Should Pass)

$ pytest tests/test_user_api.py::test_get_deleted_user_profile -v

tests/test_user_api.py::test_get_deleted_user_profile PASSED ✅

Run Full Test Suite

$ pytest tests/ -v

======================== 127 passed in 4.23s ========================

All tests pass - No regressions introduced

Manual Verification

# Test in staging
$ curl https://api-staging.example.com/users/usr_deleted_123/profile

HTTP/1.1 404 Not Found
{
  "detail": "User usr_deleted_123 not found"
}

Fix verified - Returns proper 404 instead of 500

Phase 8: Root Cause Analysis (5 Whys)

Why 1: Why did the API return 500 error? → Code tried to access user.name on None

Why 2: Why was user None? → Database query returned None (user doesn't exist)

Why 3: Why didn't code handle None case? → No null check before accessing user properties

Why 4: Why was there no null check? → Developer assumed user would always exist (invalid assumption)

Why 5: Why was invalid assumption made? → No code review guideline requiring null checks for database queries

ROOT CAUSE: Missing null-safety pattern for database queries + no enforcement in code review

Phase 9: Prevention Strategy

Immediate Prevention

Unit test added (prevents this specific bug)

def test_get_deleted_user_profile(client, db):
    # Test ensures 404 is returned for deleted users
    pass

Short-term Prevention

Integration test added (prevents similar bugs)

@pytest.mark.parametrize("endpoint", [
    "/users/{id}/profile",
    "/users/{id}/settings",
    "/users/{id}/posts"
])
def test_user_endpoints_return_404_for_deleted_users(client, db, endpoint):
    """All user endpoints should return 404 for deleted users."""
    user_id = create_and_delete_user(db)
    response = client.get(endpoint.format(id=user_id))
    assert response.status_code == 404

Long-term Prevention

Architecture change proposed: Create get_resource_or_404() pattern

# services/base_service.py
from typing import TypeVar, Generic, Type

T = TypeVar('T')

class BaseService(Generic[T]):
    """Base service with null-safe query methods."""

    async def get_or_404(
        self,
        resource_id: str,
        resource_type: str = "Resource"
    ) -> T:
        """Get resource by ID or raise 404."""
        resource = await self.find_one({"id": resource_id})
        if resource is None:
            raise HTTPException(
                status_code=404,
                detail=f"{resource_type} {resource_id} not found"
            )
        return resource

# Usage across all resources
user_service = UserService()
post_service = PostService()
comment_service = CommentService()

user = await user_service.get_or_404(user_id, "User")
post = await post_service.get_or_404(post_id, "Post")

Monitoring Added

Alert created (detects recurrence)

# prometheus/alerts/user_not_found.yml
groups:
  - name: user_api
    rules:
    - alert: HighUserNotFoundRate
      expr: |
        rate(http_requests_total{
          endpoint="/users/:id/profile",
          status_code="404"
        }[5m]) > 10
      for: 5m
      annotations:
        summary: "High rate of user not found errors"
        description: "{{ $value }} 404s/sec on user profile endpoint"

Documentation Updated

Runbook created

# Runbook: User Not Found Errors

## Symptom
404 errors when accessing user profiles

## Diagnosis
- Check if user was recently deleted
- Verify database replication lag
- Check for stale cache entries

## Resolution
- User deleted: Expected behavior
- Replication lag: Wait 30 seconds
- Stale cache: Clear user cache

## Prevention
Always use `get_user_or_404()` helper

Phase 10: Deploy & Monitor

Pre-Deployment Checklist

  • Fix tested in staging
  • No performance impact
  • Security review not needed (defensive fix)
  • Deployment plan created
  • Rollback plan ready

Deployment

# Deploy to staging
$ git push origin feature/fix-user-not-found
$ ./scripts/deploy-staging.sh

# Verify in staging (1 hour)
$ ./scripts/monitor-staging.sh --duration 1h

# Deploy to production (gradual rollout)
$ ./scripts/deploy-production.sh --canary 10%  # 10% traffic
$ sleep 600  # Monitor for 10 minutes
$ ./scripts/deploy-production.sh --canary 50%  # 50% traffic
$ sleep 600
$ ./scripts/deploy-production.sh --canary 100% # Full traffic

Post-Deployment Monitoring

1 Hour Post-Deploy:

# Check error logs
$ kubectl logs -l app=api --since=1h | grep "User.*not found"
# No unexpected errors ✅

# Check error rate
$ curl prometheus/query?query='rate(http_errors_total[1h])'
# No increase in error rate ✅

24 Hours Post-Deploy:

# Verify user not found rate is zero
$ curl prometheus/query?query='rate(http_requests_total{status_code="404",endpoint="/users/:id/profile"}[24h])'
# Result: 0 errors ✅

Summary

Metric Value
Time to Reproduce 5 minutes
Time to Fix 15 minutes
Time to Deploy 30 minutes
Total Time 50 minutes
Tests Added 2 (unit + integration)
Prevention Strategies 3 (tests, architecture, monitoring)
Recurrences 0 (monitored for 1 week)

Lessons Learned

What Went Well

  1. Clear stack trace made root cause obvious
  2. Test-driven debugging caught the issue immediately
  3. Helper function prevents similar bugs across codebase

What Could Be Improved

  1. Should have had null-safety pattern from the start
  2. Code review should catch missing null checks
  3. Static analysis could detect this pattern

Recommendations

  1. Add mypy or similar for null-safety checking
  2. Update code review checklist to include null-safety checks
  3. Create linter rule: "Database queries must use get_or_404 pattern"

Bug Fixed: Tests Pass: Prevention Implemented: Production Stable: