12 KiB
Null Pointer Debug Example
Complete walkthrough of debugging a NoneType AttributeError using smart-debug systematic methodology.
Error Encountered
Environment: Production Severity: SEV2 (Degraded service - user profile pages failing) Frequency: 127 occurrences in last 24 hours First Occurrence: 2025-01-16 14:23:00 UTC
Error Message
AttributeError: 'NoneType' object has no attribute 'name'
User Report
"When I click on a user's profile after they've deleted their account, the page crashes with a 500 error instead of showing a 'User not found' message."
Phase 1: Triage (3 minutes)
Severity Assessment:
- Not production down (SEV1)
- Affects specific user workflow (profile viewing)
- 127 occurrences = moderate frequency
- Decision: SEV2 - Proceed with full smart-debug workflow
Error Category: Runtime Exception (NoneType error)
Phase 2: Stack Trace Analysis
Full Stack Trace
Traceback (most recent call last):
File "/app/api/users.py", line 42, in get_user_profile
return {"name": user.name, "email": user.email}
File "/app/models/user.py", line 89, in name
return self._name.upper()
AttributeError: 'NoneType' object has no attribute 'name'
Pattern Match
Pattern: null_pointer
Indicators: 'NoneType' object has no attribute
Likely Cause: Accessing property on None value - check for null/undefined
Fix Template: Add null check before access
Phase 3: Code Inspection
Problematic Code (api/users.py:42)
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
"""Get user profile information."""
user = await db.users.find_one({"id": user_id})
# Line 42 - THE PROBLEM
return {
"name": user.name,
"email": user.email,
"created_at": user.created_at
}
Issue Identified: No null check! If find_one() returns None (user doesn't exist), accessing user.name causes AttributeError.
Root Cause
When a user deletes their account:
- User record is deleted from database
- But references to that user ID may still exist (links, mentions, etc.)
- When someone tries to view the deleted user's profile,
find_one()returnsNone - Code tries to access
None.name→ AttributeError
Phase 4: Reproduce Locally
Test Case (Failing)
# tests/test_user_api.py
import pytest
from fastapi.testclient import TestClient
def test_get_deleted_user_profile(client: TestClient, db):
"""Test viewing profile of deleted user."""
# Setup: Create and then delete a user
user_id = "usr_deleted_123"
db.users.insert_one({"id": user_id, "name": "Deleted User"})
db.users.delete_one({"id": user_id})
# Action: Try to get profile of deleted user
response = client.get(f"/users/{user_id}/profile")
# Expected: 404 Not Found, not 500 Internal Server Error
assert response.status_code == 404
assert response.json() == {"detail": f"User {user_id} not found"}
Run Test (Fails as Expected)
$ pytest tests/test_user_api.py::test_get_deleted_user_profile -v
tests/test_user_api.py::test_get_deleted_user_profile FAILED
E assert 500 == 404
E + where 500 = <Response [500 Internal Server Error]>.status_code
✅ Reproduction Successful - Test reliably reproduces the bug.
Phase 5: Fix Generation
Option 1: Quick Fix (Return Early)
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
"""Get user profile information."""
user = await db.users.find_one({"id": user_id})
# Quick fix: Return early if user not found
if user is None:
raise HTTPException(status_code=404, detail=f"User {user_id} not found")
return {
"name": user.name,
"email": user.email,
"created_at": user.created_at
}
Pros: Simple, fixes the immediate issue Cons: Doesn't prevent similar issues elsewhere
Option 2: Robust Fix (Custom Exception)
# models/exceptions.py
class UserNotFoundError(Exception):
"""Raised when user is not found in database."""
def __init__(self, user_id: str):
self.user_id = user_id
super().__init__(f"User {user_id} not found")
# api/users.py
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
"""Get user profile information."""
user = await db.users.find_one({"id": user_id})
if user is None:
raise UserNotFoundError(user_id)
return {
"name": user.name,
"email": user.email,
"created_at": user.created_at
}
# Global exception handler
@app.exception_handler(UserNotFoundError)
async def user_not_found_handler(request, exc):
return JSONResponse(
status_code=404,
content={"detail": str(exc)}
)
Pros: Reusable, type-safe, better error handling Cons: More boilerplate
Option 3: Best Practice (Helper Function)
# services/user_service.py
async def get_user_or_404(user_id: str) -> User:
"""Get user by ID or raise 404."""
user = await db.users.find_one({"id": user_id})
if user is None:
raise HTTPException(status_code=404, detail=f"User {user_id} not found")
return user
# api/users.py
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
"""Get user profile information."""
user = await get_user_or_404(user_id) # Handles None case
return {
"name": user.name,
"email": user.email,
"created_at": user.created_at
}
Pros: Reusable pattern, prevents similar bugs, clean API code Cons: Slightly more abstraction
Selected: Option 3 - Best practice with reusable helper
Phase 6: Apply Fix
# services/user_service.py
from fastapi import HTTPException
async def get_user_or_404(user_id: str) -> User:
"""
Get user by ID or raise 404 Not Found.
Args:
user_id: User ID to retrieve
Returns:
User object
Raises:
HTTPException: 404 if user not found
Example:
user = await get_user_or_404("usr_123")
"""
user = await db.users.find_one({"id": user_id})
if user is None:
raise HTTPException(
status_code=404,
detail=f"User {user_id} not found"
)
return user
# api/users.py (FIXED)
@router.get("/users/{user_id}/profile")
async def get_user_profile(user_id: str):
"""Get user profile information."""
user = await get_user_or_404(user_id) # Null-safe!
return {
"name": user.name,
"email": user.email,
"created_at": user.created_at
}
Phase 7: Verify Fix
Run Test Again (Should Pass)
$ pytest tests/test_user_api.py::test_get_deleted_user_profile -v
tests/test_user_api.py::test_get_deleted_user_profile PASSED ✅
Run Full Test Suite
$ pytest tests/ -v
======================== 127 passed in 4.23s ========================
✅ All tests pass - No regressions introduced
Manual Verification
# Test in staging
$ curl https://api-staging.example.com/users/usr_deleted_123/profile
HTTP/1.1 404 Not Found
{
"detail": "User usr_deleted_123 not found"
}
✅ Fix verified - Returns proper 404 instead of 500
Phase 8: Root Cause Analysis (5 Whys)
Why 1: Why did the API return 500 error?
→ Code tried to access user.name on None
Why 2: Why was user None?
→ Database query returned None (user doesn't exist)
Why 3: Why didn't code handle None case? → No null check before accessing user properties
Why 4: Why was there no null check? → Developer assumed user would always exist (invalid assumption)
Why 5: Why was invalid assumption made? → No code review guideline requiring null checks for database queries
ROOT CAUSE: Missing null-safety pattern for database queries + no enforcement in code review
Phase 9: Prevention Strategy
Immediate Prevention
✅ Unit test added (prevents this specific bug)
def test_get_deleted_user_profile(client, db):
# Test ensures 404 is returned for deleted users
pass
Short-term Prevention
✅ Integration test added (prevents similar bugs)
@pytest.mark.parametrize("endpoint", [
"/users/{id}/profile",
"/users/{id}/settings",
"/users/{id}/posts"
])
def test_user_endpoints_return_404_for_deleted_users(client, db, endpoint):
"""All user endpoints should return 404 for deleted users."""
user_id = create_and_delete_user(db)
response = client.get(endpoint.format(id=user_id))
assert response.status_code == 404
Long-term Prevention
✅ Architecture change proposed: Create get_resource_or_404() pattern
# services/base_service.py
from typing import TypeVar, Generic, Type
T = TypeVar('T')
class BaseService(Generic[T]):
"""Base service with null-safe query methods."""
async def get_or_404(
self,
resource_id: str,
resource_type: str = "Resource"
) -> T:
"""Get resource by ID or raise 404."""
resource = await self.find_one({"id": resource_id})
if resource is None:
raise HTTPException(
status_code=404,
detail=f"{resource_type} {resource_id} not found"
)
return resource
# Usage across all resources
user_service = UserService()
post_service = PostService()
comment_service = CommentService()
user = await user_service.get_or_404(user_id, "User")
post = await post_service.get_or_404(post_id, "Post")
Monitoring Added
✅ Alert created (detects recurrence)
# prometheus/alerts/user_not_found.yml
groups:
- name: user_api
rules:
- alert: HighUserNotFoundRate
expr: |
rate(http_requests_total{
endpoint="/users/:id/profile",
status_code="404"
}[5m]) > 10
for: 5m
annotations:
summary: "High rate of user not found errors"
description: "{{ $value }} 404s/sec on user profile endpoint"
Documentation Updated
✅ Runbook created
# Runbook: User Not Found Errors
## Symptom
404 errors when accessing user profiles
## Diagnosis
- Check if user was recently deleted
- Verify database replication lag
- Check for stale cache entries
## Resolution
- User deleted: Expected behavior
- Replication lag: Wait 30 seconds
- Stale cache: Clear user cache
## Prevention
Always use `get_user_or_404()` helper
Phase 10: Deploy & Monitor
Pre-Deployment Checklist
- Fix tested in staging
- No performance impact
- Security review not needed (defensive fix)
- Deployment plan created
- Rollback plan ready
Deployment
# Deploy to staging
$ git push origin feature/fix-user-not-found
$ ./scripts/deploy-staging.sh
# Verify in staging (1 hour)
$ ./scripts/monitor-staging.sh --duration 1h
# Deploy to production (gradual rollout)
$ ./scripts/deploy-production.sh --canary 10% # 10% traffic
$ sleep 600 # Monitor for 10 minutes
$ ./scripts/deploy-production.sh --canary 50% # 50% traffic
$ sleep 600
$ ./scripts/deploy-production.sh --canary 100% # Full traffic
Post-Deployment Monitoring
1 Hour Post-Deploy:
# Check error logs
$ kubectl logs -l app=api --since=1h | grep "User.*not found"
# No unexpected errors ✅
# Check error rate
$ curl prometheus/query?query='rate(http_errors_total[1h])'
# No increase in error rate ✅
24 Hours Post-Deploy:
# Verify user not found rate is zero
$ curl prometheus/query?query='rate(http_requests_total{status_code="404",endpoint="/users/:id/profile"}[24h])'
# Result: 0 errors ✅
Summary
| Metric | Value |
|---|---|
| Time to Reproduce | 5 minutes |
| Time to Fix | 15 minutes |
| Time to Deploy | 30 minutes |
| Total Time | 50 minutes |
| Tests Added | 2 (unit + integration) |
| Prevention Strategies | 3 (tests, architecture, monitoring) |
| Recurrences | 0 (monitored for 1 week) |
Lessons Learned
What Went Well
- Clear stack trace made root cause obvious
- Test-driven debugging caught the issue immediately
- Helper function prevents similar bugs across codebase
What Could Be Improved
- Should have had null-safety pattern from the start
- Code review should catch missing null checks
- Static analysis could detect this pattern
Recommendations
- Add
mypyor similar for null-safety checking - Update code review checklist to include null-safety checks
- Create linter rule: "Database queries must use
get_or_404pattern"
Bug Fixed: ✅ Tests Pass: ✅ Prevention Implemented: ✅ Production Stable: ✅