13 KiB
13 KiB
Database Connection Pool Memory Leaks
Detecting and fixing PostgreSQL connection pool leaks in FastAPI applications using connection monitoring and proper cleanup patterns.
Overview
Before Optimization:
- Active connections: 95/100 (pool exhausted)
- Connection timeouts: 15-20/min during peak
- Memory growth: 100MB/hour (unclosed connections)
- Service restarts: 3-4x/day
After Optimization:
- Active connections: 8-12/100 (healthy pool)
- Connection timeouts: 0/day
- Memory growth: 0MB/hour (stable)
- Service restarts: 0/month
Tools: asyncpg, SQLModel, psycopg3, pg_stat_activity, Prometheus
1. Connection Pool Architecture
Grey Haven Stack: PostgreSQL + SQLModel
Connection Pool Configuration:
# database.py
from sqlmodel import create_engine
from sqlalchemy.pool import QueuePool
# ❌ VULNERABLE: No max_overflow, no timeout
engine = create_engine(
"postgresql://user:pass@localhost/db",
poolclass=QueuePool,
pool_size=20,
echo=True
)
# ✅ SECURE: Proper pool configuration
engine = create_engine(
"postgresql://user:pass@localhost/db",
poolclass=QueuePool,
pool_size=20, # Core connections
max_overflow=10, # Max additional connections
pool_timeout=30, # Wait timeout (seconds)
pool_recycle=3600, # Recycle after 1 hour
pool_pre_ping=True, # Verify connection before use
echo=False
)
Pool Health Monitoring:
# monitoring.py
from prometheus_client import Gauge
# Prometheus metrics
db_pool_size = Gauge('db_pool_connections_total', 'Total pool size')
db_pool_active = Gauge('db_pool_connections_active', 'Active connections')
db_pool_idle = Gauge('db_pool_connections_idle', 'Idle connections')
db_pool_overflow = Gauge('db_pool_connections_overflow', 'Overflow connections')
def record_pool_metrics(engine):
pool = engine.pool
db_pool_size.set(pool.size())
db_pool_active.set(pool.checkedout())
db_pool_idle.set(pool.size() - pool.checkedout())
db_pool_overflow.set(pool.overflow())
2. Common Leak Pattern: Unclosed Connections
Vulnerable Code (Connection Leak)
# api/orders.py (BEFORE)
from fastapi import APIRouter, Depends
from sqlmodel import Session, select
from database import engine
router = APIRouter()
@router.get("/orders")
async def get_orders():
# ❌ LEAK: Connection never closed
session = Session(engine)
# If exception occurs here, session never closed
orders = session.exec(select(Order)).all()
# If return happens here, session never closed
return orders
# session.close() never reached if early return/exception
session.close()
What Happens:
- Every request acquires connection from pool
- Exception/early return prevents
session.close() - Connection remains in "active" state
- Pool exhausts after 100 requests (pool_size=100)
- New requests timeout waiting for connection
Memory Impact:
Initial pool: 20 connections (40MB)
After 1 hour: 95 leaked connections (190MB)
After 6 hours: Pool exhausted + 100MB leaked memory
Fixed Code (Context Manager)
# api/orders.py (AFTER)
from fastapi import APIRouter, Depends
from sqlmodel import Session, select
from database import engine, get_session
from contextlib import contextmanager
router = APIRouter()
# ✅ Option 1: FastAPI dependency injection (recommended)
def get_session():
"""Session dependency with automatic cleanup"""
with Session(engine) as session:
yield session
@router.get("/orders")
async def get_orders(session: Session = Depends(get_session)):
# Session automatically closed after request
orders = session.exec(select(Order)).all()
return orders
# ✅ Option 2: Explicit context manager
@router.get("/orders-alt")
async def get_orders_alt():
with Session(engine) as session:
orders = session.exec(select(Order)).all()
return orders
# Session guaranteed to close (even on exception)
Why This Works:
- Context manager ensures
session.close()called in__exit__ - Works even if exception raised
- Works even if early return
- FastAPI
Depends()handles async cleanup
3. Async Connection Leaks (asyncpg)
Vulnerable Async Pattern
# api/analytics.py (BEFORE)
import asyncpg
from fastapi import APIRouter
router = APIRouter()
@router.get("/analytics")
async def get_analytics():
# ❌ LEAK: Connection never closed
conn = await asyncpg.connect(
user='postgres',
password='secret',
database='analytics'
)
# Exception here = connection leaked
result = await conn.fetch('SELECT * FROM metrics WHERE date > $1', date)
# Early return = connection leaked
if not result:
return []
await conn.close() # Never reached
return result
Fixed Async Pattern
# api/analytics.py (AFTER)
import asyncpg
from fastapi import APIRouter
from contextlib import asynccontextmanager
router = APIRouter()
# ✅ Connection pool (shared across requests)
pool: asyncpg.Pool = None
@asynccontextmanager
async def get_db_connection():
"""Async context manager for connections"""
conn = await pool.acquire()
try:
yield conn
finally:
await pool.release(conn)
@router.get("/analytics")
async def get_analytics():
async with get_db_connection() as conn:
result = await conn.fetch(
'SELECT * FROM metrics WHERE date > $1',
date
)
return result
# Connection automatically released to pool
Pool Setup (application startup):
# main.py
from fastapi import FastAPI
import asyncpg
app = FastAPI()
@app.on_event("startup")
async def startup():
global pool
pool = await asyncpg.create_pool(
user='postgres',
password='secret',
database='analytics',
min_size=10, # Minimum connections
max_size=20, # Maximum connections
max_inactive_connection_lifetime=300 # Recycle after 5 min
)
@app.on_event("shutdown")
async def shutdown():
await pool.close()
4. Transaction Leak Detection
Monitoring Active Connections
PostgreSQL Query:
-- Show active connections with details
SELECT
pid,
usename,
application_name,
client_addr,
state,
query,
state_change,
NOW() - state_change AS duration
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
Prometheus Metrics:
# monitoring.py
from prometheus_client import Gauge
import asyncpg
db_connections_active = Gauge(
'db_connections_active',
'Active database connections',
['state']
)
async def monitor_connections(pool: asyncpg.Pool):
"""Monitor PostgreSQL connections every 30 seconds"""
async with pool.acquire() as conn:
rows = await conn.fetch("""
SELECT state, COUNT(*) as count
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
""")
for row in rows:
db_connections_active.labels(state=row['state']).set(row['count'])
Grafana Alert (connection leak):
alert: DatabaseConnectionLeak
expr: db_connections_active{state="active"} > 80
for: 5m
annotations:
summary: "Potential connection leak ({{ $value }} active connections)"
description: "Active connections have been above 80 for 5+ minutes"
5. Real-World Fix: FastAPI Order Service
Before (Connection Pool Exhaustion)
# services/order_processor.py (BEFORE)
from sqlmodel import Session, select
from database import engine
from models import Order, OrderItem
class OrderProcessor:
async def process_order(self, order_id: int):
# ❌ LEAK: Multiple sessions, some never closed
session1 = Session(engine)
order = session1.get(Order, order_id)
if not order:
# Early return = session1 leaked
return None
# ❌ LEAK: Second session
session2 = Session(engine)
items = session2.exec(
select(OrderItem).where(OrderItem.order_id == order_id)
).all()
# Exception here = both sessions leaked
total = sum(item.price * item.quantity for item in items)
order.total = total
session1.commit()
# Only session1 closed, session2 leaked
session1.close()
return order
Metrics (Before):
Connection pool: 100 connections
Active connections after 1 hour: 95/100
Leaked connections: ~12/min
Memory growth: 100MB/hour
Pool exhaustion: Every 6-8 hours
After (Proper Resource Management)
# services/order_processor.py (AFTER)
from sqlmodel import Session, select
from database import engine, get_session
from models import Order, OrderItem
from contextlib import contextmanager
class OrderProcessor:
async def process_order(self, order_id: int):
# ✅ Single session, guaranteed cleanup
with Session(engine) as session:
# Query order
order = session.get(Order, order_id)
if not order:
return None
# Query items (same session)
items = session.exec(
select(OrderItem).where(OrderItem.order_id == order_id)
).all()
# Calculate total
total = sum(item.price * item.quantity for item in items)
# Update order
order.total = total
session.add(order)
session.commit()
session.refresh(order)
return order
# Session automatically closed (even on exception)
Metrics (After):
Connection pool: 100 connections
Active connections: 8-12/100 (stable)
Leaked connections: 0/day
Memory growth: 0MB/hour
Pool exhaustion: Never (0 incidents/month)
6. Connection Pool Configuration Best Practices
Recommended Settings (Grey Haven Stack)
# database.py - Production settings
from sqlmodel import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
database_url,
poolclass=QueuePool,
pool_size=20, # (workers * connections/worker) + buffer
max_overflow=10, # 50% of pool_size
pool_timeout=30, # Wait timeout
pool_recycle=3600, # Recycle after 1h
pool_pre_ping=True # Health check
)
Pool Size Formula: pool_size = (workers * conn_per_worker) + buffer
Example: (4 workers * 3 conn) + 8 buffer = 20
7. Testing Connection Cleanup
Pytest Fixture for Connection Tracking
# tests/conftest.py
import pytest
from sqlmodel import Session, create_engine
@pytest.fixture
def engine():
"""Test engine with connection tracking"""
test_engine = create_engine("postgresql://test:test@localhost/test_db", pool_size=5)
initial_active = test_engine.pool.checkedout()
yield test_engine
final_active = test_engine.pool.checkedout()
assert final_active == initial_active, f"Leaked {final_active - initial_active} connections"
@pytest.mark.asyncio
async def test_no_connection_leak_under_load(engine):
"""Simulate 1000 concurrent requests"""
initial = engine.pool.checkedout()
tasks = [get_orders() for _ in range(1000)]
await asyncio.gather(*tasks)
await asyncio.sleep(1)
assert engine.pool.checkedout() == initial, "Connection leak detected"
8. CI/CD Integration
# .github/workflows/connection-leak-test.yml
name: Connection Leak Detection
on: [pull_request]
jobs:
leak-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env: {POSTGRES_PASSWORD: test, POSTGRES_DB: test_db}
ports: [5432:5432]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with: {python-version: '3.11'}
- run: pip install -r requirements.txt pytest pytest-asyncio
- run: pytest tests/test_connection_leaks.py -v
9. Results and Impact
Before vs After Metrics
| Metric | Before | After | Impact |
|---|---|---|---|
| Active Connections | 95/100 (95%) | 8-12/100 (10%) | 85% reduction |
| Connection Timeouts | 15-20/min | 0/day | 100% eliminated |
| Memory Growth | 100MB/hour | 0MB/hour | 100% eliminated |
| Service Restarts | 3-4x/day | 0/month | 100% eliminated |
| Pool Wait Time (p95) | 5.2s | 0.01s | 99.8% faster |
Key Optimizations Applied
- Context Managers: Guaranteed connection cleanup (even on exceptions)
- FastAPI Dependencies: Automatic session lifecycle management
- Connection Pooling: Proper pool_size, max_overflow, pool_timeout
- Prometheus Monitoring: Real-time pool saturation metrics
- Load Testing: CI/CD checks for connection leaks
Related Documentation
- Node.js Leaks: nodejs-memory-leak.md
- Python Profiling: python-scalene-profiling.md
- Large Datasets: large-dataset-optimization.md
- Reference: ../reference/profiling-tools.md
Return to examples index