6.0 KiB
6.0 KiB
Playbook: Database Deadlock
Symptoms
- "Deadlock detected" errors in application
- API returning 500 errors
- Transactions timing out
- Database connection pool exhausted
- Monitoring alert: "Deadlock count >0"
Severity
- SEV2 if isolated to specific endpoint
- SEV1 if affecting all database operations
Diagnosis
Step 1: Confirm Deadlock (PostgreSQL)
-- Check for currently locked queries
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
-- Check deadlock log
SELECT * FROM pg_stat_database WHERE datname = 'your_database';
Step 2: Confirm Deadlock (MySQL)
-- Show InnoDB status (includes deadlock info)
SHOW ENGINE INNODB STATUS\G
-- Look for "LATEST DETECTED DEADLOCK" section
-- Shows which transactions were involved
Step 3: Identify Deadlock Pattern
Common Pattern 1: Lock Order Mismatch
Transaction A: Locks row 1, then row 2
Transaction B: Locks row 2, then row 1
→ DEADLOCK
Common Pattern 2: Gap Locks
Transaction A: SELECT ... FOR UPDATE WHERE id BETWEEN 1 AND 10
Transaction B: INSERT INTO table (id) VALUES (5)
→ DEADLOCK
Common Pattern 3: Foreign Key Deadlock
Transaction A: Updates parent table
Transaction B: Inserts into child table
→ DEADLOCK (foreign key check locks)
Mitigation
Immediate (Now - 5 min)
Option A: Kill Blocking Query (PostgreSQL)
-- Terminate blocking process
SELECT pg_terminate_backend(<blocking_pid>);
-- Verify deadlock cleared
SELECT count(*) FROM pg_locks WHERE NOT granted;
-- Should return 0
Option B: Kill Blocking Query (MySQL)
-- Show process list
SHOW PROCESSLIST;
-- Kill blocking query
KILL <process_id>;
Option C: Kill Idle Transactions (PostgreSQL)
-- Find idle transactions (>5 min)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND state_change < NOW() - INTERVAL '5 minutes';
-- Impact: Frees up locks
-- Risk: Low (transactions are idle)
Short-term (5 min - 1 hour)
Option A: Add Transaction Timeout (PostgreSQL)
-- Set statement timeout (30 seconds)
ALTER DATABASE your_database SET statement_timeout = '30s';
-- Or in application:
SET statement_timeout = '30s';
-- Impact: Prevents long-running transactions
-- Risk: Low (transactions should be fast)
Option B: Add Transaction Timeout (MySQL)
-- Set lock wait timeout
SET GLOBAL innodb_lock_wait_timeout = 30;
-- Impact: Transactions fail instead of waiting forever
-- Risk: Low (application should handle errors)
Option C: Fix Lock Order in Application
// BAD: Inconsistent lock order
async function transferMoney(fromId, toId, amount) {
await db.query('UPDATE accounts SET balance = balance - ? WHERE id = ?', [amount, fromId]);
await db.query('UPDATE accounts SET balance = balance + ? WHERE id = ?', [amount, toId]);
}
// GOOD: Consistent lock order
async function transferMoney(fromId, toId, amount) {
const firstId = Math.min(fromId, toId);
const secondId = Math.max(fromId, toId);
await db.query('UPDATE accounts SET balance = balance - ? WHERE id = ?', [amount, firstId]);
await db.query('UPDATE accounts SET balance = balance + ? WHERE id = ?', [amount, secondId]);
}
Long-term (1 hour+)
Option A: Reduce Transaction Scope
// BAD: Long transaction
BEGIN;
const user = await db.query('SELECT * FROM users WHERE id = ? FOR UPDATE', [userId]);
await sendEmail(user.email); // External call (slow!)
await db.query('UPDATE users SET last_email_sent = NOW() WHERE id = ?', [userId]);
COMMIT;
// GOOD: Short transaction
const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
await sendEmail(user.email); // Outside transaction
await db.query('UPDATE users SET last_email_sent = NOW() WHERE id = ?', [userId]);
Option B: Use Optimistic Locking
-- Add version column
ALTER TABLE accounts ADD COLUMN version INT DEFAULT 0;
-- Update with version check
UPDATE accounts
SET balance = balance - 100, version = version + 1
WHERE id = 1 AND version = <current_version>;
-- If 0 rows updated, retry with new version
Option C: Review Isolation Level
-- PostgreSQL default: READ COMMITTED
-- Most cases: READ COMMITTED is fine
-- Rare cases: REPEATABLE READ or SERIALIZABLE
-- Lower isolation = less locking = fewer deadlocks
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
Escalation
Escalate to developer if:
- Application code causing deadlock
- Requires code refactoring
Escalate to DBA if:
- Database configuration issue
- Foreign key constraint problem
Prevention
- Always lock in same order
- Keep transactions short
- Use timeout (statement_timeout, lock_wait_timeout)
- Use optimistic locking when possible
- Add deadlock monitoring alert
- Review isolation level (lower = fewer deadlocks)
Related Runbooks
- 04-slow-api-response.md - If API slow due to deadlock
- ../modules/database-diagnostics.md - Database troubleshooting
Post-Incident
After resolving:
- Create post-mortem
- Identify which queries deadlocked
- Fix lock order in application code
- Add regression test
- Update this runbook if needed