# Playbook: Database Deadlock ## Symptoms - "Deadlock detected" errors in application - API returning 500 errors - Transactions timing out - Database connection pool exhausted - Monitoring alert: "Deadlock count >0" ## Severity - **SEV2** if isolated to specific endpoint - **SEV1** if affecting all database operations ## Diagnosis ### Step 1: Confirm Deadlock (PostgreSQL) ```sql -- Check for currently locked queries SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS blocking_statement FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted; -- Check deadlock log SELECT * FROM pg_stat_database WHERE datname = 'your_database'; ``` ### Step 2: Confirm Deadlock (MySQL) ```sql -- Show InnoDB status (includes deadlock info) SHOW ENGINE INNODB STATUS\G -- Look for "LATEST DETECTED DEADLOCK" section -- Shows which transactions were involved ``` --- ### Step 3: Identify Deadlock Pattern **Common Pattern 1: Lock Order Mismatch** ``` Transaction A: Locks row 1, then row 2 Transaction B: Locks row 2, then row 1 → DEADLOCK ``` **Common Pattern 2: Gap Locks** ``` Transaction A: SELECT ... FOR UPDATE WHERE id BETWEEN 1 AND 10 Transaction B: INSERT INTO table (id) VALUES (5) → DEADLOCK ``` **Common Pattern 3: Foreign Key Deadlock** ``` Transaction A: Updates parent table Transaction B: Inserts into child table → DEADLOCK (foreign key check locks) ``` --- ## Mitigation ### Immediate (Now - 5 min) **Option A: Kill Blocking Query** (PostgreSQL) ```sql -- Terminate blocking process SELECT pg_terminate_backend(); -- Verify deadlock cleared SELECT count(*) FROM pg_locks WHERE NOT granted; -- Should return 0 ``` **Option B: Kill Blocking Query** (MySQL) ```sql -- Show process list SHOW PROCESSLIST; -- Kill blocking query KILL ; ``` **Option C: Kill Idle Transactions** (PostgreSQL) ```sql -- Find idle transactions (>5 min) SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < NOW() - INTERVAL '5 minutes'; -- Impact: Frees up locks -- Risk: Low (transactions are idle) ``` --- ### Short-term (5 min - 1 hour) **Option A: Add Transaction Timeout** (PostgreSQL) ```sql -- Set statement timeout (30 seconds) ALTER DATABASE your_database SET statement_timeout = '30s'; -- Or in application: SET statement_timeout = '30s'; -- Impact: Prevents long-running transactions -- Risk: Low (transactions should be fast) ``` **Option B: Add Transaction Timeout** (MySQL) ```sql -- Set lock wait timeout SET GLOBAL innodb_lock_wait_timeout = 30; -- Impact: Transactions fail instead of waiting forever -- Risk: Low (application should handle errors) ``` **Option C: Fix Lock Order in Application** ```javascript // BAD: Inconsistent lock order async function transferMoney(fromId, toId, amount) { await db.query('UPDATE accounts SET balance = balance - ? WHERE id = ?', [amount, fromId]); await db.query('UPDATE accounts SET balance = balance + ? WHERE id = ?', [amount, toId]); } // GOOD: Consistent lock order async function transferMoney(fromId, toId, amount) { const firstId = Math.min(fromId, toId); const secondId = Math.max(fromId, toId); await db.query('UPDATE accounts SET balance = balance - ? WHERE id = ?', [amount, firstId]); await db.query('UPDATE accounts SET balance = balance + ? WHERE id = ?', [amount, secondId]); } ``` --- ### Long-term (1 hour+) **Option A: Reduce Transaction Scope** ```javascript // BAD: Long transaction BEGIN; const user = await db.query('SELECT * FROM users WHERE id = ? FOR UPDATE', [userId]); await sendEmail(user.email); // External call (slow!) await db.query('UPDATE users SET last_email_sent = NOW() WHERE id = ?', [userId]); COMMIT; // GOOD: Short transaction const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]); await sendEmail(user.email); // Outside transaction await db.query('UPDATE users SET last_email_sent = NOW() WHERE id = ?', [userId]); ``` **Option B: Use Optimistic Locking** ```sql -- Add version column ALTER TABLE accounts ADD COLUMN version INT DEFAULT 0; -- Update with version check UPDATE accounts SET balance = balance - 100, version = version + 1 WHERE id = 1 AND version = ; -- If 0 rows updated, retry with new version ``` **Option C: Review Isolation Level** ```sql -- PostgreSQL default: READ COMMITTED -- Most cases: READ COMMITTED is fine -- Rare cases: REPEATABLE READ or SERIALIZABLE -- Lower isolation = less locking = fewer deadlocks SET TRANSACTION ISOLATION LEVEL READ COMMITTED; ``` --- ## Escalation **Escalate to developer if**: - Application code causing deadlock - Requires code refactoring **Escalate to DBA if**: - Database configuration issue - Foreign key constraint problem --- ## Prevention - [ ] Always lock in same order - [ ] Keep transactions short - [ ] Use timeout (statement_timeout, lock_wait_timeout) - [ ] Use optimistic locking when possible - [ ] Add deadlock monitoring alert - [ ] Review isolation level (lower = fewer deadlocks) --- ## Related Runbooks - [04-slow-api-response.md](04-slow-api-response.md) - If API slow due to deadlock - [../modules/database-diagnostics.md](../modules/database-diagnostics.md) - Database troubleshooting --- ## Post-Incident After resolving: - [ ] Create post-mortem - [ ] Identify which queries deadlocked - [ ] Fix lock order in application code - [ ] Add regression test - [ ] Update this runbook if needed