Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:18:40 +08:00
commit d6245b656d
8 changed files with 723 additions and 0 deletions

View File

@@ -0,0 +1,15 @@
{
"name": "database-transaction-monitor",
"description": "Database plugin for database-transaction-monitor",
"version": "1.0.0",
"author": {
"name": "Claude Code Plugins",
"email": "[email protected]"
},
"skills": [
"./skills"
],
"commands": [
"./commands"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# database-transaction-monitor
Database plugin for database-transaction-monitor

570
commands/transactions.md Normal file
View File

@@ -0,0 +1,570 @@
---
description: Monitor database transactions with real-time alerting for performance and lock issues
shortcut: txn-monitor
---
# Database Transaction Monitor
Monitor database transaction performance, detect long-running transactions, identify lock contention, track rollback rates, and automatically alert on transaction anomalies for production database health.
## When to Use This Command
Use `/txn-monitor` when you need to:
- Detect and kill long-running transactions blocking other queries
- Monitor lock wait times and identify deadlock patterns
- Track transaction rollback rates for error analysis
- Alert on isolation level anomalies (phantom reads, dirty reads)
- Analyze transaction throughput and latency trends
- Investigate application connection leak issues
DON'T use this when:
- Database has minimal transaction load (<100 TPS)
- All transactions complete within milliseconds
- Looking for query optimization (use query optimizer instead)
- Investigating data corruption (use audit logger instead)
## Design Decisions
This command implements **real-time transaction monitoring with automated alerting** because:
- Long-running transactions (>30s) block other queries and cause performance degradation
- Lock contention detection prevents cascade failures
- Rollback rate monitoring identifies application bugs early
- Automatic alerts reduce MTTR (Mean Time To Resolution)
- Historical trend analysis enables capacity planning
**Alternative considered: Periodic manual checks**
- No automated alerting on issues
- Relies on humans checking dashboards
- Slower incident response
- Recommended only for development environments
**Alternative considered: Database log parsing**
- Post-mortem analysis only
- No real-time alerts
- Requires custom log parsing logic
- Recommended for compliance/audit purposes
## Prerequisites
Before running this command:
1. Database monitoring permissions (pg_monitor role or PROCESS privilege)
2. Access to pg_stat_activity (PostgreSQL) or performance_schema (MySQL)
3. Alerting infrastructure (Slack, PagerDuty, email)
4. Monitoring data retention strategy (metrics database or time-series DB)
5. Runbook for common transaction issues
## Implementation Process
### Step 1: Enable Transaction Monitoring
Configure database to track transaction statistics.
### Step 2: Build Real-Time Monitor
Create monitoring script that polls transaction statistics every 5-10 seconds.
### Step 3: Define Alert Thresholds
Set thresholds for long-running transactions, lock waits, and rollback rates.
### Step 4: Implement Automated Actions
Auto-kill transactions exceeding thresholds or alert operators.
### Step 5: Create Dashboards
Build Grafana dashboards for transaction metrics visualization.
## Output Format
The command generates:
- `monitoring/transaction_monitor.py` - Real-time transaction monitoring daemon
- `queries/transaction_analysis.sql` - Transaction health diagnostic queries
- `alerts/transaction_alerts.yml` - Prometheus alerting rules
- `dashboards/transaction_dashboard.json` - Grafana dashboard configuration
- `docs/transaction_runbook.md` - Incident response procedures
## Code Examples
### Example 1: PostgreSQL Real-Time Transaction Monitor
```python
# monitoring/postgres_transaction_monitor.py
import psycopg2
from psycopg2.extras import Dict Cursor
import time
import logging
from typing import List, Dict, Optional
from dataclasses import dataclass, asdict
from datetime import datetime, timedelta
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class TransactionInfo:
"""Represents an active transaction."""
pid: int
username: str
database: str
application_name: str
client_addr: str
state: str
query: str
transaction_start: datetime
query_start: datetime
wait_event: Optional[str]
blocking_pids: List[int]
def duration_seconds(self) -> float:
return (datetime.now() - self.transaction_start).total_seconds()
def to_dict(self) -> dict:
result = asdict(self)
result['transaction_start'] = self.transaction_start.isoformat()
result['query_start'] = self.query_start.isoformat()
result['duration_seconds'] = self.duration_seconds()
return result
class PostgreSQLTransactionMonitor:
"""Monitor PostgreSQL transactions in real-time."""
def __init__(
self,
connection_string: str,
long_transaction_threshold: int = 30,
check_interval: int = 10
):
self.conn_string = connection_string
self.long_transaction_threshold = long_transaction_threshold
self.check_interval = check_interval
self.stats = {
'total_transactions': 0,
'long_running_count': 0,
'blocked_count': 0,
'idle_in_transaction_count': 0
}
def connect(self):
return psycopg2.connect(self.conn_string, cursor_factory=DictCursor)
def get_active_transactions(self) -> List[TransactionInfo]:
"""Fetch all active transactions with blocking information."""
query = """
SELECT
a.pid,
a.usename,
a.datname AS database,
a.application_name,
a.client_addr::text,
a.state,
a.query,
a.xact_start AS transaction_start,
a.query_start,
a.wait_event,
array_agg(b.pid) FILTER (WHERE b.pid IS NOT NULL) AS blocking_pids
FROM pg_stat_activity a
LEFT JOIN pg_stat_activity b ON b.pid = ANY(pg_blocking_pids(a.pid))
WHERE a.pid != pg_backend_pid()
AND a.state != 'idle'
AND a.xact_start IS NOT NULL
GROUP BY a.pid, a.usename, a.datname, a.application_name,
a.client_addr, a.state, a.query, a.xact_start,
a.query_start, a.wait_event
ORDER BY a.xact_start;
"""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute(query)
rows = cur.fetchall()
transactions = []
for row in rows:
txn = TransactionInfo(
pid=row['pid'],
username=row['usename'],
database=row['database'],
application_name=row['application_name'] or 'unknown',
client_addr=row['client_addr'] or 'local',
state=row['state'],
query=row['query'][:200], # Truncate long queries
transaction_start=row['transaction_start'],
query_start=row['query_start'],
wait_event=row['wait_event'],
blocking_pids=row['blocking_pids'] or []
)
transactions.append(txn)
return transactions
finally:
conn.close()
def find_long_running_transactions(
self,
transactions: List[TransactionInfo]
) -> List[TransactionInfo]:
"""Identify transactions exceeding threshold."""
return [
txn for txn in transactions
if txn.duration_seconds() > self.long_transaction_threshold
]
def find_blocked_transactions(
self,
transactions: List[TransactionInfo]
) -> List[TransactionInfo]:
"""Identify transactions waiting on locks."""
return [
txn for txn in transactions
if txn.blocking_pids and len(txn.blocking_pids) > 0
]
def find_idle_in_transaction(
self,
transactions: List[TransactionInfo]
) -> List[TransactionInfo]:
"""Find idle transactions holding locks."""
return [
txn for txn in transactions
if txn.state == 'idle in transaction'
and txn.duration_seconds() > 60
]
def kill_transaction(self, pid: int, reason: str) -> bool:
"""Terminate a transaction by PID."""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute("SELECT pg_terminate_backend(%s)", (pid,))
success = cur.fetchone()[0]
if success:
logger.warning(f"Killed transaction PID {pid}: {reason}")
else:
logger.error(f"Failed to kill transaction PID {pid}")
return success
finally:
conn.close()
def get_transaction_stats(self) -> Dict[str, any]:
"""Get overall transaction statistics."""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT
(SELECT count(*) FROM pg_stat_activity WHERE state != 'idle') AS active_connections,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'idle in transaction') AS idle_in_txn,
(SELECT sum(xact_commit) FROM pg_stat_database) AS total_commits,
(SELECT sum(xact_rollback) FROM pg_stat_database) AS total_rollbacks,
(SELECT sum(conflicts) FROM pg_stat_database_conflicts) AS conflicts
""")
row = cur.fetchone()
total_txns = row['total_commits'] + row['total_rollbacks']
rollback_rate = (row['total_rollbacks'] / total_txns * 100) if total_txns > 0 else 0
return {
'active_connections': row['active_connections'],
'idle_in_transaction': row['idle_in_txn'],
'total_commits': row['total_commits'],
'total_rollbacks': row['total_rollbacks'],
'rollback_rate_percent': round(rollback_rate, 2),
'conflicts': row['conflicts']
}
finally:
conn.close()
def alert(self, severity: str, message: str, details: dict = None):
"""Send alert to monitoring system."""
log_func = {
'critical': logger.critical,
'warning': logger.warning,
'info': logger.info
}.get(severity, logger.info)
log_func(f"[{severity.upper()}] {message}")
if details:
logger.info(f"Details: {details}")
# Implement webhook/email/PagerDuty integration here
# Example: requests.post(webhook_url, json={'message': message, 'details': details})
def run_monitoring_loop(self):
"""Main monitoring loop."""
logger.info(f"Starting transaction monitoring (interval: {self.check_interval}s)")
while True:
try:
# Fetch active transactions
transactions = self.get_active_transactions()
self.stats['total_transactions'] = len(transactions)
# Check for long-running transactions
long_running = self.find_long_running_transactions(transactions)
if long_running:
self.stats['long_running_count'] = len(long_running)
for txn in long_running:
self.alert(
'warning',
f"Long-running transaction detected: PID {txn.pid}",
{
'duration': txn.duration_seconds(),
'database': txn.database,
'username': txn.username,
'query': txn.query
}
)
# Auto-kill if exceeds 5 minutes
if txn.duration_seconds() > 300:
self.kill_transaction(
txn.pid,
f"Exceeded 5 minute threshold ({txn.duration_seconds():.0f}s)"
)
# Check for blocked transactions
blocked = self.find_blocked_transactions(transactions)
if blocked:
self.stats['blocked_count'] = len(blocked)
for txn in blocked:
self.alert(
'warning',
f"Blocked transaction: PID {txn.pid}",
{
'blocking_pids': txn.blocking_pids,
'wait_event': txn.wait_event,
'duration': txn.duration_seconds()
}
)
# Check for idle in transaction
idle_txns = self.find_idle_in_transaction(transactions)
if idle_txns:
self.stats['idle_in_transaction_count'] = len(idle_txns)
for txn in idle_txns:
self.alert(
'warning',
f"Idle in transaction: PID {txn.pid}",
{
'duration': txn.duration_seconds(),
'application': txn.application_name
}
)
# Kill idle transactions after 10 minutes
if txn.duration_seconds() > 600:
self.kill_transaction(txn.pid, "Idle in transaction >10 minutes")
# Get overall stats
stats = self.get_transaction_stats()
# Alert on high rollback rate
if stats['rollback_rate_percent'] > 10:
self.alert(
'warning',
f"High transaction rollback rate: {stats['rollback_rate_percent']}%",
stats
)
# Log periodic status
logger.info(
f"Monitoring: {stats['active_connections']} active, "
f"{len(long_running)} long-running, "
f"{len(blocked)} blocked, "
f"{stats['rollback_rate_percent']}% rollback rate"
)
time.sleep(self.check_interval)
except KeyboardInterrupt:
logger.info("Monitoring stopped by user")
break
except Exception as e:
logger.error(f"Monitoring error: {e}")
time.sleep(self.check_interval)
# Usage
if __name__ == "__main__":
monitor = PostgreSQLTransactionMonitor(
connection_string="postgresql://monitor_user:password@localhost:5432/mydb",
long_transaction_threshold=30, # 30 seconds
check_interval=10 # Check every 10 seconds
)
monitor.run_monitoring_loop()
```
### Example 2: Transaction Analysis Queries
```sql
-- PostgreSQL transaction health diagnostic queries
-- 1. Long-running transactions
SELECT
pid,
usename,
application_name,
client_addr,
NOW() - xact_start AS transaction_duration,
NOW() - query_start AS query_duration,
state,
LEFT(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND state != 'idle'
AND pid != pg_backend_pid()
ORDER BY xact_start;
-- 2. Blocking tree (which transactions are blocking others)
WITH RECURSIVE blocking_tree AS (
SELECT
a.pid,
a.usename,
a.query AS blocked_query,
NULL::integer AS blocking_pid,
NULL::text AS blocking_query,
1 AS level
FROM pg_stat_activity a
WHERE NOT EXISTS (
SELECT 1 FROM pg_stat_activity b
WHERE b.pid = ANY(pg_blocking_pids(a.pid))
)
AND a.pid IN (
SELECT unnest(pg_blocking_pids(c.pid))
FROM pg_stat_activity c
)
UNION ALL
SELECT
a.pid,
a.usename,
a.query,
b.pid,
b.query,
bt.level + 1
FROM blocking_tree bt
JOIN pg_stat_activity a ON a.pid = ANY(
SELECT unnest(pg_blocking_pids(x.pid))
FROM pg_stat_activity x
WHERE x.pid = bt.pid
)
JOIN pg_stat_activity b ON b.pid = ANY(pg_blocking_pids(a.pid))
)
SELECT
level,
pid,
usename,
blocking_pid,
LEFT(blocked_query, 50) AS blocked_query,
LEFT(blocking_query, 50) AS blocking_query
FROM blocking_tree
ORDER BY level, pid;
-- 3. Transaction rollback rate by database
SELECT
datname,
xact_commit AS commits,
xact_rollback AS rollbacks,
ROUND(100.0 * xact_rollback / NULLIF(xact_commit + xact_rollback, 0), 2) AS rollback_rate_percent
FROM pg_stat_database
WHERE datname NOT IN ('template0', 'template1', 'postgres')
ORDER BY rollback_rate_percent DESC;
-- 4. Idle in transaction connections
SELECT
pid,
usename,
application_name,
client_addr,
NOW() - state_change AS idle_duration,
state,
query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND pid != pg_backend_pid()
ORDER BY state_change;
-- 5. Lock wait time by query
SELECT
wait_event_type,
wait_event,
COUNT(*) AS waiting_count,
array_agg(DISTINCT pid) AS waiting_pids
FROM pg_stat_activity
WHERE wait_event IS NOT NULL
AND state = 'active'
GROUP BY wait_event_type, wait_event
ORDER BY waiting_count DESC;
```
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| "Permission denied for pg_stat_activity" | Insufficient monitoring privileges | Grant pg_monitor role or SELECT on pg_stat_activity |
| "Cannot terminate backend" | Trying to kill superuser connection | Use pg_cancel_backend or kill from OS level |
| "Connection pool exhausted" | Too many idle connections | Kill idle in transaction connections, increase pool size |
| "High rollback rate" | Application errors or constraint violations | Review application logs and fix bugs |
| "Lock wait timeout exceeded" | Deadlock or very long lock hold | Analyze blocking queries, implement timeouts |
## Configuration Options
**Monitoring Intervals**
- `check_interval`: 5-10 seconds for real-time alerting
- `long_transaction_threshold`: 30-60 seconds (production), 300s (analytics)
- `idle_in_transaction_timeout`: 600 seconds (10 minutes)
**Auto-Kill Thresholds**
- Long-running OLTP: 60-300 seconds
- Long-running analytics: 3600 seconds (1 hour)
- Idle in transaction: 600 seconds (10 minutes)
**Alert Thresholds**
- Rollback rate: >5% warning, >10% critical
- Blocked transactions: >10 warning, >50 critical
- Active connections: >80% of max_connections
## Best Practices
DO:
- Set statement_timeout in application connection strings
- Use connection pooling to limit total connections
- Implement transaction timeout in application code
- Monitor transaction throughput trends over time
- Kill idle in transaction connections automatically
- Track rollback reasons in application logs
DON'T:
- Leave transactions open while waiting for user input
- Hold locks during expensive operations (file I/O, network calls)
- Use long-running transactions in OLTP workloads
- Ignore idle in transaction connections (they hold locks)
- Set transaction timeouts too low (causes false positives)
## Performance Considerations
- Monitoring adds <0.1% CPU overhead with 10-second intervals
- pg_stat_activity queries are lightweight (<1ms)
- Auto-killing transactions requires careful threshold tuning
- Historical metrics retention: 30 days (aggregated), 7 days (detailed)
- Consider read replicas for monitoring queries in high-load systems
## Related Commands
- `/database-deadlock-detector` - Detailed deadlock analysis
- `/database-health-monitor` - Overall database health metrics
- `/sql-query-optimizer` - Optimize slow queries causing lock contention
- `/database-connection-pooler` - Manage connection pool sizing
## Version History
- v1.0.0 (2024-10): Initial implementation with PostgreSQL real-time monitoring
- Planned v1.1.0: Add MySQL transaction monitoring and distributed transaction support

61
plugin.lock.json Normal file
View File

@@ -0,0 +1,61 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:jeremylongshore/claude-code-plugins-plus:plugins/database/database-transaction-monitor",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "ae3d40391e224c96d732ff554066f96ad63964a1",
"treeHash": "fa80af828308e583ad409c7379ca6a608fab81f9495affd8926019f875da2358",
"generatedAt": "2025-11-28T10:18:22.258912Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "database-transaction-monitor",
"description": "Database plugin for database-transaction-monitor",
"version": "1.0.0"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "47aa8c5629b3dd8a1e001926e26f78d94ad9046fcd081e020c8e924b50769eed"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "b1e51f890a4ed23ba7a0a12e0885fb6f53056ec88f428bc3043adf248e44ed2f"
},
{
"path": "commands/transactions.md",
"sha256": "84ce85e7a64ef081bbf79a599fb60828b2688912e7d2a77e05c7def6ed524edb"
},
{
"path": "skills/database-transaction-monitor/SKILL.md",
"sha256": "4bba37a1db1b20fea9aee4da97c27ad6685c6d48d9959cd36f117ea723ca101b"
},
{
"path": "skills/database-transaction-monitor/references/README.md",
"sha256": "e07410ca05e9c06a5344d58e72d75ee7c97eb2f7e11910131d1b87ae50c795d5"
},
{
"path": "skills/database-transaction-monitor/scripts/README.md",
"sha256": "c52f146abfbf0451bea5f52d5cb5edc803e4e9c7896114142b66a06c06f2acf6"
},
{
"path": "skills/database-transaction-monitor/assets/README.md",
"sha256": "77a9afc430ec87f16c213d63aceed84d8dacbbe45d594a288b4b57e79e798123"
}
],
"dirSha256": "fa80af828308e583ad409c7379ca6a608fab81f9495affd8926019f875da2358"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,52 @@
---
name: monitoring-database-transactions
description: |
This skill enables Claude to monitor database transactions for performance and lock issues using the database-transaction-monitor plugin. It is triggered when the user requests transaction monitoring, lock detection, or rollback rate analysis for databases. Use this skill when the user mentions "monitor database transactions," "detect long-running transactions," "identify lock contention," "track rollback rates," or asks about "transaction anomalies." The skill leverages the `/txn-monitor` command to provide real-time alerts and insights into database health.
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
version: 1.0.0
---
## Overview
This skill empowers Claude to proactively monitor database transactions, identify performance bottlenecks like long-running queries and lock contention, and alert on anomalies such as high rollback rates. It provides insights into database health and helps prevent performance degradation.
## How It Works
1. **Activation**: The user's request triggers the `database-transaction-monitor` plugin.
2. **Transaction Monitoring**: The plugin executes the `/txn-monitor` command to initiate transaction monitoring.
3. **Alerting**: The plugin analyzes transaction data and generates alerts based on predefined thresholds for long-running transactions, lock wait times, and rollback rates.
## When to Use This Skill
This skill activates when you need to:
- Detect and kill long-running transactions blocking other queries.
- Monitor lock wait times and identify deadlock patterns.
- Track transaction rollback rates for error analysis.
## Examples
### Example 1: Detecting Long-Running Transactions
User request: "Find any long-running database transactions."
The skill will:
1. Activate the `database-transaction-monitor` plugin.
2. Execute the `/txn-monitor` command to identify transactions exceeding a predefined duration threshold.
### Example 2: Analyzing Lock Contention
User request: "Analyze database lock contention."
The skill will:
1. Activate the `database-transaction-monitor` plugin.
2. Execute the `/txn-monitor` command to monitor lock wait times and identify deadlock patterns.
## Best Practices
- **Threshold Configuration**: Configure appropriate thresholds for long-running transactions and lock wait times to minimize false positives.
- **Alerting Integration**: Integrate transaction alerts with existing monitoring systems for timely notification and response.
- **Regular Review**: Regularly review transaction monitoring data to identify trends and proactively address potential performance issues.
## Integration
This skill can be integrated with other monitoring and alerting tools to provide a comprehensive view of database health. It complements tools for query optimization and database schema design.

View File

@@ -0,0 +1,7 @@
# Assets
Bundled resources for database-transaction-monitor skill
- [ ] transaction_monitoring_template.json: Template for configuring transaction monitoring parameters.
- [ ] lock_contention_report_template.md: Template for generating lock contention reports.
- [ ] rollback_rate_analysis_template.md: Template for rollback rate analysis reports.

View File

@@ -0,0 +1,8 @@
# References
Bundled resources for database-transaction-monitor skill
- [ ] database_schema.md: Documentation of the database schema for transaction monitoring.
- [ ] sql_best_practices.md: Best practices for writing efficient SQL queries.
- [ ] transaction_monitoring_guide.md: A comprehensive guide to database transaction monitoring concepts and techniques.
- [ ] supported_databases.md: List of supported database systems and their specific monitoring configurations.

View File

@@ -0,0 +1,7 @@
# Scripts
Bundled resources for database-transaction-monitor skill
- [ ] transaction_monitor.py: Script to execute database queries and analyze transaction data.
- [ ] lock_detector.py: Script to detect and report lock contention issues.
- [ ] rollback_analyzer.py: Script to analyze rollback rates and identify potential problems.