Files
gh-anton-abyzov-specweave-p…/commands/monitor-setup.md
2025-11-29 17:56:46 +08:00

97 lines
2.8 KiB
Markdown

---
name: specweave-kafka:monitor-setup
description: Set up comprehensive Kafka monitoring with Prometheus + Grafana. Configures JMX exporter, dashboards, and alerting rules.
---
# Set Up Kafka Monitoring
Configure comprehensive monitoring for your Kafka cluster using Prometheus and Grafana.
## What This Command Does
1. **JMX Exporter Setup**: Configure Prometheus JMX exporter for Kafka brokers
2. **Prometheus Configuration**: Add Kafka scrape targets
3. **Grafana Dashboards**: Install 5 pre-built dashboards
4. **Alerting Rules**: Configure 14 critical/high/warning alerts
5. **Verification**: Test metrics collection and dashboard access
## Interactive Workflow
I'll detect your environment and guide setup:
### Environment Detection
- **Kubernetes** (Strimzi/Confluent Operator) → Use PodMonitor
- **Docker Compose** → Add Prometheus + Grafana services
- **VM/Bare Metal** → Configure JMX exporter JAR
### Question 1: Where is Kafka running?
- Kubernetes (Strimzi)
- Docker Compose
- VMs/EC2 instances
### Question 2: Prometheus already installed?
- Yes → Just add Kafka scrape config
- No → Install Prometheus + Grafana stack
## Example Usage
```bash
# Start monitoring setup wizard
/specweave-kafka:monitor-setup
# I'll activate kafka-observability skill and:
# 1. Detect your environment
# 2. Configure JMX exporter (port 7071)
# 3. Set up Prometheus scraping
# 4. Install 5 Grafana dashboards
# 5. Configure 14 alerting rules
# 6. Verify metrics collection
```
## What Gets Configured
**JMX Exporter** (Kafka brokers):
- Metrics endpoint on port 7071
- 50+ critical Kafka metrics exported
- Broker, topic, consumer lag, JVM metrics
**Prometheus Scraping**:
```yaml
scrape_configs:
- job_name: 'kafka'
static_configs:
- targets: ['kafka-0:7071', 'kafka-1:7071', 'kafka-2:7071']
```
**5 Grafana Dashboards**:
1. **Cluster Overview** - Health, throughput, ISR changes
2. **Broker Metrics** - CPU, memory, network, request handlers
3. **Consumer Lag** - Lag per group/topic, offset tracking
4. **Topic Metrics** - Partition count, replication, log size
5. **JVM Metrics** - Heap, GC, threads, file descriptors
**14 Alerting Rules**:
- CRITICAL: Under-replicated partitions, offline partitions, no controller
- HIGH: Consumer lag, ISR shrinks, leader elections
- WARNING: CPU, memory, GC time, disk usage
## Prerequisites
- Kafka cluster running (self-hosted or K8s)
- Prometheus installed (or will be installed)
- Grafana installed (or will be installed)
## Post-Setup
After setup completes, I'll:
1. ✅ Provide Grafana URL and credentials
2. ✅ Show how to access dashboards
3. ✅ Explain critical alerts
4. ✅ Suggest testing alerts by stopping a broker
---
**Skills Activated**: kafka-observability
**Related Commands**: /specweave-kafka:deploy
**Dashboard Locations**: `plugins/specweave-kafka/monitoring/grafana/dashboards/`