97 lines
2.8 KiB
Markdown
97 lines
2.8 KiB
Markdown
---
|
|
name: specweave-kafka:monitor-setup
|
|
description: Set up comprehensive Kafka monitoring with Prometheus + Grafana. Configures JMX exporter, dashboards, and alerting rules.
|
|
---
|
|
|
|
# Set Up Kafka Monitoring
|
|
|
|
Configure comprehensive monitoring for your Kafka cluster using Prometheus and Grafana.
|
|
|
|
## What This Command Does
|
|
|
|
1. **JMX Exporter Setup**: Configure Prometheus JMX exporter for Kafka brokers
|
|
2. **Prometheus Configuration**: Add Kafka scrape targets
|
|
3. **Grafana Dashboards**: Install 5 pre-built dashboards
|
|
4. **Alerting Rules**: Configure 14 critical/high/warning alerts
|
|
5. **Verification**: Test metrics collection and dashboard access
|
|
|
|
## Interactive Workflow
|
|
|
|
I'll detect your environment and guide setup:
|
|
|
|
### Environment Detection
|
|
- **Kubernetes** (Strimzi/Confluent Operator) → Use PodMonitor
|
|
- **Docker Compose** → Add Prometheus + Grafana services
|
|
- **VM/Bare Metal** → Configure JMX exporter JAR
|
|
|
|
### Question 1: Where is Kafka running?
|
|
- Kubernetes (Strimzi)
|
|
- Docker Compose
|
|
- VMs/EC2 instances
|
|
|
|
### Question 2: Prometheus already installed?
|
|
- Yes → Just add Kafka scrape config
|
|
- No → Install Prometheus + Grafana stack
|
|
|
|
## Example Usage
|
|
|
|
```bash
|
|
# Start monitoring setup wizard
|
|
/specweave-kafka:monitor-setup
|
|
|
|
# I'll activate kafka-observability skill and:
|
|
# 1. Detect your environment
|
|
# 2. Configure JMX exporter (port 7071)
|
|
# 3. Set up Prometheus scraping
|
|
# 4. Install 5 Grafana dashboards
|
|
# 5. Configure 14 alerting rules
|
|
# 6. Verify metrics collection
|
|
```
|
|
|
|
## What Gets Configured
|
|
|
|
**JMX Exporter** (Kafka brokers):
|
|
- Metrics endpoint on port 7071
|
|
- 50+ critical Kafka metrics exported
|
|
- Broker, topic, consumer lag, JVM metrics
|
|
|
|
**Prometheus Scraping**:
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'kafka'
|
|
static_configs:
|
|
- targets: ['kafka-0:7071', 'kafka-1:7071', 'kafka-2:7071']
|
|
```
|
|
|
|
**5 Grafana Dashboards**:
|
|
1. **Cluster Overview** - Health, throughput, ISR changes
|
|
2. **Broker Metrics** - CPU, memory, network, request handlers
|
|
3. **Consumer Lag** - Lag per group/topic, offset tracking
|
|
4. **Topic Metrics** - Partition count, replication, log size
|
|
5. **JVM Metrics** - Heap, GC, threads, file descriptors
|
|
|
|
**14 Alerting Rules**:
|
|
- CRITICAL: Under-replicated partitions, offline partitions, no controller
|
|
- HIGH: Consumer lag, ISR shrinks, leader elections
|
|
- WARNING: CPU, memory, GC time, disk usage
|
|
|
|
## Prerequisites
|
|
|
|
- Kafka cluster running (self-hosted or K8s)
|
|
- Prometheus installed (or will be installed)
|
|
- Grafana installed (or will be installed)
|
|
|
|
## Post-Setup
|
|
|
|
After setup completes, I'll:
|
|
1. ✅ Provide Grafana URL and credentials
|
|
2. ✅ Show how to access dashboards
|
|
3. ✅ Explain critical alerts
|
|
4. ✅ Suggest testing alerts by stopping a broker
|
|
|
|
---
|
|
|
|
**Skills Activated**: kafka-observability
|
|
**Related Commands**: /specweave-kafka:deploy
|
|
**Dashboard Locations**: `plugins/specweave-kafka/monitoring/grafana/dashboards/`
|