---
name: specweave-kafka:monitor-setup
description: Set up comprehensive Kafka monitoring with Prometheus + Grafana. Configures JMX exporter, dashboards, and alerting rules.
---

# Set Up Kafka Monitoring

Configure comprehensive monitoring for your Kafka cluster using Prometheus and Grafana.

## What This Command Does

1. **JMX Exporter Setup**: Configure Prometheus JMX exporter for Kafka brokers
2. **Prometheus Configuration**: Add Kafka scrape targets
3. **Grafana Dashboards**: Install 5 pre-built dashboards
4. **Alerting Rules**: Configure 14 critical/high/warning alerts
5. **Verification**: Test metrics collection and dashboard access

## Interactive Workflow

I'll detect your environment and guide setup:

### Environment Detection
- **Kubernetes** (Strimzi/Confluent Operator) → Use PodMonitor
- **Docker Compose** → Add Prometheus + Grafana services
- **VM/Bare Metal** → Configure JMX exporter JAR

### Question 1: Where is Kafka running?
- Kubernetes (Strimzi)
- Docker Compose
- VMs/EC2 instances

### Question 2: Prometheus already installed?
- Yes → Just add Kafka scrape config
- No → Install Prometheus + Grafana stack

## Example Usage

```bash
# Start monitoring setup wizard
/specweave-kafka:monitor-setup

# I'll activate kafka-observability skill and:
# 1. Detect your environment
# 2. Configure JMX exporter (port 7071)
# 3. Set up Prometheus scraping
# 4. Install 5 Grafana dashboards
# 5. Configure 14 alerting rules
# 6. Verify metrics collection
```

## What Gets Configured

**JMX Exporter** (Kafka brokers):
- Metrics endpoint on port 7071
- 50+ critical Kafka metrics exported
- Broker, topic, consumer lag, JVM metrics

**Prometheus Scraping**:
```yaml
scrape_configs:
  - job_name: 'kafka'
    static_configs:
      - targets: ['kafka-0:7071', 'kafka-1:7071', 'kafka-2:7071']
```

**5 Grafana Dashboards**:
1. **Cluster Overview** - Health, throughput, ISR changes
2. **Broker Metrics** - CPU, memory, network, request handlers
3. **Consumer Lag** - Lag per group/topic, offset tracking
4. **Topic Metrics** - Partition count, replication, log size
5. **JVM Metrics** - Heap, GC, threads, file descriptors

**14 Alerting Rules**:
- CRITICAL: Under-replicated partitions, offline partitions, no controller
- HIGH: Consumer lag, ISR shrinks, leader elections
- WARNING: CPU, memory, GC time, disk usage

## Prerequisites

- Kafka cluster running (self-hosted or K8s)
- Prometheus installed (or will be installed)
- Grafana installed (or will be installed)

## Post-Setup

After setup completes, I'll:
1. ✅ Provide Grafana URL and credentials
2. ✅ Show how to access dashboards
3. ✅ Explain critical alerts
4. ✅ Suggest testing alerts by stopping a broker

---

**Skills Activated**: kafka-observability
**Related Commands**: /specweave-kafka:deploy
**Dashboard Locations**: `plugins/specweave-kafka/monitoring/grafana/dashboards/`