zhongwei/gh-anton-abyzov-specweave-plugins-specweave-kafka

Files

Zhongwei Li 96a7ab295d Initial commit

2025-11-29 17:56:46 +08:00

2.8 KiB

Raw Blame History

name, description

name	description
specweave-kafka:monitor-setup	Set up comprehensive Kafka monitoring with Prometheus + Grafana. Configures JMX exporter, dashboards, and alerting rules.

Set Up Kafka Monitoring

Configure comprehensive monitoring for your Kafka cluster using Prometheus and Grafana.

What This Command Does

JMX Exporter Setup: Configure Prometheus JMX exporter for Kafka brokers
Prometheus Configuration: Add Kafka scrape targets
Grafana Dashboards: Install 5 pre-built dashboards
Alerting Rules: Configure 14 critical/high/warning alerts
Verification: Test metrics collection and dashboard access

Interactive Workflow

I'll detect your environment and guide setup:

Environment Detection

Kubernetes (Strimzi/Confluent Operator) → Use PodMonitor
Docker Compose → Add Prometheus + Grafana services
VM/Bare Metal → Configure JMX exporter JAR

Question 1: Where is Kafka running?

Kubernetes (Strimzi)
Docker Compose
VMs/EC2 instances

Question 2: Prometheus already installed?

Yes → Just add Kafka scrape config
No → Install Prometheus + Grafana stack

Example Usage

# Start monitoring setup wizard
/specweave-kafka:monitor-setup

# I'll activate kafka-observability skill and:
# 1. Detect your environment
# 2. Configure JMX exporter (port 7071)
# 3. Set up Prometheus scraping
# 4. Install 5 Grafana dashboards
# 5. Configure 14 alerting rules
# 6. Verify metrics collection

What Gets Configured

JMX Exporter (Kafka brokers):

Metrics endpoint on port 7071
50+ critical Kafka metrics exported
Broker, topic, consumer lag, JVM metrics

Prometheus Scraping:

scrape_configs:
  - job_name: 'kafka'
    static_configs:
      - targets: ['kafka-0:7071', 'kafka-1:7071', 'kafka-2:7071']

5 Grafana Dashboards:

Cluster Overview - Health, throughput, ISR changes
Broker Metrics - CPU, memory, network, request handlers
Consumer Lag - Lag per group/topic, offset tracking
Topic Metrics - Partition count, replication, log size
JVM Metrics - Heap, GC, threads, file descriptors

14 Alerting Rules:

CRITICAL: Under-replicated partitions, offline partitions, no controller
HIGH: Consumer lag, ISR shrinks, leader elections
WARNING: CPU, memory, GC time, disk usage

Prerequisites

Kafka cluster running (self-hosted or K8s)
Prometheus installed (or will be installed)
Grafana installed (or will be installed)

Post-Setup

After setup completes, I'll:

✅ Provide Grafana URL and credentials
✅ Show how to access dashboards
✅ Explain critical alerts
✅ Suggest testing alerts by stopping a broker

Skills Activated: kafka-observability Related Commands: /specweave-kafka:deploy Dashboard Locations: plugins/specweave-kafka/monitoring/grafana/dashboards/

2.8 KiB Raw Blame History