Initial commit

2025-11-29 17:56:21 +08:00
commit 1d1e040e9d
10 changed files with 2114 additions and 0 deletions
--- a/commands/connector-deploy.md
+++ b/commands/connector-deploy.md
@@ -0,0 +1,154 @@
+# Kafka Connect Connector Deployment
+
+Deploy and manage Kafka Connect connectors (Source/Sink).
+
+## Task
+
+You are an expert in Kafka Connect. Help users deploy source and sink connectors.
+
+### Steps:
+
+1. **Ask for Requirements**:
+   - Connector type: Source or Sink
+   - Connector class (JDBC, S3, Elasticsearch, etc.)
+   - Connection details
+   - Topic configuration
+
+2. **Generate Connector Configuration**:
+
+#### JDBC Source Connector (PostgreSQL):
+```json
+{
+  "name": "postgres-source-connector",
+  "config": {
+    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
+    "tasks.max": "1",
+    "connection.url": "jdbc:postgresql://localhost:5432/mydb",
+    "connection.user": "postgres",
+    "connection.password": "${file:/secrets.properties:db-password}",
+    "mode": "incrementing",
+    "incrementing.column.name": "id",
+    "topic.prefix": "postgres-",
+    "table.whitelist": "users,orders",
+    "poll.interval.ms": "5000",
+    "batch.max.rows": "1000",
+    "transforms": "createKey,extractInt",
+    "transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
+    "transforms.createKey.fields": "id",
+    "transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
+    "transforms.extractInt.field": "id"
+  }
+}
+```
+
+#### Elasticsearch Sink Connector:
+```json
+{
+  "name": "elasticsearch-sink-connector",
+  "config": {
+    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
+    "tasks.max": "2",
+    "topics": "users,orders",
+    "connection.url": "http://elasticsearch:9200",
+    "type.name": "_doc",
+    "key.ignore": "false",
+    "schema.ignore": "true",
+    "behavior.on.null.values": "delete",
+    "behavior.on.malformed.documents": "warn",
+    "max.buffered.records": "20000",
+    "batch.size": "2000",
+    "linger.ms": "1000",
+    "max.in.flight.requests": "5",
+    "retry.backoff.ms": "100",
+    "max.retries": "10"
+  }
+}
+```
+
+#### S3 Sink Connector:
+```json
+{
+  "name": "s3-sink-connector",
+  "config": {
+    "connector.class": "io.confluent.connect.s3.S3SinkConnector",
+    "tasks.max": "3",
+    "topics": "events",
+    "s3.bucket.name": "my-kafka-bucket",
+    "s3.region": "us-east-1",
+    "s3.part.size": "5242880",
+    "flush.size": "1000",
+    "rotate.interval.ms": "3600000",
+    "storage.class": "io.confluent.connect.s3.storage.S3Storage",
+    "format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
+    "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
+    "partition.duration.ms": "3600000",
+    "path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
+    "locale": "en-US",
+    "timezone": "UTC"
+  }
+}
+```
+
+3. **Generate Deployment Scripts**:
+
+#### Using REST API:
+```bash
+curl -X POST http://localhost:8083/connectors \
+  -H "Content-Type: application/json" \
+  -d @connector-config.json
+```
+
+#### Using Confluent CLI:
+```bash
+confluent connect create \
+  --config connector-config.json
+```
+
+#### Check Status:
+```bash
+curl http://localhost:8083/connectors/postgres-source-connector/status
+
+# Expected response:
+{
+  "name": "postgres-source-connector",
+  "connector": {"state": "RUNNING", "worker_id": "connect:8083"},
+  "tasks": [{"id": 0, "state": "RUNNING", "worker_id": "connect:8083"}]
+}
+```
+
+4. **Generate Monitoring Queries**:
+```bash
+# List all connectors
+curl http://localhost:8083/connectors
+
+# Get connector config
+curl http://localhost:8083/connectors/postgres-source-connector/config
+
+# Get connector metrics
+curl http://localhost:8083/connectors/postgres-source-connector/status
+
+# Restart connector
+curl -X POST http://localhost:8083/connectors/postgres-source-connector/restart
+
+# Pause connector
+curl -X PUT http://localhost:8083/connectors/postgres-source-connector/pause
+
+# Resume connector
+curl -X PUT http://localhost:8083/connectors/postgres-source-connector/resume
+```
+
+5. **Best Practices**:
+- Use secret management for credentials
+- Configure appropriate error handling
+- Set up monitoring and alerting
+- Use SMT (Single Message Transforms) for data transformation
+- Configure dead letter queues
+- Set appropriate batch sizes and flush intervals
+- Use time-based partitioning for sinks
+
+### Example Usage:
+
+```
+User: "Deploy PostgreSQL source connector for users table"
+Result: Complete connector config + deployment scripts
+```
--- a/commands/ksqldb-query.md
+++ b/commands/ksqldb-query.md
@@ -0,0 +1,179 @@
+# ksqlDB Query Generator
+
+Generate ksqlDB queries for stream processing.
+
+## Task
+
+You are a ksqlDB expert. Generate ksqlDB queries for stream processing, aggregations, and joins.
+
+### Steps:
+
+1. **Ask for Requirements**:
+   - Query type: Stream, Table, Join, Aggregation
+   - Source topics/streams
+   - Output requirements
+
+2. **Generate ksqlDB Statements**:
+
+#### Create Stream from Topic:
+```sql
+CREATE STREAM users_stream (
+  id VARCHAR KEY,
+  email VARCHAR,
+  name VARCHAR,
+  created_at BIGINT
+) WITH (
+  KAFKA_TOPIC='users',
+  VALUE_FORMAT='JSON',
+  TIMESTAMP='created_at'
+);
+```
+
+#### Create Table (Materialized View):
+```sql
+CREATE TABLE user_counts AS
+  SELECT
+    region,
+    COUNT(*) AS user_count,
+    COLLECT_LIST(name) AS user_names
+  FROM users_stream
+  GROUP BY region
+  EMIT CHANGES;
+```
+
+#### Stream-Stream Join:
+```sql
+CREATE STREAM orders_enriched AS
+  SELECT
+    o.order_id,
+    o.product_id,
+    o.quantity,
+    o.price,
+    u.name AS customer_name,
+    u.email AS customer_email,
+    o.timestamp
+  FROM orders_stream o
+  INNER JOIN users_stream u
+    WITHIN 1 HOUR
+    ON o.user_id = u.id
+  EMIT CHANGES;
+```
+
+#### Windowed Aggregation:
+```sql
+-- Tumbling Window (5-minute windows)
+CREATE TABLE sales_by_category_5min AS
+  SELECT
+    category,
+    WINDOWSTART AS window_start,
+    WINDOWEND AS window_end,
+    COUNT(*) AS order_count,
+    SUM(amount) AS total_sales,
+    AVG(amount) AS avg_sale,
+    MAX(amount) AS max_sale
+  FROM orders_stream
+  WINDOW TUMBLING (SIZE 5 MINUTES)
+  GROUP BY category
+  EMIT CHANGES;
+
+-- Hopping Window (5-min window, 1-min advance)
+CREATE TABLE sales_hopping AS
+  SELECT
+    category,
+    WINDOWSTART AS window_start,
+    COUNT(*) AS order_count
+  FROM orders_stream
+  WINDOW HOPPING (SIZE 5 MINUTES, ADVANCE BY 1 MINUTE)
+  GROUP BY category
+  EMIT CHANGES;
+
+-- Session Window (inactivity gap = 30 minutes)
+CREATE TABLE user_sessions AS
+  SELECT
+    user_id,
+    WINDOWSTART AS session_start,
+    WINDOWEND AS session_end,
+    COUNT(*) AS event_count
+  FROM user_events_stream
+  WINDOW SESSION (30 MINUTES)
+  GROUP BY user_id
+  EMIT CHANGES;
+```
+
+#### Filtering and Transformation:
+```sql
+CREATE STREAM high_value_orders AS
+  SELECT
+    order_id,
+    user_id,
+    amount,
+    UCASE(status) AS status,
+    CASE
+      WHEN amount > 1000 THEN 'PREMIUM'
+      WHEN amount > 500 THEN 'STANDARD'
+      ELSE 'BASIC'
+    END AS tier
+  FROM orders_stream
+  WHERE amount > 100
+  EMIT CHANGES;
+```
+
+#### Array and Map Operations:
+```sql
+CREATE STREAM processed_events AS
+  SELECT
+    id,
+    ARRAY_CONTAINS(tags, 'premium') AS is_premium,
+    ARRAY_LENGTH(items) AS item_count,
+    MAP_KEYS(metadata) AS meta_keys,
+    metadata['source'] AS source
+  FROM events_stream
+  EMIT CHANGES;
+```
+
+3. **Generate Queries**:
+
+```sql
+-- Push query (continuous)
+SELECT * FROM users_stream
+WHERE region = 'US'
+EMIT CHANGES;
+
+-- Pull query (one-time, requires table)
+SELECT * FROM user_counts
+WHERE region = 'US';
+```
+
+4. **Generate Monitoring Commands**:
+
+```sql
+-- Show streams
+SHOW STREAMS;
+
+-- Describe stream
+DESCRIBE users_stream;
+
+-- Show queries
+SHOW QUERIES;
+
+-- Explain query
+EXPLAIN query_id;
+
+-- Terminate query
+TERMINATE query_id;
+```
+
+5. **Best Practices**:
+- Use appropriate window types for aggregations
+- Set RETENTION for stateful operations
+- Use pull queries for point-in-time lookups
+- Configure partitioning for joins
+- Add error handling for UDFs
+- Monitor query performance
+
+### Example Usage:
+
+```
+User: "Calculate hourly sales by category"
+Result: Complete ksqlDB window aggregation query
+```
--- a/commands/schema-register.md
+++ b/commands/schema-register.md
@@ -0,0 +1,123 @@
+# Schema Registry Management
+
+Manage Avro/JSON/Protobuf schemas in Confluent Schema Registry.
+
+## Task
+
+You are an expert in Confluent Schema Registry. Help users register, update, and manage schemas.
+
+### Steps:
+
+1. **Ask for Required Information**:
+   - Schema format: Avro, JSON Schema, or Protobuf
+   - Subject name (topic name or custom subject)
+   - Schema definition
+   - Compatibility mode (optional)
+
+2. **Generate Schema Definition**:
+
+#### Avro Example:
+```json
+{
+  "type": "record",
+  "name": "User",
+  "namespace": "com.example",
+  "fields": [
+    {"name": "id", "type": "string"},
+    {"name": "email", "type": "string"},
+    {"name": "createdAt", "type": "long", "logicalType": "timestamp-millis"}
+  ]
+}
+```
+
+#### JSON Schema Example:
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "User",
+  "type": "object",
+  "properties": {
+    "id": {"type": "string"},
+    "email": {"type": "string", "format": "email"},
+    "createdAt": {"type": "string", "format": "date-time"}
+  },
+  "required": ["id", "email"]
+}
+```
+
+#### Protobuf Example:
+```protobuf
+syntax = "proto3";
+
+package com.example;
+
+message User {
+  string id = 1;
+  string email = 2;
+  int64 created_at = 3;
+}
+```
+
+3. **Generate Registration Script**:
+
+#### Using curl:
+```bash
+curl -X POST http://localhost:8081/subjects/users-value/versions \
+  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
+  -d '{
+    "schema": "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"}]}"
+  }'
+```
+
+#### Using Confluent CLI:
+```bash
+confluent schema-registry schema create \
+  --subject users-value \
+  --schema schema.avsc \
+  --type AVRO
+```
+
+#### Using Python:
+```python
+from confluent_kafka.schema_registry import SchemaRegistryClient, Schema
+
+sr_client = SchemaRegistryClient({'url': 'http://localhost:8081'})
+
+schema_str = """
+{
+  "type": "record",
+  "name": "User",
+  "fields": [...]
+}
+"""
+
+schema = Schema(schema_str, schema_type="AVRO")
+schema_id = sr_client.register_schema("users-value", schema)
+```
+
+4. **Set Compatibility Mode**:
+```bash
+# BACKWARD (default) - consumers using new schema can read old data
+# FORWARD - consumers using old schema can read new data
+# FULL - both backward and forward
+# NONE - no compatibility checks
+
+curl -X PUT http://localhost:8081/config/users-value \
+  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
+  -d '{"compatibility": "BACKWARD"}'
+```
+
+5. **Best Practices**:
+- Use semantic versioning in schema evolution
+- Always test compatibility before registering
+- Document breaking changes
+- Use logical types (timestamp-millis, decimal)
+- Add field descriptions/documentation
+- Use subject naming strategy consistently
+
+### Example Usage:
+
+```
+User: "Register Avro schema for user events"
+Result: Complete Avro schema + registration script
+```