Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:56:21 +08:00
commit 1d1e040e9d
10 changed files with 2114 additions and 0 deletions

View File

@@ -0,0 +1,154 @@
# Kafka Connect Connector Deployment
Deploy and manage Kafka Connect connectors (Source/Sink).
## Task
You are an expert in Kafka Connect. Help users deploy source and sink connectors.
### Steps:
1. **Ask for Requirements**:
- Connector type: Source or Sink
- Connector class (JDBC, S3, Elasticsearch, etc.)
- Connection details
- Topic configuration
2. **Generate Connector Configuration**:
#### JDBC Source Connector (PostgreSQL):
```json
{
"name": "postgres-source-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:postgresql://localhost:5432/mydb",
"connection.user": "postgres",
"connection.password": "${file:/secrets.properties:db-password}",
"mode": "incrementing",
"incrementing.column.name": "id",
"topic.prefix": "postgres-",
"table.whitelist": "users,orders",
"poll.interval.ms": "5000",
"batch.max.rows": "1000",
"transforms": "createKey,extractInt",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id"
}
}
```
#### Elasticsearch Sink Connector:
```json
{
"name": "elasticsearch-sink-connector",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "2",
"topics": "users,orders",
"connection.url": "http://elasticsearch:9200",
"type.name": "_doc",
"key.ignore": "false",
"schema.ignore": "true",
"behavior.on.null.values": "delete",
"behavior.on.malformed.documents": "warn",
"max.buffered.records": "20000",
"batch.size": "2000",
"linger.ms": "1000",
"max.in.flight.requests": "5",
"retry.backoff.ms": "100",
"max.retries": "10"
}
}
```
#### S3 Sink Connector:
```json
{
"name": "s3-sink-connector",
"config": {
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "3",
"topics": "events",
"s3.bucket.name": "my-kafka-bucket",
"s3.region": "us-east-1",
"s3.part.size": "5242880",
"flush.size": "1000",
"rotate.interval.ms": "3600000",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"partition.duration.ms": "3600000",
"path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
"locale": "en-US",
"timezone": "UTC"
}
}
```
3. **Generate Deployment Scripts**:
#### Using REST API:
```bash
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @connector-config.json
```
#### Using Confluent CLI:
```bash
confluent connect create \
--config connector-config.json
```
#### Check Status:
```bash
curl http://localhost:8083/connectors/postgres-source-connector/status
# Expected response:
{
"name": "postgres-source-connector",
"connector": {"state": "RUNNING", "worker_id": "connect:8083"},
"tasks": [{"id": 0, "state": "RUNNING", "worker_id": "connect:8083"}]
}
```
4. **Generate Monitoring Queries**:
```bash
# List all connectors
curl http://localhost:8083/connectors
# Get connector config
curl http://localhost:8083/connectors/postgres-source-connector/config
# Get connector metrics
curl http://localhost:8083/connectors/postgres-source-connector/status
# Restart connector
curl -X POST http://localhost:8083/connectors/postgres-source-connector/restart
# Pause connector
curl -X PUT http://localhost:8083/connectors/postgres-source-connector/pause
# Resume connector
curl -X PUT http://localhost:8083/connectors/postgres-source-connector/resume
```
5. **Best Practices**:
- Use secret management for credentials
- Configure appropriate error handling
- Set up monitoring and alerting
- Use SMT (Single Message Transforms) for data transformation
- Configure dead letter queues
- Set appropriate batch sizes and flush intervals
- Use time-based partitioning for sinks
### Example Usage:
```
User: "Deploy PostgreSQL source connector for users table"
Result: Complete connector config + deployment scripts
```

179
commands/ksqldb-query.md Normal file
View File

@@ -0,0 +1,179 @@
# ksqlDB Query Generator
Generate ksqlDB queries for stream processing.
## Task
You are a ksqlDB expert. Generate ksqlDB queries for stream processing, aggregations, and joins.
### Steps:
1. **Ask for Requirements**:
- Query type: Stream, Table, Join, Aggregation
- Source topics/streams
- Output requirements
2. **Generate ksqlDB Statements**:
#### Create Stream from Topic:
```sql
CREATE STREAM users_stream (
id VARCHAR KEY,
email VARCHAR,
name VARCHAR,
created_at BIGINT
) WITH (
KAFKA_TOPIC='users',
VALUE_FORMAT='JSON',
TIMESTAMP='created_at'
);
```
#### Create Table (Materialized View):
```sql
CREATE TABLE user_counts AS
SELECT
region,
COUNT(*) AS user_count,
COLLECT_LIST(name) AS user_names
FROM users_stream
GROUP BY region
EMIT CHANGES;
```
#### Stream-Stream Join:
```sql
CREATE STREAM orders_enriched AS
SELECT
o.order_id,
o.product_id,
o.quantity,
o.price,
u.name AS customer_name,
u.email AS customer_email,
o.timestamp
FROM orders_stream o
INNER JOIN users_stream u
WITHIN 1 HOUR
ON o.user_id = u.id
EMIT CHANGES;
```
#### Windowed Aggregation:
```sql
-- Tumbling Window (5-minute windows)
CREATE TABLE sales_by_category_5min AS
SELECT
category,
WINDOWSTART AS window_start,
WINDOWEND AS window_end,
COUNT(*) AS order_count,
SUM(amount) AS total_sales,
AVG(amount) AS avg_sale,
MAX(amount) AS max_sale
FROM orders_stream
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY category
EMIT CHANGES;
-- Hopping Window (5-min window, 1-min advance)
CREATE TABLE sales_hopping AS
SELECT
category,
WINDOWSTART AS window_start,
COUNT(*) AS order_count
FROM orders_stream
WINDOW HOPPING (SIZE 5 MINUTES, ADVANCE BY 1 MINUTE)
GROUP BY category
EMIT CHANGES;
-- Session Window (inactivity gap = 30 minutes)
CREATE TABLE user_sessions AS
SELECT
user_id,
WINDOWSTART AS session_start,
WINDOWEND AS session_end,
COUNT(*) AS event_count
FROM user_events_stream
WINDOW SESSION (30 MINUTES)
GROUP BY user_id
EMIT CHANGES;
```
#### Filtering and Transformation:
```sql
CREATE STREAM high_value_orders AS
SELECT
order_id,
user_id,
amount,
UCASE(status) AS status,
CASE
WHEN amount > 1000 THEN 'PREMIUM'
WHEN amount > 500 THEN 'STANDARD'
ELSE 'BASIC'
END AS tier
FROM orders_stream
WHERE amount > 100
EMIT CHANGES;
```
#### Array and Map Operations:
```sql
CREATE STREAM processed_events AS
SELECT
id,
ARRAY_CONTAINS(tags, 'premium') AS is_premium,
ARRAY_LENGTH(items) AS item_count,
MAP_KEYS(metadata) AS meta_keys,
metadata['source'] AS source
FROM events_stream
EMIT CHANGES;
```
3. **Generate Queries**:
```sql
-- Push query (continuous)
SELECT * FROM users_stream
WHERE region = 'US'
EMIT CHANGES;
-- Pull query (one-time, requires table)
SELECT * FROM user_counts
WHERE region = 'US';
```
4. **Generate Monitoring Commands**:
```sql
-- Show streams
SHOW STREAMS;
-- Describe stream
DESCRIBE users_stream;
-- Show queries
SHOW QUERIES;
-- Explain query
EXPLAIN query_id;
-- Terminate query
TERMINATE query_id;
```
5. **Best Practices**:
- Use appropriate window types for aggregations
- Set RETENTION for stateful operations
- Use pull queries for point-in-time lookups
- Configure partitioning for joins
- Add error handling for UDFs
- Monitor query performance
### Example Usage:
```
User: "Calculate hourly sales by category"
Result: Complete ksqlDB window aggregation query
```

123
commands/schema-register.md Normal file
View File

@@ -0,0 +1,123 @@
# Schema Registry Management
Manage Avro/JSON/Protobuf schemas in Confluent Schema Registry.
## Task
You are an expert in Confluent Schema Registry. Help users register, update, and manage schemas.
### Steps:
1. **Ask for Required Information**:
- Schema format: Avro, JSON Schema, or Protobuf
- Subject name (topic name or custom subject)
- Schema definition
- Compatibility mode (optional)
2. **Generate Schema Definition**:
#### Avro Example:
```json
{
"type": "record",
"name": "User",
"namespace": "com.example",
"fields": [
{"name": "id", "type": "string"},
{"name": "email", "type": "string"},
{"name": "createdAt", "type": "long", "logicalType": "timestamp-millis"}
]
}
```
#### JSON Schema Example:
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "User",
"type": "object",
"properties": {
"id": {"type": "string"},
"email": {"type": "string", "format": "email"},
"createdAt": {"type": "string", "format": "date-time"}
},
"required": ["id", "email"]
}
```
#### Protobuf Example:
```protobuf
syntax = "proto3";
package com.example;
message User {
string id = 1;
string email = 2;
int64 created_at = 3;
}
```
3. **Generate Registration Script**:
#### Using curl:
```bash
curl -X POST http://localhost:8081/subjects/users-value/versions \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{
"schema": "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"}]}"
}'
```
#### Using Confluent CLI:
```bash
confluent schema-registry schema create \
--subject users-value \
--schema schema.avsc \
--type AVRO
```
#### Using Python:
```python
from confluent_kafka.schema_registry import SchemaRegistryClient, Schema
sr_client = SchemaRegistryClient({'url': 'http://localhost:8081'})
schema_str = """
{
"type": "record",
"name": "User",
"fields": [...]
}
"""
schema = Schema(schema_str, schema_type="AVRO")
schema_id = sr_client.register_schema("users-value", schema)
```
4. **Set Compatibility Mode**:
```bash
# BACKWARD (default) - consumers using new schema can read old data
# FORWARD - consumers using old schema can read new data
# FULL - both backward and forward
# NONE - no compatibility checks
curl -X PUT http://localhost:8081/config/users-value \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{"compatibility": "BACKWARD"}'
```
5. **Best Practices**:
- Use semantic versioning in schema evolution
- Always test compatibility before registering
- Document breaking changes
- Use logical types (timestamp-millis, decimal)
- Add field descriptions/documentation
- Use subject naming strategy consistently
### Example Usage:
```
User: "Register Avro schema for user events"
Result: Complete Avro schema + registration script
```