Initial commit
This commit is contained in:
154
commands/connector-deploy.md
Normal file
154
commands/connector-deploy.md
Normal file
@@ -0,0 +1,154 @@
|
||||
# Kafka Connect Connector Deployment
|
||||
|
||||
Deploy and manage Kafka Connect connectors (Source/Sink).
|
||||
|
||||
## Task
|
||||
|
||||
You are an expert in Kafka Connect. Help users deploy source and sink connectors.
|
||||
|
||||
### Steps:
|
||||
|
||||
1. **Ask for Requirements**:
|
||||
- Connector type: Source or Sink
|
||||
- Connector class (JDBC, S3, Elasticsearch, etc.)
|
||||
- Connection details
|
||||
- Topic configuration
|
||||
|
||||
2. **Generate Connector Configuration**:
|
||||
|
||||
#### JDBC Source Connector (PostgreSQL):
|
||||
```json
|
||||
{
|
||||
"name": "postgres-source-connector",
|
||||
"config": {
|
||||
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
|
||||
"tasks.max": "1",
|
||||
"connection.url": "jdbc:postgresql://localhost:5432/mydb",
|
||||
"connection.user": "postgres",
|
||||
"connection.password": "${file:/secrets.properties:db-password}",
|
||||
"mode": "incrementing",
|
||||
"incrementing.column.name": "id",
|
||||
"topic.prefix": "postgres-",
|
||||
"table.whitelist": "users,orders",
|
||||
"poll.interval.ms": "5000",
|
||||
"batch.max.rows": "1000",
|
||||
"transforms": "createKey,extractInt",
|
||||
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
|
||||
"transforms.createKey.fields": "id",
|
||||
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
|
||||
"transforms.extractInt.field": "id"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Elasticsearch Sink Connector:
|
||||
```json
|
||||
{
|
||||
"name": "elasticsearch-sink-connector",
|
||||
"config": {
|
||||
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
|
||||
"tasks.max": "2",
|
||||
"topics": "users,orders",
|
||||
"connection.url": "http://elasticsearch:9200",
|
||||
"type.name": "_doc",
|
||||
"key.ignore": "false",
|
||||
"schema.ignore": "true",
|
||||
"behavior.on.null.values": "delete",
|
||||
"behavior.on.malformed.documents": "warn",
|
||||
"max.buffered.records": "20000",
|
||||
"batch.size": "2000",
|
||||
"linger.ms": "1000",
|
||||
"max.in.flight.requests": "5",
|
||||
"retry.backoff.ms": "100",
|
||||
"max.retries": "10"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### S3 Sink Connector:
|
||||
```json
|
||||
{
|
||||
"name": "s3-sink-connector",
|
||||
"config": {
|
||||
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
|
||||
"tasks.max": "3",
|
||||
"topics": "events",
|
||||
"s3.bucket.name": "my-kafka-bucket",
|
||||
"s3.region": "us-east-1",
|
||||
"s3.part.size": "5242880",
|
||||
"flush.size": "1000",
|
||||
"rotate.interval.ms": "3600000",
|
||||
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
|
||||
"format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
|
||||
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
|
||||
"partition.duration.ms": "3600000",
|
||||
"path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
|
||||
"locale": "en-US",
|
||||
"timezone": "UTC"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. **Generate Deployment Scripts**:
|
||||
|
||||
#### Using REST API:
|
||||
```bash
|
||||
curl -X POST http://localhost:8083/connectors \
|
||||
-H "Content-Type: application/json" \
|
||||
-d @connector-config.json
|
||||
```
|
||||
|
||||
#### Using Confluent CLI:
|
||||
```bash
|
||||
confluent connect create \
|
||||
--config connector-config.json
|
||||
```
|
||||
|
||||
#### Check Status:
|
||||
```bash
|
||||
curl http://localhost:8083/connectors/postgres-source-connector/status
|
||||
|
||||
# Expected response:
|
||||
{
|
||||
"name": "postgres-source-connector",
|
||||
"connector": {"state": "RUNNING", "worker_id": "connect:8083"},
|
||||
"tasks": [{"id": 0, "state": "RUNNING", "worker_id": "connect:8083"}]
|
||||
}
|
||||
```
|
||||
|
||||
4. **Generate Monitoring Queries**:
|
||||
```bash
|
||||
# List all connectors
|
||||
curl http://localhost:8083/connectors
|
||||
|
||||
# Get connector config
|
||||
curl http://localhost:8083/connectors/postgres-source-connector/config
|
||||
|
||||
# Get connector metrics
|
||||
curl http://localhost:8083/connectors/postgres-source-connector/status
|
||||
|
||||
# Restart connector
|
||||
curl -X POST http://localhost:8083/connectors/postgres-source-connector/restart
|
||||
|
||||
# Pause connector
|
||||
curl -X PUT http://localhost:8083/connectors/postgres-source-connector/pause
|
||||
|
||||
# Resume connector
|
||||
curl -X PUT http://localhost:8083/connectors/postgres-source-connector/resume
|
||||
```
|
||||
|
||||
5. **Best Practices**:
|
||||
- Use secret management for credentials
|
||||
- Configure appropriate error handling
|
||||
- Set up monitoring and alerting
|
||||
- Use SMT (Single Message Transforms) for data transformation
|
||||
- Configure dead letter queues
|
||||
- Set appropriate batch sizes and flush intervals
|
||||
- Use time-based partitioning for sinks
|
||||
|
||||
### Example Usage:
|
||||
|
||||
```
|
||||
User: "Deploy PostgreSQL source connector for users table"
|
||||
Result: Complete connector config + deployment scripts
|
||||
```
|
||||
179
commands/ksqldb-query.md
Normal file
179
commands/ksqldb-query.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# ksqlDB Query Generator
|
||||
|
||||
Generate ksqlDB queries for stream processing.
|
||||
|
||||
## Task
|
||||
|
||||
You are a ksqlDB expert. Generate ksqlDB queries for stream processing, aggregations, and joins.
|
||||
|
||||
### Steps:
|
||||
|
||||
1. **Ask for Requirements**:
|
||||
- Query type: Stream, Table, Join, Aggregation
|
||||
- Source topics/streams
|
||||
- Output requirements
|
||||
|
||||
2. **Generate ksqlDB Statements**:
|
||||
|
||||
#### Create Stream from Topic:
|
||||
```sql
|
||||
CREATE STREAM users_stream (
|
||||
id VARCHAR KEY,
|
||||
email VARCHAR,
|
||||
name VARCHAR,
|
||||
created_at BIGINT
|
||||
) WITH (
|
||||
KAFKA_TOPIC='users',
|
||||
VALUE_FORMAT='JSON',
|
||||
TIMESTAMP='created_at'
|
||||
);
|
||||
```
|
||||
|
||||
#### Create Table (Materialized View):
|
||||
```sql
|
||||
CREATE TABLE user_counts AS
|
||||
SELECT
|
||||
region,
|
||||
COUNT(*) AS user_count,
|
||||
COLLECT_LIST(name) AS user_names
|
||||
FROM users_stream
|
||||
GROUP BY region
|
||||
EMIT CHANGES;
|
||||
```
|
||||
|
||||
#### Stream-Stream Join:
|
||||
```sql
|
||||
CREATE STREAM orders_enriched AS
|
||||
SELECT
|
||||
o.order_id,
|
||||
o.product_id,
|
||||
o.quantity,
|
||||
o.price,
|
||||
u.name AS customer_name,
|
||||
u.email AS customer_email,
|
||||
o.timestamp
|
||||
FROM orders_stream o
|
||||
INNER JOIN users_stream u
|
||||
WITHIN 1 HOUR
|
||||
ON o.user_id = u.id
|
||||
EMIT CHANGES;
|
||||
```
|
||||
|
||||
#### Windowed Aggregation:
|
||||
```sql
|
||||
-- Tumbling Window (5-minute windows)
|
||||
CREATE TABLE sales_by_category_5min AS
|
||||
SELECT
|
||||
category,
|
||||
WINDOWSTART AS window_start,
|
||||
WINDOWEND AS window_end,
|
||||
COUNT(*) AS order_count,
|
||||
SUM(amount) AS total_sales,
|
||||
AVG(amount) AS avg_sale,
|
||||
MAX(amount) AS max_sale
|
||||
FROM orders_stream
|
||||
WINDOW TUMBLING (SIZE 5 MINUTES)
|
||||
GROUP BY category
|
||||
EMIT CHANGES;
|
||||
|
||||
-- Hopping Window (5-min window, 1-min advance)
|
||||
CREATE TABLE sales_hopping AS
|
||||
SELECT
|
||||
category,
|
||||
WINDOWSTART AS window_start,
|
||||
COUNT(*) AS order_count
|
||||
FROM orders_stream
|
||||
WINDOW HOPPING (SIZE 5 MINUTES, ADVANCE BY 1 MINUTE)
|
||||
GROUP BY category
|
||||
EMIT CHANGES;
|
||||
|
||||
-- Session Window (inactivity gap = 30 minutes)
|
||||
CREATE TABLE user_sessions AS
|
||||
SELECT
|
||||
user_id,
|
||||
WINDOWSTART AS session_start,
|
||||
WINDOWEND AS session_end,
|
||||
COUNT(*) AS event_count
|
||||
FROM user_events_stream
|
||||
WINDOW SESSION (30 MINUTES)
|
||||
GROUP BY user_id
|
||||
EMIT CHANGES;
|
||||
```
|
||||
|
||||
#### Filtering and Transformation:
|
||||
```sql
|
||||
CREATE STREAM high_value_orders AS
|
||||
SELECT
|
||||
order_id,
|
||||
user_id,
|
||||
amount,
|
||||
UCASE(status) AS status,
|
||||
CASE
|
||||
WHEN amount > 1000 THEN 'PREMIUM'
|
||||
WHEN amount > 500 THEN 'STANDARD'
|
||||
ELSE 'BASIC'
|
||||
END AS tier
|
||||
FROM orders_stream
|
||||
WHERE amount > 100
|
||||
EMIT CHANGES;
|
||||
```
|
||||
|
||||
#### Array and Map Operations:
|
||||
```sql
|
||||
CREATE STREAM processed_events AS
|
||||
SELECT
|
||||
id,
|
||||
ARRAY_CONTAINS(tags, 'premium') AS is_premium,
|
||||
ARRAY_LENGTH(items) AS item_count,
|
||||
MAP_KEYS(metadata) AS meta_keys,
|
||||
metadata['source'] AS source
|
||||
FROM events_stream
|
||||
EMIT CHANGES;
|
||||
```
|
||||
|
||||
3. **Generate Queries**:
|
||||
|
||||
```sql
|
||||
-- Push query (continuous)
|
||||
SELECT * FROM users_stream
|
||||
WHERE region = 'US'
|
||||
EMIT CHANGES;
|
||||
|
||||
-- Pull query (one-time, requires table)
|
||||
SELECT * FROM user_counts
|
||||
WHERE region = 'US';
|
||||
```
|
||||
|
||||
4. **Generate Monitoring Commands**:
|
||||
|
||||
```sql
|
||||
-- Show streams
|
||||
SHOW STREAMS;
|
||||
|
||||
-- Describe stream
|
||||
DESCRIBE users_stream;
|
||||
|
||||
-- Show queries
|
||||
SHOW QUERIES;
|
||||
|
||||
-- Explain query
|
||||
EXPLAIN query_id;
|
||||
|
||||
-- Terminate query
|
||||
TERMINATE query_id;
|
||||
```
|
||||
|
||||
5. **Best Practices**:
|
||||
- Use appropriate window types for aggregations
|
||||
- Set RETENTION for stateful operations
|
||||
- Use pull queries for point-in-time lookups
|
||||
- Configure partitioning for joins
|
||||
- Add error handling for UDFs
|
||||
- Monitor query performance
|
||||
|
||||
### Example Usage:
|
||||
|
||||
```
|
||||
User: "Calculate hourly sales by category"
|
||||
Result: Complete ksqlDB window aggregation query
|
||||
```
|
||||
123
commands/schema-register.md
Normal file
123
commands/schema-register.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Schema Registry Management
|
||||
|
||||
Manage Avro/JSON/Protobuf schemas in Confluent Schema Registry.
|
||||
|
||||
## Task
|
||||
|
||||
You are an expert in Confluent Schema Registry. Help users register, update, and manage schemas.
|
||||
|
||||
### Steps:
|
||||
|
||||
1. **Ask for Required Information**:
|
||||
- Schema format: Avro, JSON Schema, or Protobuf
|
||||
- Subject name (topic name or custom subject)
|
||||
- Schema definition
|
||||
- Compatibility mode (optional)
|
||||
|
||||
2. **Generate Schema Definition**:
|
||||
|
||||
#### Avro Example:
|
||||
```json
|
||||
{
|
||||
"type": "record",
|
||||
"name": "User",
|
||||
"namespace": "com.example",
|
||||
"fields": [
|
||||
{"name": "id", "type": "string"},
|
||||
{"name": "email", "type": "string"},
|
||||
{"name": "createdAt", "type": "long", "logicalType": "timestamp-millis"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### JSON Schema Example:
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "User",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"id": {"type": "string"},
|
||||
"email": {"type": "string", "format": "email"},
|
||||
"createdAt": {"type": "string", "format": "date-time"}
|
||||
},
|
||||
"required": ["id", "email"]
|
||||
}
|
||||
```
|
||||
|
||||
#### Protobuf Example:
|
||||
```protobuf
|
||||
syntax = "proto3";
|
||||
|
||||
package com.example;
|
||||
|
||||
message User {
|
||||
string id = 1;
|
||||
string email = 2;
|
||||
int64 created_at = 3;
|
||||
}
|
||||
```
|
||||
|
||||
3. **Generate Registration Script**:
|
||||
|
||||
#### Using curl:
|
||||
```bash
|
||||
curl -X POST http://localhost:8081/subjects/users-value/versions \
|
||||
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
|
||||
-d '{
|
||||
"schema": "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"}]}"
|
||||
}'
|
||||
```
|
||||
|
||||
#### Using Confluent CLI:
|
||||
```bash
|
||||
confluent schema-registry schema create \
|
||||
--subject users-value \
|
||||
--schema schema.avsc \
|
||||
--type AVRO
|
||||
```
|
||||
|
||||
#### Using Python:
|
||||
```python
|
||||
from confluent_kafka.schema_registry import SchemaRegistryClient, Schema
|
||||
|
||||
sr_client = SchemaRegistryClient({'url': 'http://localhost:8081'})
|
||||
|
||||
schema_str = """
|
||||
{
|
||||
"type": "record",
|
||||
"name": "User",
|
||||
"fields": [...]
|
||||
}
|
||||
"""
|
||||
|
||||
schema = Schema(schema_str, schema_type="AVRO")
|
||||
schema_id = sr_client.register_schema("users-value", schema)
|
||||
```
|
||||
|
||||
4. **Set Compatibility Mode**:
|
||||
```bash
|
||||
# BACKWARD (default) - consumers using new schema can read old data
|
||||
# FORWARD - consumers using old schema can read new data
|
||||
# FULL - both backward and forward
|
||||
# NONE - no compatibility checks
|
||||
|
||||
curl -X PUT http://localhost:8081/config/users-value \
|
||||
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
|
||||
-d '{"compatibility": "BACKWARD"}'
|
||||
```
|
||||
|
||||
5. **Best Practices**:
|
||||
- Use semantic versioning in schema evolution
|
||||
- Always test compatibility before registering
|
||||
- Document breaking changes
|
||||
- Use logical types (timestamp-millis, decimal)
|
||||
- Add field descriptions/documentation
|
||||
- Use subject naming strategy consistently
|
||||
|
||||
### Example Usage:
|
||||
|
||||
```
|
||||
User: "Register Avro schema for user events"
|
||||
Result: Complete Avro schema + registration script
|
||||
```
|
||||
Reference in New Issue
Block a user