Files
gh-anton-abyzov-specweave-p…/commands/connector-deploy.md
2025-11-29 17:56:21 +08:00

4.2 KiB

Kafka Connect Connector Deployment

Deploy and manage Kafka Connect connectors (Source/Sink).

Task

You are an expert in Kafka Connect. Help users deploy source and sink connectors.

Steps:

  1. Ask for Requirements:

    • Connector type: Source or Sink
    • Connector class (JDBC, S3, Elasticsearch, etc.)
    • Connection details
    • Topic configuration
  2. Generate Connector Configuration:

JDBC Source Connector (PostgreSQL):

{
  "name": "postgres-source-connector",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "tasks.max": "1",
    "connection.url": "jdbc:postgresql://localhost:5432/mydb",
    "connection.user": "postgres",
    "connection.password": "${file:/secrets.properties:db-password}",
    "mode": "incrementing",
    "incrementing.column.name": "id",
    "topic.prefix": "postgres-",
    "table.whitelist": "users,orders",
    "poll.interval.ms": "5000",
    "batch.max.rows": "1000",
    "transforms": "createKey,extractInt",
    "transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
    "transforms.createKey.fields": "id",
    "transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
    "transforms.extractInt.field": "id"
  }
}

Elasticsearch Sink Connector:

{
  "name": "elasticsearch-sink-connector",
  "config": {
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "tasks.max": "2",
    "topics": "users,orders",
    "connection.url": "http://elasticsearch:9200",
    "type.name": "_doc",
    "key.ignore": "false",
    "schema.ignore": "true",
    "behavior.on.null.values": "delete",
    "behavior.on.malformed.documents": "warn",
    "max.buffered.records": "20000",
    "batch.size": "2000",
    "linger.ms": "1000",
    "max.in.flight.requests": "5",
    "retry.backoff.ms": "100",
    "max.retries": "10"
  }
}

S3 Sink Connector:

{
  "name": "s3-sink-connector",
  "config": {
    "connector.class": "io.confluent.connect.s3.S3SinkConnector",
    "tasks.max": "3",
    "topics": "events",
    "s3.bucket.name": "my-kafka-bucket",
    "s3.region": "us-east-1",
    "s3.part.size": "5242880",
    "flush.size": "1000",
    "rotate.interval.ms": "3600000",
    "storage.class": "io.confluent.connect.s3.storage.S3Storage",
    "format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
    "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
    "partition.duration.ms": "3600000",
    "path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
    "locale": "en-US",
    "timezone": "UTC"
  }
}
  1. Generate Deployment Scripts:

Using REST API:

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @connector-config.json

Using Confluent CLI:

confluent connect create \
  --config connector-config.json

Check Status:

curl http://localhost:8083/connectors/postgres-source-connector/status

# Expected response:
{
  "name": "postgres-source-connector",
  "connector": {"state": "RUNNING", "worker_id": "connect:8083"},
  "tasks": [{"id": 0, "state": "RUNNING", "worker_id": "connect:8083"}]
}
  1. Generate Monitoring Queries:
# List all connectors
curl http://localhost:8083/connectors

# Get connector config
curl http://localhost:8083/connectors/postgres-source-connector/config

# Get connector metrics
curl http://localhost:8083/connectors/postgres-source-connector/status

# Restart connector
curl -X POST http://localhost:8083/connectors/postgres-source-connector/restart

# Pause connector
curl -X PUT http://localhost:8083/connectors/postgres-source-connector/pause

# Resume connector
curl -X PUT http://localhost:8083/connectors/postgres-source-connector/resume
  1. Best Practices:
  • Use secret management for credentials
  • Configure appropriate error handling
  • Set up monitoring and alerting
  • Use SMT (Single Message Transforms) for data transformation
  • Configure dead letter queues
  • Set appropriate batch sizes and flush intervals
  • Use time-based partitioning for sinks

Example Usage:

User: "Deploy PostgreSQL source connector for users table"
Result: Complete connector config + deployment scripts