Initial commit
This commit is contained in:
15
.claude-plugin/plugin.json
Normal file
15
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"name": "specweave-kafka-streams",
|
||||
"description": "Kafka Streams library integration for SpecWeave - Stream processing with Java/Kotlin, topology patterns, state stores, windowing, joins, and testing frameworks",
|
||||
"version": "0.24.0",
|
||||
"author": {
|
||||
"name": "SpecWeave Team",
|
||||
"url": "https://spec-weave.com"
|
||||
},
|
||||
"skills": [
|
||||
"./skills"
|
||||
],
|
||||
"commands": [
|
||||
"./commands"
|
||||
]
|
||||
}
|
||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# specweave-kafka-streams
|
||||
|
||||
Kafka Streams library integration for SpecWeave - Stream processing with Java/Kotlin, topology patterns, state stores, windowing, joins, and testing frameworks
|
||||
437
commands/topology.md
Normal file
437
commands/topology.md
Normal file
@@ -0,0 +1,437 @@
|
||||
---
|
||||
name: specweave-kafka-streams:topology
|
||||
description: Generate Kafka Streams topology code (Java/Kotlin) with KStream/KTable patterns. Creates stream processing applications with windowing, joins, state stores, and exactly-once semantics.
|
||||
---
|
||||
|
||||
# Generate Kafka Streams Topology
|
||||
|
||||
Create production-ready Kafka Streams applications with best practices baked in.
|
||||
|
||||
## What This Command Does
|
||||
|
||||
1. **Select Pattern**: Choose topology pattern (word count, enrichment, aggregation, etc.)
|
||||
2. **Configure Topics**: Input/output topics and schemas
|
||||
3. **Define Operations**: Filter, map, join, aggregate, window
|
||||
4. **Generate Code**: Java or Kotlin implementation
|
||||
5. **Add Tests**: Topology Test Driver unit tests
|
||||
6. **Build Configuration**: Gradle/Maven, dependencies, configs
|
||||
|
||||
## Available Patterns
|
||||
|
||||
### Pattern 1: Stream Processing (Filter + Transform)
|
||||
**Use Case**: Data cleansing and transformation
|
||||
|
||||
**Topology**:
|
||||
```java
|
||||
KStream<String, Event> events = builder.stream("raw-events");
|
||||
|
||||
KStream<String, ProcessedEvent> processed = events
|
||||
.filter((key, value) -> value.isValid())
|
||||
.mapValues(value -> value.toUpperCase())
|
||||
.selectKey((key, value) -> value.getUserId());
|
||||
|
||||
processed.to("processed-events");
|
||||
```
|
||||
|
||||
### Pattern 2: Stream-Table Join (Enrichment)
|
||||
**Use Case**: Enrich events with reference data
|
||||
|
||||
**Topology**:
|
||||
```java
|
||||
// Users table (changelog stream)
|
||||
KTable<Long, User> users = builder.table("users");
|
||||
|
||||
// Click events
|
||||
KStream<Long, ClickEvent> clicks = builder.stream("clicks");
|
||||
|
||||
// Enrich clicks with user data
|
||||
KStream<Long, EnrichedClick> enriched = clicks.leftJoin(
|
||||
users,
|
||||
(click, user) -> new EnrichedClick(
|
||||
click.getPage(),
|
||||
user != null ? user.getName() : "unknown",
|
||||
click.getTimestamp()
|
||||
)
|
||||
);
|
||||
|
||||
enriched.to("enriched-clicks");
|
||||
```
|
||||
|
||||
### Pattern 3: Windowed Aggregation
|
||||
**Use Case**: Time-based metrics (counts, sums, averages)
|
||||
|
||||
**Topology**:
|
||||
```java
|
||||
KTable<Windowed<String>, Long> counts = events
|
||||
.groupByKey()
|
||||
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
|
||||
.count(Materialized.as("event-counts"));
|
||||
|
||||
counts.toStream()
|
||||
.map((windowedKey, count) -> {
|
||||
String key = windowedKey.key();
|
||||
Instant start = windowedKey.window().startTime();
|
||||
return KeyValue.pair(key, new WindowedCount(key, start, count));
|
||||
})
|
||||
.to("event-counts-output");
|
||||
```
|
||||
|
||||
### Pattern 4: Stateful Deduplication
|
||||
**Use Case**: Remove duplicate events within time window
|
||||
|
||||
**Topology**:
|
||||
```java
|
||||
KStream<String, Event> deduplicated = events
|
||||
.transformValues(
|
||||
() -> new DeduplicationTransformer(Duration.ofMinutes(10)),
|
||||
Materialized.as("dedup-store")
|
||||
);
|
||||
|
||||
deduplicated.to("unique-events");
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
||||
```bash
|
||||
# Generate topology
|
||||
/specweave-kafka-streams:topology
|
||||
|
||||
# I'll ask:
|
||||
# 1. Language? (Java or Kotlin)
|
||||
# 2. Pattern? (Filter/Transform, Join, Aggregation, Deduplication)
|
||||
# 3. Input topic(s)?
|
||||
# 4. Output topic(s)?
|
||||
# 5. Windowing? (if aggregation)
|
||||
# 6. State store? (if stateful)
|
||||
# 7. Build tool? (Gradle or Maven)
|
||||
|
||||
# Then I'll generate:
|
||||
# - src/main/java/MyApp.java (application code)
|
||||
# - src/test/java/MyAppTest.java (unit tests)
|
||||
# - build.gradle or pom.xml
|
||||
# - application.properties
|
||||
# - README.md with setup instructions
|
||||
```
|
||||
|
||||
## Generated Files
|
||||
|
||||
**1. StreamsApplication.java**: Main topology
|
||||
```java
|
||||
package com.example.streams;
|
||||
|
||||
import org.apache.kafka.streams.*;
|
||||
import org.apache.kafka.streams.kstream.*;
|
||||
|
||||
public class StreamsApplication {
|
||||
public static void main(String[] args) {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-app");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
|
||||
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,
|
||||
StreamsConfig.EXACTLY_ONCE_V2);
|
||||
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
|
||||
// Topology code here
|
||||
KStream<String, String> input = builder.stream("input-topic");
|
||||
KStream<String, String> processed = input
|
||||
.filter((key, value) -> value != null)
|
||||
.mapValues(value -> value.toUpperCase());
|
||||
processed.to("output-topic");
|
||||
|
||||
KafkaStreams streams = new KafkaStreams(builder.build(), props);
|
||||
streams.start();
|
||||
|
||||
// Graceful shutdown
|
||||
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**2. StreamsApplicationTest.java**: Unit tests with Topology Test Driver
|
||||
```java
|
||||
package com.example.streams;
|
||||
|
||||
import org.apache.kafka.streams.*;
|
||||
import org.junit.jupiter.api.*;
|
||||
|
||||
public class StreamsApplicationTest {
|
||||
private TopologyTestDriver testDriver;
|
||||
private TestInputTopic<String, String> inputTopic;
|
||||
private TestOutputTopic<String, String> outputTopic;
|
||||
|
||||
@BeforeEach
|
||||
public void setup() {
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
// Build topology
|
||||
// ...
|
||||
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
|
||||
|
||||
testDriver = new TopologyTestDriver(builder.build(), props);
|
||||
inputTopic = testDriver.createInputTopic("input-topic",
|
||||
Serdes.String().serializer(),
|
||||
Serdes.String().serializer());
|
||||
outputTopic = testDriver.createOutputTopic("output-topic",
|
||||
Serdes.String().deserializer(),
|
||||
Serdes.String().deserializer());
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testTransformation() {
|
||||
// Send test data
|
||||
inputTopic.pipeInput("key1", "hello");
|
||||
|
||||
// Assert output
|
||||
KeyValue<String, String> output = outputTopic.readKeyValue();
|
||||
assertEquals("HELLO", output.value);
|
||||
}
|
||||
|
||||
@AfterEach
|
||||
public void tearDown() {
|
||||
testDriver.close();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**3. build.gradle**: Gradle build configuration
|
||||
```groovy
|
||||
plugins {
|
||||
id 'java'
|
||||
id 'application'
|
||||
}
|
||||
|
||||
group = 'com.example'
|
||||
version = '1.0.0'
|
||||
|
||||
repositories {
|
||||
mavenCentral()
|
||||
}
|
||||
|
||||
dependencies {
|
||||
implementation 'org.apache.kafka:kafka-streams:3.6.0'
|
||||
implementation 'org.slf4j:slf4j-simple:2.0.9'
|
||||
|
||||
testImplementation 'org.apache.kafka:kafka-streams-test-utils:3.6.0'
|
||||
testImplementation 'org.junit.jupiter:junit-jupiter:5.10.0'
|
||||
}
|
||||
|
||||
application {
|
||||
mainClass = 'com.example.streams.StreamsApplication'
|
||||
}
|
||||
|
||||
test {
|
||||
useJUnitPlatform()
|
||||
}
|
||||
```
|
||||
|
||||
**4. application.properties**: Runtime configuration
|
||||
```properties
|
||||
bootstrap.servers=localhost:9092
|
||||
application.id=my-streams-app
|
||||
processing.guarantee=exactly_once_v2
|
||||
commit.interval.ms=100
|
||||
cache.max.bytes.buffering=10485760
|
||||
num.stream.threads=2
|
||||
replication.factor=3
|
||||
```
|
||||
|
||||
**5. README.md**: Setup instructions
|
||||
```markdown
|
||||
# Kafka Streams Application
|
||||
|
||||
## Build
|
||||
|
||||
```bash
|
||||
# Gradle
|
||||
./gradlew build
|
||||
|
||||
# Maven
|
||||
mvn clean package
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
# Gradle
|
||||
./gradlew run
|
||||
|
||||
# Maven
|
||||
mvn exec:java
|
||||
```
|
||||
|
||||
## Test
|
||||
|
||||
```bash
|
||||
# Unit tests
|
||||
./gradlew test
|
||||
|
||||
# Integration tests (requires Kafka cluster)
|
||||
./gradlew integrationTest
|
||||
```
|
||||
|
||||
## Docker
|
||||
|
||||
```bash
|
||||
# Build image
|
||||
docker build -t my-streams-app .
|
||||
|
||||
# Run
|
||||
docker run -e BOOTSTRAP_SERVERS=kafka:9092 my-streams-app
|
||||
```
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Exactly-Once Semantics (EOS v2)
|
||||
```java
|
||||
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,
|
||||
StreamsConfig.EXACTLY_ONCE_V2);
|
||||
```
|
||||
|
||||
### Multiple Stream Threads
|
||||
```java
|
||||
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
|
||||
```
|
||||
|
||||
### State Store Configuration
|
||||
```java
|
||||
StoreBuilder<KeyValueStore<String, Long>> storeBuilder =
|
||||
Stores.keyValueStoreBuilder(
|
||||
Stores.persistentKeyValueStore("my-store"),
|
||||
Serdes.String(),
|
||||
Serdes.Long()
|
||||
)
|
||||
.withCachingEnabled()
|
||||
.withLoggingEnabled(Map.of("retention.ms", "86400000"));
|
||||
```
|
||||
|
||||
### Custom Serdes
|
||||
```java
|
||||
// JSON Serde (using Jackson)
|
||||
public class JsonSerde<T> implements Serde<T> {
|
||||
private final ObjectMapper mapper = new ObjectMapper();
|
||||
private final Class<T> type;
|
||||
|
||||
public JsonSerde(Class<T> type) {
|
||||
this.type = type;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Serializer<T> serializer() {
|
||||
return (topic, data) -> {
|
||||
try {
|
||||
return mapper.writeValueAsBytes(data);
|
||||
} catch (Exception e) {
|
||||
throw new SerializationException(e);
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
@Override
|
||||
public Deserializer<T> deserializer() {
|
||||
return (topic, data) -> {
|
||||
try {
|
||||
return mapper.readValue(data, type);
|
||||
} catch (Exception e) {
|
||||
throw new SerializationException(e);
|
||||
}
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Testing Strategies
|
||||
|
||||
### 1. Unit Tests (Topology Test Driver)
|
||||
```java
|
||||
// No Kafka cluster required
|
||||
TopologyTestDriver testDriver = new TopologyTestDriver(topology, props);
|
||||
```
|
||||
|
||||
### 2. Integration Tests (Embedded Kafka)
|
||||
```java
|
||||
@ExtendWith(EmbeddedKafkaExtension.class)
|
||||
public class IntegrationTest {
|
||||
@Test
|
||||
public void testWithRealKafka(EmbeddedKafka kafka) {
|
||||
// Real Kafka cluster
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Performance Tests (Load Testing)
|
||||
```bash
|
||||
# Generate test load
|
||||
kafka-producer-perf-test.sh \
|
||||
--topic input-topic \
|
||||
--num-records 1000000 \
|
||||
--throughput 10000 \
|
||||
--record-size 1024 \
|
||||
--producer-props bootstrap.servers=localhost:9092
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### JMX Metrics
|
||||
```java
|
||||
// Enable JMX
|
||||
props.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG");
|
||||
|
||||
// Export to Prometheus
|
||||
props.put("metric.reporters",
|
||||
"io.confluent.metrics.reporter.ConfluentMetricsReporter");
|
||||
```
|
||||
|
||||
### Key Metrics to Monitor
|
||||
- **Consumer Lag**: `kafka.consumer.fetch.manager.records.lag.max`
|
||||
- **Processing Rate**: `kafka.streams.stream.task.process.rate`
|
||||
- **State Store Size**: `kafka.streams.state.store.bytes.total`
|
||||
- **Rebalance Frequency**: `kafka.streams.consumer.coordinator.rebalance.total`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue 1: Rebalancing Too Frequently
|
||||
**Solution**: Increase session timeout
|
||||
```java
|
||||
props.put(StreamsConfig.SESSION_TIMEOUT_MS_CONFIG, 30000);
|
||||
```
|
||||
|
||||
### Issue 2: State Store Too Large
|
||||
**Solution**: Enable compaction, reduce retention
|
||||
```java
|
||||
storeBuilder.withLoggingEnabled(Map.of(
|
||||
"cleanup.policy", "compact",
|
||||
"retention.ms", "86400000"
|
||||
));
|
||||
```
|
||||
|
||||
### Issue 3: Slow Processing
|
||||
**Solution**: Increase parallelism
|
||||
```java
|
||||
// More threads
|
||||
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 8);
|
||||
|
||||
// More partitions (requires topic reconfiguration)
|
||||
kafka-topics.sh --alter --topic input-topic --partitions 8
|
||||
```
|
||||
|
||||
## Related Commands
|
||||
|
||||
- `/specweave-kafka:dev-env` - Set up local Kafka cluster for testing
|
||||
- `/specweave-kafka:monitor-setup` - Configure Prometheus + Grafana monitoring
|
||||
|
||||
## Documentation
|
||||
|
||||
- **Kafka Streams Docs**: https://kafka.apache.org/documentation/streams/
|
||||
- **Topology Patterns**: `.specweave/docs/public/guides/kafka-streams-patterns.md`
|
||||
- **State Stores**: `.specweave/docs/public/guides/kafka-streams-state.md`
|
||||
- **Testing Guide**: `.specweave/docs/public/guides/kafka-streams-testing.md`
|
||||
|
||||
---
|
||||
|
||||
**Plugin**: specweave-kafka-streams
|
||||
**Version**: 1.0.0
|
||||
**Status**: ✅ Production Ready
|
||||
49
plugin.lock.json
Normal file
49
plugin.lock.json
Normal file
@@ -0,0 +1,49 @@
|
||||
{
|
||||
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||
"pluginId": "gh:anton-abyzov/specweave:plugins/specweave-kafka-streams",
|
||||
"normalized": {
|
||||
"repo": null,
|
||||
"ref": "refs/tags/v20251128.0",
|
||||
"commit": "b3ba663ad1284d387f801e7e93fdb9805de2e390",
|
||||
"treeHash": "e1f429a5d1e8be3446af8477174213bfdd8056566bf41a72b69bc7ca208e38d2",
|
||||
"generatedAt": "2025-11-28T10:13:51.925634Z",
|
||||
"toolVersion": "publish_plugins.py@0.2.0"
|
||||
},
|
||||
"origin": {
|
||||
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||
"branch": "master",
|
||||
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||
},
|
||||
"manifest": {
|
||||
"name": "specweave-kafka-streams",
|
||||
"description": "Kafka Streams library integration for SpecWeave - Stream processing with Java/Kotlin, topology patterns, state stores, windowing, joins, and testing frameworks",
|
||||
"version": "0.24.0"
|
||||
},
|
||||
"content": {
|
||||
"files": [
|
||||
{
|
||||
"path": "README.md",
|
||||
"sha256": "1904889d8eb5971d149f64cbffcde444867365525d75698bc275aebd79910b93"
|
||||
},
|
||||
{
|
||||
"path": ".claude-plugin/plugin.json",
|
||||
"sha256": "13c93052727765230b15347ee3207e16fd395a6e9af6adcd4dc0aa52cfd49d7d"
|
||||
},
|
||||
{
|
||||
"path": "commands/topology.md",
|
||||
"sha256": "78e2b7ea8aea466b55647bae98f96be4b22a1b3fdf2d95a8595c5d4bc1fc33c8"
|
||||
},
|
||||
{
|
||||
"path": "skills/kafka-streams-topology/SKILL.md",
|
||||
"sha256": "b9db3de04661dcfff703fc611a668e11420919fb9624581fb9a6c9ff0bd3193b"
|
||||
}
|
||||
],
|
||||
"dirSha256": "e1f429a5d1e8be3446af8477174213bfdd8056566bf41a72b69bc7ca208e38d2"
|
||||
},
|
||||
"security": {
|
||||
"scannedAt": null,
|
||||
"scannerVersion": null,
|
||||
"flags": []
|
||||
}
|
||||
}
|
||||
539
skills/kafka-streams-topology/SKILL.md
Normal file
539
skills/kafka-streams-topology/SKILL.md
Normal file
@@ -0,0 +1,539 @@
|
||||
---
|
||||
name: kafka-streams-topology
|
||||
description: Kafka Streams topology design expert. Covers KStream vs KTable vs GlobalKTable, topology patterns, stream operations (filter, map, flatMap, branch), joins, windowing strategies, and exactly-once semantics. Activates for kafka streams topology, kstream, ktable, globalkTable, stream operations, stream joins, windowing, exactly-once, topology design.
|
||||
---
|
||||
|
||||
# Kafka Streams Topology Skill
|
||||
|
||||
Expert knowledge of Kafka Streams library for building stream processing topologies in Java/Kotlin.
|
||||
|
||||
## What I Know
|
||||
|
||||
### Core Abstractions
|
||||
|
||||
**KStream** (Event Stream - Unbounded, Append-Only):
|
||||
- Represents immutable event sequences
|
||||
- Each record is an independent event
|
||||
- Use for: Clickstreams, transactions, sensor readings
|
||||
|
||||
**KTable** (Changelog Stream - Latest State by Key):
|
||||
- Represents mutable state (compacted topic)
|
||||
- Updates override previous values (by key)
|
||||
- Use for: User profiles, product catalog, account balances
|
||||
|
||||
**GlobalKTable** (Replicated Table - Available on All Instances):
|
||||
- Full table replicated to every stream instance
|
||||
- No partitioning (broadcast)
|
||||
- Use for: Reference data (countries, products), lookups
|
||||
|
||||
**Key Differences**:
|
||||
```java
|
||||
// KStream: Every event is independent
|
||||
KStream<Long, Click> clicks = builder.stream("clicks");
|
||||
clicks.foreach((key, value) -> {
|
||||
System.out.println(value); // Prints every click event
|
||||
});
|
||||
|
||||
// KTable: Latest value wins (by key)
|
||||
KTable<Long, User> users = builder.table("users");
|
||||
users.toStream().foreach((key, value) -> {
|
||||
System.out.println(value); // Prints only current user state
|
||||
});
|
||||
|
||||
// GlobalKTable: Replicated to all instances (no partitioning)
|
||||
GlobalKTable<Long, Product> products = builder.globalTable("products");
|
||||
// Available for lookups on any instance (no repartitioning needed)
|
||||
```
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Activate me when you need help with:
|
||||
- Topology design ("How to design Kafka Streams topology?")
|
||||
- KStream vs KTable ("When to use KStream vs KTable?")
|
||||
- Stream operations ("Filter and transform events")
|
||||
- Joins ("Join KStream with KTable")
|
||||
- Windowing ("Tumbling vs hopping vs session windows")
|
||||
- Exactly-once semantics ("Enable EOS")
|
||||
- Topology optimization ("Optimize stream processing")
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Filter and Transform
|
||||
|
||||
**Use Case**: Clean and enrich events
|
||||
|
||||
```java
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
|
||||
// Input stream
|
||||
KStream<Long, ClickEvent> clicks = builder.stream("clicks");
|
||||
|
||||
// Filter out bot clicks
|
||||
KStream<Long, ClickEvent> humanClicks = clicks
|
||||
.filter((key, value) -> !value.isBot());
|
||||
|
||||
// Transform: Extract page from URL
|
||||
KStream<Long, String> pages = humanClicks
|
||||
.mapValues(click -> extractPage(click.getUrl()));
|
||||
|
||||
// Write to output topic
|
||||
pages.to("pages");
|
||||
```
|
||||
|
||||
### Pattern 2: Branch by Condition
|
||||
|
||||
**Use Case**: Route events to different paths
|
||||
|
||||
```java
|
||||
Map<String, KStream<Long, Order>> branches = orders
|
||||
.split(Named.as("order-"))
|
||||
.branch((key, order) -> order.getTotal() > 1000, Branched.as("high-value"))
|
||||
.branch((key, order) -> order.getTotal() > 100, Branched.as("medium-value"))
|
||||
.defaultBranch(Branched.as("low-value"));
|
||||
|
||||
// High-value orders → priority processing
|
||||
branches.get("order-high-value").to("priority-orders");
|
||||
|
||||
// Low-value orders → standard processing
|
||||
branches.get("order-low-value").to("standard-orders");
|
||||
```
|
||||
|
||||
### Pattern 3: Enrich Stream with Table (Stream-Table Join)
|
||||
|
||||
**Use Case**: Add user details to click events
|
||||
|
||||
```java
|
||||
// Users table (current state)
|
||||
KTable<Long, User> users = builder.table("users");
|
||||
|
||||
// Clicks stream
|
||||
KStream<Long, ClickEvent> clicks = builder.stream("clicks");
|
||||
|
||||
// Enrich clicks with user data (left join)
|
||||
KStream<Long, EnrichedClick> enriched = clicks.leftJoin(
|
||||
users,
|
||||
(click, user) -> new EnrichedClick(
|
||||
click.getPage(),
|
||||
user != null ? user.getName() : "unknown",
|
||||
user != null ? user.getEmail() : "unknown"
|
||||
),
|
||||
Joined.with(Serdes.Long(), clickSerde, userSerde)
|
||||
);
|
||||
|
||||
enriched.to("enriched-clicks");
|
||||
```
|
||||
|
||||
### Pattern 4: Aggregate with Windowing
|
||||
|
||||
**Use Case**: Count clicks per user, per 5-minute window
|
||||
|
||||
```java
|
||||
KTable<Windowed<Long>, Long> clickCounts = clicks
|
||||
.groupByKey()
|
||||
.windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
|
||||
.count(Materialized.as("click-counts-store"));
|
||||
|
||||
// Convert to stream for output
|
||||
clickCounts.toStream()
|
||||
.map((windowedKey, count) -> {
|
||||
Long userId = windowedKey.key();
|
||||
Instant start = windowedKey.window().startTime();
|
||||
Instant end = windowedKey.window().endTime();
|
||||
return KeyValue.pair(userId, new WindowedCount(userId, start, end, count));
|
||||
})
|
||||
.to("click-counts");
|
||||
```
|
||||
|
||||
### Pattern 5: Stateful Processing with State Store
|
||||
|
||||
**Use Case**: Detect duplicate events within 10 minutes
|
||||
|
||||
```java
|
||||
// Define state store
|
||||
StoreBuilder<KeyValueStore<Long, Long>> storeBuilder =
|
||||
Stores.keyValueStoreBuilder(
|
||||
Stores.persistentKeyValueStore("dedup-store"),
|
||||
Serdes.Long(),
|
||||
Serdes.Long()
|
||||
);
|
||||
|
||||
builder.addStateStore(storeBuilder);
|
||||
|
||||
// Deduplicate events
|
||||
KStream<Long, Event> deduplicated = events.transformValues(
|
||||
() -> new ValueTransformerWithKey<Long, Event, Event>() {
|
||||
private KeyValueStore<Long, Long> store;
|
||||
|
||||
@Override
|
||||
public void init(ProcessorContext context) {
|
||||
this.store = context.getStateStore("dedup-store");
|
||||
}
|
||||
|
||||
@Override
|
||||
public Event transform(Long key, Event value) {
|
||||
Long lastSeen = store.get(key);
|
||||
long now = System.currentTimeMillis();
|
||||
|
||||
// Duplicate detected (within 10 minutes)
|
||||
if (lastSeen != null && (now - lastSeen) < 600_000) {
|
||||
return null; // Drop duplicate
|
||||
}
|
||||
|
||||
// Not duplicate, store timestamp
|
||||
store.put(key, now);
|
||||
return value;
|
||||
}
|
||||
},
|
||||
"dedup-store"
|
||||
).filter((key, value) -> value != null); // Remove nulls
|
||||
|
||||
deduplicated.to("unique-events");
|
||||
```
|
||||
|
||||
## Join Types
|
||||
|
||||
### 1. Stream-Stream Join (Inner)
|
||||
|
||||
**Use Case**: Correlate related events within time window
|
||||
|
||||
```java
|
||||
// Page views and clicks within 10 minutes
|
||||
KStream<Long, PageView> views = builder.stream("page-views");
|
||||
KStream<Long, Click> clicks = builder.stream("clicks");
|
||||
|
||||
KStream<Long, ClickWithView> joined = clicks.join(
|
||||
views,
|
||||
(click, view) -> new ClickWithView(click, view),
|
||||
JoinWindows.of(Duration.ofMinutes(10)),
|
||||
StreamJoined.with(Serdes.Long(), clickSerde, viewSerde)
|
||||
);
|
||||
```
|
||||
|
||||
### 2. Stream-Table Join (Left)
|
||||
|
||||
**Use Case**: Enrich events with current state
|
||||
|
||||
```java
|
||||
// Add product details to order items
|
||||
KTable<Long, Product> products = builder.table("products");
|
||||
KStream<Long, OrderItem> items = builder.stream("order-items");
|
||||
|
||||
KStream<Long, EnrichedOrderItem> enriched = items.leftJoin(
|
||||
products,
|
||||
(item, product) -> new EnrichedOrderItem(
|
||||
item,
|
||||
product != null ? product.getName() : "Unknown",
|
||||
product != null ? product.getPrice() : 0.0
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
### 3. Table-Table Join (Inner)
|
||||
|
||||
**Use Case**: Combine two tables (latest state)
|
||||
|
||||
```java
|
||||
// Join users with their current shopping cart
|
||||
KTable<Long, User> users = builder.table("users");
|
||||
KTable<Long, Cart> carts = builder.table("shopping-carts");
|
||||
|
||||
KTable<Long, UserWithCart> joined = users.join(
|
||||
carts,
|
||||
(user, cart) -> new UserWithCart(user.getName(), cart.getTotal())
|
||||
);
|
||||
```
|
||||
|
||||
### 4. Stream-GlobalKTable Join
|
||||
|
||||
**Use Case**: Enrich with reference data (no repartitioning)
|
||||
|
||||
```java
|
||||
// Add country details to user registrations
|
||||
GlobalKTable<String, Country> countries = builder.globalTable("countries");
|
||||
KStream<Long, UserRegistration> registrations = builder.stream("registrations");
|
||||
|
||||
KStream<Long, EnrichedRegistration> enriched = registrations.leftJoin(
|
||||
countries,
|
||||
(userId, registration) -> registration.getCountryCode(), // Key extractor
|
||||
(registration, country) -> new EnrichedRegistration(
|
||||
registration,
|
||||
country != null ? country.getName() : "Unknown"
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
## Windowing Strategies
|
||||
|
||||
### Tumbling Windows (Non-Overlapping)
|
||||
|
||||
**Use Case**: Aggregate per fixed time period
|
||||
|
||||
```java
|
||||
// Count events every 5 minutes
|
||||
KTable<Windowed<Long>, Long> counts = events
|
||||
.groupByKey()
|
||||
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
|
||||
.count();
|
||||
|
||||
// Windows: [0:00-0:05), [0:05-0:10), [0:10-0:15)
|
||||
```
|
||||
|
||||
### Hopping Windows (Overlapping)
|
||||
|
||||
**Use Case**: Moving average or overlapping aggregates
|
||||
|
||||
```java
|
||||
// Count events in 10-minute windows, advancing every 5 minutes
|
||||
KTable<Windowed<Long>, Long> counts = events
|
||||
.groupByKey()
|
||||
.windowedBy(TimeWindows.ofSizeAndGrace(
|
||||
Duration.ofMinutes(10),
|
||||
Duration.ofMinutes(5)
|
||||
).advanceBy(Duration.ofMinutes(5)))
|
||||
.count();
|
||||
|
||||
// Windows: [0:00-0:10), [0:05-0:15), [0:10-0:20)
|
||||
```
|
||||
|
||||
### Session Windows (Event-Based)
|
||||
|
||||
**Use Case**: User sessions with inactivity gap
|
||||
|
||||
```java
|
||||
// Session ends after 30 minutes of inactivity
|
||||
KTable<Windowed<Long>, Long> sessionCounts = events
|
||||
.groupByKey()
|
||||
.windowedBy(SessionWindows.ofInactivityGapWithNoGrace(Duration.ofMinutes(30)))
|
||||
.count();
|
||||
```
|
||||
|
||||
### Sliding Windows (Continuous)
|
||||
|
||||
**Use Case**: Anomaly detection over sliding time window
|
||||
|
||||
```java
|
||||
// Detect >100 events in any 1-minute period
|
||||
KTable<Windowed<Long>, Long> slidingCounts = events
|
||||
.groupByKey()
|
||||
.windowedBy(SlidingWindows.ofTimeDifferenceWithNoGrace(Duration.ofMinutes(1)))
|
||||
.count();
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Partition Keys Correctly
|
||||
|
||||
✅ **DO**:
|
||||
```java
|
||||
// Repartition by user_id before aggregation
|
||||
KStream<Long, Event> byUser = events
|
||||
.selectKey((key, value) -> value.getUserId());
|
||||
|
||||
// Now aggregation is efficient
|
||||
KTable<Long, Long> userCounts = byUser
|
||||
.groupByKey()
|
||||
.count();
|
||||
```
|
||||
|
||||
❌ **DON'T**:
|
||||
```java
|
||||
// WRONG: groupBy with different key (triggers repartitioning!)
|
||||
KTable<Long, Long> userCounts = events
|
||||
.groupBy((key, value) -> KeyValue.pair(value.getUserId(), value))
|
||||
.count();
|
||||
```
|
||||
|
||||
### 2. Use Appropriate Serdes
|
||||
|
||||
✅ **DO**:
|
||||
```java
|
||||
// Define custom serde for complex types
|
||||
Serde<User> userSerde = new JsonSerde<>(User.class);
|
||||
|
||||
KStream<Long, User> users = builder.stream(
|
||||
"users",
|
||||
Consumed.with(Serdes.Long(), userSerde)
|
||||
);
|
||||
```
|
||||
|
||||
❌ **DON'T**:
|
||||
```java
|
||||
// WRONG: No serde specified (uses default String serde!)
|
||||
KStream<Long, User> users = builder.stream("users");
|
||||
```
|
||||
|
||||
### 3. Enable Exactly-Once Semantics
|
||||
|
||||
✅ **DO**:
|
||||
```java
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,
|
||||
StreamsConfig.EXACTLY_ONCE_V2); // EOS v2 (recommended)
|
||||
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 100); // Commit frequently
|
||||
```
|
||||
|
||||
### 4. Use Materialized Stores for Queries
|
||||
|
||||
✅ **DO**:
|
||||
```java
|
||||
// Named store for interactive queries
|
||||
KTable<Long, Long> counts = events
|
||||
.groupByKey()
|
||||
.count(Materialized.<Long, Long, KeyValueStore<Bytes, byte[]>>as("user-counts")
|
||||
.withKeySerde(Serdes.Long())
|
||||
.withValueSerde(Serdes.Long()));
|
||||
|
||||
// Query store from REST API
|
||||
ReadOnlyKeyValueStore<Long, Long> store =
|
||||
streams.store(StoreQueryParameters.fromNameAndType(
|
||||
"user-counts",
|
||||
QueryableStoreTypes.keyValueStore()
|
||||
));
|
||||
|
||||
Long count = store.get(userId);
|
||||
```
|
||||
|
||||
## Topology Optimization
|
||||
|
||||
### 1. Combine Operations
|
||||
|
||||
**GOOD** (Single pass):
|
||||
```java
|
||||
KStream<Long, String> result = events
|
||||
.filter((key, value) -> value.isValid())
|
||||
.mapValues(value -> value.toUpperCase())
|
||||
.filterNot((key, value) -> value.contains("test"));
|
||||
```
|
||||
|
||||
**BAD** (Multiple intermediate topics):
|
||||
```java
|
||||
KStream<Long, Event> valid = events.filter((key, value) -> value.isValid());
|
||||
valid.to("valid-events"); // Unnecessary write
|
||||
|
||||
KStream<Long, Event> fromValid = builder.stream("valid-events");
|
||||
KStream<Long, String> upper = fromValid.mapValues(v -> v.toUpperCase());
|
||||
```
|
||||
|
||||
### 2. Reuse KTables
|
||||
|
||||
**GOOD** (Shared table):
|
||||
```java
|
||||
KTable<Long, User> users = builder.table("users");
|
||||
|
||||
KStream<Long, EnrichedClick> enrichedClicks = clicks.leftJoin(users, ...);
|
||||
KStream<Long, EnrichedOrder> enrichedOrders = orders.leftJoin(users, ...);
|
||||
```
|
||||
|
||||
**BAD** (Duplicate tables):
|
||||
```java
|
||||
KTable<Long, User> users1 = builder.table("users");
|
||||
KTable<Long, User> users2 = builder.table("users"); // Duplicate!
|
||||
```
|
||||
|
||||
## Testing Topologies
|
||||
|
||||
### Topology Test Driver
|
||||
|
||||
```java
|
||||
@Test
|
||||
public void testClickFilter() {
|
||||
// Setup topology
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
KStream<Long, Click> clicks = builder.stream("clicks");
|
||||
clicks.filter((key, value) -> !value.isBot())
|
||||
.to("human-clicks");
|
||||
|
||||
Topology topology = builder.build();
|
||||
|
||||
// Create test driver
|
||||
TopologyTestDriver testDriver = new TopologyTestDriver(topology);
|
||||
|
||||
// Input topic
|
||||
TestInputTopic<Long, Click> inputTopic = testDriver.createInputTopic(
|
||||
"clicks",
|
||||
Serdes.Long().serializer(),
|
||||
clickSerde.serializer()
|
||||
);
|
||||
|
||||
// Output topic
|
||||
TestOutputTopic<Long, Click> outputTopic = testDriver.createOutputTopic(
|
||||
"human-clicks",
|
||||
Serdes.Long().deserializer(),
|
||||
clickSerde.deserializer()
|
||||
);
|
||||
|
||||
// Send test data
|
||||
inputTopic.pipeInput(1L, new Click(1L, "page1", false)); // Human
|
||||
inputTopic.pipeInput(2L, new Click(2L, "page2", true)); // Bot
|
||||
|
||||
// Assert output
|
||||
List<Click> output = outputTopic.readValuesToList();
|
||||
assertEquals(1, output.size()); // Only human click
|
||||
assertFalse(output.get(0).isBot());
|
||||
|
||||
testDriver.close();
|
||||
}
|
||||
```
|
||||
|
||||
## Common Issues & Solutions
|
||||
|
||||
### Issue 1: StreamsException - Not Co-Partitioned
|
||||
|
||||
**Error**: Topics not co-partitioned for join
|
||||
|
||||
**Root Cause**: Joined streams/tables have different partition counts
|
||||
|
||||
**Solution**: Repartition to match:
|
||||
```java
|
||||
// Ensure same partition count
|
||||
KStream<Long, Event> repartitioned = events
|
||||
.through("events-repartitioned",
|
||||
Produced.with(Serdes.Long(), eventSerde)
|
||||
.withStreamPartitioner((topic, key, value, numPartitions) ->
|
||||
(int) (key % 12) // Match target partition count
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
### Issue 2: Out of Memory (Large State Store)
|
||||
|
||||
**Error**: Java heap space
|
||||
|
||||
**Root Cause**: State store too large, windowing not used
|
||||
|
||||
**Solution**: Add time-based cleanup:
|
||||
```java
|
||||
// Use windowing to limit state size
|
||||
KTable<Windowed<Long>, Long> counts = events
|
||||
.groupByKey()
|
||||
.windowedBy(TimeWindows.ofSizeAndGrace(
|
||||
Duration.ofHours(24), // Window size
|
||||
Duration.ofHours(1) // Grace period
|
||||
))
|
||||
.count();
|
||||
```
|
||||
|
||||
### Issue 3: High Lag, Slow Processing
|
||||
|
||||
**Root Cause**: Blocking operations, inefficient transformations
|
||||
|
||||
**Solution**: Use async processing:
|
||||
```java
|
||||
// BAD: Blocking HTTP call
|
||||
events.mapValues(value -> {
|
||||
return httpClient.get(value.getUrl()); // BLOCKS!
|
||||
});
|
||||
|
||||
// GOOD: Async processing with state store
|
||||
events.transformValues(() -> new AsyncEnricher());
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- Kafka Streams Documentation: https://kafka.apache.org/documentation/streams/
|
||||
- Kafka Streams Tutorial: https://kafka.apache.org/documentation/streams/tutorial
|
||||
- Testing Guide: https://kafka.apache.org/documentation/streams/developer-guide/testing.html
|
||||
|
||||
---
|
||||
|
||||
**Invoke me when you need topology design, joins, windowing, or exactly-once semantics expertise!**
|
||||
Reference in New Issue
Block a user