feat(kafka): add topic pattern subscription support#1002
Open
touhou09 wants to merge 3 commits intoArroyoSystems:masterfrom
Open
feat(kafka): add topic pattern subscription support#1002touhou09 wants to merge 3 commits intoArroyoSystems:masterfrom
touhou09 wants to merge 3 commits intoArroyoSystems:masterfrom
Conversation
Add support for subscribing to multiple Kafka topics using regex patterns.
This enables scenarios where a single Arroyo pipeline can consume from
multiple topics matching a pattern (e.g., `^logs-.*` or `^edge[0-9]+_raw$`).
Changes:
- Add `topic_pattern` option to Kafka source table configuration
- Implement pattern-based subscription using rdkafka's `subscribe()` method
- Update `KafkaState` to track offsets by (topic, partition) key for multi-topic support
- Add validation: `topic` and `topic_pattern` are mutually exclusive
- Add validation: `topic_pattern` only valid for source tables, not sinks
Usage in SQL:
```sql
CREATE TABLE multi_topic_source (
...
) WITH (
connector = 'kafka',
type = 'source',
topic_pattern = '^logs-.*',
...
);
```
When using `topic_pattern`:
- Kafka handles partition assignment via consumer group protocol
- New topics matching the pattern are automatically discovered
- Rebalancing happens automatically when topics are added/removed
Closes: #XXXX
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add support for subscribing to multiple Kafka topics using regex patterns.
This enables scenarios where a single Arroyo pipeline can consume from multiple topics matching a pattern (e.g.,
^logs-.*or^edge[0-9]+_raw$).Motivation
Currently, Arroyo's Kafka source connector only supports subscribing to a single explicit topic. In many real-world scenarios (e.g., multi-tenant systems, edge device data collection), data is partitioned across multiple topics with a common naming pattern. This PR adds the ability to subscribe to all topics matching a regex pattern using Kafka's native pattern subscription feature.
Changes
topic_patternoption to Kafka source table configurationsubscribe()methodKafkaStateto track offsets by(topic, partition)key for multi-topic supporttopicandtopic_patternare mutually exclusivetopic_patternonly valid for source tables, not sinksUsage Example
Behavior
When using
topic_pattern:topicmetadata field reflects the actual topic each message came fromBreaking Changes
None. The
topicfield remains the default and recommended option for single-topic subscriptions.Test Plan
Option<String>for topic)Related Issues
This addresses the need for multi-topic subscription in stream processing pipelines.