[CORE 15747] Partial solution to feature gating cloud topics#29817
[CORE 15747] Partial solution to feature gating cloud topics#29817piyushredpanda merged 4 commits intoredpanda-data:devfrom
Conversation
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds a new cluster feature flag for “cloud topics” and wires feature-table awareness into Kafka CreateTopics validation so that redpanda.storage.mode=cloud is rejected unless the cloud_topics feature is active.
Changes:
- Introduces
features::feature::cloud_topicsand registers it in the feature schema +to_string_view. - Extends topic request validation plumbing to pass a
features::feature_table*into validators. - Gates
redpanda.storage.mode=cloudinstorage_mode_config_validatoronfeature::cloud_topicsbeing active, and updates impacted unit tests/build deps.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/kafka/server/tests/validator_tests.cc | Updates validator test callsites for new is_valid(..., feature_table*) signature. |
| src/v/kafka/server/tests/topic_utils_test.cc | Updates test validators to accept feature_table* to match new predicate shape. |
| src/v/kafka/server/handlers/topics/validators.h | Changes validator concept/signatures and adds feature-gating for cloud storage mode. |
| src/v/kafka/server/handlers/topics/topic_utils.h | Threads feature_table* through validate_requests_range predicate/validator execution. |
| src/v/kafka/server/handlers/create_topics.cc | Passes feature table (when available) into request validation. |
| src/v/kafka/server/BUILD | Adds //src/v/features dependency to support new includes. |
| src/v/features/feature_table.h | Adds cloud_topics feature bit and schema entry. |
| src/v/features/feature_table.cc | Adds stringification for feature::cloud_topics. |
You can also share your feedback on Copilot code review. Take the survey.
| case model::redpanda_storage_mode::cloud: | ||
| if ( | ||
| ft == nullptr | ||
| || !ft->is_active(features::feature::cloud_topics)) { | ||
| return false; | ||
| } | ||
| return config::shard_local_cfg().cloud_topics_enabled(); |
There was a problem hiding this comment.
The new cloud_topics feature gate logic in storage_mode_config_validator is not covered by tests (e.g., a create-topics request with redpanda.storage.mode=cloud should fail when the feature is inactive and succeed when it is active, assuming config allows it). Adding a focused unit/integration test would help prevent regressions in the gating behavior.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
WillemKauf
left a comment
There was a problem hiding this comment.
LGTM but I'll repeat once more publicly that we probably need to find a better way of gating our config system, potentially by wrapping a feature_table accessor within our config store, and adding a definition for gated_property which can transparently respect features being active or not.
Though this also may require changes to the feature_table a la #29707 (i.e. suppressing updates to features until all nodes are updated, and then restarted).
The reason for this being if we have a check for a cluster config at start-up (say to construct a service like cloud_topics_app), gated_property<bool> cloud_topics_enabled may return false (if the feature has not become active yet), but in the future (say at topic creation time), cloud_topics_enabled may return true once the feature is active (but behavior of the system is now undefined/hard to reason about/here be dragons).
😓
Retry command for Build#81776please wait until all jobs are finished before running the slash command |
Retry command for Build#81779please wait until all jobs are finished before running the slash command |
|
ugh so annoying |
Verify that cloud topic creation is rejected when the cloud_topics feature has not yet activated after upgrading from v25.3.x, and succeeds once the feature becomes active.
andrwng
left a comment
There was a problem hiding this comment.
Gating the subsystem startup is a bit more complicated (follow up)
Curious why this is desirable. It seems like this would make things much more complicated and would work against the mental model of "if the whole cluster is on 26.1 then we're ready for a cloud topic"
@andrwng yeh it's debatable if we do it or not. the bullet point is really there as a placeholder for that investigation/consideration. right now we have these goals:
On boot-up today we can control the construction of So maybe the answer here is that all |
Do we even have a use for
|
Backports Required
Release Notes