CORE-14944 ACLs: replace vector with chunked_vector#30368
CORE-14944 ACLs: replace vector with chunked_vector#30368pgellert wants to merge 8 commits intoredpanda-data:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces std::vector with chunked_vector in ACL-related paths to reduce large contiguous allocations and mitigate OOM/crash risk under memory fragmentation when many ACLs exist (CORE-14944).
Changes:
- Switched ACL binding/filter containers across security, cluster, kafka, pandaproxy, and cluster_link codepaths from
std::vectortochunked_vector. - Updated RPC/serde/ADL types and controller command handling to support move-only ACL command payloads (via deep-copy helpers).
- Updated tests/benches and Bazel deps to compile and exercise the updated ACL container types.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/security/tests/role_store_bench.cc | Uses chunked_vector for benchmark ACL bindings. |
| src/v/security/tests/authorizer_test.cc | Updates authorizer tests to use chunked_vector bindings/filters. |
| src/v/security/tests/BUILD | Adds chunked_vector dep for security tests. |
| src/v/security/authorizer.h | Changes public authorizer APIs to accept/return chunked_vector. |
| src/v/security/authorizer.cc | Implements updated authorizer APIs using chunked_vector. |
| src/v/security/audit/schemas/utils.h | Generalizes audit resource utilities toward ranges (used with chunked_vector). |
| src/v/security/audit/schemas/application_activity.h | Generalizes audit hashing/construction to accept ranges. |
| src/v/security/audit/audit_log_manager.h | Updates audit enqueue constraints to accept broader resource ranges. |
| src/v/security/acl_store.h | Updates ACL store APIs to chunked_vector and propagates through store operations. |
| src/v/security/acl.cc | Converts ACL removal/listing internals to chunked_vector. |
| src/v/pandaproxy/schema_registry/service.cc | Uses chunked_vector when creating schema-registry ACLs. |
| src/v/pandaproxy/schema_registry/handlers.cc | Uses/moves chunked_vector ACL bindings/filters in SR HTTP handlers. |
| src/v/kafka/server/tests/metadata_test.cc | Updates kafka server tests to use chunked_vector for ACL setup and results. |
| src/v/kafka/server/tests/fetch_plan_bench.cc | Updates fetch bench ACL bindings container. |
| src/v/kafka/server/tests/describe_user_scram_credentials_test.cc | Updates ACL setup/result handling to chunked_vector. |
| src/v/kafka/server/tests/consumer_groups_test.cc | Uses chunked_vector (incl. std::from_range) for ACL creation inputs. |
| src/v/kafka/server/tests/alter_user_scram_credentials_test.cc | Updates ACL setup/result handling to chunked_vector. |
| src/v/kafka/server/tests/BUILD | Adds chunked_vector dep for updated kafka server tests/benches. |
| src/v/kafka/server/server.cc | Uses chunked_vector for create-acls handler and copies for audit lambdas. |
| src/v/kafka/server/request_context.h | Broadens audit resource function constraint to accept ranges. |
| src/v/kafka/server/handlers/delete_acls.cc | Uses chunked_vector filters + copies for audit. |
| src/v/kafka/client/test/metadata.cc | Updates client test ACL containers/results to chunked_vector. |
| src/v/kafka/client/test/BUILD | Adds chunked_vector dep for kafka client tests. |
| src/v/cluster_link/tests/deps.h | Updates fake security service to accept/return chunked_vector. |
| src/v/cluster_link/tests/BUILD | Adds chunked_vector dep for cluster_link tests. |
| src/v/cluster_link/security_migrator.cc | Uses chunked_vector for ACL entries/bindings during migration. |
| src/v/cluster_link/deps.h | Updates security_service interface for chunked_vector ACL APIs. |
| src/v/cluster_link/deps.cc | Updates default security_service implementation signatures. |
| src/v/cluster_link/BUILD | Adds chunked_vector implementation dep. |
| src/v/cluster/types.cc | Switches create/delete ACL cmd ADL parsing to chunked_vector. |
| src/v/cluster/tests/serialization_rt_test.cc | Updates roundtrip helpers/tests for move-only ACL types + deep-copy helpers. |
| src/v/cluster/service.cc | Updates RPC service replies to return chunked_vector results. |
| src/v/cluster/security_types.h | Updates command/request/reply types to chunked_vector + adds explicit copy() helpers. |
| src/v/cluster/security_manager.cc | Dispatches commands across shards using deep copies (avoids moving same cmd repeatedly). |
| src/v/cluster/security_frontend.h | Updates frontend APIs to accept/return chunked_vector and adjusts docs. |
| src/v/cluster/security_frontend.cc | Implements chunked_vector APIs and adds helpers to build error result vectors. |
| src/v/cluster/commands.h | Adds copy_cmd + controller_command::copy() and adjusts deserializer moves. |
| src/v/cluster/cloud_metadata/cluster_recovery_backend.cc | Uses chunked_vector when restoring ACLs during recovery. |
Comments suppressed due to low confidence (1)
src/v/security/audit/schemas/utils.h:168
create_resource_detailsis templated onstd::ranges::sized_range, but it callsresources.size()/resources.begin()/resources.end(). Those member functions are not guaranteed for allsized_rangetypes (e.g.std::ranges::subrange), so this template will fail to instantiate for some valid inputs. Usestd::ranges::size(resources)andstd::ranges::begin/end(resources)(andstd::ranges::transformif desired), or tighten the constraints to require these members explicitly.
With many ACLs, the contiguous std::vector allocation can fail under fragmentation. chunked_vector allocates in fixed-size chunks.
Needed by the upcoming ACL chunked_vector migration where command data becomes move-only. copy_cmd<T> deep-copies via T::copy() if present, else the copy ctor; the .copy() check must come first because is_copy_constructible_v reports true even for types whose synthesized copy ctor only fails at instantiation.
Replace std::vector with chunked_vector through the delete-ACL path (remove_bindings, delete_acls_cmd_data, delete_acls_result, delete_acls_reply, the kafka delete_acls handler, and the security_frontend chain). delete_acls_result becomes move-only, requiring: - security_manager: per-shard dispatch deep-copies via controller_command::copy() to keep do_apply by value - security_frontend: a make_delete_acls_error_results helper replaces std::vector(n, val) and assign(n, val), neither provided by chunked_vector - commands.h: ADL deserialization moves the parsed key/value into the Cmd ctor since command data may be move-only - audit: create_resource_details and returns_auditable_resources are generalized to std::ranges::sized_range - serialization tests: compat_copy and the roundtrip_cmd helpers use copy_cmd to handle move-only commands directly, without per-type overloads or copyable/move-only branching
Mirror of the delete-path migration on the create side. Initializer-list construction is rewritten as push_back/emplace_back since chunked_vector's initializer_list ctor copies elements, which fails for move-only acl_binding contents.
Sized by input filter count, which scales with bulk-delete request size. The result is filled via reserve+emplace_back since chunked_vector lacks resize().
Mirror of the delete-path migration on the create side: switch the create_acls chain and create_acls_reply::results to chunked_vector. A make_create_acls_error_results helper replaces the std::vector(n, val) and assign(n, val) calls. Wire-compatible: serde encodes chunked_vector and std::vector identically.
|
force-push: rebase to dev to fix merge conflicts |
| std::vector<cluster::errc> results; | ||
| for (int i = 0, mi = random_generators::get_int(20); i < mi; i++) { | ||
| results.push_back(cluster::errc::invalid_node_operation); | ||
| } | ||
| cluster::create_acls_reply data; | ||
| roundtrip_test(data); | ||
| data.results = std::move(results); | ||
| roundtrip_test(std::move(data)); |
There was a problem hiding this comment.
nit, could we simplify:
cluster::create_acls_reply data;
for (int i = 0, mi = random_generators::get_int(20); i < mi; i++) {
data.results.push_back(cluster::errc::invalid_node_operation);
}
roundtrip_test(std::move(data));| [&ret, cmd = std::move(cmd), &service](int shard) mutable { | ||
| return do_apply(shard, cmd, service) | ||
| [&ret, &cmd, &service](int shard) { | ||
| return do_apply(shard, cmd.copy(), service) |
There was a problem hiding this comment.
Would it be useful to add a concept to constrain Cmd to have .copy()? Or could we use copy_cmd here?
| std::vector<delete_acls_result> results; | ||
| chunked_vector<delete_acls_result> results; | ||
|
|
||
| delete_acls_reply copy() const { |
There was a problem hiding this comment.
Not really in scope for this PR, but I wonder if it'd be useful to extract a chunked_vector::deep_copy or extend a copy_cmd with a chunked_vector overload (that recursively calls copy_cmd on each element). Maybe not
| , timeout(timeout) {} | ||
|
|
||
| create_acls_request copy() const { | ||
| return create_acls_request{data.copy(), timeout}; |
There was a problem hiding this comment.
This relies on create_acls_cmd_data::copy(), which isn't added until the next commit.
This PR replaces
std::vectorwithchunked_vectoracross the various places where we are storing ACLs in vectors and for types that are stored in vectors that scale with the number of ACLs in redpanda.The first commit addresses the specific crash that was described in the linked Jira ticket, while the rest go beyond and try to replace all ACL-related vectors with
chunked_vector.See the commit messages for implementation details.
Fixes https://redpandadata.atlassian.net/browse/CORE-14944
Backports Required
Release Notes
Bug Fixes