Skip to content

CORE-14944 ACLs: replace vector with chunked_vector#30368

Open
pgellert wants to merge 8 commits intoredpanda-data:devfrom
pgellert:fix/acl-vector
Open

CORE-14944 ACLs: replace vector with chunked_vector#30368
pgellert wants to merge 8 commits intoredpanda-data:devfrom
pgellert:fix/acl-vector

Conversation

@pgellert
Copy link
Copy Markdown
Contributor

@pgellert pgellert commented May 1, 2026

This PR replaces std::vector with chunked_vector across the various places where we are storing ACLs in vectors and for types that are stored in vectors that scale with the number of ACLs in redpanda.

The first commit addresses the specific crash that was described in the linked Jira ticket, while the rest go beyond and try to replace all ACL-related vectors with chunked_vector.

See the commit messages for implementation details.

Fixes https://redpandadata.atlassian.net/browse/CORE-14944

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

Bug Fixes

  • Avoid OOM under memory fragmentation when there are many ACLs.

@pgellert pgellert requested a review from a team May 1, 2026 12:52
@pgellert pgellert self-assigned this May 1, 2026
@pgellert pgellert requested review from a team, Copilot and nguyen-andrew and removed request for a team May 1, 2026 12:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces std::vector with chunked_vector in ACL-related paths to reduce large contiguous allocations and mitigate OOM/crash risk under memory fragmentation when many ACLs exist (CORE-14944).

Changes:

  • Switched ACL binding/filter containers across security, cluster, kafka, pandaproxy, and cluster_link codepaths from std::vector to chunked_vector.
  • Updated RPC/serde/ADL types and controller command handling to support move-only ACL command payloads (via deep-copy helpers).
  • Updated tests/benches and Bazel deps to compile and exercise the updated ACL container types.

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/v/security/tests/role_store_bench.cc Uses chunked_vector for benchmark ACL bindings.
src/v/security/tests/authorizer_test.cc Updates authorizer tests to use chunked_vector bindings/filters.
src/v/security/tests/BUILD Adds chunked_vector dep for security tests.
src/v/security/authorizer.h Changes public authorizer APIs to accept/return chunked_vector.
src/v/security/authorizer.cc Implements updated authorizer APIs using chunked_vector.
src/v/security/audit/schemas/utils.h Generalizes audit resource utilities toward ranges (used with chunked_vector).
src/v/security/audit/schemas/application_activity.h Generalizes audit hashing/construction to accept ranges.
src/v/security/audit/audit_log_manager.h Updates audit enqueue constraints to accept broader resource ranges.
src/v/security/acl_store.h Updates ACL store APIs to chunked_vector and propagates through store operations.
src/v/security/acl.cc Converts ACL removal/listing internals to chunked_vector.
src/v/pandaproxy/schema_registry/service.cc Uses chunked_vector when creating schema-registry ACLs.
src/v/pandaproxy/schema_registry/handlers.cc Uses/moves chunked_vector ACL bindings/filters in SR HTTP handlers.
src/v/kafka/server/tests/metadata_test.cc Updates kafka server tests to use chunked_vector for ACL setup and results.
src/v/kafka/server/tests/fetch_plan_bench.cc Updates fetch bench ACL bindings container.
src/v/kafka/server/tests/describe_user_scram_credentials_test.cc Updates ACL setup/result handling to chunked_vector.
src/v/kafka/server/tests/consumer_groups_test.cc Uses chunked_vector (incl. std::from_range) for ACL creation inputs.
src/v/kafka/server/tests/alter_user_scram_credentials_test.cc Updates ACL setup/result handling to chunked_vector.
src/v/kafka/server/tests/BUILD Adds chunked_vector dep for updated kafka server tests/benches.
src/v/kafka/server/server.cc Uses chunked_vector for create-acls handler and copies for audit lambdas.
src/v/kafka/server/request_context.h Broadens audit resource function constraint to accept ranges.
src/v/kafka/server/handlers/delete_acls.cc Uses chunked_vector filters + copies for audit.
src/v/kafka/client/test/metadata.cc Updates client test ACL containers/results to chunked_vector.
src/v/kafka/client/test/BUILD Adds chunked_vector dep for kafka client tests.
src/v/cluster_link/tests/deps.h Updates fake security service to accept/return chunked_vector.
src/v/cluster_link/tests/BUILD Adds chunked_vector dep for cluster_link tests.
src/v/cluster_link/security_migrator.cc Uses chunked_vector for ACL entries/bindings during migration.
src/v/cluster_link/deps.h Updates security_service interface for chunked_vector ACL APIs.
src/v/cluster_link/deps.cc Updates default security_service implementation signatures.
src/v/cluster_link/BUILD Adds chunked_vector implementation dep.
src/v/cluster/types.cc Switches create/delete ACL cmd ADL parsing to chunked_vector.
src/v/cluster/tests/serialization_rt_test.cc Updates roundtrip helpers/tests for move-only ACL types + deep-copy helpers.
src/v/cluster/service.cc Updates RPC service replies to return chunked_vector results.
src/v/cluster/security_types.h Updates command/request/reply types to chunked_vector + adds explicit copy() helpers.
src/v/cluster/security_manager.cc Dispatches commands across shards using deep copies (avoids moving same cmd repeatedly).
src/v/cluster/security_frontend.h Updates frontend APIs to accept/return chunked_vector and adjusts docs.
src/v/cluster/security_frontend.cc Implements chunked_vector APIs and adds helpers to build error result vectors.
src/v/cluster/commands.h Adds copy_cmd + controller_command::copy() and adjusts deserializer moves.
src/v/cluster/cloud_metadata/cluster_recovery_backend.cc Uses chunked_vector when restoring ACLs during recovery.
Comments suppressed due to low confidence (1)

src/v/security/audit/schemas/utils.h:168

  • create_resource_details is templated on std::ranges::sized_range, but it calls resources.size() / resources.begin() / resources.end(). Those member functions are not guaranteed for all sized_range types (e.g. std::ranges::subrange), so this template will fail to instantiate for some valid inputs. Use std::ranges::size(resources) and std::ranges::begin/end(resources) (and std::ranges::transform if desired), or tighten the constraints to require these members explicitly.

Comment thread src/v/security/audit/schemas/application_activity.h
Comment thread src/v/pandaproxy/schema_registry/service.cc Outdated
pgellert added 8 commits May 1, 2026 16:30
With many ACLs, the contiguous std::vector allocation can fail under
fragmentation. chunked_vector allocates in fixed-size chunks.
Needed by the upcoming ACL chunked_vector migration where command data
becomes move-only. copy_cmd<T> deep-copies via T::copy() if present,
else the copy ctor; the .copy() check must come first because
is_copy_constructible_v reports true even for types whose synthesized
copy ctor only fails at instantiation.
Replace std::vector with chunked_vector through the delete-ACL path
(remove_bindings, delete_acls_cmd_data, delete_acls_result,
delete_acls_reply, the kafka delete_acls handler, and the
security_frontend chain).

delete_acls_result becomes move-only, requiring:
- security_manager: per-shard dispatch deep-copies via
  controller_command::copy() to keep do_apply by value
- security_frontend: a make_delete_acls_error_results helper replaces
  std::vector(n, val) and assign(n, val), neither provided by
  chunked_vector
- commands.h: ADL deserialization moves the parsed key/value into the
  Cmd ctor since command data may be move-only
- audit: create_resource_details and returns_auditable_resources are
  generalized to std::ranges::sized_range
- serialization tests: compat_copy and the roundtrip_cmd helpers use
  copy_cmd to handle move-only commands directly, without per-type
  overloads or copyable/move-only branching
Mirror of the delete-path migration on the create side.
Initializer-list construction is rewritten as push_back/emplace_back
since chunked_vector's initializer_list ctor copies elements, which
fails for move-only acl_binding contents.
Sized by input filter count, which scales with bulk-delete request
size. The result is filled via reserve+emplace_back since
chunked_vector lacks resize().
Mirror of the delete-path migration on the create side: switch the
create_acls chain and create_acls_reply::results to chunked_vector. A
make_create_acls_error_results helper replaces the std::vector(n, val)
and assign(n, val) calls.

Wire-compatible: serde encodes chunked_vector and std::vector
identically.
@pgellert
Copy link
Copy Markdown
Contributor Author

pgellert commented May 1, 2026

force-push: rebase to dev to fix merge conflicts

Comment on lines +1496 to +1502
std::vector<cluster::errc> results;
for (int i = 0, mi = random_generators::get_int(20); i < mi; i++) {
results.push_back(cluster::errc::invalid_node_operation);
}
cluster::create_acls_reply data;
roundtrip_test(data);
data.results = std::move(results);
roundtrip_test(std::move(data));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could we simplify:

cluster::create_acls_reply data;
for (int i = 0, mi = random_generators::get_int(20); i < mi; i++) {
    data.results.push_back(cluster::errc::invalid_node_operation);
}
roundtrip_test(std::move(data));

[&ret, cmd = std::move(cmd), &service](int shard) mutable {
return do_apply(shard, cmd, service)
[&ret, &cmd, &service](int shard) {
return do_apply(shard, cmd.copy(), service)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to add a concept to constrain Cmd to have .copy()? Or could we use copy_cmd here?

std::vector<delete_acls_result> results;
chunked_vector<delete_acls_result> results;

delete_acls_reply copy() const {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really in scope for this PR, but I wonder if it'd be useful to extract a chunked_vector::deep_copy or extend a copy_cmd with a chunked_vector overload (that recursively calls copy_cmd on each element). Maybe not

, timeout(timeout) {}

create_acls_request copy() const {
return create_acls_request{data.copy(), timeout};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on create_acls_cmd_data::copy(), which isn't added until the next commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants