Skip to content

[DNM] User Reconcilliation PR#29595

Closed
nguyen-andrew wants to merge 13 commits intoredpanda-data:devfrom
nguyen-andrew:wip/sr/subject-query-param
Closed

[DNM] User Reconcilliation PR#29595
nguyen-andrew wants to merge 13 commits intoredpanda-data:devfrom
nguyen-andrew:wip/sr/subject-query-param

Conversation

@nguyen-andrew
Copy link
Member

DO NOT MERGE - THIS IS FOR A DEMO

Reconcile Cloud principals into Redpanda Core roles (OIDC / impersonation foundation)

Context

We’re moving authorization for Kafka API and Schema Registry API from Console into Redpanda Core to unlock user impersonation and restore end-to-end auditability. Today, Console proxies cluster operations through its service account, which breaks attribution and requires customers to maintain separate auth systems.

This PR implements the “user reconciliation” plumbing needed for Phase 2/3 of the RFC: turning Cloud identities (users + service accounts) into dataplane principals bound to dataplane roles in Redpanda Core, without persisting email PII in control plane operational storage.

What this PR changes

  1. Introduces/extends dataplane role binding reconciliation
    • Adds/updates logic to reconcile RoleBindings of type TYPE_DATAPLANE (cluster-scoped only) into the target cluster.
    • Maps CP “role memberships” → Redpanda Core role assignments:
    • role_name (e.g. __redpanda_cloud_admin_role or customer-defined role) →
    • principal (email) resolved at runtime →
    • role assignment in the cluster via rpk security role assign (or Admin API equivalent, depending on environment).

  2. Runtime principal resolution (Zero-PII CP DB)
    • Reconciler uses account_id as the durable identifier in control plane.
    • At reconcile-time, calls User Service / Backoffice API to resolve:
    • account_id -> email (batch up to N, with retries/backoff).
    • Email values are used only transiently to materialize dataplane role assignments.
    • Control plane audit logging stores account_id only (no email), aligning with the RFC privacy requirements.

  3. Reconciles “admin” role bootstrap for Cloud clusters
    • Ensures the built-in dataplane role exists:
    • __redpanda_cloud_admin_role
    • Ensures it has the wildcard “allow all” ACL set (idempotent).
    • Ensures principals with dataplane_cluster_superuser (or equivalent CP signal) are bound to that role.

  4. Adds observability + guardrails
    • Metrics:
    • principal resolution latency
    • reconciliation lag
    • reconcile failures by reason code (lookup failure, role missing, rpk failure, etc.)
    • Safety:
    • idempotent updates (re-run safe)
    • partial failure handling: if principal resolution fails, we do not delete existing bindings; we retry later.

Why
• Enables the target state where Console forwards user OIDC tokens for Kafka + SR and Redpanda Core performs authorization.
• Restores compliance-critical auditability: cluster actions attributed to real user principals (with the known Admin API exception).
• Preserves privacy posture: no email addresses stored in CP operational DB.

How it works (high-level)
1. RoleBinding event / periodic sweep triggers reconcile for a cluster.
2. Reconciler fetches desired dataplane memberships (account_ids + role_names).
3. Batch-resolve account_ids → emails via User Service.
4. Apply role assignment delta in cluster:
• create role if missing (for system roles) + apply ACLs
• assign principals (emails) to roles
5. Emit metrics + write audit log with account_id + role_name + cluster_id.

API / Data model changes
• RoleBinding supports type = TYPE_DATAPLANE, validated to require:
• scope.resource_type == CLUSTER
• (If included in this PR) Cluster spec / OIDC spec additions used by reconciliation:
• OIDC configuration (discovery URL, audience, principal mapping)
• role specs / system role naming convention (_redpanda_cloud* reserved)

Security / Privacy notes
• No changes to secrets handling.
• Email addresses are treated as PII:
• not persisted in CP DB
• not logged in CP audit logs
• may exist in dataplane config/CRDs/ConfigMaps as permitted by the RFC

Dependencies / rollout considerations
• Blocked on per-listener SASL mechanism config in Redpanda Core (CD-403):
• We must ensure OIDC (OAUTHBEARER) is enabled only on internal listeners.
• Schema Registry authz maturity: SR authorization requires Redpanda 25.2+ and is not enabled cloud-wide by default (tracked separately).

Update context_subject::from_string to handle context-only strings.
Rename is_default_context() to is_default_context_only() to better
reflect its behavior: it returns true only when the context is the
default context AND the subject is empty (context-only).

Add a new method is_non_default_context() that checks if a subject is
in a non-default context. This will be used in future changes to handle
subject query parameters.
Add an optional `subject` query parameter to control schema lookup
context and subject restriction. The parameter value is parsed using
context_subject::from_string(), which extracts a context substring and
a subject substring from the input.

Lookup behavior:
- No parameter: search default context without subject restriction
  (existing behavior)
- Context only (e.g., ":.ctx:"): search the specified context without
  subject restriction
- Qualified (e.g., ":.ctx:sub"): search the specified context,
  restricted to the subject substring
- Unqualified (e.g., "sub" or ":.:sub"): search the default context
  restricted to the subject substring; if not found, search all other
  contexts; if still not found, fall back to the default context
  without subject restriction

A subject parameter is "unqualified" if it resolves to the default
context, either implicitly (no context substring) or explicitly
(context substring is ".").
Add test for the `subject` query parameter on GET /schemas/ids/{id},
verifying that it correctly extracts context for schema lookup.
Also update the test client to support the new parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant