-
Notifications
You must be signed in to change notification settings - Fork 235
feat: eager service account SelfSubjectAccessReview
support when evaluating ResourceGraphDefinition
#646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
```yaml apiVersion: kro.run/v1alpha1 kind: ResourceGraphDefinition metadata: # ... spec: defaultServiceAccounts: '*': configmap-creator resources: - id: configmap template: apiVersion: v1 data: key: value kind: ConfigMap metadata: name: ${schema.spec.name}-configmap schema: apiVersion: v1alpha1 group: kro.run kind: ConfigMapService spec: name: string status: conditions: # ... - lastTransitionTime: "2025-08-22T21:22:15Z" message: |- specified service account system:serviceaccount:default:configmap-creator does not have permission to delete "configmap" (/v1, Resource=configmaps) in namespace default specified service account system:serviceaccount:default:configmap-creator does not have permission to patch "configmap" (/v1, Resource=configmaps) in namespace default specified service account system:serviceaccount:default:configmap-creator does not have permission to update "configmap" (/v1, Resource=configmaps) in namespace default specified service account system:serviceaccount:default:configmap-creator does not have permission to watch "configmap" (/v1, Resource=configmaps) in namespace default observedGeneration: 1 reason: InvalidResourceGraph status: "False" type: Ready state: Inactive ``` - Added preflight RBAC validation for every resource in a ResourceGraphDefinition (RGD). - Selects the service account per resource based on spec.defaultServiceAccounts (namespace-specific override first, then * catch-all). - Performs SelfSubjectAccessReview calls (create, update, delete, patch, watch) using impersonation of the selected service account. - Runs these SARs concurrently and aggregates denials into a single error. - Treats permission failures as “invalid resource graph,” marking the RGD Ready=False and surfacing a precise, multi-line message listing missing verbs per resource. - Wired the authorization client and impersonation into the shared client set (pkg/client/set.go) and extended the fake client to support it for tests. Added E2E tests: Namespaced and cluster-scoped permission checks that assert Ready=False with exact denial messages. A deletion flow test that applies an RGD, verifies, deletes it, and confirms the object is gone. ## Why it’s useful - Fast, explicit failure: Users get immediate, actionable feedback when service accounts lack required verbs, before any workloads are created. - Least-privilege safety: Encourages narrowly scoped RBAC by showing exactly which verbs are needed for each resource. - Better UX and debuggability: Aggregated, deterministic messages show all missing permissions in one place and propagate to the Ready condition. Signed-off-by: Jakob Möller <jakob.moeller@sap.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea and the usability gain introduced with this feature!
I suppose you decided against SelfSubjectRulesReview
since
- we wanted to actually fail in case of insufficient authorization (so, we cannot affort to have incomplete results, because otherwise we might wrongfully fail)
- we can handle a slight delay as we are not a UI
I figure with several large rgds, we'd quickly be doing quite a bunch of requests - especially at ramp up. Wouldn't we have to be concerned about rate limiting here (or is this less of a concern because each rgd might use a different user identity)?
for _, resource := range resources { | ||
selectedServiceAccount := selectServiceAccountForResource(accounts, resource) | ||
if err := verifyResourcePermissions(ctx, r.clientSet, selectedServiceAccount, resource); err != nil { | ||
errs = append(errs, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to run this concurrently as well?
} | ||
|
||
// getServiceAccountUserName builds the impersonate service account user name. | ||
// The format of the user name is "system:serviceaccount:<serviceaccount>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The format of the user name is "system:serviceaccount:<serviceaccount>" | |
// The format of the user name is "system:serviceaccount:<namespace>:<serviceaccount>" |
nit
The main reason to not use
The PR is still in draft because it is a proof of concept created based on another issue discussion and it should be optimized much further before being considered for performance reasons or similar IMO |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jakobmoellerdev for taking a stab at this, i left a few "philosophical" questions below.
// make sure the service accounts configured in the resource graph definition have the required permissions. | ||
if err := r.reconcileResourceGraphDefinitionPermissions(ctx, processedRGD.Resources, rgd.Spec.DefaultServiceAccounts); err != nil { | ||
return nil, nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a special condition for this? i'm thinking that it shouldn't necessarily be Terminal (as in marking the RGD not Ready) - but more of an advisory condition informing the admins that one of their permissions set isn't configured correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing I think thats possible
review := &authv1.SelfSubjectAccessReview{ | ||
Spec: authv1.SelfSubjectAccessReviewSpec{ | ||
ResourceAttributes: &authv1.ResourceAttributes{ | ||
Namespace: namespace, | ||
Verb: verb, | ||
Group: gvr.Group, | ||
Version: gvr.Version, | ||
Resource: gvr.Resource, | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about the practical applicability of this permission checking approach. The implementation assumes static namespace values, but in reality, some/maybe-majority of kro use cases leverage CEL expressions for dynamic namespace resolution. Users commonly write templates with expressions like ${schema.spec.something + "-test"}
, ${resource.configMap.metadata.namespace}
, or even conditional logic that resolves to different namespaces based on runtime conditions, thinking ${somethingBoolean? a: b}
.
I'm wondering if we should consider limiting this validation to resources that don't specify a namespace field at all, since those inherit the "instance" namespace by default. At least for that subset of resources, we could provide meaningful validation. Alternatively, we could delegate permission checking to the instance reconcilers, but this introduces its own challenges. Sometimes we'll only be able to evaluate permissions at certain points in the DAG execution (dynamic variables point to other resources status?), and by the time we discover permission issues, we may have already created some resources. This somewhat defeats the purpose of "eager" validation.
I'm questioning whether this feature provides enough value in its current form. Perhaps we should either constrain this to a smaller well defined scope where it can actually work reliably, or rethink the approach entirely to handle dynamic values appropriately for namespaces (as not allowing dynamic namespace settings)? would love to hear your thoughts @fabianburth @jakobmoellerdev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm im wondering about this. Wouldn't the emulated namespace for the resource be the only thing that we care about? as in, whatever namespace
on the resource resolves to? And that is currenty the value im using here (im using processedRGD.Resources
).
But maybe im just wrong on this :D My assumption currently is that the namespace value im using here is the pre-evaluated one from the graph analysis.
Sometimes we'll only be able to evaluate permissions at certain points in the DAG execution (dynamic variables point to other resources status?), and by the time we discover permission issues, we may have already created some resources. This somewhat defeats the purpose of "eager" validation.
This is only true if KRO moves fully away from its graph evaluation. I see it like CEL itself. Sure there are some checks you could only do at runtime, but this should not be the norm. I think KRO should default to eager validation and to only fallback to runtime checks if nothing else works.
In an ideal world, KRO uses its graph to divide a topology into applicable "layers" in which each layer is validatable when all previous layers have been considered valid and applied as well.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Added preflight RBAC validation for every resource in a ResourceGraphDefinition (RGD).
Added E2E tests: Namespaced and cluster-scoped permission checks that assert Ready=False with exact denial messages. A deletion flow test that applies an RGD, verifies, deletes it, and confirms the object is gone.
Why it’s useful