Skip to content

Conversation

jakobmoellerdev
Copy link
Contributor

apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  # ...
spec:
  defaultServiceAccounts:
    '*': configmap-creator
  resources:
  - id: configmap
    template:
      apiVersion: v1
      data:
        key: value
      kind: ConfigMap
      metadata:
        name: ${schema.spec.name}-configmap
  schema:
    apiVersion: v1alpha1
    group: kro.run
    kind: ConfigMapService
    spec:
      name: string
status:
  conditions:
  # ...
  - lastTransitionTime: "2025-08-22T21:22:15Z"
    message: |-
      specified service account system:serviceaccount:default:configmap-creator does not have permission to delete "configmap" (/v1, Resource=configmaps) in namespace default
      specified service account system:serviceaccount:default:configmap-creator does not have permission to patch "configmap" (/v1, Resource=configmaps) in namespace default
      specified service account system:serviceaccount:default:configmap-creator does not have permission to update "configmap" (/v1, Resource=configmaps) in namespace default
      specified service account system:serviceaccount:default:configmap-creator does not have permission to watch "configmap" (/v1, Resource=configmaps) in namespace default
    observedGeneration: 1
    reason: InvalidResourceGraph
    status: "False"
    type: Ready
  state: Inactive

Added preflight RBAC validation for every resource in a ResourceGraphDefinition (RGD).

  • Selects the service account per resource based on spec.defaultServiceAccounts (namespace-specific override first, then * catch-all).
  • Performs SelfSubjectAccessReview calls (create, update, delete, patch, watch) using impersonation of the selected service account.
  • Runs these SARs concurrently and aggregates denials into a single error.
  • Treats permission failures as “invalid resource graph,” marking the RGD Ready=False and surfacing a precise, multi-line message listing missing verbs per resource.
  • Wired the authorization client and impersonation into the shared client set (pkg/client/set.go) and extended the fake client to support it for tests.

Added E2E tests: Namespaced and cluster-scoped permission checks that assert Ready=False with exact denial messages. A deletion flow test that applies an RGD, verifies, deletes it, and confirms the object is gone.

Why it’s useful

  • Fast, explicit failure: Users get immediate, actionable feedback when service accounts lack required verbs, before any workloads are created.
  • Least-privilege safety: Encourages narrowly scoped RBAC by showing exactly which verbs are needed for each resource.
  • Better UX and debuggability: Aggregated, deterministic messages show all missing permissions in one place and propagate to the Ready condition.

```yaml
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  # ...
spec:
  defaultServiceAccounts:
    '*': configmap-creator
  resources:
  - id: configmap
    template:
      apiVersion: v1
      data:
        key: value
      kind: ConfigMap
      metadata:
        name: ${schema.spec.name}-configmap
  schema:
    apiVersion: v1alpha1
    group: kro.run
    kind: ConfigMapService
    spec:
      name: string
status:
  conditions:
  # ...
  - lastTransitionTime: "2025-08-22T21:22:15Z"
    message: |-
      specified service account system:serviceaccount:default:configmap-creator does not have permission to delete "configmap" (/v1, Resource=configmaps) in namespace default
      specified service account system:serviceaccount:default:configmap-creator does not have permission to patch "configmap" (/v1, Resource=configmaps) in namespace default
      specified service account system:serviceaccount:default:configmap-creator does not have permission to update "configmap" (/v1, Resource=configmaps) in namespace default
      specified service account system:serviceaccount:default:configmap-creator does not have permission to watch "configmap" (/v1, Resource=configmaps) in namespace default
    observedGeneration: 1
    reason: InvalidResourceGraph
    status: "False"
    type: Ready
  state: Inactive
```

- Added preflight RBAC validation for every resource in a ResourceGraphDefinition (RGD).
  - Selects the service account per resource based on spec.defaultServiceAccounts (namespace-specific override first, then * catch-all).
  - Performs SelfSubjectAccessReview calls (create, update, delete, patch, watch) using impersonation of the selected service account.
  - Runs these SARs concurrently and aggregates denials into a single error.
  - Treats permission failures as “invalid resource graph,” marking the RGD Ready=False and surfacing a precise, multi-line message listing missing verbs per resource.
  - Wired the authorization client and impersonation into the shared client set (pkg/client/set.go) and extended the fake client to support it for tests.

Added E2E tests: Namespaced and cluster-scoped permission checks that assert Ready=False with exact denial messages.
A deletion flow test that applies an RGD, verifies, deletes it, and confirms the object is gone.

## Why it’s useful

- Fast, explicit failure: Users get immediate, actionable feedback when service accounts lack required verbs, before any workloads are created.
- Least-privilege safety: Encourages narrowly scoped RBAC by showing exactly which verbs are needed for each resource.
- Better UX and debuggability: Aggregated, deterministic messages show all missing permissions in one place and propagate to the Ready condition.

Signed-off-by: Jakob Möller <jakob.moeller@sap.com>
Copy link
Contributor

@fabianburth fabianburth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea and the usability gain introduced with this feature!

I suppose you decided against SelfSubjectRulesReview since

  • we wanted to actually fail in case of insufficient authorization (so, we cannot affort to have incomplete results, because otherwise we might wrongfully fail)
  • we can handle a slight delay as we are not a UI

I figure with several large rgds, we'd quickly be doing quite a bunch of requests - especially at ramp up. Wouldn't we have to be concerned about rate limiting here (or is this less of a concern because each rgd might use a different user identity)?

Comment on lines +208 to +213
for _, resource := range resources {
selectedServiceAccount := selectServiceAccountForResource(accounts, resource)
if err := verifyResourcePermissions(ctx, r.clientSet, selectedServiceAccount, resource); err != nil {
errs = append(errs, err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to run this concurrently as well?

}

// getServiceAccountUserName builds the impersonate service account user name.
// The format of the user name is "system:serviceaccount:<serviceaccount>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The format of the user name is "system:serviceaccount:<serviceaccount>"
// The format of the user name is "system:serviceaccount:<namespace>:<serviceaccount>"

nit

@jakobmoellerdev
Copy link
Contributor Author

The main reason to not use SelfSubjectRulesReview is because it is not supposed to be used as such. https://kubernetes.io/docs/reference/kubernetes-api/authorization-resources/self-subject-rules-review-v1/#SelfSubjectRulesReview:

The returned list of actions may be incomplete depending on the server's authorization mode, and any errors experienced during the evaluation. SelfSubjectRulesReview should be used by UIs to show/hide actions, or to quickly let an end user reason about their permissions. It should NOT Be used by external systems to drive authorization decisions as this raises confused deputy, cache lifetime/revocation, and correctness concerns. SubjectAccessReview, and LocalAccessReview are the correct way to defer authorization decisions to the API server.

The PR is still in draft because it is a proof of concept created based on another issue discussion and it should be optimized much further before being considered for performance reasons or similar IMO

@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 12, 2025
Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jakobmoellerdev for taking a stab at this, i left a few "philosophical" questions below.

Comment on lines +147 to +150
// make sure the service accounts configured in the resource graph definition have the required permissions.
if err := r.reconcileResourceGraphDefinitionPermissions(ctx, processedRGD.Resources, rgd.Spec.DefaultServiceAccounts); err != nil {
return nil, nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a special condition for this? i'm thinking that it shouldn't necessarily be Terminal (as in marking the RGD not Ready) - but more of an advisory condition informing the admins that one of their permissions set isn't configured correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing I think thats possible

Comment on lines +294 to +304
review := &authv1.SelfSubjectAccessReview{
Spec: authv1.SelfSubjectAccessReviewSpec{
ResourceAttributes: &authv1.ResourceAttributes{
Namespace: namespace,
Verb: verb,
Group: gvr.Group,
Version: gvr.Version,
Resource: gvr.Resource,
},
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about the practical applicability of this permission checking approach. The implementation assumes static namespace values, but in reality, some/maybe-majority of kro use cases leverage CEL expressions for dynamic namespace resolution. Users commonly write templates with expressions like ${schema.spec.something + "-test"}, ${resource.configMap.metadata.namespace}, or even conditional logic that resolves to different namespaces based on runtime conditions, thinking ${somethingBoolean? a: b}.

I'm wondering if we should consider limiting this validation to resources that don't specify a namespace field at all, since those inherit the "instance" namespace by default. At least for that subset of resources, we could provide meaningful validation. Alternatively, we could delegate permission checking to the instance reconcilers, but this introduces its own challenges. Sometimes we'll only be able to evaluate permissions at certain points in the DAG execution (dynamic variables point to other resources status?), and by the time we discover permission issues, we may have already created some resources. This somewhat defeats the purpose of "eager" validation.

I'm questioning whether this feature provides enough value in its current form. Perhaps we should either constrain this to a smaller well defined scope where it can actually work reliably, or rethink the approach entirely to handle dynamic values appropriately for namespaces (as not allowing dynamic namespace settings)? would love to hear your thoughts @fabianburth @jakobmoellerdev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm im wondering about this. Wouldn't the emulated namespace for the resource be the only thing that we care about? as in, whatever namespace on the resource resolves to? And that is currenty the value im using here (im using processedRGD.Resources).
But maybe im just wrong on this :D My assumption currently is that the namespace value im using here is the pre-evaluated one from the graph analysis.

Sometimes we'll only be able to evaluate permissions at certain points in the DAG execution (dynamic variables point to other resources status?), and by the time we discover permission issues, we may have already created some resources. This somewhat defeats the purpose of "eager" validation.

This is only true if KRO moves fully away from its graph evaluation. I see it like CEL itself. Sure there are some checks you could only do at runtime, but this should not be the norm. I think KRO should default to eager validation and to only fallback to runtime checks if nothing else works.

In an ideal world, KRO uses its graph to divide a topology into applicable "layers" in which each layer is validatable when all previous layers have been considered valid and applied as well.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants