Skip to content

Conversation

a-hilaly
Copy link
Member

@a-hilaly a-hilaly commented Sep 17, 2025

Introduces declarative resource collections to KRO, enabling dynamic
creation of multiple resources based on runtime data. This addresses the
fundamental limitation where each RGD resource block could only create a
single Kubernetes resource, requiring static definition of all resources
at design time.

This proposal adds the forEach field to RGD spec.resources, allowing
users to create N resources where N is determined at runtime from arrays,
maps, or discovered resources. The design maintains KRO's declarative
model and security guarantees while solving real world use cases like
creating resources per tenant, per namespace, or per availability zone.

While this proposal also covers related enhancements such as collection
support in externalRef, status field computations with collections, and
various integration points, the implementation will be split into phases.
The initial implementation will focus on core forEach functionality with
basic collections, followed by incremental additions of advanced features
like collection chaining and external resource discovery.

The enhancement is fully backward compatible, requiring no API version
changes or migration. Existing RGDs continue to work unchanged, and users
can adopt collections incrementally as needed. All changes are additive
and maintain KRO's existing reconciliation semantics.

Link for markdown: https://github.com/kubernetes-sigs/kro/blob/e39a08bea545769ca22271784e8737ef1f45b936/docs/design/proposals/collections.md

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a-hilaly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 17, 2025
@a-hilaly a-hilaly changed the title design(kro): add declarative resource collections support design: add declarative resource collections support Sep 17, 2025
@a-hilaly a-hilaly changed the title design: add declarative resource collections support KREP-002: add declarative resource collections support Sep 17, 2025
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool proposal! looking forward to what others have to say on this!

create at runtime. If resource B depends on a collection resource A, the
dependency is at the collection level - B waits for all of A's resources to be
ready based on A's readyWhen condition. This maintains KRO's existing DAG
validation and cycle detection without modification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean it is not possible to wait for individual elements created by a collection as a result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct - dependencies work at the collection level, not individual items. A collection is treated as a single node in the DAG, so for the node to be ready, all "resources" (expanded resources) must satisfty the readyWhen condition.

This is by design to maintain the predictable DAG behaviour. If we allow dependencies on individual collection items the DAG structure would become dynamic - edges could appear/disappear as collection items are added/removed during reconciliation. This would break kro's static analysis and make dependency resolution unpredictable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. However that means we should probably write a disclaimer on this in the design. I think this is a good tradeoff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the "readyWhen with Collections" section, L225

team: frontend
```

#### Discover resources using forEach semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is extertnalRef thus exclusive with forEach as well? I am assuming forEach would only be usable together with template right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEach would work with both template and externalRef. The mutual exclusion is between template and externalRef themselves (a resource can either create something or reference something, not both) but forEach enhances both


- Turn KRO into a general purpose programming language
- Replace specialized controllers that need complex state machines
- Provide workflow orchestration with conditional branching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have conditional branching with if like logic and I think its in theory also possible to define conditional runs by checking if forEach evaluates to no elements right? Are you referring to more complex conditions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring more to situation where items inside a collection are dependent resources that need conditional branching. For example create item of index 5 after index 3 reach a specific state and if item 0 exists.

When a resource uses forEach, special variables become available within CEL
expressions in the template:

- `each.item` - The current element being iterated over
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im wondering if we will keep adding more "introspection" fields on ones own structured data in a resource. if so, maybe it could make sense to structure this under self instead of each, because self can be used for other fields as well. WDYT?

- id: workers
  forEach: ${spec.workers} # ["alice", "bob", "charlie"]
  template:
    kind: Pod
    metadata:
      name: ${'worker-' + self.item} # worker-alice, worker-bob, worker-charlie
      labels:
        worker-name: ${self.item} # alice, bob, charlie
        worker-index: ${string(self.index)} # "0", "1", "2"
        total-workers: ${string(self.length)} # "3"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interresting, the more i think about it the more I do like the idea of using self or this in general (I think we should add them to https://github.com/kubernetes-sigs/kro/blob/main/pkg/graph/validation.go#L38-L65 ). I chose each because it clearly signals iteration context and to be very frank it's mainly inspired from terraform's for_each. I do really want to use self with readyWhen conditions, i'm not entirely sure for collections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with self is the the potential confusion if kro adds other self-referential features in the future, self might mean the current resource being created, not the iteration (as in self.metadata.name vs self.item

Copy link
Contributor

@michaelhtm michaelhtm Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we instead allowed users to define a variable?
eg:

forEach: ${worker: spec.workers}

or maybe something similar to how we handle existing functions (.all(w, condition))
we can try to enforce a var: collection syntax only for forEach.
The same functionalities as each.item would turn into worker.item..
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me it is acceptable if self is used for more than the iterator values because it is effectively an inflection API. the only thing we would be careful on is which fields / keywords are available or blocked, but I think thats doable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you ever have nested forEach that could cause collisions for self?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelhtm That's an interesting suggestion! The challenge with parsing variable name directly from the forEach expression (like ${worker: spec.workers}) is that it conflicts with CEL's ternary operator syntax. In CEL : is a special token dedicated to expressions like condition ? x : y.
My worry is that trying to reliably distinguish between the two syntaxes would require a fragile prefix parsing that could break with comcplex expressions, we'd basically be extending CEL syntax in a non-standard way.

However your idea sparked a thought - we could use a CEL function apparoch:

forEach: ${iterate("worker", spec.workers)}

This would be a CEL compliant and allow custom variable names. We could leverage our ast inspector to detect those and only allow constant strings in the first argument of the function - and the function would return a special iterator context that kro recognizes. Thought this is a little unambiguis and might feed weird-ich.

We could also use a simpler syntax like:

forEach:
  iterateOver: ${spec.workers}
  var: "worker"

it's slightly more expicit and avoid parsing complexity and leaves room for future extensions. Thoughts on these ideas?

Copy link
Member Author

@a-hilaly a-hilaly Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellistarn regarding nested collections - i didn't plan on adding native support for nested collections with a single resource block. However nested for loops is already achievable by composing multiple RGDs: one RGDx creates a collection, and a parent RGDy can forEach over that new type (generated by RGDx). For example:

# RGDx: creates pods per zone #######
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
spec:
  schema:
    kind: ZonePods
    spec: { zones: []string }
  resources:
    - id: pods
      forEach: ${spec.zones}
      template:
        kind: Pod
        # ...
# RGDy: creates ZonePods per region  
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
spec:
  schema:
    kind: RegionalApp
    spec: { regions: []string }
  resources:
    - id: regional
      forEach: ${spec.regions}
      template:
        kind: ZonePods
        # ...

That said your question make me think about natively supporting nested iterations - we could potentially support multiple iterators:

  forEach:
    - iterator: ${spec.regions}
      var: "var1"
    - iterator: ${spec.zones}
      var: "var2"
  template:
    kind: Pod
    metadata:
      name: ${'pod-' +var1.item + '-' +  var2.item}

This would create a cartesian product (us-east x zone-a = pod-us-east-zone-a etc...). Thought the semantics get tricky, what happens when one iterator is empty? I'm realizing this could be problematic we could endup evaluating name: ${iter1.item}-${iter2.item}-${iter3.item} evaluating to something like a-- which isn't a valid kubernetes name foe example. Though this feels like it shouldn't be considered a design issue but rather an implementation issue that users needs to handle.

@jakobmoellerdev ^ if we ever go this route i think the above is a good reason to keep the runtime emulation system and protect users from generating awkward/invalid kuberentes names/namespaces.

Copy link

@vast-raymond-page vast-raymond-page Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-hilaly iterator read redundant in the foreach context. Perhaps itemCollection or just collection for the list of items to iterator over. Instead of var perhaps just name. Also instead of item, element or entry are good alternatives if you need ideas.

  forEach:
    - collection: ${spec.regions}
      name: "region"
    - collection: ${spec.zones}
      name: "zone"
  template:
    kind: Pod
    metadata:
      name: ${'pod-' +region.entry+ '-' +  zone.entry}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vast-raymond-page I like this suggestion as well. It's more kro like.

- `each.index` - The index (for arrays)
- `each.key` - The key (for maps)
- `each.value` - The value (for maps)
- `each.length` - Total size of the collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length breaks with the pattern you establish here because it is the only value you do not define on the individual item, but the collection, maybe worth separating out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, was reading and going to say the same thing. It would seem more logical to have this be separate, or maybe introduce another special variable like collection, etc, if you want to provide properties of the collection itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely both right in here, i'll eliminate each.length. @cheeseandcereal +1 for the another special variable to expose such properties - i purposefully shrank the proposal to not talk about them, i think there is a bigger picture where users can access resources via resources[id] or resources[index]. Also related to what @jakobmoellerdev proposed above #679 (comment)

single resource block to create multiple resources based on collections. The
forEach field can iterate over:

- Arrays from the custom resource spec

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever iterate over maps, or even over structs? Can always be added later, but this struck me as a common pattern in other PLs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maps are support - i missed highlighting them clearly in the overview section.. The implementation in #632 handles map iteration.

By struct, do you mean iterating over object fields as key-value pairs? or iterating over an array of objects?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By struct, do you mean iterating over object fields as key-value pairs

Yes this. Kind of wild -- I'd probably hold off until we knew the use case better. I could imagine wanting to walk the YAML though.

```

The resource collection is considered ready only when ALL worker pods are
running. You can also use:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sugg: easier to read if all of the new aggregation functions are defined in a list. i.e. add all here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps all of the new cel functions we're supporting. Right now it's a bit difficult to distinguish functions that already exist from net-new ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Update the section to include all the supported functions/macros by default.


- `workers.any(w, condition)` - Ready when at least one instance matches
- `workers.filter(w, condition)` - Filter instances based on a condition
- Custom aggregations across the collection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom aggregations would be any combination of CEL functions, for example:

  • workers.filter(w, w.status.ready).size() >= 10
  • workers.map(w, w.status.replicas).sum() >= 10
  • workers.filter(w, w.metadata.labels.env == "productino").all(w, w.status.ready)

totalWorkers: ${size(workerPods.items)}
runningWorkers: ${size(workerPods.items.filter(w, w.status.phase == 'Running'))}
allWorkersReady: ${workerPods.all(w, w.status.conditions.exists(c, c.type == 'Ready' && c.status == 'True'))}
workerNames: ${workerPods.items.map(w, w.metadata.name).join(',')}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we put any thought into shadowing? i.e., if you put items.map(item, ...). Item is being shadowed. I suppose we have validations in place for this? Another option is to standardize w with a keyword.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, kro tracks "local" variables properly regardless of the variable name. We have an AST inspector that tries to handle variable scoping and shadowing correctly: https://github.com/kubernetes-sigs/kro/blob/main/pkg/cel/ast/inspector.go#L311-L333

# collection resource can be referenced by its ID
paths:
| # kro will do it's magic making sure to replace this with the proper type.
${microservices.map(svc, {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cel expression re-evaluate on any change to any service? Maybe somewhat challenging. Would that lead users to optimize performance with something like forEach: ${spec.services.map(s, s.status.loadbalancer.ingress.hostname[0]). This would resolve the services in isolation on line 325 before doing the more expensive evaluation on line 341.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cel expression re-evaluate on any change to any service?

Correct to change to any service or every "resyncPeriod" duration.

This would resolve the services in isolation on line 325 before doing the more expensive evaluation on line 341.

Are you suggesting users could reference "static" values directly from the instance spec rather than from the dynamic service resources?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline -- I think this is essentially the equivalent of a compiler optimization. We can essentially look at subtrees of the DAG, detect changes at specific nodes, and only re-evaluate children within the DAG. For large DAGs, this could result in significant performance improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can consider using evaluation result storage as a later iteration. Kubernetes famously does not recompute cel results everytime, and we could think of something similar. What do you think?

@a-hilaly
Copy link
Member Author

@n3wscott BTW I called this one KREP-002 because technically you did write KREP-001 :)

@a-hilaly a-hilaly force-pushed the krep-2/collections branch 2 times, most recently from c0d54db to a1a0f15 Compare September 18, 2025 07:02
Introduces declarative resource collections to KRO, enabling dynamic
creation of multiple resources based on runtime data. This addresses the
fundamental limitation where each RGD resource block could only create a
single Kubernetes resource, requiring static definition of all resources
at design time.

This proposal adds the `forEach` field to RGD `spec.resources`, allowing
users to create N resources where N is determined at runtime from arrays,
maps, or discovered resources. The design maintains KRO's declarative
model and security guarantees while solving real world use cases like
creating resources per tenant, per namespace, or per availability zone.

While this proposal also covers related enhancements such as collection
support in `externalRef`, status field computations with collections, and
various integration points, the implementation will be split into phases.
The initial implementation will focus on core `forEach` functionality with
basic collections, followed by incremental additions of advanced features
like collection chaining and external resource discovery.

The enhancement is fully backward compatible, requiring no API version
changes or migration. Existing RGDs continue to work unchanged, and users
can adopt collections incrementally as needed. All changes are additive
and maintain KRO's existing reconciliation semantics.
Copy link

@brandonjbjelland brandonjbjelland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exceptional! Thanks for the hard work here.

Comment on lines +13 to +14
Currently, KRO has a fundamental limitation: each resource block in an RGD
creates exactly one resource. The number of resources must be known at design

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can't a CEL expression conditionally create resources today? If so, the following is technically more correct:

Suggested change
Currently, KRO has a fundamental limitation: each resource block in an RGD
creates exactly one resource. The number of resources must be known at design
Currently, KRO has a fundamental limitation: each resource block in an RGD
creates zero or one of each declared resource. The number of resources must be known at design

Comment on lines +192 to +195
As the collection changes (items added/removed), KRO automatically reconciles
the resources - creating new ones for added items and deleting resources for
removed items. Each resource maintains a stable identity based on its position
or key in the collection.
Copy link

@brandonjbjelland brandonjbjelland Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question / Potential Risk:

For array-based forEach resources, what is the stateful key that tracks each resource? My read is that position or numeric index is used. If so, that makes resources fragile: any change in element order or array size could cause churn. This is the same problem Terraform had before introducing for_each.

Options I see:

  1. Set-based keys – Require arrays to track state via a guaranteed unique element (not index).
  2. Maps only – Drop array-based iteration entirely; force arrays to be converted to maps.
  3. Composite keys – Derive stable keys via a hash function or serialization utility (though duplicates like ["foo","foo"] still break).

Tradeoffs:

  • Option 1 burdens users with ensuring key uniqueness and detecting collisions.
  • Option 2 feels heavy-handed, but simplifies state stability and aligns with Terraform’s shift away from count.
  • Option 3 may rely on api-machinery helpers, but isn’t a silver bullet for duplicates and runtime uniqueness violations (eg when the array is based on an externalRef) would make for a bad user experience.

Terraform analogy: count is fragile and considered an anti-pattern except for conditional resource creation. Iteration should use for_each with a toset() when executed on a list to guarantee uniqueness. My concern: forEach on arrays risks re-creating that “count trap.”

Am I over-thinking this? Or is this already solved in kro?

a-hilaly added a commit to a-hilaly/kro that referenced this pull request Sep 21, 2025
After this conversation we'd like to book these two keywords just in case they
will be helpful for collections

kubernetes-sigs#679 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants