Skip to content

feat(resolver): add CRD informer-backed schema resolver with lazy extraction#1176

Open
a-hilaly wants to merge 4 commits into
kubernetes-sigs:mainfrom
a-hilaly:schema-caching-but-better
Open

feat(resolver): add CRD informer-backed schema resolver with lazy extraction#1176
a-hilaly wants to merge 4 commits into
kubernetes-sigs:mainfrom
a-hilaly:schema-caching-but-better

Conversation

@a-hilaly
Copy link
Copy Markdown
Member

Replace the TTL-only schema resolution with a three-layer chained resolver:
core (compiled-in) -> CRD (informer-backed) -> fallback (TTL-cached discovery)

The CRD resolver uses a custom GVK indexer on the CRD informer for O(1)
lookups and lazily extracts OpenAPI schemas on first access, caching them
until the CRD is updated or deleted. This gives event driven invalidation
without paying the memory cost of converting schemas for every CRD in the
cluster -- only CRDs that kro actually references are materialized.

Builder is a pure consumer that takes a SchemaResolver interface with no
lifecycle concerns. Resolver wiring, ordering, and lifecycle are owned by
cmd/controller/main.go. The CRD resolver takes a
CustomResourceDefinitionInformer directly rather than a clientset, and the
informer factory is registered with the controller manager for lifecycle
management.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a-hilaly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 19, 2026
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch from 1d49aeb to 895a18e Compare March 19, 2026 23:45
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2026
@a-hilaly
Copy link
Copy Markdown
Member Author

/retest

@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch 2 times, most recently from 4f25b71 to c975bbe Compare March 20, 2026 01:10
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 20, 2026
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch from c975bbe to 2987ee4 Compare March 20, 2026 01:18
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 20, 2026
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch from 2987ee4 to 90a5f68 Compare March 20, 2026 02:03
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 20, 2026
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch 2 times, most recently from 5444a98 to e3f43b6 Compare March 20, 2026 06:43
Copy link
Copy Markdown
Member

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the general idea of a CRD informer to give us events on the schemas, I don't really like how its implemented here in a fully custom index informer. I would much prefer the normal controller runtime informer setup that is present in the other controllers. What was the main reasoning for choosing this way here?

Comment thread cmd/controller/main.go Outdated
}
crdInformerFactory := apiextensionsinformers.NewSharedInformerFactory(apiextensionsClientset, 0)
crdInformer := crdInformerFactory.Apiextensions().V1().CustomResourceDefinitions()
if err := mgr.Add(manager.RunnableFunc(func(ctx context.Context) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a complete custom implementation of basically a normal controller for CRDs, I think this should be unified to controller runtime

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I think we should restrict the informer to only watch for kro based CRDs to limit reach of the events

@a-hilaly
Copy link
Copy Markdown
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 20, 2026
…raction

  Replace the TTL-only schema resolution with a three-layer chained resolver:
  core (compiled-in) -> CRD (informer-backed) -> fallback (TTL-cached discovery).

  The CRD resolver uses a custom GVK indexer on the CRD informer for O(1)
  lookups and lazily extracts OpenAPI schemas on first access, caching them
  until the CRD is updated or deleted. This gives event-driven invalidation
  without paying the memory cost of converting schemas for every CRD in the
  cluster -- only CRDs that kro actually references are materialized.

  Builder is a pure consumer that takes a SchemaResolver interface with no
  lifecycle concerns. Resolver wiring, ordering, and lifecycle are owned by
  cmd/controller/main.go. The CRD resolver takes a
  CustomResourceDefinitionInformer directly rather than a clientset, and the
  informer factory is registered with the controller manager for lifecycle
  management.

  The cache starts empty with no onAdd handler. Schemas are populated on
  demand and evicted by onUpdate/onDelete handlers. onUpdate evicts GVKs
  from both old and new CRD objects to handle served-version changes.
  DefaultCoreResolver and DefaultFallbackResolver provide opinionated
  defaults while CRD resolver wiring remains explicit at the call site.
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch 2 times, most recently from 803cb4c to f635ee3 Compare March 20, 2026 11:13
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch from f635ee3 to 2398e2b Compare March 20, 2026 11:19
@a-hilaly a-hilaly force-pushed the schema-caching-but-better branch from 2398e2b to 098ccea Compare March 20, 2026 11:30
Copy link
Copy Markdown
Member

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better already!

// Used to evict schemas when a CRD is deleted (the object is already
// gone from the cache by the time the reconciler runs).
crdGVKs map[string][]schema.GroupVersionKind
cache cache.Cache
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can just pass the controller runtime client here

// called before the manager is started.
func (r *CRDSchemaResolver) SetupWithManager(mgr ctrl.Manager) error {
if err := mgr.GetCache().IndexField(
context.Background(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, pass cmd context

crd := &apiextensionsv1.CustomResourceDefinition{}
if err := r.cache.Get(ctx, client.ObjectKey{Name: req.Name}, crd); err != nil {
if apierrors.IsNotFound(err) {
// CRD deleted — evict using the reverse index.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move cache functions out so you dont have confusing lock paths


// Look up the CRD from the cache's GVK field index.
var crdList apiextensionsv1.CustomResourceDefinitionList
if err := r.cache.List(context.Background(), &crdList,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, needs different context

// injectKubeEnvelope adds the standard Kubernetes envelope fields (metadata,
// apiVersion, kind) to a CRD schema. CRD specs only contain spec/status —
// the API server injects the rest when serving aggregated OpenAPI.
func injectKubeEnvelope(s *spec.Schema, namespaced bool) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should probably move to kroschema

}

// convertCRDSchema converts a CRD JSONSchemaProps to a kube-openapi spec.Schema.
func convertCRDSchema(jsonSchema *apiextensionsv1.JSONSchemaProps) (*spec.Schema, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dont we already have a schema converter for this somewhere?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 22, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@a-hilaly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmits-verify-codegen 098ccea link true /test presubmits-verify-codegen
presubmits-e2e-upgrade-tests 098ccea link true /test presubmits-e2e-upgrade-tests

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants