✨ Add Kubeconfig-Based provider #45

FourFifthsCode · 2025-06-06T20:02:18Z

Kubeconfig-based multi-cluster provider. It reads secrets from the operator's namespace with a given label and engages the clusters that the secret applies to.

This works in real time for engaging and disengaging clusters as the provider watches all secrets in the operator's namespace or can be configured to watch a specific namespaces.

Added tests
Updated example to follow Configmap pattern
Added simple hash to prevent re-engaging cluster on secret update that didn't modify kubeconfig data.

Related issue: #22
Original PR: #26

…oviders. Remove extra unnecessary aspects like isReady, hashing, etc. Add script to create kubeconfig secrets using remote serviceaccounts.

…ime into kubeconfig-based-provider

…tions

…uster of secret data hasn't changed

linux-foundation-easycla · 2025-06-06T20:02:24Z

The committers listed above are authorized under a signed CLA.

✅ login: christensenjairus / name: Jairus Christensen (f942914, 0a9c619, b21016e)
✅ login: FourFifthsCode / name: Codey Jenkins (e72b5a8, 1bc0776, d1481a0, b545226, f40b8f6, 3e1c90f, 9759a8b, 389fe13, 558cd9f, c976645, 9729c81, 6d4cb6f, 0595ad0, 49611e9, eb889d6)

k8s-ci-robot · 2025-06-06T20:02:27Z

Welcome @FourFifthsCode!

It looks like this is your first PR to kubernetes-sigs/multicluster-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/multicluster-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

providers/kubeconfig/provider.go

FourFifthsCode · 2025-06-10T14:18:55Z

closes #26

FourFifthsCode · 2025-06-13T15:02:39Z

@sttts @embik @corentone does this address your concerns from #26?

* Adds a multi cluster controller capable of watching remote clusters and reconciling their DNSRecord resources * Uses https://github.com/kubernetes-sigs/multicluster-runtime and kubeconfig provider from kubernetes-sigs/multicluster-runtime#45 Signed-off-by: Michael Nairn <mnairn@redhat.com>

embik · 2025-06-13T15:45:01Z

Hi @FourFifthsCode, thanks for the PR, sorry for missing it. I'll assign myself to review it ASAP.

FourFifthsCode · 2025-06-13T20:14:15Z

@embik no worries, thank you!

corentone

sorry some questions may come from my ignorance of the details of multicluster-runtime. it does feel like a few of the things you do could be handled by the runtime library instead of reimplemented here.

corentone · 2025-06-13T20:31:36Z

examples/kubeconfig/README.md

+Command line options:
+- `-c, --context`: Kubeconfig context to use (required)
+- `--name`: Name for the secret (defaults to context name)
+- `-n, --namespace`: Namespace to create the secret in (default: "default")


do we really want to use default namespace for credentials?
maybe make it required?

I think default namespace is ok, but I did change the default service account to use a different name, don't think we should use default for that.

corentone · 2025-06-13T20:33:03Z

examples/kubeconfig/README.md

+
+## How It Works
+
+1. The kubeconfig provider watches for secrets with a specific label in a namespace


shouldn't it always be a secret within the same namespace as the controller?

Namespace if configurable on the provider

corentone · 2025-06-13T20:33:58Z

examples/kubeconfig/README.md

+   - Creates a new controller-runtime cluster
+   - Makes the cluster available to your controllers
+3. Your controllers can access any cluster through the manager
+4. RBAC rules ensure your operator has the necessary permissions in each cluster


can you clarify where those rules are.

RBAC Rules on the remote clusters ensure the SA operator on the cluster of the controller has the necessary permissions in the remote clusters

corentone · 2025-06-13T20:34:41Z

examples/kubeconfig/README.md

+- `-n, --namespace`: Namespace to create the secret in (default: "default")
+- `-a, --service-account`: Service account name to use from the remote cluster (default: "default")
+
+### 2. Customizing RBAC Rules


This step is not too clear to me. Who sets that. Also, what is the Binding?
Also maybe we want to suggest a role and not a clusterRole?

Added abilty to create role or cluster role in script

corentone · 2025-06-13T20:36:52Z

examples/kubeconfig/main.go

+	}
+
+	// Create the provider first, then the manager with the provider
+	entryLog.Info("Creating provider")


can we log providerOpts here

Im thinking of removing this log entry since it will be logged with options on provider.Run

corentone · 2025-06-13T20:48:03Z

providers/kubeconfig/provider.go

+
+// handleSecret processes a secret containing kubeconfig data
+func (p *Provider) handleSecret(ctx context.Context, secret *corev1.Secret, mgr mcmanager.Manager) error {
+	if secret == nil {


this could be a SecretToActiveCluster() func

corentone · 2025-06-13T20:49:31Z

providers/kubeconfig/provider.go

+		}
+
+		log.Info("Cluster already exists, updating it")
+		if err := p.removeCluster(clusterName); err != nil {


Does the cluster need to be removed, could it be dangerous?
Because its going to shut down and restart the whole controller for this cluster.

I don't think there is a clean way to update a cluster.Cluster (e.g. restart its underlying cache) and the change in the kubeconfig that is detected here could be anything (from a credentials refresh to a completely different server URL). Stopping controllers and re-starting them is probably the best way to make sure changes from the kubeconfig get reflected.

yeah, my initial feeling is that if a secret is deleted, I would expect to stop reconciling on the associated cluster without having to restart a pod to re-initialize. If that means restarting the controller I would be inclined to accept that trade-off. Are there any other implications or trade-offs I'm missing with controller restarts?

Assuming a kubeconfig is only establishing the pipe, is there a mode where we could swap the credentials without flushing the whole Cluster? We could have it assume everything is the same on the other end? (the side problem is that if the endpoint suddenly point to another cluster, it would be broken)

@FourFifthsCode I do agree, if a secret is deleted we should NOT flush the whole MC-Controller. Clusters should be able to come and go.

One thing I'm slightly worried about removing the cluster is if the MC-Controller relies on that signal. Imagine a controller that looks at service endpoints for ex, when the cluster is removed and added, there is a risk that the controller removes the endpoints globally? I wouldn't want creds refresh to be safe if thats possible.

@embik are we limited by the current cluster.Cluster definition at this point? maybe we need our own lightweight Cluster ?

We could have it assume everything is the same on the other end? (the side problem is that if the endpoint suddenly point to another cluster, it would be broken)

That's a big assumption I think we cannot guarantee unfortunately. I would value correctness (i.e. temporarily drop the cluster from global endpoints) over potential breakage (Kubernetes API endpoint changed and now the running controller is blind to changes because the cluster.Cluster and its active watches are totally broken).

@embik are we limited by the current cluster.Cluster definition at this point? maybe we need our own lightweight Cluster ?

I don't think it's just cluster.Cluster, none of the controller-runtime types we build on are really meant to be restarted (other than "stop cluster, start cluster"). We can look at whether it would be possible for us to provide implementations of the interfaces that can be restarted, but I would maybe argue that is out of scope for this particular PR.

corentone · 2025-06-13T20:55:57Z

providers/kubeconfig/provider.go

+		return fmt.Errorf("failed to create cluster: %w", err)
+	}
+
+	// Copy indexers to avoid holding lock.


why is this necessary?
I'm feeling this would be common to all providers. Should we move such logic into the main runtime code?
I feel that the provider should only have to do:
cl := cluster.New(config)
then mgr.Engage(cl)

CC @embik

This could well be an over-optimization, but the only reason for copying the array is to reduce deadlocks by shortening the lock time.

Anything main runtime code can do that provider doesn't need to implement sounds great if it doesn't impact the abstraction flexibility too much

the part that worries me with the locks is that they are a bit scattered around and you have some in the top level reconcile and some in the internal removeCluster.
I do have some love for the defer pattern:

xx.Lock() defer xx.Unlock()

because Im sure every error handling is automatically covered so no risk of lost locked lock.
Also, I think it'd be good to have locking only at a single level, either in "inside funcs" or in "top level ones".

It's purely a readability recommendation, we could either:

lock more coarsely; for ex only locking top level Reconcile and IndexAllFields
1.1 Reconcile would lock RO until it has made a decision on what to do, then locks RW to proceed with the full edit
1.2 IndexField would grab a RW for the entirety of its update of indexers.

Hide the locking into helper functions GetCluster, SetCluster, RemoveCluster, GetIndexers, SetIndexers, RemoveIndexers (and make the copy there).... it would make the code a bit less readable but your lock would be tucked hidden.

Another thought: is there a risk of removing indexed fields on a cluster being deleted? I wonder if the granular locks could create some weird half in between situations? (I locked for read but now I need to write and things have changed?)

corentone · 2025-06-13T20:56:36Z

providers/kubeconfig/provider.go

+	log.Info("Successfully added cluster")
+
+	// Engage the manager if provided
+	if mgr != nil {


can the manager be nil? if the manager is nil it should be an error no?

Theoretically speaking, you could just start the provider and call Get on it whenever you want to fetch a cluster. It would then not interact with the wider mc-runtime primitives, but it would still be functional in its own way. I'm inclined to allow such a use case for flexibility (maybe you want to use the provider in something else than a typical mc-runtime controller setup).

What are the use cases for providers not in the MC-runtime context? And why would they be in the Mc-runtime repo?

Could we maybe have this provider (maybe others too?) use standard controller-runtime code then?
The whole provider here really really looks like a controller (predicate, HandlerFunc, client and informer cache)

That's a very interesting point, I wouldn't be opposed to using controller runtime code at all if it covers most of the use case. I think I'll give that a try and see how far I get.

What I meant was that you can use an mc-runtime provider (that usually would be used for mc-runtime controllers) without the full mc-runtime machinery if you have a problem to solve that mc-runtime is perhaps not flexible enough for. We were close to using that approach in kcp-dev/api-syncagent and I'm pretty confident others might also hit limitations of the mcManager in mc-runtime while wanting to build on top of the primitives offered in mc-runtime and controller-runtime.

It's not meant to say that some providers should be used without mc-runtime, but it gives implementers the flexibility to use them in other scenarios.

I don't think such providers can and/or should be implemented in controller-runtime standard code then, there isn't much of a different here (providers don't use too much mc-runtime specific code anywaY).

corentone · 2025-06-13T20:58:24Z

providers/kubeconfig/provider.go

+
+	// Start the cluster
+	go func() {
+		if err := cl.Start(clusterCtx); err != nil {


@embik why are the providers starting the clusters? could it be made the responsibility of the runtime?
Then a provider could focus on: cl=cluster.New; mgr.Engage.
then cl.Delete or cl.Update?

As hinted at in https://github.com/kubernetes-sigs/multicluster-runtime/pull/45/files#r2149539056, this makes providers functional in their own right if you want to use them outside of a mcmanager.

In addition, some provider implementations might be constructing cluster.Cluster in a special way - the kcp provider for example creates something called a scopedCluster that provides restricted access to a larger cache, and such a scoped cluster cannot be started: https://github.com/kcp-dev/multicluster-provider/blob/9c638a4a0047f86564e00bbbd527c386bfabc378/apiexport/cluster.go#L134

@sttts any further thoughts on this maybe?

corentone · 2025-06-18T21:28:18Z

providers/kubeconfig/provider.go

+}
+
+// removeCluster removes a cluster by name with write lock and cleanup
+func (p *Provider) removeCluster(clusterName string) {


This method looks fine, two small questions:
1/ do the order between removal from clusters and cancellation matter? can stoppage fail or something?
2/ should the remove wait for the cluster to fully stop? (should there be some channel in the start method to wait on to know it's fully returned?). I'm not actually sure what "Start" on the Cluster actually does so it likely doesn't matter.

Looks like Start on the Cluster is mostly involved with starting the cache. I explored possibly using the 'waitForCacheSync' methods on the cache, and followed it all the way back to informer cache waitForStarted func, and can see that it also is just using the context to watch for cancels. So I don't think there is currently any mechanism that allows us to wait, we can only trigger the cancel and report the cluster no longer exists when fetched.

So I don't think there is currently any mechanism that allows us to wait, we can only trigger the cancel and report the cluster no longer exists when fetched.

That's correct as far as I can say, we also added #32 to better address this.

Also, the main place cluster is fetched looks like its through the multi-controller manager GetCluster func. Which means if the cluster doesn't exist, the consumer needs to handle that use case. I did see that Im not returning the cluster ErrClusterNotFound, I updated Get() to return that type.

…when cluster isn't found Signed-off-by: Codey Jenkins <FourFifthsCode@users.noreply.github.com>

…ice account creation Signed-off-by: Codey Jenkins <FourFifthsCode@users.noreply.github.com>

FourFifthsCode · 2025-06-20T15:41:12Z

Updated script/readme to include rbac and service account creation. Hopefully helps to clarify usage and how things work in better detail.

embik · 2025-06-24T09:54:21Z

/approve

I'm good with this, leaving final lgtm to @corentone to make sure all comments have been addressed from his perspective.

Thank you so much for picking this work up!

k8s-ci-robot · 2025-06-26T19:33:57Z

@agaudreault: changing LGTM is restricted to collaborators

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

embik · 2025-06-27T07:02:24Z

@FourFifthsCode there is one failure in the linter job, can you PTAL and fix it?

/approve cancel

…m secret Signed-off-by: Codey Jenkins <FourFifthsCode@users.noreply.github.com>

FourFifthsCode · 2025-06-30T13:39:23Z

@FourFifthsCode there is one failure in the linter job, can you PTAL and fix it?

@embik no problem, should be good now

FourFifthsCode · 2025-07-02T13:07:57Z

@corentone I think I addressed everything, anything I missed or anything else to take a look at?

embik · 2025-07-03T06:36:40Z

/approve

zachaller · 2025-07-03T15:49:13Z

@embik looks like this is missing the ok-to-test label?

embik · 2025-07-04T09:36:39Z

@embik looks like this is missing the ok-to-test label?

ok-to-test shouldn't matter because CI isn't prow based in this repo. All jobs have passed, let me ping @corentone to see if he has any last remarks or we can merge.

k8s-ci-robot · 2025-07-05T16:30:56Z

LGTM label has been added.

Details

Git tree hash: 31d9aca914079a64c96b9dc119c37599f5ca0831

k8s-ci-robot · 2025-07-05T16:30:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: agaudreault, embik, FourFifthsCode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [embik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sttts · 2025-07-05T17:32:08Z

🚀

* Adds a multi cluster controller capable of watching remote clusters and reconciling their DNSRecord resources * Uses https://github.com/kubernetes-sigs/multicluster-runtime and kubeconfig provider from kubernetes-sigs/multicluster-runtime#45 Signed-off-by: Michael Nairn <mnairn@redhat.com>

christensenjairus and others added 7 commits March 28, 2025 10:35

Initial attempt, reviewed by corentone

f942914

Super simplify - reviewed by stts and embik

0a9c619

Simplify provider and main.go further, making them look like other pr…

b21016e

…oviders. Remove extra unnecessary aspects like isReady, hashing, etc. Add script to create kubeconfig secrets using remote serviceaccounts.

Merge branch 'main' of github.com:christensenjairus/multicluster-runt…

3e1c90f

…ime into kubeconfig-based-provider

feat: use configmap example in kubeconfig provider, cleanup unused op…

e72b5a8

…tions

test: add tests for kubeconfig provider

1bc0776

feat: add basic hash to kubeconfig to prevent re-engaging existing cl…

d1481a0

…uster of secret data hasn't changed

k8s-ci-robot requested a review from skitt June 6, 2025 20:02

k8s-ci-robot requested a review from sttts June 6, 2025 20:02

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jun 6, 2025

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 6, 2025

zachaller reviewed Jun 6, 2025

View reviewed changes

providers/kubeconfig/provider.go Outdated Show resolved Hide resolved

fix: move unlock after read

389fe13

FourFifthsCode force-pushed the kubeconfig-based-provider branch from 4d8f5cb to 389fe13 Compare June 9, 2025 14:16

fix: guard against some race conditions and add a basic race test

f40b8f6

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 9, 2025

FourFifthsCode added 3 commits June 9, 2025 15:43

docs: remove pod watcher controller and use builder as example

558cd9f

Merge branch 'kubernetes-sigs:main' into kubeconfig-based-provider

9759a8b

chore: add go.mod for kubeconfig provider and example

c976645

embik self-requested a review June 13, 2025 15:45

FourFifthsCode mentioned this pull request Jun 13, 2025

feat: add multicluster support for watching argocd apps in other clusters argoproj-labs/gitops-promoter#327

Merged

corentone reviewed Jun 13, 2025

View reviewed changes

corentone reviewed Jun 18, 2025

View reviewed changes

corentone mentioned this pull request Jun 19, 2025

🌱 Cluster Inventory(ClusterProfile) API Provider #48

Merged

FourFifthsCode added 2 commits June 20, 2025 10:17

Update kubeconfig provider to return multicluster.ErrClusterNotFound …

49611e9

…when cluster isn't found Signed-off-by: Codey Jenkins <FourFifthsCode@users.noreply.github.com>

Update kubeconfig provider script and documentation for rbac and serv…

eb889d6

…ice account creation Signed-off-by: Codey Jenkins <FourFifthsCode@users.noreply.github.com>

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2025

embik mentioned this pull request Jun 25, 2025

Kubeconfig-based Provider Support #22

Closed

agaudreault approved these changes Jun 26, 2025

View reviewed changes

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2025

Remove unnecessary error path and function for parsing kubeconfig fro…

6d4cb6f

…m secret Signed-off-by: Codey Jenkins <FourFifthsCode@users.noreply.github.com>

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2025

embik approved these changes Jul 5, 2025

View reviewed changes

k8s-ci-robot assigned embik Jul 5, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 5, 2025

k8s-ci-robot merged commit 05a6f78 into kubernetes-sigs:main Jul 5, 2025
13 of 14 checks passed

mikenairn mentioned this pull request Jul 14, 2025

Add multi cluster secrets and controller Kuadrant/dns-operator#484

Closed

embik changed the title ~~🌱 Kubeconfig-Based Provider~~ ✨ Add Kubeconfig-Based provider Aug 29, 2025

FourFifthsCode mentioned this pull request Sep 12, 2025

Add direct integration with ClusterProfile object argoproj/argo-cd#24282

Open


		## How It Works

		1. The kubeconfig provider watches for secrets with a specific label in a namespace

✨ Add Kubeconfig-Based provider #45

✨ Add Kubeconfig-Based provider #45

Uh oh!

Conversation

FourFifthsCode commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 6, 2025

Uh oh!

Uh oh!

FourFifthsCode commented Jun 10, 2025

Uh oh!

FourFifthsCode commented Jun 13, 2025

Uh oh!

embik commented Jun 13, 2025

Uh oh!

FourFifthsCode commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corentone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

embik Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FourFifthsCode Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

FourFifthsCode commented Jun 6, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jun 6, 2025 •

edited

Loading

FourFifthsCode commented Jun 13, 2025 •

edited

Loading

embik Jun 17, 2025 •

edited

Loading

FourFifthsCode Jun 16, 2025 •

edited

Loading

zachaller commented Jul 3, 2025 •

edited

Loading