-
Notifications
You must be signed in to change notification settings - Fork 304
Respect k3s experimental agentless server option #8289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is highly unlikely to be accepted.
The closest you'd probably ever get with RKE2 is starting up the kubelet and container runtime, but then leaving the kubelet disconnected from the apiserver so that all it handles is static pods. And I don't really get why you'd want to do that, as you're cutting out nothing except the ability to manage the server as a Kubernetes node. You might as well just taint the node, instead of breaking a bunch of other things. |
I guess I'm not even sure what you're trying to accomplish here. Do you want to start up just the RKE2 supervisor API (the bit that is exposed on port 9345) without any of the control plane components running? If that's what you want, you might take a look at something I was tinkering with a while back: https://github.com/brandond/s8r - but note that this hasn't been updated for recent reorganization of some packages in k3s. |
If having ONLY the supervisor API isn't what you wanted, then perhaps this is closer to what you want: Although even this will have issues, as there are still things that will expect a Node reference to eventually become available. As noted in the K3s docs, |
Thanks for the quick response @brandond. Yes, having only the Supervisor API started by the long running I think I want it for similar reasons I would in k3s and similar to vcluster. To have a Node-less control plane that uses an external etcd (e.g. kine), has no root privileges on the machine it's running on and starts quickly. Normal full VM RKE2 Agent nodes would join to this. Granted, with this simple change, I have to orchestrate the apiserver myself. In my example, the flags could still be consumed from generated manifests, but the host bind mounts no longer make any sense. The example Gist has ephemeral etcd just to be a self-contained example. I will attach a full agent node to it and see how that goes. It at least runs without errors and responds to kubectl requests with the linked gist example. In testing with Tilt, it's also just nice how fast I can iterate with it. |
If you want a functional control-plane with RKE2 you'll need the kubelet and containerd. Is there any particular reason you're not just using K3s? |
What I want is to run control planes as Pods without VMs. This already works well with other Kubernetes distros. I would like to make RKE2 available as well, for the reasons written on the tin, some of which are not claimed by k3s. And because RKE2 is the kubernetes my specific internal product already provides to our users. Ideally, most parts of the RKE2 control plane (e.g. not DaemonSets) could run as unprivileged Pods in an orchestration cluster that are invisible to the resulting user cluster. I realize this is not supported and requires duplication of effort on my end, that's fine and expected. I opened this issue somewhat early in the evaluation phase to see how receptive you might be, I was actually pleasantly surprised how easy it was to get running without containerd - presumably because of the k3s inheritance and previous efforts there around vcluster. |
What exactly do you mean by this? What does this look like, in practice?
There are no daemonsets. The RKE2 control-plane is entirely comprised of static pods, using manifests created by the RKE2 supervisor process and executed by the kubelet. While I would love to enable some variety of an agentless server, even in an unsupported capacity - we're unlikely to move forward with any approach that doesn't result in a working cluster. So, an agentless server using kine or external etcd, where RKE2 continues to manage the control-plane pods through containerd + kubelet, is something that we could enable. An agentless server where the supervisor comes up and does nothing until someone externally provides the control-plane components using some external automation, is not something we'd be interested in. Note that neither K3s nor RKE2 are architected to support heterogenous clusters. They both expect that the correct distro's supervisor API is available, and that all nodes are running the same distro. If you're attempting to mix and match distros, or run RKE2 agents without RKE2 servers and supervisor controllers, things are unlikely to work well. What you're talking about would probably require a custom executor that integrated with the host cluster's Kubernetes API to create pods alongside the RKE2 server pod, instead of relying on the kubelet+containerd to run the pods. As far as I know the kubelet cannot be run within an unprivileged pod. https://github.com/rancher/rke2/tree/master/pkg/podexecutor |
K8s control planes run inside orchestration Kubernetes clusters as Pods.
I know, it was just an example, there are DaemonSets in a typical RKE2 control plane Node. These aren't part of Pod-based control planes, which I suppose is obvious.
Clusters are homogenous, it's just the control plane is Node-less, is provided a datastore and run as Pods with minimal privileges, not VMs.
This is interesting and I could work towards this, but there are complexities like mounts that may be awkward to wire up unless it's end-user pluggable. These control planes are typical Kubernetes applications, they have no host mounts and things like configs, certs and tokens are generated in an operator then provided in typical k8s fashion, e.g. mounting from Secrets. Rotations are performed by rollout (Pod replacement). It's also a little weird for scheduling, I'm not sure you can elegantly express that the rke2 Pod is going to create an apiserver Pod next to it - two containers in a Pod fit more naturally there, but you would generate that in advance. I'd be happy to even use If |
I'm tinkering with the outline of a new executor implementation that would do something like what you're describing. It would basically convert the RKE2 supervisor to a cluster operator that hosts the supervisor API and manages control-plane pod deployments. The current code for managing etcd is very closely tied to the idea of each supervisor managing a local etcd member, so I'm ignoring that and assuming that there is an existing etcd operator that could be deployed to manage that, and the cluster can simply be pointed at a service endpoint. You'd lose the ability to manage snapshots and such via the RKE2 CLI, but rewiring the current logic to work with StatefulSets or the like is way more involved than I want to get for a proof of concept. |
An example would be a Pod-based cluster-api control plane provider. (I'm not sure Pod-based is anything more than an idea one might implement from cluster-api's perspective, it's not defined by types in the code AFAIK.) At a glance, something that is part of the "workload" cluster's steady operation or listening to remote traffic, e.g. the Supervisor API, I'd want isolated from an operator. I'd want it to have little or no access to the management apiserver and separate concerns about eviction / scaling / versioning. |
I've been poking at this a bit more and it's... more complicated than I initially anticipated. It is easy enough to get the supervisor to run the control plane pod as deployments in a host cluster, I have that working. Other than the previously mentioned difficulties around managing the etcd cluster pods, the supervisor API is proving somewhat difficult to break out from the apiserver. The control-plane pods normally run with host network, so the components can always find eachother on the loopback address, and clients can rely on every apiserver IP also hosting the supervisor API on a different port. This will need to either be broken out on the k3s side, or the apiserver pods will need to proxy the supervisor port back to the supervisor pods. There are also complications around the agent tunnels that the apiserver uses to connect up to the kubelet for |
I think in my own trials that's addressed by a couple things. Things that expect to talk on loopback go in the same Pod. A LoadBalancer Service doesn't work if addressing a specific Pod matters. |
If things are going to be properly abstracted, they need to work behind a LoadBalancer or Ingress. I wouldn't personally settle for anything that required agents to be able to connect directly to one or more pods. It is unfortunate that the apiserver only supports setting a single value for its advertise-address flag, and ensures that only the advertised addresses of running apiservers are present in the kubernetes service endpoint list. Makes it hard to hide multiple apiservers behind a loadbalancer that may itself have a dynamic list of IPs. |
Proposed Changes
The experimental k3s flag
--disable-agent
is passed through already, however, items that contact containerd directly inrke2 server
do not respect this flag.Supporting this flag allows running rke2 server independently of local static pods. This might be useful for:
rke2 server
as a pod for quick testingThis is a hidden experimental feature. We could copy the same server configuration docs from k3s or keep them omitted.
Types of Changes
Fix for hidden experimental feature.
Verification
I verified this change by applying the patch onto
v1.32.4-rke2r1
and running that build as a StatefulSet. Etcd and apiserver were not rebuilt, used the published images.https://gist.github.com/glennpratt/6272c94db3093127a948a37c5a378a0e
Testing
This change is not currently covered by unit tests. Happy to add any tests desired, but I may need some assistance if it's beyond unit tests.
Linked Issues
User-Facing Change
This is a hidden feature, undocumented in rke2 but documented in k3s.
Further Comments