Skip to content

[Multi-k8s] Use different configuration for different k8s contexts #5353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Michaelvll opened this issue Apr 24, 2025 · 0 comments
Open

[Multi-k8s] Use different configuration for different k8s contexts #5353

Michaelvll opened this issue Apr 24, 2025 · 0 comments
Labels

Comments

@Michaelvll
Copy link
Collaborator

Discussed in #5352

Originally posted by SchKng April 24, 2025
Hi all,

I'm currently setting up skypilot for our org. and I'm running into some limitations with the way kubernetes is handled at the config.yaml level.

Our setup

Here's a summary of our setup:

  • 3 kubernetes clusters
    • GKE (on GCP)
    • EKS (on AWS)
    • K3S (on-prem, deployed with sky local up)
  • Sky API server deployed remotely (multi-user teams)

Kubernetes contexts management

As for as I understand, kubernetes is handled as a single cloud through the config.yaml file.
Each kubernetes cluster has to be added in the allowed_contexts of the kubernetes section of the config.

I'm going to take the example of the autoscaling configuration to illustrate my point.
If I want to enable the autoscaling feature, I can only do it at the "kubernetes" level in the config.yaml deployed in the API server:

kubernetes:
  allowed_contexts:
    - gke_context
    - eks_context
    - on_prem_context
  provision_timeout: 900
  autoscaler: gke

Let's say I want to force a job / cluster to run on our EKS cluster (--cloud k8s --region eks_context).
As it stands, given the previous config on the remote API server, it will try to use the GKE autoscaling feature to try to provision and fail.

Of course, I could override the configuration through the cli or through my local config.yaml (as specified here).
However, as an admin of the whole thing, I'd like to be able to manage all this within the config.yaml of the remote API server, not hope that users don't forget changing or overriding the config !

Benefits & design proposal

This could also have other benefits such as specifying a different service account / custom_metadata / ... for each k8s context, basically all of the kubernetes options offered in the config.yaml.

A proposal could be to have something like:

kubernetes:
  gke_context:
    autoscaler: gke
    provision_timeout: 600
    remote_identity: my-gke-service-account
  eks_context:
    autoscaler: xxx
    remote_identity: my-eks-service-account
  on_prem_context:
    autoscaler: none
    provision_timeout: 300

Please let me know if I missed something or if there's another way to do that.
Thanks a lot!

@Michaelvll Michaelvll added the P0 label Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant