Skip to content

GCP: experiment nodepools per SIG #8004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ameukam opened this issue Apr 17, 2025 · 7 comments
Open

GCP: experiment nodepools per SIG #8004

ameukam opened this issue Apr 17, 2025 · 7 comments
Assignees
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@ameukam
Copy link
Member

ameukam commented Apr 17, 2025

Experiment if split prowjobs on different GKE nodepools allocated per SIG is viable.

SIG CI Node gave a +1 for participation to the experiment.

Experiment should be done after 1.33 is out and before 1.34 code freeze.

TODO:

cc @SergeyKanzhelev

/assign
/sig k8s-infra
/sig node
/priority backlog
/milestone v1.34

@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Apr 17, 2025
@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/node Categorizes an issue or PR as relevant to SIG Node. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Apr 17, 2025
ameukam added a commit to ameukam/k8s.io that referenced this issue Apr 24, 2025
Related:
  - kubernetes#8004

Setup a dedicated nodepool with taints using an external terraform module.
We want evaluate running prowjobs on COS with newer machine types.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@BenTheElder
Copy link
Member

Wait, why would we do nodepools per sig? What's the purpose?

Won't this give us worse bin-packing than sharing one node pool ...?

@BenTheElder
Copy link
Member

cc @kubernetes/sig-k8s-infra-leads

@ameukam
Copy link
Member Author

ameukam commented Apr 24, 2025

Wait, why would we do nodepools per sig? What's the purpose?

I think breaking prowjobs could help simplify maintenance and potentially improve performance. A split per SIG looks like the simple approach to me for this expermient. Also we don't have to do it for all the SIGs. We can target only 2-3 SIGs to do this. with more nodepools we can have different instance types, disk types, etc.

Won't this give us worse bin-packing than sharing one node pool ...?

I think not really for the case of prowjobs by owned SIG node.

@BenTheElder
Copy link
Member

BenTheElder commented Apr 24, 2025

Doing a few sigs in order to experiment with different node config and then consolidate on one afterwards makes sense to me.

Permanently splitting wouldn't, the issue description could do with more detail. As written it sounds like an experiment towards permanently splitting.

@BenTheElder
Copy link
Member

I still wouldn't frame them as sig node pools. Or even split them, we should just try desired node pool configs on specific jobs which will span sigs

@ameukam
Copy link
Member Author

ameukam commented Apr 24, 2025

I still wouldn't frame them as sig node pools. Or even split them, we should just try desired node pool configs on specific jobs which will span sigs

🤔 what are the cons of doing a permanent split ?

@BenTheElder
Copy link
Member

🤔 what are the cons of doing a permanent split ?

worse bin packing / autoscaling

more complex and confusing prowjob config

poorer portability between clusters

false sense of per-sig utilization (in reality a lot of jobs are for many sigs, maybe "sig testing" owns them or something but they're producing test results for many sigs)

ameukam added a commit to ameukam/test-infra that referenced this issue May 14, 2025
Related to:
  - kubernetes/k8s.io#8004

Use tolerations to schedule e2e-containerd prowjobs to a dedicated
nodepool added in kubernetes/k8s.io#8035.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/test-infra that referenced this issue May 17, 2025
Related to:
  - kubernetes/k8s.io#8004

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/test-infra that referenced this issue May 17, 2025
Related to:
  - kubernetes/k8s.io#8004

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

3 participants