Skip to content

Request for a community owned GCP project for minikube #7414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
medyagh opened this issue Oct 15, 2024 · 28 comments
Open

Request for a community owned GCP project for minikube #7414

medyagh opened this issue Oct 15, 2024 · 28 comments
Labels
sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@medyagh
Copy link
Member

medyagh commented Oct 15, 2024

Hello, minikube maintainer here, I would like to ask for a GCP project for minikube owned by the CNCF community, our release test infra is at a google owned project that we like to explore migrating it to CNCF-owned project, is this the right place to ask for it ?

related: kubernetes/test-infra#33654

@medyagh medyagh added the sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. label Oct 15, 2024
@ameukam
Copy link
Member

ameukam commented Oct 15, 2024

cc @BenTheElder @upodroid

@BenTheElder
Copy link
Member

BenTheElder commented Oct 15, 2024

Can you outline some more detailed requirements so we can determine how best to provide them?

We generally try to work from "my project needs VMs for testing cgroups v2 which we cannot do locally in a CI container" => "well we have AWS credits let's use EC2, make sure to use boskos to rent access" or "my project needs to host container images" => use registry.k8s.io (which is AWS+GCP, there are standardized docs for setting up image hosting on here in this repo). We have to maintain balance across the budgets available to the project.

What infra we do provide we also setup here in git wherever possible (terraform, bash etc), so it's auditable and so others can chip in in the future, instead of just creating cloud project admins and having them create random resources. So we need to know what to spin up, exactly.

We have a lot of existing shared resources in the project for things like CI and release.
We also have been asking subprojects to for example use github to host binaries, to avoid digging a deeper dependency on vendor credits when there are reasonable alternatives.

@medyagh
Copy link
Member Author

medyagh commented Oct 21, 2024

There are multiple aspects of it, and since it is an 8 years old infrastructure for both test/release and also hosting live apps and released artifacts (binaries and tarballs, ISOs, docker images ...) .
currently there is no concrete plan on how the design of the new infrastructure would be.

I agree that we would like to leverage github binaries and github actions as much possible when do-able, some cases might not work such as minikube preload tarball images.

Currently the idea is to get a footprint in the public owned infra and then try to move little by little without disrupting the system or unrealistic overcapacity re-eingeering

The current requirements coming to mind

  • GCS buckets (ISOs, Preload Tarballs, released binaries, json files, ...)
  • Compute Engine VMs (jenkins CI and test agents)
  • Cloud Run (to host minikube apps such as triage-party, gopoph-server)
  • Artifact Registry to host various images (in-house addons such as storage provisioner, kic base image,...)

This is a good list to start with but not comprehensive,

The idea to to get a footprint in the new project and re-evaluate the path forward.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 22, 2024

Currently the idea is to get a footprint in the public owned infra and then try to move little by little without disrupting the system or unrealistic overcapacity re-eingeering

We have already engineered systems for e.g. hosting images though, and we do not want to dig a new unsustainable hole for these.

From the specific examples:

Artifact Registry to host various images (in-house addons such as storage provisioner, kic base image,...)

We do not want users consuming directly from any paid SaaS like this, it is a liability for the project (we have no flexibility to shift costs when utilization and funding shifts).

We shouldn't re-introduce this.

GCS buckets (ISOs, Preload Tarballs, released binaries, json files, ...)

See above comment, also can be hosted on github at no cost?

Compute Engine VMs (jenkins CI and test agents)

Can we use our existing CI infra? We already have a lot of resources behind this and they're shared/pooled across the project. We care a lot about things like making sure that VMs get cleaned up when they're no longer in use.

At the scale that we're supporting, if every project runs custom unmonitored systems we can't keep track of the waste.
When subprojects rent an e2e project/account on prow and create a test cluster there, we have some assurances that whether or not the test itself is a good use of resources, the resources will not be forgotten to run indefinitely.


currently there is no concrete plan on how the design of the new infrastructure would be.

The idea to to get a footprint in the new project and re-evaluate the path forward.

That's just not how we run k8s infra though, it's not transparent or sustainable.

Everything we've lifted and shifted previously we've span up a new copy in k8s infra, with the specifics checked in, so others can read through, edit/PR, and otherwise take over in the future.

We haven't granted any subproject the ability to arbitrarily create cloud resources in a project because it's not accountable and it's not reproducible. Everything we're running can be traced back to e.g. https://github.com/kubernetes/k8s.io/tree/main/infra/gcp/terraform and the SIG (as steward) has agreed is reasonable to run (and always sought out the most effective answers, we've had to work hard to reach sustainable spend, up to and including things like working with SIG Scalability to evaluate their test workloads and adjust frequency and scheduling).

@BenTheElder
Copy link
Member

cc @dims (chair) in additional to TLs (#7414 (comment))

@BenTheElder
Copy link
Member

All of the infra we've migrated has been similarly old if not older and it does take a lot of work, but I also think we really don't want to regress from all the effort we've put in so far and the ground rules we've stablished (such as not permitting non-community owned accounts into our CI), which are all based on mitigating real issues we've experienced in the past.

It's really important that I or any of the other infra leads can quit and someone else can pick up the pieces without blockers, and that we keep an eye on sustainable spend and know what it is that we're funding and what the usage trends are.

@medyagh
Copy link
Member Author

medyagh commented Oct 28, 2024

I undrestand and I agree with leveraging github as much as possible, Some of the the artifacts can be hosted in github such as binaries, as part of the Release Assets
however some can not such a preload tar balls since they would need to be generated per kubernetes version per container runtime, and they get generated After minikube is released, that would require a separate release tag or possibly a new Kubernetes Projects just for preloads generation.

there are also many jobs that build ISOs and Kic Images Per PR and push to the PR, that would not be doable in Free github action machines, that would need beefy machines to build ISOs.

currently we have 80 internal autmoation jobs (not dependabot) thats bumps new versions of ISO/Image software and pushes a new ISO during Off peak hours (mid night) those wouldn not be implementable using github or github actions.

also as mentioned in my previous comment, we also have multiple hosted Software running for minikube that are essential in running minikube project, currently deployed to Cloud run

@BenTheElder
Copy link
Member

however some can not such a preload tar balls since they would need to be generated per kubernetes version per container runtime, and they get generated After minikube is released, that would require a separate release tag or possibly a new Kubernetes Projects just for preloads generation.

The content contained in github releases is mutable, even after advertising a release publicly.

Are these "preload tarballs" essentially a set of container images? Because that sounds like if we host it we're going to have the registry.k8s.io egress problem duplicated. Per above it sounds like these are advertised directly from GCS buckets, which is not a cost-effective approach and not something we want to do again.

Cost effectiveness aside, it limits our ability to make decisions later about what resources to use for hosting as users become dependent on the buckets and make assumptions about them).

Again, we have an established process and common infra for container image hosting: https://github.com/kubernetes/k8s.io/tree/main/registry.k8s.io#managing-kubernetes-container-registries

@upodroid has been working on migrating the staging to artifact registry and may have some updates for the process but we don't have to block on that.

there are also many jobs that build ISOs and Kic Images Per PR and push to the PR, that would not be doable in Free github action machines, that would need beefy machines to build ISOs.

That's a distinct problem from where they're hosted though. The output of the jobs can be copied where we need it ...?

also as mentioned in my previous comment, we also have multiple hosted Software running for minikube that are essential in running minikube project, currently deployed to Cloud run

ACK ... We still need an accounting of what exactly.

Should probably prioritize the most critical assets first.

@medyagh
Copy link
Member Author

medyagh commented Nov 13, 2024

Are these "preload tarballs" essentially a set of container images?
the preload are not images, but they are essentially the File System Compressed for a specific Runtime/FileSystem Storage/Kubernetes Version, that way both VM and Container Drivers can spin up quickly without having to load each image individually to the container runtime

@BenTheElder
Copy link
Member

Ok, but we still have to sustainably host the ingress if we're paying for it in k8s infra. We have an allocation for the core repos binaries (we get a bandwidth budget that we negotiated based on that need), and we have registry.k8s.io

We have to be careful with introducing content hosts because we have limited ability to cut usage and manage costs. We've been asking subprojects to use GitHub releases to host files. We probably would do this for Kubernetes too but we have a huge legacy around that and we receive an ongoing donation specifically for that problem.

@ameukam
Copy link
Member

ameukam commented Nov 14, 2024

IMHO we should break down this migration project in different conversations. I can't definitively do a lift and shift for Minikube.
Can we start the CI migration and migrate away from Jenkins to Prow ?

@ameukam
Copy link
Member

ameukam commented Dec 6, 2024

@medyagh Any thoughts on my proposal ?

@ameukam
Copy link
Member

ameukam commented Feb 17, 2025

@medyagh kindly ping

@medyagh
Copy link
Member Author

medyagh commented Apr 30, 2025

Hi @ameukam @BenTheElder , we do need a project to host the ISO images that are built Per PR,
for example in this PR contributor bumped the CRIO version in the iso

and intenral minikube GCP project built and "pushed" and ISO image to a GCS bucket, to not depend on an internal minikube proejct inside google we need an alternative to push large artifacts during build and test process

easiest thing would be having a community owned and managed GCP project for minikube to replicate the same process there but if there is a way to Push large ISO images "Per PR" I am open to use that infra if available.

mind that building ISO wouldnt work in Github action machines, and wouldnt work on github action artifacts due to its size and compute power needed to build linux from source.
kubernetes/minikube#20630 (comment)

@BenTheElder
Copy link
Member

We have avoided pointing end users at any single SaaS to avoid being stuck with exploding bills and no way to migrated, so we'd have to setup something like dl.k8s.io

We need to know how much bandwidth. dl.k8s.io involved a lengthy process to negotiate sufficient bandwidth from a CDN provider to host Kubernetes's binaries with some room for growth.

Can you run builds on GCB (we provide resources for this meant for building container images, which we have a shared host for) but upload them to github?

@BenTheElder
Copy link
Member

cc @kubernetes/sig-k8s-infra-leads

@medyagh
Copy link
Member Author

medyagh commented May 1, 2025

the machines that build ISO need to be beefy, the github action time out and take forever make it not practical to use,

but after the ISO is built that would be for the PR's ISO not the released ISO, we can not use github release assets for that.

have added iso to github assets for the reelased minikube as a fail over to GCS (since github assets are significantly slower)
but we do need a way to push 400MB ISO after built on the PR to be used for HEAD minikube source till release

@BenTheElder
Copy link
Member

BenTheElder commented May 1, 2025

the machines that build ISO need to be beefy, the github action time out and take forever make it not practical to use,

I'm not suggesting to use GHA to build, please see above again. GCB offers large machines and we already have docs to get automated builds with GCB, however we would ask that you do NOT point end users to the "staging" buckets / GCR, those are supposed to be intermediate / internal only.

My ask it that we build in GCB and publish to github.

EDIT: GCB is also post-merge though, for security reasons.

@BenTheElder
Copy link
Member

but after the ISO is built that would be for the PR's ISO not the released ISO, we can not use github release assets for that.

Do we have to build this on every pull request before merge ...?

Even kubernetes/kubernetes has many artifacts we do not build on PRs and are instead only reviewed and then after merge built, and then those builds can be adopted in a subsequent PR.

This is a trade off in resourcing and load and what we make available to ~arbitrary code pushed to PRs.

@medyagh
Copy link
Member Author

medyagh commented May 1, 2025

Do we have to build this on every pull request before merge ...?

not on all pull request but on all pull requests that change the ISO yes it is very essential to build PRs that change ISO and have it tested, so we do need something like GCS bucket to store the build artifacts to be used till minikube is released.

@BenTheElder
Copy link
Member

BenTheElder commented May 1, 2025

not on all pull request but on all pull requests that change the ISO yes it is very essential to build PRs that change ISO and have it tested, so we do need something like GCS bucket to store the build artifacts to be used till minikube is released.

If you don't release it directly to users (IE just because an ISO build is available doesn't mean minikube is using it yet), you can do what we do in other projects including kubernetes/kubernetes, which is:

  1. PR to change ISO
  2. ISO builds and released (after merge)
  3. Adopt ISO in the tooling via second PR
  4. If second PR's tests fail, iterate on 1-3 again instead of merging.

See for example:
kubernetes/kubernetes#127057
#7262

For this, we can offer you GCB and you can push to GCS as an intermediate location to then publish to github.
You cannot tell users to download from the staging GCS, because that pins us to uncapped bandwidth costs with a particular vendor, and we are rolling out garbage collection for staging locations to prevent accumulation of unused builds.
But you can push to staging and then promote it to github or similar.

The docs for staging GCB builds are in this repo.

@medyagh
Copy link
Member Author

medyagh commented May 1, 2025

we dont merge the ISO PRs till they can build, because there are more than 20+ different type of automated bumping ISO softwtare + manual ISO PRs that needs to be first proved than can be built, The user end laptops are not suitable for building ISO and contributor relies on our CI infra to build the ISO and also test it with all different platforms before being merged.

anyways that part of the mechanisim is not what needs to be solved (the current trigger mechanism is if the PR has a ok-to-build-iso by a maintainer then it triggers the ISO job (that can be moved out of jenkins to be GCB or anything) else however we do need a place to host the ISO while it is merged on HEAD

and even after minikube is ready to be releaed, the current github assets is very slow and will signifincatly affect the onboarding experience of minikube users.

so for not affecting the smooth onboarding experience for kubernetes, I still need a Highly Available Storage for the Released ISO other than github release assets which is only good for archive purposes.

@upodroid
Copy link
Member

upodroid commented May 1, 2025

Hi @medyagh

I'm a fellow SIG Infra/Testing maintainer with several comments to share.

At a high level, I recommend you rewrite the Minikube CI and release pipelines to fully use Prow and release Minikube the same way we release other projects. We are unable to migrate the Jenkins CI used by Minikube that's hosted internally at Google and in the unlikely event it disappears, the project will be at risk.

The kops project is quite similar to minikube and I can share how they do things:

  1. they have various prow jobs that do unit, linting, and a few real e2e tests. gce: use typed ServiceAccount in IAM tasks kops#17379
  2. Once a PR is merged, they build kops and publish it to a staging image registry + staging blob location
  3. https://testgrid.k8s.io/kops-distros They run a large set of periodic tests using kops on real infra with various os/cni/plugin combinations
  4. When they cut releases, kops serves images from registry.k8s.io and blobs from GitHub releases. https://github.com/kubernetes/kops/releases/tag/v1.31.0 and Promote kOps 1.32.0-beta.1 images #7740
  5. Cluster API is another very large project similar to minikube that directs users to download from GitHub releases https://cluster-api.sigs.k8s.io/user/quick-start
  6. Some projects do use GitHub Actions and you can retain them but Prow gives you access to faster VMs and the ability to launch VMs in GCP for e2e testing.
  7. We want minikube to start using testgrid so we can track regressions and failures over a period of time. It will help you detect when changes have been merged and they are causing regressions.

and even after minikube is ready to be releaed, the current github assets is very slow and will signifincatly affect the onboarding experience of minikube users.

Downloading artifacts from GitHub releases is fast for all users and there are no rate limits for reasonable use cases.

If this sounds good, let me know and I'll help you with the migration effort.

@medyagh
Copy link
Member Author

medyagh commented May 1, 2025

I am open to migrate to prow if it can support our testing/building, minikube infra in jenkins is 8-9 years old and historically the reason that couldnt use prow was prow didnt have nested virtualization to run VMs inside there, and I believe that is still true that prow jobs all need to be running inside a container, (which wont work for VM drivers of minikube)

Currently I am the only person at google mananging all the internal infra and I would love to move out as much of this infra to community owned infra without disruption and also be realistic how much work it needs. I prefer super simple solution without causing more maintainers toll.

The minikube is ISO has a few requirements
1 - Be Able to Build ISO Per PR and push the ISO artifact to be merged on Head
(due to complexity of ISO and the time it takes to build, having a ISO PR merged without building will cause massive conflicts and massive reverts will be needed) so this part is not doable, (to merge ISO PR hoping it is good and later build after merged)
2- Be able to host ISO artifact at least minikube is released

there are currently more than 20 ISO automation PRs+manual ISO PRs from contributors, each build of ISO takes 5+ hours on a beefy machine, and if this process gets entagled it will be extremely difficult to debug what cause the issue

(mind that I havent discussed other aspects yet, KIC base images are simmilar story but instead of publishing artifacts, we publish OCI images, I would like to tackle one at a time though)

I feel like this discussion would be better to do over a call, would you guys be open to discuss ?

@upodroid
Copy link
Member

upodroid commented May 1, 2025

I feel like this discussion would be better to do over a call, would you guys be open to discuss ?

Lets discuss it further in a call.

I am open to migrate to prow if it can support our testing/building, minikube infra in jenkins is 8-9 years old and historically the reason that couldnt use prow was prow didnt have nested virtualization to run VMs inside there, and I believe that is still true that prow jobs all need to be running inside a container, (which wont work for VM drivers of minikube)

We can do nested virtualisation in a K8s pod running in GKE. https://cloud.google.com/kubernetes-engine/docs/how-to/nested-virtualization. You can also launch VMs with nested virt in GCE, SSH in and run your tests against it(this is how we run node e2e tests and also real e2e testing on the cloud for kubernetes)

1 - Be Able to Build ISO Per PR and push the ISO artifact to be merged on Head

We can build ISOs in presubmits(usually via an optional job that is triggered by slash commands or changed directories) and rebuild them again on merge to master branch.

I did see the images being hosted at gcr.io/k8s-minikube and they all need to be migrated to registry.k8s.io as part of the CI rewrite.

@medyagh
Copy link
Member Author

medyagh commented May 1, 2025

Sounds good, SSHing into a GCE instance would still need a GCP Project for minikube owned by community (the purpose of this issue)
so I believe it is best to start with a gcp project for minikube owned by community

@upodroid
Copy link
Member

upodroid commented May 1, 2025

SSHing into a GCE instance would still need a GCP Project for minikube owned by community (the purpose of this issue)

It doesn't, we have something called boskos and kubetest2 that lends you a GCP project to create ephemeral VMs in.

Have a look at this job as an example https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-node-e2e-containerd/1918007953990881280

I can explain the tooling when we meet.

@medyagh
Copy link
Member Author

medyagh commented May 7, 2025

SSHing into a GCE instance would still need a GCP Project for minikube owned by community (the purpose of this issue)

It doesn't, we have something called boskos and kubetest2 that lends you a GCP project to create ephemeral VMs in.

Have a look at this job as an example https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-node-e2e-containerd/1918007953990881280

I can explain the tooling when we meet.

that looks very interesting ! might be able to leverage that, however we still need somewhere to Host the built ISO after the PR is merged, (the Cron Job Tests on the Head will need to be run against the HEAD minikube with the merged ISO, so if the PR gets merged with temproary GCP project, the after merged HEAD will have older ISO and mess up our Gopogh Flake Test Dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
Status: No status
Development

No branches or pull requests

4 participants