-
Notifications
You must be signed in to change notification settings - Fork 879
Request for a community owned GCP project for minikube #7414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you outline some more detailed requirements so we can determine how best to provide them? We generally try to work from "my project needs VMs for testing cgroups v2 which we cannot do locally in a CI container" => "well we have AWS credits let's use EC2, make sure to use boskos to rent access" or "my project needs to host container images" => use registry.k8s.io (which is AWS+GCP, there are standardized docs for setting up image hosting on here in this repo). We have to maintain balance across the budgets available to the project. What infra we do provide we also setup here in git wherever possible (terraform, bash etc), so it's auditable and so others can chip in in the future, instead of just creating cloud project admins and having them create random resources. So we need to know what to spin up, exactly. We have a lot of existing shared resources in the project for things like CI and release. |
There are multiple aspects of it, and since it is an 8 years old infrastructure for both test/release and also hosting live apps and released artifacts (binaries and tarballs, ISOs, docker images ...) . I agree that we would like to leverage github binaries and github actions as much possible when do-able, some cases might not work such as minikube preload tarball images. Currently the idea is to get a footprint in the public owned infra and then try to move little by little without disrupting the system or unrealistic overcapacity re-eingeering The current requirements coming to mind
This is a good list to start with but not comprehensive, The idea to to get a footprint in the new project and re-evaluate the path forward. |
We have already engineered systems for e.g. hosting images though, and we do not want to dig a new unsustainable hole for these. From the specific examples:
We do not want users consuming directly from any paid SaaS like this, it is a liability for the project (we have no flexibility to shift costs when utilization and funding shifts). We shouldn't re-introduce this.
See above comment, also can be hosted on github at no cost?
Can we use our existing CI infra? We already have a lot of resources behind this and they're shared/pooled across the project. We care a lot about things like making sure that VMs get cleaned up when they're no longer in use. At the scale that we're supporting, if every project runs custom unmonitored systems we can't keep track of the waste.
That's just not how we run k8s infra though, it's not transparent or sustainable. Everything we've lifted and shifted previously we've span up a new copy in k8s infra, with the specifics checked in, so others can read through, edit/PR, and otherwise take over in the future. We haven't granted any subproject the ability to arbitrarily create cloud resources in a project because it's not accountable and it's not reproducible. Everything we're running can be traced back to e.g. https://github.com/kubernetes/k8s.io/tree/main/infra/gcp/terraform and the SIG (as steward) has agreed is reasonable to run (and always sought out the most effective answers, we've had to work hard to reach sustainable spend, up to and including things like working with SIG Scalability to evaluate their test workloads and adjust frequency and scheduling). |
cc @dims (chair) in additional to TLs (#7414 (comment)) |
All of the infra we've migrated has been similarly old if not older and it does take a lot of work, but I also think we really don't want to regress from all the effort we've put in so far and the ground rules we've stablished (such as not permitting non-community owned accounts into our CI), which are all based on mitigating real issues we've experienced in the past. It's really important that I or any of the other infra leads can quit and someone else can pick up the pieces without blockers, and that we keep an eye on sustainable spend and know what it is that we're funding and what the usage trends are. |
I undrestand and I agree with leveraging github as much as possible, Some of the the artifacts can be hosted in github such as binaries, as part of the Release Assets there are also many jobs that build ISOs and Kic Images Per PR and push to the PR, that would not be doable in Free github action machines, that would need beefy machines to build ISOs. currently we have 80 internal autmoation jobs (not dependabot) thats bumps new versions of ISO/Image software and pushes a new ISO during Off peak hours (mid night) those wouldn not be implementable using github or github actions. also as mentioned in my previous comment, we also have multiple hosted Software running for minikube that are essential in running minikube project, currently deployed to Cloud run |
The content contained in github releases is mutable, even after advertising a release publicly. Are these "preload tarballs" essentially a set of container images? Because that sounds like if we host it we're going to have the registry.k8s.io egress problem duplicated. Per above it sounds like these are advertised directly from GCS buckets, which is not a cost-effective approach and not something we want to do again. Cost effectiveness aside, it limits our ability to make decisions later about what resources to use for hosting as users become dependent on the buckets and make assumptions about them). Again, we have an established process and common infra for container image hosting: https://github.com/kubernetes/k8s.io/tree/main/registry.k8s.io#managing-kubernetes-container-registries @upodroid has been working on migrating the staging to artifact registry and may have some updates for the process but we don't have to block on that.
That's a distinct problem from where they're hosted though. The output of the jobs can be copied where we need it ...?
ACK ... We still need an accounting of what exactly. Should probably prioritize the most critical assets first. |
|
Ok, but we still have to sustainably host the ingress if we're paying for it in k8s infra. We have an allocation for the core repos binaries (we get a bandwidth budget that we negotiated based on that need), and we have registry.k8s.io We have to be careful with introducing content hosts because we have limited ability to cut usage and manage costs. We've been asking subprojects to use GitHub releases to host files. We probably would do this for Kubernetes too but we have a huge legacy around that and we receive an ongoing donation specifically for that problem. |
IMHO we should break down this migration project in different conversations. I can't definitively do a lift and shift for Minikube. |
@medyagh Any thoughts on my proposal ? |
@medyagh kindly ping |
Hi @ameukam @BenTheElder , we do need a project to host the ISO images that are built Per PR, and intenral minikube GCP project built and "pushed" and ISO image to a GCS bucket, to not depend on an internal minikube proejct inside google we need an alternative to push large artifacts during build and test process easiest thing would be having a community owned and managed GCP project for minikube to replicate the same process there but if there is a way to Push large ISO images "Per PR" I am open to use that infra if available. mind that building ISO wouldnt work in Github action machines, and wouldnt work on github action artifacts due to its size and compute power needed to build linux from source. |
We have avoided pointing end users at any single SaaS to avoid being stuck with exploding bills and no way to migrated, so we'd have to setup something like dl.k8s.io We need to know how much bandwidth. dl.k8s.io involved a lengthy process to negotiate sufficient bandwidth from a CDN provider to host Kubernetes's binaries with some room for growth. Can you run builds on GCB (we provide resources for this meant for building container images, which we have a shared host for) but upload them to github? |
cc @kubernetes/sig-k8s-infra-leads |
the machines that build ISO need to be beefy, the github action time out and take forever make it not practical to use, but after the ISO is built that would be for the PR's ISO not the released ISO, we can not use github release assets for that. have added iso to github assets for the reelased minikube as a fail over to GCS (since github assets are significantly slower) |
I'm not suggesting to use GHA to build, please see above again. GCB offers large machines and we already have docs to get automated builds with GCB, however we would ask that you do NOT point end users to the "staging" buckets / GCR, those are supposed to be intermediate / internal only. My ask it that we build in GCB and publish to github. EDIT: GCB is also post-merge though, for security reasons. |
Do we have to build this on every pull request before merge ...? Even kubernetes/kubernetes has many artifacts we do not build on PRs and are instead only reviewed and then after merge built, and then those builds can be adopted in a subsequent PR. This is a trade off in resourcing and load and what we make available to ~arbitrary code pushed to PRs. |
not on all pull request but on all pull requests that change the ISO yes it is very essential to build PRs that change ISO and have it tested, so we do need something like GCS bucket to store the build artifacts to be used till minikube is released. |
If you don't release it directly to users (IE just because an ISO build is available doesn't mean minikube is using it yet), you can do what we do in other projects including kubernetes/kubernetes, which is:
See for example: For this, we can offer you GCB and you can push to GCS as an intermediate location to then publish to github. The docs for staging GCB builds are in this repo. |
we dont merge the ISO PRs till they can build, because there are more than 20+ different type of automated bumping ISO softwtare + manual ISO PRs that needs to be first proved than can be built, The user end laptops are not suitable for building ISO and contributor relies on our CI infra to build the ISO and also test it with all different platforms before being merged. anyways that part of the mechanisim is not what needs to be solved (the current trigger mechanism is if the PR has a ok-to-build-iso by a maintainer then it triggers the ISO job (that can be moved out of jenkins to be GCB or anything) else however we do need a place to host the ISO while it is merged on HEAD and even after minikube is ready to be releaed, the current github assets is very slow and will signifincatly affect the onboarding experience of minikube users. so for not affecting the smooth onboarding experience for kubernetes, I still need a Highly Available Storage for the Released ISO other than github release assets which is only good for archive purposes. |
Hi @medyagh I'm a fellow SIG Infra/Testing maintainer with several comments to share. At a high level, I recommend you rewrite the Minikube CI and release pipelines to fully use Prow and release Minikube the same way we release other projects. We are unable to migrate the Jenkins CI used by Minikube that's hosted internally at Google and in the unlikely event it disappears, the project will be at risk. The kops project is quite similar to minikube and I can share how they do things:
Downloading artifacts from GitHub releases is fast for all users and there are no rate limits for reasonable use cases. If this sounds good, let me know and I'll help you with the migration effort. |
I am open to migrate to prow if it can support our testing/building, minikube infra in jenkins is 8-9 years old and historically the reason that couldnt use prow was prow didnt have nested virtualization to run VMs inside there, and I believe that is still true that prow jobs all need to be running inside a container, (which wont work for VM drivers of minikube) Currently I am the only person at google mananging all the internal infra and I would love to move out as much of this infra to community owned infra without disruption and also be realistic how much work it needs. I prefer super simple solution without causing more maintainers toll. The minikube is ISO has a few requirements there are currently more than 20 ISO automation PRs+manual ISO PRs from contributors, each build of ISO takes 5+ hours on a beefy machine, and if this process gets entagled it will be extremely difficult to debug what cause the issue (mind that I havent discussed other aspects yet, KIC base images are simmilar story but instead of publishing artifacts, we publish OCI images, I would like to tackle one at a time though) I feel like this discussion would be better to do over a call, would you guys be open to discuss ? |
Lets discuss it further in a call.
We can do nested virtualisation in a K8s pod running in GKE. https://cloud.google.com/kubernetes-engine/docs/how-to/nested-virtualization. You can also launch VMs with nested virt in GCE, SSH in and run your tests against it(this is how we run node e2e tests and also real e2e testing on the cloud for kubernetes)
We can build ISOs in presubmits(usually via an optional job that is triggered by slash commands or changed directories) and rebuild them again on merge to master branch. I did see the images being hosted at gcr.io/k8s-minikube and they all need to be migrated to registry.k8s.io as part of the CI rewrite. |
Sounds good, SSHing into a GCE instance would still need a GCP Project for minikube owned by community (the purpose of this issue) |
It doesn't, we have something called boskos and kubetest2 that lends you a GCP project to create ephemeral VMs in. Have a look at this job as an example https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-node-e2e-containerd/1918007953990881280 I can explain the tooling when we meet. |
that looks very interesting ! might be able to leverage that, however we still need somewhere to Host the built ISO after the PR is merged, (the Cron Job Tests on the Head will need to be run against the HEAD minikube with the merged ISO, so if the PR gets merged with temproary GCP project, the after merged HEAD will have older ISO and mess up our Gopogh Flake Test Dashboard |
Hello, minikube maintainer here, I would like to ask for a GCP project for minikube owned by the CNCF community, our release test infra is at a google owned project that we like to explore migrating it to CNCF-owned project, is this the right place to ask for it ?
related: kubernetes/test-infra#33654
The text was updated successfully, but these errors were encountered: