-
Notifications
You must be signed in to change notification settings - Fork 930
FederatedResourceQuota should be failover friendly #5179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
First of all, thanks for bringing this up. I'm glad to enhance it with real-world use cases. Generally, the |
I guess your idea is to let the user declare a total quota by FedratedResourceQuota for a specific namespace, and the quota can be shared across clusters. |
Thanks for taking a look!
Pretty much yes. In our use-case, we have a controller that syncs a tenant's resourcequota on each member cluster to be equal to the tenant's limits (lets say 40CPU and 50GB). Each cluster will an identical static resourcequota (so that 1 cluster can accommodate all of the tenant's workloads if necessary in the case of DR). But we want the FederatedResourceQuota to monitor the existing quota usage across all clusters and set a limit on the amount of resources that can be applied to the Karmada control plane. In the comments you linked these two were most relevant:
Perhaps this would require some sort of admission webhook that would prevent resources from being applied if their total resource usage would go above the limits defined in the FederatedResourceQuota. This would mirror the way that ResourceQuotas are defined in K8s. The more difficult part would be determining when to replenish the quota (perhaps when a work is deleted?). |
Yes, exactly. In addition, the scheduler also should take the resource quota into account and prevent scheduling workloads from clusters that exceed the limitation. By the way, I might be slow to respond on this topic and I wish to pay more attention to #5116 and #5085 and the others we planned in the current release. But I'm interested and glad to have this discussion, and hoping keep this open and welcome other people to join this. |
That's alright! Apologies for all the issues that have been filed recently - one at a time. :) |
What would you like to be added:
A way for the FederatedResourceQuota to monitoring existing ResourceQuotas (without managing those ResourceQuotas) and impose resource limits on the user based off the sum of all currently used Quota.
Why is this needed:
The existing FederatedResourceQuota mirrors the behavior of a typical Kubernetes ResourceQuota by imposing total resource limits in a multi-cluster setup. This is done by creating statically distributed ResourceQuotas across the specified member clusters, who's limits will total the limits defined in the FederatedResourceQuota. This works if the user does not need to worry about DR events which require back-up resources dedicated for failover in the event of a disaster.
In our case, since we are using Karmada for it's failover feature, we would like clusters to have additionally available quota for each namespace so that in the event of a DR event, all applicatons are able to be rescheduled:
In the diagram above, we can see that the total limits of the FederatedResourceQuota is 40/40 CPU and 50Gb / 50 Gb Memory. Individual clusters will have the same limit, so that in the case of a DR event, all workloads can be scheduled on one cluster.
Above, we see that during a failover all workloads from Cluster A will be migrated to Cluster B, where there will be enough available resources to schedule all required pods. With the existing statically defined ResourceQuotas, we cannot support this type of failover.
We've created this ticket to start a discussion on how best to address this limitation, and if this use-case is valid.
The text was updated successfully, but these errors were encountered: