Skip to content

Commit 1a2f8e3

Browse files
committed
Proposal for federated resource quota enhancement
Signed-off-by: mszacillo <mszacillo@bloomberg.net>
1 parent 3b6c0e0 commit 1a2f8e3

File tree

5 files changed

+161
-0
lines changed

5 files changed

+161
-0
lines changed
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: Enhance existing FederatedResourceQuota API to support enforcing resource limits directly on the Karmada control-plane
3+
authors:
4+
- "@mszacillo"
5+
reviewers:
6+
- "@RainbowMango"
7+
- "@XiShanYongYe-Chang"
8+
approvers:
9+
- "@RainbowMango"
10+
- "XiShanYongYe-Chang"
11+
12+
creation-date: 2025-03-02
13+
14+
---
15+
16+
# Enhance FederatedResourceQuota to enforce Resource Limits on Karmada Control Plane
17+
18+
## Summary
19+
20+
This proposal aims to enhance the existing FederatedResourceQuota API to impose namespaced resource limits directly on the Karmada control-plane level. With the feature, it will be easier for users of application and cluster failover to ensure that fallback clusters have enough resources to house all the user's workflows. Ideally, this is a feature that can be configured and toggled, which would provider users with more control over enforcing their namespaces resources.
21+
22+
## Motivation
23+
24+
The desire for this feature comes from the use of application and cluster failover. When failover is enabled, applications are expected to migrate between clusters, and it is up to us as platform owners to make sure each cluster has sufficient resources to be able to host those applications. We believe controlling resource limits on a Karmada level will simplify this resource quota management.
25+
26+
The existing FederatedResourceQuota only provides Karmada administrators with the ability to manage ResourceQuotas via static assignment on member clusters. This saves administrators some time by not requiring a PropagationPolicy for their ResourceQuotas.
27+
28+
However, this feature does not work entirely with application and cluster failover. Since static assignment of resource quotas requires that users subdivide their quota between clusters, each member cluster’s resource quota will be less than the total quota allocated to the user. This means during failover, the other member cluster will not have sufficient resources to host failed-over applications.
29+
30+
### User Story
31+
32+
As a data platform owner, we host many different tenants across different namespaces. One of the biggest benefits of using Karmada to manage our user's Flink applications is the automated failover feature. But in order for failover to succeed, we need to carefully plan our cluster federation and Karmada setup to ensure:
33+
1. Fallback clusters have sufficient resources to host all necessary applications.
34+
2. Users have imposed resource limits, so they cannot schedule applications that go over their namespaces resource limits.
35+
36+
Given these requirements, let's assume we have a Karmada control-plane setup with application and cluster failover enabled. In order to impose namespaced resource limits, we use a FederatedResourceQuota with 40 CPU and 50 GB Memory. Since static-assignment is being used, each cluster gets a ResourceQuota of 20 CPU and 25 GB Memory.
37+
38+
Eventually, all clusters are full and no more resources can be scheduled. Below we see 4 FlinkDeployments with varying CPU and Memory requests, they are subdivided into two clusters and both filling the entire ResourceQuota:
39+
40+
![resource-quota-assignment](resources/resource-quota-assignment.png)
41+
42+
However, let's now assume there was a cluster failure which triggers a cluster failover. In this case, since the ResourceQuotas are statically assigned, the fallback cluster will not be able to schedule these applications because the ResourceQuota is already full. Jobs will be unable to failover, and will have to wait until the original cluster comes back up. This is not acceptable.
43+
44+
![failover-example-1](resources/failover-example-1.png)
45+
46+
In the following image, we show what we would ideally like to support. The Karmada control plane limits the total resource usage for the user's namespace, but does not statically assign ResourceQuotas. Both member clusters have identical ResourceQuotas of 40 CPU and 50 Gb Memory, so when Member Cluster 1 fails, Member Cluster 2 will have enough space to host FlinkDeployments A and B:
47+
48+
![failover-example-2](resources/failover-example-2.png)
49+
50+
Could we support dynamic assignment of FederatedResourceQuota? Potentially yes, but there are some drawbacks with that approach:
51+
1. Each time an application failovers, the FederatedResourceQuota will need to check that the feasible clusters have enough quota, and if not, rebalance the resource quotas before scheduling work. This adds complexity to the scheduling step, and would increase E2E failover latency.
52+
2. Additionally, in bad cases, applications could be failing over frequently which would result in frequent ResourceQuota updates, leading to a lot of churn on the Karmada control-plane and member clusters.
53+
54+
Instead, we would like to enhance the existing FederatedResourceQuota API so that it can enforce resource limits directly on the Karmada control-plane and allow users to configure their cluster federation to be ready for failovers.
55+
56+
### Goals
57+
- Enhance FederatedResourceQuota API to support enforcing namespaces resource limits on the Karmada control-plane level by using Overall
58+
- Make this feature configurable and toggleable
59+
60+
### Non-Goals
61+
- Support of dynamic resource quota allocation is outside the scope of this proposal.
62+
63+
## Proposal
64+
65+
1. FederatedResourceQuota API should enforce namespaced overall resource limits.
66+
- FederatedResourceQuota Status will be updated whenever resources are applied against the relevant namespace
67+
- We consider including a resource selector scope for the quota
68+
2. A custom controller will be responsible for updating the overall resource usage
69+
3. A validation webhook (or admission controller) will block users from applying or updating resources that will go above total resource allowances
70+
71+
## API Changes
72+
73+
### FederatedResourceQuota API
74+
75+
There will not be changes made to the existing FederatedResourceQuota API definition, but we will redefine how `Overall` is currently used. We would like `Overall` to represent the resource limits imposed by the quota, and have Karmada keep track of these limits when managing applications applied to the namespace.
76+
77+
```go
78+
// FederatedResourceQuotaSpec defines the desired hard limits to enforce on the Karmada namespace.
79+
type FederatedResourceQuotaSpec struct {
80+
// Overall is the set of desired hard limits for each named resource.
81+
// If Overall is set, the FederatedResourceQuota will impose limits directly on the Karmada control-plane
82+
// +required
83+
Overall corev1.ResourceList `json:"overall"`
84+
}
85+
```
86+
87+
### ResourceBinding Change
88+
89+
The ResourceBindingSpec will include a FederatedResourceQuota reference, which can be updated one of two ways:
90+
91+
1. If there is a FederatedResourceQuota in the namespace, add it to the created binding spec by default
92+
2. If there is no FederatedResourceQuota, leave empty.
93+
3. If the FederatedResourceQuota has a scope, determine if the resource matches the scope, and then add the pointer in the binding if needed.
94+
95+
```go
96+
// ResourceBindingSpec represents the expectation of ResourceBinding.
97+
type ResourceBindingSpec struct {
98+
***
99+
100+
// FederatedResourceQuota represents the name of the quota that will be used for this resource
101+
// +optional
102+
FederatedResourceQuota string `json:"federatedResourceQuota,omitempty"`
103+
104+
***
105+
}
106+
```
107+
108+
## Design Details
109+
110+
### Controller Change
111+
112+
**Reconciliation**: Controller will reconcile whenever a resource binding is created, updated, or deleted. Controller will only reconcile if the resource in question has a pointer to the FederatedResourceQuota.
113+
114+
**Reconcile Logic**: When reconciling, the controller will fetch the list of ResourceBindings by namespace and add up their resource requirements. The existing implementation grabs all RBs by namespace, however this could probably be improved by only calculating the delta of the applied resource, rather than calculating the entire resource footprint of the namespace.
115+
116+
In order to calculate delta efficiently, we'd need to introduce a ResourceBinding cache maintained by the controller.
117+
118+
**Internal RB Cache**
119+
120+
The cache would populate by fetching all ResourceBindings during initialization. Cache would then be maintained during the controller's reconcile loops. In the case of a pod crash or restart, the cache would need to be repopulated. But having the cache will prevent the controller from needing to fetch all ResourceBindings during all reconciles.
121+
122+
During reconcile, cache updates would occur:
123+
1. If the reconciled ResourceBinding is not present in the cache.
124+
2. The reconciled ResourceBinding has a spec change and should be updated in the cache.
125+
126+
Resource usage delta would
127+
128+
### Scheduler Change
129+
130+
Note: Since the controller is listening to RBs, the FederatedResourceQuota will be calculated after the resource binding has been created or updated.
131+
132+
If a user bulk-applies a bunch of resources at once, it could be possible for the user to go above the quota’s limits. In this case, we should also check that the quota is honored before deciding to schedule the resource to a member cluster.
133+
134+
### Admission Webhook
135+
136+
As part of this change, we will introduce a new validating webhook:
137+
138+
1. The new validating webhook will watch all types of resources, at least all native workloads (Deployments) and supported CRDs (FlinkDeployments). The webhook will use Karmada's `ResourceInterpreter#GetReplicas` to calculate the predicted delta resource usage for the quota. If the applied resource goes above the limit, then the webhook will deny the request.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: policy.karmada.io/v1alpha1
2+
kind: PropagationPolicy
3+
metadata:
4+
name: test-flinkdeployment
5+
namespace: user-ns
6+
spec:
7+
failover:
8+
application:
9+
decisionConditions:
10+
tolerationSeconds: 150
11+
purgeMode: Immediately
12+
statePreservation:
13+
rules:
14+
- aliasLabelName: resourcebinding.karmada.io/failover-jobid
15+
jsonPath: '{ .jobStatus.jobId }'
16+
...
17+
18+
19+
20+
21+
22+
23+
Loading
Loading
Loading

0 commit comments

Comments
 (0)