8
8
- [ Proposal] ( #proposal )
9
9
- [ Risks and Mitigations] ( #risks-and-mitigations )
10
10
- [ Design Details] ( #design-details )
11
- - [ Assumptions] ( #assumptions )
12
- - [ Identifying Zones] ( #identifying-zones )
13
- - [ Excluding Control Plane Nodes] ( #excluding-control-plane-nodes )
14
11
- [ Configuration] ( #configuration )
15
- - [ Interoperability] ( #interoperability )
16
- - [ Feature Gate] ( #feature-gate )
12
+ - [ Interoperability] ( #interoperability )
13
+ - [ Feature Gate] ( #feature-gate )
17
14
- [ API] ( #api )
18
15
- [ Future API Expansion] ( #future-api-expansion )
19
16
- [ Kube-Proxy] ( #kube-proxy )
20
17
- [ EndpointSlice Controller] ( #endpointslice-controller )
18
+ - [ Heuristics] ( #heuristics )
19
+ - [ Proportional CPU Heuristic] ( #proportional-cpu-heuristic )
20
+ - [ Assumptions] ( #assumptions )
21
+ - [ Identifying Zones] ( #identifying-zones )
22
+ - [ Excluding Control Plane Nodes] ( #excluding-control-plane-nodes )
21
23
- [ Example] ( #example )
22
24
- [ Overload] ( #overload )
23
25
- [ Handling Node Updates] ( #handling-node-updates )
26
+ - [ Additional Heuristics] ( #additional-heuristics )
24
27
- [ Future Expansion] ( #future-expansion )
25
28
- [ Test Plan] ( #test-plan )
26
29
- [ Unit tests] ( #unit-tests )
@@ -94,6 +97,7 @@ Kubernetes clusters are increasingly deployed in multi-zone environments.
94
97
Network traffic is routed randomly to any endpoint matching a Service. Some
95
98
users might want the traffic to stay in the same zone for the following
96
99
reasons:
100
+
97
101
- Cost savings: Keeping traffic within a zone can limit cross-zone networking
98
102
costs.
99
103
- Performance: Traffic within a zone usually has less latency and bandwidth
@@ -125,10 +129,19 @@ for most use cases.
125
129
- Ensuring that Pods are distributed evenly across zones.
126
130
127
131
## Proposal
132
+ This KEP describes two related concepts:
133
+
134
+ 1 . A way to express the heuristic you'd like to use for Topology Aware Routing.
135
+ 2 . A new Hints field in EndpointSlices that can be used to enable certain
136
+ topology heuristics.
128
137
129
- When this feature is enabled, the EndpointSlice controller will be updated to
130
- provide hints for each endpoint. These hints will initially be limited to a
131
- single zone per-endpoint. Kube-Proxy will then use these hints to filter the
138
+ For now, the only heuristic proposed relies on hints so these concepts are
139
+ closely tied. It is important to note that that may not be the case for future
140
+ heuristics.
141
+
142
+ When a heuristic that depends on Hints is chosen, the EndpointSlice controller
143
+ will populate hints for each endpoint. These hints will initially be limited to
144
+ a single zone per-endpoint. Kube-Proxy will then use these hints to filter the
132
145
endpoints they should route to.
133
146
134
147
For example, for a Service with 3 endpoints, the EndpointSlice controller may
@@ -178,43 +191,16 @@ with a new Service annotation.
178
191
179
192
## Design Details
180
193
181
- ### Assumptions
182
-
183
- - Incoming traffic is proportional to the number of allocatable CPU cores in a
184
- zone. Although this is an imperfect metric, it is the best available way of
185
- predicting how much traffic will be received in a zone. If we are unable to
186
- derive the number of allocatable cores in a zone we will fall back to the
187
- number of nodes in that zone.
188
- - Service capacity is proportional to the number of endpoints in a zone. This
189
- assumes that each endpoint has equivalent capacity. Although this is not
190
- always true, it usually is. We can explore ways to deal with variable capacity
191
- endpoints in the future.
192
-
193
- ### Identifying Zones
194
-
195
- The EndpointSlice controller reads the standard ` topology.kubernetes.io/zone `
196
- label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
197
- updated to read the same information to identify which zone it is running in.
198
-
199
- ### Excluding Control Plane Nodes
200
-
201
- Any Nodes with the following labels (set to any value) will be excluded when
202
- calculating allocatable cores in a zone:
203
-
204
- * ` node-role.kubernetes.io/control-plane `
205
- * ` node-role.kubernetes.io/master `
206
-
207
194
### Configuration
208
195
209
- A new ` service.kubernetes.io/topology-aware-routing ` annotation can be used to
210
- enable or disable Topology Aware Routing (and by extension, hints) for a
211
- Service. This may be set to "Auto" or "Disabled". Any other value is treated as
212
- "Disabled".
196
+ A new ` service.kubernetes.io/topology-mode ` annotation can be used to enable or
197
+ disable Topology Aware Routing heuristics for a Service.
213
198
214
199
The previous ` service.kubernetes.io/topology-aware-hints ` annotation will
215
- continue to be supported as a means of configuring this feature.
200
+ continue to be supported as a means of configuring this feature for both "Auto"
201
+ and "Disabled" values. New values will only be supported by the new annotation.
216
202
217
- #### Interoperability
203
+ ### Interoperability
218
204
219
205
Topology hints will be ignored if the TopologyKeys field has at least one entry.
220
206
This field is deprecated and will be removed soon.
@@ -225,7 +211,7 @@ topology was enabled, external traffic would be routed using the
225
211
ExternalTrafficPolicy configuration while internal traffic would be routed with
226
212
topology.
227
213
228
- #### Feature Gate
214
+ ### Feature Gate
229
215
230
216
This functionality will be guarded by the ` TopologyAwareHints ` feature gate.
231
217
This gate also interacts with 2 other feature gates:
@@ -290,7 +276,6 @@ conditions are true:
290
276
291
277
- Kube-Proxy is able to determine the zone it is running within (likely based
292
278
on node labels).
293
- - The annotation is set to `Auto`.
294
279
- At least one endpoint for the Service has a hint pointing to the zone
295
280
Kube-Proxy is running within.
296
281
- All endpoints for the Service have zone hints.
@@ -304,17 +289,56 @@ and disabled states. Without this fallback, endpoints could easily get
304
289
overloaded as hints were being added or removed from some EndpointSlices but
305
290
had not yet propagated to all of them.
306
291
292
+ Note : Some future heuristics may not rely on hints and could instead be
293
+ implemented directly by kube-proxy.
294
+
307
295
# ## EndpointSlice Controller
308
296
309
297
When the `TopologyAwareHints` feature gate is enabled and the annotation is set
310
- to `Auto` for a Service, the EndpointSlice controller will add hints to
311
- EndpointSlices. These hints will indicate where an endpoint should be consumed
312
- by proxy implementations to enable topology aware routing.
298
+ to `Auto` or `ProportionalByCore` for a Service, the EndpointSlice controller
299
+ will add hints to EndpointSlices. These hints will indicate where an endpoint
300
+ should be consumed by proxy implementations to enable topology aware routing.
301
+
302
+ # # Heuristics
303
+
304
+ This KEP starts with the following heuristics :
305
+
306
+ | Heuristic Name | Description |
307
+ |-|-|
308
+ | Auto | EndpointSlice controller and/or underlying dataplane can choose the heuristic used. |
309
+ | ProportionalByCore | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
310
+
311
+ In the future, additional heuristics may be added. Until that point, "Auto" will
312
+ be the only configurable value. In most clusters, that will translate to
313
+ ` ProportionalByCore` unless the underlying dataplane has a better approach
314
+ available.
313
315
314
- The EndpointSlice controller will determine how many endpoints should be
315
- available for each zone based on the proportion of CPU cores in each zone. If
316
- it is not possible to determine the number CPU cores, 1 core per node will be
317
- assumed for calculations.
316
+ # ## Proportional CPU Heuristic
317
+ # ### Assumptions
318
+
319
+ - Incoming traffic is proportional to the number of allocatable CPU cores in a
320
+ zone. Although this is an imperfect metric, it is the best available way of
321
+ predicting how much traffic will be received in a zone. If we are unable to
322
+ derive the number of allocatable cores in a zone we will fall back to the
323
+ number of nodes in that zone.
324
+ - Service capacity is proportional to the number of endpoints in a zone. This
325
+ assumes that each endpoint has equivalent capacity. Although this is not
326
+ always true, it usually is. We can explore ways to deal with variable capacity
327
+ endpoints in the future.
328
+
329
+ # ### Identifying Zones
330
+
331
+ The EndpointSlice controller reads the standard `topology.kubernetes.io/zone`
332
+ label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
333
+ updated to read the same information to identify which zone it is running in.
334
+
335
+ # ### Excluding Control Plane Nodes
336
+
337
+ Any Nodes with the following labels (set to any value) will be excluded when
338
+ calculating allocatable cores in a zone :
339
+
340
+ * `node-role.kubernetes.io/control-plane`
341
+ * `node-role.kubernetes.io/master`
318
342
319
343
# ### Example
320
344
@@ -369,12 +393,20 @@ of the following scenarios:
369
393
2. A new Node results in a Service that is able to achieve an endpoint
370
394
distribution below 20% for the first time.
371
395
396
+ # ## Additional Heuristics
397
+ To enable additional heuristics to be added in the future, we will :
398
+
399
+ 1. Remove the requirement in kube-proxy that the hints annotation must be set to
400
+ a known value on the associated Service before the values of EndpointSlice
401
+ hints will be considered.
402
+ 2. Ensure the EndpointSlice controller TopologyCache provides an interface that
403
+ simplifies adding additional heuristics in the future.
404
+
372
405
# ## Future Expansion
373
406
374
407
In the future we may expand this functionality if needed. This could include :
375
408
376
- - A new `RequireZone` algorithm that would keep endpoints in EndpointSlices for
377
- the same zone they are in.
409
+ - As described above, additional heuristics may be added in the future.
378
410
- A new option to specify a minimum threshold for the `Auto` (PreferZone)
379
411
approach.
380
412
- Support for region based hints.
@@ -467,6 +499,16 @@ EndpointSliceSyncs = metrics.NewCounterVec(
467
499
[]string{"result"}, // either "success" or "failure"
468
500
)
469
501
502
+ // EndpointSliceHints tracks the number of endpoints that have hints assigned.
503
+ EndpointSliceEndpointsWithHints = metrics.NewGaugeVec(
504
+ &metrics.CounterOpts{
505
+ Subsystem: EndpointSliceSubsystem,
506
+ Name: "endpoints_with_hints",
507
+ Help: "Number of endpoints that have hints assigned",
508
+ StabilityLevel: metrics.ALPHA,
509
+ },
510
+ []string{"result"}, // either "Auto" or "SameZone"
511
+ )
470
512
` ` `
471
513
472
514
# ## Events
@@ -490,7 +532,7 @@ feature.
490
532
491
533
# ### Sample Events
492
534
493
- | Type | Reason | Message |
535
+ | Type | Reason | Message |
494
536
|-|-|-|
495
537
| Normal | TopologyAwareRoutingEnabled | Topology Aware Routing has been enabled |
496
538
| Normal | TopologyAwareRoutingDisabled | Topology Aware Routing configuration was removed |
@@ -532,11 +574,17 @@ completeness.
532
574
disabled.
533
575
- Ensure that existing Topology Hints e2e test runs as a presubmit if any code
534
576
changes in kube-proxy or the EndpointSlice controller.
535
- - Topology Hints e2e tests will graduate to conformance tests.
536
577
- Autoscaling and Scheduling SIGs have a plan to provide zone aware autoscaling
537
578
(and scheduling) that allows users to proportionally distribute endpoints
538
579
across zones.
539
580
581
+ **Note on Conformance Tests:**
582
+ It's worth noting that conformance tests are intentionally out of scope for this
583
+ KEP. We want to provide flexibility for underlying dataplanes to provide
584
+ improved topology aware routing options. As the name suggests, "hints" can be
585
+ useful when implementing topology aware routing, but we do not want them to be
586
+ considered a strict requirement.
587
+
540
588
# ## Version Skew Strategy
541
589
This KEP requires updates to both the EndpointSlice Controller and kube-proxy.
542
590
Thus there could be two potential version skew scenarios :
@@ -559,6 +607,7 @@ enabled even if the annotation has been set on the Service.
559
607
- [x] Feature gate (also fill in values in `kep.yaml`)
560
608
- Feature gate name : TopologyAwareHints
561
609
- Components depending on the feature gate :
610
+ - kube-apiserver
562
611
- kube-controller-manager
563
612
- kube-proxy
564
613
@@ -575,13 +624,14 @@ enabled even if the annotation has been set on the Service.
575
624
EndpointSlices for Services that have this feature enabled.
576
625
577
626
* **Are there any tests for feature enablement/disablement?**
578
- Per Service enablement and disablement is covered in depth by unit tests. As a
579
- prerequisite for graduation to GA, we will also add the following :
580
-
581
- - Test coverage in EndpointSlice strategy to ensure that the Hints field is
582
- dropped when the feature gate is not enabled.
583
- - Test coverage in EndpointSlice controller for the transition from enabled to
584
- disabled.
627
+ Enablement is covered by a variety of tests :
628
+
629
+ * Per Service enablement and disablement in EndpointSlice Controller. [(Unit
630
+ Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/controller/endpointslice/reconciler_test.go#L1641-L1907)
631
+ * Hints field is dropped when feature gate is off. [(Strategy Unit
632
+ Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/registry/discovery/endpointslice/strategy_test.go)
633
+ * TODO before GA: Test coverage in EndpointSlice controller for the transition
634
+ from enabled to disabled.
585
635
586
636
# ## Rollout, Upgrade and Rollback Planning
587
637
0 commit comments