Skip to content

Commit a26d7bd

Browse files
authored
Merge pull request kubernetes#3765 from robscott/topology-hints-1-27-updates
KEP-2433 Topology Aware Hints: Adding SameZone heuristic and other tweaks
2 parents 33f7b95 + 1b4fccd commit a26d7bd

File tree

3 files changed

+113
-65
lines changed

3 files changed

+113
-65
lines changed

keps/prod-readiness/sig-network/2433.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,3 @@ alpha:
33
approver: "@wojtek-t"
44
beta:
55
approver: "@wojtek-t"
6-
stable:
7-
approver: "@wojtek-t"

keps/sig-network/2433-topology-aware-hints/README.md

Lines changed: 110 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,22 @@
88
- [Proposal](#proposal)
99
- [Risks and Mitigations](#risks-and-mitigations)
1010
- [Design Details](#design-details)
11-
- [Assumptions](#assumptions)
12-
- [Identifying Zones](#identifying-zones)
13-
- [Excluding Control Plane Nodes](#excluding-control-plane-nodes)
1411
- [Configuration](#configuration)
15-
- [Interoperability](#interoperability)
16-
- [Feature Gate](#feature-gate)
12+
- [Interoperability](#interoperability)
13+
- [Feature Gate](#feature-gate)
1714
- [API](#api)
1815
- [Future API Expansion](#future-api-expansion)
1916
- [Kube-Proxy](#kube-proxy)
2017
- [EndpointSlice Controller](#endpointslice-controller)
18+
- [Heuristics](#heuristics)
19+
- [Proportional CPU Heuristic](#proportional-cpu-heuristic)
20+
- [Assumptions](#assumptions)
21+
- [Identifying Zones](#identifying-zones)
22+
- [Excluding Control Plane Nodes](#excluding-control-plane-nodes)
2123
- [Example](#example)
2224
- [Overload](#overload)
2325
- [Handling Node Updates](#handling-node-updates)
26+
- [Additional Heuristics](#additional-heuristics)
2427
- [Future Expansion](#future-expansion)
2528
- [Test Plan](#test-plan)
2629
- [Unit tests](#unit-tests)
@@ -94,6 +97,7 @@ Kubernetes clusters are increasingly deployed in multi-zone environments.
9497
Network traffic is routed randomly to any endpoint matching a Service. Some
9598
users might want the traffic to stay in the same zone for the following
9699
reasons:
100+
97101
- Cost savings: Keeping traffic within a zone can limit cross-zone networking
98102
costs.
99103
- Performance: Traffic within a zone usually has less latency and bandwidth
@@ -125,10 +129,19 @@ for most use cases.
125129
- Ensuring that Pods are distributed evenly across zones.
126130

127131
## Proposal
132+
This KEP describes two related concepts:
133+
134+
1. A way to express the heuristic you'd like to use for Topology Aware Routing.
135+
2. A new Hints field in EndpointSlices that can be used to enable certain
136+
topology heuristics.
128137

129-
When this feature is enabled, the EndpointSlice controller will be updated to
130-
provide hints for each endpoint. These hints will initially be limited to a
131-
single zone per-endpoint. Kube-Proxy will then use these hints to filter the
138+
For now, the only heuristic proposed relies on hints so these concepts are
139+
closely tied. It is important to note that that may not be the case for future
140+
heuristics.
141+
142+
When a heuristic that depends on Hints is chosen, the EndpointSlice controller
143+
will populate hints for each endpoint. These hints will initially be limited to
144+
a single zone per-endpoint. Kube-Proxy will then use these hints to filter the
132145
endpoints they should route to.
133146

134147
For example, for a Service with 3 endpoints, the EndpointSlice controller may
@@ -178,43 +191,16 @@ with a new Service annotation.
178191

179192
## Design Details
180193

181-
### Assumptions
182-
183-
- Incoming traffic is proportional to the number of allocatable CPU cores in a
184-
zone. Although this is an imperfect metric, it is the best available way of
185-
predicting how much traffic will be received in a zone. If we are unable to
186-
derive the number of allocatable cores in a zone we will fall back to the
187-
number of nodes in that zone.
188-
- Service capacity is proportional to the number of endpoints in a zone. This
189-
assumes that each endpoint has equivalent capacity. Although this is not
190-
always true, it usually is. We can explore ways to deal with variable capacity
191-
endpoints in the future.
192-
193-
### Identifying Zones
194-
195-
The EndpointSlice controller reads the standard `topology.kubernetes.io/zone`
196-
label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
197-
updated to read the same information to identify which zone it is running in.
198-
199-
### Excluding Control Plane Nodes
200-
201-
Any Nodes with the following labels (set to any value) will be excluded when
202-
calculating allocatable cores in a zone:
203-
204-
* `node-role.kubernetes.io/control-plane`
205-
* `node-role.kubernetes.io/master`
206-
207194
### Configuration
208195

209-
A new `service.kubernetes.io/topology-aware-routing` annotation can be used to
210-
enable or disable Topology Aware Routing (and by extension, hints) for a
211-
Service. This may be set to "Auto" or "Disabled". Any other value is treated as
212-
"Disabled".
196+
A new `service.kubernetes.io/topology-mode` annotation can be used to enable or
197+
disable Topology Aware Routing heuristics for a Service.
213198

214199
The previous `service.kubernetes.io/topology-aware-hints` annotation will
215-
continue to be supported as a means of configuring this feature.
200+
continue to be supported as a means of configuring this feature for both "Auto"
201+
and "Disabled" values. New values will only be supported by the new annotation.
216202

217-
#### Interoperability
203+
### Interoperability
218204

219205
Topology hints will be ignored if the TopologyKeys field has at least one entry.
220206
This field is deprecated and will be removed soon.
@@ -225,7 +211,7 @@ topology was enabled, external traffic would be routed using the
225211
ExternalTrafficPolicy configuration while internal traffic would be routed with
226212
topology.
227213

228-
#### Feature Gate
214+
### Feature Gate
229215

230216
This functionality will be guarded by the `TopologyAwareHints` feature gate.
231217
This gate also interacts with 2 other feature gates:
@@ -290,7 +276,6 @@ conditions are true:
290276

291277
- Kube-Proxy is able to determine the zone it is running within (likely based
292278
on node labels).
293-
- The annotation is set to `Auto`.
294279
- At least one endpoint for the Service has a hint pointing to the zone
295280
Kube-Proxy is running within.
296281
- All endpoints for the Service have zone hints.
@@ -304,17 +289,56 @@ and disabled states. Without this fallback, endpoints could easily get
304289
overloaded as hints were being added or removed from some EndpointSlices but
305290
had not yet propagated to all of them.
306291

292+
Note: Some future heuristics may not rely on hints and could instead be
293+
implemented directly by kube-proxy.
294+
307295
### EndpointSlice Controller
308296

309297
When the `TopologyAwareHints` feature gate is enabled and the annotation is set
310-
to `Auto` for a Service, the EndpointSlice controller will add hints to
311-
EndpointSlices. These hints will indicate where an endpoint should be consumed
312-
by proxy implementations to enable topology aware routing.
298+
to `Auto` or `ProportionalByCore` for a Service, the EndpointSlice controller
299+
will add hints to EndpointSlices. These hints will indicate where an endpoint
300+
should be consumed by proxy implementations to enable topology aware routing.
301+
302+
## Heuristics
303+
304+
This KEP starts with the following heuristics:
305+
306+
| Heuristic Name | Description |
307+
|-|-|
308+
| Auto | EndpointSlice controller and/or underlying dataplane can choose the heuristic used. |
309+
| ProportionalByCore | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
310+
311+
In the future, additional heuristics may be added. Until that point, "Auto" will
312+
be the only configurable value. In most clusters, that will translate to
313+
`ProportionalByCore` unless the underlying dataplane has a better approach
314+
available.
313315

314-
The EndpointSlice controller will determine how many endpoints should be
315-
available for each zone based on the proportion of CPU cores in each zone. If
316-
it is not possible to determine the number CPU cores, 1 core per node will be
317-
assumed for calculations.
316+
### Proportional CPU Heuristic
317+
#### Assumptions
318+
319+
- Incoming traffic is proportional to the number of allocatable CPU cores in a
320+
zone. Although this is an imperfect metric, it is the best available way of
321+
predicting how much traffic will be received in a zone. If we are unable to
322+
derive the number of allocatable cores in a zone we will fall back to the
323+
number of nodes in that zone.
324+
- Service capacity is proportional to the number of endpoints in a zone. This
325+
assumes that each endpoint has equivalent capacity. Although this is not
326+
always true, it usually is. We can explore ways to deal with variable capacity
327+
endpoints in the future.
328+
329+
#### Identifying Zones
330+
331+
The EndpointSlice controller reads the standard `topology.kubernetes.io/zone`
332+
label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
333+
updated to read the same information to identify which zone it is running in.
334+
335+
#### Excluding Control Plane Nodes
336+
337+
Any Nodes with the following labels (set to any value) will be excluded when
338+
calculating allocatable cores in a zone:
339+
340+
* `node-role.kubernetes.io/control-plane`
341+
* `node-role.kubernetes.io/master`
318342

319343
#### Example
320344

@@ -369,12 +393,20 @@ of the following scenarios:
369393
2. A new Node results in a Service that is able to achieve an endpoint
370394
distribution below 20% for the first time.
371395

396+
### Additional Heuristics
397+
To enable additional heuristics to be added in the future, we will:
398+
399+
1. Remove the requirement in kube-proxy that the hints annotation must be set to
400+
a known value on the associated Service before the values of EndpointSlice
401+
hints will be considered.
402+
2. Ensure the EndpointSlice controller TopologyCache provides an interface that
403+
simplifies adding additional heuristics in the future.
404+
372405
### Future Expansion
373406

374407
In the future we may expand this functionality if needed. This could include:
375408

376-
- A new `RequireZone` algorithm that would keep endpoints in EndpointSlices for
377-
the same zone they are in.
409+
- As described above, additional heuristics may be added in the future.
378410
- A new option to specify a minimum threshold for the `Auto` (PreferZone)
379411
approach.
380412
- Support for region based hints.
@@ -467,6 +499,16 @@ EndpointSliceSyncs = metrics.NewCounterVec(
467499
[]string{"result"}, // either "success" or "failure"
468500
)
469501
502+
// EndpointSliceHints tracks the number of endpoints that have hints assigned.
503+
EndpointSliceEndpointsWithHints = metrics.NewGaugeVec(
504+
&metrics.CounterOpts{
505+
Subsystem: EndpointSliceSubsystem,
506+
Name: "endpoints_with_hints",
507+
Help: "Number of endpoints that have hints assigned",
508+
StabilityLevel: metrics.ALPHA,
509+
},
510+
[]string{"result"}, // either "Auto" or "SameZone"
511+
)
470512
```
471513

472514
### Events
@@ -490,7 +532,7 @@ feature.
490532

491533
#### Sample Events
492534

493-
| Type | Reason | Message |
535+
| Type | Reason | Message |
494536
|-|-|-|
495537
| Normal | TopologyAwareRoutingEnabled | Topology Aware Routing has been enabled |
496538
| Normal | TopologyAwareRoutingDisabled | Topology Aware Routing configuration was removed |
@@ -532,11 +574,17 @@ completeness.
532574
disabled.
533575
- Ensure that existing Topology Hints e2e test runs as a presubmit if any code
534576
changes in kube-proxy or the EndpointSlice controller.
535-
- Topology Hints e2e tests will graduate to conformance tests.
536577
- Autoscaling and Scheduling SIGs have a plan to provide zone aware autoscaling
537578
(and scheduling) that allows users to proportionally distribute endpoints
538579
across zones.
539580

581+
**Note on Conformance Tests:**
582+
It's worth noting that conformance tests are intentionally out of scope for this
583+
KEP. We want to provide flexibility for underlying dataplanes to provide
584+
improved topology aware routing options. As the name suggests, "hints" can be
585+
useful when implementing topology aware routing, but we do not want them to be
586+
considered a strict requirement.
587+
540588
### Version Skew Strategy
541589
This KEP requires updates to both the EndpointSlice Controller and kube-proxy.
542590
Thus there could be two potential version skew scenarios:
@@ -559,6 +607,7 @@ enabled even if the annotation has been set on the Service.
559607
- [x] Feature gate (also fill in values in `kep.yaml`)
560608
- Feature gate name: TopologyAwareHints
561609
- Components depending on the feature gate:
610+
- kube-apiserver
562611
- kube-controller-manager
563612
- kube-proxy
564613

@@ -575,13 +624,14 @@ enabled even if the annotation has been set on the Service.
575624
EndpointSlices for Services that have this feature enabled.
576625

577626
* **Are there any tests for feature enablement/disablement?**
578-
Per Service enablement and disablement is covered in depth by unit tests. As a
579-
prerequisite for graduation to GA, we will also add the following:
580-
581-
- Test coverage in EndpointSlice strategy to ensure that the Hints field is
582-
dropped when the feature gate is not enabled.
583-
- Test coverage in EndpointSlice controller for the transition from enabled to
584-
disabled.
627+
Enablement is covered by a variety of tests:
628+
629+
* Per Service enablement and disablement in EndpointSlice Controller. [(Unit
630+
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/controller/endpointslice/reconciler_test.go#L1641-L1907)
631+
* Hints field is dropped when feature gate is off. [(Strategy Unit
632+
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/registry/discovery/endpointslice/strategy_test.go)
633+
* TODO before GA: Test coverage in EndpointSlice controller for the transition
634+
from enabled to disabled.
585635

586636
### Rollout, Upgrade and Rollback Planning
587637

keps/sig-network/2433-topology-aware-hints/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,18 @@ replaces:
2323
- "github.com/kubernetes/enhancements/tree/master/keps/sig-network/536-topology-aware-routing"
2424

2525
# The target maturity stage in the current dev cycle for this KEP.
26-
stage: stable
26+
stage: beta
2727

2828
# The most recent milestone for which work toward delivery of this KEP has been
2929
# done. This can be the current (upcoming) milestone, if it is being actively
3030
# worked on.
31-
latest-milestone: "v1.26"
31+
latest-milestone: "v1.27"
3232

3333
# The milestone at which this feature was, or is targeted to be, at each stage.
3434
milestone:
3535
alpha: "v1.21"
3636
beta: "v1.23"
37-
stable: "v1.26"
37+
stable: "v1.28"
3838

3939
# The following PRR answers are required at alpha release
4040
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)