Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 23 additions & 6 deletions A76-ring-hash-improvements.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,22 @@ if !using_random_hash {
return ring[first_index].picker->Pick(...);
```

This behavior ensures that a single RPC does not cause more than one endpoint to
exit `IDLE` state at a time, and that a request missing the header does not
incur extra latency in the common case where there is already at least one
endpoint in `READY` state. It converges to picking a random endpoint, since each
request may eventually cause a random endpoint to go from `IDLE` to `READY`.
This behavior ensures that a single RPC does not cause more than two endpoint to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we say that it's bounded to 2, but I think the problem described in the next paragraph means that it's actually not bounded at all. It's conceivably possible to have a case where all endpoints are in IDLE state and we get a bunch of picks on the current picker in parallel, where each pick could trigger a connection attempt on a different IDLE endpoint before the picker gets updated.

If this is indeed completely unbounded, then I think it would make sense for us to use an atomic for picker_has_endpoint_in_connecting_state and do a CAS loop to set it, so that we can guarantee that we'll trigger only one connection attempt at a time.

@ejona86, @dfawley, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test in C-core that proves that the number of simultaneous connection attempts is indeed unbounded, and proves that using an atomic fixes this problem:

grpc/grpc#39150

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bunch of picks on the current picker in parallel

That means you have multiple RPCs. The text here is that a single RPC can't cause a bunch of connections. I thought we've always accepted that multiple RPCs cause more aggressive connection creation, and can even consider that good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good point. We are actually okay with starting a separate connection attempt for each RPC based on its hash -- we do the same thing in the xDS case, where the random hash for a given RPC is stable across multiple pick attempts for that RPC. (In the non-xDS case, we will actually inhibit those multiple connection attempts once the picker gets updated due to picker_has_endpoint_in_connecting_state, but that's probably fine.)

Given that, I think we can stick with the current implementation without using an atomic. However, let's change the paragraph below to clarify that we can still start multiple simultaneous connection attempts for different RPCs, just not for a single RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I've updated the paragraph. How do you all feel about the second question from the PR description:

If we queued the pick because there was no endpoint in READY state and there was endpoints in IDLE or CONNECTING state, then if a queued pick is triggered when all endpoints in CONNECTING have transitioned to either READY or TRANSIENT_FAILURE, we may end up triggering a second connection attempt again on the next pick for an RPC that already triggered one.

I ran into this when asserting that a single RPC only wakes up a single endpoint, when there are $n$ idle endpoints:

  1. The picker randomly chooses an endpoint in IDLE state, triggers a first connection attempt to that backend. The pick is queued.
  2. The picker is updated when the endpoint transitions to Connecting state, no new connection is triggered because picker_has_endpoint_in_connecting_state is true. There is no endpoint in READY state. The pick is queued again.
  3. The picker is updated when the endpoint transitions to Ready state. Since there is no other endpoint in Connecting, if the random endpoint is not the one in READY state (probability $\frac{n-1}{n}$), then a second connection attempt is triggered.

Just want to make sure you're OK with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This behavior ensures that a single RPC does not cause more than two endpoint to
This behavior ensures that a single RPC does not cause more than two endpoints to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

exit `IDLE` state, and that a request missing the header does not incur extra
latency in the common case where there is already at least one endpoint in
`READY` state. It converges to picking a random endpoint if the header is not
set, since each request may cause one or two random endpoints to go from `IDLE`
to `READY`:
- a first connection attempt if it randomly picks an `IDLE` endpoint and no
endpoint is currently `CONNECTING`
- a second connection attempt if the pick was queued and an endpoint
transitioned from `CONNECTING` to `READY`, but no other endpoint is currently
`CONNECTING`.

Because pickers generated with `picker_has_endpoint_in_connecting_state` set to
false may be used by multiple requests until an endpoint transitions to
`CONNECTING` and a the picker is updated, we can still start multiple
simultaneous connection attempts for different RPCs.

### Explicitly setting the endpoint hash key

Expand All @@ -178,6 +189,10 @@ endpoint attribute to the value of [LbEndpoint.Metadata][LbEndpoint.Metadata]
`envoy.lb` `hash_key` field, as described in [Envoy's documentation for the ring
hash load balancer][envoy-ringhash].

If multiple endpoints have the same `hash_key`, then which endpoint is chosen is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will note that this problem is not strictly new in this gRFC, because it's conceivably possible for there to be hash collisions even in the original design. But I agree that the fact that we're allowing hash keys to be specified manually makes it more likely, so I'm fine with noting this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

unspecified, and the implementation is free to choose any of the endpoint with a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unspecified, and the implementation is free to choose any of the endpoint with a
unspecified, and the implementation is free to choose any of the endpoints with a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

matching `hash_key`.

### Temporary environment variable protection

Explicitly setting the request hash key will be gated by the
Expand Down Expand Up @@ -230,7 +245,9 @@ considered the following alternative solutions:

## Implementation

Implemented in C-core in https://github.com/grpc/grpc/pull/38312.
Implemented in:
- C-core in https://github.com/grpc/grpc/pull/38312.
- Go: https://github.com/grpc/grpc-go/pull/8159

[A42]: A42-xds-ring-hash-lb-policy.md
[A61]: A61-IPv4-IPv6-dualstack-backends.md
Expand Down