-
Notifications
You must be signed in to change notification settings - Fork 248
A76 update: Clarify behaviors discovered during implementation #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -157,11 +157,22 @@ if !using_random_hash { | |||||
return ring[first_index].picker->Pick(...); | ||||||
``` | ||||||
|
||||||
This behavior ensures that a single RPC does not cause more than one endpoint to | ||||||
exit `IDLE` state at a time, and that a request missing the header does not | ||||||
incur extra latency in the common case where there is already at least one | ||||||
endpoint in `READY` state. It converges to picking a random endpoint, since each | ||||||
request may eventually cause a random endpoint to go from `IDLE` to `READY`. | ||||||
This behavior ensures that a single RPC does not cause more than two endpoint to | ||||||
|
This behavior ensures that a single RPC does not cause more than two endpoint to | |
This behavior ensures that a single RPC does not cause more than two endpoints to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will note that this problem is not strictly new in this gRFC, because it's conceivably possible for there to be hash collisions even in the original design. But I agree that the fact that we're allowing hash keys to be specified manually makes it more likely, so I'm fine with noting this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unspecified, and the implementation is free to choose any of the endpoint with a | |
unspecified, and the implementation is free to choose any of the endpoints with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we say that it's bounded to 2, but I think the problem described in the next paragraph means that it's actually not bounded at all. It's conceivably possible to have a case where all endpoints are in IDLE state and we get a bunch of picks on the current picker in parallel, where each pick could trigger a connection attempt on a different IDLE endpoint before the picker gets updated.
If this is indeed completely unbounded, then I think it would make sense for us to use an atomic for
picker_has_endpoint_in_connecting_state
and do a CAS loop to set it, so that we can guarantee that we'll trigger only one connection attempt at a time.@ejona86, @dfawley, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test in C-core that proves that the number of simultaneous connection attempts is indeed unbounded, and proves that using an atomic fixes this problem:
grpc/grpc#39150
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means you have multiple RPCs. The text here is that a single RPC can't cause a bunch of connections. I thought we've always accepted that multiple RPCs cause more aggressive connection creation, and can even consider that good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a good point. We are actually okay with starting a separate connection attempt for each RPC based on its hash -- we do the same thing in the xDS case, where the random hash for a given RPC is stable across multiple pick attempts for that RPC. (In the non-xDS case, we will actually inhibit those multiple connection attempts once the picker gets updated due to
picker_has_endpoint_in_connecting_state
, but that's probably fine.)Given that, I think we can stick with the current implementation without using an atomic. However, let's change the paragraph below to clarify that we can still start multiple simultaneous connection attempts for different RPCs, just not for a single RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I've updated the paragraph. How do you all feel about the second question from the PR description:
I ran into this when asserting that a single RPC only wakes up a single endpoint, when there are$n$ idle endpoints:
IDLE
state, triggers a first connection attempt to that backend. The pick is queued.Connecting
state, no new connection is triggered becausepicker_has_endpoint_in_connecting_state
is true. There is no endpoint inREADY
state. The pick is queued again.Ready
state. Since there is no other endpoint inConnecting
, if the random endpoint is not the one inREADY
state (probabilityJust want to make sure you're OK with this.