-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Summary
The implementation of the System.ClientModel
retry policy will strictly honor a Retry-After
header returned regardless of the length it specifies. (src)
Though the retry and overall network request is governed by a cancellation token, an extreme Retry-After
value can cause an unexpected delay that strongly resembles a client hang to callers. For example, in OpenAI #404, the service returns a Retry-After
of more than 114 hours (~4.75 DAYS). It seems inappropriate to unconditionally honor this.
Proposed change
I'd propose that when evaluating a Retry-After
header, the policy applies the following logic:
-
Calculate the delay for the current retry.
-
If the
Retry-After
duration exceeds that duration, "consume" another retry. -
If the
Retry-After
duration exceeds the total duration from those retry attempts, repeat. -
If the
Retry-After
duration is less or equal to the total duration from consumed retry attempts, adjust the retry counter to match and apply the delay. -
If all retries were consumed and the
Retry-After
duration exceeds the total duration from all configured retries, do not retry. Just bubble the original exception.
Why am I proposing this?
-
The current behavior for an extreme
Retry-After
is unexpected and confusing. -
User studies have shown that a small number of callers make use of cancellation tokens.
-
If the client knows the retry would fail after a period as long as all of the configured retries, it is better to return control to the caller and allow them to evaluate whether or not to attempt the operation again. Applying an unreasonably long retry delay impacts the caller's responsiveness to its users.