[BUG] Strange timeouts when interacting with Azure OpenAI Regional PTU

### Library name and version

SemanticKernel 1.56.0

### Describe the bug

![Image](https://github.com/user-attachments/assets/11db072c-ea2d-4ca3-9c4b-99adf9e9cabc)

We are short circuiting long running calls (past 5 seconds, used to be past 10) to our model deployment. A majority of the calls will land within 0-2 seconds, however some never come back. Strangely, they fit _mostly_ within a certain request length bucket, of 4000-5000. It feels like this might be some weird hot path / sharding strategy based on token size or request length behind the scenes, but I'm not sure if the SDK is doing something strange with the requests or retrying behind the scenes as 2000~ tokens on a 4o-mini PTU instance should be really, really quick.

The SemanticKernel team pointed me here also. A support ticket has been raised async.

### Expected behavior

Responses within 0-2 seconds.

### Actual behavior

Responses exceeding 10 seconds.

### Reproduction Steps

Stand up a regional PTU for 4o-mini and hit it with a request of a length that coincides with the bucket outlined above.

### Environment

- .NET 9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Strange timeouts when interacting with Azure OpenAI Regional PTU #50809

Library name and version

Describe the bug

Expected behavior

Actual behavior

Reproduction Steps

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Strange timeouts when interacting with Azure OpenAI Regional PTU #50809

Description

Library name and version

Describe the bug

Expected behavior

Actual behavior

Reproduction Steps

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions