Skip to content

[BUG] Strange timeouts when interacting with Azure OpenAI Regional PTU #50809

@dominic-codespoti

Description

@dominic-codespoti

Library name and version

SemanticKernel 1.56.0

Describe the bug

Image

We are short circuiting long running calls (past 5 seconds, used to be past 10) to our model deployment. A majority of the calls will land within 0-2 seconds, however some never come back. Strangely, they fit mostly within a certain request length bucket, of 4000-5000. It feels like this might be some weird hot path / sharding strategy based on token size or request length behind the scenes, but I'm not sure if the SDK is doing something strange with the requests or retrying behind the scenes as 2000~ tokens on a 4o-mini PTU instance should be really, really quick.

The SemanticKernel team pointed me here also. A support ticket has been raised async.

Expected behavior

Responses within 0-2 seconds.

Actual behavior

Responses exceeding 10 seconds.

Reproduction Steps

Stand up a regional PTU for 4o-mini and hit it with a request of a length that coincides with the bucket outlined above.

Environment

  • .NET 9

Metadata

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue is related to a non-management packageOpenAIService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions