Fix: emit streaming usage once at final chunk for OpenAI/Azure #13386

Cozmopolit · 2025-11-24T19:44:51Z

Motivation and Context
For non‑streaming chat completions, the OpenAI / Azure OpenAI connector already logs token usage via LogUsage, which allows downstream systems to track prompt/completion/total tokens and costs.

For streaming chat completions, however, the .NET OpenAI connector currently never logs usage, even when the OpenAI API is called with stream_options: { "include_usage": true } and sends a final usage‑only chunk. As a result, consumers that rely on these metrics (e.g., for cost and token‑usage tracking) see no usage data at all for streaming calls, while non‑streaming calls behave as expected.

This PR fixes that gap and aligns the .NET connector’s behavior with both the OpenAI API semantics and the existing Python Semantic Kernel fix for streaming usage reporting.

Description
OpenAI’s chat completions API, when used with stream_options: { "include_usage": true }, emits:

normal streaming chunks with choices populated and usage == null, and
a final chunk where choices is empty and usage contains the final token counts.
The non‑streaming path in the .NET OpenAI connector already calls LogUsage(chatCompletion.Usage). The streaming path, however, did not call LogUsage at all, so usage was never reported for streaming completions.

This change updates ClientCore.ChatCompletion.GetStreamingChatMessageContentsAsync as follows:

Introduces a ChatTokenUsage? finalUsage local variable alongside the existing streaming state.
On each StreamingChatCompletionUpdate, if chatCompletionUpdate.Usage is non‑null, it overwrites finalUsage with that value.
After the streaming loop completes, if finalUsage is not null, it calls this.LogUsage(finalUsage) once.
There are no changes to the public API surface and no behavior changes for the non‑streaming path. The fix is intentionally minimal and makes the streaming path emit a single, consolidated usage event, consistent with:

the non‑streaming behavior in .NET,
the OpenAI API’s final usage‑only chunk semantics, and
the previously merged Python Semantic Kernel fix for streaming usage.

Contribution Checklist
[x] The code builds clean without any errors or warnings
[x] The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
[x] All unit tests pass
[x] I didn't break anyone today 😄

markwallace-microsoft · 2025-11-28T10:07:44Z

@Cozmopolit thanks for the contribution, the team will take a look and provide feedback

Fix: emit streaming usage once at final chunk for OpenAI/Azure

01c32f7

Cozmopolit requested a review from a team as a code owner November 24, 2025 19:44

Cozmopolit mentioned this pull request Nov 24, 2025

Bug: .NET: Gemini connector emits duplicate token usage metrics during streaming (similar to #12977 for Python/OpenAI) #13382

Open

Cozmopolit temporarily deployed to integration November 28, 2025 10:06 — with GitHub Actions Inactive

SergeyMenshykh approved these changes Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: emit streaming usage once at final chunk for OpenAI/Azure #13386

Fix: emit streaming usage once at final chunk for OpenAI/Azure #13386

Uh oh!

Cozmopolit commented Nov 24, 2025

Uh oh!

markwallace-microsoft commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: emit streaming usage once at final chunk for OpenAI/Azure #13386

Are you sure you want to change the base?

Fix: emit streaming usage once at final chunk for OpenAI/Azure #13386

Uh oh!

Conversation

Cozmopolit commented Nov 24, 2025

Uh oh!

markwallace-microsoft commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants