Fix: emit streaming usage once at final chunk for OpenAI/Azure #13386
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
For non‑streaming chat completions, the OpenAI / Azure OpenAI connector already logs token usage via LogUsage, which allows downstream systems to track prompt/completion/total tokens and costs.
For streaming chat completions, however, the .NET OpenAI connector currently never logs usage, even when the OpenAI API is called with stream_options: { "include_usage": true } and sends a final usage‑only chunk. As a result, consumers that rely on these metrics (e.g., for cost and token‑usage tracking) see no usage data at all for streaming calls, while non‑streaming calls behave as expected.
This PR fixes that gap and aligns the .NET connector’s behavior with both the OpenAI API semantics and the existing Python Semantic Kernel fix for streaming usage reporting.
Description
OpenAI’s chat completions API, when used with stream_options: { "include_usage": true }, emits:
normal streaming chunks with choices populated and usage == null, and
a final chunk where choices is empty and usage contains the final token counts.
The non‑streaming path in the .NET OpenAI connector already calls LogUsage(chatCompletion.Usage). The streaming path, however, did not call LogUsage at all, so usage was never reported for streaming completions.
This change updates ClientCore.ChatCompletion.GetStreamingChatMessageContentsAsync as follows:
Introduces a ChatTokenUsage? finalUsage local variable alongside the existing streaming state.
On each StreamingChatCompletionUpdate, if chatCompletionUpdate.Usage is non‑null, it overwrites finalUsage with that value.
After the streaming loop completes, if finalUsage is not null, it calls this.LogUsage(finalUsage) once.
There are no changes to the public API surface and no behavior changes for the non‑streaming path. The fix is intentionally minimal and makes the streaming path emit a single, consolidated usage event, consistent with:
the non‑streaming behavior in .NET,
the OpenAI API’s final usage‑only chunk semantics, and
the previously merged Python Semantic Kernel fix for streaming usage.
Contribution Checklist
[x] The code builds clean without any errors or warnings
[x] The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
[x] All unit tests pass
[x] I didn't break anyone today 😄