Skip to content

Conversation

@Cozmopolit
Copy link

Motivation and Context
For non‑streaming chat completions, the OpenAI / Azure OpenAI connector already logs token usage via LogUsage, which allows downstream systems to track prompt/completion/total tokens and costs.

For streaming chat completions, however, the .NET OpenAI connector currently never logs usage, even when the OpenAI API is called with stream_options: { "include_usage": true } and sends a final usage‑only chunk. As a result, consumers that rely on these metrics (e.g., for cost and token‑usage tracking) see no usage data at all for streaming calls, while non‑streaming calls behave as expected.

This PR fixes that gap and aligns the .NET connector’s behavior with both the OpenAI API semantics and the existing Python Semantic Kernel fix for streaming usage reporting.

Description
OpenAI’s chat completions API, when used with stream_options: { "include_usage": true }, emits:

normal streaming chunks with choices populated and usage == null, and
a final chunk where choices is empty and usage contains the final token counts.
The non‑streaming path in the .NET OpenAI connector already calls LogUsage(chatCompletion.Usage). The streaming path, however, did not call LogUsage at all, so usage was never reported for streaming completions.

This change updates ClientCore.ChatCompletion.GetStreamingChatMessageContentsAsync as follows:

Introduces a ChatTokenUsage? finalUsage local variable alongside the existing streaming state.
On each StreamingChatCompletionUpdate, if chatCompletionUpdate.Usage is non‑null, it overwrites finalUsage with that value.
After the streaming loop completes, if finalUsage is not null, it calls this.LogUsage(finalUsage) once.
There are no changes to the public API surface and no behavior changes for the non‑streaming path. The fix is intentionally minimal and makes the streaming path emit a single, consolidated usage event, consistent with:

the non‑streaming behavior in .NET,
the OpenAI API’s final usage‑only chunk semantics, and
the previously merged Python Semantic Kernel fix for streaming usage.

Contribution Checklist
[x] The code builds clean without any errors or warnings
[x] The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
[x] All unit tests pass
[x] I didn't break anyone today 😄

@markwallace-microsoft
Copy link
Member

@Cozmopolit thanks for the contribution, the team will take a look and provide feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants