-
Notifications
You must be signed in to change notification settings - Fork 95
feat: add latency tracking and enhanced token usage details #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add latency tracking and enhanced token usage details #665
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Warning Rate limit exceeded@Pratham-Mishra04 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 50 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (17)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThis PR instruments Vertex provider calls with latency, adds detailed token usage breakdowns (prompt/completion subfields), introduces bidirectional usage conversion helpers, extends Gemini usage extraction, enhances pricing fallback logic for Vertex/model formats, and surfaces token detail fields in logging and UI. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant VertexProvider as Vertex Provider
participant HTTP as HTTP Client
Note over Client,VertexProvider: Chat/Embedding request flow with latency instrumentation
Client->>VertexProvider: send Chat/Embedding request
VertexProvider->>HTTP: perform HTTP call (startTime recorded)
HTTP-->>VertexProvider: response
Note over VertexProvider: compute latency = now - startTime\npopulate ExtraFields.Latency (ms)
VertexProvider-->>Client: return response with ExtraFields.Latency
sequenceDiagram
participant Caller
participant Pricing as CalculateCostFromUsage
participant PrimaryLookup
participant VertexLookup
participant ChatCompFallback
Caller->>Pricing: requestType, model, provider
Pricing->>PrimaryLookup: try primary lookup (e.g., Gemini)
alt Primary found
PrimaryLookup-->>Pricing: pricing
else Primary not found
PrimaryLookup-->>Pricing: not found
Pricing->>VertexLookup: try Vertex for same model (or stripped model if provider/ prefix)
alt Vertex found
VertexLookup-->>Pricing: pricing
else Vertex not found
VertexLookup-->>Pricing: not found
alt requestType is chat-related
Pricing->>ChatCompFallback: try chat-completion pricing
ChatCompFallback-->>Pricing: pricing or not found
end
end
end
Pricing-->>Caller: final pricing or default
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/schemas/providers/cohere/chat.go (1)
291-310: Move CachedTokens and TotalTokens computation outside the Tokens nil check.CachedTokens and Tokens are independent fields in the Cohere response. Currently, CachedTokens is gated inside the
if response.Usage.Tokens != nilblock, causing data loss when Tokens is nil. TotalTokens computation is also gated, staying at 0. The responses.go file correctly accesses CachedTokens at the top level (lines 107 and 648), showing the inconsistency. Apply the suggested fix to move both checks outside and compute TotalTokens unconditionally.core/schemas/providers/gemini/utils.go (1)
361-372: Nil dereference risk: message.Content may be nil.convertBifrostMessagesToGemini reads message.Content without a nil check, which will panic for tool-only or assistant-only messages.
- // Handle content - if message.Content.ContentStr != nil && *message.Content.ContentStr != "" { - parts = append(parts, &CustomPart{ - Text: *message.Content.ContentStr, - }) - } else if message.Content.ContentBlocks != nil { - for _, block := range message.Content.ContentBlocks { + // Handle content + if message.Content != nil { + if message.Content.ContentStr != nil && *message.Content.ContentStr != "" { + parts = append(parts, &CustomPart{ + Text: *message.Content.ContentStr, + }) + } else if message.Content.ContentBlocks != nil { + for _, block := range message.Content.ContentBlocks { if block.Text != nil { parts = append(parts, &CustomPart{ Text: *block.Text, }) } - } + } + } }plugins/logging/main.go (1)
303-307: Code clarity issue: latency is persisted but via inconsistent paths.The review concern is partially valid. While
updateStreamingLogEntrydoesn't receive a latency parameter (line 353), it does persist latency viastreamResponse.Data.Latency(operations.go:170). However,logMsg.Latency(captured at lines 304-306) remains unused in the streaming path—it's dead code. The non-streaming path (line 321) explicitly passes latency toupdateLogEntry, creating an asymmetrical and confusing pattern.Recommendation: Either pass
logMsg.LatencytoupdateStreamingLogEntryto match the non-streaming pattern, or remove the orphanedlogMsg.Latencyassignment and rely solely onstreamResponse.Data.Latency. This removes ambiguity and ensures consistent latency handling across both paths.
🧹 Nitpick comments (6)
framework/pricing/main.go (2)
497-550: Clarify log wording and consider reducing chat fallback duplication.The verification confirms your observations:
- Three chat fallback patterns exist at lines 509, 529, and 540
- Lines 509 and 529 contain identical log text ("secondary lookup failed, trying vertex provider for the same model in chat completion")
- Line 540's wording "trying chat provider" is indeed misleading and should clarify it means chat-completion pricing
The suggested log text tweaks are sound and improve clarity:
- Line 509 & 529: Change to "secondary lookup failed, trying chat-completion pricing for the same model with vertex provider"
- Line 540: Change to "primary lookup failed, trying chat-completion pricing for the same model"
The optional refactoring suggestion to extract a helper function is valid—the triplication of this pattern (check ResponsesRequest/ResponsesStreamRequest, then attempt ChatCompletionRequest lookup) creates maintenance risk. However, this remains optional given the minimal scope.
493-505: Normalize provider at start of getPricing for consistent keying.The verification confirms the inconsistency:
populateModelPool(line 433) normalizes provider before keying, butgetPricinguses the raw provider parameter directly inmakeKey(line 497). This creates fragility—if pricingData keys use normalized providers, lookups fail with non-canonical inputs like "vertex_ai" or mixed-case variants, forcing the current fallback logic.Normalizing provider at the function entry would align with the
populateModelPoolpattern and ensure consistent keying across all three lookup locations (lines 497, 502, 520, 543).core/providers/vertex.go (1)
498-505: Now let me search for the actual BifrostEmbeddingRequest initialization in the codebase:Based on the earlier script output, I can see the semanticcache plugin creates a
BifrostEmbeddingRequest, but the Model field initialization was not visible in the truncated output. However, I have enough information from the verification to assess the review comment:Guard against double‑prefix in embedding endpoint.
Based on my verification, Vertex AI's embedding API requires the MODEL_ID to be just the trailing model identifier (e.g., "gemini-embedding-001", "textembedding-gecko@001"), not a fully qualified path like "publishers/google/models/textembedding-gecko@001". The current code at lines 498-505 in core/providers/vertex.go directly interpolates
request.Modelinto the URL without sanitization.While the initial script output did not reveal explicit evidence that callers are passing fully qualified model names, the concern is valid from a defensive programming perspective. Some Vertex AI documentation examples show the full resource name being used (e.g., "projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/text-embedding-004"), which suggests that callers or configuration sources could potentially pass qualified names.
The proposed sanitization fix is reasonable defensive programming. However, without evidence that this issue is currently manifesting in the codebase, this is a preventive refactor rather than a critical bug fix. The suggested code changes are appropriate and would guard against URL malformation if unexpected input formats are ever introduced.
core/schemas/providers/gemini/transcription.go (1)
100-106: Always populate usage, even if no text extracted.Currently, Usage is only set when text is present. Populate Usage whenever metadata exists so logs/costing remain accurate for empty/edge transcripts.
- if textContent != "" { - bifrostResp.Text = textContent - bifrostResp.Task = schemas.Ptr("transcribe") - - // Set usage information - bifrostResp.Usage = &schemas.TranscriptionUsage{ - Type: "tokens", - InputTokens: &inputTokens, - OutputTokens: &outputTokens, - TotalTokens: &totalTokens, - } - } + if textContent != "" { + bifrostResp.Text = textContent + bifrostResp.Task = schemas.Ptr("transcribe") + } + // Set usage information regardless of text presence + bifrostResp.Usage = &schemas.TranscriptionUsage{ + Type: "tokens", + InputTokens: &inputTokens, + OutputTokens: &outputTokens, + TotalTokens: &totalTokens, + }plugins/logging/main.go (1)
454-458: Ensure pooled UpdateLogData is zeroed before reuse.When returning UpdateLogData to the pool, clear all fields (including nested pointers/slices) to avoid leaking prior state across requests.
Can you confirm putUpdateLogData(...) nils/zeroes every field?
core/schemas/providers/gemini/utils.go (1)
480-521: Standardize MP3 MIME type toaudio/mpeg.detectAudioMimeType returns
audio/mp3in several branches, while normalizeAudioMIMEType maps MP3 toaudio/mpeg. Preferaudio/mpegeverywhere for compatibility.- return "audio/mp3" + return "audio/mpeg" @@ - return "audio/mp3" + return "audio/mpeg" @@ - return "audio/mp3" + return "audio/mpeg" @@ - return "audio/mp3" + return "audio/mpeg"Also applies to: 539-556
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
core/providers/vertex.go(9 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/mux.go(4 hunks)core/schemas/providers/anthropic/chat.go(3 hunks)core/schemas/providers/cohere/chat.go(1 hunks)core/schemas/providers/cohere/types.go(1 hunks)core/schemas/providers/gemini/chat.go(3 hunks)core/schemas/providers/gemini/transcription.go(1 hunks)core/schemas/providers/gemini/utils.go(1 hunks)framework/changelog.md(1 hunks)framework/pricing/main.go(3 hunks)plugins/logging/main.go(1 hunks)ui/app/logs/views/logDetailsSheet.tsx(1 hunks)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (18)
core/schemas/providers/cohere/types.go (1)
505-510: CohereStreamMessage field alignment — no functional change.Tags and optionality unchanged. Good to merge.
framework/pricing/main.go (1)
306-312: Useful debug log.Good addition; helps trace pricing resolution.
core/providers/vertex.go (2)
249-274: Latency instrumentation — good coverage.Start/stop around client.Do and propagation into ExtraFields for both Claude and non-Claude paths looks correct.
Also applies to: 318-324, 339-344
600-606: Embedding latency propagation — LGTM.ExtraFields.Provider/ModelRequested/RequestType/Latency set consistently; raw response preserved behind flag.
core/schemas/chatcompletions.go (1)
548-565: Now I need to verify if there are corresponding fields in ResponsesResponse for all these token details, or if the incomplete mapping is intentional:Based on my verification, I found that the new token detail fields are defined but only partially propagated through type conversions. The conversions in
core/schemas/mux.go(lines 629-637, 659-662) map onlyCachedTokensandReasoningTokens, but do not propagateAudioTokens,AcceptedPredictionTokens, orRejectedPredictionTokens—even though these fields are defined in the structs and used by provider integrations (Gemini, Anthropic, Cohere).This creates an incomplete mapping between
BifrostLLMUsageand the Responses API format. The scope of this gap—whether it's a missing feature or intentionally limited by the Responses API specification—requires verification against theResponsesResponsestruct definition, which I couldn't fully inspect.Incomplete token detail propagation in Responses API conversions.
Several fields defined in
ChatCompletionTokensDetailsandChatPromptTokensDetailsare not being mapped through the conversion functions incore/schemas/mux.go. Lines 634–637 and 659–662 propagate onlyCachedTokensandReasoningTokens, but skipAudioTokens,AcceptedPredictionTokens, andRejectedPredictionTokens. This means data populated by provider integrations (Gemini, Anthropic, Cohere) is lost when converting to Responses API responses. Verify whether the Responses API specification intentionally excludes these fields, or whether the conversion logic needs to be extended to map all detail fields.core/schemas/providers/gemini/transcription.go (1)
80-81: Signature update handled correctly.Accepting 5 returns and discarding the last two is appropriate since TranscriptionUsage doesn’t carry usage detail fields.
plugins/logging/main.go (2)
390-390: Good move to centralized conversion.Using ToBifrostLLMUsage() keeps Responses usage mapping consistent with the new detail fields.
121-124: No issues found.The repository targets Go 1.24, as specified in
plugins/logging/go.mod. This satisfies the Go 1.22+ requirement for thefor range 1000syntax introduced in Go 1.22, so the code is compatible and will compile correctly.framework/changelog.md (1)
4-5: Changelog entry reads well.Entry clearly states the new vertex-provider/model pricing lookup support.
core/schemas/providers/gemini/utils.go (1)
156-166: Usage extraction extended correctly.New cachedTokens and reasoningTokens mapping from UsageMetadata looks correct and keeps old fields intact.
core/schemas/providers/gemini/chat.go (2)
338-389: Usage detail propagation looks correct.CachedTokens -> PromptTokensDetails and ReasoningTokens -> CompletionTokensDetails are set as expected.
471-484: Reverse mapping back to Gemini metadata is consistent.Details are written back to CachedContentTokenCount and ThoughtsTokenCount when present.
ui/app/logs/views/logDetailsSheet.tsx (1)
154-185: LGTM: Completion token details rendering is correct.The conditional rendering of
completion_tokens_detailsproperly checks for the existence of each field before displaying it, avoiding unnecessary UI clutter.core/schemas/providers/anthropic/chat.go (2)
354-361: Verify: Aggregating cache creation and read tokens loses pricing granularity.Line 356 sums
CacheCreationInputTokensandCacheReadInputTokensinto a singleCachedTokensfield. Anthropic typically prices these differently (cache creation is more expensive than cache reads), so this aggregation loses important cost breakdown information.Please confirm whether:
- This aggregation is acceptable for cost tracking and observability purposes
- The UI and downstream systems don't need separate visibility into cache creation vs. cache read costs
If granular cache cost tracking is important, consider adding separate fields to
ChatPromptTokensDetailsfor cache creation and cache read tokens.
621-624: Verify: Function has no internal callers in the codebase.During verification,
ToAnthropicChatCompletionResponsewas found to have no call sites anywhere in the repository—only its definition exists. This means either:
- The function is part of a public/exported API used by external code (outside this repository)
- The function is unused/dead code
If this is a public API, external callers could encounter the issue where all
CachedTokensare mapped toCacheReadInputTokensonly, potentially resulting in incorrect cost estimates if they rely on this conversion. Confirm whether this function is intentionally exposed as a public API or if it should be removed.core/schemas/mux.go (3)
643-666: Verify: Reverse token detail conversion has the same incompleteness.The
ToBifrostLLMUsagemethod has the same limitation asToResponsesResponseUsage- it only mapsCachedTokensandReasoningTokens, potentially missingAudioTokens,AcceptedPredictionTokens, andRejectedPredictionTokens.This is the reverse direction of the previous comment. Ensure both conversion methods map all available token detail fields to avoid data loss in bidirectional conversions.
858-858: LGTM: Clean refactoring to use conversion methods.The replacement of inline usage construction with calls to
ToResponsesResponseUsage()andToBifrostLLMUsage()improves code maintainability and consistency. The conversion methods centralize the mapping logic, making future updates easier.Also applies to: 904-904, 1013-1013
618-641: Target type definitions are incomplete; fields cannot be mapped.The conversion function correctly maps all available fields in
ResponsesResponseInputTokensandResponsesResponseOutputTokens. However, the source structs contain fields that the target types do not define:
ChatPromptTokensDetails.AudioTokens→ no corresponding field inResponsesResponseInputTokensChatCompletionTokensDetails.AudioTokens→ no corresponding field inResponsesResponseOutputTokensChatCompletionTokensDetails.AcceptedPredictionTokens→ no corresponding field inResponsesResponseOutputTokensChatCompletionTokensDetails.RejectedPredictionTokens→ no corresponding field inResponsesResponseOutputTokensIf these fields should be exposed in the Responses API format, extend
ResponsesResponseInputTokensandResponsesResponseOutputTokens(core/schemas/responses.go, lines 248-254) to include them. Otherwise, the current conversion is complete as-is.
427f5c2 to
fbdd429
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/providers/vertex.go (1)
318-323: Overwriting ExtraFields can drop fields set by converters; set fields incrementally.Align with the non-Claude branch and avoid clobbering existing ExtraFields.
- response.ExtraFields = schemas.BifrostResponseExtraFields{ - RequestType: schemas.ChatCompletionRequest, - Provider: schemas.Vertex, - ModelRequested: request.Model, - Latency: latency.Milliseconds(), - } + response.ExtraFields.RequestType = schemas.ChatCompletionRequest + response.ExtraFields.Provider = schemas.Vertex + response.ExtraFields.ModelRequested = request.Model + response.ExtraFields.Latency = latency.Milliseconds()
🧹 Nitpick comments (4)
framework/changelog.md (1)
4-5: Changelog could mention latency + token detail surfaces added.Add entries for Vertex latency reporting and token usage detail fields to help downstream consumers track the change scope.
framework/pricing/main.go (1)
500-515: Unify provider constants and reduce duplicate fallback code.
- Use string(schemas.Vertex) instead of hard-coded "vertex" to avoid drift.
- Consolidate repeated “responses→chat” fallbacks into a helper to keep behavior consistent and logging messages accurate.
Apply minimally:
- pricing, ok = pm.pricingData[makeKey(model, "vertex", normalizeRequestType(requestType))] + pricing, ok = pm.pricingData[makeKey(model, string(schemas.Vertex), normalizeRequestType(requestType))] ... - pricing, ok = pm.pricingData[makeKey(modelWithoutProvider, "vertex", normalizeRequestType(requestType))] + pricing, ok = pm.pricingData[makeKey(modelWithoutProvider, string(schemas.Vertex), normalizeRequestType(requestType))] ... - pm.logger.Debug("primary lookup failed, trying chat provider for the same model in chat completion") + pm.logger.Debug("primary lookup failed, trying chat pricing for same model/provider in chat completion")Also applies to: 517-535, 539-545
core/providers/vertex.go (1)
267-269: Aggressive client eviction on generic network errors.Evicting the pooled client for any non-timeout error may churn tokens/transport on transient blips. Consider evicting only on 401/403 and keeping the client for retryable network errors.
core/schemas/providers/gemini/chat.go (1)
338-389: Good: propagate cached/reasoning token details.Optional: only set detail pointers when counts > 0 to avoid empty objects in JSON.
- PromptTokensDetails: &schemas.ChatPromptTokensDetails{ CachedTokens: cachedTokens }, + PromptTokensDetails: func() *schemas.ChatPromptTokensDetails { + if cachedTokens > 0 { return &schemas.ChatPromptTokensDetails{ CachedTokens: cachedTokens } } + return nil + }(), - CompletionTokensDetails: &schemas.ChatCompletionTokensDetails{ ReasoningTokens: reasoningTokens }, + CompletionTokensDetails: func() *schemas.ChatCompletionTokensDetails { + if reasoningTokens > 0 { return &schemas.ChatCompletionTokensDetails{ ReasoningTokens: reasoningTokens } } + return nil + }(),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
core/providers/vertex.go(9 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/mux.go(4 hunks)core/schemas/providers/anthropic/chat.go(3 hunks)core/schemas/providers/cohere/chat.go(1 hunks)core/schemas/providers/cohere/types.go(1 hunks)core/schemas/providers/gemini/chat.go(3 hunks)core/schemas/providers/gemini/transcription.go(1 hunks)core/schemas/providers/gemini/utils.go(1 hunks)framework/changelog.md(1 hunks)framework/pricing/main.go(3 hunks)plugins/logging/main.go(1 hunks)ui/app/logs/views/logDetailsSheet.tsx(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- plugins/logging/main.go
- core/schemas/providers/cohere/types.go
- core/schemas/providers/anthropic/chat.go
- core/schemas/providers/cohere/chat.go
🧰 Additional context used
🧬 Code graph analysis (7)
core/providers/vertex.go (5)
plugins/mocker/main.go (1)
Latency(125-129)core/schemas/provider.go (1)
ErrProviderJSONMarshaling(26-26)core/schemas/bifrost.go (3)
Vertex(40-40)RequestType(79-79)EmbeddingRequest(88-88)core/schemas/account.go (1)
VertexKeyConfig(29-33)transports/bifrost-http/handlers/inference.go (1)
EmbeddingRequest(202-206)
framework/pricing/main.go (1)
core/schemas/bifrost.go (4)
ResponsesRequest(86-86)ResponsesStreamRequest(87-87)ChatCompletionRequest(84-84)Vertex(40-40)
core/schemas/mux.go (2)
core/schemas/chatcompletions.go (3)
BifrostLLMUsage(547-553)ChatPromptTokensDetails(555-558)ChatCompletionTokensDetails(560-565)core/schemas/responses.go (3)
ResponsesResponseUsage(240-246)ResponsesResponseInputTokens(248-250)ResponsesResponseOutputTokens(252-254)
core/schemas/providers/gemini/utils.go (1)
core/schemas/providers/gemini/types.go (1)
GenerateContentResponse(1273-1289)
core/schemas/providers/gemini/chat.go (3)
core/schemas/chatcompletions.go (2)
ChatPromptTokensDetails(555-558)ChatCompletionTokensDetails(560-565)ui/lib/types/logs.ts (1)
CompletionTokensDetails(184-189)plugins/mocker/main.go (1)
Usage(138-142)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
CompletionTokensDetails(184-189)
ui/app/logs/views/logDetailsSheet.tsx (1)
ui/app/logs/views/logEntryDetailsView.tsx (1)
LogEntryDetailsView(15-47)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
framework/pricing/main.go (1)
306-312: Helpful debug log.Good addition; aids tracing pricing paths without changing behavior.
core/providers/vertex.go (1)
600-605: Latency propagation for embeddings looks good.Accurate measurement window (just the HTTP call) and consistent ExtraFields population.
core/schemas/providers/gemini/transcription.go (1)
80-81: Signature update correctly handled.Unpacking new return values with discards is fine; no behavior change.
core/schemas/providers/gemini/chat.go (1)
478-483: Reverse mapping to Gemini usage is correct.Fields map 1:1 with Gemini UsageMetadata; looks solid.
core/schemas/chatcompletions.go (1)
546-553: Frontend types updated:ui/lib/types/logs.tsalready definesprompt_tokens_detailsandcompletion_tokens_details, matching the API additions.core/schemas/providers/gemini/utils.go (1)
156-166: Signature widening verified
transcription.go and chat.go both use the updated 5-tuple; no stale callers remain.core/schemas/mux.go (2)
858-859: Good refactor: centralize usage mapping via helper.Switching to cr.Usage.ToResponsesResponseUsage() reduces duplication and keeps future token-field changes localized.
904-905: Good refactor: symmetric helper for Responses → Chat usage.Keeps conversions bidirectional and consistent.
b0c2195 to
21c78bf
Compare
2f89e37 to
badb1db
Compare
…y calculation in vertex
badb1db to
8fd0545
Compare
Merge activity
|

Summary
Added latency tracking and enhanced token usage details for LLM responses, providing more granular metrics for performance monitoring and cost analysis.
Changes
BifrostLLMUsagestructure with detailed token usage fields for both prompt and completion tokensType of change
Affected areas
How to test
Test the latency tracking and enhanced token usage details with various providers:
Verify that latency is reported in milliseconds in the response and that detailed token usage information is displayed in the UI when available.
Breaking changes
Related issues
N/A
Security considerations
No security implications as this PR only enhances metrics reporting.
Checklist
docs/contributing/README.mdand followed the guidelines