Skip to content

Conversation

@Pratham-Mishra04
Copy link
Collaborator

Summary

Added latency tracking and enhanced token usage details for LLM responses, providing more granular metrics for performance monitoring and cost analysis.

Changes

  • Added latency tracking to Vertex provider responses, measuring and reporting API call duration in milliseconds
  • Enhanced BifrostLLMUsage structure with detailed token usage fields for both prompt and completion tokens
  • Added support for specialized token types like cached tokens, reasoning tokens, audio tokens, and prediction tokens
  • Implemented conversion methods between different response formats to preserve token usage details
  • Updated UI to display detailed token usage information when available
  • Added support for vertex provider/model format in pricing lookup
  • Removed commented-out code related to response pooling in Vertex provider

Type of change

  • Feature
  • Refactor

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Test the latency tracking and enhanced token usage details with various providers:

# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i
pnpm test
pnpm build

Verify that latency is reported in milliseconds in the response and that detailed token usage information is displayed in the UI when available.

Breaking changes

  • Yes
  • No

Related issues

N/A

Security considerations

No security implications as this PR only enhances metrics reporting.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

Copy link
Collaborator Author

Pratham-Mishra04 commented Oct 22, 2025

@Pratham-Mishra04 Pratham-Mishra04 marked this pull request as ready for review October 22, 2025 17:10
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

Warning

Rate limit exceeded

@Pratham-Mishra04 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 50 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fbdd429 and 8fd0545.

📒 Files selected for processing (17)
  • core/providers/gemini.go (4 hunks)
  • core/providers/vertex.go (9 hunks)
  • core/schemas/chatcompletions.go (1 hunks)
  • core/schemas/mux.go (4 hunks)
  • core/schemas/providers/anthropic/chat.go (3 hunks)
  • core/schemas/providers/cohere/chat.go (1 hunks)
  • core/schemas/providers/cohere/types.go (1 hunks)
  • core/schemas/providers/gemini/chat.go (3 hunks)
  • core/schemas/providers/gemini/transcription.go (1 hunks)
  • core/schemas/providers/gemini/types.go (1 hunks)
  • core/schemas/providers/gemini/utils.go (1 hunks)
  • core/schemas/providers/openai/chat.go (0 hunks)
  • core/schemas/responses.go (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/pricing/main.go (3 hunks)
  • plugins/logging/main.go (1 hunks)
  • ui/app/logs/views/logDetailsSheet.tsx (1 hunks)
📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Request latency tracking now included in all API responses
    • Detailed token usage breakdown showing cached tokens, audio tokens, reasoning tokens, and prediction tokens
    • Enhanced token metrics visibility in log viewer
  • Improvements

    • Improved pricing lookup support for Vertex provider/model formats

Walkthrough

This PR instruments Vertex provider calls with latency, adds detailed token usage breakdowns (prompt/completion subfields), introduces bidirectional usage conversion helpers, extends Gemini usage extraction, enhances pricing fallback logic for Vertex/model formats, and surfaces token detail fields in logging and UI.

Changes

Cohort / File(s) Summary
Core Provider: Vertex
core/providers/vertex.go
Adds time-based latency measurement around HTTP calls; populates ExtraFields.Latency for chat and embedding responses; consolidates embedding flow inline and uses request.Model to build embedding endpoint; retains client-pool removal behavior on auth failures.
Core Schemas — Definitions
core/schemas/chatcompletions.go
Adds ChatPromptTokensDetails and ChatCompletionTokensDetails; extends BifrostLLMUsage with optional PromptTokensDetails and CompletionTokensDetails fields to capture cached/audio/reasoning/prediction token breakdowns.
Core Schemas — Conversions
core/schemas/mux.go
Adds ToResponsesResponseUsage() and ToBifrostLLMUsage() helper methods and refactors conversion call sites to use them, centralizing usage mapping logic.
Provider Integrations — Anthropic
core/schemas/providers/anthropic/chat.go
Populates PromptTokensDetails.CachedTokens from Anthropic cache fields and propagates cached token counts back in reverse conversion.
Provider Integrations — Cohere
core/schemas/providers/cohere/chat.go, core/schemas/providers/cohere/types.go
Adds CachedTokens handling into ToBifrostChatResponse() (populate PromptTokensDetails); minor struct formatting changes in Cohere types.
Provider Integrations — Gemini
core/schemas/providers/gemini/utils.go, core/schemas/providers/gemini/chat.go, core/schemas/providers/gemini/transcription.go
extractUsageMetadata() now returns five ints (adds cachedTokens, reasoningTokens); populates PromptTokensDetails and CompletionTokensDetails from Gemini usage metadata; updates callsites to handle expanded return arity.
Logging / Plugins
plugins/logging/main.go
Replaces inline construction of BifrostLLMUsage with ToBifrostLLMUsage() call on ResponsesResponse.Usage in PostHook.
Pricing
framework/pricing/main.go
Adds extended fallback logic and debug logs in pricing lookup: try Vertex for Gemini misses, strip provider prefix for Vertex-form models, and attempt chat-completion pricing fallback when appropriate.
UI
ui/app/logs/views/logDetailsSheet.tsx
Conditionally renders new token detail fields: prompt_tokens_details (cached, audio) and completion_tokens_details (reasoning, audio, accepted/rejected prediction tokens).
Changelog
framework/changelog.md
Adds feature entry for Vertex provider/model format support in pricing lookup.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant VertexProvider as Vertex Provider
    participant HTTP as HTTP Client
    Note over Client,VertexProvider: Chat/Embedding request flow with latency instrumentation
    Client->>VertexProvider: send Chat/Embedding request
    VertexProvider->>HTTP: perform HTTP call (startTime recorded)
    HTTP-->>VertexProvider: response
    Note over VertexProvider: compute latency = now - startTime\npopulate ExtraFields.Latency (ms)
    VertexProvider-->>Client: return response with ExtraFields.Latency
Loading
sequenceDiagram
    participant Caller
    participant Pricing as CalculateCostFromUsage
    participant PrimaryLookup
    participant VertexLookup
    participant ChatCompFallback

    Caller->>Pricing: requestType, model, provider
    Pricing->>PrimaryLookup: try primary lookup (e.g., Gemini)
    alt Primary found
        PrimaryLookup-->>Pricing: pricing
    else Primary not found
        PrimaryLookup-->>Pricing: not found
        Pricing->>VertexLookup: try Vertex for same model (or stripped model if provider/ prefix)
        alt Vertex found
            VertexLookup-->>Pricing: pricing
        else Vertex not found
            VertexLookup-->>Pricing: not found
            alt requestType is chat-related
                Pricing->>ChatCompFallback: try chat-completion pricing
                ChatCompFallback-->>Pricing: pricing or not found
            end
        end
    end
    Pricing-->>Caller: final pricing or default
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped along the token trail,
Timed requests and counts set sail.
Cached and reasoning, audio too,
Latency ticks and pricing grew.
A tiny rabbit cheers — new fields in view! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly identifies the two main contributions of the PR—adding latency tracking and enhancing token usage details—which align clearly with the core implementation changes around Vertex provider latency measurement and expanded token usage fields. It is concise and specific without extraneous information.
Description Check ✅ Passed The PR description follows the repository template by providing a clear summary, detailed changes list, type of change, affected areas, test instructions, breaking changes, related issues, security considerations, and a comprehensive checklist. The only optional template section not included is screenshots for UI changes, which is non-critical but recommended for Next.js UI updates.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
core/schemas/providers/cohere/chat.go (1)

291-310: Move CachedTokens and TotalTokens computation outside the Tokens nil check.

CachedTokens and Tokens are independent fields in the Cohere response. Currently, CachedTokens is gated inside the if response.Usage.Tokens != nil block, causing data loss when Tokens is nil. TotalTokens computation is also gated, staying at 0. The responses.go file correctly accesses CachedTokens at the top level (lines 107 and 648), showing the inconsistency. Apply the suggested fix to move both checks outside and compute TotalTokens unconditionally.

core/schemas/providers/gemini/utils.go (1)

361-372: Nil dereference risk: message.Content may be nil.

convertBifrostMessagesToGemini reads message.Content without a nil check, which will panic for tool-only or assistant-only messages.

-        // Handle content
-        if message.Content.ContentStr != nil && *message.Content.ContentStr != "" {
-            parts = append(parts, &CustomPart{
-                Text: *message.Content.ContentStr,
-            })
-        } else if message.Content.ContentBlocks != nil {
-            for _, block := range message.Content.ContentBlocks {
+        // Handle content
+        if message.Content != nil {
+            if message.Content.ContentStr != nil && *message.Content.ContentStr != "" {
+                parts = append(parts, &CustomPart{
+                    Text: *message.Content.ContentStr,
+                })
+            } else if message.Content.ContentBlocks != nil {
+                for _, block := range message.Content.ContentBlocks {
                     if block.Text != nil {
                         parts = append(parts, &CustomPart{
                             Text: *block.Text,
                         })
                     }
-            }
+                }
+            }
         }
plugins/logging/main.go (1)

303-307: Code clarity issue: latency is persisted but via inconsistent paths.

The review concern is partially valid. While updateStreamingLogEntry doesn't receive a latency parameter (line 353), it does persist latency via streamResponse.Data.Latency (operations.go:170). However, logMsg.Latency (captured at lines 304-306) remains unused in the streaming path—it's dead code. The non-streaming path (line 321) explicitly passes latency to updateLogEntry, creating an asymmetrical and confusing pattern.

Recommendation: Either pass logMsg.Latency to updateStreamingLogEntry to match the non-streaming pattern, or remove the orphaned logMsg.Latency assignment and rely solely on streamResponse.Data.Latency. This removes ambiguity and ensures consistent latency handling across both paths.

🧹 Nitpick comments (6)
framework/pricing/main.go (2)

497-550: Clarify log wording and consider reducing chat fallback duplication.

The verification confirms your observations:

  • Three chat fallback patterns exist at lines 509, 529, and 540
  • Lines 509 and 529 contain identical log text ("secondary lookup failed, trying vertex provider for the same model in chat completion")
  • Line 540's wording "trying chat provider" is indeed misleading and should clarify it means chat-completion pricing

The suggested log text tweaks are sound and improve clarity:

  • Line 509 & 529: Change to "secondary lookup failed, trying chat-completion pricing for the same model with vertex provider"
  • Line 540: Change to "primary lookup failed, trying chat-completion pricing for the same model"

The optional refactoring suggestion to extract a helper function is valid—the triplication of this pattern (check ResponsesRequest/ResponsesStreamRequest, then attempt ChatCompletionRequest lookup) creates maintenance risk. However, this remains optional given the minimal scope.


493-505: Normalize provider at start of getPricing for consistent keying.

The verification confirms the inconsistency: populateModelPool (line 433) normalizes provider before keying, but getPricing uses the raw provider parameter directly in makeKey (line 497). This creates fragility—if pricingData keys use normalized providers, lookups fail with non-canonical inputs like "vertex_ai" or mixed-case variants, forcing the current fallback logic.

Normalizing provider at the function entry would align with the populateModelPool pattern and ensure consistent keying across all three lookup locations (lines 497, 502, 520, 543).

core/providers/vertex.go (1)

498-505: Now let me search for the actual BifrostEmbeddingRequest initialization in the codebase:

Based on the earlier script output, I can see the semanticcache plugin creates a BifrostEmbeddingRequest, but the Model field initialization was not visible in the truncated output. However, I have enough information from the verification to assess the review comment:

Guard against double‑prefix in embedding endpoint.

Based on my verification, Vertex AI's embedding API requires the MODEL_ID to be just the trailing model identifier (e.g., "gemini-embedding-001", "textembedding-gecko@001"), not a fully qualified path like "publishers/google/models/textembedding-gecko@001". The current code at lines 498-505 in core/providers/vertex.go directly interpolates request.Model into the URL without sanitization.

While the initial script output did not reveal explicit evidence that callers are passing fully qualified model names, the concern is valid from a defensive programming perspective. Some Vertex AI documentation examples show the full resource name being used (e.g., "projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/text-embedding-004"), which suggests that callers or configuration sources could potentially pass qualified names.

The proposed sanitization fix is reasonable defensive programming. However, without evidence that this issue is currently manifesting in the codebase, this is a preventive refactor rather than a critical bug fix. The suggested code changes are appropriate and would guard against URL malformation if unexpected input formats are ever introduced.

core/schemas/providers/gemini/transcription.go (1)

100-106: Always populate usage, even if no text extracted.

Currently, Usage is only set when text is present. Populate Usage whenever metadata exists so logs/costing remain accurate for empty/edge transcripts.

-            if textContent != "" {
-                bifrostResp.Text = textContent
-                bifrostResp.Task = schemas.Ptr("transcribe")
-
-                // Set usage information
-                bifrostResp.Usage = &schemas.TranscriptionUsage{
-                    Type:         "tokens",
-                    InputTokens:  &inputTokens,
-                    OutputTokens: &outputTokens,
-                    TotalTokens:  &totalTokens,
-                }
-            }
+            if textContent != "" {
+                bifrostResp.Text = textContent
+                bifrostResp.Task = schemas.Ptr("transcribe")
+            }
+            // Set usage information regardless of text presence
+            bifrostResp.Usage = &schemas.TranscriptionUsage{
+                Type:         "tokens",
+                InputTokens:  &inputTokens,
+                OutputTokens: &outputTokens,
+                TotalTokens:  &totalTokens,
+            }
plugins/logging/main.go (1)

454-458: Ensure pooled UpdateLogData is zeroed before reuse.

When returning UpdateLogData to the pool, clear all fields (including nested pointers/slices) to avoid leaking prior state across requests.

Can you confirm putUpdateLogData(...) nils/zeroes every field?

core/schemas/providers/gemini/utils.go (1)

480-521: Standardize MP3 MIME type to audio/mpeg.

detectAudioMimeType returns audio/mp3 in several branches, while normalizeAudioMIMEType maps MP3 to audio/mpeg. Prefer audio/mpeg everywhere for compatibility.

-        return "audio/mp3"
+        return "audio/mpeg"
@@
-        return "audio/mp3"
+        return "audio/mpeg"
@@
-        return "audio/mp3"
+        return "audio/mpeg"
@@
-    return "audio/mp3"
+    return "audio/mpeg"

Also applies to: 539-556

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0c2195 and 427f5c2.

📒 Files selected for processing (13)
  • core/providers/vertex.go (9 hunks)
  • core/schemas/chatcompletions.go (1 hunks)
  • core/schemas/mux.go (4 hunks)
  • core/schemas/providers/anthropic/chat.go (3 hunks)
  • core/schemas/providers/cohere/chat.go (1 hunks)
  • core/schemas/providers/cohere/types.go (1 hunks)
  • core/schemas/providers/gemini/chat.go (3 hunks)
  • core/schemas/providers/gemini/transcription.go (1 hunks)
  • core/schemas/providers/gemini/utils.go (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/pricing/main.go (3 hunks)
  • plugins/logging/main.go (1 hunks)
  • ui/app/logs/views/logDetailsSheet.tsx (1 hunks)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (18)
core/schemas/providers/cohere/types.go (1)

505-510: CohereStreamMessage field alignment — no functional change.

Tags and optionality unchanged. Good to merge.

framework/pricing/main.go (1)

306-312: Useful debug log.

Good addition; helps trace pricing resolution.

core/providers/vertex.go (2)

249-274: Latency instrumentation — good coverage.

Start/stop around client.Do and propagation into ExtraFields for both Claude and non-Claude paths looks correct.

Also applies to: 318-324, 339-344


600-606: Embedding latency propagation — LGTM.

ExtraFields.Provider/ModelRequested/RequestType/Latency set consistently; raw response preserved behind flag.

core/schemas/chatcompletions.go (1)

548-565: Now I need to verify if there are corresponding fields in ResponsesResponse for all these token details, or if the incomplete mapping is intentional:

Based on my verification, I found that the new token detail fields are defined but only partially propagated through type conversions. The conversions in core/schemas/mux.go (lines 629-637, 659-662) map only CachedTokens and ReasoningTokens, but do not propagate AudioTokens, AcceptedPredictionTokens, or RejectedPredictionTokens—even though these fields are defined in the structs and used by provider integrations (Gemini, Anthropic, Cohere).

This creates an incomplete mapping between BifrostLLMUsage and the Responses API format. The scope of this gap—whether it's a missing feature or intentionally limited by the Responses API specification—requires verification against the ResponsesResponse struct definition, which I couldn't fully inspect.

Incomplete token detail propagation in Responses API conversions.

Several fields defined in ChatCompletionTokensDetails and ChatPromptTokensDetails are not being mapped through the conversion functions in core/schemas/mux.go. Lines 634–637 and 659–662 propagate only CachedTokens and ReasoningTokens, but skip AudioTokens, AcceptedPredictionTokens, and RejectedPredictionTokens. This means data populated by provider integrations (Gemini, Anthropic, Cohere) is lost when converting to Responses API responses. Verify whether the Responses API specification intentionally excludes these fields, or whether the conversion logic needs to be extended to map all detail fields.

core/schemas/providers/gemini/transcription.go (1)

80-81: Signature update handled correctly.

Accepting 5 returns and discarding the last two is appropriate since TranscriptionUsage doesn’t carry usage detail fields.

plugins/logging/main.go (2)

390-390: Good move to centralized conversion.

Using ToBifrostLLMUsage() keeps Responses usage mapping consistent with the new detail fields.


121-124: No issues found.

The repository targets Go 1.24, as specified in plugins/logging/go.mod. This satisfies the Go 1.22+ requirement for the for range 1000 syntax introduced in Go 1.22, so the code is compatible and will compile correctly.

framework/changelog.md (1)

4-5: Changelog entry reads well.

Entry clearly states the new vertex-provider/model pricing lookup support.

core/schemas/providers/gemini/utils.go (1)

156-166: Usage extraction extended correctly.

New cachedTokens and reasoningTokens mapping from UsageMetadata looks correct and keeps old fields intact.

core/schemas/providers/gemini/chat.go (2)

338-389: Usage detail propagation looks correct.

CachedTokens -> PromptTokensDetails and ReasoningTokens -> CompletionTokensDetails are set as expected.


471-484: Reverse mapping back to Gemini metadata is consistent.

Details are written back to CachedContentTokenCount and ThoughtsTokenCount when present.

ui/app/logs/views/logDetailsSheet.tsx (1)

154-185: LGTM: Completion token details rendering is correct.

The conditional rendering of completion_tokens_details properly checks for the existence of each field before displaying it, avoiding unnecessary UI clutter.

core/schemas/providers/anthropic/chat.go (2)

354-361: Verify: Aggregating cache creation and read tokens loses pricing granularity.

Line 356 sums CacheCreationInputTokens and CacheReadInputTokens into a single CachedTokens field. Anthropic typically prices these differently (cache creation is more expensive than cache reads), so this aggregation loses important cost breakdown information.

Please confirm whether:

  1. This aggregation is acceptable for cost tracking and observability purposes
  2. The UI and downstream systems don't need separate visibility into cache creation vs. cache read costs

If granular cache cost tracking is important, consider adding separate fields to ChatPromptTokensDetails for cache creation and cache read tokens.


621-624: Verify: Function has no internal callers in the codebase.

During verification, ToAnthropicChatCompletionResponse was found to have no call sites anywhere in the repository—only its definition exists. This means either:

  1. The function is part of a public/exported API used by external code (outside this repository)
  2. The function is unused/dead code

If this is a public API, external callers could encounter the issue where all CachedTokens are mapped to CacheReadInputTokens only, potentially resulting in incorrect cost estimates if they rely on this conversion. Confirm whether this function is intentionally exposed as a public API or if it should be removed.

core/schemas/mux.go (3)

643-666: Verify: Reverse token detail conversion has the same incompleteness.

The ToBifrostLLMUsage method has the same limitation as ToResponsesResponseUsage - it only maps CachedTokens and ReasoningTokens, potentially missing AudioTokens, AcceptedPredictionTokens, and RejectedPredictionTokens.

This is the reverse direction of the previous comment. Ensure both conversion methods map all available token detail fields to avoid data loss in bidirectional conversions.


858-858: LGTM: Clean refactoring to use conversion methods.

The replacement of inline usage construction with calls to ToResponsesResponseUsage() and ToBifrostLLMUsage() improves code maintainability and consistency. The conversion methods centralize the mapping logic, making future updates easier.

Also applies to: 904-904, 1013-1013


618-641: Target type definitions are incomplete; fields cannot be mapped.

The conversion function correctly maps all available fields in ResponsesResponseInputTokens and ResponsesResponseOutputTokens. However, the source structs contain fields that the target types do not define:

  • ChatPromptTokensDetails.AudioTokens → no corresponding field in ResponsesResponseInputTokens
  • ChatCompletionTokensDetails.AudioTokens → no corresponding field in ResponsesResponseOutputTokens
  • ChatCompletionTokensDetails.AcceptedPredictionTokens → no corresponding field in ResponsesResponseOutputTokens
  • ChatCompletionTokensDetails.RejectedPredictionTokens → no corresponding field in ResponsesResponseOutputTokens

If these fields should be exposed in the Responses API format, extend ResponsesResponseInputTokens and ResponsesResponseOutputTokens (core/schemas/responses.go, lines 248-254) to include them. Otherwise, the current conversion is complete as-is.

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 10-22-feat_enhacements_token_details_for_chat_completions_and_added_latency_calculation_in_vertex branch from 427f5c2 to fbdd429 Compare October 22, 2025 20:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/providers/vertex.go (1)

318-323: Overwriting ExtraFields can drop fields set by converters; set fields incrementally.

Align with the non-Claude branch and avoid clobbering existing ExtraFields.

- response.ExtraFields = schemas.BifrostResponseExtraFields{
-   RequestType:    schemas.ChatCompletionRequest,
-   Provider:       schemas.Vertex,
-   ModelRequested: request.Model,
-   Latency:        latency.Milliseconds(),
- }
+ response.ExtraFields.RequestType = schemas.ChatCompletionRequest
+ response.ExtraFields.Provider = schemas.Vertex
+ response.ExtraFields.ModelRequested = request.Model
+ response.ExtraFields.Latency = latency.Milliseconds()
🧹 Nitpick comments (4)
framework/changelog.md (1)

4-5: Changelog could mention latency + token detail surfaces added.

Add entries for Vertex latency reporting and token usage detail fields to help downstream consumers track the change scope.

framework/pricing/main.go (1)

500-515: Unify provider constants and reduce duplicate fallback code.

  • Use string(schemas.Vertex) instead of hard-coded "vertex" to avoid drift.
  • Consolidate repeated “responses→chat” fallbacks into a helper to keep behavior consistent and logging messages accurate.

Apply minimally:

- pricing, ok = pm.pricingData[makeKey(model, "vertex", normalizeRequestType(requestType))]
+ pricing, ok = pm.pricingData[makeKey(model, string(schemas.Vertex), normalizeRequestType(requestType))]
...
- pricing, ok = pm.pricingData[makeKey(modelWithoutProvider, "vertex", normalizeRequestType(requestType))]
+ pricing, ok = pm.pricingData[makeKey(modelWithoutProvider, string(schemas.Vertex), normalizeRequestType(requestType))]
...
- pm.logger.Debug("primary lookup failed, trying chat provider for the same model in chat completion")
+ pm.logger.Debug("primary lookup failed, trying chat pricing for same model/provider in chat completion")

Also applies to: 517-535, 539-545

core/providers/vertex.go (1)

267-269: Aggressive client eviction on generic network errors.

Evicting the pooled client for any non-timeout error may churn tokens/transport on transient blips. Consider evicting only on 401/403 and keeping the client for retryable network errors.

core/schemas/providers/gemini/chat.go (1)

338-389: Good: propagate cached/reasoning token details.

Optional: only set detail pointers when counts > 0 to avoid empty objects in JSON.

- PromptTokensDetails: &schemas.ChatPromptTokensDetails{ CachedTokens: cachedTokens },
+ PromptTokensDetails: func() *schemas.ChatPromptTokensDetails {
+   if cachedTokens > 0 { return &schemas.ChatPromptTokensDetails{ CachedTokens: cachedTokens } }
+   return nil
+ }(),
- CompletionTokensDetails: &schemas.ChatCompletionTokensDetails{ ReasoningTokens: reasoningTokens },
+ CompletionTokensDetails: func() *schemas.ChatCompletionTokensDetails {
+   if reasoningTokens > 0 { return &schemas.ChatCompletionTokensDetails{ ReasoningTokens: reasoningTokens } }
+   return nil
+ }(),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 427f5c2 and fbdd429.

📒 Files selected for processing (13)
  • core/providers/vertex.go (9 hunks)
  • core/schemas/chatcompletions.go (1 hunks)
  • core/schemas/mux.go (4 hunks)
  • core/schemas/providers/anthropic/chat.go (3 hunks)
  • core/schemas/providers/cohere/chat.go (1 hunks)
  • core/schemas/providers/cohere/types.go (1 hunks)
  • core/schemas/providers/gemini/chat.go (3 hunks)
  • core/schemas/providers/gemini/transcription.go (1 hunks)
  • core/schemas/providers/gemini/utils.go (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/pricing/main.go (3 hunks)
  • plugins/logging/main.go (1 hunks)
  • ui/app/logs/views/logDetailsSheet.tsx (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • plugins/logging/main.go
  • core/schemas/providers/cohere/types.go
  • core/schemas/providers/anthropic/chat.go
  • core/schemas/providers/cohere/chat.go
🧰 Additional context used
🧬 Code graph analysis (7)
core/providers/vertex.go (5)
plugins/mocker/main.go (1)
  • Latency (125-129)
core/schemas/provider.go (1)
  • ErrProviderJSONMarshaling (26-26)
core/schemas/bifrost.go (3)
  • Vertex (40-40)
  • RequestType (79-79)
  • EmbeddingRequest (88-88)
core/schemas/account.go (1)
  • VertexKeyConfig (29-33)
transports/bifrost-http/handlers/inference.go (1)
  • EmbeddingRequest (202-206)
framework/pricing/main.go (1)
core/schemas/bifrost.go (4)
  • ResponsesRequest (86-86)
  • ResponsesStreamRequest (87-87)
  • ChatCompletionRequest (84-84)
  • Vertex (40-40)
core/schemas/mux.go (2)
core/schemas/chatcompletions.go (3)
  • BifrostLLMUsage (547-553)
  • ChatPromptTokensDetails (555-558)
  • ChatCompletionTokensDetails (560-565)
core/schemas/responses.go (3)
  • ResponsesResponseUsage (240-246)
  • ResponsesResponseInputTokens (248-250)
  • ResponsesResponseOutputTokens (252-254)
core/schemas/providers/gemini/utils.go (1)
core/schemas/providers/gemini/types.go (1)
  • GenerateContentResponse (1273-1289)
core/schemas/providers/gemini/chat.go (3)
core/schemas/chatcompletions.go (2)
  • ChatPromptTokensDetails (555-558)
  • ChatCompletionTokensDetails (560-565)
ui/lib/types/logs.ts (1)
  • CompletionTokensDetails (184-189)
plugins/mocker/main.go (1)
  • Usage (138-142)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
  • CompletionTokensDetails (184-189)
ui/app/logs/views/logDetailsSheet.tsx (1)
ui/app/logs/views/logEntryDetailsView.tsx (1)
  • LogEntryDetailsView (15-47)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
framework/pricing/main.go (1)

306-312: Helpful debug log.

Good addition; aids tracing pricing paths without changing behavior.

core/providers/vertex.go (1)

600-605: Latency propagation for embeddings looks good.

Accurate measurement window (just the HTTP call) and consistent ExtraFields population.

core/schemas/providers/gemini/transcription.go (1)

80-81: Signature update correctly handled.

Unpacking new return values with discards is fine; no behavior change.

core/schemas/providers/gemini/chat.go (1)

478-483: Reverse mapping to Gemini usage is correct.

Fields map 1:1 with Gemini UsageMetadata; looks solid.

core/schemas/chatcompletions.go (1)

546-553: Frontend types updated: ui/lib/types/logs.ts already defines prompt_tokens_details and completion_tokens_details, matching the API additions.

core/schemas/providers/gemini/utils.go (1)

156-166: Signature widening verified
transcription.go and chat.go both use the updated 5-tuple; no stale callers remain.

core/schemas/mux.go (2)

858-859: Good refactor: centralize usage mapping via helper.

Switching to cr.Usage.ToResponsesResponseUsage() reduces duplication and keeps future token-field changes localized.


904-905: Good refactor: symmetric helper for Responses → Chat usage.

Keeps conversions bidirectional and consistent.

@akshaydeo akshaydeo mentioned this pull request Oct 23, 2025
18 tasks
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 10-22-docs_documentation_fixes_and_updates branch from b0c2195 to 21c78bf Compare October 23, 2025 09:54
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 10-22-feat_enhacements_token_details_for_chat_completions_and_added_latency_calculation_in_vertex branch 2 times, most recently from 2f89e37 to badb1db Compare October 23, 2025 10:46
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 10-22-feat_enhacements_token_details_for_chat_completions_and_added_latency_calculation_in_vertex branch from badb1db to 8fd0545 Compare October 23, 2025 11:03
Copy link
Contributor

akshaydeo commented Oct 23, 2025

Merge activity

  • Oct 23, 7:48 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Oct 23, 7:49 PM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo changed the base branch from 10-22-docs_documentation_fixes_and_updates to graphite-base/665 October 23, 2025 19:48
@akshaydeo akshaydeo changed the base branch from graphite-base/665 to main October 23, 2025 19:48
@akshaydeo akshaydeo merged commit 304a3c6 into main Oct 23, 2025
2 of 3 checks passed
@akshaydeo akshaydeo deleted the 10-22-feat_enhacements_token_details_for_chat_completions_and_added_latency_calculation_in_vertex branch October 23, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants