-
Couldn't load subscription status.
- Fork 84
feat: add retry count tracking and metrics for Bifrost requests #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add retry count tracking and metrics for Bifrost requests #472
Conversation
Signed-off-by: suresh krishnan v <sureshkrishnanv@outlook.com>
📝 WalkthroughSummary by CodeRabbit
WalkthroughStores final retry attempts in each request's context after Bifrost's retry loop, exposes that count via an Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant H as HTTP Handler
participant B as Bifrost Core
participant U as Upstream
participant T as Telemetry
C->>H: Request
H->>B: Execute request
rect rgba(220,235,255,0.35)
note right of B: Retry loop (attempts)
B->>U: Send attempt n
U-->>B: Response / Error
alt retry needed
B->>U: Retry (n+1)
end
end
B->>B: Store attempts in req.Context (RetryCount)
B-->>H: Result + ctx
H->>T: PostHook(ctx)
T->>T: Read RetryCount -> Inc bifrost_retries_total
H->>H: Set header x-bf-retries (if >=0)
H-->>C: Response (with x-bf-retries)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/bifrost.go(1 hunks)core/schemas/bifrost.go(1 hunks)docs/features/telemetry.mdx(3 hunks)plugins/telemetry/main.go(2 hunks)plugins/telemetry/setup.go(2 hunks)transports/bifrost-http/handlers/completions.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
transports/bifrost-http/handlers/completions.go (1)
core/schemas/bifrost.go (1)
BifrostContextKeyRetryCount(108-108)
core/bifrost.go (1)
core/schemas/bifrost.go (1)
BifrostContextKeyRetryCount(108-108)
plugins/telemetry/main.go (1)
core/schemas/bifrost.go (1)
BifrostContextKeyRetryCount(108-108)
🔇 Additional comments (7)
plugins/telemetry/main.go (1)
176-179: LGTM! Retry metric collection is properly implemented.The retry count retrieval and metric increment logic follows the same pattern as other metrics, using the context key and only recording when retries > 0.
core/bifrost.go (1)
1237-1238: LGTM! Context storage of retry count enables downstream telemetry.The retry count is correctly stored in the context after the retry loop completes, making it accessible to both HTTP headers and telemetry plugins.
docs/features/telemetry.mdx (3)
52-52: LGTM! New retry metric is properly documented.The documentation correctly describes the new retry metric and its usage.
61-61: LGTM! Clear explanation of retry count recording.The label definition correctly clarifies that retries are recorded as metric values, not labels.
129-141: LGTM! Comprehensive retry analysis examples provided.The PromQL examples cover the most useful retry monitoring patterns: rate tracking, retry ratios, and high retry detection.
plugins/telemetry/setup.go (2)
52-54: New counter declared — OK.Consistent with existing metric patterns. No issues with the declaration itself.
176-184: No call sites found — confirm metric is incremented by retry count (Add), not by 1. Searched forbifrostRetriesTotaland only found its declaration in plugins/telemetry/setup.go; no increment calls located. If/where it’s incremented, use bifrostRetriesTotal.WithLabelValues(...).Add(float64(retries)) so the metric measures total retries performed.
Signed-off-by: suresh krishnan v <sureshkrishnanv@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
transports/bifrost-http/handlers/completions.go (2)
430-433: Use enhanced helper with resp/err; ensures header even if context missing.Set from resp/err first, then context.
- if r := getRetryCount(bifrostCtx); r >= 0 { + if r := getRetryCount(bifrostCtx, resp, bifrostErr); r >= 0 { ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r)) }
581-584: Same fix here: pass resp/err to helper.Keeps behavior consistent across request types.
- if r := getRetryCount(bifrostCtx); r >= 0 { + if r := getRetryCount(bifrostCtx, resp, bifrostErr); r >= 0 { ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r)) }
🧹 Nitpick comments (2)
transports/bifrost-http/handlers/completions.go (2)
604-608: Remove duplicate header set.Header was already set at Lines 581–584; redundant writes add noise.
- // // getting (handlers) - if r := getRetryCount(bifrostCtx); r >= 0 { - ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r)) - } + // (removed duplicate x-bf-retries header; already set earlier)
718-721: Update helper usage here as well (streaming transcription).Pass nils; helper will fall back to context.
- if r := getRetryCount(bifrostCtx); r >= 0 { + if r := getRetryCount(bifrostCtx, nil, nil); r >= 0 { ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
plugins/telemetry/main.go(3 hunks)transports/bifrost-http/handlers/completions.go(5 hunks)transports/bifrost-http/handlers/utils.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/telemetry/main.go
🧰 Additional context used
🧬 Code graph analysis (1)
transports/bifrost-http/handlers/utils.go (1)
core/schemas/bifrost.go (1)
BifrostContextKeyRetryCount(108-108)
🔇 Additional comments (2)
transports/bifrost-http/handlers/utils.go (1)
5-7: OK to import context.No concerns.
transports/bifrost-http/handlers/completions.go (1)
336-342: x-bf-retries header present on all error paths — resolved.
Verified header writes at lines 431, 582, 606, 683 and 719; SendBifrostError/SendSSEError and the handleStreamingSpeech path are covered.
|
Hello @sureshkrishnan-v, thanks for the PR, it looks good but it has one issue in the flow - the context passed in the request worker is passed by value and not reference, so any updates there will not be reflected in the ctx outside in handleRequest function which actually returns back the ctx. This is a fundamental issue in the ctx passing and its already fixed in #449 so lets wait till that PR is merged and then we can update and merge this one as well :) |
|
Also can you please maintain the PR template -> https://github.com/maximhq/bifrost/blob/main/.github/pull_request_template.md |
|
@Pratham-Mishra04 yeah sure |
Signed-off-by: suresh krishnan v <sureshkrishnanv@outlook.com>
| append(bifrostDefaultLabels, labels...), | ||
| ) | ||
|
|
||
| bifrostRetriesTotal = promauto.NewCounterVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have retries count as a Histogram metric? would be much better for analytics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will add retries count as histogram metric
|
Hey @sureshkrishnan-v #449 is now merged so you can sync and test your changes. Plus it'll also be helpful if you could send back number of retries in Error, Response extra fields instead of in ctx so that we can save and log it in frontend as well. |
Summary
Currently, there’s no way to know how many retries were performed before a request succeeded or failed. This PR adds retry count tracking and exposes it to consumers.
Changes
Populated retry counts in requestWorker after retry loops.
Exposed retry count via x-bf-retries HTTP header in the gateway.
Saved retry count in context for telemetry and plugin access.
Type of change
Feature
Affected areas
Core (Go)
Transports (HTTP)
Plugins
Docs
How to test
Run go test ./... to verify core functionality.
Send a request through a provider that triggers retries.
Check Retries in BifrostResponse or BifrostError.
Check x-bf-retries HTTP header.
Related issues
Closes #459