Skip to content

Conversation

@sureshkrishnan-v
Copy link

@sureshkrishnan-v sureshkrishnan-v commented Sep 15, 2025

Summary

Currently, there’s no way to know how many retries were performed before a request succeeded or failed. This PR adds retry count tracking and exposes it to consumers.

Changes

Populated retry counts in requestWorker after retry loops.

Exposed retry count via x-bf-retries HTTP header in the gateway.

Saved retry count in context for telemetry and plugin access.

Type of change

Feature

Affected areas

Core (Go)

Transports (HTTP)

Plugins

Docs

How to test

Run go test ./... to verify core functionality.

Send a request through a provider that triggers retries.

Check Retries in BifrostResponse or BifrostError.

Check x-bf-retries HTTP header.

Related issues

Closes #459

Signed-off-by: suresh krishnan v <sureshkrishnanv@outlook.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Exposes retry count via x-bf-retries HTTP response header for both streaming and non-streaming responses.
    • Adds Prometheus metric bifrost_retries_total to track total retry attempts, with standard labels (provider, model, method, and custom labels).
  • Documentation

    • Updates telemetry docs with the new retry metric, definition of retries, and PromQL examples for retry rate, retries-per-request ratio, and high-retry detection.

Walkthrough

Stores final retry attempts in each request's context after Bifrost's retry loop, exposes that count via an x-bf-retries HTTP response header across streaming and non-streaming handlers, records retries into a new Prometheus counter bifrost_retries_total, and documents the metric and example PromQL queries. No public API signatures changed.

Changes

Cohort / File(s) Summary of changes
Core: Retry propagation
core/bifrost.go, core/schemas/bifrost.go
After the retry loop, write final retry attempts into the request context using a new key BifrostContextKeyRetryCount ("bifrost-retry-count"). No control-flow or signature changes.
HTTP Transport: Response header & utils
transports/bifrost-http/handlers/completions.go, transports/bifrost-http/handlers/utils.go
Add getRetryCount(ctx *context.Context) int helper; set x-bf-retries response header when retry count ≥ 0 across streaming and non-streaming chat, transcription, and speech handlers.
Telemetry: Metrics and plugin
plugins/telemetry/setup.go, plugins/telemetry/main.go, docs/features/telemetry.mdx
Add Prometheus counter bifrost_retries_total and plugin field to record retries read from context; update telemetry docs with metric description, labels, and PromQL examples (retry rate, retries-per-request ratio, high-retry detection).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant H as HTTP Handler
    participant B as Bifrost Core
    participant U as Upstream
    participant T as Telemetry

    C->>H: Request
    H->>B: Execute request
    rect rgba(220,235,255,0.35)
      note right of B: Retry loop (attempts)
      B->>U: Send attempt n
      U-->>B: Response / Error
      alt retry needed
        B->>U: Retry (n+1)
      end
    end
    B->>B: Store attempts in req.Context (RetryCount)
    B-->>H: Result + ctx
    H->>T: PostHook(ctx)
    T->>T: Read RetryCount -> Inc bifrost_retries_total
    H->>H: Set header x-bf-retries (if >=0)
    H-->>C: Response (with x-bf-retries)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I counted hops between the clouds tonight,
Each retry a small and steady bite.
Headers hum the number shown,
Metrics tally every thrown,
A rabbit notes the path alight — 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning Against the linked issue [#459], the PR implements several required pieces (a context key for retry count, an HTTP header x-bf-retries, telemetry metrics and docs) but it does not appear to add the requested Retries field to BifrostResponse and BifrostError, and there is a reported context-propagation problem (context is mutated inside requestWorker but may not be visible to outer handlers) that could prevent transports/plugins from observing the retry count; tests validating the response fields, header, and metric behavior are also missing. To meet issue [#459], add a Retries int field to BifrostResponse and BifrostError (and populate it at the response-return path), ensure the retry count is propagated correctly to the returned/outer context or attached directly to the response (do not rely on a local request.Context mutation that won't be visible), change the helper to accept context.Context (not *context.Context) and add unit/integration tests that assert the header and metric are emitted as expected; coordinate with the fix referenced by the reviewer (PR #449) if needed before merging.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: add retry count tracking and metrics for Bifrost requests" concisely and accurately summarizes the primary change (adding retry-count tracking and telemetry exposure) and is specific, readable, and aligned with the changes in the PR.
Out of Scope Changes Check ✅ Passed All modifications are focused on implementing retry-count propagation, telemetry, HTTP header exposure, and documentation; there are no unrelated feature changes introduced. A couple of risky or stylistic items (adding an exported PrometheusPlugin field and using a pointer-to-context helper) are within the telemetry scope but should be reviewed for API impact and idiomatic usage.
Description Check ✅ Passed The PR description includes the main required sections from the repository template (Summary, Changes, Type of change, Affected areas, How to test, and Related issues) and gives actionable testing steps and a linked issue, but it omits the formal checklist, the explicit breaking-changes checkbox, and a brief security considerations section and does not use the template's checkbox markup consistently.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3e9e49 and 20d4249.

📒 Files selected for processing (1)
  • transports/bifrost-http/handlers/utils.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • transports/bifrost-http/handlers/utils.go

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6170cd and e7d7080.

📒 Files selected for processing (6)
  • core/bifrost.go (1 hunks)
  • core/schemas/bifrost.go (1 hunks)
  • docs/features/telemetry.mdx (3 hunks)
  • plugins/telemetry/main.go (2 hunks)
  • plugins/telemetry/setup.go (2 hunks)
  • transports/bifrost-http/handlers/completions.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
transports/bifrost-http/handlers/completions.go (1)
core/schemas/bifrost.go (1)
  • BifrostContextKeyRetryCount (108-108)
core/bifrost.go (1)
core/schemas/bifrost.go (1)
  • BifrostContextKeyRetryCount (108-108)
plugins/telemetry/main.go (1)
core/schemas/bifrost.go (1)
  • BifrostContextKeyRetryCount (108-108)
🔇 Additional comments (7)
plugins/telemetry/main.go (1)

176-179: LGTM! Retry metric collection is properly implemented.

The retry count retrieval and metric increment logic follows the same pattern as other metrics, using the context key and only recording when retries > 0.

core/bifrost.go (1)

1237-1238: LGTM! Context storage of retry count enables downstream telemetry.

The retry count is correctly stored in the context after the retry loop completes, making it accessible to both HTTP headers and telemetry plugins.

docs/features/telemetry.mdx (3)

52-52: LGTM! New retry metric is properly documented.

The documentation correctly describes the new retry metric and its usage.


61-61: LGTM! Clear explanation of retry count recording.

The label definition correctly clarifies that retries are recorded as metric values, not labels.


129-141: LGTM! Comprehensive retry analysis examples provided.

The PromQL examples cover the most useful retry monitoring patterns: rate tracking, retry ratios, and high retry detection.

plugins/telemetry/setup.go (2)

52-54: New counter declared — OK.

Consistent with existing metric patterns. No issues with the declaration itself.


176-184: No call sites found — confirm metric is incremented by retry count (Add), not by 1. Searched for bifrostRetriesTotal and only found its declaration in plugins/telemetry/setup.go; no increment calls located. If/where it’s incremented, use bifrostRetriesTotal.WithLabelValues(...).Add(float64(retries)) so the metric measures total retries performed.

Signed-off-by: suresh krishnan v <sureshkrishnanv@outlook.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
transports/bifrost-http/handlers/completions.go (2)

430-433: Use enhanced helper with resp/err; ensures header even if context missing.

Set from resp/err first, then context.

-	if r := getRetryCount(bifrostCtx); r >= 0 {
+	if r := getRetryCount(bifrostCtx, resp, bifrostErr); r >= 0 {
 		ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r))
 	}

581-584: Same fix here: pass resp/err to helper.

Keeps behavior consistent across request types.

-	if r := getRetryCount(bifrostCtx); r >= 0 {
+	if r := getRetryCount(bifrostCtx, resp, bifrostErr); r >= 0 {
 		ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r))
 	}
🧹 Nitpick comments (2)
transports/bifrost-http/handlers/completions.go (2)

604-608: Remove duplicate header set.

Header was already set at Lines 581–584; redundant writes add noise.

-	// // getting (handlers)
-	if r := getRetryCount(bifrostCtx); r >= 0 {
-		ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r))
-	}
+	// (removed duplicate x-bf-retries header; already set earlier)

718-721: Update helper usage here as well (streaming transcription).

Pass nils; helper will fall back to context.

-	if r := getRetryCount(bifrostCtx); r >= 0 {
+	if r := getRetryCount(bifrostCtx, nil, nil); r >= 0 {
 		ctx.Response.Header.Set("x-bf-retries", strconv.Itoa(r))
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7d7080 and c3e9e49.

📒 Files selected for processing (3)
  • plugins/telemetry/main.go (3 hunks)
  • transports/bifrost-http/handlers/completions.go (5 hunks)
  • transports/bifrost-http/handlers/utils.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/telemetry/main.go
🧰 Additional context used
🧬 Code graph analysis (1)
transports/bifrost-http/handlers/utils.go (1)
core/schemas/bifrost.go (1)
  • BifrostContextKeyRetryCount (108-108)
🔇 Additional comments (2)
transports/bifrost-http/handlers/utils.go (1)

5-7: OK to import context.

No concerns.

transports/bifrost-http/handlers/completions.go (1)

336-342: x-bf-retries header present on all error paths — resolved.
Verified header writes at lines 431, 582, 606, 683 and 719; SendBifrostError/SendSSEError and the handleStreamingSpeech path are covered.

@Pratham-Mishra04
Copy link
Collaborator

Hello @sureshkrishnan-v, thanks for the PR, it looks good but it has one issue in the flow - the context passed in the request worker is passed by value and not reference, so any updates there will not be reflected in the ctx outside in handleRequest function which actually returns back the ctx.

This is a fundamental issue in the ctx passing and its already fixed in #449 so lets wait till that PR is merged and then we can update and merge this one as well :)

@Pratham-Mishra04
Copy link
Collaborator

Also can you please maintain the PR template -> https://github.com/maximhq/bifrost/blob/main/.github/pull_request_template.md

@sureshkrishnan-v
Copy link
Author

@Pratham-Mishra04 yeah sure

Signed-off-by: suresh krishnan v <sureshkrishnanv@outlook.com>
append(bifrostDefaultLabels, labels...),
)

bifrostRetriesTotal = promauto.NewCounterVec(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have retries count as a Histogram metric? would be much better for analytics

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will add retries count as histogram metric

@Pratham-Mishra04
Copy link
Collaborator

Hey @sureshkrishnan-v #449 is now merged so you can sync and test your changes. Plus it'll also be helpful if you could send back number of retries in Error, Response extra fields instead of in ctx so that we can save and log it in frontend as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Propagate retry info in response

2 participants