Skip to content

Conversation

@TejasGhatte
Copy link
Collaborator

@TejasGhatte TejasGhatte commented Oct 23, 2025

Summary

Increased API timeout values and added model listing tests for Anthropic, Google, and OpenAI integrations.

Changes

  • Increased default timeout from 30 to 120 seconds in Anthropic and OpenAI client configurations
  • Updated timeout values in streaming content collection functions from 30 to 120 seconds
  • Added null check for Anthropic response content before checking its length
  • Added new test cases to verify model listing functionality for all three providers:
    • test_14_list_models for Anthropic
    • test_15_list_models for Google
    • test_31_list_models for OpenAI

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Run the integration tests for the affected providers:

# Run Anthropic integration tests
pytest tests/integrations/tests/integrations/test_anthropic.py -v

# Run Google integration tests
pytest tests/integrations/tests/integrations/test_google.py -v

# Run OpenAI integration tests
pytest tests/integrations/tests/integrations/test_openai.py -v

Breaking changes

  • Yes
  • No

Related issues

Addresses timeout issues with API calls and adds test coverage for model listing functionality.

Security considerations

No security implications.

Checklist

  • I added/updated tests where appropriate
  • I verified the CI pipeline passes locally if applicable

@TejasGhatte TejasGhatte mentioned this pull request Oct 23, 2025
8 tasks
Copy link
Collaborator Author

TejasGhatte commented Oct 23, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added model listing capability across multiple AI providers.
    • Improved streaming response parsing for robust content extraction.
  • Bug Fixes

    • Enhanced image loading with User-Agent headers and improved error handling.
    • Increased timeout values for streaming operations and API calls for improved reliability.
    • Refined error propagation and validation logic.

Walkthrough

Updated integration tests for Anthropic, Google, and OpenAI (timeouts increased, new model-listing tests, streaming parsing hardened); provider mappers for Anthropic and Gemini now emit and assemble streaming tool/function-call blocks; test utilities loosen some error validations and adjust streaming assumptions.

Changes

Cohort / File(s) Summary
Anthropic integration tests
tests/integrations/tests/integrations/test_anthropic.py
Increased client and streaming timeouts (30→120s), added content truthiness guard in final response check, changed invalid-role test to expect a non-empty response instead of exception, added test_14_list_models.
Google integration tests & helpers
tests/integrations/tests/integrations/test_google.py
load_image_from_url adds User-Agent and 30s timeout and raises on bad status; streaming parsing made tolerant (candidates→content→parts→text with fallback); lowered minimum content length; removed debug prints; added extract_google_function_calls(response: Any) -> List[Dict[str, Any]]; added test_15_list_models.
OpenAI integration tests
tests/integrations/tests/integrations/test_openai.py
Increased OpenAI client timeout (30→120s); extended streaming/transcription timeouts to 120s; added minor error logging; added test_31_list_models.
Provider response mapping — Anthropic
core/schemas/providers/anthropic/responses.go
Streaming mapping updated to emit ContentBlockStart for function-call/tool-use outputs, populate ContentBlock ID/Name and index from tool messages/parts, handle ContentPartAdded accordingly, remove placeholder delta on OutputItemDone, and ensure non-streaming content slices are empty (not nil).
Provider response mapping — Gemini (chat)
core/schemas/providers/gemini/chat.go
Add streaming support: handle ChatStreamResponseChoice.Delta by deriving role (Delta.Role or "model"), append delta text to Parts, translate Delta.ToolCalls into FunctionCall parts (parse arguments JSON), and assemble streaming candidate content while preserving non-streaming fallback.
Test utilities
tests/integrations/tests/utils/common.py
Relaxed OpenAI error assertions (removed top-level type and event_id checks and nested event_id checks); removed Anthropic streaming fallback branch that assumed a text_delta when no type specified.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Provider as Provider (stream)
  participant Mapper as Response Mapper
  participant Consumer as App/Tests

  Note right of Provider #bfe9ff: streaming chunks / output items
  Provider->>Mapper: send chunk (delta / output item)
  alt chunk signals tool/function-call
    Mapper->>Mapper: emit ContentBlockStart(type=ToolUse, id?, name?)
    Mapper->>Consumer: ContentPart (ToolUse) with index
  else chunk contains text/parts
    Mapper->>Mapper: append Text Part (index-aware)
    Mapper->>Consumer: ContentPart (Text) with index
  end
  Note right of Mapper #e8f5e9: completion emitted without placeholder deltas
  Mapper->>Consumer: Completion event (no artificial delta)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through streams and model lists,

Tool calls tucked in tidy bits.
Timeouts stretched to catch the flow,
Parsers gentle, errors low.
The rabbit smiles — tests all go.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The PR title "fix: integration test cases" is vague and generic, using non-descriptive phrasing that could apply to almost any integration test modification. While the title is technically related to the changeset (all changes are in integration test files), it fails to convey meaningful information about the actual changes made, such as the increased timeout values from 30s to 120s, the addition of three new model listing tests, or the improved error handling. A developer scanning the commit history would not understand the primary purpose of these changes from the title alone.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed The PR description follows the provided template structure effectively, covering all major required sections including Summary, Changes, Type of change, Affected areas, How to test, Breaking changes, Related issues, Security considerations, and Checklist. The description provides concrete details about the modifications made to timeout values and the new test cases added for each provider. While the "Related issues" section provides context rather than explicit issue links in the "Closes #123" format suggested by the template, this is a minor deviation that does not prevent understanding the PR's purpose.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 10-23-fix_integration_test_cases

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@TejasGhatte TejasGhatte marked this pull request as ready for review October 23, 2025 14:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8618bf8 and 4fedb68.

📒 Files selected for processing (3)
  • tests/integrations/tests/integrations/test_anthropic.py (4 hunks)
  • tests/integrations/tests/integrations/test_google.py (1 hunks)
  • tests/integrations/tests/integrations/test_openai.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (1)
  • collect_streaming_content (783-865)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (1)
  • collect_streaming_content (783-865)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1386-1397)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_openai.py

1058-1058: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

589-589: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_google.py

506-506: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
tests/integrations/tests/integrations/test_anthropic.py (3)

78-78: Timeout increase looks good.

The increased timeout (30→120 seconds) for the Anthropic client is appropriate for integration tests and aligns with similar changes across OpenAI and other providers.


486-486: Good defensive check for None content.

Adding the null check before evaluating length prevents potential AttributeError if final_response.content is None.


564-564: Streaming timeout increases are appropriate.

The extended timeouts for streaming operations are consistent with the client configuration and help prevent premature timeouts during integration tests.

Also applies to: 582-582

tests/integrations/tests/integrations/test_openai.py (3)

166-166: Timeout increase is appropriate.

The increased timeout (30→120 seconds) aligns with the timeout changes across all provider integration tests and helps prevent premature timeouts.


473-473: Streaming timeout increases look good.

The extended timeouts for both basic streaming and tool-based streaming are consistent with the overall timeout strategy.

Also applies to: 491-491


573-573: Transcription streaming timeout increase is reasonable.

Audio transcription streaming may require additional processing time, making the 120-second timeout appropriate.

tests/integrations/tests/integrations/test_google.py (2)

505-510: Well-implemented model listing test.

This test correctly includes the @skip_if_no_api_key("google") decorator and uses a consistent approach with the Anthropic test (limiting to 5 results and asserting exactly 5). Good work!

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.


514-535: Well-structured helper function with proper error handling.

The extract_google_function_calls helper follows the pattern established in other provider test files and includes appropriate defensive checks and error handling.

@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from 4fedb68 to ed5714f Compare October 23, 2025 16:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
core/schemas/providers/anthropic/responses.go (3)

328-336: Bug: tool_use start uses ToolUseID instead of ID (breaks call/result association).

For content_block_start of a tool_use, the identifier is in ContentBlock.ID. ToolUseID is only present on tool_result to reference the earlier tool_use. Using ToolUseID here will produce nil IDs and prevent correlating tool_result with its call.

Apply this diff:

 case AnthropicContentBlockTypeToolUse:
   // This is a function call starting - create function call message

   item := &schemas.ResponsesMessage{
-    ID:   chunk.ContentBlock.ToolUseID,
+    ID:   chunk.ContentBlock.ID,
     Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall),
     ResponsesToolMessage: &schemas.ResponsesToolMessage{
-      CallID:    chunk.ContentBlock.ToolUseID,
+      CallID:    chunk.ContentBlock.ID,
       Name:      chunk.ContentBlock.Name,
       Arguments: schemas.Ptr(""), // Arguments will be filled by deltas
     },
   }

373-383: Use Arguments field for function-call JSON deltas (mapper mismatch).

ToAnthropicResponsesStreamResponse reads bifrostResp.Arguments for FunctionCallArgumentsDelta, but here we set Delta. Populate Arguments to keep round‑trip symmetry and avoid dropped deltas.

Apply this diff:

 case AnthropicStreamDeltaTypeInputJSON:
   // Function call arguments delta
   if chunk.Delta.PartialJSON != nil && *chunk.Delta.PartialJSON != "" {
     return &schemas.BifrostResponsesStreamResponse{
       Type:           schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDelta,
       SequenceNumber: sequenceNumber,
       OutputIndex:    schemas.Ptr(0),
       ContentIndex:   chunk.Index,
-      Delta:          chunk.Delta.PartialJSON,
+      Arguments:      chunk.Delta.PartialJSON,
     }, nil, false
   }

318-346: Fix field mapping errors and populate Arguments field in FunctionCallArgumentsDelta events.

Two critical issues require correction:

  1. Lines 329, 332 (responses.go): Within case AnthropicContentBlockTypeToolUse, use chunk.ContentBlock.ID instead of chunk.ContentBlock.ToolUseID. Per the type definitions, ID is the tool_use identifier, while ToolUseID is for tool_result blocks.

  2. Line 381 (responses.go): When handling FunctionCallArgumentsDelta, populate the Arguments field instead of Delta. The struct defines Arguments *string for this purpose, and other parts of the codebase (mux.go line 915) correctly set streamResp.Arguments. Change Delta: chunk.Delta.PartialJSON to Arguments: chunk.Delta.PartialJSON.

  3. Line 702 (chat.go): Same fix as #1—use chunk.ContentBlock.ID instead of chunk.ContentBlock.ToolUseID within the tool_use case.

♻️ Duplicate comments (2)
tests/integrations/tests/integrations/test_anthropic.py (1)

589-594: Missing @skip_if_no_api_key decorator.

This test should include the @skip_if_no_api_key("anthropic") decorator to gracefully skip when the API key is unavailable, consistent with all other tests in this file.

Apply this diff:

+    @skip_if_no_api_key("anthropic")
     def test_14_list_models(self, anthropic_client, test_config):

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

tests/integrations/tests/integrations/test_openai.py (1)

1058-1064: Missing @skip_if_no_api_key decorator and inconsistent test approach.

Two issues to address:

  1. Like other test methods, this should use @skip_if_no_api_key("openai") to gracefully skip when the API key is unavailable.

  2. This test's approach differs from the Anthropic and Google equivalents:

    • Anthropic (line 592): models.list(limit=5) → asserts len(response.data) == 5
    • Google (line 517): models.list(config={"page_size": 5}) → asserts len(response) == 5
    • OpenAI (line 1062): models.list() → asserts len(response.data) > 0

For consistency and determinism, align with the other providers.

Apply this diff:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

🧹 Nitpick comments (7)
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Consider using proper logging instead of print.

While this debug output can be helpful for troubleshooting, consider using Python's logging module for consistency with production code practices.

Apply this diff if you want to use proper logging:

-        print(error)
+        import logging
+        logging.debug(f"Error from invalid role test: {error}")

Alternatively, if the print was added only for temporary debugging, consider removing it before merging.

core/schemas/providers/anthropic/responses.go (4)

471-507: SSE conformance: ensure index for tool_use starts and include empty content on message_start.

  • content_block_start events should include an index; default to 0 if absent.
  • message_start payload should include an empty content array to mirror Anthropic’s schema.

Apply this diff:

 case schemas.ResponsesStreamResponseTypeOutputItemAdded:
   // Check if this is a function call (tool use) message
   if bifrostResp.Item != nil && bifrostResp.Item.Type != nil && *bifrostResp.Item.Type == schemas.ResponsesMessageTypeFunctionCall {
     // Convert function call to tool_use content_block_start event
     streamResp.Type = AnthropicStreamEventTypeContentBlockStart
-    if bifrostResp.ContentIndex != nil {
-      streamResp.Index = bifrostResp.ContentIndex
-    }
+    if bifrostResp.ContentIndex != nil {
+      streamResp.Index = bifrostResp.ContentIndex
+    } else {
+      zero := 0
+      streamResp.Index = &zero
+    }

     contentBlock := &AnthropicContentBlock{
       Type: AnthropicContentBlockTypeToolUse,
     }

     if bifrostResp.Item.ResponsesToolMessage != nil {
       if bifrostResp.Item.ResponsesToolMessage.CallID != nil {
         contentBlock.ID = bifrostResp.Item.ResponsesToolMessage.CallID
       }
       if bifrostResp.Item.ResponsesToolMessage.Name != nil {
         contentBlock.Name = bifrostResp.Item.ResponsesToolMessage.Name
       }
     }

     streamResp.ContentBlock = contentBlock
   } else {
     // Regular message start event
     streamResp.Type = AnthropicStreamEventTypeMessageStart
     if bifrostResp.Item != nil {
       // Create message start event
       streamMessage := &AnthropicMessageResponse{
         Type: "message",
-        Role: string(schemas.ResponsesInputMessageRoleAssistant),
+        Role: string(schemas.ResponsesInputMessageRoleAssistant),
+        Content: []AnthropicContentBlock{},
       }
       if bifrostResp.Item.ID != nil {
         streamMessage.ID = *bifrostResp.Item.ID
       }
       streamResp.Message = streamMessage
     }
   }

1265-1279: Guard against nil msg.Content before dereference.

Several branches dereference msg.Content without a prior nil check, risking a panic for messages lacking content.

Apply this diff:

-      case schemas.ResponsesMessageTypeMessage:
-        // Regular text message
-        if msg.Content.ContentStr != nil {
+      case schemas.ResponsesMessageTypeMessage:
+        // Regular text message
+        if msg.Content != nil && msg.Content.ContentStr != nil {
           contentBlocks = append(contentBlocks, AnthropicContentBlock{
             Type: "text",
             Text: msg.Content.ContentStr,
           })
-        } else if msg.Content.ContentBlocks != nil {
+        } else if msg.Content != nil && msg.Content.ContentBlocks != nil {
           // Convert content blocks
           for _, block := range msg.Content.ContentBlocks {
             anthropicBlock := convertContentBlockToAnthropic(block)
             if anthropicBlock != nil {
               contentBlocks = append(contentBlocks, *anthropicBlock)
             }
           }
         }

509-596: Optional: expand ContentPartAdded mapping to support images when streaming.

Currently handles only text blocks. If Anthropic emits image content blocks mid‑stream, mirror them here for completeness.


416-425: Minor: include stop_reason in message_delta when known.

If upstream carries stop_reason in the Bifrost event, add it to the message_delta for better parity. Safe to defer.

core/schemas/providers/gemini/chat.go (2)

497-501: Normalize non‑streaming assistant role to Gemini's 'model'.

Gemini Content.Role allows only 'user' or 'model'. Mapping 'assistant' → 'model' avoids invalid role values.

-          candidate.Content = &Content{
-            Parts: parts,
-            Role:  string(choice.ChatNonStreamResponseChoice.Message.Role),
-          }
+          role := string(choice.ChatNonStreamResponseChoice.Message.Role)
+          if role == string(schemas.ChatMessageRoleAssistant) {
+            role = "model"
+          }
+          candidate.Content = &Content{
+            Parts: parts,
+            Role:  role,
+          }

Please confirm Gemini consumption points expect only 'user'|'model'; adjust if there’s any downstream relying on 'assistant'.


418-462: Reduce duplication: extract shared tool‑call → Part conversion.

Both branches build FunctionCall Parts from tool calls. Consider a small helper:

  • toFunctionCallPart(idPtr, fnNamePtr, argsJSON string) (*Part, bool)

This trims repeated JSON parsing and ID/name wiring and centralizes error handling.

I can draft the helper and update both branches if you want.

Also applies to: 463-501

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fedb68 and ed5714f.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (517-519)
  • ChatNonStreamResponseChoice (511-514)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (949-957)
  • Part (982-1006)
  • FunctionCall (1078-1088)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (268-268)
  • ResponsesToolMessage (431-451)
  • ResponsesInputMessageRoleAssistant (304-304)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (13)
tests/integrations/tests/integrations/test_anthropic.py (4)

78-78: LGTM: Timeout increase addresses integration test stability.

The timeout increase from 30 to 120 seconds for the Anthropic client is appropriate for integration tests and aligns with the PR objective to fix timeout issues.


486-486: LGTM: Defensive null check prevents potential AttributeError.

The added null check appropriately handles the case where Anthropic returns empty content when tool results are sufficient, as documented in the surrounding comment.


564-564: LGTM: Streaming timeout increases improve test reliability.

The timeout increases from 30 to 120 seconds for streaming tests are appropriate and consistent with the PR's objective to address timeout issues in integration tests.

Also applies to: 582-582


540-550: Based on my verification, I found a critical inconsistency: the test behavior contradicts both the Bifrost codebase and the pattern across other provider integrations.

Key Findings:

  1. Inconsistent Test Patterns: While test_anthropic.py expects a successful response without exceptions, test_openai.py, test_litellm.py, and test_google.py all use pytest.raises(Exception) for the same INVALID_ROLE_MESSAGES test data.

  2. No Role Normalization in Bifrost: Examination of the Bifrost Anthropic provider code shows direct pass-through of roles without validation or normalization:

    • Role: schemas.ChatMessageRole(msg.Role) (line 54 in core/schemas/providers/anthropic/chat.go)
    • No filtering, validation, or normalization logic present
  3. Misleading Comment: The comment "bifrost handles invalid roles internally" is not supported by the Bifrost codebase. The code performs no special handling for invalid roles like "tester".

The test is likely incorrect and should be aligned with the other provider tests. The Anthropic API (like other LLM APIs) rejects invalid message roles; Bifrost does not normalize them as the comment suggests.

tests/integrations/tests/integrations/test_openai.py (3)

166-166: LGTM: Timeout increase improves integration test resilience.

The timeout increase from 60 to 120 seconds aligns with the PR objective to address timeout issues in integration tests.


474-474: LGTM: Streaming timeout increases address test stability.

The timeout increases from 30 to 120 seconds for streaming tests are appropriate and consistent with similar changes across other provider tests.

Also applies to: 492-492


574-574: LGTM: Transcription streaming timeout increase is appropriate.

The timeout increase from 60 to 120 seconds for transcription streaming is reasonable given that transcription operations can be more time-intensive.

tests/integrations/tests/integrations/test_google.py (4)

147-152: Excellent defensive improvements to image loading.

The additions are all best practices:

  • User-Agent header prevents 403 errors from bot detection
  • Timeout prevents indefinite hanging
  • raise_for_status() ensures HTTP errors are caught early

These changes improve test reliability significantly.


469-483: LGTM: Improved streaming content extraction with proper fallback.

The enhanced logic properly navigates Google GenAI's nested streaming structure (candidates → content → parts → text) with a fallback to direct chunk.text access for compatibility. The reduced minimum content length from 10 to 5 appears to reflect observed streaming behavior.


514-519: LGTM: Well-implemented model listing test.

This test correctly includes the @skip_if_no_api_key decorator and uses a deterministic approach with page_size: 5 and an exact count assertion, consistent with best practices.

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.


523-544: Clarify intent for unused helper function.

The extract_google_function_calls helper is well-structured with proper defensive programming, but verification confirms it's not called anywhere in the codebase. Existing tests at lines 236-239 and 316-319 access response.function_calls directly instead.

Either refactor the existing tests to use this helper for consistency, or remove it if it's not intended for future use or external consumption.

core/schemas/providers/anthropic/responses.go (1)

266-268: Good fix: ensure JSON emits [] instead of null for content.

Setting Content to an empty slice avoids nulls and aligns better with clients expecting arrays. LGTM.

core/schemas/providers/gemini/chat.go (1)

425-453: Remove the proposed diff; the nil deref risk does not exist, but role normalization is needed only in the non-streaming path.

The review comment incorrectly identifies a nil pointer dereference risk. In the code, toolCall.Function is an embedded value field (ChatAssistantMessageToolCallFunction), not a pointer, so it cannot be nil—accessing toolCall.Function.Arguments and toolCall.Function.Name is always safe.

However, there is a legitimate issue in the non-streaming path at line 499: the code outputs string(choice.ChatNonStreamResponseChoice.Message.Role) directly, which will be "assistant" for ChatMessageRoleAssistant. Since Gemini expects roles to map "assistant" to "model", only line 499 needs normalization. The streaming path already defaults to "model" and does not require the suggested guard. The json.Unmarshal errors being silently ignored is a minor quality issue (not critical for streaming partial JSON).

Likely an incorrect or invalid review comment.

@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from ed5714f to 21dd884 Compare October 23, 2025 18:10
@TejasGhatte TejasGhatte force-pushed the 10-17-feat_added_list_models_request branch from 8618bf8 to 40d6d8f Compare October 23, 2025 18:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

1059-1064: Missing decorator and non-deterministic assertion.

Two issues need addressing:

  1. Missing @skip_if_no_api_key("openai") decorator: All other tests in this suite use this decorator to gracefully skip when the API key is unavailable. This test should follow the same pattern.

  2. Non-deterministic assertion: The test uses assert len(response.data) > 0, which makes the test non-deterministic. For consistency with the Anthropic (line 592-594) and Google (line 517-519) tests, use a fixed limit for deterministic validation.

Apply this diff:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

Note: The OpenAI Python SDK does support the limit parameter for models.list() (default 20, valid range 1-100). The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

🧹 Nitpick comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Consider using structured logging instead of print statements.

While printing the error aids debugging, consider using Python's logging module for better control and formatting in test output.

-        print(error)
+        import logging
+        logging.debug(f"Error in test_12_error_handling_invalid_roles: {error}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed5714f and 21dd884.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (517-519)
  • ChatNonStreamResponseChoice (511-514)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (949-957)
  • Part (982-1006)
  • FunctionCall (1078-1088)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (2)
  • ResponsesMessageTypeFunctionCall (268-268)
  • ResponsesToolMessage (431-451)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
core/schemas/providers/anthropic/responses.go (1)

266-268: LGTM: Consistent empty slice initialization.

Setting Content to an empty slice instead of leaving it nil ensures consistent JSON serialization (empty array [] instead of null). This is a good practice for API responses.

tests/integrations/tests/integrations/test_anthropic.py (5)

78-78: LGTM! Timeout increase addresses integration test reliability.

The 4x timeout increase aligns with the PR's goal of fixing integration test timeout issues and is consistently applied across all provider tests.


486-491: LGTM! Proper null-safety for known Anthropic behavior.

The null check correctly handles cases where Anthropic returns empty content after tool results, which is documented as valid behavior.


540-550: LGTM! Test correctly reflects bifrost's error handling behavior.

The test now validates that bifrost handles invalid roles internally rather than propagating errors to the provider. This is a significant behavioral change from the original test expectation but aligns with bifrost's role as a translation/normalization layer.


564-564: LGTM! Streaming timeout increases improve test reliability.

The 4x timeout increase for streaming operations is consistent with the broader timeout adjustments in this PR and should reduce flaky test failures.

Also applies to: 582-582


589-594: LGTM! Model listing test is well-structured and deterministic.

The test correctly uses the @skip_if_no_api_key decorator and requests a fixed number of models (5) for deterministic validation, consistent with the Google provider's test approach.

Note: The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

tests/integrations/tests/integrations/test_openai.py (2)

166-166: LGTM! Timeout increase improves test reliability.

Consistent with the timeout adjustments across all provider tests in this PR.


474-474: LGTM! Streaming timeout increases address test flakiness.

The timeout increases across all streaming scenarios (chat, tools, transcription) should improve test reliability for slower network conditions or provider throttling.

Also applies to: 492-492, 574-574

tests/integrations/tests/integrations/test_google.py (4)

147-152: LGTM! Robust image loading with proper error handling.

The improvements add three important safety measures:

  1. User-Agent header prevents 403 errors from servers that block default clients
  2. 30-second timeout prevents indefinite hangs
  3. raise_for_status() ensures HTTP errors are caught early

469-479: LGTM! More robust streaming text extraction.

The improved extraction logic properly traverses the Google GenAI response structure (candidates → content → parts → text) with defensive attribute checks, while maintaining a fallback for compatibility. This should handle edge cases better than the previous implementation.


483-483: Relaxed content threshold—verify this is appropriate.

The minimum content length was reduced from 10 to 5 characters. While this makes the test more tolerant of short responses, 5 characters is quite minimal. Ensure this threshold still validates meaningful streaming content and won't pass on truncated or error responses.


514-519: LGTM! Deterministic model listing test with proper decorator.

The test correctly uses the @skip_if_no_api_key decorator and requests a fixed page size for deterministic validation, consistent with the Anthropic test approach. The Google SDK uses config={"page_size": 5} syntax, which is appropriate for this provider.

Note: The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Consider removing debug print statement.

The print(error) statement appears to be for debugging purposes. While helpful during development, consider removing it or replacing it with proper logging for production test code.

Apply this diff if you want to remove the debug statement:

         # Verify the error is properly caught and contains role-related information
         error = exc_info.value
-        print(error)
         assert_valid_error_response(error, "tester")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed5714f and 21dd884.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (3)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (517-519)
  • ChatNonStreamResponseChoice (511-514)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (949-957)
  • Part (982-1006)
  • FunctionCall (1078-1088)
ui/lib/types/logs.ts (1)
  • Function (139-144)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (2)
  • ResponsesMessageTypeFunctionCall (268-268)
  • ResponsesToolMessage (431-451)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
tests/integrations/tests/integrations/test_anthropic.py (5)

78-78: LGTM! Timeout increase addresses integration test stability.

The timeout increase from 30 to 120 seconds aligns with the PR objectives to fix integration test timeout issues.


486-491: Good defensive null check.

This properly handles the case where Anthropic returns an empty content array when a tool result is sufficient, preventing potential AttributeErrors.


564-564: LGTM! Extended streaming timeouts.

The timeout increases from 30 to 120 seconds for streaming operations align with the PR objectives and ensure tests don't fail prematurely on slower network connections or API responses.

Also applies to: 582-582


589-594: LGTM! Model listing test added with proper decorator.

The test correctly uses the @skip_if_no_api_key decorator and validates that exactly 5 models are returned when limited to 5. This aligns with similar tests in other provider integrations.


540-550: Test behavior is inconsistent with other provider tests and the claim about Bifrost is unverified.

The Anthropic test_12 does not follow the established pattern used across OpenAI, Google, and LiteLLM integrations. While all other provider tests expect exceptions for invalid roles and call assert_valid_error_response(), the Anthropic test expects a successful response without validation. Additionally:

  • The test does not use convert_to_anthropic_messages() like other tests in the same file
  • The comment claims "bifrost handles invalid roles internally" but this is not verified against Bifrost's actual implementation
  • The same INVALID_ROLE_MESSAGES fixture with role="tester" is expected to raise exceptions in all other providers

Either verify that Bifrost's Anthropic integration has documented, provider-specific role handling that differs from OpenAI/Google/LiteLLM, or align the test with the established error-handling pattern used elsewhere.

tests/integrations/tests/integrations/test_openai.py (3)

166-166: LGTM! Timeout increase improves test reliability.

The timeout increase from 30 to 120 seconds addresses timeout issues mentioned in the PR objectives.


474-474: LGTM! Extended streaming timeouts.

The timeout increases from 30 to 120 seconds for streaming operations are consistent with changes in other provider tests and address the PR's timeout-related objectives.

Also applies to: 492-492, 574-574


1059-1064: Missing decorator and inconsistent test pattern.

This test has two issues:

  1. Missing @skip_if_no_api_key("openai") decorator – Unlike all other test methods in this file, this test lacks the decorator to skip when the API key is unavailable.

  2. Inconsistent assertion pattern – This test uses assert len(response.data) > 0 while the equivalent tests in other providers use deterministic assertions with a fixed limit:

    • Anthropic (line 592): models.list(limit=5)assert len(response.data) == 5
    • Google (line 517): models.list(config={"page_size": 5})assert len(response) == 5

The past review thread indicates you mentioned "list is not supported in openai sdk," but the OpenAI Python SDK does support the limit parameter for models.list().

Apply this diff to align with other provider tests:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

Likely an incorrect or invalid review comment.

core/schemas/providers/anthropic/responses.go (2)

266-268: Good defensive initialization.

Setting Content to an empty slice instead of leaving it nil prevents potential nil pointer dereferences in downstream code that expects a slice.


471-507: LGTM! Improved function call handling in streaming responses.

The code now correctly differentiates between function call messages and regular messages in the OutputItemAdded event:

  • Function calls emit a ContentBlockStart event with a ToolUse content block
  • Regular messages emit a MessageStart event with an assistant role

This aligns with Anthropic's streaming event model and ensures proper handling of tool-use blocks.

tests/integrations/tests/integrations/test_google.py (4)

147-152: Excellent improvements to image loading reliability.

The additions address common issues when fetching remote images:

  • User-Agent header prevents 403 errors from servers like Wikipedia
  • timeout=30 prevents indefinite hangs
  • raise_for_status() ensures bad HTTP responses are caught early

469-483: LGTM! Robust streaming content extraction.

The updated parsing correctly navigates Google GenAI's nested streaming structure (candidates -> content -> parts -> text) with a compatibility fallback to chunk.text. The reduced minimum content length assertion (>5 vs >10) is more realistic for streaming scenarios.


514-519: LGTM! Model listing test properly implemented.

The test correctly includes the @skip_if_no_api_key decorator and uses page_size=5 to request exactly 5 models, then asserts the count matches. This is consistent with the Anthropic provider's test pattern.


523-544: LGTM! Helper function with proper error handling.

The new extract_google_function_calls helper safely extracts function call metadata with:

  • Proper type checking using hasattr
  • Exception handling for AttributeError and TypeError
  • Warning messages for debugging

@TejasGhatte TejasGhatte changed the base branch from 10-17-feat_added_list_models_request to graphite-base/671 October 23, 2025 18:47
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from 21dd884 to d1ad6c4 Compare October 23, 2025 18:49
@TejasGhatte TejasGhatte changed the base branch from graphite-base/671 to 10-17-feat_added_list_models_request October 23, 2025 18:49
@TejasGhatte TejasGhatte changed the base branch from 10-17-feat_added_list_models_request to graphite-base/671 October 23, 2025 18:59
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from d1ad6c4 to 06eadf7 Compare October 24, 2025 04:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/providers/gemini/chat.go (1)

486-491: Fix critical logic error in tool call argument parsing.

Both the if and else branches set argsMap to an empty map, which discards successfully parsed arguments. This breaks tool call argument handling for non-streaming responses.

Apply this diff to fix the logic:

 					argsMap := make(map[string]interface{})
 					if toolCall.Function.Arguments != "" {
 						if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+							// Keep argsMap empty on error
 							argsMap = map[string]interface{}{}
-						} else {
-							argsMap = map[string]interface{}{}
 						}
 					}
♻️ Duplicate comments (1)
core/schemas/providers/gemini/chat.go (1)

443-460: Handle unmarshaling errors for tool call arguments.

This issue was already identified in previous reviews. Line 447 calls json.Unmarshal without checking the error, which can silently produce empty argsMap if the JSON is malformed.

🧹 Nitpick comments (4)
tests/integrations/tests/integrations/test_google.py (3)

515-515: Remove unused parameter.

The test_config parameter is not used in this test method.

As per static analysis hints.

Apply this diff:

-    def test_15_list_models(self, google_client, test_config):
+    def test_15_list_models(self, google_client):

517-519: Assertion may be too strict.

The assertion assert len(response) == 5 assumes the API will return exactly 5 models. If fewer models are available or if the API behavior changes, this test will fail. Consider using assert len(response) <= 5 or assert len(response) > 0 instead.

Apply this diff:

         response = google_client.models.list(config={"page_size": 5})
         assert response is not None
-        assert len(response) == 5
+        assert len(response) > 0, "Should return at least one model"
+        assert len(response) <= 5, "Should not exceed requested page_size"

523-544: Well-implemented helper function with good error handling.

The function correctly extracts function calls with proper type checking and tolerant error handling. The defensive programming approach is appropriate for parsing potentially variable response structures.

One minor suggestion: consider using Python's logging module instead of print for the warning message to allow better control over test output verbosity.

Optional improvement:

+import logging
+
+logger = logging.getLogger(__name__)
+
 def extract_google_function_calls(response: Any) -> List[Dict[str, Any]]:
     """Extract function calls from Google GenAI response format with proper type checking"""
     function_calls = []
 
     # Type check for Google GenAI response
     if not hasattr(response, "function_calls") or not response.function_calls:
         return function_calls
 
     for fc in response.function_calls:
         if hasattr(fc, "name") and hasattr(fc, "args"):
             try:
                 function_calls.append(
                     {
                         "name": fc.name,
                         "arguments": dict(fc.args) if fc.args else {},
                     }
                 )
             except (AttributeError, TypeError) as e:
-                print(f"Warning: Failed to extract Google function call: {e}")
+                logger.warning(f"Failed to extract Google function call: {e}")
                 continue
 
     return function_calls
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Remove debug print statement.

This debug print statement should be removed or replaced with proper logging using pytest's built-in output capture.

Apply this diff to remove the debug statement:

-        print(error)
         assert_valid_error_response(error, "tester")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21dd884 and 06eadf7.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (9-9)
  • Content (868-876)
  • Part (882-906)
  • FunctionCall (978-988)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (272-272)
  • ResponsesToolMessage (435-455)
  • ResponsesInputMessageRoleAssistant (308-308)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
tests/integrations/tests/integrations/test_google.py (2)

469-479: Well-implemented streaming content extraction.

The enhanced parsing logic correctly handles Google GenAI's nested streaming response structure (candidates → content → parts → text) with proper attribute checks. The fallback to chunk.text maintains backward compatibility.


147-152: ****

The review comment misinterprets the timeout changes. The codebase appropriately uses different timeout values for different operation types: timeout=120 is used for streaming content collection in test_openai.py and test_anthropic.py, while timeout=30 is used for simple HTTP GET requests like image fetching in load_image_from_url. A 30-second timeout for a direct image fetch is reasonable and consistent with the PR's objectives of increasing timeouts specifically for longer-running streaming operations. The code change does not require adjustment.

Likely an incorrect or invalid review comment.

tests/integrations/tests/integrations/test_openai.py (4)

166-166: Timeout increase looks good.

The increase from 30 to 120 seconds is appropriate for integration tests and consistent with the PR's objective to address timeout issues.


474-474: Streaming timeout increases look good.

The increased timeouts for streaming operations are appropriate for integration tests and align with the PR's objective to address timeout issues.

Also applies to: 492-492


574-574: Transcription streaming timeout increase looks good.

The increased timeout is appropriate for transcription operations which can take longer than chat completions.


1058-1064: Critical: Missing decorator and inconsistent test approach.

Despite the previous review feedback and your comment about adding the skip decorator, the @skip_if_no_api_key("openai") decorator is still missing from this test method. This will cause the test to fail with an error (rather than skip gracefully) when the API key is unavailable in CI/CD environments.

Additionally, contrary to your comment that "list is not supported in openai sdk," the OpenAI Python SDK does support the limit parameter for models.list(). The previous review included web search results confirming this, showing that the method accepts a limit parameter with a valid range of 1–100 (default 20).

For consistency with the Anthropic test (line 591: models.list(limit=5)) and Google test (line 508: models.list(config={"page_size": 5})), this test should use a deterministic assertion rather than just checking > 0.

Apply this diff to add the decorator and align with other provider tests:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

This ensures:

  • The test is skipped when no API key is available (consistent with all other tests)
  • The test is deterministic by requesting exactly 5 models (consistent with test_14_list_models for Anthropic and test_15_list_models for Google)

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

Likely an incorrect or invalid review comment.

tests/integrations/tests/integrations/test_anthropic.py (4)

78-78: LGTM! Timeout increase addresses integration test stability.

The timeout increase from 30 to 120 seconds aligns with the PR objectives to address timeout issues in integration tests. This is consistently applied across streaming tests as well.


486-491: LGTM! Defensive null check for Anthropic response content.

The added null check properly handles cases where Anthropic returns empty content when the tool result is sufficient, preventing potential AttributeError exceptions.


540-550: LGTM! Test updated to reflect bifrost's internal error handling.

The test correctly validates that bifrost handles invalid roles internally without raising exceptions, which is the expected behavior per the inline comment.


589-623: LGTM! Comprehensive model listing test with pagination.

The new test properly validates model listing functionality with pagination parameters (after_id, before_id). The test includes appropriate guards for pagination edge cases and follows the established patterns in the test suite.

Note: The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

core/schemas/providers/anthropic/responses.go (2)

266-268: LGTM! Defensive initialization ensures consistent response format.

Setting Content to an empty slice when no content blocks exist ensures consistent JSON serialization (empty array vs null), which is good practice for API responses.


471-507: LGTM! Proper differentiation of function calls from regular messages.

The refactored logic correctly handles function calls by emitting ContentBlockStart events with tool_use type, while regular messages continue to emit MessageStart events. The extraction of tool message details (CallID, Name) is handled appropriately.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/providers/gemini/chat.go (1)

487-491: Critical: Tool call arguments are always empty (non-streaming path).

Lines 487-491 contain inverted logic that discards parsed tool call arguments. Both the error and success branches set argsMap to an empty map, meaning:

  • If unmarshaling succeeds, the parsed data is immediately overwritten with an empty map
  • If unmarshaling fails, an empty map is used (and the error is silently ignored)

This breaks all tool/function calling in non-streaming responses.

Apply this diff to fix the logic:

 					argsMap := make(map[string]interface{})
 					if toolCall.Function.Arguments != "" {
-						if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
-							argsMap = map[string]interface{}{}
-						} else {
-							argsMap = map[string]interface{}{}
-						}
+						if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+							// Log the error or handle appropriately
+							// argsMap remains empty on error
+						}
+						// On success, argsMap already contains the parsed data
 					}
♻️ Duplicate comments (2)
core/schemas/providers/gemini/chat.go (1)

447-447: Missing error handling for JSON unmarshaling (streaming path).

Line 447 calls json.Unmarshal without checking the returned error. If toolCall.Function.Arguments contains malformed JSON, the error is silently ignored and argsMap remains empty, potentially passing incorrect tool call arguments downstream.

tests/integrations/tests/integrations/test_openai.py (1)

1059-1064: Make model listing deterministic; request exactly one page.

Use a page size to avoid environment-dependent counts and align with Anthropic/Google tests.

OpenAI list methods support a limit parameter for pagination; using it here makes the test stable. (github.com)

-        response = openai_client.models.list()
-        assert response.data is not None
-        assert len(response.data) > 0
+        response = openai_client.models.list(limit=5)
+        assert response.data is not None
+        assert 0 < len(response.data) <= 5
🧹 Nitpick comments (2)
tests/integrations/tests/integrations/test_openai.py (1)

456-460: Avoid noisy prints in tests.

Use assertion messages or logging instead of print(error) to keep CI output clean.

-        print(error)
+        # Optionally log at debug if needed:
+        # import logging; logging.getLogger(__name__).debug("OpenAI error: %s", error)
tests/integrations/tests/integrations/test_google.py (1)

515-519: Make the page-size assertion tolerant.

models.list(config={"page_size": 5}) is correct, but some environments may return fewer than 5 items. Prefer <= 5 and > 0 to avoid flakiness. (github.com)

-        assert response is not None
-        assert len(response) == 5
+        assert response is not None
+        assert 0 < len(response) <= 5
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21dd884 and 06eadf7.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/schemas/providers/anthropic/responses.go
🧰 Additional context used
🧬 Code graph analysis (4)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (9-9)
  • Content (868-876)
  • Part (882-906)
  • FunctionCall (978-988)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (42)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (9)
tests/integrations/tests/integrations/test_openai.py (3)

163-168: LGTM: increased client timeout to 120s.

Matches the PR’s reliability goal and uses the config default cleanly.


472-476: LGTM: extend streaming collector timeouts to 120s.

Reduces test flakiness on slower runs.

Also applies to: 490-494


571-576: LGTM: transcription streaming timeout to 120s.

tests/integrations/tests/integrations/test_anthropic.py (4)

75-81: LGTM: increased client timeout to 120s.


484-491: Good defensive check on empty content.

Prevents None/empty content access during tool-result-only paths.


561-566: LGTM: extend streaming collector timeouts to 120s.

Also applies to: 581-584


589-624: Nice, thorough pagination test.

Covers limit, paging forward/backward, and pagination metadata. Tolerant <= avoids flakiness.

tests/integrations/tests/integrations/test_google.py (2)

147-153: LGTM: hardened image fetch (UA, 30s timeout, raise_for_status).


467-484: LGTM: robust streaming parsing and assertion.

Traverses candidates→content→parts safely; relaxed length avoids brittle failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants