-
Notifications
You must be signed in to change notification settings - Fork 94
fix: integration test cases #671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: graphite-base/671
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughUpdated integration tests for Anthropic, Google, and OpenAI (timeouts increased, new model-listing tests, streaming parsing hardened); provider mappers for Anthropic and Gemini now emit and assemble streaming tool/function-call blocks; test utilities loosen some error validations and adjust streaming assumptions. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Provider as Provider (stream)
participant Mapper as Response Mapper
participant Consumer as App/Tests
Note right of Provider #bfe9ff: streaming chunks / output items
Provider->>Mapper: send chunk (delta / output item)
alt chunk signals tool/function-call
Mapper->>Mapper: emit ContentBlockStart(type=ToolUse, id?, name?)
Mapper->>Consumer: ContentPart (ToolUse) with index
else chunk contains text/parts
Mapper->>Mapper: append Text Part (index-aware)
Mapper->>Consumer: ContentPart (Text) with index
end
Note right of Mapper #e8f5e9: completion emitted without placeholder deltas
Mapper->>Consumer: Completion event (no artificial delta)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/integrations/tests/integrations/test_anthropic.py(4 hunks)tests/integrations/tests/integrations/test_google.py(1 hunks)tests/integrations/tests/integrations/test_openai.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (1)
collect_streaming_content(783-865)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (1)
collect_streaming_content(783-865)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
skip_if_no_api_key(1386-1397)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_openai.py
1058-1058: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_anthropic.py
589-589: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_google.py
506-506: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
tests/integrations/tests/integrations/test_anthropic.py (3)
78-78: Timeout increase looks good.The increased timeout (30→120 seconds) for the Anthropic client is appropriate for integration tests and aligns with similar changes across OpenAI and other providers.
486-486: Good defensive check for None content.Adding the null check before evaluating length prevents potential AttributeError if
final_response.contentis None.
564-564: Streaming timeout increases are appropriate.The extended timeouts for streaming operations are consistent with the client configuration and help prevent premature timeouts during integration tests.
Also applies to: 582-582
tests/integrations/tests/integrations/test_openai.py (3)
166-166: Timeout increase is appropriate.The increased timeout (30→120 seconds) aligns with the timeout changes across all provider integration tests and helps prevent premature timeouts.
473-473: Streaming timeout increases look good.The extended timeouts for both basic streaming and tool-based streaming are consistent with the overall timeout strategy.
Also applies to: 491-491
573-573: Transcription streaming timeout increase is reasonable.Audio transcription streaming may require additional processing time, making the 120-second timeout appropriate.
tests/integrations/tests/integrations/test_google.py (2)
505-510: Well-implemented model listing test.This test correctly includes the
@skip_if_no_api_key("google")decorator and uses a consistent approach with the Anthropic test (limiting to 5 results and asserting exactly 5). Good work!Note: The static analysis hint about unused
test_configis a false positive—it's a standard pytest fixture pattern.
514-535: Well-structured helper function with proper error handling.The
extract_google_function_callshelper follows the pattern established in other provider test files and includes appropriate defensive checks and error handling.
4fedb68 to
ed5714f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/schemas/providers/anthropic/responses.go (3)
328-336: Bug: tool_use start uses ToolUseID instead of ID (breaks call/result association).For content_block_start of a tool_use, the identifier is in ContentBlock.ID. ToolUseID is only present on tool_result to reference the earlier tool_use. Using ToolUseID here will produce nil IDs and prevent correlating tool_result with its call.
Apply this diff:
case AnthropicContentBlockTypeToolUse: // This is a function call starting - create function call message item := &schemas.ResponsesMessage{ - ID: chunk.ContentBlock.ToolUseID, + ID: chunk.ContentBlock.ID, Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), ResponsesToolMessage: &schemas.ResponsesToolMessage{ - CallID: chunk.ContentBlock.ToolUseID, + CallID: chunk.ContentBlock.ID, Name: chunk.ContentBlock.Name, Arguments: schemas.Ptr(""), // Arguments will be filled by deltas }, }
373-383: Use Arguments field for function-call JSON deltas (mapper mismatch).ToAnthropicResponsesStreamResponse reads bifrostResp.Arguments for FunctionCallArgumentsDelta, but here we set Delta. Populate Arguments to keep round‑trip symmetry and avoid dropped deltas.
Apply this diff:
case AnthropicStreamDeltaTypeInputJSON: // Function call arguments delta if chunk.Delta.PartialJSON != nil && *chunk.Delta.PartialJSON != "" { return &schemas.BifrostResponsesStreamResponse{ Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDelta, SequenceNumber: sequenceNumber, OutputIndex: schemas.Ptr(0), ContentIndex: chunk.Index, - Delta: chunk.Delta.PartialJSON, + Arguments: chunk.Delta.PartialJSON, }, nil, false }
318-346: Fix field mapping errors and populate Arguments field in FunctionCallArgumentsDelta events.Two critical issues require correction:
Lines 329, 332 (responses.go): Within
case AnthropicContentBlockTypeToolUse, usechunk.ContentBlock.IDinstead ofchunk.ContentBlock.ToolUseID. Per the type definitions,IDis the tool_use identifier, whileToolUseIDis for tool_result blocks.Line 381 (responses.go): When handling
FunctionCallArgumentsDelta, populate theArgumentsfield instead ofDelta. The struct definesArguments *stringfor this purpose, and other parts of the codebase (mux.go line 915) correctly setstreamResp.Arguments. ChangeDelta: chunk.Delta.PartialJSONtoArguments: chunk.Delta.PartialJSON.Line 702 (chat.go): Same fix as #1—use
chunk.ContentBlock.IDinstead ofchunk.ContentBlock.ToolUseIDwithin the tool_use case.
♻️ Duplicate comments (2)
tests/integrations/tests/integrations/test_anthropic.py (1)
589-594: Missing@skip_if_no_api_keydecorator.This test should include the
@skip_if_no_api_key("anthropic")decorator to gracefully skip when the API key is unavailable, consistent with all other tests in this file.Apply this diff:
+ @skip_if_no_api_key("anthropic") def test_14_list_models(self, anthropic_client, test_config):Note: The static analysis hint about unused
test_configis a false positive—it's a standard pytest fixture pattern.tests/integrations/tests/integrations/test_openai.py (1)
1058-1064: Missing@skip_if_no_api_keydecorator and inconsistent test approach.Two issues to address:
Like other test methods, this should use
@skip_if_no_api_key("openai")to gracefully skip when the API key is unavailable.This test's approach differs from the Anthropic and Google equivalents:
- Anthropic (line 592):
models.list(limit=5)→ assertslen(response.data) == 5- Google (line 517):
models.list(config={"page_size": 5})→ assertslen(response) == 5- OpenAI (line 1062):
models.list()→ assertslen(response.data) > 0For consistency and determinism, align with the other providers.
Apply this diff:
+ @skip_if_no_api_key("openai") def test_31_list_models(self, openai_client, test_config): """Test Case 31: List models""" - response = openai_client.models.list() + response = openai_client.models.list(limit=5) assert response.data is not None - assert len(response.data) > 0 + assert len(response.data) == 5Note: The static analysis hint about unused
test_configis a false positive—it's a standard pytest fixture pattern.
🧹 Nitpick comments (7)
tests/integrations/tests/integrations/test_openai.py (1)
458-458: Consider using proper logging instead of print.While this debug output can be helpful for troubleshooting, consider using Python's
loggingmodule for consistency with production code practices.Apply this diff if you want to use proper logging:
- print(error) + import logging + logging.debug(f"Error from invalid role test: {error}")Alternatively, if the print was added only for temporary debugging, consider removing it before merging.
core/schemas/providers/anthropic/responses.go (4)
471-507: SSE conformance: ensure index for tool_use starts and include empty content on message_start.
- content_block_start events should include an index; default to 0 if absent.
- message_start payload should include an empty content array to mirror Anthropic’s schema.
Apply this diff:
case schemas.ResponsesStreamResponseTypeOutputItemAdded: // Check if this is a function call (tool use) message if bifrostResp.Item != nil && bifrostResp.Item.Type != nil && *bifrostResp.Item.Type == schemas.ResponsesMessageTypeFunctionCall { // Convert function call to tool_use content_block_start event streamResp.Type = AnthropicStreamEventTypeContentBlockStart - if bifrostResp.ContentIndex != nil { - streamResp.Index = bifrostResp.ContentIndex - } + if bifrostResp.ContentIndex != nil { + streamResp.Index = bifrostResp.ContentIndex + } else { + zero := 0 + streamResp.Index = &zero + } contentBlock := &AnthropicContentBlock{ Type: AnthropicContentBlockTypeToolUse, } if bifrostResp.Item.ResponsesToolMessage != nil { if bifrostResp.Item.ResponsesToolMessage.CallID != nil { contentBlock.ID = bifrostResp.Item.ResponsesToolMessage.CallID } if bifrostResp.Item.ResponsesToolMessage.Name != nil { contentBlock.Name = bifrostResp.Item.ResponsesToolMessage.Name } } streamResp.ContentBlock = contentBlock } else { // Regular message start event streamResp.Type = AnthropicStreamEventTypeMessageStart if bifrostResp.Item != nil { // Create message start event streamMessage := &AnthropicMessageResponse{ Type: "message", - Role: string(schemas.ResponsesInputMessageRoleAssistant), + Role: string(schemas.ResponsesInputMessageRoleAssistant), + Content: []AnthropicContentBlock{}, } if bifrostResp.Item.ID != nil { streamMessage.ID = *bifrostResp.Item.ID } streamResp.Message = streamMessage } }
1265-1279: Guard against nil msg.Content before dereference.Several branches dereference msg.Content without a prior nil check, risking a panic for messages lacking content.
Apply this diff:
- case schemas.ResponsesMessageTypeMessage: - // Regular text message - if msg.Content.ContentStr != nil { + case schemas.ResponsesMessageTypeMessage: + // Regular text message + if msg.Content != nil && msg.Content.ContentStr != nil { contentBlocks = append(contentBlocks, AnthropicContentBlock{ Type: "text", Text: msg.Content.ContentStr, }) - } else if msg.Content.ContentBlocks != nil { + } else if msg.Content != nil && msg.Content.ContentBlocks != nil { // Convert content blocks for _, block := range msg.Content.ContentBlocks { anthropicBlock := convertContentBlockToAnthropic(block) if anthropicBlock != nil { contentBlocks = append(contentBlocks, *anthropicBlock) } } }
509-596: Optional: expand ContentPartAdded mapping to support images when streaming.Currently handles only text blocks. If Anthropic emits image content blocks mid‑stream, mirror them here for completeness.
416-425: Minor: include stop_reason in message_delta when known.If upstream carries stop_reason in the Bifrost event, add it to the message_delta for better parity. Safe to defer.
core/schemas/providers/gemini/chat.go (2)
497-501: Normalize non‑streaming assistant role to Gemini's 'model'.Gemini Content.Role allows only 'user' or 'model'. Mapping 'assistant' → 'model' avoids invalid role values.
- candidate.Content = &Content{ - Parts: parts, - Role: string(choice.ChatNonStreamResponseChoice.Message.Role), - } + role := string(choice.ChatNonStreamResponseChoice.Message.Role) + if role == string(schemas.ChatMessageRoleAssistant) { + role = "model" + } + candidate.Content = &Content{ + Parts: parts, + Role: role, + }Please confirm Gemini consumption points expect only 'user'|'model'; adjust if there’s any downstream relying on 'assistant'.
418-462: Reduce duplication: extract shared tool‑call → Part conversion.Both branches build FunctionCall Parts from tool calls. Consider a small helper:
- toFunctionCallPart(idPtr, fnNamePtr, argsJSON string) (*Part, bool)
This trims repeated JSON parsing and ID/name wiring and centralizes error handling.
I can draft the helper and update both branches if you want.
Also applies to: 463-501
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/schemas/providers/anthropic/responses.go(2 hunks)core/schemas/providers/gemini/chat.go(1 hunks)tests/integrations/tests/integrations/test_anthropic.py(5 hunks)tests/integrations/tests/integrations/test_google.py(3 hunks)tests/integrations/tests/integrations/test_openai.py(6 hunks)tests/integrations/tests/utils/common.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
ChatStreamResponseChoice(517-519)ChatNonStreamResponseChoice(511-514)core/schemas/providers/gemini/types.go (4)
Role(12-12)Content(949-957)Part(982-1006)FunctionCall(1078-1088)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
AnthropicContentBlock(129-139)AnthropicStreamEventTypeContentBlockStart(232-232)AnthropicContentBlockTypeToolUse(123-123)AnthropicStreamEventTypeMessageStart(230-230)AnthropicMessageResponse(194-203)core/schemas/responses.go (3)
ResponsesMessageTypeFunctionCall(268-268)ResponsesToolMessage(431-451)ResponsesInputMessageRoleAssistant(304-304)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py
515-515: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_openai.py
1060-1060: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_anthropic.py
590-590: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (13)
tests/integrations/tests/integrations/test_anthropic.py (4)
78-78: LGTM: Timeout increase addresses integration test stability.The timeout increase from 30 to 120 seconds for the Anthropic client is appropriate for integration tests and aligns with the PR objective to fix timeout issues.
486-486: LGTM: Defensive null check prevents potential AttributeError.The added null check appropriately handles the case where Anthropic returns empty content when tool results are sufficient, as documented in the surrounding comment.
564-564: LGTM: Streaming timeout increases improve test reliability.The timeout increases from 30 to 120 seconds for streaming tests are appropriate and consistent with the PR's objective to address timeout issues in integration tests.
Also applies to: 582-582
540-550: Based on my verification, I found a critical inconsistency: the test behavior contradicts both the Bifrost codebase and the pattern across other provider integrations.Key Findings:
Inconsistent Test Patterns: While
test_anthropic.pyexpects a successful response without exceptions,test_openai.py,test_litellm.py, andtest_google.pyall usepytest.raises(Exception)for the sameINVALID_ROLE_MESSAGEStest data.No Role Normalization in Bifrost: Examination of the Bifrost Anthropic provider code shows direct pass-through of roles without validation or normalization:
Role: schemas.ChatMessageRole(msg.Role)(line 54 incore/schemas/providers/anthropic/chat.go)- No filtering, validation, or normalization logic present
Misleading Comment: The comment "bifrost handles invalid roles internally" is not supported by the Bifrost codebase. The code performs no special handling for invalid roles like "tester".
The test is likely incorrect and should be aligned with the other provider tests. The Anthropic API (like other LLM APIs) rejects invalid message roles; Bifrost does not normalize them as the comment suggests.
tests/integrations/tests/integrations/test_openai.py (3)
166-166: LGTM: Timeout increase improves integration test resilience.The timeout increase from 60 to 120 seconds aligns with the PR objective to address timeout issues in integration tests.
474-474: LGTM: Streaming timeout increases address test stability.The timeout increases from 30 to 120 seconds for streaming tests are appropriate and consistent with similar changes across other provider tests.
Also applies to: 492-492
574-574: LGTM: Transcription streaming timeout increase is appropriate.The timeout increase from 60 to 120 seconds for transcription streaming is reasonable given that transcription operations can be more time-intensive.
tests/integrations/tests/integrations/test_google.py (4)
147-152: Excellent defensive improvements to image loading.The additions are all best practices:
- User-Agent header prevents 403 errors from bot detection
- Timeout prevents indefinite hanging
raise_for_status()ensures HTTP errors are caught earlyThese changes improve test reliability significantly.
469-483: LGTM: Improved streaming content extraction with proper fallback.The enhanced logic properly navigates Google GenAI's nested streaming structure (
candidates → content → parts → text) with a fallback to directchunk.textaccess for compatibility. The reduced minimum content length from 10 to 5 appears to reflect observed streaming behavior.
514-519: LGTM: Well-implemented model listing test.This test correctly includes the
@skip_if_no_api_keydecorator and uses a deterministic approach withpage_size: 5and an exact count assertion, consistent with best practices.Note: The static analysis hint about unused
test_configis a false positive—it's a standard pytest fixture pattern.
523-544: Clarify intent for unused helper function.The
extract_google_function_callshelper is well-structured with proper defensive programming, but verification confirms it's not called anywhere in the codebase. Existing tests at lines 236-239 and 316-319 accessresponse.function_callsdirectly instead.Either refactor the existing tests to use this helper for consistency, or remove it if it's not intended for future use or external consumption.
core/schemas/providers/anthropic/responses.go (1)
266-268: Good fix: ensure JSON emits [] instead of null for content.Setting Content to an empty slice avoids nulls and aligns better with clients expecting arrays. LGTM.
core/schemas/providers/gemini/chat.go (1)
425-453: Remove the proposed diff; the nil deref risk does not exist, but role normalization is needed only in the non-streaming path.The review comment incorrectly identifies a nil pointer dereference risk. In the code, toolCall.Function is an embedded value field (ChatAssistantMessageToolCallFunction), not a pointer, so it cannot be nil—accessing toolCall.Function.Arguments and toolCall.Function.Name is always safe.
However, there is a legitimate issue in the non-streaming path at line 499: the code outputs string(choice.ChatNonStreamResponseChoice.Message.Role) directly, which will be "assistant" for ChatMessageRoleAssistant. Since Gemini expects roles to map "assistant" to "model", only line 499 needs normalization. The streaming path already defaults to "model" and does not require the suggested guard. The json.Unmarshal errors being silently ignored is a minor quality issue (not critical for streaming partial JSON).
Likely an incorrect or invalid review comment.
ed5714f to
21dd884
Compare
8618bf8 to
40d6d8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests/integrations/tests/integrations/test_openai.py (1)
1059-1064: Missing decorator and non-deterministic assertion.Two issues need addressing:
Missing
@skip_if_no_api_key("openai")decorator: All other tests in this suite use this decorator to gracefully skip when the API key is unavailable. This test should follow the same pattern.Non-deterministic assertion: The test uses
assert len(response.data) > 0, which makes the test non-deterministic. For consistency with the Anthropic (line 592-594) and Google (line 517-519) tests, use a fixed limit for deterministic validation.Apply this diff:
+ @skip_if_no_api_key("openai") def test_31_list_models(self, openai_client, test_config): """Test Case 31: List models""" - response = openai_client.models.list() + response = openai_client.models.list(limit=5) assert response.data is not None - assert len(response.data) > 0 + assert len(response.data) == 5Note: The OpenAI Python SDK does support the
limitparameter formodels.list()(default 20, valid range 1-100). The static analysis warning about unusedtest_configis a false positive—it's a standard pytest fixture pattern.
🧹 Nitpick comments (1)
tests/integrations/tests/integrations/test_openai.py (1)
458-458: Consider using structured logging instead of print statements.While printing the error aids debugging, consider using Python's
loggingmodule for better control and formatting in test output.- print(error) + import logging + logging.debug(f"Error in test_12_error_handling_invalid_roles: {error}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/schemas/providers/anthropic/responses.go(2 hunks)core/schemas/providers/gemini/chat.go(1 hunks)tests/integrations/tests/integrations/test_anthropic.py(5 hunks)tests/integrations/tests/integrations/test_google.py(3 hunks)tests/integrations/tests/integrations/test_openai.py(6 hunks)tests/integrations/tests/utils/common.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
ChatStreamResponseChoice(517-519)ChatNonStreamResponseChoice(511-514)core/schemas/providers/gemini/types.go (4)
Role(12-12)Content(949-957)Part(982-1006)FunctionCall(1078-1088)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
AnthropicContentBlock(129-139)AnthropicStreamEventTypeContentBlockStart(232-232)AnthropicContentBlockTypeToolUse(123-123)AnthropicStreamEventTypeMessageStart(230-230)AnthropicMessageResponse(194-203)core/schemas/responses.go (2)
ResponsesMessageTypeFunctionCall(268-268)ResponsesToolMessage(431-451)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_openai.py
1060-1060: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_google.py
515-515: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_anthropic.py
590-590: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
core/schemas/providers/anthropic/responses.go (1)
266-268: LGTM: Consistent empty slice initialization.Setting
Contentto an empty slice instead of leaving it nil ensures consistent JSON serialization (empty array[]instead ofnull). This is a good practice for API responses.tests/integrations/tests/integrations/test_anthropic.py (5)
78-78: LGTM! Timeout increase addresses integration test reliability.The 4x timeout increase aligns with the PR's goal of fixing integration test timeout issues and is consistently applied across all provider tests.
486-491: LGTM! Proper null-safety for known Anthropic behavior.The null check correctly handles cases where Anthropic returns empty content after tool results, which is documented as valid behavior.
540-550: LGTM! Test correctly reflects bifrost's error handling behavior.The test now validates that bifrost handles invalid roles internally rather than propagating errors to the provider. This is a significant behavioral change from the original test expectation but aligns with bifrost's role as a translation/normalization layer.
564-564: LGTM! Streaming timeout increases improve test reliability.The 4x timeout increase for streaming operations is consistent with the broader timeout adjustments in this PR and should reduce flaky test failures.
Also applies to: 582-582
589-594: LGTM! Model listing test is well-structured and deterministic.The test correctly uses the
@skip_if_no_api_keydecorator and requests a fixed number of models (5) for deterministic validation, consistent with the Google provider's test approach.Note: The static analysis warning about unused
test_configis a false positive—it's a standard pytest fixture pattern.tests/integrations/tests/integrations/test_openai.py (2)
166-166: LGTM! Timeout increase improves test reliability.Consistent with the timeout adjustments across all provider tests in this PR.
474-474: LGTM! Streaming timeout increases address test flakiness.The timeout increases across all streaming scenarios (chat, tools, transcription) should improve test reliability for slower network conditions or provider throttling.
Also applies to: 492-492, 574-574
tests/integrations/tests/integrations/test_google.py (4)
147-152: LGTM! Robust image loading with proper error handling.The improvements add three important safety measures:
- User-Agent header prevents 403 errors from servers that block default clients
- 30-second timeout prevents indefinite hangs
raise_for_status()ensures HTTP errors are caught early
469-479: LGTM! More robust streaming text extraction.The improved extraction logic properly traverses the Google GenAI response structure (
candidates → content → parts → text) with defensive attribute checks, while maintaining a fallback for compatibility. This should handle edge cases better than the previous implementation.
483-483: Relaxed content threshold—verify this is appropriate.The minimum content length was reduced from 10 to 5 characters. While this makes the test more tolerant of short responses, 5 characters is quite minimal. Ensure this threshold still validates meaningful streaming content and won't pass on truncated or error responses.
514-519: LGTM! Deterministic model listing test with proper decorator.The test correctly uses the
@skip_if_no_api_keydecorator and requests a fixed page size for deterministic validation, consistent with the Anthropic test approach. The Google SDK usesconfig={"page_size": 5}syntax, which is appropriate for this provider.Note: The static analysis warning about unused
test_configis a false positive—it's a standard pytest fixture pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integrations/tests/integrations/test_openai.py (1)
458-458: Consider removing debug print statement.The
print(error)statement appears to be for debugging purposes. While helpful during development, consider removing it or replacing it with proper logging for production test code.Apply this diff if you want to remove the debug statement:
# Verify the error is properly caught and contains role-related information error = exc_info.value - print(error) assert_valid_error_response(error, "tester")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/schemas/providers/anthropic/responses.go(2 hunks)core/schemas/providers/gemini/chat.go(1 hunks)tests/integrations/tests/integrations/test_anthropic.py(5 hunks)tests/integrations/tests/integrations/test_google.py(3 hunks)tests/integrations/tests/integrations/test_openai.py(6 hunks)tests/integrations/tests/utils/common.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (3)
core/schemas/chatcompletions.go (2)
ChatStreamResponseChoice(517-519)ChatNonStreamResponseChoice(511-514)core/schemas/providers/gemini/types.go (4)
Role(12-12)Content(949-957)Part(982-1006)FunctionCall(1078-1088)ui/lib/types/logs.ts (1)
Function(139-144)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
AnthropicContentBlock(129-139)AnthropicStreamEventTypeContentBlockStart(232-232)AnthropicContentBlockTypeToolUse(123-123)AnthropicStreamEventTypeMessageStart(230-230)AnthropicMessageResponse(194-203)core/schemas/responses.go (2)
ResponsesMessageTypeFunctionCall(268-268)ResponsesToolMessage(431-451)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py
515-515: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_anthropic.py
590-590: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_openai.py
1060-1060: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
tests/integrations/tests/integrations/test_anthropic.py (5)
78-78: LGTM! Timeout increase addresses integration test stability.The timeout increase from 30 to 120 seconds aligns with the PR objectives to fix integration test timeout issues.
486-491: Good defensive null check.This properly handles the case where Anthropic returns an empty content array when a tool result is sufficient, preventing potential AttributeErrors.
564-564: LGTM! Extended streaming timeouts.The timeout increases from 30 to 120 seconds for streaming operations align with the PR objectives and ensure tests don't fail prematurely on slower network connections or API responses.
Also applies to: 582-582
589-594: LGTM! Model listing test added with proper decorator.The test correctly uses the
@skip_if_no_api_keydecorator and validates that exactly 5 models are returned when limited to 5. This aligns with similar tests in other provider integrations.
540-550: Test behavior is inconsistent with other provider tests and the claim about Bifrost is unverified.The Anthropic test_12 does not follow the established pattern used across OpenAI, Google, and LiteLLM integrations. While all other provider tests expect exceptions for invalid roles and call
assert_valid_error_response(), the Anthropic test expects a successful response without validation. Additionally:
- The test does not use
convert_to_anthropic_messages()like other tests in the same file- The comment claims "bifrost handles invalid roles internally" but this is not verified against Bifrost's actual implementation
- The same
INVALID_ROLE_MESSAGESfixture with role="tester" is expected to raise exceptions in all other providersEither verify that Bifrost's Anthropic integration has documented, provider-specific role handling that differs from OpenAI/Google/LiteLLM, or align the test with the established error-handling pattern used elsewhere.
tests/integrations/tests/integrations/test_openai.py (3)
166-166: LGTM! Timeout increase improves test reliability.The timeout increase from 30 to 120 seconds addresses timeout issues mentioned in the PR objectives.
474-474: LGTM! Extended streaming timeouts.The timeout increases from 30 to 120 seconds for streaming operations are consistent with changes in other provider tests and address the PR's timeout-related objectives.
Also applies to: 492-492, 574-574
1059-1064: Missing decorator and inconsistent test pattern.This test has two issues:
Missing
@skip_if_no_api_key("openai")decorator – Unlike all other test methods in this file, this test lacks the decorator to skip when the API key is unavailable.Inconsistent assertion pattern – This test uses
assert len(response.data) > 0while the equivalent tests in other providers use deterministic assertions with a fixed limit:
- Anthropic (line 592):
models.list(limit=5)→assert len(response.data) == 5- Google (line 517):
models.list(config={"page_size": 5})→assert len(response) == 5The past review thread indicates you mentioned "list is not supported in openai sdk," but the OpenAI Python SDK does support the
limitparameter formodels.list().Apply this diff to align with other provider tests:
+ @skip_if_no_api_key("openai") def test_31_list_models(self, openai_client, test_config): """Test Case 31: List models""" - response = openai_client.models.list() + response = openai_client.models.list(limit=5) assert response.data is not None - assert len(response.data) > 0 + assert len(response.data) == 5Note: The static analysis hint about unused
test_configis a false positive—it's a standard pytest fixture pattern.Likely an incorrect or invalid review comment.
core/schemas/providers/anthropic/responses.go (2)
266-268: Good defensive initialization.Setting
Contentto an empty slice instead of leaving it nil prevents potential nil pointer dereferences in downstream code that expects a slice.
471-507: LGTM! Improved function call handling in streaming responses.The code now correctly differentiates between function call messages and regular messages in the
OutputItemAddedevent:
- Function calls emit a
ContentBlockStartevent with aToolUsecontent block- Regular messages emit a
MessageStartevent with an assistant roleThis aligns with Anthropic's streaming event model and ensures proper handling of tool-use blocks.
tests/integrations/tests/integrations/test_google.py (4)
147-152: Excellent improvements to image loading reliability.The additions address common issues when fetching remote images:
- User-Agent header prevents 403 errors from servers like Wikipedia
timeout=30prevents indefinite hangsraise_for_status()ensures bad HTTP responses are caught early
469-483: LGTM! Robust streaming content extraction.The updated parsing correctly navigates Google GenAI's nested streaming structure (
candidates -> content -> parts -> text) with a compatibility fallback tochunk.text. The reduced minimum content length assertion (>5 vs >10) is more realistic for streaming scenarios.
514-519: LGTM! Model listing test properly implemented.The test correctly includes the
@skip_if_no_api_keydecorator and usespage_size=5to request exactly 5 models, then asserts the count matches. This is consistent with the Anthropic provider's test pattern.
523-544: LGTM! Helper function with proper error handling.The new
extract_google_function_callshelper safely extracts function call metadata with:
- Proper type checking using
hasattr- Exception handling for
AttributeErrorandTypeError- Warning messages for debugging
21dd884 to
d1ad6c4
Compare
40d6d8f to
4b86a8b
Compare
d1ad6c4 to
06eadf7
Compare
4b86a8b to
f1f62d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/schemas/providers/gemini/chat.go (1)
486-491: Fix critical logic error in tool call argument parsing.Both the
ifandelsebranches setargsMapto an empty map, which discards successfully parsed arguments. This breaks tool call argument handling for non-streaming responses.Apply this diff to fix the logic:
argsMap := make(map[string]interface{}) if toolCall.Function.Arguments != "" { if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil { + // Keep argsMap empty on error argsMap = map[string]interface{}{} - } else { - argsMap = map[string]interface{}{} } }
♻️ Duplicate comments (1)
core/schemas/providers/gemini/chat.go (1)
443-460: Handle unmarshaling errors for tool call arguments.This issue was already identified in previous reviews. Line 447 calls
json.Unmarshalwithout checking the error, which can silently produce emptyargsMapif the JSON is malformed.
🧹 Nitpick comments (4)
tests/integrations/tests/integrations/test_google.py (3)
515-515: Remove unused parameter.The
test_configparameter is not used in this test method.As per static analysis hints.
Apply this diff:
- def test_15_list_models(self, google_client, test_config): + def test_15_list_models(self, google_client):
517-519: Assertion may be too strict.The assertion
assert len(response) == 5assumes the API will return exactly 5 models. If fewer models are available or if the API behavior changes, this test will fail. Consider usingassert len(response) <= 5orassert len(response) > 0instead.Apply this diff:
response = google_client.models.list(config={"page_size": 5}) assert response is not None - assert len(response) == 5 + assert len(response) > 0, "Should return at least one model" + assert len(response) <= 5, "Should not exceed requested page_size"
523-544: Well-implemented helper function with good error handling.The function correctly extracts function calls with proper type checking and tolerant error handling. The defensive programming approach is appropriate for parsing potentially variable response structures.
One minor suggestion: consider using Python's
loggingmodule instead ofOptional improvement:
+import logging + +logger = logging.getLogger(__name__) + def extract_google_function_calls(response: Any) -> List[Dict[str, Any]]: """Extract function calls from Google GenAI response format with proper type checking""" function_calls = [] # Type check for Google GenAI response if not hasattr(response, "function_calls") or not response.function_calls: return function_calls for fc in response.function_calls: if hasattr(fc, "name") and hasattr(fc, "args"): try: function_calls.append( { "name": fc.name, "arguments": dict(fc.args) if fc.args else {}, } ) except (AttributeError, TypeError) as e: - print(f"Warning: Failed to extract Google function call: {e}") + logger.warning(f"Failed to extract Google function call: {e}") continue return function_callstests/integrations/tests/integrations/test_openai.py (1)
458-458: Remove debug print statement.This debug print statement should be removed or replaced with proper logging using pytest's built-in output capture.
Apply this diff to remove the debug statement:
- print(error) assert_valid_error_response(error, "tester")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/schemas/providers/anthropic/responses.go(2 hunks)core/schemas/providers/gemini/chat.go(1 hunks)tests/integrations/tests/integrations/test_anthropic.py(5 hunks)tests/integrations/tests/integrations/test_google.py(3 hunks)tests/integrations/tests/integrations/test_openai.py(6 hunks)tests/integrations/tests/utils/common.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
skip_if_no_api_key(1366-1377)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
ChatStreamResponseChoice(518-520)ChatNonStreamResponseChoice(512-515)core/schemas/providers/gemini/types.go (4)
Role(9-9)Content(868-876)Part(882-906)FunctionCall(978-988)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
AnthropicContentBlock(129-139)AnthropicStreamEventTypeContentBlockStart(232-232)AnthropicContentBlockTypeToolUse(123-123)AnthropicStreamEventTypeMessageStart(230-230)AnthropicMessageResponse(194-203)core/schemas/responses.go (3)
ResponsesMessageTypeFunctionCall(272-272)ResponsesToolMessage(435-455)ResponsesInputMessageRoleAssistant(308-308)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py
515-515: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_openai.py
1060-1060: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_anthropic.py
590-590: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
tests/integrations/tests/integrations/test_google.py (2)
469-479: Well-implemented streaming content extraction.The enhanced parsing logic correctly handles Google GenAI's nested streaming response structure (candidates → content → parts → text) with proper attribute checks. The fallback to
chunk.textmaintains backward compatibility.
147-152: ****The review comment misinterprets the timeout changes. The codebase appropriately uses different timeout values for different operation types:
timeout=120is used for streaming content collection intest_openai.pyandtest_anthropic.py, whiletimeout=30is used for simple HTTP GET requests like image fetching inload_image_from_url. A 30-second timeout for a direct image fetch is reasonable and consistent with the PR's objectives of increasing timeouts specifically for longer-running streaming operations. The code change does not require adjustment.Likely an incorrect or invalid review comment.
tests/integrations/tests/integrations/test_openai.py (4)
166-166: Timeout increase looks good.The increase from 30 to 120 seconds is appropriate for integration tests and consistent with the PR's objective to address timeout issues.
474-474: Streaming timeout increases look good.The increased timeouts for streaming operations are appropriate for integration tests and align with the PR's objective to address timeout issues.
Also applies to: 492-492
574-574: Transcription streaming timeout increase looks good.The increased timeout is appropriate for transcription operations which can take longer than chat completions.
1058-1064: Critical: Missing decorator and inconsistent test approach.Despite the previous review feedback and your comment about adding the skip decorator, the
@skip_if_no_api_key("openai")decorator is still missing from this test method. This will cause the test to fail with an error (rather than skip gracefully) when the API key is unavailable in CI/CD environments.Additionally, contrary to your comment that "list is not supported in openai sdk," the OpenAI Python SDK does support the
limitparameter formodels.list(). The previous review included web search results confirming this, showing that the method accepts alimitparameter with a valid range of 1–100 (default 20).For consistency with the Anthropic test (line 591:
models.list(limit=5)) and Google test (line 508:models.list(config={"page_size": 5})), this test should use a deterministic assertion rather than just checking> 0.Apply this diff to add the decorator and align with other provider tests:
+ @skip_if_no_api_key("openai") def test_31_list_models(self, openai_client, test_config): """Test Case 31: List models""" - response = openai_client.models.list() + response = openai_client.models.list(limit=5) assert response.data is not None - assert len(response.data) > 0 + assert len(response.data) == 5This ensures:
- The test is skipped when no API key is available (consistent with all other tests)
- The test is deterministic by requesting exactly 5 models (consistent with test_14_list_models for Anthropic and test_15_list_models for Google)
Note: The static analysis hint about unused
test_configis a false positive—it's a standard pytest fixture pattern.Likely an incorrect or invalid review comment.
tests/integrations/tests/integrations/test_anthropic.py (4)
78-78: LGTM! Timeout increase addresses integration test stability.The timeout increase from 30 to 120 seconds aligns with the PR objectives to address timeout issues in integration tests. This is consistently applied across streaming tests as well.
486-491: LGTM! Defensive null check for Anthropic response content.The added null check properly handles cases where Anthropic returns empty content when the tool result is sufficient, preventing potential AttributeError exceptions.
540-550: LGTM! Test updated to reflect bifrost's internal error handling.The test correctly validates that bifrost handles invalid roles internally without raising exceptions, which is the expected behavior per the inline comment.
589-623: LGTM! Comprehensive model listing test with pagination.The new test properly validates model listing functionality with pagination parameters (
after_id,before_id). The test includes appropriate guards for pagination edge cases and follows the established patterns in the test suite.Note: The static analysis warning about unused
test_configis a false positive—it's a standard pytest fixture pattern.core/schemas/providers/anthropic/responses.go (2)
266-268: LGTM! Defensive initialization ensures consistent response format.Setting
Contentto an empty slice when no content blocks exist ensures consistent JSON serialization (empty array vs null), which is good practice for API responses.
471-507: LGTM! Proper differentiation of function calls from regular messages.The refactored logic correctly handles function calls by emitting
ContentBlockStartevents with tool_use type, while regular messages continue to emitMessageStartevents. The extraction of tool message details (CallID, Name) is handled appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/schemas/providers/gemini/chat.go (1)
487-491: Critical: Tool call arguments are always empty (non-streaming path).Lines 487-491 contain inverted logic that discards parsed tool call arguments. Both the error and success branches set
argsMapto an empty map, meaning:
- If unmarshaling succeeds, the parsed data is immediately overwritten with an empty map
- If unmarshaling fails, an empty map is used (and the error is silently ignored)
This breaks all tool/function calling in non-streaming responses.
Apply this diff to fix the logic:
argsMap := make(map[string]interface{}) if toolCall.Function.Arguments != "" { - if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil { - argsMap = map[string]interface{}{} - } else { - argsMap = map[string]interface{}{} - } + if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil { + // Log the error or handle appropriately + // argsMap remains empty on error + } + // On success, argsMap already contains the parsed data }
♻️ Duplicate comments (2)
core/schemas/providers/gemini/chat.go (1)
447-447: Missing error handling for JSON unmarshaling (streaming path).Line 447 calls
json.Unmarshalwithout checking the returned error. IftoolCall.Function.Argumentscontains malformed JSON, the error is silently ignored andargsMapremains empty, potentially passing incorrect tool call arguments downstream.tests/integrations/tests/integrations/test_openai.py (1)
1059-1064: Make model listing deterministic; request exactly one page.Use a page size to avoid environment-dependent counts and align with Anthropic/Google tests.
OpenAI list methods support a
limitparameter for pagination; using it here makes the test stable. (github.com)- response = openai_client.models.list() - assert response.data is not None - assert len(response.data) > 0 + response = openai_client.models.list(limit=5) + assert response.data is not None + assert 0 < len(response.data) <= 5
🧹 Nitpick comments (2)
tests/integrations/tests/integrations/test_openai.py (1)
456-460: Avoid noisy prints in tests.Use assertion messages or logging instead of
print(error)to keep CI output clean.- print(error) + # Optionally log at debug if needed: + # import logging; logging.getLogger(__name__).debug("OpenAI error: %s", error)tests/integrations/tests/integrations/test_google.py (1)
515-519: Make the page-size assertion tolerant.
models.list(config={"page_size": 5})is correct, but some environments may return fewer than 5 items. Prefer<= 5and> 0to avoid flakiness. (github.com)- assert response is not None - assert len(response) == 5 + assert response is not None + assert 0 < len(response) <= 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/schemas/providers/anthropic/responses.go(2 hunks)core/schemas/providers/gemini/chat.go(1 hunks)tests/integrations/tests/integrations/test_anthropic.py(5 hunks)tests/integrations/tests/integrations/test_google.py(3 hunks)tests/integrations/tests/integrations/test_openai.py(6 hunks)tests/integrations/tests/utils/common.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/integrations/tests/utils/common.py
🚧 Files skipped from review as they are similar to previous changes (1)
- core/schemas/providers/anthropic/responses.go
🧰 Additional context used
🧬 Code graph analysis (4)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
skip_if_no_api_key(1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
ChatStreamResponseChoice(518-520)ChatNonStreamResponseChoice(512-515)core/schemas/providers/gemini/types.go (4)
Role(9-9)Content(868-876)Part(882-906)FunctionCall(978-988)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
collect_streaming_content(763-845)skip_if_no_api_key(1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py
515-515: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_anthropic.py
590-590: Unused method argument: test_config
(ARG002)
tests/integrations/tests/integrations/test_openai.py
1060-1060: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (42)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (9)
tests/integrations/tests/integrations/test_openai.py (3)
163-168: LGTM: increased client timeout to 120s.Matches the PR’s reliability goal and uses the config default cleanly.
472-476: LGTM: extend streaming collector timeouts to 120s.Reduces test flakiness on slower runs.
Also applies to: 490-494
571-576: LGTM: transcription streaming timeout to 120s.tests/integrations/tests/integrations/test_anthropic.py (4)
75-81: LGTM: increased client timeout to 120s.
484-491: Good defensive check on empty content.Prevents None/empty content access during tool-result-only paths.
561-566: LGTM: extend streaming collector timeouts to 120s.Also applies to: 581-584
589-624: Nice, thorough pagination test.Covers limit, paging forward/backward, and pagination metadata. Tolerant
<=avoids flakiness.tests/integrations/tests/integrations/test_google.py (2)
147-153: LGTM: hardened image fetch (UA, 30s timeout, raise_for_status).
467-484: LGTM: robust streaming parsing and assertion.Traverses candidates→content→parts safely; relaxed length avoids brittle failures.

Summary
Increased API timeout values and added model listing tests for Anthropic, Google, and OpenAI integrations.
Changes
test_14_list_modelsfor Anthropictest_15_list_modelsfor Googletest_31_list_modelsfor OpenAIType of change
Affected areas
How to test
Run the integration tests for the affected providers:
Breaking changes
Related issues
Addresses timeout issues with API calls and adds test coverage for model listing functionality.
Security considerations
No security implications.
Checklist