-
Notifications
You must be signed in to change notification settings - Fork 3
fix: support image input with sealion v4 #808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe PR updates the LLMSpec for LargeLanguageModels.sea_lion_v4_gemma_3_27b_it in daras_ai_v2/language_model.py. Specifically, it sets is_vision_model=True and supports_json=True, and omits the is_thinking_model argument (implicitly defaulting to False). All other fields (label, model_id, llm_api, context_window, max_output_tokens) remain unchanged. Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
daras_ai_v2/language_model.py (1)
1061-1068
: pop_entry_images leaves invalid parts; remove the image parts entirely.Currently it pops the image_url key but keeps {"type":"image_url"} parts, producing invalid payloads for non‑vision models.
Apply:
def pop_entry_images(entry: ConversationEntry) -> list[str]: contents = entry.get("content") or "" if isinstance(contents, str): return [] - return list( - filter(None, (part.pop("image_url", {}).get("url") for part in contents)), - ) + removed: list[str] = [] + new_parts = [] + for part in contents: + url = (part.get("image_url") or {}).get("url") + if part.get("type") == "image_url" and url: + removed.append(url) + # drop the whole image part + continue + new_parts.append(part) + entry["content"] = new_parts + return removed
🧹 Nitpick comments (2)
daras_ai_v2/language_model.py (2)
2417-2433
: Don’t echo raw image URLs into the text content.Leaking image URLs into the assistant-visible text is redundant and a privacy risk; the image_url parts already carry them.
Apply:
def format_chat_entry( @@ - parts = [] - if input_images: - parts.append(f"Image URLs: {', '.join(input_images)}") + parts = [] @@ - parts.append(content_text) + parts.append(content_text) @@ - if input_images: + if input_images: content = [ {"type": "image_url", "image_url": {"url": url}} for url in input_images ] + [ {"type": "text", "text": content}, ]Optional: if you want to keep a hint, embed alt text instead of URLs.
1666-1668
: Optional fallback if SEA-LION rejects response_format.If the SEA-LION endpoint 400s on response_format, consider catching that and retrying without JSON to avoid hard failures for callers that request JSON.
I can draft a small wrapper that inspects OpenAIError (status 400, message contains "response_format") and transparently retries without response_format.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
daras_ai_v2/language_model.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-12T08:22:19.003Z
Learnt from: nikochiko
PR: GooeyAI/gooey-server#768
File: daras_ai_v2/language_model.py:124-126
Timestamp: 2025-08-12T08:22:19.003Z
Learning: GPT-5 Chat (gpt-5-chat-latest) has different token limits than other GPT-5 models: 128,000 context window and 16,384 max output tokens, as confirmed by Azure documentation and actual API behavior, despite general GPT-5 documentation suggesting higher limits.
Applied to files:
daras_ai_v2/language_model.py
📚 Learning: 2025-08-12T08:22:19.003Z
Learnt from: nikochiko
PR: GooeyAI/gooey-server#768
File: daras_ai_v2/language_model.py:124-126
Timestamp: 2025-08-12T08:22:19.003Z
Learning: GPT-5 Chat (gpt-5-chat-latest) has specific limits of 128,000 context window and 16,384 max output tokens according to Azure documentation and confirmed by API testing, which differ from the general GPT-5 documentation that mentions larger limits for other GPT-5 models.
Applied to files:
daras_ai_v2/language_model.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (1)
daras_ai_v2/language_model.py (1)
99-101
: Vision + JSON enablement for SEA-LION v4 — manual API smoke-test required.Smoke test failed locally: SEA_LION_API_KEY was not set. Run the curl from the original comment with SEA_LION_API_KEY set and attach the response (or confirm) that the OpenAI-compatible SEA-LION endpoint accepts message content with image_url parts and response_format={"type":"json_object"} for aisingapore/Gemma-SEA-LION-v4-27B-IT.
Q/A checklist
How to check import time?
You can visualize this using tuna:
To measure import time for a specific library:
To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.