-
Notifications
You must be signed in to change notification settings - Fork 3
feat: add support for SwissAI Apertus LLM #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthrough
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
daras_ai_v2/settings.py (1)
514-514
: PUBLICAI_API_KEY added — please document and validate missing-key behaviorLooks good. Please add PUBLICAI_API_KEY to your env samples/ops docs and confirm that selecting a swiss-ai/* model without this key surfaces a clear error to the user (not just a 401 from provider).
I can open a docs patch and add a startup check that warns if PUBLICAI_API_KEY is empty.
daras_ai_v2/language_model.py (3)
83-91
: New model spec: verify real token limits and JSON/tool supportSpec wiring looks correct. Per your past guidance, please confirm context_window=65,536 and max_output_tokens=4,096 from actual API behavior (not docs), and whether JSON mode is supported. If JSON is supported via PublicAI, set supports_json=True to enable response_format.
I can add an integration check that probes max tokens and JSON mode and auto-adjusts the spec.
1656-1659
: Disable tools with a visible warning (don’t fail silently)Silently nulling tools can confuse callers. Log a warning when tools are provided, so it’s obvious they were ignored for this model.
Apply within this hunk:
if model == LargeLanguageModels.apertus_70b_instruct: - # Swiss AI Apertus model doesn't support tool calling - tools = None + # Swiss AI Apertus model doesn't support tool calling + if tools: + logger.warning("Tools are not supported for %s; disabling tool calls.", model.name) + tools = None
1934-1941
: PublicAI routing: double-check base_url and feature parity (streaming, usage, penalties)Client wiring looks fine. Please verify:
- base_url "https://api.publicai.co/v1" is correct for OpenAI-compatible chat completions (and supports streaming).
- Usage tokens are returned in OpenAI shape (so record_openai_llm_usage works).
- Frequency/presence penalties are accepted or safely ignored.
I can add a small probe script to hit chat.completions with stream=True and confirm response headers/usage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
daras_ai_v2/language_model.py
(3 hunks)daras_ai_v2/settings.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nikochiko
PR: GooeyAI/gooey-server#768
File: daras_ai_v2/language_model.py:124-126
Timestamp: 2025-08-12T08:22:19.003Z
Learning: When setting token limits for language models in daras_ai_v2/language_model.py, prioritize actual API behavior and platform-specific documentation over general model documentation to ensure consistent UX and avoid API errors.
Learnt from: nikochiko
PR: GooeyAI/gooey-server#768
File: daras_ai_v2/language_model.py:124-126
Timestamp: 2025-08-12T08:22:19.003Z
Learning: When determining language model token limits in daras_ai_v2/language_model.py, prioritize actual API behavior and direct testing over general documentation, as different model variants within the same family may have different practical limits.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10.12, 1.8.3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
usage_costs/models.py (1)
66-69
: Optional: keep IntegerChoices ordered for readabilityPlacing
aks = 5
afterpublicai = 14
is a bit jarring when scanning. Consider grouping/sorting by id.class ModelProvider(models.IntegerChoices): openai = 1, "OpenAI" google = 2, "Google" together_ai = 3, "TogetherAI" azure_openai = 4, "Azure OpenAI" + aks = 5, "Azure Kubernetes Service" anthropic = 6, "Anthropic" groq = 7, "groq" fireworks = 8, "Fireworks AI" mistral = 9, "Mistral AI" sarvam = 10, "sarvam.ai" fal_ai = 11, "fal.ai" twilio = 12, "Twilio" sea_lion = 13, "sea-lion.ai" publicai = 14, "PublicAI" - - aks = 5, "Azure Kubernetes Service"usage_costs/migrations/0035_alter_modelpricing_model_name_and_more.py (1)
1-14
: Ruff RUF012 on migrations: prefer per-file ignoreAuto-generated migrations trigger RUF012 (ClassVar) lint. Recommend ignoring migrations to avoid churn.
Add to your Ruff config:
+[tool.ruff.lint.per-file-ignores] +"**/migrations/*.py" = ["RUF012"]I can raise a tiny PR to add this to pyproject.toml if you want.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
scripts/init_llm_pricing.py
(1 hunks)usage_costs/migrations/0035_alter_modelpricing_model_name_and_more.py
(1 hunks)usage_costs/models.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
scripts/init_llm_pricing.py (2)
daras_ai_v2/language_model.py (1)
LargeLanguageModels
(82-1025)usage_costs/models.py (1)
ModelProvider
(53-68)
usage_costs/migrations/0035_alter_modelpricing_model_name_and_more.py (2)
usage_costs/migrations/0033_alter_modelpricing_model_name_and_more.py (1)
Migration
(6-297)usage_costs/migrations/0032_alter_modelpricing_model_name_alter_modelpricing_sku.py (1)
Migration
(6-282)
🪛 Ruff (0.12.2)
usage_costs/migrations/0035_alter_modelpricing_model_name_and_more.py
8-13: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
15-320: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (5)
usage_costs/models.py (1)
53-69
: Add PublicAI provider enum — looks correctValue 14 is unique and matches the migration and pricing script usage.
scripts/init_llm_pricing.py (2)
16-18
: Good move: notes migrated from comment tonotes
fieldStoring this in DB instead of inline comment improves traceability.
20-30
: Verify PublicAI pricing & update entry
- PublicAI lists swiss‑ai/apertus‑70b‑instruct and notes it’s free during “Swiss AI Weeks” (September 2025); no per‑token pricing published as of 2025-09-11.
- Action: do not keep the hardcoded unit_cost_input/unit_cost_output (0.25/2 per 1M) unless you can cite published pricing — either remove or mark them as estimates with an “as of 2025-09-11” note; update pricing_url to a real pricing/docs page (current value points to API endpoints). Confirm with the provider and update the entry.
File: scripts/init_llm_pricing.py Lines: 20-30
usage_costs/migrations/0035_alter_modelpricing_model_name_and_more.py (2)
20-26
: Apertus model added to choices — consistent with model enum
"apertus_70b_instruct"
display label matches the enum label; migration looks good.
300-315
: Provider choices include PublicAI (14) — aligned with models.pyThe new provider id matches the enum; no data-mapping concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
daras_ai_v2/text_splitter.py (1)
58-63
: Per-thread cache bug: encoder cached for first model only.threadlocal.enc stores a single encoder; subsequent calls with a different model reuse the wrong encoder, producing incorrect lengths and splits.
Apply per-model caching with a safe fallback:
- try: - enc = threadlocal.enc - except AttributeError: - enc = tiktoken.encoding_for_model(model) - threadlocal.enc = enc + try: + encoders = threadlocal.encoders + except AttributeError: + encoders = threadlocal.encoders = {} + enc = encoders.get(model) + if enc is None: + try: + enc = tiktoken.encoding_for_model(model) + except Exception: + # Fallbacks for older tiktoken or unknown model IDs + try: + enc = tiktoken.get_encoding("o200k_base") + except Exception: + enc = tiktoken.get_encoding("cl100k_base") + encoders[model] = enc
🧹 Nitpick comments (1)
daras_ai_v2/text_splitter.py (1)
44-57
: Optional: hoist prefixes to a module constant.Slight readability/perf win; avoids rebuilding the list on every call and centralizes future additions.
Example:
+MODEL_PREFIXES_MAP_TO_GPT4O = ( + "o3-","gpt-4.5-","gpt-4.1-","sarvam-","claude-","google/","aisingapore/","swiss-ai/" +) def default_length_function(text: str, model: str = "gpt-4") -> int: - if any( - model.startswith(prefix) - for prefix in [ - "o3-", - "gpt-4.5-", - "gpt-4.1-", - "sarvam-", - "claude-", - "google/", - "aisingapore/", - "swiss-ai/", - ] - ): + if any(model.startswith(prefix) for prefix in MODEL_PREFIXES_MAP_TO_GPT4O): model = "gpt-4o"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
daras_ai_v2/text_splitter.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (1)
daras_ai_v2/text_splitter.py (1)
44-57
: Verify tokenizer parity for 'swiss-ai/' → 'gpt-4o' mappingLocal tiktoken check failed (ModuleNotFoundError). Run the provided tiktoken snippet locally (pip install tiktoken) to compare o200k_base vs cl100k_base on representative text and confirm whether SwissAI/Apertus-70B uses a Llama-style tokenizer; if it does, map 'swiss-ai/' to cl100k_base or a provider-specific fallback instead of gpt-4o. File: daras_ai_v2/text_splitter.py:44-57.
Q/A checklist
How to check import time?
You can visualize this using tuna:
To measure import time for a specific library:
To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.