Skip to content

Conversation

nikochiko
Copy link
Member

@nikochiko nikochiko commented Sep 16, 2025

Q/A checklist

  • I have tested my UI changes on mobile and they look acceptable
  • I have tested changes to the workflows in both the API and the UI
  • I have done a code review of my changes and looked at each line of the diff + the references of each function I have changed
  • My changes have not increased the import time of the server
How to check import time?

time python -c 'import server'

You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

Copy link

coderabbitai bot commented Sep 16, 2025

📝 Walkthrough

Walkthrough

Added HF_TOKEN configuration in daras_ai_v2/settings.py sourced from the environment. Introduced MMS TTS support across the codebase: a new TextToSpeechProviders enum member and selector wiring with mms_tts_selector and mms_tts_language_options (cached) in daras_ai_v2/text_to_speech_settings_widgets.py. Added recipes/TextToSpeech.py support for MMS_TTS with a new TextToSpeechSettings field mms_tts_language and a Modal-based execution path that validates language, calls a remote Modal function, and uploads WAV audio. Added a new daras_ai_v2/mms_tts.py module that defines MMS_TTS_SUPPORTED_LANGUAGES, a Modal app/image/volume, and public functions load_pipe(language) and run_mms_tts(language, text) which instantiate and run HuggingFace MMS pipelines and return WAV bytes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • devxpy

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: add support for MMS TTS models via Modal" succinctly and accurately describes the primary change in the diff — adding MMS TTS support with Modal integration across runtime, settings, and UI — and is clear and appropriately scoped for reviewers.
Description Check ✅ Passed The PR description follows the repository template by including the Q/A checklist (all items checked), the import-time guidance inside a details block, and the legal boilerplate, so it satisfies the required structure and sections.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mms-tts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
daras_ai_v2/settings.py (1)

521-522: Hugging Face token: set standard env var and avoid leaking into templates

  • Mirror HF token to HUGGING_FACE_HUB_TOKEN so huggingface_hub & CLIs pick it up.
  • Caution: all settings are injected into Jinja globals (Line 112-115), which makes HF_TOKEN visible to templates. Consider whitelisting non‑secret values.

Apply this diff:

-HF_TOKEN = config("HF_TOKEN", "")
+HF_TOKEN = config("HF_TOKEN", default="")
+if HF_TOKEN and "HUGGING_FACE_HUB_TOKEN" not in os.environ:
+    os.environ["HUGGING_FACE_HUB_TOKEN"] = HF_TOKEN

Outside this hunk (for your consideration), restrict what’s exposed to Jinja:

# replace settings=globals() with a safe subset, e.g.:
SAFE_SETTINGS_FOR_TEMPLATES = {
    "APP_BASE_URL": APP_BASE_URL,
    "DOCS_URL": DOCS_URL,
    # ... non-secret fields only
}
templates.env.globals.update(dict(humanize=humanize, datetime=datetime, settings=SAFE_SETTINGS_FOR_TEMPLATES))
🧹 Nitpick comments (3)
recipes/TextToSpeech.py (1)

68-69: Add field doc for clarity

Document expected code scheme (ISO‑639‑3).

Apply this diff:

-    mms_tts_language: str = "eng"
+    mms_tts_language: str = Field("eng", description="ISO‑639‑3 code for MMS TTS")
daras_ai_v2/text_to_speech_settings_widgets.py (2)

204-214: Selectbox: pass keys and sort for stable UX

Pass the keys view explicitly and sort by display label for determinism.

Apply this diff:

-def mms_tts_selector():
-    options = mms_tts_language_options()
+def mms_tts_selector():
+    options = mms_tts_language_options()
+    lang_codes = sorted(options.keys(), key=lambda c: options[c].lower())
     gui.selectbox(
         label="""
         ###### MMS TTS Language
         """,
         key="mms_tts_language",
-        format_func=lambda lang: options[lang],
-        options=options,
+        format_func=options.__getitem__,
+        options=lang_codes,
     )

216-225: Options builder: add fallback if langcodes is missing

Avoid runtime failure if langcodes isn’t installed; degrade to returning codes.

Apply this diff:

 @redis_cache_decorator(ex=settings.REDIS_MODELS_CACHE_EXPIRY)
 def mms_tts_language_options():
-    import langcodes
-    from daras_ai_v2.mms_tts import MMS_TTS_SUPPORTED_LANGUAGES
-
-    result = {}
-    for lang in MMS_TTS_SUPPORTED_LANGUAGES:
-        result[lang] = langcodes.Language.get(lang).display_name()
-    return result
+    from daras_ai_v2.mms_tts import MMS_TTS_SUPPORTED_LANGUAGES
+    try:
+        import langcodes
+    except Exception:
+        return {lang: lang for lang in MMS_TTS_SUPPORTED_LANGUAGES}
+    return {
+        lang: langcodes.Language.get(lang).display_name()
+        for lang in MMS_TTS_SUPPORTED_LANGUAGES
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b6c847 and 729aa3f.

📒 Files selected for processing (3)
  • daras_ai_v2/settings.py (1 hunks)
  • daras_ai_v2/text_to_speech_settings_widgets.py (3 hunks)
  • recipes/TextToSpeech.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
daras_ai_v2/text_to_speech_settings_widgets.py (1)
daras_ai_v2/redis_cache.py (1)
  • redis_cache_decorator (23-51)
recipes/TextToSpeech.py (3)
daras_ai_v2/text_to_speech_settings_widgets.py (1)
  • TextToSpeechProviders (69-77)
daras_ai_v2/exceptions.py (1)
  • UserError (58-65)
daras_ai/image_input.py (1)
  • upload_file_from_bytes (59-66)
🪛 Ruff (0.12.2)
recipes/TextToSpeech.py

422-422: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (3)
daras_ai_v2/text_to_speech_settings_widgets.py (2)

77-78: Enum addition LGTM

Provider wired into the enum with a clear label.


174-176: Selector dispatch LGTM

Correctly routes MMS TTS to its selector.

recipes/TextToSpeech.py (1)

414-431: MMS TTS — guard modal.enable_output(); don't add local ImportError handling

  • Do not add a local try/except ImportError for modal — modal is imported at module top (recipes/TextToSpeech.py line 7) and used elsewhere (e.g., recipes/Functions.py), so it's a repo-level dependency.
  • Change only the enable_output usage: at recipes/TextToSpeech.py (around lines 424–426) wrap modal.enable_output() behind a settings.DEBUG check; in production call run_mms_tts.remote(...) without enable_output(). Keep all other behavior unchanged.

Likely an incorrect or invalid review comment.


import gooey_gui as gui
import requests
import modal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Lazy‑load modal (import-time + optional dependency)

Top-level import of modal increases server import time and breaks environments without modal installed. Import it only when MMS TTS is selected.

Apply this diff:

-import modal

See follow-up change in the MMS_TTS case below.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import modal
🤖 Prompt for AI Agents
In recipes/TextToSpeech.py around line 7, the top-level "import modal" makes
modal a required import at module import time and slows or breaks environments
without modal; remove the top-level import and instead lazy-import modal inside
the branch that handles the MMS TTS backend (i.e., import modal only when
MMS_TTS is selected), wrap the import in a try/except to raise a clear error if
modal is missing, and mirror this lazy-load pattern in the follow-up MMS_TTS
case so modal is only imported/used when that backend is actually chosen.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
recipes/TextToSpeech.py (2)

7-7: Lazy‑load Modal to avoid import‑time failures and keep server import fast

Top‑level import modal makes Modal a hard dependency and slows/breaks environments without it. Import Modal only inside the MMS_TTS branch.

Apply this diff:

- import modal

414-431: Gate Modal usage and surface friendly errors; add progress message

Handle missing Modal, failed function lookup, and remote call errors gracefully; also mirror other providers’ “Uploading…” yield.

Apply this diff:

             case TextToSpeechProviders.MMS_TTS:
-                from daras_ai_v2.mms_tts import (
-                    MMS_TTS_SUPPORTED_LANGUAGES,
-                    app as modal_app,
-                )
+                try:
+                    import modal
+                except Exception as e:
+                    raise UserError("MMS TTS requires the 'modal' package.") from e
+                from daras_ai_v2.mms_tts import (
+                    MMS_TTS_SUPPORTED_LANGUAGES,
+                    app as modal_app,
+                )

                 language = state.get("mms_tts_language", "eng")
                 if language not in MMS_TTS_SUPPORTED_LANGUAGES:
                     raise UserError(f"Unsupported language: {language}")

-                run_mms_tts = modal.Function.lookup(modal_app.name, "run_mms_tts")
-                with modal.enable_output():
-                    audio = run_mms_tts.remote(language=language, text=text)
+                try:
+                    run_mms_tts = modal.Function.lookup(modal_app.name, "run_mms_tts")
+                except Exception as e:
+                    raise UserError("MMS TTS backend is unavailable. Please deploy daras_ai_v2/mms_tts.py.") from e
+                try:
+                    with modal.enable_output():
+                        audio = run_mms_tts.remote(language=language, text=text)
+                except Exception as e:
+                    raise UserError("MMS TTS synthesis failed.") from e

-                state["audio_url"] = upload_file_from_bytes(
+                yield "Uploading Audio file..."
+                state["audio_url"] = upload_file_from_bytes(
                     filename="output.wav", data=audio, content_type="audio/wav"
                 )
🧹 Nitpick comments (3)
recipes/TextToSpeech.py (1)

68-69: Preseed default in sane_defaults for consistency

Add mms_tts_language: "eng" to sane_defaults so the UI/session state is prefilled without relying on model defaults.

daras_ai_v2/mms_tts.py (2)

108-119: Reuse the CUDA check and pass a single computed device

Minor tidy‑up; avoids double checks and keeps intent clear.

Apply this diff:

-    has_cuda = torch.cuda.is_available()
-    if has_cuda:
-        print("Using GPU")
-    else:
-        print("GPU not available, using CPU")
+    has_cuda = torch.cuda.is_available()
+    print("Using GPU" if has_cuda else "Using CPU")
+    device = 0 if has_cuda else -1
@@
-        device=0 if torch.cuda.is_available() else -1,
+        device=device,

142-144: Guard against shape variance in pipeline output (optional)

Some TTS pipelines return audio with/without batch dim. Consider squeezing before write.

Proposed change:

-    scipy.io.wavfile.write(b, rate=output["sampling_rate"], data=output["audio"][0])
+    audio = output["audio"]
+    try:
+        audio = audio[0]  # handle batch dim if present
+    except Exception:
+        pass
+    scipy.io.wavfile.write(b, rate=output["sampling_rate"], data=audio)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 729aa3f and 0be9263.

📒 Files selected for processing (4)
  • daras_ai_v2/mms_tts.py (1 hunks)
  • daras_ai_v2/settings.py (1 hunks)
  • daras_ai_v2/text_to_speech_settings_widgets.py (3 hunks)
  • recipes/TextToSpeech.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • daras_ai_v2/text_to_speech_settings_widgets.py
  • daras_ai_v2/settings.py
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/TextToSpeech.py (4)
daras_ai_v2/text_to_speech_settings_widgets.py (1)
  • TextToSpeechProviders (69-77)
daras_ai_v2/exceptions.py (1)
  • UserError (58-65)
daras_ai_v2/mms_tts.py (1)
  • run_mms_tts (131-144)
daras_ai/image_input.py (1)
  • upload_file_from_bytes (59-66)
🪛 Ruff (0.12.2)
recipes/TextToSpeech.py

422-422: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (3.10.12, 1.8.3)

Comment on lines +91 to +101
image = (
modal.Image.debian_slim()
.pip_install(
"transformers~=4.44",
"huggingface_hub[hf_transfer]~=0.34.4",
"torch~=2.5.1",
"scipy~=1.11",
"python-decouple~=3.6",
)
.env({"HF_HUB_CACHE": cache_dir, "HF_TOKEN": config("HF_TOKEN", cast=str)})
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don’t crash when HF_TOKEN is unset; only inject when present (+ enable hf_transfer)

config("HF_TOKEN", cast=str) raises if the var isn’t set. Build env_vars conditionally and enable faster downloads.

Apply this diff:

-image = (
-    modal.Image.debian_slim()
-    .pip_install(
-        "transformers~=4.44",
-        "huggingface_hub[hf_transfer]~=0.34.4",
-        "torch~=2.5.1",
-        "scipy~=1.11",
-        "python-decouple~=3.6",
-    )
-    .env({"HF_HUB_CACHE": cache_dir, "HF_TOKEN": config("HF_TOKEN", cast=str)})
-)
+env_vars = {
+    "HF_HUB_CACHE": cache_dir,
+    "HF_HUB_ENABLE_HF_TRANSFER": "1",
+}
+hf_token = config("HF_TOKEN", default=None, cast=str)
+if hf_token:
+    env_vars["HF_TOKEN"] = hf_token
+image = (
+    modal.Image.debian_slim()
+    .pip_install(
+        "transformers~=4.44",
+        "huggingface_hub[hf_transfer]~=0.34.4",
+        "torch~=2.5.1",
+        "scipy~=1.11",
+        "python-decouple~=3.6",
+    )
+    .env(env_vars)
+)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In daras_ai_v2/mms_tts.py around lines 91-101, avoid calling config("HF_TOKEN")
directly (which raises if unset); instead build an env_vars dict before
constructing the image, only add "HF_TOKEN" when it's present (e.g., from
os.getenv or config with a default), keep "HF_HUB_CACHE" always, and pass that
env_vars into modal.Image.env(...) so the build won’t crash when HF_TOKEN is
missing; also ensure the pip_install uses huggingface_hub[hf_transfer] (already
present) to enable faster downloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants