fix(infra): lazy load and don't warm up model server models #5527

edwin-onyx · 2025-09-28T18:34:05Z

Description

lazy load heavy model server libraries and don't warm up so that memory doesn't increase if user doesn't rag

ram container sizes go from
index model server 900 mb
inference model server 1.6 gb

to each ~220 MB (700 pkg lazy loading, and the extra for inference model server is from not warming up)

https://linear.app/danswer/issue/DAN-2656/lazy-load-ml-libraries-model-server

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Additional Options

[Optional] Override Linear Check

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

greptile-apps

Greptile Overview

Summary

This PR implements a comprehensive memory optimization strategy for the Onyx model server by introducing lazy loading for heavy ML libraries and disabling model warm-up by default. The changes address a significant memory usage issue where model server containers were consuming 900MB (index) to 1.6GB (inference) of RAM, reducing this to approximately 220MB each when RAG functionality isn't actively used.

The optimization works through two main mechanisms:

Lazy Loading: Heavy ML libraries (litellm, transformers, sentence_transformers, setfit) are no longer imported at module level. Instead, they use TYPE_CHECKING imports for type hints and runtime imports only when the functionality is actually needed. This prevents the memory overhead of loading these libraries during application startup.
Skip Model Warm-up: A new SKIP_WARM_UP configuration flag (defaulting to true) prevents models from being pre-loaded and warmed up at startup unless explicitly enabled. This eliminates the upfront memory cost of model initialization when users don't immediately need ML capabilities.

Key architectural changes include:

Moving SKIP_WARM_UP from app_configs to shared_configs for broader accessibility
Creating a new configure_litellm() singleton pattern in onyx/llm/get_litellm.py for lazy litellm initialization
Implementing lazy imports across multiple modules including chat LLM, image generation, search NLP models, and model server components
Enhancing the lazy import checking script with more sophisticated per-module ignore patterns
Updating tests to accommodate the new lazy loading patterns

The changes maintain full backward compatibility while making memory-efficient behavior the default. This is particularly beneficial for containerized deployments where not all instances may need immediate ML capabilities, allowing for more efficient resource utilization and better scalability.

Important Files Changed

Changed Files

Filename	Score	Overview
backend/shared_configs/configs.py	5/5	Added `SKIP_WARM_UP` configuration flag defaulting to true to control model warmup behavior
backend/onyx/llm/get_litellm.py	4/5	New file implementing singleton lazy loading pattern for litellm configuration
backend/onyx/llm/chat_llm.py	4/5	Implemented comprehensive lazy loading for litellm with `configure_litellm()` pattern
backend/model_server/main.py	4/5	Added conditional model warmup logic based on `SKIP_WARM_UP` flag with proper logging
backend/model_server/encoders.py	5/5	Moved heavy ML library imports to lazy loading pattern using `TYPE_CHECKING`
backend/model_server/custom_models.py	4/5	Implemented lazy loading for transformers and setfit libraries
backend/model_server/onyx_torch_model.py	4/5	Added lazy loading for DistilBERT model components
backend/scripts/check_lazy_imports.py	4/5	Enhanced lazy import checker with per-module ignore patterns and better configuration
backend/onyx/natural_language_processing/search_nlp_models.py	5/5	Added `SKIP_WARM_UP` guard to cross-encoder warmup function
backend/onyx/configs/app_configs.py	5/5	Removed `SKIP_WARM_UP` configuration (moved to shared_configs)
backend/onyx/tools/tool_implementations/images/image_generation_tool.py	5/5	Moved litellm image generation import to lazy loading pattern
backend/onyx/secondary_llm_flows/starter_message_creation.py	5/5	Implemented lazy loading for litellm utilities
backend/onyx/agents/agent_search/shared_graph_utils/llm.py	5/5	Added lazy loading for litellm utility functions
backend/tests/unit/scripts/test_check_lazy_imports.py	5/5	Updated tests to match new lazy import checker API with LazyImportSettings
backend/tests/unit/onyx/llm/test_chat_llm.py	5/5	Updated mock patch paths to reflect new lazy loading implementation
backend/tests/external_dependency_unit/full_setup.py	5/5	Updated to use shared_configs for `SKIP_WARM_UP` configuration

Confidence score: 4/5

This PR is safe to merge with good confidence in the memory optimization benefits and architectural improvements
Score reflects thorough implementation of lazy loading patterns with proper error handling and backward compatibility
Pay close attention to backend/scripts/check_lazy_imports.py for potential path matching issues and backend/onyx/llm/chat_llm.py for litellm configuration complexity

Sequence Diagram

sequenceDiagram
    participant User
    participant MainServer as Model Server Main
    participant CustomModels as Custom Models Module
    participant Encoders as Encoders Module
    participant Libraries as ML Libraries (transformers, setfit, etc.)
    
    User->>MainServer: Start Model Server
    MainServer->>MainServer: Initialize FastAPI app
    MainServer->>MainServer: Check SKIP_WARM_UP=true (new default)
    Note over MainServer: Skip warm-up to save memory
    MainServer->>MainServer: Set lifespan handlers
    MainServer-->>User: Server ready (~220MB RAM
    
    User->>CustomModels: Request connector classification
    CustomModels->>CustomModels: get_connector_classifier_tokenizer()
    Note over CustomModels: Lazy import: from transformers import AutoTokenizer
    CustomModels->>Libraries: Import transformers only when needed
    Libraries-->>CustomModels: Tokenizer instance
    CustomModels->>CustomModels: get_local_connector_classifier()
    CustomModels-->>User: Classification response
    
    User->>CustomModels: Request content classification
    CustomModels->>CustomModels: get_local_information_content_model()
    Note over CustomModels: Lazy import: from setfit import SetFitModel
    CustomModels->>Libraries: Import setfit only when needed
    Libraries-->>CustomModels: SetFitModel instance
    CustomModels-->>User: Content classification response
    
    User->>Encoders: Request embedding
    Encoders->>Encoders: get_embedding_model()
    Note over Encoders: Lazy import: from sentence_transformers import SentenceTransformer
    Encoders->>Libraries: Import sentence_transformers only when needed
    Libraries-->>Encoders: Model instance
    Encoders-->>User: Embedding response

_{16 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

backend/scripts/check_lazy_imports.py

backend/model_server/custom_models.py

backend/onyx/llm/get_litellm.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

cubic-dev-ai

5 issues found across 16 files

Prompt for AI agents (all 5 issues)


Understand the root cause of the following 5 issues and fix them.


<file name="backend/tests/external_dependency_unit/full_setup.py">

<violation number="1" location="backend/tests/external_dependency_unit/full_setup.py:51">
Mutating SKIP_WARM_UP at runtime may not affect modules that imported it by value, leading to inconsistent skip-warmup behavior based on import order.</violation>
</file>

<file name="backend/onyx/llm/chat_llm.py">

<violation number="1" location="backend/onyx/llm/chat_llm.py:455">
Importing exceptions inside the except block can raise ImportError and mask the original error; import these once before the try or guard the import separately.</violation>
</file>

<file name="backend/shared_configs/configs.py">

<violation number="1" location="backend/shared_configs/configs.py:11">
SKIP_WARM_UP appears unused across the codebase, so this change has no effect. According to linked Linear issue DAN-2656, skipping model warmup is required—ensure this config is actually consumed by the model server startup logic.</violation>
</file>

<file name="backend/scripts/check_lazy_imports.py">

<violation number="1" location="backend/scripts/check_lazy_imports.py:20">
Rule violated: **Use Pydantic over dataclasses**

Use Pydantic BaseModel instead of @dataclass for new models. Replace LazyImportSettings with a Pydantic model and remove @dataclass.</violation>

<violation number="2" location="backend/scripts/check_lazy_imports.py:163">
Path comparison uses OS-specific separators; forward-slash ignore patterns won’t match on Windows. Normalize to POSIX before comparing.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

backend/tests/external_dependency_unit/full_setup.py

backend/onyx/llm/chat_llm.py

backend/shared_configs/configs.py

backend/scripts/check_lazy_imports.py

edwin-onyx and others added 30 commits September 23, 2025 13:41

.

51afca7

.

deb6315

.

78c255f

.

b9c7df4

.

6782d00

.

d5acffc

Update backend/onyx/chat/prompt_builder/schemas.py

eb3528d

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

.

ab31021

Merge branch 'main' into edwin/dan-2573-newwer

808316f

.

ae22e43

Merge branch 'main' into edwin/dan-2636

a87d70b

.

2198822

Merge branch 'main' into edwin/dan-2573-newwer

2c407da

.

64b45a3

Merge branch 'edwin/dan-2573-newwer' into edwin/dan-2636

a71a459

.

07fa3f4

.

13b8641

.

73a7995

.

391c23f

.

88a2cd8

.

e515429

.

201ff96

.

69ac92d

.

12f58f8

.

6ada5d4

.

ea45dc5

.

0b41733

.

66734bf

.

93c64a0

.

f41330f

edwin-onyx added 4 commits September 29, 2025 16:07

.

f655a71

.

f1d97a9

.

0ad5be0

.

881e720

vercel bot deployed to Preview September 29, 2025 23:20 View deployment

.

9dbc4fb

vercel bot deployed to Preview September 29, 2025 23:26 View deployment

.

41b6e19

vercel bot deployed to Preview September 29, 2025 23:37 View deployment

.

7a71421

vercel bot deployed to Preview September 29, 2025 23:56 View deployment

.

790c503

vercel bot deployed to Preview September 30, 2025 00:14 View deployment

edwin-onyx requested a review from Weves September 30, 2025 00:37

edwin-onyx marked this pull request as ready for review September 30, 2025 03:05

edwin-onyx requested a review from a team as a code owner September 30, 2025 03:05

greptile-apps bot reviewed Sep 30, 2025

View reviewed changes

backend/scripts/check_lazy_imports.py Outdated Show resolved Hide resolved

backend/model_server/custom_models.py Show resolved Hide resolved

backend/onyx/llm/get_litellm.py Outdated Show resolved Hide resolved

Update backend/scripts/check_lazy_imports.py

5fad187

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

vercel bot deployed to Preview September 30, 2025 03:11 View deployment

cubic-dev-ai bot reviewed Sep 30, 2025

View reviewed changes

edwin-onyx added 2 commits September 29, 2025 20:34

.

6b4a8fa

.

917c493

vercel bot deployed to Preview September 30, 2025 03:38 View deployment

.

af3ca67

vercel bot deployed to Preview October 1, 2025 22:18 View deployment

edwin-onyx added 3 commits October 1, 2025 15:23

.

e21a414

.

35f8300

.

349bbd7

vercel bot deployed to Preview October 1, 2025 22:31 View deployment

edwin-onyx requested a review from yuhongsun96 October 1, 2025 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(infra): lazy load and don't warm up model server models #5527

fix(infra): lazy load and don't warm up model server models #5527

edwin-onyx commented Sep 28, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix(infra): lazy load and don't warm up model server models #5527

Are you sure you want to change the base?

fix(infra): lazy load and don't warm up model server models #5527

Conversation

edwin-onyx commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Important Files Changed

Confidence score: 4/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwin-onyx commented Sep 28, 2025 •

edited

Loading