ModelZoo is a system for managing and serving local AI Models. It provides a flexible framework for discovering, launching, and managing Language, Vision and Image generation models using different runtimes and environments.
ZooKeeper is the entry-point of ModelZoo. It's a flask application that:
- Loads configuration from a YAML file.
- Instantiates the configured zoos and runtimes.
- Provides a web-based user interface to:
- List available models.
- Launch models with specific runtimes and configurations.
- Manage running models (viewing logs, stopping models)
- Embeds a proxy server (
proxy.py
) that forwards requests to the appropriate running model- Supports OpenAI protocol for text, chat and multi-modal completions
- Supports A1111 protocol for image generation
- Keeps track of model launch history
- Number of times a model has been launched, and the last launch time (to sort models by most frequently used)
- Last used enviroment and parameters (provides a better user experience by pre-filling launch configurations based on previous usage)
- Peers with instances of itself on other nodes to create distributed setups.
- Clone the repository.
- Install dependencies:
pip install -r requirements.txt
- Create a
config.yaml
YAML file. - Run the ZooKeeper application:
python ./main.py --config config.yaml
- Open the ZooKeeper web interface (listening at http://0.0.0.0:3333/ by default) to view, launch and manage models.
ModelZoo is composed of several key components:
- Zoos: Discovery systems that catalog models.
- Models: Data objects representing models.
- Runtimes: Backends that can serve models in specific environments.
- Environments: Named GPU Configurations (environment variables).
- EnvironmentSet: A collection of environments combined for model execution.
- ZooKeeper: Web application to interact with zoos, use runtimes to spawn models, interface with history and host the proxy.
- Proxy: A hybrid OpenAI-compatible (text+multimodal) and A1111-compatible (image) proxy server.
- ModelHistory: A ZooKeeper component that tracks model launch history, including frequency of use and last used configurations.
- Peers: Instances of ZooKeeper running on other hosts.
ModelZoo (in practice, ZooKeeper) is configured using a YAML file that defines:
- Zoos to be instantiated and their configurations.
- Runtimes to be made available.
- Predefined environments.
- Remote peers for distributed model management.
zoos:
- name: SSD
class: FolderZoo
params:
path: /mnt/ssd0
runtimes:
- name: LlamaRuntime
class: LlamaRuntime
params:
bin_path: /home/mike/work/llama.cpp/llama-server
envs:
- name: "P40/0"
vars:
CUDA_VISIBLE_DEVICES: 0
- name: "P40/1"
vars:
CUDA_VISIBLE_DEVICES: 1
peers:
- host: another-host
port: 3333
This example assumes you have some *.gguf
files under /mnt/ssd0 and that you have a compiled llama.cpp server binary at the specified path and that you have a second instance of ModelZoo running on another-host
.
Zoos are responsible for discovering and cataloging models.
That the name
field is optional and will default to class
if not provided, but naming your Zoos is strongly encouraged.
The system supports different types of zoos:
-
FolderZoo: Discovers models in a specified file system folder.
- Parameters:
path
(str): Path to folder containing models
- Example:
- name: LocalModels class: FolderZoo params: path: /path/to/models
- Parameters:
-
StaticZoo: Returns a predefined list of models.
- Parameters:
models
(List[Dict]): List of model dictionaries
- Example:
- name: PredefinedModels class: StaticZoo params: models: - model_id: chatgpt model_name: ChatGPT model_format: litellm
- Parameters:
-
OpenAIZoo: Fetches models from an OpenAI-compatible API.
- Parameters:
api_url
(str): Base URL of the OpenAI-compatible APIapi_key
(str, optional): API key for authenticationapi_key_env
(str, optional): Environment variable name containing the API keycache
(bool): Whether to cache the model list (default: True)models
(List[str], optional): Optional list of models to override API exploration
- Example:
- name: OpenAIModels class: OpenAIZoo params: api_url: https://api.openai.com/v1 api_key_env: OPENAI_API_KEY cache: true
- Parameters:
-
OllamaZoo: Discovers models from a local or remote Ollama instance.
- Parameters:
api_url
(str): Base URL of the Ollama API (default: http://localhost:11434)
- Example:
- name: LocalOllama class: OllamaZoo params: api_url: http://localhost:11434
- Parameters:
Each zoo type is designed to accommodate different model discovery and management needs, allowing for flexibility in how models are sourced and cataloged within the ModelZoo system.
Runtimes are responsible for serving models. The name
field is optional, and will default to class
if not provided.
-
LlamaRuntime: For serving GGUF models with llama-server
- Compatible model formats: gguf
- Parameters:
bin_path
(str): Path to the llama.cpp server binary
- Example:
- name: LlamaRuntime class: LlamaRuntime params: bin_path: /path/to/llama-server
-
LlamaSrbRuntime: For serving GGUF models with llama-srb-api
- Compatible model formats: gguf
- Parameters:
bin_path
(str): Path to the llama.cpp server binary
- Example:
- class: LlamaSrbRuntime params: script_path: /path/to/llama-srb-api/api.py
-
KoboldCppRuntime: For serving GGUF models using KoboldCpp
- Compatible model formats: gguf
- Parameters:
bin_path
(str): Path to the KoboldCpp binary
- Example:
- name: KoboldCppRuntime class: KoboldCppRuntime params: bin_path: /path/to/koboldcpp
-
TabbyRuntime: For serving GPTQ and EXL2 models using TabbyAPI
- Compatible model formats: gptq, exl2
- Parameters:
script_path
(str): Path to the TabbyAPI start.sh script
- Example:
- name: TabbyRuntime class: TabbyRuntime params: script_path: /path/to/tabby_api/start.sh
-
LiteLLMRuntime: For proxying models using LiteLLM
- Compatible model formats: litellm
- All formats supported by LiteLLM (including OpenAI, Azure, Anthropic, and various open-source models)
- Parameters:
bin_path
(str, optional): Path to the LiteLLM binary (default: "litellm")
- Example:
- name: LiteLLMRuntime class: LiteLLMRuntime params: bin_path: litellm
- Compatible model formats: litellm
-
SDServerRuntime: For serving Stable Diffusion models using stable-diffusion.cpp
- Compatible model formats: kcppt
- Parameters:
bin_path
(str): Path to the sd-server binary
- Example:
- name: SDServerRuntime class: SDServerRuntime params: bin_path: /path/to/sd-server
- Runtime Parameters:
sampler_name
: Sampling method (Euler, Euler A, Heun, DPM2, DPM++, LCM)cfg_scale
: CFG Scale for guidance (default: 1.0)steps
: Number of sampling steps (default: 1)extra_args
: Additional command line arguments
Each runtime defines compatible model formats and configurable parameters. When launching a model, you can specify additional runtime-specific parameters as needed. The choice of runtime depends on the model format and the specific features required for your use case.
Environments are configurations for running models, typically including environment variables like CUDA_VISIBLE_DEVICES
.
Example:
envs:
- name: "RTX3090"
vars:
CUDA_VISIBLE_DEVICES: 0
- name: "P40"
vars:
CUDA_VISIBLE_DEVICES: 1
Multiple enviroments may be pre-defined in the configuration file, and multiple enviroments can be selected when launching model (any conflicting values will be merged with a comma).
The remote models feature allows you to connect multiple ModelZoo instances and view the running models on remote peers. To configure remote peers:
- Add a
peers
section to your configuration file. - For each peer, specify the
host
andport
where the remote ModelZoo instance is running.
Example:
peers:
- host: falcon
port: 3333
The web interface will display the status and running models of each configured peer, allowing you to manage a distributed setup of ModelZoo instances.