Skip to content

Commit ca84e6a

Browse files
authored
Download and start hf (#3293)
1 parent 58d6739 commit ca84e6a

21 files changed

+552
-305
lines changed

Dockerfile.redhat

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -389,6 +389,8 @@ RUN if [ -f /usr/bin/dnf ] ; then export DNF_TOOL=dnf ; echo -e "max_parallel_do
389389
ENV LD_LIBRARY_PATH=/ovms/lib
390390

391391
COPY --from=pkg /ovms_release /ovms
392+
# hadolint ignore=SC2114
393+
RUN mv /ovms/bin/git* /usr/bin
392394
COPY --from=pkg /usr/local/lib/python3.*/site-packages/jinja2 /ovms/python_deps/jinja2
393395
COPY --from=pkg /usr/local/lib/python3.*/site-packages/jinja2-3.1.6.dist-info /ovms/python_deps/jinja2-3.1.6.dist-info
394396
COPY --from=pkg /usr/local/lib64/python3.*/site-packages/MarkupSafe-3.0.2.dist-info /ovms/python_deps/MarkupSafe-3.0.2.dist-info

Dockerfile.ubuntu

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,8 @@ RUN apt-get update ; \
407407
useradd --home-dir /home/ovms --create-home --uid 5000 --gid 5000 --groups 39,44 --shell /bin/bash --skel /dev/null ovms
408408

409409
COPY --from=pkg /ovms_release /ovms
410+
# hadolint ignore=SC2114
411+
RUN mv /ovms/bin/git* /usr/bin
410412
COPY --from=build /usr/local/lib/python3.*/dist-packages/MarkupSafe-3.0.2.dist-info /ovms/python_deps/MarkupSafe-3.0.2.dist-info
411413
COPY --from=build /usr/local/lib/python3.*/dist-packages/jinja2 /ovms/python_deps/jinja2
412414
COPY --from=build /usr/local/lib/python3.*/dist-packages/jinja2-3.1.6.dist-info /ovms/python_deps/jinja2-3.1.6.dist-info

create_package.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ patchelf --debug --set-rpath '$ORIGIN' /ovms_release/lib/lib*plugin.so
8686
if [ -f /ovms_release/lib/libopenvino_nvidia_gpu_plugin.so ] && [ "$BASE_OS" != "redhat" ]; then patchelf --replace-needed libcutensor.so.1 /usr/lib/x86_64-linux-gnu/libcutensor/11/libcutensor.so.1 /ovms_release/lib/libopenvino_nvidia_gpu_plugin.so ; fi
8787

8888
cp -P /usr/bin/git-lfs .
89+
cp -P /usr/bin/git .
8990

9091
cd /ovms
9192
cp -v /ovms/release_files/LICENSE /ovms_release/

docs/llm/quickstart.md

Lines changed: 15 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -6,53 +6,34 @@ It is [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5
66
## Requirements
77
- Linux or Windows 11
88
- Docker Engine or `ovms` binary package [installed](../deploying_server_baremetal.md)
9-
- Intel iGPU or ARC GPU
9+
- Intel iGPU or ARC GPU
1010

1111
## Deployment Steps
12-
13-
### 1. Install Python dependencies:
14-
```console
15-
pip3 install huggingface_hub jinja2
16-
```
17-
18-
### 2. Download and Prepare the Model:
19-
Using `export_model.py` script, download the OpenVINO model and prepare models repository including all configuration required for deployment with OpenVINO Model Server. For details, see [Exporting GEN AI Models](../../demos/common/export_models/README.md).
20-
21-
```console
22-
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
23-
mkdir models
24-
python export_model.py text_generation --source_model OpenVINO/Phi-3.5-mini-instruct-int4-ov --model_repository_path models --target_device GPU --cache 2
25-
```
26-
LLM engine parameters will be defined inside the `graph.pbtxt` file.
27-
28-
> **Note:** The users in China need to set environment variable `HF_ENDPOINT="https://hf-mirror.com"` before running the export script to connect to the HF Hub.
29-
30-
> **Note:** If you want to export models outside of the `OpenVINO` organization in HuggingFace, you need to install additional Python dependencies:
31-
> ```console
32-
> pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/1/demos/common/export_models/requirements.txt
33-
> ```
3412

35-
### 3. Deploy the Model
13+
### 1. Deploy the Model
3614
::::{tab-set}
3715

3816
:::{tab-item} With Docker
3917
**Required:** Docker Engine installed
4018

4119
```bash
42-
docker run -d --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render*) --rm -p 8000:8000 -v $(pwd)/models:/models:ro openvino/model_server:latest-gpu --rest_port 8000 --model_name Phi-3.5-mini-instruct --model_path /models/OpenVINO/Phi-3.5-mini-instruct-int4-ov
20+
docker run -d --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render*) --rm -p 8000:8000 -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --source_model OpenVINO/Phi-3.5-mini-instruct-int4-ov --model_repository_path models --rest_port 8000 --target_device GPU --cache_size 2
4321
```
4422
:::
4523

4624
:::{tab-item} On Baremetal Host
4725
**Required:** OpenVINO Model Server package - see [deployment instructions](../deploying_server_baremetal.md) for details.
4826

4927
```bat
50-
ovms --rest_port 8000 --model_name Phi-3.5-mini-instruct --model_path models/OpenVINO/Phi-3.5-mini-instruct-int4-ov
28+
ovms.exe --source_model OpenVINO/Phi-3.5-mini-instruct-int4-ov --model_repository_path models --rest_port 8000 --target_device GPU --cache_size 2
5129
```
5230
:::
5331
::::
5432

55-
### 4. Check Model Readiness
33+
First run of the command will download the https://huggingface.co/OpenVINO/Phi-3.5-mini-instruct-int4-ov to models/OpenVINO/Phi-3.5-mini-instruct-int4-ov directory and start serving it with ovms.
34+
The consecutive run of the command will check that the model exists and start serving it.
35+
36+
### 2. Check Model Readiness
5637

5738
Wait for the model to load. You can check the status with a simple command:
5839

@@ -63,7 +44,7 @@ curl http://localhost:8000/v1/config
6344
:::{dropdown} Expected Response
6445
```json
6546
{
66-
"Phi-3.5-mini-instruct": {
47+
"OpenVINO/Phi-3.5-mini-instruct-int4-ov": {
6748
"model_version_status": [
6849
{
6950
"version": "1",
@@ -79,7 +60,7 @@ curl http://localhost:8000/v1/config
7960
```
8061
:::
8162

82-
### 5. Run Generation
63+
### 3. Run Generation
8364

8465
::::{tab-set}
8566

@@ -88,7 +69,7 @@ curl http://localhost:8000/v1/config
8869
curl -s http://localhost:8000/v3/chat/completions \
8970
-H "Content-Type: application/json" \
9071
-d '{
91-
"model": "Phi-3.5-mini-instruct",
72+
"model": "OpenVINO/Phi-3.5-mini-instruct-int4-ov",
9273
"max_tokens": 30,
9374
"temperature": 0,
9475
"stream": false,
@@ -107,12 +88,12 @@ Windows Powershell
10788
(Invoke-WebRequest -Uri "http://localhost:8000/v3/chat/completions" `
10889
-Method POST `
10990
-Headers @{ "Content-Type" = "application/json" } `
110-
-Body '{"model": "Phi-3.5-mini-instruct", "max_tokens": 30, "temperature": 0, "stream": false, "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What are the 3 main tourist attractions in Paris?"}]}').Content
91+
-Body '{"model": "OpenVINO/Phi-3.5-mini-instruct-int4-ov", "max_tokens": 30, "temperature": 0, "stream": false, "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What are the 3 main tourist attractions in Paris?"}]}').Content
11192
```
11293

11394
Windows Command Prompt
11495
```bat
115-
curl -s http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"Phi-3.5-mini-instruct\", \"max_tokens\": 30, \"temperature\": 0, \"stream\": false, \"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What are the 3 main tourist attractions in Paris?\"}]}"
96+
curl -s http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"OpenVINO/Phi-3.5-mini-instruct-int4-ov\", \"max_tokens\": 30, \"temperature\": 0, \"stream\": false, \"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What are the 3 main tourist attractions in Paris?\"}]}"
11697
```
11798
:::
11899

@@ -133,7 +114,7 @@ curl -s http://localhost:8000/v3/chat/completions -H "Content-Type: application/
133114
}
134115
],
135116
"created": 1744716414,
136-
"model": "Phi-3.5-mini-instruct",
117+
"model": "OpenVINO/Phi-3.5-mini-instruct-int4-ov",
137118
"object": "chat.completion",
138119
"usage": {
139120
"prompt_tokens": 24,
@@ -161,7 +142,7 @@ client = OpenAI(
161142
)
162143

163144
stream = client.chat.completions.create(
164-
model="Phi-3.5-mini-instruct",
145+
model="OpenVINO/Phi-3.5-mini-instruct-int4-ov",
165146
messages=[{"role": "system", "content": "You are a helpful assistant."},
166147
{"role": "user", "content": "What are the 3 main tourist attractions in Paris?"}
167148
],

src/BUILD

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,7 @@ cc_library(
341341
"@com_github_jarro2783_cxxopts//:cxxopts",
342342
"libovms_server_settings",
343343
"libovms_version",
344+
"libovmsfilesystem",
344345
"//src/graph_export:graph_export_types",
345346
"//src/graph_export:graph_cli_parser",
346347
"//src/graph_export:rerank_graph_cli_parser",

src/capi_frontend/server_settings.hpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include <cstdint>
1818
#include <optional>
1919
#include <string>
20+
#include <variant>
2021
#include <vector>
2122

2223
#include "../graph_export/graph_export_types.hpp"
@@ -61,14 +62,14 @@ struct RerankGraphSettingsImpl {
6162
};
6263

6364
struct HFSettingsImpl {
65+
std::string targetDevice = "CPU";
6466
std::string sourceModel = "";
6567
std::string downloadPath = "";
6668
bool pullHfModelMode = false;
69+
bool pullHfAndStartModelMode = false;
6770
bool overwriteModels = false;
6871
ExportType task = text_generation;
69-
TextGenGraphSettingsImpl graphSettings;
70-
RerankGraphSettingsImpl rerankGraphSettings;
71-
EmbeddingsGraphSettingsImpl embeddingsGraphSettings;
72+
std::variant<TextGenGraphSettingsImpl, RerankGraphSettingsImpl, EmbeddingsGraphSettingsImpl> graphSettings;
7273
};
7374

7475
struct ServerSettingsImpl {

0 commit comments

Comments
 (0)