You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/llm/quickstart.md
+15-34Lines changed: 15 additions & 34 deletions
Original file line number
Diff line number
Diff line change
@@ -6,53 +6,34 @@ It is [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5
6
6
## Requirements
7
7
- Linux or Windows 11
8
8
- Docker Engine or `ovms` binary package [installed](../deploying_server_baremetal.md)
9
-
- Intel iGPU or ARC GPU
9
+
- Intel iGPU or ARC GPU
10
10
11
11
## Deployment Steps
12
-
13
-
### 1. Install Python dependencies:
14
-
```console
15
-
pip3 install huggingface_hub jinja2
16
-
```
17
-
18
-
### 2. Download and Prepare the Model:
19
-
Using `export_model.py` script, download the OpenVINO model and prepare models repository including all configuration required for deployment with OpenVINO Model Server. For details, see [Exporting GEN AI Models](../../demos/common/export_models/README.md).
LLM engine parameters will be defined inside the `graph.pbtxt` file.
27
-
28
-
> **Note:** The users in China need to set environment variable `HF_ENDPOINT="https://hf-mirror.com"` before running the export script to connect to the HF Hub.
29
-
30
-
> **Note:** If you want to export models outside of the `OpenVINO` organization in HuggingFace, you need to install additional Python dependencies:
First run of the command will download the https://huggingface.co/OpenVINO/Phi-3.5-mini-instruct-int4-ov to models/OpenVINO/Phi-3.5-mini-instruct-int4-ov directory and start serving it with ovms.
34
+
The consecutive run of the command will check that the model exists and start serving it.
35
+
36
+
### 2. Check Model Readiness
56
37
57
38
Wait for the model to load. You can check the status with a simple command:
-Body '{"model": "Phi-3.5-mini-instruct", "max_tokens": 30, "temperature": 0, "stream": false, "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What are the 3 main tourist attractions in Paris?"}]}').Content
91
+
-Body '{"model": "OpenVINO/Phi-3.5-mini-instruct-int4-ov", "max_tokens": 30, "temperature": 0, "stream": false, "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What are the 3 main tourist attractions in Paris?"}]}').Content
111
92
```
112
93
113
94
Windows Command Prompt
114
95
```bat
115
-
curl -s http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"Phi-3.5-mini-instruct\", \"max_tokens\": 30, \"temperature\": 0, \"stream\": false, \"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What are the 3 main tourist attractions in Paris?\"}]}"
96
+
curl -s http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"OpenVINO/Phi-3.5-mini-instruct-int4-ov\", \"max_tokens\": 30, \"temperature\": 0, \"stream\": false, \"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What are the 3 main tourist attractions in Paris?\"}]}"
0 commit comments