28
28
29
29
[ ![ License] ( https://img.shields.io/badge/License-BSD3-lightgrey.svg )] ( https://opensource.org/licenses/BSD-3-Clause )
30
30
31
- ** LATEST RELEASE: You are currently on the main branch which tracks
32
- under-development progress towards the next release. The current release branch
33
- is [ r24.01] ( https://github.com/triton-inference-server/vllm_backend/tree/r24.01 )
34
- and which corresponds to the 24.01 container release on
35
- [ NVIDIA GPU Cloud (NGC)] ( https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver ) .**
36
-
37
31
# vLLM Backend
38
32
39
33
The Triton backend for [ vLLM] ( https://github.com/vllm-project/vllm )
@@ -81,7 +75,14 @@ script.
81
75
82
76
A sample command to build a Triton Server container with all options enabled is shown below. Feel free to customize flags according to your needs.
83
77
78
+ Please use [ NGC registry] ( https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags )
79
+ to get the latest version of the Triton vLLM container, which corresponds to the
80
+ latest YY.MM (year.month) of [ Triton release] ( https://github.com/triton-inference-server/server/releases ) .
81
+
82
+
84
83
```
84
+ # YY.MM is the version of Triton.
85
+ export TRITON_CONTAINER_VERSION=<YY.MM>
85
86
./build.py -v --enable-logging
86
87
--enable-stats
87
88
--enable-tracing
@@ -96,9 +97,9 @@ A sample command to build a Triton Server container with all options enabled is
96
97
--endpoint=grpc
97
98
--endpoint=sagemaker
98
99
--endpoint=vertex-ai
99
- --upstream-container-version=24.01
100
- --backend=python:r24.01
101
- --backend=vllm:r24.01
100
+ --upstream-container-version=${TRITON_CONTAINER_VERSION}
101
+ --backend=python:r${TRITON_CONTAINER_VERSION}
102
+ --backend=vllm:r${TRITON_CONTAINER_VERSION}
102
103
```
103
104
104
105
### Option 3. Add the vLLM Backend to the Default Triton Container
0 commit comments