Skip to content

Commit 2a1691a

Browse files
authored
Removed the note regarding 24.05 being in development (#44)
1 parent f77614e commit 2a1691a

File tree

1 file changed

+0
-3
lines changed

1 file changed

+0
-3
lines changed

docs/llama_multi_lora_tutorial.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,6 @@ sudo docker run --gpus all -it --net=host -p 8001:8001 --shm-size=12G \
6161
Triton's vLLM container has been introduced starting from 23.10 release, and `multi-lora` experimental support was added in vLLM v0.3.0 release.
6262

6363
> Docker image version `nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3` or higher version is strongly recommended.
64-
65-
> [!IMPORTANT]
66-
> 24.05 release is still under active development, and relevant NGC containers are not available at this time.
6764
---
6865

6966
For **pre-24.05 containers**, the docker images didn't support multi-lora feature, so you need to replace that provided in the container `/opt/tritonserver/backends/vllm/model.py` with the most up to date version. Just follow this command:

0 commit comments

Comments
 (0)