[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform

I'm trying to deploy Qwen 1.5B on Vertex AI Endpoints, and I get a crash deploying Qwen 1.5B while Qwen 7B deploys perfectly fine, using the same HuggingFace TRL configuration (other than the base model) to train both. Note that training and local inference work fine both for 1.5B and 7B. The container I'm using is [us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-4.ubuntu2204.py311](us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-4.ubuntu2204.py311). My requirements.txt file is as follows for the training / local-inference setup is as follows:
```
accelerate==1.4.0
deepspeed==0.16.3
importlib-metadata==8.6.1
transformers==4.49.0
trl @ git+https://github.com/huggingface/trl@v0.15.1
protobuf==5.29.3
sentencepiece==0.2.0
```

Logs from the container referenced above:
[aiplatform_endpoints_crash.log](https://github.com/user-attachments/files/18969756/aiplatform_endpoints_crash.log)

Container environment variables:
```
serving_container_environment_variables={
          "NUM_SHARD": "1",
          "MAX_INPUT_TOKENS": "512",
          "MAX_TOTAL_TOKENS": "1024",
          "MAX_BATCH_PREFILL_TOKENS": "1512",
          "CUDA_LAUNCH_BLOCKING": "1", # Debug for Qwen 1.5B
          "TORCH_USE_CUDA_DSA": "1",   # Debug for Qwen 1.5B
      }
```

I wonder if there's some sort of version mismatch here between the training and serving containers, or perhaps [2.4.0 is just too old/buggy](https://github.com/huggingface/text-generation-inference/tree/v2.4.0), since the latest release of text-generation-inference [appears to be 3.1.0](https://github.com/huggingface/text-generation-inference/tree/v3.1.0). Is there a newer container I can try?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions