Checklist
Describe the bug
When running inference with InternVL2, I found that vLLM and lmdeploy produce inconsistent results. After investigation, it seems that the issue is caused by a template mismatch between the two frameworks.
lmdeploy uses its built-in prompt template.
vLLM relies on the chat_template parameter defined in the configuration file.
In testing, I noticed that vLLM’s template does not include the following prompt section:
<|im_start|>system
你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫 InternVL,是一个有用无害的人工智能助手。<|im_end|>
Reproduction
vllm:
vllm serve ${MODEL_PATH}
--enforce-eager
--trust-remote-code
--gpu-memory-utilization 0.6
--port 8000 \
lmdeploy:
lmdeploy serve api_server ${MODEL_PATH}
--model-name ${MODEL_NAME}
--server-port 8000
--tp 1 \
Environment
GPU H20
vllm:0.6.3
lmdeploy:0.9.2
model:OpenGVLab/InternVL2-2B
Error traceback