Skip to content

[Bug] Template inconsistency causes different results between vLLM and lmdeploy when using InternVL2 #1208

@Sugar-zsg

Description

@Sugar-zsg

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

When running inference with InternVL2, I found that vLLM and lmdeploy produce inconsistent results. After investigation, it seems that the issue is caused by a template mismatch between the two frameworks.

lmdeploy uses its built-in prompt template.

vLLM relies on the chat_template parameter defined in the configuration file.

In testing, I noticed that vLLM’s template does not include the following prompt section:

<|im_start|>system
你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫 InternVL,是一个有用无害的人工智能助手。<|im_end|>

Reproduction

vllm:
vllm serve ${MODEL_PATH}
--enforce-eager
--trust-remote-code
--gpu-memory-utilization 0.6
--port 8000 \

lmdeploy:
lmdeploy serve api_server ${MODEL_PATH}
--model-name ${MODEL_NAME}
--server-port 8000
--tp 1 \

Environment

GPU H20
vllm:0.6.3
lmdeploy:0.9.2
model:OpenGVLab/InternVL2-2B

Error traceback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions