[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594
Labels
bug
Something that is supposed to be working; but isn't
community-backlog
P0
Issues that should be fixed in short order
What happened + What you expected to happen
There is some problem with how the chat template is resolved for vision language models when you compare vllm's openai server vs. LLM apis.
The issue is rooted in some ad-hoc ops that are happening in openAI server before passing prompts to the engine that might be missing from the LLM engine request submission path on ray serve llm.
Versions / Dependencies
N/A
Reproduction script
Compare the following vllm cmd with the corresponding serve llm deployment:
VLLM Code
SERVE LLM CODE
For example for this particular model here is the diff in the conversation that
tokenizer.apply_chat_template()
gets applied to:on vllm:
Image is replaced with
tag. This is done in the openAI server logic and then passed into the tokenizer's chat_template
on serve llm:
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: