[New Model]: Support ColQwen2VL

### The model to consider.

ColQwen2VL is an efficient document retrieval vision language model based on Qwen2VL, as described in the paper "ColPali: Efficient Document Retrieval with Vision Language Models". The model is designed to generate embeddings rather than text outputs, making it suitable for document retrieval applications.

This was supported in HF Transformers as of https://github.com/huggingface/transformers/pull/35778

An initial attempt to support the model was posted in https://github.com/vllm-project/vllm/pull/14291 but it was made before the HF definition was finalized so it grew out-of-date.

### The closest model vllm already supports.

Qwen2VL is used as a base, so mostly it is wrapping that backbone

### What's your difficulty of supporting the model you want?

See previous attempt https://github.com/vllm-project/vllm/pull/14291

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[New Model]: Support ColQwen2VL #19381

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[New Model]: Support ColQwen2VL #19381

Description

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions