-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Open
Labels
multi-modalityRelated to multi-modality (#4194)Related to multi-modality (#4194)new-modelRequests to new modelsRequests to new models
Description
The model to consider.
ColQwen2VL is an efficient document retrieval vision language model based on Qwen2VL, as described in the paper "ColPali: Efficient Document Retrieval with Vision Language Models". The model is designed to generate embeddings rather than text outputs, making it suitable for document retrieval applications.
This was supported in HF Transformers as of huggingface/transformers#35778
An initial attempt to support the model was posted in #14291 but it was made before the HF definition was finalized so it grew out-of-date.
The closest model vllm already supports.
Qwen2VL is used as a base, so mostly it is wrapping that backbone
What's your difficulty of supporting the model you want?
See previous attempt #14291
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
multi-modalityRelated to multi-modality (#4194)Related to multi-modality (#4194)new-modelRequests to new modelsRequests to new models
Type
Projects
Status
Abandoned