[Model] Update pooling model interface #21058

DarkLight1337 · 2025-07-16T15:04:27Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Update the VllmModelForPooling interface so that pooler now must be a Pooler instance instead of a method. This enables the model runner to directly fetch information from the Pooler instance in subsequent PRs.

cc @maxdebayser @noooop

Test Plan

The existing tests should pass

Test Result

(Optional) Documentation Update

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

github-actions · 2025-07-16T15:04:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request refactors the pooling model interface by changing pooler from a method to a Pooler instance. The changes are consistent across the codebase and improve the interface design. I've identified a critical type inconsistency in the new Pooler abstract base class and a related issue with ClassifierPooler's inheritance that should be addressed to ensure type safety and interface correctness.

vllm/model_executor/layers/pooler.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

vllm/model_executor/models/prithvi_geospatial_mae.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py

Thanks for the cleanup!

noooop · 2025-07-16T16:12:28Z

Wait a moment

Can we rename pooler?
It is the same as the pooler of bert-like model, which will cause many problems.

How about pooling, anything except pooler.

I plan to discuss this issue in a later PR, but it seems that discussing it here now is the best choice .

DarkLight1337 · 2025-07-16T16:43:00Z

Why can't this be named Pooler? As long as weights are mapped properly this shouldn't matter right?

maxdebayser

I think we can wait for @noooop to show why the pooler name would cause problems but otherwise this PR LGTM.

noooop · 2025-07-17T00:12:41Z

In principle, we should not use the same name for things that are often used together, as this can easily cause confusion.

There is another specific issue:

#20930 can automatically handle the model's default_pooling_type,

After, as_embedding_model can theoretically handle BertModel (and other BERT-like models) correctly

However, running as_embedding_model(BertModel) will not succeed because there is a name conflict between BertModel.pooler and VllmModelForPooling.pooler (both before and now).

Therefore, we also need to wrap BertModel again with BertEmbeddingModel.

vllm/vllm/model_executor/models/bert.py

Line 395 in 01513a3

class BertEmbeddingModel(nn.Module, SupportsV0Only, SupportsQuant):

In other words, if we rename the VllmModelForPooling.pooler, we would no longer need wrappers like BertEmbeddingModel, as_embedding_model will support all models.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-07-17T06:03:43Z

Could you clarify a bit? Should BertModel.pooler or as_embedding_model(...).pooler take precedence?

DarkLight1337 · 2025-07-17T06:06:29Z

And isn't this also a problem before this PR? Since both are named _pooler.

noooop · 2025-07-17T06:33:55Z

BertModel has a pooler module

class BertModel(nn.Module, SupportsQuant):
    packed_modules_mapping = {"qkv_proj": ["query", "key", "value"]}

    def __init__(self,
                 *,
                 vllm_config: VllmConfig,
                 prefix: str = "",
                 embedding_class: type = BertEmbedding,
                 add_pooling_layer: bool = False):
        super().__init__()
        config = vllm_config.model_config.hf_config
        self.embeddings = embedding_class(config)
        self.encoder = BertEncoder(vllm_config=vllm_config,
                                   prefix=f"{prefix}.encoder")
        self.pooler = BertPooler(config) if add_pooling_layer else None   <- here

Previously, BertModel.pooler conflicted with ModelForPooling.pooler method.

    class ModelForPooling(orig_cls, VllmModelForPooling):
.....

        def pooler(
            self,
            hidden_states: torch.Tensor,
            pooling_metadata: PoolingMetadata,
        ) -> PoolerOutput:
            return self._pooler(hidden_states, pooling_metadata)

Now conflicts with Pooler instance

class ModelForPooling(orig_cls, VllmModelForPooling):
.....
    def __init__(
            if not getattr(self, "pooler", None):
                self._init_pooler(vllm_config, prefix=prefix)
.....

In other words, if we rename the VllmModelForPooling.pooler, we would no longer need wrappers like BertEmbeddingModel, as_embedding_model will support BertModel.

DarkLight1337 · 2025-07-17T07:10:47Z

BertPooler has been refactored to be an instance of Pooler in this PR, would that cause any problems?

noooop · 2025-07-17T07:51:10Z

BertPooler has been refactored to be an instance of Pooler in this PR, would that cause any problems?

It feels weird.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-07-17T10:46:54Z

Is anyone able to repro the CI failure in language models test? It passes for me locally...

DarkLight1337 · 2025-07-17T10:48:16Z

I was suspecting that somehow has_step_pooler isn't working correctly, but locally the model can be correctly detected as has_step_pooler

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-07-17T12:56:19Z

Yeah it seems to happen for me on Python 3.11 but not on Python 3.10 for whatever reason... fixing now

christian-pinto · 2025-07-17T12:57:28Z

I was suspecting that somehow has_step_pooler isn't working correctly, but locally the model can be correctly detected as has_step_pooler

In my case it fails and has_step_pooler returns false because of is_pooling_model that returns false.

DarkLight1337 · 2025-07-17T12:58:23Z

Can you pull the latest commit and try again? See if it works now

DarkLight1337 · 2025-07-17T13:00:21Z

Instead of using isinstance checks which seem to be brittle depending on the implementation of runtime_checkable, I have changed the checks to directly access the flag attribute (e.g. is_pooling_model)

christian-pinto · 2025-07-17T13:08:51Z

Instead of using isinstance checks which seem to be brittle depending on the implementation of runtime_checkable, I have changed the checks to directly access the flag attribute (e.g. is_pooling_model)

It works now.

christian-pinto · 2025-07-17T13:16:43Z

vllm/model_executor/models/adapters.py

    class ModelForPooling(orig_cls, VllmModelForPooling):

+        is_pooling_model = True
+


Isn't this achieved already by deriving VllmModelForPooling?

In the doc we also say that all pooling models implement VllmModelForPooling but not all do. Could this be cause of confusion?

VllmModelForPooling is only an interface, we don't explicitly derive from it just like how generative models don't explicitly derive from VllmModelForTextGeneration

maxdebayser · 2025-07-17T13:27:19Z

vllm/model_executor/layers/pooler.py

+
+        return SimplePooler.from_config(resolved_config)
+
+    def get_pooling_params(self, task: PoolingTask) -> Optional[PoolingParams]:


What is the intended use of get_pooling_params()? Will it get called from serving_embedding.py somehow?

It will be called by:

LLMEngine (and its async version) to validate that the request is supported by the model.

The model runner, in order to get information such as use_cross_encoder and logits_processing_needs_token_ids.

The task will be set by our code at API level

For example:

Score API: We set task="score"

LLMEngine: Call get_pooling_params with the task to see if it's supported

Model runner: Call get_pooling_params to pass use_cross_encoder to the pooler.

This abstraction lets each model define how to handle each task, instead of having static logic at the API level

Yeah, this is good, we're starting to accumulate too much logic at the entrypoint level.

Just to understand the last detail: is EmbeddingCompetionRequest.to_pooling_params() going to be replaced with something like EmbeddingCompetionRequest.to_pooling_task()

No, since we still have some parameters (e.g. dimensions) that need to be forwarded. I will add a task attribute to PoolingParams so that the task can be set in to_pooling_params

maxdebayser · 2025-07-17T14:25:17Z

In other words, if we rename the VllmModelForPooling.pooler, we would no longer need wrappers like BertEmbeddingModel, as_embedding_model will support BertModel.

@noooop and @DarkLight1337 , the wrapper classes are useful to place custom logic that is needed to handle the ideosyncrasies of Bert-like models. For example in #19988 I'm using the RobertaEmbeddingModel class to fix the position ids. I moved the logic out of the RobertaEmbedding class because I couldn't find a way to make it work with cuda graphs.

…ecture Update JinaVLForEmbedding to align with PR vllm-project#21058's pooling model interface: - Add is_pooling_model = True class attribute - Create JinaVLPooler class inheriting from Pooler base class - Move vision-aware pooling logic into JinaVLPooler - Implement get_pooling_params method returning PoolingParams() for "embed" task - Replace pooler method with pooler attribute - Add required imports: PoolingTask, PoolingParams, assert_never The JinaVLPooler maintains the sophisticated vision-text pooling behavior while conforming to the new architecture requirements. Signed-off-by: Sigrid Jin (Sionic AI) <sigrid@sionic.ai>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Himanshu Jaju <hj@mistral.ai>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Model] Update pooling model interface

ade3090

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from Isotr0py July 16, 2025 15:04

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 16, 2025

mergify bot added the qwen Related to Qwen models label Jul 16, 2025

gemini-code-assist bot reviewed Jul 16, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

DarkLight1337 added 3 commits July 16, 2025 15:11

Fix return type annotation

4805bc6

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fixes

a954738

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

737fc59

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 commented Jul 16, 2025

View reviewed changes

vllm/model_executor/models/prithvi_geospatial_mae.py Show resolved Hide resolved

Fix a typo

9984479

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py approved these changes Jul 16, 2025

View reviewed changes

maxdebayser approved these changes Jul 16, 2025

View reviewed changes

DarkLight1337 added 2 commits July 17, 2025 02:48

Fix interface detection

0861b82

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fixes

378dab0

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from aarnphm as a code owner July 17, 2025 05:18

mergify bot added the frontend label Jul 17, 2025

Fix name

335441c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Proper inheritance

ebbe733

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 mentioned this pull request Jul 17, 2025

[Model] Add support for Jina Embeddings V4 #20802

Open

Fix interfaces detection

495a29d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 enabled auto-merge (squash) July 17, 2025 13:09

christian-pinto reviewed Jul 17, 2025

View reviewed changes

maxdebayser reviewed Jul 17, 2025

View reviewed changes

noooop mentioned this pull request Jul 17, 2025

[Model] Re-add the implicit conversion feature for as_seq_cls_model #21103

Merged

4 tasks

DarkLight1337 merged commit 90bd2ab into vllm-project:main Jul 17, 2025
79 checks passed

DarkLight1337 deleted the model-use-pooler branch July 17, 2025 16:05

DarkLight1337 mentioned this pull request Jul 17, 2025

[Core] Set pooling params based on task and model #21128

Merged

4 tasks

hj-mistral pushed a commit to hj-mistral/vllm that referenced this pull request Jul 19, 2025

[Model] Update pooling model interface (vllm-project#21058)

7bdeba4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Himanshu Jaju <hj@mistral.ai>

LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025

[Model] Update pooling model interface (vllm-project#21058)

3b34a49

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

		class ModelForPooling(orig_cls, VllmModelForPooling):

		is_pooling_model = True


		return SimplePooler.from_config(resolved_config)

		def get_pooling_params(self, task: PoolingTask) -> Optional[PoolingParams]:

Uh oh!

[Model] Update pooling model interface #21058

[Model] Update pooling model interface #21058

Uh oh!

Conversation

DarkLight1337 commented Jul 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jul 16, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

noooop commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

noooop commented Jul 17, 2025

Uh oh!

DarkLight1337 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

christian-pinto commented Jul 17, 2025

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

DarkLight1337 commented Jul 17, 2025

Uh oh!

christian-pinto commented Jul 17, 2025

Uh oh!

christian-pinto Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

DarkLight1337 commented Jul 16, 2025 •

edited by github-actions bot

Loading

noooop commented Jul 16, 2025 •

edited

Loading

noooop commented Jul 17, 2025 •

edited

Loading

noooop commented Jul 17, 2025 •

edited

Loading

DarkLight1337 commented Jul 17, 2025 •

edited

Loading

DarkLight1337 Jul 17, 2025 •

edited

Loading

DarkLight1337 Jul 17, 2025 •

edited

Loading