Try LITE-style MLP over token interaction matrix #242

wilkeywork · 2025-05-15T05:47:33Z

wilkeywork
May 15, 2025

Hi! 👋

I've been exploring the paper from Google Research:
Efficient Document Ranking with Learnable Late Interactions (LITE).

In it, they compute a token-wise similarity matrix between query and document embeddings (post-transformer), then pass that through a small MLP to predict a relevance score. This gives near cross-encoder performance at dual-encoder efficiency.

Given that the Model2Vec (especially Potion) embedding model captures ~90% of transformer performance while being lightweight and fast, I wonder:

Would it be worth experimenting with a LITE-style architecture using Model2Vec as the embedding model?

The idea would be:

Use Model2Vec's Potion to embed the tokens of query and document.
Compute token-token cosine similarity matrix.
Flatten + feed into a small trainable MLP for scoring.

If Potion encodes enough token-level semantics, this could yield a fast and strong late interaction model, and might even generalize well across model families !

stephantul · 2025-05-15T06:09:22Z

stephantul
May 15, 2025
Maintainer

Nice! This is something we've been thinking about, also for classification. You would have two separate models, a query model and a document model, and those are optimized jointly. So that's pretty similar.

I do question the necessity of actually computing the token interaction itself, since:

tokens always have the same vector, regardless of their context
the mean of the token transparently encodes every token

So, for example: "the earth is green" and the token "green" by itself have a high similarity, simply because the vector of "green" directly participates in the mean, without any intermediate transformations.

But yeah, this is something we were thinking of, but just by concatenating the means and sending those through an MLP. What do you think?

0 replies

wilkeywork · 2025-05-15T07:11:38Z

wilkeywork
May 15, 2025
Author

This is fantastic insight, thank you!

Your explanation of how Model2Vec/Potion's context-independent token vectors and mean embeddings influence the approach to interaction modeling is really helpful.

The idea of MLP(concat(mean_Q, mean_D)) is indeed compelling and aligns well with the strengths you described. I can definitely see that as a strong path forward.

You've given me a lot to think about regarding the necessity of the full token interaction matrix when the base token vectors are static.
I do wonder, though, if even with static token embeddings, the explicit token-token similarity matrix (S) in a LITE-style architecture might still capture some useful alignment patterns that a global mean could average out.
For instance, consider a query like 'symptoms of rare tropical diseases affecting lungs'. An MLP(S_potion) could potentially learn that the co-occurrence of specific alignments (e.g., 'tropical diseases' aligning with actual tropical disease names, AND 'lungs' aligning with respiratory terms near those disease mentions in the document) is a much stronger relevance signal than a more general semantic overlap captured by mean embeddings. It's about the MLP learning the interplay of multiple specific semantic matches from the S matrix.

It raises an interesting question: what's the trade-off between the computational cost of processing S versus the potential for these more fine-grained alignment signals it might offer to a downstream MLP, compared to using mean embeddings with Potion?

It would be very interesting to see results if one were to compare these two MLP-based late-interaction approaches using Potion as the embedder (perhaps with jointly optimized Potion models). It seems like a neat way to explore the best way to harness Potion's efficiency for more complex ranking tasks.

Thanks again for your time and thoughts!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try LITE-style MLP over token interaction matrix #242

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Try LITE-style MLP over token interaction matrix #242

Uh oh!

wilkeywork May 15, 2025

Replies: 2 comments

Uh oh!

stephantul May 15, 2025 Maintainer

Uh oh!

wilkeywork May 15, 2025 Author

wilkeywork
May 15, 2025

stephantul
May 15, 2025
Maintainer

wilkeywork
May 15, 2025
Author