Try LITE-style MLP over token interaction matrix #242
Replies: 2 comments
-
Nice! This is something we've been thinking about, also for classification. You would have two separate models, a query model and a document model, and those are optimized jointly. So that's pretty similar. I do question the necessity of actually computing the token interaction itself, since:
So, for example: "the earth is green" and the token "green" by itself have a high similarity, simply because the vector of "green" directly participates in the mean, without any intermediate transformations. But yeah, this is something we were thinking of, but just by concatenating the means and sending those through an MLP. What do you think? |
Beta Was this translation helpful? Give feedback.
-
This is fantastic insight, thank you! Your explanation of how Model2Vec/Potion's context-independent token vectors and mean embeddings influence the approach to interaction modeling is really helpful. The idea of MLP(concat(mean_Q, mean_D)) is indeed compelling and aligns well with the strengths you described. I can definitely see that as a strong path forward. You've given me a lot to think about regarding the necessity of the full token interaction matrix when the base token vectors are static. It raises an interesting question: what's the trade-off between the computational cost of processing S versus the potential for these more fine-grained alignment signals it might offer to a downstream MLP, compared to using mean embeddings with Potion? It would be very interesting to see results if one were to compare these two MLP-based late-interaction approaches using Potion as the embedder (perhaps with jointly optimized Potion models). It seems like a neat way to explore the best way to harness Potion's efficiency for more complex ranking tasks. Thanks again for your time and thoughts! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! 👋
I've been exploring the paper from Google Research:
Efficient Document Ranking with Learnable Late Interactions (LITE).
In it, they compute a token-wise similarity matrix between query and document embeddings (post-transformer), then pass that through a small MLP to predict a relevance score. This gives near cross-encoder performance at dual-encoder efficiency.
Given that the Model2Vec (especially Potion) embedding model captures ~90% of transformer performance while being lightweight and fast, I wonder:
Would it be worth experimenting with a LITE-style architecture using Model2Vec as the embedding model?
The idea would be:
If Potion encodes enough token-level semantics, this could yield a fast and strong late interaction model, and might even generalize well across model families !
Beta Was this translation helpful? Give feedback.
All reactions