X,B,C correspond to Q,K,V or to V,K,Q #705

MichaelWangGo · 2025-03-11T21:38:40Z

Hi authors, thanks for the amazing work!

I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say

# Split into 3 main branches: X, B, C
# These correspond to V, K, Q respectively in the SSM/attention duality

which is correct?

Thanks

The text was updated successfully, but these errors were encountered:

tridao · 2025-03-11T21:40:36Z

X <-> V
B <-> K
C <-> Q

albertfgu · 2025-03-24T17:08:40Z

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

MichaelWangGo · 2025-03-24T17:12:53Z

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

Thanks for reply.

It says here:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

X,B,C correspond to Q,K,V or to V,K,Q #705

X,B,C correspond to Q,K,V or to V,K,Q #705

MichaelWangGo commented Mar 11, 2025

tridao commented Mar 11, 2025

Uh oh!

albertfgu commented Mar 24, 2025

Uh oh!

MichaelWangGo commented Mar 24, 2025

Uh oh!

X,B,C correspond to Q,K,V or to V,K,Q #705

X,B,C correspond to Q,K,V or to V,K,Q #705

Comments

MichaelWangGo commented Mar 11, 2025

tridao commented Mar 11, 2025

Uh oh!

albertfgu commented Mar 24, 2025

Uh oh!

MichaelWangGo commented Mar 24, 2025

Uh oh!