Skip to content

X,B,C correspond to Q,K,V or to V,K,Q #705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MichaelWangGo opened this issue Mar 11, 2025 · 3 comments
Open

X,B,C correspond to Q,K,V or to V,K,Q #705

MichaelWangGo opened this issue Mar 11, 2025 · 3 comments

Comments

@MichaelWangGo
Copy link

Hi authors, thanks for the amazing work!

I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say

# Split into 3 main branches: X, B, C
# These correspond to V, K, Q respectively in the SSM/attention duality

which is correct?

Thanks

@tridao
Copy link
Collaborator

tridao commented Mar 11, 2025

X <-> V
B <-> K
C <-> Q

@albertfgu
Copy link
Contributor

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

@MichaelWangGo
Copy link
Author

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

Thanks for reply.

It says here:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants