You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say
# Split into 3 main branches: X, B, C
# These correspond to V, K, Q respectively in the SSM/attention duality
which is correct?
Thanks
The text was updated successfully, but these errors were encountered:
Hi authors, thanks for the amazing work!
I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say
which is correct?
Thanks
The text was updated successfully, but these errors were encountered: