3
3
![ Mamba] ( assets/selection.png " Selective State Space ")
4
4
> ** Mamba: Linear-Time Sequence Modeling with Selective State Spaces** \
5
5
> Albert Gu* , Tri Dao* \
6
- > Paper: https://arxiv.org/abs/2312.00752\
7
- > ** Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality** \
6
+ > Paper: https://arxiv.org/abs/2312.00752\
7
+
8
+ ![ Mamba-2] ( assets/ssd_algorithm.png " State Space Dual Model ")
9
+ > ** Transformers are SSMs: Generalized Models and Efficient Algorithms** \
10
+ > ** Through Structured State Space Duality** \
8
11
> Tri Dao* , Albert Gu* \
9
12
> Paper: https://arxiv.org/abs/2405.21060
10
13
@@ -63,6 +66,8 @@ y = model(x)
63
66
assert y.shape == x.shape
64
67
```
65
68
69
+ ### Mamba-2
70
+
66
71
The Mamba-2 block is implemented at [ modules/mamba2.py] ( mamba_ssm/modules/mamba2.py ) .
67
72
68
73
A simpler version is at [ modules/mamba2_simple.py] ( mamba_ssm/modules/mamba2_simple.py )
@@ -81,6 +86,11 @@ y = model(x)
81
86
assert y.shape == x.shape
82
87
```
83
88
89
+ #### SSD
90
+
91
+ A minimal version of the inner SSD module (Listing 1 from the Mamba-2 paper) with conversion between "discrete" and "continuous" SSM versions
92
+ is at [ modules/ssd_minimal.py] ( mamba_ssm/modules/ssd_minimal.py ) .
93
+
84
94
### Mamba Language Model
85
95
86
96
Finally, we provide an example of a complete language model: a deep sequence model backbone (with repeating Mamba blocks) + language model head.
@@ -205,6 +215,7 @@ If you use this codebase, or otherwise find our work valuable, please cite Mamba
205
215
journal={arXiv preprint arXiv:2312.00752},
206
216
year={2023}
207
217
}
218
+
208
219
@inproceedings{mamba2,
209
220
title={Transformers are {SSM}s: Generalized Models and Efficient Algorithms Through Structured State Space Duality},
210
221
author={Dao, Tri and Gu, Albert},
0 commit comments