Open
Description
We know that flash attention supports cu_seqlens
, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational efficiency when packing multiple short sequences.
So, does Mamba also have this mechanism such as variable-length input or cu_seqlens
like flash attention?
Metadata
Metadata
Assignees
Labels
No labels