Skip to content

Question about does mamba support variable-length input or cu_seqlens like flash attention? #180

Open
@zigzagcai

Description

@zigzagcai

We know that flash attention supports cu_seqlens, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational efficiency when packing multiple short sequences.

So, does Mamba also have this mechanism such as variable-length input or cu_seqlens like flash attention?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions