feat: Initial state support for Mamba SSM (1) #488

mzusman · 2024-07-24T09:47:40Z

Add chunked prefill / use initial state capability to Mamba ssm ( Mamba 1 ) , Done it by prepending the last forward pass state to the FWD pass kernel and read the data accordingly .

Latency is not affected. ( benchmark script shows similar latencies between this PR and main - 130ms )
Added tests that check correctness when running on chunks.

Limitations:

Applied only for selective scan fwd pass ( bwd pass is not supported )

This PR enables efficient Speculative decoding, prefix caching and prefill chunking.

FIX #233 #473 #258 #101

daphneOdera-618 · 2024-09-02T17:21:34Z

@mzusman I've noticed you made changes to files in the csrc directory, but I'm having trouble getting these changes to take effect in my environment. Could you please tell me the exact instructions to rebuild and install the mamba_ssm package so the changes are applied? It seems I always get the original package using pip install .Thank you!

mzusman · 2024-09-03T07:28:14Z

@daphneOdera-618 Yeah, the default setup.py behaviour is to download the upstream's wheel upon "installing", What you would need to do to force build is to add MAMBA_FORCE_BUILD=TRUE pip install .

jploski · 2025-06-09T14:03:33Z

Unfortunately, this PR changes the API for selective_scan_cuda.fwd in an incompatible way. The same API is also invoked in MambaInnerFn.forward besides of SelectiveScanFn.forward, leading to runtime errors in code which uses MambaInnerFn (e.g. the Mamba implementation found in the transformers library while running in vanilla training mode without cache_params).

I think MambaInnerFn.forward could be modified to use the new API version, but I don't know how to produce the prerequisite additional empty vector (x) from what is available in MambaInnerFn.fowrard.

jploski · 2025-06-09T14:18:36Z

Unfortunately, this PR changes the API for selective_scan_cuda.fwd in an incompatible way. The same API is also invoked in MambaInnerFn.forward besides of SelectiveScanFn.forward, leading to runtime errors in code which uses MambaInnerFn (e.g. the Mamba implementation found in the transformers library while running in vanilla training mode without cache_params).

I think MambaInnerFn.forward could be modified to use the new API version, but I don't know how to produce the prerequisite additional empty vector (x) from what is available in MambaInnerFn.fowrard.

Since conv1d_out in MambeInnerFn seems to play the same role as u in SelectiveScanFn, adding this hack in place of the original invocation of selective_scan_cuda.fwd seems to work:

        u = conv1d_out
        n_chunks = int((u.shape[-1] + 2048 - 1) / 2048)
        _x = torch.zeros(
            (u.shape[0], u.shape[1], n_chunks, int(A.shape[1] * 2),),
            device=u.device,
            dtype=torch.float32,
            requires_grad=u.requires_grad
        )
        _x[:, :, 0, 0::2] = 1
#        if prev_state is not None:
#            _x[:, :, 0, 1::2].copy_(prev_state)
        out, scan_intermediates, out_z = selective_scan_cuda.fwd(u, delta, A, B, C, D, z, delta_bias, delta_softplus, _x)

mzusman added 6 commits July 23, 2024 03:31

Working, need to clean up

23f42a1

Clean up

26a76ff

Add chunked mamba tests

028fe7b

Test chunked vs not chuned

59f20cb

Comments

643dfbf

Add assert in backward pass

84a2ec8

mzusman changed the title ~~feat: Chunked prefill for Mamba SSM (1)~~ feat: Initial state support for Mamba SSM (1) Jul 24, 2024

mzusman mentioned this pull request Aug 19, 2024

[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM vllm-project/vllm#7651

Merged

Hprairie mentioned this pull request Dec 12, 2024

Adding Initial Value Support to Selective Scan Forward Kernel #285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Initial state support for Mamba SSM (1) #488

feat: Initial state support for Mamba SSM (1) #488

Uh oh!

mzusman commented Jul 24, 2024 •

edited

Loading

Uh oh!

daphneOdera-618 commented Sep 2, 2024 •

edited

Loading

Uh oh!

mzusman commented Sep 3, 2024

Uh oh!

jploski commented Jun 9, 2025

Uh oh!

jploski commented Jun 9, 2025

Uh oh!

Uh oh!

feat: Initial state support for Mamba SSM (1) #488

Are you sure you want to change the base?

feat: Initial state support for Mamba SSM (1) #488

Uh oh!

Conversation

mzusman commented Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daphneOdera-618 commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mzusman commented Sep 3, 2024

Uh oh!

jploski commented Jun 9, 2025

Uh oh!

jploski commented Jun 9, 2025

Uh oh!

Uh oh!

mzusman commented Jul 24, 2024 •

edited

Loading

daphneOdera-618 commented Sep 2, 2024 •

edited

Loading