State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.
git clone https://github.com/CFinTech/SparseSSM
cd SparseSSM
pip install -r requirements.txtThe data for calibrations can be downloaded here.
To prune the SSM module, you can run the following command:
CUDA_VISIBLE_DEVICES=${your_gpu_id} python main.py \
path/to/your/model wikitext2 \
--experiment_name your_experiment_name\
--method "sparsessm_dev" \
--save path/to/pruned_model \
--sparsity 0.5 \
--nsamples 64 \
--minlayer 0 \
--maxlayer 100 \
--prune_A True \
--do_prune \
--eval_zero_shot \
--log_wandb \Illustration of SparseSSM. The first row depicts the evolution of the diagonal parameter matrix
Performance analysis for one-shot unstructured pruning of SSM modules in Mamba models at
- This source code is derived from the famous PyTorch reimplementation of SparseGPT and mamba-minimal.
- We use Mamba checkpoints to test our method.
- The README file is inspired by LLM-pruner.
If you find this work useful for your research, please consider citing our paper:
@article{tuo2025sparsessm,
title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot},
author={Kaiwen Tuo and Huan Wang},
journal={arXiv preprint arXiv:2506.09613},
year={2025},
}


