Open
Description
Hi,
I noticed that if the fast-path is disabled for Mamba2, the perplexity of the model increases. I tested a 130M Mamba2 model and the PPL for wikitext measured with eleuther-eval-harness is 31 when using the fast-path and ~38 when not using the fast-path (torch_forward). Do you know why this happens?
Metadata
Metadata
Assignees
Labels
No labels