Skip to content

add mamba_chunk_scan_combined and mamba_split_conv1d_scan_combined tests #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

garrett361
Copy link

This PR adds correctness tests for mamba_chunk_scan_combined and mamba_split_conv1d_scan_combined, which seemed to be missing. Forwards and backwards are tested against their reference implementations. Correctness when providing seq_idx is also tested.

@garrett361
Copy link
Author

@tridao I know the kernels inside of mamba_chunk_scan_combined and mamba_split_conv1d_scan_combined are individually tested, but I thought it would be worth it to add these more end-to-end tests. Thoughts?p

@peterbjorgensen
Copy link

Any idea why the tolerances need to be that high?
Those tolerances seem very high for float32.
It is probably related to #683 #571

@garrett361
Copy link
Author

Yes, concerningly high, at least for the backwards where some tests need tol = 1e-1 and/or are sensitive to seeds.

My first suspicion was that it is an issue with the tests, rather than the kernels, but I haven't found any problems yet. And since the forwards tests pass at reasonable-ish 1e-2/1e-3 levels, any error would need to be a bit subtle.

I have also found some non-determinism with the backwards passes for the D grads. Haven't posted about it yet; will try to today.

@garrett361
Copy link
Author

Also, also this is relevant: non-determinism is expected in the backwards due to atomic adds, apparently.

@karannb
Copy link

karannb commented Apr 15, 2025

Any idea why the tolerances need to be that high? Those tolerances seem very high for float32. It is probably related to #683 #571

Hi, thanks for mentioning this. I posted a solution for my case in #571 , you might want to check that. I was able to manage tolerances upto 1e-8 for all gradients and outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants