[mxfp8 moe training] add support for bias parm in _to_mxfp8_then_scaled_grouped_mm by danielvegamyhre · Pull Request #4386 · pytorch/ao

danielvegamyhre · 2026-05-12T18:01:25Z

Fixes #4341

Summary

The public torch.nn.functional.scaled_grouped_mm api supports a bias parameter, that the private torch._scaled_grouped_mm api does not support.
We built torchao's _to_mxfp8_then_scaled_grouped_mm before the public scaled_grouped_mm api existed, so it doesn't support bias.
Users using the public scaled_grouped_mm api are now reporting issues due to the param mismatch: MXFP8 MoE Training: Bias parameter mismatch between PyTorch's grouped_mm and TorchAO's _quantize_then_scaled_grouped_mm causes unexpected keyword argument error #4341
This PR adds support for bias.

pytorch-bot · 2026-05-12T18:01:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4386

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull jobs on OSDC in pull requests shadow mode

❌ 7 New Failures

As of commit 1a8a442 with merge base 6529fca ():

NEW FAILURES - The following jobs have failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio mslk --index-url https://download.... / linux-job (gh)
RuntimeError: Command docker exec -t 9f589518d42a2331df282bf87d6dde1ba576027f9450128c0e03720f7605900b /exec failed with exit code 2
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytor... / linux-job (gh)
RuntimeError: Command docker exec -t 8c5171569c096002e40fa0f8eacd2e845353b91b3e1741e190a0ab4dc4b03ff0 /exec failed with exit code 2
Run Regression Tests / test (CPU 2.9, linux.4xlarge, torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytor... / linux-job (gh)
RuntimeError: Command docker exec -t aaef53a85c55e1acd3616cb58bd3d72a0c618f80a497e0d0967a394dcd847c32 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.10, linux.g5.12xlarge.nvidia.gpu, torch==2.10.0 torchvision==0.25.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 391383c6edeaa357f8a34980e8a61f2718f961c0077a84a94283a798969c5767 /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0 torchvision==0.23.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t a4d494ecf6a9a2a8cd91c9c43013e9fcdfd5106f58d5f9e3624cf76436fe4c95 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.9, linux.g5.12xlarge.nvidia.gpu, torch==2.9.1 torchvision==0.24.1, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 4866c04b5afe42fb8ba24dd71ee87028c5675b365854df048316d9d3c2baa8bf /exec failed with exit code 2
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch torchvision --index-url htt... / linux-job (gh)
RuntimeError: Command docker exec -t 64e56bb426fd74409b0e817a4012103488e7f185f5e32542a69c1020c6626ee2 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre added 3 commits April 29, 2026 14:22

work on supporting bias in mxfp8 grouped mm

0a99b57

tweak output_dtype param

93ade49

update tests

1a8a442

danielvegamyhre requested review from jerryzh168 and vkuzo as code owners May 12, 2026 18:01

danielvegamyhre added mx module: training quantize_ api training flow moe labels May 12, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mxfp8 moe training] add support for bias parm in _to_mxfp8_then_scaled_grouped_mm#4386

[mxfp8 moe training] add support for bias parm in _to_mxfp8_then_scaled_grouped_mm#4386
danielvegamyhre wants to merge 3 commits into
mainfrom
bias428

danielvegamyhre commented May 12, 2026

Uh oh!

pytorch-bot Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielvegamyhre commented May 12, 2026

Summary

Uh oh!

pytorch-bot Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4386

❗ 1 Active SEVs

❌ 7 New Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented May 12, 2026 •

edited

Loading