Skip to content

Conversation

jiawenliu64
Copy link
Member

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1910

  • Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
  • Optimize performance of pretraining moe shapes on H100
  • Support --total-K in quantize_bench for wgrad

Differential Revision: D82325651

Copy link

netlify bot commented Sep 17, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit b5c9a1a
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68caf1a7165a4e0008dbe9b5
😎 Deploy Preview https://deploy-preview-4886--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Sep 17, 2025
@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82325651.

@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82325651.

jiawenliu64 added a commit to jiawenliu64/FBGEMM that referenced this pull request Sep 17, 2025
…ytorch#4886)

Summary:
Pull Request resolved: pytorch#4886

X-link: facebookresearch/FBGEMM#1910

- Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
- Optimize performance of pretraining moe shapes on H100
- Support total_K in quantize_bench for wgrad

Differential Revision: D82325651
@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82325651.

jiawenliu64 added a commit to jiawenliu64/FBGEMM that referenced this pull request Sep 17, 2025
…ytorch#4886)

Summary:
Pull Request resolved: pytorch#4886

X-link: facebookresearch/FBGEMM#1910

- Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
- Optimize performance of pretraining moe shapes on H100
- Support total_K in quantize_bench for wgrad

Differential Revision: D82325651
…ytorch#4886)

Summary:
Pull Request resolved: pytorch#4886

X-link: facebookresearch/FBGEMM#1910

- Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
- Optimize performance of pretraining moe shapes on H100
- Support total_K in quantize_bench for wgrad

Reviewed By: q10

Differential Revision: D82325651
@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82325651.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 57c2293.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 53f9e51.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants