Skip to content

Conversation

jiawenliu64
Copy link
Member

Summary:
The preshuffled BF16I4 batched gemm is much faster than BF16I4 batched gemm, especially in memory-bound shapes, which will be used in fbgemm + other oss system integration

Add BF16I4 preshuffled batched gemm to unblock the integration as requested

Differential Revision: D77311295

Josh Fromm and others added 2 commits June 24, 2025 10:32
Summary:
Some integrations of fbgemm kernels and oss systems like VLLM would be made simpler by the ability to slice preshuffled tensors. Prior to this diff, there were two blockers to doing that:
- Scales were required to be contiguous. This is easily addressed by more carefully setting the stride argument.
- Shuffled tensors have a non-trivial layout. We add a python helper function for slicing int4 shuffled tensors. Notably, it involves some data copying that I believe is unavoidable. Hopefully it only needs to be done during model setup.

Differential Revision: D77239566
Summary:
The preshuffled BF16I4 batched gemm is much faster than BF16I4 batched gemm, especially in memory-bound shapes, which will be used in fbgemm + other oss system integration

Add BF16I4 preshuffled batched gemm to unblock the integration as requested

Differential Revision: D77311295
Copy link

netlify bot commented Jun 25, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit f46c9e0
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/685c2c7b9f0cdb000850de59
😎 Deploy Preview https://deploy-preview-4399--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77311295

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b56c580.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants