Skip to content

Store NVFP4 block scales in swwizzled layout on tensor #2438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented Jun 24, 2025

Stacked PRs:


Store NVFP4 block scales in swizzled layout on tensor

For llama3 70b no TP sizes w/ 1024 tokens: 15% E2E speedup

In eager Before: https://fburl.com/7w3j6b1q

nvfp4
Screenshot 2025-06-24 at 4 02 42 PM
Runtime: 2436.98 μs per iteration

In eager After: https://fburl.com/s7ggvm94

nvfp4 Runtime: 2356.77 μs per iteration
Screenshot 2025-06-24 at 4 03 27 PM

In compile

Before: https://fburl.com/1gvfjjlu

nvfp4 Runtime: 576.14 μs per iteration
Screenshot 2025-06-24 at 4 11 36 PM

After: https://fburl.com/usp1xelj

nvfp4 Runtime: 486.69 μs per iteration
Screenshot 2025-06-24 at 4 11 55 PM

Throughput: 47.12 requests/s, 19998.00 total tokens/s, 9635.87 output tokens/s
Total num prompt tokens:  225190
Total num output tokens:  209407

@drisspg drisspg force-pushed the drisspg/stack/80 branch from 2621b5c to a27abbc Compare June 24, 2025 23:00
drisspg added a commit that referenced this pull request Jun 24, 2025
stack-info: PR: #2438, branch: drisspg/stack/80
Copy link

pytorch-bot bot commented Jun 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2438

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 5c059bf with merge base faf788a (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2025
@drisspg drisspg changed the title Store NVFP4 block scales in swwizzled layout on tensor Store NVFP4 block scales in swizzled layout on tensor Jun 24, 2025
@drisspg drisspg added mx topic: not user facing Use this tag if you don't want this PR to show up in release notes labels Jun 24, 2025
drisspg added a commit that referenced this pull request Jun 24, 2025
stack-info: PR: #2438, branch: drisspg/stack/80
@drisspg drisspg force-pushed the drisspg/stack/80 branch from a27abbc to 9d539f3 Compare June 24, 2025 23:17
@drisspg drisspg changed the title Store NVFP4 block scales in swizzled layout on tensor Store NVFP4 block scales in swwizzled layout on tensor Jun 24, 2025
drisspg added a commit that referenced this pull request Jun 24, 2025
stack-info: PR: #2438, branch: drisspg/stack/80
@drisspg drisspg force-pushed the drisspg/stack/80 branch from 9d539f3 to e53b456 Compare June 24, 2025 23:51
stack-info: PR: #2438, branch: drisspg/stack/80
@drisspg drisspg force-pushed the drisspg/stack/80 branch from e53b456 to 5c059bf Compare June 24, 2025 23:57
@drisspg drisspg mentioned this pull request Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. mx topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants