-
Notifications
You must be signed in to change notification settings - Fork 657
Support tuning cache for Cutlass FP8 GEMM #4301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This pull request was exported from Phabricator. Differential Revision: D75541025 |
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
Differential Revision: D75540999
Differential Revision: D75541013
Differential Revision: D75806957
This pull request was exported from Phabricator. Differential Revision: D75541025 |
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
Summary: Pull Request resolved: pytorch#4301 X-link: facebookresearch/FBGEMM#1377 This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics. - I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke. - The next diff in this stack will add the new kernels D75820688, to make the review easier - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later. Reviewed By: q10, jiawenliu64 Differential Revision: D75541025
This pull request was exported from Phabricator. Differential Revision: D75541025 |
This pull request has been merged in 4c9313f. |
Summary:
This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
Reviewed By: q10, jiawenliu64
Differential Revision: D75541025