Skip to content

Conversation

cthi
Copy link
Contributor

@cthi cthi commented Jun 9, 2025

Summary:
This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.

  • I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
  • The next diff in this stack will add the new kernels D75820688, to make the review easier
    • Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025

Copy link

netlify bot commented Jun 9, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 026366c
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/684c96b3c9c81d00082893ec
😎 Deploy Preview https://deploy-preview-4301--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

@cthi cthi force-pushed the export-D75541025 branch from 8f609f6 to ccbdff4 Compare June 12, 2025 20:38
cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

@cthi cthi force-pushed the export-D75541025 branch from ccbdff4 to 050d494 Compare June 12, 2025 20:45
cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 050d494 to 943b754 Compare June 12, 2025 21:58
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 943b754 to 6c5535c Compare June 12, 2025 22:08
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 6c5535c to 83d2886 Compare June 12, 2025 22:25
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 83d2886 to 042892a Compare June 12, 2025 22:38
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 042892a to f761d19 Compare June 13, 2025 03:36
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from f761d19 to 6aa693f Compare June 13, 2025 03:40
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

@cthi cthi force-pushed the export-D75541025 branch from 6aa693f to b28acf4 Compare June 13, 2025 14:35
@cthi cthi force-pushed the export-D75541025 branch from 731129c to 8aea124 Compare June 13, 2025 17:19
cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 8aea124 to e08e8d3 Compare June 13, 2025 19:02
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from e08e8d3 to 136a6af Compare June 13, 2025 19:10
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from 136a6af to ed34cfe Compare June 13, 2025 20:14
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from ed34cfe to e80f9ee Compare June 13, 2025 20:54
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

cthi added a commit to cthi/FBGEMM-1 that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@cthi cthi force-pushed the export-D75541025 branch from e80f9ee to 2dca156 Compare June 13, 2025 21:15
Summary:
Pull Request resolved: pytorch#4301

X-link: facebookresearch/FBGEMM#1377

This diff adds support for the tuning cache to the kernel. There should be no performance changes to the existing heuristics.
- I refactored the kernel dispatch logic to instead return the kernel function, as it removes some duplication of the kernel invoke.
- The next diff in this stack will add the new kernels D75820688, to make the review easier
  - Note that we are having some issues with adding the new kernels, as I have found this kernel is actually compiling 12 variants for each configuration, see D75820688 for more context. So for now we won't add the new kernels in D75820688, but we can just onboard it to auto tuning incase someone wants to compile them locally. Will revisit D75820688 later.

Reviewed By: q10, jiawenliu64

Differential Revision: D75541025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75541025

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 4c9313f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants