Skip to content

Conversation

Yurlungur
Copy link
Collaborator

PR Summary

In this MR I try to optionally enable template instantiations and relocatable device code for the most commonly used templates at the "bottom" of the template hierarchy, so things don't have to be recompiled in as many places.

I am unsure if this will actually help with clang + GPU build times, but it's worth a shot. @rbberger please experiment and let me know if this helps. Note that I also combined this MR with #475 .

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.
  • Format your changes by using the make format command after configuring with cmake.
  • Document any new features, update documentation for changes made.
  • Make sure the copyright notice on any files you modified is up to date.
  • After creating a pull request, note it in the CHANGELOG.md file.
  • LANL employees: make sure tests pass both on the github CI and on the Darwin CI

If preparing for a new release, in addition please check the following:

  • Update the version in cmake.
  • Move the changes in the CHANGELOG.md file under a new header for the new release, and reset the categories.
  • Ensure that any when='@main' dependencies are updated to the release version in the package.py

@Yurlungur Yurlungur requested a review from rbberger March 4, 2025 18:56
@Yurlungur
Copy link
Collaborator Author

OK. Tests should pass now on this branch. Also compile times now improved by a factor of 2.5x on my laptop with this version. I'm not totally happy with how this is written, however. So I may try to clean it up a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants