Skip to content

Conversation

WeiqunZhang
Copy link
Member

No description provided.

@WeiqunZhang WeiqunZhang marked this pull request as draft September 29, 2025 00:42
@WeiqunZhang
Copy link
Member Author

We might want to using tiling for CPU implementation.

@WeiqunZhang
Copy link
Member Author

WeiqunZhang commented Sep 29, 2025

On my machine, the new CPU version with cache blocking is 4x faster than the original version without cache blocking.

@WeiqunZhang WeiqunZhang marked this pull request as ready for review September 29, 2025 01:47
@WeiqunZhang WeiqunZhang merged commit 0e4fb2f into AMReX-Codes:development Sep 30, 2025
75 checks passed
@WeiqunZhang WeiqunZhang deleted the transpose branch September 30, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants