We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于转置的实现,这其中有两个核函数,这两个核函数是不是重复了,我看转置计算的代码分解开其实是一样的,你只是调换了global_x和global_y,但本质展开计算的代码是一样的
The text was updated successfully, but these errors were encountered:
还有,global void mat_transpose_f32x4_shared_col2row2d_kernel 及以下的函数测试的结果都不正确,是有什么操作不对吗
Sorry, something went wrong.
这个你可以理解为 col2row 是按照输入矩阵逐元素起线程;row2col 是按照输出矩阵逐元素起线程。
col2row
row2col
关于编码虽然当时写的潦草但是基本上确实是前面的人提到的思路。 至于测试结果错误的问题,我目前在3090设备上进行测试好像还是没有问题,但是这个问题可能和设备有关,麻烦你提供更加详细的信息或者尝试调整一下M、N的取值范围。
bear-zd
No branches or pull requests
关于转置的实现,这其中有两个核函数,这两个核函数是不是重复了,我看转置计算的代码分解开其实是一样的,你只是调换了global_x和global_y,但本质展开计算的代码是一样的
The text was updated successfully, but these errors were encountered: