This repository implements and benchmarks different matrix transpose algorithms. Definitely check out the corresponding blog post
The data folder contains the original benchmark data from the tested architectures that was used in the experimental analyses.
The lib folder contains C-functions used by all tested algorithms with the corresponding header file.
The src folder contains C files with the different algorithms for matrix transposition.
After cloning the repository you can run
make
and the files in src are compiled, and the benchmark test is started and stored in the stats folder (that is created by the Makefile). Waring: It can take a lot of time for the benchmarks to finish!
If you only want to compile the files in src run
make compile
This should compile all C files in src and store them into the (newly created) bin folder without starting the benchmarks. Compiled binaries follow the naming convention of <ALGORITHM>-<OPTIMIZATION LEVEL>
The correctness of the provided implementations can be verified by running the compiled binaries in 'debug mode'. After compilation you can run
./bin/<BINARY> <MATRIX SIZE> --debug
For example
./bin/naive-0 2 --debug
Should output a randomly initialized matrix with dimension 2^2 and the corresponding transposed matrix. Additionaly the execution time and the effective bandwidth are displayed.