Skip to content

v0.8.1

Compare
Choose a tag to compare
@chewxy chewxy released this 22 Jan 03:37
· 119 commits to master since this release
2ab2564

v0.8.1 sees the built in transpose function use a different algorithm to perform inplace transpose.

Prior to this version the transpose uses a cycle-chasing algorithm. This turns out to have poor cache locality. So the solution is to replace that with one that allocates a new temporary array. The transpose operation is then simpy an iterative copying to the new array. The data is then copied from the temp array back to the original array.

v0.8.1 also sees an improvement contributed by @stuartcarnie on the FlatIterator structure. Here's the benchmark results.

benchmark                     old ns/op     new ns/op     delta
BenchmarkComplicatedGet-8     228778        199737        -12.69%

benchmark                     old allocs     new allocs     delta
BenchmarkComplicatedGet-8     2              2              +0.00%

benchmark                     old bytes     new bytes     delta
BenchmarkComplicatedGet-8     112           112           +0.00%