Releases: gorgonia/tensor
Bugfix: `ShallowClone` had a subtle bug where not all the properties of `Dense` was cloned.
v0.9.4 Fixes a bug in `ShallowClone` (#56)
Pointer Arithmetics fixed
Thanks to the new checkptr
feature in go tip
, a bunch of pointer arithmetics issues were found and fixed.
No API changes
Bugfix: axis decrement in aggregation functions
@bdleitner fixed a bug where axes in the along array were only ever decremented by 1 to account for prior reductions, no matter how many reductions had occurred.
v0.9.1
v0.9.0-beta
v0.9.0
The changes made in this PR is aimed at better supporting v0.9.0 of Gorgonia itself. Along the way there are some new features and optimizations, as well as some bug fixes.
The majority of the work in supporting v0.9.0 of Gorgonia is to shore up the underlying architecture to support CUDA related engines. This means moving more things to rely on Engine
while keeping the engine interface overheads low. Additionally this also means better support for column major data layouts.
- Heavier reliance on
Engine
for most functions. This allows for extensibility on the data structure. - Long standing bugbear - concepts of
RowVec
andColVec
has been removed (thanks to @matdodgson)- Touch points:
ap.go
,iterator.go
,iterator_mult.go
.shape.go
, and the tests that were correct prior to this change have semantic meaning changes too. - POTENTIAL TECH DEBT:
iterator_mult.go
- the solution of filling with ones is a little too dodgy for my liking. The alternative would be to changeBroadcastStrides
which will change even more things (Concat
,Stack
etc)
- Touch points:
- Optimization:
AP
has been depointerized in*Dense
(thanks to @docmerlin). This reduces some amount of GC pointer chasing, but not all- allocation is slightly improved. (
(array).fromSliceOrArrayer
,(array).fix()
and(array).forcefix()
are part of the improvement around the logic of allocating data.
- Bug fixes:
- Fixes subtle errors in linear algebra functions. The result is a slightly longer function but easier to reason with.
- Fixes some subtle bugs in
Concat
- see also gorgonia/gorgonia#218 - Fixed some small bugs with regards to
SampleIndex
that only show up when the slices have extreme lengths. This API should have been deprecated 2 years ago, but eh... it touched a lot of external projects.
- API changes:
Diag
is made available. Relies heavily on anEngine
's implementationNewFlatIterator
is unexported.NewAP
is unexported.MakeAP
is used instead.(Tensor).DataOrder()
is added to the definiiton of what aTensor
is.(Shape).IsScalarEquiv()
is a new method. This corresponds to the change of semantics of what aShape
should be.(Shape).CalcStrides()
is exported now. This enables users to correctly calculate strides that are consistent to what the package expects.(Shape).CalcStridesColMajor()
is exported as the method to calculate the strides of a Col-Major*Dense
.
- New Interfaces:
NonStdEngine
is anEngine that does not allocate using the default allocator. This allows for both embedding a
DefaultEngine` while overriding the allocation behaviour.Diager
- any engine that can return a tensor that only contains the diagonal values of the inputNaNChecker
andInfChecker
- engines that can check a tensor for NaN and Inf
- New Features:
- New Subpackages:
native
is a subpackage that essentially gives users a native, Go-based iterator. Basically the ability to go from a*Dense
to a[][]T
or[][][]T
without extra allocations (for the data). This was pulled intomaster
earlier, but as of v0.9.0, the generic version is available too.
- Semantic Changes:
Shape
has semantic changes regarding whether or not a shape is scalar. A scalar shape is defined to beShape{}
orShape{1}
only. Formerly,Shape{1,1}
was also considered to be scalar. Now they're considered to beScalarEquivalent
(along withShape{1, 1, .... , 1}
)- A
Dtype
that is is orderable is also now comparable for equality. IfRegisterOrd
is called with a newDtype
, it is also automatically registered asEq
.
- Cosmetic Changes:
- README has been updated to point to correct doc pages
v0.8.1
v0.8.1 sees the built in transpose function use a different algorithm to perform inplace transpose.
Prior to this version the transpose uses a cycle-chasing algorithm. This turns out to have poor cache locality. So the solution is to replace that with one that allocates a new temporary array. The transpose operation is then simpy an iterative copying to the new array. The data is then copied from the temp array back to the original array.
v0.8.1 also sees an improvement contributed by @stuartcarnie on the FlatIterator
structure. Here's the benchmark results.
benchmark old ns/op new ns/op delta
BenchmarkComplicatedGet-8 228778 199737 -12.69%
benchmark old allocs new allocs delta
BenchmarkComplicatedGet-8 2 2 +0.00%
benchmark old bytes new bytes delta
BenchmarkComplicatedGet-8 112 112 +0.00%