-
Notifications
You must be signed in to change notification settings - Fork 21
WIP: Allow missing elements when constructing ConcatDiskArrays #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
One additional thing I added here (which might be removed again) is a generic I started this process here by making If this is problematic we can revert this and fix the test to run on an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the readblock! fallback makes total sense, especially with the warning for devs. My main comment is to use a fill
value rather than missing
. In most of my own uses the fill would be zeros or the missingval of the tif file in the rest of the array
|
||
function othersize(x, id) | ||
return (x[1:(id-1)]..., x[(id+1):end]...) | ||
function ConcatDiskArray(arrays::AbstractArray{Union{<:AbstractArray,Missing}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function ConcatDiskArray(arrays::AbstractArray{Union{<:AbstractArray,Missing}}) | |
function ConcatDiskArray(arrays::AbstractArray{Union{<:AbstractArray,Missing}}; fill=missing) |
return (x[1:(id-1)]..., x[(id+1):end]...) | ||
function ConcatDiskArray(arrays::AbstractArray{Union{<:AbstractArray,Missing}}) | ||
et = Base.nonmissingtype(eltype(arrays)) | ||
T = Union{Missing,eltype(et)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T = Union{Missing,eltype(et)} | |
T = promotetype(typeof(fill), eltype(et)) |
Would be good to all arbitrary fill values, like 0.0
M = ndims(eltype(arrays)) | ||
_ConcatDiskArray(arrays, T, Val(N), Val(M)) | ||
end | ||
function ConcatDiskArray(arrays::AbstractArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function ConcatDiskArray(arrays::AbstractArray) | |
function ConcatDiskArray(arrays::AbstractArray; fill=missing) |
src/cat.jl
Outdated
N = ndims(arrays) | ||
M, T = foldl(arrays, init=(-1, Union{})) do (M, T), a | ||
if ismissing(a) | ||
(M, promote_type(Missing, T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(M, promote_type(Missing, T)) | |
(M, promote_type(fill, T)) |
@@ -18,64 +22,87 @@ struct ConcatDiskArray{T,N,P,C,HC} <: AbstractDiskArray{T,N} | |||
chunks::C | |||
haschunks::HC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haschunks::HC | |
haschunks::HC | |
fill::F |
src/cat.jl
Outdated
aout[outer_range...] = a.parents[I][array_range...] | ||
vout = view(aout, outer_range...) | ||
if ismissing(I) | ||
vout .= missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vout .= missing | |
vout .= a.fill |
data = view(aout, outer_range...) | ||
if ismissing(I) | ||
if !all(ismissing, data) | ||
@warn "Trying to write data to missing array tile, skipping write" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just error?
function readblock!(a::AbstractArray, aout, r...) | ||
# Fallback implementation for arrays that are not DiskArrays | ||
if isdisk(a) | ||
@warn "Using fallback readblock! for array $(typeof(a)). This should not happen but there should be a custom implementation." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice to have this warning
I tried it for my Sentinel-1 forest change use case and with missing it works but with zeros of the eltype of my other cubes it fails with the following error: ulia> allarrs = [haskey(idx_to_fname,(x,y)) ? parent(Cube(idx_to_fname[(x,y)])) : zero(t) for x in 33:3:57, y in 36:-3:6]
9×11 Matrix{Any}:
0x00 0x00 … 0x00
0x00 0x00 0x00
0x00 0x00 0x00
0x00 0x00 0x00
0x00 0x00 0x00
0x00 0x00 … 0x00
0x00 0x00 0x00
ZArray{UInt8} of size 15000 x 15000 ZArray{UInt8} of size 15000 x 15000 ZArray{UInt8} of size 15000 x 15000
ZArray{UInt8} of size 15000 x 15000 ZArray{UInt8} of size 15000 x 15000 ZArray{UInt8} of size 15000 x 15000
julia> diskarray_merged = DiskArrays.ConcatDiskArray(allarrs)
ERROR: ArgumentError: All arrays to concatenate must have equal ndims
Stacktrace:
[1] (::DiskArrays.var"#125#126")(::Tuple{Int64, DataType}, a::ZArray{UInt8, 2, Zarr.BloscCompressor, DirectoryStore})
@ DiskArrays /mnt/felix1/worldmap/dev/DiskArrays/src/cat.jl:39
[2] BottomRF
@ ./reduce.jl:86 [inlined]
[3] _foldl_impl(op::Base.BottomRF{DiskArrays.var"#125#126"}, init::Tuple{Int64, Core.TypeofBottom}, itr::Matrix{Any})
@ Base ./reduce.jl:62
[4] foldl_impl
@ ./reduce.jl:48 [inlined]
[5] mapfoldl_impl
@ ./reduce.jl:44 [inlined]
[6] mapfoldl
@ ./reduce.jl:175 [inlined]
[7] foldl
@ ./reduce.jl:198 [inlined]
[8] infer_eltypes
@ /mnt/felix1/worldmap/dev/DiskArrays/src/cat.jl:35 [inlined]
[9] DiskArrays.ConcatDiskArray(arrays::Matrix{Any})
@ DiskArrays /mnt/felix1/worldmap/dev/DiskArrays/src/cat.jl:56
[10] top-level scope
@ REPL[49]:1 This comes from the special casing of missing in infer_eltypes |
I added fill with arbitrary values in #256 |
#256) * Add test case for zero fill concatdiskarray * Enable arbitrary single values in concatDiskArray * Update src/cat.jl Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com> * Update src/cat.jl Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com> * Apply review Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com> * Wrap fill values in MissingTile for ConcatDiskArray This introduces a MissingTile type to wrap the values for ConcatDiskArray. This allows to use vectors as elements in the fillvalue. --------- Co-authored-by: Felix Cremer <felix.cremer@bgc-jena.mpg.de> Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
@felixcremer something went wrong in the tests when merging your branch and rebasing. Could you maybe check what is going on in the tests? |
We need to add this line at the runtests.jl file at line 980: struct TestArray2{T,N} <: AbstractArray{T,N} end This should fix the macro test |
I just realized, that this doesn't capture all use cases that I have for ChunkedFillArray. julia> a = rand(10,20)
10×20 Matrix{Float64}:
0.539125 0.367817 0.226616 0.417871 0.646191 0.876692 0.846301 … 0.334754 0.335576 0.205217 0.86204 0.090585 0.256855 0.965874
0.843005 0.264758 0.425721 0.237675 0.0207835 0.865324 0.803832 0.0676902 0.0511545 0.646324 0.421283 0.135919 0.138599 0.546225
0.610425 0.997422 0.131995 0.684199 0.409742 0.851482 0.00807563 0.7234 0.41646 0.334982 0.879538 0.608926 0.0638256 0.651027
0.875439 0.0417532 0.532343 0.972112 0.442585 0.0428194 0.870938 0.100127 0.754773 0.210889 0.667935 0.140189 0.532685 0.43808
0.645182 0.773129 0.385898 0.784275 0.942378 0.167264 0.351001 0.278384 0.21907 0.528196 0.0323599 0.8652 0.929417 0.719093
0.361373 0.773568 0.035367 0.826952 0.803518 0.599254 0.131155 … 0.661012 0.194536 0.842141 0.949948 0.515462 0.61466 0.388635
0.141028 0.0480575 0.474805 0.0807357 0.610856 0.976086 0.967459 0.161204 0.130386 0.6826 0.16455 0.0957621 0.542587 0.694546
0.868268 0.0415871 0.0128882 0.816408 0.556889 0.289149 0.70118 0.124271 0.175424 0.524135 0.125642 0.000570321 0.204595 0.157297
0.566457 0.31898 0.287166 0.311565 0.423148 0.225144 0.172287 0.642905 0.683538 0.65991 0.189106 0.0347848 0.31928 0.96579
0.216138 0.0772371 0.581787 0.224558 0.190489 0.921081 0.43512 0.456608 0.0183954 0.675053 0.104787 0.120345 0.464202 0.823329
julia> [a, 1 a,1]
ERROR: ParseError:
# Error @ REPL[14]:1:7
[a, 1 a,1]
# └─┘ ── Expected `]`
Stacktrace:
[1] top-level scope
@ none:1
julia> r1 = [MissingTile(0), MissingTile(0)]
2-element Vector{MissingTile{Int64}}:
MissingTile{Int64}(0)
MissingTile{Int64}(0)
julia> r2 = [a, a]
2-element Vector{Matrix{Float64}}:
[0.5391252836305531 0.36781671324764775 … 0.25685458230643265 0.965874106936992; 0.843004616065691 0.2647580317611108 … 0.13859908306278845 0.5462248541622374; … ; 0.5664565229318287 0.3189804021036051 … 0.3192800293310547 0.9657901568862917; 0.21613796850187095 0.07723710825843488 … 0.46420197169408306 0.823329303818388]
[0.5391252836305531 0.36781671324764775 … 0.25685458230643265 0.965874106936992; 0.843004616065691 0.2647580317611108 … 0.13859908306278845 0.5462248541622374; … ; 0.5664565229318287 0.3189804021036051 … 0.3192800293310547 0.9657901568862917; 0.21613796850187095 0.07723710825843488 … 0.46420197169408306 0.823329303818388]
julia> arrays = cat(r1, r2, dims=2)
2×2 Matrix{Any}:
MissingTile{Int64}(0) … [0.539125 0.367817 … 0.256855 0.965874; 0.843005 0.264758 … 0.138599 0.546225; … ; 0.566457 0.31898 … 0.31928 0.96579; 0.216138 0.0772371 … 0.464202 0.823329]
MissingTile{Int64}(0) [0.539125 0.367817 … 0.256855 0.965874; 0.843005 0.264758 … 0.138599 0.546225; … ; 0.566457 0.31898 … 0.31928 0.96579; 0.216138 0.0772371 … 0.464202 0.823329]
julia> DiskArrays.ConcatDiskArray(arrays)
(N, M) = (2, 2)
20×21 DiskArrays.ConcatDiskArray{Float64, 2, Matrix{Any}, DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.IrregularChunks}}, DiskArrays.Chunked{DiskArrays.SubRanges{DiskArrays.NoStepRange}}, 2}
Chunked: (
[10, 10]
[1, 20]
)
And this is rightfully put into a 20 by 21 shape instead of the 20 by 40 shape I was hoping for. I am not sure, whether we can do a better interface for that than just using ChunkedFillArray but this is a caveat that should be documented somewhere. |
This PR contains a more efficient construction of
ConcatDiskArray
s and fixes #248 . In addition, it is now allowed that the array of arrays containsmissing
entries, which means that these represent tiles that contains only missings. These missing values are never allocated but will be inserted on the fly inside thereadblock!
machinery. This might make constructing mosaics of tiles less cumbersome and no need for tricks like rechunking FillArrays or similar (like in meggart/DiskArrayTools.jl#37) just to represent missing values.