Skip to content

Commit 14edc59

Browse files
authored
Fix columntable materialization on stored schema (#360)
Fixes #357. The issue here is for stored schema, the type of the schema is `Schema{nothing, nothing}` which usually indicates tables with many columns. Some tables implementations, however, like ARFFFiles.jl, may choose to explicitly store _all_ schemas, even for very narrow tables. We already have a generated branch which checks for a specialization threshold for the known-schema case, so the fix here is fairly straightforward in just actually checking if the stored schema # of columns is actually too many or not. In the end, users should be aware that `Tables.columntable` isn't a perfect, 100% kind of table implementation that is always expected to work. It was originally meant as just a test implementation that then turned out to be fairly convenient for REPL use. Users should note that generating a named tuple of columns from stored schema doesn't have a way to be particularly efficient, since it necessarily has to generate the NamedTuple type at runtime.
1 parent 722ffce commit 14edc59

File tree

2 files changed

+19
-3
lines changed

2 files changed

+19
-3
lines changed

src/namedtuples.jl

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -176,9 +176,14 @@ function columntable(sch::Schema{names, types}, cols) where {names, types}
176176
end
177177
end
178178

179-
# extremely large tables
180-
columntable(sch::Schema{nothing, nothing}, cols) =
181-
throw(ArgumentError("input table too wide ($(length(sch.names)) columns) to convert to `NamedTuple` of `AbstractVector`s"))
179+
# extremely large tables or schema explicitly stored
180+
function columntable(sch::Schema{nothing, nothing}, cols)
181+
nms = sch.names
182+
if nms !== nothing && length(nms) > SPECIALIZATION_THRESHOLD
183+
throw(ArgumentError("input table too wide ($(length(nms)) columns) to convert to `NamedTuple` of `AbstractVector`s"))
184+
end
185+
return NamedTuple{Tuple(map(Symbol, nms))}(Tuple(getarray(getcolumn(cols, nms[i])) for i = 1:length(nms)))
186+
end
182187

183188
# unknown schema case
184189
columntable(::Nothing, cols) =

test/runtests.jl

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1040,3 +1040,14 @@ end
10401040
@test DataAPI.nrow(Tables.dictrowtable([(a=1, b=2), (a=3, b=4), (a=5, b=6)])) == 3
10411041
@test DataAPI.ncol(Tables.dictrowtable([(a=1, b=2), (a=3, b=4), (a=5, b=6)])) == 2
10421042
end
1043+
1044+
@testset "#357" begin
1045+
dct = Tables.dictcolumntable((a=1:3, b=4.0:6.0, c=["7", "8", "9"]))
1046+
sch = Tables.schema(dct)
1047+
sch = Tables.Schema(sch.names, sch.types, stored=true)
1048+
dct = Tables.DictColumnTable(sch, getfield(dct, :values))
1049+
nt = Tables.columntable(dct)
1050+
@test nt.a == [1, 2, 3]
1051+
@test nt.b == [4.0, 5.0, 6.0]
1052+
@test nt.c == ["7", "8", "9"]
1053+
end

0 commit comments

Comments
 (0)