|
| 1 | +# Implementing the Interface (i.e. becoming a Tables.jl source) |
| 2 | + |
| 3 | +Now that we've seen how one _uses_ the Tables.jl interface, let's walk-through how to implement it; i.e. how can I |
| 4 | +make my custom type valid for Tables.jl consumers? |
| 5 | + |
| 6 | +For a type `MyTable`, the interface to becoming a proper table is straightforward: |
| 7 | + |
| 8 | +| Required Methods | Default Definition | Brief Description | |
| 9 | +|----------------------------------------------|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------| |
| 10 | +| `Tables.istable(::Type{MyTable})` | | Declare that your table type implements the interface | |
| 11 | +| **One of:** | | | |
| 12 | +| `Tables.rowaccess(::Type{MyTable})` | | Declare that your table type defines a `Tables.rows(::MyTable)` method | |
| 13 | +| `Tables.rows(x::MyTable)` | | Return an `Tables.AbstractRow`-compatible iterator from your table | |
| 14 | +| **Or:** | | | |
| 15 | +| `Tables.columnaccess(::Type{MyTable})` | | Declare that your table type defines a `Tables.columns(::MyTable)` method | |
| 16 | +| `Tables.columns(x::MyTable)` | | Return an `Tables.AbstractColumns`-compatible object from your table | |
| 17 | +| **Optional methods** | | | |
| 18 | +| `Tables.schema(x::MyTable)` | `Tables.schema(x) = nothing` | Return a [`Tables.Schema`](@ref) object from your `Tables.AbstractRow` iterator or `Tables.AbstractColumns` object; or `nothing` for unknown schema | |
| 19 | +| `Tables.materializer(::Type{MyTable})` | `Tables.columntable` | Declare a "materializer" sink function for your table type that can construct an instance of your type from any Tables.jl input | |
| 20 | +| `Tables.subset(x::MyTable, inds; viewhint)` | | Return a row or a sub-table of the original table | |
| 21 | +| `DataAPI.nrow(x::MyTable)` | | Return number of rows of table `x` | |
| 22 | +| `DataAPI.ncol(x::MyTable)` | | Return number of columns of table `x` | |
| 23 | + |
| 24 | +Based on whether your table type has defined `Tables.rows` or `Tables.columns`, you then ensure that the `Tables.AbstractRow` iterator |
| 25 | +or `Tables.AbstractColumns` object satisfies the respective interface. |
| 26 | + |
| 27 | +As an additional source of documentation, see [this discourse post](https://discourse.julialang.org/t/struggling-to-implement-tables-jl-interface-for-vector-mystruct/42318/7?u=quinnj) outlining in detail a walk-through of making a row-oriented table. |
| 28 | + |
| 29 | +## `Tables.AbstractRow` |
| 30 | + |
| 31 | +```@docs; canonical = false |
| 32 | +Tables.AbstractRow |
| 33 | +``` |
| 34 | + |
| 35 | +## `Tables.AbstractColumns` |
| 36 | + |
| 37 | +```@docs; canonical = false |
| 38 | +Tables.AbstractColumns |
| 39 | +``` |
| 40 | + |
| 41 | +## Implementation Example |
| 42 | +As an extended example, let's take a look at some code defined in Tables.jl for treating `AbstractVecOrMat`s as tables. |
| 43 | + |
| 44 | +First, we define a special `MatrixTable` type that will wrap an `AbstractVecOrMat`, and allow easy overloading for the |
| 45 | +Tables.jl interface. |
| 46 | + |
| 47 | +```julia |
| 48 | +struct MatrixTable{T <: AbstractVecOrMat} <: Tables.AbstractColumns |
| 49 | + names::Vector{Symbol} |
| 50 | + lookup::Dict{Symbol, Int} |
| 51 | + matrix::T |
| 52 | +end |
| 53 | +# declare that MatrixTable is a table |
| 54 | +Tables.istable(::Type{<:MatrixTable}) = true |
| 55 | +# getter methods to avoid getproperty clash |
| 56 | +names(m::MatrixTable) = getfield(m, :names) |
| 57 | +matrix(m::MatrixTable) = getfield(m, :matrix) |
| 58 | +lookup(m::MatrixTable) = getfield(m, :lookup) |
| 59 | +# schema is column names and types |
| 60 | +Tables.schema(m::MatrixTable{T}) where {T} = Tables.Schema(names(m), fill(eltype(T), size(matrix(m), 2))) |
| 61 | +``` |
| 62 | + |
| 63 | +Here we defined `Tables.istable` for all `MatrixTable` types, signaling that they implement the Tables.jl interfaces. |
| 64 | +We also defined [`Tables.schema`](@ref) by pulling the column names out that we stored, and since `AbstractVecOrMat` have a single |
| 65 | +`eltype`, we repeat it for each column (the call to `fill`). Note that defining [`Tables.schema`](@ref) is optional on tables; by default, `nothing` |
| 66 | +is returned and Tables.jl consumers should account for both known and unknown schema cases. Returning a schema when possible allows consumers |
| 67 | +to have certain optimizations when they can know the types of all columns upfront (and if the # of columns isn't too large) |
| 68 | +to generate more efficient code. |
| 69 | + |
| 70 | +Now, in this example, we're actually going to have `MatrixTable` implement _both_ `Tables.rows` and `Tables.columns` |
| 71 | +methods itself, i.e. it's going to return itself from those functions, so here's first how we make our `MatrixTable` a |
| 72 | +valid `Tables.AbstractColumns` object: |
| 73 | + |
| 74 | +```julia |
| 75 | +# column interface |
| 76 | +Tables.columnaccess(::Type{<:MatrixTable}) = true |
| 77 | +Tables.columns(m::MatrixTable) = m |
| 78 | +# required Tables.AbstractColumns object methods |
| 79 | +Tables.getcolumn(m::MatrixTable, ::Type{T}, col::Int, nm::Symbol) where {T} = matrix(m)[:, col] |
| 80 | +Tables.getcolumn(m::MatrixTable, nm::Symbol) = matrix(m)[:, lookup(m)[nm]] |
| 81 | +Tables.getcolumn(m::MatrixTable, i::Int) = matrix(m)[:, i] |
| 82 | +Tables.columnnames(m::MatrixTable) = names(m) |
| 83 | +``` |
| 84 | + |
| 85 | +We define `columnaccess` for our type, then `columns` just returns the `MatrixTable` itself, and then we define |
| 86 | +the three `getcolumn` methods and `columnnames`. Note the use of a `lookup` `Dict` that maps column name to column index |
| 87 | +so we can figure out which column to return from the matrix. We're also storing the column names in our `names` field |
| 88 | +so the `columnnames` implementation is trivial. And that's it! Literally! It can now be written out to a csv file, |
| 89 | +stored in a sqlite or other database, converted to DataFrame or JuliaDB table, etc. Pretty fun. |
| 90 | + |
| 91 | +And now for the `Tables.rows` implementation: |
| 92 | +```julia |
| 93 | +# declare that any MatrixTable defines its own `Tables.rows` method |
| 94 | +rowaccess(::Type{<:MatrixTable}) = true |
| 95 | +# just return itself, which means MatrixTable must iterate `Tables.AbstractRow`-compatible objects |
| 96 | +rows(m::MatrixTable) = m |
| 97 | +# the iteration interface, at a minimum, requires `eltype`, `length`, and `iterate` |
| 98 | +# for `MatrixTable` `eltype`, we're going to provide a custom row type |
| 99 | +Base.eltype(m::MatrixTable{T}) where {T} = MatrixRow{T} |
| 100 | +Base.length(m::MatrixTable) = size(matrix(m), 1) |
| 101 | + |
| 102 | +Base.iterate(m::MatrixTable, st=1) = st > length(m) ? nothing : (MatrixRow(st, m), st + 1) |
| 103 | + |
| 104 | +# a custom row type; acts as a "view" into a row of an AbstractVecOrMat |
| 105 | +struct MatrixRow{T} <: Tables.AbstractRow |
| 106 | + row::Int |
| 107 | + source::MatrixTable{T} |
| 108 | +end |
| 109 | +# required `Tables.AbstractRow` interface methods (same as for `Tables.AbstractColumns` object before) |
| 110 | +# but this time, on our custom row type |
| 111 | +getcolumn(m::MatrixRow, ::Type, col::Int, nm::Symbol) = |
| 112 | + getfield(getfield(m, :source), :matrix)[getfield(m, :row), col] |
| 113 | +getcolumn(m::MatrixRow, i::Int) = |
| 114 | + getfield(getfield(m, :source), :matrix)[getfield(m, :row), i] |
| 115 | +getcolumn(m::MatrixRow, nm::Symbol) = |
| 116 | + getfield(getfield(m, :source), :matrix)[getfield(m, :row), getfield(getfield(m, :source), :lookup)[nm]] |
| 117 | +columnnames(m::MatrixRow) = names(getfield(m, :source)) |
| 118 | +``` |
| 119 | +Here we start by defining `Tables.rowaccess` and `Tables.rows`, and then the iteration interface methods, |
| 120 | +since we declared that a `MatrixTable` itself is an iterator of `Tables.AbstractRow`-compatible objects. For `eltype`, |
| 121 | +we say that a `MatrixTable` iterates our own custom row type, `MatrixRow`. `MatrixRow` subtypes |
| 122 | +`Tables.AbstractRow`, which provides interface implementations for several useful behaviors (indexing, |
| 123 | +iteration, property-access, etc.); essentially it makes our custom `MatrixRow` type more convenient to work with. |
| 124 | + |
| 125 | +Implementing the `Tables.AbstractRow` interface is straightforward, and very similar to our implementation |
| 126 | +of `Tables.AbstractColumns` previously (i.e. the same methods for `getcolumn` and `columnnames`). |
| 127 | + |
| 128 | +And that's it. Our `MatrixTable` type is now a fully fledged, valid Tables.jl source and can be used throughout |
| 129 | +the ecosystem. Now, this is obviously not a lot of code; but then again, the actual Tables.jl interface |
| 130 | +implementations tend to be fairly simple, given the other behaviors that are already defined for table types |
| 131 | +(i.e. table types tend to already have a `getcolumn` like function defined). |
| 132 | + |
| 133 | +## `Tables.isrowtable` |
| 134 | + |
| 135 | +One option for certain table types is to define `Tables.isrowtable` to automatically satisfy the Tables.jl interface. |
| 136 | +This can be convenient for "natural" table types that already iterate rows. |
| 137 | +```@docs; canonical = false |
| 138 | +Tables.isrowtable |
| 139 | +``` |
| 140 | + |
| 141 | +## Testing Tables.jl Implementations |
| 142 | + |
| 143 | +One question that comes up is what the best strategies are for testing a Tables.jl implementation. Continuing with |
| 144 | +our `MatrixTable` example, let's see some useful ways to test that things are working as expected. |
| 145 | + |
| 146 | +```julia |
| 147 | +mat = [1 4.0 "7"; 2 5.0 "8"; 3 6.0 "9"] |
| 148 | +``` |
| 149 | + |
| 150 | +First, we define a matrix literal with three columns of various differently typed values. |
| 151 | + |
| 152 | +```julia |
| 153 | +# first, create a MatrixTable from our matrix input |
| 154 | +mattbl = Tables.table(mat) |
| 155 | +# test that the MatrixTable `istable` |
| 156 | +@test Tables.istable(typeof(mattbl)) |
| 157 | +# test that it defines row access |
| 158 | +@test Tables.rowaccess(typeof(mattbl)) |
| 159 | +@test Tables.rows(mattbl) === mattbl |
| 160 | +# test that it defines column access |
| 161 | +@test Tables.columnaccess(typeof(mattbl)) |
| 162 | +@test Tables.columns(mattbl) === mattbl |
| 163 | +# test that we can access the first "column" of our matrix table by column name |
| 164 | +@test mattbl.Column1 == [1,2,3] |
| 165 | +# test our `Tables.AbstractColumns` interface methods |
| 166 | +@test Tables.getcolumn(mattbl, :Column1) == [1,2,3] |
| 167 | +@test Tables.getcolumn(mattbl, 1) == [1,2,3] |
| 168 | +@test Tables.columnnames(mattbl) == [:Column1, :Column2, :Column3] |
| 169 | +# now let's iterate our MatrixTable to get our first MatrixRow |
| 170 | +matrow = first(mattbl) |
| 171 | +@test eltype(mattbl) == typeof(matrow) |
| 172 | +# now we can test our `Tables.AbstractRow` interface methods on our MatrixRow |
| 173 | +@test matrow.Column1 == 1 |
| 174 | +@test Tables.getcolumn(matrow, :Column1) == 1 |
| 175 | +@test Tables.getcolumn(matrow, 1) == 1 |
| 176 | +@test propertynames(mattbl) == propertynames(matrow) == [:Column1, :Column2, :Column3] |
| 177 | +``` |
| 178 | + |
| 179 | +So, it looks like our `MatrixTable` type is looking good. It's doing everything we'd expect with regards to accessing |
| 180 | +its rows or columns via the Tables.jl API methods. Testing a table source like this is fairly straightforward since |
| 181 | +we're really just testing that our interface methods are doing what we expect them to do. |
| 182 | + |
| 183 | +Now, while we didn't go over a "sink" function for matrices in our walkthrough, there does indeed exist a `Tables.matrix` function that allows converting any table input source into a plain Julia `Matrix` object. |
| 184 | + |
| 185 | +Having both Tables.jl "source" and "sink" implementations (i.e. a type that is a Tables.jl-compatible source, |
| 186 | +as well as a way to _consume_ other tables), allows us to do some additional "round trip" testing: |
| 187 | + |
| 188 | +```julia |
| 189 | +rt = [(a=1, b=4.0, c="7"), (a=2, b=5.0, c="8"), (a=3, b=6.0, c="9")] |
| 190 | +ct = (a=[1,2,3], b=[4.0, 5.0, 6.0]) |
| 191 | +``` |
| 192 | + |
| 193 | +In addition to our `mat` object earlier, we can define a couple simple "tables"; in this case `rt` is a kind of default "row table" as a `Vector` of `NamedTuple`s, while `ct` is a default "column table" as a `NamedTuple` of `Vector`s. Notice that they contain mostly the same data as our matrix literal earlier, yet in slightly different storage formats. These default "row" and "column" tables are supported by default in Tables.jl due do their natural table representations, and hence can be excellent tools in testing table integrations. |
| 194 | + |
| 195 | +```julia |
| 196 | +# let's turn our row table into a plain Julia Matrix object |
| 197 | +mat = Tables.matrix(rt) |
| 198 | +# test that our matrix came out like we expected |
| 199 | +@test mat[:, 1] == [1, 2, 3] |
| 200 | +@test size(mat) == (3, 3) |
| 201 | +@test eltype(mat) == Any |
| 202 | +# so we successfully consumed a row-oriented table, |
| 203 | +# now let's try with a column-oriented table |
| 204 | +mat2 = Tables.matrix(ct) |
| 205 | +@test eltype(mat2) == Float64 |
| 206 | +@test mat2[:, 1] == ct.a |
| 207 | + |
| 208 | +# now let's take our matrix input, and make a column table out of it |
| 209 | +tbl = Tables.table(mat) |> columntable |
| 210 | +@test keys(tbl) == (:Column1, :Column2, :Column3) |
| 211 | +@test tbl.Column1 == [1, 2, 3] |
| 212 | +# and same for a row table |
| 213 | +tbl2 = Tables.table(mat2) |> rowtable |
| 214 | +@test length(tbl2) == 3 |
| 215 | +@test map(x->x.Column1, tbl2) == [1.0, 2.0, 3.0] |
| 216 | +``` |
0 commit comments