Skip to content

Commit 9910493

Browse files
committed
use record type names to drive version declaration, rather than identifiers
1 parent 2fe38fc commit 9910493

File tree

5 files changed

+211
-163
lines changed

5 files changed

+211
-163
lines changed

docs/src/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,6 @@ Technically, Legolas.jl's core `@schema`/`@version` functionality is agnostic to
1212

1313
Otherwise, with regards to (de)serialization-specific functionality, Beacon has put effort into ensuring Legolas.jl works well with [Arrow.jl](https://github.com/JuliaData/Arrow.jl) "by default" simply because we're heavy users of the Arrow format. There's nothing stopping users from composing the package with [JSON3.jl](https://github.com/quinnj/JSON3.jl) or other packages.
1414

15-
## Why are Legolas.jl's generated record types defined the way that they are? For example, why is the version number hardcoded
15+
## Why are Legolas.jl's generated record types defined the way that they are? For example, why is the version number hardcoded in the type name?
1616

1717
Many of Legolas' current choices on this front stem from refactoring efforts undertaken as part of [this pull request](https://github.com/beacon-biosignals/Legolas.jl/pull/54), and directly resulted from a [design mini-investigation](https://gist.github.com/jrevels/fdfe939109bee23566d425440b7c759e) associated with those efforts.

docs/src/schema-concepts.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ Schema authors should follow the below conventions when choosing the name of a n
1919

2020
1. Include a namespace. For example, assuming the schema is defined in a package Foo.jl, `foo.automobile` is good, `automobile` is bad.
2121
2. Prefer singular over plural. For example, `foo.automobile` is good, `foo.automobiles` is bad.
22-
3. Don't "overqualify" the schema name with ancestor-derived information. For example, `bar.automobile@1>foo.automobile@1` is good, `baz.supercar@1>bar.automobile@1` is good, `bar.foo.automobile@1>foo.automobile@1` is bad, `baz.automobile.supercar@1>bar.automobile@1` is bad.
22+
3. Don't "overqualify" a schema name with ancestor-derived information that is better captured by the fully qualified identifier of a specific schema version. For example,
23+
`bar.automobile` should be preferred over `bar.foo.automobile`, since `bar.automobile@1>foo.automobile@1` is preferrable to `bar.foo.automobile@1>foo.automobile@1`. Similarly, `baz.supercar` should be preferred over `baz.automobile.supercar`, since `baz.supercar@1>bar.automobile@1` is preferrable to `baz.automobile.supercar@1>bar.automobile@1`.
2324

2425
## Schema Versioning: You Break It, You Bump It
2526

@@ -33,7 +34,7 @@ For example, a schema author must introduce a new schema version for any of the
3334
- An existing required field's type restriction is tightened.
3435
- An existing required field is renamed.
3536

36-
One benefit of Legolas' approach is that multiple schema versions may be defined in the same codebase, e.g. there's nothing that prevents `@version("my-schema@1", ...)` and `@version("my-schema@2", ...)` from being defined and utilized simultaneously. The source code that defines any given Legolas schema version and/or consumes/produces Legolas tables is presumably already semantically versioned, such that consumer/producer packages can determine their compatibility with each other in the usual manner via interpreting major/minor/patch increments.
37+
One benefit of Legolas' approach is that multiple schema versions may be defined in the same codebase, e.g. there's nothing that prevents `@version(FooV1, ...)` and `@version(FooV2, ...)` from being defined and utilized simultaneously. The source code that defines any given Legolas schema version and/or consumes/produces Legolas tables is presumably already semantically versioned, such that consumer/producer packages can determine their compatibility with each other in the usual manner via interpreting major/minor/patch increments.
3738

3839
Note that it is preferable to avoid introducing new versions of an existing schema, if possible, in order to minimize code/data churn for downstream producers/consumers. Thus, authors should prefer conservative field type restrictions from the get-go. Remember: loosening a field type restriction is not a breaking change, but tightening one is.
3940

examples/tour.jl

Lines changed: 59 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -18,25 +18,41 @@ using Legolas: @schema, @version, complies_with, find_violation, validate
1818
# let's start the tour by declaring a new Legolas schema via the `@schema` macro.
1919

2020
# Here, we declare a new schema named `example.foo`, specifying that Legolas should
21-
# use the prefix `Foo` whenever it generates `example.foo`-related type definitions:
21+
# use the prefix `Foo` for all `example.foo`-related type definitions:
2222
@schema "example.foo" Foo
2323

2424
# The above schema declaration provides the necessary scaffolding to start declaring
2525
# new *versions* of the `example.foo` schema. Schema version declarations specify the
2626
# set of required fields that a given table (or row) must contain in order to comply
2727
# with that schema version. Let's use the `@version` macro to declare an initial
2828
# version of the `example.foo` schema with some required fields:
29-
@version "example.foo@1" begin
29+
@version FooV1 begin
3030
a::Real
3131
b::String
3232
c
3333
d::AbstractVector
3434
end
3535

36-
# Behind the scenes, this `@version` declaration automatically generated some type definitions
37-
# and overloaded a bunch of useful Legolas methods with respect to `example.foo@1`. One of the
38-
# types it generated is `FooSchemaV1`, an alias for `Legolas.SchemaVersion`:
39-
@test FooSchemaV1() == Legolas.SchemaVersion("example.foo", 1)
36+
# In the above declaration, the symbol `FooV1` can be broken into the prefix `Foo` (as
37+
# specified in `example.foo`'s `@schema` declaration) and `1`, the integer that identifies
38+
# this particular version of the `example.foo` schema. The `@version` macro requires this
39+
# symbol to always follow this format (`$(prefix)V$(integer)`), because it generates two
40+
# special types that match it. For example, our `@version` declaration above generated:
41+
#
42+
# - `FooV1`: A special subtype of `Tables.AbstractRow` whose fields match the corresponding
43+
# schema version's declared required fields.
44+
# - `FooV1SchemaVersion`: An alias for `Legolas.SchemaVersion` that matches the corresponding
45+
# schema version.
46+
47+
# Let's first examine `FooV1SchemaVersion`:
48+
@test Legolas.SchemaVersion("example.foo", 1) == FooV1SchemaVersion()
49+
@test Legolas.SchemaVersion("example.foo", 1) isa FooV1SchemaVersion
50+
@test "example.foo@1" == Legolas.identifier(FooV1SchemaVersion())
51+
52+
# As you can see, Legolas' Julia-agnostic identifier for this schema version is `example.foo@1`.
53+
# To avoid confusion throughout this tour, we'll use this Julia-agnostic identifier to refer to
54+
# individual schema versions in the abstract sense, while we'll use the relevant `SchemaVersion`
55+
# aliases to specifically refer to the types that represent schema versions in Julia.
4056

4157
#####
4258
##### `Tables.Schema` Compliance/Validation
@@ -53,28 +69,28 @@ for s in [Tables.Schema((:a, :b, :c, :d), (Real, String, Any, AbstractVector)),
5369
Tables.Schema((:a, :b, :d), (Int, String, Vector)), # Fields whose declared type constraints are `>:Missing` may be elided entirely.
5470
Tables.Schema((:a, :x, :b, :y, :d), (Int, Any, String, Any, Vector))] # Non-required fields may also be present.
5571
# if `complies_with` finds a violation, it returns `false`; returns `true` otherwise
56-
@test complies_with(s, FooSchemaV1())
72+
@test complies_with(s, FooV1SchemaVersion())
5773

5874
# if `validate` finds a violation, it throws an error indicating the violation;
5975
# returns `nothing` otherwise
60-
@test validate(s, FooSchemaV1()) isa Nothing
76+
@test validate(s, FooV1SchemaVersion()) isa Nothing
6177

6278
# if `find_violation` finds a violation, it returns a tuple indicating the relevant
6379
# field and its violation; returns `nothing` otherwise
64-
@test isnothing(find_violation(s, FooSchemaV1()))
80+
@test isnothing(find_violation(s, FooV1SchemaVersion()))
6581
end
6682

6783
# ...while the below `Tables.Schema`s do not:
6884

6985
s = Tables.Schema((:a, :c, :d), (Int, Float64, Vector)) # The required non-`>:Missing` field `b::String` is not present.
70-
@test !complies_with(s, FooSchemaV1())
71-
@test_throws ArgumentError validate(s, FooSchemaV1())
72-
@test isequal(find_violation(s, FooSchemaV1()), :b => missing)
86+
@test !complies_with(s, FooV1SchemaVersion())
87+
@test_throws ArgumentError validate(s, FooV1SchemaVersion())
88+
@test isequal(find_violation(s, FooV1SchemaVersion()), :b => missing)
7389

7490
s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of required field `d::AbstractVector` is not `<:AbstractVector`.
75-
@test !complies_with(s, FooSchemaV1())
76-
@test_throws ArgumentError validate(s, FooSchemaV1())
77-
@test isequal(find_violation(s, FooSchemaV1()), :d => Any)
91+
@test !complies_with(s, FooV1SchemaVersion())
92+
@test_throws ArgumentError validate(s, FooV1SchemaVersion())
93+
@test isequal(find_violation(s, FooV1SchemaVersion()), :d => Any)
7894

7995
# The expectations that characterize Legolas' particular notion of "schematic compliance" - requiring the
8096
# presence of pre-specified declared fields, assuming non-present fields to be implicitly `missing`, and allowing
@@ -87,12 +103,14 @@ s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of r
87103
#####
88104
##### Legolas-Generated Record Types
89105
#####
90-
# In addition to `FooSchemaV1`, `example.foo@1`'s `@version` declaration also generated a new type,
91-
# `FooV1 <: Tables.AbstractRow`, whose fields are guaranteed to match all the fields required by
92-
# `example.foo@1`. We refer to such Legolas-generated types as "Legolas record types" (see
93-
# https://en.wikipedia.org/wiki/Record_(computer_science)).
94106

95-
# Legolas record type constructors accept keyword arguments or `Tables.AbstractRow`-compliant values:
107+
# As mentioned in this tour's introduction, `FooV1` is a subtype of `Tables.AbstractRow` whose fields are guaranteed to
108+
# match all the fields required by `example.foo@1`. We refer to such Legolas-generated types as "record types" (see
109+
# https://en.wikipedia.org/wiki/Record_(computer_science)). These record types are direct subtypes of
110+
# `Legolas.AbstractRecord`, which is, itself, a subtype of `Tables.AbstractRow`:
111+
@test FooV1 <: Legolas.AbstractRecord <: Tables.AbstractRow
112+
113+
# Record type constructors accept keyword arguments or `Tables.AbstractRow`-compliant values:
96114
fields = (a=1.0, b="hi", c=π, d=[1, 2, 3])
97115
@test NamedTuple(FooV1(; fields...)) == fields
98116
@test NamedTuple(FooV1(fields)) == fields
@@ -129,7 +147,7 @@ foo = FooV1(; a=1.0, b="hi", d=[1, 2, 3])
129147
# any such assignments, so let's declare a new schema version `example.bar@1` that does:
130148
@schema "example.bar" Bar
131149

132-
@version "example.bar@1" begin
150+
@version BarV1 begin
133151
x::Union{Int8,Missing} = ismissing(x) ? x : Int8(clamp(x, -128, 127))
134152
y::String = string(y)
135153
z::String = ismissing(z) ? string(y, '_', x) : z
@@ -177,7 +195,7 @@ const GLOBAL_STATE = Ref(0)
177195

178196
@schema "example.bad" Bad
179197

180-
@version "example.bad@1" begin
198+
@version BadV1 begin
181199
x::Int = x + 1
182200
y = (GLOBAL_STATE[] += y; GLOBAL_STATE[])
183201
end
@@ -198,7 +216,7 @@ fields = (x=1, y=1)
198216
# as an "extension" of `example.bar@1`:
199217
@schema "example.baz" Baz
200218

201-
@version "example.baz@1 > example.bar@1" begin
219+
@version BazV1 > BarV1 begin
202220
x::Int8
203221
z::String
204222
k::Int64 = ismissing(k) ? length(z) : k
@@ -211,14 +229,14 @@ end
211229
# For a given Legolas schema version extension to be valid, all `Tables.Schema`s that comply with the child
212230
# must comply with the parent, but the reverse need not be true. We can check a schema version's required fields
213231
# and their type constraints via `Legolas.required_fields`. Based on these outputs, it is a worthwhile exercise
214-
# to confirm for yourself that `BazSchemaV1` is a valid extension of `BarSchemaV1` under the aforementioned rule:
215-
@test Legolas.required_fields(BarSchemaV1()) == (x=Union{Missing,Int8}, y=String, z=String)
216-
@test Legolas.required_fields(BazSchemaV1()) == (x=Int8, y=String, z=String, k=Int64)
232+
# to confirm for yourself that `BazV1SchemaVersion` is a valid extension of `BarV1SchemaVersion` under the aforementioned rule:
233+
@test Legolas.required_fields(BarV1SchemaVersion()) == (x=Union{Missing,Int8}, y=String, z=String)
234+
@test Legolas.required_fields(BazV1SchemaVersion()) == (x=Int8, y=String, z=String, k=Int64)
217235

218236
# As a counterexample, the following is invalid, because the declaration of `x::Any` would allow for `x`
219237
# values that are disallowed by the parent schema version `example.bar@1`:
220238
@schema "example.broken" Broken
221-
@test_throws Legolas.SchemaVersionDeclarationError @version "example.broken@1 > example.bar@1" begin x::Any end
239+
@test_throws Legolas.SchemaVersionDeclarationError @version BrokenV1 > BarV1 begin x::Any end
222240

223241
# Record type constructors generated for extension schema versions will apply the parent's field
224242
# assignments before applying the child's field assignments. Notice how `BazV1` applies the
@@ -248,18 +266,18 @@ end
248266
##### Schema Versioning
249267
#####
250268

251-
# Throughout this tour, all `@version` declarations have used the version number `1`, and thus every generated
252-
# record type and `SchemaVersion` alias has had the suffix `V1`. As you might guess, you can declare more than
253-
# a single version of any given schema, and the generated types' suffix will always match the version integer:
269+
# Throughout this tour, all `@version` declarations have used the version number `1`. As you might guess, you can
270+
# declare more than a single version of any given schema. Here's an example using the `example.foo` schema we defined
271+
# earlier:
254272

255-
@version "example.foo@2" begin
273+
@version FooV2 begin
256274
a::Float64
257275
b::String
258276
c::Int
259277
d::Vector
260278
end
261279

262-
@test FooSchemaV2() == Legolas.SchemaVersion("example.foo", 2)
280+
@test FooV2SchemaVersion() == Legolas.SchemaVersion("example.foo", 2)
263281

264282
fields = (a=1.0, b="b", c=3, d=[1,2,3])
265283
@test NamedTuple(FooV2(fields)) == fields
@@ -279,7 +297,7 @@ fields = (a=1.0, b="b", c=3, d=[1,2,3])
279297

280298
@schema "example.param" Param
281299

282-
@version "example.param@1" begin
300+
@version ParamV1 begin
283301
a::Int
284302
b::(<:Real)
285303
c
@@ -297,7 +315,7 @@ end
297315

298316
@schema "example.child-param" ChildParam
299317

300-
@version "example.child-param@1 > example.param@1" begin
318+
@version ChildParamV1 > ParamV1 begin
301319
c::(<:Union{Real,String})
302320
d::(<:Union{Real,Missing})
303321
e
@@ -330,24 +348,24 @@ table_isequal(a, b) = isequal(Legolas.materialize(a), Legolas.materialize(b))
330348
# key whose value is `Legolas.schema_identifier(schema)`. This field enables consumers of the table to
331349
# perform automated (or manual) schema discovery/evolution/validation.
332350
io = IOBuffer()
333-
Legolas.write(io, table, BazSchemaV1())
351+
Legolas.write(io, table, BazV1SchemaVersion())
334352
t = Arrow.Table(seekstart(io))
335353
@test Arrow.getmetadata(t) == Dict("legolas_schema_qualified" => "example.baz@1>example.bar@1")
336354
@test table_isequal(t, Arrow.Table(Arrow.tobuffer(table)))
337-
@test table_isequal(t, Arrow.Table(Legolas.tobuffer(table, BazSchemaV1()))) # `Legolas.tobuffer` is analogous to `Arrow.tobuffer`
355+
@test table_isequal(t, Arrow.Table(Legolas.tobuffer(table, BazV1SchemaVersion()))) # `Legolas.tobuffer` is analogous to `Arrow.tobuffer`
338356

339357
# Similarly, Legolas provides `Legolas.read(src)`, which wraps `Arrow.Table(src)`, but
340358
# validates the deserialized `Arrow.Table` against its declared schema version before
341359
# returning it:
342-
@test table_isequal(Legolas.read(Legolas.tobuffer(table, BazSchemaV1())), t)
360+
@test table_isequal(Legolas.read(Legolas.tobuffer(table, BazV1SchemaVersion())), t)
343361
msg = """
344362
could not extract valid `Legolas.SchemaVersion` from the `Arrow.Table` read
345363
via `Legolas.read`; is it missing the expected custom metadata and/or the
346364
expected \"legolas_schema_qualified\" field?
347365
"""
348366
@test_throws ArgumentError(msg) Legolas.read(Arrow.tobuffer(table))
349367
invalid = [Tables.rowmerge(row; k=string(row.k)) for row in table]
350-
invalid_but_has_metadata = Arrow.tobuffer(invalid; metadata=("legolas_schema_qualified" => Legolas.identifier(BazSchemaV1()),))
368+
invalid_but_has_metadata = Arrow.tobuffer(invalid; metadata=("legolas_schema_qualified" => Legolas.identifier(BazV1SchemaVersion()),))
351369
@test_throws ArgumentError("field `k` has unexpected type; expected <:Int64, found String") Legolas.read(invalid_but_has_metadata)
352370

353371
# A note about one additional benefit of `Legolas.read`/`Legolas.write`: Unlike their Arrow.jl counterparts,
@@ -364,7 +382,7 @@ invalid_but_has_metadata = Arrow.tobuffer(invalid; metadata=("legolas_schema_qua
364382

365383
@schema "example.portable" Portable
366384

367-
@version "example.portable@1" begin
385+
@version PortableV1 begin
368386
id::UUID = UUID(id)
369387
end
370388

@@ -380,8 +398,8 @@ end
380398
# since its UUID conversion behavior (and the corresponding type constraint) may be useful for validated construction.
381399

382400
# Luckily, it turns out that Legolas is actually smart enough to natively support this by default:
383-
@test complies_with(Tables.Schema((:id,), (UUID,)), PortableSchemaV1())
384-
@test complies_with(Tables.Schema((:id,), (UInt128,)), PortableSchemaV1())
401+
@test complies_with(Tables.Schema((:id,), (UUID,)), PortableV1SchemaVersion())
402+
@test complies_with(Tables.Schema((:id,), (UInt128,)), PortableV1SchemaVersion())
385403

386404
# How is this possible? Well, when Legolas checks whether a given field `f::T` matches a required field `f::F`, it doesn't
387405
# directly check that `T <: F`; instead, it checks that `T <: Legolas.accepted_field_type(sv, F)` where `sv` is the relevant

0 commit comments

Comments
 (0)