-
Notifications
You must be signed in to change notification settings - Fork 24
load_geojson
: Several improvements, especially around empty propert…
#549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: draft
Are you sure you want to change the base?
Conversation
3e04b16
to
8a303fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, but I'd rather wait for a feedback from @soxofaan since I'm not using this process so often.
proposals/load_geojson.json
Outdated
@@ -18,7 +18,7 @@ | |||
}, | |||
{ | |||
"name": "properties", | |||
"description": "A list of properties from the GeoJSON file to construct an additional dimension from. A new dimension with the name `properties` and type `other` is created if at least one property is provided. Only applies for GeoJSON Features and FeatureCollections. Missing values are generally set to no-data (`null`).\n\nDepending on the number of properties provided, the process creates the dimension differently:\n\n- Single property with scalar values: A single dimension label with the name of the property and a single value per geometry.\n- Single property of type array: The dimension labels correspond to the array indices. There are as many values and labels per geometry as there are for the largest array.\n- Multiple properties with scalar values: The dimension labels correspond to the property names. There are as many values and labels per geometry as there are properties provided here.", | |||
"description": "A list of properties from the GeoJSON file to construct an additional dimension from. A new dimension with the name `properties` and type `other` is created. If no property is provided, a label with value `id` is created and all values are set to a no-data value. Missing values are generally set to a no-data value.\n\nDepending on the number of properties provided, the process creates the dimension differently:\n\n- Single property with scalar values: A single dimension label with the name of the property and a single value per geometry.\n- Single property of type array: The dimension labels correspond to the array indices. There are as many values and labels per geometry as there are for the largest array.\n- Multiple properties with scalar values: The dimension labels correspond to the property names. There are as many values and labels per geometry as there are properties provided here.\n\nAlthough mixing data types of values is generally allowed, certain combinations of data type may be unsupported by some implementations. It is recommended to keep the data types homogeneous.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial suggestion (unfortunately the diff-view does not seem to work properly on these long lines):
"description": "A list of properties from the GeoJSON file to construct an additional dimension from. A new dimension with the name `properties` and type `other` is created. If no property is provided, a label with value `id` is created and all values are set to a no-data value. Missing values are generally set to a no-data value.\n\nDepending on the number of properties provided, the process creates the dimension differently:\n\n- Single property with scalar values: A single dimension label with the name of the property and a single value per geometry.\n- Single property of type array: The dimension labels correspond to the array indices. There are as many values and labels per geometry as there are for the largest array.\n- Multiple properties with scalar values: The dimension labels correspond to the property names. There are as many values and labels per geometry as there are properties provided here.\n\nAlthough mixing data types of values is generally allowed, certain combinations of data type may be unsupported by some implementations. It is recommended to keep the data types homogeneous.", | |
"description": "A list of properties from the GeoJSON data to construct an additional dimension from. A new dimension with the name `properties` and type `other` is created. Missing values are generally set to a no-data value.\n\nDepending on the number of properties provided, the process creates the dimension differently:\n\n- Empty `properties` list: A single dimension label with value `id` is created and all values are set to a no-data value.\n\n- Single property with scalar values: A single dimension label with the name of the property and a single value per geometry.\n- Single property of type array: The dimension labels correspond to the array indices. There are as many values and labels per geometry as there are for the largest array.\n- Multiple properties with scalar values: The dimension labels correspond to the property names. There are as many values and labels per geometry as there are properties provided here.\n\nMissing values are generally set to a no-data value.\n\nAlthough mixing data types of values is generally allowed, certain combinations of data type may be unsupported by some implementations. It is recommended to keep the data types homogeneous.", |
Some additional notes:
- about the
Empty properties list: ... id is created and all values are set to a no-data value
: we could also set auto-increment values, which makes this case (which is the default by the way) quite a bit more useful I would say - about
Single property with scalar values/of type array
: I'm a bit hesitant about involving the type of the property value in the behavior of this process (e.g. to unroll arrays into separate "columns"). I guess this is to be more user friendly, but to me it feels a bit too magic and it undermines consistency and predictability. - Slightly related: what happens if multiple properties are listed, but some of them have array values: would that be an error?
What would feel more strict to me is to allow the properties
argument not only be an array of strings like it is now, but also a single string; and maybe even null
, which could then be the default:
properties
arg isnull
(default) -> labelid
with no-data (or auto-increment) valuesproperties
arg is array (must be non-empty) -> values are expected to be scalar. No automatic unrolling: to be decided if it is allowed to have arrays as pixel values or to raise error here.properties
arg is string -> values are expected to be an array to be unrolled (and raise error when they are scalar)
I think that will make the API more clean and predictable, because it is more explicit about assumptions and it avoids brittle value sniffing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about the Empty properties list: ... id is created and all values are set to a no-data value: we could also set auto-increment values, which makes this case (which is the default by the way) quite a bit more useful I would say
I think the intention was to not change the features, so I'd be hesistant to create something new here.
A simple load_geojson => save_result would add data, which was not there before.
about Single property with scalar values/of type array: I'm a bit hesitant about involving the type of the property value in the behavior of this process (e.g. to unroll arrays into separate "columns"). I guess this is to be more user friendly, but to me it feels a bit too magic and it undermines consistency and predictability.
What would happen in case of arrays then? openEO datacubes only consist of scalars, so we wouldn't be able to load any property that is of type array if we remove the "magic".
Slightly related: what happens if multiple properties are listed, but some of them have array values: would that be an error?
I guess that should result in an error, indeed. Or we create something like "property.index".
What would feel more strict to me is to allow the properties argument not only be an array of strings like it is now, but also a single string; and maybe even null, which could then be the default:
I don't understand this. Why is this more strict? My impression would be that this is less strict (and a rather verbose schema). A single string is equivalent to an array with one string and null is equivalent to an empty array. I don't get it. The same rules that apply could also apply to the array schema, especially as that would still be allowed unless we set minItems 2 in the schena for the array. I think I'd prefer keeping the input data type as array only.
526ed6d
to
d51ffba
Compare
…ies parameter #448 Co-authored-by: Stefaan Lippens <soxofaan@users.noreply.github.com>
d51ffba
to
63ea54d
Compare
load_geojson
:properties
parameter is emptyCloses #448