Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cookbook/sections/data-consumers.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
== Recipes for data consumers

=== Search the Global Discovery Catalogue
=== Searching the Global Discovery Catalogue

The https://wmo-im.github.io/wis2-guide/guide/wis2-guide-APPROVED.html#_2_4_4_global_discovery_catalogue[Global Discovery Catalogue (GDC)] allows for a wide range of query predicates to search for data in WIS2 as per the OGC API - Records - Part 1: Core specification.

Expand Down
297 changes: 147 additions & 150 deletions cookbook/sections/data-publishers.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
== Recipes for data publishers

=== Validate a WIS2 Notification Message
=== Validating a WIS2 Notification Message

A https://wmo-im.github.io/wis2-notification-message[WIS2 Notification Message] provides a http://schemas.wmo.int/wnm[JSON Schema] which can be used by any programming language that supports JSON and JSON Schema validation.

Expand All @@ -17,7 +17,7 @@ curl -O http://schemas.wmo.int/wnm/1.0.0/schemas/wis2-notification-message-bundl
check-jsonschema --schemafile wis2-notification-message-bundled.json /path/to/my/wnm.json
----

The https://github.com/wmo-im/pywis-pubsub[pywis-pubsub] tool provides a test suite to validate a message against the WNM specification requirements, as well as a Python API for application integration. Consult the pywis-pubsub README on GitHub for more information/examples.
The https://github.com/World-Meteorological-Organization/pywis-pubsub[pywis-pubsub] tool provides a test suite to validate a message against the WNM specification requirements, as well as a Python API for application integration. Consult the pywis-pubsub README on GitHub for more information/examples.

.Using pywis-pubsub
[source,bash]
Expand All @@ -35,7 +35,7 @@ pywis-pubsub ets validate /path/to/file.json
pywis-pubsub ets validate https://example.org/path/to/file.json
----

=== Publish a WIS2 Notification Message with access control
=== Publishing a WIS2 Notification Message with access control

Recommended data in WIS2 may be open or access controlled. For data publication with access control implications, WNM provides a `security` object as part a link object. The `security` object is defined using https://swagger.io/specification/#security-scheme-object[OpenAPI Security Scheme definitions].

Expand Down Expand Up @@ -82,9 +82,150 @@ Note:

Of course, always ensure your WNM is valid (see <<Validate a WIS2 Notification Message>> for more information).

=== Validate a WMO Core Metadata Profile record
=== Publishing a WIS2 Notification Message with embedded data

The [pywcmp](https://github.com/wmo-im/pywcmp) tool provides a test suite to validate a message against the WCMP2 specification requirements, as well as a Python API for application integration. Consult the pywcmp README on GitHub for more information/examples.
A https://wmo-im.github.io/wis2-notification-message[WIS2 Notification Message] provides the ability to embed data as part of a message (see the https://wmo-im.github.io/wis2-notification-message/standard/wis2-notification-message-STABLE.html#_1_15_properties_content[Properties / Content] section of the specification).

Providing embedded data for direct access offers users access to the data without the need for downloading the file via the notification link (see the https://wmo-im.github.io/wis2-notification-message/standard/wis2-notification-message-STABLE.html#_1_16_links[Links] section of the specification).

In some cases, a data publisher also may not wish to manage or publish the data link to infrastructure to an HTTP or FTP server.

When the data publisher decides to use the embedded data feature without the need of implementing an actionable link providing access to the same data, the data publisher will:

- use `properties.content` object to provide the data
- use `properties.cache=false` to instruct WIS2 Global Cache Services to not cache the data
- provide a link object whose `href` results in HTTP 204 (No Content)

To provide an HTTP 204 link, there a couple of options:

**Option 1: configure your web server to provide an endpoint that always returns 204**:

.Nginx example configuration
[source,console]
----
# always return HTTP 204 when accessing /no-content-here
location /no-content-here/ {
return 204;
}
----

.Apache example configuration
[source,console]
----
# always return HTTP 204 when accessing /no-content-here
Redirect 204 no-content-here
----

which then would result in the link object as follows:

[source,json]
----
{
"href": "https://example.org/no-content-here",
"rel": "canonical"
}
----

**Option 2: use a service that provides HTTP responses as a service**. One such service is the https://http.codes[HTTP Response API], which then would result in the link object as follows:

[source,json]
----
{
"href": "https://http.codes/204",
"rel": "canonical"
}
----

.Embedded data example:
[source,json]
----
{
"id": "31e9d66a-cd83-4174-9429-b932f1abe1be",
"conformsTo": [
"http://wis.wmo.int/spec/wnm/1/conf/core"
],
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
6.146255135536194,
46.223296618227444
]
},
"properties": {
"pubtime": "2022-03-20T04:50:18Z",
"datetime": "2022-03-20T04:45:00Z",
"integrity": {
"method": "sha512",
"value": "A2KNxvks...S8qfSCw=="
},
"cache": false,
"data_id": "dataset/123/data-granule/UANT01_CWAO_200445___15103.bufr4",
"metadata_id": "urn:wmo:md:ca-eccc-msc:observations.swob",
"content": {
"encoding": "utf-8",
"value": "encoded bytes from the file",
"size": 457
}
},
"links": [
{
"href": "https://http.codes/204",
"rel": "canonical"
}
]
}
----

For cases of publishing recommended data, the same approaches/example can be used, without the need to set `properties.cache`.

=== Publishing a WIS2 Notification Message for resource deletion

A https://wmo-im.github.io/wis2-notification-message[WIS2 Notification Message] provides the ability to publish notifications for new, updated or deleted data and metadata (see the https://wmo-im.github.io/wis2-notification-message/standard/wis2-notification-message-STABLE.html#_1_16_links[Links] section of the specification).

Similar to the <<_publishing_a_wis2_notification_message_with_embedded_data,embedded data Recipe>>, for data or metadata deletions (core or recommended data), a data publisher may not wish to manage or publish the data or metadata link to infrastructure to an HTTP or FTP server.

In this case, a similar strategy can be used to provide an HTTP 204 No Content link, as per the <<_publishing_a_wis2_notification_message_with_embedded_data,embedded data Recipe>>, along with setting `rel=deletion` in the link object.

.Resource deletion example:
[source,json]
----
{
"id": "31e9d66a-cd83-4174-9429-b932f1abe1be",
"conformsTo": [
"http://wis.wmo.int/spec/wnm/1/conf/core"
],
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
6.146255135536194,
46.223296618227444
]
},
"properties": {
"pubtime": "2022-03-20T04:50:18Z",
"datetime": "2022-03-20T04:45:00Z",
"integrity": {
"method": "sha512",
"value": "A2KNxvks...S8qfSCw=="
},
"cache": false,
"data_id": "dataset/123/data-granule/UANT01_CWAO_200445___15103.bufr4",
"metadata_id": "urn:wmo:md:ca-eccc-msc:observations.swob",
},
"links": [
{
"href": "https://http.codes/204",
"rel": "deletion"
}
]
}
----

=== Validating a WMO Core Metadata Profile record

The https://github.com/World-Meteorological-Organization/pywcmp[pywcmp] tool provides a test suite to validate a message against the WCMP2 specification requirements, as well as a Python API for application integration. Consult the pywcmp README on GitHub for more information/examples.

.Using pywcmp
[source,bash]
Expand Down Expand Up @@ -119,7 +260,7 @@ image::images/data-publishers-wcmp2-validate-response.png[WIS2 GDC online valida

A response will be provided with validation results.

=== Advertise client side filters for data subscriptions in WCMP2 and WNM
=== Advertising client side filters for data subscriptions in WCMP2 and WNM

A key concept of a WCMP2 record is "actionable links"; this means being able to access a dataset or data granule
without any further interactions. For real-time data, a WCMP2 record provides linkages to the WIS2 Global Broker
Expand Down Expand Up @@ -305,147 +446,3 @@ def run():
if __name__ == '__main__':
run()
----

=== Defining and proposing topic for WMO Earth system disciplines

The https://wmo-im.github.io/wis2-topic-hierarchy/standard/wis2-topic-hierarchy-STABLE.html[WIS2 Topic Hierarchy] describes the topic structure and levels to be used when publishing WIS2 notification messages.

The purpose of this document is to provide guidelines to domain experts so that the Topic Hierarchy definition is consistent and useful to address the needs of WIS2 users.

==== Core levels

Core topics are defined in the first 7 levels and address all Earth system disciplines in a consistent manner.

.Core topic examples:
* `cache/a/wis2/ma-marocmeteo/core/data/weather`
* `origin/a/wis2/de-dwd/core/data/ocean`
* `cache/a/wis2/jp-jma/core/data/weather`

The Earth system disciplines https://codes.wmo.int/wis/topic-hierarchy/_earth-system-discipline[defined] by WTH are as follows:

* `atmospheric-composition`
* `climate`
* `cryosphere`
* `hydrology`
* `ocean`
* `space-weather`
* `weather`

Topics within each Earth system discipline are then defined by domain experts, reviewed, approved, and published by WMO.

==== Additional Earth domain specific topics levels

===== What is the use of the Topic Hierarchy ?

The definition of the Topic Hierarchy is heavily linked to the use of Publish-Subscribe (Pub/Sub) protocols, here MQTT[S], in WIS2. Users, when subscribing to the Global Broker, can decide which notification messages they wish to receive.

===== MQTT wildcards

For example, connecting to a Global Broker and subscribing to:

`pass:[cache/a/wis2/+/core/data/weather/surface-weather-observation/synop]`

users will receive a notification when a new synop is made available from *any* centre-id publishing message on WIS2. The `+` character is a single level wildcard for MQTT subscription.

A user can also choose to subscribe to:

`pass:[cache/a/wis2/de-dwd/core/data/weather/#]`

In this case, users will receive notification messages from the `de-dwd` centre-id for *all* weather data. The `#` character is a multiple level wildcard for MQTT subscription, and can only be used at the end of a topic subscription.

By using `+` and `#` (the two defined wildcards in the MQTT[S] protocol), users can extend their subscription and receive all messages they need.

The purpose of additional topic hierarchy levels is to allow users to get the messages they need and to be more specific in their request.

==== Dos and don'ts when defining Earth system discipline topics

*Define only a _small number_ of additional levels*

According the the MQTT[S] protocol specification, the accepted length of a given topic is more than 65kB. This means that the number of levels can be extremely large. However, as explained in https://www.emqx.com/en/blog/advanced-features-of-mqtt-topics:

_Try not to use more topic levels “just because I can”. For example, `my-home/room1/data` is a better choice than `my/home/room1/data`._

By default, some MQTT brokers are configured to accept a maximum of 10 levels. This can be changed in the configuration of the broker, however, this limit shows that a usable, practical topic structure should not be too deep (i.e. with a large number of levels).

For WIS2, and considering the various Earth system disciplines, a limit of *4* sublevels seems appropriate.

*Do not use the topic as a metadata record*

When defining the topic, experts must focus on the needs of users. The purpose of the WIS2 Topic Hierarchy is *not* to describe as precisely as possible what data users will obtain if they decide to subscribe and then download. Rather, it is _only_ to provide a filtering mechanism so that users will not be flooded by WIS2 Notification Messages that may not be useful for them.

In WIS2, all datasets *must* be described using the https://wmo-im.github.io/wcmp2/standard/wcmp2-STABLE.html[WMO Core Metadata Profile version 2(WCMP2)]. Users will be able to discover the data they need by searching the WIS2 catalogue using (via search engines, directly, or from portals and applications). The topic hierarchy information will be part of the metadata record for data which provides real-time notifications of publication.

*Do not allow locally defined sublevels outside the `experimental` topic

Each Earth system discipline provides an `experimental` topic. For example, for `weather`, the first additional levels in the domain topic hierarchy are:

[cols="1,1"]
|===
|Name|Description
|`advisories-warnings`
|Advisories and warnings

|`aviation`
|Aviation

|`prediction`
|Data sets produced by quantitative algorithms, such as numerical or statistical prediction models, describing the past, present and future meteorological states

|`space-based-observations`
|Space based observations

|`surface-based-observations`
|Surface based observations
|`experimental`
|Experimental topics
|===

As the name suggests, `experimental` allows for defining additional topics for tests and experiments. This is not meant to be used for operational data exchange. It should only be used for testing purposes.

With the exclusion of the `experimental` topic level, the topic Hierarchy must be _fully_ defined for each Earth system discipline.

In some situations, it might be tempting for a data producer to use additional topic levels to restrict even more the number of messages received by the users.

This is *forbidden*.

In each Earth system discipline community, all WIS Centres will be able to use the entire topic hierarchy of the domain if they provide data corresponding to each topic. A WIS Centre will not be allowed to add additional sublevels or undefined level within the Topic Hierarchy for its own needs.

Modification of the Topic Hierarchy will be possible by using the WMO fast-track approval process.

*Consider users needs and prevent complex wildcard subscriptions*

The purpose of the WIS2 Topic Hierarchy is to inform users about the availability of new data. In WIS2, obtaining data will start, in most cases, by configuring one or more subscription to topics, as defined in the associated WCMP2 discovery metadata records, so that users will receive notifications when new data is available.

The Topic Hierarchy should be defined so that users will not need to configure a very large number of different subscriptions to get the data they are interested in.

Each level in the WIS2 Topic Hierarchy should be seen as a "logical" group (as the Earth system disciplines `weather`, `ocean`... or like `synop` for `surface-based-observations`).

Then, and considering that wildcard subscription (using `+` and `#` as described above), are "expensive" to manage for the brokers.

For example, a topic hierarchy resulting in users subscribing to:

`pass:[cache/a/wis2/+/core/data/ocean/+/some/+/thing/+/else/#]`

should be avoided. A subscription to the following topic:

`pass:[cache/a/wis2/+/core/data/ocean/some/thing/else/#]`

is much more effective for both the client and the broker side.

If it is likely that most users will use wildcards for particular topic levels, then, either removing that level altogether, of moving that level a the end of the topic hierarchy is also more efficient for clients and producers.

If most users end up subscribing to:

`pass:[cache/a/wis2/+/core/data/ocean/+/thing/+/#]`

then, the Topic Hierarchy could be reconsidered, so that the above subscription can be replaced by:

`pass:[cache/a/wis2/+/core/data/ocean/thing/#]`

Reordering the levels of topics and potentially reducing the number of sublevels makes the topic hierarchy simpler and more efficient.

*Facilitate client side filtering*

Notification messages are small pieces of information. MQTT[S] broker and clients are able to handle a very large number of messages. In that sense, receiving, potentially, too many messages is not a problem. However, downloading data, depending on the size of the data might be slower and less efficient. If, for a particular dataset, the geometry information available in the notification message is not sufficient to allow client-side filtering before download, it is suggested to provide additional information in the `properties` object of the notification message so that users can decide _before_ downloading if the data in this particular message is useful for them.

See <<_advertise_client_side_filters_for_data_subscriptions_in_wcmp2_and_wnm>> for more information on client side filtering