Skip to content

Commit f982574

Browse files
authored
Add concept of Sensor Centre
I am proposing to add this recipe of Sensor Centre to the cookbook.
1 parent f871d4a commit f982574

File tree

1 file changed

+159
-0
lines changed

1 file changed

+159
-0
lines changed
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
= Sensor Centres in WIS 2.0
2+
:toc: macro
3+
:sectnums: all
4+
:version: 0.1
5+
:author: Rémy Giraud
6+
:email: remy@giraud.me
7+
:revnumber: 0.1
8+
:revdate: 16.02.2025
9+
10+
<<<
11+
12+
Introduction::
13+
14+
15+
The documentation _Sensor Centres in WIS 2.0_ explains the *concept* of Sensor Centres in WIS2.0 and gives some examples of such tools, that are currently available
16+
or could be added, if volunteers WIS Members are willing to work on this.
17+
18+
As describe in the Manual of WIS, Volume II, WIS 2.0 as a collective IT system needs monitoring. Unlike typical integrated IT systems under
19+
a single management authority, it was not possible to us a tool like Zabbix, which is by nature fairly intrusive toward the system it monitors.
20+
Therefore, a solution where each component of the system provides information on its behaviour was preferred.
21+
22+
It has been decided to use openmetrics, and primarily two additional open source tools that can collect (prometheus) and visualize (grafana) the metrics.
23+
24+
At the moment, and for three Global Services (Broker, Cache and Discovery Catalogue), a set of metrics have been defined.
25+
Each instance of those Global Services must provide the agreed metrics.
26+
27+
Then, the Global Monitor collect the metrics and present a set of Grafana Dashboard.
28+
29+
This monitoring architecture is by design, extendable.
30+
31+
When new metrics are defined, it is possible to collect them and to visualize them.
32+
33+
To go beyond the metrics defined for the Global Services, the concept of Sensor Centre has been introduced.
34+
35+
A Sensor Centre relies on WIS 2.0 Global Services and potentially WIS2 Nodes. It can implement further processing on any part of the WIS 2.0 architecture
36+
37+
. receiving Notification Messages
38+
. downloading files
39+
. processing the content of the files
40+
. ...
41+
42+
the result of this processing is a specific set of _new_ metrics, that will be in turn collected either by Global Monitor
43+
or by additional monitoring systems for statistical analysis, visualization,...
44+
45+
It must be noted though that WIS 2.0 ecosystem *does not* provide any Sensor Centre.
46+
47+
It provides the tooling to implemented Sensor Centre by taking advantages of the monitoring solution of WIS 2.0. It is the responsibility of
48+
a communiy relying on WIS 2.0 for its data exchange to implement and to eventually deploy Sensor Centre(s) to monitor whatever seems appropriate to monitor its operations.
49+
50+
image:2025-02-22T10-53-08-434Z.png[]
51+
52+
The rest of this document provide example of Sensor Centre
53+
54+
Comparing the behaviour of Global Cache(s)::
55+
56+
In WIS 2.0 each Global Cache is independent of the other Global Caches. According to the specification of WIS 2.0, Global Cache (see https://wmo-im.github.io/wis2-guide/guide/wis2-guide-APPROVED.html#_2_7_4_1_technical_considerations ) :
57+
58+
`Global Caches will operate independently of one another. Each Global Cache will hold a full copy of the cache – although there may be small differences between the various Global Caches as data availability notification messages propagate through WIS to each one. There is no formal synchronization between Global Caches.`
59+
60+
it is therefore interesting to verify, for example:
61+
62+
. what is the average delay to cache the data made available by WIS2 Node
63+
. are all files published as _core data_ (and with `cache: true` in the Notification Message) by WIS2 Nodes are effectively available in all Global Caches
64+
. are files missed by some Global Cache or more critically by all Global Caches
65+
66+
Such metrics would provide useful information to identify problems, to help Global Caches to fix them, to define KPI that could be used to objectively measure the effective performance of the Cache.
67+
68+
It could also be used to detect anomalies from the WIS2 node, such as the reuse too frequently of the same `data_id`.
69+
70+
It is agreed to call this type of Sensor Centre: sensor-global-cache.
71+
The full centre-id will therefore be 2 letter country code - name of the centre - sensor-global-cache
72+
73+
For such a centre operated by Météo-France the centre-id would be fr-meteofrance-sensor-global-cache
74+
75+
A list of metrics has been defined for this Sensor Centre (see https://github.com/wmo-im/wis2-metric-hierarchy/blob/main/metrics/sgc.csv):
76+
77+
[cols="3*", options="header"]
78+
|=============================================================================================================================================================
79+
| Name | Labels | Description
80+
| wmo_wis2_sgc_cache_delay_seconds | globalcache,centre_id,report_by | Delay between origin and cache message
81+
| wmo_wis2_sgc_messages_cached_total | globalcache,centre_id,report_by | Number of data files cached for centre_id
82+
| wmo_wis2_sgc_messages_cached_delay_total | globalcache,centre_id,report_by | Number of data files cached for centre_id within defined delay (120s 300s 600s)
83+
| wmo_wis2_sgc_messages_published_total | centre_id,report_by | Number of cacheable data files published
84+
| wmo_wis2_sgc_messages_missed_total | globalcache,centre_id,report_by | Number of cacheable data not in global cache
85+
| wmo_wis2_sgc_messages_missed_all_total | centre_id,report_by | Number of cacheable data not in any cache
86+
|=============================================================================================================================================================
87+
88+
The processing of this Sensor Centre is as follow:
89+
90+
- Subscribe to a given `origin/a/wis2/...`, it could be `#` or a particular centre-id on at least one Global Broker
91+
- Subscribe to a given `cache/a/wis2/...`, it could be `#` or a particular centre-id on at least one Global Cache - The subscription must be done on the broker of the Global Cache (unlike normal subscription to be made only on the Global Broker)
92+
93+
Only on the subscription to the Global Broker:
94+
95+
- Discard the Notification Message if is it `recommended` data or if `cache: false` as the data will not be cached
96+
- Detect any duplicates `data_id` not including a `rel: update` within a period of at least X hours
97+
- Store the time where the message as been received
98+
- (optional) Store the full Notification Message - this can be useful to analyse systematic issues
99+
100+
For each of the subscription to the various Global Caches:
101+
102+
- For each Notification Message and using `data_id`, make the difference between the time was received on `origin` and the same `data_id` on `cache`: Time~Cache~ - Time~Origin~
103+
- Update the `wmo_wis2_sgc_cache_delay_seconds` metric with this value
104+
- Compare this value with the three threshold defined in the matric table above. Increase by 1 `wmo_wis2_sgc_messages_cached_delay_total` using the threshold as a label (so less than 120s, less that 300s, less than 600s)
105+
- If no Notification Message for the `data_id` is received after the highest threshould (here 600s), increase by 1 `wmo_wis2_sgc_messages_missed_total`
106+
107+
108+
If no Global Cache has cached the data, increase by 1 `wmo_wis2_sgc_messages_missed_all_total`
109+
110+
All the metrics must be exposed for scraping by the Global Monitor.
111+
112+
If desirable and in order to further analyse the situation, the origin Notification Message can be published on monitor/a/wis2/centre-id sensor centre/centre-id of the originator of the message.
113+
114+
Comparing the behaviour of Global Brokers::
115+
116+
By design, all Notification Messages must be avaimable on all Global Brokers. Either after being received directly from the source centre-id or indirectly from another Global Broker.
117+
118+
During the validation tests ran in autumn 2024, it was check that for a (small) giver number of Notification Messages all Global Brokers were behaving as expected.
119+
120+
However, as a complement or as a way to detect anomalies, it could be useful to effectively compare, using operational Notification Messages that all Notification Messages are available on all Global Broker.
121+
122+
It is expected that the Global Brokers will be _almost_ in sync, and the delay between having the same `ìd` on all Global Broker will be less than 15 secondes.
123+
124+
This type of Sensor Centre can be called: sensor-global-broker.
125+
The full centre-id will therefore be 2 letter country code - name of the centre - sensor-global-broker.
126+
127+
128+
[cols="3*", options="header"]
129+
|=============================================================================================================================================================
130+
| Name | Labels | Description
131+
| wmo_wis2_sgb_missed_total | globalbroker,centre_id,report_by | Number of Notification Messages missed by the Global Broker
132+
|=============================================================================================================================================================
133+
_to be further expanded_
134+
135+
The processing of this Sensor Centre is as follow:
136+
137+
- Subsbribe to `origin/a/wis2/...` and `cache/a/wis2/...`, it could be `#` or a particular centre-id on at all Global Brokers
138+
- For each `id` received, check if the `id` is received by all Global Brokers within the 15s time window
139+
140+
Conclusion::
141+
142+
This document presents the concept of Sensor Centre and provide two examples of such tools.
143+
144+
Obviously, many more types of Sensor Centre can be designed.
145+
146+
Each community within WIS2.0 can design Sensor Centre tailored to its needs.
147+
148+
The approach will always be similar:
149+
150+
. Discuss the opportunity of developping a Sensor Centre to assess how the centre-id providing the data, or how the Global Services are performing, or anything relying on WIS 2.0 for addressing the needs of the community
151+
. Agree on a list of metrics than can be implemented to perform the assessment
152+
. Register the list of metrics in the WMO metrics repository https://github.com/wmo-im/wis2-metric-hierarchy/
153+
. Develop the Sensor Centre
154+
. Operate one or more instance of the Sensor Centre
155+
. Register the Sensor(s) Centre centre-id in the WMO Register
156+
. Ensure that the metrics are correctly scraped by the Global Monitor
157+
. Provide the Grafana dashboard that the Global Monitor will host
158+
159+
It is also possible for item 7. and 8. above to use another Monitor Centre if preferred by the community.

0 commit comments

Comments
 (0)