-
Notifications
You must be signed in to change notification settings - Fork 116
Description
Expected Behavior
Metrics collected by a single execution of a check plugin using --metric_format influxdb
should be rolled up into a single Influx measurement.
Current Behavior
Metrics collected by a single execution generate multiple Influx measurements, recorded with the same time but varying key fields.
Context
Recently I've been working with the metrics-aggregate.rb
check plugin from the sensu-plugins-sensu collection. In doing so, I've learned a bit about how this library is generating measurements for influxdb line protocol -- and I think we need to make a change.
After installing the plugin collection, I ran the following to collect measurements of check results being reported into aggregates. In this example I have four checks feeding results into one aggregate named "procs" :
/opt/sensu/embedded/bin/metrics-aggregate.rb --metric_format influxdb
sensu.aggregates,aggregate=procs clients=1 1545929125
sensu.aggregates,aggregate=procs checks=4 1545929125
sensu.aggregates,aggregate=procs ok=4 1545929125
sensu.aggregates,aggregate=procs warning=0 1545929125
sensu.aggregates,aggregate=procs critical=0 1545929125
sensu.aggregates,aggregate=procs unknown=0 1545929125
sensu.aggregates,aggregate=procs total=4 1545929125
sensu.aggregates,aggregate=procs stale=0 1545929125
After traversing my Sensu pipeline, this output is accepted by InfluxDB and recorded. I can query them using influx
CLI:
> SELECT * FROM "sensu.aggregates"
name: sensu.aggregates
time aggregate checks clients critical critical_1 host ok ok_1 stale stale_1 total total_1 unknown unknown_1 value warning warning_1
---- --------- ------ ------- -------- ---------- ---- -- ---- ----- ------- ----- ------- ------- --------- ----- ------- ---------
1545929125000000000 procs 4 sensu-enterprise-poc 0
1545929125000000000 procs sensu-enterprise-poc 0 0
1545929125000000000 procs 0 sensu-enterprise-poc 0
1545929125000000000 procs sensu-enterprise-poc 0 0
1545929125000000000 procs sensu-enterprise-poc 4 0
1545929125000000000 procs 1 sensu-enterprise-poc 0
1545929125000000000 procs sensu-enterprise-poc 0 0
1545929125000000000 procs sensu-enterprise-poc 4 0
1545929125000000000 procs sensu-enterprise-poc 0 0
1545929125000000000 procs sensu-enterprise-poc 4 0
1545929125000000000 procs 0 sensu-enterprise-poc 0
1545929125000000000 procs 4 sensu-enterprise-poc 0
1545929125000000000 procs sensu-enterprise-poc 0 0
1545929125000000000 procs 1 sensu-enterprise-poc 0
1545929125000000000 procs sensu-enterprise-poc 0 0
1545929125000000000 procs sensu-enterprise-poc 4 0
You can see that although the time is the same for each of these measurements, each one has a single value for one dimension of the aggregate being measured. That is to say, one line has a value of 4 for the ok
key field, and another may have a value of 1 for the clients
field, but none of the measurements have values for multiple key fields.
On the face of it this looks less than ideal, but in reality I think this makes the measurements rather useless. The effect becomes obvious when one attempts to perform basic math across these multiple measurements.
In my case I want to use these measurements to provide a value to a single stat pane in a Grafana dashboard. In theory, this should allow me to use an Influx query like this one to return a percentile of ok
checks:
SELECT (ok / total) * 100 AS "calculated_percentage" FROM "sensu.aggregates" WHERE "aggregate" = 'procs'
But this query returns an empty response. I believe this is because of a known limitation in InfluxDB which prevents mathematics across measurements.
If I manually insert a measurement which has values for both ok
and total
key fields, the query works as expected:
> insert sensu.aggregates,aggregate=procs ok=3,warning=1,critical=0,unknown=0,stale=0,total=4 1545144411148540258
> SELECT (ok / total) * 100 AS "calculated_percentage" FROM "sensu.aggregates" WHERE "aggregate" = 'procs'
name: sensu.aggregates
time calculated_percentage
---- ---------------------
1545144411148540258 75
Because of the limitation the current approach creates, I've had to send these measurements to graphite instead, where I was able to use asPercent
function across the recorded measurements
to get the needed single stat.
Environment
- Sensu Core 1.6.2
- sensu-plugin 2.7.0
- sensu-plugins-sensu pre-release build w/ Update aggregate plugins for Sensu 1.x sensu-plugins-sensu#73