feat(mongodb): alert when compaction is needed#2433
Conversation
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
807646c to
146b0b5
Compare
|
@francoisferrand Heuristics are up for discussion -- I chose a mix of:
|
francoisferrand
left a comment
There was a problem hiding this comment.
- not sure about thresholds/computation: are these relevant default?
- not sure we should merge in 2.15, or target 2.16 and get time to "preview" this and avoid shipping alerts which would lead to support calls...
Add --collector.dbstatsfreestorage to mongodb_exporter's extraArgs so
the dbstats response's freeStorageSize / indexFreeStorageSize /
totalFreeStorageSize fields are surfaced as top-level Prometheus series
(mongodb_dbstats_freeStorageSize{database, rs_nm, ...} etc.).
These sub-collectors are not bundled into the catch-all options the
exporter exposes (--collect-all and similar shortcuts); they have to be
opted into explicitly. Since the chart no longer uses --collect-all
anyway (dropped in 8414833 for ZENKO-5281), each individual collector
we want has to be named in extraArgs — which is already how dbstats,
diagnosticdata, replicasetstatus, and topmetrics are wired up. This
just adds dbstatsfreestorage to that list.
Verified on a live Artesca cluster (exporter 0.40.0): without this flag
the freeStorageSize fields only appear as part of the per-host
mongodb_dbstats_raw_<host>_freeStorageSize series — clunky for alerting
queries. With the flag they appear cleanly as top-level series with
{database, rs_nm, ...} labels.
This unblocks the MongoDbCompactionNeeded alert added in the following
commit, which needs totalFreeStorageSize at top level to express the
compaction-pressure heuristic.
Issue: ZENKO-5293
146b0b5 to
2790943
Compare
|
Keeping on 2.15 for now as 2.16 doesnt exist. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
The following reviewers are expecting changes from the author, or must review again: |
2790943 to
b8e0eda
Compare
|
Intentionally set 30% by default (conservative), up for discussion |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
Add MongoDbCompactionNeeded Prometheus rule that fires when a MongoDB database has accumulated reclaimable storage exceeding 30% of the underlying filesystem capacity: totalFreeStorageSize > 0.3 * fsTotalSize for: 1h Per-(pod, database) granularity, severity warning. The threshold is exposed as the compactionFreeStorageRatioThreshold x-input. Expressing it as a fraction of fsTotalSize lets it scale across cluster sizes: a 100 GB filesystem fires around 30 GB reclaimable; a 10 TB filesystem fires around 3 TB. Companion fixture covers a needs-compaction DB and a healthy DB sharing the same filesystem (the first crosses the threshold, the second doesn't). Issue: ZENKO-5293
b8e0eda to
fe5f995
Compare
|
/approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
This pull request does not target the following hotfix branch(es) so they
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
|
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue ZENKO-5293. Goodbye delthas. |
Follow-up to ZENKO-5285 (PR #2431), which bundled two alerts in its description and only shipped the first (createIndexes-failed). This adds the second — a fragmentation / compaction-needed signal — that @DarkIsDude explicitly asked be filed as a follow-up.
Two commits, deliberately split
mongodb: enable dbstatsfreestorage collector in exporter— one-linevalues.yamlchange adding--collector.dbstatsfreestoragetometrics.extraArgs. The chart's exporter no longer uses--collect-all(dropped in 8414833 for ZENKO-5281), so each sub-collector is opted into individually inextraArgs(already listsdbstats,diagnosticdata,replicasetstatus,topmetrics). Withoutdbstatsfreestorage,freeStorageSize/totalFreeStorageSizeonly appear as part of the clunky per-hostmongodb_dbstats_raw_<host>_*expansion instead of as top-levelmongodb_dbstats_*series.mongodb: alert when compaction is needed— the alert proper.The alert
Per-(pod, database) granularity. Severity warning. Threshold exposed as the
compactionFreeStorageRatioThresholdx-input (default 0.3).Expressing the threshold as a fraction of filesystem capacity (not of the DB's own storage) lets it scale across cluster sizes:
Heuristic notes
A previous draft combined three legs (FS-pressure + fragmentation ratio + absolute floor). @francoisferrand's review pointed out that the fragmentation-ratio leg added noise without much signal and the FS-pressure leg could mask real waste on under-used disks. The single fs-scaled threshold above folds both concerns into one condition.
Per-DB granularity means the alert tells you
pod={{ "{{ $labels.pod }}" }}anddatabase={{ "{{ $labels.database }}" }}but does not identify the specific collection. To find that, an operator runscollStatson the alerting DB. The exporter has a--collector.collstatsflag that could expose per-collection visibility, but at Artesca-scale cardinality (thousands of buckets per DB) that's expensive — deferred.Safety against missing / zero values
(pod, database)tuple → PromQL produces empty → no alert.totalFreeStorageSize = 0(no fragmentation) →0 > 0.3 * fsTotalSizeis always false.fsTotalSize = 0would degenerate (0.3 * 0 = 0, then any positivefreeStoragewould fire), but it's physically impossible for an attached PVC.Follow-up (not in this PR)
Matching
compactionFreeStorageRatioThresholdconfig option in ZKOP, to expose the knob to ops. Not filed as a ticket yet; will track when this lands.Related
Issue: ZENKO-5293