-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
probe (module scoped) failure reason metric
Current Situation
Currently, the blackbox_exporter provides some general metrics such as probe_success and probe_duration_seconds that apply universally to all modules. Additionally, specific modules like the http module / prober offer their own metrics like probe_http_status_code, which help monitor the availability and performance of http endpoints. However, when a probe fails, it can be challenging to pinpoint the exact cause of the failure without manually inspecting blackbox_exporter logs or attempting to reproduce the error, if possible.
Proposal
To address this issue, we propose the addition of a new metric called probe_($MODULE)_failure_reason to the blackbox_exporter. This metric would provide more detailed information about the reasons behind probe failures. It would include a label named "reason" with descriptive and enumerable values such as "dns-resolution-error," "http-timeout," or "ssl-certificate-validation-failed," among others. Currently, these failures can only be inferred from the logged errors.
Benefits
The introduction of the probe_($MODULE)_failure_reason metric would significantly enhance troubleshooting capabilities. In most cases users would be able to identify the root cause of a probe failure without the need for manual log inspection or additional testing. Moreover, this new metric would facilitate the setup of alerts and notifications tailored to specific failure scenarios.
Contribution
We believe that incorporating the probe_($MODULE)_failure_reason metric would be a valuable enhancement for the blackbox_exporter, improving its usability and effectiveness. We would be happy to contribute to the development of this feature and provide feedback on its implementation.
Thank you for considering our proposal. If this is something that would be ok to go forward with we’d love to contribute the functionality to blackbox_exporter.