-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Hello Glen,
first of all thanks so much for your work!
I noticed that your check_raid.pl plugin doesn't seem to trigger the warning or critical state in the case of a failed spare drive. In my case, the server used to run on two harddisks in a RAID1 configuration, with another harddisk configured as a hot spare device. When I looked at ILO logs last night, I saw that the harddisk in bay 1 failed and the hot spare in bay 3 was activated some time ago.
I would have expected that the check_raid.pl plugin would trigger some sort of warning if any harddisk fails which is why I now created this bug report. I don't mind the exact state (warning or critical), but a failed device needs to trigger an action which is why I am using the plugin. I read CONTRIBUTING.md and I hope that all relevant details are included in this bug report.
Output of check_raid -d:
# /usr/lib/nagios/plugins/check_raid.pl -d
check_raid 4.0.10
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs
Please include output of **ALL** commands in bugreport
DEBUG EXEC: /sbin/dmsetup status --noflush at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /proc/mdstat at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /sbin/ssacli controller all show status at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /sbin/ssacli controller slot=0 logicaldrive all show at /usr/lib/nagios/plugins/check_raid.pl line 503.
OK: ssacli:[Smart Array P440ar[OK]: Array A(OK)[LUN1:OK]]
Output of each command from check_raid -d
/sbin/ssacli controller all show status
Smart Array P440ar in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK
/sbin/ssacli controller slot=0 logicaldrive all show
Smart Array P440ar in Slot 0 (Embedded)
Array A
logicaldrive 1 (558.88 GB, RAID 1, OK)
However, the failed hot spare drive is not detected, even though ssacli notices it:
/sbin/ssacli ctrl slot=0 pd all show status
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, 600 GB): OK
physicaldrive 1I:3:3 (port 1I:box 3:bay 3, 600 GB): OK
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, 0 GB, spare): Failed
Additional environment details:
- Debian 12 Bookworm
- HPE DL360 Gen9 with a P440ar raid controller + BBU
Thanks and best wishes!