Skip to content

Failed hot spare drive is not detected / HPE ssacli #229

@crocodileneptune

Description

@crocodileneptune

Hello Glen,

first of all thanks so much for your work!

I noticed that your check_raid.pl plugin doesn't seem to trigger the warning or critical state in the case of a failed spare drive. In my case, the server used to run on two harddisks in a RAID1 configuration, with another harddisk configured as a hot spare device. When I looked at ILO logs last night, I saw that the harddisk in bay 1 failed and the hot spare in bay 3 was activated some time ago.

I would have expected that the check_raid.pl plugin would trigger some sort of warning if any harddisk fails which is why I now created this bug report. I don't mind the exact state (warning or critical), but a failed device needs to trigger an action which is why I am using the plugin. I read CONTRIBUTING.md and I hope that all relevant details are included in this bug report.

Output of check_raid -d:

# /usr/lib/nagios/plugins/check_raid.pl -d
check_raid 4.0.10
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs
Please include output of **ALL** commands in bugreport

DEBUG EXEC: /sbin/dmsetup status --noflush at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /proc/mdstat at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /sbin/ssacli controller all show status at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /sbin/ssacli controller slot=0 logicaldrive all show at /usr/lib/nagios/plugins/check_raid.pl line 503.
OK: ssacli:[Smart Array P440ar[OK]: Array A(OK)[LUN1:OK]]

Output of each command from check_raid -d

/sbin/ssacli controller all show status

Smart Array P440ar in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK

/sbin/ssacli controller slot=0 logicaldrive all show

Smart Array P440ar in Slot 0 (Embedded)

   Array A

      logicaldrive 1 (558.88 GB, RAID 1, OK)

However, the failed hot spare drive is not detected, even though ssacli notices it:

/sbin/ssacli ctrl slot=0 pd all show status

   physicaldrive 1I:3:2 (port 1I:box 3:bay 2, 600 GB): OK
   physicaldrive 1I:3:3 (port 1I:box 3:bay 3, 600 GB): OK
   physicaldrive 1I:3:1 (port 1I:box 3:bay 1, 0 GB, spare): Failed

Additional environment details:

  • Debian 12 Bookworm
  • HPE DL360 Gen9 with a P440ar raid controller + BBU

Thanks and best wishes!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions