Skip to content

Delay initialization of secondary RAID arrays? #673

@n5ke

Description

@n5ke

When choosing to create and install on a software raid array through the installer, the system handles early assembly for the root device through initramfs and passing the kernel option rd.auto through the grub boot entries.

This, at least in some cases, causes secondary raid arrays present to end up in a failed state. My guess is this might be caused due to certain drives (in my case, Kioxia CD8-V) needing some time to initialise properly. The result is something like this:

            Version : 1.0
      Creation Time : Fri Jan 24 15:53:25 2025
         Raid Level : raid10
      Used Dev Size : 18446744073709551615
       Raid Devices : 6
      Total Devices : 6
        Persistence : Superblock is persistent
 
        Update Time : Fri Jan 24 16:32:09 2025
              State : active, FAILED, Not Started
     Active Devices : 6
    Working Devices : 6
     Failed Devices : 0
      Spare Devices : 0
 
             Layout : far=2
         Chunk Size : 128K
 
 Consistency Policy : unknown
 
               Name : xx-xcp-01:0  (local to host xx-xcp-01)
               UUID : xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
             Events : 227
 
     Number   Major   Minor   RaidDevice State
        -       0        0        0      removed
        -       0        0        1      removed
        -       0        0        2      removed
        -       0        0        3      removed
        -       0        0        4      removed
        -       0        0        5      removed
 
        -     259        9        1      sync   /dev/nvme6n1
        -     259        7        0      sync   /dev/nvme2n1
        -     259        8        5      sync   /dev/nvme5n1
        -     259        6        3      sync   /dev/nvme1n1
        -     259       10        2      sync   /dev/nvme4n1
        -     259       11        4      sync   /dev/nvme7n1

Instructing the kernel to only assemble the OS array avoids this issue:
rd.auto rd.md.uuid=<UUID-of-boot-array>
(can be retrieved with mdadm --detail /dev/md127)

All secondary arrays assemble automatically after that, at a later stage and without causing failures on SR defined on them, with no further configuration needed.

Understandably that would be ideally addressed upstream at the kernel or the drives firmware, but it will be harder to identify/fix there. Since the only reason to assemble an array so early is if it is needed to boot (because it contains the root filesystem), I think it would be a good idea for the XCP-NG installation/update scripts -or some kernel/other rpm package scripts, if it's handled there) to include the above command line option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions