Skip to content

Conversation

Millefeuille42
Copy link

No description provided.

@Millefeuille42 Millefeuille42 self-assigned this Mar 5, 2025
MarkSymsCtx and others added 27 commits March 20, 2025 11:23
Signed-off-by: Mark Syms <mark.syms@cloud.com>
… After Disabling and Enabling Multipath

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
This was a patch added to the sm RPM git repo before we had this
forked git repo for sm in the xcp-ng github organisation.
Originally-by: Ronan Abhamon <ronan.abhamon@vates.fr>

This version obtained through merge in
ff1bf65:

 git restore -SW -s ydi/forks/2.30.7/xfs drivers/EXTSR.py
 mv drivers/EXTSR.py drivers/XFSSR.py
 git restore -SW drivers/EXTSR.py

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Some important points:

- linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix).

- Encrypted VDI are supported with key_hash attribute (not tested, experimental).

- When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless
devices are not necessarily created on other hosts. So if a resource definition exists without
local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume
is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the
volume if the symlink has not yet been created by the udev rule.

- Can change the provisioning using the device config 'provisioning' param.

- We can only increase volume size (See: LINBIT/linstor-server#66),
it would be great if we could shrink volumes to limit the space used by the snapshots.

- Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present
to do this from slaves. The same plugin is used to open LINSTOR ports + start controller.

- Use a `total_allocated_volume_size` method to have a good idea of the reserved memory
Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok,
but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method
solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the
volume data.

- Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info,
the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that
contains the DRBD data or a remote disk if the DRBD device is diskless.

- If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no
errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when
VDIs are loaded.

- Refresh PhyLink when unpause in called on DRBD devices:
We must always recreate the symlink to ensure we have
the right info. Why? Because if the volume UUID is changed in
LINSTOR the symlink is not directly updated. When live leaf
coalesce is executed we have these steps:
"A" -> "OLD_A"
"B" -> "A"
Without symlink update the previous "A" path is reused instead of
"B" path. Note: "A", "B" and "OLD_A" are UUIDs.

- Since linstor python modules are not present on every XCP-ng host,
module imports are protected by try.. except... blocks.

- Provide a linstor-monitor daemon to check master changes
- Check if "create" doesn't succeed without zfs packages
- Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)
Co-authored-by: Piotr Robert Konopelko <piotr.konopelko@moosefs.pro>
Signed-off-by: Aleksander Wieliczko <aleksander.wieliczko@moosefs.pro>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir
used during SR creation is unmounted at the end of the `create` call (and also
when a PBD is unplugged) in `detach` block.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
A sm-config boolean param `subdir` is available to configure where to store the VHDs:
- In a subdir with the SR UUID, the new behavior
- In the root directory of the MooseFS SR

By default, new SRs are created with `subdir` = True.
Existing SRs  are not modified and continue to use the folder that was given at
SR creation, directly, without looking for a subdirectory.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Ensure all shared drivers are imported in `_is_open` definition to register
them in the driver list. Otherwise this function always fails with a SRUnknownType exception.

Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`.
Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of
the `serverpath` param because its value is equal to None.)

`location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
SR_CACHING offers the capacity to use IntelliCache, but this
feature is only available using NFS SR.

For more details, the implementation of `_setup_cache` in blktap2.py
uses only an instance of NFSFileVDI for the shared target.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
* `except` syntax fixes
* drop `has_key()` usage
* drop `filter()` usage (but drop their silly `list(x.keys())` wrappings)
* drop `map()` usage
* use `int` not `long`
* use `items()` not `iteritems()`

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
…store

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Guided by futurize's "old_div" use

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
PROBE_MOUNTPOINT in a some drivers is a relative path, which is resolved
using MOUNT_BASE at probe time, but CephFS, GlusterFS and MooseFS it is
set on driver load to an absolute path, and this requires MOUNT_BASE to be
looking like a path component.

```
drivers/CephFSSR.py:69: in <module>
    PROBE_MOUNTPOINT = os.path.join(SR.MOUNT_BASE, "probe")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = <MagicMock name='mock.MOUNT_BASE' id='140396863897728'>, p = ('probe',)

    def join(a, *p):
        """Join two or more pathname components, inserting '/' as needed.
        If any component is an absolute path, all previous path components
        will be discarded.  An empty last part will result in a path that
        ends with a separator."""
>       a = os.fspath(a)
E       TypeError: expected str, bytes or os.PathLike object, not MagicMock

/usr/lib64/python3.6/posixpath.py:80: TypeError
```

Note this same idiom is also used in upstream SMBFS, although that does not
appear to cause any problem with the tests.

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
(coverage 7.2.5)

Without these changes many warns/errors are emitted:
- "assertEquals" is deprecated, "assertEqual" must be used instead
- mocked objects in "setUp" method like "cleanup.IPCFlag" cannot be repatched
  at the level of the test functions, otherwise tests are aborted,
  this is the  behavior of coverage version 7.2.5

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
The probe method is not implemented so we
shouldn't advertise it.

Signed-off-by: BenjiReis <benjamin.reis@vates.fr>
Impacted drivers: LINSTOR, MooseFS and ZFS.
- Ignore all linstor.* members during coverage,
  the module is not installed in github runner.
- Use mock from unittest, the old one is not found now.
- Remove useless return from LinstorSR scan method.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: BenjiReis <benjamin.reis@vates.fr>
)

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
A SR inheriting from a EXTSR allowing to use a 4KiB blocksize device as
SR.
Create a 512 bytes loop device on top of a 4KiB device then give it to
EXTSR code.
It uses the same device-config as a normal local SR, i.e.
`device-config:device=/dev/nvme0n1`

After creation, the driver find the device under the VG to identify the
correct disk.
It means that creating the SR with a non-stable disk identifier is
doable and it will work as EXTSR would by ignoring the device-config
after creation.
Identifying the correct disk by using LVM infos.

The VG is created using a different prefix name from EXTSR.
It is `XSLocalLargeBlock-<SR UUID>`.

The SR artificially limits the creation to disk not being 512b.
It will throw an error if a disk whose blocksize is 512 is given.

We currently don't support multi devices, it fails at the EXTSR creation.
We added an error to explicitly say that multi devices SR is not supported
on the driver.
Before that, it would make another error:
```
Error code: SR_BACKEND_FAILURE_77
Error parameters: , Logical Volume group creation failed,
```
Sometimes the pvremove from EXTSR using the loop device fails.
In this case, we need to remove the real device from PV list ourself in
the error handling.

Signed-off-by: Damien Thenot <damien.thenot@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Millefeuille42 and others added 2 commits June 24, 2025 11:06
Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Millefeuille <millefeuille42@proton.me>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Millefeuille42 and others added 6 commits June 25, 2025 15:18
Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Co-authored-by: Damien Thenot <damien.thenot@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
@Millefeuille42 Millefeuille42 force-pushed the mlr-533-cache-controller-uri-in-a-file branch from 4b663cb to 363372d Compare June 26, 2025 09:39
Comment on lines 228 to 235
def get_cached_controller_uri(ctx=None):
try:
with ctx if ctx else shared_reader(CONTROLLER_CACHE_PATH) as f:
return f.read().strip()
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to read controller URI cache file at `{}` : {}'.format(CONTROLLER_CACHE_PATH,e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_cached_controller_uri(ctx=None):
try:
with ctx if ctx else shared_reader(CONTROLLER_CACHE_PATH) as f:
return f.read().strip()
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to read controller URI cache file at `{}` : {}'.format(CONTROLLER_CACHE_PATH,e))
def _read_controller_uri_from_file(f):
try:
return f.read().strip()
except Exception as e:
util.SMlog('Unable to read controller URI cache file at `{}`: {}'.format(CONTROLLER_CACHE_PATH, e))
def read_controller_uri_cache():
try:
with shared_reader(CONTROLLER_CACHE_PATH) as f:
return _read_controller_uri_from_file(f)
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to read controller URI cache file at `{}`: {}'.format(CONTROLLER_CACHE_PATH, e))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 4f84f23

Comment on lines 238 to 246
def delete_controller_uri_cache(ctx=None):
try:
with ctx if ctx else excl_writer(CONTROLLER_CACHE_PATH) as f:
f.seek(0)
f.truncate()
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to delete uri cache file at `{}` : {}'.format(CONTROLLER_CACHE_PATH, e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def delete_controller_uri_cache(ctx=None):
try:
with ctx if ctx else excl_writer(CONTROLLER_CACHE_PATH) as f:
f.seek(0)
f.truncate()
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to delete uri cache file at `{}` : {}'.format(CONTROLLER_CACHE_PATH, e))
def delete_controller_uri_cache():
try:
with excl_writer(CONTROLLER_CACHE_PATH) as f:
f.seek(0)
f.truncate()
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to delete URI cache file at `{}`: {}'.format(CONTROLLER_CACHE_PATH, e))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 4f84f23

Comment on lines 249 to 279
def write_controller_uri_cache(uri, ctx=None):
try:
if not os.path.exists(CONTROLLER_CACHE_DIRECTORY):
os.makedirs(CONTROLLER_CACHE_DIRECTORY)
os.chmod(CONTROLLER_CACHE_DIRECTORY, 0o700)
with ctx if ctx else excl_writer(CONTROLLER_CACHE_PATH) as f:
f.seek(0)
f.write(uri)
f.truncate()
except FileNotFoundError:
pass
except Exception as e:
util.SMlog('Unable to write URI cache file at `{}` : {}'.format(CONTROLLER_CACHE_PATH, e))


def build_controller_uri_cache():
with excl_writer(CONTROLLER_CACHE_PATH) as f:
uri = get_cached_controller_uri(contextlib.nullcontext(f))
if uri:
return uri
uri = _get_controller_uri()
if not uri:
for retries in range(9):
time.sleep(1)
uri = _get_controller_uri()
if uri:
break

retries += 1
if retries >= 10:
break
time.sleep(1)
if uri:
write_controller_uri_cache(uri, contextlib.nullcontext(f))
return uri
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot add suggestion due to github limitation but:

def build_controller_uri_cache():
    uri = ''
    try:
        with excl_writer(CONTROLLER_CACHE_PATH) as f:
            uri = _read_controller_uri_from_file(f)
            if uri:
                return uri
            uri = _get_controller_uri()
            if not uri:
                for retries in range(9):
                    time.sleep(1)
                    uri = _get_controller_uri()
                    if uri:
                        break
            if uri:
                f.seek(0)
                f.write(uri)
                f.truncate()
    except FileNotFoundError:
        if os.path.exists(CONTROLLER_CACHE_DIRECTORY):
            raise
        os.makedirs(CONTROLLER_CACHE_DIRECTORY)
        os.chmod(CONTROLLER_CACHE_DIRECTORY, 0o700)
        return build_controller_uri_cache()
    except Exception as e:
        util.SMlog('Unable to write URI cache file at `{}` : {}'.format(CONTROLLER_CACHE_PATH, e))

    return uri

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 1f6ffbb

Comment on lines 282 to 286
def get_controller_uri():
uri = get_cached_controller_uri()
if not uri:
uri = build_controller_uri_cache()
return uri
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_controller_uri():
uri = get_cached_controller_uri()
if not uri:
uri = build_controller_uri_cache()
return uri
def get_controller_uri():
uri = read_controller_uri_cache()
if not uri:
uri = build_controller_uri_cache()
return uri

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 4f84f23

:param function logger: Function to log messages.
:param int attempt_count: Number of attempts to join the controller.
"""
uri = get_cached_controller_uri()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uri = get_cached_controller_uri()
uri = read_controller_uri_cache()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 4f84f23

Millefeuille42 and others added 5 commits June 26, 2025 14:41
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
…che files

Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
@Millefeuille42 Millefeuille42 requested a review from Wescoeur June 30, 2025 08:18
@Wescoeur Wescoeur force-pushed the 3.2.12-8.3 branch 3 times, most recently from e7801da to ca5b52d Compare August 28, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants