Skip to content

Conversation

@JRRudy1
Copy link

@JRRudy1 JRRudy1 commented Jul 14, 2024

This PR builds on PyO3#431 to migrate to PyO3 0.22.0 without needing to activate the problematic py-clone feature.

Background

PyO3 recently released version 0.22.0 which includes many new features, but also includes the next step in deprecating the gil-refs API by putting it behind the opt-in gil-refs feature flag. @bschoenmaeckers's PR went long way towards updating numpy for the migration by mirroring the gil-refs deprecation and feature-gating the methods that depend on it, but @juntyr raised the valid concerns about its unconditional activation of PyO3's py-clone feature flag.

The problem with py-clone

The PyO3 0.22.0 release includes PyO3#4095's removal of the ability to clone Py<T> while the GIL is not held, which was implemented by removing its Clone impl and requiring users to either convert to Bound<T> before cloning or call the clone_ref method instead. The opt-in py-clone feature flag was added that restores the Clone implementation, but a panic is triggered if a clone is attempted while the GIL is not held. Enabling this feature may be convenient and fairly harmless in some use cases but can be hugely problematic in others, and we should not unconditionally impose the risk of panicking clones on all numpy users.

Why not just add our own py-clone feature?

This was suggested in PyO3#431, but has a major problem related to the Element impl for PyObject (aka Py<PyAny>). The problem is that Clone is a supertrait of Element, so PyObject losing its Clone impl when the py-clone feature is disabled means its Element impl must be disabled as well. So if we made the feature opt-in as was done by PyO3, then by default we would lose all support for object arrays which would be a huge regression. Making the feature opt-out instead is also a bad option, so we need another way.

Could we implement Element for Bound<PyAny> instead?

No. Unfortunately Bound<T> does not (and can not) implement Send, which is another requirement for implementing Element. So retaining the Element impl for Py<T> is our only option for supporting object arrays without major reworks.

Eliminating our dependence on py-clone

The sad thing about this issue is that it really shouldn't be a problem in the context of numpy -- all of our API's require the GIL anyways, so it should always be safe to clone the PyObjects in our arrays. But that that doesn't help the fact that we still need a way to clone elements (and to_owned slices and array views), and we were relying on the Element trait's Clone requirement to do so. This PR solves the dilemma by replacing the Clone supertrait with a new trait PyClone, which can be safely implemented for PyObject and all existing Element impls.

The new PyClone trait

The new PyClone trait is a weaker form of the standard Clone trait that represents types which can be cloned while the GIL is held. Philosophically (though not in practice), this includes all Clone types that don't care about the GIL, as well as types like Py<T> that can only be cloned when it is held.

The trait has the following (simplified) definition:

pub trait PyClone: Sized {
    /// Create a clone of the value while the GIL is guaranteed to be held.
    fn py_clone(&self, py: Python<'_>) -> Self;
}

Any type that implements Clone can trivially implement PyClone by ignoring the py argument and simply delegating to its clone method. Meanwhile, PyObject can implement this trait by using the provided Python token to call Py::clone_ref. The actualy trait also defines a couple of methods (not shown above) for cloning collections, which have default implementations that types can override if there is a more efficient way to do so then simply mapping the py_clone method to each item.

Adding a trait like this to PyO3 was discussed in PyO3#4133, but it is not clear if/when this would be implemented. If such a trait ends up getting added to PyO3 we can later transition to using it, but I don't think we should hold up numpy's migration to 0.22.0 waiting for that to happen.

Drawbacks

Ideally we would provide a blanket impl of PyClone for all T: Clone and then a specific impl for PyObject, which would let us avoid the breaking change for anyone with custom Element impls that will now need to add a PyClone impl as well. Unfortunately, this is not allowed due to possible impl overlaps since the compiler asserts that PyO3 restoring the Clone impl for Py<T> should not be a breaking change. If PyClone trait was instead defined in the pyo3 crate then they could provide these impls, but it is unlikely that they would (as discussed in PyO3#4133) since it would prevent them defining necessary generic impls like Option<T: PyClone>.

So, this will inevitably need to be a breaking change for downstream crates defining custom Element impls for their types. Luckily it extremely rare for crates to do this, and adding the PyClone impl is absolutely trivial in comparison to all the other changes needed to migrate away from the gil-refs API in the PyO3 0.21/0.22 updates anyways.

Patching numpy internals

As mentioned before, all of the numpy API's already require access to a Python token to prove that the GIL is held, so most of the necessary updates simply require replacing item.clone() calls with item.py_clone(py). The only exceptions are the calls to <[T] as ToOwned>::to_owned and ArrayView::to_owned (both of which require T: Clone), which now delegate to PyClone's provided methods discussed above. Delegating to these collection-specific methods is used instead of simply mapping the py_clone method to avoid losing out on the optimized to_owned implementations for Clone types.

Tasks

  • Deprecate the gil-refs API
    • Add gil-refs feature guarding all usages of PyO3's gil-refs API (completed by @bschoenmaeckers)
    • Update doctests and examples to use the bound API instead of gil-refs
  • Eliminate py-clone feature dependency
    • Add PyClone trait to replace Clone as a supertrait of Element
    • Add PyClone impls for all internal Element types
    • Update numpy internals to use PyClone instead of Clone
  • Documentation
    • Document the PyClone trait
    • Update the Element docs to reference the newly required supertrait
    • Create changelog and/or migration guide
  • Add tests -- the existing tests cover the changes pretty well, but there may be PyClone-specific tests worth adding.

JRRudy1 added 17 commits July 12, 2024 15:24
…ment`.

This trait can be trivially implemented for `Clone` types, but can also be implemented for types such as `PyObject` that can only be safely cloned while the GIL is held.
… for `Clone` types.

The `impl_element_scalar` macro has been updated to invoke this new macro, and an explicit invocation was added for the types in the `datetime` module.
The impls were provided explicitly instead of using the `impl_py_clone` macro since making the macro flexible enough to handle the `const` param adds more complexity than it's worth.
This is the motivating impl for the changes, since `PyObject` can only be safely cloned while the GIL is held.
This is necessary in order to continue supporting object arrays after the update to PyO3 0.22.0 since `Py<T>` no longer implements `Clone` unless the 'py-clone' feature is enabled (which would be problematic for `numpy` to enable).

This is a breaking change for 2 main reasons:
1. Custom `Element` impls will now need to provide a `PyClone` impl as well, since providing a blanket impl for all `T: Clone` is not possible.
2. A type parameter `T: Element` will no longer have an implicit `Clone` bound, so code relying on it will need to update the bounds to `T: Element + Clone`.
… token and call the `py_clone` method instead of `Clone`.

Updated usages accordingly, which is trivial since the GIL is already held everywhere it is called.
…ad of `Clone` methods.

This change is trivial since the GIL is already held everywhere that the clones occur, with the minor technicality that the GIL is not guaranteed to be held in the definition of the `PyArrayMethods` trait, but only in the actual impl block for `Bound<PyArray`. As such, several method implementations had to be moved to the impl block instead of being default implementations in the trait definition, but this is not a breaking change since the trait is `Sealed` and we control the only impl.
…ttribute.

This is necessary since the `pyo3::pyobject_native_type_base` macro only implements `Debug` with the "gil-refs" feature since the 0.22.0 update. The `Debug` impls are no longer relevant anyway because `&PyArray`, `&PyArrayDescr`, etc. can only be obtained through the gil-refs API's.
@JRRudy1 JRRudy1 closed this Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant