Skip to content

🏷️ sparse: overhauled stubs for diags_array, eye_array, and their matrix equivalents #733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jul 18, 2025

Conversation

JulVandenBroeck
Copy link
Contributor

Hello!

I improved support for the different sparse formats in diags, spdiags, diags_array, identity, eye, and eye_array.

Additionally, I made the stubs of thediags_array function more versatile, so that it supports Python sequences a little better (see the extra tests). To achieve this I needed the optype.numpy.AnyFloat64DType, but without the None option, which I reimplemented in the _construct.pyi file.
Is there already such a type? Alternatively, is it too late to remove None and alter all uses to include | None? I saw that this is the case in most places already.

Let me know!

Kind regards,
Jul

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

Is there already such a type? Alternatively, is it too late to remove None and alter all uses to include | None? I saw that this is the case in most places already.

That | None stuff has been bothering me for a while now. Changing it will be backwards incompatible, but we can probably get away with that if we put in in a new 0.11.0 release. I'll try do to do that today.

@JulVandenBroeck JulVandenBroeck force-pushed the improve-sparse-construct branch from 1770af8 to df6a50d Compare July 9, 2025 14:15
@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 9, 2025

Ok, good to hear that you share the same sentiment.
I accidentally pushed a VSCode setting, sorry about that. Should be good to go now :)

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 9, 2025

Ah, one more thing. I struggled with the following problem:

I wanted to support an input of type list[int | float] or list[list[int | float]] (bool gives an error in diags_array), so that it yields an output of type dia_array[np.float64], which is always the case even for lists with only integers.
However, as soon as there is one complex number in any of the lists, I want to output dia_array[np.complex128]. The problem is that int and float are treated as subtypes of complex, and I was unable to fix this without getting errors about overlapping overloads.

Any idea on how to elegantly solve this issue?
Right now I simply return dia_array[Any], until something better is found.

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty awesome 🤯


I really wish we could do something like def f[T: str](x: T) -> Result[T]: ... with

case type Result["a"] = A
case type Result["b"] = B
...

but the typing committee didn't seem very enthusiastic when I and others have proposed similar ideas like this before (even though it could directly replace overloads in a DRY and easier-to-specify way) 🤷🏻

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

I wanted to support an input of type list[int | float] or list[list[int | float]] (bool gives an error in diags_array), so that it yields an output of type dia_array[np.float64], which is always the case even for lists with only integers.
However, as soon as there is one complex number in any of the lists, I want to output dia_array[np.complex128]. The problem is that int and float are treated as subtypes of complex, and I was unable to fix this without getting errors about overlapping overloads.

I think the issue is the invariance of list. The standard workaround for that is to use the covariant collections.abc.Sequence, but that's not always appropriate. Alternatively, you could use a "free" T = TypeVar("T", int, float) or something, that you only use in list[T] (but I didn't try this out myself yet, so I'm not 100% certain it will work).

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

I accidentally pushed a VSCode setting, sorry about that. Should be good to go now :)

If you think it can help other VSCode users (like me) as well, feel free to open a separate PR for it.

@JulVandenBroeck
Copy link
Contributor Author

I think the issue is the invariance of list. The standard workaround for that is to use the covariant collections.abc.Sequence, but that's not always appropriate. Alternatively, you could use a "free" T = TypeVar("T", int, float) or something, that you only use in list[T] (but I didn't try this out myself yet, so I'm not 100% certain it will work).

I used list in my comment as a concrete example, but during my testing I did use collections.abc.Sequence, still giving the aforementioned errors. Given the complexity and the already large amount of overloads, I'll leave it like this. Most users will probably work with numpy arrays anyways :)

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 9, 2025

I see that you merged the change to onp.ToFloat64DType. Should I update my branch and remove my custom type in this PR too?

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

I see that you merged the change to onp.ToFloat64DType. Should I update my branch and remove my custom type in this PR too?

I'll open a separate PR that you can rebase onto after it's merged

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

I see that you merged the change to onp.ToFloat64DType. Should I update my branch and remove my custom type in this PR too?

I'll open a separate PR that you can rebase onto after it's merged

#737 has been merged :)

@JulVandenBroeck JulVandenBroeck force-pushed the improve-sparse-construct branch from 44c98b8 to 215b980 Compare July 9, 2025 20:05
@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 10, 2025

I think I found a way to differentiate between float-type and complex-type sequences.

That being said, I get other errors that I can't really explain and are leading me to believe there might be a bug in basedpyright, but I want to check with you to be certain.
What is the best way to share some code that gives errors?

Maybe some context:
1D numpy arrays form no problem, however all of the sudden inputs of type np.ndarray[tuple[int, int], np.dtype[...]] are not matched to onp.CanArray2D[_SCT] anymore (I think they were before though?).
However, when I create a dummy function in the same file to test the matching, suddenly the type checker seems to recognize its mistake and all errors get resolved. Very strange behavior, but again not sure how I can demonstrate it.

@jorenham
Copy link
Member

I think I found a way to differentiate between float-type and complex-type sequences.

I forgot to mention it earlier, but you can use the optype.JustFloat and optype.JustComplex for that (docs). So e.g. Sequence[op.JustComplex] accepts [1j] but rejects [1.0].

What is the best way to share some code that gives errors?

If you can make it reproducible without scipy-stubs, then https://basedpyright.com/ is a nice way. Otherwise, you could push it to a new branch on your fork.

I get other errors that I can't really explain and are leading me to believe there might be a bug in basedpyright, but I want to check with you to be certain.

Coincidentally, I just ran into a very strange case of pyright-specific "spooky action at a distance" myself: #741 (comment)
Perhaps they're related 🤷🏻?

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 10, 2025

I forgot to mention it earlier, but you can use the optype.JustFloat and optype.JustComplex for that (docs). So e.g. Sequence[op.JustComplex] accepts [1j] but rejects [1.0].

I want to try to enable the input of mixed arrays like [[1, 0], [0.5, 1j]] and I think I found a way by using _PSCT as a match and np.number[_64Bit, float | _PSCT] as output, where _PSCT is a type variable bound to int | float | complex. This returns a type conform to np.float64 if the arrays contain only int and float, and to np.complex128 if they contain a mix of int, float, and complex. I'm at the limit of my knowledge here, so it might be too hacky.

If you can make it reproducible without scipy-stubs, then https://basedpyright.com/ is a nice way. Otherwise, you could push it to a new branch on your fork.

I'll try that tomorrow :)

Coincidentally, I just ran into a very strange case of pyright-specific "spooky action at a distance" myself: #741 (comment) Perhaps they're related 🤷🏻?

Seems different, since it's happening in the same file as the tests are in, but you never know...

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 14, 2025

Another issue: mypy complains about _ToArray1D2D requiring two type aliases (first _PSCT, then _SCT), while pyright accepts only _ToArray1D2D[_SCT] or _ToArray1D2D[_SCT, _PSCT], so the other way around... Weird stuff going on with pyright.

@JulVandenBroeck
Copy link
Contributor Author

I fixed it by writing the type out, instead of the alias.

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 14, 2025

As can be seen from the commit message, the Python list typings are causing too many headaches. I noticed different behavior between mypy and pyright for lists like [1.0, 1j, 1]. pyright gives list[int | float | complex], while mypy gives list[Any], so I gave up and just return a sparse array of either np.float64 or np.complex128.

All tests pass now, but the local issue withpyright tests remains and in VSCode these errors are still reported.

@JulVandenBroeck JulVandenBroeck requested a review from jorenham July 15, 2025 22:21
@jorenham
Copy link
Member

I'll have another look at this tomorrow, sorry for the delay

@jorenham
Copy link
Member

I think we're nearing the end of this PR 😅

I think so too

@JulVandenBroeck
Copy link
Contributor Author

 Error: No overloads for "diags" match the provided arguments (reportCallIssue)
/home/runner/work/scipy-stubs/scipy-stubs/tests/sparse/test_construct.pyi:67:13 - error: "assert_type" mismatch: expected "bsr_matrix[floating[_32Bit]]" but received "Unknown" (reportAssertTypeFailure)
Error: "assert_type" mismatch: expected "bsr_matrix[floating[_32Bit]]" but received "Unknown" (reportAssertTypeFailure)
/home/runner/work/scipy-stubs/scipy-stubs/tests/sparse/test_construct.pyi:67:43 - error: Argument of type "Literal['bsr']" cannot be assigned to parameter "format" of type "_FmtDIA | None" in function "diags"
  Type "Literal['bsr']" is not assignable to type "_FmtDIA | None"
    "Literal['bsr']" is not assignable to "None"
    "Literal['bsr']" is not assignable to type "_FmtDIA" (reportArgumentType)

I have no clue why this error is now triggered... Why is overload 2 not checked, which has format: _FmtBSR? I'm going to bed... 😅

@JulVandenBroeck
Copy link
Contributor Author

That being said, I restructured all functions in _construct.pyi so that overloads are bundled in groups in which all argument types are the same, except for the format. This is more visually pleasing, and also easier to wrap my mind around. I slightly tweaked some overloads as well, since I found an error in the stubs for random and rand (random_state is not an argument anymore).

@JulVandenBroeck
Copy link
Contributor Author

I'll have to work on the other comments later, since I'm going on vacation!

@jorenham
Copy link
Member

That being said, I restructured all functions in _construct.pyi so that overloads are bundled in groups in which all argument types are the same, except for the format. This is more visually pleasing, and also easier to wrap my mind around. I slightly tweaked some overloads as well, since I found an error in the stubs for random and rand (random_state is not an argument anymore).

So the pyright false positives weren't there before the big shuffle?

I'll have to work on the other comments later, since I'm going on vacation!

Ah good to know; I'll go ahead and release 1.16.0.3 then. Have fun!

@jorenham
Copy link
Member

jorenham commented Jul 17, 2025

(random_state is not an argument anymore).

The @_transition_to_rng decorator adds it :)

See #648 and scipy/scipy#23216 for the nasty details surrounding it

@JulVandenBroeck
Copy link
Contributor Author

So the pyright false positives weren't there before the big shuffle?

Nope... But they seem to be very sensitive to slight tweaks.

Ah good to know; I'll go ahead and release 1.16.0.3 then. Have fun!

Thanks! I kinda figured you were waiting for this PR to release the next version haha, sorry for the delay. On the other hand, I still want to fix the other functions in _construct.pyi as well, since I'm getting errors with "block_diag" in my code, where I use specific formats. It will be nice to combine all changes into a new release then.

@jorenham
Copy link
Member

Nope... But they seem to be very sensitive to slight tweaks.

It must be a quantum bug: I've seen it causing spooky action at a distance before, and this sounds like a case of decoherence. So putting it in the (dilution) fridge sounds like a good idea to me 👌🏻.

@jorenham
Copy link
Member

jorenham commented Jul 17, 2025

I still want to fix the other functions in _construct.pyi as well, since I'm getting errors with "block_diag" in my code, where I use specific formats. It will be nice to combine all changes into a new release then.

Then I'd prefer that in a PR, so that it shows up nicely in the release notes. It'll make it easier to review as well 😇.

@JulVandenBroeck
Copy link
Contributor Author

Turns out I'm just an idiot (accidentally removed some overloads during refactoring + added some * and / tokens that were out of place, because I learned about them approx. last week for the first time)! Happy ending :p
The additional unit tests are for later...

@jorenham
Copy link
Member

because I learned about them approx. last week for the first time

That's quite impressive then

The additional unit tests are for later...

So then I shouldn't merge this yet :p ?

@jorenham jorenham requested a review from Copilot July 18, 2025 10:37
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request significantly overhauls the type stubs for sparse matrix/array construction functions in SciPy's sparse module. The changes improve type safety by providing more precise return types based on the format parameter and enhance support for different sparse formats.

Key changes include:

  • Comprehensive overload definitions for all sparse formats ("bsr", "coo", "csc", "csr", "dia", "dok", "lil") across multiple functions
  • Enhanced type annotations for diags_array to better support Python sequences with complex number handling
  • Systematic refactoring of function signatures to use consistent type aliases and parameter patterns

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
scipy-stubs/sparse/_construct.pyi Major overhaul of function stubs with comprehensive format-specific overloads and improved type aliases
tests/sparse/test_construct.pyi Extensive test coverage additions for all sparse formats and edge cases, including complex number handling

Comment on lines 41 to 47
_Numeric: TypeAlias = np.bool_ | npc.number

_T = TypeVar("_T")
_SCT = TypeVar("_SCT", bound=_Numeric, default=Any)
_SCT0 = TypeVar("_SCT0", bound=_Numeric)
_ShapeT = TypeVar("_ShapeT", bound=tuple[int, *tuple[int, ...]], default=tuple[Any, ...])

Copy link
Preview

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _Numeric type alias is redefined when it was already defined below. The order should be consistent with the existing codebase pattern.

Suggested change
_Numeric: TypeAlias = np.bool_ | npc.number
_T = TypeVar("_T")
_SCT = TypeVar("_SCT", bound=_Numeric, default=Any)
_SCT0 = TypeVar("_SCT0", bound=_Numeric)
_ShapeT = TypeVar("_ShapeT", bound=tuple[int, *tuple[int, ...]], default=tuple[Any, ...])
_T = TypeVar("_T")
_SCT = TypeVar("_SCT", bound=_Numeric, default=Any)
_SCT0 = TypeVar("_SCT0", bound=_Numeric)
_ShapeT = TypeVar("_ShapeT", bound=tuple[int, *tuple[int, ...]], default=tuple[Any, ...])
_Numeric: TypeAlias = np.bool_ | npc.number

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this even mean @jorenham? Seems counterintuitive to put a variable under its use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In .pyi that's not a problem; forward references are fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda see how it can be cleaner to keep the typevars and type aliases separated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to ignore it though

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 18, 2025

So then I shouldn't merge this yet :p ?

You can! I don't know how pressing these unit tests are. Maybe it can be marked as a TODO with an issue?

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 18, 2025

Mr. Copilot found a missing comma! The robot did forget how to count (there are only 56 overloads for diags_array...)

@jorenham
Copy link
Member

Maybe it can be marked as a TODO with an issue?

An inline # TODO(julvandenbroeck): ... or an issue are both fine for that, as far as I'm concerned

Mr. Copilot found a missing comma! The robot did forget how to count (there are only 56 overloads for diags_array...)

Haha, that's exactly the thing you'd expect AI's to be good at; counting toothpicks overloads.

@jorenham
Copy link
Member

Anyway, let me know when I can press the button

@JulVandenBroeck
Copy link
Contributor Author

I added TODOs and moved _Numeric to the other TypeAliases. I'm leaving now, so if all tests pass you can merge 😄

@jorenham jorenham added this to the 1.16.0.3 milestone Jul 18, 2025
@jorenham jorenham merged commit 2f03182 into scipy:master Jul 18, 2025
16 checks passed
@jorenham
Copy link
Member

Thanks Jul!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants