Skip to content

🏷️ added additional overloads and improved existing stubs of block-related sparse construction functions #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 3, 2025

Conversation

JulVandenBroeck
Copy link
Contributor

I added all formats for the final functions in _construct.pyi.

Let me know if you spot any errors! πŸ˜„

@JulVandenBroeck
Copy link
Contributor Author

Looks like I've got some mypy errors. I'll fix them later.

@JulVandenBroeck
Copy link
Contributor Author

Most errors are related to mypy not recognizing [[<sparse array>], [None]] as type list[list[<sparse array> | None]]. Instead, the type is marked as list[object]. It does work for pyright though.
Simply removing the None entries from the test cases is a simple workaround.

There is still an error left that I have to investigate.

@JulVandenBroeck
Copy link
Contributor Author

Okay, the last error is basically the same issue. [[csr_arr], [dok_arr]] is recognized as list[object] again. I can also just remove this test case.

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 30, 2025

mypy < pyright ? πŸ‘€

@jorenham
Copy link
Member

jorenham commented Jul 30, 2025

mypy < pyright ? πŸ‘€

In my experience that's indeed often the case. But as we've recently seen, pyright also has its issues. I have high hopes for ty, though. Mypy has improved a lot over the lasts year though, which is certainly appreciated.

@jorenham
Copy link
Member

Most errors are related to mypy not recognizing [[<sparse array>], [None]] as type list[list[<sparse array> | None]]. Instead, the type is marked as list[object]. It does work for pyright though. Simply removing the None entries from the test cases is a simple workaround.

There is still an error left that I have to investigate.

That's probably a classic case of "join vs union", which the (based)pyright docs explain quite nicely: https://docs.basedpyright.com/v1.18.2/usage/mypy-comparison/#unions-vs-joins

I believe that mypy intends to address this class of issues though: python/mypy#12056.

@jorenham
Copy link
Member

Okay, the last error is basically the same issue. [[csr_arr], [dok_arr]] is recognized as list[object] again. I can also just remove this test case.

You could also leave it in with a # type: ignore[assert-type], and a comment explaining why mypy is behaving incorrectly (i.e. join vs union).

@JulVandenBroeck
Copy link
Contributor Author

Ok, I like that approach. I'll restore the problematic tests

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the block_diag return type (matrix vs array) still depend on the input, or did that maybe change in the meantime?

@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 30, 2025

Does the block_diag return type (matrix vs array) still depend on the input, or did that maybe change in the meantime?

The documentation states the following:

If at least one input is a sparse array, the output is a sparse array. Otherwise the output is a sparse matrix.

So, I thought it might be most accurate to create an overload for all possible combinations of matrices, numpy arrays, and sequences, while arrays are treated in a separate overload. I haven't found a way to get the right behavior for "a list with at least one element of type x", see also my comment here: #733 (comment).

@jorenham
Copy link
Member

I haven't found a way to get the right behavior for "a list with at least one element of type x"

Yea, I don't think that's something we can express in the Python type system. I suppose you could brute-force-engineer something like that for tuples, but that wouldn't be very pretty.

@JulVandenBroeck
Copy link
Contributor Author

I did try by creating a Protocol that contains __getitem__ returning x | y, but this approach only works for (based)pyright due to the union vs. join behavior again.

@JulVandenBroeck JulVandenBroeck requested a review from jorenham July 30, 2025 13:39
@jorenham
Copy link
Member

jorenham commented Jul 30, 2025

#777 and #778 seem to have caused some merge conflicts

@JulVandenBroeck
Copy link
Contributor Author

I think I did something wrong with git... Hopefully it didn't screw anything up.

@JulVandenBroeck
Copy link
Contributor Author

Looks good, but np.int32 gets interpreted as np.signedinteger[Any] in some Python/NumPy combinations. Seems like it doesn't occur anymore starting from NumPy version 2.2.

@jorenham
Copy link
Member

I think I did something wrong with git... Hopefully it didn't screw anything up.

Hmm yea, those commits don't look right. Did you by any chance rebase and then git push instead of git push -f?

@JulVandenBroeck JulVandenBroeck force-pushed the improved-block-functions branch from 779f04a to cfa2252 Compare July 31, 2025 17:11
@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 31, 2025

Hmm yea, those commits don't look right. Did you by any chance rebase and then git push instead of git push -f?

I think you nailed it, but now I'm in too deep apparently πŸ˜†.
Will it cause problems with the merge? Otherwise, I'll make a new branch and corresponding PR...

@jorenham
Copy link
Member

Hmm yea, those commits don't look right. Did you by any chance rebase and then git push instead of git push -f?

I think you nailed it, but now I'm in too deep apparently πŸ˜†.
Will it cause problems with the merge? Otherwise, I'll make a new branch and corresponding PR...

Haha no worries, I'll squash merge them

@jorenham jorenham self-requested a review August 2, 2025 15:56
@jorenham
Copy link
Member

jorenham commented Aug 2, 2025

Hmm, I just compared the timing results of uv run basedpyright --stats --verbose scipy-stubs/sparse, and this change increases the analysis time of scipy-stubs/sparse/_construct.pyi from 1038ms to 1316ms (+27%). Similarly, this increases the analysis time of tests/sparse/test_construct.pyi from 847ms to 964ms (+14%). There's little difference in outcomes when when repeating the measurements (I've seen +-25ms at most), so the hypothesis that this is causes a performance regression appears to be statistically significant.

I don't think that the test slowdown is any reason for concern, especially if you consider the tests that were added. But the +300ms of _construct.pyi itself can be pretty annoying for DX. For me it causes my laptop fan to go into helicopter mode for a couple of seconds each time that I make an edit to _construct.pyi with the basedpyright extension enabled, which wasn't the case before.

For context, these are the top 10 slowest analysis times per file in scipy-stubs:

1314ms: scipy-stubs/sparse/_construct.pyi
 919ms: scipy-stubs/signal/_signaltools.pyi
 460ms: scipy-stubs/linalg/_basic.pyi
 279ms: scipy-stubs/sparse/_base.pyi
 278ms: scipy-stubs/special/_ufuncs.pyi
 252ms: scipy-stubs/stats/_distribution_infrastructure.pyi
 227ms: scipy-stubs/sparse/linalg/_dsolve/linsolve.pyi
 221ms: scipy-stubs/sparse/_dok.pyi
 197ms: scipy-stubs/stats/_entropy.pyi
 190ms: scipy-stubs/ndimage/_filters.pyi
 190ms: scipy-stubs/special/_logsumexp.pyi

So even before this change, _construct.pyi was already the bottleneck, and this increases its bottlenecknessitude by +25% or so.

I'm also worried that this will affect scipy-stubs users that use scipy.sparse a lot. These kind of things have the tendency to snowball into additional seconds or minutes of CI runtime, and that can be pretty costly (and very annoying in general).


I'll see if I can figure out an easy way to improve the analysis time. Sometimes it's as easy as tweaking a single type alias, or swapping some overloads, so let's hope that's also the case here. In this case, I suspect that it has to do with the overload-overlap detection algorithm, which has an exponential time-complexity in terms of overload count and the amount of protocol members of the input types. So it might help to replace a couple of onp.CanArrayND with onp.CanArray. But to be honest, optimizing for type-checker performance isn't something I have much experience in, so I might very well be wrong here 🀷🏻.

Anyway, it's just something to keep in mind.


It would be pretty cool to have a script that warns about performance regressions in CI πŸ€”. I guess I'll open an issue for it.

@jorenham
Copy link
Member

jorenham commented Aug 2, 2025

Ok so there's two easy ways to bring down the analysis time a bit:

  1. set _ToArray1D2D: TypeAlias = onp.CanArray[tuple[int] | tuple[int, int], np.dtype[_SCT]] | Seq[_SCT | Seq[_SCT]], whichs shaves off ~170 ms or so
  2. create a new _ToDType: TypeAlias = type[complex | _Numeric] | np.dtype[_Numeric] | str, and replace all npt.DTypeLike with _ToDType for an improvement of ~120 ms.

Since these changes aren't directly related to this PR, I first thought I'd create a separate PR for this. But fix 2. in particular would've caused nasty merge conflicts. And even though it's not very clean, it's fine by me to put these performance tweaks in this PR. Those two should be enough to counteract the +300 ms :)

@JulVandenBroeck
Copy link
Contributor Author

Thanks for the detailed analysis!

It would be pretty cool to have a script that warns about performance regressions in CI πŸ€”. I guess I'll open an issue for it.

Indeed; I didn't notice this slowdown, so this warning would be quite nice!

Since these changes aren't directly related to this PR, I first thought I'd create a separate PR for this. But fix 2. in particular would've caused nasty merge conflicts. And even though it's not very clean, it's fine by me to put these performance tweaks in this PR. Those two should be enough to counteract the +300 ms :)

Sounds good! I'll start implementing these suggestions soon :)

@jorenham jorenham added this to the 1.16.1.1 milestone Aug 3, 2025
@jorenham jorenham merged commit 5b09c4c into scipy:master Aug 3, 2025
18 checks passed
@jorenham
Copy link
Member

jorenham commented Aug 3, 2025

Ok great; thanks Jul!

@JulVandenBroeck JulVandenBroeck deleted the improved-block-functions branch August 6, 2025 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants