-
Notifications
You must be signed in to change notification settings - Fork 208
[ENH] Use np.argpartition
for efficient top-k selection instead of np.argsort
#2805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for contributing to
|
np.argpartition
for efficient top-k selection instead of np.argsort
np.argpartition
for efficient top-k selection instead of np.argsort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test are failing due to an issue with the implementation, currently, sorted_index
is only keeping the first k
indexes of dist_profile
with the smallest distance. When not allowing trivial matches, it is possible that the top-k
matches are not included in the first k
indexes.
The original issue mention "adjust the algorithms, notably for the successive call to argpartition in cases where trivial matches should be avoided."
The idea is to try to find these k
best match in the first argpartition, and look into another k
argpartition, until you find k
matches or checked all values of dist profile
.
Am I clear or is this a bit confusion ?
Sorry for the earlier misunderstanding. Our main goal is to find |
No problem, this is quite a challenging topic. Yes I think you got it. We want to output |
Thanks for the changes, I'll run a benchmark to compare performance before making the decision, but it looks OK at first glance. |
Bit busy until 15th of june, i'll take a look at it after, thx 🙏 |
Okay, no problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally had a change to benchmark it, I confirm that this reduces runtime for large inputs, and the difference is negligible for small ones. Great job, thanks again !
Reference Issues/PRs
Fixes: #2713
What does this implement/fix? Explain your changes.
This PR replaces
np.argsort
withnp.argpartition
in cases where only the top-k smallest elements are needed, improving performance on large arrays.Does your contribution introduce a new dependency? If yes, which one?
Any other comments?
PR checklist
For all contributions
For new estimators and functions
__maintainer__
at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.For developers with write access