Skip to content

Add Hamerly's accelerated K-means algorithm to linfa-clustering#439

Open
mrsanor wants to merge 3 commits intorust-ml:masterfrom
mrsanor:kmeans-hamerly
Open

Add Hamerly's accelerated K-means algorithm to linfa-clustering#439
mrsanor wants to merge 3 commits intorust-ml:masterfrom
mrsanor:kmeans-hamerly

Conversation

@mrsanor
Copy link
Copy Markdown

@mrsanor mrsanor commented Apr 21, 2026

Implement K-means Hamerly's triangle-inequality optimization as an alternative to Lloyd's algorithm for K-means clustering for speed optimization. For each observation the algorithm maintains upper/lower distance bounds and skips centroid comparisons that cannot change the assignment, yielding the same results as Lloyd but with significantly fewer distance computations when clusters are well separated.

Key changes:

  • The new Hamerly K-means algorithm (uses the same m_k-means for centroid calculations as Lloyd)
  • Add KMeansAlgorithm enum (Lloyd | Hamerly) and .algorithm() builder method
  • Reject Hamerly for incremental fit_with
  • Comprehensive tests

Here are the benchmarks between Lloyd and Hamerly

Screenshot 2026-04-21 at 15-26-35 k_means Summary - Criterion rs

@mrsanor mrsanor changed the title feat(linfa-clustering): Add Hamerly's accelerated K-means algorithm Add Hamerly's accelerated K-means algorithm to linfa-clustering Apr 21, 2026
Copy link
Copy Markdown
Member

@relf relf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution which looks good to me. Could you also rebase your PR on the current master? (some new lints have been fixed)

Comment thread algorithms/linfa-clustering/benches/k_means.rs Outdated
mrsanor added 2 commits May 7, 2026 10:35
Implement K-means Hamerly's triangle-inequality optimization as an alternative to Lloyd's algorithm for K-means clustering. For each observation the algorithm maintains upper/lower distance bounds and skips centroid comparisons that cannot change the assignment, yielding the same results as Lloyd but with significantly fewer distance computations when clusters are well separated.

Key changes:
- The new Hamerly K-means algorithm
- Add KMeansAlgorithm enum (Lloyd | Hamerly) and .algorithm() builder method
- Reject Hamerly for incremental fit_with
- Comprehensive tests
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.54%. Comparing base (8167a62) to head (617383d).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #439      +/-   ##
==========================================
+ Coverage   76.98%   77.54%   +0.55%     
==========================================
  Files         106      106              
  Lines        7405     7579     +174     
==========================================
+ Hits         5701     5877     +176     
+ Misses       1704     1702       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mrsanor
Copy link
Copy Markdown
Author

mrsanor commented May 7, 2026

Here is new report after the benchmark order change

Screenshot 2026-05-07 at 10-56-53 k_means Summary - Criterion rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants