Skip to content

Change MetricFamily dictionary key from 32 to 64 bits hash to reduce collisions #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025

Conversation

ogxd
Copy link
Contributor

@ogxd ogxd commented Mar 18, 2025

Context

The MetricFamily uses a 32-bit hashes of the labels as keys to disambiguate series. This can be an issue as collisions are very common on such a narrow space.

According to the birthday paradox, and considering a good hash function (which HashCode.Combine is), there is a 50% chance of collisions for 77163 32-bit hashes. In our case, that translate to as many chances to have two series colliding, resulting in metrics data to be mixed up, which can be very nasty to debug.

To prove this point, here is a test that shows that collisions do happen:

[Fact]
public async Task TestCollisions()
{
    const int seriesCount = 77163;
    string output = await CollectionTestHelper.CollectAsync(factory => {
        var counter = factory.CreateCounter("test", "with help text", ("label1", "label2"));

        ulong c = 0;
        string GetNextUniqueString()
        {
            return c++.ToString();
        }

        for (int i = 0; i < seriesCount; i++)
        {
            counter.WithLabels((GetNextUniqueString(), GetNextUniqueString())).Inc(5.5);
        }
    });

    // Name + description + seriesCount samples + 1 empty line
    Assert.Equal(seriesCount + 3, output.Split('\n').Length);
}

This test does fail about 50% of the time.

Solution

A solution would be to compare labels when checking for equality. This however would require some changes due to the support of pre-.NET6 because of the missing ITuple interface, requiring more usage of compiled expression, which would complexify the code.

Instead, I propose to increase the hash width to 64-bit. This can be easily done by replacing the Int32 by a struct containing two Int32: one that remains used as a key, as we do today, but a second one to then check the equality, effectively reducing the odds of collision by a factor 77163, if I'm correct. That brings us to having to generate 5954128569 series to have a 50% chance of collisions, which to be goes from "quite likely to have collisions" to "very unlikely".

Of course, the two generated hashes need to be different, that's why I introduced a seed, which was quite an easy addition given the current code.

This PR presents this solution.

Performance

I ran the CounterUsage benchmark on my Macbook pro M1:

Method Mean Error StdDev Median Min Max Gen0 Allocated
LabelledCreation Before 32.58 ns 0.180 ns 0.150 ns 32.30 ns 32.81 ns 32.56 ns 0.0063 40 B
LabelledCreation After 54.77 ns 0.210 ns 0.175 ns 54.70 ns 54.55 ns 55.22 ns 0.0063 40 B

It seems this changes does have a small performance cost, which I think is acceptable considering the issue this addresses. If this is a concern, an improvement could be to generate a single 64-bit hash instead, but that would be require access to such function, which I don't think we have pre .NET 6 without a third party library.

@ogxd ogxd requested a review from phnx47 as a code owner March 18, 2025 21:29
@phnx47 phnx47 self-assigned this Mar 19, 2025
Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.40%. Comparing base (657c476) to head (a86aa94).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/Prometheus.Client/MetricFamily.cs 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #342      +/-   ##
==========================================
- Coverage   88.41%   88.40%   -0.01%     
==========================================
  Files          48       48              
  Lines        1752     1760       +8     
  Branches      233      234       +1     
==========================================
+ Hits         1549     1556       +7     
  Misses        154      154              
- Partials       49       50       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@phnx47
Copy link
Member

phnx47 commented Mar 19, 2025

@ogxd Thank you! I'll need some time to prepare the next v6 release.

Regarding performance, for .NET 6 or greater, can use preprocessor directives like this:

#if NET6_0_OR_GREATER
    // Code for .NET 6 or greater
#else
    // Fallback code for earlier versions
#endif

@phnx47 phnx47 merged commit 85c5d33 into prom-client-net:main Mar 19, 2025
6 checks passed
@ogxd
Copy link
Contributor Author

ogxd commented Mar 19, 2025

Thanks for the quick review 😄

Regarding performance, for .NET 6 or greater, can use preprocessor directives like this

Indeed, I'll see if we can leverage this for further optimization.

I have another (unrelated) question: I'm looking for being able to have a time to live for unused metrics. This feature is not available in this library but does exists in the other implementation.
Since the method RemoveLabelled exists and because MetricFamily is already inherently thread-safe, would you be open to such change? I'd like to submit a PR implementing this feature. I think it should be fairly easy to implement.

Also, do you have a rough ETA for the v6 release? Or even a pre-release or release candidate (rc) package?

Thank you for the work you put into this project!

@phnx47
Copy link
Member

phnx47 commented Mar 19, 2025

I'm happy to merge new PRs.

All breaking changes have already been merged into the main branch. I’ll try to finalize everything this month for v6. The latest code from the main branch can be installed via GitHub Packages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants