Skip to content

Distinct on a Value Type doesn't seem to work in all cases #389

@CyberBotX

Description

@CyberBotX

I am using the FatValueType from NetFabric's own LINQ benchmarks project (https://github.com/NetFabric/LinqBenchmarks), except I've made it a readonly record struct instead of just a struct, so it basically becomes the following:

public readonly record struct FatValueType
{
	public readonly int Value0 { get; }
	public readonly long Value1 { get; }
	public readonly long Value2 { get; }
	public readonly long Value3 { get; }
	public readonly long Value4 { get; }
	public readonly long Value5 { get; }
	public readonly long Value6 { get; }
	public readonly long Value7 { get; }

	public FatValueType(int value)
	{
		this.Value0 = value;
		this.Value1 = value;
		this.Value2 = value;
		this.Value3 = value;
		this.Value4 = value;
		this.Value5 = value;
		this.Value6 = value;
		this.Value7 = value;
	}

	public readonly bool IsEven() => (this.Value0 & 0x01) == 0;

	public static FatValueType operator +(in FatValueType left, in FatValueType right) => new(left.Value0 + right.Value0);

	public static FatValueType operator *(in FatValueType left, int right) => new(left.Value0 * right);
}

I'm using this in my own set of LINQ benchmarks, and I found that despite that EqualityComparer<FatValueType>.Default.GetHashCode() returns the same value for two identical instances of this value type, in the array that contains 4 distinct copies of each value, instead of Hyperlinq returning 100 values, it returns 162. The first 100 values are the first 100 from the original source, but the following 62 are the ones where Value0 is between 1 and 63, except for 32.

From tracing the code in a debugger, I find that it seems like Hyperlinq's Set<T> implementation might be at fault. I am not sure why it is failing in this case, but it seems that after it has added the first 100 items, the 101st item (which is when Value0 is 0) is correctly found as being in the set, but the 102nd item (which is when Value0 is 1) is not correctly found as being in the set.

Probably the simplest way I found to duplicate the problem, without knowing how to fix it, is with the following:

Enumerable.Range(0, 20).Select(i => new FatValueType(i)).Concat(Enumerable.Range(1, 10).Select(i => new FatValueType(i))).AsValueEnumerable().Distinct()

This should return a set of 20 values (0 through 19), but it instead returns all 30 values of the original enumerable.

This problem does not seem to plague primitive types such as int, as if the Select statements are removed from the above, it only returns 20 values. I believe it also seems to affect reference types too, such as the FatReferenceType that is also in NetFabric's LINQ benchmarks project, since when I make an IEqualityComparer<T> class for FatReferenceType to compare its field1 value, despite that it returns true for the 1st and 21st values in the above (when FatValueType is replaced by FatReferenceType and an instance of the comparer is passed to Distinct), the above also returns 30 values instead of 20.

If I had to venture a guess as to why it is failing, it could be because of how Set<T> in Hyperlinq handles resizing itself, but I can't say for sure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions