Skip to content

Add fast path for single value in VALUES aggregator #130510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 2, 2025

This change introduces a fast path for the VALUES aggregator in the single-value case. For the first value seen in each group, we add it the new big array without touching the hash. For subsequent values, if they are the same as the current value, we skip them; if they differ, we trigger the slow path and add them to the hash. This optimization speeds up VALUES when the number of groups is large and most groups have only one value.

Before:

Benchmark                      (dataType)  (groups)  Mode  Cnt      Score       Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    3    177.756 ±     2.111  ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    3    126.174 ±     0.431  ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    3  66920.144 ± 53588.490  ms/op

After:

Benchmark                      (dataType)  (groups)  Mode  Cnt      Score      Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    3    180.269 ±    4.019  ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    3    107.051 ±    3.149  ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    3  26277.863 ± 7214.319  ms/op

@dnhatn dnhatn force-pushed the single-value-path-for-values branch 2 times, most recently from 61f3ad6 to c1d8eb7 Compare July 3, 2025 04:21
@dnhatn dnhatn force-pushed the single-value-path-for-values branch from c1d8eb7 to 5860c6e Compare July 3, 2025 04:29
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn requested a review from nik9000 July 3, 2025 04:53
@dnhatn dnhatn marked this pull request as ready for review July 3, 2025 04:53
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of collecting the first values into only the firstValues array. So we only go to the hash if there's more than one value. I'll have another look in the morning. I think it's correct, but I'll need another scan to make sure.

@@ -90,6 +91,14 @@ public static void combineIntermediate(GroupingState state, int groupId, BytesRe

public static void combineStates(GroupingState current, int currentGroupId, GroupingState state, int statePosition) {
BytesRef scratch = new BytesRef();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch can go below the quick bail outs.

for (int s = 0; s < selected.getPositionCount(); s++) {
int group = selected.getInt(s);
int count = -selectedCounts[group];
selectedCounts[group] = total;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the counts any more - it's the counts in values, which is one less than the actual number of values. That's tricky. I'm not entirely sure what's correct here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants