Add optimized path for intermediate values aggregator #131390

dnhatn · 2025-07-16T20:23:11Z

Similar to #127849, this change adds an optimized path for leveraging ordinal blocks of intermediate input pages in the Values aggregator. Below are the micro-benchmark results.

Before:

// 1 raw input page + 1000 intermediate input pages
Benchmark                      (dataType)  (groups)  Mode  Cnt       Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    2       0.382          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    2     112.293          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    2  113182.908          ms/op

After:
// 1 raw input page + 1000 intermediate input pages
Benchmark                      (dataType)  (groups)  Mode  Cnt      Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    2      0.378          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    2     34.410          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    2  64654.830          ms/op

1K groups: 112 ms -> 34.4ms
1M groups: 113s -> 64s

More to come with #130510

Relates #127849

elasticsearchmachine · 2025-07-17T05:01:46Z

Hi @dnhatn, I've created a changelog YAML for you.

elasticsearchmachine · 2025-07-17T05:34:18Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java

nik9000

I just scanned it, but I approve of the general approach of letting aggs optimize their intermediate join and think VALUES is the right place to do it. I'm not sure if you did it right, but I think @idegtiarenko is checking this more closely.

ivancea

LGTM

...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java

ivancea · 2025-07-21T12:03:23Z

...l/compute/src/main/java/org/elasticsearch/compute/aggregation/ValuesBytesRefAggregators.java

+                ordinals = asOrdinals.getOrdinalsBlock();
+            }
+        }
+        if (dict != null && dict.getPositionCount() < groupIds.getPositionCount()) {


Should this use OrdinalBytesRefBlock.isDense(), or are the logics not related?

dnhatn · 2025-07-21T19:13:50Z

Thanks friends!

…king * upstream/main: (100 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...

…-tracking * upstream/main: (44 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...

There are two bugs introduced in #130510 and #131390 affecting the VALUES aggregator. The random tests do not cover these edge cases: 1. The check should be firstValues.size() <= group instead of firstValues.size() < group when reading values from the firstValues array. We need to inject nulls with repeated values (to simulate ordinals) to trigger this case. 2. We incorrectly added positionOffset when reading the group ID. We need to generate more groups to trigger chunking. Relates #130510 Relates #131390 Closes #131878

elasticsearchmachine added the v9.2.0 label Jul 16, 2025

dnhatn force-pushed the values-partial-input branch from d336a28 to f854f10 Compare July 17, 2025 00:43

dnhatn changed the title ~~Add prepareProcessIntermediateInputPage~~ Add optimized path for intermediate values aggregator Jul 17, 2025

dnhatn force-pushed the values-partial-input branch from f854f10 to 42b033f Compare July 17, 2025 03:14

dnhatn added :Analytics/ES|QL AKA ESQL >enhancement labels Jul 17, 2025

dnhatn added 2 commits July 16, 2025 22:02

Add optimized path for intermediate values aggregator

efd20d0

Update docs/changelog/131390.yaml

747d56e

dnhatn force-pushed the values-partial-input branch from 3f51173 to 747d56e Compare July 17, 2025 05:03

extra check

a25e26e

dnhatn requested review from nik9000 and ivancea July 17, 2025 05:33

dnhatn marked this pull request as ready for review July 17, 2025 05:33

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 17, 2025

dnhatn added 3 commits July 16, 2025 22:37

Merge remote-tracking branch 'elastic/main' into values-partial-input

30b0455

fix dense

87bc050

Merge remote-tracking branch 'elastic/main' into values-partial-input

90f892c

dnhatn requested a review from idegtiarenko July 18, 2025 06:46

idegtiarenko reviewed Jul 18, 2025

View reviewed changes

...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java Outdated Show resolved Hide resolved

idegtiarenko reviewed Jul 18, 2025

View reviewed changes

...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java Show resolved Hide resolved

nik9000 approved these changes Jul 18, 2025

View reviewed changes

ivancea approved these changes Jul 21, 2025

View reviewed changes

dnhatn added 4 commits July 21, 2025 09:17

format

8df8fb8

streams

faf0252

Merge remote-tracking branch 'elastic/main' into values-partial-input

5c4d0a8

fix after merges

6845928

dnhatn merged commit 2564379 into elastic:main Jul 21, 2025
33 checks passed

dnhatn deleted the values-partial-input branch July 21, 2025 19:14

dnhatn mentioned this pull request Jul 28, 2025

Fix off by one in ValuesBytesRefAggregator #132032

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optimized path for intermediate values aggregator #131390

Add optimized path for intermediate values aggregator #131390

Uh oh!

dnhatn commented Jul 16, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

nik9000 left a comment

Uh oh!

ivancea left a comment

Uh oh!

Uh oh!

ivancea Jul 21, 2025

Uh oh!

dnhatn commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

Add optimized path for intermediate values aggregator #131390

Add optimized path for intermediate values aggregator #131390

Uh oh!

Conversation

dnhatn commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

ivancea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ivancea Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

dnhatn commented Jul 16, 2025 •

edited

Loading