-
Notifications
You must be signed in to change notification settings - Fork 2k
[BUG] reduced performance on GlobalOrdinalsStringTermsAggregator
#17979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So @msfroh @sandeshkr419 am I completely off here? |
@bruce-hong-glean specifically what regressions are you seeing? Are your queries slower? |
@harshavamsi I think I understand the problem pointed out now. Basically when trying to pre-compute using
Now, for cases when we are not successfully pre-computing (return
So I think re-computing |
I'm also curious about the regression, if we can reproduce, maybe we will find some issue for loading global ordinal. Theoretically I would expect it to be built only once for a shard, unless there are updates and refreshes coming in. |
I raised a fix for this. Something like https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/noaa/operations/default.json#L253 Once the CI finishes, will try running benchmark comparisons on it. |
I was looking into benchmark results of 2.18 vs 2.19 since the refactoring was done in 2.19. ![]() ![]() Orange/brown line is 2.18 and trailing green line is 2.19 - I didn't see any regression in the 2 queries where the sub-aggregation is present which should fail pre-compute and use regular LeafCollector inplementation:
Let me know if you have a better way to verify the fix. Else will try to do some benchmarking setup locally. |
Unfortunately I don't have a better way to verify immediately either. We've seen this happen on a subset of customer deployments, so I could build an image and post screenshots of the changes in profiles and latency stats here in ~2 weeks (following our release schedule) if that helps. |
Describe the bug
After refactoring aggregation precomputations to a unified API, there are now (almost) consecutive calls to
tryPrecomputeAggregationForLeaf
andgetLeafCollector
.When running aggregations with
GlobalOrdinalsStringTermsAggregator
, this callsvaluesSource.globalOrdinalsValues(ctx);
twice, which we've seen to be expensive on several clusters.My question is: can this value be reused across aggregation computations? As far as I can tell, it's not modified between the two calls, but if it's dangerous to do so would it make sense to at least move
inside the block:
to minimize recomputations?
Related component
Search:Aggregations
To Reproduce
Execute a an aggregations query that uses
GlobalOrdinalsStringTermsAggregator
with expensiveglobalOrdinalsValues
computation.Expected behavior
Ideally only compute
globalOrdinalsValues
when it's used and at most once.Additional Details
Screenshots

Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: