Skip to content

[BUG] reduced performance on GlobalOrdinalsStringTermsAggregator #17979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bruce-hong-glean opened this issue Apr 17, 2025 · 7 comments · May be fixed by #18059
Open

[BUG] reduced performance on GlobalOrdinalsStringTermsAggregator #17979

bruce-hong-glean opened this issue Apr 17, 2025 · 7 comments · May be fixed by #18059
Assignees
Labels
bug Something isn't working Search:Aggregations

Comments

@bruce-hong-glean
Copy link

Describe the bug

After refactoring aggregation precomputations to a unified API, there are now (almost) consecutive calls to tryPrecomputeAggregationForLeaf and getLeafCollector.

When running aggregations with GlobalOrdinalsStringTermsAggregator, this calls valuesSource.globalOrdinalsValues(ctx); twice, which we've seen to be expensive on several clusters.

My question is: can this value be reused across aggregation computations? As far as I can tell, it's not modified between the two calls, but if it's dangerous to do so would it make sense to at least move

SortedSetDocValues globalOrds = valuesSource.globalOrdinalsValues(ctx);

inside the block:

if (collectionStrategy instanceof DenseGlobalOrds
            && this.resultStrategy instanceof StandardTermsResults
            && subAggregators.length == 0)

to minimize recomputations?

Related component

Search:Aggregations

To Reproduce

Execute a an aggregations query that uses GlobalOrdinalsStringTermsAggregator with expensive globalOrdinalsValues computation.

Expected behavior

Ideally only compute globalOrdinalsValues when it's used and at most once.

Additional Details

Screenshots
Image

Host/Environment (please complete the following information):

  • OS: Linux
  • Version: 2.19.1
@harshavamsi
Copy link
Contributor

So tryPrecomputeAggregationForLeaf runs per leaf segment and returns a CollectionTerminationException if that leaf can be computed without running an expensive collection. The getLeafCollector of GlobalOrdinalStringTermsAggregator only runs when we cannot pre-compute. So I think what the stack trace is pointing to is two leaves where one is pre-computed and the other is not. This is intended behavior, unless I'm mistaken.

@msfroh @sandeshkr419 am I completely off here?

@harshavamsi
Copy link
Contributor

@bruce-hong-glean specifically what regressions are you seeing? Are your queries slower?

@sandeshkr419
Copy link
Contributor

sandeshkr419 commented Apr 17, 2025

@harshavamsi I think I understand the problem pointed out now.
Thanks @bruce-hong-glean for reporting it.

Basically when trying to pre-compute using tryPrecomputeAggregationForLeaf(...), we compute globalOrds once.

    @Override
    protected boolean tryPrecomputeAggregationForLeaf(LeafReaderContext ctx) throws IOException {
        SortedSetDocValues globalOrds = valuesSource.globalOrdinalsValues(ctx);
        .
        .
        .
        return false;
    }

Now, for cases when we are not successfully pre-computing (return false), we fallback to default implementation of getLeafCollector(...) where we compute globalOrds again:

    @Override
    public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCollector sub) throws IOException {
        SortedSetDocValues globalOrds = valuesSource.globalOrdinalsValues(ctx);
        collectionStrategy.globalOrdsReady(globalOrds);
        .
        .
        .

So I think re-computing globalOrds is what we can avoid since it is expensive. Before #16733 since everything resided in getLeafCollector() we were computing globalOrds only once.

@bowenlan-amzn
Copy link
Member

I'm also curious about the regression, if we can reproduce, maybe we will find some issue for loading global ordinal. Theoretically I would expect it to be built only once for a shard, unless there are updates and refreshes coming in.

@sandeshkr419 sandeshkr419 linked a pull request Apr 23, 2025 that will close this issue
3 tasks
@sandeshkr419
Copy link
Contributor

@bruce-hong-glean

I raised a fix for this.
For testing, a non match-all query with keyword terms aggregation should see slight improvements.

Something like https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/noaa/operations/default.json#L253 Once the CI finishes, will try running benchmark comparisons on it.

@sandeshkr419
Copy link
Contributor

sandeshkr419 commented Apr 23, 2025

@bruce-hong-glean

I was looking into benchmark results of 2.18 vs 2.19 since the refactoring was done in 2.19.

Image Image

Orange/brown line is 2.18 and trailing green line is 2.19 - I didn't see any regression in the 2 queries where the sub-aggregation is present which should fail pre-compute and use regular LeafCollector inplementation:

    {
      "name": "keyword-terms-min",
      "operation-type": "search",
      "body": {
        "size": 0,
        "aggs": {
          "station": {
            "terms": {
              "field": "station.id",
              "size": 500
            },
            "aggs": {
              "tmin": {
                "min": {
                  "field": "TMIN"
                }
              }
            }
          }
        }
      }
    },
    {
      "name": "keyword-terms-low-cardinality-min",
      "operation-type": "search",
      "body": {
        "size": 0,
        "aggs": {
          "country": {
            "terms": {
              "field": "station.country",
              "size": 200
            },
            "aggs": {
              "tmin": {
                "min": {
                  "field": "TMIN"
                }
              }
            }
          }
        }
      }
    },

Let me know if you have a better way to verify the fix. Else will try to do some benchmarking setup locally.

@bruce-hong-glean
Copy link
Author

bruce-hong-glean commented Apr 23, 2025

Unfortunately I don't have a better way to verify immediately either. We've seen this happen on a subset of customer deployments, so I could build an image and post screenshots of the changes in profiles and latency stats here in ~2 weeks (following our release schedule) if that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations
Projects
Status: In Progress
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

4 participants