ESQL: Split large pages on load sometimes (#131053) #132036
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for splitting
Page
s of large values when loading from single segment, non-descending hits. This is hottest code path as it's how we load data for aggregation. So! We had to make very very very sure this doesn't slow down the fast path of loading doc values.Caveat - this only defends against loading large values via the row-by-row load mechanism that we use for stored fields and _source. That covers the most common kinds of large values - mostly
text
and geo fields. If we need to split further on docs values, we'll have to invent something for them specifically. For now, just row-by-row.This works by flipping the order in which we load row-by-row and column-at-a-time values. Previously we loaded all column-at-a-time values first because that was simpler. Then we loaded all of the row-by-row values. Now we save the column-at-a-time values and instead load row-by-row until the
Page
's estimated size is larger than a "jumbo" size which defaults to a megabyte.Once we load enough rows that we estimate the page is "jumbo", we then stop loading rows. The Page will look like this:
Then we chop the page to the last row:
Then fill in the column-at-a-time columns:
And then we return that
Page
. On the nextDriver
iteration we start from where we left off.