Skip to content

Commit 6f2578e

Browse files
authored
ESQL: Split large pages on load sometimes (#131053) (#132036)
This adds support for splitting `Page`s of large values when loading from single segment, non-descending hits. This is hottest code path as it's how we load data for aggregation. So! We had to make very very very sure this doesn't slow down the fast path of loading doc values. Caveat - this only defends against loading large values via the row-by-row load mechanism that we use for stored fields and _source. That covers the most common kinds of large values - mostly `text` and geo fields. If we need to split further on docs values, we'll have to invent something for them specifically. For now, just row-by-row. This works by flipping the order in which we load row-by-row and column-at-a-time values. Previously we loaded all column-at-a-time values first because that was simpler. Then we loaded all of the row-by-row values. Now we save the column-at-a-time values and instead load row-by-row until the `Page`'s estimated size is larger than a "jumbo" size which defaults to a megabyte. Once we load enough rows that we estimate the page is "jumbo", we then stop loading rows. The Page will look like this: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | <-- after loading this row | | | | | | we crossed to "jumbo" size | | | | | | | | | | | | | | | | | | <-- these rows are entirely empty | | | | | | | | | | | | ``` Then we chop the page to the last row: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | ``` Then fill in the column-at-a-time columns: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | 1 | XXXX | 11 | 1.0 | | XXXX | 2 | XXXX | 22 | -2.0 | | XXXX | 3 | XXXX | 33 | 1e9 | | XXXX | 4 | XXXX | 44 | 913 | | XXXX | 5 | XXXX | 55 | 0.1234 | | XXXX | 6 | XXXX | 66 | 3.1415 | ``` And then we return *that* `Page`. On the next `Driver` iteration we start from where we left off.
1 parent c057ec1 commit 6f2578e

File tree

48 files changed

+732
-234
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+732
-234
lines changed

benchmarks/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -152,11 +152,10 @@ exit
152152
Grab the async profiler from https://github.com/jvm-profiling-tools/async-profiler
153153
and run `prof async` like so:
154154
```
155-
gradlew -p benchmarks/ run --args 'LongKeyedBucketOrdsBenchmark.multiBucket -prof "async:libPath=/home/nik9000/Downloads/async-profiler-3.0-29ee888-linux-x64/lib/libasyncProfiler.so;dir=/tmp/prof;output=flamegraph"'
155+
gradlew -p benchmarks/ run --args 'LongKeyedBucketOrdsBenchmark.multiBucket -prof "async:libPath=/home/nik9000/Downloads/async-profiler-4.0-linux-x64/lib/libasyncProfiler.so;dir=/tmp/prof;output=flamegraph"'
156156
```
157157

158-
Note: As of January 2025 the latest release of async profiler doesn't work
159-
with our JDK but the nightly is fine.
158+
Note: As of July 2025 the 4.0 release of the async profiler works well.
160159

161160
If you are on Mac, this'll warn you that you downloaded the shared library from
162161
the internet. You'll need to go to settings and allow it to run.

benchmarks/src/main/java/org/elasticsearch/benchmark/compute/operator/ValuesSourceReaderBenchmark.java

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,10 @@
2424
import org.apache.lucene.util.BytesRef;
2525
import org.apache.lucene.util.NumericUtils;
2626
import org.elasticsearch.common.breaker.NoopCircuitBreaker;
27+
import org.elasticsearch.common.logging.LogConfigurator;
2728
import org.elasticsearch.common.lucene.Lucene;
2829
import org.elasticsearch.common.settings.Settings;
30+
import org.elasticsearch.common.unit.ByteSizeValue;
2931
import org.elasticsearch.common.util.BigArrays;
3032
import org.elasticsearch.compute.data.BlockFactory;
3133
import org.elasticsearch.compute.data.BytesRefBlock;
@@ -84,10 +86,13 @@
8486
@State(Scope.Thread)
8587
@Fork(1)
8688
public class ValuesSourceReaderBenchmark {
89+
static {
90+
LogConfigurator.configureESLogging();
91+
}
92+
8793
private static final int BLOCK_LENGTH = 16 * 1024;
8894
private static final int INDEX_SIZE = 10 * BLOCK_LENGTH;
8995
private static final int COMMIT_INTERVAL = 500;
90-
private static final BigArrays BIG_ARRAYS = BigArrays.NON_RECYCLING_INSTANCE;
9196
private static final BlockFactory blockFactory = BlockFactory.getInstance(
9297
new NoopCircuitBreaker("noop"),
9398
BigArrays.NON_RECYCLING_INSTANCE
@@ -296,6 +301,7 @@ private static BlockLoader numericBlockLoader(WhereAndBaseName w, NumberFieldMap
296301
public void benchmark() {
297302
ValuesSourceReaderOperator op = new ValuesSourceReaderOperator(
298303
blockFactory,
304+
ByteSizeValue.ofMb(1).getBytes(),
299305
fields(name),
300306
List.of(new ValuesSourceReaderOperator.ShardContext(reader, () -> {
301307
throw new UnsupportedOperationException("can't load _source here");

docs/changelog/131053.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 131053
2+
summary: Split large pages on load sometimes
3+
area: ES|QL
4+
type: bug
5+
issues: []

server/src/main/java/org/elasticsearch/index/mapper/AbstractShapeGeometryFieldMapper.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,11 @@ protected void writeExtent(BlockLoader.IntBuilder builder, Extent extent) {
9898
public BlockLoader.AllReader reader(LeafReaderContext context) throws IOException {
9999
return new BlockLoader.AllReader() {
100100
@Override
101-
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs) throws IOException {
101+
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
102102
var binaryDocValues = context.reader().getBinaryDocValues(fieldName);
103103
var reader = new GeometryDocValueReader();
104-
try (var builder = factory.ints(docs.count())) {
105-
for (int i = 0; i < docs.count(); i++) {
104+
try (var builder = factory.ints(docs.count() - offset)) {
105+
for (int i = offset; i < docs.count(); i++) {
106106
read(binaryDocValues, docs.get(i), reader, builder);
107107
}
108108
return builder.build();

server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java

Lines changed: 43 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -123,10 +123,10 @@ private static class SingletonLongs extends BlockDocValuesReader {
123123
}
124124

125125
@Override
126-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
127-
try (BlockLoader.LongBuilder builder = factory.longsFromDocValues(docs.count())) {
126+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
127+
try (BlockLoader.LongBuilder builder = factory.longsFromDocValues(docs.count() - offset)) {
128128
int lastDoc = -1;
129-
for (int i = 0; i < docs.count(); i++) {
129+
for (int i = offset; i < docs.count(); i++) {
130130
int doc = docs.get(i);
131131
if (doc < lastDoc) {
132132
throw new IllegalStateException("docs within same block must be in order");
@@ -172,9 +172,9 @@ private static class Longs extends BlockDocValuesReader {
172172
}
173173

174174
@Override
175-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
176-
try (BlockLoader.LongBuilder builder = factory.longsFromDocValues(docs.count())) {
177-
for (int i = 0; i < docs.count(); i++) {
175+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
176+
try (BlockLoader.LongBuilder builder = factory.longsFromDocValues(docs.count() - offset)) {
177+
for (int i = offset; i < docs.count(); i++) {
178178
int doc = docs.get(i);
179179
if (doc < this.docID) {
180180
throw new IllegalStateException("docs within same block must be in order");
@@ -258,10 +258,10 @@ private static class SingletonInts extends BlockDocValuesReader {
258258
}
259259

260260
@Override
261-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
262-
try (BlockLoader.IntBuilder builder = factory.intsFromDocValues(docs.count())) {
261+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
262+
try (BlockLoader.IntBuilder builder = factory.intsFromDocValues(docs.count() - offset)) {
263263
int lastDoc = -1;
264-
for (int i = 0; i < docs.count(); i++) {
264+
for (int i = offset; i < docs.count(); i++) {
265265
int doc = docs.get(i);
266266
if (doc < lastDoc) {
267267
throw new IllegalStateException("docs within same block must be in order");
@@ -307,9 +307,9 @@ private static class Ints extends BlockDocValuesReader {
307307
}
308308

309309
@Override
310-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
311-
try (BlockLoader.IntBuilder builder = factory.intsFromDocValues(docs.count())) {
312-
for (int i = 0; i < docs.count(); i++) {
310+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
311+
try (BlockLoader.IntBuilder builder = factory.intsFromDocValues(docs.count() - offset)) {
312+
for (int i = offset; i < docs.count(); i++) {
313313
int doc = docs.get(i);
314314
if (doc < this.docID) {
315315
throw new IllegalStateException("docs within same block must be in order");
@@ -407,10 +407,10 @@ private static class SingletonDoubles extends BlockDocValuesReader {
407407
}
408408

409409
@Override
410-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
411-
try (BlockLoader.DoubleBuilder builder = factory.doublesFromDocValues(docs.count())) {
410+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
411+
try (BlockLoader.DoubleBuilder builder = factory.doublesFromDocValues(docs.count() - offset)) {
412412
int lastDoc = -1;
413-
for (int i = 0; i < docs.count(); i++) {
413+
for (int i = offset; i < docs.count(); i++) {
414414
int doc = docs.get(i);
415415
if (doc < lastDoc) {
416416
throw new IllegalStateException("docs within same block must be in order");
@@ -460,9 +460,9 @@ private static class Doubles extends BlockDocValuesReader {
460460
}
461461

462462
@Override
463-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
464-
try (BlockLoader.DoubleBuilder builder = factory.doublesFromDocValues(docs.count())) {
465-
for (int i = 0; i < docs.count(); i++) {
463+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
464+
try (BlockLoader.DoubleBuilder builder = factory.doublesFromDocValues(docs.count() - offset)) {
465+
for (int i = offset; i < docs.count(); i++) {
466466
int doc = docs.get(i);
467467
if (doc < this.docID) {
468468
throw new IllegalStateException("docs within same block must be in order");
@@ -541,10 +541,10 @@ private static class DenseVectorValuesBlockReader extends BlockDocValuesReader {
541541
}
542542

543543
@Override
544-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
544+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
545545
// Doubles from doc values ensures that the values are in order
546-
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count(), dimensions)) {
547-
for (int i = 0; i < docs.count(); i++) {
546+
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count() - offset, dimensions)) {
547+
for (int i = offset; i < docs.count(); i++) {
548548
int doc = docs.get(i);
549549
if (doc < floatVectorValues.docID()) {
550550
throw new IllegalStateException("docs within same block must be in order");
@@ -642,19 +642,19 @@ private BlockLoader.Block readSingleDoc(BlockFactory factory, int docId) throws
642642
if (ordinals.advanceExact(docId)) {
643643
BytesRef v = ordinals.lookupOrd(ordinals.ordValue());
644644
// the returned BytesRef can be reused
645-
return factory.constantBytes(BytesRef.deepCopyOf(v));
645+
return factory.constantBytes(BytesRef.deepCopyOf(v), 1);
646646
} else {
647-
return factory.constantNulls();
647+
return factory.constantNulls(1);
648648
}
649649
}
650650

651651
@Override
652-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
653-
if (docs.count() == 1) {
654-
return readSingleDoc(factory, docs.get(0));
652+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
653+
if (docs.count() - offset == 1) {
654+
return readSingleDoc(factory, docs.get(offset));
655655
}
656-
try (BlockLoader.SingletonOrdinalsBuilder builder = factory.singletonOrdinalsBuilder(ordinals, docs.count())) {
657-
for (int i = 0; i < docs.count(); i++) {
656+
try (var builder = factory.singletonOrdinalsBuilder(ordinals, docs.count() - offset)) {
657+
for (int i = offset; i < docs.count(); i++) {
658658
int doc = docs.get(i);
659659
if (doc < ordinals.docID()) {
660660
throw new IllegalStateException("docs within same block must be in order");
@@ -697,9 +697,9 @@ private static class Ordinals extends BlockDocValuesReader {
697697
}
698698

699699
@Override
700-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
701-
try (BytesRefBuilder builder = factory.bytesRefsFromDocValues(docs.count())) {
702-
for (int i = 0; i < docs.count(); i++) {
700+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
701+
try (BytesRefBuilder builder = factory.bytesRefsFromDocValues(docs.count() - offset)) {
702+
for (int i = offset; i < docs.count(); i++) {
703703
int doc = docs.get(i);
704704
if (doc < ordinals.docID()) {
705705
throw new IllegalStateException("docs within same block must be in order");
@@ -777,9 +777,9 @@ private static class BytesRefsFromBinary extends BlockDocValuesReader {
777777
}
778778

779779
@Override
780-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
781-
try (BlockLoader.BytesRefBuilder builder = factory.bytesRefs(docs.count())) {
782-
for (int i = 0; i < docs.count(); i++) {
780+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
781+
try (BlockLoader.BytesRefBuilder builder = factory.bytesRefs(docs.count() - offset)) {
782+
for (int i = offset; i < docs.count(); i++) {
783783
int doc = docs.get(i);
784784
if (doc < docID) {
785785
throw new IllegalStateException("docs within same block must be in order");
@@ -876,9 +876,9 @@ private static class DenseVectorFromBinary extends BlockDocValuesReader {
876876
}
877877

878878
@Override
879-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
880-
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count(), dimensions)) {
881-
for (int i = 0; i < docs.count(); i++) {
879+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
880+
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count() - offset, dimensions)) {
881+
for (int i = offset; i < docs.count(); i++) {
882882
int doc = docs.get(i);
883883
if (doc < docID) {
884884
throw new IllegalStateException("docs within same block must be in order");
@@ -960,10 +960,10 @@ private static class SingletonBooleans extends BlockDocValuesReader {
960960
}
961961

962962
@Override
963-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
964-
try (BlockLoader.BooleanBuilder builder = factory.booleansFromDocValues(docs.count())) {
963+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
964+
try (BlockLoader.BooleanBuilder builder = factory.booleansFromDocValues(docs.count() - offset)) {
965965
int lastDoc = -1;
966-
for (int i = 0; i < docs.count(); i++) {
966+
for (int i = offset; i < docs.count(); i++) {
967967
int doc = docs.get(i);
968968
if (doc < lastDoc) {
969969
throw new IllegalStateException("docs within same block must be in order");
@@ -1009,9 +1009,9 @@ private static class Booleans extends BlockDocValuesReader {
10091009
}
10101010

10111011
@Override
1012-
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
1013-
try (BlockLoader.BooleanBuilder builder = factory.booleansFromDocValues(docs.count())) {
1014-
for (int i = 0; i < docs.count(); i++) {
1012+
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
1013+
try (BlockLoader.BooleanBuilder builder = factory.booleansFromDocValues(docs.count() - offset)) {
1014+
for (int i = offset; i < docs.count(); i++) {
10151015
int doc = docs.get(i);
10161016
if (doc < this.docID) {
10171017
throw new IllegalStateException("docs within same block must be in order");

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ interface ColumnAtATimeReader extends Reader {
4343
/**
4444
* Reads the values of all documents in {@code docs}.
4545
*/
46-
BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException;
46+
BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException;
4747
}
4848

4949
interface RowStrideReader extends Reader {
@@ -149,8 +149,8 @@ public String toString() {
149149
*/
150150
class ConstantNullsReader implements AllReader {
151151
@Override
152-
public Block read(BlockFactory factory, Docs docs) throws IOException {
153-
return factory.constantNulls();
152+
public Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
153+
return factory.constantNulls(docs.count() - offset);
154154
}
155155

156156
@Override
@@ -183,8 +183,8 @@ public Builder builder(BlockFactory factory, int expectedCount) {
183183
public ColumnAtATimeReader columnAtATimeReader(LeafReaderContext context) {
184184
return new ColumnAtATimeReader() {
185185
@Override
186-
public Block read(BlockFactory factory, Docs docs) {
187-
return factory.constantBytes(value);
186+
public Block read(BlockFactory factory, Docs docs, int offset) {
187+
return factory.constantBytes(value, docs.count() - offset);
188188
}
189189

190190
@Override
@@ -261,8 +261,8 @@ public ColumnAtATimeReader columnAtATimeReader(LeafReaderContext context) throws
261261
}
262262
return new ColumnAtATimeReader() {
263263
@Override
264-
public Block read(BlockFactory factory, Docs docs) throws IOException {
265-
return reader.read(factory, docs);
264+
public Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
265+
return reader.read(factory, docs, offset);
266266
}
267267

268268
@Override
@@ -408,13 +408,13 @@ interface BlockFactory {
408408
/**
409409
* Build a block that contains only {@code null}.
410410
*/
411-
Block constantNulls();
411+
Block constantNulls(int count);
412412

413413
/**
414414
* Build a block that contains {@code value} repeated
415415
* {@code size} times.
416416
*/
417-
Block constantBytes(BytesRef value);
417+
Block constantBytes(BytesRef value, int count);
418418

419419
/**
420420
* Build a reader for reading keyword ordinals.

server/src/main/java/org/elasticsearch/index/mapper/BooleanScriptBlockDocValuesReader.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,10 @@ public int docId() {
4949
}
5050

5151
@Override
52-
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs) throws IOException {
52+
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
5353
// Note that we don't emit falses before trues so we conform to the doc values contract and can use booleansFromDocValues
54-
try (BlockLoader.BooleanBuilder builder = factory.booleans(docs.count())) {
55-
for (int i = 0; i < docs.count(); i++) {
54+
try (BlockLoader.BooleanBuilder builder = factory.booleans(docs.count() - offset)) {
55+
for (int i = offset; i < docs.count(); i++) {
5656
read(docs.get(i), builder);
5757
}
5858
return builder.build();

server/src/main/java/org/elasticsearch/index/mapper/DateScriptBlockDocValuesReader.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,10 @@ public int docId() {
4949
}
5050

5151
@Override
52-
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs) throws IOException {
52+
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
5353
// Note that we don't sort the values sort, so we can't use factory.longsFromDocValues
54-
try (BlockLoader.LongBuilder builder = factory.longs(docs.count())) {
55-
for (int i = 0; i < docs.count(); i++) {
54+
try (BlockLoader.LongBuilder builder = factory.longs(docs.count() - offset)) {
55+
for (int i = offset; i < docs.count(); i++) {
5656
read(docs.get(i), builder);
5757
}
5858
return builder.build();

server/src/main/java/org/elasticsearch/index/mapper/DoubleScriptBlockDocValuesReader.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,10 @@ public int docId() {
4949
}
5050

5151
@Override
52-
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs) throws IOException {
52+
public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
5353
// Note that we don't sort the values sort, so we can't use factory.doublesFromDocValues
54-
try (BlockLoader.DoubleBuilder builder = factory.doubles(docs.count())) {
55-
for (int i = 0; i < docs.count(); i++) {
54+
try (BlockLoader.DoubleBuilder builder = factory.doubles(docs.count() - offset)) {
55+
for (int i = offset; i < docs.count(); i++) {
5656
read(docs.get(i), builder);
5757
}
5858
return builder.build();

0 commit comments

Comments
 (0)