Skip to content

Commit c0374c2

Browse files
authored
Reduce quantization optimization steps at ivf query time (#130493)
Since we are quantizing for posting list centroid, I think we can get away with fewer optimization iterations. Dropping from 5 to 2 reduces latency when hitting many centroids, with no recall impact (at least on my data sets). baseline: ``` index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited ------------------------------ ---------- ------- ----------- ---------------- ------------- ------ ------ --------- cohere-wikipedia-docs-768d.vec ivf 100 2.43 0.00 0.00 411.52 0.91 23766.65 ``` candidate: ``` index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited ------------------------------ ---------- ------- ----------- ---------------- ------------- ------ ------ --------- cohere-wikipedia-docs-768d.vec ivf 100 1.84 0.00 0.00 543.48 0.91 23766.65 ``` Here is a more extreme case (many segments): baseline: ``` index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited ------------------------------ ---------- ------- ----------- ---------------- ------------- ------ ------ --------- cohere-wikipedia-docs-768d.vec ivf 100 36.10 0.00 0.00 27.70 0.87 364480.37 ``` candidate: ``` index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited ------------------------------ ---------- ------- ----------- ---------------- ------------- ------ ------ --------- cohere-wikipedia-docs-768d.vec ivf 100 24.94 0.00 0.00 40.10 0.87 364480.37 ``` Need to test against more data sets, but this is a nice improvement.
1 parent 7fac8ff commit c0374c2

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
import static org.apache.lucene.index.VectorSimilarityFunction.MAXIMUM_INNER_PRODUCT;
3434
import static org.elasticsearch.index.codec.vectors.BQSpaceUtils.transposeHalfByte;
3535
import static org.elasticsearch.index.codec.vectors.BQVectorUtils.discretize;
36+
import static org.elasticsearch.index.codec.vectors.OptimizedScalarQuantizer.DEFAULT_LAMBDA;
3637
import static org.elasticsearch.simdvec.ES91OSQVectorsScorer.BULK_SIZE;
3738

3839
/**
@@ -211,7 +212,7 @@ private static class MemorySegmentPostingsVisitor implements PostingVisitor {
211212
quantizedQueryScratch = new byte[QUERY_BITS * discretizedDimensions / 8];
212213
quantizedByteLength = discretizedDimensions / 8 + (Float.BYTES * 3) + Short.BYTES;
213214
quantizedVectorByteSize = (discretizedDimensions / 8);
214-
quantizer = new OptimizedScalarQuantizer(fieldInfo.getVectorSimilarityFunction());
215+
quantizer = new OptimizedScalarQuantizer(fieldInfo.getVectorSimilarityFunction(), DEFAULT_LAMBDA, 1);
215216
osqVectorsScorer = ESVectorUtil.getES91OSQVectorsScorer(indexInput, fieldInfo.getVectorDimension());
216217
}
217218

0 commit comments

Comments
 (0)