-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Deserialize hits on demand #8270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @gao-artur, just to clarify: Are you using the new v8 client or NEST? NEST is deprecated and won't get any feature updates. Otherwise, your feature request reminds me of this planned feature ( That specific API could be implemented in a way that allows to stream the responses. Usually it's hard to implement streaming with JSON responses as there is no guaranteed order of fields and Would this work for you? |
Hey @flobernd,
This is fine and expected.
This will work for this specific API but in my case, I also need the |
Hi @gao-artur,
This sadly is not possible due to the mentioned "random" field order in the JSON response. If the With PIT it should be possible to fire an initial query that returns the |
I see, you are right. Then, your proposal to only stream hits works great for me. |
Going to close this issue in favor of the existing proposal and hope to find time to work on this soon! |
Hey. We need to process a large amount of documents, sometimes tens of millions. For that, we retrieve the first batch, make some processing, then retrieve the next batch, etc.
The problem is that these documents are pretty large. Multiply their size by the batch size, and you find that the deserialization process wastes a lot of memory.
Unfortunately, Nest deserializes
Hits
intoIReadOnlyCollection
, which means all the documents are deserialized at once. This means you need to allocate a large array (most likely in LOH) to hold all these objects, and these objects can't be GC'ed until you finish processing them and retrieve the next batch. But even during the deserialization of the next batch, the previous batch won't be GC'ed unless you set the previous response reference tonull
before starting the next batch search.We would like to have an API that will deserialize documents one-by-one during the
Hits
enumeration. ChangingHits
toIEnumerable/IAsyncEnumerable
is probably too large a breaking change, but maybe introducing a new property for this purpose, something likeLazyHits
, can work? TheHits
then can be changed to read fromLazyHits
and store the result in memory to avoid multiple deserialization.This approach will solve all the problems at once:
The text was updated successfully, but these errors were encountered: