DPR create_embeddings/update_embeddings FAISS is so slow! #5641
Unanswered
shahad2099
asked this question in
Questions
Replies: 1 comment 4 replies
-
Hi @shahad2099 happy to answer your questions! 🙂
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I hope you're all doing well. I am using the Haystack framework to build a retriever. After training DPR, I wanted to use FAISS as my vector database. However, updating or creating embeddings is so slow! I have 3 million short documents (100 words/document), and for only 17% of them, it takes almost 5 hours and 38 minutes. This is incredibly frustrating for me.

What should I do? Are there any optimizations that I should implement in the code?
I also have some simple questions:
1-If I want to add more data, should I store all the documents and create embeddings from scratch? Or can I add and create embeddings for the new ones only?
2-In your opinion, what is the most efficient vector database for a retriever?
3-Does the update_embeddings function utilize GPUs or multi-threading?
Many thanks to all of you!
Beta Was this translation helpful? Give feedback.
All reactions