How to test performance of dense retrieval model #5555
Replies: 3 comments
-
@bogdankostic do you have any ideas to help? |
Beta Was this translation helpful? Give feedback.
-
The performance measures displayed at the end of DPR training are in-batch metrics. This means that, for each batch of samples in your development set, and for each question in each batch, we calculate whether the model can identify the relevant passage (positive label) among all passages in the batch (hard-negative labels and in-batch negatives). The final metrics shown at the end are the average metrics over all batches and highly depend on the batch size you set when training the model. These metrics are useful for quickly checking whether training works as expected (i.e., the model is converging) and to compare different hyperparameter values. However, the best practice for evaluating retrieval models is to check their performance on a large collection of documents. I’d recommend reviewing the Evaluation Guide in our documentation and our Evaluation Tutorial. |
Beta Was this translation helpful? Give feedback.
-
I see, thank you for your help! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey,
I like to train my own dense retrieval model on my own dataset. However, as a starting point I tried to replicate the results of the haystack example. At the end, they report different DPR performance measures of their trained model. It looks like this:

However, they do not say how they get these measures. Is there a simple way to compute these given the trained model and the data, or do I have to implement this myself?
Thank you very much in advance!
Beta Was this translation helpful? Give feedback.
All reactions