-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Hi, I am trying to understand how you combined the hard negative loss
Ls with the in-batch random negative loss
Lr, as in the paper the in-batch random negative loss
is scaled by an alpha
hyperparameter but there is no mention of the value of alpha
you used in the experiments.
Following star/train.py
I found the RobertaDot_InBatch
model, whose forward
function calls the inbatch_train
method.
A the end of the inbatch_train
method (line 182), I found
return ((first_loss + second_loss) / (first_num + second_num),)
which is different from the combined loss proposed in the paper (Eq. 13).
Am I missing something?
Also, for each query in the batch, did you consider all the possible in-batch random negatives
or just one?
Thanks in advance!
yhy-2000
Metadata
Metadata
Assignees
Labels
No labels