You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the following issue, which is really odd and affects the evaluation of the neural models.
I build my data using the auto preparer and I came to realize, that when I try to make predictions on the test set, some document-query pairs are duplicated.
I am not sure why this is happening, my first guess would be in order to fill up the missing examples until the batch size, but this does not seem to be the case.
Even more odd, is the fact that those predictions have different scores for the same document-query pairs. And those are not even always close to each other - so this can't be some rounding error or so. This is very weird, how is it possible that without re-training the model, I can get so much different predictions for the same query-document pairs in inference time???
Hi, @littlewine , there are indeed three kinds of datapack, i.e., point-wise, pair-wise, and list-wise. In fact, for training, we can choose either one according to the loss function. While in testing, we should not organize the datapack into pair-wise since it will add duplicate instances to fill the batch size.
I have the following issue, which is really odd and affects the evaluation of the neural models.
I build my data using the auto preparer and I came to realize, that when I try to make predictions on the test set, some document-query pairs are duplicated.
I am not sure why this is happening, my first guess would be in order to fill up the missing examples until the batch size, but this does not seem to be the case.
Here's most of my code:
Now, it seems that the duplicates are created through the dataset builder, but I don't understand why.
Even more odd, is the fact that those predictions have different scores for the same document-query pairs. And those are not even always close to each other - so this can't be some rounding error or so. This is very weird, how is it possible that without re-training the model, I can get so much different predictions for the same query-document pairs in inference time???
In this replicated example the model used was
KNRM
, but I think this happens in other models too.The text was updated successfully, but these errors were encountered: