Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated negative samples for a user exist #72

Open
swyo opened this issue May 12, 2021 · 0 comments
Open

Duplicated negative samples for a user exist #72

swyo opened this issue May 12, 2021 · 0 comments

Comments

@swyo
Copy link

swyo commented May 12, 2021

Hello, @hexiangnan

First of all, thank you for sharing your codes for many reviewers.

I reviewed your codes, and explored preprocessed data in Data folder.

I find some strange thing;duplicated negative samples exist for a user.

In the paper Section 4.1 Evaluation Protocols, there is a sentence as follows.

we followed the common strategy [6, 21] that randomly samples 100 items that are not interacted by the user, ranking the test item among the 100 items.

Although you mentioned about replacement for negative sampling, I think it is reasonable to extract negative sampling without replacement for each user.

This is because the ndcg of test dataset would be over-estimated.

As an example, this scenario can be happened.

If given negative samples which has duplicated items, recommended list also can have duplicated items.

# suppose that there is a top 10 recommended list for given one positive and 99 negative samples with replacement.
recs= [10, 11, 11, 11, 9, 29, 102, 204, 23, 2]
gt = [11]
ndcg(recs, gt)

Above ndcg returns 1 / log2(1 + 2).

This ndcg is not reasonable because 11 sampled 3 times. It means other items lose their chances to be recommended.

Summary

Generally, recommended list is distinct.
However, your test negative samples has duplicated items for a user.

Please checkout as follows. (Reproduce unreasonable behavior)

for uid, iid, label in test_loader:
  assert len(set(iid)) == len(iid)
@swyo swyo changed the title Duplicated negative samples for a user exists Duplicated negative samples for a user exist May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant