Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear how to interpret the relevance document #2

Open
lkurlandski opened this issue Sep 18, 2021 · 5 comments
Open

Unclear how to interpret the relevance document #2

lkurlandski opened this issue Sep 18, 2021 · 5 comments

Comments

@lkurlandski
Copy link

Could use some documentation about what athome4.qrel.sample actually contains.

@nims11
Copy link
Contributor

nims11 commented Sep 19, 2021

The qrel file is the standard qrel format used by TREC: https://trec.nist.gov/data/qrels_eng/

@lkurlandski
Copy link
Author

lkurlandski commented Sep 20, 2021

Okay I thought it was something along those lines, however, not every document in athome4_sample.tgz is represented in the athome4.qrel.sample document. For example, 000018, 000022, 000055, 000093, and 000094 are in athome4_sample.tgz but not in athome4.qrel.sample.

I'm guessing this is because "Documents not occurring in the qrels file were not judged by the human assessor and are assumed to be irrelevant in the evaluations used in TREC". Just want to make sure I'm not missing something obvious.

Nice repo, and thank you for your help.

@nims11
Copy link
Contributor

nims11 commented Sep 20, 2021 via email

@lkurlandski
Copy link
Author

Great thank you!

@Kaotic3
Copy link

Kaotic3 commented Dec 7, 2023

I see this is all very old now - but a couple of things to note.

Firstly, the QREL document is not conformant to the QREL standard - it has 2 in the Relevancy column, and I can't really see why that would be.

Secondly, the QREL document is inaccurate. Here is a demonstration of that, I include a few surrounding documents so you can see the 2 as well.

401 0 088668 2
401 0 008416 1
401 0 019276 2
401 0 020205 2

401 0 008416 1 - this has a 1, indicating relevancy to the Topic 401 - which is the Olympic Bid topic.

The document is about a student using campus websites inappropriately - but also discusses the individual holding an "Olympics sports day" for kids in the community - this is not a bid for the Olympics.

It isn't discussing the bid for the Olympics, it is literally nothing to do with that at all.

But it is marked 1. It should be marked 0.

I only put this here, so that people like myself who stumble across it - know that it may be useful for somethings, but it is not a reliable dataset which you can benchmark against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants