Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduction logs #2606

Merged
merged 1 commit into from
Sep 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/experiments-msmarco-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ bin/run.sh io.anserini.index.IndexCollection \
-input collections/msmarco-passage/collection_jsonl \
-index indexes/msmarco-passage/lucene-index-msmarco \
-generator DefaultLuceneDocumentGenerator \
-threads 9 -storePositions -storeDocvectors -storeRaw
-threads 9 -storePositions -storeDocvectors -storeRaw
```
For Windows:
```bash
Expand Down Expand Up @@ -206,7 +206,7 @@ Since the first column indicates the `qid`, it means that the file contains rank

## Evaluation

Finally, we can evaluate the retrieved documents using this the official MS MARCO evaluation script:
Finally, we can evaluate the retrieved documents using this the official MS MARCO evaluation script:

```bash
python tools/scripts/msmarco/msmarco_passage_eval.py \
Expand Down Expand Up @@ -244,7 +244,7 @@ We take the average of the scores across all queries (6980 in this case), and we
You can find this run on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/MSMARCO-Passage-Ranking-Submissions/leaderboard/) as the entry named "BM25 (Lucene8, tuned)", dated 2019/06/26.
So you've just reproduced (part of) a leaderboard submission!

We can also use the official [TREC](https://trec.nist.gov/) evaluation tool, `trec_eval`, to compute other metrics than MRR@10.
We can also use the official [TREC](https://trec.nist.gov/) evaluation tool, `trec_eval`, to compute other metrics than MRR@10.
For that we first need to convert runs and qrels files to the TREC format:

```bash
Expand Down Expand Up @@ -525,3 +525,4 @@ The BM25 run with default parameters `k1=0.9`, `b=0.4` roughly corresponds to th
+ Results reproduced by [@r-aya](https://github.com/r-aya) on 2024-09-07 (commit [`4319f89`](https://github.com/castorini/anserini/commit/4319f89472c4dd3359482f041dbcaee5202d8dd2))
+ Results reproduced by [@Amirkia1998](https://github.com/Amirkia1998) on 2024-09-20 (commit [`9e0cd5b`](https://github.com/castorini/anserini/commit/204bc87ef66e689773549ff804377eae20d5d7ce))
+ Results reproduced by [@CCarolD](https://github.com/CCarolD) on 2024-09-20 (commit [`2cb5d13`](https://github.com/castorini/anserini/commit/2cb5d1377862d49f70fa60cc452e96b31d815b94))
+ Results reproduced by [@pjyi2147](https://github.com/pjyi2147) on 2024-09-20 (commit [`2cb5d13`](https://github.com/castorini/anserini/commit/2cb5d1377862d49f70fa60cc452e96b31d815b94))
13 changes: 7 additions & 6 deletions docs/start-here.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ It simply means: of the top 10 documents, what fraction are relevant according t
For a query, if five of them are relevant, you get a score of 0.5; if nine of them are relevant, you get a score of 0.9.
You compute P@10 per query, and then average across all queries.

Information retrieval researchers have dozens of metrics, but a detailed explanation of each isn't important right now...
Information retrieval researchers have dozens of metrics, but a detailed explanation of each isn't important right now...
just recognize that _all_ metrics are imperfect, but they try to capture different aspects of the quality of a ranked list in terms of containing relevant documents.
For nearly all metrics, though, higher is better.

Expand Down Expand Up @@ -200,7 +200,7 @@ Look inside a file to see the json format we use.
The entire collection is now something like this:

```bash
$ wc collections/msmarco-passage/collection_jsonl/*
$ wc collections/msmarco-passage/collection_jsonl/*
1000000 58716381 374524070 collections/msmarco-passage/collection_jsonl/docs00.json
1000000 59072018 377845773 collections/msmarco-passage/collection_jsonl/docs01.json
1000000 58895092 375856044 collections/msmarco-passage/collection_jsonl/docs02.json
Expand All @@ -217,7 +217,7 @@ As an aside, data munging along these lines is a very common data preparation op
Collections rarely come in _exactly_ the format that your tools expect, so you'll be frequently writing lots of small scripts that munge data to convert from one format to another.

Similarly, we'll also have to do a bit of data munging of the queries and the qrels.
We're going to retain only the queries that are in the qrels file:
We're going to retain only the queries that are in the qrels file:

```bash
python tools/scripts/msmarco/filter_queries.py \
Expand Down Expand Up @@ -252,7 +252,7 @@ These queries are taken from Bing search logs, so they're "realistic" web querie
Okay, let's now cross-reference the `qid` with the relevance judgments, i.e., the qrels file:

```bash
$ grep 1048585 collections/msmarco-passage/qrels.dev.small.tsv
$ grep 1048585 collections/msmarco-passage/qrels.dev.small.tsv
1048585 0 7187158 1
```

Expand Down Expand Up @@ -282,7 +282,7 @@ Well, we've just seen that there are 6980 training queries.
For those, we have 7437 relevance judgments:

```bash
$ wc collections/msmarco-passage/qrels.dev.small.tsv
$ wc collections/msmarco-passage/qrels.dev.small.tsv
7437 29748 143300 collections/msmarco-passage/qrels.dev.small.tsv
````

Expand All @@ -295,7 +295,7 @@ This is just looking at the development set.
Now let's look at the training set:

```bash
$ wc collections/msmarco-passage/qrels.train.tsv
$ wc collections/msmarco-passage/qrels.train.tsv
532761 2131044 10589532 collections/msmarco-passage/qrels.train.tsv
```

Expand Down Expand Up @@ -409,3 +409,4 @@ If you think this guide can be improved in any way (e.g., you caught a typo or t
+ Results reproduced by [@r-aya](https://github.com/r-aya) on 2024-09-07 (commit [`4319f89`](https://github.com/castorini/anserini/commit/4319f89472c4dd3359482f041dbcaee5202d8dd2))
+ Results reproduced by [@Amirkia1998](https://github.com/Amirkia1998) on 2024-09-20 (commit [`9e0cd5b`](https://github.com/castorini/anserini/commit/204bc87ef66e689773549ff804377eae20d5d7ce))
+ Results reproduced by [@CCarolD](https://github.com/CCarolD) on 2024-09-20 (commit [`2cb5d13`](https://github.com/castorini/anserini/commit/2cb5d1377862d49f70fa60cc452e96b31d815b94))
+ Results reproduced by [@pjyi2147](https://github.com/pjyi2147) on 2024-09-20 (commit [`2cb5d13`](https://github.com/castorini/anserini/commit/2cb5d1377862d49f70fa60cc452e96b31d815b94))
Loading