Skip to content

Latest commit

 

History

History
218 lines (186 loc) · 13.5 KB

experiments-vector-prf.md

File metadata and controls

218 lines (186 loc) · 13.5 KB

Pyserini: Reproducing Vector PRF Results

This guide provides instructions to reproduce the Vector PRF in the following work and on all datasets and DR models available in Pyserini:

Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Starting with v0.12.0, you can reproduce these results directly from the Pyserini PyPI package. Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature. See package installation notes for more details.

Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS). However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective. Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.

Summary

Here's how our results stack up against all available models and datasets in Pyserini:

Passage Ranking Datasets

TREC DL 2019 Passage

Model Method MAP nDCG@10 nDCG@100 Recall@1000
ANCE Original 0.3710 0.6452 0.5540 0.7554
ANCE Average PRF 3 0.4247 0.6532 0.5937 0.7739
ANCE Rocchio PRF 5 A0.4 B0.6 0.4211 0.6539 0.5928 0.7825
TCT-ColBERT V1 Original 0.3906 0.6700 0.5730 0.7916
TCT-ColBERT V1 Average PRF 3 0.4336 0.6639 0.6119 0.8230
TCT-ColBERT V1 Rocchio PRF 5 A0.4 B0.6 0.4463 0.6875 0.6143 0.8393
TCT-ColBERT V2 HN+ Original 0.4469 0.7204 0.6318 0.8261
TCT-ColBERT V2 HN+ Average PRF 3 0.4879 0.7312 0.6719 0.8586
TCT-ColBERT V2 HN+ Rocchio PRF 5 A0.4 B0.6 0.4883 0.7111 0.6684 0.8694
DistillBERT KD Original 0.4053 0.6994 0.5765 0.7653
DistillBERT KD Average PRF 3 0.4575 0.7096 0.6217 0.7939
DistillBERT KD Rocchio PRF 5 A0.4 B0.6 0.4548 0.7052 0.6189 0.8049
DistillBERT Balanced Original 0.4590 0.7210 0.6360 0.8406
DistillBERT Balanced Average PRF 3 0.4856 0.7190 0.6526 0.8515
DistillBERT Balanced Rocchio PRF 5 A0.4 B0.6 0.4974 0.7231 0.6684 0.8775
SBERT Original 0.4060 0.6930 0.5985 0.7872
SBERT Average PRF 3 0.4354 0.7001 0.6149 0.7937
SBERT Rocchio PRF 5 A0.4 B0.6 0.4371 0.6952 0.6149 0.7941
ADORE Original 0.4188 0.6832 0.5946 0.7759
ADORE Average PRF 3 0.4672 0.6958 0.6263 0.7890
ADORE Rocchio PRF 5 A0.4 B0.6 0.4629 0.7021 0.6325 0.7950

TREC DL 2020 Passage

Model Method MAP nDCG@10 nDCG@100 Recall@1000
ANCE Original 0.4076 0.6458 0.5679 0.7764
ANCE Average PRF 3 0.4325 0.6573 0.5793 0.7909
ANCE Rocchio PRF 5 A0.4 B0.6 0.4315 0.6471 0.5800 0.7957
TCT-ColBERT V1 Original 0.4290 0.6678 0.5826 0.8181
TCT-ColBERT V1 Average PRF 3 0.4725 0.6957 0.6101 0.8667
TCT-ColBERT V1 Rocchio PRF 5 A0.4 B0.6 0.4625 0.6945 0.6056 0.8576
TCT-ColBERT V2 HN+ Original 0.4754 0.6882 0.6206 0.8429
TCT-ColBERT V2 HN+ Average PRF 3 0.4811 0.6836 0.6228 0.8579
TCT-ColBERT V2 HN+ Rocchio PRF 5 A0.4 B0.6 0.4860 0.6804 0.6254 0.8518
DistillBERT KD Original 0.4159 0.6447 0.5728 0.7953
DistillBERT KD Average PRF 3 0.4214 0.6316 0.5755 0.8403
DistillBERT KD Rocchio PRF 5 A0.4 B0.6 0.4145 0.6289 0.5760 0.8433
DistillBERT Balanced Original 0.4698 0.6854 0.6346 0.8727
DistillBERT Balanced Average PRF 3 0.4887 0.7086 0.6449 0.9030
DistillBERT Balanced Rocchio PRF 5 A0.4 B0.6 0.4879 0.7083 0.6470 0.8926
SBERT Original 0.4124 0.6344 0.5734 0.7937
SBERT Average PRF 3 0.4258 0.6412 0.5781 0.8169
SBERT Rocchio PRF 5 A0.4 B0.6 0.4342 0.6559 0.5851 0.8226
ADORE Original 0.4418 0.6655 0.5949 0.8151
ADORE Average PRF 3 0.4706 0.7086 0.6176 0.8323
ADORE Rocchio PRF 5 A0.4 B0.6 0.4760 0.7019 0.6193 0.8251

MS MARCO Passage V1

The PRF does not perform well with sparse judgements like in MS MARCO, the results here are just complements.

Model Method MAP nDCG@100 Recall@1000 MRR@10
ANCE Original 0.3362 0.4457 0.9587 0.3302
ANCE Average PRF 3 0.3133 0.4247 0.9490 0.3073
ANCE Rocchio PRF 5 A0.4 B0.6 0.3115 0.4250 0.9545 0.3048
TCT-ColBERT V1 Original 0.3416 0.4514 0.9640 0.3350
TCT-ColBERT V1 Average PRF 3 0.2882 0.4014 0.9452 0.2816
TCT-ColBERT V1 Rocchio PRF 5 A0.4 B0.6 0.2809 0.3988 0.9543 0.2740
TCT-ColBERT V2 HN+ Original 0.3644 0.4750 0.9695 0.3590
TCT-ColBERT V2 HN+ Average PRF 3 0.3183 0.4325 0.9585 0.2995
TCT-ColBERT V2 HN+ Rocchio PRF 5 A0.4 B0.6 0.3190 0.4360 0.9659 0.2933
DistillBERT KD Original 0.3309 0.4391 0.9553 0.3250
DistillBERT KD Average PRF 3 0.2830 0.3940 0.9325 0.2470
DistillBERT KD Rocchio PRF 5 A0.4 B0.6 0.2787 0.3937 0.9432 0.2716
DistillBERT Balanced Original 0.3515 0.4651 0.9771 0.3443
DistillBERT Balanced Average PRF 3 0.2979 0.4151 0.9613 0.2630
DistillBERT Balanced Rocchio PRF 5 A0.4 B0.6 0.2969 0.4178 0.9702 0.2897
SBERT Original 0.3373 0.4453 0.9558 0.3314
SBERT Average PRF 3 0.3094 0.4183 0.9446 0.3035
SBERT Rocchio PRF 5 A0.4 B0.6 0.3034 0.4157 0.9529 0.2974
ADORE Original 0.3523 0.4637 0.9688 0.3466
ADORE Average PRF 3 0.3188 0.4330 0.9583 0.3127
ADORE Rocchio PRF 5 A0.4 B0.6 0.3209 0.4376 0.9669 0.3145

Reproducing Results

To reproduce the Average Vector PRF on different models, same command with different parameter values can be used:

$ python -m pyserini.dsearch --topics topic \
    --index index \
    --encoder encoder \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.average_prf3.trec \
    --prf-depth 3 \
    --prf-method avg

To reproduce the Rocchio Vector PRF on different models, similar with Average:

$ python -m pyserini.dsearch --topics topic \
    --index index \
    --encoder encoder \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.rocchio_prf5_a0.4_b0.6.trec \
    --prf-depth 5 \
    --prf-method rocchio \
    --rocchio-alpha 0.4 \
    --rocchio-beta 0.6

For different models and datasets, the --topics, --index, and --encoder are different, since Pyserini has all these datasets available, we can pass in different values to run on different datasets.

--topics:
    TREC DL 2019 Passage: dl19-passage
    TREC DL 2020 Passage: dl20
    MS MARCO Passage V1: msmarco-passage-dev-subset

--index:
    ANCE index with MS MARCO V1 passage collection: msmarco-passage-ance-bf
    TCT-ColBERT V1 index with MS MARCO V1 passage collection: msmarco-passage-tct_colbert-bf
    TCT-ColBERT V2 HN+ index with MS MARCO V1 passage collection: msmarco-passage-tct_colbert-v2-hnp-bf
    DistillBERT KD index with MS MARCO V1 passage collection: msmarco-passage-distilbert-dot-margin_mse-T2-bf
    DistillBERT Balanced index with MS MARCO V1 passage collection: msmarco-passage-distilbert-dot-tas_b-b256-bf
    SBERT index with MS MARCO V1 passage collection: msmarco-passage-sbert-bf

Note: TREC DL 2019, TREC DL 2020, and MS MARCO Passage V1 use the same passage collection, so the index of the same model will be the same among these three datasets.

--encoder:
    ANCE: castorini/ance-msmarco-passage
    TCT-ColBERT V1: castorini/tct_colbert-msmarco
    TCT-ColBERT V2 HN+: castorini/tct_colbert-v2-hnp-msmarco
    DistillBERT KD: sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco
    DistillBERT Balanced: sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco
    SBERT: sentence-transformers/msmarco-distilbert-base-v3

Note: If you have pre-computed queries available, the --encoder can be replaced with --encoded-queries to avoid "on-the-fly" query encoding by passing in the path to your pre-computed query file. For example, Pyserini has the ANCE pre-computed query available for MS MARCO Passage V1, so instead of using --encoder castorini/ance-msmarco-passage, one can use --encoded-queries ance-msmarco-passage-dev-subset. For ADORE model, you can only use --encoded-queries, otf encoding is not available.

With these parameters, one can easily reproduce the results above, for example, to reproduce TREC DL 2019 Passage with ANCE Average Vector PRF 3 the command will be:

$ python -m pyserini.search.faiss --topics dl19-passage \
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.ance.dl19-passage.average_prf3.trec \
    --prf-depth 3 \
    --prf-method avg

To reproduce TREC DL 2019 Passage with ANCE Rocchio Vector PRF 5 Alpha 0.4 Beta 0.6, the command will be:

$ python -m pyserini.search.faiss --topics dl19-passage \
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.ance.dl19-passage.rocchio_prf5_a0.4_b0.6.trec \
    --prf-method rocchio \
    --prf-depth 5 \
    --rocchio-topk 5 \
    --rocchio-alpha 0.4 \
    --rocchio-beta 0.6

To evaluate, we use trec_eval built in Pyserini:

For TREC DL 2019, use this command to evaluate your run file:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 -l 2 dl19-passage runs/run.ance.dl19-passage.average_prf3.trec
map                 all     0.4247
ndcg_cut_100        all     0.5937
recall_1000         all     0.7739

Qrels file is available in Pyserini, so just replace the runs/run.ance.dl19-passage.average_prf3.trec with your own run file path to test your reproduced results.

Similarly, for TREC DL 2020:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 -l 2 dl20-passage runs/run.ance.dl20-passage.average_prf3.trec
map                 all     0.4325
ndcg_cut_100        all     0.5793
recall_1000         all     0.7909

Qrels file also available in Pyserini, just replace the runs/run.ance.dl20-passage.average_prf3.trec with your own run file path to test your reproduced results.

For MS MARCO Passage V1, no need to use -l 2 option:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 msmarco-passage-dev-subset runs/run.ance.msmarco-passage.average_prf3.trec
map                 all     0.3133
ndcg_cut_100        all     0.4247
recall_1000         all     0.9490

Qrels file already available, replace the runs/run.ance.msmarco-passage.average_prf3.trec with your own run file path to test your reproduced results.

Reproduction Log*