- Download either the training and evaluation input query JSON files. These files can be found under
data/treccastweb/2019/data
if you cloned the submodules for this repo.
Pass your pathname to a variable default:
export input_query_json=data/treccastweb/2019/data
- Download the evaluation answer files for training or evaluation. The training answer file is found under
data/treccastweb/2019/data
.
The following command is for HQE, but you can also run other CQR methods using t5
or fusion
instead of hqe
as the input to the --experiment
flag. Running the command for the first time will download the CAsT 2019 index (or whatever index is specified for the --sparse_index
flag). It is also possible to supply a path to a local directory containing the index.
python -m experiments.run_retrieval \
--experiment hqe \
--hits 1000 \
--sparse_index cast2019 \
--qid_queries $input_query_json \
--output ./output/hqe_bm25 \
The experiment will output the retrieval results at the specified location in TSV format. By default, this will perform retrieval using only BM25, but you can add the --rerank
flag to further rerank these results using BERT. For other command line arguments, see run_retrieval.py.
Convert the TSV file from above to TREC format and use the TREC tool to evaluate the resuls in terms of Recall@1000, mAP and NDCG@1,3.
python -m pyserini.eval.trec_eval -c -mndcg_cut.3,1 -mrecall.1000 -mmap $qrel ./output/hqe_bm25.trec
Results for the CAsT 2019 evaluation dataset are provided below. The results may be slightly different from the numbers reported in the paper due to implementation differences between Huggingface and SpaCy versions. As of writing, we use spacy==2.2.4
with the English model en_core_web_sm==2.2.5
, and transformers==4.0.0
.
HQE BM25 | HQE BM25 + BERT | T5 BM25 | T5 BM25 + BERT | Fusion BM25 | Fusion BM25 + BERT | |
---|---|---|---|---|---|---|
mAP | 0.2109 | 0.3058 | 0.2250 | 0.3555 | 0.2584 | 0.3739 |
Recall@1000 | 0.7322 | 0.7322 | 0.7392 | 0.7392 | 0.8028 | 0.8028 |
NDCG@1 | 0.2640 | 0.4745 | 0.2842 | 0.5751 | 0.3353 | 0.5838 |
NDCG@3 | 0.2606 | 0.4798 | 0.2954 | 0.5464 | 0.3247 | 0.5640 |
- Results reproduced by @saileshnankani on 2021-05-07 (commit
3847d15
) (Fusion BM25) - Results reproduced by @ArthurChen189 on 2021-05-07 (commit
ef3a271
) (Fusion BM25) - Results reproduced by @andrewyguo on 2021-05-07 (commit
79f89dc
) (Fusion BM25)