Experiments for Conversational Query Reformulation

Data Preparation

Download either the training and evaluation input query JSON files. These files can be found under data/treccastweb/2019/data if you cloned the submodules for this repo.

Pass your pathname to a variable default:

export input_query_json=data/treccastweb/2019/data

Download the evaluation answer files for training or evaluation. The training answer file is found under data/treccastweb/2019/data.

Run CQR retrieval

The following command is for HQE, but you can also run other CQR methods using t5 or fusion instead of hqe as the input to the --experiment flag. Running the command for the first time will download the CAsT 2019 index (or whatever index is specified for the --sparse_index flag). It is also possible to supply a path to a local directory containing the index.

python -m experiments.run_retrieval \
      --experiment hqe \
      --hits 1000 \
      --sparse_index cast2019 \
      --qid_queries $input_query_json \
      --output ./output/hqe_bm25 \

The experiment will output the retrieval results at the specified location in TSV format. By default, this will perform retrieval using only BM25, but you can add the --rerank flag to further rerank these results using BERT. For other command line arguments, see run_retrieval.py.

Evaluate CQR results

Convert the TSV file from above to TREC format and use the TREC tool to evaluate the resuls in terms of Recall@1000, mAP and NDCG@1,3.

python -m pyserini.eval.trec_eval -c -mndcg_cut.3,1 -mrecall.1000 -mmap $qrel ./output/hqe_bm25.trec

Evaluation results

Results for the CAsT 2019 evaluation dataset are provided below. The results may be slightly different from the numbers reported in the paper due to implementation differences between Huggingface and SpaCy versions. As of writing, we use spacy==2.2.4 with the English model en_core_web_sm==2.2.5, and transformers==4.0.0.

	HQE BM25	HQE BM25 + BERT	T5 BM25	T5 BM25 + BERT	Fusion BM25	Fusion BM25 + BERT
mAP	0.2109	0.3058	0.2250	0.3555	0.2584	0.3739
Recall@1000	0.7322	0.7322	0.7392	0.7392	0.8028	0.8028
NDCG@1	0.2640	0.4745	0.2842	0.5751	0.3353	0.5838
NDCG@3	0.2606	0.4798	0.2954	0.5464	0.3247	0.5640

Reproduction Log

Results reproduced by @saileshnankani on 2021-05-07 (commit 3847d15) (Fusion BM25)
Results reproduced by @ArthurChen189 on 2021-05-07 (commit ef3a271) (Fusion BM25)
Results reproduced by @andrewyguo on 2021-05-07 (commit 79f89dc) (Fusion BM25)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cqr_experiments.md

cqr_experiments.md

Experiments for Conversational Query Reformulation

Data Preparation

Run CQR retrieval

Evaluate CQR results

Evaluation results

Reproduction Log

Files

cqr_experiments.md

Latest commit

History

cqr_experiments.md

File metadata and controls

Experiments for Conversational Query Reformulation

Data Preparation

Run CQR retrieval

Evaluate CQR results

Evaluation results

Reproduction Log