Fine-tune on MS-MARCO Passage Ranking Task

Download MS-MARCO Collections

bash get_data.sh

Fine-tune on MS-MARCO

Finetuning on MS-MARCO dataset involves a two-stage pipeline

s1: BM25 negs
s2: Mined negatives from s1

These pipelines are bootstraped in train_dpr_msmarco.sh. The pre-trained checkpoint on MS-MARCO Passage Corpus is released in bowdpr/bowdpr_marco. Assume the checkpoints are already placed in examples/results/$MODEL_NAME/model (You can set $MODEL_NAME to any name as you wish), please execute the fine-tuning pipelines by just run:

bash train_dpr_msmarco.sh $MODEL_NAME

Directly Test a Retreiver

We have released the fine-tuned MS-MARCO retriever to Huggingface. Please execute the following script to test the retrieval performances.

# Save the scores of retrieval results to this folder. Change to any temporary folder as you wish
mkdir -p results/msmarco
bash test_dpr_msmarco.sh bowdpr/bowdpr_marco_ft results/msmarco

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fine-tune on MS-MARCO Passage Ranking Task

Download MS-MARCO Collections