Codebase for Paper Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval. We develop an encoder-only pre-training schema for dense retrieval, named Bag-of-Word Prediction, for directly compressing the lexicon information into dense representations.
Please install Faiss by following their guidelines. Then you can easily set up the environment by cloning this repo, and runing the following command.
pip install -e .
We have released multiple models pre-trained with Bag-of-Word Prediction.
Model | Description |
---|---|
bowdpr/bowdpr_wiki | Pre-trained Model on Wikipedia and BookCorpus. |
bowdpr/bowdpr_marco | Pre-trained Model on MS-MARCO Passages. |
The released fine-tuned retrievers are listed as follows.
Model | Description |
---|---|
bowdpr/bowdpr_marco_ft | Retriever initialized from bowdpr/bowdpr_marco and fine-tuned on MS-MARCO |
bowdpr/bowdpr_wiki_nqft | Retriever initialized from bowdpr/bowdpr_wiki and fine-tuned on NQ |
bowdpr/bowdpr_wiki_triviaft | Retriever initialized from bowdpr/bowdpr_wiki and fine-tuned on Trivia |
Our model achieves considerable pre-training speedup comparing to previous MAE-style pre-training methods. Speed test is conducted with a batch size of 64, max sequence length of 512 and dataloader number of workers of 8.
Data Process | Additional Decoder | Training Speed | |||||
---|---|---|---|---|---|---|---|
Model | Archeticture | Complexity | Time(s) | Complexity | GPU Time(s) | Sample per second | Degeneration |
Pure MLM Pre-train | Encoder | O(n) | 0.0476 | - | - | 269.708 | - |
Auto-Encoding | Encoder-Decoder | O(n) | 0.0940 | O(n^2) | 0.0013 | 222.658 | 17.4% |
Auto-Regression | Encoder-Decoder | O(n) | 0.0636 | O(n^2) | 0.0030 | 215.136 | 20.2% |
Enhanced Decoding | Encoder-Decoder | O(n^2) | 5.6261 | O(n^2) | 0.0012 | 85.797 | 68.2% |
BoW Prediction | Encoder | O(n) | 0.0533 | - | 0.0002 | 266.359 | 1.2% |
Our model achieves state-of-the-art retrieval performances on multiple retrieval benchmarks, without using any special masking, context sampling, data augmentation, or task-ensembling techniques.
MS-MARCO | Natural Question | Trivia QA | |||||||
---|---|---|---|---|---|---|---|---|---|
Model | MRR@10 | R@50 | R@1k | R@5 | R@20 | R@100 | R@5 | R@20 | R@100 |
RetroMAE | 39.3 | 87.0 | 98.5 | 74.4 | 84.4 | 89.4 | 78.9 | 84.5 | 88.0 |
BoW Prediction | 40.1 | 88.7 | 98.9 | 75.3 | 84.6 | 90.4 | 79.4 | 84.9 | 88.0 |
Please refer to examples below for reproducing our works.
- Pre-training on Wikipedia & BookCorpus or MS-MARCO Passages
- Fine-tuning on MS-MARCO Passage Ranking Task
- Fine-tuning on Natural Questions or TriviaQA
- Fine-tuning on BEIR
If you encounter any bugs or questions, please feel free to open an issue and contact me.
If you are interested in our work, please consider citing our paper.
@misc{ma2024bow_pred,
title={Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval},
author={Guangyuan Ma and Xing Wu and Zijia Lin and Songlin Hu},
year={2024},
eprint={2401.11248},
archivePrefix={arXiv},
primaryClass={cs.IR}
}