Request for Dense Indexed Data #1745
-
Hello, This is Delaram, MSc student at the University of Windsor, Canada. I am currently researching query reformulation, explicitly focusing on robust04, dbpedia, clueweb09b, antique, and gov2 datasets using sparse retrieval methods. However, as I progress in my work, I need to use dense retrieval methods, which require its own version of indexed data. Unfortunately, I do not have the indexes for dense retrieval. I do not have the document corpus to build the indexes either. I only have the sparse indexed data. Given your expertise and the remarkable work you've undertaken in dense retrieval methods, I was wondering if you or your team have generated dense indexes for the above datasets. If so, I would be grateful if you could share them with me. Access to dense indexed data would be incredibly beneficial for my ongoing research efforts. Please let me know if you would be open to this collaboration or if you have any specific requirements or conditions regarding the sharing of dense indexed data. Your assistance and generosity would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @DelaramRajaei thanks for your interest and apologies for the late response, just getting to this now. All the pre-built indexes that are available are already integrated into our reproducibility guides: https://github.com/castorini/pyserini/#%EF%B8%8F-reproducibility If they're not listed on our website, then we haven't built them yet... so you'll have to do it yourself. All transformers models are available on Huggingface. On onboarding path walks you through the process of encoding a corpus: https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md Hope this helps! |
Beta Was this translation helpful? Give feedback.
Hi @DelaramRajaei thanks for your interest and apologies for the late response, just getting to this now. All the pre-built indexes that are available are already integrated into our reproducibility guides: https://github.com/castorini/pyserini/#%EF%B8%8F-reproducibility
If they're not listed on our website, then we haven't built them yet... so you'll have to do it yourself. All transformers models are available on Huggingface.
On onboarding path walks you through the process of encoding a corpus: https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md
Hope this helps!