Lucene index of the MS MARCO V1 segmented document corpus, with doc2query-T5 expansions.
This index was generated on 2022/02/01 at Anserini commit 9ea315
on orca
with the following command:
target/appassembler/bin/IndexCollection -collection JsonCollection \
-generator DefaultLuceneDocumentGenerator -threads 16 \
-input /store/collections/msmarco/msmarco-doc-segmented-docTTTTTquery/ \
-index indexes/lucene-index.msmarco-v1-doc-segmented-d2q-t5.20220201.9ea315/ \
-optimize
Note that this index stores term frequencies only, which supports bag-of-words queries, but no phrase queries and no relevance feedback. In addition, there is no way to fetch the raw text.