Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 813 Bytes

lucene-index.msmarco-v1-doc-segmented-d2q-t5.20220201.9ea315.README.md

File metadata and controls

15 lines (11 loc) · 813 Bytes

msmarco-v1-doc-segmented-d2q-t5

Lucene index of the MS MARCO V1 segmented document corpus, with doc2query-T5 expansions.

This index was generated on 2022/02/01 at Anserini commit 9ea315 on orca with the following command:

target/appassembler/bin/IndexCollection -collection JsonCollection \
  -generator DefaultLuceneDocumentGenerator -threads 16 \
  -input /store/collections/msmarco/msmarco-doc-segmented-docTTTTTquery/ \
  -index indexes/lucene-index.msmarco-v1-doc-segmented-d2q-t5.20220201.9ea315/ \
  -optimize

Note that this index stores term frequencies only, which supports bag-of-words queries, but no phrase queries and no relevance feedback. In addition, there is no way to fetch the raw text.