diff --git a/README.md b/README.md index ab96899832..007e5cced5 100644 --- a/README.md +++ b/README.md @@ -10,8 +10,8 @@ Anserini Anserini is a toolkit for reproducible information retrieval research. By building on Lucene, we aim to bridge the gap between academic information retrieval research and the practice of building real-world search applications. Among other goals, our effort aims to be [the opposite of this](http://phdcomics.com/comics/archive.php?comicid=1689).[*](docs/reproducibility.md) -Anserini grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) (Lin et al., ECIR 2016). -See [Yang et al. (SIGIR 2017)](https://dl.acm.org/authorize?N47337) and [Yang et al. (JDIQ 2018)](https://dl.acm.org/citation.cfm?doid=3289400.3239571) for overviews. +Anserini grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://link.springer.com/chapter/10.1007/978-3-319-30671-1_30) (Lin et al., ECIR 2016). +See [Yang et al. (SIGIR 2017)](https://dl.acm.org/doi/10.1145/3077136.3080721) and [Yang et al. (JDIQ 2018)](https://dl.acm.org/doi/10.1145/3239571) for overviews. ## 🎬 Getting Started @@ -64,29 +64,29 @@ See individual pages for details! ### MS MARCO V1 Passage Regressions -| | dev | DL19 | DL20 | -|---------------------------------------------|:------------------------------------------------------------------------:|:---------------------------------------------------------------------:|:---------------------------------------------------------------------:| -| **Unsupervised Sparse Lexical** | | | | -| BoW baselines | [+](docs/regressions-msmarco-passage.md) | [+](docs/regressions-dl19-passage.md) | [+](docs/regressions-dl20-passage.md) | -| Quantized BM25 | [✓](docs/regressions-msmarco-passage-bm25-b8.md) | [✓](docs/regressions-dl19-passage-bm25-b8.md) | [✓](docs/regressions-dl20-passage-bm25-b8.md) | -| WP baselines | [+](docs/regressions-msmarco-passage-wp.md) | [+](docs/regressions-dl19-passage-wp.md) | [+](docs/regressions-dl20-passage-wp.md) | -| Huggingface WP baselines | [+](docs/regressions-msmarco-passage-hgf-wp.md) | [+](docs/regressions-dl19-passage-hgf-wp.md) | [+](docs/regressions-dl20-passage-hgf-wp.md) | -| doc2query | [+](docs/regressions-msmarco-passage-doc2query.md) | | | -| doc2query-T5 | [+](docs/regressions-msmarco-passage-docTTTTTquery.md) | [+](docs/regressions-dl19-passage-docTTTTTquery.md) | [+](docs/regressions-dl20-passage-docTTTTTquery.md) | -| **Learned Sparse Lexical (uniCOIL family)** | | | | -| uniCOIL noexp | [✓](docs/regressions-msmarco-passage-unicoil-noexp.md) | [✓](docs/regressions-dl19-passage-unicoil-noexp.md) | [✓](docs/regressions-dl20-passage-unicoil-noexp.md) | -| uniCOIL with doc2query-T5 | [✓](docs/regressions-msmarco-passage-unicoil.md) | [✓](docs/regressions-dl19-passage-unicoil.md) | [✓](docs/regressions-dl20-passage-unicoil.md) | -| uniCOIL with TILDE | [✓](docs/regressions-msmarco-passage-unicoil-tilde-expansion.md) | | | -| **Learned Sparse Lexical (other)** | | | | -| DeepImpact | [✓](docs/regressions-msmarco-passage-deepimpact.md) | | | -| SPLADEv2 | [✓](docs/regressions-msmarco-passage-distill-splade-max.md) | | | -| SPLADE-distill CoCodenser-medium | [✓](docs/regressions-msmarco-passage-splade-distil-cocodenser-medium.md) | [✓](docs/regressions-dl19-passage-splade-distil-cocodenser-medium.md) | [✓](docs/regressions-dl20-passage-splade-distil-cocodenser-medium.md) | -| SPLADE++ CoCondenser-EnsembleDistil | [✓](docs/regressions-msmarco-passage-splade-pp-ed.md) | [✓](docs/regressions-dl19-passage-splade-pp-ed.md) | [✓](docs/regressions-dl20-passage-splade-pp-ed.md) | -| SPLADE++ CoCondenser-EnsembleDistil (ONNX) | [✓](docs/regressions-msmarco-passage-splade-pp-ed-onnx.md) | [✓](docs/regressions-dl19-passage-splade-pp-ed-onnx.md) | [✓](docs/regressions-dl20-passage-splade-pp-ed-onnx.md) | -| SPLADE++ CoCondenser-SelfDistil | [✓](docs/regressions-msmarco-passage-splade-pp-sd.md) | [✓](docs/regressions-dl19-passage-splade-pp-sd.md) | [✓](docs/regressions-dl20-passage-splade-pp-sd.md) | -| SPLADE++ CoCondenser-SelfDistil (ONNX) | [✓](docs/regressions-msmarco-passage-splade-pp-sd-onnx.md) | [✓](docs/regressions-dl19-passage-splade-pp-sd-onnx.md) | [✓](docs/regressions-dl20-passage-splade-pp-sd-onnx.md) | -| **Learned Dense** | | | | -| cosDPR-distil | [✓](docs/regressions-msmarco-passage-cos-dpr-distil.md) | | | | +| | dev | DL19 | DL20 | +|---------------------------------------------|:------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------:| +| **Unsupervised Sparse Lexical** | | | | +| BoW baselines | [+](docs/regressions/regressions-msmarco-passage.md) | [+](docs/regressions/regressions-dl19-passage.md) | [+](docs/regressions/regressions-dl20-passage.md) | +| Quantized BM25 | [✓](docs/regressions/regressions-msmarco-passage-bm25-b8.md) | [✓](docs/regressions/regressions-dl19-passage-bm25-b8.md) | [✓](docs/regressions/regressions-dl20-passage-bm25-b8.md) | +| WP baselines | [+](docs/regressions/regressions-msmarco-passage-wp.md) | [+](docs/regressions/regressions-dl19-passage-wp.md) | [+](docs/regressions/regressions-dl20-passage-wp.md) | +| Huggingface WP baselines | [+](docs/regressions/regressions-msmarco-passage-hgf-wp.md) | [+](docs/regressions/regressions-dl19-passage-hgf-wp.md) | [+](docs/regressions/regressions-dl20-passage-hgf-wp.md) | +| doc2query | [+](docs/regressions/regressions-msmarco-passage-doc2query.md) | | | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-passage-docTTTTTquery.md) | [+](docs/regressions/regressions-dl19-passage-docTTTTTquery.md) | [+](docs/regressions/regressions-dl20-passage-docTTTTTquery.md) | +| **Learned Sparse Lexical (uniCOIL family)** | | | | +| uniCOIL noexp | [✓](docs/regressions/regressions-msmarco-passage-unicoil-noexp.md) | [✓](docs/regressions/regressions-dl19-passage-unicoil-noexp.md) | [✓](docs/regressions/regressions-dl20-passage-unicoil-noexp.md) | +| uniCOIL with doc2query-T5 | [✓](docs/regressions/regressions-msmarco-passage-unicoil.md) | [✓](docs/regressions/regressions-dl19-passage-unicoil.md) | [✓](docs/regressions/regressions-dl20-passage-unicoil.md) | +| uniCOIL with TILDE | [✓](docs/regressions/regressions-msmarco-passage-unicoil-tilde-expansion.md) | | | +| **Learned Sparse Lexical (other)** | | | | +| DeepImpact | [✓](docs/regressions/regressions-msmarco-passage-deepimpact.md) | | | +| SPLADEv2 | [✓](docs/regressions/regressions-msmarco-passage-distill-splade-max.md) | | | +| SPLADE-distill CoCodenser-medium | [✓](docs/regressions/regressions-msmarco-passage-splade-distil-cocodenser-medium.md) | [✓](docs/regressions/regressions-dl19-passage-splade-distil-cocodenser-medium.md) | [✓](docs/regressions/regressions-dl20-passage-splade-distil-cocodenser-medium.md) | +| SPLADE++ CoCondenser-EnsembleDistil | [✓](docs/regressions/regressions-msmarco-passage-splade-pp-ed.md) | [✓](docs/regressions/regressions-dl19-passage-splade-pp-ed.md) | [✓](docs/regressions/regressions-dl20-passage-splade-pp-ed.md) | +| SPLADE++ CoCondenser-EnsembleDistil (ONNX) | [✓](docs/regressions/regressions-msmarco-passage-splade-pp-ed-onnx.md) | [✓](docs/regressions/regressions-dl19-passage-splade-pp-ed-onnx.md) | [✓](docs/regressions/regressions-dl20-passage-splade-pp-ed-onnx.md) | +| SPLADE++ CoCondenser-SelfDistil | [✓](docs/regressions/regressions-msmarco-passage-splade-pp-sd.md) | [✓](docs/regressions/regressions-dl19-passage-splade-pp-sd.md) | [✓](docs/regressions/regressions-dl20-passage-splade-pp-sd.md) | +| SPLADE++ CoCondenser-SelfDistil (ONNX) | [✓](docs/regressions/regressions-msmarco-passage-splade-pp-sd-onnx.md) | [✓](docs/regressions/regressions-dl19-passage-splade-pp-sd-onnx.md) | [✓](docs/regressions/regressions-dl20-passage-splade-pp-sd-onnx.md) | +| **Learned Dense** | | | | +| cosDPR-distil | [✓](docs/regressions/regressions-msmarco-passage-cos-dpr-distil.md) | | | | ### Available Corpora for Download @@ -109,20 +109,20 @@ See individual pages for details! ### MS MARCO V1 Document Regressions -| | dev | DL19 | DL20 | -|-----------------------------------------------------------------------------------------------|:------------------------------------------------------------:|:---------------------------------------------------------:|:---------------------------------------------------------:| +| | dev | DL19 | DL20 | +|-----------------------------------------------------------------------------------------------|:------------------------------------------------------------------------:|:---------------------------------------------------------------------:|:---------------------------------------------------------------------:| | **Unsupervised Lexical, Complete Doc**[*](docs/experiments-msmarco-doc-doc2query-details.md) | -| BoW baselines | [+](docs/regressions-msmarco-doc.md) | [+](docs/regressions-dl19-doc.md) | [+](docs/regressions-dl20-doc.md) | -| WP baselines | [+](docs/regressions-msmarco-doc-wp.md) | [+](docs/regressions-dl19-doc-wp.md) | [+](docs/regressions-dl20-doc-wp.md) | -| Huggingface WP baselines | [+](docs/regressions-msmarco-doc-hgf-wp.md) | [+](docs/regressions-dl19-doc-hgf-wp.md) | [+](docs/regressions-dl20-doc-hgf-wp.md) | -| doc2query-T5 | [+](docs/regressions-msmarco-doc-docTTTTTquery.md) | [+](docs/regressions-dl19-doc-docTTTTTquery.md) | [+](docs/regressions-dl20-doc-docTTTTTquery.md) | +| BoW baselines | [+](docs/regressions/regressions-msmarco-doc.md) | [+](docs/regressions/regressions-dl19-doc.md) | [+](docs/regressions/regressions-dl20-doc.md) | +| WP baselines | [+](docs/regressions/regressions-msmarco-doc-wp.md) | [+](docs/regressions/regressions-dl19-doc-wp.md) | [+](docs/regressions/regressions-dl20-doc-wp.md) | +| Huggingface WP baselines | [+](docs/regressions/regressions-msmarco-doc-hgf-wp.md) | [+](docs/regressions/regressions-dl19-doc-hgf-wp.md) | [+](docs/regressions/regressions-dl20-doc-hgf-wp.md) | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-doc-docTTTTTquery.md) | [+](docs/regressions/regressions-dl19-doc-docTTTTTquery.md) | [+](docs/regressions/regressions-dl20-doc-docTTTTTquery.md) | | **Unsupervised Lexical, Segmented Doc**[*](docs/experiments-msmarco-doc-doc2query-details.md) | -| BoW baselines | [+](docs/regressions-msmarco-doc-segmented.md) | [+](docs/regressions-dl19-doc-segmented.md) | [+](docs/regressions-dl20-doc-segmented.md) | -| WP baselines | [+](docs/regressions-msmarco-doc-segmented-wp.md) | [+](docs/regressions-dl19-doc-segmented-wp.md) | [+](docs/regressions-dl20-doc-segmented-wp.md) | -| doc2query-T5 | [+](docs/regressions-msmarco-doc-segmented-docTTTTTquery.md) | [+](docs/regressions-dl19-doc-segmented-docTTTTTquery.md) | [+](docs/regressions-dl20-doc-segmented-docTTTTTquery.md) | +| BoW baselines | [+](docs/regressions/regressions-msmarco-doc-segmented.md) | [+](docs/regressions/regressions-dl19-doc-segmented.md) | [+](docs/regressions/regressions-dl20-doc-segmented.md) | +| WP baselines | [+](docs/regressions/regressions-msmarco-doc-segmented-wp.md) | [+](docs/regressions/regressions-dl19-doc-segmented-wp.md) | [+](docs/regressions/regressions-dl20-doc-segmented-wp.md) | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-doc-segmented-docTTTTTquery.md) | [+](docs/regressions/regressions-dl19-doc-segmented-docTTTTTquery.md) | [+](docs/regressions/regressions-dl20-doc-segmented-docTTTTTquery.md) | | **Learned Sparse Lexical** | -| uniCOIL noexp | [✓](docs/regressions-msmarco-doc-segmented-unicoil-noexp.md) | [✓](docs/regressions-dl19-doc-segmented-unicoil-noexp.md) | [✓](docs/regressions-dl20-doc-segmented-unicoil-noexp.md) | -| uniCOIL with doc2query-T5 | [✓](docs/regressions-msmarco-doc-segmented-unicoil.md) | [✓](docs/regressions-dl19-doc-segmented-unicoil.md) | [✓](docs/regressions-dl20-doc-segmented-unicoil.md) | +| uniCOIL noexp | [✓](docs/regressions/regressions-msmarco-doc-segmented-unicoil-noexp.md) | [✓](docs/regressions/regressions-dl19-doc-segmented-unicoil-noexp.md) | [✓](docs/regressions/regressions-dl20-doc-segmented-unicoil-noexp.md) | +| uniCOIL with doc2query-T5 | [✓](docs/regressions/regressions-msmarco-doc-segmented-unicoil.md) | [✓](docs/regressions/regressions-dl19-doc-segmented-unicoil.md) | [✓](docs/regressions/regressions-dl20-doc-segmented-unicoil.md) | ### Available Corpora for Download @@ -137,19 +137,19 @@ See individual pages for details! ### MS MARCO V2 Passage Regressions -| | dev | DL21 | DL22 | -|--------------------------------------------|:---------------------------------------------------------------:|:---------------------------------------------------------:|:---------------------------------------------------------:| +| | dev | DL21 | DL22 | +|--------------------------------------------|:---------------------------------------------------------------------------:|:---------------------------------------------------------------------:|:---------------------------------------------------------------------:| | **Unsupervised Lexical, Original Corpus** | -| baselines | [+](docs/regressions-msmarco-v2-passage.md) | [+](docs/regressions-dl21-passage.md) | [+](docs/regressions-dl22-passage.md) | -| doc2query-T5 | [+](docs/regressions-msmarco-v2-passage-d2q-t5.md) | [+](docs/regressions-dl21-passage-d2q-t5.md) | [+](docs/regressions-dl22-passage-d2q-t5.md) | +| baselines | [+](docs/regressions/regressions-msmarco-v2-passage.md) | [+](docs/regressions/regressions-dl21-passage.md) | [+](docs/regressions/regressions-dl22-passage.md) | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-passage-d2q-t5.md) | [+](docs/regressions/regressions-dl21-passage-d2q-t5.md) | [+](docs/regressions/regressions-dl22-passage-d2q-t5.md) | | **Unsupervised Lexical, Augmented Corpus** | -| baselines | [+](docs/regressions-msmarco-v2-passage-augmented.md) | [+](docs/regressions-dl21-passage-augmented.md) | [+](docs/regressions-dl22-passage-augmented.md) | -| doc2query-T5 | [+](docs/regressions-msmarco-v2-passage-augmented-d2q-t5.md) | [+](docs/regressions-dl21-passage-augmented-d2q-t5.md) | [+](docs/regressions-dl22-passage-augmented-d2q-t5.md) | +| baselines | [+](docs/regressions/regressions-msmarco-v2-passage-augmented.md) | [+](docs/regressions/regressions-dl21-passage-augmented.md) | [+](docs/regressions/regressions-dl22-passage-augmented.md) | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-passage-augmented-d2q-t5.md) | [+](docs/regressions/regressions-dl21-passage-augmented-d2q-t5.md) | [+](docs/regressions/regressions-dl22-passage-augmented-d2q-t5.md) | | **Learned Sparse Lexical** | -| uniCOIL noexp zero-shot | [✓](docs/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md) | [✓](docs/regressions-dl21-passage-unicoil-noexp-0shot.md) | [✓](docs/regressions-dl22-passage-unicoil-noexp-0shot.md) | -| uniCOIL with doc2query-T5 zero-shot | [✓](docs/regressions-msmarco-v2-passage-unicoil-0shot.md) | [✓](docs/regressions-dl21-passage-unicoil-0shot.md) | [✓](docs/regressions-dl22-passage-unicoil-0shot.md) | -| SPLADE++ CoCondenser-EnsembleDistil | [✓](docs/regressions-msmarco-v2-passage-splade-pp-ed.md) | [✓](docs/regressions-dl21-passage-splade-pp-ed.md) | [✓](docs/regressions-dl22-passage-splade-pp-ed.md) | -| SPLADE++ CoCondenser-SelfDistil | [✓](docs/regressions-msmarco-v2-passage-splade-pp-sd.md) | [✓](docs/regressions-dl21-passage-splade-pp-sd.md) | [✓](docs/regressions-dl22-passage-splade-pp-sd.md) | +| uniCOIL noexp zero-shot | [✓](docs/regressions/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md) | [✓](docs/regressions/regressions-dl21-passage-unicoil-noexp-0shot.md) | [✓](docs/regressions/regressions-dl22-passage-unicoil-noexp-0shot.md) | +| uniCOIL with doc2query-T5 zero-shot | [✓](docs/regressions/regressions-msmarco-v2-passage-unicoil-0shot.md) | [✓](docs/regressions/regressions-dl21-passage-unicoil-0shot.md) | [✓](docs/regressions/regressions-dl22-passage-unicoil-0shot.md) | +| SPLADE++ CoCondenser-EnsembleDistil | [✓](docs/regressions/regressions-msmarco-v2-passage-splade-pp-ed.md) | [✓](docs/regressions/regressions-dl21-passage-splade-pp-ed.md) | [✓](docs/regressions/regressions-dl22-passage-splade-pp-ed.md) | +| SPLADE++ CoCondenser-SelfDistil | [✓](docs/regressions/regressions-msmarco-v2-passage-splade-pp-sd.md) | [✓](docs/regressions/regressions-dl21-passage-splade-pp-sd.md) | [✓](docs/regressions/regressions-dl22-passage-splade-pp-sd.md) | ### Available Corpora for Download @@ -166,17 +166,17 @@ See individual pages for details! ### MS MARCO V2 Document Regressions -| | dev | DL21 | -|-----------------------------------------|:------------------------------------------------------------------------:|:------------------------------------------------------------------:| +| | dev | DL21 | +|-----------------------------------------|:------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------:| | **Unsupervised Lexical, Complete Doc** | -| baselines | [+](docs/regressions-msmarco-v2-doc.md) | [+](docs/regressions-dl21-doc.md) | -| doc2query-T5 | [+](docs/regressions-msmarco-v2-doc-d2q-t5.md) | [+](docs/regressions-dl21-doc-d2q-t5.md) | +| baselines | [+](docs/regressions/regressions-msmarco-v2-doc.md) | [+](docs/regressions/regressions-dl21-doc.md) | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-doc-d2q-t5.md) | [+](docs/regressions/regressions-dl21-doc-d2q-t5.md) | | **Unsupervised Lexical, Segmented Doc** | -| baselines | [+](docs/regressions-msmarco-v2-doc-segmented.md) | [+](docs/regressions-dl21-doc-segmented.md) | -| doc2query-T5 | [+](docs/regressions-msmarco-v2-doc-segmented-d2q-t5.md) | [+](docs/regressions-dl21-doc-segmented-d2q-t5.md) | +| baselines | [+](docs/regressions/regressions-msmarco-v2-doc-segmented.md) | [+](docs/regressions/regressions-dl21-doc-segmented.md) | +| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-doc-segmented-d2q-t5.md) | [+](docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md) | | **Learned Sparse Lexical** | -| uniCOIL noexp zero-shot | [✓](docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md) | [✓](docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md) | -| uniCOIL with doc2query-T5 zero-shot | [✓](docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md) | [✓](docs/regressions-dl21-doc-segmented-unicoil-0shot-v2.md) | +| uniCOIL noexp zero-shot | [✓](docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md) | [✓](docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md) | +| uniCOIL with doc2query-T5 zero-shot | [✓](docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md) | [✓](docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot-v2.md) | ### Available Corpora for Download @@ -196,37 +196,37 @@ See individual pages for details! + UCx = uniCOIL (noexp) + SPLADE = SPLADE-distill CoCodenser-medium -| Corpus | flat | flat-wp | multifield | UCx | SPLADE | -|--------|:----:|:-------:|:----------:|:------:|:------:| -| TREC-COVID | [+](docs/regressions-beir-v1.0.0-trec-covid-flat.md) | [+](docs/regressions-beir-v1.0.0-trec-covid-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-trec-covid-multifield.md) | [+](docs/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md) | -| BioASQ | [+](docs/regressions-beir-v1.0.0-bioasq-flat.md) | [+](docs/regressions-beir-v1.0.0-bioasq-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-bioasq-multifield.md) | [+](docs/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md) | -| NFCorpus | [+](docs/regressions-beir-v1.0.0-nfcorpus-flat.md) | [+](docs/regressions-beir-v1.0.0-nfcorpus-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-nfcorpus-multifield.md) | [+](docs/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md) | -| NQ | [+](docs/regressions-beir-v1.0.0-nq-flat.md) | [+](docs/regressions-beir-v1.0.0-nq-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-nq-multifield.md) | [+](docs/regressions-beir-v1.0.0-nq-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md) | -| HotpotQA | [+](docs/regressions-beir-v1.0.0-hotpotqa-flat.md) | [+](docs/regressions-beir-v1.0.0-hotpotqa-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-hotpotqa-multifield.md) | [+](docs/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md) | -| FiQA-2018 | [+](docs/regressions-beir-v1.0.0-fiqa-flat.md) | [+](docs/regressions-beir-v1.0.0-fiqa-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-fiqa-multifield.md) | [+](docs/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md) | -| Signal-1M(RT) | [+](docs/regressions-beir-v1.0.0-signal1m-flat.md) | [+](docs/regressions-beir-v1.0.0-signal1m-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-signal1m-multifield.md) | [+](docs/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md) | -| TREC-NEWS | [+](docs/regressions-beir-v1.0.0-trec-news-flat.md) | [+](docs/regressions-beir-v1.0.0-trec-news-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-trec-news-multifield.md) | [+](docs/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md) | -| Robust04 | [+](docs/regressions-beir-v1.0.0-robust04-flat.md) | [+](docs/regressions-beir-v1.0.0-robust04-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-robust04-multifield.md) | [+](docs/regressions-beir-v1.0.0-robust04-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md) | -| ArguAna | [+](docs/regressions-beir-v1.0.0-arguana-flat.md) | [+](docs/regressions-beir-v1.0.0-arguana-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-arguana-multifield.md) | [+](docs/regressions-beir-v1.0.0-arguana-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md) | -| Touche2020 | [+](docs/regressions-beir-v1.0.0-webis-touche2020-flat.md) | [+](docs/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-webis-touche2020-multifield.md) | [+](docs/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md) | -| CQADupStack-Android | [+](docs/regressions-beir-v1.0.0-cqadupstack-android-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-android-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md) | -| CQADupStack-English | [+](docs/regressions-beir-v1.0.0-cqadupstack-english-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-english-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md) | -| CQADupStack-Gaming | [+](docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md) | -| CQADupStack-Gis | [+](docs/regressions-beir-v1.0.0-cqadupstack-gis-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md) | -| CQADupStack-Mathematica | [+](docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md) | -| CQADupStack-Physics | [+](docs/regressions-beir-v1.0.0-cqadupstack-physics-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md) | -| CQADupStack-Programmers | [+](docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md) | -| CQADupStack-Stats | [+](docs/regressions-beir-v1.0.0-cqadupstack-stats-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md) | -| CQADupStack-Tex | [+](docs/regressions-beir-v1.0.0-cqadupstack-tex-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md) | -| CQADupStack-Unix | [+](docs/regressions-beir-v1.0.0-cqadupstack-unix-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md) | -| CQADupStack-Webmasters | [+](docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md) | -| CQADupStack-Wordpress | [+](docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md) | -| Quora | [+](docs/regressions-beir-v1.0.0-quora-flat.md) | [+](docs/regressions-beir-v1.0.0-quora-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-quora-multifield.md) | [+](docs/regressions-beir-v1.0.0-quora-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md) | -| DBPedia | [+](docs/regressions-beir-v1.0.0-dbpedia-entity-flat.md) | [+](docs/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-dbpedia-entity-multifield.md) | [+](docs/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md) | -| SCIDOCS | [+](docs/regressions-beir-v1.0.0-scidocs-flat.md) | [+](docs/regressions-beir-v1.0.0-scidocs-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-scidocs-multifield.md) | [+](docs/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md) | -| FEVER | [+](docs/regressions-beir-v1.0.0-fever-flat.md) | [+](docs/regressions-beir-v1.0.0-fever-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-fever-multifield.md) | [+](docs/regressions-beir-v1.0.0-fever-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md) | -| Climate-FEVER | [+](docs/regressions-beir-v1.0.0-climate-fever-flat.md) | [+](docs/regressions-beir-v1.0.0-climate-fever-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-climate-fever-multifield.md) | [+](docs/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md) | -| SciFact | [+](docs/regressions-beir-v1.0.0-scifact-flat.md) | [+](docs/regressions-beir-v1.0.0-scifact-flat-wp.md) | [+](docs/regressions-beir-v1.0.0-scifact-multifield.md) | [+](docs/regressions-beir-v1.0.0-scifact-unicoil-noexp.md) | [+](docs/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md) | +| Corpus | flat | flat-wp | multifield | UCx | SPLADE | +|-------------------------|:-----------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------:| +| TREC-COVID | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md) | +| BioASQ | [+](docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md) | +| NFCorpus | [+](docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md) | +| NQ | [+](docs/regressions/regressions-beir-v1.0.0-nq-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-nq-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md) | +| HotpotQA | [+](docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md) | +| FiQA-2018 | [+](docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md) | +| Signal-1M(RT) | [+](docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md) | +| TREC-NEWS | [+](docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md) | +| Robust04 | [+](docs/regressions/regressions-beir-v1.0.0-robust04-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md) | +| ArguAna | [+](docs/regressions/regressions-beir-v1.0.0-arguana-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md) | +| Touche2020 | [+](docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md) | +| CQADupStack-Android | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md) | +| CQADupStack-English | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md) | +| CQADupStack-Gaming | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md) | +| CQADupStack-Gis | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md) | +| CQADupStack-Mathematica | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md) | +| CQADupStack-Physics | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md) | +| CQADupStack-Programmers | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md) | +| CQADupStack-Stats | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md) | +| CQADupStack-Tex | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md) | +| CQADupStack-Unix | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md) | +| CQADupStack-Webmasters | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md) | +| CQADupStack-Wordpress | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md) | +| Quora | [+](docs/regressions/regressions-beir-v1.0.0-quora-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-quora-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md) | +| DBPedia | [+](docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md) | +| SCIDOCS | [+](docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md) | +| FEVER | [+](docs/regressions/regressions-beir-v1.0.0-fever-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-fever-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md) | +| Climate-FEVER | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md) | +| SciFact | [+](docs/regressions/regressions-beir-v1.0.0-scifact-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md) |
@@ -234,19 +234,19 @@ See individual pages for details! ### Cross-lingual and Multi-lingual Regressions -+ Regressions for Mr. TyDi (v1.1) baselines: [ar](docs/regressions-mrtydi-v1.1-ar.md), [bn](docs/regressions-mrtydi-v1.1-bn.md), [en](docs/regressions-mrtydi-v1.1-en.md), [fi](docs/regressions-mrtydi-v1.1-fi.md), [id](docs/regressions-mrtydi-v1.1-id.md), [ja](docs/regressions-mrtydi-v1.1-ja.md), [ko](docs/regressions-mrtydi-v1.1-ko.md), [ru](docs/regressions-mrtydi-v1.1-ru.md), [sw](docs/regressions-mrtydi-v1.1-sw.md), [te](docs/regressions-mrtydi-v1.1-te.md), [th](docs/regressions-mrtydi-v1.1-th.md) -+ Regressions for MIRACL (v1.0) baselines: [ar](docs/regressions-miracl-v1.0-ar.md), [bn](docs/regressions-miracl-v1.0-bn.md), [en](docs/regressions-miracl-v1.0-en.md), [es](docs/regressions-miracl-v1.0-es.md), [fa](docs/regressions-miracl-v1.0-fa.md), [fi](docs/regressions-miracl-v1.0-fi.md), [fr](docs/regressions-miracl-v1.0-fr.md), [hi](docs/regressions-miracl-v1.0-hi.md), [id](docs/regressions-miracl-v1.0-id.md), [ja](docs/regressions-miracl-v1.0-ja.md), [ko](docs/regressions-miracl-v1.0-ko.md), [ru](docs/regressions-miracl-v1.0-ru.md), [sw](docs/regressions-miracl-v1.0-sw.md), [te](docs/regressions-miracl-v1.0-te.md), [th](docs/regressions-miracl-v1.0-th.md), [zh](docs/regressions-miracl-v1.0-zh.md) -+ Regressions for TREC 2022 NeuCLIR Track BM25 (query translation): [Persian](docs/regressions-neuclir22-fa-qt.md), [Russian](docs/regressions-neuclir22-ru-qt.md), [Chinese](docs/regressions-neuclir22-zh-qt.md) -+ Regressions for TREC 2022 NeuCLIR Track BM25 (document translation): [Persian](docs/regressions-neuclir22-fa-dt.md), [Russian](docs/regressions-neuclir22-ru-dt.md), [Chinese](docs/regressions-neuclir22-zh-dt.md) -+ Regressions for TREC 2022 NeuCLIR Track SPLADE (query translation): [Persian](docs/regressions-neuclir22-fa-qt-splade.md), [Russian](docs/regressions-neuclir22-ru-qt-splade.md), [Chinese](docs/regressions-neuclir22-zh-qt-splade.md) -+ Regressions for TREC 2022 NeuCLIR Track SPLADE (document translation): [Persian](docs/regressions-neuclir22-fa-dt-splade.md), [Russian](docs/regressions-neuclir22-ru-dt-splade.md), [Chinese](docs/regressions-neuclir22-zh-dt-splade.md) -+ Regressions for HC4 (v1.0) baselines on HC4 corpora: [Persian](docs/regressions-hc4-v1.0-fa.md), [Russian](docs/regressions-hc4-v1.0-ru.md), [Chinese](docs/regressions-hc4-v1.0-zh.md) -+ Regressions for HC4 (v1.0) baselines on original NeuCLIR22 corpora: [Persian](docs/regressions-hc4-neuclir22-fa.md), [Russian](docs/regressions-hc4-neuclir22-ru.md), [Chinese](docs/regressions-hc4-neuclir22-zh.md) -+ Regressions for HC4 (v1.0) baselines on translated NeuCLIR22 corpora: [Persian](docs/regressions-hc4-neuclir22-fa-en.md), [Russian](docs/regressions-hc4-neuclir22-ru-en.md), [Chinese](docs/regressions-hc4-neuclir22-zh-en.md) -+ Regressions for [NTCIR-8 ACLIA (IR4QA subtask, Monolingual Chinese)](docs/regressions-ntcir8-zh.md) -+ Regressions for [CLEF 2006 Monolingual French](docs/regressions-clef06-fr.md) -+ Regressions for [TREC 2002 Monolingual Arabic](docs/regressions-trec02-ar.md) -+ Regressions for FIRE 2012: [Monolingual Bengali](docs/regressions-fire12-bn.md), [Monolingual Hindi](docs/regressions-fire12-hi.md), [Monolingual English](docs/regressions-fire12-en.md) ++ Regressions for Mr. TyDi (v1.1) baselines: [ar](docs/regressions/regressions-mrtydi-v1.1-ar.md), [bn](docs/regressions/regressions-mrtydi-v1.1-bn.md), [en](docs/regressions/regressions-mrtydi-v1.1-en.md), [fi](docs/regressions/regressions-mrtydi-v1.1-fi.md), [id](docs/regressions/regressions-mrtydi-v1.1-id.md), [ja](docs/regressions/regressions-mrtydi-v1.1-ja.md), [ko](docs/regressions/regressions-mrtydi-v1.1-ko.md), [ru](docs/regressions/regressions-mrtydi-v1.1-ru.md), [sw](docs/regressions/regressions-mrtydi-v1.1-sw.md), [te](docs/regressions/regressions-mrtydi-v1.1-te.md), [th](docs/regressions/regressions-mrtydi-v1.1-th.md) ++ Regressions for MIRACL (v1.0) baselines: [ar](docs/regressions/regressions-miracl-v1.0-ar.md), [bn](docs/regressions/regressions-miracl-v1.0-bn.md), [en](docs/regressions/regressions-miracl-v1.0-en.md), [es](docs/regressions/regressions-miracl-v1.0-es.md), [fa](docs/regressions/regressions-miracl-v1.0-fa.md), [fi](docs/regressions/regressions-miracl-v1.0-fi.md), [fr](docs/regressions/regressions-miracl-v1.0-fr.md), [hi](docs/regressions/regressions-miracl-v1.0-hi.md), [id](docs/regressions/regressions-miracl-v1.0-id.md), [ja](docs/regressions/regressions-miracl-v1.0-ja.md), [ko](docs/regressions/regressions-miracl-v1.0-ko.md), [ru](docs/regressions/regressions-miracl-v1.0-ru.md), [sw](docs/regressions/regressions-miracl-v1.0-sw.md), [te](docs/regressions/regressions-miracl-v1.0-te.md), [th](docs/regressions/regressions-miracl-v1.0-th.md), [zh](docs/regressions/regressions-miracl-v1.0-zh.md) ++ Regressions for TREC 2022 NeuCLIR Track BM25 (query translation): [Persian](docs/regressions/regressions-neuclir22-fa-qt.md), [Russian](docs/regressions/regressions-neuclir22-ru-qt.md), [Chinese](docs/regressions/regressions-neuclir22-zh-qt.md) ++ Regressions for TREC 2022 NeuCLIR Track BM25 (document translation): [Persian](docs/regressions/regressions-neuclir22-fa-dt.md), [Russian](docs/regressions/regressions-neuclir22-ru-dt.md), [Chinese](docs/regressions/regressions-neuclir22-zh-dt.md) ++ Regressions for TREC 2022 NeuCLIR Track SPLADE (query translation): [Persian](docs/regressions/regressions-neuclir22-fa-qt-splade.md), [Russian](docs/regressions/regressions-neuclir22-ru-qt-splade.md), [Chinese](docs/regressions/regressions-neuclir22-zh-qt-splade.md) ++ Regressions for TREC 2022 NeuCLIR Track SPLADE (document translation): [Persian](docs/regressions/regressions-neuclir22-fa-dt-splade.md), [Russian](docs/regressions/regressions-neuclir22-ru-dt-splade.md), [Chinese](docs/regressions/regressions-neuclir22-zh-dt-splade.md) ++ Regressions for HC4 (v1.0) baselines on HC4 corpora: [Persian](docs/regressions/regressions-hc4-v1.0-fa.md), [Russian](docs/regressions/regressions-hc4-v1.0-ru.md), [Chinese](docs/regressions/regressions-hc4-v1.0-zh.md) ++ Regressions for HC4 (v1.0) baselines on original NeuCLIR22 corpora: [Persian](docs/regressions/regressions-hc4-neuclir22-fa.md), [Russian](docs/regressions/regressions-hc4-neuclir22-ru.md), [Chinese](docs/regressions/regressions-hc4-neuclir22-zh.md) ++ Regressions for HC4 (v1.0) baselines on translated NeuCLIR22 corpora: [Persian](docs/regressions/regressions-hc4-neuclir22-fa-en.md), [Russian](docs/regressions/regressions-hc4-neuclir22-ru-en.md), [Chinese](docs/regressions/regressions-hc4-neuclir22-zh-en.md) ++ Regressions for [NTCIR-8 ACLIA (IR4QA subtask, Monolingual Chinese)](docs/regressions/regressions-ntcir8-zh.md) ++ Regressions for [CLEF 2006 Monolingual French](docs/regressions/regressions-clef06-fr.md) ++ Regressions for [TREC 2002 Monolingual Arabic](docs/regressions/regressions-trec02-ar.md) ++ Regressions for FIRE 2012: [Monolingual Bengali](docs/regressions/regressions-fire12-bn.md), [Monolingual Hindi](docs/regressions/regressions-fire12-hi.md), [Monolingual English](docs/regressions/regressions-fire12-en.md)
@@ -254,15 +254,15 @@ See individual pages for details! ### Other Regressions -+ Regressions for [Disks 1 & 2 (TREC 1-3)](docs/regressions-disk12.md), [Disks 4 & 5 (TREC 7-8, Robust04)](docs/regressions-disk45.md), [AQUAINT (Robust05)](docs/regressions-robust05.md) -+ Regressions for [the New York Times Corpus (Core17)](docs/regressions-core17.md), [the Washington Post Corpus (Core18)](docs/regressions-core18.md) -+ Regressions for [Wt10g](docs/regressions-wt10g.md), [Gov2](docs/regressions-gov2.md) -+ Regressions for [ClueWeb09 (Category B)](docs/regressions-cw09b.md), [ClueWeb12-B13](docs/regressions-cw12b13.md), [ClueWeb12](docs/regressions-cw12.md) -+ Regressions for [Tweets2011 (MB11 & MB12)](docs/regressions-mb11.md), [Tweets2013 (MB13 & MB14)](docs/regressions-mb13.md) -+ Regressions for Complex Answer Retrieval (CAR17): [v1.5](docs/regressions-car17v1.5.md), [v2.0](docs/regressions-car17v2.0.md), [v2.0 with doc2query](docs/regressions-car17v2.0-doc2query.md) -+ Regressions for TREC News Tracks (Background Linking Task): [2018](docs/regressions-backgroundlinking18.md), [2019](docs/regressions-backgroundlinking19.md), [2020](docs/regressions-backgroundlinking20.md) -+ Regressions for [FEVER Fact Verification](docs/regressions-fever.md) -+ Regressions for DPR Wikipedia QA baselines: [100-word splits](docs/regressions-wikipedia-dpr-100w-bm25.md), [6/3 sliding window sentences](docs/regressions-wiki-all-6-3-tamber-bm25.md) ++ Regressions for [Disks 1 & 2 (TREC 1-3)](docs/regressions/regressions-disk12.md), [Disks 4 & 5 (TREC 7-8, Robust04)](docs/regressions/regressions-disk45.md), [AQUAINT (Robust05)](docs/regressions/regressions-robust05.md) ++ Regressions for [the New York Times Corpus (Core17)](docs/regressions/regressions-core17.md), [the Washington Post Corpus (Core18)](docs/regressions/regressions-core18.md) ++ Regressions for [Wt10g](docs/regressions/regressions-wt10g.md), [Gov2](docs/regressions/regressions-gov2.md) ++ Regressions for [ClueWeb09 (Category B)](docs/regressions/regressions-cw09b.md), [ClueWeb12-B13](docs/regressions/regressions-cw12b13.md), [ClueWeb12](docs/regressions/regressions-cw12.md) ++ Regressions for [Tweets2011 (MB11 & MB12)](docs/regressions/regressions-mb11.md), [Tweets2013 (MB13 & MB14)](docs/regressions/regressions-mb13.md) ++ Regressions for Complex Answer Retrieval (CAR17): [v1.5](docs/regressions/regressions-car17v1.5.md), [v2.0](docs/regressions/regressions-car17v2.0.md), [v2.0 with doc2query](docs/regressions/regressions-car17v2.0-doc2query.md) ++ Regressions for TREC News Tracks (Background Linking Task): [2018](docs/regressions/regressions-backgroundlinking18.md), [2019](docs/regressions/regressions-backgroundlinking19.md), [2020](docs/regressions/regressions-backgroundlinking20.md) ++ Regressions for [FEVER Fact Verification](docs/regressions/regressions-fever.md) ++ Regressions for DPR Wikipedia QA baselines: [100-word splits](docs/regressions/regressions-wikipedia-dpr-100w-bm25.md), [6/3 sliding window sentences](docs/regressions/regressions-wiki-all-6-3-tamber-bm25.md)
@@ -305,7 +305,7 @@ For the most part, manual copying and pasting of commands into a shell is requir ## 🙋 How Can I Contribute? If you've found Anserini to be helpful, we have a simple request for you to contribute back. -In the course of [reproducing](docs/reproducibility.md) baseline results on standard test collections, please let us know if you're successful by sending us a pull request with a simple note, like what appears at the bottom of [the page for Disks 4 & 5](docs/regressions-disk45.md). +In the course of [reproducing](docs/reproducibility.md) baseline results on standard test collections, please let us know if you're successful by sending us a pull request with a simple note, like what appears at the bottom of [the page for Disks 4 & 5](docs/regressions/regressions-disk45.md). Reproducibility is important to us, and we'd like to know about successes as well as failures. Since the regression documentation is auto-generated, pull requests should be sent against the [raw templates](https://github.com/castorini/anserini/tree/master/src/main/resources/docgen/templates). Then the regression documentation can be generated using the [`bin/build.sh`](bin/build.sh) script. @@ -377,9 +377,9 @@ To reproducible old results from Lucene 7.6, use [v0.5.1](https://github.com/cas ## ✨ References -+ Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, Sebastiano Vigna. [Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge.](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) _ECIR 2016_. -+ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Enabling the Use of Lucene for Information Retrieval Research.](https://dl.acm.org/authorize?N47337) _SIGIR 2017_. -+ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Reproducible Ranking Baselines Using Lucene.](https://dl.acm.org/citation.cfm?doid=3289400.3239571) _Journal of Data and Information Quality_, 10(4), Article 16, 2018. ++ Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, Sebastiano Vigna. [Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge.](https://link.springer.com/chapter/10.1007/978-3-319-30671-1_30) _ECIR 2016_. ++ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Enabling the Use of Lucene for Information Retrieval Research.](https://dl.acm.org/doi/10.1145/3077136.3080721) _SIGIR 2017_. ++ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Reproducible Ranking Baselines Using Lucene.](https://dl.acm.org/doi/10.1145/3239571) _Journal of Data and Information Quality_, 10(4), Article 16, 2018. ## 🙏 Acknowledgments diff --git a/docs/elastirini.md b/docs/elastirini.md index 4ffa87972f..00756c859a 100644 --- a/docs/elastirini.md +++ b/docs/elastirini.md @@ -38,10 +38,10 @@ If you want to install Kibana, it's just another distribution to unpack and a si ## Indexing and Retrieval: Robust04 Once we have a local instance of Elasticsearch up and running, we can index using Elasticsearch through Elastirini. -In this example, we reproduce experiments on [Robust04](regressions-disk45.md). +In this example, we reproduce experiments on Robust04. First, let's create the index in Elasticsearch. -We define the schema and the ranking function (BM25) using [this config](../src/main/resources/elasticsearch/index-config.robust04.json): +We define the schema and the ranking function (BM25) usingthe config at `src/main/resources/elasticsearch/index-config.robust04.json`: ```bash cat src/main/resources/elasticsearch/index-config.robust04.json \ @@ -87,8 +87,8 @@ P_30 all 0.3102 ## Indexing and Retrieval: Core18 -We can reproduce the [TREC Washington Post Corpus](regressions-core18.md) results in a similar way. -First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.core18.json): +We can reproduce the TREC Washington Post Corpus results in a similar way. +First, set up the proper schema using the config at `src/main/resources/elasticsearch/index-config.core18.json`: ```bash cat src/main/resources/elasticsearch/index-config.core18.json \ @@ -133,8 +133,8 @@ P_30 all 0.3573 ## Indexing and Retrieval: MS MARCO Passage -We can reproduce the [BM25 Baselines on MS MARCO (Passage)](experiments-msmarco-passage.md) results in a similar way. -First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.msmarco-passage.json): +We can reproduce the BM25 Baselines on MS MARCO (Passage) results in a similar way. +First, set up the proper schema using the config at `src/main/resources/elasticsearch/index-config.msmarco-passage.json`: ```bash cat src/main/resources/elasticsearch/index-config.msmarco-passage.json \ @@ -179,8 +179,8 @@ recall_1000 all 0.8573 ## Indexing and Retrieval: MS MARCO Document -We can reproduce the [BM25 Baselines on MS MARCO (Doc)](experiments-msmarco-doc.md) results in a similar way. -First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.msmarco-doc.json): +We can reproduce the BM25 Baselines on MS MARCO (Doc) results in a similar way. +First, set up the proper schema using the config at `src/main/resources/elasticsearch/index-config.msmarco-doc.json`: ```bash cat src/main/resources/elasticsearch/index-config.msmarco-doc.json \ @@ -227,7 +227,7 @@ recall_1000 all 0.8856 ## Elasticsearch Integration Test -We have an end-to-end integration testing script `run_es_regression.py` for [Robust04](regressions-disk45.md), [Core18](regressions-core18.md), [MS MARCO passage](regressions-msmarco-passage.md) and [MS MARCO document](regressions-msmarco-doc.md): +We have an end-to-end integration testing script `run_es_regression.py` for Robust04, Core18, MS MARCO passage and MS MARCO document: ```bash # Check if Elasticsearch server is on @@ -256,16 +256,16 @@ For the `collection` meta-parameter, use `robust04`, `core18`, `msmarco-passage` ## Reproduction Log[*](reproducibility.md) -+ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`d5ee069`](https://github.com/castorini/anserini/commit/d5ee069399e6a306d7685bda756c1f19db721156)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-disk45.md) -+ Results reproduced by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-26 (commit [`7b76dfb`](https://github.com/castorini/anserini/commit/7b76dfbea7e0c01a3a5dc13e74f54852c780ec9b)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-disk45.md) -+ Results reproduced by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`07a9b05`](https://github.com/castorini/anserini/commit/07a9b053173637e15be79de4e7fce4d5a93d04fe)) for [MS MARCO Passage](regressions-msmarco-passage.md), [Robust04](regressions-disk45.md) and [Core18](regressions-core18.md) using end-to-end [`run_es_regression`](../src/main/python/run_es_regression.py) -+ Results reproduced by [@shaneding](https://github.com/shaneding) on 2020-05-25 (commit [`1de3274`](https://github.com/castorini/anserini/commit/1de3274b057a63382534c5277ffcd772c3fc0d43)) for [MS MARCO Passage](regressions-msmarco-passage.md) -+ Results reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`94893f1`](https://github.com/castorini/anserini/commit/94893f170e047d77c3ef5b8b995d7fbdd13f4298)) for [MS MARCO Passage](regressions-msmarco-passage.md), [MS MARCO Document](experiments-msmarco-doc.md) -+ Results reproduced by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS MARCO Passage](regressions-msmarco-passage.md) -+ Results reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for [Robust04](regressions-disk45.md), [Core18](regressions-core18.md), and [MS MARCO Passage](regressions-msmarco-passage.md) ++ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`d5ee069`](https://github.com/castorini/anserini/commit/d5ee069399e6a306d7685bda756c1f19db721156)) for both MS MARCO Passage and Robust04 ++ Results reproduced by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-26 (commit [`7b76dfb`](https://github.com/castorini/anserini/commit/7b76dfbea7e0c01a3a5dc13e74f54852c780ec9b)) for both MS MARCO Passage and Robust04 ++ Results reproduced by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`07a9b05`](https://github.com/castorini/anserini/commit/07a9b053173637e15be79de4e7fce4d5a93d04fe)) for MS MARCO Passage, Robust04 and Core18 using end-to-end `run_es_regression` ++ Results reproduced by [@shaneding](https://github.com/shaneding) on 2020-05-25 (commit [`1de3274`](https://github.com/castorini/anserini/commit/1de3274b057a63382534c5277ffcd772c3fc0d43)) for MS MARCO Passage ++ Results reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`94893f1`](https://github.com/castorini/anserini/commit/94893f170e047d77c3ef5b8b995d7fbdd13f4298)) for MS MARCO Passage, MS MARCO Document ++ Results reproduced by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for MS MARCO Passage ++ Results reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for Robust04, Core18, and MS MARCO Passage + Results reproduced by [@lintool](https://github.com/lintool) on 2020-11-10 (commit [`e19755b`](https://github.com/castorini/anserini/commit/e19755b5fa976127830597bc9fbca203b9f5ad24)), all commands and end-to-end regression script for all four collections -+ Results reproduced by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-02 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for [MS MARCO Passage](regressions-msmarco-passage.md) -+ Results reproduced by [@tyao-t](https://github.com/tyao-t) on 2022-01-13 (commit [`06fb4f9`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786)) for [MS MARCO Passage](regressions-msmarco-passage.md) and [MS MARCO Document](regressions-msmarco-doc.md) -+ Results reproduced by [@d1shs0ap](https://github.com/d1shs0ap) on 2022-01-21 (commit [`a81299e`](https://github.com/castorini/anserini/commit/a81299e59eff24512d635e0d49fba6e373286469)) for [MS MARCO Document](regressions-msmarco-doc.md) using end-to-end [`run_es_regression`](../src/main/python/run_es_regression.py) ++ Results reproduced by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-02 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for MS MARCO Passage ++ Results reproduced by [@tyao-t](https://github.com/tyao-t) on 2022-01-13 (commit [`06fb4f9`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786)) for MS MARCO Passage and MS MARCO Document ++ Results reproduced by [@d1shs0ap](https://github.com/d1shs0ap) on 2022-01-21 (commit [`a81299e`](https://github.com/castorini/anserini/commit/a81299e59eff24512d635e0d49fba6e373286469)) for MS MARCO Document using end-to-end `run_es_regression` + Results reproduced by [@lintool](https://github.com/lintool) on 2022-03-21 (commit [`3d1fc34`](https://github.com/castorini/anserini/commit/3d1fc3457b993832b4682c0482b26d8271d02ec6)) for all collections + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-31 (commit [`2a0cb16`](https://github.com/castorini/anserini/commit/2a0cb16829b347e38801b9972b349de498dadf03)) (v0.14.4) for all collections diff --git a/docs/regressions-backgroundlinking18.md b/docs/regressions/regressions-backgroundlinking18.md similarity index 93% rename from docs/regressions-backgroundlinking18.md rename to docs/regressions/regressions-backgroundlinking18.md index 040ad4102b..ad53b02368 100644 --- a/docs/regressions-backgroundlinking18.md +++ b/docs/regressions/regressions-backgroundlinking18.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the background linking task in the [TREC 2018 News Track](http://trec-news.org/). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking18.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/backgroundlinking18.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/backgroundlinking18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-backgroundlinking19.md b/docs/regressions/regressions-backgroundlinking19.md similarity index 93% rename from docs/regressions-backgroundlinking19.md rename to docs/regressions/regressions-backgroundlinking19.md index d59d7290e6..201a8d195d 100644 --- a/docs/regressions-backgroundlinking19.md +++ b/docs/regressions/regressions-backgroundlinking19.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the background linking task in the [TREC 2019 News Track](http://trec-news.org/). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking19.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking19.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/backgroundlinking19.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/backgroundlinking19.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-backgroundlinking20.md b/docs/regressions/regressions-backgroundlinking20.md similarity index 93% rename from docs/regressions-backgroundlinking20.md rename to docs/regressions/regressions-backgroundlinking20.md index f90e61a398..3e82631ea9 100644 --- a/docs/regressions-backgroundlinking20.md +++ b/docs/regressions/regressions-backgroundlinking20.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the background linking task in the [TREC 2020 News Track](http://trec-news.org/). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/backgroundlinking20.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus *v3*](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-arguana-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-arguana-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md index b23bdd5877..63197dec7a 100644 --- a/docs/regressions-beir-v1.0.0-arguana-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — ArguA These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-arguana-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-arguana-flat.md b/docs/regressions/regressions-beir-v1.0.0-arguana-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-arguana-flat.md rename to docs/regressions/regressions-beir-v1.0.0-arguana-flat.md index c523e71b5f..a7cbcd9dbe 100644 --- a/docs/regressions-beir-v1.0.0-arguana-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — ArguAna](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-arguana-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-arguana-multifield.md b/docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-arguana-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md index b111acfed7..affdb1beee 100644 --- a/docs/regressions-beir-v1.0.0-arguana-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — ArguA These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-arguana-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md index 5ea241bd0c..c911486244 100644 --- a/docs/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — ArguAna](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-arguana-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): ArguAna | 0.9950 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-arguana-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-arguana-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md index 814c947b24..f975ab0c14 100644 --- a/docs/regressions-beir-v1.0.0-arguana-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-arguana-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-bioasq-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-bioasq-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md index 4b0bfd4538..addcc88402 100644 --- a/docs/regressions-beir-v1.0.0-bioasq-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — BioAS These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-bioasq-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-bioasq-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-bioasq-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-bioasq-flat.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-bioasq-flat.md rename to docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md index e7a140d5ff..31d7f89081 100644 --- a/docs/regressions-beir-v1.0.0-bioasq-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — BioASQ](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-bioasq-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-bioasq-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-bioasq-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-bioasq-multifield.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-bioasq-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md index d1cf5916d7..4795cfe049 100644 --- a/docs/regressions-beir-v1.0.0-bioasq-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — BioAS These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-bioasq-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-bioasq-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-bioasq-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md index d28fe8715d..b7d26f0fb6 100644 --- a/docs/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — BioASQ](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-bioasq-splade_distil_cocodenser_medium/` should p The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): BioASQ | 0.8904 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md index b59b3c1849..72fdd6409c 100644 --- a/docs/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-bioasq-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-bioasq-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-bioasq-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-climate-fever-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-climate-fever-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md index 2a7c903f98..a745ff00a8 100644 --- a/docs/regressions-beir-v1.0.0-climate-fever-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Clima These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-climate-fever-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-climate-fever-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-climate-fever-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-climate-fever-flat.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-climate-fever-flat.md rename to docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md index 3414adebd5..11768beeee 100644 --- a/docs/regressions-beir-v1.0.0-climate-fever-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Climate-FEVER](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-climate-fever-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-climate-fever-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-climate-fever-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-climate-fever-multifield.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-climate-fever-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md index 913a39511b..b21a32c271 100644 --- a/docs/regressions-beir-v1.0.0-climate-fever-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Clima These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-climate-fever-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-climate-fever-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-climate-fever-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md index d8df368013..23a9becc89 100644 --- a/docs/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — Climate-FEVER](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-climate-fever-splade_distil_cocodenser_medium/` s The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,416,593 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Climate-FEVER | 0.7084 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md index 159edd91d0..9310c4e6e8 100644 --- a/docs/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-climate-fever-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-climate-fever-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-climate-fever-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md index fe8288c277..0310ae957c 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-android-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-android-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-android-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md index 3b3d751bb7..d3e609b43e 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-android-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-android](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-android-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-android-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-android-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md index e09e5104c1..d4eccafeec 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-android-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-android-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md index 362756bf0c..adf7fec4c3 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-android](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-android-splade_distil_cocodenser_medi The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-android | 0.9035 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md index 2391e22348..308e13a8aa 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-android-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-android-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md index 846774acbf..808c0f635c 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-english-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-english-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-english-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md index c1b8acce11..f4dd574c90 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-english-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-english](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-english-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-english-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-english-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md index 9d332b3125..350e191340 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-english-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-english-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md index 522062ff8d..5f941ef1f5 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-english](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-english-splade_distil_cocodenser_medi The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-english | 0.8346 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md index 146282e6f3..227b7febdd 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-english-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md index e3eff9ebe4..44b08b39f0 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gaming-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md index ecbe2d72cb..c9e3cd50e7 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-gaming](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gaming-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md index 7aaad9bb2e..8c4a2008ea 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gaming-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md index eba6276f09..c90880ff36 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-gaming](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-gaming-splade_distil_cocodenser_mediu The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-gaming | 0.9253 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md index 89c584ecb5..8d67a5bb36 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gaming-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md index c22ee68b26..6247693b35 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gis-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gis-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gis-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md index 999c2063c9..6cd7105b3d 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gis-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-gis](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gis-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md index 6ab3769e0c..cc5d35c8a2 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gis-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md index 90d9816d3b..c232474211 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-gis](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-gis-splade_distil_cocodenser_medium/` The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-gis | 0.8385 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md index febfd0a7eb..32b7a82db2 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-gis-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md index 3bafebcfb5..d4fbc0ebd1 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-mathematica-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md index 50f12baf99..060f1bfce3 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-mathematica](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-mathematica-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md index 3a4b0a4410..f87268e287 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-mathematica-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md index 61a20c5e38..21d4906dd6 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-mathematica](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-mathematica-splade_distil_cocodenser_ The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-mathematica | 0.7848 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md index f8e6ad94f6..e7f094b22c 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md index 939ff10392..0f24413e21 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-physics-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-physics-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-physics-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md index e08f9cadb9..84ac6dc82b 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-physics-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-physics](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-physics-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md index b23d619607..4980e923d7 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-physics-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md index 708409c6a3..2248f8f50c 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-physics](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-physics-splade_distil_cocodenser_medi The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-physics | 0.8931 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md index ea51798df2..d69cb88bc5 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-physics-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md index b401ae8866..44d2012d3d 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-programmers-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md index a70772a1b8..9bca89a3c9 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-programmers](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-programmers-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md index 5914648d60..b5bad4d553 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-programmers-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md index 08e3b930b0..b0612e0f24 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-programmers](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-programmers-splade_distil_cocodenser_ The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-programmers | 0.8451 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md index 53b43c38af..d9cbb6b026 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-programmers-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md index ba737966f5..1df9ef5fe5 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-stats-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-stats-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-stats-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md index 1b01f8f1e6..4ba693e58d 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-stats-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-stats](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-stats-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md index 1c776bd440..ff8ff5bf20 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-stats-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md index 20f5693b2c..8cc6bfe8a6 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-stats](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-stats-splade_distil_cocodenser_medium The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-stats | 0.7823 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md index 21d1abe5a7..00f5166834 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-stats-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md index ec750c9212..453a597192 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-tex-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-tex-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-tex-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md index 0741b30ae3..e29127165a 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-tex-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-tex](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-tex-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md index 9d5005434e..695b885c6f 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-tex-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md index aad2382767..ee5609f80a 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-tex](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-tex-splade_distil_cocodenser_medium/` The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-tex | 0.7372 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md index 1b64e9494c..a246040edb 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-tex-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md index fea668b74a..fcb07e3dbe 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-unix-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-unix-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-unix-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md index ae2f52f453..463d703b1c 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-unix-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-unix](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-unix-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md index 17252bd3eb..a4d9561846 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-unix-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md index 4cf18b27fa..fb3d578cbc 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-unix](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-unix-splade_distil_cocodenser_medium/ The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-unix | 0.8225 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md index 8433eb3684..4f3ec0d53b 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-unix-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md index 59f5787750..7087309d1b 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-webmasters-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md index b10b309f8f..513bd74513 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-webmasters](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-webmasters-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md index cc19993b03..6cb4726616 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-webmasters-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md index 084ac61365..e18e702115 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-webmasters](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-webmasters-splade_distil_cocodenser_m The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-webmasters | 0.8767 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md index 331cb39c7e..3006d93230 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md index dfc4609fd1..d320ac8f98 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-wordpress-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md index 9e2ee8812a..de03f016f8 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADupStack-wordpress](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-wordpress-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md index 947deb21c9..08a2576291 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — CQADu These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-wordpress-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md index 8296bf386b..473b740ea7 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — CQADupStack-wordpress](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-cqadupstack-wordpress-splade_distil_cocodenser_me The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-wordpress | 0.8036 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md index 351085c675..d0d40eb47d 100644 --- a/docs/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md index c86ef11f25..ba94e811e1 100644 --- a/docs/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — DBPed These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-dbpedia-entity-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-dbpedia-entity-flat.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-dbpedia-entity-flat.md rename to docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md index 73bd60316c..0d74397008 100644 --- a/docs/regressions-beir-v1.0.0-dbpedia-entity-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — DBPedia](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-dbpedia-entity-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-dbpedia-entity-multifield.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-dbpedia-entity-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md index f819a0efde..d86b32c33d 100644 --- a/docs/regressions-beir-v1.0.0-dbpedia-entity-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — DBPed These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-dbpedia-entity-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md index c8f57145a5..30b999b798 100644 --- a/docs/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — DBPedia](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-dbpedia-entity-splade_distil_cocodenser_medium/` The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 4,635,922 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): DBPedia | 0.7774 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md index 929b3043c9..ae0cd97a61 100644 --- a/docs/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-dbpedia-entity-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-dbpedia-entity-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fever-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fever-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md index c4b551c222..24f9e77abb 100644 --- a/docs/regressions-beir-v1.0.0-fever-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — FEVER These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fever-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fever-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fever-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fever-flat.md b/docs/regressions/regressions-beir-v1.0.0-fever-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fever-flat.md rename to docs/regressions/regressions-beir-v1.0.0-fever-flat.md index 4cbb3963a6..9675f76966 100644 --- a/docs/regressions-beir-v1.0.0-fever-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — FEVER](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fever-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fever-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fever-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fever-multifield.md b/docs/regressions/regressions-beir-v1.0.0-fever-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fever-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-fever-multifield.md index e31424c39a..068932f2af 100644 --- a/docs/regressions-beir-v1.0.0-fever-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — FEVER These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fever-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fever-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fever-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md index f5107895dd..30217a3204 100644 --- a/docs/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — FEVER](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fever-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fever-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-fever-splade_distil_cocodenser_medium/` should po The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,416,568 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): FEVER | 0.9751 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-fever-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fever-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md index 6c76ba996e..902a1f3b19 100644 --- a/docs/regressions-beir-v1.0.0-fever-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fever-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fever-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fever-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fiqa-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fiqa-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md index 0cc03a0612..1c49d7792e 100644 --- a/docs/regressions-beir-v1.0.0-fiqa-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — FiQA- These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fiqa-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fiqa-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fiqa-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fiqa-flat.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fiqa-flat.md rename to docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md index e559240bd2..078d996b3a 100644 --- a/docs/regressions-beir-v1.0.0-fiqa-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — FiQA-2018](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fiqa-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fiqa-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fiqa-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fiqa-multifield.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fiqa-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md index 9304b41dde..c2c66ff133 100644 --- a/docs/regressions-beir-v1.0.0-fiqa-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — fiqa] These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fiqa-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fiqa-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fiqa-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md index 83f84c97e0..b341e947d5 100644 --- a/docs/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — FiQA-2018](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-fiqa-splade_distil_cocodenser_medium/` should poi The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 57,638 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): FiQA-2018 | 0.8323 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md index 19a1bfd968..e3d8857e29 100644 --- a/docs/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-fiqa-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-fiqa-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-fiqa-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-hotpotqa-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-hotpotqa-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md index 3461ca071c..44191cb4ee 100644 --- a/docs/regressions-beir-v1.0.0-hotpotqa-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Hotpo These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-hotpotqa-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-hotpotqa-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-hotpotqa-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-hotpotqa-flat.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-hotpotqa-flat.md rename to docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md index a472247fc9..ceefbccdfb 100644 --- a/docs/regressions-beir-v1.0.0-hotpotqa-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — HotpotQA](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-hotpotqa-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-hotpotqa-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-hotpotqa-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-hotpotqa-multifield.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-hotpotqa-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md index 2f3b0842c1..e7078a941e 100644 --- a/docs/regressions-beir-v1.0.0-hotpotqa-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Hotpo These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-hotpotqa-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-hotpotqa-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-hotpotqa-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md index ebc7e436ce..37dd282470 100644 --- a/docs/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — HotpotQA](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-hotpotqa-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,233,329 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): HotpotQA | 0.8945 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md index 0cce5ebabb..b2df25a461 100644 --- a/docs/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-hotpotqa-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-hotpotqa-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-hotpotqa-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nfcorpus-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nfcorpus-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md index f1db6e2b63..97d0edd863 100644 --- a/docs/regressions-beir-v1.0.0-nfcorpus-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — NFCor These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nfcorpus-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nfcorpus-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nfcorpus-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nfcorpus-flat.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nfcorpus-flat.md rename to docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md index 4d76fa86cd..a64587f879 100644 --- a/docs/regressions-beir-v1.0.0-nfcorpus-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — NFCorpus](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nfcorpus-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nfcorpus-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nfcorpus-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nfcorpus-multifield.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nfcorpus-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md index ad2c374a84..385d5411d0 100644 --- a/docs/regressions-beir-v1.0.0-nfcorpus-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — NFCor These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nfcorpus-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nfcorpus-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nfcorpus-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md index 2212f4f319..3a09d6f7c8 100644 --- a/docs/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — NFCorpus](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-nfcorpus-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 3,633 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): NFCorpus | 0.5694 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md index 38016997eb..856bfa2460 100644 --- a/docs/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nfcorpus-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nfcorpus-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nfcorpus-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nq-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nq-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md index ba7a6902bd..6c6dbf243b 100644 --- a/docs/regressions-beir-v1.0.0-nq-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — NQ](h These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nq-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nq-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nq-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nq-flat.md b/docs/regressions/regressions-beir-v1.0.0-nq-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nq-flat.md rename to docs/regressions/regressions-beir-v1.0.0-nq-flat.md index 14c3863381..7aa8738fe2 100644 --- a/docs/regressions-beir-v1.0.0-nq-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — NQ](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nq-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nq-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nq-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nq-multifield.md b/docs/regressions/regressions-beir-v1.0.0-nq-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nq-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-nq-multifield.md index d9e7239e85..4990f88f89 100644 --- a/docs/regressions-beir-v1.0.0-nq-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — NQ](h These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nq-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nq-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nq-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md index 0e25f508be..c7db2610c7 100644 --- a/docs/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — NQ](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nq-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nq-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-nq-splade_distil_cocodenser_medium/` should point The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 2,681,468 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): NQ | 0.9812 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-nq-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-nq-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md index 9432f4424c..df0b0f39ea 100644 --- a/docs/regressions-beir-v1.0.0-nq-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-nq-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-nq-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-nq-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-quora-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-quora-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md index cbe6670c6d..1fc0449b3c 100644 --- a/docs/regressions-beir-v1.0.0-quora-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Quora These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-quora-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-quora-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-quora-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-quora-flat.md b/docs/regressions/regressions-beir-v1.0.0-quora-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-quora-flat.md rename to docs/regressions/regressions-beir-v1.0.0-quora-flat.md index 207ca65daa..eb0d27e43d 100644 --- a/docs/regressions-beir-v1.0.0-quora-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Quora](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-quora-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-quora-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-quora-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-quora-multifield.md b/docs/regressions/regressions-beir-v1.0.0-quora-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-quora-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-quora-multifield.md index d5c76f3563..6ca7c7c7ee 100644 --- a/docs/regressions-beir-v1.0.0-quora-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Quora These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-quora-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-quora-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-quora-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md index 0875e672df..8650bea3d2 100644 --- a/docs/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — Quora](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-quora-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-quora-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-quora-splade_distil_cocodenser_medium/` should po The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 522,931 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Quora | 0.9979 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-quora-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-quora-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md index fb9fffa922..fa6fcb040c 100644 --- a/docs/regressions-beir-v1.0.0-quora-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-quora-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-quora-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-quora-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-robust04-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-robust04-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md index bffda9781f..4ea23d7350 100644 --- a/docs/regressions-beir-v1.0.0-robust04-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Robus These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-robust04-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-robust04-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-robust04-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-robust04-flat.md b/docs/regressions/regressions-beir-v1.0.0-robust04-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-robust04-flat.md rename to docs/regressions/regressions-beir-v1.0.0-robust04-flat.md index 317133ebf8..12e46a5de0 100644 --- a/docs/regressions-beir-v1.0.0-robust04-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Robust04](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-robust04-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-robust04-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-robust04-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-robust04-multifield.md b/docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-robust04-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md index 6eb9497d4e..5663ba0a2d 100644 --- a/docs/regressions-beir-v1.0.0-robust04-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Robus These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-robust04-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-robust04-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-robust04-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md index 675b6c46e4..1a4f7968e2 100644 --- a/docs/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — Robust04](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-robust04-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Robust04 | 0.6099 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-robust04-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-robust04-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md index 3e8462f8d2..cec9e5c22c 100644 --- a/docs/regressions-beir-v1.0.0-robust04-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-robust04-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-robust04-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-robust04-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scidocs-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scidocs-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md index 2a1c9a3f7d..c6100084a5 100644 --- a/docs/regressions-beir-v1.0.0-scidocs-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — SCIDO These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scidocs-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scidocs-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scidocs-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scidocs-flat.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scidocs-flat.md rename to docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md index da1b49f775..2e89a9eecc 100644 --- a/docs/regressions-beir-v1.0.0-scidocs-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — SCIDOCS](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scidocs-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scidocs-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scidocs-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scidocs-multifield.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scidocs-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md index 3119c3a43c..3b8c3fa443 100644 --- a/docs/regressions-beir-v1.0.0-scidocs-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — SCIDO These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scidocs-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scidocs-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scidocs-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md index 8cb17093a1..40e8ace329 100644 --- a/docs/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — SCIDOCS](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-scidocs-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 25,657 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): SCIDOCS | 0.5891 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md index dc7834be7d..e60c7b8b64 100644 --- a/docs/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scidocs-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scidocs-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scidocs-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scifact-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scifact-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md index 26a29c7dfd..ee9e6b6ca0 100644 --- a/docs/regressions-beir-v1.0.0-scifact-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — SciFa These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scifact-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scifact-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scifact-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scifact-flat.md b/docs/regressions/regressions-beir-v1.0.0-scifact-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scifact-flat.md rename to docs/regressions/regressions-beir-v1.0.0-scifact-flat.md index 033820bf46..954544305e 100644 --- a/docs/regressions-beir-v1.0.0-scifact-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — SciFact](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scifact-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scifact-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scifact-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scifact-multifield.md b/docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scifact-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md index 985809f785..bf07a8bb2d 100644 --- a/docs/regressions-beir-v1.0.0-scifact-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — SciFa These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scifact-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scifact-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scifact-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md index cb6bfce243..4e5c9084a2 100644 --- a/docs/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — SciFact](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-scifact-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,183 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): SciFact | 0.9767 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-scifact-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-scifact-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md index b8a7a1e5fa..6069a44903 100644 --- a/docs/regressions-beir-v1.0.0-scifact-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-scifact-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-scifact-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-scifact-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-signal1m-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-signal1m-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md index c787e55d08..b53e765103 100644 --- a/docs/regressions-beir-v1.0.0-signal1m-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Signa These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-signal1m-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-signal1m-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-signal1m-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-signal1m-flat.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-signal1m-flat.md rename to docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md index a8bb081aba..bd77f9891a 100644 --- a/docs/regressions-beir-v1.0.0-signal1m-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Signal-1M](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-signal1m-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-signal1m-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-signal1m-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-signal1m-multifield.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-signal1m-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md index 467b0e1104..598efc3d56 100644 --- a/docs/regressions-beir-v1.0.0-signal1m-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Signa These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-signal1m-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-signal1m-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-signal1m-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md index b4ab7ea900..37d0ce375f 100644 --- a/docs/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — Signal-1M](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-signal1m-splade_distil_cocodenser_medium/` should The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Signal-1M | 0.5514 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md index e87d1db6c6..cd1ea691a6 100644 --- a/docs/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-signal1m-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-signal1m-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-signal1m-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-covid-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-covid-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md index 45c944970a..667f32b6b9 100644 --- a/docs/regressions-beir-v1.0.0-trec-covid-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — TREC- These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-covid-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-covid-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-covid-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-covid-flat.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-covid-flat.md rename to docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md index 1044afc984..bef636d839 100644 --- a/docs/regressions-beir-v1.0.0-trec-covid-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — TREC-COVID](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-covid-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-covid-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-covid-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-covid-multifield.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-covid-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md index 98477edbd6..7d928f0b31 100644 --- a/docs/regressions-beir-v1.0.0-trec-covid-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — TREC- These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-covid-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-covid-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-covid-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md index ea75f0756d..dbb453e005 100644 --- a/docs/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — TREC-COVID](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-trec-covid-splade_distil_cocodenser_medium/` shou The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 171,332 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): TREC-COVID | 0.4433 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md index 05ea0d06c8..747029c197 100644 --- a/docs/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-covid-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-covid-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-covid-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-news-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-news-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md index 93f1c3cacf..cc71ac3a0c 100644 --- a/docs/regressions-beir-v1.0.0-trec-news-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — TREC- These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-news-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-news-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-news-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-news-flat.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-news-flat.md rename to docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md index 468df04070..7a6ef70a55 100644 --- a/docs/regressions-beir-v1.0.0-trec-news-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — TREC-NEWS](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-news-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-news-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-news-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-news-multifield.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-news-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md index 0d75dda90c..fed56f82fe 100644 --- a/docs/regressions-beir-v1.0.0-trec-news-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — TREC- These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-news-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-news-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-news-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md index 99d9d21e3b..918b8568fc 100644 --- a/docs/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — TREC-NEWS](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-trec-news-splade_distil_cocodenser_medium/` shoul The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): TREC-NEWS | 0.6977 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md index a819afd0ee..de0a61cb0d 100644 --- a/docs/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-trec-news-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-trec-news-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-trec-news-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md rename to docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md index 7676e73073..735b4b2721 100644 --- a/docs/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Webis These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. All the documents and queries are pre-tokenized with `bert-base-uncased` tokenizer. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-webis-touche2020-flat-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-webis-touche2020-flat-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-webis-touche2020-flat-wp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-webis-touche2020-flat.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md similarity index 91% rename from docs/regressions-beir-v1.0.0-webis-touche2020-flat.md rename to docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md index 7056f150b6..6d79cd0d6f 100644 --- a/docs/regressions-beir-v1.0.0-webis-touche2020-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Webis-Touche2020](http://beir.ai/). These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-webis-touche2020-flat.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-webis-touche2020-flat.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-webis-touche2020-flat & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-webis-touche2020-multifield.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md similarity index 91% rename from docs/regressions-beir-v1.0.0-webis-touche2020-multifield.md rename to docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md index 13664ef645..8473a7af01 100644 --- a/docs/regressions-beir-v1.0.0-webis-touche2020-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [BEIR (v1.0.0) — Webis These experiments index the "title" and "text" fields in corpus separately. At retrieval time, a query is issued across both fields (equally weighted). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-webis-touche2020-multifield.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-webis-touche2020-multifield.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -27,7 +27,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-webis-touche2020-multifield & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md similarity index 89% rename from docs/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md index 5f4da02c6d..9bf249e040 100644 --- a/docs/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using SPLADE-distil CoCodenser Medium on [BEIR (v1.0.0) — Webis-Touche2020](http://beir.ai/). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -59,7 +59,7 @@ The path `/path/to/beir-v1.0.0-webis-touche2020-splade_distil_cocodenser_medium/ The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 382,545 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -97,6 +97,6 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Webis-Touche2020 | 0.8116 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md similarity index 91% rename from docs/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md rename to docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md index 9eecbf7c0f..7d52c502d1 100644 --- a/docs/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-webis-touche2020-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-webis-touche2020-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.beir-v1.0.0-webis-touche2020-unicoil-noexp & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-car17v1.5.md b/docs/regressions/regressions-car17v1.5.md similarity index 95% rename from docs/regressions-car17v1.5.md rename to docs/regressions/regressions-car17v1.5.md index fcf5de5d56..4633706b39 100644 --- a/docs/regressions-car17v1.5.md +++ b/docs/regressions/regressions-car17v1.5.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v1.5). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v1.5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v1.5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/car17v1.5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/car17v1.5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-car17v2.0-doc2query.md b/docs/regressions/regressions-car17v2.0-doc2query.md similarity index 94% rename from docs/regressions-car17v2.0-doc2query.md rename to docs/regressions/regressions-car17v2.0-doc2query.md index e56a6a10f7..b72fe09967 100644 --- a/docs/regressions-car17v2.0-doc2query.md +++ b/docs/regressions/regressions-car17v2.0-doc2query.md @@ -7,10 +7,10 @@ This page documents regression experiments for the [TREC 2017 Complex Answer Ret > Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho. [Document Expansion by Query Prediction.](https://arxiv.org/abs/1904.08375) _arxiv:1904.08375_ These experiments are integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-doc2query.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-doc2query.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v2.0-doc2query.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v2.0-doc2query.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/car17v2.0-doc2query.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/car17v2.0-doc2query.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,9 +32,9 @@ target/appassembler/bin/IndexCollection \ >& logs/log.car-paragraphCorpus.v2.0-doc2query & ``` -The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](experiments-doc2query.md). +The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](../../docs/experiments-doc2query.md). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-car17v2.0.md b/docs/regressions/regressions-car17v2.0.md similarity index 95% rename from docs/regressions-car17v2.0.md rename to docs/regressions/regressions-car17v2.0.md index 647a39fba4..7bdbf8388e 100644 --- a/docs/regressions-car17v2.0.md +++ b/docs/regressions/regressions-car17v2.0.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v2.0). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v2.0.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v2.0.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/car17v2.0.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/car17v2.0.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-clef06-fr.md b/docs/regressions/regressions-clef06-fr.md similarity index 93% rename from docs/regressions-clef06-fr.md rename to docs/regressions/regressions-clef06-fr.md index 9fddd94cdc..e660e54b00 100644 --- a/docs/regressions-clef06-fr.md +++ b/docs/regressions/regressions-clef06-fr.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for monolingual French document retrieval as part of the [CLEF 2006 Multilingual Document Retrieval (Ad Hoc) Track](http://www.clef-initiative.eu/edition/clef2006). Associated data can be found on the [CLEF test suites pages](http://www.clef-initiative.eu/dataset/corpus). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/clef06-fr.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/clef06-fr.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/clef06-fr.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/clef06-fr.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ The collection comprises news articles from ATS (SDA) and Le Monde totaling 177, Since the original distribution is in a format that's slightly different from standard TREC collections, we used a [preprocessing script](../src/main/python/clir/document_preprocess.py) to convert the collection into Anserini's JSON line format (we also applied a bit of light data cleaning using a script that has been lost; if you have problems reproducing our results, get in touch directly). The directory `/path/to/clef06-fr/` should point to the location of the processed collection. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-core17.md b/docs/regressions/regressions-core17.md similarity index 94% rename from docs/regressions-core17.md rename to docs/regressions/regressions-core17.md index b6d7f37400..b1c3fb6865 100644 --- a/docs/regressions-core17.md +++ b/docs/regressions/regressions-core17.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the TREC 2017 Common Core Track, which uses the [New York Times Annotated Corpus](https://catalog.ldc.upenn.edu/LDC2008T19). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/core17.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/core17.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/core17.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/core17.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/nyt_corpus/` should be the root directory of the [New York Times Annotated Corpus](https://catalog.ldc.upenn.edu/LDC2008T19), i.e., `ls /path/to/nyt_corpus/` should bring up a bunch of subdirectories, `1987` to `2007`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -111,7 +111,7 @@ With the above commands, you should be able to reproduce the following results: | **P30** | **BM25** | **+RM3** | **+Ax** | **QL** | **+RM3** | **+Ax** | | [TREC 2017 Common Core Track Topics](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels/topics.core17.txt)| 0.4293 | 0.5027 | 0.4940 | 0.4467 | 0.4827 | 0.4893 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) + Results reproduced by [@tteofili](https://github.com/tteofili) on 2019-01-27 (commit [`951090`](https://github.com/castorini/Anserini/commit/951090b66230040f037dde46534d896416467337)) + Results reproduced by [@chriskamphuis](https://github.com/chriskamphuis) on 2019-09-07 (commit [`61f6f20`](https://github.com/castorini/anserini/commit/61f6f20ff6872484966ea1badcdcdcebf1eea852)) diff --git a/docs/regressions-core18.md b/docs/regressions/regressions-core18.md similarity index 95% rename from docs/regressions-core18.md rename to docs/regressions/regressions-core18.md index d20dc0ca58..b95e0f9356 100644 --- a/docs/regressions-core18.md +++ b/docs/regressions/regressions-core18.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the TREC 2018 Common Core Track, which uses the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/core18.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/core18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/core18.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/core18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -111,7 +111,7 @@ With the above commands, you should be able to reproduce the following results: | **P30** | **BM25** | **+RM3** | **+Ax** | **QL** | **+RM3** | **+Ax** | | [TREC 2018 Common Core Track Topics](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels/topics.core18.txt)| 0.3573 | 0.4167 | 0.3947 | 0.3653 | 0.4007 | 0.4013 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) + Results reproduced by [@andrewyates](https://github.com/andrewyates) on 2018-11-30 (commit [`c1aac5`](https://github.com/castorini/Anserini/commit/c1aac5e353e2ab77db3e7106cb4c017a09ce0fe9)) + Results reproduced by [@chriskamphuis](https://github.com/chriskamphuis) on 2019-09-07 (commit [`61f6f20`](https://github.com/castorini/anserini/commit/61f6f20ff6872484966ea1badcdcdcebf1eea852)) diff --git a/docs/regressions-cw09b.md b/docs/regressions/regressions-cw09b.md similarity index 98% rename from docs/regressions-cw09b.md rename to docs/regressions/regressions-cw09b.md index d296af6887..20517daf87 100644 --- a/docs/regressions-cw09b.md +++ b/docs/regressions/regressions-cw09b.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the Web Tracks from TREC 2009 to 2012 using the [ClueWeb09 (Category B) collection](http://lemurproject.org/clueweb09.php/). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw09b.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw09b.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/cw09b.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/cw09b.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/ClueWeb09b` should be the root directory of the [ClueWeb09 (Category B) collection](http://lemurproject.org/clueweb09.php/), i.e., `ls /path/to/ClueWeb09b` should bring up a bunch of subdirectories, `en0000` to `enwp03`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-cw12.md b/docs/regressions/regressions-cw12.md similarity index 96% rename from docs/regressions-cw12.md rename to docs/regressions/regressions-cw12.md index 7e2d95d8f4..526206ac83 100644 --- a/docs/regressions-cw12.md +++ b/docs/regressions/regressions-cw12.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the Web Tracks from TREC 2013 and 2014 using the (full) [ClueWeb12 collection](http://lemurproject.org/clueweb12.php/). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw12.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/cw12.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/cw12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/cw12/` should be the root directory of the (full) [ClueWeb12 collection](http://lemurproject.org/clueweb12.php/), i.e., `/path/to/cw12/` should contain `Disk1`, `Disk2`, `Disk3`, `Disk4`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-cw12b13.md b/docs/regressions/regressions-cw12b13.md similarity index 97% rename from docs/regressions-cw12b13.md rename to docs/regressions/regressions-cw12b13.md index 578ad16da8..76d010f085 100644 --- a/docs/regressions-cw12b13.md +++ b/docs/regressions/regressions-cw12b13.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the Web Tracks from TREC 2013 and 2014 using the [ClueWeb12-B13 collection](http://lemurproject.org/clueweb12/ClueWeb12-CreateB13.php). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw12b13.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw12b13.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/cw12b13.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/cw12b13.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/cw12-b13/` should be the root directory of the [ClueWeb12-B13 collection](http://lemurproject.org/clueweb12/ClueWeb12-CreateB13.php), i.e., `/path/to/cw12-b13/` should bring up a bunch of subdirectories, `ClueWeb12_00` to `ClueWeb12_18`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -174,6 +174,6 @@ With the above commands, you should be able to reproduce the following results: | [TREC 2013 Web Track (Topics 201-250)](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels/topics.web.201-250.txt)| 0.0838 | 0.0745 | 0.0951 | 0.0767 | 0.0548 | 0.0761 | | [TREC 2014 Web Track (Topics 251-300)](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels/topics.web.251-300.txt)| 0.1198 | 0.1064 | 0.0936 | 0.1091 | 0.0926 | 0.0911 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) * Results reproduced by [@matthew-z](https://github.com/matthew-z) on 2019-04-14 (commit [`abaa4c8`](https://github.com/castorini/Anserini/commit/abaa4c8e7cb50e8e4a3677377716f609b7859538))[*](https://github.com/castorini/Anserini/pull/590)[!](https://github.com/castorini/Anserini/issues/592) diff --git a/docs/regressions-disk12.md b/docs/regressions/regressions-disk12.md similarity index 97% rename from docs/regressions-disk12.md rename to docs/regressions/regressions-disk12.md index ad2b28678f..eba5ebb565 100644 --- a/docs/regressions-disk12.md +++ b/docs/regressions/regressions-disk12.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for ad hoc topics from TREC 1-3, which use [TIPSTER Disks 1 & 2](https://catalog.ldc.upenn.edu/LDC93T3A). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/disk12.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/disk12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/disk12.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/disk12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/disk12/` should be the root directory of [TIPSTER Disks 1 & 2](https://catalog.ldc.upenn.edu/LDC93T3A), i.e., `ls /path/to/disk12/` should bring up subdirectories like `doe`, `wsj`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-disk45.md b/docs/regressions/regressions-disk45.md similarity index 98% rename from docs/regressions-disk45.md rename to docs/regressions/regressions-disk45.md index c5c6e0e18d..82e067ef9a 100644 --- a/docs/regressions-disk45.md +++ b/docs/regressions/regressions-disk45.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for ad hoc topics from TREC 7-8, which use [TREC Disks 4 & 5](https://trec.nist.gov/data/cd45/index.html). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/disk45.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/disk45.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/disk45.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/disk45.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/disk45/` should be the root directory of [TREC Disks 4 & 5](https://trec.nist.gov/data/cd45/index.html); inside each there should be subdirectories like `ft`, `fr94`. Note that Anserini ignores the `cr` folder when indexing, which is the standard configuration. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -293,9 +293,7 @@ With the above commands, you should be able to reproduce the following results: | [TREC-8 Ad Hoc Topics](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels/topics.adhoc.401-450.txt)| 0.3560 | 0.3753 | 0.3707 | 0.3713 | 0.3753 | 0.3480 | 0.3713 | 0.3640 | 0.3660 | 0.3500 | | [TREC 2004 Robust Track Topics](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels/topics.robust04.txt)| 0.3102 | 0.3349 | 0.3364 | 0.3378 | 0.3333 | 0.3079 | 0.3244 | 0.3237 | 0.3246 | 0.3229 | -## Reproduction Log[*](reproducibility.md) - -(Prior to the addition of TREC 7/8 topics) +## Reproduction Log[*](../../docs/reproducibility.md) + Results reproduced by [@chriskamphuis](https://github.com/chriskamphuis) on 2018-12-18 (commit [`a15235`](https://github.com/castorini/Anserini/commit/a152359435ac6ae694b39f561343bba5eed8fdc9)) + Results reproduced by [@kelvin-jiang](https://github.com/kelvin-jiang) on 2019-09-08 (commit [`a1892ae`](https://github.com/castorini/anserini/commit/a1892aec726efe55111a7bc501ab0914afab3a30)) diff --git a/docs/regressions-dl19-doc-ca.md b/docs/regressions/regressions-dl19-doc-ca.md similarity index 91% rename from docs/regressions-dl19-doc-ca.md rename to docs/regressions/regressions-dl19-doc-ca.md index a1c694a475..3e6a52a3f9 100644 --- a/docs/regressions-dl19-doc-ca.md +++ b/docs/regressions/regressions-dl19-doc-ca.md @@ -7,8 +7,8 @@ Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** wit Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-docTTTTTquery.md b/docs/regressions/regressions-dl19-doc-docTTTTTquery.md similarity index 95% rename from docs/regressions-dl19-doc-docTTTTTquery.md rename to docs/regressions/regressions-dl19-doc-docTTTTTquery.md index e8fe21693c..b32671e48d 100644 --- a/docs/regressions-dl19-doc-docTTTTTquery.md +++ b/docs/regressions/regressions-dl19-doc-docTTTTTquery.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -14,10 +14,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -42,9 +42,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-docTTTTTquery/` should be a directory containing the expanded document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-hgf-wp.md b/docs/regressions/regressions-dl19-doc-hgf-wp.md similarity index 91% rename from docs/regressions-dl19-doc-hgf-wp.md rename to docs/regressions/regressions-dl19-doc-hgf-wp.md index b1506b76ee..a6ef83bdc7 100644 --- a/docs/regressions-dl19-doc-hgf-wp.md +++ b/docs/regressions/regressions-dl19-doc-hgf-wp.md @@ -8,8 +8,8 @@ In general, effectiveness is lower than with "standard" Lucene tokenization for Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-hgf-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-hgf-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,9 +32,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-segmented-ca.md b/docs/regressions/regressions-dl19-doc-segmented-ca.md similarity index 91% rename from docs/regressions-dl19-doc-segmented-ca.md rename to docs/regressions/regressions-dl19-doc-segmented-ca.md index af6c547579..e79d1eef6b 100644 --- a/docs/regressions-dl19-doc-segmented-ca.md +++ b/docs/regressions/regressions-dl19-doc-segmented-ca.md @@ -5,17 +5,17 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). + **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing + **Expansion Condition:** none In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-segmented-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -40,9 +40,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-segmented-docTTTTTquery.md b/docs/regressions/regressions-dl19-doc-segmented-docTTTTTquery.md similarity index 95% rename from docs/regressions-dl19-doc-segmented-docTTTTTquery.md rename to docs/regressions/regressions-dl19-doc-segmented-docTTTTTquery.md index 0835db7b69..370a2cd530 100644 --- a/docs/regressions-dl19-doc-segmented-docTTTTTquery.md +++ b/docs/regressions/regressions-dl19-doc-segmented-docTTTTTquery.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -15,10 +15,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-segmented-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -43,9 +43,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented-docTTTTTquery/` should be a directory containing the expanded segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-segmented-unicoil-noexp.md b/docs/regressions/regressions-dl19-doc-segmented-unicoil-noexp.md similarity index 94% rename from docs/regressions-dl19-doc-segmented-unicoil-noexp.md rename to docs/regressions/regressions-dl19-doc-segmented-unicoil-noexp.md index e03afed790..725f33afbe 100644 --- a/docs/regressions-dl19-doc-segmented-unicoil-noexp.md +++ b/docs/regressions/regressions-dl19-doc-segmented-unicoil-noexp.md @@ -11,8 +11,8 @@ The experiments on this page are not actually reported in the paper. However, the model is the same, applied to the MS MARCO _segmented_ document corpus (without any expansions). Retrieval uses MaxP technique, where we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-segmented-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -67,7 +67,7 @@ The directory `/path/to/msmarco-doc-segmented-unicoil-noexp/` should point to th The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -150,9 +150,9 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@mayankanand007](https://github.com/mayankanand007) on 2022-02-28 (commit [`950d7fd`](https://github.com/castorini/anserini/commit/950d7fd88dbb87f39e9c1f6ccf9e41cbb6f04f36)) diff --git a/docs/regressions-dl19-doc-segmented-unicoil.md b/docs/regressions/regressions-dl19-doc-segmented-unicoil.md similarity index 94% rename from docs/regressions-dl19-doc-segmented-unicoil.md rename to docs/regressions/regressions-dl19-doc-segmented-unicoil.md index 113668b25a..01cef7c44f 100644 --- a/docs/regressions-dl19-doc-segmented-unicoil.md +++ b/docs/regressions/regressions-dl19-doc-segmented-unicoil.md @@ -11,8 +11,8 @@ The experiments on this page are not actually reported in the paper. However, the model is the same, applied to the MS MARCO _segmented_ document corpus (with doc2query-T5 expansions). Retrieval uses MaxP technique, where we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-unicoil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-segmented-unicoil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -67,7 +67,7 @@ The directory `/path/to/msmarco-doc-segmented-unicoil/` should point to the corp The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -150,9 +150,9 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@mayankanand007](https://github.com/mayankanand007) on 2022-02-28 (commit [`950d7fd`](https://github.com/castorini/anserini/commit/950d7fd88dbb87f39e9c1f6ccf9e41cbb6f04f36)) diff --git a/docs/regressions-dl19-doc-segmented-wp.md b/docs/regressions/regressions-dl19-doc-segmented-wp.md similarity index 92% rename from docs/regressions-dl19-doc-segmented-wp.md rename to docs/regressions/regressions-dl19-doc-segmented-wp.md index 5c52f3b58e..fd923b8794 100644 --- a/docs/regressions-dl19-doc-segmented-wp.md +++ b/docs/regressions/regressions-dl19-doc-segmented-wp.md @@ -9,8 +9,8 @@ In general, effectiveness is lower than with "standard" Lucene tokenization for Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-segmented-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -33,9 +33,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-segmented.md b/docs/regressions/regressions-dl19-doc-segmented.md similarity index 97% rename from docs/regressions-dl19-doc-segmented.md rename to docs/regressions/regressions-dl19-doc-segmented.md index e1e11c6fe8..e09a6d8828 100644 --- a/docs/regressions-dl19-doc-segmented.md +++ b/docs/regressions/regressions-dl19-doc-segmented.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -15,10 +15,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-segmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -43,9 +43,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc-wp.md b/docs/regressions/regressions-dl19-doc-wp.md similarity index 91% rename from docs/regressions-dl19-doc-wp.md rename to docs/regressions/regressions-dl19-doc-wp.md index 53454a0e2f..f3f810f8db 100644 --- a/docs/regressions-dl19-doc-wp.md +++ b/docs/regressions/regressions-dl19-doc-wp.md @@ -8,8 +8,8 @@ In general, effectiveness is lower than with "standard" Lucene tokenization for Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,9 +32,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-doc.md b/docs/regressions/regressions-dl19-doc.md similarity index 97% rename from docs/regressions-dl19-doc.md rename to docs/regressions/regressions-dl19-doc.md index 687a8add8d..ab0f36b364 100644 --- a/docs/regressions-dl19-doc.md +++ b/docs/regressions/regressions-dl19-doc.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -14,10 +14,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-doc.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -42,9 +42,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -294,7 +294,7 @@ Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. + The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](../../docs/experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl19-passage-bm25-b8.md b/docs/regressions/regressions-dl19-passage-bm25-b8.md similarity index 90% rename from docs/regressions-dl19-passage-bm25-b8.md rename to docs/regressions/regressions-dl19-passage-bm25-b8.md index efb359b43a..725bff5f76 100644 --- a/docs/regressions-dl19-passage-bm25-b8.md +++ b/docs/regressions/regressions-dl19-passage-bm25-b8.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-bm25-b8.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-bm25-b8.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-bm25-b8.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-bm25-b8.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -57,7 +57,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-bm25-b8/` should be a directory containing `jsonl` files containing quantized BM25 vectors for every document -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -99,8 +99,8 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **BM25 (default parameters, quantized 8 bits)**| | [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.7639 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-bm25-b8.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-bm25-b8.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-14 (commit [`dc07344`](https://github.com/castorini/anserini/commit/dc073447c8a0c07b53d979c49bf1e2e018200508)) diff --git a/docs/regressions-dl19-passage-ca.md b/docs/regressions/regressions-dl19-passage-ca.md similarity index 92% rename from docs/regressions-dl19-passage-ca.md rename to docs/regressions/regressions-dl19-passage-ca.md index e233077d3f..541a0b22e1 100644 --- a/docs/regressions-dl19-passage-ca.md +++ b/docs/regressions/regressions-dl19-passage-ca.md @@ -6,10 +6,10 @@ This page describes baseline experiments, integrated into Anserini's regression Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** with **WordPiece tokenization** (i.e., from BERT) using the following tokenizer from HuggingFace [`bert-base-uncased`](https://huggingface.co/bert-base-uncased). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -33,7 +33,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-passage-docTTTTTquery.md b/docs/regressions/regressions-dl19-passage-docTTTTTquery.md similarity index 97% rename from docs/regressions-dl19-passage-docTTTTTquery.md rename to docs/regressions/regressions-dl19-passage-docTTTTTquery.md index dc22c7e8a7..acd70039b9 100644 --- a/docs/regressions-dl19-passage-docTTTTTquery.md +++ b/docs/regressions/regressions-dl19-passage-docTTTTTquery.md @@ -6,10 +6,10 @@ This page describes document expansion experiments, integrated into Anserini's r These experiments take advantage of [docTTTTTquery](http://doc2query.ai/) (also called doc2query-T5) expansions. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,7 +34,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-docTTTTTquery` should be a directory containing `jsonl` files containing the expanded passage collection. [Instructions in the docTTTTTquery repo](http://doc2query.ai/) explain how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -211,7 +211,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](../../docs/experiments-msmarco-passage.md). + The setting "tuned2" refers to `k1=2.18`, `b=0.86`, tuned via grid search to optimize recall@1000 directly _on the expanded passages_ using the MS MARCO passage sparse judgments (in 2020/12). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl19-passage-hgf-wp.md b/docs/regressions/regressions-dl19-passage-hgf-wp.md similarity index 92% rename from docs/regressions-dl19-passage-hgf-wp.md rename to docs/regressions/regressions-dl19-passage-hgf-wp.md index d00e7a8c86..4c406e9507 100644 --- a/docs/regressions-dl19-passage-hgf-wp.md +++ b/docs/regressions/regressions-dl19-passage-hgf-wp.md @@ -7,10 +7,10 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-hgf-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-hgf-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,7 +34,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-passage-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-dl19-passage-splade-distil-cocodenser-medium.md similarity index 93% rename from docs/regressions-dl19-passage-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-dl19-passage-splade-distil-cocodenser-medium.md index 9e94c3a861..bae6ce0cc3 100644 --- a/docs/regressions-dl19-passage-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-dl19-passage-splade-distil-cocodenser-medium.md @@ -6,10 +6,10 @@ This page describes regression experiments, integrated into Anserini's regressio The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -64,7 +64,7 @@ The path `/path/to/msmarco-passage-splade_distil_cocodenser_medium/` should poin The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -134,8 +134,8 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-14 (commit [`dc07344`](https://github.com/castorini/anserini/commit/dc073447c8a0c07b53d979c49bf1e2e018200508)) diff --git a/docs/regressions-dl19-passage-splade-pp-ed-onnx.md b/docs/regressions/regressions-dl19-passage-splade-pp-ed-onnx.md similarity index 93% rename from docs/regressions-dl19-passage-splade-pp-ed-onnx.md rename to docs/regressions/regressions-dl19-passage-splade-pp-ed-onnx.md index 528f9910e1..a7a27079ee 100644 --- a/docs/regressions-dl19-passage-splade-pp-ed-onnx.md +++ b/docs/regressions/regressions-dl19-passage-splade-pp-ed-onnx.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-splade-pp-ed-onnx.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-splade-pp-ed-onnx.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-ed/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@cadurosar](https://github.com/cadurosar) on 2023-06-01 (commit [`70ea75`](https://github.com/castorini/anserini/commit/70ea75314ba570001eb68134f2185b55f6c66044)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-dl19-passage-splade-pp-ed.md b/docs/regressions/regressions-dl19-passage-splade-pp-ed.md similarity index 93% rename from docs/regressions-dl19-passage-splade-pp-ed.md rename to docs/regressions/regressions-dl19-passage-splade-pp-ed.md index 588a50d444..56ad9a325c 100644 --- a/docs/regressions-dl19-passage-splade-pp-ed.md +++ b/docs/regressions/regressions-dl19-passage-splade-pp-ed.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-splade-pp-ed.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-splade-pp-ed.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-ed/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@justram](https://github.com/justram) on 2023-03-08 (commit [`03f95a8`](https://github.com/castorini/anserini/commit/03f95a8e1ae09ab09efe046bfcbd3a4cdda691b4)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) \ No newline at end of file diff --git a/docs/regressions-dl19-passage-splade-pp-sd-onnx.md b/docs/regressions/regressions-dl19-passage-splade-pp-sd-onnx.md similarity index 93% rename from docs/regressions-dl19-passage-splade-pp-sd-onnx.md rename to docs/regressions/regressions-dl19-passage-splade-pp-sd-onnx.md index 74c2b6f721..4e2212a4dc 100644 --- a/docs/regressions-dl19-passage-splade-pp-sd-onnx.md +++ b/docs/regressions/regressions-dl19-passage-splade-pp-sd-onnx.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-splade-pp-sd-onnx.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-splade-pp-sd-onnx.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-sd/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@cadurosar](https://github.com/cadurosar) on 2023-06-01 (commit [`70ea75`](https://github.com/castorini/anserini/commit/70ea75314ba570001eb68134f2185b55f6c66044)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-dl19-passage-splade-pp-sd.md b/docs/regressions/regressions-dl19-passage-splade-pp-sd.md similarity index 93% rename from docs/regressions-dl19-passage-splade-pp-sd.md rename to docs/regressions/regressions-dl19-passage-splade-pp-sd.md index 76132418b7..4a6e7b53bb 100644 --- a/docs/regressions-dl19-passage-splade-pp-sd.md +++ b/docs/regressions/regressions-dl19-passage-splade-pp-sd.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-splade-pp-sd.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-splade-pp-sd.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-sd/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@justram](https://github.com/justram) on 2023-03-08 (commit [`03f95a8`](https://github.com/castorini/anserini/commit/03f95a8e1ae09ab09efe046bfcbd3a4cdda691b4)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-dl19-passage-unicoil-noexp.md b/docs/regressions/regressions-dl19-passage-unicoil-noexp.md similarity index 93% rename from docs/regressions-dl19-passage-unicoil-noexp.md rename to docs/regressions/regressions-dl19-passage-unicoil-noexp.md index 41e6ed5443..5d2fb2c3d5 100644 --- a/docs/regressions-dl19-passage-unicoil-noexp.md +++ b/docs/regressions/regressions-dl19-passage-unicoil-noexp.md @@ -11,10 +11,10 @@ The experiments on this page are not actually reported in the paper. Here, a variant model without expansion is used. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -69,7 +69,7 @@ The path `/path/to/msmarco-passage-unicoil-noexp/` should point to the corpus do The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -139,9 +139,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@mayankanand007](https://github.com/mayankanand007) on 2022-02-28 (commit [`950d7fd`](https://github.com/castorini/anserini/commit/950d7fd88dbb87f39e9c1f6ccf9e41cbb6f04f36)) diff --git a/docs/regressions-dl19-passage-unicoil.md b/docs/regressions/regressions-dl19-passage-unicoil.md similarity index 93% rename from docs/regressions-dl19-passage-unicoil.md rename to docs/regressions/regressions-dl19-passage-unicoil.md index ea0fe9985b..4e8060a570 100644 --- a/docs/regressions-dl19-passage-unicoil.md +++ b/docs/regressions/regressions-dl19-passage-unicoil.md @@ -11,10 +11,10 @@ The experiments on this page are not actually reported in the paper. However, the model is the same. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-unicoil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-unicoil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -69,7 +69,7 @@ The path `/path/to/msmarco-passage-unicoil/` should point to the corpus download The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -139,9 +139,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl19-passage-unicoil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl19-passage-unicoil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@mayankanand007](https://github.com/mayankanand007) on 2022-02-28 (commit [`950d7fd`](https://github.com/castorini/anserini/commit/950d7fd88dbb87f39e9c1f6ccf9e41cbb6f04f36)) diff --git a/docs/regressions-dl19-passage-wp.md b/docs/regressions/regressions-dl19-passage-wp.md similarity index 92% rename from docs/regressions-dl19-passage-wp.md rename to docs/regressions/regressions-dl19-passage-wp.md index d294e837d7..0314ae06be 100644 --- a/docs/regressions-dl19-passage-wp.md +++ b/docs/regressions/regressions-dl19-passage-wp.md @@ -7,10 +7,10 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,7 +34,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl19-passage.md b/docs/regressions/regressions-dl19-passage.md similarity index 97% rename from docs/regressions-dl19-passage.md rename to docs/regressions/regressions-dl19-passage.md index 24f8b90482..c21e03610e 100644 --- a/docs/regressions-dl19-passage.md +++ b/docs/regressions/regressions-dl19-passage.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl19-passage.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl19-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-passage/` should be a directory containing `jsonl` files converted from the official passage collection, which is in `tsv` format. -[This page](experiments-msmarco-passage.md) explains how to perform this conversion. +[This page](../../docs/experiments-msmarco-passage.md) explains how to perform this conversion. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -210,7 +210,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](../../docs/experiments-msmarco-passage.md). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-doc-ca.md b/docs/regressions/regressions-dl20-doc-ca.md similarity index 91% rename from docs/regressions-dl20-doc-ca.md rename to docs/regressions/regressions-dl20-doc-ca.md index b8d6e64a46..31193f5979 100644 --- a/docs/regressions-dl20-doc-ca.md +++ b/docs/regressions/regressions-dl20-doc-ca.md @@ -7,8 +7,8 @@ Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** wit Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-docTTTTTquery.md b/docs/regressions/regressions-dl20-doc-docTTTTTquery.md similarity index 95% rename from docs/regressions-dl20-doc-docTTTTTquery.md rename to docs/regressions/regressions-dl20-doc-docTTTTTquery.md index 115c61780b..70b7c4549a 100644 --- a/docs/regressions-dl20-doc-docTTTTTquery.md +++ b/docs/regressions/regressions-dl20-doc-docTTTTTquery.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -14,10 +14,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -42,9 +42,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-docTTTTTquery/` should be a directory containing the expanded document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-hgf-wp.md b/docs/regressions/regressions-dl20-doc-hgf-wp.md similarity index 91% rename from docs/regressions-dl20-doc-hgf-wp.md rename to docs/regressions/regressions-dl20-doc-hgf-wp.md index 29c34fd6c2..fc95b3a49e 100644 --- a/docs/regressions-dl20-doc-hgf-wp.md +++ b/docs/regressions/regressions-dl20-doc-hgf-wp.md @@ -8,8 +8,8 @@ In general, effectiveness is lower than with "standard" Lucene tokenization for Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-hgf-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-hgf-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,9 +32,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-segmented-ca.md b/docs/regressions/regressions-dl20-doc-segmented-ca.md similarity index 91% rename from docs/regressions-dl20-doc-segmented-ca.md rename to docs/regressions/regressions-dl20-doc-segmented-ca.md index 6438fcfb4c..a9abf540f9 100644 --- a/docs/regressions-dl20-doc-segmented-ca.md +++ b/docs/regressions/regressions-dl20-doc-segmented-ca.md @@ -5,17 +5,17 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). + **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing + **Expansion Condition:** none In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-segmented-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-segmented-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -40,9 +40,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-segmented-docTTTTTquery.md b/docs/regressions/regressions-dl20-doc-segmented-docTTTTTquery.md similarity index 95% rename from docs/regressions-dl20-doc-segmented-docTTTTTquery.md rename to docs/regressions/regressions-dl20-doc-segmented-docTTTTTquery.md index 1ed5af96ce..fed3dd6dff 100644 --- a/docs/regressions-dl20-doc-segmented-docTTTTTquery.md +++ b/docs/regressions/regressions-dl20-doc-segmented-docTTTTTquery.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -15,10 +15,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-segmented-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-segmented-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -43,9 +43,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented-docTTTTTquery/` should be a directory containing the expanded segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-segmented-unicoil-noexp.md b/docs/regressions/regressions-dl20-doc-segmented-unicoil-noexp.md similarity index 94% rename from docs/regressions-dl20-doc-segmented-unicoil-noexp.md rename to docs/regressions/regressions-dl20-doc-segmented-unicoil-noexp.md index baa7cb8cd4..065eb12c6c 100644 --- a/docs/regressions-dl20-doc-segmented-unicoil-noexp.md +++ b/docs/regressions/regressions-dl20-doc-segmented-unicoil-noexp.md @@ -11,8 +11,8 @@ The experiments on this page are not actually reported in the paper. However, the model is the same, applied to the MS MARCO _segmented_ document corpus (without any expansions). Retrieval uses MaxP technique, where we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-segmented-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-segmented-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -67,7 +67,7 @@ The directory `/path/to/msmarco-doc-segmented-unicoil-noexp/` should point to th The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -150,9 +150,9 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl20-doc-segmented-unicoil.md b/docs/regressions/regressions-dl20-doc-segmented-unicoil.md similarity index 94% rename from docs/regressions-dl20-doc-segmented-unicoil.md rename to docs/regressions/regressions-dl20-doc-segmented-unicoil.md index 896345db97..bb659d7b73 100644 --- a/docs/regressions-dl20-doc-segmented-unicoil.md +++ b/docs/regressions/regressions-dl20-doc-segmented-unicoil.md @@ -11,8 +11,8 @@ The experiments on this page are not actually reported in the paper. However, the model is the same, applied to the MS MARCO _segmented_ document corpus (with doc2query-T5 expansions). Retrieval uses MaxP technique, where we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-segmented-unicoil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-segmented-unicoil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -67,7 +67,7 @@ The directory `/path/to/msmarco-doc-segmented-unicoil/` should point to the corp The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -150,9 +150,9 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl20-doc-segmented-wp.md b/docs/regressions/regressions-dl20-doc-segmented-wp.md similarity index 92% rename from docs/regressions-dl20-doc-segmented-wp.md rename to docs/regressions/regressions-dl20-doc-segmented-wp.md index ae9a26203f..93dcb4181d 100644 --- a/docs/regressions-dl20-doc-segmented-wp.md +++ b/docs/regressions/regressions-dl20-doc-segmented-wp.md @@ -9,8 +9,8 @@ In general, effectiveness is lower than with "standard" Lucene tokenization for Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-segmented-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-segmented-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-segmented-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -33,9 +33,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-segmented.md b/docs/regressions/regressions-dl20-doc-segmented.md similarity index 97% rename from docs/regressions-dl20-doc-segmented.md rename to docs/regressions/regressions-dl20-doc-segmented.md index c22207723a..fd5419c46b 100644 --- a/docs/regressions-dl20-doc-segmented.md +++ b/docs/regressions/regressions-dl20-doc-segmented.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -15,10 +15,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-segmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-segmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -43,9 +43,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc-wp.md b/docs/regressions/regressions-dl20-doc-wp.md similarity index 91% rename from docs/regressions-dl20-doc-wp.md rename to docs/regressions/regressions-dl20-doc-wp.md index 18e4dc71ce..0099a067c0 100644 --- a/docs/regressions-dl20-doc-wp.md +++ b/docs/regressions/regressions-dl20-doc-wp.md @@ -8,8 +8,8 @@ In general, effectiveness is lower than with "standard" Lucene tokenization for Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,9 +32,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-doc.md b/docs/regressions/regressions-dl20-doc.md similarity index 97% rename from docs/regressions-dl20-doc.md rename to docs/regressions/regressions-dl20-doc.md index aae3d38010..bd78024568 100644 --- a/docs/regressions-dl20-doc.md +++ b/docs/regressions/regressions-dl20-doc.md @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -14,10 +14,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-doc.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -42,9 +42,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -294,7 +294,7 @@ Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. + The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](../../docs/experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-passage-bm25-b8.md b/docs/regressions/regressions-dl20-passage-bm25-b8.md similarity index 90% rename from docs/regressions-dl20-passage-bm25-b8.md rename to docs/regressions/regressions-dl20-passage-bm25-b8.md index 733215be26..932179c406 100644 --- a/docs/regressions-dl20-passage-bm25-b8.md +++ b/docs/regressions/regressions-dl20-passage-bm25-b8.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-bm25-b8.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-bm25-b8.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-bm25-b8.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-bm25-b8.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -57,7 +57,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-bm25-b8/` should be a directory containing `jsonl` files containing quantized BM25 vectors for every document -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -99,8 +99,8 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **BM25 (default parameters, quantized 8 bits)**| | [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.8119 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-bm25-b8.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-bm25-b8.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-14 (commit [`dc07344`](https://github.com/castorini/anserini/commit/dc073447c8a0c07b53d979c49bf1e2e018200508)) diff --git a/docs/regressions-dl20-passage-ca.md b/docs/regressions/regressions-dl20-passage-ca.md similarity index 92% rename from docs/regressions-dl20-passage-ca.md rename to docs/regressions/regressions-dl20-passage-ca.md index 1c8179477c..f4cfb12404 100644 --- a/docs/regressions-dl20-passage-ca.md +++ b/docs/regressions/regressions-dl20-passage-ca.md @@ -6,10 +6,10 @@ This page describes baseline experiments, integrated into Anserini's regression Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** with **WordPiece tokenization** (i.e., from BERT) using the following tokenizer from HuggingFace [`bert-base-uncased`](https://huggingface.co/bert-base-uncased). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -33,7 +33,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-passage-docTTTTTquery.md b/docs/regressions/regressions-dl20-passage-docTTTTTquery.md similarity index 97% rename from docs/regressions-dl20-passage-docTTTTTquery.md rename to docs/regressions/regressions-dl20-passage-docTTTTTquery.md index 8e5e566c71..b81fc19317 100644 --- a/docs/regressions-dl20-passage-docTTTTTquery.md +++ b/docs/regressions/regressions-dl20-passage-docTTTTTquery.md @@ -6,10 +6,10 @@ This page describes document expansion experiments, integrated into Anserini's r These experiments take advantage of [docTTTTTquery](http://doc2query.ai/) (also called doc2query-T5) expansions. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,7 +34,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-docTTTTTquery` should be a directory containing `jsonl` files containing the expanded passage collection. [Instructions in the docTTTTTquery repo](http://doc2query.ai/) explain how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -211,7 +211,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](../../docs/experiments-msmarco-passage.md). + The setting "tuned2" refers to `k1=2.18`, `b=0.86`, tuned via grid search to optimize recall@1000 directly _on the expanded passages_ using the MS MARCO passage sparse judgments (in 2020/12). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-passage-hgf-wp.md b/docs/regressions/regressions-dl20-passage-hgf-wp.md similarity index 92% rename from docs/regressions-dl20-passage-hgf-wp.md rename to docs/regressions/regressions-dl20-passage-hgf-wp.md index d6b35fc44c..dc0d8e77cf 100644 --- a/docs/regressions-dl20-passage-hgf-wp.md +++ b/docs/regressions/regressions-dl20-passage-hgf-wp.md @@ -7,10 +7,10 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-hgf-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-hgf-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,7 +34,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-passage-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-dl20-passage-splade-distil-cocodenser-medium.md similarity index 93% rename from docs/regressions-dl20-passage-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-dl20-passage-splade-distil-cocodenser-medium.md index 87ab79d8e7..f0c984a5e2 100644 --- a/docs/regressions-dl20-passage-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-dl20-passage-splade-distil-cocodenser-medium.md @@ -6,10 +6,10 @@ This page describes regression experiments, integrated into Anserini's regressio The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -64,7 +64,7 @@ The path `/path/to/msmarco-passage-splade_distil_cocodenser_medium/` should poin The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -134,8 +134,8 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-14 (commit [`dc07344`](https://github.com/castorini/anserini/commit/dc073447c8a0c07b53d979c49bf1e2e018200508)) diff --git a/docs/regressions-dl20-passage-splade-pp-ed-onnx.md b/docs/regressions/regressions-dl20-passage-splade-pp-ed-onnx.md similarity index 93% rename from docs/regressions-dl20-passage-splade-pp-ed-onnx.md rename to docs/regressions/regressions-dl20-passage-splade-pp-ed-onnx.md index ac24e02902..ad239b07f0 100644 --- a/docs/regressions-dl20-passage-splade-pp-ed-onnx.md +++ b/docs/regressions/regressions-dl20-passage-splade-pp-ed-onnx.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-splade-pp-ed-onnx.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-splade-pp-ed-onnx.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-ed/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@cadurosar](https://github.com/cadurosar) on 2023-06-01 (commit [`70ea75`](https://github.com/castorini/anserini/commit/70ea75314ba570001eb68134f2185b55f6c66044)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-dl20-passage-splade-pp-ed.md b/docs/regressions/regressions-dl20-passage-splade-pp-ed.md similarity index 93% rename from docs/regressions-dl20-passage-splade-pp-ed.md rename to docs/regressions/regressions-dl20-passage-splade-pp-ed.md index 1f729b2f92..bc2c037bb4 100644 --- a/docs/regressions-dl20-passage-splade-pp-ed.md +++ b/docs/regressions/regressions-dl20-passage-splade-pp-ed.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-splade-pp-ed.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-splade-pp-ed.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-ed/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@justram](https://github.com/justram) on 2023-03-08 (commit [`03f95a8`](https://github.com/castorini/anserini/commit/03f95a8e1ae09ab09efe046bfcbd3a4cdda691b4)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) \ No newline at end of file diff --git a/docs/regressions-dl20-passage-splade-pp-sd-onnx.md b/docs/regressions/regressions-dl20-passage-splade-pp-sd-onnx.md similarity index 93% rename from docs/regressions-dl20-passage-splade-pp-sd-onnx.md rename to docs/regressions/regressions-dl20-passage-splade-pp-sd-onnx.md index 12679aef53..b6dbcef3c0 100644 --- a/docs/regressions-dl20-passage-splade-pp-sd-onnx.md +++ b/docs/regressions/regressions-dl20-passage-splade-pp-sd-onnx.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-splade-pp-sd-onnx.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-splade-pp-sd-onnx.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-sd/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@cadurosar](https://github.com/cadurosar) on 2023-06-01 (commit [`70ea75`](https://github.com/castorini/anserini/commit/70ea75314ba570001eb68134f2185b55f6c66044)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-dl20-passage-splade-pp-sd.md b/docs/regressions/regressions-dl20-passage-splade-pp-sd.md similarity index 93% rename from docs/regressions-dl20-passage-splade-pp-sd.md rename to docs/regressions/regressions-dl20-passage-splade-pp-sd.md index a47dac9952..59fd108ffd 100644 --- a/docs/regressions-dl20-passage-splade-pp-sd.md +++ b/docs/regressions/regressions-dl20-passage-splade-pp-sd.md @@ -9,10 +9,10 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-splade-pp-sd.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-splade-pp-sd.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,7 +66,7 @@ The path `/path/to/msmarco-passage-splade-pp-sd/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,9 +136,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@justram](https://github.com/justram) on 2023-03-08 (commit [`03f95a8`](https://github.com/castorini/anserini/commit/03f95a8e1ae09ab09efe046bfcbd3a4cdda691b4)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) \ No newline at end of file diff --git a/docs/regressions-dl20-passage-unicoil-noexp.md b/docs/regressions/regressions-dl20-passage-unicoil-noexp.md similarity index 93% rename from docs/regressions-dl20-passage-unicoil-noexp.md rename to docs/regressions/regressions-dl20-passage-unicoil-noexp.md index 1abf8ce33b..a7b603c8ad 100644 --- a/docs/regressions-dl20-passage-unicoil-noexp.md +++ b/docs/regressions/regressions-dl20-passage-unicoil-noexp.md @@ -11,10 +11,10 @@ The experiments on this page are not actually reported in the paper. Here, a variant model without expansion is used. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -69,7 +69,7 @@ The path `/path/to/msmarco-passage-unicoil-noexp/` should point to the corpus do The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -139,9 +139,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2102.07662). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl20-passage-unicoil.md b/docs/regressions/regressions-dl20-passage-unicoil.md similarity index 93% rename from docs/regressions-dl20-passage-unicoil.md rename to docs/regressions/regressions-dl20-passage-unicoil.md index 5096ff775a..08b7d805ac 100644 --- a/docs/regressions-dl20-passage-unicoil.md +++ b/docs/regressions/regressions-dl20-passage-unicoil.md @@ -11,10 +11,10 @@ The experiments on this page are not actually reported in the paper. However, the model is the same. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-unicoil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-unicoil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -69,7 +69,7 @@ The path `/path/to/msmarco-passage-unicoil/` should point to the corpus download The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -139,9 +139,9 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2102.07662). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl20-passage-unicoil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl20-passage-unicoil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-02-25 (commit [`7472d86`](https://github.com/castorini/anserini/commit/7472d862c7311bc8bbd30655c940d6396e27c223)) + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl20-passage-wp.md b/docs/regressions/regressions-dl20-passage-wp.md similarity index 92% rename from docs/regressions-dl20-passage-wp.md rename to docs/regressions/regressions-dl20-passage-wp.md index 9efd73824c..c4a669d906 100644 --- a/docs/regressions-dl20-passage-wp.md +++ b/docs/regressions/regressions-dl20-passage-wp.md @@ -7,10 +7,10 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,7 +34,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl20-passage.md b/docs/regressions/regressions-dl20-passage.md similarity index 97% rename from docs/regressions-dl20-passage.md rename to docs/regressions/regressions-dl20-passage.md index c0feb1f846..de15762936 100644 --- a/docs/regressions-dl20-passage.md +++ b/docs/regressions/regressions-dl20-passage.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-passage.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl20-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl20-passage.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl20-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-passage/` should be a directory containing `jsonl` files converted from the official passage collection, which is in `tsv` format. -[This page](experiments-msmarco-passage.md) explains how to perform this conversion. +[This page](../../docs/experiments-msmarco-passage.md) explains how to perform this conversion. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -210,7 +210,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](../../docs/experiments-msmarco-passage.md). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl21-doc-d2q-t5.md b/docs/regressions/regressions-dl21-doc-d2q-t5.md similarity index 94% rename from docs/regressions-dl21-doc-d2q-t5.md rename to docs/regressions/regressions-dl21-doc-d2q-t5.md index e04f18e9a4..ae6f946e08 100644 --- a/docs/regressions-dl21-doc-d2q-t5.md +++ b/docs/regressions/regressions-dl21-doc-d2q-t5.md @@ -5,15 +5,15 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 document collection (with doc2query-T5 expansions). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: + **Indexing Condition:** each document in the MS MARCO V2 document collection is treated as a unit of indexing + **Expansion Condition:** doc2query-T5 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -36,9 +36,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-doc-segmented-d2q-t5.md b/docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md similarity index 94% rename from docs/regressions-dl21-doc-segmented-d2q-t5.md rename to docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md index 6f48dfebb0..4fe37bd328 100644 --- a/docs/regressions-dl21-doc-segmented-d2q-t5.md +++ b/docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md @@ -5,15 +5,15 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 _segmented_ document collection (with doc2query-T5 expansions). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: + **Indexing Condition:** each segment in the MS MARCO V2 _segmented_ document collection is treated as a unit of indexing + **Expansion Condition:** doc2query-T5 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-segmented-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -36,9 +36,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-doc-segmented-unicoil-0shot-v2.md b/docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot-v2.md similarity index 94% rename from docs/regressions-dl21-doc-segmented-unicoil-0shot-v2.md rename to docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot-v2.md index 0d7fbb4743..82f49c14da 100644 --- a/docs/regressions-dl21-doc-segmented-unicoil-0shot-v2.md +++ b/docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot-v2.md @@ -16,10 +16,10 @@ The segment-only encoding results are deprecated and kept around primarily for a You probably don't want to use them. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-segmented-unicoil-0shot-v2.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented-unicoil-0shot-v2.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -80,7 +80,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-0shot-v2/` should point to t The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -151,8 +151,8 @@ With the above commands, you should be able to reproduce the following results: This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl21-doc-segmented-unicoil-0shot.md b/docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot.md similarity index 92% rename from docs/regressions-dl21-doc-segmented-unicoil-0shot.md rename to docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot.md index ce91e9cb63..04caf1033f 100644 --- a/docs/regressions-dl21-doc-segmented-unicoil-0shot.md +++ b/docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot.md @@ -15,10 +15,10 @@ This regression captures segment-only encoding and is kept around primarily for The version that uses title/segment encoding can be found [here](regressions-dl21-doc-segmented-unicoil-0shot-v2.md). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-segmented-unicoil-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented-unicoil-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -79,7 +79,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-0shot/` should point to the The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -126,8 +126,8 @@ With the above commands, you should be able to reproduce the following results: This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md b/docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md similarity index 94% rename from docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md rename to docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md index 7c96d58abb..1386d9238a 100644 --- a/docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md +++ b/docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md @@ -16,10 +16,10 @@ The segment-only encoding results are deprecated and kept around primarily for a You probably don't want to use them. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-segmented-unicoil-noexp-0shot-v2.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented-unicoil-noexp-0shot-v2.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -80,7 +80,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2/` should poin The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -151,8 +151,8 @@ With the above commands, you should be able to reproduce the following results: This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot.md b/docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot.md similarity index 92% rename from docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot.md rename to docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot.md index b0fa98bc15..ce879b072e 100644 --- a/docs/regressions-dl21-doc-segmented-unicoil-noexp-0shot.md +++ b/docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot.md @@ -15,10 +15,10 @@ This regression captures segment-only encoding and is kept around primarily for The version that uses title/segment encoding can be found [here](regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-segmented-unicoil-noexp-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented-unicoil-noexp-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -79,7 +79,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-noexp-0shot/` should point t The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -126,8 +126,8 @@ With the above commands, you should be able to reproduce the following results: This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl21-doc-segmented.md b/docs/regressions/regressions-dl21-doc-segmented.md similarity index 94% rename from docs/regressions-dl21-doc-segmented.md rename to docs/regressions/regressions-dl21-doc-segmented.md index 46cb451850..14e0ff5e27 100644 --- a/docs/regressions-dl21-doc-segmented.md +++ b/docs/regressions/regressions-dl21-doc-segmented.md @@ -5,15 +5,15 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 _segmented_ document collection. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: + **Indexing Condition:** each segment in the MS MARCO V2 _segmented_ document collection is treated as a unit of indexing + **Expansion Condition:** none -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc-segmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -36,9 +36,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-doc.md b/docs/regressions/regressions-dl21-doc.md similarity index 94% rename from docs/regressions-dl21-doc.md rename to docs/regressions/regressions-dl21-doc.md index c33f195ff1..962bb601b6 100644 --- a/docs/regressions-dl21-doc.md +++ b/docs/regressions/regressions-dl21-doc.md @@ -5,15 +5,15 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 document collection. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: + **Indexing Condition:** each document in the MS MARCO V2 document collection is treated as a unit of indexing + **Expansion Condition:** none -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-doc.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -36,9 +36,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-passage-augmented-d2q-t5.md b/docs/regressions/regressions-dl21-passage-augmented-d2q-t5.md similarity index 94% rename from docs/regressions-dl21-passage-augmented-d2q-t5.md rename to docs/regressions/regressions-dl21-passage-augmented-d2q-t5.md index 0d66ba4984..e969f18987 100644 --- a/docs/regressions-dl21-passage-augmented-d2q-t5.md +++ b/docs/regressions/regressions-dl21-passage-augmented-d2q-t5.md @@ -5,10 +5,10 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-augmented-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-augmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-augmented-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-augmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-passage-augmented.md b/docs/regressions/regressions-dl21-passage-augmented.md similarity index 94% rename from docs/regressions-dl21-passage-augmented.md rename to docs/regressions/regressions-dl21-passage-augmented.md index 85d94e9ae8..d887c5a481 100644 --- a/docs/regressions-dl21-passage-augmented.md +++ b/docs/regressions/regressions-dl21-passage-augmented.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-augmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-augmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-augmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-augmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-passage-d2q-t5.md b/docs/regressions/regressions-dl21-passage-d2q-t5.md similarity index 94% rename from docs/regressions-dl21-passage-d2q-t5.md rename to docs/regressions/regressions-dl21-passage-d2q-t5.md index 931491ffcf..05c586105c 100644 --- a/docs/regressions-dl21-passage-d2q-t5.md +++ b/docs/regressions/regressions-dl21-passage-d2q-t5.md @@ -5,10 +5,10 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl21-passage-splade-pp-ed.md b/docs/regressions/regressions-dl21-passage-splade-pp-ed.md similarity index 94% rename from docs/regressions-dl21-passage-splade-pp-ed.md rename to docs/regressions/regressions-dl21-passage-splade-pp-ed.md index 817f90971d..b0e1c97460 100644 --- a/docs/regressions-dl21-passage-splade-pp-ed.md +++ b/docs/regressions/regressions-dl21-passage-splade-pp-ed.md @@ -8,10 +8,10 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-EnsembleDistil](https: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-splade-pp-ed.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-splade-pp-ed.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -65,7 +65,7 @@ The path `/path/to/msmarco-v2-passage-splade-pp-ed/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,6 +136,6 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-EnsembleDistil**| **+RM3** | **+Rocchio**| | [DL21 (Passage)](https://microsoft.github.io/msmarco/TREC-Deep-Learning) | 0.8586 | 0.8705 | 0.8964 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-dl21-passage-splade-pp-sd.md b/docs/regressions/regressions-dl21-passage-splade-pp-sd.md similarity index 94% rename from docs/regressions-dl21-passage-splade-pp-sd.md rename to docs/regressions/regressions-dl21-passage-splade-pp-sd.md index 30c92add28..f2ade9c756 100644 --- a/docs/regressions-dl21-passage-splade-pp-sd.md +++ b/docs/regressions/regressions-dl21-passage-splade-pp-sd.md @@ -8,10 +8,10 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-SelfDistil](https://hu > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-splade-pp-sd.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-splade-pp-sd.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -65,7 +65,7 @@ The path `/path/to/msmarco-v2-passage-splade-pp-sd/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -136,6 +136,6 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-SelfDistil**| **+RM3** | **+Rocchio**| | [DL21 (Passage)](https://microsoft.github.io/msmarco/TREC-Deep-Learning) | 0.8525 | 0.8655 | 0.8827 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-dl21-passage-unicoil-0shot.md b/docs/regressions/regressions-dl21-passage-unicoil-0shot.md similarity index 94% rename from docs/regressions-dl21-passage-unicoil-0shot.md rename to docs/regressions/regressions-dl21-passage-unicoil-0shot.md index fa821c8e5d..82370c5e97 100644 --- a/docs/regressions-dl21-passage-unicoil-0shot.md +++ b/docs/regressions/regressions-dl21-passage-unicoil-0shot.md @@ -10,10 +10,10 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-unicoil-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-unicoil-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -74,7 +74,7 @@ The path `/path/to/msmarco-v2-passage-unicoil-0shot/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -148,8 +148,8 @@ With the above commands, you should be able to reproduce the following results: This run roughly corresponds to run `d_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl21-passage-unicoil-noexp-0shot.md b/docs/regressions/regressions-dl21-passage-unicoil-noexp-0shot.md similarity index 94% rename from docs/regressions-dl21-passage-unicoil-noexp-0shot.md rename to docs/regressions/regressions-dl21-passage-unicoil-noexp-0shot.md index 8e76a6ce4e..22d5d38beb 100644 --- a/docs/regressions-dl21-passage-unicoil-noexp-0shot.md +++ b/docs/regressions/regressions-dl21-passage-unicoil-noexp-0shot.md @@ -10,10 +10,10 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage-unicoil-noexp-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage-unicoil-noexp-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -74,7 +74,7 @@ The path `/path/to/msmarco-v2-passage-unicoil-noexp-0shot/` should point to the The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -148,8 +148,8 @@ With the above commands, you should be able to reproduce the following results: This run roughly corresponds to run `d_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-dl21-passage.md b/docs/regressions/regressions-dl21-passage.md similarity index 94% rename from docs/regressions-dl21-passage.md rename to docs/regressions/regressions-dl21-passage.md index e669d7c3b7..7c06872872 100644 --- a/docs/regressions-dl21-passage.md +++ b/docs/regressions/regressions-dl21-passage.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl21-passage.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl21-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-passage.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl21-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl22-passage-augmented-d2q-t5.md b/docs/regressions/regressions-dl22-passage-augmented-d2q-t5.md similarity index 95% rename from docs/regressions-dl22-passage-augmented-d2q-t5.md rename to docs/regressions/regressions-dl22-passage-augmented-d2q-t5.md index a3ebbd5234..8f48735b4c 100644 --- a/docs/regressions-dl22-passage-augmented-d2q-t5.md +++ b/docs/regressions/regressions-dl22-passage-augmented-d2q-t5.md @@ -5,10 +5,10 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-augmented-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-augmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-augmented-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-augmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl22-passage-augmented.md b/docs/regressions/regressions-dl22-passage-augmented.md similarity index 95% rename from docs/regressions-dl22-passage-augmented.md rename to docs/regressions/regressions-dl22-passage-augmented.md index 0bf657dbb5..2ee3f32c12 100644 --- a/docs/regressions-dl22-passage-augmented.md +++ b/docs/regressions/regressions-dl22-passage-augmented.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-augmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-augmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-augmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-augmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl22-passage-d2q-t5.md b/docs/regressions/regressions-dl22-passage-d2q-t5.md similarity index 95% rename from docs/regressions-dl22-passage-d2q-t5.md rename to docs/regressions/regressions-dl22-passage-d2q-t5.md index 1a46bf4c52..6531b3e380 100644 --- a/docs/regressions-dl22-passage-d2q-t5.md +++ b/docs/regressions/regressions-dl22-passage-d2q-t5.md @@ -5,10 +5,10 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl22-passage-splade-pp-ed.md b/docs/regressions/regressions-dl22-passage-splade-pp-ed.md similarity index 94% rename from docs/regressions-dl22-passage-splade-pp-ed.md rename to docs/regressions/regressions-dl22-passage-splade-pp-ed.md index d55bd078a0..a4bbfada21 100644 --- a/docs/regressions-dl22-passage-splade-pp-ed.md +++ b/docs/regressions/regressions-dl22-passage-splade-pp-ed.md @@ -8,10 +8,10 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-EnsembleDistil](https: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-splade-pp-ed.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-splade-pp-ed.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -65,7 +65,7 @@ The path `/path/to/msmarco-v2-passage-splade-pp-ed/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -139,6 +139,6 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-EnsembleDistil**| **+RM3** | **+Rocchio**| | [DL22 (Passage)](https://microsoft.github.io/msmarco/TREC-Deep-Learning) | 0.6629 | 0.6367 | 0.6778 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-dl22-passage-splade-pp-sd.md b/docs/regressions/regressions-dl22-passage-splade-pp-sd.md similarity index 94% rename from docs/regressions-dl22-passage-splade-pp-sd.md rename to docs/regressions/regressions-dl22-passage-splade-pp-sd.md index 282d7adc37..d410568d2e 100644 --- a/docs/regressions-dl22-passage-splade-pp-sd.md +++ b/docs/regressions/regressions-dl22-passage-splade-pp-sd.md @@ -8,10 +8,10 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-SelfDistil](https://hu > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-splade-pp-sd.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-splade-pp-sd.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -65,7 +65,7 @@ The path `/path/to/msmarco-v2-passage-splade-pp-sd/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -139,6 +139,6 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-SelfDistil**| **+RM3** | **+Rocchio**| | [DL22 (Passage)](https://microsoft.github.io/msmarco/TREC-Deep-Learning) | 0.6551 | 0.6350 | 0.6696 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-dl22-passage-unicoil-0shot.md b/docs/regressions/regressions-dl22-passage-unicoil-0shot.md similarity index 96% rename from docs/regressions-dl22-passage-unicoil-0shot.md rename to docs/regressions/regressions-dl22-passage-unicoil-0shot.md index 5cf7db9f65..07aa285c5b 100644 --- a/docs/regressions-dl22-passage-unicoil-0shot.md +++ b/docs/regressions/regressions-dl22-passage-unicoil-0shot.md @@ -10,10 +10,10 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-unicoil-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-unicoil-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -74,7 +74,7 @@ The path `/path/to/msmarco-v2-passage-unicoil-0shot/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl22-passage-unicoil-noexp-0shot.md b/docs/regressions/regressions-dl22-passage-unicoil-noexp-0shot.md similarity index 96% rename from docs/regressions-dl22-passage-unicoil-noexp-0shot.md rename to docs/regressions/regressions-dl22-passage-unicoil-noexp-0shot.md index ae77adb71b..310a135c6f 100644 --- a/docs/regressions-dl22-passage-unicoil-noexp-0shot.md +++ b/docs/regressions/regressions-dl22-passage-unicoil-noexp-0shot.md @@ -10,10 +10,10 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage-unicoil-noexp-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage-unicoil-noexp-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -74,7 +74,7 @@ The path `/path/to/msmarco-v2-passage-unicoil-noexp-0shot/` should point to the The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-dl22-passage.md b/docs/regressions/regressions-dl22-passage.md similarity index 95% rename from docs/regressions-dl22-passage.md rename to docs/regressions/regressions-dl22-passage.md index fad36db80a..426bafdbe5 100644 --- a/docs/regressions-dl22-passage.md +++ b/docs/regressions/regressions-dl22-passage.md @@ -5,10 +5,10 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl22-passage.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl22-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl22-passage.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/dl22-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-fever.md b/docs/regressions/regressions-fever.md similarity index 89% rename from docs/regressions-fever.md rename to docs/regressions/regressions-fever.md index c1abbad8f4..8252ae83bc 100644 --- a/docs/regressions-fever.md +++ b/docs/regressions/regressions-fever.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for the [FEVER fact verification task](https://fever.ai/), which is integrated into Anserini's regression testing framework. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/fever.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/fever.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/fever.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/fever.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -25,9 +25,9 @@ target/appassembler/bin/IndexCollection \ >& logs/log.fever & ``` -The directory `/path/to/fever` should be a directory containing the expanded document collection; see [this link](../docs/experiments-fever.md) for how to prepare this collection. +The directory `/path/to/fever` should be a directory containing the expanded document collection; see [this page](../../docs/experiments-fever.md) for how to prepare this collection. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-fire12-bn.md b/docs/regressions/regressions-fire12-bn.md similarity index 92% rename from docs/regressions-fire12-bn.md rename to docs/regressions/regressions-fire12-bn.md index cafea8db93..4b3732f4ba 100644 --- a/docs/regressions-fire12-bn.md +++ b/docs/regressions/regressions-fire12-bn.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [FIRE 2012 ad hoc retrieval (Monolingual Bengali)](https://www.isical.ac.in/~fire/2012/adhoc.html). The document collection can be found in [FIRE data page](http://fire.irsi.res.in/fire/static/data). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/fire12-bn.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/fire12-bn.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/fire12-bn.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/fire12-bn.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/fire12-bn/` should be a directory containing the collection, containing `bn_ABP` and `bn_BDNews24` directories. There should be 500,122 documents in total. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-fire12-en.md b/docs/regressions/regressions-fire12-en.md similarity index 92% rename from docs/regressions-fire12-en.md rename to docs/regressions/regressions-fire12-en.md index cf32a7fe8b..a24224aa0c 100644 --- a/docs/regressions-fire12-en.md +++ b/docs/regressions/regressions-fire12-en.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [FIRE 2012 ad hoc retrieval (Monolingual English)](https://www.isical.ac.in/~fire/2012/adhoc.html). The document collection can be found in [FIRE data page](http://fire.irsi.res.in/fire/static/data). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/fire12-en.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/fire12-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/fire12-en.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/fire12-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/fire12-en/` should be a directory containing the collection, containing `en_BDNews24` and `en_TheTelegraph_2001-2010` directories. There should be 392,577 documents in total. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-fire12-hi.md b/docs/regressions/regressions-fire12-hi.md similarity index 92% rename from docs/regressions-fire12-hi.md rename to docs/regressions/regressions-fire12-hi.md index 7193fdf7e0..c9fb43f662 100644 --- a/docs/regressions-fire12-hi.md +++ b/docs/regressions/regressions-fire12-hi.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [FIRE 2012 ad hoc retrieval (Monolingual Hindi)](https://www.isical.ac.in/~fire/2012/adhoc.html). The document collection can be found in [FIRE data page](http://fire.irsi.res.in/fire/static/data). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/fire12-hi.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/fire12-hi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/fire12-hi.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/fire12-hi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/fire12-hi/` should be a directory containing the collection, containing `hi_AmarUjala` and `hi_NavbharatTimes` directories. There should be 331,599 documents in total. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-gov2.md b/docs/regressions/regressions-gov2.md similarity index 97% rename from docs/regressions-gov2.md rename to docs/regressions/regressions-gov2.md index b8a46cc776..3b83c63d10 100644 --- a/docs/regressions-gov2.md +++ b/docs/regressions/regressions-gov2.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the Terabyte Tracks from TREC 2004 to 2006, which uses the [Gov2 collection](http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/gov2.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/gov2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/gov2.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/gov2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/gov2/` should be the root directory of the [Gov2 collection](http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm), i.e., `ls /path/to/gov2/` should bring up a bunch of subdirectories, `GX000` to `GX272`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-hc4-neuclir22-fa-en.md b/docs/regressions/regressions-hc4-neuclir22-fa-en.md similarity index 96% rename from docs/regressions-hc4-neuclir22-fa-en.md rename to docs/regressions/regressions-hc4-neuclir22-fa-en.md index 06c943ab98..b9a988ebd0 100644 --- a/docs/regressions-hc4-neuclir22-fa-en.md +++ b/docs/regressions/regressions-hc4-neuclir22-fa-en.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) Persian topics]( The HC4 qrels have been filtered down to include only those in the intersection of the HC4 and NeuCLIR22 corpora. To be clear, the queries are in English and the corpus is in English (automatically translated by the organizers using Sockeye). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-neuclir22-fa-en.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-neuclir22-fa-en.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -41,7 +41,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -173,8 +173,8 @@ With the above commands, you should be able to reproduce the following results: The above results reproduce the BM25 title queries run in Table 2 of [this paper](https://arxiv.org/pdf/2201.08471.pdf). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-neuclir22-fa.md b/docs/regressions/regressions-hc4-neuclir22-fa.md similarity index 96% rename from docs/regressions-hc4-neuclir22-fa.md rename to docs/regressions/regressions-hc4-neuclir22-fa.md index 8488518ffe..749e7099c4 100644 --- a/docs/regressions-hc4-neuclir22-fa.md +++ b/docs/regressions/regressions-hc4-neuclir22-fa.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) Persian topics]( The HC4 qrels have been filtered down to include only those in the intersection of the HC4 and NeuCLIR22 corpora. To be clear, the queries are in Persian (human translations) and the corpus is in Persian. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-neuclir22-fa.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-neuclir22-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-neuclir22-fa.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -41,7 +41,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -173,8 +173,8 @@ With the above commands, you should be able to reproduce the following results: The above results reproduce the BM25 title queries run in Table 2 of [this paper](https://arxiv.org/pdf/2201.08471.pdf). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-neuclir22-fa.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-fa.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-neuclir22-ru-en.md b/docs/regressions/regressions-hc4-neuclir22-ru-en.md similarity index 96% rename from docs/regressions-hc4-neuclir22-ru-en.md rename to docs/regressions/regressions-hc4-neuclir22-ru-en.md index 9282679d20..b0b28311e5 100644 --- a/docs/regressions-hc4-neuclir22-ru-en.md +++ b/docs/regressions/regressions-hc4-neuclir22-ru-en.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) Russian topics]( The HC4 qrels have been filtered down to include only those in the intersection of the HC4 and NeuCLIR22 corpora. To be clear, the queries are in English and the corpus is in English (automatically translated by the organizers using Sockeye). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-neuclir22-ru-en.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-neuclir22-ru-en.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -42,7 +42,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -172,8 +172,8 @@ With the above commands, you should be able to reproduce the following results: | [HC4 (Russian): test-topic description](https://github.com/hltcoe/HC4) | 0.6632 | 0.6866 | 0.6721 | | [HC4 (Russian): test-topic description+title](https://github.com/hltcoe/HC4) | 0.6783 | 0.7089 | 0.7427 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-neuclir22-ru.md b/docs/regressions/regressions-hc4-neuclir22-ru.md similarity index 96% rename from docs/regressions-hc4-neuclir22-ru.md rename to docs/regressions/regressions-hc4-neuclir22-ru.md index 2550308907..ee3a5eacb2 100644 --- a/docs/regressions-hc4-neuclir22-ru.md +++ b/docs/regressions/regressions-hc4-neuclir22-ru.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) Russian topics]( The HC4 qrels have been filtered down to include only those in the intersection of the HC4 and NeuCLIR22 corpora. To be clear, the queries are in Russian (human translations) and the corpus is in Russian. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-neuclir22-ru.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-neuclir22-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-neuclir22-ru.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -42,7 +42,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -172,8 +172,8 @@ With the above commands, you should be able to reproduce the following results: | [HC4 (Russian): test-topic description](https://github.com/hltcoe/HC4) | 0.6640 | 0.5408 | 0.6407 | | [HC4 (Russian): test-topic description+title](https://github.com/hltcoe/HC4) | 0.6667 | 0.6254 | 0.6810 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-neuclir22-ru.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-ru.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-neuclir22-zh-en.md b/docs/regressions/regressions-hc4-neuclir22-zh-en.md similarity index 96% rename from docs/regressions-hc4-neuclir22-zh-en.md rename to docs/regressions/regressions-hc4-neuclir22-zh-en.md index 471cdd29ca..2bbbc4ffd9 100644 --- a/docs/regressions-hc4-neuclir22-zh-en.md +++ b/docs/regressions/regressions-hc4-neuclir22-zh-en.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) Chinese topics]( The HC4 qrels have been filtered down to include only those in the intersection of the HC4 and NeuCLIR22 corpora. To be clear, the queries are in English and the corpus is in English (automatically translated by the organizers using Sockeye). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-neuclir22-zh-en.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-neuclir22-zh-en.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -41,7 +41,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -171,8 +171,8 @@ With the above commands, you should be able to reproduce the following results: | [HC4 (Chinese): test-topic description](https://github.com/hltcoe/HC4) | 0.5573 | 0.6077 | 0.6142 | | [HC4 (Chinese): test-topic description+title](https://github.com/hltcoe/HC4) | 0.6182 | 0.6482 | 0.6516 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-neuclir22-zh.md b/docs/regressions/regressions-hc4-neuclir22-zh.md similarity index 96% rename from docs/regressions-hc4-neuclir22-zh.md rename to docs/regressions/regressions-hc4-neuclir22-zh.md index 015c9604d1..40c7edcd34 100644 --- a/docs/regressions-hc4-neuclir22-zh.md +++ b/docs/regressions/regressions-hc4-neuclir22-zh.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) Chinese topics]( The HC4 qrels have been filtered down to include only those in the intersection of the HC4 and NeuCLIR22 corpora. To be clear, the queries are in Chinese (human translations) and the corpus is in Chinese. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-neuclir22-zh.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-neuclir22-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-neuclir22-zh.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -41,7 +41,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -171,8 +171,8 @@ With the above commands, you should be able to reproduce the following results: | [HC4 (Chinese): test-topic description](https://github.com/hltcoe/HC4) | 0.3565 | 0.2407 | 0.3858 | | [HC4 (Chinese): test-topic description+title](https://github.com/hltcoe/HC4) | 0.4442 | 0.2811 | 0.4259 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-neuclir22-zh.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-neuclir22-zh.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-v1.0-fa.md b/docs/regressions/regressions-hc4-v1.0-fa.md similarity index 97% rename from docs/regressions-hc4-v1.0-fa.md rename to docs/regressions/regressions-hc4-v1.0-fa.md index 51ed308b81..2b7bfcf52a 100644 --- a/docs/regressions-hc4-v1.0-fa.md +++ b/docs/regressions/regressions-hc4-v1.0-fa.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) — Persian](https://github.com/hltcoe/HC4) ([paper](https://arxiv.org/pdf/2201.09992.pdf)). To be clear, the queries are in Persian (human translations) and the corpus is in Persian. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-v1.0-fa.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-v1.0-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-v1.0-fa.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-v1.0-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -274,8 +274,8 @@ With the above commands, you should be able to reproduce the following results: The above results reproduce the BM25 title queries run in Table 2 of [this paper](https://arxiv.org/pdf/2201.08471.pdf). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-v1.0-fa.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-v1.0-fa.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-v1.0-ru.md b/docs/regressions/regressions-hc4-v1.0-ru.md similarity index 97% rename from docs/regressions-hc4-v1.0-ru.md rename to docs/regressions/regressions-hc4-v1.0-ru.md index 0e83090b0c..a0f7dd857e 100644 --- a/docs/regressions-hc4-v1.0-ru.md +++ b/docs/regressions/regressions-hc4-v1.0-ru.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) — Russian](https://github.com/hltcoe/HC4) ([paper](https://arxiv.org/pdf/2201.09992.pdf)). To be clear, the queries are in Russian (human translations) and the corpus is in Russian. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-v1.0-ru.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-v1.0-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-v1.0-ru.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-v1.0-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -41,7 +41,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -273,8 +273,8 @@ With the above commands, you should be able to reproduce the following results: | [HC4 (Russian): test-topic description](https://github.com/hltcoe/HC4) | 0.7355 | 0.6530 | 0.7680 | | [HC4 (Russian): test-topic description+title](https://github.com/hltcoe/HC4) | 0.7721 | 0.7335 | 0.8271 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-v1.0-ru.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-v1.0-ru.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-hc4-v1.0-zh.md b/docs/regressions/regressions-hc4-v1.0-zh.md similarity index 97% rename from docs/regressions-hc4-v1.0-zh.md rename to docs/regressions/regressions-hc4-v1.0-zh.md index 8c2cd1daaa..2af1f33b4b 100644 --- a/docs/regressions-hc4-v1.0-zh.md +++ b/docs/regressions/regressions-hc4-v1.0-zh.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [HC4 (v1.0) — Chinese](https://github.com/hltcoe/HC4) ([paper](https://arxiv.org/pdf/2201.09992.pdf)). To be clear, the queries are in Chinese (human translations) and the corpus is in Chinese. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-v1.0-zh.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-v1.0-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/hc4-v1.0-zh.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/hc4-v1.0-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -272,8 +272,8 @@ With the above commands, you should be able to reproduce the following results: | [HC4 (Chinese): test-topic description](https://github.com/hltcoe/HC4) | 0.6358 | 0.4914 | 0.6481 | | [HC4 (Chinese): test-topic description+title](https://github.com/hltcoe/HC4) | 0.7100 | 0.5979 | 0.7074 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/hc4-v1.0-zh.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/hc4-v1.0-zh.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-13 (commit [`500e87`](https://github.com/castorini/anserini/commit/500e872d594a86cbf01adae644479f74a4b4af2d)) diff --git a/docs/regressions-mb11.md b/docs/regressions/regressions-mb11.md similarity index 97% rename from docs/regressions-mb11.md rename to docs/regressions/regressions-mb11.md index 4693469017..737062fe1d 100644 --- a/docs/regressions-mb11.md +++ b/docs/regressions/regressions-mb11.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the Microblog Tracks from TREC 2011 and 2012 using the Tweets2011 collection. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mb11.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mb11.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mb11.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mb11.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -36,7 +36,7 @@ More available indexing options: * `-tweet.maxId`: the max tweet Id for indexing. Tweet Ids that are larger (when being parsed to Long type) than this value will NOT be indexed, default `LONG.MAX_VALUE` * `-tweet.deletedIdsFile`: a file that contains deleted tweetIds, one per line. these tweeets won't be indexed -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mb13.md b/docs/regressions/regressions-mb13.md similarity index 97% rename from docs/regressions-mb13.md rename to docs/regressions/regressions-mb13.md index f7cff8b332..9aecbdce7e 100644 --- a/docs/regressions-mb13.md +++ b/docs/regressions/regressions-mb13.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the Microblog Tracks from TREC 2013 and 2014 using the Tweets2013 collection. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mb13.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mb13.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mb13.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mb13.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -36,7 +36,7 @@ More available indexing options: * `-tweet.maxId`: the max tweet Id for indexing. Tweet Ids that are larger (when being parsed to Long type) than this value will NOT be indexed, default `LONG.MAX_VALUE` * `-tweet.deletedIdsFile`: a file that contains deleted tweetIds, one per line. these tweeets won't be indexed -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ar-aca.md b/docs/regressions/regressions-miracl-v1.0-ar-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-ar-aca.md rename to docs/regressions/regressions-miracl-v1.0-ar-aca.md index e6ef9487f2..b2a105600c 100644 --- a/docs/regressions-miracl-v1.0-ar-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-ar-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Arabic](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ar-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ar-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ar-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ar-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ar.md b/docs/regressions/regressions-miracl-v1.0-ar.md similarity index 89% rename from docs/regressions-miracl-v1.0-ar.md rename to docs/regressions/regressions-miracl-v1.0-ar.md index 4350de8ad0..e8d8687310 100644 --- a/docs/regressions-miracl-v1.0-ar.md +++ b/docs/regressions/regressions-miracl-v1.0-ar.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Arabic](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ar.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ar.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ar.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ar.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-bn-aca.md b/docs/regressions/regressions-miracl-v1.0-bn-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-bn-aca.md rename to docs/regressions/regressions-miracl-v1.0-bn-aca.md index bbf56c1061..19631e59bd 100644 --- a/docs/regressions-miracl-v1.0-bn-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-bn-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Bengali](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-bn-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-bn-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-bn-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-bn-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-bn.md b/docs/regressions/regressions-miracl-v1.0-bn.md similarity index 89% rename from docs/regressions-miracl-v1.0-bn.md rename to docs/regressions/regressions-miracl-v1.0-bn.md index 464b023040..a7f70455fc 100644 --- a/docs/regressions-miracl-v1.0-bn.md +++ b/docs/regressions/regressions-miracl-v1.0-bn.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Bengali](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-bn.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-bn.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-bn.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-bn.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-en-aca.md b/docs/regressions/regressions-miracl-v1.0-en-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-en-aca.md rename to docs/regressions/regressions-miracl-v1.0-en-aca.md index 7b48a4355e..986d0e5836 100644 --- a/docs/regressions-miracl-v1.0-en-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-en-aca.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — English](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-en-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-en-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-en-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-en-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-en.md b/docs/regressions/regressions-miracl-v1.0-en.md similarity index 89% rename from docs/regressions-miracl-v1.0-en.md rename to docs/regressions/regressions-miracl-v1.0-en.md index 010160a520..f08cde17e0 100644 --- a/docs/regressions-miracl-v1.0-en.md +++ b/docs/regressions/regressions-miracl-v1.0-en.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — English](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-en.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-en.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-es-aca.md b/docs/regressions/regressions-miracl-v1.0-es-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-es-aca.md rename to docs/regressions/regressions-miracl-v1.0-es-aca.md index 8245006d38..4cf2a8f6dd 100644 --- a/docs/regressions-miracl-v1.0-es-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-es-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Spanish](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-es-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-es-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-es-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-es-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-es.md b/docs/regressions/regressions-miracl-v1.0-es.md similarity index 89% rename from docs/regressions-miracl-v1.0-es.md rename to docs/regressions/regressions-miracl-v1.0-es.md index 8d7c8bd6e8..cffc92ca8b 100644 --- a/docs/regressions-miracl-v1.0-es.md +++ b/docs/regressions/regressions-miracl-v1.0-es.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Spanish](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-es.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-es.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-es.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-es.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-fa-aca.md b/docs/regressions/regressions-miracl-v1.0-fa-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-fa-aca.md rename to docs/regressions/regressions-miracl-v1.0-fa-aca.md index a7e5109057..d3fd4537a1 100644 --- a/docs/regressions-miracl-v1.0-fa-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-fa-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Persian](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-fa-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-fa-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-fa-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-fa-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-fa.md b/docs/regressions/regressions-miracl-v1.0-fa.md similarity index 89% rename from docs/regressions-miracl-v1.0-fa.md rename to docs/regressions/regressions-miracl-v1.0-fa.md index 5a97a833ff..674a4bcae7 100644 --- a/docs/regressions-miracl-v1.0-fa.md +++ b/docs/regressions/regressions-miracl-v1.0-fa.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Persian](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-fa.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-fa.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-fi-aca.md b/docs/regressions/regressions-miracl-v1.0-fi-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-fi-aca.md rename to docs/regressions/regressions-miracl-v1.0-fi-aca.md index 87f3ed8ca2..937f2f0650 100644 --- a/docs/regressions-miracl-v1.0-fi-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-fi-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Finnish](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-fi-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-fi-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-fi-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-fi-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-fi.md b/docs/regressions/regressions-miracl-v1.0-fi.md similarity index 89% rename from docs/regressions-miracl-v1.0-fi.md rename to docs/regressions/regressions-miracl-v1.0-fi.md index b185d1043a..3256bfce15 100644 --- a/docs/regressions-miracl-v1.0-fi.md +++ b/docs/regressions/regressions-miracl-v1.0-fi.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Finnish](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-fi.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-fi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-fi.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-fi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-fr-aca.md b/docs/regressions/regressions-miracl-v1.0-fr-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-fr-aca.md rename to docs/regressions/regressions-miracl-v1.0-fr-aca.md index 177ae54b38..041fc7afd2 100644 --- a/docs/regressions-miracl-v1.0-fr-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-fr-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Arabic](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-fr-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-fr-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-fr-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-fr-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-fr.md b/docs/regressions/regressions-miracl-v1.0-fr.md similarity index 89% rename from docs/regressions-miracl-v1.0-fr.md rename to docs/regressions/regressions-miracl-v1.0-fr.md index a0952133b6..998a606170 100644 --- a/docs/regressions-miracl-v1.0-fr.md +++ b/docs/regressions/regressions-miracl-v1.0-fr.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Arabic](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-fr.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-fr.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-fr.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-fr.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-hi-aca.md b/docs/regressions/regressions-miracl-v1.0-hi-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-hi-aca.md rename to docs/regressions/regressions-miracl-v1.0-hi-aca.md index 312971fd13..419939deb3 100644 --- a/docs/regressions-miracl-v1.0-hi-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-hi-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Hindi](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-hi-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-hi-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-hi-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-hi-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-hi.md b/docs/regressions/regressions-miracl-v1.0-hi.md similarity index 89% rename from docs/regressions-miracl-v1.0-hi.md rename to docs/regressions/regressions-miracl-v1.0-hi.md index 08264f68d4..383df69b2f 100644 --- a/docs/regressions-miracl-v1.0-hi.md +++ b/docs/regressions/regressions-miracl-v1.0-hi.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Hindi](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-hi.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-hi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-hi.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-hi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-id-aca.md b/docs/regressions/regressions-miracl-v1.0-id-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-id-aca.md rename to docs/regressions/regressions-miracl-v1.0-id-aca.md index 9119dec9b6..043dbac525 100644 --- a/docs/regressions-miracl-v1.0-id-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-id-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Indonesian](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-id-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-id-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-id-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-id-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-id.md b/docs/regressions/regressions-miracl-v1.0-id.md similarity index 89% rename from docs/regressions-miracl-v1.0-id.md rename to docs/regressions/regressions-miracl-v1.0-id.md index f18f8817c3..41cc08705a 100644 --- a/docs/regressions-miracl-v1.0-id.md +++ b/docs/regressions/regressions-miracl-v1.0-id.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Indonesian](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-id.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-id.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-id.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-id.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ja-aca.md b/docs/regressions/regressions-miracl-v1.0-ja-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-ja-aca.md rename to docs/regressions/regressions-miracl-v1.0-ja-aca.md index a4f614c0b6..be267a0565 100644 --- a/docs/regressions-miracl-v1.0-ja-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-ja-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Japanese](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ja-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ja-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ja-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ja-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ja.md b/docs/regressions/regressions-miracl-v1.0-ja.md similarity index 89% rename from docs/regressions-miracl-v1.0-ja.md rename to docs/regressions/regressions-miracl-v1.0-ja.md index cb4c82fbaa..5024c8108d 100644 --- a/docs/regressions-miracl-v1.0-ja.md +++ b/docs/regressions/regressions-miracl-v1.0-ja.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Japanese](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ja.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ja.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ja.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ja.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ko-aca.md b/docs/regressions/regressions-miracl-v1.0-ko-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-ko-aca.md rename to docs/regressions/regressions-miracl-v1.0-ko-aca.md index ebc51d898d..f9a38ba673 100644 --- a/docs/regressions-miracl-v1.0-ko-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-ko-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Korean](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ko-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ko-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ko-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ko-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ko.md b/docs/regressions/regressions-miracl-v1.0-ko.md similarity index 89% rename from docs/regressions-miracl-v1.0-ko.md rename to docs/regressions/regressions-miracl-v1.0-ko.md index 72bfa1b3e2..d419b53024 100644 --- a/docs/regressions-miracl-v1.0-ko.md +++ b/docs/regressions/regressions-miracl-v1.0-ko.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Korean](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ko.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ko.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ko.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ko.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ru-aca.md b/docs/regressions/regressions-miracl-v1.0-ru-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-ru-aca.md rename to docs/regressions/regressions-miracl-v1.0-ru-aca.md index 7e312191fe..41c7d8d185 100644 --- a/docs/regressions-miracl-v1.0-ru-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-ru-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Russian](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ru-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ru-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ru-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ru-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-ru.md b/docs/regressions/regressions-miracl-v1.0-ru.md similarity index 89% rename from docs/regressions-miracl-v1.0-ru.md rename to docs/regressions/regressions-miracl-v1.0-ru.md index 711ba0c065..1558891615 100644 --- a/docs/regressions-miracl-v1.0-ru.md +++ b/docs/regressions/regressions-miracl-v1.0-ru.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Russian](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-ru.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-ru.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-sw-aca.md b/docs/regressions/regressions-miracl-v1.0-sw-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-sw-aca.md rename to docs/regressions/regressions-miracl-v1.0-sw-aca.md index f4d7b749af..a3af03e24e 100644 --- a/docs/regressions-miracl-v1.0-sw-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-sw-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Swahili](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-sw-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-sw-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-sw-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-sw-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-sw.md b/docs/regressions/regressions-miracl-v1.0-sw.md similarity index 89% rename from docs/regressions-miracl-v1.0-sw.md rename to docs/regressions/regressions-miracl-v1.0-sw.md index 75341e7e6f..0627413438 100644 --- a/docs/regressions-miracl-v1.0-sw.md +++ b/docs/regressions/regressions-miracl-v1.0-sw.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Swahili](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-sw.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-sw.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-sw.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-sw.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-te-aca.md b/docs/regressions/regressions-miracl-v1.0-te-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-te-aca.md rename to docs/regressions/regressions-miracl-v1.0-te-aca.md index 34a8503391..7b88a6a527 100644 --- a/docs/regressions-miracl-v1.0-te-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-te-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Telugu](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-te-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-te-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-te-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-te-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-te.md b/docs/regressions/regressions-miracl-v1.0-te.md similarity index 89% rename from docs/regressions-miracl-v1.0-te.md rename to docs/regressions/regressions-miracl-v1.0-te.md index 1b70ff1d2e..ffff55a3c3 100644 --- a/docs/regressions-miracl-v1.0-te.md +++ b/docs/regressions/regressions-miracl-v1.0-te.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Telugu](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-te.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-te.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-te.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-te.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-th-aca.md b/docs/regressions/regressions-miracl-v1.0-th-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-th-aca.md rename to docs/regressions/regressions-miracl-v1.0-th-aca.md index 19f4de74ce..0a371b7386 100644 --- a/docs/regressions-miracl-v1.0-th-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-th-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Thai](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-th-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-th-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-th-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-th-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-th.md b/docs/regressions/regressions-miracl-v1.0-th.md similarity index 89% rename from docs/regressions-miracl-v1.0-th.md rename to docs/regressions/regressions-miracl-v1.0-th.md index 907f1b8ecf..74e4e2ffde 100644 --- a/docs/regressions-miracl-v1.0-th.md +++ b/docs/regressions/regressions-miracl-v1.0-th.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Thai](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-th.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-th.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-th.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-th.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-zh-aca.md b/docs/regressions/regressions-miracl-v1.0-zh-aca.md similarity index 89% rename from docs/regressions-miracl-v1.0-zh-aca.md rename to docs/regressions/regressions-miracl-v1.0-zh-aca.md index 3e7e4eb5e5..e454b14191 100644 --- a/docs/regressions-miracl-v1.0-zh-aca.md +++ b/docs/regressions/regressions-miracl-v1.0-zh-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Chinese](https://github.com/project-miracl/miracl) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-zh-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-zh-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-zh-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-zh-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-miracl-v1.0-zh.md b/docs/regressions/regressions-miracl-v1.0-zh.md similarity index 89% rename from docs/regressions-miracl-v1.0-zh.md rename to docs/regressions/regressions-miracl-v1.0-zh.md index eb9f879590..f2c2fe5696 100644 --- a/docs/regressions-miracl-v1.0-zh.md +++ b/docs/regressions/regressions-miracl-v1.0-zh.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [MIRACL (v1.0) — Chinese](https://github.com/project-miracl/miracl). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/miracl-v1.0-zh.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/miracl-v1.0-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/miracl-v1.0-zh.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/miracl-v1.0-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ar-aca.md b/docs/regressions/regressions-mrtydi-v1.1-ar-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-ar-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-ar-aca.md index ba06067742..dd143bfae8 100644 --- a/docs/regressions-mrtydi-v1.1-ar-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ar-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Arabic](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ar-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ar-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ar-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ar-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ar.md b/docs/regressions/regressions-mrtydi-v1.1-ar.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-ar.md rename to docs/regressions/regressions-mrtydi-v1.1-ar.md index a5fcb8523a..6821c4144f 100644 --- a/docs/regressions-mrtydi-v1.1-ar.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ar.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Arabic](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ar.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ar.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ar.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ar.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-bn-aca.md b/docs/regressions/regressions-mrtydi-v1.1-bn-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-bn-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-bn-aca.md index 0ff17ec9d9..e80fdeeac3 100644 --- a/docs/regressions-mrtydi-v1.1-bn-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-bn-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Bengali](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-bn-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-bn-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-bn-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-bn-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-bn.md b/docs/regressions/regressions-mrtydi-v1.1-bn.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-bn.md rename to docs/regressions/regressions-mrtydi-v1.1-bn.md index 21f8f09bf7..bc84a274c1 100644 --- a/docs/regressions-mrtydi-v1.1-bn.md +++ b/docs/regressions/regressions-mrtydi-v1.1-bn.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Bengali](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-bn.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-bn.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-bn.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-bn.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-en-aca.md b/docs/regressions/regressions-mrtydi-v1.1-en-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-en-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-en-aca.md index a823cf9c5b..63ab786d5b 100644 --- a/docs/regressions-mrtydi-v1.1-en-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-en-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — English](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-en-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-en-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-en-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-en-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-en.md b/docs/regressions/regressions-mrtydi-v1.1-en.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-en.md rename to docs/regressions/regressions-mrtydi-v1.1-en.md index 89af119638..51276c37d1 100644 --- a/docs/regressions-mrtydi-v1.1-en.md +++ b/docs/regressions/regressions-mrtydi-v1.1-en.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — English](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-en.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-en.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-en.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-fi-aca.md b/docs/regressions/regressions-mrtydi-v1.1-fi-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-fi-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-fi-aca.md index efa68b6b32..8c5d566fbe 100644 --- a/docs/regressions-mrtydi-v1.1-fi-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-fi-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Finnish](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-fi-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-fi-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-fi-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-fi-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-fi.md b/docs/regressions/regressions-mrtydi-v1.1-fi.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-fi.md rename to docs/regressions/regressions-mrtydi-v1.1-fi.md index 9d91b87e9f..e1bcbc15c8 100644 --- a/docs/regressions-mrtydi-v1.1-fi.md +++ b/docs/regressions/regressions-mrtydi-v1.1-fi.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Finnish](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-fi.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-fi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-fi.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-fi.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-id-aca.md b/docs/regressions/regressions-mrtydi-v1.1-id-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-id-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-id-aca.md index 54feece942..509d03d926 100644 --- a/docs/regressions-mrtydi-v1.1-id-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-id-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Indonesian](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-id-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-id-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-id-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-id-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-id.md b/docs/regressions/regressions-mrtydi-v1.1-id.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-id.md rename to docs/regressions/regressions-mrtydi-v1.1-id.md index ba873a1337..daeedaa4b7 100644 --- a/docs/regressions-mrtydi-v1.1-id.md +++ b/docs/regressions/regressions-mrtydi-v1.1-id.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Indonesian](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-id.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-id.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-id.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-id.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ja-aca.md b/docs/regressions/regressions-mrtydi-v1.1-ja-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-ja-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-ja-aca.md index b20d8a72e6..f18c76f702 100644 --- a/docs/regressions-mrtydi-v1.1-ja-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ja-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Japanese](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ja-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ja-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ja-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ja-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ja.md b/docs/regressions/regressions-mrtydi-v1.1-ja.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-ja.md rename to docs/regressions/regressions-mrtydi-v1.1-ja.md index 3a7a2701f9..a1c67bcce4 100644 --- a/docs/regressions-mrtydi-v1.1-ja.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ja.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Japanese](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ja.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ja.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ja.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ja.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ko-aca.md b/docs/regressions/regressions-mrtydi-v1.1-ko-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-ko-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-ko-aca.md index fb881ee717..6495187ad2 100644 --- a/docs/regressions-mrtydi-v1.1-ko-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ko-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Korean](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ko-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ko-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ko-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ko-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ko.md b/docs/regressions/regressions-mrtydi-v1.1-ko.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-ko.md rename to docs/regressions/regressions-mrtydi-v1.1-ko.md index 7b2515c112..e1030d8814 100644 --- a/docs/regressions-mrtydi-v1.1-ko.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ko.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Korean](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ko.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ko.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ko.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ko.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ru-aca.md b/docs/regressions/regressions-mrtydi-v1.1-ru-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-ru-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-ru-aca.md index eb3be79c56..c91c2c3225 100644 --- a/docs/regressions-mrtydi-v1.1-ru-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ru-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Russian](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ru-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ru-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ru-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ru-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-ru.md b/docs/regressions/regressions-mrtydi-v1.1-ru.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-ru.md rename to docs/regressions/regressions-mrtydi-v1.1-ru.md index d6f6e5cfb8..91b26dd682 100644 --- a/docs/regressions-mrtydi-v1.1-ru.md +++ b/docs/regressions/regressions-mrtydi-v1.1-ru.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Russian](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-ru.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-ru.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-ru.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-sw-aca.md b/docs/regressions/regressions-mrtydi-v1.1-sw-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-sw-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-sw-aca.md index ff98751d93..1c4a9fd4ad 100644 --- a/docs/regressions-mrtydi-v1.1-sw-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-sw-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Swahili](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-sw-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-sw-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-sw-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-sw-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-sw.md b/docs/regressions/regressions-mrtydi-v1.1-sw.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-sw.md rename to docs/regressions/regressions-mrtydi-v1.1-sw.md index 2f117ba731..35690ee565 100644 --- a/docs/regressions-mrtydi-v1.1-sw.md +++ b/docs/regressions/regressions-mrtydi-v1.1-sw.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Swahili](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-sw.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-sw.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-sw.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-sw.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-te-aca.md b/docs/regressions/regressions-mrtydi-v1.1-te-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-te-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-te-aca.md index 86f1614737..2735d94990 100644 --- a/docs/regressions-mrtydi-v1.1-te-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-te-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Telugu](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-te-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-te-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-te-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-te-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-te.md b/docs/regressions/regressions-mrtydi-v1.1-te.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-te.md rename to docs/regressions/regressions-mrtydi-v1.1-te.md index 3e0eab4c67..bf39293cc4 100644 --- a/docs/regressions-mrtydi-v1.1-te.md +++ b/docs/regressions/regressions-mrtydi-v1.1-te.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Telugu](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-te.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-te.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-te.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-te.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-th-aca.md b/docs/regressions/regressions-mrtydi-v1.1-th-aca.md similarity index 93% rename from docs/regressions-mrtydi-v1.1-th-aca.md rename to docs/regressions/regressions-mrtydi-v1.1-th-aca.md index cbd103ea08..c2343b1cca 100644 --- a/docs/regressions-mrtydi-v1.1-th-aca.md +++ b/docs/regressions/regressions-mrtydi-v1.1-th-aca.md @@ -4,8 +4,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Thai](https://github.com/castorini/mr.tydi) using `AutoCompositeAnalyzer`. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-th-aca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-th-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-th-aca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-th-aca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-mrtydi-v1.1-th.md b/docs/regressions/regressions-mrtydi-v1.1-th.md similarity index 92% rename from docs/regressions-mrtydi-v1.1-th.md rename to docs/regressions/regressions-mrtydi-v1.1-th.md index a30e646376..76b48cb6a8 100644 --- a/docs/regressions-mrtydi-v1.1-th.md +++ b/docs/regressions/regressions-mrtydi-v1.1-th.md @@ -2,8 +2,8 @@ This page documents BM25 regression experiments for [Mr. TyDi (v1.1) — Thai](https://github.com/castorini/mr.tydi). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/mrtydi-v1.1-th.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/mrtydi-v1.1-th.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/mrtydi-v1.1-th.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/mrtydi-v1.1-th.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -26,7 +26,7 @@ target/appassembler/bin/IndexCollection \ ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-doc-ca.md b/docs/regressions/regressions-msmarco-doc-ca.md similarity index 91% rename from docs/regressions-msmarco-doc-ca.md rename to docs/regressions/regressions-msmarco-doc-ca.md index 492da8b1a0..153683c489 100644 --- a/docs/regressions-msmarco-doc-ca.md +++ b/docs/regressions/regressions-msmarco-doc-ca.md @@ -5,8 +5,8 @@ This page documents regression experiments on the [MS MARCO document ranking task](https://github.com/microsoft/MSMARCO-Document-Ranking), which is integrated into Anserini's regression testing framework. Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** with **WordPiece tokenization** (i.e., from BERT) using the following tokenizer from HuggingFace [`bert-base-uncased`](https://huggingface.co/bert-base-uncased). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,9 +29,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-doc-docTTTTTquery.md b/docs/regressions/regressions-msmarco-doc-docTTTTTquery.md similarity index 93% rename from docs/regressions-msmarco-doc-docTTTTTquery.md rename to docs/regressions/regressions-msmarco-doc-docTTTTTquery.md index 59d2b31457..cc69c9affd 100644 --- a/docs/regressions-msmarco-doc-docTTTTTquery.md +++ b/docs/regressions/regressions-msmarco-doc-docTTTTTquery.md @@ -10,10 +10,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -38,9 +38,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-docTTTTTquery/` should be a directory containing the expanded document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -101,7 +101,7 @@ Explanation of settings: In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](../../docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. ## Additional Implementation Details diff --git a/docs/regressions-msmarco-doc-hgf-wp.md b/docs/regressions/regressions-msmarco-doc-hgf-wp.md similarity index 93% rename from docs/regressions-msmarco-doc-hgf-wp.md rename to docs/regressions/regressions-msmarco-doc-hgf-wp.md index 178227ce72..d6d1824788 100644 --- a/docs/regressions-msmarco-doc-hgf-wp.md +++ b/docs/regressions/regressions-msmarco-doc-hgf-wp.md @@ -7,8 +7,8 @@ Here we are using **WordPiece tokenization** (i.e., from BERT) with the followin In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-hgf-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-hgf-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,7 +32,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-doc-segmented-ca.md b/docs/regressions/regressions-msmarco-doc-segmented-ca.md similarity index 91% rename from docs/regressions-msmarco-doc-segmented-ca.md rename to docs/regressions/regressions-msmarco-doc-segmented-ca.md index 29fe567aad..b24413db67 100644 --- a/docs/regressions-msmarco-doc-segmented-ca.md +++ b/docs/regressions/regressions-msmarco-doc-segmented-ca.md @@ -5,17 +5,17 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](../../docs/experiments-msmarco-doc.md). + **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing + **Expansion Condition:** none In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-segmented-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-segmented-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -40,9 +40,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-doc-segmented-docTTTTTquery.md b/docs/regressions/regressions-msmarco-doc-segmented-docTTTTTquery.md similarity index 93% rename from docs/regressions-msmarco-doc-segmented-docTTTTTquery.md rename to docs/regressions/regressions-msmarco-doc-segmented-docTTTTTquery.md index c687bfa7e8..1286055c93 100644 --- a/docs/regressions-msmarco-doc-segmented-docTTTTTquery.md +++ b/docs/regressions/regressions-msmarco-doc-segmented-docTTTTTquery.md @@ -11,10 +11,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5. In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-segmented-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-segmented-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -39,9 +39,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented-docTTTTTquery/` should be a directory containing the expanded segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -103,7 +103,7 @@ In these runs, we are retrieving the top 1000 hits for each query and using `tre Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](../../docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. The MaxP passage retrieval functionality is available in `SearchCollection`. To generate an MS MARCO submission with the BM25 default parameters, corresponding to "BM25 (default)" above: diff --git a/docs/regressions-msmarco-doc-segmented-unicoil-noexp.md b/docs/regressions/regressions-msmarco-doc-segmented-unicoil-noexp.md similarity index 93% rename from docs/regressions-msmarco-doc-segmented-unicoil-noexp.md rename to docs/regressions/regressions-msmarco-doc-segmented-unicoil-noexp.md index b9de71508d..0e54ed5905 100644 --- a/docs/regressions-msmarco-doc-segmented-unicoil-noexp.md +++ b/docs/regressions/regressions-msmarco-doc-segmented-unicoil-noexp.md @@ -11,8 +11,8 @@ The experiments on this page are not actually reported in the paper. However, the model is the same, applied to the MS MARCO _segmented_ document corpus (without any expansions). Retrieval uses MaxP technique, where we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-segmented-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-segmented-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -67,7 +67,7 @@ The directory `/path/to/msmarco-doc-segmented-unicoil-noexp/` should point to th The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -119,9 +119,9 @@ Because of tie-breaking effects, we get slightly different results: | `-hits 10000 -selectMaxPassage.hits 100` | 0.3409 | 0.3409 | 0.8639 | - | 0.3410112121151749 | | `-hits 1000 -selectMaxPassage.hits 100` | 0.3409 | 0.3409 | 0.8639 | - | 0.3410112121151749 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2021-06-28 (commit [`1550683`](https://github.com/castorini/anserini/commit/1550683e41cefe89b7e67c0a5f0e147bc70dfcda)) + Results reproduced by [@JMMackenzie](https://github.com/JMMackenzie) on 2021-07-02 (commit [`e4c5127`](https://github.com/castorini/anserini/commit/e4c51278d375ebad9aa2bf9bde66cab32260d6b4)) diff --git a/docs/regressions-msmarco-doc-segmented-unicoil.md b/docs/regressions/regressions-msmarco-doc-segmented-unicoil.md similarity index 94% rename from docs/regressions-msmarco-doc-segmented-unicoil.md rename to docs/regressions/regressions-msmarco-doc-segmented-unicoil.md index af1089fffa..51af407557 100644 --- a/docs/regressions-msmarco-doc-segmented-unicoil.md +++ b/docs/regressions/regressions-msmarco-doc-segmented-unicoil.md @@ -11,8 +11,8 @@ The experiments on this page are not actually reported in the paper. However, the model is the same, applied to the MS MARCO _segmented_ document corpus (with doc2query-T5 expansions). Retrieval uses MaxP technique, where we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-segmented-unicoil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-segmented-unicoil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -67,7 +67,7 @@ The directory `/path/to/msmarco-doc-segmented-unicoil/` should point to the corp The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -150,9 +150,9 @@ Because of tie-breaking effects, we get slightly different results: | `-hits 10000 -selectMaxPassage.hits 100` | 0.3531 | 0.3531 | 0.8860 | - | 0.352997702662614 | | `-hits 1000 -selectMaxPassage.hits 100` | 0.3531 | 0.3531 | 0.8860 | - | 0.352997702662614 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2021-06-28 (commit [`1550683`](https://github.com/castorini/anserini/commit/1550683e41cefe89b7e67c0a5f0e147bc70dfcda)) + Results reproduced by [@JMMackenzie](https://github.com/JMMackenzie) on 2021-07-02 (commit [`e4c5127`](https://github.com/castorini/anserini/commit/e4c51278d375ebad9aa2bf9bde66cab32260d6b4)) diff --git a/docs/regressions-msmarco-doc-segmented-wp.md b/docs/regressions/regressions-msmarco-doc-segmented-wp.md similarity index 91% rename from docs/regressions-msmarco-doc-segmented-wp.md rename to docs/regressions/regressions-msmarco-doc-segmented-wp.md index 4258890e82..6bff0d5618 100644 --- a/docs/regressions-msmarco-doc-segmented-wp.md +++ b/docs/regressions/regressions-msmarco-doc-segmented-wp.md @@ -7,8 +7,8 @@ Here we are using **WordPiece tokenization** (i.e., from BERT) on passages from At retrieval time, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-segmented-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-segmented-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,9 +31,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-doc-segmented.md b/docs/regressions/regressions-msmarco-doc-segmented.md similarity index 93% rename from docs/regressions-msmarco-doc-segmented.md rename to docs/regressions/regressions-msmarco-doc-segmented.md index 32337d9035..8ee9c58c0a 100644 --- a/docs/regressions-msmarco-doc-segmented.md +++ b/docs/regressions/regressions-msmarco-doc-segmented.md @@ -11,10 +11,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. In the passage (i.e., segment) indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-segmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-segmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -39,9 +39,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -103,7 +103,7 @@ In these runs, we are retrieving the top 1000 hits for each query and using `tre Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](../../docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. The MaxP passage retrieval functionality is available in `SearchCollection`. To generate an MS MARCO submission with the BM25 default parameters, corresponding to "BM25 (default)" above: diff --git a/docs/regressions-msmarco-doc-wp.md b/docs/regressions/regressions-msmarco-doc-wp.md similarity index 91% rename from docs/regressions-msmarco-doc-wp.md rename to docs/regressions/regressions-msmarco-doc-wp.md index 5432fe80c6..14e11505ce 100644 --- a/docs/regressions-msmarco-doc-wp.md +++ b/docs/regressions/regressions-msmarco-doc-wp.md @@ -6,8 +6,8 @@ This page documents regression experiments on the [MS MARCO document ranking tas Here we are using **WordPiece tokenization** (i.e., from BERT) on the entire document. In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,9 +30,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-doc.md b/docs/regressions/regressions-msmarco-doc.md similarity index 93% rename from docs/regressions-msmarco-doc.md rename to docs/regressions/regressions-msmarco-doc.md index 954b5f7b3e..7fc7802316 100644 --- a/docs/regressions-msmarco-doc.md +++ b/docs/regressions/regressions-msmarco-doc.md @@ -10,10 +10,10 @@ Note that there are four different bag-of-words regression conditions for this t All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery), in the context of doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-doc.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](../../docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -38,9 +38,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](../../docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -109,14 +109,14 @@ Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. + The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](../../docs/experiments-msmarco-doc.md) additional details. -See [this page](experiments-msmarco-doc.md) for more details on tuning. +See [this page](../../docs/experiments-msmarco-doc.md) for more details on tuning. In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](../../docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. ## Additional Implementation Details diff --git a/docs/regressions-msmarco-passage-bm25-b8.md b/docs/regressions/regressions-msmarco-passage-bm25-b8.md similarity index 88% rename from docs/regressions-msmarco-passage-bm25-b8.md rename to docs/regressions/regressions-msmarco-passage-bm25-b8.md index 4af77f2e31..b436fd24b1 100644 --- a/docs/regressions-msmarco-passage-bm25-b8.md +++ b/docs/regressions/regressions-msmarco-passage-bm25-b8.md @@ -3,10 +3,10 @@ **Models**: BM25 with quantized weights (8 bits) This page documents regression experiments on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), which is integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-bm25-b8.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-bm25-b8.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -55,12 +55,12 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-bm25-b8/` should be a directory containing `jsonl` files containing quantized BM25 vectors for every document -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -96,8 +96,8 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **BM25 (default parameters, quantized 8 bits)**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.8562 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-14 (commit [`dc07344`](https://github.com/castorini/anserini/commit/dc073447c8a0c07b53d979c49bf1e2e018200508)) diff --git a/docs/regressions-msmarco-passage-ca.md b/docs/regressions/regressions-msmarco-passage-ca.md similarity index 91% rename from docs/regressions-msmarco-passage-ca.md rename to docs/regressions/regressions-msmarco-passage-ca.md index 8da43f7565..7d95d37bdd 100644 --- a/docs/regressions-msmarco-passage-ca.md +++ b/docs/regressions/regressions-msmarco-passage-ca.md @@ -5,8 +5,8 @@ This page documents regression experiments on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), which is integrated into Anserini's regression testing framework. Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** with **WordPiece tokenization** (i.e., from BERT) using the following tokenizer from HuggingFace [`bert-base-uncased`](https://huggingface.co/bert-base-uncased). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-ca.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-ca.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-ca.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,12 +30,12 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: diff --git a/docs/regressions-msmarco-passage-cos-dpr-distil.md b/docs/regressions/regressions-msmarco-passage-cos-dpr-distil.md similarity index 89% rename from docs/regressions-msmarco-passage-cos-dpr-distil.md rename to docs/regressions/regressions-msmarco-passage-cos-dpr-distil.md index a377ac302b..0208be17fa 100644 --- a/docs/regressions-msmarco-passage-cos-dpr-distil.md +++ b/docs/regressions/regressions-msmarco-passage-cos-dpr-distil.md @@ -8,8 +8,8 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-cos-dpr-distil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-cos-dpr-distil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -62,12 +62,12 @@ The path `/path/to/msmarco-passage-cos-dpr-distil/` should point to the corpus d Upon completion, we should have an index with 8,841,823 documents. - + ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows using HNSW indexes: @@ -106,8 +106,8 @@ With the above commands, you should be able to reproduce the following results: Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run. Nevertheless, scores are generally stable to the third digit after the decimal point. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@yilinjz](https://github.com/yilinjz) on 2023-09-01 (commit [`4ae518b`](https://github.com/castorini/anserini/commit/4ae518bb284ebcba0b273a473bc8774735cb7d19)) \ No newline at end of file diff --git a/docs/regressions-msmarco-passage-deepimpact.md b/docs/regressions/regressions-msmarco-passage-deepimpact.md similarity index 92% rename from docs/regressions-msmarco-passage-deepimpact.md rename to docs/regressions/regressions-msmarco-passage-deepimpact.md index 11fff02c59..56a55a8a87 100644 --- a/docs/regressions-msmarco-passage-deepimpact.md +++ b/docs/regressions/regressions-msmarco-passage-deepimpact.md @@ -7,8 +7,8 @@ The DeepImpact model is described in the following paper: > Antonio Mallia, Omar Khattab, Nicola Tonellotto, and Torsten Suel. [Learning Passage Impacts for Inverted Indexes.](https://dl.acm.org/doi/10.1145/3404835.3463030) _SIGIR 2021_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-deepimpact.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-deepimpact.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -62,12 +62,12 @@ The path `/path/to/msmarco-passage-deepimpact/` should point to the corpus downl The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADEv2 tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -127,9 +127,9 @@ QueriesRanked: 6980 The final evaluation metric is very close to the one reported in the paper (0.326). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@MXueguang](https://github.com/MXueguang) on 2021-06-17 (commit [`ff618db`](https://github.com/castorini/anserini/commit/ff618dbf87feee0ad75dc42c72a361c05984097d)) + Results reproduced by [@JMMackenzie](https://github.com/jmmackenzie) on 2021-06-22 (commit [`4904341`](https://github.com/castorini/anserini/commit/490434172a035b6eade8c17771aed83cc7f5d996)) diff --git a/docs/regressions-msmarco-passage-distill-splade-max.md b/docs/regressions/regressions-msmarco-passage-distill-splade-max.md similarity index 92% rename from docs/regressions-msmarco-passage-distill-splade-max.md rename to docs/regressions/regressions-msmarco-passage-distill-splade-max.md index 5c5839d976..d28ff0291b 100644 --- a/docs/regressions-msmarco-passage-distill-splade-max.md +++ b/docs/regressions/regressions-msmarco-passage-distill-splade-max.md @@ -7,8 +7,8 @@ The DistilSPLADE-max model is described in the following paper: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant. [SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval.](https://arxiv.org/abs/2109.10086) _arXiv:2109.10086_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-distill-splade-max.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-distill-splade-max.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-distill-splade-max/` should point to the corp The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADEv2 tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -128,9 +128,9 @@ QueriesRanked: 6980 This corresponds to the effectiveness reported in the paper. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@jmmackenzie](https://github.com/jmmackenzie) on 2021-10-15 (commit [`52b76f6`](https://github.com/castorini/anserini/commit/52b76f63b163036e8fad1a6e1b10b431b4ddd06c)) + Results reproduced by [@justram](https://github.com/justram) on 2022-03-02 (commit [`41b64d9`](https://github.com/castorini/anserini/commit/41b65d9fcb82d787faf4ca937f81faca82ead8c2)) diff --git a/docs/regressions-msmarco-passage-doc2query.md b/docs/regressions/regressions-msmarco-passage-doc2query.md similarity index 92% rename from docs/regressions-msmarco-passage-doc2query.md rename to docs/regressions/regressions-msmarco-passage-doc2query.md index 06960deb20..6f02845ff9 100644 --- a/docs/regressions-msmarco-passage-doc2query.md +++ b/docs/regressions/regressions-msmarco-passage-doc2query.md @@ -7,10 +7,10 @@ This page documents regression experiments on the [MS MARCO passage ranking task > Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. [Document Expansion by Query Prediction.](https://arxiv.org/abs/1904.08375) arXiv:1904.08375, 2019. These experiments are integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-doc2query.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-doc2query.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-doc2query.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-doc2query.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-doc2query.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-doc2query.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -33,14 +33,14 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-passage-doc2query` should be a directory containing `jsonl` files containing the expanded passage collection. -[This page](experiments-doc2query.md) explains how to perform this data preparation. +[This page](../../docs/experiments-doc2query.md) explains how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -91,7 +91,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](../../docs/experiments-msmarco-passage.md). ## Additional Implementation Details diff --git a/docs/regressions-msmarco-passage-docTTTTTquery.md b/docs/regressions/regressions-msmarco-passage-docTTTTTquery.md similarity index 95% rename from docs/regressions-msmarco-passage-docTTTTTquery.md rename to docs/regressions/regressions-msmarco-passage-docTTTTTquery.md index 6bd66d91ee..8045c517fc 100644 --- a/docs/regressions-msmarco-passage-docTTTTTquery.md +++ b/docs/regressions/regressions-msmarco-passage-docTTTTTquery.md @@ -8,8 +8,8 @@ This page documents regression experiments on the [MS MARCO passage ranking task These experiments are integrated into Anserini's regression testing framework. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-docTTTTTquery.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-docTTTTTquery.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -34,12 +34,12 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-docTTTTTquery` should be a directory containing `jsonl` files containing the expanded passage collection. [Instructions in the docTTTTTquery repo](http://doc2query.ai/) explain how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -102,7 +102,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](../../docs/experiments-msmarco-passage.md). + The setting "tuned2" refers to `k1=2.18`, `b=0.86`, tuned to optimize for recall@1000 directly _on the expanded passages_ (in 2020/12); this is the configuration reported in the Lin et al. (SIGIR 2021) Pyserini paper. ## Additional Implementation Details diff --git a/docs/regressions-msmarco-passage-hgf-wp.md b/docs/regressions/regressions-msmarco-passage-hgf-wp.md similarity index 91% rename from docs/regressions-msmarco-passage-hgf-wp.md rename to docs/regressions/regressions-msmarco-passage-hgf-wp.md index e3b40c832f..d966562785 100644 --- a/docs/regressions-msmarco-passage-hgf-wp.md +++ b/docs/regressions/regressions-msmarco-passage-hgf-wp.md @@ -7,8 +7,8 @@ Here we are using **WordPiece tokenization** (i.e., from BERT) with the followin In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-hgf-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-hgf-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-hgf-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,12 +32,12 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: diff --git a/docs/regressions-msmarco-passage-splade-distil-cocodenser-medium.md b/docs/regressions/regressions-msmarco-passage-splade-distil-cocodenser-medium.md similarity index 90% rename from docs/regressions-msmarco-passage-splade-distil-cocodenser-medium.md rename to docs/regressions/regressions-msmarco-passage-splade-distil-cocodenser-medium.md index de09847b80..39c78dd901 100644 --- a/docs/regressions-msmarco-passage-splade-distil-cocodenser-medium.md +++ b/docs/regressions/regressions-msmarco-passage-splade-distil-cocodenser-medium.md @@ -5,8 +5,8 @@ This page describes regression experiments, integrated into Anserini's regression testing framework, using the SPLADE-distil CoCodenser Medium model on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking). The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-splade-distil-cocodenser-medium.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-splade-distil-cocodenser-medium.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -61,12 +61,12 @@ The path `/path/to/msmarco-passage-splade_distil_cocodenser_medium/` should poin The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -102,8 +102,8 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE-distill CoCodenser Medium**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9817 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-14 (commit [`dc07344`](https://github.com/castorini/anserini/commit/dc073447c8a0c07b53d979c49bf1e2e018200508)) diff --git a/docs/regressions-msmarco-passage-splade-pp-ed-onnx.md b/docs/regressions/regressions-msmarco-passage-splade-pp-ed-onnx.md similarity index 90% rename from docs/regressions-msmarco-passage-splade-pp-ed-onnx.md rename to docs/regressions/regressions-msmarco-passage-splade-pp-ed-onnx.md index a5ff2e962f..a338515672 100644 --- a/docs/regressions-msmarco-passage-splade-pp-ed-onnx.md +++ b/docs/regressions/regressions-msmarco-passage-splade-pp-ed-onnx.md @@ -8,8 +8,8 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-splade-pp-ed-onnx.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-splade-pp-ed-onnx.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-splade-pp-ed/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -104,9 +104,9 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-EnsembleDistil**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9831 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@cadurosar](https://github.com/cadurosar) on 2023-05-31 (commit [`a403a2`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-msmarco-passage-splade-pp-ed.md b/docs/regressions/regressions-msmarco-passage-splade-pp-ed.md similarity index 90% rename from docs/regressions-msmarco-passage-splade-pp-ed.md rename to docs/regressions/regressions-msmarco-passage-splade-pp-ed.md index b8c4f98daf..2f78c4c7a6 100644 --- a/docs/regressions-msmarco-passage-splade-pp-ed.md +++ b/docs/regressions/regressions-msmarco-passage-splade-pp-ed.md @@ -8,8 +8,8 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-splade-pp-ed.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-splade-pp-ed.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-splade-pp-ed/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -104,9 +104,9 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-EnsembleDistil**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9831 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@justram](https://github.com/justram) on 2023-03-08 (commit [`03f95a8`](https://github.com/castorini/anserini/commit/03f95a8e1ae09ab09efe046bfcbd3a4cdda691b4)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) \ No newline at end of file diff --git a/docs/regressions-msmarco-passage-splade-pp-sd-onnx.md b/docs/regressions/regressions-msmarco-passage-splade-pp-sd-onnx.md similarity index 90% rename from docs/regressions-msmarco-passage-splade-pp-sd-onnx.md rename to docs/regressions/regressions-msmarco-passage-splade-pp-sd-onnx.md index dd513ab06b..10d22f3775 100644 --- a/docs/regressions-msmarco-passage-splade-pp-sd-onnx.md +++ b/docs/regressions/regressions-msmarco-passage-splade-pp-sd-onnx.md @@ -8,8 +8,8 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-splade-pp-sd-onnx.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-splade-pp-sd-onnx.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-splade-pp-sd/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -104,9 +104,9 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-SelfDistil**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9846 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@cadurosar](https://github.com/cadurosar) on 2023-05-31 (commit [`a403a2`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-msmarco-passage-splade-pp-sd.md b/docs/regressions/regressions-msmarco-passage-splade-pp-sd.md similarity index 90% rename from docs/regressions-msmarco-passage-splade-pp-sd.md rename to docs/regressions/regressions-msmarco-passage-splade-pp-sd.md index 7093fcb9fa..6c2e7ade9d 100644 --- a/docs/regressions-msmarco-passage-splade-pp-sd.md +++ b/docs/regressions/regressions-msmarco-passage-splade-pp-sd.md @@ -8,8 +8,8 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-splade-pp-sd.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-splade-pp-sd.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-splade-pp-sd/` should point to the corpus dow The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -104,9 +104,9 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **SPLADE++ CoCondenser-SelfDistil**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9846 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@justram](https://github.com/justram) on 2023-03-08 (commit [`03f95a8`](https://github.com/castorini/anserini/commit/03f95a8e1ae09ab09efe046bfcbd3a4cdda691b4)) + Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2023-06-01 (commit [`a403a2a`](https://github.com/castorini/anserini/commit/a403a2a44af9322c7a2dbdb5240180a62398ab06)) diff --git a/docs/regressions-msmarco-passage-unicoil-noexp.md b/docs/regressions/regressions-msmarco-passage-unicoil-noexp.md similarity index 91% rename from docs/regressions-msmarco-passage-unicoil-noexp.md rename to docs/regressions/regressions-msmarco-passage-unicoil-noexp.md index df0cfcb5d7..eee979caaa 100644 --- a/docs/regressions-msmarco-passage-unicoil-noexp.md +++ b/docs/regressions/regressions-msmarco-passage-unicoil-noexp.md @@ -10,8 +10,8 @@ The uniCOIL model is described in the following paper: The experiments on this page are not actually reported in the paper. Here, a variant model without expansion is used. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil-noexp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-unicoil-noexp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -66,12 +66,12 @@ The path `/path/to/msmarco-passage-unicoil-noexp/` should point to the corpus do The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -107,9 +107,9 @@ With the above commands, you should be able to reproduce the following results: | **R@1000** | **uniCOIL (no expansions)**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9239 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2021-06-28 (commit [`1550683`](https://github.com/castorini/anserini/commit/1550683e41cefe89b7e67c0a5f0e147bc70dfcda)) + Results reproduced by [@JMMackenzie](https://github.com/JMMackenzie) on 2021-07-02 (commit [`e4c5127`](https://github.com/castorini/anserini/commit/e4c51278d375ebad9aa2bf9bde66cab32260d6b4)) diff --git a/docs/regressions-msmarco-passage-unicoil-tilde-expansion.md b/docs/regressions/regressions-msmarco-passage-unicoil-tilde-expansion.md similarity index 92% rename from docs/regressions-msmarco-passage-unicoil-tilde-expansion.md rename to docs/regressions/regressions-msmarco-passage-unicoil-tilde-expansion.md index 298fa3e22c..7bf5573009 100644 --- a/docs/regressions-msmarco-passage-unicoil-tilde-expansion.md +++ b/docs/regressions/regressions-msmarco-passage-unicoil-tilde-expansion.md @@ -7,8 +7,8 @@ The uniCOIL+TILDE model is described in the following paper: > Shengyao Zhuang and Guido Zuccon. [Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion.](https://arxiv.org/pdf/2108.08513) _arXiv:2108.08513_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil-tilde-expansion.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-unicoil-tilde-expansion.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-unicoil-tilde-expansion/` should point to the The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADEv2 tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -128,9 +128,9 @@ QueriesRanked: 6980 This corresponds to the effectiveness reported in the paper. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@MXueguang](https://github.com/MXueguang) on 2021-09-14 (commit [`a05fc52`](https://github.com/castorini/anserini/commit/a05fc5215a6d9de77bd5f4b8f874f608442024a3)) + Results reproduced by [@jmmackenzie](https://github.com/jmmackenzie) on 2021-10-15 (commit [`52b76f6`](https://github.com/castorini/anserini/commit/52b76f63b163036e8fad1a6e1b10b431b4ddd06c)) diff --git a/docs/regressions-msmarco-passage-unicoil.md b/docs/regressions/regressions-msmarco-passage-unicoil.md similarity index 92% rename from docs/regressions-msmarco-passage-unicoil.md rename to docs/regressions/regressions-msmarco-passage-unicoil.md index 1426b5982b..a7a41b374a 100644 --- a/docs/regressions-msmarco-passage-unicoil.md +++ b/docs/regressions/regressions-msmarco-passage-unicoil.md @@ -7,8 +7,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-unicoil.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -63,12 +63,12 @@ The path `/path/to/msmarco-passage-unicoil/` should point to the corpus download The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -128,9 +128,9 @@ QueriesRanked: 6980 This corresponds to the effectiveness reported in the paper and also the run named "uniCOIL-d2q" on the official MS MARCO Passage Ranking Leaderboard, submitted 2021/09/22. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2021-06-28 (commit [`1550683`](https://github.com/castorini/anserini/commit/1550683e41cefe89b7e67c0a5f0e147bc70dfcda)) + Results reproduced by [@JMMackenzie](https://github.com/JMMackenzie) on 2021-07-02 (commit [`e4c5127`](https://github.com/castorini/anserini/commit/e4c51278d375ebad9aa2bf9bde66cab32260d6b4)) diff --git a/docs/regressions-msmarco-passage-wp.md b/docs/regressions/regressions-msmarco-passage-wp.md similarity index 91% rename from docs/regressions-msmarco-passage-wp.md rename to docs/regressions/regressions-msmarco-passage-wp.md index f2040a2903..e2c1472ebd 100644 --- a/docs/regressions-msmarco-passage-wp.md +++ b/docs/regressions/regressions-msmarco-passage-wp.md @@ -6,8 +6,8 @@ This page documents regression experiments on the [MS MARCO passage ranking task Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-wp.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage-wp.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage-wp.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,12 +31,12 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: diff --git a/docs/regressions-msmarco-passage.md b/docs/regressions/regressions-msmarco-passage.md similarity index 93% rename from docs/regressions-msmarco-passage.md rename to docs/regressions/regressions-msmarco-passage.md index cd52851103..1f50e4b2b0 100644 --- a/docs/regressions-msmarco-passage.md +++ b/docs/regressions/regressions-msmarco-passage.md @@ -3,10 +3,10 @@ **Models**: various bag-of-words approaches This page documents regression experiments on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), which is integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-msmarco-passage.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-passage.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,14 +29,14 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-passage/` should be a directory containing `jsonl` files converted from the official passage collection, which is in `tsv` format. -[This page](experiments-msmarco-passage.md) explains how to perform this conversion. +[This page](../../docs/experiments-msmarco-passage.md) explains how to perform this conversion. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](../../docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -87,7 +87,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](../../docs/experiments-msmarco-passage.md). To generate runs corresponding to the submissions on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/), follow the instructions below: diff --git a/docs/regressions-msmarco-v2-doc-d2q-t5.md b/docs/regressions/regressions-msmarco-v2-doc-d2q-t5.md similarity index 92% rename from docs/regressions-msmarco-v2-doc-d2q-t5.md rename to docs/regressions/regressions-msmarco-v2-doc-d2q-t5.md index 0f173a9e5e..14860aaf5b 100644 --- a/docs/regressions-msmarco-v2-doc-d2q-t5.md +++ b/docs/regressions/regressions-msmarco-v2-doc-d2q-t5.md @@ -5,8 +5,8 @@ This page describes regression experiments for document ranking on the MS MARCO (V2) document corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we expand the document corpus with doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,9 +29,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-v2-doc-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-doc-segmented-d2q-t5.md b/docs/regressions/regressions-msmarco-v2-doc-segmented-d2q-t5.md similarity index 92% rename from docs/regressions-msmarco-v2-doc-segmented-d2q-t5.md rename to docs/regressions/regressions-msmarco-v2-doc-segmented-d2q-t5.md index 32d4105339..c53e80a500 100644 --- a/docs/regressions-msmarco-v2-doc-segmented-d2q-t5.md +++ b/docs/regressions/regressions-msmarco-v2-doc-segmented-d2q-t5.md @@ -5,8 +5,8 @@ This page describes regression experiments for document ranking _on the segmented version_ of the MS MARCO (V2) document corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we expand the segmented document corpus with doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-segmented-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-segmented-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,9 +29,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-v2-doc-segmented-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md similarity index 93% rename from docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md rename to docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md index c95696943b..cbff8cf47d 100644 --- a/docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md +++ b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md @@ -15,8 +15,8 @@ This regression captures the latter title/segment encoding, which for clarity we The segment-only encoding results are deprecated and kept around primarily for archival purposes and ablation experiments. You probably don't want to use them. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-0shot-v2.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-0shot-v2.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -77,7 +77,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-0shot-v2/` should point to t The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -129,8 +129,8 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.9122 | | [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.9172 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot.md b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot.md similarity index 93% rename from docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot.md rename to docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot.md index f452c417ae..ded8d8a8e2 100644 --- a/docs/regressions-msmarco-v2-doc-segmented-unicoil-0shot.md +++ b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot.md @@ -14,8 +14,8 @@ Initially, we fed only the segment text, but later we realized that prepending t This regression captures segment-only encoding and is kept around primarily for archival purposes; you probably don't want to use this one unless you're running ablation experiments. The version that uses title/segment encoding can be found [here](regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -76,7 +76,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-0shot/` should point to the The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -128,8 +128,8 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.9056 | | [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.9097 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md similarity index 93% rename from docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md rename to docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md index bfade983a4..6413a4bbab 100644 --- a/docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md +++ b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md @@ -15,8 +15,8 @@ This regression captures the latter title/segment encoding, which for clarity we The segment-only encoding results are deprecated and kept around primarily for archival purposes and ablation experiments. You probably don't want to use them. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -77,7 +77,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2/` should poin The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -129,8 +129,8 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8987 | | [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8995 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot.md b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot.md similarity index 93% rename from docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot.md rename to docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot.md index 283946c05d..0ad1cc7d80 100644 --- a/docs/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot.md +++ b/docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot.md @@ -14,8 +14,8 @@ Initially, we fed only the segment text, but later we realized that prepending t This regression captures segment-only encoding and is kept around primarily for archival purposes; you probably don't want to use this one unless you're running ablation experiments. The version that uses title/segment encoding can be found [here](regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-noexp-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-segmented-unicoil-noexp-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -76,7 +76,7 @@ The path `/path/to/msmarco-v2-doc-segmented-unicoil-noexp-0shot/` should point t The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -128,8 +128,8 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8854 | | [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8899 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-msmarco-v2-doc-segmented.md b/docs/regressions/regressions-msmarco-v2-doc-segmented.md similarity index 92% rename from docs/regressions-msmarco-v2-doc-segmented.md rename to docs/regressions/regressions-msmarco-v2-doc-segmented.md index 6f1215471d..c5cda8da5e 100644 --- a/docs/regressions-msmarco-v2-doc-segmented.md +++ b/docs/regressions/regressions-msmarco-v2-doc-segmented.md @@ -4,10 +4,10 @@ This page describes regression experiments for document ranking _on the segmented version_ of the MS MARCO (V2) document corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc-segmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc-segmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc-segmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,9 +30,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-v2-doc-segmented/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-doc.md b/docs/regressions/regressions-msmarco-v2-doc.md similarity index 91% rename from docs/regressions-msmarco-v2-doc.md rename to docs/regressions/regressions-msmarco-v2-doc.md index d27e21b683..a390f2835c 100644 --- a/docs/regressions-msmarco-v2-doc.md +++ b/docs/regressions/regressions-msmarco-v2-doc.md @@ -4,10 +4,10 @@ This page describes regression experiments for document ranking on the MS MARCO (V2) document corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-doc.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-doc.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-doc.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,9 +30,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-v2-doc/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-passage-augmented-d2q-t5.md b/docs/regressions/regressions-msmarco-v2-passage-augmented-d2q-t5.md similarity index 94% rename from docs/regressions-msmarco-v2-passage-augmented-d2q-t5.md rename to docs/regressions/regressions-msmarco-v2-passage-augmented-d2q-t5.md index 14a744f4ca..bb6e446dc6 100644 --- a/docs/regressions-msmarco-v2-passage-augmented-d2q-t5.md +++ b/docs/regressions/regressions-msmarco-v2-passage-augmented-d2q-t5.md @@ -5,8 +5,8 @@ This page describes regression experiments for passage ranking _on the augmented version_ of the MS MARCO V2 Passage Corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we expand the augmented passage corpus with doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-augmented-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-augmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-augmented-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-augmented-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-v2-passage-augmented-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-passage-augmented.md b/docs/regressions/regressions-msmarco-v2-passage-augmented.md similarity index 91% rename from docs/regressions-msmarco-v2-passage-augmented.md rename to docs/regressions/regressions-msmarco-v2-passage-augmented.md index 4298abc87f..3d8e29bb40 100644 --- a/docs/regressions-msmarco-v2-passage-augmented.md +++ b/docs/regressions/regressions-msmarco-v2-passage-augmented.md @@ -4,10 +4,10 @@ This page describes regression experiments for passage ranking _on the augmented version_ of the MS MARCO V2 Passage Corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-augmented.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-augmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-augmented.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-augmented.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,9 +30,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-v2-passage-augmented/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-passage-d2q-t5.md b/docs/regressions/regressions-msmarco-v2-passage-d2q-t5.md similarity index 94% rename from docs/regressions-msmarco-v2-passage-d2q-t5.md rename to docs/regressions/regressions-msmarco-v2-passage-d2q-t5.md index 5fa46ae857..01c6ea8426 100644 --- a/docs/regressions-msmarco-v2-passage-d2q-t5.md +++ b/docs/regressions/regressions-msmarco-v2-passage-d2q-t5.md @@ -5,8 +5,8 @@ This page describes regression experiments for passage ranking on the MS MARCO V2 Passage Corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we expand the passage corpus with doc2query-T5. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-d2q-t5.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-d2q-t5.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-d2q-t5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,7 +30,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/msmarco-v2-passage-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-msmarco-v2-passage-splade-pp-ed.md b/docs/regressions/regressions-msmarco-v2-passage-splade-pp-ed.md similarity index 92% rename from docs/regressions-msmarco-v2-passage-splade-pp-ed.md rename to docs/regressions/regressions-msmarco-v2-passage-splade-pp-ed.md index 2ec4343786..f3905dafac 100644 --- a/docs/regressions-msmarco-v2-passage-splade-pp-ed.md +++ b/docs/regressions/regressions-msmarco-v2-passage-splade-pp-ed.md @@ -9,8 +9,8 @@ The model can be described in the following paper: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-splade-pp-ed.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-splade-pp-ed.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -64,7 +64,7 @@ The path `/path/to/msmarco-v2-passage-splade-pp-ed/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -116,6 +116,6 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Passage: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8220 | | [MS MARCO V2 Passage: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8124 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-msmarco-v2-passage-splade-pp-sd.md b/docs/regressions/regressions-msmarco-v2-passage-splade-pp-sd.md similarity index 92% rename from docs/regressions-msmarco-v2-passage-splade-pp-sd.md rename to docs/regressions/regressions-msmarco-v2-passage-splade-pp-sd.md index ccaf8e687b..783d5d88d9 100644 --- a/docs/regressions-msmarco-v2-passage-splade-pp-sd.md +++ b/docs/regressions/regressions-msmarco-v2-passage-splade-pp-sd.md @@ -9,8 +9,8 @@ The model can be described in the following paper: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-splade-pp-sd.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-splade-pp-sd.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -64,7 +64,7 @@ The path `/path/to/msmarco-v2-passage-splade-pp-sd/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -116,6 +116,6 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Passage: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8270 | | [MS MARCO V2 Passage: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.8234 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-msmarco-v2-passage-unicoil-0shot.md b/docs/regressions/regressions-msmarco-v2-passage-unicoil-0shot.md similarity index 93% rename from docs/regressions-msmarco-v2-passage-unicoil-0shot.md rename to docs/regressions/regressions-msmarco-v2-passage-unicoil-0shot.md index 6cca015fc2..7432ec1665 100644 --- a/docs/regressions-msmarco-v2-passage-unicoil-0shot.md +++ b/docs/regressions/regressions-msmarco-v2-passage-unicoil-0shot.md @@ -9,8 +9,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-unicoil-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-unicoil-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -71,7 +71,7 @@ The path `/path/to/msmarco-v2-passage-unicoil-0shot/` should point to the corpus The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -123,8 +123,8 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Passage: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.7616 | | [MS MARCO V2 Passage: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.7671 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md b/docs/regressions/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md similarity index 93% rename from docs/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md rename to docs/regressions/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md index 7edc61a228..892e41f2ef 100644 --- a/docs/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md +++ b/docs/regressions/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md @@ -9,8 +9,8 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage-unicoil-noexp-0shot.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage-unicoil-noexp-0shot.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -71,7 +71,7 @@ The path `/path/to/msmarco-v2-passage-unicoil-noexp-0shot/` should point to the The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -123,8 +123,8 @@ With the above commands, you should be able to reproduce the following results: | [MS MARCO V2 Passage: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.7010 | | [MS MARCO V2 Passage: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html) | 0.7114 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template) and run `bin/build.sh` to rebuild the documentation. + Results reproduced by [@lintool](https://github.com/lintool) on 2022-06-06 (commit [`236b386`](https://github.com/castorini/anserini/commit/236b386ddc11d292b4b736162b59488a02236d6c)) diff --git a/docs/regressions-msmarco-v2-passage.md b/docs/regressions/regressions-msmarco-v2-passage.md similarity index 91% rename from docs/regressions-msmarco-v2-passage.md rename to docs/regressions/regressions-msmarco-v2-passage.md index 0f6e91e4bf..441070a898 100644 --- a/docs/regressions-msmarco-v2-passage.md +++ b/docs/regressions/regressions-msmarco-v2-passage.md @@ -4,10 +4,10 @@ This page describes regression experiments for passage ranking on the MS MARCO V2 Passage Corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](../../docs/experiments-msmarco-v2.md). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-v2-passage.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-v2-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2-passage.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -30,9 +30,9 @@ target/appassembler/bin/IndexCollection \ ``` The directory `/path/to/msmarco-v2-passage/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](../../docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-neuclir22-fa-dt-splade.md b/docs/regressions/regressions-neuclir22-fa-dt-splade.md similarity index 96% rename from docs/regressions-neuclir22-fa-dt-splade.md rename to docs/regressions/regressions-neuclir22-fa-dt-splade.md index 362c7bbcb4..8399a4daff 100644 --- a/docs/regressions-neuclir22-fa-dt-splade.md +++ b/docs/regressions/regressions-neuclir22-fa-dt-splade.md @@ -6,8 +6,8 @@ This page presents **document translation** regression experiments for the [TREC + Documents: Machine-translated documents from Persian into English (corpus provided by the organizers) + Model: [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil) -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-fa-dt-splade.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-fa-dt-splade.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. We make available a version of the corpus that has already been encoded with [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil), i.e., we performed model inference on every document and stored the output sparse vectors. Thus, no neural inference is required to reproduce these experiments; see instructions below. @@ -49,7 +49,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-fa-en-splade & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -179,7 +179,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Persian): desc (original English queries)](https://neuclir.github.io/) | 0.8796 | 0.8061 | 0.8735 | | [NeuCLIR 2022 (Persian): desc+title (original English queries)](https://neuclir.github.io/) | 0.8860 | 0.7948 | 0.8703 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-fa-dt.md b/docs/regressions/regressions-neuclir22-fa-dt.md similarity index 96% rename from docs/regressions-neuclir22-fa-dt.md rename to docs/regressions/regressions-neuclir22-fa-dt.md index 75fa3d1441..f56df242a9 100644 --- a/docs/regressions-neuclir22-fa-dt.md +++ b/docs/regressions/regressions-neuclir22-fa-dt.md @@ -6,8 +6,8 @@ This page presents **document translation** regression experiments for the [TREC + Documents: Machine-translated documents from Persian into English (corpus provided by the organizers) + Model: BM25 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-fa-dt.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-fa-dt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-fa-dt.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-fa-dt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-fa-en & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -170,7 +170,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Persian): desc (original English queries)](https://neuclir.github.io/) | 0.6319 | 0.7663 | 0.7638 | | [NeuCLIR 2022 (Persian): desc+title (original English queries)](https://neuclir.github.io/) | 0.7652 | 0.8180 | 0.8248 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-fa-dt.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-fa-dt.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-fa-qt-splade.md b/docs/regressions/regressions-neuclir22-fa-qt-splade.md similarity index 97% rename from docs/regressions-neuclir22-fa-qt-splade.md rename to docs/regressions/regressions-neuclir22-fa-qt-splade.md index e9823eca5f..e6ca212a44 100644 --- a/docs/regressions-neuclir22-fa-qt-splade.md +++ b/docs/regressions/regressions-neuclir22-fa-qt-splade.md @@ -6,8 +6,8 @@ This page presents **query translation** regression experiments for the [TREC 20 + Documents: Original Persian corpus + Model: SPLADE NeuCLIR22 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-fa-qt-splade.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-fa-qt-splade.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. We make available a version of the corpus that has already been encoded with SPLADE NeuCLIR22, i.e., we performed model inference on every document and stored the output sparse vectors. Thus, no neural inference is required to reproduce these experiments; see instructions below. @@ -49,7 +49,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-fa-splade & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -281,7 +281,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Persian): desc (machine-translated queries)](https://neuclir.github.io/) | 0.8172 | 0.8172 | 0.8117 | | [NeuCLIR 2022 (Persian): desc+title (machine-translated queries)](https://neuclir.github.io/) | 0.8437 | 0.8437 | 0.8350 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-fa-qt.md b/docs/regressions/regressions-neuclir22-fa-qt.md similarity index 97% rename from docs/regressions-neuclir22-fa-qt.md rename to docs/regressions/regressions-neuclir22-fa-qt.md index db94480dbd..3f26b5c701 100644 --- a/docs/regressions-neuclir22-fa-qt.md +++ b/docs/regressions/regressions-neuclir22-fa-qt.md @@ -6,8 +6,8 @@ This page presents **query translation** regression experiments for the [TREC 20 + Documents: Original Persian corpus + Model: BM25 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-fa-qt.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-fa-qt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-fa-qt.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-fa-qt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-fa & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -272,7 +272,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Persian): desc (machine-translated queries)](https://neuclir.github.io/) | 0.6815 | 0.5606 | 0.7033 | | [NeuCLIR 2022 (Persian): desc+title (machine-translated queries)](https://neuclir.github.io/) | 0.7424 | 0.6264 | 0.7829 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-fa-qt.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-fa-qt.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-ru-dt-splade.md b/docs/regressions/regressions-neuclir22-ru-dt-splade.md similarity index 96% rename from docs/regressions-neuclir22-ru-dt-splade.md rename to docs/regressions/regressions-neuclir22-ru-dt-splade.md index a1ca4efb3d..a38f7a151e 100644 --- a/docs/regressions-neuclir22-ru-dt-splade.md +++ b/docs/regressions/regressions-neuclir22-ru-dt-splade.md @@ -6,8 +6,8 @@ This page presents **document translation** regression experiments for the [TREC + Documents: Machine-translated documents from Russian into English (corpus provided by the organizers) + Model: [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil) -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-ru-dt-splade.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-ru-dt-splade.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. We make available a version of the corpus that has already been encoded with [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil), i.e., we performed model inference on every document and stored the output sparse vectors. Thus, no neural inference is required to reproduce these experiments; see instructions below. @@ -49,7 +49,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-ru-en-splade & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -179,7 +179,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Russian): desc (original English queries)](https://neuclir.github.io/) | 0.8376 | 0.7529 | 0.8238 | | [NeuCLIR 2022 (Russian): desc+title (original English queries)](https://neuclir.github.io/) | 0.8513 | 0.7704 | 0.8544 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-ru-dt.md b/docs/regressions/regressions-neuclir22-ru-dt.md similarity index 96% rename from docs/regressions-neuclir22-ru-dt.md rename to docs/regressions/regressions-neuclir22-ru-dt.md index be023e62b2..4818e05d39 100644 --- a/docs/regressions-neuclir22-ru-dt.md +++ b/docs/regressions/regressions-neuclir22-ru-dt.md @@ -6,8 +6,8 @@ This page presents **document translation** regression experiments for the [TREC + Documents: Machine-translated documents from Russian into English (corpus provided by the organizers) + Model: BM25 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-ru-dt.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-ru-dt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-ru-dt.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-ru-dt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-ru-en & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -170,7 +170,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Russian): desc (original English queries)](https://neuclir.github.io/) | 0.5780 | 0.6772 | 0.6780 | | [NeuCLIR 2022 (Russian): desc+title (original English queries)](https://neuclir.github.io/) | 0.7255 | 0.7658 | 0.7798 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-ru-dt.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-ru-dt.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-ru-qt-splade.md b/docs/regressions/regressions-neuclir22-ru-qt-splade.md similarity index 97% rename from docs/regressions-neuclir22-ru-qt-splade.md rename to docs/regressions/regressions-neuclir22-ru-qt-splade.md index 7b9593dc1b..9a5995e06a 100644 --- a/docs/regressions-neuclir22-ru-qt-splade.md +++ b/docs/regressions/regressions-neuclir22-ru-qt-splade.md @@ -6,8 +6,8 @@ This page presents **query translation** regression experiments for the [TREC 20 + Documents: Original Russian corpus + Model: SPLADE NeuCLIR22 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-ru-qt-splade.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-ru-qt-splade.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. We make available a version of the corpus that has already been encoded with SPLADE NeuCLIR22, i.e., we performed model inference on every document and stored the output sparse vectors. Thus, no neural inference is required to reproduce these experiments; see instructions below. @@ -49,7 +49,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-ru-splade & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -281,7 +281,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Russian): desc (machine-translated queries)](https://neuclir.github.io/) | 0.7150 | 0.7150 | 0.7090 | | [NeuCLIR 2022 (Russian): desc+title (machine-translated queries)](https://neuclir.github.io/) | 0.7669 | 0.7669 | 0.7590 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-ru-qt.md b/docs/regressions/regressions-neuclir22-ru-qt.md similarity index 97% rename from docs/regressions-neuclir22-ru-qt.md rename to docs/regressions/regressions-neuclir22-ru-qt.md index 85e2890914..2038738267 100644 --- a/docs/regressions-neuclir22-ru-qt.md +++ b/docs/regressions/regressions-neuclir22-ru-qt.md @@ -6,8 +6,8 @@ This page presents **query translation** regression experiments for the [TREC 20 + Documents: Original Russian corpus + Model: BM25 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-ru-qt.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-ru-qt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-ru-qt.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-ru-qt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-ru & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -272,7 +272,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Russian): desc (machine-translated queries)](https://neuclir.github.io/) | 0.6210 | 0.5536 | 0.7136 | | [NeuCLIR 2022 (Russian): desc+title (machine-translated queries)](https://neuclir.github.io/) | 0.7373 | 0.6271 | 0.7959 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-ru-qt.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-ru-qt.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-zh-dt-splade.md b/docs/regressions/regressions-neuclir22-zh-dt-splade.md similarity index 96% rename from docs/regressions-neuclir22-zh-dt-splade.md rename to docs/regressions/regressions-neuclir22-zh-dt-splade.md index 53f1e27cb9..aa3cf8bae9 100644 --- a/docs/regressions-neuclir22-zh-dt-splade.md +++ b/docs/regressions/regressions-neuclir22-zh-dt-splade.md @@ -6,8 +6,8 @@ This page presents **document translation** regression experiments for the [TREC + Documents: Machine-translated documents from Chinese into English (corpus provided by the organizers) + Model: [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil) -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-zh-dt-splade.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-zh-dt-splade.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. We make available a version of the corpus that has already been encoded with [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil), i.e., we performed model inference on every document and stored the output sparse vectors. Thus, no neural inference is required to reproduce these experiments; see instructions below. @@ -49,7 +49,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-zh-en-splade & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -179,7 +179,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Chinese): desc (original English queries)](https://neuclir.github.io/) | 0.7597 | 0.6969 | 0.7623 | | [NeuCLIR 2022 (Chinese): desc+title (original English queries)](https://neuclir.github.io/) | 0.7922 | 0.7481 | 0.8067 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-zh-dt.md b/docs/regressions/regressions-neuclir22-zh-dt.md similarity index 96% rename from docs/regressions-neuclir22-zh-dt.md rename to docs/regressions/regressions-neuclir22-zh-dt.md index 49551eea38..9805975e1f 100644 --- a/docs/regressions-neuclir22-zh-dt.md +++ b/docs/regressions/regressions-neuclir22-zh-dt.md @@ -6,8 +6,8 @@ This page presents **document translation** regression experiments for the [TREC + Documents: Machine-translated documents from Chinese into English (corpus provided by the organizers) + Model: BM25 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-zh-dt.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-zh-dt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-zh-dt.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-zh-dt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-zh-en & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -170,7 +170,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Chinese): desc (original English queries)](https://neuclir.github.io/) | 0.6639 | 0.7519 | 0.7404 | | [NeuCLIR 2022 (Chinese): desc+title (original English queries)](https://neuclir.github.io/) | 0.7567 | 0.7959 | 0.8011 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-zh-dt.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-zh-dt.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-zh-qt-splade.md b/docs/regressions/regressions-neuclir22-zh-qt-splade.md similarity index 97% rename from docs/regressions-neuclir22-zh-qt-splade.md rename to docs/regressions/regressions-neuclir22-zh-qt-splade.md index bc6b181277..5523cec46f 100644 --- a/docs/regressions-neuclir22-zh-qt-splade.md +++ b/docs/regressions/regressions-neuclir22-zh-qt-splade.md @@ -6,8 +6,8 @@ This page presents **query translation** regression experiments for the [TREC 20 + Documents: Original Chinese corpus + Model: SPLADE NeuCLIR22 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-zh-qt-splade.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-zh-qt-splade.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. We make available a version of the corpus that has already been encoded with SPLADE NeuCLIR22, i.e., we performed model inference on every document and stored the output sparse vectors. Thus, no neural inference is required to reproduce these experiments; see instructions below. @@ -49,7 +49,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-zh-splade & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -281,7 +281,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Chinese): desc (machine-translated queries)](https://neuclir.github.io/) | 0.5919 | 0.5919 | 0.6096 | | [NeuCLIR 2022 (Chinese): desc+title (machine-translated queries)](https://neuclir.github.io/) | 0.6312 | 0.6312 | 0.6535 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-neuclir22-zh-qt.md b/docs/regressions/regressions-neuclir22-zh-qt.md similarity index 97% rename from docs/regressions-neuclir22-zh-qt.md rename to docs/regressions/regressions-neuclir22-zh-qt.md index 6688198e4f..1da2ca80d0 100644 --- a/docs/regressions-neuclir22-zh-qt.md +++ b/docs/regressions/regressions-neuclir22-zh-qt.md @@ -6,8 +6,8 @@ This page presents **query translation** regression experiments for the [TREC 20 + Documents: Original Chinese corpus + Model: BM25 -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/neuclir22-zh-qt.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/neuclir22-zh-qt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/neuclir22-zh-qt.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/neuclir22-zh-qt.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -40,7 +40,7 @@ target/appassembler/bin/IndexCollection \ >& logs/log.neuclir22-zh & ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -272,7 +272,7 @@ With the above commands, you should be able to reproduce the following results: | [NeuCLIR 2022 (Chinese): desc (machine-translated queries)](https://neuclir.github.io/) | 0.2989 | 0.2462 | 0.3748 | | [NeuCLIR 2022 (Chinese): desc+title (machine-translated queries)](https://neuclir.github.io/) | 0.4028 | 0.2746 | 0.4341 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/neuclir22-zh-qt.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/neuclir22-zh-qt.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-ntcir8-zh.md b/docs/regressions/regressions-ntcir8-zh.md similarity index 93% rename from docs/regressions-ntcir8-zh.md rename to docs/regressions/regressions-ntcir8-zh.md index 595a058cb9..69bf7cfc2b 100644 --- a/docs/regressions-ntcir8-zh.md +++ b/docs/regressions/regressions-ntcir8-zh.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for [NTCIR-8 ACLIA (IR4QA subtask), monolingual Chinese topics](http://research.nii.ac.jp/ntcir/ntcir-ws8/ws-en.html). The description of the document collection can be found in the [NTCIR-8 data page](http://research.nii.ac.jp/ntcir/permission/ntcir-8/perm-en-ACLIA.html). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/ntcir8-zh.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/ntcir8-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/ntcir8-zh.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/ntcir8-zh.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,7 +31,7 @@ We build the index directly from the raw LDC data: the directory `/path/to/ntcir8-zh/` should point to the directory `data/xin_cmn/` from LDC2007T38. In that directory, there should be 48 gzipped files matching the pattern `xin_cmn_200[2-5]*`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-robust05.md b/docs/regressions/regressions-robust05.md similarity index 95% rename from docs/regressions-robust05.md rename to docs/regressions/regressions-robust05.md index 86473a38dd..80769da8b2 100644 --- a/docs/regressions-robust05.md +++ b/docs/regressions/regressions-robust05.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the TREC 2005 Robust Track, which uses the [AQUAINT collection](https://tac.nist.gov//data/data_desc.html#AQUAINT). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/robust05.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/robust05.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/robust05.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/robust05.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/aquaint/` should be the root directory of the [AQUAINT collection](https://tac.nist.gov//data/data_desc.html#AQUAINT); under subdirectory `disk1/` there should be `NYT/` and under subdirectory `disk2/` there should be `APW/` and `XIE/`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-trec02-ar.md b/docs/regressions/regressions-trec02-ar.md similarity index 93% rename from docs/regressions-trec02-ar.md rename to docs/regressions/regressions-trec02-ar.md index 1d46111150..485e993256 100644 --- a/docs/regressions-trec02-ar.md +++ b/docs/regressions/regressions-trec02-ar.md @@ -3,8 +3,8 @@ This page documents BM25 regression experiments for monolingual Arabic document retrieval as part of the [TREC 2002 CLIR Track](https://trec.nist.gov/pubs/trec11/t11_proceedings.html). The description of the document collection can be found on the [TREC data page](https://trec.nist.gov/data/docs_noneng.html). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/trec02-ar.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/trec02-ar.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/trec02-ar.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/trec02-ar.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -31,7 +31,7 @@ Inside the LDC2007T38 distribution, there should be a directory named `transcrip The path above `/path/to/trec02-ar/` should point to this `transcripts/` directory. The collection contains 383,872 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/regressions-wiki-all-6-3-tamber-bm25.md b/docs/regressions/regressions-wiki-all-6-3-tamber-bm25.md similarity index 94% rename from docs/regressions-wiki-all-6-3-tamber-bm25.md rename to docs/regressions/regressions-wiki-all-6-3-tamber-bm25.md index 8c74b96439..a2409dbf24 100644 --- a/docs/regressions-wiki-all-6-3-tamber-bm25.md +++ b/docs/regressions/regressions-wiki-all-6-3-tamber-bm25.md @@ -7,8 +7,8 @@ The exact configuration here is the 6/3 sentence sliding window corpus described > Manveer Singh Tamber, Ronak Pradeep, and Jimmy Lin. [Pre-Processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering.](https://link.springer.com/chapter/10.1007/978-3-031-28241-6_11) _Proceedings of the 45th European Conference on Information Retrieval (ECIR 2023), Part III_, pages 163–176, April 2023, Dublin, Ireland. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/wiki-all-6-3-tamber-bm25.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/wiki-all-6-3-tamber-bm25.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -32,7 +32,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/wiki-all-6-3-tamber/`should be a directory containing the wiki-all-6-3-tamber passages collection retrieved from [here](https://huggingface.co/datasets/castorini/odqa-wiki-corpora). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -157,6 +157,6 @@ With the above commands, you should be able to reproduce the following results: | [DPR: CuratedTREC Test](https://github.com/facebookresearch/DPR) | 0.9135 | | [EfficientQA: Natural Questions Test](https://efficientqa.github.io/) | 0.8166 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-wikipedia-dpr-100w-bm25.md b/docs/regressions/regressions-wikipedia-dpr-100w-bm25.md similarity index 93% rename from docs/regressions-wikipedia-dpr-100w-bm25.md rename to docs/regressions/regressions-wikipedia-dpr-100w-bm25.md index 56533bb76b..d67511f275 100644 --- a/docs/regressions-wikipedia-dpr-100w-bm25.md +++ b/docs/regressions/regressions-wikipedia-dpr-100w-bm25.md @@ -4,8 +4,8 @@ This page documents QA regression experiments on the `wikipedia-dpr-100w` corpus, which is integrated into Anserini's regression testing framework. -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/wikipedia-dpr-100w-bm25.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/wikipedia-dpr-100w-bm25.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -29,7 +29,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/wikipedia-dpr-100w/`should be a directory containing the wikipedia-dpr-100w passages collection retrieved from [here](https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval @@ -154,6 +154,6 @@ With the above commands, you should be able to reproduce the following results: | [DPR: CuratedTREC Test](https://github.com/facebookresearch/DPR) | 0.8991 | | [EfficientQA: Natural Questions Test](https://efficientqa.github.io/) | 0.7922 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](../../docs/reproducibility.md) -To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template) and run `bin/build.sh` to rebuild the documentation. +To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions-wt10g.md b/docs/regressions/regressions-wt10g.md similarity index 95% rename from docs/regressions-wt10g.md rename to docs/regressions/regressions-wt10g.md index 917294f8dd..a98253d304 100644 --- a/docs/regressions-wt10g.md +++ b/docs/regressions/regressions-wt10g.md @@ -3,8 +3,8 @@ **Models**: various bag-of-words approaches This page describes regressions for the TREC-9 Web Track and the TREC 2001 Web Track, which uses the [Wt10g collection](http://ir.dcs.gla.ac.uk/test_collections/wt10g.html). -The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/wt10g.yaml). -Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/wt10g.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. +The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/wt10g.yaml). +Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/wt10g.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: @@ -28,7 +28,7 @@ target/appassembler/bin/IndexCollection \ The directory `/path/to/wt10g/` should be the root directory of the [Wt10g collection](http://ir.dcs.gla.ac.uk/test_collections/wt10g.html), containing a bunch of subdirectories, `WTX001` to `WTX104`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md). ## Retrieval diff --git a/docs/solrini.md b/docs/solrini.md index 2e1fab050a..be227c0491 100644 --- a/docs/solrini.md +++ b/docs/solrini.md @@ -43,7 +43,7 @@ pushd src/main/resources/solr && ./solr.sh ../../../../solrini localhost:9983 && Solr should now be available at [http://localhost:8983/](http://localhost:8983/) for browsing. The Solr index schema can also be modified using the [Schema API](https://lucene.apache.org/solr/guide/8_3/schema-api.html). This is useful for specifying field types and other properties including multiValued fields. -Schemas for setting up specific Solr index schemas can be found in the [src/main/resources/solr/schemas/](../src/main/resources/solr/schemas/) folder. +Schemas for setting up specific Solr index schemas can be found in the `src/main/resources/solr/schemas/`folder. To set the schema, we can make a request to the Schema API: ```bash @@ -61,7 +61,7 @@ Indexing into Solr is similar indexing to disk with Lucene, with a few added par Most notably, we replace the `-index` parameter (which specifies the Lucene index path on disk) with Solr parameters. Alternatively, Solr can also be configured to read pre-built Lucene indexes, since Solr uses Lucene indexes under the hood (more details below). -We'll index [Robust04](regressions-disk45.md) as an example. +We'll index Robust04 as an example. First, create the `robust04` collection in Solr: ```bash @@ -109,9 +109,9 @@ P_30 all 0.3102 Solrini has also been verified to work with following collections as well: -+ [TREC Washington Post Corpus](regressions-core18.md) -+ [MS MARCO passage ranking task](experiments-msmarco-passage.md) -+ [MS MARCO document ranking task](regressions-msmarco-doc.md) ++ TREC Washington Post Corpus ++ MS MARCO passage ranking task ++ MS MARCO document ranking task See `run_solr_regression.py` regression script for more details. @@ -119,7 +119,7 @@ See `run_solr_regression.py` regression script for more details. It is possible for Solr to read pre-built Lucene indexes. To achieve this, some housekeeping is required to "install" the pre-built indexes. -The following uses [Robust04](regressions-disk45.md) as an example. +The following uses Robust04 as an example. Let's assume the pre-built index is stored at `indexes/lucene-index.disk45/`. First, a Solr collection must be created to house the index. @@ -168,7 +168,7 @@ You can confirm that everything works by performing a retrieval run and checking ## Solr integration test We have an end-to-end integration testing script `run_solr_regression.py`. -See example usage for [Robust04](regressions-disk45.md) below: +See example usage for Robust04 below: ```bash # Check if Solr server is on @@ -196,21 +196,21 @@ To run end-to-end, issue the following command: python src/main/python/run_solr_regression.py --regression robust04 --input /path/to/disk45 ``` -The regression script has been verified to work for [`robust04`](regressions-disk45.md), [`core18`](regressions-core18.md), [`msmarco-passage`](experiments-msmarco-passage.md), [`msmarco-doc`](regressions-msmarco-doc.md). +The regression script has been verified to work for `robust04`, `core18`, `msmarco-passage`, `msmarco-doc`. ## Reproduction Log[*](reproducibility.md) -+ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`1882d84`](https://github.com/castorini/anserini/commit/1882d84236b13cd4673d2d8fa91003438eea2d82)) for both [Washington Post](regressions-core18.md) and [Robust04](regressions-disk45.md) -+ Results reproduced by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-28 (commit [`a79cb62`](https://github.com/castorini/anserini/commit/a79cb62a57a059113a6c3b1523b582b89dccf0a1)) for both [Washington Post](regressions-core18.md) and [Robust04](regressions-disk45.md) -+ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-02-12 (commit [`eff7755`](https://github.com/castorini/anserini/commit/eff7755a611bd20ee1d63ac0167f5c8f38cd3074)) for [Washington Post `core18`](regressions-core18.md), [Robust04 `robust04`](regressions-disk45.md), and [MS Marco Passage `msmarco-passage`](regressions-msmarco-passage.md) using end-to-end [`run_solr_regression`](../src/main/python/run_solr_regression.py) -+ Results reproduced by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`31d843a`](https://github.com/castorini/anserini/commit/31d843a6073bfd7eff7e326f543e3f11845df7fa)) for [MS Marco Passage `msmarco-passage`](regressions-msmarco-passage.md) using end-to-end [`run_solr_regression`](../src/main/python/run_solr_regression.py) -+ Results reproduced by [@shaneding](https://github.com/shaneding) on 2020-05-26 (commit [`bed8ead`](https://github.com/castorini/anserini/commit/bed8eadad5f2ba859a2ddd2801db4aaeb3c81485)) for [MS Marco Passage `msmarco-passage`](regressions-msmarco-passage.md) using end-to-end [`run_solr_regression`](../src/main/python/run_solr_regression.py) -+ Results reproduced by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS MARCO Passage `msmarco-passage`](regressions-msmarco-passage.md) -+ Results reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS Marco Passage `msmarco-passage`](regressions-msmarco-passage.md) and [MS Marco Document `msmarco-doc`](regressions-msmarco-doc.md) using end-to-end [`run_solr_regression`](../src/main/python/run_solr_regression.py) -+ Results reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for [Robust04 `robust04`](regressions-disk45.md), [Washington Post `core18`](regressions-core18.md), and [MS Marco Passage `msmarco-passage`](regressions-msmarco-passage.md) using end-to-end [`run_solr_regression`](../src/main/python/run_solr_regression.py) ++ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`1882d84`](https://github.com/castorini/anserini/commit/1882d84236b13cd4673d2d8fa91003438eea2d82)) for both Washington Post and Robust04 ++ Results reproduced by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-28 (commit [`a79cb62`](https://github.com/castorini/anserini/commit/a79cb62a57a059113a6c3b1523b582b89dccf0a1)) for both Washington Post and Robust04 ++ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-02-12 (commit [`eff7755`](https://github.com/castorini/anserini/commit/eff7755a611bd20ee1d63ac0167f5c8f38cd3074)) for Washington Post `core18`, Robust04 `robust04`, and MS Marco Passage `msmarco-passage` using end-to-end `run_solr_regression` ++ Results reproduced by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`31d843a`](https://github.com/castorini/anserini/commit/31d843a6073bfd7eff7e326f543e3f11845df7fa)) for MS Marco Passage `msmarco-passage` using end-to-end `run_solr_regression` ++ Results reproduced by [@shaneding](https://github.com/shaneding) on 2020-05-26 (commit [`bed8ead`](https://github.com/castorini/anserini/commit/bed8eadad5f2ba859a2ddd2801db4aaeb3c81485)) for MS Marco Passage `msmarco-passage` using end-to-end `run_solr_regression` ++ Results reproduced by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for MS MARCO Passage `msmarco-passage` ++ Results reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for MS Marco Passage `msmarco-passage` and MS Marco Document `msmarco-doc` using end-to-end `run_solr_regression` ++ Results reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for Robust04 `robust04`, Washington Post `core18`, and MS Marco Passage `msmarco-passage` using end-to-end `run_solr_regression` + Results reproduced by [@lintool](https://github.com/lintool) on 2020-11-10 (commit [`e19755b`](https://github.com/castorini/anserini/commit/e19755b5fa976127830597bc9fbca203b9f5ad24)), all commands and end-to-end regression script for all four collections -+ Results reproduced by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-10 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for [MS MARCO Passage](regressions-msmarco-passage.md) -+ Results reproduced by [@tyao-t](https://github.com/tyao-t) on 2021-01-13 (commit [`a62aca0`](https://github.com/castorini/anserini/commit/a62aca06c1603617207c1c148133de0f90f24738)) for [MS MARCO Passage](regressions-msmarco-passage.md) and [MS MARCO Document](regressions-msmarco-doc.md) -+ Results reproduced by [@d1shs0ap](https://github.com/d1shs0ap) on 2022-01-21 (commit [`a81299e`](https://github.com/castorini/anserini/commit/a81299e59eff24512d635e0d49fba6e373286469)) for [MS MARCO Document](regressions-msmarco-doc.md) using end-to-end [`run_solr_regression`](../src/main/python/run_solr_regression.py) ++ Results reproduced by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-10 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for MS MARCO Passage ++ Results reproduced by [@tyao-t](https://github.com/tyao-t) on 2021-01-13 (commit [`a62aca0`](https://github.com/castorini/anserini/commit/a62aca06c1603617207c1c148133de0f90f24738)) for[MS MARCO Passage and MS MARCO Document ++ Results reproduced by [@d1shs0ap](https://github.com/d1shs0ap) on 2022-01-21 (commit [`a81299e`](https://github.com/castorini/anserini/commit/a81299e59eff24512d635e0d49fba6e373286469)) for MS MARCO Document using end-to-end `run_solr_regression` + Results reproduced by [@lintool](https://github.com/lintool) on 2022-03-21 (commit [`3d1fc34`](https://github.com/castorini/anserini/commit/3d1fc3457b993832b4682c0482b26d8271d02ec6)) for all collections + Results reproduced by [@lintool](https://github.com/lintool) on 2022-07-31 (commit [`2a0cb16`](https://github.com/castorini/anserini/commit/2a0cb16829b347e38801b9972b349de498dadf03)) (v0.14.4) for all collections diff --git a/src/main/resources/docgen/templates/backgroundlinking18.template b/src/main/resources/docgen/templates/backgroundlinking18.template index d58f681fe5..082ce1adad 100644 --- a/src/main/resources/docgen/templates/backgroundlinking18.template +++ b/src/main/resources/docgen/templates/backgroundlinking18.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/backgroundlinking19.template b/src/main/resources/docgen/templates/backgroundlinking19.template index d5e10b6b2b..0af310ea9e 100644 --- a/src/main/resources/docgen/templates/backgroundlinking19.template +++ b/src/main/resources/docgen/templates/backgroundlinking19.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/backgroundlinking20.template b/src/main/resources/docgen/templates/backgroundlinking20.template index 9d223b80b6..cf646f8bb0 100644 --- a/src/main/resources/docgen/templates/backgroundlinking20.template +++ b/src/main/resources/docgen/templates/backgroundlinking20.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus *v3*](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template index d713a10f1a..883051e316 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template index 30f24a9279..f3aaa385c1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template index 438360ed85..39960620bb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template index d7366de0df..d4208e6fd1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template index e98b150125..7037b1f942 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template index 7de3834403..39ddd86af9 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template index a41769a73c..b9f006423f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template index 715e3b18f3..5fef099fd1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template index d8f226e613..de909c9b56 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template index 6247033676..0c94dd1119 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template index 5dfc3e59cb..20dd483e87 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template index 2c94e552ef..be8a6b93a5 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template index 5685e66426..f63334f946 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template index 5600da8202..1a4b9dfb6b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,416,593 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template index e1d2923ebc..63392a37c4 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template index 0b14d3d472..7afd0c0997 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template index c0abcf0d14..1bcc0b4d5b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template index 2ea2564808..f7c9f91a46 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template index 8505e8373f..4e041cf4fe 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template index d4d7be04e6..06c911ee81 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template index 51cddee3b3..17ce96ceac 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template index 91430df497..b5d668bd79 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template index 6b4acba975..48a5cec922 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template index d5600c2190..1bb59ceb7f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template index a231b80726..9e7e8a10d3 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template index f3b078974e..9da82eca4e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template index 73e44f194e..62f3571740 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template index 3e626d9a72..eadd63ed0a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template index 738df5c7fd..5799245b9a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template index 83fcc135cf..7a2c93a5dc 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template index b2b3d585f8..74a064e7ef 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template index 9f65da8dbe..4af81819a5 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template index 6352c2d352..0da6842d13 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template index e9f7e19d14..cfcc97b704 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template index 7f9ecd4bcf..914a75d34e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template index 72fb7a53ce..18bdc0ee66 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template index cb1f1afcef..c8a339245a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template index ebd0e3756e..877dedffa3 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template index 9a3f2d579b..92149b9094 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template index f7f8ce5d59..d05e3564b2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template index 4b1e34d89c..3947ea722f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template index 41e4214176..f9748e622d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template index 417b04c3ef..9666a36cd5 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template index 7aaa5ad799..4109ff05eb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template index 44a3648366..26162ad1e8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template index e211a90513..f62e6a081e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template index 94c8d39721..3009b3f7d1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template index ccaf0a726b..e32ac8549d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template index 9fef2b61c3..e3410352d9 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template index 858bf7cab4..ce1afbbe08 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template index 2758b6dc65..f0e174111c 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template index 5608c400b6..5c573679c8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template index 371240d231..7dc9744562 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template index b7fbadf47a..092965098a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template index 11dd5cd2e1..778e00ca96 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template index 15428584a5..a68904b738 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template index a6ebec95ed..4dcf8ba60b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template index 8175f2948e..19ab0d4e30 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template index b80e96d492..1d6919bc85 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template index f45991369d..9c5028d31b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template index ce025e8a62..c66ef9dce0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template index 76256b355d..1ce06d3a4a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template index 36497b2794..6e7cf0159f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template index f6b4b67736..d1334fecfe 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template index 81febbd917..634dcf61b1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template index 67bcc38d4a..d6c7758792 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template index 2058501390..2b7314d8ac 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template index 47ad9d9284..837c404983 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template index f20d12d26e..ac472c375b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template index 619bfa1b24..71f2ae3f82 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template index 63145335e1..994dbec854 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template index 43a4bc9623..958e6a7924 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template index 2dff1137a1..b8441a9645 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template index ff8c88bbe0..94f808e389 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template index 89c277eb4d..efab4a17b6 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template index b13b5e8802..aad2b01b0b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template index 9c2a80d06a..6d9205e9bb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template index eac52addc4..31d34ae37e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template index 60b8a66cd9..40631444b5 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 4,635,922 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template index f164ecebec..dfc1d3a4b6 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template index 6e5c17d97d..d100b296d8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template index 177f228ce9..acf1e67ba9 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template index 629c36ed64..a992c4c57b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template index 162abfeae2..4fd426b843 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,416,568 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template index 8263ea8af7..66c4f4aa1d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template index acf3e05ca4..107c7e3098 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template index 95deb6222f..feabcf7b4b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template index 2f95ee3bf0..4de7b44928 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template index bbf5415f5b..8693ce2319 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 57,638 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template index c4bb2ae241..b229c52abe 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template index 41e1f4d69e..e0212b8744 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template index 243b8e110c..e472e9505d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template index 06800e244f..26dbb9f239 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template index 540e80b872..451a1ac0cf 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,233,329 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template index 16a3451ac8..be6255d1c8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template index 54af9971bf..11c28bef39 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template index 70f22e7cfe..66691933d8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template index 274cc36c19..4be8ecc62f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template index 8a8298d11b..c8775b6121 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 3,633 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template index d6bd57b121..20df61e2dc 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template index f06a9f629a..dc717315b1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template index 72c930d330..04931a531c 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template index c9d9403fa6..c4f216ccab 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template index 3d1466aae1..e6a85885f2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 2,681,468 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template index a47343d65f..380c2ef906 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template index d94a37d6af..9d6596be37 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template index 90a9a9c69c..63c47a6c4e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template index 03e4e961f6..2e15e24b53 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template index 88565e8c02..2874ede0f2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 522,931 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template index 7387a003b6..8265204a85 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template index 853eb43b62..abb4a63490 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template index 8ab4b276f3..cfbc174bf8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template index edc5eb011e..a88056a5f8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template index fdd4c58fce..7362dc7aaa 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template index 02b54def24..b2dfd4d70d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template index 5fe68c98c5..95654f7668 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template index 7530c66dff..8ed5127328 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template index 0a48b9d7f7..49b9c74755 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template index e7370a0500..c7b30922b1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 25,657 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template index bbbaf85776..b9088751bb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template index 701bbc0150..081e5612b5 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template index 665d5ef882..6e305bb225 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template index a1c2504918..4091cb6f6b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template index cfb4c3d0f6..9719bf7c89 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 5,183 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template index 50957734ee..c969cc5f61 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template index 3ccfc3733c..f50084cd57 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template index a003535e43..9fbc26ae08 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template index 85413f0758..6e573b0182 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template index 09129a1c78..f77bb5b90c 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template index 6d8f8662a2..3608965d09 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template index 2e69bb5f4b..c185080d4e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template index e6e43e051a..468bf8fcbb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template index b8bfd68181..1ccb937725 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template index 106073e8d2..631e25c296 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 171,332 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template index c401c77e79..3e98f0f3cf 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template index 5d80f9e54a..89f3318e28 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template index 6a19dcaf34..d57cd78a5b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template index ee4188e807..9493d27c20 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template index 4888b659f4..81aed8ce9c 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,674 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template index b361db6d63..20fd128a0e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template index 07c8c54450..83f73aa5f0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template index 5c36b439eb..0113063702 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template @@ -20,7 +20,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template index 5e0b93b684..a45f59cf63 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template @@ -21,7 +21,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template index 7c271103bc..02cc88970d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.template @@ -53,7 +53,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 382,545 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -78,6 +78,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template index 8b6cb165e9..b7e549b52a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template @@ -24,7 +24,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/car17v1.5.template b/src/main/resources/docgen/templates/car17v1.5.template index 00f3a2e82a..74dcdb66a4 100644 --- a/src/main/resources/docgen/templates/car17v1.5.template +++ b/src/main/resources/docgen/templates/car17v1.5.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/car17v2.0-doc2query.template b/src/main/resources/docgen/templates/car17v2.0-doc2query.template index fe8ecea956..880ee910a4 100644 --- a/src/main/resources/docgen/templates/car17v2.0-doc2query.template +++ b/src/main/resources/docgen/templates/car17v2.0-doc2query.template @@ -7,7 +7,7 @@ This page documents regression experiments for the [TREC 2017 Complex Answer Ret > Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho. [Document Expansion by Query Prediction.](https://arxiv.org/abs/1904.08375) _arxiv:1904.08375_ These experiments are integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-doc2query.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-doc2query.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -26,9 +26,9 @@ Typical indexing command: ${index_cmds} ``` -The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](experiments-doc2query.md). +The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](${root_path}/docs/experiments-doc2query.md). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/car17v2.0.template b/src/main/resources/docgen/templates/car17v2.0.template index 48a821dfd5..5d12cba88e 100644 --- a/src/main/resources/docgen/templates/car17v2.0.template +++ b/src/main/resources/docgen/templates/car17v2.0.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/clef06-fr.template b/src/main/resources/docgen/templates/clef06-fr.template index 32ee51ca1a..cc0eb6e1aa 100644 --- a/src/main/resources/docgen/templates/clef06-fr.template +++ b/src/main/resources/docgen/templates/clef06-fr.template @@ -24,7 +24,7 @@ The collection comprises news articles from ATS (SDA) and Le Monde totaling 177, Since the original distribution is in a format that's slightly different from standard TREC collections, we used a [preprocessing script](../src/main/python/clir/document_preprocess.py) to convert the collection into Anserini's JSON line format (we also applied a bit of light data cleaning using a script that has been lost; if you have problems reproducing our results, get in touch directly). The directory `/path/to/clef06-fr/` should point to the location of the processed collection. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/core17.template b/src/main/resources/docgen/templates/core17.template index d4fe88da50..064722dce3 100644 --- a/src/main/resources/docgen/templates/core17.template +++ b/src/main/resources/docgen/templates/core17.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/nyt_corpus/` should be the root directory of the [New York Times Annotated Corpus](https://catalog.ldc.upenn.edu/LDC2008T19), i.e., `ls /path/to/nyt_corpus/` should bring up a bunch of subdirectories, `1987` to `2007`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -51,7 +51,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) + Results reproduced by [@tteofili](https://github.com/tteofili) on 2019-01-27 (commit [`951090`](https://github.com/castorini/Anserini/commit/951090b66230040f037dde46534d896416467337)) + Results reproduced by [@chriskamphuis](https://github.com/chriskamphuis) on 2019-09-07 (commit [`61f6f20`](https://github.com/castorini/anserini/commit/61f6f20ff6872484966ea1badcdcdcebf1eea852)) diff --git a/src/main/resources/docgen/templates/core18.template b/src/main/resources/docgen/templates/core18.template index e72e1a5712..f71ae690c9 100644 --- a/src/main/resources/docgen/templates/core18.template +++ b/src/main/resources/docgen/templates/core18.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` should bring up a single JSON file. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -51,7 +51,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) + Results reproduced by [@andrewyates](https://github.com/andrewyates) on 2018-11-30 (commit [`c1aac5`](https://github.com/castorini/Anserini/commit/c1aac5e353e2ab77db3e7106cb4c017a09ce0fe9)) + Results reproduced by [@chriskamphuis](https://github.com/chriskamphuis) on 2019-09-07 (commit [`61f6f20`](https://github.com/castorini/anserini/commit/61f6f20ff6872484966ea1badcdcdcebf1eea852)) diff --git a/src/main/resources/docgen/templates/cw09b.template b/src/main/resources/docgen/templates/cw09b.template index e61c7474ef..ecde687acb 100644 --- a/src/main/resources/docgen/templates/cw09b.template +++ b/src/main/resources/docgen/templates/cw09b.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/ClueWeb09b` should be the root directory of the [ClueWeb09 (Category B) collection](http://lemurproject.org/clueweb09.php/), i.e., `ls /path/to/ClueWeb09b` should bring up a bunch of subdirectories, `en0000` to `enwp03`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/cw12.template b/src/main/resources/docgen/templates/cw12.template index 85c11b4670..c7f7c1c551 100644 --- a/src/main/resources/docgen/templates/cw12.template +++ b/src/main/resources/docgen/templates/cw12.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/cw12/` should be the root directory of the (full) [ClueWeb12 collection](http://lemurproject.org/clueweb12.php/), i.e., `/path/to/cw12/` should contain `Disk1`, `Disk2`, `Disk3`, `Disk4`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/cw12b13.template b/src/main/resources/docgen/templates/cw12b13.template index d2424b1c54..fed72fb428 100644 --- a/src/main/resources/docgen/templates/cw12b13.template +++ b/src/main/resources/docgen/templates/cw12b13.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/cw12-b13/` should be the root directory of the [ClueWeb12-B13 collection](http://lemurproject.org/clueweb12/ClueWeb12-CreateB13.php), i.e., `/path/to/cw12-b13/` should bring up a bunch of subdirectories, `ClueWeb12_00` to `ClueWeb12_18`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -52,6 +52,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) * Results reproduced by [@matthew-z](https://github.com/matthew-z) on 2019-04-14 (commit [`abaa4c8`](https://github.com/castorini/Anserini/commit/abaa4c8e7cb50e8e4a3677377716f609b7859538))[*](https://github.com/castorini/Anserini/pull/590)[!](https://github.com/castorini/Anserini/issues/592) diff --git a/src/main/resources/docgen/templates/disk12.template b/src/main/resources/docgen/templates/disk12.template index 6abeb259a8..3a76f85d60 100644 --- a/src/main/resources/docgen/templates/disk12.template +++ b/src/main/resources/docgen/templates/disk12.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/disk12/` should be the root directory of [TIPSTER Disks 1 & 2](https://catalog.ldc.upenn.edu/LDC93T3A), i.e., `ls /path/to/disk12/` should bring up subdirectories like `doe`, `wsj`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/disk45.template b/src/main/resources/docgen/templates/disk45.template index 43b1bb421b..883c5030bf 100644 --- a/src/main/resources/docgen/templates/disk45.template +++ b/src/main/resources/docgen/templates/disk45.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/disk45/` should be the root directory of [TREC Disks 4 & 5](https://trec.nist.gov/data/cd45/index.html); inside each there should be subdirectories like `ft`, `fr94`. Note that Anserini ignores the `cr` folder when indexing, which is the standard configuration. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -53,9 +53,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) - -(Prior to the addition of TREC 7/8 topics) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) + Results reproduced by [@chriskamphuis](https://github.com/chriskamphuis) on 2018-12-18 (commit [`a15235`](https://github.com/castorini/Anserini/commit/a152359435ac6ae694b39f561343bba5eed8fdc9)) + Results reproduced by [@kelvin-jiang](https://github.com/kelvin-jiang) on 2019-09-08 (commit [`a1892ae`](https://github.com/castorini/anserini/commit/a1892aec726efe55111a7bc501ab0914afab3a30)) diff --git a/src/main/resources/docgen/templates/dl19-doc-ca.template b/src/main/resources/docgen/templates/dl19-doc-ca.template index eb3ed5c07d..3ab737b7a0 100644 --- a/src/main/resources/docgen/templates/dl19-doc-ca.template +++ b/src/main/resources/docgen/templates/dl19-doc-ca.template @@ -25,9 +25,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template b/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template index f03798c1d8..78f6b9206d 100644 --- a/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -17,7 +17,7 @@ All four conditions are described in detail [here](https://github.com/castorini/ The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -36,9 +36,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-docTTTTTquery/` should be a directory containing the expanded document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-hgf-wp.template b/src/main/resources/docgen/templates/dl19-doc-hgf-wp.template index 3e468441f0..1378198c7a 100644 --- a/src/main/resources/docgen/templates/dl19-doc-hgf-wp.template +++ b/src/main/resources/docgen/templates/dl19-doc-hgf-wp.template @@ -26,9 +26,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented-ca.template b/src/main/resources/docgen/templates/dl19-doc-segmented-ca.template index 96fdd097fd..cab96c7ea3 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented-ca.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented-ca.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). + **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing + **Expansion Condition:** none @@ -15,7 +15,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -34,9 +34,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template b/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template index 6fb1f5a6a9..0fe910c1fa 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -18,7 +18,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -37,9 +37,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented-docTTTTTquery/` should be a directory containing the expanded segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template b/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template index 12cb2f15f7..b0bd744270 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil-noexp.template @@ -61,7 +61,7 @@ The directory `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -104,7 +104,7 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template b/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template index bcf48dbe91..90221cdb3e 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template @@ -61,7 +61,7 @@ The directory `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -104,7 +104,7 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented-wp.template b/src/main/resources/docgen/templates/dl19-doc-segmented-wp.template index 93c84a04bd..100502da87 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented-wp.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented-wp.template @@ -27,9 +27,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented.template b/src/main/resources/docgen/templates/dl19-doc-segmented.template index f68f1975c4..0c7f246e77 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -18,7 +18,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -37,9 +37,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc-wp.template b/src/main/resources/docgen/templates/dl19-doc-wp.template index 1664751134..2131ab90b3 100644 --- a/src/main/resources/docgen/templates/dl19-doc-wp.template +++ b/src/main/resources/docgen/templates/dl19-doc-wp.template @@ -26,9 +26,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-doc.template b/src/main/resources/docgen/templates/dl19-doc.template index c116a7cc60..bf6d7c5841 100644 --- a/src/main/resources/docgen/templates/dl19-doc.template +++ b/src/main/resources/docgen/templates/dl19-doc.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -17,7 +17,7 @@ All four conditions are described in detail [here](https://github.com/castorini/ The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -36,9 +36,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -68,7 +68,7 @@ Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. + The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](${root_path}/docs/experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl19-passage-bm25-b8.template b/src/main/resources/docgen/templates/dl19-passage-bm25-b8.template index 355484f0e9..49fbaa552a 100644 --- a/src/main/resources/docgen/templates/dl19-passage-bm25-b8.template +++ b/src/main/resources/docgen/templates/dl19-passage-bm25-b8.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -51,7 +51,7 @@ ${index_cmds} The directory `/path/to/${corpus}/` should be a directory containing `jsonl` files containing quantized BM25 vectors for every document -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -77,7 +77,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-ca.template b/src/main/resources/docgen/templates/dl19-passage-ca.template index a0869ff6e5..b65d3adfea 100644 --- a/src/main/resources/docgen/templates/dl19-passage-ca.template +++ b/src/main/resources/docgen/templates/dl19-passage-ca.template @@ -6,7 +6,7 @@ This page describes baseline experiments, integrated into Anserini's regression Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** with **WordPiece tokenization** (i.e., from BERT) using the following tokenizer from HuggingFace [`bert-base-uncased`](https://huggingface.co/bert-base-uncased). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -27,7 +27,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template b/src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template index 65aede534a..29d0c5211c 100644 --- a/src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template @@ -6,7 +6,7 @@ This page describes document expansion experiments, integrated into Anserini's r These experiments take advantage of [docTTTTTquery](http://doc2query.ai/) (also called doc2query-T5) expansions. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -28,7 +28,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-docTTTTTquery` should be a directory containing `jsonl` files containing the expanded passage collection. [Instructions in the docTTTTTquery repo](http://doc2query.ai/) explain how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -57,7 +57,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). + The setting "tuned2" refers to `k1=2.18`, `b=0.86`, tuned via grid search to optimize recall@1000 directly _on the expanded passages_ using the MS MARCO passage sparse judgments (in 2020/12). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl19-passage-hgf-wp.template b/src/main/resources/docgen/templates/dl19-passage-hgf-wp.template index d0c61e3237..b61c045e58 100644 --- a/src/main/resources/docgen/templates/dl19-passage-hgf-wp.template +++ b/src/main/resources/docgen/templates/dl19-passage-hgf-wp.template @@ -7,7 +7,7 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -28,7 +28,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template index 9d339301e6..af8ab6f9bf 100644 --- a/src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/dl19-passage-splade-distil-cocodenser-medium.template @@ -6,7 +6,7 @@ This page describes regression experiments, integrated into Anserini's regressio The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -58,7 +58,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -88,7 +88,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template b/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template index d7c142ec2b..d4c23ff335 100644 --- a/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template +++ b/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed-onnx.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template b/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template index 421158889c..f114a0bda5 100644 --- a/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/dl19-passage-splade-pp-ed.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template b/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template index 99a3680cda..c31dff9734 100644 --- a/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template +++ b/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd-onnx.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template b/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template index 02d52a99b4..e333bc57ce 100644 --- a/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template +++ b/src/main/resources/docgen/templates/dl19-passage-splade-pp-sd.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template b/src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template index 0572e40703..76924745c2 100644 --- a/src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/dl19-passage-unicoil-noexp.template @@ -11,7 +11,7 @@ The experiments on this page are not actually reported in the paper. Here, a variant model without expansion is used. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -63,7 +63,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -93,7 +93,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-unicoil.template b/src/main/resources/docgen/templates/dl19-passage-unicoil.template index 44452634bd..655c4c85ef 100644 --- a/src/main/resources/docgen/templates/dl19-passage-unicoil.template +++ b/src/main/resources/docgen/templates/dl19-passage-unicoil.template @@ -11,7 +11,7 @@ The experiments on this page are not actually reported in the paper. However, the model is the same. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -63,7 +63,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -93,7 +93,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl19-passage-wp.template b/src/main/resources/docgen/templates/dl19-passage-wp.template index 860367cdd8..aee9726b71 100644 --- a/src/main/resources/docgen/templates/dl19-passage-wp.template +++ b/src/main/resources/docgen/templates/dl19-passage-wp.template @@ -7,7 +7,7 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -28,7 +28,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl19-passage.template b/src/main/resources/docgen/templates/dl19-passage.template index e35903dd06..ebd5b6e56b 100644 --- a/src/main/resources/docgen/templates/dl19-passage.template +++ b/src/main/resources/docgen/templates/dl19-passage.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2019 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-passage/` should be a directory containing `jsonl` files converted from the official passage collection, which is in `tsv` format. -[This page](experiments-msmarco-passage.md) explains how to perform this conversion. +[This page](${root_path}/docs/experiments-msmarco-passage.md) explains how to perform this conversion. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-doc-ca.template b/src/main/resources/docgen/templates/dl20-doc-ca.template index f70d0b516a..3cc5633fa5 100644 --- a/src/main/resources/docgen/templates/dl20-doc-ca.template +++ b/src/main/resources/docgen/templates/dl20-doc-ca.template @@ -25,9 +25,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template b/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template index bb19ba7679..576088b82f 100644 --- a/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -17,7 +17,7 @@ All four conditions are described in detail [here](https://github.com/castorini/ The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -36,9 +36,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-docTTTTTquery/` should be a directory containing the expanded document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-hgf-wp.template b/src/main/resources/docgen/templates/dl20-doc-hgf-wp.template index 95bcb5b66a..420826d54c 100644 --- a/src/main/resources/docgen/templates/dl20-doc-hgf-wp.template +++ b/src/main/resources/docgen/templates/dl20-doc-hgf-wp.template @@ -26,9 +26,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented-ca.template b/src/main/resources/docgen/templates/dl20-doc-segmented-ca.template index 35e9279b7f..429bd48a5d 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented-ca.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented-ca.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). + **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing + **Expansion Condition:** none @@ -15,7 +15,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -34,9 +34,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template b/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template index feca1f26cb..c40a74467f 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -18,7 +18,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -37,9 +37,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented-docTTTTTquery/` should be a directory containing the expanded segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template b/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template index 552521c66e..69d1641af2 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil-noexp.template @@ -61,7 +61,7 @@ The directory `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -104,7 +104,7 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template b/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template index dfe208c0b2..07b803ef30 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented-unicoil.template @@ -61,7 +61,7 @@ The directory `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -104,7 +104,7 @@ The reasonable settings are: However, for these topics, we get the same effectiveness results; that is, the tie-breaking affects do not manifest in different scores. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented-wp.template b/src/main/resources/docgen/templates/dl20-doc-segmented-wp.template index 23abaed0ee..397facfa2f 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented-wp.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented-wp.template @@ -27,9 +27,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented.template b/src/main/resources/docgen/templates/dl20-doc-segmented.template index a8955fc7d5..18265cff90 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -18,7 +18,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -37,9 +37,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc-wp.template b/src/main/resources/docgen/templates/dl20-doc-wp.template index 189c69e720..59e69ac8e6 100644 --- a/src/main/resources/docgen/templates/dl20-doc-wp.template +++ b/src/main/resources/docgen/templates/dl20-doc-wp.template @@ -26,9 +26,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-doc.template b/src/main/resources/docgen/templates/dl20-doc.template index 144ebccebd..9bd39d1099 100644 --- a/src/main/resources/docgen/templates/dl20-doc.template +++ b/src/main/resources/docgen/templates/dl20-doc.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -17,7 +17,7 @@ All four conditions are described in detail [here](https://github.com/castorini/ The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -36,9 +36,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -68,7 +68,7 @@ Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. + The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](${root_path}/docs/experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-passage-bm25-b8.template b/src/main/resources/docgen/templates/dl20-passage-bm25-b8.template index 333e935627..146a7ff25a 100644 --- a/src/main/resources/docgen/templates/dl20-passage-bm25-b8.template +++ b/src/main/resources/docgen/templates/dl20-passage-bm25-b8.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -51,7 +51,7 @@ ${index_cmds} The directory `/path/to/${corpus}/` should be a directory containing `jsonl` files containing quantized BM25 vectors for every document -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -77,7 +77,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-ca.template b/src/main/resources/docgen/templates/dl20-passage-ca.template index d422dba92b..66b10a0e58 100644 --- a/src/main/resources/docgen/templates/dl20-passage-ca.template +++ b/src/main/resources/docgen/templates/dl20-passage-ca.template @@ -6,7 +6,7 @@ This page describes baseline experiments, integrated into Anserini's regression Here we are using `CompositeAnalyzer` which combines **Lucene tokenization** with **WordPiece tokenization** (i.e., from BERT) using the following tokenizer from HuggingFace [`bert-base-uncased`](https://huggingface.co/bert-base-uncased). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -27,7 +27,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-passage-docTTTTTquery.template b/src/main/resources/docgen/templates/dl20-passage-docTTTTTquery.template index b99b2b4384..442ed90b7e 100644 --- a/src/main/resources/docgen/templates/dl20-passage-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl20-passage-docTTTTTquery.template @@ -6,7 +6,7 @@ This page describes document expansion experiments, integrated into Anserini's r These experiments take advantage of [docTTTTTquery](http://doc2query.ai/) (also called doc2query-T5) expansions. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -28,7 +28,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-docTTTTTquery` should be a directory containing `jsonl` files containing the expanded passage collection. [Instructions in the docTTTTTquery repo](http://doc2query.ai/) explain how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -57,7 +57,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_ using the MS MARCO passage sparse judgments, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). + The setting "tuned2" refers to `k1=2.18`, `b=0.86`, tuned via grid search to optimize recall@1000 directly _on the expanded passages_ using the MS MARCO passage sparse judgments (in 2020/12). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-passage-hgf-wp.template b/src/main/resources/docgen/templates/dl20-passage-hgf-wp.template index 1043930fa2..9cad488aae 100644 --- a/src/main/resources/docgen/templates/dl20-passage-hgf-wp.template +++ b/src/main/resources/docgen/templates/dl20-passage-hgf-wp.template @@ -7,7 +7,7 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -28,7 +28,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template index 5a00106642..64bc52e43a 100644 --- a/src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/dl20-passage-splade-distil-cocodenser-medium.template @@ -6,7 +6,7 @@ This page describes regression experiments, integrated into Anserini's regressio The SPLADE-distil CoCodenser Medium model is open-sourced by [Naver Labs Europe](https://europe.naverlabs.com/research/machine-learning-and-optimization/splade-models). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -58,7 +58,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -88,7 +88,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template b/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template index 5fa28d601d..c23ed48b80 100644 --- a/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template +++ b/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed-onnx.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template b/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template index 77344b165b..0b0d99ad45 100644 --- a/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/dl20-passage-splade-pp-ed.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template b/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template index c92d160bdb..bfe07d8b6c 100644 --- a/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template +++ b/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd-onnx.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using ONNX to perform query encoding on the fly. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template b/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template index b17cdce1d0..85d45b208d 100644 --- a/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template +++ b/src/main/resources/docgen/templates/dl20-passage-splade-pp-sd.template @@ -9,7 +9,7 @@ This page describes regression experiments, integrated into Anserini's regressio In these experiments, we are using pre-encoded queries (i.e., cached results of query encoding). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -60,7 +60,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADE-distil CoCodenser Medium tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -90,7 +90,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2003.07820). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template b/src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template index 7fd5ae38e4..f2453d80e1 100644 --- a/src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/dl20-passage-unicoil-noexp.template @@ -11,7 +11,7 @@ The experiments on this page are not actually reported in the paper. Here, a variant model without expansion is used. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -63,7 +63,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -93,7 +93,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2102.07662). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-unicoil.template b/src/main/resources/docgen/templates/dl20-passage-unicoil.template index d48b1b0c3b..116a3ab558 100644 --- a/src/main/resources/docgen/templates/dl20-passage-unicoil.template +++ b/src/main/resources/docgen/templates/dl20-passage-unicoil.template @@ -11,7 +11,7 @@ The experiments on this page are not actually reported in the paper. However, the model is the same. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation. @@ -63,7 +63,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -93,7 +93,7 @@ Note that retrieval metrics are computed to depth 1000 hits per query (as oppose Also, for computing nDCG, remember that we keep qrels of _all_ relevance grades, whereas for other metrics (e.g., AP), relevance grade 1 is considered not relevant (i.e., use the `-l 2` option in `trec_eval`). The experimental results reported here are directly comparable to the results reported in the [track overview paper](https://arxiv.org/abs/2102.07662). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl20-passage-wp.template b/src/main/resources/docgen/templates/dl20-passage-wp.template index 0b46e06cbf..2ca24d54f8 100644 --- a/src/main/resources/docgen/templates/dl20-passage-wp.template +++ b/src/main/resources/docgen/templates/dl20-passage-wp.template @@ -7,7 +7,7 @@ Here we are using **WordPiece tokenization** (i.e., from BERT). In general, effectiveness is lower than with "standard" Lucene tokenization for two reasons: (1) we're losing stemming, and (2) some terms are chopped into less meaningful subwords. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -28,7 +28,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl20-passage.template b/src/main/resources/docgen/templates/dl20-passage.template index 59085361fb..503d17b6b8 100644 --- a/src/main/resources/docgen/templates/dl20-passage.template +++ b/src/main/resources/docgen/templates/dl20-passage.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO passage collection, refer to [this page](experiments-msmarco-passage.md). +For additional instructions on working with MS MARCO passage collection, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-passage/` should be a directory containing `jsonl` files converted from the official passage collection, which is in `tsv` format. -[This page](experiments-msmarco-passage.md) explains how to perform this conversion. +[This page](${root_path}/docs/experiments-msmarco-passage.md) explains how to perform this conversion. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned using the MS MARCO passage sparse judgments, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). Settings tuned on the MS MARCO passage sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl21-doc-d2q-t5.template b/src/main/resources/docgen/templates/dl21-doc-d2q-t5.template index 6eecffc298..93a4dec9ee 100644 --- a/src/main/resources/docgen/templates/dl21-doc-d2q-t5.template +++ b/src/main/resources/docgen/templates/dl21-doc-d2q-t5.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 document collection (with doc2query-T5 expansions). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -30,9 +30,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-doc-segmented-d2q-t5.template b/src/main/resources/docgen/templates/dl21-doc-segmented-d2q-t5.template index 255d87f140..d52639d0e2 100644 --- a/src/main/resources/docgen/templates/dl21-doc-segmented-d2q-t5.template +++ b/src/main/resources/docgen/templates/dl21-doc-segmented-d2q-t5.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 _segmented_ document collection (with doc2query-T5 expansions). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -30,9 +30,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template index 9ecc949766..a06143a1d8 100644 --- a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template +++ b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot-v2.template @@ -16,7 +16,7 @@ The segment-only encoding results are deprecated and kept around primarily for a You probably don't want to use them. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -74,7 +74,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -103,7 +103,7 @@ ${effectiveness} This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template index 9aa1161dc8..d09feabafd 100644 --- a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template +++ b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-0shot.template @@ -15,7 +15,7 @@ This regression captures segment-only encoding and is kept around primarily for The version that uses title/segment encoding can be found [here](regressions-dl21-doc-segmented-unicoil-0shot-v2.md). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -73,7 +73,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -102,7 +102,7 @@ ${effectiveness} This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template index 1a92562766..bd16ae7553 100644 --- a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template +++ b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot-v2.template @@ -16,7 +16,7 @@ The segment-only encoding results are deprecated and kept around primarily for a You probably don't want to use them. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -74,7 +74,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -103,7 +103,7 @@ ${effectiveness} This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template index 97beebe3d7..a4cc008e37 100644 --- a/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template +++ b/src/main/resources/docgen/templates/dl21-doc-segmented-unicoil-noexp-0shot.template @@ -15,7 +15,7 @@ This regression captures segment-only encoding and is kept around primarily for The version that uses title/segment encoding can be found [here](regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -73,7 +73,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -102,7 +102,7 @@ ${effectiveness} This run roughly corresponds to run `p_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-doc-segmented.template b/src/main/resources/docgen/templates/dl21-doc-segmented.template index eddcef1d98..10db81c38d 100644 --- a/src/main/resources/docgen/templates/dl21-doc-segmented.template +++ b/src/main/resources/docgen/templates/dl21-doc-segmented.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 _segmented_ document collection. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -30,9 +30,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-doc.template b/src/main/resources/docgen/templates/dl21-doc.template index 486d91b555..5a11b170eb 100644 --- a/src/main/resources/docgen/templates/dl21-doc.template +++ b/src/main/resources/docgen/templates/dl21-doc.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 document collection. Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO V2 document collection, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with MS MARCO V2 document collection, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). Note that there are four different bag-of-words regression conditions for this task, and this page describes the following: @@ -30,9 +30,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-passage-augmented-d2q-t5.template b/src/main/resources/docgen/templates/dl21-passage-augmented-d2q-t5.template index 924548b589..a402ae6c6c 100644 --- a/src/main/resources/docgen/templates/dl21-passage-augmented-d2q-t5.template +++ b/src/main/resources/docgen/templates/dl21-passage-augmented-d2q-t5.template @@ -5,7 +5,7 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-passage-augmented.template b/src/main/resources/docgen/templates/dl21-passage-augmented.template index e1727bf002..f3b2fc40f7 100644 --- a/src/main/resources/docgen/templates/dl21-passage-augmented.template +++ b/src/main/resources/docgen/templates/dl21-passage-augmented.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-passage-d2q-t5.template b/src/main/resources/docgen/templates/dl21-passage-d2q-t5.template index b6a40144dd..1e12c54da1 100644 --- a/src/main/resources/docgen/templates/dl21-passage-d2q-t5.template +++ b/src/main/resources/docgen/templates/dl21-passage-d2q-t5.template @@ -5,7 +5,7 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template b/src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template index 3d1592b742..a41f15c6e8 100644 --- a/src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/dl21-passage-splade-pp-ed.template @@ -8,7 +8,7 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-EnsembleDistil](https: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -59,7 +59,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -85,6 +85,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template b/src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template index bd64ec32fa..f1e516ebfd 100644 --- a/src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template +++ b/src/main/resources/docgen/templates/dl21-passage-splade-pp-sd.template @@ -8,7 +8,7 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-SelfDistil](https://hu > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -59,7 +59,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -85,6 +85,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template b/src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template index 1b551dfe13..8b17d4244b 100644 --- a/src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template +++ b/src/main/resources/docgen/templates/dl21-passage-unicoil-0shot.template @@ -10,7 +10,7 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -68,7 +68,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -97,7 +97,7 @@ ${effectiveness} This run roughly corresponds to run `d_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template b/src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template index 7457ef88d5..2207898675 100644 --- a/src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template +++ b/src/main/resources/docgen/templates/dl21-passage-unicoil-noexp-0shot.template @@ -10,7 +10,7 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -68,7 +68,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -97,7 +97,7 @@ ${effectiveness} This run roughly corresponds to run `d_unicoil0` submitted to the TREC 2021 Deep Learning Track under the "baseline" group. The difference is that here we are using pre-encoded queries, whereas the official submission performed query encoding on the fly. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl21-passage.template b/src/main/resources/docgen/templates/dl21-passage.template index ffe280a3e3..275e7a4b75 100644 --- a/src/main/resources/docgen/templates/dl21-passage.template +++ b/src/main/resources/docgen/templates/dl21-passage.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl22-passage-augmented-d2q-t5.template b/src/main/resources/docgen/templates/dl22-passage-augmented-d2q-t5.template index 60db1cdc59..edaa756ca6 100644 --- a/src/main/resources/docgen/templates/dl22-passage-augmented-d2q-t5.template +++ b/src/main/resources/docgen/templates/dl22-passage-augmented-d2q-t5.template @@ -5,7 +5,7 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl22-passage-augmented.template b/src/main/resources/docgen/templates/dl22-passage-augmented.template index a056820ce8..989128882d 100644 --- a/src/main/resources/docgen/templates/dl22-passage-augmented.template +++ b/src/main/resources/docgen/templates/dl22-passage-augmented.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the _augmented version_ of the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl22-passage-d2q-t5.template b/src/main/resources/docgen/templates/dl22-passage-d2q-t5.template index c92b400da2..fcceb6e16e 100644 --- a/src/main/resources/docgen/templates/dl22-passage-d2q-t5.template +++ b/src/main/resources/docgen/templates/dl22-passage-d2q-t5.template @@ -5,7 +5,7 @@ This page describes document expansion experiments (with doc2query-T5), integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template b/src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template index a98ecb36b6..650463e008 100644 --- a/src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/dl22-passage-splade-pp-ed.template @@ -8,7 +8,7 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-EnsembleDistil](https: > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -59,7 +59,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -88,6 +88,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template b/src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template index f08ada810e..6ef31c429e 100644 --- a/src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template +++ b/src/main/resources/docgen/templates/dl22-passage-splade-pp-sd.template @@ -8,7 +8,7 @@ Here, we cover experiments with the [SPLADE++ CoCondenser-SelfDistil](https://hu > Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. [From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.](https://dl.acm.org/doi/10.1145/3477495.3531857) _Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval_, pages 2353–2359. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -59,7 +59,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -88,6 +88,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/dl22-passage-unicoil-0shot.template b/src/main/resources/docgen/templates/dl22-passage-unicoil-0shot.template index 365c880e9e..fd02859b3d 100644 --- a/src/main/resources/docgen/templates/dl22-passage-unicoil-0shot.template +++ b/src/main/resources/docgen/templates/dl22-passage-unicoil-0shot.template @@ -10,7 +10,7 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -68,7 +68,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl22-passage-unicoil-noexp-0shot.template b/src/main/resources/docgen/templates/dl22-passage-unicoil-noexp-0shot.template index ae6cc78be7..88d0a5550b 100644 --- a/src/main/resources/docgen/templates/dl22-passage-unicoil-noexp-0shot.template +++ b/src/main/resources/docgen/templates/dl22-passage-unicoil-noexp-0shot.template @@ -10,7 +10,7 @@ The uniCOIL model is described in the following paper: > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -68,7 +68,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/dl22-passage.template b/src/main/resources/docgen/templates/dl22-passage.template index 4e9cab80b9..efaaabec7a 100644 --- a/src/main/resources/docgen/templates/dl22-passage.template +++ b/src/main/resources/docgen/templates/dl22-passage.template @@ -5,7 +5,7 @@ This page describes baseline experiments, integrated into Anserini's regression testing framework, on the [TREC 2022 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2022.html) using the MS MARCO V2 Passage Corpus. Note that the NIST relevance judgments provide far more relevant passages per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](experiments-msmarco-v2.md). +For additional instructions on working with the MS MARCO V2 Passage Corpus, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -25,9 +25,9 @@ ${index_cmds} ``` The value of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/fever.template b/src/main/resources/docgen/templates/fever.template index 5dfeb5f27e..4d4f2ba86f 100644 --- a/src/main/resources/docgen/templates/fever.template +++ b/src/main/resources/docgen/templates/fever.template @@ -19,9 +19,9 @@ Typical indexing command: ${index_cmds} ``` -The directory `/path/to/fever` should be a directory containing the expanded document collection; see [this link](../docs/experiments-fever.md) for how to prepare this collection. +The directory `/path/to/fever` should be a directory containing the expanded document collection; see [this page](${root_path}/docs/experiments-fever.md) for how to prepare this collection. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/fire12-bn.template b/src/main/resources/docgen/templates/fire12-bn.template index e2ecdb476b..aa602f0a0f 100644 --- a/src/main/resources/docgen/templates/fire12-bn.template +++ b/src/main/resources/docgen/templates/fire12-bn.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/fire12-bn/` should be a directory containing the collection, containing `bn_ABP` and `bn_BDNews24` directories. There should be 500,122 documents in total. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/fire12-en.template b/src/main/resources/docgen/templates/fire12-en.template index af02820dc8..eb8f89422b 100644 --- a/src/main/resources/docgen/templates/fire12-en.template +++ b/src/main/resources/docgen/templates/fire12-en.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/fire12-en/` should be a directory containing the collection, containing `en_BDNews24` and `en_TheTelegraph_2001-2010` directories. There should be 392,577 documents in total. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/fire12-hi.template b/src/main/resources/docgen/templates/fire12-hi.template index b33eb4e088..d22a2aff3b 100644 --- a/src/main/resources/docgen/templates/fire12-hi.template +++ b/src/main/resources/docgen/templates/fire12-hi.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/fire12-hi/` should be a directory containing the collection, containing `hi_AmarUjala` and `hi_NavbharatTimes` directories. There should be 331,599 documents in total. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/gov2.template b/src/main/resources/docgen/templates/gov2.template index 709010bca9..10dee79956 100644 --- a/src/main/resources/docgen/templates/gov2.template +++ b/src/main/resources/docgen/templates/gov2.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/gov2/` should be the root directory of the [Gov2 collection](http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm), i.e., `ls /path/to/gov2/` should bring up a bunch of subdirectories, `GX000` to `GX272`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template b/src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template index 975e09ab3e..f0adee18f6 100644 --- a/src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template +++ b/src/main/resources/docgen/templates/hc4-neuclir22-fa-en.template @@ -35,7 +35,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -59,7 +59,7 @@ ${effectiveness} The above results reproduce the BM25 title queries run in Table 2 of [this paper](https://arxiv.org/pdf/2201.08471.pdf). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-neuclir22-fa.template b/src/main/resources/docgen/templates/hc4-neuclir22-fa.template index d418ca7c04..c42ca02df6 100644 --- a/src/main/resources/docgen/templates/hc4-neuclir22-fa.template +++ b/src/main/resources/docgen/templates/hc4-neuclir22-fa.template @@ -35,7 +35,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -59,7 +59,7 @@ ${effectiveness} The above results reproduce the BM25 title queries run in Table 2 of [this paper](https://arxiv.org/pdf/2201.08471.pdf). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template b/src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template index 7bb014245d..4d097c2214 100644 --- a/src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template +++ b/src/main/resources/docgen/templates/hc4-neuclir22-ru-en.template @@ -36,7 +36,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -58,7 +58,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-neuclir22-ru.template b/src/main/resources/docgen/templates/hc4-neuclir22-ru.template index 35502100ff..f2115aaf48 100644 --- a/src/main/resources/docgen/templates/hc4-neuclir22-ru.template +++ b/src/main/resources/docgen/templates/hc4-neuclir22-ru.template @@ -36,7 +36,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -58,7 +58,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template b/src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template index 80f574e90f..b5c890ee1b 100644 --- a/src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template +++ b/src/main/resources/docgen/templates/hc4-neuclir22-zh-en.template @@ -35,7 +35,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -57,7 +57,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-neuclir22-zh.template b/src/main/resources/docgen/templates/hc4-neuclir22-zh.template index 160e7c2001..017ce40f63 100644 --- a/src/main/resources/docgen/templates/hc4-neuclir22-zh.template +++ b/src/main/resources/docgen/templates/hc4-neuclir22-zh.template @@ -35,7 +35,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -57,7 +57,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-v1.0-fa.template b/src/main/resources/docgen/templates/hc4-v1.0-fa.template index f3a2d508bf..316a41e5fa 100644 --- a/src/main/resources/docgen/templates/hc4-v1.0-fa.template +++ b/src/main/resources/docgen/templates/hc4-v1.0-fa.template @@ -34,7 +34,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -58,7 +58,7 @@ ${effectiveness} The above results reproduce the BM25 title queries run in Table 2 of [this paper](https://arxiv.org/pdf/2201.08471.pdf). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-v1.0-ru.template b/src/main/resources/docgen/templates/hc4-v1.0-ru.template index ebea0c5813..4c0ce23fa8 100644 --- a/src/main/resources/docgen/templates/hc4-v1.0-ru.template +++ b/src/main/resources/docgen/templates/hc4-v1.0-ru.template @@ -35,7 +35,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -57,7 +57,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/hc4-v1.0-zh.template b/src/main/resources/docgen/templates/hc4-v1.0-zh.template index fa5362a207..c2a8c5b963 100644 --- a/src/main/resources/docgen/templates/hc4-v1.0-zh.template +++ b/src/main/resources/docgen/templates/hc4-v1.0-zh.template @@ -34,7 +34,7 @@ ${index_cmds} ``` See [this page](https://github.com/hltcoe/HC4) for more details about the HC4 corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/mb11.template b/src/main/resources/docgen/templates/mb11.template index 545ceb2cde..e7f2aa5fe7 100644 --- a/src/main/resources/docgen/templates/mb11.template +++ b/src/main/resources/docgen/templates/mb11.template @@ -30,7 +30,7 @@ More available indexing options: * `-tweet.maxId`: the max tweet Id for indexing. Tweet Ids that are larger (when being parsed to Long type) than this value will NOT be indexed, default `LONG.MAX_VALUE` * `-tweet.deletedIdsFile`: a file that contains deleted tweetIds, one per line. these tweeets won't be indexed -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mb13.template b/src/main/resources/docgen/templates/mb13.template index b6710c7d72..3968dd2b46 100644 --- a/src/main/resources/docgen/templates/mb13.template +++ b/src/main/resources/docgen/templates/mb13.template @@ -30,7 +30,7 @@ More available indexing options: * `-tweet.maxId`: the max tweet Id for indexing. Tweet Ids that are larger (when being parsed to Long type) than this value will NOT be indexed, default `LONG.MAX_VALUE` * `-tweet.deletedIdsFile`: a file that contains deleted tweetIds, one per line. these tweeets won't be indexed -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ar-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-ar-aca.template index b7dffd6cac..2a06d8dd53 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ar-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ar-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ar.template b/src/main/resources/docgen/templates/miracl-v1.0-ar.template index c7c529063f..40fe5491f0 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ar.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ar.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-bn-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-bn-aca.template index 0531aef243..fb6509a287 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-bn-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-bn-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-bn.template b/src/main/resources/docgen/templates/miracl-v1.0-bn.template index cac34e75f9..d229fbac82 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-bn.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-bn.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-en-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-en-aca.template index 8c6259bd23..192d40d4a8 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-en-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-en-aca.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-en.template b/src/main/resources/docgen/templates/miracl-v1.0-en.template index 7ed9013749..e1c6932357 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-en.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-en.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-es-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-es-aca.template index e7868d8e32..2d71a2b57f 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-es-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-es-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-es.template b/src/main/resources/docgen/templates/miracl-v1.0-es.template index f26df26747..96fb15e4dc 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-es.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-es.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-fa-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-fa-aca.template index 744bc1d0f9..da20fce2ab 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-fa-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-fa-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-fa.template b/src/main/resources/docgen/templates/miracl-v1.0-fa.template index 6a8bf66cda..601427600f 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-fa.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-fa.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-fi-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-fi-aca.template index c2b5c9fce5..0d5185bf6a 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-fi-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-fi-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-fi.template b/src/main/resources/docgen/templates/miracl-v1.0-fi.template index 35b7121a3e..04b96cee10 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-fi.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-fi.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-fr-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-fr-aca.template index b7dffd6cac..2a06d8dd53 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-fr-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-fr-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-fr.template b/src/main/resources/docgen/templates/miracl-v1.0-fr.template index c7c529063f..40fe5491f0 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-fr.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-fr.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-hi-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-hi-aca.template index 6e6afa3b34..97105dd710 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-hi-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-hi-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-hi.template b/src/main/resources/docgen/templates/miracl-v1.0-hi.template index f95573dc59..79386aae55 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-hi.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-hi.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-id-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-id-aca.template index 91d4b26ec2..681486b117 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-id-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-id-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-id.template b/src/main/resources/docgen/templates/miracl-v1.0-id.template index 6acce89df0..12a4a89f8e 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-id.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-id.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ja-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-ja-aca.template index 492dea7040..adc6ca1d83 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ja-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ja-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ja.template b/src/main/resources/docgen/templates/miracl-v1.0-ja.template index 02fc972864..64b66c1f72 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ja.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ja.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ko-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-ko-aca.template index f69e0013e3..f1c5168e9f 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ko-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ko-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ko.template b/src/main/resources/docgen/templates/miracl-v1.0-ko.template index d1ebe98a5d..0b6938403e 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ko.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ko.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ru-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-ru-aca.template index 02fda8c829..8d2969cd8d 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ru-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ru-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-ru.template b/src/main/resources/docgen/templates/miracl-v1.0-ru.template index a019feba58..610c23b9a3 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-ru.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-ru.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-sw-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-sw-aca.template index 9f772c5d0e..72db90bac0 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-sw-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-sw-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-sw.template b/src/main/resources/docgen/templates/miracl-v1.0-sw.template index 2990d6cc6f..a2cd95870d 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-sw.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-sw.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-te-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-te-aca.template index 6c90a2bb47..29ea6c0f03 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-te-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-te-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-te.template b/src/main/resources/docgen/templates/miracl-v1.0-te.template index e4d8a76406..d3584e77a7 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-te.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-te.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-th-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-th-aca.template index 9e7ec4ddf8..9112f7d0c7 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-th-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-th-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-th.template b/src/main/resources/docgen/templates/miracl-v1.0-th.template index f67fd50fba..ee7992e916 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-th.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-th.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-zh-aca.template b/src/main/resources/docgen/templates/miracl-v1.0-zh-aca.template index 43ffb79e33..dc5894b56f 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-zh-aca.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-zh-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/miracl-v1.0-zh.template b/src/main/resources/docgen/templates/miracl-v1.0-zh.template index 2c43b06679..60cf82b030 100644 --- a/src/main/resources/docgen/templates/miracl-v1.0-zh.template +++ b/src/main/resources/docgen/templates/miracl-v1.0-zh.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/project-miracl/miracl) for more details about the MIRACL corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ar-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ar-aca.template index c53af95678..6694999f81 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ar-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ar-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ar.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ar.template index fa80666165..ec8848405a 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ar.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ar.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-bn-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-bn-aca.template index e1c3642772..80a5e70336 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-bn-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-bn-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-bn.template b/src/main/resources/docgen/templates/mrtydi-v1.1-bn.template index 9677595de5..d7b36cedac 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-bn.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-bn.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-en-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-en-aca.template index 8790cad20d..e88e50c037 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-en-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-en-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-en.template b/src/main/resources/docgen/templates/mrtydi-v1.1-en.template index d208829706..460e8c4018 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-en.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-en.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-fi-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-fi-aca.template index 0c488e917a..ffb810b1a8 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-fi-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-fi-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-fi.template b/src/main/resources/docgen/templates/mrtydi-v1.1-fi.template index e301cd60da..d77c711f11 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-fi.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-fi.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-id-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-id-aca.template index 4f954d8e2d..910e45e320 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-id-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-id-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-id.template b/src/main/resources/docgen/templates/mrtydi-v1.1-id.template index c060e83b43..112657e2f0 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-id.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-id.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ja-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ja-aca.template index 09477b0dbf..9e8e64b521 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ja-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ja-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ja.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ja.template index 71cffe5188..92ef3b1bee 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ja.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ja.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ko-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ko-aca.template index ede258a0ef..8b3be0df95 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ko-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ko-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ko.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ko.template index 25e7b6fd41..0476eae101 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ko.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ko.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ru-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ru-aca.template index 9839eae0bc..c74774bf90 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ru-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ru-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-ru.template b/src/main/resources/docgen/templates/mrtydi-v1.1-ru.template index 1f35d1e5de..8f5c88a6bb 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-ru.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-ru.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-sw-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-sw-aca.template index 43c1c3ae6d..e1a29e10ed 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-sw-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-sw-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-sw.template b/src/main/resources/docgen/templates/mrtydi-v1.1-sw.template index 23e5188a95..0f5a347c41 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-sw.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-sw.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-te-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-te-aca.template index 8ed6507873..62687ef93f 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-te-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-te-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-te.template b/src/main/resources/docgen/templates/mrtydi-v1.1-te.template index 5e1ad9f204..526784a759 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-te.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-te.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-th-aca.template b/src/main/resources/docgen/templates/mrtydi-v1.1-th-aca.template index d4caf4e4a7..62fde64591 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-th-aca.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-th-aca.template @@ -22,7 +22,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/mrtydi-v1.1-th.template b/src/main/resources/docgen/templates/mrtydi-v1.1-th.template index cdced36234..1ec7de0c2c 100644 --- a/src/main/resources/docgen/templates/mrtydi-v1.1-th.template +++ b/src/main/resources/docgen/templates/mrtydi-v1.1-th.template @@ -20,7 +20,7 @@ ${index_cmds} ``` See [this page](https://github.com/castorini/mr.tydi) for more details about the Mr. TyDi corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-doc-ca.template b/src/main/resources/docgen/templates/msmarco-doc-ca.template index b0c9969630..f40ab7c836 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-ca.template +++ b/src/main/resources/docgen/templates/msmarco-doc-ca.template @@ -23,9 +23,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template b/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template index e4cf4bde09..44fce7a32e 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template @@ -13,7 +13,7 @@ All four conditions are described in detail [here](https://github.com/castorini/ The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -32,9 +32,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-docTTTTTquery/` should be a directory containing the expanded document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -67,7 +67,7 @@ Explanation of settings: In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](${root_path}/docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. ## Additional Implementation Details diff --git a/src/main/resources/docgen/templates/msmarco-doc-hgf-wp.template b/src/main/resources/docgen/templates/msmarco-doc-hgf-wp.template index 6922153d5a..30f768558a 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-hgf-wp.template +++ b/src/main/resources/docgen/templates/msmarco-doc-hgf-wp.template @@ -26,7 +26,7 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented-ca.template b/src/main/resources/docgen/templates/msmarco-doc-segmented-ca.template index 0046e3bffd..e4c5f399ae 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented-ca.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented-ca.template @@ -5,7 +5,7 @@ This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2020 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2020.html). Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast). -For additional instructions on working with MS MARCO document collection, refer to [this page](experiments-msmarco-doc.md). +For additional instructions on working with MS MARCO document collection, refer to [this page](${root_path}/docs/experiments-msmarco-doc.md). + **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing + **Expansion Condition:** none @@ -15,7 +15,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -34,9 +34,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template b/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template index 77842980c1..fd49394d41 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template @@ -14,7 +14,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -33,9 +33,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented-docTTTTTquery/` should be a directory containing the expanded segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -69,7 +69,7 @@ In these runs, we are retrieving the top 1000 hits for each query and using `tre Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](${root_path}/docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. The MaxP passage retrieval functionality is available in `SearchCollection`. To generate an MS MARCO submission with the BM25 default parameters, corresponding to "BM25 (default)" above: diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template b/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template index a50bc50587..9282c7f12b 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil-noexp.template @@ -61,7 +61,7 @@ The directory `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -97,7 +97,7 @@ Because of tie-breaking effects, we get slightly different results: | `-hits 10000 -selectMaxPassage.hits 100` | 0.3409 | 0.3409 | 0.8639 | - | 0.3410112121151749 | | `-hits 1000 -selectMaxPassage.hits 100` | 0.3409 | 0.3409 | 0.8639 | - | 0.3410112121151749 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template b/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template index fa174b39c8..1224106908 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented-unicoil.template @@ -61,7 +61,7 @@ The directory `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 20,545,677 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -128,7 +128,7 @@ Because of tie-breaking effects, we get slightly different results: | `-hits 10000 -selectMaxPassage.hits 100` | 0.3531 | 0.3531 | 0.8860 | - | 0.352997702662614 | | `-hits 1000 -selectMaxPassage.hits 100` | 0.3531 | 0.3531 | 0.8860 | - | 0.352997702662614 | -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented-wp.template b/src/main/resources/docgen/templates/msmarco-doc-segmented-wp.template index 7ba14e14eb..648bae0a3c 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented-wp.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented-wp.template @@ -25,9 +25,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented.template b/src/main/resources/docgen/templates/msmarco-doc-segmented.template index 8919074056..7b70ac5fe8 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented.template @@ -14,7 +14,7 @@ In the passage (i.e., segment) indexing condition, we select the score of the hi The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -33,9 +33,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc-segmented/` should be a directory containing the segmented corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -69,7 +69,7 @@ In these runs, we are retrieving the top 1000 hits for each query and using `tre Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](${root_path}/docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. The MaxP passage retrieval functionality is available in `SearchCollection`. To generate an MS MARCO submission with the BM25 default parameters, corresponding to "BM25 (default)" above: diff --git a/src/main/resources/docgen/templates/msmarco-doc-wp.template b/src/main/resources/docgen/templates/msmarco-doc-wp.template index 357b16c6d2..7ff2a08dc2 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-wp.template +++ b/src/main/resources/docgen/templates/msmarco-doc-wp.template @@ -24,9 +24,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-doc.template b/src/main/resources/docgen/templates/msmarco-doc.template index 7f22ef014c..96a124bd1a 100644 --- a/src/main/resources/docgen/templates/msmarco-doc.template +++ b/src/main/resources/docgen/templates/msmarco-doc.template @@ -13,7 +13,7 @@ All four conditions are described in detail [here](https://github.com/castorini/ The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. -Note that in November 2021 we discovered issues in our regression tests, documented [here](experiments-msmarco-doc-doc2query-details.md). +Note that in November 2021 we discovered issues in our regression tests, documented [here](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md). As a result, we have had to rebuild all our regressions from the raw corpus. These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason. @@ -32,9 +32,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-doc/` should be a directory containing the document corpus in Anserini's jsonl format. -See [this page](experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. +See [this page](${root_path}/docs/experiments-msmarco-doc-doc2query-details.md) for how to prepare the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -63,14 +63,14 @@ Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. + The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](${root_path}/docs/experiments-msmarco-doc.md) additional details. -See [this page](experiments-msmarco-doc.md) for more details on tuning. +See [this page](${root_path}/docs/experiments-msmarco-doc.md) for more details on tuning. In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. Beware, an official MS MARCO document ranking task leaderboard submission comprises only 100 hits per query. -See [this page](experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. +See [this page](${root_path}/docs/experiments-msmarco-doc-leaderboard.md) for details on Anserini baseline runs that were submitted to the official leaderboard. ## Additional Implementation Details diff --git a/src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template b/src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template index 4e07c6e189..db81ed980c 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template +++ b/src/main/resources/docgen/templates/msmarco-passage-bm25-b8.template @@ -3,7 +3,7 @@ **Models**: BM25 with quantized weights (8 bits) This page documents regression experiments on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), which is integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -49,12 +49,12 @@ ${index_cmds} The directory `/path/to/${corpus}/` should be a directory containing `jsonl` files containing quantized BM25 vectors for every document -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -74,7 +74,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-ca.template b/src/main/resources/docgen/templates/msmarco-passage-ca.template index b15908fb0c..44d00ee84a 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-ca.template +++ b/src/main/resources/docgen/templates/msmarco-passage-ca.template @@ -24,12 +24,12 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: diff --git a/src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template b/src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template index da6b0aa424..973ee00bee 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template +++ b/src/main/resources/docgen/templates/msmarco-passage-cos-dpr-distil.template @@ -56,12 +56,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. Upon completion, we should have an index with 8,841,823 documents. - + ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows using HNSW indexes: @@ -84,7 +84,7 @@ ${effectiveness} Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run. Nevertheless, scores are generally stable to the third digit after the decimal point. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template b/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template index c4848780e3..0ca788e934 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template +++ b/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template @@ -56,12 +56,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADEv2 tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -105,7 +105,7 @@ QueriesRanked: 6980 The final evaluation metric is very close to the one reported in the paper (0.326). -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template b/src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template index 8fc97927e3..3328f9413a 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template +++ b/src/main/resources/docgen/templates/msmarco-passage-distill-splade-max.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADEv2 tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -106,7 +106,7 @@ QueriesRanked: 6980 This corresponds to the effectiveness reported in the paper. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-doc2query.template b/src/main/resources/docgen/templates/msmarco-passage-doc2query.template index ee34ae6a70..8686a6d106 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-doc2query.template +++ b/src/main/resources/docgen/templates/msmarco-passage-doc2query.template @@ -7,7 +7,7 @@ This page documents regression experiments on the [MS MARCO passage ranking task > Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. [Document Expansion by Query Prediction.](https://arxiv.org/abs/1904.08375) arXiv:1904.08375, 2019. These experiments are integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-doc2query.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-doc2query.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -27,14 +27,14 @@ ${index_cmds} ``` The directory `/path/to/msmarco-passage-doc2query` should be a directory containing `jsonl` files containing the expanded passage collection. -[This page](experiments-doc2query.md) explains how to perform this data preparation. +[This page](${root_path}/docs/experiments-doc2query.md) explains how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -57,7 +57,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). ## Additional Implementation Details diff --git a/src/main/resources/docgen/templates/msmarco-passage-docTTTTTquery.template b/src/main/resources/docgen/templates/msmarco-passage-docTTTTTquery.template index 239f7eeec3..05d946037d 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/msmarco-passage-docTTTTTquery.template @@ -28,12 +28,12 @@ ${index_cmds} The directory `/path/to/msmarco-passage-docTTTTTquery` should be a directory containing `jsonl` files containing the expanded passage collection. [Instructions in the docTTTTTquery repo](http://doc2query.ai/) explain how to perform this data preparation. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -56,7 +56,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, tuned on _on the original passages_, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). + The setting "tuned2" refers to `k1=2.18`, `b=0.86`, tuned to optimize for recall@1000 directly _on the expanded passages_ (in 2020/12); this is the configuration reported in the Lin et al. (SIGIR 2021) Pyserini paper. ## Additional Implementation Details diff --git a/src/main/resources/docgen/templates/msmarco-passage-hgf-wp.template b/src/main/resources/docgen/templates/msmarco-passage-hgf-wp.template index 7d0f406b00..c31fcf3c96 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-hgf-wp.template +++ b/src/main/resources/docgen/templates/msmarco-passage-hgf-wp.template @@ -26,12 +26,12 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: diff --git a/src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template b/src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template index 25c4505e6b..2bc3de57ca 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template +++ b/src/main/resources/docgen/templates/msmarco-passage-splade-distil-cocodenser-medium.template @@ -55,12 +55,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -80,7 +80,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template index f08d3f4b44..939ea0fe5d 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template +++ b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed-onnx.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -82,7 +82,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template index f3bb706a7a..c4ddea3366 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-ed.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -82,7 +82,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template index 44df0f0ea8..ddcf05e9f8 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template +++ b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd-onnx.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -82,7 +82,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template index f215dc741a..171a7318dc 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template +++ b/src/main/resources/docgen/templates/msmarco-passage-splade-pp-sd.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -82,7 +82,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template b/src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template index 03ee48f139..9d82d8faa3 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/msmarco-passage-unicoil-noexp.template @@ -60,12 +60,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -85,7 +85,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template b/src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template index f37a90b242..4f47cd9926 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template +++ b/src/main/resources/docgen/templates/msmarco-passage-unicoil-tilde-expansion.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the SPLADEv2 tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -106,7 +106,7 @@ QueriesRanked: 6980 This corresponds to the effectiveness reported in the paper. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-unicoil.template b/src/main/resources/docgen/templates/msmarco-passage-unicoil.template index a2f29bdca6..5026217ba6 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-unicoil.template +++ b/src/main/resources/docgen/templates/msmarco-passage-unicoil.template @@ -57,12 +57,12 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -106,7 +106,7 @@ QueriesRanked: 6980 This corresponds to the effectiveness reported in the paper and also the run named "uniCOIL-d2q" on the official MS MARCO Passage Ranking Leaderboard, submitted 2021/09/22. -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-passage-wp.template b/src/main/resources/docgen/templates/msmarco-passage-wp.template index 1de0811989..0a8ac1162f 100644 --- a/src/main/resources/docgen/templates/msmarco-passage-wp.template +++ b/src/main/resources/docgen/templates/msmarco-passage-wp.template @@ -25,12 +25,12 @@ ${index_cmds} The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: diff --git a/src/main/resources/docgen/templates/msmarco-passage.template b/src/main/resources/docgen/templates/msmarco-passage.template index 5fe34092f1..d4d4462afd 100644 --- a/src/main/resources/docgen/templates/msmarco-passage.template +++ b/src/main/resources/docgen/templates/msmarco-passage.template @@ -3,7 +3,7 @@ **Models**: various bag-of-words approaches This page documents regression experiments on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), which is integrated into Anserini's regression testing framework. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-msmarco-passage.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -23,14 +23,14 @@ ${index_cmds} ``` The directory `/path/to/msmarco-passage/` should be a directory containing `jsonl` files converted from the official passage collection, which is in `tsv` format. -[This page](experiments-msmarco-passage.md) explains how to perform this conversion. +[This page](${root_path}/docs/experiments-msmarco-passage.md) explains how to perform this conversion. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule. -The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. +The regression experiments here evaluate on the 6980 dev set questions; see [this page](${root_path}/docs/experiments-msmarco-passage.md) for more details. After indexing has completed, you should be able to perform retrieval as follows: @@ -53,7 +53,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](experiments-msmarco-passage.md). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](${root_path}/docs/experiments-msmarco-passage.md). To generate runs corresponding to the submissions on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/), follow the instructions below: diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-d2q-t5.template b/src/main/resources/docgen/templates/msmarco-v2-doc-d2q-t5.template index 4e933168e8..5ead2c938f 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-d2q-t5.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-d2q-t5.template @@ -23,9 +23,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-v2-doc-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-d2q-t5.template b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-d2q-t5.template index c17fd4868d..f0bacf58e2 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-d2q-t5.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-d2q-t5.template @@ -23,9 +23,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-v2-doc-segmented-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template index 54036106ca..99f14ca1e4 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot-v2.template @@ -71,7 +71,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -95,7 +95,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template index 0f7b8944ef..bcd17cf23a 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-0shot.template @@ -70,7 +70,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,414 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -94,7 +94,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template index 1b28c3ad42..de17b54f0d 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.template @@ -71,7 +71,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -95,7 +95,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template index 74b7b377ac..10e280f4ac 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented-unicoil-noexp-0shot.template @@ -70,7 +70,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 124,131,404 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -94,7 +94,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented.template b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented.template index 56289b8b71..898bed064d 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc-segmented.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc-segmented.template @@ -4,7 +4,7 @@ This page describes regression experiments for document ranking _on the segmented version_ of the MS MARCO (V2) document corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -24,9 +24,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-v2-doc-segmented/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-doc.template b/src/main/resources/docgen/templates/msmarco-v2-doc.template index bfdc5d1d33..7e93ba8067 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-doc.template +++ b/src/main/resources/docgen/templates/msmarco-v2-doc.template @@ -4,7 +4,7 @@ This page describes regression experiments for document ranking on the MS MARCO (V2) document corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -24,9 +24,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-v2-doc/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-augmented-d2q-t5.template b/src/main/resources/docgen/templates/msmarco-v2-passage-augmented-d2q-t5.template index a9a3957c13..75b856e6c0 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-augmented-d2q-t5.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-augmented-d2q-t5.template @@ -24,7 +24,7 @@ ${index_cmds} The directory `/path/to/msmarco-v2-passage-augmented-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-augmented.template b/src/main/resources/docgen/templates/msmarco-v2-passage-augmented.template index f81ee17c8b..98c17fc971 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-augmented.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-augmented.template @@ -4,7 +4,7 @@ This page describes regression experiments for passage ranking _on the augmented version_ of the MS MARCO V2 Passage Corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -24,9 +24,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-v2-passage-augmented/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-d2q-t5.template b/src/main/resources/docgen/templates/msmarco-v2-passage-d2q-t5.template index f61f3d0429..a237d4517b 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-d2q-t5.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-d2q-t5.template @@ -24,7 +24,7 @@ ${index_cmds} The directory `/path/to/msmarco-v2-passage-d2q-t5/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template b/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template index 5306211fd3..a3b54b1aab 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-ed.template @@ -58,7 +58,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -82,6 +82,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template b/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template index 8ad0b74923..8c2b8345d8 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-splade-pp-sd.template @@ -58,7 +58,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doc lengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens. Upon completion, we should have an index with 8,841,823 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -82,6 +82,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template b/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template index c880ece60e..18e3f9409f 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-0shot.template @@ -65,7 +65,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -89,7 +89,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template b/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template index 436d22ac64..4d05b36c72 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage-unicoil-noexp-0shot.template @@ -65,7 +65,7 @@ The path `/path/to/${corpus}/` should point to the corpus downloaded above. The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the uniCOIL tokens. Upon completion, we should have an index with 138,364,198 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -89,7 +89,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/msmarco-v2-passage.template b/src/main/resources/docgen/templates/msmarco-v2-passage.template index 28dfb26a9f..1f6d6cbddd 100644 --- a/src/main/resources/docgen/templates/msmarco-v2-passage.template +++ b/src/main/resources/docgen/templates/msmarco-v2-passage.template @@ -4,7 +4,7 @@ This page describes regression experiments for passage ranking on the MS MARCO V2 Passage Corpus using the dev queries, which is integrated into Anserini's regression testing framework. Here, we cover bag-of-words baselines. -For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-v2.md). +For more complete instructions on how to run end-to-end experiments, refer to [this page](${root_path}/docs/experiments-msmarco-v2.md). The exact configurations for these regressions are stored in [this YAML file](${yaml}). Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. @@ -24,9 +24,9 @@ ${index_cmds} ``` The directory `/path/to/msmarco-v2-passage/` should be a directory containing the compressed `jsonl` files that comprise the corpus. -See [this page](experiments-msmarco-v2.md) for additional details. +See [this page](${root_path}/docs/experiments-msmarco-v2.md) for additional details. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template b/src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template index 5a90e3f6e7..6c4f746ebd 100644 --- a/src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template +++ b/src/main/resources/docgen/templates/neuclir22-fa-dt-splade.template @@ -43,7 +43,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -65,7 +65,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-fa-dt.template b/src/main/resources/docgen/templates/neuclir22-fa-dt.template index 21122d80b0..455cc82d61 100644 --- a/src/main/resources/docgen/templates/neuclir22-fa-dt.template +++ b/src/main/resources/docgen/templates/neuclir22-fa-dt.template @@ -34,7 +34,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template b/src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template index 6949f2afe3..c508727b44 100644 --- a/src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template +++ b/src/main/resources/docgen/templates/neuclir22-fa-qt-splade.template @@ -43,7 +43,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -65,7 +65,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-fa-qt.template b/src/main/resources/docgen/templates/neuclir22-fa-qt.template index 2d78467ac7..c4efb1d78b 100644 --- a/src/main/resources/docgen/templates/neuclir22-fa-qt.template +++ b/src/main/resources/docgen/templates/neuclir22-fa-qt.template @@ -34,7 +34,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template b/src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template index bb82224f90..65038c2f64 100644 --- a/src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template +++ b/src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template @@ -43,7 +43,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -65,7 +65,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-ru-dt.template b/src/main/resources/docgen/templates/neuclir22-ru-dt.template index 2f53c96cf7..75bfe856de 100644 --- a/src/main/resources/docgen/templates/neuclir22-ru-dt.template +++ b/src/main/resources/docgen/templates/neuclir22-ru-dt.template @@ -34,7 +34,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template b/src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template index ccf7e74a3f..7ca7e23eef 100644 --- a/src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template +++ b/src/main/resources/docgen/templates/neuclir22-ru-qt-splade.template @@ -43,7 +43,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -65,7 +65,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-ru-qt.template b/src/main/resources/docgen/templates/neuclir22-ru-qt.template index e4f44c78c2..20c640d410 100644 --- a/src/main/resources/docgen/templates/neuclir22-ru-qt.template +++ b/src/main/resources/docgen/templates/neuclir22-ru-qt.template @@ -34,7 +34,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template b/src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template index cb566c96b6..495c80e558 100644 --- a/src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template +++ b/src/main/resources/docgen/templates/neuclir22-zh-dt-splade.template @@ -43,7 +43,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -65,7 +65,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-zh-dt.template b/src/main/resources/docgen/templates/neuclir22-zh-dt.template index 724c3fff43..4b05065a32 100644 --- a/src/main/resources/docgen/templates/neuclir22-zh-dt.template +++ b/src/main/resources/docgen/templates/neuclir22-zh-dt.template @@ -34,7 +34,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template b/src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template index 3f63587daf..91a6fa6244 100644 --- a/src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template +++ b/src/main/resources/docgen/templates/neuclir22-zh-qt-splade.template @@ -43,7 +43,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -65,7 +65,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/neuclir22-zh-qt.template b/src/main/resources/docgen/templates/neuclir22-zh-qt.template index f026246420..c469e616c3 100644 --- a/src/main/resources/docgen/templates/neuclir22-zh-qt.template +++ b/src/main/resources/docgen/templates/neuclir22-zh-qt.template @@ -34,7 +34,7 @@ Typical indexing command: ${index_cmds} ``` -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,7 +56,7 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/ntcir8-zh.template b/src/main/resources/docgen/templates/ntcir8-zh.template index 4ad88a24a7..75bd6b95c0 100644 --- a/src/main/resources/docgen/templates/ntcir8-zh.template +++ b/src/main/resources/docgen/templates/ntcir8-zh.template @@ -25,7 +25,7 @@ We build the index directly from the raw LDC data: the directory `/path/to/ntcir8-zh/` should point to the directory `data/xin_cmn/` from LDC2007T38. In that directory, there should be 48 gzipped files matching the pattern `xin_cmn_200[2-5]*`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/robust05.template b/src/main/resources/docgen/templates/robust05.template index ae5168a1b6..be44540c6e 100644 --- a/src/main/resources/docgen/templates/robust05.template +++ b/src/main/resources/docgen/templates/robust05.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/aquaint/` should be the root directory of the [AQUAINT collection](https://tac.nist.gov//data/data_desc.html#AQUAINT); under subdirectory `disk1/` there should be `NYT/` and under subdirectory `disk2/` there should be `APW/` and `XIE/`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/trec02-ar.template b/src/main/resources/docgen/templates/trec02-ar.template index 079cbffec7..a6b1ff0d70 100644 --- a/src/main/resources/docgen/templates/trec02-ar.template +++ b/src/main/resources/docgen/templates/trec02-ar.template @@ -25,7 +25,7 @@ Inside the LDC2007T38 distribution, there should be a directory named `transcrip The path above `/path/to/trec02-ar/` should point to this `transcripts/` directory. The collection contains 383,872 documents. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template b/src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template index da4845ed50..b29254c88e 100644 --- a/src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template +++ b/src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template @@ -26,7 +26,7 @@ ${index_cmds} The directory `/path/to/${corpus}/`should be a directory containing the wiki-all-6-3-tamber passages collection retrieved from [here](https://huggingface.co/datasets/castorini/odqa-wiki-corpora). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -56,6 +56,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template b/src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template index f1aaa64768..4773d231f1 100644 --- a/src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template +++ b/src/main/resources/docgen/templates/wikipedia-dpr-100w-bm25.template @@ -23,7 +23,7 @@ ${index_cmds} The directory `/path/to/${corpus}/`should be a directory containing the wikipedia-dpr-100w passages collection retrieved from [here](https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz). -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval @@ -53,6 +53,6 @@ With the above commands, you should be able to reproduce the following results: ${effectiveness} -## Reproduction Log[*](reproducibility.md) +## Reproduction Log[*](${root_path}/docs/reproducibility.md) To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/wt10g.template b/src/main/resources/docgen/templates/wt10g.template index a7c226ce03..25b7d8946d 100644 --- a/src/main/resources/docgen/templates/wt10g.template +++ b/src/main/resources/docgen/templates/wt10g.template @@ -22,7 +22,7 @@ ${index_cmds} The directory `/path/to/wt10g/` should be the root directory of the [Wt10g collection](http://ir.dcs.gla.ac.uk/test_collections/wt10g.html), containing a bunch of subdirectories, `WTX001` to `WTX104`. -For additional details, see explanation of [common indexing options](common-indexing-options.md). +For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md). ## Retrieval diff --git a/src/test/java/io/anserini/doc/GenerateRegressionDocsTest.java b/src/test/java/io/anserini/doc/GenerateRegressionDocsTest.java index f5c6ea24cf..4a46af5c24 100755 --- a/src/test/java/io/anserini/doc/GenerateRegressionDocsTest.java +++ b/src/test/java/io/anserini/doc/GenerateRegressionDocsTest.java @@ -46,8 +46,9 @@ public void main() throws Exception { String download_corpus = data.getDownload_corpus(); Map valuesMap = new HashMap<>(); - valuesMap.put("yaml", String.format("../src/main/resources/regression/%s.yaml", testName)); - valuesMap.put("template", String.format("../src/main/resources/docgen/templates/%s.template", testName)); + valuesMap.put("root_path", "../.."); + valuesMap.put("yaml", String.format("../../src/main/resources/regression/%s.yaml", testName)); + valuesMap.put("template", String.format("../../src/main/resources/docgen/templates/%s.template", testName)); valuesMap.put("test_name", testName); valuesMap.put("corpus", corpus); valuesMap.put("download_url", data.getDownload_url()); @@ -66,7 +67,7 @@ public void main() throws Exception { scanner.close(); String resolvedString = sub.replace(text); - FileUtils.writeStringToFile(new File(String.format("docs/regressions-%s.md", testName)), + FileUtils.writeStringToFile(new File(String.format("docs/regressions/regressions-%s.md", testName)), resolvedString, "UTF-8"); } }