forked from castorini/anserini
-
Notifications
You must be signed in to change notification settings - Fork 0
/
neuclir22-fa-qt-splade.template
71 lines (45 loc) · 2.33 KB
/
neuclir22-fa-qt-splade.template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Anserini Regressions: NeuCLIR22 — Persian (Query Translation)
This page presents **query translation** regression experiments for the [TREC 2022 NeuCLIR Track](https://neuclir.github.io/), Persian, with the following configuration:
+ Queries: Translated from English into Persian
+ Documents: Original Persian corpus
+ Model: SPLADE NeuCLIR22
The exact configurations for these regressions are stored in [this YAML file](${yaml}).
Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
We make available a version of the corpus that has already been encoded with SPLADE NeuCLIR22, i.e., we performed model inference on every document and stored the output sparse vectors.
Thus, no neural inference is required to reproduce these experiments; see instructions below.
From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
```
python src/main/python/run_regression.py --index --verify --search --regression ${test_name}
```
## Corpus Download
Download the corpus and unpack into `collections/`:
```bash
wget https://rgw.cs.uwaterloo.ca/pyserini/data/neuclir22-fa-splade.tar -P collections/
tar xvf collections/neuclir22-fa-splade.tar -C collections/
```
To confirm, `neuclir22-fa-splade.tar` is 4.0 GB and has MD5 checksum `10fddf0b2a132b9514767bed87ca2693`.
With the corpus downloaded, the following command will perform the remaining steps below:
```bash
python src/main/python/run_regression.py --index --verify --search --regression ${test_name} \
--corpus-path collections/${corpus}
```
## Indexing
Typical indexing command:
```
${index_cmds}
```
For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md).
## Retrieval
After indexing has completed, you should be able to perform retrieval as follows:
```
${ranking_cmds}
```
Evaluation can be performed using `trec_eval`:
```
${eval_cmds}
```
## Effectiveness
With the above commands, you should be able to reproduce the following results:
${effectiveness}
## Reproduction Log[*](${root_path}/docs/reproducibility.md)
To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation.