Skip to content

Latest commit

 

History

History
78 lines (51 loc) · 3.35 KB

solrini.md

File metadata and controls

78 lines (51 loc) · 3.35 KB

Solrini: Anserini Integration with Solr

This page documents code for replicating results from the following paper:

We provide instructions for setting up a single-node SolrCloud instance running locally and indexing into it from Anserini. Instructions for setting up SolrCloud clusters can be found by searching the web.

Setting up a Single-Node SolrCloud Instance

From the Solr archives, download the Solr (non -src) version that matches Anserini's Lucene version.

Extract the archive:

mkdir solrini && tar -zxvf solr*.tgz -C solrini --strip-components=1

Start Solr:

solrini/bin/solr start -c -m 8G

Adjust memory usage (i.e., -m 8G as appropriate).

Run the Solr bootstrap script to copy the Anserini JAR into Solr's classpath and upload the configsets to Solr's internal ZooKeeper:

pushd src/main/resources/solr && ./solr.sh ../../../../solrini localhost:9983 && popd

Solr should now be available at http://localhost:8983/ for browsing.

Indexing into SolrCloud from Anserini

We can use Anserini as a common "frontend" for indexing into SolrCloud, thus supporting the same range of test collections that's already included in Anserini (when directly building local Lucene indexes). Indexing into Solr is similar indexing to disk with Lucene, with a few added parameters. Most notably, we replace the -index parameter (which specifies the Lucene index path on disk) with Solr parameters.

We'll index robust04 as an example:

Create the robust04 collection in Solr:

solrini/bin/solr create -n anserini -c robust04

Run the Solr indexing command for robust04:

sh target/appassembler/bin/IndexCollection -collection TrecCollection -generator JsoupGenerator \
  -threads 8 -input /path/to/robust04 \
  -solr -solr.index robust04 -solr.zkUrl localhost:9983 \
  -storePositions -storeDocvectors -storeRawDocs

Make sure /path/to/robust04 is updated with the appropriate path.

Once indexing has completed, you should be able to query robust04 from the Solr query interface.

You can also run the following command to replicate Anserini BM25 retrieval:

sh target/appassembler/bin/SearchSolrCollection -topicreader Trec \
  -solr.index robust04 -solr.zkUrl localhost:9983 \
  -topics src/main/resources/topics-and-qrels/topics.robust04.301-450.601-700.txt \
  -output run.solr.robust04.bm25.topics.robust04.301-450.601-700.txt

Evaluation can be performed using trec_eval:

eval/trec_eval.9.0.4/trec_eval -m map -m P.30 src/main/resources/topics-and-qrels/qrels.robust2004.txt run.solr.robust04.bm25.topics.robust04.301-450.601-700.txt

Other collections can be indexed by substituting the appropriate parameters; see each collection's experiment docs.