An Impact Ordered Query Processor.
IOQP is loosely based on the JASS and JASSv2 search systems.
IOQP makes use of open-source Rust code from the faster-graph-bisection library.
If you use this code in your own work or research, please consider citing our work:
@inproceedings{mpg22-desires,
title = {IOQP: A simple Impact-Ordered Query Processor written in Rust},
author = {J. Mackenzie and M. Petri and L. Gallagher},
booktitle = {Proc. DESIRES},
year = {2022},
pages = {22--34},
}
You can build the code using cargo:
cargo build --release
Use the following scripts to run the Gov2 and MS MARCO experiments from the paper:
./script/download-data.sh
./script/build-indexes.sh
./script/run-queries.sh
The run files are located in data/gov2/runs
and data/msmarco/runs
. Timing
results can be found in data/log
.
Here we give an example of the multi-threaded throughput experiments for Gov2. In the following we first start the server to host an index and listen for incoming requests. Then we run the load generator to simulate a workload of incoming queries.
- Starting the server
$ ./target/release/serve --max-blocking-threads 16 --index data/gov2/indexes/bp-gov2.8.ioqp.idx
2022-08-17T01:19:18.406794Z INFO serve: args = Args { index: "data/gov2/indexes/bp-gov2.8.ioqp.idx", port: 3000, max_blocking_threads: 16 }
2022-08-17T01:19:18.406844Z INFO serve: loading index from file data/gov2/indexes/bp-gov2.8.ioqp.idx
2022-08-17T01:21:22.277424Z INFO serve: start http endpoint at 0.0.0.0:3000
- Run the load generator. In this example we are using exhaustive processing with 10 incoming queries per second.
$ ./target/release/load_gen --k 1000 --mode fraction-1 --queries data/gov2/queries/mqt.queries --tps 10
2022-08-17T01:29:59.276151Z INFO load_gen: read queries = 59986
2022-08-17T01:34:59.322531Z INFO load_gen: ======= Server Time =======
2022-08-17T01:34:59.322544Z INFO load_gen: # of samples: 2009
2022-08-17T01:34:59.322548Z INFO load_gen: 50'th percntl.: 30500µs
2022-08-17T01:34:59.322553Z INFO load_gen: 90'th percntl.: 69395µs
2022-08-17T01:34:59.322558Z INFO load_gen: 99'th percntl.: 116047µs
2022-08-17T01:34:59.322562Z INFO load_gen: 99.9'th percntl.: 149946µs
2022-08-17T01:34:59.322565Z INFO load_gen: max.: 164049µs
2022-08-17T01:34:59.322578Z INFO load_gen: mean time: 36120.8µs
2022-08-17T01:34:59.322648Z INFO load_gen: ======= User Time =======
2022-08-17T01:34:59.322651Z INFO load_gen: # of samples: 2009
2022-08-17T01:34:59.322655Z INFO load_gen: 50'th percntl.: 31352µs
2022-08-17T01:34:59.322660Z INFO load_gen: 90'th percntl.: 70331µs
2022-08-17T01:34:59.322663Z INFO load_gen: 99'th percntl.: 116714µs
2022-08-17T01:34:59.322666Z INFO load_gen: 99.9'th percntl.: 150643µs
2022-08-17T01:34:59.322669Z INFO load_gen: max.: 164848µs
2022-08-17T01:34:59.322673Z INFO load_gen: mean time: 36953.0µs
You will require a unique URL with a password to get the data due to the old data storage platform being decommissioned. Please create a github issue or contact Joel directly: [email protected]
Use the script/build-indexes.sh
to build the indexes from the paper.
Index a CIFF file and perform quantization:
./target/release/create \
--input data/gov2/ciff/bp-gov2.ciff \
--output data/gov2/indexes/bp-gov2.8.ioqp.idx \
--quantize \
--quant-bits 8 \
--bm25-k1 0.9 \
--bm25-b 0.4
Index a CIFF file that is already quantized:
./target/release/create \
--input data/msmarco/ciff/bp-spladev2.ciff \
--output data/msmarco/indexes/bp-spladev2.ioqp.idx
Use the script/run-queries.sh
to run the queries from the paper.
Query processing with exhaustive mode:
./target/release/query \
--index data/gov2/bp-gov2.8.ioqp.idx \
--queries data/gov2/queries/gov2.queries \
--output data/gov2/run/gov2.run \
--k 1000 \
--mode fraction-1 \
--warmup
Query processing with fixed budget:
./target/release/query \
--index data/gov2/bp-gov2.8.ioqp.idx \
--queries data/gov2/queries/gov2.queries \
--output data/gov2/run/gov2.run \
--k 1000 \
--mode fixed-10000 \
--warmup
Query processing with query term weights:
./target/release/query \
--index data/msmarco/bp-spladev2.ioqp.idx \
--queries data/msmarco/queries/spladev2.dev.query \
--output data/msmarco/run/spladev2.run \
--k 1000 \
--weighted