Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Added a benchmark report on sift #374

Merged
merged 7 commits into from
Dec 26, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 31 additions & 28 deletions docs/benchmark.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# Benchmark

**Infinity** provides Python script for sift1m and gist1m dataset benchmark.
Infinity provides a Python script for benchmarking the SIFT1M and GIST1M datasets.

## Get the Infinity binary file
## Build and start Infinity
writinwaters marked this conversation as resolved.
Show resolved Hide resolved

```sh
git clone https://github.com/infiniflow/infinity.git
cd infinity
```
You have two options for building Infinity. Choose the option that best fits your needs:

## Download the benchmark file
- [Build Infinity using Docker](../README.md)
- [Build from source](./build_from_source.md)

Download via wget.
## Download the Benchmark datasets

To obtain the benchmark datasets, you have the option to download them using the wget command.

```sh
#download sift benchmark
Expand All @@ -21,16 +21,16 @@ wget ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz

```

or visit [http://corpus-texmex.irisa.fr/](http://corpus-texmex.irisa.fr/) to download manually.
Alternatively, you can manually download the benchmark datasets by visiting [http://corpus-texmex.irisa.fr/](http://corpus-texmex.irisa.fr/).

```sh
#uncompress and move benchmark file
# Unzip and move the SIFT1M benchmark file.
tar -zxvf sift.tar.gz
mv sift/sift_base.fvecs test/data/benchmark/sift_1m/sift_base.fvecs
mv sift/sift_query.fvecs test/data/benchmark/sift_1m/sift_query.fvecs
mv sift/sift_groundtruth.ivecs test/data/benchmark/sift_1m/sift_groundtruth.ivecs


# Unzip and move the GIST1M benchmark file.
tar -zxvf gist.tar.gz
mv gist/gist_base.fvecs test/data/benchmark/gist_1m/gist_base.fvecs
mv gist/gist_query.fvecs test/data/benchmark/gist_1m/gist_query.fvecs
Expand All @@ -48,36 +48,39 @@ python setup.py bdist_wheel
pip install dist/infinity_sdk-0.1.0.dev1-py3-none-any.whl
```

## Start Infinity

See the [README.md](https://github.com/infiniflow/infinity/blob/main/README.md) to start Infinity.

## Import data
## Import the Benchmark datasets

```sh
cd benchmark

options:
-h, --help show this help message and exit
-d DATA_SET, --data DATA_SET
# options:
# -h, --help show this help message and exit
# -d DATA_SET, --data DATA_SET

python remote_benchmark_import.py -d sift_1m
python remote_benchmark_import.py -d gist_1m
```

## Run benchmark
## Run Benchmark

```sh
options:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
-r ROUNDS, --rounds ROUNDS
-d DATA_SET, --data DATA_SET
# options:
# -h, --help show this help message and exit
# -t THREADS, --threads THREADS
# -r ROUNDS, --rounds ROUNDS
# -d DATA_SET, --data DATA_SET

# ROUNDS refers to the number of times that Python runs the benchmark. The result is the average time for all runs.
# ROUNDS indicates the number of times Python executes the benchmark, and the result represents the average duration for each run.

# The following command means run benchmark with one thread, for one time using the sift dataset.
# Perform a benchmark on the SIFT1M dataset using a single thread, running it only once.
python remote_benchmark.py -t 1 -r 1 -d sift_1m

# Perform a benchmark on the GIST1M dataset using a single thread, running it only once.
python remote_benchmark.py -t 1 -r 1 -d gist_1m
```
## A SIFT1M Benchmark report

- **Hardware**: Intel i5-12500H, 16C, 16GB
- **Operating system**: Ubuntu 22.04
- **Dataset**: SIFT1M; **topk**: 100; **recall**: 97%+
- **QPS**: 10,305
- **P99 Latency**: < 0.4 ms