Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Evaluation on AMD 16-Core CPU Bare Metal via Latitude.sh Hardware Cloud #306

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
7105c49
fleshing out outline of the latitude readme
sourcesync Aug 27, 2024
83f3a17
fix links
sourcesync Aug 27, 2024
1ef3d49
fix typos and refactor sections
sourcesync Aug 27, 2024
f37301f
refactor readme; add external files and links
sourcesync Aug 28, 2024
37aa5f9
fix some typos
sourcesync Aug 28, 2024
04ad340
fix some typos
sourcesync Aug 28, 2024
f9792ef
fix title
sourcesync Aug 28, 2024
5b63e39
fix typo
sourcesync Aug 28, 2024
e40ce97
filled out credits section
sourcesync Aug 28, 2024
178c93f
completing hw inventory
sourcesync Aug 28, 2024
e2c2d43
fix typos
sourcesync Aug 28, 2024
b54eb5b
adding latitude logo
sourcesync Aug 28, 2024
5b05b77
change instance text; adding latitude logo
sourcesync Aug 28, 2024
e216c5e
changing logo to src tag
sourcesync Aug 28, 2024
89a4f6d
changing logo bg
sourcesync Aug 28, 2024
605aa3e
trying to change bg color at logo area
sourcesync Aug 28, 2024
995a264
remove rect w/wbg
sourcesync Aug 28, 2024
67c3017
refactor markdown
sourcesync Aug 28, 2024
cf0c31d
refactor hw inventory files
sourcesync Aug 28, 2024
ee23751
fix typo
sourcesync Aug 28, 2024
c8bb4d9
update credits
sourcesync Aug 28, 2024
b285905
fix typo
sourcesync Aug 28, 2024
dc997d5
starting markdown ranking table
sourcesync Aug 28, 2024
d0d94a7
adding zilliz; re-export data csv
sourcesync Aug 28, 2024
e8c33e8
zilliz sparse; re-export csv
sourcesync Aug 29, 2024
1acae16
adding results table and run scripts
sourcesync Aug 29, 2024
d1d5094
fix link path
sourcesync Aug 29, 2024
d3f652d
refactor instance name; re-run to get error outputs
sourcesync Aug 29, 2024
f57a913
rename data export
sourcesync Aug 29, 2024
1b3bb52
error dir cleanup ; fixup nb and rerun
sourcesync Aug 29, 2024
2488265
file cleanup
sourcesync Aug 29, 2024
1dc024e
rm invalid err file
sourcesync Aug 29, 2024
4db0380
rm invalid err files
sourcesync Aug 29, 2024
d44d37f
remove all err files to prep for redo
sourcesync Aug 29, 2024
da28506
cleanup and re-render markdown
sourcesync Aug 29, 2024
a17e963
add algo run instr to markdown
sourcesync Aug 29, 2024
e59d955
add nb link to markdown
sourcesync Aug 29, 2024
bf9c477
update notebook link in markdown
sourcesync Aug 29, 2024
26fb36e
reran error algos
sourcesync Aug 30, 2024
8b17954
re-render nb
sourcesync Aug 30, 2024
19dfb00
re-render nb to get error links
sourcesync Aug 30, 2024
1d3e769
fix nb and re-render
sourcesync Aug 30, 2024
d8f3e11
add ranking sanity check cell
sourcesync Aug 30, 2024
9095fb6
moved ranking to top of markdown
sourcesync Aug 30, 2024
5d4a762
added extra line breaks at top
sourcesync Aug 30, 2024
595fd8f
re-render nb
sourcesync Aug 30, 2024
b042555
add 'qualifying' note
sourcesync Aug 30, 2024
44521a8
fix language and re-render nb
sourcesync Aug 30, 2024
2e4d15f
fix typo
sourcesync Aug 30, 2024
8549d8d
fix language
sourcesync Aug 30, 2024
227cf9a
fix language;re-render nb
sourcesync Aug 30, 2024
3d0d3c9
cleanup jupyter checkpoints dir
sourcesync Aug 30, 2024
ed5c9aa
Change ScaNN submission to pin version and use all available cores.
arron2003 Sep 13, 2024
f46c421
Merge branch 'main' into gw/latitude_m4_metal_medium
sourcesync Sep 13, 2024
2374291
Merge branch 'arron2003_main' into gw/latitude_m4_metal_medium
sourcesync Sep 13, 2024
eeb92ba
merged scann PR https://github.com/arron2003/big-ann-benchmarks/tree/…
sourcesync Sep 13, 2024
acb7d74
Merge branch 'gw/latitude_m4_metal_medium' of github.com:harsha-simha…
sourcesync Sep 13, 2024
7f5b1cc
adding algo ignore for hanns temporarily; rerender markdown for new r…
sourcesync Sep 13, 2024
e5125a6
run all nb cells; rerender markdown and rankings
sourcesync Sep 13, 2024
23eb1a9
removing algos in IGNORE_ALGOS config; rerender markdown
sourcesync Sep 13, 2024
c66f803
need to update paretto graph since we reran scann
sourcesync Sep 14, 2024
691fba6
Merge branch 'gw/latitude_m4_metal_medium' of github.com:harsha-simha…
sourcesync Sep 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
320 changes: 320 additions & 0 deletions neurips23/latitude-m4-metal-medium.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions neurips23/latitude/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.ipynb_checkpoints/
142 changes: 142 additions & 0 deletions neurips23/latitude/_latitude-m4-metal-medium.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@

# Eval On AMD 3GHz/16-Core + 125GB RAM + NVMe SSD (Bare Metal)

## Table Of Contents

- [Introduction](#introduction)
- [Results](#results)
- [Hardware Inventory](#hardware_inventory)
- [How To Reproduce](#how_to_reproduce)
- [Disclaimers And Credits](#disclaimers_and_credits)

## Introduction

The NeurIPS2023 Practical Vector Search Challenge evaluated participating algorithms on Azure and EC2 CPU-based hardware instances.

In pursuit of expanding the evaluation criteria, we are also running on other generally available hardware configurations.

Shown here are results run on the following hardware:
* AMD EPYC 9124 16-Core 3GHz processor
* 125GB RAM
* 440GB NVMe SSD
* Bare-metal "m4-metal-medium" instance provided by [Latitude](https://www.latitude.sh/)

## Results

The calculated rankings are shown at the top.

Notes:
* Evaluations were run in late August 2024.
* In each track, qualifying algorithms are ranked by largest *qps* where *recall/ap* >= 0.9.
* All participating algorithms are shown for each track, but only qualifying algorithms are ranked.
* Each track algorithm links to the build and run commmand used (or disqualifying errors, if any).
* Pareto graphs for each track shown below.

### Track: Filter

![Filter](latitude/filter.png)

### Track: Sparse

![Sparse](latitude/sparse.png)

### Track: OOD

![OOD](latitude/ood.png)

### Track: Streaming

TODO

### Data Export

The full data export CSV file can be found [here.](latitude/data_export_m4-metal-medium.csv)

## Hardware_Inventory

* Via [*lshw*](latitude/m4-metal-medium-lshw.txt)
* Via [*hwinfo*](latitude/m4-metal-medium-hwinfo.txt)
* Via [*procinfo*](latitude/m4-metal-medium-procinfo.txt)

## How_To_Reproduce

This section shows the steps you can use to reproduce the results shown above, from scratch.

### System Preparation

* Signup for/sign into your Latitude account
* Provision an "m4-metal-medium" instance with at least 100GB NVMe SSD with Linux 20.04.06 LTS
* ssh remotely into the instance
* update Linux via command ```sudo apt-get update```
* install Anaconda for Linux
* run the following commands:
```
git clone [email protected]:harsha-simhadri/big-ann-benchmarks.git
cd big-ann-benchmarks
conda create -n bigann-latitude-m4-metal-medium python=3.10
conda activate bigann-latitude-m4-metal-medium
python -m pip install -r requirements_py3.10.txt
```

### Sparse Track

Prepare the track dataset by running the following command in the top-level directory of the repository:
```
python create_dataset.py --dataset sparse-full
```

See the [latitude/commands](latitude/commands) directory for individual algorithm scripts.

### Filter Track

Prepare the track dataset by running the following command in the top-level directory of the repository:
```
python create_dataset.py --dataset yfcc-10M
```

See the [latitude/commands](latitude/commands) directory for individual algorithm scripts.

### OOD Track

Prepare the track dataset by running the following command in the top-level directory of the repository:
```
python create_dataset.py --dataset text2image-10M
```
See the [latitude/commands](latitude/commands) directory for individual algorithm scripts.

### Streaming Track

Prepare the track dataset by running the following command in the top-level directory of the repository:
```
python create_dataset.py --dataset msturing-30M-clustered
python -m benchmark.streaming.download_gt --runbook_file neurips23/streaming/final_runbook.yaml --dataset msturing-30M-clustered
```

See the [latitude/commands](latitude/commands) directory for individual algorithm scripts.

### Analysis

To extract the data as CSV:
```
sudo chmod ugo+rw -R ./results/ # recursively add read/write permissions to directories and files under the results directory.
python data_export.py --recompute --output neurips23/latitude/data_export_m4-metal-medium.csv
```

To plot individual tracks:
```
python plot.py --neurips23track sparse --output neurips23/latitude/sparse.png --raw --recompute --dataset sparse-full
python plot.py --neurips23track filter --output neurips23/latitude/filter.png --raw --recompute --dataset yfcc-10M
python plot.py --neurips23track ood --output neurips23/latitude/ood.png --raw --recompute --dataset text2image-10M
TODO: streaming track
```

To render the ranking table, see this [notebook](latitude/analysis.ipynb).

## Disclaimers_And_Credits

* The hardware systems were graciously donated by [Latitude](https://www.latitude.sh/)
* None of the Neurips2021/23 organizers is an employee or affiliated with Latitude.
* [George Williams](https://github.com/sourcesync), an organizer for both the NeurIPS2021 and NeurIPS2023 Competitions ran the evaluations described above.
* Our main contact from Latitude is [Victor Chiea]([email protected]), whom we were introduced by [Harald Carlens]([email protected]) from [MLContests](https://mlcontests.com/).
* Latitude logo for sponsorship attribution below (note: it has a transparent background):
<img src="latitude/latitude_logo.png" height="100px">
Loading
Loading