Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NeurIPS 2023 OOD Track] Submission from Team UTokyo #197

Merged
merged 9 commits into from
Oct 30, 2023

Conversation

maronuu
Copy link
Contributor

@maronuu maronuu commented Oct 29, 2023

This is a submission from our team UTokyo.

Our algorithm is epsearch.

If it has something wrong, I would appreciate quick response about it.

Additionally, I hope to see the result output to check the performance gap, because I'm not available the same VM as yours.

@maronuu maronuu changed the title [NeurIPS'23] Submission from Team UTokyo [NeurIPS 2023 OOD Track] Submission from Team UTokyo Oct 29, 2023
@harsha-simhadri
Copy link
Owner

Thanks for your submission. I was unable to install the code. Could you please help?

10.33 Get:47 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libwebsocketpp-dev amd64 0.8.2-4 [119 kB]
10.43 Err:46 http://security.ubuntu.com/ubuntu jammy-updates/main amd64 libssl-dev amd64 3.0.2-0ubuntu1.10
10.43   404  Not Found [IP: 185.125.190.39 80]
10.80 Get:48 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcpprest-dev amd64 2.10.18-1build2 [158 kB]
10.99 E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/o/openssl/libssl-dev_3.0.2-0ubuntu1.10_amd64.deb  404  Not Found [IP: 185.125.190.39 80]
10.99 E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
10.99 Fetched 66.6 MB in 10s (6780 kB/s)
------
Dockerfile:7
--------------------
   5 |     RUN add-apt-repository -y ppa:git-core/ppa
   6 |     RUN apt update
   7 | >>> RUN DEBIAN_FRONTEND=noninteractive apt install -y git make g++ libaio-dev libgoogle-perftools-dev libunwind-dev clang-format libboost-dev libboost-program-options-dev libcpprest-dev python3.10
   8 |     # mkl
   9 |     RUN DEBIAN_FRONTEND=noninteractive apt install -y intel-mkl
--------------------
ERROR: failed to solve: process "/bin/sh -c DEBIAN_FRONTEND=noninteractive apt install -y git make g++ libaio-dev libgoogle-perftools-dev libunwind-dev clang-format libboost-dev libboost-program-options-dev libcpprest-dev python3.10" did not complete successfully: exit code: 100


Install Status:
{'neurips23-ood-epsearch': 'fail'}

@maronuu
Copy link
Contributor Author

maronuu commented Oct 30, 2023

It seems network issue, since it shows 404.
In CI executed about 7 hours ago worked successfully, here is the CI log:

https://github.com/maronuu/big-ann-benchmarks/actions/runs/6685604607/job/18164076292#step:4:2379

So could you please retry after removing docker build cache, or reboot the VM?

@maronuu
Copy link
Contributor Author

maronuu commented Oct 30, 2023

@harsha-simhadri

@harsha-simhadri
Copy link
Owner

Yes, I am trying from a different machine. will update soon

@harsha-simhadri
Copy link
Owner

I am curious why your docker script is so long -- faiss, diskann, hnsw installs and then some. Can this be pruned?

@maronuu
Copy link
Contributor Author

maronuu commented Oct 30, 2023

Sorry, all of them faiss, hnswlib, and diskann (built from our modified source) are used in our solution...
Will try to reduce the dependencies and investigate the dockerfile, but I'd appreciate you could continue the evaluation.
Thank you for understanding

@harsha-simhadri
Copy link
Owner

I am able to install and run now, but see the following crash

2023-10-30 06:00:22,465 - annb.da2885bf00d6 - INFO - Time taken for save: 70.039s.
2023-10-30 06:00:23,168 - annb.da2885bf00d6 - INFO - DiskANN index built at /home/app/data/indices/ood/diskann/Text2Image1B-10000000/R60_L500_alpha1.0/R60_L500_alpha1.0
2023-10-30 06:00:23,168 - annb.da2885bf00d6 - INFO - DiskANN index built in 6738.572 s
2023-10-30 06:00:23,168 - annb.da2885bf00d6 - INFO - Loading index..
2023-10-30 06:00:23,172 - annb.da2885bf00d6 - INFO - /opt/miniconda3/envs/utokyo/lib/python3.10/site-packages/diskannpy/_common.py:250: UserWarning: The number of vectors in the saved index exceeds the max_vectors parameter. max_vectors is being adjusted to accommodate the dataset, but any insertions will fail.
2023-10-30 06:00:23,172 - annb.da2885bf00d6 - INFO -   warnings.warn(
2023-10-30 06:00:23,172 - annb.da2885bf00d6 - INFO - /opt/miniconda3/envs/utokyo/lib/python3.10/site-packages/diskannpy/_common.py:256: UserWarning: The number of vectors in the saved index equals max_vectors parameter. Any insertions will fail.
2023-10-30 06:00:23,172 - annb.da2885bf00d6 - INFO -   warnings.warn(
2023-10-30 06:00:23,196 - annb.da2885bf00d6 - INFO - Inner product: Using AVX2 implementation AVXDistanceInnerProductFloat
2023-10-30 06:00:24,934 - annb.da2885bf00d6 - INFO - /tmp/tmpiy6kl6l0: line 3:    15 Killed                  python3 -u run_algorithm.py --dataset text2image-10M --algorithm epsearch --module neurips23.ood.epsearch.diskann-in-mem-ep-hnsw --constructor epdiskann --runs 5 --count 10 --neurips23track ood '["ip", {"R": 60, "L": 500, "alpha": 1.0, "n_ep_candidates": 16384, "buildthreads": 8, "ep_train": "id", "M": 32, "efConstruction": 200}]' '[{"Ls": 70, "T": 8, "efSearch": 32}]' '[{"Ls": 70, "T": 8, "efSearch": 128}]' '[{"Ls": 80, "T": 8, "efSearch": 32}]' '[{"Ls": 80, "T": 8, "efSearch": 128}]' '[{"Ls": 85, "T": 8, "efSearch": 32}]' '[{"Ls": 90, "T": 8, "efSearch": 32}]' '[{"Ls": 95, "T": 8, "efSearch": 32}]' '[{"Ls": 100, "T": 8, "efSearch": 32}]' '[{"Ls": 105, "T": 8, "efSearch": 32}]' '[{"Ls": 110, "T": 8, "efSearch": 32}]'
2023-10-30 06:00:24,968 - annb.da2885bf00d6 - INFO - ERROR conda.cli.main_run:execute(49): `conda run python3 -u run_algorithm.py --dataset text2image-10M --algorithm epsearch --module neurips23.ood.epsearch.diskann-in-mem-ep-hnsw --constructor epdiskann --runs 5 --count 10 --neurips23track ood ["ip", {"R": 60, "L": 500, "alpha": 1.0, "n_ep_candidates": 16384, "buildthreads": 8, "ep_train": "id", "M": 32, "efConstruction": 200}] [{"Ls": 70, "T": 8, "efSearch": 32}] [{"Ls": 70, "T": 8, "efSearch": 128}] [{"Ls": 80, "T": 8, "efSearch": 32}] [{"Ls": 80, "T": 8, "efSearch": 128}] [{"Ls": 85, "T": 8, "efSearch": 32}] [{"Ls": 90, "T": 8, "efSearch": 32}] [{"Ls": 95, "T": 8, "efSearch": 32}] [{"Ls": 100, "T": 8, "efSearch": 32}] [{"Ls": 105, "T": 8, "efSearch": 32}] [{"Ls": 110, "T": 8, "efSearch": 32}]` failed. (See above for error)
2023-10-30 06:00:26,702 - annb.da2885bf00d6 - ERROR - ['ip', {'R': 60, 'L': 500, 'alpha': 1.0, 'n_ep_candidates': 16384, 'buildthreads': 8, 'ep_train': 'id', 'M': 32, 'efConstruction': 200}]
Trying to instantiate neurips23.ood.epsearch.diskann-in-mem-ep-hnsw.epdiskann(['ip', {'R': 60, 'L': 500, 'alpha': 1.0, 'n_ep_candidates': 16384, 'buildthreads': 8, 'ep_train': 'id', 'M': 32, 'efConstruction': 200}])
Running epsearch on text2image-10M
buildthreads: 8
buildthreads: 8 set to faiss
data/text2image1B/base.1B.fbin.crop_nb_10000000
xb shape: (10000000, 200)
Building Entry Point Searcher..
ep index not found at /home/app/data/indices/ood/diskann/Text2Image1B-10000000/mydiskann-ep-hnsw_n_ep_candidates16384_M32_efConstruction200/ep_R60_L500_alpha1.0.index, building...
candidate ids not found, building...
training kmeans
training kmeans done
centroids: (16384, 200)
delete kmeans instance
build raw index
build raw index done
candidate ids: (16384,)
[9192670 6146820 7294570 ...  854926 7241299 1113765]
saving candidate ids...
build search index
candidate vecs: (16384, 200)
adding items to hnsw index
build search index done
write index...
write index done
EP Searcher built in 1290.920 s
Building DiskANN index..
DiskANN index does not exist at /home/app/data/indices/ood/diskann/Text2Image1B-10000000/R60_L500_alpha1.0/R60_L500_alpha1.0, building...
Inner product: Using AVX2 implementation AVXDistanceInnerProductFloat
Using only first 10000000 from file..
Starting index build with 10000000 points...
99% of index build completed.Starting final cleanup..done. Link time: 5354.67s
Index built with degree: max:60  avg:34.2944  min:1  count(deg<2):167
Not saving tags as they are not enabled.
Time taken for save: 70.039s.
DiskANN index built at /home/app/data/indices/ood/diskann/Text2Image1B-10000000/R60_L500_alpha1.0/R60_L500_alpha1.0
DiskANN index built in 6738.572 s
Loading index..
/opt/miniconda3/envs/utokyo/lib/python3.10/site-packages/diskannpy/_common.py:250: UserWarning: The number of vectors in the saved index exceeds the max_vectors parameter. max_vectors is being adjusted to accommodate the dataset, but any insertions will fail.
  warnings.warn(
/opt/miniconda3/envs/utokyo/lib/python3.10/site-packages/diskannpy/_common.py:256: UserWarning: The number of vectors in the saved index equals max_vectors parameter. Any insertions will fail.
  warnings.warn(
Inner product: Using AVX2 implementation AVXDistanceInnerProductFloat
/tmp/tmpiy6kl6l0: line 3:    15 Killed                  python3 -u run_algorithm.py --dataset text2image-10M --algorithm epsearch --module neurips23.ood.epsearch.diskann-in-mem-ep-hnsw --constructor epdiskann --runs 5 --count 10 --neurips23track ood '["ip", {"R": 60, "L": 500, "alpha": 1.0, "n_ep_candidates": 16384, "buildthreads": 8, "ep_train": "id", "M": 32, "efConstruction": 200}]' '[{"Ls": 70, "T": 8, "efSearch": 32}]' '[{"Ls": 70, "T": 8, "efSearch": 128}]' '[{"Ls": 80, "T": 8, "efSearch": 32}]' '[{"Ls": 80, "T": 8, "efSearch": 128}]' '[{"Ls": 85, "T": 8, "efSearch": 32}]' '[{"Ls": 90, "T": 8, "efSearch": 32}]' '[{"Ls": 95, "T": 8, "efSearch": 32}]' '[{"Ls": 100, "T": 8, "efSearch": 32}]' '[{"Ls": 105, "T": 8, "efSearch": 32}]' '[{"Ls": 110, "T": 8, "efSearch": 32}]'
ERROR conda.cli.main_run:execute(49): `conda run python3 -u run_algorithm.py --dataset text2image-10M --algorithm epsearch --module neurips23.ood.epsearch.diskann-in-mem-ep-hnsw --constructor epdiskann --runs 5 --count 10 --neurips23track ood ["ip", {"R": 60, "L": 500, "alpha": 1.0, "n_ep_candidates": 16384, "buildthreads": 8, "ep_train": "id", "M": 32, "efConstruction": 200}] [{"Ls": 70, "T": 8, "efSearch": 32}] [{"Ls": 70, "T": 8, "efSearch": 128}] [{"Ls": 80, "T": 8, "efSearch": 32}] [{"Ls": 80, "T": 8, "efSearch": 128}] [{"Ls": 85, "T": 8, "efSearch": 32}] [{"Ls": 90, "T": 8, "efSearch": 32}] [{"Ls": 95, "T": 8, "efSearch": 32}] [{"Ls": 100, "T": 8, "efSearch": 32}] [{"Ls": 105, "T": 8, "efSearch": 32}] [{"Ls": 110, "T": 8, "efSearch": 32}]` failed. (See above for error)

@maronuu
Copy link
Contributor Author

maronuu commented Oct 30, 2023

@harsha-simhadri
Thank you for the response!

Could you please re-run the script? It will load the built index, since index building seems OK.

I encountered the same kind of issue when running the baseline diskann.
When you execute build -> load consecutively, you might encounter this kind of crash.
In my VM (16GB RAM), after completing building, re-running the script to start from loading the built one is sccusessful.

@harsha-simhadri
Copy link
Owner

ok, will do
@magdalendobson also mentioned the problem with diskann. will fix later.

@harsha-simhadri
Copy link
Owner

@maronuu Do these results agree with your experiments?

epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 70, 'T': 8, 'efSearch': 32}))",text2image-10M,10,7573.446042503093,0.0,20.606487035751343,9923132.0,0,0,ood,0.8635550000000001
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 70, 'T': 8, 'efSearch': 128}))",text2image-10M,10,6640.106372050111,0.0,20.606487035751343,9923132.0,0,0,ood,0.86555
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 80, 'T': 8, 'efSearch': 32}))",text2image-10M,10,6937.091639136287,0.0,20.606487035751343,9923132.0,0,0,ood,0.877434
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 80, 'T': 8, 'efSearch': 128}))",text2image-10M,10,6196.065618049459,0.0,20.606487035751343,9923132.0,0,0,ood,0.8793979999999999
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 85, 'T': 8, 'efSearch': 32}))",text2image-10M,10,6734.157169006247,0.0,20.606487035751343,9923132.0,0,0,ood,0.883373
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 90, 'T': 8, 'efSearch': 32}))",text2image-10M,10,6481.696715359318,0.0,20.606487035751343,9923132.0,0,0,ood,0.888749
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 95, 'T': 8, 'efSearch': 32}))",text2image-10M,10,6272.184337292608,0.0,20.606487035751343,9923132.0,0,0,ood,0.8935280000000001
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 100, 'T': 8, 'efSearch': 32}))",text2image-10M,10,6052.100101117866,0.0,20.606487035751343,9923132.0,0,0,ood,0.897959
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 105, 'T': 8, 'efSearch': 32}))",text2image-10M,10,5876.9827056185695,0.0,20.606487035751343,9923132.0,0,0,ood,0.901895
epsearch,"diskann(('R60_L500_alpha1.0', {'Ls': 110, 'T': 8, 'efSearch': 32}))",text2image-10M,10,5693.436734613872,0.0,20.606487035751343,9923132.0,0,0,ood,0.9056050000000001

@maronuu
Copy link
Contributor Author

maronuu commented Oct 30, 2023

not the same tendency but seems reasonable,
I will try another parameter sets and submit it.
Thank you for the evaluation! You can merge this PR.

@maronuu
Copy link
Contributor Author

maronuu commented Oct 30, 2023

@harsha-simhadri
Sorry, one question,
Did you use the private query set or public one in this evaluation?

@harsha-simhadri
Copy link
Owner

Public query.

@harsha-simhadri harsha-simhadri merged commit c4847d2 into harsha-simhadri:main Oct 30, 2023
matchyc pushed a commit to matchyc/big-ann-benchmarks that referenced this pull request Oct 31, 2023
…#197)

* init

* update params for random-xs

* update param

* fix param type

* fix constructor name

* fix random-xs config

* add efsearch to random-xs

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants