Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NeurIPS 2023 Filter Track] rubignn #213

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

ciuji
Copy link

@ciuji ciuji commented Oct 30, 2023

Team RuBignn's submission of the filter track.

Our submission should be run by the custom setup, please refer to the README file for detailed processes. We have the script to run the docker and generate the hdf5 results in the /results folder. And we have the prebuilt index on Azure blob.

Copy link
Owner

@harsha-simhadri harsha-simhadri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a CI test entry?

@maumueller
Copy link
Collaborator

maumueller commented Nov 2, 2023

Hi @ciuji. Thank you for your submission. It would have been better if you had reached out to discuss your custom docker format. We mentioned certain properties of this setup here: https://github.com/harsha-simhadri/big-ann-benchmarks/blob/main/neurips23/README.md#custom_setup. I cannot see that you adhered to these.

Nevertheless, I gave the index building a try, and I received the following output:

$ bash docker_run_container_build.sh                                                                                                                                    transfer label file :/home/app/data/yfcc100M/base.metadata.10M.spmat to /home/app/index_file/label_file_base_yfcc10m_filter.txt                                                                                                              Matrix size: (10000000, 200386), non-zeros elements: 108210476                                                                                                                                                                               cover labels string to int                                                                                                                                                                                                                   Done                                                                                                                                                                                                                                         0                                                                                                                                                                                                                                            Used Virtual Memory: 9026699264                                                                                                                                                                                                              Used Physical Memory: 9026699264                                                                                                                                                                                                             1000000                                                                                                                                                                                                                                      Used Virtual Memory: 9331453952                                                                                                                                                                                                              Used Physical Memory: 9331453952                                                                                                                                                                                                             2000000                                                                                                                                                                                                                                      Used Virtual Memory: 9624080384                                                                                                                                                                                                              Used Physical Memory: 9624080384                                                                                                                                                                                                             3000000                                                                                                                                                                                                                                      Used Virtual Memory: 9914900480                                                                                                                                                                                                              Used Physical Memory: 9914900480                                                                                                                                                                                                             4000000                                                                                                                                                                                                                                      Used Virtual Memory: 10205720576                                                                                                                                                                                                             Used Physical Memory: 10205720576                                                                                                                                                                                                            5000000                                                                                                                                                                                                                                      Used Virtual Memory: 10623242240                                                                                                                                                                                                             Used Physical Memory: 10623242240                                                                                                                                                                                                            6000000                                                                                                                                                                                                                                      Used Virtual Memory: 10786844672                                                                                                                                                                                                             Used Physical Memory: 10786844672                                                                                                                                                                                                            7000000                                                                                                                                                                                                                                      Used Virtual Memory: 11170304000                                                                                                                                                                                                             Used Physical Memory: 11170304000                                                                                                                                                                                                            8000000                                                                                                                                                                                                                                      Used Virtual Memory: 11562020864                                                                                                                                                                                                             Used Physical Memory: 11562020864                                                                                                                                                                                                            9000000                                                                                                                                                                                                                                      Used Virtual Memory: 12364808192                                                                                                                                                                                                             Used Physical Memory: 12364808192                                                                                                                                                                                                            Identified 200363 distinct label(s) for 10000000 pointsile Size: 1920000008                                                                                                                                                                                                                        B1                                                                                                                                                                                                                                           B2                                                                                                                                                                                                                                           terminate called without an active exception

(not sure what happened to the line breaks after c&p from the VM)

@maumueller
Copy link
Collaborator

It seems that the build ran succeeded on another machine (although the physical memory seems extremely close to the machine limit, I wonder if that is what killed the first processes).

multiple assignments: country, year = line.split()
functions: arguments, parameter should have different names
codegrade calls the function

upload Matthew's videos about the debugger

bash docker_run_container_build.sh
transfer label file :/home/app/data/yfcc100M/base.metadata.10M.spmat to /home/app/index_file/label_file_base_yfcc10m_filter.txt
Matrix size: (10000000, 200386), non-zeros elements: 108210476
cover labels string to int
Done
0
Used Virtual Memory: 7707553792
Used Physical Memory: 7707553792
1000000
Used Virtual Memory: 8012308480
Used Physical Memory: 8012308480
2000000
Used Virtual Memory: 8303386624
Used Physical Memory: 8303386624
3000000
Used Virtual Memory: 8595238912
Used Physical Memory: 8595238912
4000000
Used Virtual Memory: 8886575104
Used Physical Memory: 8886575104
5000000
Used Virtual Memory: 9304870912
Used Physical Memory: 9304870912
6000000
Used Virtual Memory: 9468989440
Used Physical Memory: 9468989440
7000000
Used Virtual Memory: 9854513152
Used Physical Memory: 9854513152
8000000
Used Virtual Memory: 10244423680
Used Physical Memory: 10244423680
9000000
Used Virtual Memory: 11046694912
Used Physical Memory: 11046694912
Identified 200363 distinct label(s) for 10000000 points

A
File Size: 1920000008
B1
B2
generated 200363 label-specific vector files for index building in time 104.237

B
Generating indices per label...

Done. Generated per-label indices in 2407.01 seconds

Used Virtual Memory: 16286466048
Used Physical Memory: 16286466048
C
Used Virtual Memory: 15003127808
Used Physical Memory: 15003127808
stitched graph generated in memory in 997.189 seconds
stitched_graph_size: 2874834996
Used Virtual Memory: 15003127808
Used Physical Memory: 15003127808
Stitched graph written in 27.6037 seconds
Stitched graph average degree: 70.8709
Stitched graph max degree: 6395

Used Virtual Memory: 16325402624
Used Physical Memory: 16325402624
D
Used Virtual Memory: 16308019200
Used Physical Memory: 16308019200
D1
D2
Passed, empty build_params while creating index config
Passed, empty search_params while creating index config
Loading
From graph header, expected_file_size: 2874834996, _max_observed_degree: 6395, _start: 0, file_frozen_pts: 0
Loading vamana graph /home/app/index_file/yfcc_R10_L70_SR96_stitched_index_label_full.............done. Index has 10000000 nodes and 708708743 out-edges, _start is set to 0
Identified 200363 distinct label(s)
Num frozen points:0 _nd: 10000000 _start: 0 size(_location_to_tag): 0 size(_tag_to_location):0 Max points: 10000000
parsing labels
Prune time : 178141ms
Index built with degree: max:96  avg:57.6919  min:0  count(deg<2):792
Not saving tags as they are not enabled.
Time taken for save: 27.548s.
pruning performed in 239.255 seconds

Used Virtual Memory: 16312610816
Used Physical Memory: 16312610816
pruned/stitched graph generated in 3940.98 seconds

Then running the search greets me with


$ bash docker_run_container_search.sh
docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /home/ubuntu/built_index/index_file_96_R10L70.
See 'docker run --help'.

The available files are:

ubuntu@ip-172-31-95-171:~/big-ann-benchmarks/neurips23/filter/rubignn$ ls -lh /home/ubuntu/built_index/index_file_docker_build/
label_file_base_yfcc10m_filter.txt
yfcc_R10_L70_SR96_stitched_index_label
yfcc_R10_L70_SR96_stitched_index_label.data
yfcc_R10_L70_SR96_stitched_index_label_full
yfcc_R10_L70_SR96_stitched_index_label_full.data
yfcc_R10_L70_SR96_stitched_index_label_full_labels_to_medoids.txt
yfcc_R10_L70_SR96_stitched_index_label_full_labels.txt
yfcc_R10_L70_SR96_stitched_index_label_full_universal_label.txt
yfcc_R10_L70_SR96_stitched_index_label_label_formatted.txt
yfcc_R10_L70_SR96_stitched_index_label_labels_map.txt
yfcc_R10_L70_SR96_stitched_index_label_labels_to_medoids.txt
yfcc_R10_L70_SR96_stitched_index_label_labels.txt
yfcc_R10_L70_SR96_stitched_index_label_universal_label.txt

@maumueller maumueller self-assigned this Nov 2, 2023
@ciuji
Copy link
Author

ciuji commented Nov 2, 2023

I really appreciate the re-try!

Our docker_run_container_search.sh sets the index file path as the downloaded pre-built index folder, if you are running the search on the new building index, it needs to bind another path.

Below is the script for searching with the path /home/ubuntu/built_index/index_file_docker_build/

CONTEST_REPO_PATH=/home/ubuntu/big-ann-benchmarks #path to big-ann-benchmarks directory
INDEX_FILE_PATH=/home/ubuntu/built_index #path to index_file directory

docker container run -it  --mount type=bind,src=$CONTEST_REPO_PATH/results,dst=/home/app/results --mount type=bind,src=$INDEX_FILE_PATH/index_file_docker_build,dst=/home/app/index_file --read-only --mount type=bind,src=$CONTEST_REPO_PATH/data,dst=/home/app/data  neurips23-filter-rubignn  /bin/bash -c 'mkdir -p /home/app/results/neurips23/filter/yfcc-10M/10/rubignn && 
cd /home/app/ru-bignn-23/build &&
./apps/search_contest --index_path_prefix /home/app/index_file/yfcc_R10_L70_SR96_stitched_index_label --query_file /home/app/data/yfcc100M/query.public.100K.u8bin --search_list 80 90 95 100 105 110 120 130 --query_filters_file /home/app/data/yfcc100M/query.metadata.public.100K.spmat --result_path_prefix /home/app/results/neurips23/filter/yfcc-10M/10/rubignn/rubignn --runs 5 &&
python3 ../contest-scripts/output_bin_to_hdf5.py /home/app/results/neurips23/filter/yfcc-10M/10/rubignn/rubignn_search_metadata.txt /home/app'

@ciuji
Copy link
Author

ciuji commented Nov 2, 2023

Currently, we are still working on binding our code to the contest framework, we are very sorry that we cannot finish that before October 30th, and we submit a version with the custom setup to catch the deadline. We really appreciate your tries.

We tested several parameters and the submitted one worked on our AWS c5.2xlarge machine (16G memory limit), we are not sure why the first try failed because it should not reach the memory peak at that step.

@maumueller
Copy link
Collaborator

This worked and I got the following results:

rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L80 }))",yfcc-10M,10,4638.5958636341875,0.0,-1.0,-1.0,0,0,filter,0.8825379999999999
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L90 }))",yfcc-10M,10,4258.813006636381,0.0,-1.0,-1.0,0,0,filter,0.889352
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L95 }))",yfcc-10M,10,4081.0791254720834,0.0,-1.0,-1.0,0,0,filter,0.892375
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L100 }))",yfcc-10M,10,3891.093582512741,0.0,-1.0,-1.0,0,0,filter,0.895308
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L105 }))",yfcc-10M,10,3814.746451732662,0.0,-1.0,-1.0,0,0,filter,0.898113
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L110 }))",yfcc-10M,10,3653.427801233799,0.0,-1.0,-1.0,0,0,filter,0.900574
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L120 }))",yfcc-10M,10,3424.101235059619,0.0,-1.0,-1.0,0,0,filter,0.905161
rubignn,"rubignn(('R10_L70_SR96_', {'search_list': L130 }))",yfcc-10M,10,3235.251926155763,0.0,-1.0,-1.0,0,0,filter,0.9092230000000001

@maumueller
Copy link
Collaborator

maumueller commented Nov 3, 2023

@ciuji As mentioned above, please add a CI entry for your entry, although it works outside of the framework we would like to see the entry for random-filter-s.

@ciuji
Copy link
Author

ciuji commented Nov 3, 2023

Thanks for the evaluation! We have updated our code and commited it. It contains the entry for random-filter-s. It integrated our command line script to the contest evaluation framework.

@ciuji ciuji requested a review from harsha-simhadri November 4, 2023 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants