Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hanns OOD solution #304

Merged
merged 1 commit into from
Aug 20, 2024
Merged

Conversation

AndrewHYu
Copy link
Contributor

Our OOD track solution consists of a vamana index, a mutil-scale spatial clustering index, and a layout-optimized quantization acceleration index.
The entire retrieval process is from coarse to fine. First, the vamana index is used to quick find the nearst clusters. Then, within these clusters, the quantization-accelerated index is uesed for fast distance comparisons to identify the coarsely ranked candidates. Finally, SIMD instructions are used to re-rank these candidates, and the final results are returned.

text2image-10M
https://github.com/AndrewHYu/Hanns

@magdalendobson
Copy link
Collaborator

Thanks for your contribution. I am evaluating it now and will get back to you on how it goes!

@magdalendobson
Copy link
Collaborator

I ran with the downloaded index and got the following results:

hanns,"hanns,tree=27/40000,reorder=111",text2image-10M,10,53085.03723024023,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8774520000000001
hanns,"hanns,tree=27/40000,reorder=130",text2image-10M,10,51222.16584203003,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.882422
hanns,"hanns,tree=32/40000,reorder=140",text2image-10M,10,46858.49102240073,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8944110000000001
hanns,"hanns,tree=32/40000,reorder=150",text2image-10M,10,46771.317990241405,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.896185
hanns,"hanns,tree=34/40000,reorder=150",text2image-10M,10,45381.62378698972,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.899572
hanns,"hanns,tree=34/40000,reorder=155",text2image-10M,10,45685.10712457384,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.900311
hanns,"hanns,tree=36/40000,reorder=150",text2image-10M,10,44630.44910364101,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9026080000000001
hanns,"hanns,tree=37/40000,reorder=145",text2image-10M,10,44957.96616927795,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9031560000000001
hanns,"hanns,tree=38/40000,reorder=140",text2image-10M,10,44787.13982548163,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.903562
hanns,"hanns,tree=42/40000,reorder=160",text2image-10M,10,41713.34961169815,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.911723 .

These seem to agree with your posted figure. Now running without the downloaded index. By the way, your index building code seems to download a file called config.pb even when the download is disabled. Inspecting looks like it just contains parameters, but can you just confirm that it doesn't contain any pre-computed index information?

@AndrewHYu
Copy link
Contributor Author

I ran with the downloaded index and got the following results:

hanns,"hanns,tree=27/40000,reorder=111",text2image-10M,10,53085.03723024023,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8774520000000001 hanns,"hanns,tree=27/40000,reorder=130",text2image-10M,10,51222.16584203003,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.882422 hanns,"hanns,tree=32/40000,reorder=140",text2image-10M,10,46858.49102240073,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8944110000000001 hanns,"hanns,tree=32/40000,reorder=150",text2image-10M,10,46771.317990241405,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.896185 hanns,"hanns,tree=34/40000,reorder=150",text2image-10M,10,45381.62378698972,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.899572 hanns,"hanns,tree=34/40000,reorder=155",text2image-10M,10,45685.10712457384,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.900311 hanns,"hanns,tree=36/40000,reorder=150",text2image-10M,10,44630.44910364101,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9026080000000001 hanns,"hanns,tree=37/40000,reorder=145",text2image-10M,10,44957.96616927795,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9031560000000001 hanns,"hanns,tree=38/40000,reorder=140",text2image-10M,10,44787.13982548163,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.903562 hanns,"hanns,tree=42/40000,reorder=160",text2image-10M,10,41713.34961169815,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.911723 .

These seem to agree with your posted figure. Now running without the downloaded index. By the way, your index building code seems to download a file called config.pb even when the download is disabled. Inspecting looks like it just contains parameters, but can you just confirm that it doesn't contain any pre-computed index information?

yes,it's parameters for search

@magdalendobson
Copy link
Collaborator

I was able to build the index from scratch and confirm that it builds within the time and memory limits. I got the following results:
2: hanns,tree=34/40000,reorder=150 0.899 46754.467
4: hanns,tree=42/40000,reorder=160 0.911 42331.832
6: hanns,tree=32/40000,reorder=140 0.894 48426.372
9: hanns,tree=36/40000,reorder=150 0.902 45949.345
12: hanns,tree=32/40000,reorder=150 0.895 47507.687
14: hanns,tree=27/40000,reorder=111 0.877 54890.437
15: hanns,tree=27/40000,reorder=130 0.882 53006.511
16: hanns,tree=38/40000,reorder=140 0.903 45162.090
20: hanns,tree=34/40000,reorder=155 0.899 46443.915
21: hanns,tree=37/40000,reorder=145 0.902 45388.283

These agree with the results you shared, and that I found with the pre-computed index. I will approve the merge and speak with the other admins about updating our official results. Great entry!

@magdalendobson magdalendobson merged commit bf0eece into harsha-simhadri:main Aug 20, 2024
25 of 38 checks passed
@arron2003
Copy link
Contributor

Hi @AndrewHYu
Thanks for submitting.

I wonder if you can clarify the relationship between your submission and ScaNN? It looks like your submission loads a ScaNN index:
https://github.com/harsha-simhadri/big-ann-benchmarks/blob/main/neurips23/ood/hanns/hanns.py#L23-L46

The config.pb file is also identical to that of the ScaNN submission:

diff <(curl https://hanns.obs.ap-southeast-1.myhuaweicloud.com/v2/config.pb) <(curl https://storage.googleapis.com/scann/big-ann-2023/ood/scann_config.pb)

@magdalendobson for FYI.

@AndrewHYu
Copy link
Contributor Author

Hi @arron2003
Thanks for your reminder and contributions.
We used the ScaNN clustering method, and we found that there are many excellent designs that can improve performance and accuracy. Then some configuration items are reused, so the config.pb file is directly used. We will update the readme for details.

@harsha-simhadri
Copy link
Owner

harsha-simhadri commented Sep 12, 2024

@AndrewHYu Could you please share your name, affiliation and any collaborators on this code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants