This repository provides code to generate several datasets for similarity search benchmarking and evaluation on high-dimensional vectors stemming from recent deep learning models. The available datasets are:
Please see the details of each dataset in the respective README files.
[1] Aguerrebere, C.; Bhati I.; Hildebrand M.; Tepper M.; Willke T.:Similarity search in the blink of an eye with compressed indices. In: Proceedings of the VLDB Endowment, 16, 11, 3433 - 3446. (2023)
[2] Aguerrebere, C.; Hildebrand M.; Bhati I.; Willke T.; Tepper M..: Locally-adaptive Quantization for Streaming Vector Search. (2024) [arxiv]
[3] Tepper M.; Bhati I.; Aguerrebere, C.; Hildebrand M.; Willke T.: LeanVec: Search your vectors faster by making them fit. arXiv preprint arXiv:2312.16335 (2024)