You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As cuVS algorithms get more pre-filtering support, we need the be able to benchmark this functionality and compare the algorithms
Describe the solution you'd like
Add a search-only option to the benchmark executable --filter_ratio x, where x is a float between 0 and 1 means the proportion of the records in the index passing the filter; by default it's 1 meaning the legacy behavior (no filtering).
Modify the dataset class: add an extra bitset field of the same size as the dataset itself; allow generating it (or loading from file? when generating, maybe also expose the random seed parameter?).
Note 1: this way, the filter is set up once per whole benchmark; this ensures low overheads and a fair comparison.
Note 2: we cannot use raft's bitset here, because the common benchmark headers don't depend on raft.
Pass the bitset filter to the algorithms. I think, the easiest way would be to add a new api function set_filter(bitset ptr) similar to set_search_parameters. It would be called once per benchmark loop and only if the filter ratio is lower than 1.
Adapt the ground truth calculation. I think, the easiest way here would be to replace the total_count in the calculation of the recall with the number of non-filtered items. This will have noise (when the count is low) and will not be entirely correct (the recall is averaged across threads/loops), but it's better than nothing.
Note: there's a way to enhance the quality at somewhat low effort when k is smaller than the available max_k in the ground truth file; that is to consider not first k values in the ground truth, but first k non-filtered values in the ground truth.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
As cuVS algorithms get more pre-filtering support, we need the be able to benchmark this functionality and compare the algorithms
Describe the solution you'd like
--filter_ratio x
, wherex
is a float between 0 and 1 means the proportion of the records in the index passing the filter; by default it's 1 meaning the legacy behavior (no filtering).set_filter(bitset ptr)
similar toset_search_parameters
. It would be called once per benchmark loop and only if the filter ratio is lower than1
.k
is smaller than the availablemax_k
in the ground truth file; that is to consider not firstk
values in the ground truth, but firstk
non-filtered values in the ground truth.The text was updated successfully, but these errors were encountered: