[FEA] Add filtering option to benchmarks #479

achirkin · 2024-11-20T07:36:47Z

Is your feature request related to a problem? Please describe.
As cuVS algorithms get more pre-filtering support, we need the be able to benchmark this functionality and compare the algorithms

Describe the solution you'd like

Add a search-only option to the benchmark executable --filter_ratio x, where x is a float between 0 and 1 means the proportion of the records in the index passing the filter; by default it's 1 meaning the legacy behavior (no filtering).
Modify the dataset class: add an extra bitset field of the same size as the dataset itself; allow generating it (or loading from file? when generating, maybe also expose the random seed parameter?).
- Note 1: this way, the filter is set up once per whole benchmark; this ensures low overheads and a fair comparison.
- Note 2: we cannot use raft's bitset here, because the common benchmark headers don't depend on raft.
Pass the bitset filter to the algorithms. I think, the easiest way would be to add a new api function set_filter(bitset ptr) similar to set_search_parameters. It would be called once per benchmark loop and only if the filter ratio is lower than 1.
Adapt the ground truth calculation. I think, the easiest way here would be to replace the total_count in the calculation of the recall with the number of non-filtered items. This will have noise (when the count is low) and will not be entirely correct (the recall is averaged across threads/loops), but it's better than nothing.
- Note: there's a way to enhance the quality at somewhat low effort when k is smaller than the available max_k in the ground truth file; that is to consider not first k values in the ground truth, but first k non-filtered values in the ground truth.

The text was updated successfully, but these errors were encountered:

achirkin added the feature request New feature or request label Nov 20, 2024

achirkin added this to VS/ML/DM Primitives Release Board Nov 20, 2024

achirkin moved this to Todo in VS/ML/DM Primitives Release Board Nov 20, 2024

achirkin mentioned this issue Nov 20, 2024

[Feat] CAGRA filtering with BFKNN when sparsity matching threshold #378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add filtering option to benchmarks #479

[FEA] Add filtering option to benchmarks #479

achirkin commented Nov 20, 2024

[FEA] Add filtering option to benchmarks #479

[FEA] Add filtering option to benchmarks #479

Comments

achirkin commented Nov 20, 2024