top-K of Streaming #174

nk2014yj · 2023-10-03T14:58:16Z

I have two questions.

It passes top-1 to run_task of StreamingRunner, please help to confirm this value, thanks.

big-ann-benchmarks/benchmark/runner.py

Line 103 in f5bb90b

algo, ds, distance, 1, run_count, search_type, private_query, runbook)
Will the number of private querys remain consistent ？It will affect the overall time consumption.

harsha-simhadri · 2023-10-18T02:17:01Z

Hope #177 answers this question?

nk2014yj · 2023-10-18T02:50:44Z

Our code has been developed and tested. At that time, considering the memory was 16G, we only marked the deleted samples.
The 30 million data set consumes about 12G of memory. This has no impact on our algorithm.
When the memory was reduced to 8G, we had to modify the algorithm. However, we have no manpower to follow up for the time being.
To prevent preloading all data, the memory is modified to 8G. We can review our code together to ensure that there is no pre-stored full data. Is it possible to remove the memory limit?

harsha-simhadri · 2023-10-19T19:26:49Z

The intent of the track as we shared in the competition announcement is to develop algorithms that can update and compact the index instead of marking samples as deleted. We had indicated that the final runbook might be different to test this aspect. Would it be possibly to change the configuration of your algorithm to make it work with limited memory?

We could have also designed a runbook with 60M points, 16GB RAM and 2 hour limit. We decided to lower the time and number of points for faster experimentation.

nk2014yj · 2023-10-27T14:11:21Z

In August, it is says that 16GB RAM is avaliable of Streaming. We only notified the change to 8g in October, which was too late. Because we've already developed it.

nk2014yj · 2023-10-27T14:12:14Z

This is not a problem of algorithm performance, but a problem of temporarily changing rules.

nk2014yj · 2023-10-27T14:13:11Z

In early October, during online communication, there was no mention of modifying the memory requirements.

nk2014yj · 2023-10-27T14:13:23Z

In the email after the online communication meeting, there was no mention of modifying the memory requirements.

harsha-simhadri · 2023-10-30T02:45:25Z

@nk2014yj I acknowledge my mistake in not warning explicitly about the memory limit in prior communications. I apologize for that and will be careful with my communications in the future.

However, please let me share our intent for this track. We want to encourage algorithms that can adapt to a long stream of updates, including deletions, as the word "streaming" indicates in the algorithms world. Marking an index with tombstones is a good starting point, but we want to think about algorithms that also compact and adapt the index over time to the active set of points. I think we communicated this intent since the competition announcement. We also stated that the final dataset and runbook were going to be different than initial choices. If the evaluation used 16GB memory with a runbook with 800Bytes vectors + 30M stream of updates, or 400byte vectors + 60M stream of updates (with many deletions), it would have been no different qualitatively than using 8GB memory and 400byte vectors + 30M stream, right? Would you have considered a 60M update stream + 16GB limit outside the scope of the rules? The intent of the final runbook was to test whether submissions where actually cleaning up the index or just marking tombstones.

That said, I realize that you have spent time developing what you have. Please submit your algorithm if you are able to and I will run it with 16GB and publish it on the leaderboard with the note that it requires 8+GB. I am also happy to merge any update you might have with lower memory after the competition's deadline and scope and update the leaderboard.

harsha-simhadri mentioned this issue Oct 17, 2023

Harshasi/final runbook #177

Merged

nk2014yj closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

top-K of Streaming #174

top-K of Streaming #174

nk2014yj commented Oct 3, 2023

harsha-simhadri commented Oct 18, 2023

nk2014yj commented Oct 18, 2023 •

edited

Loading

harsha-simhadri commented Oct 19, 2023 •

edited

Loading

nk2014yj commented Oct 27, 2023

nk2014yj commented Oct 27, 2023

nk2014yj commented Oct 27, 2023

nk2014yj commented Oct 27, 2023

harsha-simhadri commented Oct 30, 2023 •

edited

Loading

top-K of Streaming #174

top-K of Streaming #174

Comments

nk2014yj commented Oct 3, 2023

harsha-simhadri commented Oct 18, 2023

nk2014yj commented Oct 18, 2023 • edited Loading

harsha-simhadri commented Oct 19, 2023 • edited Loading

nk2014yj commented Oct 27, 2023

nk2014yj commented Oct 27, 2023

nk2014yj commented Oct 27, 2023

nk2014yj commented Oct 27, 2023

harsha-simhadri commented Oct 30, 2023 • edited Loading

nk2014yj commented Oct 18, 2023 •

edited

Loading

harsha-simhadri commented Oct 19, 2023 •

edited

Loading

harsha-simhadri commented Oct 30, 2023 •

edited

Loading