Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top-K of Streaming #174

Closed
nk2014yj opened this issue Oct 3, 2023 · 8 comments
Closed

top-K of Streaming #174

nk2014yj opened this issue Oct 3, 2023 · 8 comments

Comments

@nk2014yj
Copy link
Contributor

nk2014yj commented Oct 3, 2023

I have two questions.

  1. It passes top-1 to run_task of StreamingRunner, please help to confirm this value, thanks.

    algo, ds, distance, 1, run_count, search_type, private_query, runbook)

  2. Will the number of private querys remain consistent ?It will affect the overall time consumption.

@harsha-simhadri
Copy link
Owner

Hope #177 answers this question?

@nk2014yj
Copy link
Contributor Author

nk2014yj commented Oct 18, 2023

Our code has been developed and tested. At that time, considering the memory was 16G, we only marked the deleted samples.
The 30 million data set consumes about 12G of memory. This has no impact on our algorithm.
When the memory was reduced to 8G, we had to modify the algorithm. However, we have no manpower to follow up for the time being.
To prevent preloading all data, the memory is modified to 8G. We can review our code together to ensure that there is no pre-stored full data. Is it possible to remove the memory limit?

@harsha-simhadri
Copy link
Owner

harsha-simhadri commented Oct 19, 2023

The intent of the track as we shared in the competition announcement is to develop algorithms that can update and compact the index instead of marking samples as deleted. We had indicated that the final runbook might be different to test this aspect. Would it be possibly to change the configuration of your algorithm to make it work with limited memory?

We could have also designed a runbook with 60M points, 16GB RAM and 2 hour limit. We decided to lower the time and number of points for faster experimentation.

@nk2014yj
Copy link
Contributor Author

In August, it is says that 16GB RAM is avaliable of Streaming. We only notified the change to 8g in October, which was too late. Because we've already developed it.

@nk2014yj
Copy link
Contributor Author

This is not a problem of algorithm performance, but a problem of temporarily changing rules.

@nk2014yj
Copy link
Contributor Author

In early October, during online communication, there was no mention of modifying the memory requirements.

@nk2014yj
Copy link
Contributor Author

In the email after the online communication meeting, there was no mention of modifying the memory requirements.

@harsha-simhadri
Copy link
Owner

harsha-simhadri commented Oct 30, 2023

@nk2014yj I acknowledge my mistake in not warning explicitly about the memory limit in prior communications. I apologize for that and will be careful with my communications in the future.

However, please let me share our intent for this track. We want to encourage algorithms that can adapt to a long stream of updates, including deletions, as the word "streaming" indicates in the algorithms world. Marking an index with tombstones is a good starting point, but we want to think about algorithms that also compact and adapt the index over time to the active set of points. I think we communicated this intent since the competition announcement. We also stated that the final dataset and runbook were going to be different than initial choices. If the evaluation used 16GB memory with a runbook with 800Bytes vectors + 30M stream of updates, or 400byte vectors + 60M stream of updates (with many deletions), it would have been no different qualitatively than using 8GB memory and 400byte vectors + 30M stream, right? Would you have considered a 60M update stream + 16GB limit outside the scope of the rules? The intent of the final runbook was to test whether submissions where actually cleaning up the index or just marking tombstones.

That said, I realize that you have spent time developing what you have. Please submit your algorithm if you are able to and I will run it with 16GB and publish it on the leaderboard with the note that it requires 8+GB. I am also happy to merge any update you might have with lower memory after the competition's deadline and scope and update the leaderboard.

@nk2014yj nk2014yj closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants