Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Train Model KNN Workload #332

Closed
2 tasks
finnroblin opened this issue Jun 19, 2024 · 3 comments · Fixed by #333
Closed
2 tasks

[FEATURE] Add Train Model KNN Workload #332

finnroblin opened this issue Jun 19, 2024 · 3 comments · Fixed by #333
Labels
enhancement New feature or request

Comments

@finnroblin
Copy link
Contributor

finnroblin commented Jun 19, 2024

Is your feature request related to a problem?

Customers may want to benchmark approximate k-NN search algorithms that require a training step. For example, the k-NN plugin with the FAISS engine and IVF method requires a training step to cluster database vectors. Then search can be performed against a smaller number of cluster centroids instead of the entire database.

There is no preexisting workload that supports this use case or an OSB operation-type to call the k-NN training API.

What solution would you like?

Add a workload that benchmarks both training a model (like faiss ivf) and searching it. This workload would require code additions in the OpenSearch Benchmarks repo in order to support the initial training operation.

Do you have any additional context?

There is a benchmarking procedure in the k-NN plugins repo for training. However it is a better customer experience to have an automated workload in the opensearch-benchmark-workloads repository. There is already a workload for the approximate k-NN methods that do not require training like HNSW.

Subtasks:

@gkamat
Copy link
Collaborator

gkamat commented Jun 20, 2024

@VijayanB perhaps you can comment on this? Thanks.

@VijayanB
Copy link
Member

VijayanB commented Jun 24, 2024

@gkamat Currently this feature is not added to OSB. We are still using OSB from K-NN to execute this operation. As a part of this task , we can deprecate this operation from K-NN and 1 step closer to using this repo as one repo for all vector search benchmarks.

@IanHoang
Copy link
Collaborator

Both PRs have been merged into mainline in respective repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants