-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reindex ITs #446
Add reindex ITs #446
Conversation
Signed-off-by: zane-neo <[email protected]>
Signed-off-by: zane-neo <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #446 +/- ##
=========================================
Coverage 84.37% 84.37%
Complexity 498 498
=========================================
Files 40 40
Lines 1491 1491
Branches 228 228
=========================================
Hits 1258 1258
Misses 133 133
Partials 100 100 |
Signed-off-by: zane-neo <[email protected]>
@zane-neo can we add reindex IT for other processors too? |
@zane-neo any update on this PR? |
The issue in ml-commons is not specific to individual processor, it's more like a generic issue(user info not restored after the listener ran), so one or two processor to cover this is enough I thought. |
I won't agree with this. It will be good if we have reindex ITs for atleast all the local model like splade and text embeddings. Text + image processor works on remote model so we cannot integrate here. |
This PR already includes splade and text embedding processors. |
Using Text_image_embeddings processor with only text field should work with local model. Processor uses mlClient to call the model, and model type is abstracted by the client. |
it's a text embedding case right? |
I think it's not. Embeddings are only text, that's correct, but the processor is text_image. That processor can work with text, image and text+image. If we want to test that reindexing works with all processors it would not be correct to test it with text_embedding and assume that text_image_embedding processor is also tested. |
current tests covered local models which can prove the correctness of ml-commons code, no need to add other cases IMO. |
This is not the sole purpose of these tests. If the purpose is for correctness of ML code, the tests should be in MLCommons. We want to cover as many use cases as possible in which an ingestion processor can be used. Re-indexing is one of the use-case here. I know we started this PR as part of making sure that bug we got here: #386 can be caught in future, but that is not the sole purpose of the tests. |
There're quite a lot different flows/features in OpenSearch, e.g. shrink index, split index, restore index etc. Do we have a way to cover all these features when implementing a new feature to ensure they're not breaking? Is covering only reindex enough? |
Given the issue in hand we should start with re-index and then add more case which you provided after this. We can cut a github issue to make sure that we are covering the cases you mentioned too. |
Is there already a generic approach to address this? Like which can be used for all plugins so that no need to implement this in individual plugin separately? |
As per my understanding there is no such thing. Main thing which I was trying to put here was, you are adding ITs only for sparse and text embedding processor. There are other processors too in the plugin. So it will be better if you can add re-index ITs for those processors too. |
Will do more test on open source part to reproduce this issue, will reopen this or create a new PR to this. Closing this now. |
Description
Add reindex ITs in neural search
Issues Resolved
Issue: #386
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.