-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for radial search in exact search #2174
Add support for radial search in exact search #2174
Conversation
This depends on #2136 |
cbd28c4
to
a2936f3
Compare
I am skipping changelog. While merging back to main will update changelog. |
e0ab310
to
0b33a1b
Compare
*/ | ||
private boolean isExactSearchRequire(final LeafReaderContext context, final int filterIdsCount, final int annResultCount) { | ||
if (annResultCount == 0 && isMissingNativeEngineFiles(context)) { | ||
log.debug("Perform exact search after approximate search since no native engine files are available"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case unexpected so we use debug level log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is in path of query, recommendation is to avoid info.
dc81e8f
to
230f286
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be updates in unit tests, will it be present in another iteration?
src/main/java/org/opensearch/knn/index/query/ExactSearcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/ExactSearcher.java
Outdated
Show resolved
Hide resolved
When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]>
6c47860
to
abc6e69
Compare
Added unit test |
@SneakyThrows | ||
public void testRadialSearch_whenNoEngineFiles_thenPerformExactSearch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought I see we are using this test to check ExactSearch for radial search and avoided the Unit test for ExactSearch with radial search threshold. See if you want to mock ExactSearcher search class and all its invocation to make it proper Unit test. Since exact search is not getting added more and more extra logic do you see we should have separate unit test class for ExactSearcher class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I can create Github issue for refactor. I see two option, 1) Keep this PR as it is and refactor as part of GH issue since there are more than 3 test cases that uses exact search, or 2) I can only do it for radial search, as part of GH issue others can be moved later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the option 2. Lets do atleast for this test case and create a GH issue too for the test refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this PR. Will create GH issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Vijayan Balasubramanian <[email protected]>
7466091
to
efa7cb3
Compare
Signed-off-by: Vijayan Balasubramanian <[email protected]>
efa7cb3
to
e51ae11
Compare
@@ -71,15 +106,17 @@ private Map<Integer, Float> scoreAllDocs(KNNIterator iterator) throws IOExceptio | |||
return docToScore; | |||
} | |||
|
|||
private Map<Integer, Float> searchTopK(KNNIterator iterator, int k) throws IOException { | |||
private Map<Integer, Float> searchTopCandidates(KNNIterator iterator, int limit, @NonNull Predicate<Float> filterScore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why k variable is renamed to limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method will be called by radialSearch as well. Hence, renamed it from K to be more generic.
96663fd
into
opensearch-project:feature/build-vector-ds-greedily
* Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
…t search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <[email protected]> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
…t search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <[email protected]> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 5a56829)
…nd perform exact search when there are no engine files (#2201) * Add support to build vector data structures greedily and perform exact search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <[email protected]> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 5a56829) * Fix compilation issue due to package error Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> Co-authored-by: Vijayan Balasubramanian <[email protected]>
Description
When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method
is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines.
Related Issues
Part of #1942
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.