-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub-iterators of ConjunctionDISI are not on the same document #13616
Comments
Thanks for filing this issue @ChapterSevenSeeds I am able to reproduce the issue with the steps you mentioned. Will take a stab at it and update as I get time. |
Thanks @expani . Removing untriaged label. |
The match rule
gets converted into an empty query ( No Match ) due to the custom analyser removing stop words. The final IntervalsSource created for the IntervalQuery here is a DisjunctionIntervalsSource and looks like below
OpenSearch represents the IntervalsIterator for The intervals iterator for the rule
is a based on PostingsEnum which represents an un-positioned docId using When both these iterators are matched, they differ which leads to the error reported being thrown here Lucene also represents an IntervalsIterator for I made the same changes to the The diff is
We should make the NO_INTERVALS used by Lucene public and use the same in OpenSearch instead of creating a copy to avoid any such issues in future. |
@expani - Thank you root causing this issue. Completely agree on reusing the |
@jainankitk Yes opened this PR in Lucene |
Thank you for the lucene PR! |
This problem is often seen with patent search. I opened apache/lucene#13388 There are more issues, but making that constant publicly available is wrong. There should be a method to create an empty interval with a reason. |
P.S.: I tend to revert above PR. It does not fit the Lucene API patterns. |
P.S.: The workaround that can be used already in OpenSearch code is the following factory method: This requires at least one match for an empty list of intervals, so it rewrites to a |
Agree on putting this behind an API. Returning a |
I made a comment because of that in the PR. We need more tests, I agree. Up to now this atLeast(1) has always worked for me. Null is always used in lucene for iterators that fo not match anything. |
See here for DoxIdSetIterator and ScorerSupplier: https://github.com/apache/lucene/blob/22d50be2eab84c9a75ea65f55fcc4356724faba1/lucene/core/src/java/org/apache/lucene/search/MatchNoDocsQuery.java#L47 |
Yes returning null worked for this case in OpenSearch as well, but I added a comment to support both style of iterators |
Describe the bug
When performing an intervals query on a field with a custom mapping and analyzer, I get an illegal argument exception with the following reason:
Sub-iterators of ConjunctionDISI are not on the same document!
. I am not sure if the error is due to an issue with our custom mapping or analyzers, or if it caused by a bug somewhere. Any insight is greatly appreciated.Related component
Search
To Reproduce
Expected behavior
I would expect the intervals query to succeed and return the document created in the reproduction steps.
Additional Details
opensearchproject/opensearch:latest
.max_gaps
from the query or setting it to -1.text_general_search
analyzer from the mapping.stop
orlowercase
from the analyzer filter.The text was updated successfully, but these errors were encountered: