-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not pass negative scores into function_score or script_score queries #13627
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -52,8 +52,14 @@ | |||
public class ScriptScoreFunction extends ScoreFunction { | ||||
|
||||
static final class CannedScorer extends Scorable { | ||||
protected int docid; | ||||
protected float score; | ||||
private int docid; | ||||
private float score; | ||||
|
||||
public void score(float subScore) { | ||||
// We check to make sure the script score function never makes a score negative, but we need to make | ||||
// sure the script score function does not receive negative input. | ||||
this.score = Math.max(0.0f, subScore); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @msfroh don't we try to hide the problem instead of fixing the source of negative scoring? As you mentioned, we do catch that in some cases (and thrown an exception) but it slips in other, why cannot we fix that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was some discussion about this on #7860. Essentially, the scoring logic in
As @ylwu-amzn called out on the issue, any change we make to the Meanwhile, this change just turns a query that was throwing an exception (which is what people have complained about) into one that doesn't. We could possibly wrap the query returned from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @reta -- I spent some more time today trying to prevent the negative score from bubbling up. So far as I can tell, this is the query that's returning the negative scores:
Unfortunately, that just returns a Lucene I believe it's specifically tl;dr: Preventing the negative scores from getting into function/script scoring functions is hard. This I'd like to commit this fix as-is to address the immediate problem. We can address the complicated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When we enter the
The relevant piece of the explain output is:
That I think I can actually fix it by guaranteeing that So, after saying that fixing the underlying problem is too hard, I think I can fix the underlying problem instead. 😁 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we go: #13829 It may still be worth merging this PR as an extra layer of protection, but I'd need to update / remove the YAML REST test, because we no longer get a negative score that we force to zero. Alternatively, we could just close this PR, since the underlying problem is (probably) fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot for looking into this @msfroh (I learn quite a few things from your clear explanation) |
||||
} | ||||
|
||||
@Override | ||||
public int docID() { | ||||
|
@@ -105,7 +111,7 @@ public LeafScoreFunction getLeafScoreFunction(LeafReaderContext ctx) throws IOEx | |||
public double score(int docId, float subQueryScore) throws IOException { | ||||
leafScript.setDocument(docId); | ||||
scorer.docid = docId; | ||||
scorer.score = subQueryScore; | ||||
scorer.score(subQueryScore); | ||||
double result = leafScript.execute(null); | ||||
if (result < 0f) { | ||||
throw new IllegalArgumentException("script score function must not produce negative scores, but got: [" + result + "]"); | ||||
|
@@ -119,7 +125,7 @@ public Explanation explainScore(int docId, Explanation subQueryScore) throws IOE | |||
if (leafScript instanceof ExplainableScoreScript) { | ||||
leafScript.setDocument(docId); | ||||
scorer.docid = docId; | ||||
scorer.score = subQueryScore.getValue().floatValue(); | ||||
scorer.score(subQueryScore.getValue().floatValue()); | ||||
exp = ((ExplainableScoreScript) leafScript).explain(subQueryScore, functionName); | ||||
} else { | ||||
double score = score(docId, subQueryScore.getValue().floatValue()); | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
/* | ||
* SPDX-License-Identifier: Apache-2.0 | ||
* | ||
* The OpenSearch Contributors require contributions made to | ||
* this file be licensed under the Apache-2.0 license or a | ||
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.index.query; | ||
|
||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.search.DocIdSetIterator; | ||
import org.apache.lucene.search.Explanation; | ||
import org.apache.lucene.search.IndexSearcher; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.QueryVisitor; | ||
import org.apache.lucene.search.ScoreMode; | ||
import org.apache.lucene.search.Scorer; | ||
import org.apache.lucene.search.Weight; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Similar to Lucene's BoostQuery, but will accept negative boost values (which is normally wrong, since scores | ||
* should not be negative). Useful for testing that other query types guard against negative input scores. | ||
*/ | ||
public class NegativeBoostQuery extends Query { | ||
private final Query query; | ||
private final float boost; | ||
|
||
public NegativeBoostQuery(Query query, float boost) { | ||
if (boost >= 0) { | ||
throw new IllegalArgumentException("Expected negative boost. Use BoostQuery if boost is non-negative."); | ||
} | ||
this.boost = boost; | ||
this.query = query; | ||
} | ||
|
||
@Override | ||
public String toString(String field) { | ||
StringBuilder builder = new StringBuilder(); | ||
builder.append("("); | ||
builder.append(query.toString(field)); | ||
builder.append(")"); | ||
builder.append("^"); | ||
builder.append(boost); | ||
return builder.toString(); | ||
} | ||
|
||
@Override | ||
public void visit(QueryVisitor visitor) { | ||
query.visit(visitor); | ||
} | ||
|
||
@Override | ||
public boolean equals(Object other) { | ||
return sameClassAs(other) && equalsTo(getClass().cast(other)); | ||
} | ||
|
||
private boolean equalsTo(NegativeBoostQuery other) { | ||
return query.equals(other.query) && Float.floatToIntBits(boost) == Float.floatToIntBits(other.boost); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
int h = classHash(); | ||
h = 31 * h + query.hashCode(); | ||
h = 31 * h + Float.floatToIntBits(boost); | ||
return h; | ||
} | ||
|
||
@Override | ||
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { | ||
final float negativeBoost = this.boost; | ||
Weight delegate = query.createWeight(searcher, scoreMode, boost); | ||
return new Weight(this) { | ||
@Override | ||
public Explanation explain(LeafReaderContext context, int doc) throws IOException { | ||
return delegate.explain(context, doc); | ||
} | ||
|
||
@Override | ||
public Scorer scorer(LeafReaderContext context) throws IOException { | ||
Scorer delegateScorer = delegate.scorer(context); | ||
return new Scorer(this) { | ||
@Override | ||
public DocIdSetIterator iterator() { | ||
return delegateScorer.iterator(); | ||
} | ||
|
||
@Override | ||
public float getMaxScore(int upTo) throws IOException { | ||
return delegateScorer.getMaxScore(upTo); | ||
} | ||
|
||
@Override | ||
public float score() throws IOException { | ||
return delegateScorer.score() * negativeBoost; | ||
} | ||
|
||
@Override | ||
public int docID() { | ||
return delegateScorer.docID(); | ||
} | ||
}; | ||
} | ||
|
||
@Override | ||
public boolean isCacheable(LeafReaderContext ctx) { | ||
return delegate.isCacheable(ctx); | ||
} | ||
}; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, so the functional score on a document will be at least zero, instead of negative. This will fix the score per document level.
Wondering if this also fix the per field level score is negative? can we run profile api on yaml test and check perFieldSimilarity score is negative or zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic focuses on the input score passed to a function or script, regardless of where that score came from.
Both before and after this change, we have checks that the score returned by a function or script is not negative (and throw an exception if it is). Unfortunately, this meant that a function/script that just passes a score through would throw an exception if the input score is negative.
So, for any case where a function or script gets an illegal negative value as input, this change defensively treats it as zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome! thank you