-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FILLNULL command in PPL (#3032) #3075
Add FILLNULL command in PPL (#3032) #3075
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! Could you add some integration tests? They live here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add an ExplainTest to validate that you get an Eval physical plan when using FILLNULL.
@MaxKsyunz I have added some integration tests. |
@jduo I have added an explain test. |
240d0d6
to
7d17615
Compare
@YANG-DB The docs have been updated to match the docs for Spark. The sample queries are still different, since they are run in this repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill plz fix the keywordsCanBeId
list and we can merge
@YANG-DB I have added it. |
Can we do this?
|
: USING nullableField EQUAL nullReplacement (COMMA nullableField EQUAL nullReplacement)* | ||
; | ||
|
||
nullableField |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure the purpose for nullableField
and nullReplacement
, except for explicit trees. But 🤷 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we could use things like fieldList
instead of nullableField (COMMA nullableField)*
maybe something like:
fillNullWithTheSameValue
: WITH nullReplacement = valueExpression IN nullableFieldList = fieldList
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and
fillNullWithFieldVariousValues
: USING nullReplacementExpression (COMMA nullReplacementExpression)*
;
nullReplacementExpression
: nullableField = fieldExpression EQUAL nullReplacement = valueExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're using valueExpression
instead of expression
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want an expression that produces a single value. expression
could produce a column such as x + 5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@acarbonetto For you first comment here about using fieldList
, it looks like it makes sense. I took the syntax directly from the parser file in opensearch-spark
. If I make the change you are suggesting here, then the two files will get further out of sync.
There are a few ways forward:
- Take this syntax as is and clean it up when the parser files are unified
- Fix the syntax here and deal with the mismatch later on when unifying the parser files
- Fix the syntax here and create a PR for
opensearch-spark
with a similar change
Do you have any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. @YANG-DB do you have a preference? Should we raise an issue and tag the issue to be fixed in opensearch-project/piped-processing-language#23?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill @acarbonetto
I would prefer going with the 3rd option (Fix the syntax here and create a PR for opensearch-spark with a similar change
)
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here and created a PR for the Spark project.
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Outdated
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/antlr/PPLSyntaxParserTest.java
Outdated
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java
Outdated
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
Outdated
Show resolved
Hide resolved
@Test | ||
public void testFillNullVariousValues() { | ||
assertEquals( | ||
"source=t | fillnull using f1 = 0, f2 = -1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds this anonymize the expression?
e.g.
"source=t | fillnull using f1 = ***, f2 = ***",
reference: testAndExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. Using a value of ***
would likely change the data type in this case since it looks like f1
and f2
are int values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anonymizer should be removing all user-defined values and anonymizing the logger output.
You can look at other examples in this test file for anonymized output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting way off track. Only null values are replaced. This feature is not at all suited for anonymizing data (all non null values would be unchanged). If you want to anonymize data, you could use eval
to create a new field and then fields
to include the new field and ignore the old field.
* Add FILLNULL command in PPL Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
…nymizer.java Formatting fix Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>
…nymizerTest.java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>
….java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>
…Test.java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
5fd3783
to
8559e1c
Compare
@normanj-bitquill LGTM
|
Description
Adds the FILLNULL command for PPL. FILLNULL will replace NULL values in specified fields.
Related Issues
Resolves #3032
Based on this PR for Spark: opensearch-project/opensearch-spark#723
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
An example query using fillnull.