Add FILLNULL command in PPL (#3032) #3075

normanj-bitquill · 2024-10-16T19:55:17Z

Description

Adds the FILLNULL command for PPL. FILLNULL will replace NULL values in specified fields.

Related Issues

Resolves #3032
Based on this PR for Spark: opensearch-project/opensearch-spark#723

Check List

[Y] New functionality includes testing.
[Y] New functionality has been documented.
[Y] New functionality has javadoc added.
[Y] New functionality has a user manual doc added.
[N/A] API changes companion pull request created.
[Y] Commits are signed per the DCO using --signoff.
[Y] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

An example query using fillnull.

    os> source=accounts | fields email, employer | fillnull with '<not found>' in email ;
    fetched rows / total rows = 4/4
    +-----------------------+----------+
    | email                 | employer |
    |-----------------------+----------|
    | [email protected]  | Pyrami   |
    | [email protected] | Netagy   |
    | <not found>           | Quility  |
    | [email protected]   | null     |
    +-----------------------+----------+

MaxKsyunz

Thank you for the PR! Could you add some integration tests? They live here.

jduo

I think we should add an ExplainTest to validate that you get an Eval physical plan when using FILLNULL.

normanj-bitquill · 2024-10-21T16:16:11Z

@MaxKsyunz I have added some integration tests.

normanj-bitquill · 2024-10-21T16:16:27Z

@jduo I have added an explain test.

docs/user/ppl/cmd/fillnull.rst

normanj-bitquill · 2024-11-27T23:54:06Z

@normanj-bitquill please lets align the docs with the existing ppl fillnull doc in spark

@YANG-DB The docs have been updated to match the docs for Spark. The sample queries are still different, since they are run in this repo.

ppl/src/main/antlr/OpenSearchPPLParser.g4

YANG-DB

@normanj-bitquill plz fix the keywordsCanBeId list and we can merge

normanj-bitquill · 2024-11-28T17:26:15Z

@normanj-bitquill plz fix the keywordsCanBeId list and we can merge

@YANG-DB I have added it.

acarbonetto · 2024-12-03T18:12:34Z

Can we do this?

Public documentation issue/PR created.

acarbonetto · 2024-12-03T18:19:20Z

ppl/src/main/antlr/OpenSearchPPLParser.g4

+   : USING nullableField EQUAL nullReplacement (COMMA nullableField EQUAL nullReplacement)*
+   ;
+
+nullableField


not sure the purpose for nullableField and nullReplacement, except for explicit trees. But 🤷 .

then we could use things like fieldList instead of nullableField (COMMA nullableField)*
maybe something like:

fillNullWithTheSameValue : WITH nullReplacement = valueExpression IN nullableFieldList = fieldList ;

and

fillNullWithFieldVariousValues : USING nullReplacementExpression (COMMA nullReplacementExpression)* ; nullReplacementExpression : nullableField = fieldExpression EQUAL nullReplacement = valueExpression

Is there a reason we're using valueExpression instead of expression?

We want an expression that produces a single value. expression could produce a column such as x + 5.

@acarbonetto For you first comment here about using fieldList, it looks like it makes sense. I took the syntax directly from the parser file in opensearch-spark. If I make the change you are suggesting here, then the two files will get further out of sync.

There are a few ways forward:

Take this syntax as is and clean it up when the parser files are unified

Fix the syntax here and deal with the mismatch later on when unifying the parser files

Fix the syntax here and create a PR for opensearch-spark with a similar change

Do you have any preference?

True. @YANG-DB do you have a preference? Should we raise an issue and tag the issue to be fixed in opensearch-project/piped-processing-language#23?

@normanj-bitquill @acarbonetto
I would prefer going with the 3rd option (Fix the syntax here and create a PR for opensearch-spark with a similar change)
thanks

Updated here and created a PR for the Spark project.

opensearch-project/opensearch-spark#968

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java

ppl/src/test/java/org/opensearch/sql/ppl/antlr/PPLSyntaxParserTest.java

ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java

acarbonetto · 2024-12-03T18:34:59Z

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java

+  @Test
+  public void testFillNullVariousValues() {
+    assertEquals(
+        "source=t | fillnull using f1 = 0, f2 = -1",


sounds this anonymize the expression?
e.g.

"source=t | fillnull using f1 = ***, f2 = ***",

reference: testAndExpression

I'm not sure what you mean here. Using a value of *** would likely change the data type in this case since it looks like f1 and f2 are int values.

The anonymizer should be removing all user-defined values and anonymizing the logger output.
You can look at other examples in this test file for anonymized output.

This is getting way off track. Only null values are replaced. This feature is not at all suited for anonymizing data (all non null values would be unchanged). If you want to anonymize data, you could use eval to create a new field and then fields to include the new field and ignore the old field.

* Add FILLNULL command in PPL Signed-off-by: Norman Jordan <[email protected]>

Signed-off-by: Norman Jordan <[email protected]>

…nymizer.java Formatting fix Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

…nymizerTest.java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

….java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

…Test.java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

Signed-off-by: Norman Jordan <[email protected]>

YANG-DB · 2024-12-04T21:18:48Z

@normanj-bitquill LGTM

@acarbonetto @Yury-Fridlyand @MaxKsyunz : can we approve and merge ?
thanks

MaxKsyunz reviewed Oct 18, 2024

View reviewed changes

YANG-DB previously approved these changes Oct 18, 2024

View reviewed changes

jduo suggested changes Oct 18, 2024

View reviewed changes

normanj-bitquill dismissed YANG-DB’s stale review via a1f584f October 21, 2024 16:15

currantw reviewed Oct 24, 2024

View reviewed changes

docs/user/ppl/cmd/fillnull.rst Outdated Show resolved Hide resolved

normanj-bitquill force-pushed the opensearch-3032 branch 2 times, most recently from 240d0d6 to 7d17615 Compare October 30, 2024 22:52

YANG-DB reviewed Nov 28, 2024

View reviewed changes

ppl/src/main/antlr/OpenSearchPPLParser.g4 Show resolved Hide resolved

YANG-DB previously approved these changes Nov 28, 2024

View reviewed changes

normanj-bitquill dismissed YANG-DB’s stale review via a8fbd43 November 28, 2024 17:25

YANG-DB previously approved these changes Dec 3, 2024

View reviewed changes

acarbonetto reviewed Dec 3, 2024

View reviewed changes

normanj-bitquill mentioned this pull request Dec 3, 2024

[DOC] Add FILLNULL PPL documentation opensearch-project/documentation-website#8867

Open

4 tasks

normanj-bitquill dismissed YANG-DB’s stale review via 2542b90 December 3, 2024 18:57

YANG-DB previously approved these changes Dec 3, 2024

View reviewed changes

normanj-bitquill and others added 11 commits December 3, 2024 15:52

Add FILLNULL command in PPL (opensearch-project#3032)

bac4725

* Add FILLNULL command in PPL Signed-off-by: Norman Jordan <[email protected]>

Added some integration tests for fillnull

4be00ba

Signed-off-by: Norman Jordan <[email protected]>

Updated the multi field integration test

913177f

Signed-off-by: Norman Jordan <[email protected]>

Added some more tests

104445a

Signed-off-by: Norman Jordan <[email protected]>

Updated fillnull doc to match doc for Spark

6df47b6

Signed-off-by: Norman Jordan <[email protected]>

Added fillnull to the keywordsCanBeId list

00fe9be

Signed-off-by: Norman Jordan <[email protected]>

Update ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAno…

18ccf38

…nymizer.java Formatting fix Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

Update ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAno…

430bf78

…nymizerTest.java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

Update ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest…

3ac44f0

….java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

Update ppl/src/test/java/org/opensearch/sql/ppl/antlr/PPLSyntaxParser…

0f04940

…Test.java Fix formatting Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: normanj-bitquill <[email protected]>

Simplified the grammar for FILLNULL

8559e1c

Signed-off-by: Norman Jordan <[email protected]>

normanj-bitquill dismissed YANG-DB’s stale review via 8559e1c December 4, 2024 17:43

normanj-bitquill force-pushed the opensearch-3032 branch from 5fd3783 to 8559e1c Compare December 4, 2024 17:43

normanj-bitquill mentioned this pull request Dec 4, 2024

Updated grammar for FILLNULL to match grammar in SQL project opensearch-project/opensearch-spark#968

Merged

5 tasks

YANG-DB approved these changes Dec 4, 2024

View reviewed changes

acarbonetto approved these changes Dec 5, 2024

View reviewed changes

acarbonetto merged commit b6846ce into opensearch-project:main Dec 5, 2024
13 of 15 checks passed

acarbonetto deleted the opensearch-3032 branch December 5, 2024 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FILLNULL command in PPL (#3032) #3075

Add FILLNULL command in PPL (#3032) #3075

normanj-bitquill commented Oct 16, 2024 •

edited

Loading

MaxKsyunz left a comment •

edited

Loading

jduo left a comment

normanj-bitquill commented Oct 21, 2024

normanj-bitquill commented Oct 21, 2024

normanj-bitquill commented Nov 27, 2024

YANG-DB left a comment

normanj-bitquill commented Nov 28, 2024

acarbonetto commented Dec 3, 2024 •

edited

Loading

acarbonetto Dec 3, 2024

acarbonetto Dec 3, 2024

acarbonetto Dec 3, 2024

acarbonetto Dec 3, 2024

normanj-bitquill Dec 3, 2024

normanj-bitquill Dec 3, 2024

acarbonetto Dec 3, 2024

YANG-DB Dec 3, 2024

normanj-bitquill Dec 4, 2024

acarbonetto Dec 3, 2024

normanj-bitquill Dec 3, 2024

acarbonetto Dec 3, 2024

normanj-bitquill Dec 4, 2024

YANG-DB commented Dec 4, 2024 •

edited

Loading

Add FILLNULL command in PPL (#3032) #3075

Add FILLNULL command in PPL (#3032) #3075

Conversation

normanj-bitquill commented Oct 16, 2024 • edited Loading

Description

Related Issues

Check List

MaxKsyunz left a comment • edited Loading

Choose a reason for hiding this comment

jduo left a comment

Choose a reason for hiding this comment

normanj-bitquill commented Oct 21, 2024

normanj-bitquill commented Oct 21, 2024

normanj-bitquill commented Nov 27, 2024

YANG-DB left a comment

Choose a reason for hiding this comment

normanj-bitquill commented Nov 28, 2024

acarbonetto commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YANG-DB commented Dec 4, 2024 • edited Loading

normanj-bitquill commented Oct 16, 2024 •

edited

Loading

MaxKsyunz left a comment •

edited

Loading

acarbonetto commented Dec 3, 2024 •

edited

Loading

YANG-DB commented Dec 4, 2024 •

edited

Loading