Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New trendline ppl command (WMA) #872

Merged
merged 12 commits into from
Nov 14, 2024

Conversation

andy-k-improving
Copy link
Contributor

@andy-k-improving andy-k-improving commented Nov 5, 2024

Description

Introduce a new variant (WMA) for existing trendline ppl command, by compositing a logical plan similar to the following with function nth_value( ) to calculate the WMA value by perform event look behind.

-- +- 'Project ['name, 'salary, 
-- (((('nth_value('salary, 3) windowspecdefinition('age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$())) * 3) + 
-- ('nth_value('salary, 2) windowspecdefinition('age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$())) * 2)) + 
-- ('nth_value('salary, 1) windowspecdefinition('age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$())) * 1)) / 6) AS WMA#708]
   -- +- 'UnresolvedRelation [employees], [], false

Some high level code changes:

  • Update developer README to include selected set of Unit || Integration test
  • Update example for WMA command usage
  • Update CatelystQueryPlanVisotor related classes to provide sort option argument into TrendLine processing logic, as sort field is mandatory for WMA calculation.
  • Update TrendlineCatalystUtils.java to have a new code path for WMA selection and associated calculation logic.

Related Issues

Prior implement for SMA formula: #833

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Test plan:

Despite the existing unit test / integration test, the feature can also be tested manually, by first inserting a simple table, then run PPL trend line command against the table to calculate WMA value.

# Produce the artifact
sbt clean sparkPPLCosmetic/publishM2

# Start Spark with the plugin
bin/spark-sql --jars "/ABSOLUTE_PATH_TO_ARTIFACT/opensearch-spark-ppl_2.12-0.6.0-SNAPSHOT.jar" \
--conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions"  \
--conf "spark.sql.catalog.dev=org.apache.spark.opensearch.catalog.OpenSearchCatalog" \
--conf "spark.hadoop.hive.cli.print.header=true"

# Insert test table and data
CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT, con STRING);

INSERT INTO employees VALUES ("Lisa", "Sales------", 10000, 35, 'test');
INSERT INTO employees VALUES ("Evan", "Sales------", 32000, 38, 'test');
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28, 'test');
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33, 'test');
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33, 'test');
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28, 'test');
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38, 'test');
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23, 'test');
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25, 'test');

# Execute WMA with basic option:

source=employees | trendline sort age wma(2, salary);

name	dept	salary	age	con	salary_trendline
Paul	Engineering	29000	23	test	NULL
Chloe	Engineering	23000	25	test	25000.0
Jane	Marketing	29000	28	test	27000.0
Fred	Engineering	21000	28	test	23666.666666666668
Alex	Sales------	30000	33	test	27000.0
Tom	Engineering	23000	33	test	25333.333333333332
Lisa	Sales------	10000	35	test	14333.333333333334
Jeff	Marketing	35000	38	test	26666.666666666668
Evan	Sales------	32000	38	test	33000.0


# Execute WMA with alias:

source=employees | trendline sort age wma(2, salary) as CUSTOM_NAME

name	dept	salary	age	con	CUSTOM_NAME
Paul	Engineering	29000	23	test	NULL
Chloe	Engineering	23000	25	test	25000.0
Jane	Marketing	29000	28	test	27000.0
Fred	Engineering	21000	28	test	23666.666666666668
Alex	Sales------	30000	33	test	27000.0
Tom	Engineering	23000	33	test	25333.333333333332
Lisa	Sales------	10000	35	test	14333.333333333334
Jeff	Marketing	35000	38	test	26666.666666666668
Evan	Sales------	32000	38	test	33000.0


# Execute WMA with multiple calculations:

source=employees | trendline sort age wma(2, salary) as WMA_2 wma(3, salary) as WMA_3;


name	dept	salary	age	con	WMA_2	WMA_3
Paul	Engineering	29000	23	test	NULL	NULL
Chloe	Engineering	23000	25	test	25000.0	NULL
Jane	Marketing	29000	28	test	27000.0	27000.0
Fred	Engineering	21000	28	test	23666.666666666668	24000.0
Alex	Sales------	30000	33	test	27000.0	26833.333333333332
Tom	Engineering	23000	33	test	25333.333333333332	25000.0
Lisa	Sales------	10000	35	test	14333.333333333334	17666.666666666668
Jeff	Marketing	35000	38	test	26666.666666666668	24666.666666666668
Evan	Sales------	32000	38	test	33000.0	29333.333333333332
Time taken: 0.466 seconds, Fetched 9 row(s)


@andy-k-improving andy-k-improving changed the title New trend line ppl command (WMA) New trendline ppl command (WMA) Nov 5, 2024
@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support 0.6 labels Nov 5, 2024
Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andy-k-improving
please add the relevant documentation references including examples

@YANG-DB
Copy link
Member

YANG-DB commented Nov 6, 2024

@andy-k-improving please add DCO (sign-off)

@andy-k-improving
Copy link
Contributor Author

@andy-k-improving please add DCO (sign-off)

Done.

@andy-k-improving
Copy link
Contributor Author

@andy-k-improving

please add the relevant documentation references including examples

Done.

@andy-k-improving
Copy link
Contributor Author

@YANG-DB I have updated the example and documentation, would you mind to have another look?

Thanks,

@andy-k-improving
Copy link
Contributor Author

Hi @dai-chen , @LantaoJin and @salyh , would you guys mind to have look on this, any feedback would be appreciated :)

Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andy-k-improving this looks good - only a few questions commented

DEVELOPER_GUIDE.md Show resolved Hide resolved
@@ -283,6 +283,7 @@ public enum BuiltinFunctionName {
WILDCARDQUERY(FunctionName.of("wildcardquery")),
WILDCARD_QUERY(FunctionName.of("wildcard_query")),

NTH_VALUE(FunctionName.of("nth_value")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is for PPL builtin function, not Spark. If we want to add a new PPL builtin function, NTH_VALUE should be added in Lexer and Parser, and user doc too. Or remove this line to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it, for now I'm good with using nth_value String Literal directly as only WMA using this.

@LantaoJin
Copy link
Member

The CI failure caused by #903 is not related but blocks your whole testing process. @andy-k-improving could you merge the latest code from main?

Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
@LantaoJin LantaoJin merged commit 439cf3e into opensearch-project:main Nov 14, 2024
4 checks passed
@LantaoJin
Copy link
Member

Thanks for your contribution. Merging to main.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 14, 2024
* WMA implementation

Signed-off-by: Andy Kwok <[email protected]>

* Update test cases

Signed-off-by: Andy Kwok <[email protected]>

* Update tests

Signed-off-by: Andy Kwok <[email protected]>

* Refactor code

Signed-off-by: Andy Kwok <[email protected]>

* Addres comments

Signed-off-by: Andy Kwok <[email protected]>

* Update doc

Signed-off-by: Andy Kwok <[email protected]>

* Update example

Signed-off-by: Andy Kwok <[email protected]>

* Update readme

Signed-off-by: Andy Kwok <[email protected]>

* Update scalafmt

Signed-off-by: Andy Kwok <[email protected]>

* Update grammar rule

Signed-off-by: Andy Kwok <[email protected]>

* Address review comments

Signed-off-by: Andy Kwok <[email protected]>

* Address review comments

Signed-off-by: Andy Kwok <[email protected]>

---------

Signed-off-by: Andy Kwok <[email protected]>
(cherry picked from commit 439cf3e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
YANG-DB pushed a commit that referenced this pull request Nov 14, 2024
* WMA implementation



* Update test cases



* Update tests



* Refactor code



* Addres comments



* Update doc



* Update example



* Update readme



* Update scalafmt



* Update grammar rule



* Address review comments



* Address review comments



---------


(cherry picked from commit 439cf3e)

Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
kenrickyap pushed a commit to Bit-Quill/opensearch-spark that referenced this pull request Dec 11, 2024
* WMA implementation

Signed-off-by: Andy Kwok <[email protected]>

* Update test cases

Signed-off-by: Andy Kwok <[email protected]>

* Update tests

Signed-off-by: Andy Kwok <[email protected]>

* Refactor code

Signed-off-by: Andy Kwok <[email protected]>

* Addres comments

Signed-off-by: Andy Kwok <[email protected]>

* Update doc

Signed-off-by: Andy Kwok <[email protected]>

* Update example

Signed-off-by: Andy Kwok <[email protected]>

* Update readme

Signed-off-by: Andy Kwok <[email protected]>

* Update scalafmt

Signed-off-by: Andy Kwok <[email protected]>

* Update grammar rule

Signed-off-by: Andy Kwok <[email protected]>

* Address review comments

Signed-off-by: Andy Kwok <[email protected]>

* Address review comments

Signed-off-by: Andy Kwok <[email protected]>

---------

Signed-off-by: Andy Kwok <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 backport 0.6 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants