Add trendline command #748

kt-eliatra · 2024-10-07T07:51:47Z

Description

This PR implements the trendline PPL command.

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

YANG-DB · 2024-10-07T15:53:30Z

@kt-eliatra
please add the needed documentation in the following locations:

ppl planning

Afterward - add documentation and samples:

kt-eliatra · 2024-10-10T11:32:44Z

@kt-eliatra please add the needed documentation in the following locations:

* [ppl planning](https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang/planning)

Afterward - add documentation and samples:

* [ppl commands list](https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang)

* [ppl functions list](https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang/functions)

* [ppl examples doc](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/PPL-Example-Commands.md)

@YANG-DB
Sure, I'll do that. In the meantime, I have a few questions:

Do we assume that the data is already sorted, e.g. by a preceding sort command, or should the trendline command syntax be extended to support sorting?

Should we calculate sma/wma when number of data-points in current window is less than number of data-points required by user?

Having such data:

YEAR  | SALES
2024  |	5.0
2023  | 6.5
2022  | 7.0
2021  | 4.3
2020  | 3.6
2019  | 9.0

and number of data-points = 3

Years 2024-2021

SMA(2024) = (5.0+6.5+7.0)/3 = 6,166666667
SMA(2023) = (6.5+7.0+4.3)/3 = 5,933333333
SMA(2022) = (7.0+4.3+3.6)/3 = 4,966666667
SMA(2021) = (4.3+3.6+9.0)/3 = 5,633333333

Years 2020-2019

SMA(2020) = NULL since there are only 2 data-points - 3.6 and 9.0 or?
SMA(2019) = NULL since there is only 1 data-point- 9.0 or?

YANG-DB · 2024-10-10T17:08:39Z

@kt-eliatra please add the needed documentation in the following locations:
* [ppl planning](https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang/planning)
Afterward - add documentation and samples:
* [ppl commands list](https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang)

* [ppl functions list](https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang/functions)

* [ppl examples doc](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/PPL-Example-Commands.md)
@YANG-DB Sure, I'll do that. In the meantime, I have a few questions:
Do we assume that the data is already sorted, e.g. by a preceding sort command, or should the trendline command syntax be extended to support sorting?
Should we calculate sma/wma when number of data-points in current window is less than number of data-points required by user?
Having such data:
YEAR  | SALES
2024  |	5.0
2023  | 6.5
2022  | 7.0
2021  | 4.3
2020  | 3.6
2019  | 9.0
and number of data-points = 3
Years 2024-2021
SMA(2024) = (5.0+6.5+7.0)/3 = 6,166666667
SMA(2023) = (6.5+7.0+4.3)/3 = 5,933333333
SMA(2022) = (7.0+4.3+3.6)/3 = 4,966666667
SMA(2021) = (4.3+3.6+9.0)/3 = 5,633333333
Years 2020-2019
SMA(2020) = NULL since there are only 2 data-points - 3.6 and 9.0 or?
SMA(2019) = NULL since there is only 1 data-point- 9.0 or?

hi

Do we assume that the data is already sorted ? - IMO we cant assume that so please allow trendline to facilitate this functionality
Should we calculate sma/wma when number of data-points in current window is less ... - IMO we need to return a response stating there are insufficient data point to calculate the trendline based on the requested number

YANG-DB · 2024-10-11T01:09:56Z

@kt-eliatra can u plz merge & resolve the conflicts ?
thanks

kt-eliatra · 2024-10-11T12:12:46Z

@YANG-DB

- Do we assume that the data is already sorted ? - IMO we cant assume that so please allow trendline to facilitate this functionality
My proposal for the syntax of the trendline command
- without sorting
  - ... | trendline sma(2, price) wma(3, x)
  - ... | trendline wma(4, price) sma(5, x)
- using sorting
  - ... | trendline sort - date sma(2, price) wma(3, x)
  - ... | trendline sort + date wma(4, price) sma(5, x)
ANTLR
```
trendlineCommand
   : TRENDLINE (SORT sortField)? trendlineClause (trendlineClause)*
   ;

trendlineClause
   : trendlineType LT_PRTHS numberOfDataPoints = integerLiteral COMMA field = fieldExpression RT_PRTHS AS alias = fieldExpression
   ;

trendlineType
   : SMA
   | WMA
   ;
```

* `Should we calculate sma/wma when number of data-points in current window is less ...` - IMO we need to return a response stating there are insufficient data point to calculate the trendline based on the requested number

Query like

select 
  case when (count(1) over (order by year desc rows between 1 preceding and current row)) < 2
    then
      "Insufficient data point to calculate the trendline" 
    else
      (avg(sales) over (order by year desc rows between 1 preceding and current row)) end as sma_result 
from sales;

causes the calculated average to be casted to a string. Wouldn't it be better to return null in such a case?

YANG-DB · 2024-10-24T01:39:19Z

@kt-eliatra thanks for the detailed review - I confirm please continue

Wouldn't it be better to return null in such a case?

Yes it does make sense

Signed-off-by: Kacper Trochimiak <[email protected]>

lukasz-soszynski-eliatra · 2024-10-29T18:08:04Z

WMA extracted to PR eliatra#7

salyh · 2024-10-29T21:12:21Z

Superseded by #833 and eliatra#7

This PR should be closed in favor of the two mentioned aboved

kt-eliatra force-pushed the trendline-command branch from c6a9fb8 to 6f97f70 Compare October 7, 2024 08:06

YANG-DB added 0.5 Lang:PPL Pipe Processing Language support backport 0.5-nexus labels Oct 7, 2024

seankao-az added 0.6 and removed 0.5 backport 0.5-nexus labels Oct 7, 2024

kt-eliatra force-pushed the trendline-command branch from 2835eee to 714bfcf Compare October 9, 2024 07:09

kt-eliatra force-pushed the trendline-command branch from 714bfcf to 96ed7a5 Compare October 11, 2024 04:49

YANG-DB mentioned this pull request Oct 24, 2024

Add trendline PPL command opensearch-project/sql#3071

Open

7 tasks

kt-eliatra force-pushed the trendline-command branch from 2b8007a to 7575906 Compare October 29, 2024 12:32

lukasz-soszynski-eliatra force-pushed the trendline-command branch from 938d619 to 86ff84e Compare October 29, 2024 17:19

kt-eliatra added 6 commits October 29, 2024 18:32

WIP trendline command

a0eb392

Signed-off-by: Kacper Trochimiak <[email protected]>

wip

46be459

Signed-off-by: Kacper Trochimiak <[email protected]>

trendline supports sorting

007f8af

Signed-off-by: Kacper Trochimiak <[email protected]>

run scalafmtAll

07aa10c

Signed-off-by: Kacper Trochimiak <[email protected]>

return null when there are too few data points

ffe0581

Signed-off-by: Kacper Trochimiak <[email protected]>

sbt scalafmtAll

2d2c0f7

Signed-off-by: Kacper Trochimiak <[email protected]>

lukasz-soszynski-eliatra force-pushed the trendline-command branch from 86ff84e to 2d2c0f7 Compare October 29, 2024 17:42

kt-eliatra closed this Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trendline command #748

Add trendline command #748

kt-eliatra commented Oct 7, 2024

YANG-DB commented Oct 7, 2024 •

edited

Loading

kt-eliatra commented Oct 10, 2024

YANG-DB commented Oct 10, 2024

YANG-DB commented Oct 11, 2024

kt-eliatra commented Oct 11, 2024

YANG-DB commented Oct 24, 2024 •

edited

Loading

lukasz-soszynski-eliatra commented Oct 29, 2024

salyh commented Oct 29, 2024

Add trendline command #748

Add trendline command #748

Conversation

kt-eliatra commented Oct 7, 2024

Description

Issues Resolved

YANG-DB commented Oct 7, 2024 • edited Loading

kt-eliatra commented Oct 10, 2024

YANG-DB commented Oct 10, 2024

YANG-DB commented Oct 11, 2024

kt-eliatra commented Oct 11, 2024

YANG-DB commented Oct 24, 2024 • edited Loading

lukasz-soszynski-eliatra commented Oct 29, 2024

salyh commented Oct 29, 2024

YANG-DB commented Oct 7, 2024 •

edited

Loading

YANG-DB commented Oct 24, 2024 •

edited

Loading