-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Support Percentile in PPL #2670
Comments
I perfer option2
Do u proposal add another aggregator? Currently, SQL aggregation framework executed query plan on coordination node (if it can not be push down to OpenSearch). In future, we want to leverage Spark aggregation framework, instead of re-invent on it.
Is it possible to implement it in OpenSearch Core? and SQL/PPL can leverage it. |
Is your feature request related to a problem?
This RFC is a part of this issue: #44
Currently, PPL can not answer question with percentile function. For example
returns
Unsupported aggregation function
SQL query with
percentiles
function works as below because that it fallbacks to legacy engine which is not support PPL, so does current JOIN syntax.PERCENTILE
is a common aggregate function which has a lot of cases needs in Visualization with PPL. Same feature request from community: opendistro-for-elasticsearch/sql#1093What solution would you like?
Syntax in PPL
Option 1 (defined in current
OpenSearchPPLParser.g4
, but not implemented in code)Option 2 (more readable and widely used in OLAP engines)
Syntax in SQL
Basic
ANSI SQL (experimental)
adding OVER ([PARTITION BY expression])
For example, mainstream database supports
percentile_cont
andpercentile_disc
Postgresql
https://www.postgresql.org/docs/9.4/functions-aggregate.html
RedShift
https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html
Snowflake
https://docs.snowflake.com/en/sql-reference/functions/percentile_disc.html
Spark
https://issues.apache.org/jira/browse/SPARK-37691
Solution
To align with the current
percentiles(aggField)
implementation in legacy engine and percentiles agg in OpenSearch core, forpercentile
we will use t-digest construction algorithm which is an approximate calculation.For
percentile_cont
andpercentile_disc
, we could use org.apache.commons.math3:Percentile instead. For example, setting EstimationType R_1 forpercentile_disc
and R_7 forpercentile_cont
. This two R_x quantile algorithms are very popular, which used in Spark, PostgreSQL and Excel, etc. Reference: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.htmlWhat alternatives have you considered?
No, due to
percentiles
in legacy SQL engine couldn't work in PPL.Do you have any additional context?
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: