Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentile Aggregate Functions #13

Open
bjovanovic84 opened this issue Feb 26, 2020 · 4 comments
Open

Percentile Aggregate Functions #13

bjovanovic84 opened this issue Feb 26, 2020 · 4 comments

Comments

@bjovanovic84
Copy link

Is your feature request related to a problem? Please describe.
It would be very useful to add a percentile aggregation function, in addition to the standard min, max, avg. When performing analysis it's often a requirement to report p95, p99 values, it's not easy to do so right now.

Describe the solution you'd like
Adding something like [MIN|MAX]_PCT(<percentile>, <column>) would be ideal, to be used like MAX_PCT(99, value), MIN_PCT(20, value) etc.

@larskoole
Copy link

Definitely something we are also looking forward too. We use percentile for so many of our statistics and charts. This is the only thing that is stopping us from using Rockset.

@benhannel
Copy link

You can emulate this functionality with the NTH_VALUE window function.

@luketn
Copy link

luketn commented Jun 7, 2022

This worked for me:

SELECT
    v.attributeId, 
    MIN(v.value) "min",
    ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.25) twenty_fifth,
    ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.5) median,
    ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.75) seventy_fifth,
    ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.95) ninety_fifth,
    ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.99) ninety_ninth,
    MAX(v.value) "max"
FROM
    commons.attributeValues v
group by v.attributeId

@benhannel can you see any issue with that approach? Would there be any way to avoid the repetition of:

ARRAY_AGG(v.value)

throughout the query?

@benhannel
Copy link

Yes, the approach you have there will work. You can also use a subquery to avoid some duplication of ARRAY_SORT(ARRAY_AGG(v.value)). There are also some window functions which could be useful to you:
https://rockset.com/docs/window-functions/#ntile

It's worth noting that any mechanism to compute an exact percentile value will take memory which is linear in the number of input rows. Max and standard deviation are much cheaper to compute, if sufficient for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants