Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ordered-set aggregation functions (WITHIN GROUP) #576

Closed
wants to merge 4 commits into from

Conversation

shane-circuithub
Copy link
Contributor

This commit adds support for ordered-set aggregation functions such as mode() which use the WITHIN GROUP (ORDER BY _) syntax. However, it doesn't actually expose any public facing API for this, basically because I don't know yet what that API should be for Opaleye. There are a lot of subtle restrictions on the kinds of arguments you can pass to the ordered-set aggregation functions (e.g., percentile_disc() can take either a constant expression or one of the columns from the query we're aggregating over, but only if it's part of the GROUP BY clause) that would be hard to capture in the types that Opaleye currently uses.

What this commit does is add just enough internals for me to able to experiment with a limited API in Rel8 for ordered-set aggregation functions that might be subject to change in the future. But I don't want to add something half-baked to Opaleye's public facing API just yet.

This means that expressions contained in `ORDER BY` clauses of aggregation functions are now also renamed by `extractAggregateFields`.
This commit adds support for ordered-set aggregation functions such as `mode()` which use the `WITHIN GROUP (ORDER BY _)` syntax. However, it doesn't actually expose any public facing API for this, basically because I don't know yet what that API should be for Opaleye. There are a lot of subtle restrictions on the kinds of arguments you can pass to the ordered-set aggregation functions (e.g., `percentile_disc()` can take either a constant expression or one of the columns from the query we're aggregating over, but only if it's part of the `GROUP BY` clause) that would be hard to capture in the types that Opaleye currently uses.

What this commit does is add just enough internals for me to able to experiment with a limited API in Rel8 for ordered-set aggregation functions that might be subject to change in the future. But I don't want to add something half-baked to Opaleye's public facing API just yet.
@shane-circuithub
Copy link
Contributor Author

This is based on the refactoring in #575, but no longer depends on the fixes proposed in #577 or #578.

@tomjaguarpaw
Copy link
Owner

Yup, all good. Likewise split up a bit and merged separately.


Due to the splitting tool all the commits get attributed to me. Sorry about that. I try to credit you on all the commits, e.g. 1367e70. If you feel strongly about this please let me know and I'll see what I can do to fix the tool.

@shane-circuithub
Copy link
Contributor Author

No worries at all, I don't care about the attribution, just happy to have this merged so I can play around with it! Thanks for being so responsive in the last week with all the PRs.

@tomjaguarpaw
Copy link
Owner

You're welcome. Thanks for the great PRs!

tomjaguarpaw added a commit that referenced this pull request Jan 21, 2024
It's redundant because we only need to rebind PrimExprs that came from
a lateral subquery.  As explained at [1], rather than carefully
analysing which PrimExprs came from a lateral subquery we can just
rebind everything.  A previous commit [2] changed things so that
everything mentioned in aggregator order expressions is rebound.
Instead we probably should have done what this commit does, that is,
added an Unpackspec constraint to aggregate.

Fixes #587

This is a simpler approach to resolving the issues discussed at

* #585
* #578

This still suffers from the problem described at:

#578 (comment)

i.e. if we find a way of duplicating field names, such as

    O.asc (\x -> snd x O..++ snd x)

then we can still create crashing queries.  The benefit of this
comment is that there is a way of generating non-crashing queries!

[1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114

[2] d848317, part of
    #576
tomjaguarpaw added a commit that referenced this pull request Jan 26, 2024
It's redundant because we only need to rebind PrimExprs that came from
a lateral subquery.  As explained at [1], rather than carefully
analysing which PrimExprs came from a lateral subquery we can just
rebind everything.  A previous commit [2] changed things so that
everything mentioned in aggregator order expressions is rebound.
Instead we probably should have done what this commit does, that is,
added an Unpackspec constraint to aggregate.

Fixes #587

This is a simpler approach to resolving the issues discussed at

* #585
* #578

This still suffers from the problem described at:

#578 (comment)

i.e. if we find a way of duplicating field names, such as

    O.asc (\x -> snd x O..++ snd x)

then we can still create crashing queries.  The benefit of this
comment is that there is a way of generating non-crashing queries!

[1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114

[2] d848317, part of
    #576
tomjaguarpaw added a commit that referenced this pull request Feb 6, 2024
It's redundant because we only need to rebind PrimExprs that came from
a lateral subquery.  As explained at [1], rather than carefully
analysing which PrimExprs came from a lateral subquery we can just
rebind everything.  A previous commit [2] changed things so that
everything mentioned in aggregator order expressions is rebound.
Instead we probably should have done what this commit does, that is,
added an Unpackspec constraint to aggregate.

Fixes #587

This is a simpler approach to resolving the issues discussed at

* #585
* #578

This still suffers from the problem described at:

#578 (comment)

i.e. if we find a way of duplicating field names, such as

    O.asc (\x -> snd x O..++ snd x)

then we can still create crashing queries.  The benefit of this
comment is that there is a way of generating non-crashing queries!

[1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114

[2] d848317, part of
    #576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants