-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite only references in aggregation, not all values #585
Rewrite only references in aggregation, not all values #585
Conversation
Thanks. This post contains a lot of repeated bits. I guess that's copy-paste error? If so, could you tidy it up? |
efb6549
to
9183036
Compare
Apologies for that, it should be fixed now. |
9183036
to
9d7fd96
Compare
Thanks. I'm very short on time and energy but I'd like to get this to a stage where you are at least unblocked for what you need to do.
When you say "take on" do you mean that the implementation of the changes to Opaleye (described in #578 (comment)) are too big to take on, or once such changes to Opaleye are made, the changes required on the rel8 side are too big? I'm not particularly worried about the former. I think I can probably do it myself given a free weekend afternoon. On the other hand, I don't know if the new API will be useable by rel8 at all, in the way you want to use it. |
Oh, and I incorporated your tests as "pending" into the test suite. Can you rebase your branch (or maybe all of them)? |
To recap, PostgreSQL does not allow lateral references in aggregation functions, e.g., the following SQL is invalid: ```sql SELECT result FROM ( VALUES (1), (2), (3) ) _(x), LATERAL ( SELECT sum(x) AS result ) __ ``` It fails with the error message `ERROR: aggregate functions are not allowed in FROM clause of their own query level`. Opaleye works around this limitation by rewriting the above query as follows: ```sql SELECT result FROM ( VALUES (1), (2), (3) ) _(x), LATERAL ( SELECT sum(inner1) AS result FROM ( SELECT x AS inner1 ) _ ) __ ``` The current implementation of this rewriting rewrites all arguments of aggregation functions regardless of whether they contain (potentially lateral) references or not. However, the same effect could be achieved if we only rewrote references and not all expressions. That's what this PR does. This change is beneficial when using set aggregation functions such as `percentile_cont`. With the current rewriting scheme, Opaleye will generate: ```sql SELECT result FROM ( VALUES (1), (2), (3) ) _(x), LATERAL ( SELECT percentile_cont(inner1) WITHIN GROUP (ORDER BY inner2) AS result FROM ( SELECT 0.5 AS inner1, x AS inner2 ) _ ) __ ``` Which fails with the error message: ``` ERROR: column "_.inner1" must appear in the GROUP BY clause or be used in an aggregate function LINE 12: percentile_cont(inner1) WITHIN GROUP (ORDER BY inner2) ^ DETAIL: Direct arguments of an ordered-set aggregate must use only grouped columns. ``` After the change in this PR, this instead becomes: ```sql SELECT result FROM ( VALUES (1), (2), (3) ) _(x), LATERAL ( SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY inner1) AS result FROM ( SELECT x AS inner1 ) _ ) __ ``` Which works.
9d7fd96
to
43670ff
Compare
Yeah, I just mean on the Rel8 side of things. I actually don't care so much about the ability to use I've rebased this PR. |
OK, I'll have a closer look at the weekend. |
After refreshing my memory on this issue, the biggest objection I have is that it's still very easy to write crashing queries after this PR. The proposed change feels too invasive for a partial solution. By way of alternative, I suggest excluding the |
I've just tested this out. It did require some changes to Rel8 and to CircuitHub code that uses Rel8, but they were relatively tolerable. I would have preferred not to incur a |
Yeah, it's unfortunate to pick up that extra constraint but it does seem like it plays a genuine role. The API is still not safe ( |
It's redundant because we only need to rebind PrimExprs that came from a lateral subquery. As explained at [1], rather than carefully analysing which PrimExprs came from a lateral subquery we can just rebind everything. A previous commit [2] changed things so that everything mentioned in aggregator order expressions is rebound. Instead we probably should have done what this commit does, that is, added an Unpackspec constraint to aggregate. Fixes #587 This is a simpler approach to resolving the issues discussed at * #585 * #578 This still suffers from the problem described at: #578 (comment) i.e. if we find a way of duplicating field names, such as O.asc (\x -> snd x O..++ snd x) then we can still create crashing queries. The benefit of this comment is that there is a way of generating non-crashing queries! [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114 [2] d848317, part of #576
It's redundant because we only need to rebind PrimExprs that came from a lateral subquery. As explained at [1], rather than carefully analysing which PrimExprs came from a lateral subquery we can just rebind everything. A previous commit [2] changed things so that everything mentioned in aggregator order expressions is rebound. Instead we probably should have done what this commit does, that is, added an Unpackspec constraint to aggregate. Fixes #587 This is a simpler approach to resolving the issues discussed at * #585 * #578 This still suffers from the problem described at: #578 (comment) i.e. if we find a way of duplicating field names, such as O.asc (\x -> snd x O..++ snd x) then we can still create crashing queries. The benefit of this comment is that there is a way of generating non-crashing queries! [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114 [2] d848317, part of #576
It's redundant because we only need to rebind PrimExprs that came from a lateral subquery. As explained at [1], rather than carefully analysing which PrimExprs came from a lateral subquery we can just rebind everything. A previous commit [2] changed things so that everything mentioned in aggregator order expressions is rebound. Instead we probably should have done what this commit does, that is, added an Unpackspec constraint to aggregate. Fixes #587 This is a simpler approach to resolving the issues discussed at * #585 * #578 This still suffers from the problem described at: #578 (comment) i.e. if we find a way of duplicating field names, such as O.asc (\x -> snd x O..++ snd x) then we can still create crashing queries. The benefit of this comment is that there is a way of generating non-crashing queries! [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114 [2] d848317, part of #576
In #576 I added preliminary support for ordered set aggregation functions. However, in the vast majority of cases this generates invalid SQL as things stand.
This PR contains two commits: the first adds a failing test demonstrating the problem, the second one fixes the problem.
To recap, recall PostgreSQL does not allow lateral references in aggregation functions, e.g., the following SQL is invalid:
It fails with the error message
ERROR: aggregate functions are not allowed in FROM clause of their own query level
. Opaleye works around this limitation by rewriting the above query as follows:The current implementation of this rewriting rewrites all arguments of aggregation functions regardless of whether they contain (potentially lateral) references or not. However, the same effect could be achieved if we only rewrote references and not all expressions. That's what this PR does.
This change is beneficial when using set aggregation functions such as
percentile_cont
. With the current rewriting scheme, Opaleye will generate:Which fails with the error message:
After the change in this PR, this instead becomes:
Which works.
Incidentally this is the exactly the same fix as the one proposed in #578, which I understand you consider a less than ideal fix for the problem it purports to solve, but I think the API changes that would result from your proposed solution in the comments (although it's correct) are bigger than I want to take on right now. In any case, this is the problem that I am eager to solve at this time, so would you consider this fix as a solution to the problem described above, even if you're not satisfied that it truly solves #578? It shouldn't break any existing code as far as I can see.