Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actual query to database is missing 'group by' #98

Open
jhmitchel opened this issue Jan 18, 2018 · 3 comments
Open

Actual query to database is missing 'group by' #98

jhmitchel opened this issue Jan 18, 2018 · 3 comments

Comments

@jhmitchel
Copy link

A query of mine using 'group by' on a column in the original query is lacking any 'group by' in the query Verdict generates to send to the database. This cause the aggregation used in the case statements to be done over the entire table rather than over the groups. The query was processed on Verdict on top of Impala.

Here is the query and debug output:
verdict:Impala> select case when count() < 50 then 1 else 0 end as sparse, case when count() > 50 then 50 else count() end as num_sampled from orders group by user_id;
DEBUG 2018-01-17 20:42:08,331 - [VerdictStatement] execute() called with: select case when count(
) < 50 then 1 else 0 end as sparse, case when count() > 50 then 50 else count() end as num_sampled from orders group by user_id
DEBUG 2018-01-17 20:42:08,331 - [VerdictJDBCContext] An input query:
DEBUG 2018-01-17 20:42:08,331 - [VerdictJDBCContext] select case when count() < 50 then 1 else 0 end as sparse, case when count() > 50 then 50 else count() end as num_sampled from orders group by user_id
DEBUG 2018-01-17 20:42:08,414 - [Class] [0] A query type: SELECT
DEBUG 2018-01-17 20:42:08,427 - [DbmsImpala] About to run: describe instacart1g.orders
DEBUG 2018-01-17 20:42:08,428 - [DbmsImpala] A new statement id: 1545827753
DEBUG 2018-01-17 20:42:08,481 - [ApproxProjectedRelation] A query to db: SELECT (CASE WHEN count(
) < 50 THEN 1 ELSE 0 END) AS sparse, (CASE WHEN count() > 50 THEN 50 ELSE count() END) AS num_sampled FROM instacart1g.orders AS vt5
DEBUG 2018-01-17 20:42:08,481 - [DbmsImpala] About to run: SELECT (CASE WHEN count() < 50 THEN 1 ELSE 0 END) AS sparse, (CASE WHEN count() > 50 THEN 50 ELSE count(*) END) AS num_sampled FROM instacart1g.orders AS vt5
DEBUG 2018-01-17 20:42:08,481 - [DbmsImpala] A new statement id: 851912430
DEBUG 2018-01-17 20:42:08,963 - [VerdictJDBCContext] The query execution finished.
DEBUG 2018-01-17 20:42:08,964 - [VerdictStatement] Internal statement set to 851912430

@barzan
Copy link
Contributor

barzan commented Jan 18, 2018

THanks James. Can you find the bug and push a fix ASAP? Thanks

@jhmitchel
Copy link
Author

jhmitchel commented Jan 19, 2018 via email

@pyongjoo
Copy link
Member

pyongjoo commented Feb 2, 2018

James mentioned this error happens when a select list does not include aggregations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants