-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10908: [Rust][DataFusion] Update relevant tpch-queries with BETWEEN #8906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Codecov Report
@@ Coverage Diff @@
## master #8906 +/- ##
=======================================
Coverage 75.48% 75.48%
=======================================
Files 181 181
Lines 41649 41649
=======================================
Hits 31439 31439
Misses 10210 10210 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR @seddonm1 .
Between
is currently implemented as <=
and >=
, while the TPCH is implemented as <
and >
. They are semantically different (specially in types such as dates).
I do not know which one is right, but this introduces a (non-obvious) semantic change.
@jorgecarleitao I think it should have been using |
@@ -400,7 +400,7 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic | |||
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') | |||
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') | |||
) | |||
and l_shipdate > date '1995-01-01' and l_shipdate < date '1996-12-31' | |||
and l_shipdate between date '1995-01-01' and date '1996-12-31' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use the TPC-H Query validation section to resolve the use of <
or <=
etc by running the query against SF1 data and comparing the result
According to the spec, the expected answer is 123141078.23
at SF1
http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf
page 38
REVENUE
123141078.23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, I agree this would be very valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working with Andy to increase the testing provided by the TPC-H queries here: #8760 (comment)
The plan is to verify results against the correct results when run in test mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing BETWEEN I went hunting for the ANSI SQL spec but the best I could do is find the Postgres implementation which confirms it is definitely INCLUSIVE.
https://www.postgresql.org/docs/13/functions-comparison.html
The BETWEEN predicate simplifies range tests:
a BETWEEN x AND y
is equivalent to
a >= x AND a <= y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I went through the spec, and there is indeed no definition of between
. As such, I am changing my review as neutral and will leave the experts take the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the TPCH spec to confirm this query is closer to the actual query text of the spec. 👍
I filed a separate subtask to track this work https://issues.apache.org/jira/browse/ARROW-10817 as I think that is Arrow project standard (not to reuse tickets). |
…WEEN @Dandandan Fixes per apache#8892 (comment) Closes apache#8906 from seddonm1/update-tpch-queries Authored-by: Mike Seddon <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8906 +/- ##
=======================================
Coverage 75.48% 75.48%
=======================================
Files 181 181
Lines 41649 41649
=======================================
Hits 31439 31439
Misses 10210 10210 ☔ View full report in Codecov by Sentry. |
@Dandandan
Fixes per #8892 (comment)