Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10908: [Rust][DataFusion] Update relevant tpch-queries with BETWEEN #8906

Closed
wants to merge 1 commit into from

Conversation

seddonm1
Copy link
Contributor

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@codecov-io
Copy link

Codecov Report

Merging #8906 (737b001) into master (5353c28) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8906   +/-   ##
=======================================
  Coverage   75.48%   75.48%           
=======================================
  Files         181      181           
  Lines       41649    41649           
=======================================
  Hits        31439    31439           
  Misses      10210    10210           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2c732d...737b001. Read the comment docs.

@github-actions
Copy link

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR @seddonm1 .

Between is currently implemented as <= and >=, while the TPCH is implemented as < and >. They are semantically different (specially in types such as dates).

I do not know which one is right, but this introduces a (non-obvious) semantic change.

@Dandandan
Copy link
Contributor

@jorgecarleitao I think it should have been using >= and <= before instead of being replaced in the benchmarks by > and <. So the change now fixes the behavior?

@@ -400,7 +400,7 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate > date '1995-01-01' and l_shipdate < date '1996-12-31'
and l_shipdate between date '1995-01-01' and date '1996-12-31'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the TPC-H Query validation section to resolve the use of < or <= etc by running the query against SF1 data and comparing the result

According to the spec, the expected answer is 123141078.23 at SF1

http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf
page 38

REVENUE
123141078.23

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, I agree this would be very valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working with Andy to increase the testing provided by the TPC-H queries here: #8760 (comment)

The plan is to verify results against the correct results when run in test mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing BETWEEN I went hunting for the ANSI SQL spec but the best I could do is find the Postgres implementation which confirms it is definitely INCLUSIVE.

https://www.postgresql.org/docs/13/functions-comparison.html

The BETWEEN predicate simplifies range tests:

a BETWEEN x AND y
is equivalent to

a >= x AND a <= y

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I went through the spec, and there is indeed no definition of between. As such, I am changing my review as neutral and will leave the experts take the decision.

jorgecarleitao
jorgecarleitao previously approved these changes Dec 14, 2020
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the TPCH spec to confirm this query is closer to the actual query text of the spec. 👍

@alamb
Copy link
Contributor

alamb commented Dec 14, 2020

I filed a separate subtask to track this work https://issues.apache.org/jira/browse/ARROW-10817 as I think that is Arrow project standard (not to reuse tickets).

@alamb alamb changed the title ARROW-10817: [Rust][DataFusion] Update relevant tpch-queries with BETWEEN ARROW-10908: [Rust][DataFusion] Update relevant tpch-queries with BETWEEN Dec 14, 2020
@alamb alamb closed this in 8df91c9 Dec 14, 2020
@seddonm1 seddonm1 deleted the update-tpch-queries branch January 4, 2021 22:47
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…WEEN

@Dandandan
Fixes per apache#8892 (comment)

Closes apache#8906 from seddonm1/update-tpch-queries

Authored-by: Mike Seddon <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.48%. Comparing base (5353c28) to head (737b001).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8906   +/-   ##
=======================================
  Coverage   75.48%   75.48%           
=======================================
  Files         181      181           
  Lines       41649    41649           
=======================================
  Hits        31439    31439           
  Misses      10210    10210           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants