Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DATE_TRUNC Optimizer #14385

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

ashishjayamohan
Copy link
Contributor

@ashishjayamohan ashishjayamohan commented Nov 5, 2024

@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 0% with 98 lines in your changes missing coverage. Please review.

Project coverage is 34.56%. Comparing base (59551e4) to head (8b58e9e).
Report is 1376 commits behind head on master.

Files with missing lines Patch % Lines
...optimizer/filter/TimePredicateFilterOptimizer.java 0.00% 97 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (8b58e9e). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (59551e4) HEAD (8b58e9e)
skip-bytebuffers-false 7 6
unittests 5 3
unittests1 2 0
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #14385       +/-   ##
=============================================
- Coverage     61.75%   34.56%   -27.19%     
- Complexity      207      779      +572     
=============================================
  Files          2436     2673      +237     
  Lines        133233   146836    +13603     
  Branches      20636    22512     +1876     
=============================================
- Hits          82274    50757    -31517     
- Misses        44911    91971    +47060     
+ Partials       6048     4108     -1940     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 34.54% <0.00%> (-27.17%) ⬇️
java-21 34.56% <0.00%> (-27.07%) ⬇️
skip-bytebuffers-false 34.55% <0.00%> (-27.20%) ⬇️
skip-bytebuffers-true 34.54% <0.00%> (+6.81%) ⬆️
temurin 34.56% <0.00%> (-27.19%) ⬇️
unittests 34.56% <0.00%> (-27.19%) ⬇️
unittests1 ?
unittests2 34.56% <0.00%> (+6.83%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@ashishjayamohan ashishjayamohan changed the title [draft] Optimize DATETRUNC in Predicate Add DATE_TRUNC Optimizer Nov 6, 2024
@ashishjayamohan ashishjayamohan marked this pull request as ready for review November 6, 2024 00:59
@ashishjayamohan
Copy link
Contributor Author

@jadami10 Made that optimizer enhancement here. Let me know if anything looks off!

Copy link
Contributor

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first, thank you for doing this!

took an initial look and left some nit comments, some testing ideas, and some ideas about stuff that looks like it might break.

I haven't reviewed the actual algorithm yet


@Test
public void testDateTruncOptimizer() {
testDateTrunc("datetrunc('DAY', col) < 1620777600000", new Range("0", true, "1620777600000", false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we need a test with an INT instead of long? I believe that's also a valid time type in pinot
  2. can we have test with week or month truncation
  3. can we have a test case where it's not a support function. as in IN to make sure nothing is erroring
  4. also a test case where the time granularity is unsupported? I'm not sure if calcite will catch that before the optimizer, but we do use DAY as an input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jadami10. Thanks for this insight! I've been working on this for the past couple days (specifically on the time zone test). I've introduced several time zone tests and my implementation seems to work only for some time zone usages.
If you're willing, would you be able to write up a quick draft of what the algorithm should look like to convert the date_trunc function with time zones to a range query (essentially, the floor and ceiling inverse of date trunc). I think it would be beneficial to hear it from another perspective to find what I'm missing.

@ashishjayamohan ashishjayamohan marked this pull request as draft November 12, 2024 00:43
Copy link
Contributor

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think your approach to convert the date trunc predicates to floor/ceiling sounds good. I left some comments where I'm confused about why the implementation drifts for similar functions

upperMillis = dateTruncFloor(operands);
if (upperMillis != TimeUnit.MILLISECONDS.convert(getLongValue(filterOperands.get(1)), TimeUnit.valueOf(outputTimeUnit.toUpperCase()))) {
upperInclusive = true;
upperMillis = dateTruncCeil(operands);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this recomputed here?

lowerMillis = Long.MIN_VALUE;
upperInclusive = false;
upperMillis = dateTruncFloor(operands);
if (upperMillis != TimeUnit.MILLISECONDS.convert(getLongValue(filterOperands.get(1)), TimeUnit.valueOf(outputTimeUnit.toUpperCase()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we checking this here but not in in GREATER_THAN?

break;
case GREATER_THAN_OR_EQUAL:
operands.set(1, getExpression(getLongValue(filterOperands.get(1)), new DateTimeFormatSpec("TIMESTAMP")));
lowerMillis = dateTruncFloor(operands);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ceil?

// testDateTrunc("datetrunc('DAY', col, 'DAYS', 'CET', 'MILLISECONDS') = 39193714800000",
// new Range("453631", true, "453631", true));
testDateTrunc("datetrunc('DAY', col, 'MILLISECONDS', 'UTC', 'DAYS') = 453630",
new Range("39193632000000", true, "39193718399999", true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks strange. 39193632000000 is like 1000 years from now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time Pruner should support queries where we're operating on the time column itself
4 participants