Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various Decimal Type Fixes #2326

Merged
merged 33 commits into from
Feb 16, 2024
Merged

Various Decimal Type Fixes #2326

merged 33 commits into from
Feb 16, 2024

Conversation

jycor
Copy link
Contributor

@jycor jycor commented Feb 9, 2024

This PR has a variety of fixes to have arithmetic operations (especially those involving decimals) behave more like MySQL.

The logic for the Type() method for Arthmetic and Div is simpler, and better tested.

When comparing Decimal results from division operations, MySQL has an internal Scale that is different than the Scale used when returning Decimal results for display.

Here's a matrix displaying the resulting scale:
(Ex: 1 / 3 = 0.333333333; scale 0 div scale 0 should return scale 9)
image

Additionally, this PR adds test for arithmetic operations over Datetime and Year types. There are still a some problems dealing with precision and parsing there...

Note: I believe the division optimization where float division is preferred over decimal division for internal calculations may be causing problems. More testing is needed to see if it's possible to enable this without causing inaccuracies/precision loss.

There are microbenchmarks measuring the performance of div expression, and it turns out these changes actually greatly improve the runtime.

Correctness: dolthub/dolt#7484

Fixes

@jycor jycor marked this pull request as draft February 9, 2024 23:29
@jycor jycor marked this pull request as ready for review February 13, 2024 23:22
@jycor jycor changed the title James/in Various Decimal Type Fixes Feb 13, 2024
Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a couple of files left to dig into, but wanted to go ahead and get a first batch of comments out for ya. Overall, looking very good, just a few small things that are worth cleaning up.

enginetest/queries/queries.go Outdated Show resolved Hide resolved
sql/expression/div.go Outdated Show resolved Hide resolved
sql/expression/div.go Outdated Show resolved Hide resolved
Comment on lines 137 to 139
// TODO: need to use the precision stored in datetimeType; something like
// return MustCreateDatetimeType(sqltypes.Datetime, ...)
return types.Datetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this affect any results? Looks like we only use the Datetime.precision value when converting to a time type, but I couldn't think of a quick arithmetic op example that would trigger this. I did notice the plan changes for tpcc dropped the precision, so seems like casting those values to times would have different behavior before/after.

Seems like we're close enough here that it'd be nice to finish this one if we can. Otherwise, it is safer (less of a behavior change) to continue returning types.DatetimeMaxPrecision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat affects results; more specifically it impacts the result type.

When performing arithmetic with DateTime, the precision determines if the result will be a Decimal or Integer.
Arithmetic involving a DateTime(0) should return an Integer.
Otherwise, arithmetic involving DateTime(x), returns a Decimal(<prec>, x).

After reimplementing the Arithmetic.Type() and Div.Type(), the tests with STR_TO_DATE broke because they were returning the wrong type.
We started returning Decimal instead of Integers because of the DateTime Precision.

{
	Query: "SELECT STR_TO_DATE('01,5,2013 09:30:17','%d,%m,%Y %h:%i:%s') - (STR_TO_DATE('01,5,2013 09:30:17','%d,%m,%Y %h:%i:%s') - INTERVAL 1 SECOND)",
	Expected: []sql.Row{{int64(1)}},
},

This also lead to the discovery of this broken query:

{
	// TODO:  need to properly handle datetime precision
	Query:    `SELECT STR_TO_DATE('01,5,2013 09:30:17','%d,%m,%Y %h:%i:%s %f') - (STR_TO_DATE('01,5,2013 09:30:17','%d,%m,%Y %h:%i:%s') - INTERVAL 1 SECOND)`,
	Expected: []sql.Row{{int64(1)}},
},

sql/expression/comparison.go Outdated Show resolved Hide resolved
Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. You did an awesome job simplifying the division expression evaluation logic. 🙌

Assuming the perf testing we talked about goes as expected, I think this one is good to go! I didn't wanna approve and jinx us though. 😜

A couple todos/loose ends that would be nice to clean up in future PRs (mod scale, int div, datetime precision), but since all the existing tests pass for those, this seems like a fine milestone to get merged in.

Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the benchmarks you added and ran them on my machine for this branch and main and the results look awesome. Seems like generally around two orders of magnitude better perf for division operations. Awesome, awesome work. I think you're right that the unnecessary atomic var use was one of the big culprits.

Really great job with all of this! Thanks for all the extra diligence with the perf testing.

@jycor jycor merged commit d69d7a2 into main Feb 16, 2024
7 checks passed
@jycor jycor deleted the james/in branch February 16, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants