-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Basic implementation of JNumber conversions #42
Conversation
There is now a constructedFlag which tracks how a JNumber is created
* Also removed JNumber double constructor for Scala.js
Codecov Report
@@ Coverage Diff @@
## master #42 +/- ##
==========================================
- Coverage 44.68% 16.29% -28.4%
==========================================
Files 5 5
Lines 687 798 +111
Branches 133 229 +96
==========================================
- Hits 307 130 -177
- Misses 380 668 +288
Continue to review full report at Codecov.
|
private[ast] def toDouble(value: String): Option[Double] = { | ||
try { | ||
val asDouble = value.toDouble | ||
if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't helpful functionality. Not only is it slow, it also isn't what one typically wants. If there is underflow or overflow, that is a problem; otherwise, the Double is only an approximation to the decimal value anyway, so you already know not to rely on it. Furthermore, I doubt that letting BigDecimal do the comparison is as fast as converting the double back to a string and letting the string comparison function have a go at it.
There are really three cases here:
- Number is too big to represent. Return
None
. - Number is not zero but too close to zero to represent. Return
None
. - Number can be represented, but possibly inexactly. Return
Some(value.toDouble)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well actually, this solution (which is taken from argonaut) is consistent with how the other methods behave.
- It will detect underflow or overflow, because the
BigDecimal
constructor will fail - It will also detect approximation issues, because when you construct a
BigDecimal
out of a raw string value it will not approximate the value (it uses real precision or it fails) and then we compare this to thevalue.toString
)
Number can be represented, but possibly inexactly. Return Some(value.toDouble)
Actually in this case I believe we should still return None
, else its not consistent with the other toInt
/toBigInt
/toBigDecimal
functions, which do not return inexact results. Either we do not loss any data (precision, rounding or inexact representations) or we accept some loss of data. If you do accept this, and make this argument for Double
, then you can argue that JNumber("1234.01").toInt
should return Some(1234)
(since its also an approximation).
Either we need to be consistent, or we don't. This is the same reason why .toFloat
isn't there.
In regards to performance, if there is a more performant implementation I would love to use it. I am not sure if I have time to implement it correctly myself, considering that this is probably the last outstanding issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eed3si9n @travisbrown @sirthias Do you guys have a comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too fussy about Double
. I won't even be surprised if Double
just always returns a value like Circe does (uses infinity when it overflows, and zero for under?). Just write in the Scaladoc that if you use Double
accuracy is best-effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that (1) We don't need to be consistent; (2) toFloat may as well be there (just don't wander through a Double
on the way); and (3) if toDouble
does validate loss of precision, I would never use toDouble
. For example, suppose I get JSON serialized from Python:
>>> "%.18f" % (1.0/7)
'0.142857142857142849'
And now we try to import it:
scala> BigDecimal("0.142857142857142849") == BigDecimal("0.142857142857142849".toDouble)
res10: Boolean = false
This is really not what I want a library to do for me: make it hard to access data due to irrelevant inaccuracies in the representation when going from decimal to binary. It shouldn't be "best practice" to use .value.toDouble
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are mixing concerns here, the point of .toDouble
is to get a valid double out of a JNumber
without losing any data (like any of the other .to
functions), its not designed to be used for comparisons. For example, I am not seeing how
BigDecimal("0.142857142857142849") == BigDecimal("0.142857142857142849".toDouble)
Is relevant, the issue here is that testing for equality with Double
's is problematic and you always need to have a range when testing for quality with a Double
.
The point of toDouble
specifically on a JNumber
is saying is that "if we lose data due to precision or otherwise, it will return None
". Nothing is preventing a user from going value.toDouble
if they don't care about losing data, just as nothing is preventing a user from going value.toInt
(or any other variant). Honestly, if a user is aware of the issues with using Double
then they are going to know what JNumber("0.142857142857142849").value.toDouble
is going to do and the risk (i.e. risk of losing data due to accuracy)
Equality of Double
's is an orthogonal concern here, and honestly in this area no language gets it really right when comparing Double
s for equality. In regards to best practice, the only valid best practice I have heard is "don't use doubles unless you really need performance" (and this is why clojure and other lisps has ratio type in standard library by default, because the amount of problems typical users trip into when dealing with doubles was enough to justify it). The fact that using BigDecimal
in Java is painful due to no operator overloading doesn't help, but that is somewhat of an offtopic issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what "without losing any data" means, at least not with this implementation.
scala> BigDecimal.exact(0.1)
res11: scala.math.BigDecimal =
0.1000000000000000055511151231257827021181583404541015625
That's what the Double
value actually is for the thing we call 0.1
but then try to represent as IEEE 754 binary64.
I think there are three justifiable positions, depending on how much work the library is supposed to do
- Double conversion never fails; you get the "closest"
Double
possible even if it might be 0 or +- infinity. Plus side: super simple. Minus side: clearly unrepresentable stuff gets through. - Double conversion fails when clearly wrong ("1e-1000" or "1e1000" for instance) but otherwise gives the best finite nonzero approximation (or zero if it's truly zero). (TBD: policy for bad mismatches with subnormal numbers.) Plus: helpful to avoid the worst errors, matches compiler treatment of constants (if 1e-1000 -> Some(0.0)). Minus: have to think about errors, doesn't catch all of them.
- Double conversion succeeds when the string is any valid rounding of a binary64 value encoded fully as decimals; otherwise it fails. So 0.1, 0.100000000000000006, 0.1000000000000000056, etc. are all fine. Plus: catches everything that is reasonably an error. Minus: catches a lot of things that seem as reasonable as anything else that Double does; very slow and lots of work to implement.
In general, Double
always means "might have lost some data". Trying to save people from data loss in this particular case seems both difficult and almost-pointless. If 0.3 - (0.1+0.2)
in Double
s isn't 0
, and it's not, we have bigger problems than supposing that "0.11111111111111111" in JSON has a sensible Double
representation. (It fails the "valid rounding" test.)
private[ast] def toDouble(value: String): Option[Double] = { | ||
try { | ||
val asDouble = value.toDouble | ||
if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for 2.10
|
||
if (!negativeEFlag) { | ||
result *= radix | ||
if (result < limit + digit && maxLenCheck) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't actually catch overflow, will it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my basic tests it does, try the following
toInt("234223")
toInt("234223e0")
toInt("-234223")
toInt("23422300.00e-1")
toInt("23422300.00e-2")
toInt("23422300.00e-3")
toInt("2147483647")
toInt("2147483648")
toInt("214748364700e-2")
toInt("214748364700.00e-2")
toInt("214748364700e43")
toInt("214748364700e0")
toInt("2147483647e0”)
toInt("2147483647e-0”)
Specifically toInt("2147483648")
will return None
(its overflowing by Integer.MAX_VALUE
+1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, I thought result
was an Int
! Yes, this works, but it's incredibly slow. In my library I have two things that serve a similar purpose, Jsonal's Double parsing and Grok's optional int parsing. Both are forty to fifty times faster than this (for 10 digit ints).
This is fine as a working placeholder, but not suitable for production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ichoran What would you suggest as a suitable replacement for BigInt
. I realise that its slow, but it will also work for any amount of zeroes (up until memory will run out actually). I guess its a question of how correct you want to be?
Or maybe I can adapt it by using result: Int
for the common case, and then if it over/under flows switch it to a BigInt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just count zeros with an Int
. You aren't going to have more than two billion because they won't fit in the string anyway, and likewise if the exponent isn't in that range you're not going to have a representable integer.
Then you need to detect when you're in danger of overflow and divide it into three cases based on leading digit: overflow impossible; overflow certain; and overflow possible. When overflow is possible and actually happens, the sign of result
will flip. So you look for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of doing it this way (with no larger-representation number) is that, first, JS will be fast (since it likes 32 bit integers); and second, the algorithm for Long will be essentially unchanged (just the key values will be different), just like now.
Bumped versions
# Conflicts: # build.sbt # project/plugins.sbt
BigDecimal looks fine to me, but you didn't do anything complicated there. It's the integer ones that have more potential for error. |
Yeah I have a test case for these, will add them in over the next few days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found def toFloat: Double
.
Do we need Float
types at all?
} | ||
|
||
@inline private[ast] def toDouble(value: String): Double = | ||
value.toDouble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
private[ast] def toBigDecimal(value: String): Option[BigDecimal] = { | ||
try { | ||
Some(BigDecimal(value, MathContext.UNLIMITED)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense to so we can roundtrip.
|
||
def toDouble: Double = scalajson.ast.toDouble(value) | ||
|
||
def toFloat: Double = scalajson.ast.toFloat(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed
@eed3si9n The |
This is a very simple implementation of JNumber conversions (#37).
@Ichoran This version throws exceptions, but only for
toBigDecimal
andtoDouble
(because it also usesBigDecimal
underneath).BigDecimal
(unlike.toDouble
) will throw an exception if the value is too large. Apart from reimplementing theBigDecimal
parser to return anOption[BigDecimal]
rather than throwing an exception, I am unaware of another solution.BigDecimal
is used intoDouble
to because it usesBigDecimal
to see if we have lost precision, i.e.I am not sure if there is a better way to detect if
toDouble
has lost precision due to conversion, but for now it worksTODO
.toInt
andtoLong
demonstrates that it appears to work, but some more comprehensive tests/benchmarks are needed)Current issues
There is a lot of code duplication in
.toInt
and.toLong
. Although I can generalize this, there are a couple of problems2.10
JNumber
to be aReal
and we can call it a day. This means we can't really create a parameterized function over "Number"We also don't support
.toFloat
, unless anyone is aware of a sane way to do this I don't think we will support it