[WIP] Basic implementation of JNumber conversions #42

mdedetrich · 2017-07-26T20:44:17Z

This is a very simple implementation of JNumber conversions (#37).

@Ichoran This version throws exceptions, but only for toBigDecimal and toDouble (because it also uses BigDecimal underneath). BigDecimal (unlike .toDouble) will throw an exception if the value is too large. Apart from reimplementing the BigDecimal parser to return an Option[BigDecimal] rather than throwing an exception, I am unaware of another solution.

BigDecimal is used in toDouble to because it uses BigDecimal to see if we have lost precision, i.e.

  private[ast] def toDouble(value: String): Option[Double] = {
    try {
      val asDouble = value.toDouble
      if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED))
        Some(asDouble)
      else
        None
    } catch {
      case _: java.lang.NumberFormatException => None
    }
  }

I am not sure if there is a better way to detect if toDouble has lost precision due to conversion, but for now it works

TODO

Write tests (some basic tests of the .toInt and toLong demonstrates that it appears to work, but some more comprehensive tests/benchmarks are needed)
Write benchmarks

Current issues

There is a lot of code duplication in .toInt and .toLong. Although I can generalize this, there are a couple of problems

I don't want to involve macros, especially since this project is also supporting Scala 2.10
Scala really doesn't have a generic number type (unlike Spire for example https://github.com/non/spire). And if we are going to use something like Spire for a generic number type, we may as well change the constructor of JNumber to be a Real and we can call it a day. This means we can't really create a parameterized function over "Number"

We also don't support .toFloat, unless anyone is aware of a sane way to do this I don't think we will support it

There is now a constructedFlag which tracks how a JNumber is created

* Also removed JNumber double constructor for Scala.js

codecov-io · 2017-07-26T20:53:38Z

Codecov Report

Merging #42 into master will decrease coverage by 28.39%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
- Coverage   44.68%   16.29%   -28.4%     
==========================================
  Files           5        5              
  Lines         687      798     +111     
  Branches      133      229      +96     
==========================================
- Hits          307      130     -177     
- Misses        380      668     +288

Impacted Files	Coverage Δ
...s/src/main/scala/scalajson/ast/unsafe/JValue.scala	`0% <0%> (ø)`	⬆️
jvm/src/main/scala/scalajson/ast/JValue.scala	`0% <0%> (-73.4%)`	⬇️
js/src/main/scala/scalajson/ast/JValue.scala	`0% <0%> (ø)`	⬆️
shared/src/main/scala/scalajson/ast/package.scala	`27.54% <0%> (-30.24%)`	⬇️
...m/src/main/scala/scalajson/ast/unsafe/JValue.scala	`0% <0%> (-59.51%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd5637b...1d391aa. Read the comment docs.

Ichoran · 2017-07-26T21:08:50Z

shared/src/main/scala-2.10/scalajson.ast/package.scala

+  private[ast] def toDouble(value: String): Option[Double] = {
+    try {
+      val asDouble = value.toDouble
+      if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED))


This isn't helpful functionality. Not only is it slow, it also isn't what one typically wants. If there is underflow or overflow, that is a problem; otherwise, the Double is only an approximation to the decimal value anyway, so you already know not to rely on it. Furthermore, I doubt that letting BigDecimal do the comparison is as fast as converting the double back to a string and letting the string comparison function have a go at it.
There are really three cases here:

Number is too big to represent. Return None.

Number is not zero but too close to zero to represent. Return None.

Number can be represented, but possibly inexactly. Return Some(value.toDouble)

Well actually, this solution (which is taken from argonaut) is consistent with how the other methods behave.

It will detect underflow or overflow, because the BigDecimal constructor will fail

It will also detect approximation issues, because when you construct a BigDecimal out of a raw string value it will not approximate the value (it uses real precision or it fails) and then we compare this to the value.toString)

Number can be represented, but possibly inexactly. Return Some(value.toDouble)

Actually in this case I believe we should still return None, else its not consistent with the other toInt/toBigInt/toBigDecimal functions, which do not return inexact results. Either we do not loss any data (precision, rounding or inexact representations) or we accept some loss of data. If you do accept this, and make this argument for Double, then you can argue that JNumber("1234.01").toInt should return Some(1234) (since its also an approximation).

Either we need to be consistent, or we don't. This is the same reason why .toFloat isn't there.

In regards to performance, if there is a more performant implementation I would love to use it. I am not sure if I have time to implement it correctly myself, considering that this is probably the last outstanding issue

@eed3si9n @travisbrown @sirthias Do you guys have a comment on this?

I am not too fussy about Double. I won't even be surprised if Double just always returns a value like Circe does (uses infinity when it overflows, and zero for under?). Just write in the Scaladoc that if you use Double accuracy is best-effort.

I'd argue that (1) We don't need to be consistent; (2) toFloat may as well be there (just don't wander through a Double on the way); and (3) if toDouble does validate loss of precision, I would never use toDouble. For example, suppose I get JSON serialized from Python:

>>> "%.18f" % (1.0/7) '0.142857142857142849'

And now we try to import it:

scala> BigDecimal("0.142857142857142849") == BigDecimal("0.142857142857142849".toDouble) res10: Boolean = false

This is really not what I want a library to do for me: make it hard to access data due to irrelevant inaccuracies in the representation when going from decimal to binary. It shouldn't be "best practice" to use .value.toDouble.

I think we are mixing concerns here, the point of .toDouble is to get a valid double out of a JNumber without losing any data (like any of the other .to functions), its not designed to be used for comparisons. For example, I am not seeing how

BigDecimal("0.142857142857142849") == BigDecimal("0.142857142857142849".toDouble)

Is relevant, the issue here is that testing for equality with Double's is problematic and you always need to have a range when testing for quality with a Double.

The point of toDouble specifically on a JNumber is saying is that "if we lose data due to precision or otherwise, it will return None". Nothing is preventing a user from going value.toDouble if they don't care about losing data, just as nothing is preventing a user from going value.toInt (or any other variant). Honestly, if a user is aware of the issues with using Double then they are going to know what JNumber("0.142857142857142849").value.toDouble is going to do and the risk (i.e. risk of losing data due to accuracy)

Equality of Double's is an orthogonal concern here, and honestly in this area no language gets it really right when comparing Doubles for equality. In regards to best practice, the only valid best practice I have heard is "don't use doubles unless you really need performance" (and this is why clojure and other lisps has ratio type in standard library by default, because the amount of problems typical users trip into when dealing with doubles was enough to justify it). The fact that using BigDecimal in Java is painful due to no operator overloading doesn't help, but that is somewhat of an offtopic issue.

I don't know what "without losing any data" means, at least not with this implementation.

scala> BigDecimal.exact(0.1) res11: scala.math.BigDecimal = 0.1000000000000000055511151231257827021181583404541015625

That's what the Double value actually is for the thing we call 0.1 but then try to represent as IEEE 754 binary64.

I think there are three justifiable positions, depending on how much work the library is supposed to do

Double conversion never fails; you get the "closest" Double possible even if it might be 0 or +- infinity. Plus side: super simple. Minus side: clearly unrepresentable stuff gets through.

Double conversion fails when clearly wrong ("1e-1000" or "1e1000" for instance) but otherwise gives the best finite nonzero approximation (or zero if it's truly zero). (TBD: policy for bad mismatches with subnormal numbers.) Plus: helpful to avoid the worst errors, matches compiler treatment of constants (if 1e-1000 -> Some(0.0)). Minus: have to think about errors, doesn't catch all of them.

Double conversion succeeds when the string is any valid rounding of a binary64 value encoded fully as decimals; otherwise it fails. So 0.1, 0.100000000000000006, 0.1000000000000000056, etc. are all fine. Plus: catches everything that is reasonably an error. Minus: catches a lot of things that seem as reasonable as anything else that Double does; very slow and lots of work to implement.

In general, Double always means "might have lost some data". Trying to save people from data loss in this particular case seems both difficult and almost-pointless. If 0.3 - (0.1+0.2) in Doubles isn't 0, and it's not, we have bigger problems than supposing that "0.11111111111111111" in JSON has a sensible Double representation. (It fails the "valid rounding" test.)

Ichoran · 2017-07-26T21:09:42Z

shared/src/main/scala/scalajson/ast/package.scala

+  private[ast] def toDouble(value: String): Option[Double] = {
+    try {
+      val asDouble = value.toDouble
+      if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED))


Same comment as for 2.10

Ichoran · 2017-07-26T21:15:19Z

shared/src/main/scala-2.10/scalajson.ast/package.scala

+
+          if (!negativeEFlag) {
+            result *= radix
+            if (result < limit + digit && maxLenCheck)


This won't actually catch overflow, will it?

According to my basic tests it does, try the following

toInt("234223") toInt("234223e0") toInt("-234223") toInt("23422300.00e-1") toInt("23422300.00e-2") toInt("23422300.00e-3") toInt("2147483647") toInt("2147483648") toInt("214748364700e-2") toInt("214748364700.00e-2") toInt("214748364700e43") toInt("214748364700e0") toInt("2147483647e0”) toInt("2147483647e-0”)

Specifically toInt("2147483648") will return None (its overflowing by Integer.MAX_VALUE +1)

Oh, wait, I thought result was an Int! Yes, this works, but it's incredibly slow. In my library I have two things that serve a similar purpose, Jsonal's Double parsing and Grok's optional int parsing. Both are forty to fifty times faster than this (for 10 digit ints).

This is fine as a working placeholder, but not suitable for production.

@Ichoran What would you suggest as a suitable replacement for BigInt. I realise that its slow, but it will also work for any amount of zeroes (up until memory will run out actually). I guess its a question of how correct you want to be?

Or maybe I can adapt it by using result: Int for the common case, and then if it over/under flows switch it to a BigInt?

Just count zeros with an Int. You aren't going to have more than two billion because they won't fit in the string anyway, and likewise if the exponent isn't in that range you're not going to have a representable integer.
Then you need to detect when you're in danger of overflow and divide it into three cases based on leading digit: overflow impossible; overflow certain; and overflow possible. When overflow is possible and actually happens, the sign of result will flip. So you look for that.

The advantage of doing it this way (with no larger-representation number) is that, first, JS will be fast (since it likes 32 bit integers); and second, the algorithm for Long will be essentially unchanged (just the key values will be different), just like now.

Bumped versions

# Conflicts: # build.sbt # project/plugins.sbt

mdedetrich · 2017-12-04T22:06:07Z

@Ichoran @eed3si9n I have just revised the JNumber conversions as instructed. In particular, does the conversion to BigDecimal make sense?

If all is good then I will add some tests and merge into master.

Ichoran · 2017-12-04T22:38:23Z

BigDecimal looks fine to me, but you didn't do anything complicated there. It's the integer ones that have more potential for error.

mdedetrich · 2017-12-04T22:39:07Z

BigDecimal looks fine to me, but you didn't do anything complicated there. It's the integer ones that have more potential for error.

Yeah I have a test case for these, will add them in over the next few days

eed3si9n

Found def toFloat: Double.

Do we need Float types at all?

eed3si9n · 2017-12-09T04:21:16Z

shared/src/main/scala/scalajson/ast/package.scala

+  }
+
+  @inline private[ast] def toDouble(value: String): Double =
+    value.toDouble


eed3si9n · 2017-12-09T04:22:47Z

shared/src/main/scala/scalajson/ast/package.scala

+
+  private[ast] def toBigDecimal(value: String): Option[BigDecimal] = {
+    try {
+      Some(BigDecimal(value, MathContext.UNLIMITED))


I think this makes sense to so we can roundtrip.

eed3si9n · 2017-12-09T04:23:21Z

jvm/src/main/scala-2.10/scalajson.ast/unsafe/JValue.scala

+
+  def toDouble: Double = scalajson.ast.toDouble(value)
+
+  def toFloat: Double = scalajson.ast.toFloat(value)


Thanks! Fixed

mdedetrich · 2017-12-09T18:38:31Z

@eed3si9n The toFloat functions are there for completeness. There are issues when working with Float (which are well documented) but at least these functions behave correctly as expected

mdedetrich added 6 commits July 11, 2017 22:33

Added in various JNumber conversions #37

7db4283

There is now a constructedFlag which tracks how a JNumber is created

Fixed up some of the checks #37

8c5b1be

Fixed equals and hashcode #37

065fcad

Removed unused import

403d9b3

#37 Basic implementation of JNumber conversions

fcaa8d1

* Also removed JNumber double constructor for Scala.js

#37 Fixed up precision

a6f610b

Ichoran reviewed Jul 26, 2017

View reviewed changes

eed3si9n mentioned this pull request Jul 27, 2017

Adds JNumber conversions #41

Closed

mdedetrich added 6 commits August 1, 2017 19:16

#37 Improved performance

efabb1f

Bumped versions

#37 Fixed performance problem with BigInt

f755a21

#37 Added benchmarks

f73d435

#37 Fix benchmark code

acf9e14

Merge branch 'master' into jNumberConversions2

302f1dc

# Conflicts: # build.sbt # project/plugins.sbt

#37 Changed the JNumber conversion methods to follow semantics of types

f7539ad

mdedetrich changed the title ~~Basic implementation of JNumber conversions~~ [WIP] Basic implementation of JNumber conversions Dec 4, 2017

eed3si9n suggested changes Dec 9, 2017

View reviewed changes

#37 Typo in one of the return types for toFloat

1d391aa

mdedetrich merged commit 6262c4e into master Dec 11, 2017

mdedetrich deleted the jNumberConversions2 branch December 11, 2017 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Basic implementation of JNumber conversions #42

[WIP] Basic implementation of JNumber conversions #42

mdedetrich commented Jul 26, 2017 •

edited

Loading

codecov-io commented Jul 26, 2017 •

edited

Loading

Ichoran Jul 26, 2017

mdedetrich Jul 27, 2017 •

edited

Loading

mdedetrich Jul 27, 2017

eed3si9n Jul 27, 2017

Ichoran Jul 27, 2017

mdedetrich Jul 27, 2017 •

edited

Loading

Ichoran Jul 27, 2017 •

edited

Loading

Ichoran Jul 26, 2017

Ichoran Jul 26, 2017

mdedetrich Jul 27, 2017 •

edited

Loading

Ichoran Jul 27, 2017

mdedetrich Jul 27, 2017

Ichoran Jul 27, 2017

Ichoran Jul 27, 2017

mdedetrich commented Dec 4, 2017 •

edited

Loading

Ichoran commented Dec 4, 2017

mdedetrich commented Dec 4, 2017

eed3si9n left a comment

eed3si9n Dec 9, 2017

eed3si9n Dec 9, 2017

eed3si9n Dec 9, 2017

mdedetrich Dec 9, 2017

mdedetrich commented Dec 9, 2017


		def toDouble: Double = scalajson.ast.toDouble(value)

		def toFloat: Double = scalajson.ast.toFloat(value)

[WIP] Basic implementation of JNumber conversions #42

[WIP] Basic implementation of JNumber conversions #42

Conversation

mdedetrich commented Jul 26, 2017 • edited Loading

TODO

Current issues

codecov-io commented Jul 26, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

mdedetrich Jul 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdedetrich Jul 27, 2017 • edited Loading

Choose a reason for hiding this comment

Ichoran Jul 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdedetrich Jul 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdedetrich commented Dec 4, 2017 • edited Loading

Ichoran commented Dec 4, 2017

mdedetrich commented Dec 4, 2017

eed3si9n left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdedetrich commented Dec 9, 2017

mdedetrich commented Jul 26, 2017 •

edited

Loading

codecov-io commented Jul 26, 2017 •

edited

Loading

mdedetrich Jul 27, 2017 •

edited

Loading

mdedetrich Jul 27, 2017 •

edited

Loading

Ichoran Jul 27, 2017 •

edited

Loading

mdedetrich Jul 27, 2017 •

edited

Loading

mdedetrich commented Dec 4, 2017 •

edited

Loading