Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Basic implementation of JNumber conversions #42

Merged
merged 13 commits into from
Dec 11, 2017

Conversation

mdedetrich
Copy link
Owner

@mdedetrich mdedetrich commented Jul 26, 2017

This is a very simple implementation of JNumber conversions (#37).

@Ichoran This version throws exceptions, but only for toBigDecimal and toDouble (because it also uses BigDecimal underneath). BigDecimal (unlike .toDouble) will throw an exception if the value is too large. Apart from reimplementing the BigDecimal parser to return an Option[BigDecimal] rather than throwing an exception, I am unaware of another solution.

BigDecimal is used in toDouble to because it uses BigDecimal to see if we have lost precision, i.e.

  private[ast] def toDouble(value: String): Option[Double] = {
    try {
      val asDouble = value.toDouble
      if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED))
        Some(asDouble)
      else
        None
    } catch {
      case _: java.lang.NumberFormatException => None
    }
  }

I am not sure if there is a better way to detect if toDouble has lost precision due to conversion, but for now it works

TODO

  • Write tests (some basic tests of the .toInt and toLong demonstrates that it appears to work, but some more comprehensive tests/benchmarks are needed)
  • Write benchmarks

Current issues

There is a lot of code duplication in .toInt and .toLong. Although I can generalize this, there are a couple of problems

  1. I don't want to involve macros, especially since this project is also supporting Scala 2.10
  2. Scala really doesn't have a generic number type (unlike Spire for example https://github.com/non/spire). And if we are going to use something like Spire for a generic number type, we may as well change the constructor of JNumber to be a Real and we can call it a day. This means we can't really create a parameterized function over "Number"

We also don't support .toFloat, unless anyone is aware of a sane way to do this I don't think we will support it

@codecov-io
Copy link

codecov-io commented Jul 26, 2017

Codecov Report

Merging #42 into master will decrease coverage by 28.39%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
- Coverage   44.68%   16.29%   -28.4%     
==========================================
  Files           5        5              
  Lines         687      798     +111     
  Branches      133      229      +96     
==========================================
- Hits          307      130     -177     
- Misses        380      668     +288
Impacted Files Coverage Δ
...s/src/main/scala/scalajson/ast/unsafe/JValue.scala 0% <0%> (ø) ⬆️
jvm/src/main/scala/scalajson/ast/JValue.scala 0% <0%> (-73.4%) ⬇️
js/src/main/scala/scalajson/ast/JValue.scala 0% <0%> (ø) ⬆️
shared/src/main/scala/scalajson/ast/package.scala 27.54% <0%> (-30.24%) ⬇️
...m/src/main/scala/scalajson/ast/unsafe/JValue.scala 0% <0%> (-59.51%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd5637b...1d391aa. Read the comment docs.

private[ast] def toDouble(value: String): Option[Double] = {
try {
val asDouble = value.toDouble
if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't helpful functionality. Not only is it slow, it also isn't what one typically wants. If there is underflow or overflow, that is a problem; otherwise, the Double is only an approximation to the decimal value anyway, so you already know not to rely on it. Furthermore, I doubt that letting BigDecimal do the comparison is as fast as converting the double back to a string and letting the string comparison function have a go at it.
There are really three cases here:

  1. Number is too big to represent. Return None.
  2. Number is not zero but too close to zero to represent. Return None.
  3. Number can be represented, but possibly inexactly. Return Some(value.toDouble)

Copy link
Owner Author

@mdedetrich mdedetrich Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually, this solution (which is taken from argonaut) is consistent with how the other methods behave.

  1. It will detect underflow or overflow, because the BigDecimal constructor will fail
  2. It will also detect approximation issues, because when you construct a BigDecimal out of a raw string value it will not approximate the value (it uses real precision or it fails) and then we compare this to the value.toString)

Number can be represented, but possibly inexactly. Return Some(value.toDouble)

Actually in this case I believe we should still return None, else its not consistent with the other toInt/toBigInt/toBigDecimal functions, which do not return inexact results. Either we do not loss any data (precision, rounding or inexact representations) or we accept some loss of data. If you do accept this, and make this argument for Double, then you can argue that JNumber("1234.01").toInt should return Some(1234) (since its also an approximation).

Either we need to be consistent, or we don't. This is the same reason why .toFloat isn't there.

In regards to performance, if there is a more performant implementation I would love to use it. I am not sure if I have time to implement it correctly myself, considering that this is probably the last outstanding issue

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eed3si9n @travisbrown @sirthias Do you guys have a comment on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too fussy about Double. I won't even be surprised if Double just always returns a value like Circe does (uses infinity when it overflows, and zero for under?). Just write in the Scaladoc that if you use Double accuracy is best-effort.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that (1) We don't need to be consistent; (2) toFloat may as well be there (just don't wander through a Double on the way); and (3) if toDouble does validate loss of precision, I would never use toDouble. For example, suppose I get JSON serialized from Python:

>>> "%.18f" % (1.0/7)
'0.142857142857142849'

And now we try to import it:

scala> BigDecimal("0.142857142857142849") == BigDecimal("0.142857142857142849".toDouble)
res10: Boolean = false

This is really not what I want a library to do for me: make it hard to access data due to irrelevant inaccuracies in the representation when going from decimal to binary. It shouldn't be "best practice" to use .value.toDouble.

Copy link
Owner Author

@mdedetrich mdedetrich Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are mixing concerns here, the point of .toDouble is to get a valid double out of a JNumber without losing any data (like any of the other .to functions), its not designed to be used for comparisons. For example, I am not seeing how

BigDecimal("0.142857142857142849") == BigDecimal("0.142857142857142849".toDouble)

Is relevant, the issue here is that testing for equality with Double's is problematic and you always need to have a range when testing for quality with a Double.

The point of toDouble specifically on a JNumber is saying is that "if we lose data due to precision or otherwise, it will return None". Nothing is preventing a user from going value.toDouble if they don't care about losing data, just as nothing is preventing a user from going value.toInt (or any other variant). Honestly, if a user is aware of the issues with using Double then they are going to know what JNumber("0.142857142857142849").value.toDouble is going to do and the risk (i.e. risk of losing data due to accuracy)

Equality of Double's is an orthogonal concern here, and honestly in this area no language gets it really right when comparing Doubles for equality. In regards to best practice, the only valid best practice I have heard is "don't use doubles unless you really need performance" (and this is why clojure and other lisps has ratio type in standard library by default, because the amount of problems typical users trip into when dealing with doubles was enough to justify it). The fact that using BigDecimal in Java is painful due to no operator overloading doesn't help, but that is somewhat of an offtopic issue.

Copy link

@Ichoran Ichoran Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what "without losing any data" means, at least not with this implementation.

scala> BigDecimal.exact(0.1)
res11: scala.math.BigDecimal =
0.1000000000000000055511151231257827021181583404541015625

That's what the Double value actually is for the thing we call 0.1 but then try to represent as IEEE 754 binary64.

I think there are three justifiable positions, depending on how much work the library is supposed to do

  1. Double conversion never fails; you get the "closest" Double possible even if it might be 0 or +- infinity. Plus side: super simple. Minus side: clearly unrepresentable stuff gets through.
  2. Double conversion fails when clearly wrong ("1e-1000" or "1e1000" for instance) but otherwise gives the best finite nonzero approximation (or zero if it's truly zero). (TBD: policy for bad mismatches with subnormal numbers.) Plus: helpful to avoid the worst errors, matches compiler treatment of constants (if 1e-1000 -> Some(0.0)). Minus: have to think about errors, doesn't catch all of them.
  3. Double conversion succeeds when the string is any valid rounding of a binary64 value encoded fully as decimals; otherwise it fails. So 0.1, 0.100000000000000006, 0.1000000000000000056, etc. are all fine. Plus: catches everything that is reasonably an error. Minus: catches a lot of things that seem as reasonable as anything else that Double does; very slow and lots of work to implement.

In general, Double always means "might have lost some data". Trying to save people from data loss in this particular case seems both difficult and almost-pointless. If 0.3 - (0.1+0.2) in Doubles isn't 0, and it's not, we have bigger problems than supposing that "0.11111111111111111" in JSON has a sensible Double representation. (It fails the "valid rounding" test.)

private[ast] def toDouble(value: String): Option[Double] = {
try {
val asDouble = value.toDouble
if (BigDecimal(value, MathContext.UNLIMITED) == BigDecimal(asDouble, MathContext.UNLIMITED))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for 2.10


if (!negativeEFlag) {
result *= radix
if (result < limit + digit && maxLenCheck)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't actually catch overflow, will it?

Copy link
Owner Author

@mdedetrich mdedetrich Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my basic tests it does, try the following

toInt("234223")
toInt("234223e0")
toInt("-234223")
toInt("23422300.00e-1")
toInt("23422300.00e-2")
toInt("23422300.00e-3")
toInt("2147483647")
toInt("2147483648")
toInt("214748364700e-2")
toInt("214748364700.00e-2")
toInt("214748364700e43")
toInt("214748364700e0")
toInt("2147483647e0”)
toInt("2147483647e-0”)

Specifically toInt("2147483648") will return None (its overflowing by Integer.MAX_VALUE +1)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, I thought result was an Int! Yes, this works, but it's incredibly slow. In my library I have two things that serve a similar purpose, Jsonal's Double parsing and Grok's optional int parsing. Both are forty to fifty times faster than this (for 10 digit ints).

This is fine as a working placeholder, but not suitable for production.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ichoran What would you suggest as a suitable replacement for BigInt. I realise that its slow, but it will also work for any amount of zeroes (up until memory will run out actually). I guess its a question of how correct you want to be?

Or maybe I can adapt it by using result: Int for the common case, and then if it over/under flows switch it to a BigInt?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just count zeros with an Int. You aren't going to have more than two billion because they won't fit in the string anyway, and likewise if the exponent isn't in that range you're not going to have a representable integer.
Then you need to detect when you're in danger of overflow and divide it into three cases based on leading digit: overflow impossible; overflow certain; and overflow possible. When overflow is possible and actually happens, the sign of result will flip. So you look for that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of doing it this way (with no larger-representation number) is that, first, JS will be fast (since it likes 32 bit integers); and second, the algorithm for Long will be essentially unchanged (just the key values will be different), just like now.

@eed3si9n eed3si9n mentioned this pull request Jul 27, 2017
@mdedetrich
Copy link
Owner Author

mdedetrich commented Dec 4, 2017

@Ichoran @eed3si9n I have just revised the JNumber conversions as instructed. In particular, does the conversion to BigDecimal make sense?

If all is good then I will add some tests and merge into master.

@mdedetrich mdedetrich changed the title Basic implementation of JNumber conversions [WIP] Basic implementation of JNumber conversions Dec 4, 2017
@Ichoran
Copy link

Ichoran commented Dec 4, 2017

BigDecimal looks fine to me, but you didn't do anything complicated there. It's the integer ones that have more potential for error.

@mdedetrich
Copy link
Owner Author

BigDecimal looks fine to me, but you didn't do anything complicated there. It's the integer ones that have more potential for error.

Yeah I have a test case for these, will add them in over the next few days

Copy link
Contributor

@eed3si9n eed3si9n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found def toFloat: Double.

Do we need Float types at all?

}

@inline private[ast] def toDouble(value: String): Double =
value.toDouble
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


private[ast] def toBigDecimal(value: String): Option[BigDecimal] = {
try {
Some(BigDecimal(value, MathContext.UNLIMITED))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to so we can roundtrip.


def toDouble: Double = scalajson.ast.toDouble(value)

def toFloat: Double = scalajson.ast.toFloat(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed

@mdedetrich
Copy link
Owner Author

@eed3si9n The toFloat functions are there for completeness. There are issues when working with Float (which are well documented) but at least these functions behave correctly as expected

@mdedetrich mdedetrich merged commit 6262c4e into master Dec 11, 2017
@mdedetrich mdedetrich deleted the jNumberConversions2 branch December 11, 2017 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants