-
Notifications
You must be signed in to change notification settings - Fork 72
Add possibility to convert String to Option[Int], Option[Long], Option[Boolean], etc #431
Conversation
TBH I don't think this deserves a place in the standard library. There are lots of things that "would be nice" to have, but we should place the bar much higher than "nice to have", or we're going to end up again with zillions of stuff nobody uses like |
On the one hand, I agree, its always nice to have the minimal amount of code, but on the other hand...
With this PR there is no maintenance cost because the amount of code is extremely small and I cannot imagine how it could change. Even if the parsing logic changes these helpers will stay the same. The same applies to evolution slowdown - this code will not require any evolution. Regarding complexity cost, I think its compensated by the frequency of usage. These SO questions: link1, link2, link3, link4 have over 70 thousands views together. And that's only from people who didn't know the answer by heart - I googled it once but used dozens of times after that. So I have to argue that the chance of these being classified as "stuff nobody uses" is rather small. |
Given the fact that the implementation of these is to actually convert them, and see if it worked, it's not clear to me what the major use case is. Presumably you actually want the value out of the string in most cases, in which case using these methods just ends up converting the string twice. |
Its hard for me to come with particular use case as its been a while since I needed it, but something like this seems probable:
And even if you parse it n times readability increases. |
You can do something like: object IntishString {
def unapply(str: String): Option[Int] = Try(str.toInt).toOption
}
/* ... */
str match {
case IntishString(intval) => /* ... */
/* ... */
} for that use case, which I think is somewhat more idiomatic and avoids double-parsing. I'd agree with Sébastien that these are a bit niche for the stdlib; having |
You really don't want to use try/catch to detect numbers in strings. It's incredibly slow (like 500x slower than inspecting the characters). |
I would love to see that discussion, if anyone knows where it is. I also think being able to get |
I don't remember where the discussion was, but having Adding methods that return an option, like |
Oh, I absolutely understand why changing its behaviour would break everything. Having other methods (or using an implicit to say whether it returns an |
I support adding |
I see we've got discussion here:) Reg, custom extractors, sure user can do that as well as many other things, its mostly matter of convenience. I think modifying the behaviour of I was considering If there will be no cons of |
The concern is, as @Ichoran mentioned, that using |
I was thinking about less trivial implementation. |
Check out mdedetrich/scalajson#37 and related code. |
this is scala/bug#16 (yup, a double-digit bug number, the only one still open) |
Semantically, scala/bug#16 doesn't really address the cost of exceptions, and I think it is legitimate to want an implementation which returns There was a thread on contributors which brought up the problem of exceptions for |
IMO a method checking whether a string is an int/byte/whatever is a useless feature. If you want to be sure it's an To only option for a safe and fast |
Let me know if following is correct: Currently almost all the conversions are based on Java implementations like
Point 3 made me ask myself a question "how is this handled in scala-js and scala-native" and as far as I can see both of these projects implement Anyway, I implemented very basic benchmark here. Its probably highly inaccurate, but may give some intuition anyway. Here are the results:
So according to this benchmark reimplementation gives us 2.5x speedup. Should we try to do that? |
It's difficult to benchmark exceptions, because creation and unwinding depends on the stack depth. Benchmarking the "random" case isn't ideal either IMO. Separately benching the success and failure cases is IMO much better. The reimplementation and benchmarks I did can be found at https://github.com/martijnhoekstra/numparsers/blob/master/src/main/scala/StringConversions.scala and an abbreviated benchmark result overview at https://docs.google.com/spreadsheets/d/1TohVfP_KDkEvVV2fvRMXYvSYTYNkntpkGdP-NSVcikg/edit#gid=0 The short of it is that parsing to I still believe these reimplementations show reasonable performance, while handling the edge cases the same as |
For reference, this is a good blog post about the costs of exceptions: https://shipilev.net/blog/2014/exceptional-performance/ |
Borrowing scala-native or scala-js's implementations (and possibly tweaking them to return |
Borrowing from Scala.js (resp. Scala Native) is not possible, because the implementations of those functions rely on primitive JS (resp. C) functions. |
production system brought to its knees by the expensiveness of |
I'd forgotten about this until that blog post jogged my memory, but in 2011 I sped up a major component of a production system 10x simply by replacing exception-catching number-parsing code with hand-rolled code that handled the commonest cases directly. (To my own astonishment at the time at how big the speedup was.) It really is a performance trap. |
…into string-ops-conversion-checks
So, as far I understand there is a consensus on having
If someone wants to pick this up and we want to keep the discussion record I can give full access to my fork to anyone. I'm aware this is probably not the highest standard of contribution but probably better than nothing :) |
No, you cannot legally copy the implem from Java. The Java implem is GPL and Scala's codebase is BSD. I said there was nothing to be copied in Scala.js' or Scala Native's implem. I most definitely did not say you should copy it from Java's implementation instead. |
(@sjrd is replying to a comment I posted and then immediately deleted when I realized I had just repeated Krever's "Can we just copy the code from legal PoV" question) |
@sjrd I never claimed you said that :) I just assumed it is the easiest way to go. I'm not a lawyer but even now this code is not literally copied. It was translated from java to scala, I introduced parameterized result handling and other changes so its far from a direct copy. Also if it helps I can admit under oath that I manually retyped every character and only took inspiration from java impl. Also, I don't think its in the state in which it could be considered ready for merge, so I think we can postpone this aspect of discussion. I assume we would like to make it more "scalaish". |
Unfortunately, that's not how the GPL works. Even "taking inspiration" and "retyping every character" is considered a derivative work, and must therefore be released under the GPL as well. In general, the only safe assumption is not to even look at any GPL code when contributing to a BSD- or MIT-licensed codebase, such as Scala. Any reimplementation must be completely clean-room. |
@Krever - That isn't good enough either. You have to do a clean reimplementation. Typing the code over again, in a different language, with a few changes to the API is practically the definition of a "derivative work", and the GPL covers all derivative works (that is practically the point of the GPL). You can find BSD-or-compatible-licensed implementations of integer parsing in ScalaJSON https://github.com/mdedetrich/scalajson, and in Jsonal in my "KSE" library, among other places. |
My mistake, I did some research and this, in fact, seems to be derivative work. I still think there is no way to prove it once the refactoring is completed, but probably the way in which it was created still make it derivative work. Maybe I will find some time to rewrite this or use BSD-compatible implementation, but cant say for sure. Still, if anyone has any comments regarding how this should be implemented its probably a good moment for it. |
I removed the copyrighted code to take away the risk of being inspired by it. Signatures and implementation of derived parsers are my work completely. |
@Krever - Three other comments. (1) Unless you have benchmarked your code to be faster than the built-in exception-throwing flavor, you should not change from the original versions of |
@Krever I bet you didn't expect this might turn out to be so complicated :-) |
I've looked at the implementations in scala-native, and the implementations of |
Note that those implementations currently suffer from a bug in
Character.digit, but that wouldn't bother the JVM implementation that has
the correct behavior
…On Sun, Mar 18, 2018, 07:13 Nth ***@***.***> wrote:
I've looked at the implementations in scala-native, and the
implementations of Long.parseLong
<https://github.com/scala-native/scala-native/blob/v0.3.6/javalib/src/main/scala/java/lang/Long.scala#L303-L348>,
Integer.parseInt
<https://github.com/scala-native/scala-native/blob/v0.3.6/javalib/src/main/scala/java/lang/Integer.scala#L289-L336>,
Short.parseShort
<https://github.com/scala-native/scala-native/blob/v0.3.6/javalib/src/main/scala/java/lang/Short.scala#L211-L217>
and Byte.parseByte
<https://github.com/scala-native/scala-native/blob/v0.3.6/javalib/src/main/scala/java/lang/Byte.scala#L211-L217>
seem pretty portable (though those for Double.parseDouble and
Float.parseFloat do not).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#431 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA13EWid8D2c6vzGyxWE9o2a6sDNQsE3ks5tffr7gaJpZM4R9Oig>
.
|
I've changed the issue title to reflect the move away from returning |
scala/scala#6538 follows up on this |
@Krever thank you very much for getting the ball rolling and pushing it as far as you did 👏 |
I wish I had more time for this, but I'm extremely happy that someone picked this up. Thanks @martijnhoekstra ! @SethTisue I think we can close it now. |
I thought it may be worth considering to add several methods that verify if the given string is a valid
int
/long
/etc. On multiple occasions, I wrote something like this:Try(s.toInt).isSuccess
which is more than inconvenient and suboptimal.collections
orcollections-contrib
In general, I will be happy to apply any changes needed, just let me know.