-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regular expressions #14
Comments
agreed bob
|
it's a good suggestion, we should take it. bob
|
I would argue that REs are outside of the scope of the Basis Library. One of the design principles of the Basis Library is to focus on features that are ubiquitous (e.g., options and lists), require compiler support (e.g., arrays and numerics), or require OS/runtime support. This is why the Basis doesn't include sets, maps. or hash tables, or regular expressions. I think that the problem is really one of library distribution. As the OP notes, there are existing RE libraries for SML, but they are not universally available. Furthermore, there is lots of other useful code that we have packaged up in the SML/NJ Library, but that is only available on SML/NJ and MLton. Smackage may be the right solution to this problem, but I haven't tried using it yet. (It also does not appear to support Moscow ML or Poly ML at this time). With respect to REs specifically, this a very complicated design space. There is the issue of syntax and features, but there is also a large space of different implementation strategies with different performance characteristics (NFA with backtracking, NFA simulation, NFA->DFA translation, partial derivatives, derivatives, cached NFA->DFA, etc.) What would requirements would we place on a standardized Basis Library specification? |
Let’s separate the end-user issue from the implementor issue. From an end-user point of view, there ought to exist (one or more?) portable regular expression libraries for writing code. People these days rely a lot on regular expressions being available, and use them in quite a lot of production code. This fact can be traced back to Dave MacQueen, who was responsible for regular expression search being in Emacs! I think we can agree that users should have easy access to such a library or libraries. From an implementor point of view, there is the question of which library. Agreed, probably not in the thing called the Standard Basis, but then where? Smackage is probably the right answer, and it would be great to get help in ensuring that it supports all the relevant compilers. It was a quick effort to provide needed functionality that has long been sorely lacking. And then there is the question of exactly which regular expression library. Well, if modules are good for anything, it is exactly in abstracting from this. There ought to be as many implementations of the same signature as anyone thinks are important to have, the more the better. Unlike crappier languages we absolutely do not have to commit to any one implementation of regular expression matching! Bob
|
I agree completely with your third paragraph. The RE library that I (and others) implemented in the SML/NJ library is designed to use the module system to support multiple front ends and multiple backends. While it works reasonably well, it could use an overhaul and modernization. |
This is a feature request for adding regular expressions to the basic library of Successor-ML.
I assume the need for them and their usefulness is obvious, if not I could elaborate upon request.
Some SML implementations already have them.
Moscow ML has a POSIX 1003.2 variant:
http://mosml.org/mosmllib/Regex.html
SML/NJ has a variant with multiple syntaxes, currently AWK syntax:
http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html
However lack of regular expressions in the BASIS library has dire consequences for everyday programming:
Availability - many implementations don't have them, e.g. PolyML;
Portability - a program written to use one implementation's regular expressions won't run on another's.
Which syntax will be chosen is in my opinion insignificant, but it seems like the multi-syntax approach is more versatile.
Bonus points for a direct possibility of matching a word boundary (\b or < and > in most syntaxes), a very useful feature in my experience, which both cited implementations currently lack.
The text was updated successfully, but these errors were encountered: