Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expressions #14

Open
Ud71p opened this issue Jun 4, 2016 · 5 comments
Open

Regular expressions #14

Ud71p opened this issue Jun 4, 2016 · 5 comments

Comments

@Ud71p
Copy link

Ud71p commented Jun 4, 2016

This is a feature request for adding regular expressions to the basic library of Successor-ML.

I assume the need for them and their usefulness is obvious, if not I could elaborate upon request.

Some SML implementations already have them.
Moscow ML has a POSIX 1003.2 variant:
http://mosml.org/mosmllib/Regex.html
SML/NJ has a variant with multiple syntaxes, currently AWK syntax:
http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html

However lack of regular expressions in the BASIS library has dire consequences for everyday programming:
Availability - many implementations don't have them, e.g. PolyML;
Portability - a program written to use one implementation's regular expressions won't run on another's.

Which syntax will be chosen is in my opinion insignificant, but it seems like the multi-syntax approach is more versatile.

Bonus points for a direct possibility of matching a word boundary (\b or < and > in most syntaxes), a very useful feature in my experience, which both cited implementations currently lack.

@RobertHarper
Copy link
Contributor

agreed

bob

On Jun 4, 2016, at 13:56, Ud71p [email protected] wrote:

This is a feature request for adding regular expressions to the basic library of Successor-ML.

I assume the need for them and their usefulness is obvious, if not I could elaborate upon request.

Some SML implementations already have them.
Moscow ML has a POSIX 1003.2 variant:
http://mosml.org/mosmllib/Regex.html
SML/NJ has a variant with multiple syntaxes, currently AWK syntax:
http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html

However lack of regular expressions in the BASIS library has dire consequences for everyday programming:
Availability - many implementations don't have them, e.g. PolyML;
Portability - a program written to use one implementation's regular expressions won't run on another's.

Which syntax will be chosen is in my opinion insignificant, but it seems like the multi-syntax approach is more versatile.

Bonus points for a direct possibility of matching a word boundary (\b or < and > in most syntaxes), a very useful feature in my experience, which both cited implementations currently lack.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@RobertHarper
Copy link
Contributor

it's a good suggestion, we should take it.

bob

On Jun 4, 2016, at 13:56, Ud71p [email protected] wrote:

This is a feature request for adding regular expressions to the basic library of Successor-ML.

I assume the need for them and their usefulness is obvious, if not I could elaborate upon request.

Some SML implementations already have them.
Moscow ML has a POSIX 1003.2 variant:
http://mosml.org/mosmllib/Regex.html
SML/NJ has a variant with multiple syntaxes, currently AWK syntax:
http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html

However lack of regular expressions in the BASIS library has dire consequences for everyday programming:
Availability - many implementations don't have them, e.g. PolyML;
Portability - a program written to use one implementation's regular expressions won't run on another's.

Which syntax will be chosen is in my opinion insignificant, but it seems like the multi-syntax approach is more versatile.

Bonus points for a direct possibility of matching a word boundary (\b or < and > in most syntaxes), a very useful feature in my experience, which both cited implementations currently lack.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@JohnReppy
Copy link
Contributor

I would argue that REs are outside of the scope of the Basis Library. One of the design principles of the Basis Library is to focus on features that are ubiquitous (e.g., options and lists), require compiler support (e.g., arrays and numerics), or require OS/runtime support. This is why the Basis doesn't include sets, maps. or hash tables, or regular expressions.

I think that the problem is really one of library distribution. As the OP notes, there are existing RE libraries for SML, but they are not universally available. Furthermore, there is lots of other useful code that we have packaged up in the SML/NJ Library, but that is only available on SML/NJ and MLton. Smackage may be the right solution to this problem, but I haven't tried using it yet. (It also does not appear to support Moscow ML or Poly ML at this time).

With respect to REs specifically, this a very complicated design space. There is the issue of syntax and features, but there is also a large space of different implementation strategies with different performance characteristics (NFA with backtracking, NFA simulation, NFA->DFA translation, partial derivatives, derivatives, cached NFA->DFA, etc.) What would requirements would we place on a standardized Basis Library specification?

@RobertHarper
Copy link
Contributor

Let’s separate the end-user issue from the implementor issue.

From an end-user point of view, there ought to exist (one or more?) portable regular expression libraries for writing code. People these days rely a lot on regular expressions being available, and use them in quite a lot of production code. This fact can be traced back to Dave MacQueen, who was responsible for regular expression search being in Emacs! I think we can agree that users should have easy access to such a library or libraries.

From an implementor point of view, there is the question of which library. Agreed, probably not in the thing called the Standard Basis, but then where? Smackage is probably the right answer, and it would be great to get help in ensuring that it supports all the relevant compilers. It was a quick effort to provide needed functionality that has long been sorely lacking.

And then there is the question of exactly which regular expression library. Well, if modules are good for anything, it is exactly in abstracting from this. There ought to be as many implementations of the same signature as anyone thinks are important to have, the more the better. Unlike crappier languages we absolutely do not have to commit to any one implementation of regular expression matching!

Bob

On Jun 13, 2016, at 12:26, John Reppy [email protected] wrote:

I would argue that REs are outside of the scope of the Basis Library. One of the design principles of the Basis Library is to focus on features that are ubiquitous (e.g., options and lists), require compiler support (e.g., arrays and numerics), or require OS/runtime support. This is why the Basis doesn't include sets, maps. or hash tables, or regular expressions.

I think that the problem is really one of library distribution. As the OP notes, there are existing RE libraries for SML, but they are not universally available. Furthermore, there is lots of other useful code that we have packaged up in the SML/NJ Library, but that is only available on SML/NJ and MLton. Smackage https://github.com/standardml/smackage may be the right solution to this problem, but I haven't tried using it yet. (It also does not appear to support Moscow ML or Poly ML at this time).

With respect to REs specifically, this a very complicated design space. There is the issue of syntax and features, but there is also a large space of different implementation strategies with different performance characteristics (NFA with backtracking, NFA simulation, NFA->DFA translation, partial derivatives, derivatives, cached NFA->DFA, etc.) What would requirements would we place on a standardized Basis Library specification?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABdsdfXcTrtNNQZ3ZzqquuGFwl6CMDTrks5qLYTIgaJpZM4IuLnY.

@JohnReppy
Copy link
Contributor

I agree completely with your third paragraph. The RE library that I (and others) implemented in the SML/NJ library is designed to use the module system to support multiple front ends and multiple backends. While it works reasonably well, it could use an overhaul and modernization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants