Add error recovery mode? #56

deltaidea · 2017-07-19T14:44:12Z

In my language, lines with invalid stuff are considered comments. I know, insane, but I'd like to support that if possible. Currently, { error: true } is extremely greedy and considers everything from the first error to be a single token.

I propose an optional error tolerance mode that can be enabled with { error: true, recover: true }:

if can't parse a token starting with current position position:
- if not already in recovery mode, save current position as recovery starting point
- increment position (skip current character)
if can parse a token && recovery starting point exists:
- return a token with name of error token type, line and col of recovery starting point
- delete recovery starting point (so we return current valid token next time)

The text was updated successfully, but these errors were encountered:

nathan · 2017-07-19T15:17:34Z

moo.compile({
  id: /\w+/,
  ws: {match: /\s+/, lineBreaks: true},
  // … rules rules rules …
  ignore: /.+/, // skip to eol
})

If you meant the entire line gets ignored, not just the trailing lexically invalid part, that's a job for the parser, not the lexer, because there are almost always sequences of lexically valid tokens that are not syntactic. (For example, + - is a sequence of JS tokens that is not syntactic.)

deltaidea · 2017-07-19T15:35:56Z

I tried that approach:

compile({
  ...
  lCurly: '{',
  rCurly: '}',
  invalid: /.+/
})

rCurly gets moved to the list of keywords matched by invalid. Then the whole line } // comment gets parsed as invalid which doesn't match rCurly.
I could do /[^{}]+/ but it gets very messy with negative lookaheads for tokens like #define. It's easy to forget to add new token to invalid regexp and hard to debug the consequences.
I'd obviously prefer a general solution upstream.

I'm willing to try and implement this in a PR if you guys think it's a good idea.

deltaidea · 2017-07-19T15:52:01Z

You made me think of a much simpler way to implement it:

let errorRe = /(?:(?!<re>).)+/my // <re> is `lexer.re`, i.e. all the valid stuff.

When can't parse a valid a token, errorRe.exec(input) matches everything right up to the next one.

nathan · 2017-07-19T16:36:40Z

I still don't think that error recovery at the level of lexical analysis is what you want. Could you provide some examples from your language?

tjvr · 2017-07-19T19:09:49Z

the whole line } // comment gets parsed as invalid which doesn't match rCurly.

Argh. I knew making keyword handling implicit would be a bad idea. Perhaps this is another reason to make keyword handling explicit (#53).

tjvr · 2017-07-21T20:04:16Z

Here are some suggestions:

We recommend not implementing error recovery as part of your lexer (as @nathan says).
Since keyword handling is now explicit (Explicit keywords #57), your excerpt from above should now behave as expected.
If you really want to do this, I think the right place is another library on top of moo; not in the moo core itself. Sorry! :-)

tjvr closed this as completed Jul 21, 2017

tjvr added the question label Jul 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add error recovery mode? #56

Add error recovery mode? #56

deltaidea commented Jul 19, 2017 •

edited

Loading

nathan commented Jul 19, 2017 •

edited

Loading

deltaidea commented Jul 19, 2017 •

edited

Loading

deltaidea commented Jul 19, 2017

nathan commented Jul 19, 2017

tjvr commented Jul 19, 2017 •

edited by nathan

Loading

tjvr commented Jul 21, 2017

Add error recovery mode? #56

Add error recovery mode? #56

Comments

deltaidea commented Jul 19, 2017 • edited Loading

nathan commented Jul 19, 2017 • edited Loading

deltaidea commented Jul 19, 2017 • edited Loading

deltaidea commented Jul 19, 2017

nathan commented Jul 19, 2017

tjvr commented Jul 19, 2017 • edited by nathan Loading

tjvr commented Jul 21, 2017

deltaidea commented Jul 19, 2017 •

edited

Loading

nathan commented Jul 19, 2017 •

edited

Loading

deltaidea commented Jul 19, 2017 •

edited

Loading

tjvr commented Jul 19, 2017 •

edited by nathan

Loading