Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error recovery mode? #56

Closed
deltaidea opened this issue Jul 19, 2017 · 6 comments
Closed

Add error recovery mode? #56

deltaidea opened this issue Jul 19, 2017 · 6 comments
Labels

Comments

@deltaidea
Copy link
Contributor

deltaidea commented Jul 19, 2017

In my language, lines with invalid stuff are considered comments. I know, insane, but I'd like to support that if possible. Currently, { error: true } is extremely greedy and considers everything from the first error to be a single token.

I propose an optional error tolerance mode that can be enabled with { error: true, recover: true }:

  • if can't parse a token starting with current position position:
    • if not already in recovery mode, save current position as recovery starting point
    • increment position (skip current character)
  • if can parse a token && recovery starting point exists:
    • return a token with name of error token type, line and col of recovery starting point
    • delete recovery starting point (so we return current valid token next time)
@nathan
Copy link
Collaborator

nathan commented Jul 19, 2017

moo.compile({
  id: /\w+/,
  ws: {match: /\s+/, lineBreaks: true},
  // … rules rules rules …
  ignore: /.+/, // skip to eol
})

If you meant the entire line gets ignored, not just the trailing lexically invalid part, that's a job for the parser, not the lexer, because there are almost always sequences of lexically valid tokens that are not syntactic. (For example, + - is a sequence of JS tokens that is not syntactic.)

@deltaidea
Copy link
Contributor Author

deltaidea commented Jul 19, 2017

I tried that approach:

compile({
  ...
  lCurly: '{',
  rCurly: '}',
  invalid: /.+/
})

rCurly gets moved to the list of keywords matched by invalid. Then the whole line } // comment gets parsed as invalid which doesn't match rCurly.
I could do /[^{}]+/ but it gets very messy with negative lookaheads for tokens like #define. It's easy to forget to add new token to invalid regexp and hard to debug the consequences.
I'd obviously prefer a general solution upstream.

I'm willing to try and implement this in a PR if you guys think it's a good idea.

@deltaidea
Copy link
Contributor Author

You made me think of a much simpler way to implement it:

let errorRe = /(?:(?!<re>).)+/my // <re> is `lexer.re`, i.e. all the valid stuff.

When can't parse a valid a token, errorRe.exec(input) matches everything right up to the next one.

@nathan
Copy link
Collaborator

nathan commented Jul 19, 2017

I still don't think that error recovery at the level of lexical analysis is what you want. Could you provide some examples from your language?

@tjvr
Copy link
Collaborator

tjvr commented Jul 19, 2017

the whole line } // comment gets parsed as invalid which doesn't match rCurly.

Argh. I knew making keyword handling implicit would be a bad idea. Perhaps this is another reason to make keyword handling explicit (#53).

@tjvr
Copy link
Collaborator

tjvr commented Jul 21, 2017

Here are some suggestions:

  1. We recommend not implementing error recovery as part of your lexer (as @nathan says).
  2. Since keyword handling is now explicit (Explicit keywords #57), your excerpt from above should now behave as expected.
  3. If you really want to do this, I think the right place is another library on top of moo; not in the moo core itself. Sorry! :-)

@tjvr tjvr closed this as completed Jul 21, 2017
@tjvr tjvr added the question label Jul 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants