-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error recovery #289
base: master
Are you sure you want to change the base?
Add error recovery #289
Conversation
It is quite big improvement and I think it will be better to split it on several commits since otherwise it will be difficult to review. I suppose to split this PR to following commits:
Maybe some steps can be split even more, but that is a required minimum as I think. |
Thanks @Mingun , I'll split up the PR into separate commits as you suggest. I've fixed the bug in trace mode that was spotted by the automated tests (oops!). I just noticed the discussion at #276, which I hadn't seen before I submitted this PR (I did most of the work a few months ago). Here are some thoughts against each of @kevinmehall's headings:
|
a5aeae7
to
bcc4456
Compare
Changes the public return type of parsers, so clients can access parse result information directly. This is a breaking change, but clients can easily adapt by invoking `.into_result()` to obtain the original return type. This commit contains a manually-updated grammar.rs. The next commit contains the bootstrapped one.
Previous commit exposes the parser return type, but used a manually-updated grammar.rs. This commit autogenerates the bootstrapped grammar.rs from grammar.rustpeg.
Adds a new expression `error!{"message" e}` which reports an error and attempts to recover it with `e`. Also `error_if!{}` and `error_unless!{}`. Maintains and returns a list of errors which have been recovered. This enables the parser to report multiple errors, which is useful in contexts like IDEs and compilers where the user may find it easier to be presented with all errors rather than just the first. This commit contains a manually-updated grammar.rs, sufficient only for bootstrapping but with no support for parsing the new expression types. The next commit contains the bootstrapped one.
Previous commit adds error recovery, but used a manually-updated grammar.rs. This commit autogenerates the bootstrapped grammar.rs from grammar.rustpeg.
bcc4456
to
9b99771
Compare
(apologies for the noise) This is now split into five commits - @Mingun hopefully this is easier to review!
This is now ready for review. Thanks. |
Yes, some kind of error recovery is definitely in-scope for rust-peg and I'm glad to see someone working on this. Error recovery is potentially a big addition and I want to make sure we've explored the design space and get it right, so this is going to take some discussion and iteration. Some initial feedback: SyntaxIt seems that
I would lean towards re-using the existing control flow operators unless there's some advantage of duplicating them. Collecting multiple errorsI'm wary of anything that returns data out of a PEG expression except via its return value, because this doesn't play well with backtracking. For example,
The Medeiros paper doesn't say much about how errors should be collected. You could imagine reverting recorded errors on backtracking, but this seems like it would have a large performance penalty to implement. This is why on #276 I proposed returning the errors as part of the AST -- you have to construct a placeholder AST node anyway, so why not make it carry the error? That also leaves the error type up to the user, instead of requiring If you really want to collect a list of errors without dealing with backtracking, you can always pass in a mutable Error vs FailureWhat constructs I also question whether the behavior of preventing further backtracking and aborting the parse is desirable here if the goal is to build an AST that is as complete as possible. You could imagine wanting layered error recovery, e.g. attempting recovery within the rule for expressions, but if that fails, the calling rule tries to recover at the following statement, and if nothing else, you replace the whole function with an error node but continue with the rest of the file. I do support being more consistent on "error" vs "failure" terminology in the docs and code, though! I want to re-read the papers more closely, because I feel like I'm missing something there -- the most useful piece from the first paper seems to be e.g. the strategy of the So I'm not sure where that leaves this PR. It's been interesting to try out, at least. If you have a grammar that you've used this on, I'd like to see it. |
Thanks, really appreciate the detailed response, and definitely keen to iterate. This is a hobby project for me so I apologise in advance if it takes me a while to get back to you. This was a fairly straightforward implementation of the paper, without taking a lot of time to consider whether it was the best approach. You are correct re I see your point about reporting errors in the AST, but having them on the side makes sense in terms of the way a compiler or IDE expects to see them - when I compile a Rust program I am presented with a linear list of errors, and in order to generate wiggly lines in an IDE I have to present a list of source ranges. This could be extracted from the AST, but that adds an extra stage which feels artificial. You are totally right that The best example I have at the moment is Thanks again. I'll get back to you with a better example of how this can be used, and I'll ponder more broadly whether there's a better approach. |
Looking at the current mechanism, error recovery is not moved to the lowest precedence alt, which can result in ambiguous matches. So that for example if we have a rule constant and expression , for a gplang, if we match a integer constant as show below, only integer literals are matched. rule integer() -> Literal
= val:$(['0'..°'9') ....
/ error!{"nan" 0}
pub rule literal() -> Literal
= integer
/ floating
/ string
/ error!{"expected a constant value" 0} If we look at ANTLR, the parser first tries all possible alternatives, then exports the parse trace to a error handler. ANTLR would produce a state machine like so:
(note that all error recovery is done at the highest level of the tree in the state machine)
which is obviously invalid. |
Bump |
I want to try this branch, can we fix the merge conflict? |
Sorry, I'm unlikely to progress on this - I've moved on to other things. If anyone else is interested feel free - and reach out to me if I can be of help. |
When using a parser in the context of an IDE or a compiler, it is desirable to be able to recover from errors, returning a best-efforts parse anyway and reporting all errors rather than just the first. This PR extends
rust-peg
with support for this, based on Syntax Error Recovery in Parsing Expression Grammars and Automatic Syntax Error Reporting and Recovery in Parsing Expression Grammars.@kevinmehall I'm not sure if multiple errors is something you want to support, and I'm not 100% confident I've chosen the best syntax, so I would be grateful for your feedback.
Adding error recovery necessarily introduces a breaking change - the public return type of a parser has to be able to include multiple errors. To ease upgrade, the new type has an
.into_result()
method which returns the original result type.There are three expressions for raising an error:
error!{ "message" e2 }
is the simplest form. It reports an error with the given message at the current location, then attempts to recover by parsinge2
instead.error_if!{ e1 | "message" e2 }
allows the error to be reported at an earlier location. It attempts to parsee1
; if it succeeds it reports an error with the given message at the start ofe1
, then attempts to recover by parsinge2
instead.error_unless!{ e1 | "message" e2 }
is shorthand for a choice expression. It is useful in order to report an error when an expression fails to match. It attempts to parsee1
; if it fails it reports an error with the given message at the current location, then attempts to recover by parsinge2
instead.The result of the parser (
ParseResults
) contains the final value, failure, or error, along with a list of all errors we recovered from during the parse.Full documentation is provided in src/lib.rs.
Changes
RecoverIfExpr
andRecoverUnlessExpr
(the simpleerror!
form is sugar forerror_if!
with an emptye1
)..into_result()
.recovery.rs
(the example from the paper) andarithmetic_recovery.rs
(a worked example of how to use the power of the new features).RuleResult
now has a third alternative (beyondMatched
andFailed
):Error
. This is handled largely following the rules from the papers cited, although a few improvements and optimizations have been added as noted in the comment in src/lib.rs.ErrorState
gains a set oferrors
recovered from so far.into_parse_error
is nowinto_failure
, since it creates a failure not an error, andreparsing_on_error
becomesreparsing_on_failure
.Let me know if you have any questions!