Ideas for error handling in IDE / language server applications #276

kevinmehall · 2021-11-21T19:09:32Z

kevinmehall
Nov 21, 2021
Maintainer

Instead of failing outright, a language-server parser should produce errors as part of its return value, so you can have error nodes mixed into an otherwise-successful syntax tree. In PEG terms, that means using the / operator to allow fallback to an error recovery path that skips over invalid input until it sees a character or keyword that would allow subsequent rules to succeed, and have this recovery path produce an error node in the syntax tree. You can skip characters/tokens until a specific recovery set with either [^ until]* (where until is a Rust pattern matching a single character/token) or (!until [_])* (where until is a PEG expression, for arbitrary lookahead).

You have to be a little strategic about where you do this -- not too deep in low-level rules like say number() because you may want to be able to use normal backtracking to try other rules when that fails. And not in too high-level of rules either, because then you'd discard too much valid code surrounding the error.

That's possible today, and I'd love to hear how it goes if you try this out.

Going beyond that, here are some thoughts on what could be possible:

Compile-time checking that the parser can't fail:

If you go the route of bundling errors into the "successful" return value, you want to be sure you've caught all the cases where the parser might fail. This is something the parser generator could check for you. Maybe a #[total] or #[infallible] annotation on a rule that would raise a compile-time error if any fallible subexpression that doesn't have an infallible alternative (e.g. the error recovery that produces the error node). Maybe on a pub rule this would make the function return the value without wrapping in Result.

Error localization:

The error info you get from rust-peg comes from reparsing with additional error position tracking turned on to find the maximal position the parser reached and the set of tokens that failed there that could have allowed it to move further. Right now, that only runs when the top-level rule fails, so you'd lose that when you handle errors within the grammar. If your error recovery paths are very granular, maybe that's fine -- you'd get an error node with a small span that would point closely to the error. But if you want to have error recovery skip big chunks of input, you might want to be able to put a more precise error position and set of expected tokens into that error node for later use. You could imagine an operator, say error_pos!(<expr>) that runs the same reparsing algorithm but only on a specific subexpression. Need to figure out exactly how this would pass the collected error information into the error recovery expression's code block.

Labeled failures:

A paper by Sérgio Medeiros and Fabio Mascarenha suggests an extension of PEG that would allow control of which choice operator is used to catch a particular failure. I haven't fully digested what the benefit of this would be for the kinds of parsers I've worked with in rust-peg, or what it would take to implement.

Lossless/homogenous/untyped/concrete syntax tree:

Right now, rust-peg is geared towards producing a strongly-typed abstract syntax tree, meaning that there are distinct Rust types for different syntactic elements. You explicitly specify what is extracted when parsing and collected into the return value, so semantically-insignificant elements like whitespace are not normally represented in the resulting AST. Many LSP implementations go the opposite route and produce a loosely-typed tree structure containing only token kinds, spans, and hierarchical information, in a syntax-rule-agnostic and potentially even language-agnostic data type.

In many ways, this seems like a big loss to me -- you write the whole grammar in one place, defining what makes up each grammar rule, but that doesn't get represented in the type system until a subsequent pass duplicates much of the structure of the parser to build a type-safe API on top of the untyped tree.

I'm not sure if this one is in-scope for rust-peg, but one way I've thought of supporting it is extending the existing tracing functionality into something that would call back into a user-provided implementation of a tracing trait. You'd write a grammar without any action code, and without use of the return values, and construct a concrete syntax tree from the calls to the tracing callback.

neunenak · 2021-11-21T20:49:53Z

neunenak
Nov 21, 2021

It does strike me as awkward that rust-peg does have a notion of tracing, but only via print statements that require a tool like pegviz to handle; building something more programmatically available on top of that printing code seems like a natural (and hopefully straightforward-to-implement) extension.

0 replies

Mingun · 2021-11-22T05:44:24Z

Mingun
Nov 22, 2021

I think, that Compile-time checking that the parser can't fail will be great addition. You may also consider creating a special syntax for creating error nodes in AST as I outlined in a sibling project. Probably, something that looks like a macro call would be looks natural because it will not introduce alien syntax:

enum Node {
  Ident(String),
  Paren(Box<Node>),
  Error(&'static str),
}
parser! {
  pub grammar parser() for str {
    pub rule expr() = paren() / ident();
    rule ident() -> Node = id:$(['a'..='z' | 'A'..='Z']+) { Node::Ident(id) };
    rule paren() -> Node
      = '('
        expr:catch!(expr(), Node::Error("expected expression after `(`"))
             catch!(')',    Node::Error("missing `)`"))
      { Node::Paren(Box::new(expr)) };
  }
}

0 replies

zsol · 2022-01-04T09:14:46Z

zsol
Jan 4, 2022

FYI there's another (newer) paper on labeled failures by the same authors: https://arxiv.org/pdf/1905.02145.pdf This discusses how to automatically label some grammars, which would be a significant advantage over the manual approach described at the top of OP

0 replies

kw217 · 2022-02-16T20:07:58Z

kw217
Feb 16, 2022

I hadn't noticed this discussion earlier, but I have attempted to implement the earlier paper (not the automatic labelling) in #289 .

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for error handling in IDE / language server applications #276

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Ideas for error handling in IDE / language server applications #276

kevinmehall Nov 21, 2021 Maintainer

Compile-time checking that the parser can't fail:

Error localization:

Labeled failures:

Lossless/homogenous/untyped/concrete syntax tree:

Replies: 4 comments

neunenak Nov 21, 2021

Mingun Nov 22, 2021

zsol Jan 4, 2022

kw217 Feb 16, 2022

kevinmehall
Nov 21, 2021
Maintainer

neunenak
Nov 21, 2021

Mingun
Nov 22, 2021

zsol
Jan 4, 2022

kw217
Feb 16, 2022