Better error reporting. #467

Kesanov · 2020-07-18T20:23:15Z

I played with the online editor and wasn't very happy with the error messages.

It seems to report only the name of the root rule: expected bar, while I would expect to see a rather more descriptive message:

foo = { "["~digit+~"]" }
bar = { "list"~"="~foo }

last~(a) <== bar expected "list"
list~(a) <== bar expected "="
list=(a) <== foo expected "["
list=[a] <== digit expected '0'..'9'

The text was updated successfully, but these errors were encountered:

plusvic · 2022-05-20T15:56:22Z

I'm experiencing the same issue with error messages. The message always seems to report the top level rule, even if part of the rule is matched correctly and the actual error is at some other place. For example, I was using the following grammar for testing:

WHITESPACE = _{ " " | "\t" | "\n" }

identifier = { (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }

condition = { "true" | "false" }

rule = {  "rule" ~ identifier ~ "{" ~ "condition:" ~ condition ~ "}" }

And then tried the following input:

rule test { foo: true }

The problem with that input is that the grammar is expecting condition instead of foo, so I would expect an error like:

 --> 1:13
  |
1 | rule test { foo: true }
  |             ^---
  |
  = expected condition

But instead I got:

 --> 1:1
  |
1 | rule test { foo: true }
  | ^---
  |
  = expected rule

This error is very misleading, the first few tokens of the rule are actually correct, and the failure comes when the parser finds foo instead of condition, and that's what the error should say.

Follow-up:
Interestingly enough, I get the expected error if I change my grammar to:

WHITESPACE = _{ " " | "\t" | "\n" }

identifier = {
  (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* 
}

condition = { "condition" }
expression = { "true" | "false" }

rule = { 
	"rule" ~ identifier ~ "{" ~ 
       condition ~ ":" ~
         expression ~
    "}" 
}

So, it looks like the errors are always expecting <something>, where <something> is the name of the production rule that failed to match, but it doesn't report errors for unexpected tokens within a production rule.

EmirVildanov · 2024-01-09T10:40:38Z

Hi!
Developing sbroad (library for distributed SQL queries execution) we've faced the same problem.
We use pest for parsing SQL, but sometimes we get error messages that are hard to link with the real syntax problem.

Some examples:

I leave the link to our simplified SQL grammar, but it seems to be redundant for examples understanding.

Query: create table t_name(col_name int null).
Note: in our grammar we suppose to see the primary key next to the columns definition that must be separeted with comma.
Error message:
```
 --> 1:27
  |
1 | create table bla(name int null)
  |                           ^---
  |
  = expected NotFlag
```
Explanation: after succesfully parsing "int" pest enters optional ColumnDefIsNull rule, tracks optional NotFlag as farthest parsed with error rule and stores it in pos_attempts (from which we later build the error message).
Query: select 1 from a group b.
Note: we expect to see error message about incomplete "by" token.
Error message:
```
 --> 1:17
  |
1 | select 1 from a group b
  |                 ^---
  |
  = expected EOI, Join, GroupByBlock, or HavingBlock
```
Explanation: after successful parsing of "group" token, pest doesn't track it anywhere and counts GroupByBlock (or HavingBlock) rule starting at "group" keyword to be the farthest rule that we've tried to parse but errored.

It seems to me that the main problem with errors is that we don't track tokens that our grammar expects to see at an farthest errored position and that library user can't interact with the sequence of rules that were applied to parse the input.

My proposal is to tinker logic in the parser_state.rs to track the following info inside ParserState:

Vector of expected tokens
Stacks of rule calls

Later we can can pass these info into ParsingError and use on the user side in order to form custom error messages based on expected rules.

Please see proposed changes at MR.

viridia · 2024-12-10T21:28:23Z

I believe this issue could be solved by the introduction of a "cut" or "trap" feature in Pest, as described here: #934

In a PEG parser, a failure in a rule isn't necessarily fatal - it just means that the parser will attempt to try a different alternative, popping up a level if need be. If no alternatives can be found that match, then the whole parse fails. Unfortunately, this means that a failure in a deeply nested rule won't be considered a failure of that specific rule, but rather a failure of the grammar as a whole - therefore you get an error at the top level, which isn't much help to the user.

The solution is a way to tell the parser to "commit" to the current alternative. If you see a pattern like "if (", you know it's an if-statement, there's no need to check alternate rules. Even if the input tokens afterwards are garbage, like "if (%!__", we know this is a bad if-statement, not some other kind of statement.

In prolog, this operation is called "cut", and it's a token that is placed within the grammar that means "once you see this token, commit to this branch and don't try any other branches. If subsequently you see an error, then the error is here, not in some parent rule.

tomtau added enhancement pest labels Jul 14, 2022

EmirVildanov mentioned this issue Jan 9, 2024

feat: support better error reporting #965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error reporting. #467

Better error reporting. #467

Kesanov commented Jul 18, 2020 •

edited

Loading

plusvic commented May 20, 2022 •

edited

Loading

EmirVildanov commented Jan 9, 2024

viridia commented Dec 10, 2024

Better error reporting. #467

Better error reporting. #467

Comments

Kesanov commented Jul 18, 2020 • edited Loading

plusvic commented May 20, 2022 • edited Loading

EmirVildanov commented Jan 9, 2024

viridia commented Dec 10, 2024

Kesanov commented Jul 18, 2020 •

edited

Loading

plusvic commented May 20, 2022 •

edited

Loading