Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error reporting. #467

Open
Kesanov opened this issue Jul 18, 2020 · 3 comments
Open

Better error reporting. #467

Kesanov opened this issue Jul 18, 2020 · 3 comments

Comments

@Kesanov
Copy link

Kesanov commented Jul 18, 2020

I played with the online editor and wasn't very happy with the error messages.

It seems to report only the name of the root rule: expected bar, while I would expect to see a rather more descriptive message:

foo = { "["~digit+~"]" }
bar = { "list"~"="~foo }
last~(a) <== bar expected "list"
list~(a) <== bar expected "="
list=(a) <== foo expected "["
list=[a] <== digit expected '0'..'9'
@plusvic
Copy link

plusvic commented May 20, 2022

I'm experiencing the same issue with error messages. The message always seems to report the top level rule, even if part of the rule is matched correctly and the actual error is at some other place. For example, I was using the following grammar for testing:

WHITESPACE = _{ " " | "\t" | "\n" }

identifier = { (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }

condition = { "true" | "false" }

rule = {  "rule" ~ identifier ~ "{" ~ "condition:" ~ condition ~ "}" }

And then tried the following input:

rule test { foo: true }

The problem with that input is that the grammar is expecting condition instead of foo, so I would expect an error like:

 --> 1:13
  |
1 | rule test { foo: true }
  |             ^---
  |
  = expected condition

But instead I got:

 --> 1:1
  |
1 | rule test { foo: true }
  | ^---
  |
  = expected rule

This error is very misleading, the first few tokens of the rule are actually correct, and the failure comes when the parser finds foo instead of condition, and that's what the error should say.

Follow-up:
Interestingly enough, I get the expected error if I change my grammar to:

WHITESPACE = _{ " " | "\t" | "\n" }

identifier = {
  (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* 
}

condition = { "condition" }
expression = { "true" | "false" }

rule = { 
	"rule" ~ identifier ~ "{" ~ 
       condition ~ ":" ~
         expression ~
    "}" 
}

So, it looks like the errors are always expecting <something>, where <something> is the name of the production rule that failed to match, but it doesn't report errors for unexpected tokens within a production rule.

@EmirVildanov
Copy link
Contributor

Hi!
Developing sbroad (library for distributed SQL queries execution) we've faced the same problem.
We use pest for parsing SQL, but sometimes we get error messages that are hard to link with the real syntax problem.

Some examples:

I leave the link to our simplified SQL grammar, but it seems to be redundant for examples understanding.

  • Query: create table t_name(col_name int null).
    Note: in our grammar we suppose to see the primary key next to the columns definition that must be separeted with comma.
    Error message:
     --> 1:27
      |
    1 | create table bla(name int null)
      |                           ^---
      |
      = expected NotFlag
    
    Explanation: after succesfully parsing "int" pest enters optional ColumnDefIsNull rule, tracks optional NotFlag as farthest parsed with error rule and stores it in pos_attempts (from which we later build the error message).
  • Query: select 1 from a group b.
    Note: we expect to see error message about incomplete "by" token.
    Error message:
     --> 1:17
      |
    1 | select 1 from a group b
      |                 ^---
      |
      = expected EOI, Join, GroupByBlock, or HavingBlock
    
    Explanation: after successful parsing of "group" token, pest doesn't track it anywhere and counts GroupByBlock (or HavingBlock) rule starting at "group" keyword to be the farthest rule that we've tried to parse but errored.

It seems to me that the main problem with errors is that we don't track tokens that our grammar expects to see at an farthest errored position and that library user can't interact with the sequence of rules that were applied to parse the input.

My proposal is to tinker logic in the parser_state.rs to track the following info inside ParserState:

  1. Vector of expected tokens
  2. Stacks of rule calls

Later we can can pass these info into ParsingError and use on the user side in order to form custom error messages based on expected rules.

Please see proposed changes at MR.

@viridia
Copy link

viridia commented Dec 10, 2024

I believe this issue could be solved by the introduction of a "cut" or "trap" feature in Pest, as described here: #934

In a PEG parser, a failure in a rule isn't necessarily fatal - it just means that the parser will attempt to try a different alternative, popping up a level if need be. If no alternatives can be found that match, then the whole parse fails. Unfortunately, this means that a failure in a deeply nested rule won't be considered a failure of that specific rule, but rather a failure of the grammar as a whole - therefore you get an error at the top level, which isn't much help to the user.

The solution is a way to tell the parser to "commit" to the current alternative. If you see a pattern like "if (", you know it's an if-statement, there's no need to check alternate rules. Even if the input tokens afterwards are garbage, like "if (%!__", we know this is a bad if-statement, not some other kind of statement.

In prolog, this operation is called "cut", and it's a token that is placed within the grammar that means "once you see this token, commit to this branch and don't try any other branches. If subsequently you see an error, then the error is here, not in some parent rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants