Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing always fails when explicitly using whitespace, with implicit whitespaces enabled #337

Closed
hardliner66 opened this issue Nov 14, 2018 · 1 comment

Comments

@hardliner66
Copy link

If I want to parse something with at least one space in between and given the following rule:

WHITESPACE = { " " }
left = {"left"}
right = {"right"}
file = {left ~ WHITESPACE+ ~ right}

parsing always fails.

This is because the implicit whitespace rule is auto inserted and matches before the explicit whitespace can match.

Maybe it would be possible to turn that around and first try to match the explicit rule and only if that fails, try to match the implicit rule.

@CAD97
Copy link
Contributor

CAD97 commented Nov 14, 2018

The "correct" solution is to mark the file rule as "compound atomic" with $, which opts out of automatic trivia insertion. #333 holds the main rallying point around design work that will make this more obvious in the future.

This is a property of how PEG works, and the assumptions made about how trivia behaves in a language.

A PEG-defined parser is greedy. Once it's made a decision, that decision is final, and no backtracking of any sort is done. (For me, it's helped to think of, in complicated cases, think of writing the grammar as writing a parser, not as writing a traditional grammar.)

The other solution is for your keywords to encode that they can't be followed by more identifier-like characters. (i.e. you encode the fact that a traditional lexer is greedy.)

Using some #333-styled syntax for clarity, an example:

identifier = { pest::unicode::XID_Start - pest::unicode::XID_Continue-* }
keyword(word) = _{ word - !pest::unicode::XID_Continue }
left = { keyword("left") }
right = { keyword("right") }

If you get really fancy, preventing identifiers from being keywords (unchecked pest:2.0 syntax this time):

left = @{ "left" ~ !ASCII_ALPHANUM }
right = @{ "right" ~ !ASCII_ALPHANUM }
any_kw = _{ left | right }
identifier = @{ !any_kw ~ ASCII_ALPHA ~ ASCII_ALPHANUM* }

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants