You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "correct" solution is to mark the file rule as "compound atomic" with $, which opts out of automatic trivia insertion. #333 holds the main rallying point around design work that will make this more obvious in the future.
This is a property of how PEG works, and the assumptions made about how trivia behaves in a language.
A PEG-defined parser is greedy. Once it's made a decision, that decision is final, and no backtracking of any sort is done. (For me, it's helped to think of, in complicated cases, think of writing the grammar as writing a parser, not as writing a traditional grammar.)
The other solution is for your keywords to encode that they can't be followed by more identifier-like characters. (i.e. you encode the fact that a traditional lexer is greedy.)
Using some #333-styled syntax for clarity, an example:
identifier = { pest::unicode::XID_Start - pest::unicode::XID_Continue-* }
keyword(word) = _{ word - !pest::unicode::XID_Continue }
left = { keyword("left") }
right = { keyword("right") }
If you get really fancy, preventing identifiers from being keywords (unchecked pest:2.0 syntax this time):
left = @{ "left" ~ !ASCII_ALPHANUM }
right = @{ "right" ~ !ASCII_ALPHANUM }
any_kw = _{ left | right }
identifier = @{ !any_kw ~ ASCII_ALPHA ~ ASCII_ALPHANUM* }
If I want to parse something with at least one space in between and given the following rule:
parsing always fails.
This is because the implicit whitespace rule is auto inserted and matches before the explicit whitespace can match.
Maybe it would be possible to turn that around and first try to match the explicit rule and only if that fails, try to match the implicit rule.
The text was updated successfully, but these errors were encountered: