Parsing always fails when explicitly using whitespace, with implicit whitespaces enabled #337

hardliner66 · 2018-11-14T20:11:11Z

If I want to parse something with at least one space in between and given the following rule:

WHITESPACE = { " " }
left = {"left"}
right = {"right"}
file = {left ~ WHITESPACE+ ~ right}

parsing always fails.

This is because the implicit whitespace rule is auto inserted and matches before the explicit whitespace can match.

Maybe it would be possible to turn that around and first try to match the explicit rule and only if that fails, try to match the implicit rule.

The text was updated successfully, but these errors were encountered:

CAD97 · 2018-11-14T21:19:04Z

The "correct" solution is to mark the file rule as "compound atomic" with $, which opts out of automatic trivia insertion. #333 holds the main rallying point around design work that will make this more obvious in the future.

This is a property of how PEG works, and the assumptions made about how trivia behaves in a language.

A PEG-defined parser is greedy. Once it's made a decision, that decision is final, and no backtracking of any sort is done. (For me, it's helped to think of, in complicated cases, think of writing the grammar as writing a parser, not as writing a traditional grammar.)

The other solution is for your keywords to encode that they can't be followed by more identifier-like characters. (i.e. you encode the fact that a traditional lexer is greedy.)

Using some #333-styled syntax for clarity, an example:

identifier = { pest::unicode::XID_Start - pest::unicode::XID_Continue-* }
keyword(word) = _{ word - !pest::unicode::XID_Continue }
left = { keyword("left") }
right = { keyword("right") }

If you get really fancy, preventing identifiers from being keywords (unchecked pest:2.0 syntax this time):

left = @{ "left" ~ !ASCII_ALPHANUM }
right = @{ "right" ~ !ASCII_ALPHANUM }
any_kw = _{ left | right }
identifier = @{ !any_kw ~ ASCII_ALPHA ~ ASCII_ALPHANUM* }

grncdr mentioned this issue Dec 26, 2018

pest language evolution #333

Open

agausmann mentioned this issue May 3, 2019

RFC: Make white-space handling less confusing / more consistent with the introduction of an "adjacent selector": , #271

Open

tomtau converted this issue into discussion #653 Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Parsing always fails when explicitly using whitespace, with implicit whitespaces enabled #337

Parsing always fails when explicitly using whitespace, with implicit whitespaces enabled #337

hardliner66 commented Nov 14, 2018

CAD97 commented Nov 14, 2018 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Parsing always fails when explicitly using whitespace, with implicit whitespaces enabled #337

Parsing always fails when explicitly using whitespace, with implicit whitespaces enabled #337

Comments

hardliner66 commented Nov 14, 2018

CAD97 commented Nov 14, 2018 • edited Loading

This issue was moved to a discussion.

CAD97 commented Nov 14, 2018 •

edited

Loading