Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow combination of _ and @ for silent rules that disallow whitespace #118

Closed
jsdw opened this issue Jul 15, 2017 · 7 comments
Closed

Allow combination of _ and @ for silent rules that disallow whitespace #118

jsdw opened this issue Jul 15, 2017 · 7 comments

Comments

@jsdw
Copy link

jsdw commented Jul 15, 2017

First off, I have found pest a pleasure to use, so thankyou for this library!

I have run into times when it would be nice to be able to decalre a rule as not accepting whitespace but also not being silent, ie using something like "@" rather than just "" or "@" to prefix a rule.

A particular example I think I have run into (but it may very well be my lack of experience) is comments that end on a newline. If my grammer is like:

whitespace = _{ ([" "] | ["\n"] | ["\t"] | ["\r"])+ }
comment = _{ ["//"] ~ (!["\n"] ~ any)* ~ ["\n"] }

I want to be able to specify that the whitespace rule does not get applied between each ~ in the comment rule, to ensure that the whitespace rule doesn't swallow the newline before the rule can be applied to end the comment. I also don't want to hear anything about comments in my queue.

Regardless of whether that is a problem, it would be nice in general to be able to opt out of whitespace and be silent about a rule.

@dragostis
Copy link
Contributor

Really nice to hear about that. Version 1.0 is just around the corner with a lot of improvements, including the fact that comments and whitespace are atomic by default.

Apart from these two special cases, I see no use case for a rule that should be both silent and atomic, but I'm open to suggestions.

@jsdw
Copy link
Author

jsdw commented Jul 17, 2017

Thanks for the quick response; that's very exciting to hear!

A case I ran into (but it wasn't a big deal, honestly) is when I wanted to parse a variable name in my language, knowing that all variables are prefixed by "$". I did something like:

variable = @{ ["$"] ~ variable_name }
variable_name = { (['a'..'z'] | ['A'..'Z'] | ["_"])+  }

And in the process macro, I have no need to see the leading "$" and so I am just skipping over variable rules and handling variable_names. I might have chosen therefor to make variable _@ in this case.

If I run into any more compelling use cases I'll let you know - intuitively _ and @ feel somewhat orthogonal and so I expect there are times when using both would help reduce clutter in the token queue.

@jsdw
Copy link
Author

jsdw commented Aug 11, 2017

Since my first comment, I've run into a couple more use cases:

  1. I want to allow either a semi-colon or a newline in some places, but my whitespace rule eats newlines. one approach that doesn't mean I have to modify my grammar might be
_@rule_we_care_about_line { rule_we_care_about ~ ([";"] | newline) }
_@newline { ([" "] | ["\t"])* ~ ["\n"] }
!@rule_we_care_about { ... }

This (admittedly looks messy - I'd love to hear an alternative) would allow ; or \n to end a rule, without adding any unnecessary tokens to my queue. The only alternative I'm aware of is not allowing newlines in my whitespace rule (which I may well do) and supporting them manually in places

@jsdw jsdw closed this as completed Aug 11, 2017
@jsdw jsdw reopened this Aug 11, 2017
@dragostis
Copy link
Contributor

In this particular case rule_we_care_about can as well be silent if it doesn't have to be atomic. (which should be the case, I guess)

The reason I'm hesitant about this is because I feel it makes the grammar a bit harder to wrap your mind around when first starting to use it.

@CAD97
Copy link
Contributor

CAD97 commented Sep 13, 2018

This will become useful in more cases with #261; consider the case of a Keyword(Word) rule:

identifier = @{ ident_start ~ ident_continue* }
Keyword(Word) = _${ Word ~ !ident_continue }

The desired semantics here are that the Keyword parse node would not appear, but the Word node would (if it is not also silent). I don't think these semantics are possible to achieve otherwise.

@dragostis
Copy link
Contributor

Will close this in favor of #271.

@singalen
Copy link

singalen commented May 24, 2021

Can we please reopen this one, given that #271 didn't go anywhere in 2 years?
I hope you have the energy at least for this small thing.
Without it, for example, indent-sensitive grammars like in #328 are filled with useless "indent" rule pairs, which makes working with pest-ast a real pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants