-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removal of zero-width matches in generated tree-sitter grammar files, preliminary implementation #475
base: master
Are you sure you want to change the base?
Conversation
Thanks, @chaserhkj! I gave the What would be the way to use this backend? Could you write some backend documentation at https://github.com/BNFC/bnfc/blob/master/docs/user_guide.rst ?
Also, we need some CHANGELOG entry. |
So it seems that the fix here is not complete, e.g. with CF file
and the
|
Sorry I wasn't able to look at this again in the past few months and I kinda forgot where I was. I think I at least implemented the simplest cases, but I might be wrong. I might not have the immediate time to work on this, but I'll see if I could get back to this (and the documentations) in one week or two. |
Tree-sitter has a known limitation of not handling symbols that can match zero-width (a.k.a. "empty") strings. (Documented here) The only exception seems to be the start symbol.
However, in BNFC, zero-width matching symbols are supported and used in many examples and test cases. (e.g. The list macros in BNFC) It would be impractical to not support them in the tree-sitter backend.
It is possible to eliminate all zero-width matching symbols from a grammar: First, all symbols that could match zero-width need to be identified. Then these symbols can be converted to valid tree-sitter symbols by removing the RHS empty branches from its rules. Then all rules of the entire grammar need to be processed, wrapping all of the references to the aforementioned symbol with
optional(..)
. Finally since the addition ofoptional(..)
can populate zero-width-matching to other symbols, this needs to be done repeatedly until converging on a fixed point or the zero-width-matching is populated to the start symbol.This PR implements a preliminary version of the above-described algorithm. The converging process is not done yet and currently, it only handles simple cases of zero-width-matching where there is only one layer of matching. It should be sufficient enough for most cases like the list-related internal macros provided in BNFC.