Replies: 3 comments 12 replies
-
Interesting, I also had this problem when writing tests. I just wanted, for example, to parse just an expression. I had to create a rule What if instead there was a flag when creating the parse tree calling Oh, I see that was choice 2. Yeah, I like choice 2, and I think it should be per parse, not at parser generation time. |
Beta Was this translation helpful? Give feedback.
-
I think this is definitely the correct solution to the problem. I encountered the problem some time ago when I had to test different rules instead of the root rule and I was forced to write a code for adding EOF tokens artificially. It was dirty. Also, I'd say "consuming up to EOF" mode should be default. |
Beta Was this translation helpful? Give feedback.
-
Hmm... I'm not liking these options I don't think. Also, I've often used the ability to parse the first part of a file with a rule. Adding EOF is just not a burden for me but apparently should be better documented haha!! |
Beta Was this translation helpful? Give feedback.
-
There was an interesting bug found a week ago in the Scala grammar. The grammar results in a parser that apparently parses input fine, but yields an incomplete parse tree, and doesn't produce any error messages. This is because the parser stops on an error, and returns a parse tree up to the point where the parse worked.
Why does a parser do this? This is because the start rule does not end with the EOF symbol (which I call an "EOF start rule"). When the start rule is changed (
compilationUnit : ('package' qualId)* topStatSeq ;
=>compilationUnit : (('package' qualId)* topStatSeq) EOF ;
), an error is raised. The grammar has errors, which were not found because the parse was valid on partial input.What other grammars in grammars-v4 do not have "an EOF start rule"? Most. When I add an EOF start rule, a dozen grammars that used to "work" no longer do.
What is my point? Requiring explicit EOF start rules is a problem because developers will forget to add an EOF start rule. Adjusting the grammars in grammars-v4 is burdensome, as much as for "symbol conflicts" resolution and "case insensitive lexing" that were recently fixed in Antlr 4.10. Already, my PR for EOF start rules in grammars-v4 changes 158 files, and it still does not fix the dozen or so grammars with errors.
There are two solutions to force the parser to read to completion of the input:
Force the developer to include an EOF start rule. Update all grammars in the grammars-v4 repository to have an EOF-terminated start rule. I.e., if the original start rule given in pom.xml is
start : s1 s2 ... sn ;
then change the rule tostart : (s1 s2 ... sn) EOF ;
, or introduce a new rule that references the old start rule symbol, i.e.start : s1 s2 ... sn;
=>new_start : start EOF;
The problem with this method is that people sometimes start the parse on something other than the one listed in the pom.xml, yet have the same expectation of a parse that consumes the entire input. For example, parse expressions using the C++ grammar.
It's possible to use automated means (via Trash) to edit the grammar to have an EOF-terminated start rule. This may works for CI builds, but developers will still need to modify the grammar for their needs. Considering many are naive developers, they may not augment the grammar with an EOF start rule.
It's likely to test whether the grammar requires an EOF start rule if the RHS of the start rule can derive empty (
start : s1 s2 ... sm sn so ... sz;
andsm...sz =>+ empty
).Provide a new switch to the Antlr runtime for parsing two ways: one in which the input is consumed to the end of file; and a second type where it accepts a parse on partial input.
This modification would mean a change to the runtime. But, most developers get what they expect, with no modifications to the grammar.
Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions