Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#3744): Human-Readable Message for Prohibited Comment #3802

Conversation

volodya-lombrozo
Copy link
Member

In this PR I've added supportive grammar rule that is used to catch prohibited comments and report them to a user:

prohibitedComment
    : comment
    ;

Closes: #3744.

@github-actions github-actions bot added the core label Jan 10, 2025
@volodya-lombrozo volodya-lombrozo marked this pull request as draft January 10, 2025 13:26
@volodya-lombrozo volodya-lombrozo marked this pull request as ready for review January 10, 2025 14:02
@volodya-lombrozo
Copy link
Member Author

@maxonfjvipon Could you have a look, please?

@maxonfjvipon
Copy link
Member

@volodya-lombrozo it's a very interesting idea, but I'm not sure if it's "right" to put wrong scenarios to grammar to detect errors. It seems that grammar is supposed to contain only "possible" scenarios. Did you see anywhere else such approach when obviously incorrect scenarios are put into grammar to detect syntax errors?

@volodya-lombrozo
Copy link
Member Author

volodya-lombrozo commented Jan 10, 2025

@maxonfjvipon I share your concern. Actually, using this approach was a last resort. I didn’t want to use it initially.
As for examples, yes, there are plenty of them:

  1. The Definitive ANTLR 4 Reference 2nd Edition: Section 9.4 Error Alternatives
  2. The real example is the Groovy lang. They use this technique a lot (at least in Lexer). For example for UNEXPECTED_CHAR
  3. Also some sites mention this approach as well

@yegor256
Copy link
Member

@volodya-lombrozo maybe we should use a two-phases compilation pipeline for the .g4 file. First, we take the original .g4 file and inject such additional constructs into it, producing new (very large) .g4 file. Then, we let ANTLR compile this large file. WDYT?

@volodya-lombrozo
Copy link
Member Author

@volodya-lombrozo maybe we should use a two-phases compilation pipeline for the .g4 file. First, we take the original .g4 file and inject such additional constructs into it, producing new (very large) .g4 file. Then, we let ANTLR compile this large file. WDYT?

@yegor256 I believe this solution would be overly complicated, to be honest. Originally, the problem with error handling was related to the EO ANTLR grammar. The grammar is created in such a way that if one of the comments is prohibited (by the grammar), then almost the entire AST becomes incorrect (see the picture below).

antlr4_parse_tree_1

The perfect solution would be to rework the grammar in a way to avoid such situations. But it might require more dramatic changes that I would like to avoid.

@maxonfjvipon
Copy link
Member

@volodya-lombrozo could you please describe how would you change the grammar so it would be more convenient for catching error? Maybe I could figure it out

@yegor256
Copy link
Member

I believe this solution would be overly complicated, to be honest.

@volodya-lombrozo I do like this PR, but I don't understand how many of such PRs we will have in the future, with similar grammar tricks. If it's only one, that's fine. However, if there will be others, maybe we should think about something less intrusive -- something that doesn't pollute our .g4 file with error-specific rules. WDYT?

@volodya-lombrozo
Copy link
Member Author

@volodya-lombrozo could you please describe how would you change the grammar so it would be more convenient for catching error? Maybe I could figure it out

@maxonfjvipon I can't say for now, it requires some additional deep investigation. ANTLR can't recognise either object, or objects sub-rule (object EOL?)* and immediately throws an error. It seems, that it can't decide whether object is a master or a slave:

object
    : master
    | slave
    ;

All the available recovery strategies don't help either.
As I understand, the idea here is to make ANTLR recognise deeper nodes. In order to achieve this we need to reduce collisions between rules. I guess.

@volodya-lombrozo
Copy link
Member Author

I believe this solution would be overly complicated, to be honest.

@volodya-lombrozo I do like this PR, but I don't understand how many of such PRs we will have in the future, with similar grammar tricks. If it's only one, that's fine. However, if there will be others, maybe we should think about something less intrusive -- something that doesn't pollute our .g4 file with error-specific rules. WDYT?

@yegor256 I can't agree more here. However, I can't say for now how much of such PR's we will have in the future. I need to move forward and try to solve similar issues:

#3745
#3746

If they will require same "tricks", it will mean that we need to solve this problem somehow else. And we will be able to discuss it in the next PR.
What do you think?

@yegor256
Copy link
Member

@volodya-lombrozo
Copy link
Member Author

We solved this issue by using different approach. See #3818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

comment-in-method.yaml:24-26: The error message doesn't...
3 participants