TokenMatcher parse errors are not exceptions #153

ehuelsmann · 2023-08-17T18:18:08Z

The problem

Before 9902c7d, Ruby, Java and JavaScript would return the same error multiple times in a single parse invocation. This wasn't in line with the Go implementation: In Go, if there was an error with, e..g., a tag line, that error would be reported exactly once.

The cause

The token matcher returns a boolean value indicating whether the token is of the requested type (match_*). If the token matcher determines a token is of a specific type, it needs to return true. At the same time, while the type matches, there may be an issue parsing the token content. This happens where the line is a tagline with invalid tags or language selection tokens with non-existing languages. In order for the parser to be able to report the error after the parse, the token matcher adds any errors to the parser context before returning.

There lies the problem: a token may be given to the token matcher multiple times before it is finally accepted by the parser and added to the AST (notably due to lookahead functionality). Each time the problematic token is parsed, the error is added to the parser context by the token matcher, regardless of whether the token will end up in the AST (meaning that it's unclear if the error actually matters).

The proposed solution

In PR #152, the Perl code changes the interface contract of the token matcher: it will return 2 values. First the acceptance boolean and second a parser error if it occurred. Subsequently, the lookahead code is designed to look only at the acceptance boolean and the AST building functions match_token_at_* receive the error, adding it to the parser's error list when the token is accepted into the AST.

The design is better this way because exceptions should be used to indicate exceptional (unexpected) situations and return values should be used for "regular business". Parse errors - although being errors - are really in the "regular business" of a parser, meaning that the error should be returned from the token matcher to the parser which can deal with it as part of its regular processing.

This issue can be closed when

This item was created after discussion during the community zoom meeting on 17 Aug 2023.

The text was updated successfully, but these errors were encountered:

mpkorstanje mentioned this issue Apr 16, 2024

Add a mechanism to report syntax errors without exceptions #246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TokenMatcher parse errors are not exceptions #153

TokenMatcher parse errors are not exceptions #153

ehuelsmann commented Aug 17, 2023

TokenMatcher parse errors are not exceptions #153

TokenMatcher parse errors are not exceptions #153

Comments

ehuelsmann commented Aug 17, 2023

The problem

The cause

The proposed solution

This issue can be closed when