-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support better error reporting #965
Conversation
WalkthroughThe changes encompass the introduction of an SQL parsing module with advanced error handling capabilities. These improvements enhance error diagnostics for SQL parsing, boosting the reliability and resilience of the parser. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (5)
- grammars/src/grammars/sql.pest (1 hunks)
- grammars/src/lib.rs (1 hunks)
- pest/src/error.rs (21 hunks)
- pest/src/lib.rs (1 hunks)
- pest/src/parser_state.rs (12 hunks)
Additional comments: 31
grammars/src/grammars/sql.pest (1)
- 1-249: The new SQL grammar file
sql.pest
is comprehensive and covers a wide range of SQL commands and syntax elements. It is well-structured and follows the conventions of thepest
parser. Ensure that all grammar rules are tested thoroughly to validate their correctness and that they cover all intended SQL syntax variations.pest/src/lib.rs (1)
- 352-352: The change to make the
parser_state
module public is a significant alteration to the library's accessibility. This should be carefully considered and documented, explaining why the module needs to be public and how it should be used by consumers of the library.grammars/src/lib.rs (2)
53-59: The addition of the
sql
module with theSqlParser
struct is consistent with the PR's objective to enhance error reporting capabilities for SQL parsing. Ensure that the grammar filegrammars/sql.pest
is correctly placed and contains the expected grammar definitions.253-361: The modified
sql
test function now includes comprehensive error handling and reporting, which aligns with the PR's goal of improving error reporting. The test function seems to cover various error scenarios and provides custom error messages. It's important to ensure that all new error handling paths are covered by tests to verify their correctness.pest/src/error.rs (15)
21-21: The import of
ParseAttempts
is correctly placed following Rust's convention of listing imports alphabetically.52-53: The addition of the
parse_attempts
field to theParsingError
variant is consistent with the PR's objective to enhance error reporting. Ensure that theParseAttempts
type is properly defined and that its usage here is consistent with its intended purpose.111-118: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [100-115]
The example provided in the documentation comment is helpful for understanding how to create a
ParsingError
with the newparse_attempts
field. However, it's important to ensure that the documentation is kept up-to-date if theParseAttempts
API changes.
- 159-166: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [147-163]
Similar to the previous comment, the example for creating an
Error
from aSpan
is clear and demonstrates the use of the new field. It's good practice to include such examples in the documentation.
- 225-232: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [214-229]
The method
with_path
is well-documented with an example. This method is not directly related to the new changes but is part of the public API, so it's important to ensure that it functions correctly with the newparse_attempts
field.
- 257-264: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [247-261]
The method
path
is a getter for the privatepath
field. It's good to see that encapsulation is being maintained. This method is unchanged and should continue to work as expected with the new changes.
- 299-306: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [289-303]
The
renamed_rules
method allows for customization of error messages by renaming rules. This is a useful feature for users who want to provide more context-specific error messages. It's important to verify that this method interacts correctly with the newparse_attempts
field.
540-540: The
message
method in theErrorVariant
enum has been updated to handle the newparse_attempts
field. It's crucial to ensure that the error messages generated are informative and accurate.582-582: The test case
display_parsing_error_mixed
has been updated to include the initialization of theparse_attempts
field. It's important to ensure that all test cases are updated to reflect the new changes and that they pass successfully.609-609: The test case
display_parsing_error_positives
is also updated to includeparse_attempts
. Consistency in test cases is key to maintaining a robust testing suite.636-636: The test case
display_parsing_error_negatives
includes the new field as well. It's good to see comprehensive testing of different error scenarios.663-663: The test case
display_parsing_error_unknown
is updated, which is essential for ensuring that the new field does not affect the handling of unknown errors.855-855: The
mapped_parsing_error
test case demonstrates the renaming of rules in an error message. This test should verify that theparse_attempts
field does not interfere with the renaming functionality.883-883: The
error_with_path
test case is updated to includeparse_attempts
. It's important to ensure that the path information is correctly displayed alongside the new error details.911-911: The
underline_with_tabs
test case checks the visual representation of errors with tabs. This is a good detail to test, as whitespace can often cause issues in formatting.pest/src/parser_state.rs (12)
15-19: The addition of
core::fmt::Debug
andstd::prelude::rust_2021::String
to the imports suggests that debugging and string manipulation features are being used in the new code. Ensure that these imports are indeed utilized within the file to avoid unnecessary imports.128-137: Constants
CALL_STACK_INITIAL_CAPACITY
,EXPECTED_TOKENS_INITIAL_CAPACITY
, andCALL_STACK_CHILDREN_THRESHOLD
have been defined with specific values. It's important to ensure that these values have been chosen based on empirical evidence or reasonable defaults and that they are documented to explain their purpose.139-146: The
ParseAttempt
enum has been introduced with variantsRule
andToken
. This is a good use of Rust's enum to represent different parsing attempts. Ensure that the enum is used consistently throughout the code and that all possible parsing attempts are covered.148-161: The
RulesCallStack
struct is well-defined with clear documentation. It's important to ensure that the logic for compressing several entities into a single rule (try_add_new_stack_rule
) is correctly implemented and tested.173-186: The
ParseAttempts
struct is designed to track parsing attempts and expected tokens at the maximum position. Ensure that the logic for adding new tokens and rules (try_add_new_token
,try_add_new_stack_rule
) is robust and handles all edge cases.277-292: The
parse_attempts
field has been added to theParserState
struct to replaceattempt_pos
andmaybe_attempts
. This change should be carefully reviewed to ensure that it integrates well with the existing parsing logic and that it accurately tracks parsing attempts and errors.368-376: New methods
get_parse_attempts
andget_error_position
have been added toParserState
. These methods should be checked to ensure they return accurate and useful information for error reporting.477-506: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [457-486]
The
rule
method has been modified to include logic for tracking parsing attempts. It's crucial to verify that the logic correctly updatesparse_attempts
and that it doesn't introduce any regressions or bugs in the parsing process.
494-502: The closure
try_add_rule_to_stack
is used to add rules to the parsing attempts. Ensure that this logic is correct and that it doesn't inadvertently overwrite or lose important information about parsing attempts.854-871: The
match_string
method has been updated to interact with the newparse_attempts
structure. It's important to ensure that the method correctly updates the parsing attempts and expected tokens based on the lookahead status and the match result.901-911: Similarly, the
match_insensitive
method has been updated. Verify that the case-insensitive matching logic is sound and that it interacts withparse_attempts
as expected.125-264: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [15-911]
Overall, the changes to
parser_state.rs
introduce significant enhancements to error tracking and parsing attempts. The new types and methods are well-documented and appear to be integrated correctly. However, given the complexity of these changes, it is recommended to perform thorough testing to ensure that all new logic behaves as expected and does not introduce regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the SQL grammar looks ok, maybe it can be added in a separate PR?
the better error reporting change is good, but it'd be a semver-breaking change... not sure if there's a way to add it in a semver-compatible way?
if not, maybe we can put it under a feature-guard?
Update: it's been a long time since my last comment and I'm sorry. I currently try to optimize SQL expressions parsing using Pratt parser. After finishing this task, I hope to return to current MR. P.S. Thanks for the comments! |
@tomtau, I am currently tinkering changes I've previously suggested in this MR. The main stumbling block was that these changes were breaking semver. May you please consult me if solution below is more applicable and whether it's semver compatible? Previously semver was broken because I've added new I've taken into account your comment about not making |
yes, I guess that could work |
Thanks for reply! I'll update this PR with latest changes in the coming days |
4c5d291
to
8d1b2fb
Compare
8d1b2fb
to
19b3b5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files ignored due to path filters (1)
pest/Cargo.toml
is excluded by:!**/*.toml
Files selected for processing (3)
- grammars/src/lib.rs (1 hunks)
- pest/src/error.rs (11 hunks)
- pest/src/parser_state.rs (14 hunks)
Files skipped from review as they are similar to previous changes (2)
- grammars/src/lib.rs
- pest/src/error.rs
Additional comments: 7
pest/src/parser_state.rs (7)
- 143-161: The
ParseAttempt
enum and its implementation provide a clear distinction between rule calls and token elements, which is crucial for accurate error reporting. However, consider adding more detailed documentation for theget_rule
method to explain its return value and potential use cases in error handling.Consider enhancing the documentation for the
get_rule
method to provide more context on its usage and return value, especially in error handling scenarios.
- 163-179: The
RulesCallStack
struct is well-designed to track the sequence of rule calls leading to a parsing attempt. The structure is simple and focused, which aids in maintaining clarity in error reporting. The use ofParseAttempt
for thedeepest
field is a good choice, as it allows tracking both rule and token errors.The design and implementation of
RulesCallStack
are clear and effective for its intended purpose.
- 182-198: The
ParsingToken
enum and itsDisplay
implementation are well-thought-out, providing a flexible way to represent different types of tokens encountered during parsing. The inclusion of both sensitive and insensitive tokens, as well as ranges and built-in rules, covers a wide range of parsing scenarios. TheDisplay
implementation is concise and should aid in generating informative error messages.The
ParsingToken
enum and itsDisplay
implementation are comprehensive and well-designed, effectively supporting the generation of informative error messages.
- 266-308: The method
try_add_new_stack_rule
introduces logic to manage the call stacks based on parsing progress and the number of children rules. The approach to handling token call stacks and collapsing long stacks for rules with many failed children is thoughtful, aiming to keep error reporting manageable and focused. However, the complexity of this method warrants careful testing to ensure it behaves as expected in various parsing scenarios.Ensure comprehensive tests cover the
try_add_new_stack_rule
method, particularly focusing on scenarios with multiple children rules and token call stacks. This is crucial to verify the method's behavior in complex parsing situations.
- 310-359: The method
try_add_new_token
effectively handles the addition of new tokens to the expected or unexpected lists based on parsing progress and lookahead status. The logic for managing themax_position
and clearing previous tokens when a new maximum position is found is crucial for accurate error reporting. However, the method's complexity and critical role in error reporting suggest the need for thorough testing.It's recommended to thoroughly test the
try_add_new_token
method, especially focusing on scenarios involving lookahead and parsing progress updates. This will help ensure the method accurately tracks expected and unexpected tokens in various parsing contexts.Verification successful
The review comment accurately captures the complexity and critical role of the
try_add_new_token
method in error reporting within the parser's context. It emphasizes the importance of thorough testing, especially in scenarios involving lookahead and parsing progress updates, which is justified based on the method's implementation and its impact on the parser's behavior. Testing this method thoroughly is essential to ensure accurate tracking of expected and unexpected tokens in various parsing contexts.* 371-376: The implementation of the `Default` trait for `ParseAttempts` is straightforward and utilizes the `new` method for initialization. This ensures consistency in how `ParseAttempts` instances are created, whether through `new` or `default`.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Suggestion for a shell script is skipped in this case. Manual review and testing are recommended.Length of output: 0
The use of the
new
method within theDefault
implementation forParseAttempts
ensures consistent initialization, which is a good practice.
- 410-424: The documentation for the
parse_attempts
field inParserState
provides a clear explanation of its purpose in tracking parsing attempts and related rules at the maximum parsed position. This field is essential for the enhanced error reporting feature, allowing for detailed analysis of parsing failures.The documentation and implementation of the
parse_attempts
field inParserState
are well-executed, supporting the goal of improved error reporting.
6300615
to
a7badd7
Compare
…and (un)expected tokens) gathering
a7badd7
to
f3d1147
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (3)
- grammars/src/lib.rs (1 hunks)
- pest/src/error.rs (11 hunks)
- pest/src/parser_state.rs (14 hunks)
Files skipped from review as they are similar to previous changes (2)
- grammars/src/lib.rs
- pest/src/error.rs
Additional comments: 10
pest/src/parser_state.rs (10)
- 143-152: The
ParseAttempt
enum is well-designed to differentiate between rule calls and token elements that errored during parsing. This distinction is crucial for accurate error reporting and debugging.- 163-178: The
RulesCallStack
structure effectively captures the sequence of rule calls leading to a parsing attempt. Including both the deepest error cause and its parent rule provides a clear path for tracing parsing errors.- 182-198: The
ParsingToken
enum comprehensively represents the different types of tokens that can be encountered during parsing. The inclusion of both sensitive and insensitive tokens, as well as ranges and built-in rules, ensures flexibility in error reporting.- 201-218: The
ParseAttempts
structure is a key addition for tracking all parsing attempts made at the maximum position. This approach to error hinting, focusing on rules that succeeded the farthest, is innovative and likely to improve error diagnostics significantly.- 266-308: The method
try_add_new_stack_rule
intelligently manages the addition of new rule calls to the parsing attempts, considering the maximum position and the number of children rules. This careful handling ensures that the error reporting remains manageable and focused.- 310-359: The method
try_add_new_token
for adding new tokens to the expected or unexpected lists based on the lookahead state and position is well-implemented. It ensures that only relevant tokens are tracked, enhancing the accuracy of error messages.- 361-368: The
nullify_expected_tokens
method effectively resets the state when a new maximum position is reached during parsing. This approach ensures that only the most relevant parsing attempts are considered for error reporting.- 410-424: The addition of the
parse_attempts
field toParserState
is a significant enhancement. It allows for detailed tracking of parsing attempts, aligning with the PR's objective to improve error reporting by storing expected tokens and rules stack calls.- 626-638: The logic within the
try_add_rule_to_stack
closure effectively adds new rules to the parsing attempts stack, considering the atomicity of the parsing state. This careful consideration ensures that the error tracking mechanism is accurately maintained throughout the parsing process.- 934-953: The
handle_token_parse_result
method efficiently tracks (un)expected tokens based on the parsing result and lookahead state. This method is central to the enhanced error reporting mechanism, ensuring that relevant tokens are recorded for improved diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I left a few comments and highlighted a few nits: some small bits of code that seem not covered by tests (they don't seem critical, maybe that other case with for call_stack in group {
(lines 475-484 in error.rs) could be worth testing?) and potentially some future-proofing (but may be unnecessary).
Anyway, I think it's ok to be merged, unless you think that future-proofing would make sense (extra tests could be addressed in a different PR)
ParsingToken::Sensitive { token } => is_whitespace(token.clone()), | ||
ParsingToken::Insensitive { token } => is_whitespace(token.clone()), | ||
ParsingToken::Range { .. } => false, | ||
ParsingToken::BuiltInRule => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems this variant isn't tested
pub fn parse_attempts(&self) -> Option<ParseAttempts<R>> { | ||
self.parse_attempts.clone() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't tested, but it's a trivial method (but maybe it could be used in some assertion)
let attempts = if let Some(ref parse_attempts) = self.parse_attempts { | ||
parse_attempts.clone() | ||
} else { | ||
return None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this case isn't tested, but it's not an important one
contains_meaningful_info = true; | ||
message | ||
} else { | ||
String::from("[Unknown parent rule]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untested (but likely unimportant)
} | ||
if !contains_meaningful_info { | ||
// Have to remove useless line for unknown parent rule. | ||
help_lines.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untested (but likely unimportant)
for call_stack in group { | ||
// Note that `deepest` rule may be `None`. E.g. in case it corresponds | ||
// to WHITESPACE expected token which has no parent rule (on the top level | ||
// parsing). | ||
if let Some(r) = call_stack.deepest.get_rule() { | ||
let helper_message = rule_to_message(r); | ||
if let Some(helper_message) = helper_message { | ||
help_lines.push(format!("{spacing}help: {helper_message}")); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untested?
pub fn get_rule(&self) -> Option<&R> { | ||
match self { | ||
ParseAttempt::Rule(r) => Some(r), | ||
ParseAttempt::Token => None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variant isn't tested (but not an important one, i assume?)
Sensitive { token: String }, | ||
Insensitive { token: String }, | ||
Range { start: char, end: char }, | ||
BuiltInRule, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this record which builtin rule?
/// at the farthest input position. | ||
/// The intuition is such rules will be most likely the query user initially wanted to write. | ||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] | ||
pub struct ParseAttempts<R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking whether any of these new structs/enum should be marked with https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute for future-proofing 🤔 parser_state is a private module, so they wouldn't be exposed, but ParserState
is re-exported, so I assume it'd also make the ones that are accessible via accessors public from the semver-perspective?
Motivation
See the issue and the comment.
Currently in case of parsing fail we sometimes get not very informative error messages (see issue comment for examples). This PR mostly introduces
parser_state.rs
anderror.rs
changes that allow to:Main changes
Currently in case parent rule contains a sequence of rules/chars/strings (sensetive and insensetive) we only track the farthest rule. Pest store it and it's position in
pos_attempts
andattempt_pos
but ignores information about successful or unsuccessful parse of strings and chars.This MR mostly introduces additional logic inside of
rule
,match_stirng
andmatch_insensetive
function calls in order to support tracking of expected tokens.Moreover logic (i'd say fake) of tracking rule calls is added. Imagine parsing have failed on position x of input string. E.g. we will have rule_1, rule_2 and rule_3 failed on the equally far position. Current MR also introduces logic of tracking most top parent of those rules. The idea is that with such (parent, deepest_failed_rule) pair we could generate more informative messages.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Enhancements
Documentation