Skip to content

Commit

Permalink
docs: document fields in InternalParser
Browse files Browse the repository at this point in the history
  • Loading branch information
plusvic committed Jul 3, 2024
1 parent ff67a3b commit bbefd3c
Showing 1 changed file with 73 additions and 2 deletions.
75 changes: 73 additions & 2 deletions parser-ng/src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,12 +61,83 @@ impl<'src> Parser<'src> {
/// Internal implementation of the parser. The [`Parser`] type is only a
/// wrapper around this type.
struct InternalParser<'src> {
/// Stream from where the parser consumes the input tokens.
tokens: TokenStream<'src>,

/// Stream where the parser puts the events that conform the resulting CST.
output: SyntaxStream,

/// If true, the parser is "failure" state. The parser enters the "failure"
/// state when some syntax rule expects a token that doesn't match the
/// next token in the input.
failed: bool,

/// How deep is the parser into "optional" branches of the grammar. An
/// optional branch is one that can fail without the whole production
/// rule failing. For instance, in `A := B? C` the parser can fail while
/// parsing `B`, but this failure is acceptable because `B` is optional.
/// Less obvious cases of optional branches are present in alternatives
/// and the "zero or more" operation (examples: `(A|B)`, `A*`).
opt_depth: usize,

/// Errors found during parsing that haven't been sent to the `output`
/// stream yet.
///
/// When the parser expects a token, and that tokens is not the next one
/// in input, it produces an error like `expecting "foo", found "bar"`.
/// However, these errors are not sent immediately to the `output` stream
/// because some the errors may occur while parsing optional code, or while
/// parsing some branch in an alternation. For instance, in the grammar
/// rule `A := (B | C)`, if the parser finds an error while parsing `B`,
/// but `C` succeeds, then `A` is successful and the error found while
/// parsing `B` is not reported.
///
/// In the other hand, if both `B` and `C` produce errors, then `A` has
/// failed, but only one of the two errors is reported. The error that
/// gets reported is the one that advanced more in the source code (i.e:
/// the one with the largest span start). This approach tends to produce
/// more meaningful errors.
///
/// The items in the vector error messages accompanied by the span in the
/// source code where the error occurred.
pending_errors: Vec<(String, Span)>,

/// Hash map where keys are positions within the source code, and values
/// are a list of tokens that were expected to match at that position.
///
/// This hash map plays a crucial role in error reporting during parsing.
/// Consider the following grammar rule:
///
/// `A := a? b`
///
/// Here, the optional token `a` must be followed by the token `b`. This
/// can be represented (conceptually, not actual code) as:
///
/// ```text
/// self.start(A)
/// .opt(|p| p.expect(a))
/// .expect(b)
/// .end()
/// ```
///
/// If we attempt to parse the sequence `cb`, it will fail at `c` because
/// the rule matches only `ab` and `b`. The error message should be:
///
/// "expecting `a` or `b`, found `c`"
///
/// This error is generated by the `expect(b)` statement. However, the
/// `expect` function only knows about the `b` token. So, how do we know
/// that both `a` and `b` are valid tokens at the position where `c` was
/// found?
///
/// This is where the `expected_tokens` hash map comes into play. We know
/// that `a` is also a valid alternative because the `expect(a)` inside the
/// `opt` was tried and failed. The parser doesn't fail at that point
/// because `a` is optional, but it records that `a` was expected at the
/// position of `c`. When `expect(b)` fails later, the parser looks up
/// any other token (besides `b`) that were expected to match at the
/// position and produces a comprehensive error message.
expected_tokens: HashMap<usize, Vec<&'static str>>,
opt_depth: usize,
failed: bool,
}

impl<'src> From<Tokenizer<'src>> for InternalParser<'src> {
Expand Down

0 comments on commit bbefd3c

Please sign in to comment.