Skip to content

Commit

Permalink
perf: improve the parser's peformance.
Browse files Browse the repository at this point in the history
Formatting error messages every time `unexpected_token_error` was called was a major bottleneck. Many of these error messages are not actually used at all, now the formatting of the error message is deferred to the moment that this error is actually needed.
  • Loading branch information
plusvic committed Jul 8, 2024
1 parent 1c38cad commit 259f1b4
Show file tree
Hide file tree
Showing 17 changed files with 86 additions and 114 deletions.
160 changes: 66 additions & 94 deletions parser-ng/src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,38 +81,8 @@ struct InternalParser<'src> {
/// and the "zero or more" operation (examples: `(A|B)`, `A*`).
opt_depth: usize,

/// Errors found during parsing that haven't been sent to `ready_errors`
/// yet.
///
/// When the parser expects a token, and that tokens is not the next one
/// in input, it produces an error like `expecting "foo", found "bar"`.
/// However, these errors are not sent immediately to `ready_errors`
/// stream because some the errors may occur while parsing optional code,
/// or while parsing some branch in an alternation. For instance, in the
/// grammar rule `A := (B | C)`, if the parser finds an error while parsing
/// `B`, but `C` succeeds, then `A` is successful and the error found while
/// parsing `B` is not reported.
///
/// In the other hand, if both `B` and `C` produce errors, then `A` has
/// failed, but only one of the two errors is reported. The error that
/// gets reported is the one that advanced more in the source code (i.e:
/// the one with the largest span start). This approach tends to produce
/// more meaningful errors.
///
/// The items in the vector are error messages accompanied by the span in
/// the source code where the error occurred.
pending_errors: Vec<(String, Span)>,

/// Errors go from `pending_errors` to `ready_errors` before they are
/// finally pushed to the `output` stream. This extra step has the purpose
/// of removing duplicate messages for the same code span. In certain cases
/// the parser can produce two different error messages for the same span,
/// but this map guarantees that only the first error is taken into account
/// and that any further error for the same span is ignored.
ready_errors: IndexMap<Span, String>,

/// Hash map where keys are positions within the source code, and values
/// are a list of tokens that were expected to match at that position.
/// Hash map where keys are spans within the source code, and values
/// are a list of tokens that were expected to match at that span.
///
/// This hash map plays a crucial role in error reporting during parsing.
/// Consider the following grammar rule:
Expand All @@ -139,14 +109,22 @@ struct InternalParser<'src> {
/// that both `a` and `b` are valid tokens at the position where `c` was
/// found?
///
/// This is where the `expected_tokens` hash map comes into play. We know
/// that `a` is also a valid alternative because the `expect(a)` inside the
/// `opt` was tried and failed. The parser doesn't fail at that point
/// because `a` is optional, but it records that `a` was expected at the
/// position of `c`. When `expect(b)` fails later, the parser looks up
/// This is where the `expected_token_errors` hash map comes into play. We
/// know that `a` is also a valid alternative because the `expect(a)`
/// inside the `opt` was tried and failed. The parser doesn't fail at that
/// point because `a` is optional, but it records that `a` was expected at
/// the position of `c`. When `expect(b)` fails later, the parser looks up
/// any other token (besides `b`) that were expected to match at the
/// position and produces a comprehensive error message.
expected_tokens: HashMap<usize, IndexSet<&'static str>>,
expected_token_errors: HashMap<Span, IndexSet<&'static str>>,

/// Errors that are not yet sent to the `output` stream. The purpose of
/// this map is removing duplicate messages for the same code span. In
/// certain cases the parser can produce two different error messages for
/// the same span, but this map guarantees that only the first error is
/// taken into account and that any further error for the same span is
/// ignored.
pending_errors: IndexMap<Span, String>,
}

impl<'src> From<Tokenizer<'src>> for InternalParser<'src> {
Expand All @@ -155,9 +133,8 @@ impl<'src> From<Tokenizer<'src>> for InternalParser<'src> {
Self {
tokens: TokenStream::new(tokenizer),
output: SyntaxStream::new(),
pending_errors: Vec::new(),
ready_errors: IndexMap::new(),
expected_tokens: HashMap::new(),
pending_errors: IndexMap::new(),
expected_token_errors: HashMap::new(),
opt_depth: 0,
failed: false,
}
Expand Down Expand Up @@ -323,7 +300,7 @@ impl<'src> InternalParser<'src> {
self
}

fn sync(&mut self, recovery_set: &TokenSet) -> &mut Self {
fn sync(&mut self, recovery_set: &'static TokenSet) -> &mut Self {
self.trivia();
match self.peek() {
None => return self,
Expand All @@ -332,13 +309,7 @@ impl<'src> InternalParser<'src> {
return self;
} else {
let span = token.span();
let token_str = token.description();
self.unexpected_token_error(
token_str,
span,
recovery_set,
None,
);
self.unexpected_token_error(span, recovery_set, None);
}
}
}
Expand Down Expand Up @@ -419,7 +390,10 @@ impl<'src> InternalParser<'src> {
/// ```
///
/// Notice how the error is now more localized.
fn recover_and_sync(&mut self, recovery_set: &TokenSet) -> &mut Self {
fn recover_and_sync(
&mut self,
recovery_set: &'static TokenSet,
) -> &mut Self {
self.recover();
/*if let Some(t) = self.peek_non_ws() {
if recovery_set.contains(t) {
Expand Down Expand Up @@ -460,15 +434,15 @@ impl<'src> InternalParser<'src> {
/// # Panics
///
/// If `expected_tokens` is empty.
fn expect(&mut self, expected_tokens: &TokenSet) -> &mut Self {
fn expect(&mut self, expected_tokens: &'static TokenSet) -> &mut Self {
self.expect_d(expected_tokens, None)
}

/// Like [`InternalParser::expect`], but allows specifying a custom
/// description for the expected tokens.
fn expect_d(
&mut self,
expected_tokens: &TokenSet,
expected_tokens: &'static TokenSet,
description: Option<&'static str>,
) -> &mut Self {
assert!(!expected_tokens.is_empty());
Expand All @@ -483,9 +457,7 @@ impl<'src> InternalParser<'src> {
let t = expected_tokens.contains(token);
if t.is_none() {
let span = token.span();
let token_str = token.description();
self.unexpected_token_error(
token_str,
span,
expected_tokens,
description,
Expand Down Expand Up @@ -567,7 +539,7 @@ impl<'src> InternalParser<'src> {
}

/// Like [`InternalParser::expect`], but optional.
fn opt_expect(&mut self, expected_tokens: &TokenSet) -> &mut Self {
fn opt_expect(&mut self, expected_tokens: &'static TokenSet) -> &mut Self {
self.opt(|p| p.expect(expected_tokens))
}

Expand All @@ -593,7 +565,7 @@ impl<'src> InternalParser<'src> {
///
fn if_found<P>(
&mut self,
expected_tokens: &TokenSet,
expected_tokens: &'static TokenSet,
parser: P,
) -> &mut Self
where
Expand All @@ -610,11 +582,14 @@ impl<'src> InternalParser<'src> {
parser(self);
} else {
let span = token.span();
let tokens =
self.expected_tokens.entry(span.start()).or_default();
tokens.extend(
expected_tokens.token_ids().map(|t| t.description()),
);
self.expected_token_errors
.entry(span)
.or_default()
.extend(
expected_tokens
.token_ids()
.map(|t| t.description()),
);
}
}
}
Expand Down Expand Up @@ -689,59 +664,56 @@ impl<'src> InternalParser<'src> {
}

fn flush_errors(&mut self) {
self.expected_tokens.clear();
self.pending_errors.clear();
for (span, error) in self.ready_errors.drain(0..) {
self.expected_token_errors.clear();
for (span, error) in self.pending_errors.drain(0..) {
self.output.push_error(error, span);
}
}

fn unexpected_token_error(
&mut self,
token_str: &str,
span: Span,
expected_tokens: &TokenSet,
expected_tokens: &'static TokenSet,
description: Option<&'static str>,
) {
let tokens = self.expected_tokens.entry(span.start()).or_default();
let tokens =
self.expected_token_errors.entry(span.clone()).or_default();

if let Some(description) = description {
tokens.insert(description);
} else {
tokens
.extend(expected_tokens.token_ids().map(|t| t.description()));
tokens.extend(
expected_tokens.token_ids().map(|token| token.description()),
);
}

let (last, all_except_last) = tokens.as_slice().split_last().unwrap();

let error_msg = if all_except_last.is_empty() {
format!("expecting {last}, found {}", token_str)
} else {
format!(
"expecting {} or {last}, found {}",
itertools::join(all_except_last.iter(), ", "),
token_str,
)
};

self.pending_errors.push((error_msg, span));

if self.opt_depth == 0 {
// Find the pending error starting at the largest offset. If several
// errors start at the same offset, the last one is used (this is
// guaranteed by the `max_by_key` function). `self.pending_errors`
// is left empty.
if let Some((error, span)) = self
.pending_errors
.drain(0..)
.max_by_key(|(_, span)| span.start())
// From all the unexpected token errors, use the one at the largest
// offset. If several errors start at the same offset, the last one
// is used. `self.expected_tokens` is left empty.
if let Some((span, tokens)) = self
.expected_token_errors
.drain()
.max_by_key(|(span, _)| span.start())
{
match self.ready_errors.entry(span) {
match self.pending_errors.entry(span) {
Entry::Occupied(_) => {
// already present, don't replace.
}
Entry::Vacant(v) => {
v.insert(error);
Entry::Vacant(entry) => {
let (last, all_except_last) =
tokens.as_slice().split_last().unwrap();

let error_msg = if all_except_last.is_empty() {
format!("expecting {last}")
} else {
format!(
"expecting {} or {last}",
itertools::join(all_except_last.iter(), ", "),
)
};

entry.insert(error_msg);
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/basic-error-1.out
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [11..12]: expecting `meta`, `strings` or `condition`, found `{`
- [11..12]: expecting `meta`, `strings` or `condition`
4 changes: 2 additions & 2 deletions parser-ng/src/parser/tests/testdata/hex-patterns-error-1.out
Original file line number Diff line number Diff line change
Expand Up @@ -79,5 +79,5 @@ [email protected]
[email protected] "}"

ERRORS:
- [40..41]: expecting BYTE or `(`, found `)`
- [100..101]: expecting BYTE or `(`, found `}`
- [40..41]: expecting BYTE or `(`
- [100..101]: expecting BYTE or `(`
4 changes: 2 additions & 2 deletions parser-ng/src/parser/tests/testdata/hex-patterns-error-2.out
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,5 @@ [email protected]
[email protected] "}"

ERRORS:
- [33..34]: expecting `[`, BYTE, `(` or `}`, found INTEGER
- [93..94]: expecting `[`, BYTE, `(` or `}`, found identifier
- [33..34]: expecting `[`, BYTE, `(` or `}`
- [93..94]: expecting `[`, BYTE, `(` or `}`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/meta-error-1.out
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [14..17]: expecting `meta`, `strings` or `condition`, found identifier
- [14..17]: expecting `meta`, `strings` or `condition`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/meta-error-2.out
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [30..39]: expecting `=`, found `condition`
- [30..39]: expecting `=`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/meta-error-3.out
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [32..41]: expecting `true`, `false`, INTEGER, FLOAT or STRING, found `condition`
- [32..41]: expecting `true`, `false`, INTEGER, FLOAT or STRING
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/meta-error-4.out
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [30..34]: expecting `true`, `false`, INTEGER, FLOAT or STRING, found `meta`
- [30..34]: expecting `true`, `false`, INTEGER, FLOAT or STRING
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/meta-error-5.out
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [30..33]: expecting `true`, `false`, INTEGER, FLOAT or STRING, found identifier
- [30..33]: expecting `true`, `false`, INTEGER, FLOAT or STRING
4 changes: 2 additions & 2 deletions parser-ng/src/parser/tests/testdata/meta-error-6.out
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,5 @@ [email protected]
[email protected] "}"

ERRORS:
- [32..39]: expecting `true`, `false`, INTEGER, FLOAT or STRING, found `strings`
- [44..47]: expecting `:`, found identifier
- [32..39]: expecting `true`, `false`, INTEGER, FLOAT or STRING
- [44..47]: expecting `:`
4 changes: 2 additions & 2 deletions parser-ng/src/parser/tests/testdata/pattern-mods-error-1.out
Original file line number Diff line number Diff line change
Expand Up @@ -75,5 +75,5 @@ [email protected]
[email protected] "}"

ERRORS:
- [36..39]: expecting pattern modifier, pattern identifier or `condition`, found identifier
- [107..108]: expecting STRING, found `)`
- [36..39]: expecting pattern modifier, pattern identifier or `condition`
- [107..108]: expecting STRING
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/patterns-error-1.out
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [37..46]: expecting `[`, BYTE, `(` or `}`, found `condition`
- [37..46]: expecting `[`, BYTE, `(` or `}`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/patterns-error-2.out
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [50..59]: expecting `[`, BYTE, `(` or `}`, found `condition`
- [50..59]: expecting `[`, BYTE, `(` or `}`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/patterns-error-3.out
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [47..50]: expecting STRING, regexp or `{`, found identifier
- [47..50]: expecting STRING, regexp or `{`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/rule-tags-error-1.out
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [12..13]: expecting identifier, found `{`
- [12..13]: expecting identifier
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/rule-tags-error-2.out
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [10..11]: expecting `:` or `{`, found `=`
- [10..11]: expecting `:` or `{`
2 changes: 1 addition & 1 deletion parser-ng/src/parser/tests/testdata/rule-tags-error-3.out
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ [email protected]
[email protected] "}"

ERRORS:
- [12..13]: expecting identifier, found `=`
- [12..13]: expecting identifier

0 comments on commit 259f1b4

Please sign in to comment.