Skip to content

Commit

Permalink
Speed up codespell:ignore check by skipping the regex in most cases
Browse files Browse the repository at this point in the history
The changes to provide a public API had some performance related costs
of about 1% runtime. There is no trivial way to offset this any
further without undermining the API we are building. However, we can
pull performance-related shenanigans to compenstate for the cost
introduced.

The codespell codebase unsurprisingly spends a vast majority of its
runtime in various regex related code such as `search` and `finditer`.

The best way to optimize runtime spend in regexes is to not do a regex
in the first place, since the regex engine has a rather steep overhead
over regular string primitives (that is the cost of flexibility). If
the regex rarely matches and there is a very easy static substring
that can be used to rule out the match, then you can speed up the code
by using `substring in string` as a conditional to skip the
regex. This is assuming the regex is used enough for the performance
to matter.

An obvious choice here falls on the `codespell:ignore` regex, because
it has a very distinctive substring in the form of `codespell:ignore`,
which will rule out almost all lines that will not match.

With this little trick, runtime goes from ~5.6s to ~4.9s on the corpus
mentioned in codespell-project#3419.
  • Loading branch information
nthykier committed May 17, 2024
1 parent 8aa4077 commit af32200
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion codespell_lib/spellchecker.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@

_builtin_default_as_tuple = tuple(_builtin_default.split(","))

_inline_ignore_regex = re.compile(r"[^\w\s]\s?codespell:ignore\b(\s+(?P<words>[\w,]*))?")
_codespell_ignore_tag = 'codespell:ignore'
_inline_ignore_regex = re.compile(fr"[^\w\s]\s?{_codespell_ignore_tag}\b(\s+(?P<words>[\w,]*))?")


class LineTokenizer(Protocol[T]):
Expand Down Expand Up @@ -188,6 +189,8 @@ def __init__(
self.load_builtin_dictionaries(builtin_dictionaries)

def _parse_inline_ignore(self, line: str) -> Optional[FrozenSet[str]]:
if _codespell_ignore_tag not in line:
return frozenset()
inline_ignore_match = _inline_ignore_regex.search(line)
if inline_ignore_match:
words = frozenset(
Expand Down

0 comments on commit af32200

Please sign in to comment.