Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #7

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

benbarbersmith
Copy link
Contributor

A bundle of a few minor performance improvements:

  • Improves the performance of the isProfane method by returning true after any bad word is found, rather than matching all bad words in the string.
  • Improves performance of clean by 56x by replacing use of e.replaceAll(RegExp('.'), '*') with ''.padLeft(e.length, '*').
  • Uses a set literal rather than list literal for the bad words.

See commit messages for more detail.

Improves the performance of the isProfane method by returning true after any bad word is found, rather than matching all bad words in the string.
Improves performance of clean by replacing use of:

`e.replaceAll(RegExp('.'), '*')`

with:

`''.padLeft(e.length, '*')`

I benchmarked a few different methods before settling on this implementation:

1. `padLeft`, i.e. `''.padLeft(e.length, '*')`
2. Dart's built-in string multiplication, i.e. `'*' * e.length`
3. The existing implementation

For the benchmark I replaced a string with 10,000,000 characters. The results were conclusive:

```
$ dart bad_words/benchmark/filter_benchmark.dart
PadLeft(RunTime): 131323.25 us.
StrMultiply(RunTime): 306138.85714285716 us.
RegExp(RunTime): 7393837.0 us.
```

The `padLeft` method performed around 56x faster than the previous implementation.
Improve performance by using a set literal rather than list literal for the bad words. Making this change also required removal of some duplicate values.
@TylerSustare
Copy link
Owner

@benbarbersmith sorry for the MIA. I really like these improvements, and I'm inclined to absolutely merge.
The only question I have is: why are some of the "bad word" entries deleted as part of the PR?

@benbarbersmith
Copy link
Contributor Author

Several of the words in the list were repeated.

(Once I converted the list to a set literal, dartanalyzer showed linter warnings that several entries in the set were not unique. I removed the duplicates.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants