Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using ahocorasick-rs instead of pyahocorasick #9825

Closed
nijel opened this issue Aug 29, 2023 · 3 comments · Fixed by #9861
Closed

Consider using ahocorasick-rs instead of pyahocorasick #9825

nijel opened this issue Aug 29, 2023 · 3 comments · Fixed by #9861
Assignees
Labels
enhancement Adding or requesting a new feature. good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed.
Milestone

Comments

@nijel
Copy link
Member

nijel commented Aug 29, 2023

Describe the problem

https://pypi.org/project/ahocorasick-rs/ seems faster alternative to pyahocorasick.

Describe the solution you'd like

It would be useful to benchmark it in Weblate use-case and switch to it in case it outperforms pyahocorasick.

Describe alternatives you've considered

No response

Screenshots

No response

Additional context

That being said, I've seen ahocorasick_rs run 1.5× to 7× as fast as pyahocorasick, depending on the options used.

@nijel nijel added hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed. good first issue Opportunity for newcoming contributors. labels Aug 29, 2023
@github-actions
Copy link

This issue seems to be a good fit for newbie contributors. You are welcome to contribute to Weblate! Don't hesitate to ask any questions you would have while implementing this.

You can learn about how to get started in our contributors documentation.

@nijel nijel added this to the 5.1 milestone Aug 29, 2023
@nijel
Copy link
Member Author

nijel commented Aug 29, 2023

I've done some really basic benchmark and it seems that it's reasonable to switch:

from weblate.checks.data import IGNORE_WORDS
import ahocorasick
import ahocorasick_rs
import timeit


def build_py():
    automaton = ahocorasick.Automaton()
    for term in IGNORE_WORDS:
        automaton.add_word(term, term)
    automaton.make_automaton()
    return automaton


def build_rs():
    return ahocorasick_rs.AhoCorasick(
        IGNORE_WORDS,
        implementation=ahocorasick_rs.Implementation.ContiguousNFA,
        store_patterns=False,
    )


print("Build")
print(timeit.timeit("build_py", globals={"build_py": build_py}))
print(timeit.timeit("build_rs", globals={"build_rs": build_rs}))

ac_py = build_py()
ac_rs = build_rs()

print("Find")
print(
    timeit.timeit(
        "list(ac_py.iter('Please enter the correct username and password.'))",
        globals={"ac_py": ac_py},
    )
)

print(
    timeit.timeit(
        "ac_rs.find_matches_as_indexes('Please enter the correct username and password.')",
        globals={"ac_rs": ac_rs},
    )
)

nijel added a commit to nijel/weblate that referenced this issue Sep 5, 2023
@nijel nijel self-assigned this Sep 5, 2023
@nijel nijel modified the milestones: 5.1, 5.0.1 Sep 5, 2023
@nijel nijel added the enhancement Adding or requesting a new feature. label Sep 5, 2023
nijel added a commit to nijel/weblate that referenced this issue Sep 5, 2023
nijel added a commit that referenced this issue Sep 5, 2023
This delivers a better performance.

Fixes #9825
@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature. good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant