Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: introduce a new hyperscan-backed searchdef type: HyperscanSearchDef #6

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mustafakemalgilor
Copy link
Contributor

Searchkit currently uses python's re which is not known for its' "blow your socks off" pattern scanning performance, hence there is an opportunity for optimization by simply swapping the regex engine.

Hyperscan is a highly optimized, performant regex engine that is typically used high throughput network packet inspection systems (e.g. DPI, IDS/IPS systems) for pattern recognition. The work that searchkit does is aligned with hyperscan's properties so it would be beneficial for searchkit to allow downstream users to leverage hyperscan, especially for searching large files.

This patch introduces a hyperscan-backed SearchDef type which can be used as a drop-in replacement for the existing SearchDef type. The patch also adds hyperscan as a dependency and moves searchkit tests to a base class so the tests can be used for testing both SearchDef and HyperscanSearchDef at the same time.

@mustafakemalgilor mustafakemalgilor force-pushed the enhancement/hyperscan-search-def branch 2 times, most recently from cfb0251 to 2219a4d Compare May 10, 2023 11:20
@mustafakemalgilor mustafakemalgilor force-pushed the enhancement/hyperscan-search-def branch 3 times, most recently from cbbf7b7 to 0d4a67e Compare May 17, 2023 12:56
@mustafakemalgilor mustafakemalgilor force-pushed the enhancement/hyperscan-search-def branch 2 times, most recently from 9ab3709 to d38829d Compare May 31, 2023 07:01
@mustafakemalgilor mustafakemalgilor force-pushed the enhancement/hyperscan-search-def branch 12 times, most recently from 05444e3 to 85396fc Compare May 7, 2024 14:48
@mustafakemalgilor mustafakemalgilor force-pushed the enhancement/hyperscan-search-def branch 2 times, most recently from 6598877 to b5fdd62 Compare May 16, 2024 12:40
…archDef`

Searchkit currently uses python's re which is not known for its' "blow your
socks off" pattern scanning performance, hence there is an opportunity for
optimization by simply swapping the regex engine.

Hyperscan is a highly optimized, performant regex engine that is typically
used high throughput network packet inspection systems (e.g. DPI, IDS/IPS
systems) for pattern recognition. The work that searchkit does is aligned
with hyperscan's properties so it would be beneficial for searchkit to
allow downstream users to leverage hyperscan, especially for searching large
files.

This patch introduces a hyperscan-backed SearchDef type which can be used as
a drop-in replacement for the existing SearchDef type. The patch also adds
hyperscan as a dependency and moves searchkit tests to a base class so the
tests can be used for testing both SearchDef and HyperscanSearchDef at the
same time.

Signed-off-by: Mustafa Kemal Gilor <[email protected]>
Signed-off-by: Mustafa Kemal Gilor <[email protected]>
@mustafakemalgilor mustafakemalgilor marked this pull request as draft July 30, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant