Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add support for embedded NUL bytes in PCRE patterns #468

Open
VictorSCushmanFastly opened this issue May 20, 2024 · 1 comment

Comments

@VictorSCushmanFastly
Copy link

Patterns containing embedded NUL bytes are successfully compiled with pcre2_compile when a non-PCRE2_ZERO_TERMINATED length argument is provided to pcre2_compile (e.g. for length-counted binary strings).

These same patters do not compile successfully with libfsm, where (currently) RE_EXEOF is returned from re_comp.

This behavior can be tested from the command line with:

$ echo -ne 'a\x00b' | re -l c -k pair -r pcre -y /dev/stdin
/dev/stdin:1: Syntax error: expected EOF

or by invoking re_comp with a custom byte-string iterator that does not return EOF when \0 is encountered in an input pattern.

It would be nice if there was a way to compile byte strings with embedded NUL bytes. Either by matching PCRE2 verbatim, or via an additional fsm_options flag that indicates that binary strings are accepted in PCRE patterns.

@katef
Copy link
Owner

katef commented May 20, 2024

Current behaviour introduced in 6b1a769

We'd expose this as a compile-time flag for libre's API, and conditionally map \0 to TOK_CHAR in the terminal extraction section for sid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants