Providing Byte-Offsets for Every Match #46

fabianovasi · 2023-10-10T22:51:53Z

Feature Request
Description:

Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.

Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.

Steps to Reproduce:

Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:

hg -e "\p{N}{2}" -b -o a.txt

Result:

The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).

Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.

Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.

The text was updated successfully, but these errors were encountered:

fabianovasi changed the title ~~Incomplete Match Information for Repeated Patterns~~ Providing Byte-Offsets for Every Match Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Providing Byte-Offsets for Every Match #46

Providing Byte-Offsets for Every Match #46

fabianovasi commented Oct 10, 2023 •

edited

Loading

Providing Byte-Offsets for Every Match #46

Providing Byte-Offsets for Every Match #46

Comments

fabianovasi commented Oct 10, 2023 • edited Loading

fabianovasi commented Oct 10, 2023 •

edited

Loading