You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.
Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.
Steps to Reproduce:
Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:
hg -e "\p{N}{2}" -b -o a.txt
Result:
The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).
Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.
Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.
The text was updated successfully, but these errors were encountered:
fabianovasi
changed the title
Incomplete Match Information for Repeated Patterns
Providing Byte-Offsets for Every Match
Oct 24, 2023
Feature Request
Description:
Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.
Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.
Steps to Reproduce:
Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:
hg -e "\p{N}{2}" -b -o a.txt
Result:
The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).
Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.
Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.
The text was updated successfully, but these errors were encountered: