You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm sometimes parsing log files, textified pdf, scanned docs or other things not designed to be parsed. One of the reasons I like TatSu for this is you can be sure you really understood the format within a section and can occasionally explain what you're doing to a non-programmer. In contrast when I do the same with regular expressions, I sometimes find myself silently skipping bits (and it's very hard to read!). Such formats often have fixed numbers of repetitions - and it's interesting to know if ones assumption always holds about the number of repetitions.
Also one sometimes gets cases where you have a repetitions followed by up to b repetitions followed by c repetitions where each group is of a different kind - possibly a harder case to manage.
rule = {int}{4} {int}{2,4} {int}{2} ;
Of course I can just measure the list length in semantics, but I feel this is more properly part of the grammar. So this is low priority.
The text was updated successfully, but these errors were encountered:
The syntax would have to be different, non regex-like, because TatSu already defines {} (and also () and []). There's already a lot of syntax around {}.
Perhaps it could be:
rule = {int}<4> {int}<2,4> {int}<2> ;
I think that TatSu only allows * after {}, so the new syntax could also be:
rule = int*4 int*2-4 (int string)*2 ;
We need to review the current syntax to choose a new one that makes the intention clear and doesn't collide with current semantics.
We should probably first provide an implementation, and decide about the syntax after.
I just spent half an hour trying to find out what other syntaxes do and the only one I could find was 're'! To be fair, it's probably the only repetition qualifier most of your users know. And I understand you reason for rejecting it.
It may be necessary to constrain it so that a sequence of repetition qualifiers can only include one range. So: rule = int*4 int*2:4 int*2:5 int*3
might not be allowed or might be formally determined so the LHS or RHS is greedy.
Did you notice I experimented with a colon in 2:4? I thought it had a more Pythonic flavour, though repetition isn't much like a slice. Of the two you offer, I mostly like the latter but found the '-' sign grated a little because my mind needs it to be subtraction. Too bad elipsis isn't on a standard keyboard.
Could you support:
rule = {expression}{7} ;
or
rule = {expression}{2,5} ;
Example from the re syntax:
https://docs.python.org/3/library/re.html#regular-expression-syntax, search for "Repetition qualifiers"
I'm sometimes parsing log files, textified pdf, scanned docs or other things not designed to be parsed. One of the reasons I like TatSu for this is you can be sure you really understood the format within a section and can occasionally explain what you're doing to a non-programmer. In contrast when I do the same with regular expressions, I sometimes find myself silently skipping bits (and it's very hard to read!). Such formats often have fixed numbers of repetitions - and it's interesting to know if ones assumption always holds about the number of repetitions.
Also one sometimes gets cases where you have a repetitions followed by up to b repetitions followed by c repetitions where each group is of a different kind - possibly a harder case to manage.
rule = {int}{4} {int}{2,4} {int}{2} ;
Of course I can just measure the list length in semantics, but I feel this is more properly part of the grammar. So this is low priority.
The text was updated successfully, but these errors were encountered: