Fix corner case regex colorization #678

tristanlatr · 2023-01-23T21:53:30Z

Trying to fix #668

codecov · 2023-01-23T22:14:53Z

Codecov Report

Patch coverage: 90.00% and project coverage change: -0.02 ⚠️

Comparison is base (0636f0d) 92.49% compared to head (88d31d6) 92.48%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #678      +/-   ##
==========================================
- Coverage   92.49%   92.48%   -0.02%     
==========================================
  Files          47       47              
  Lines        8103     8131      +28     
  Branches     1935     1940       +5     
==========================================
+ Hits         7495     7520      +25     
- Misses        354      357       +3     
  Partials      254      254

Impacted Files	Coverage Δ
pydoctor/epydoc/markup/_pyval_repr.py	`92.82% <90.00%> (-0.16%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

tristanlatr · 2023-05-29T09:32:07Z

pydoctor/epydoc/sre_parse36.py

-        # branch (the compiler may optimize this even more)
-        subpatternappend((IN, [item[0] for item in items]))
-        return subpattern
+    # pydoctor: remove all optimizations for round-tripping issues


Do we need to edit sre_parse at all ?
I’m definitely not sure of what I’m doing when editing this file…

We might indeed need this to remove some optimizations, also we might want to add support for latest python regex features. So there are good reasons to have our own fork of sre_parse.

Is there any regex guru in twisted contributors that could give me some feedback on how to remove all opmizations of our sre_parse fork and ensure we don’t break the logic ? @adiroiban

@adiroiban @glyph can you help me on that ?

@tristanlatr what do you need here exactly? I'm not super familiar with this code, and I am not sure what the relevant optimizations are. What would give you confidence the logic isn't broken, beyond the existing test suite?

Ok so basically there are several issues here. We vendor the sre_parse module from the python 3.6 standard library. First, why the 3.6 version? It is because the non capturing groups are optimized from 3.7 onwards such that the SubPattern instances cannot be converted back to the same string. Then, I suspect there are other optimizations that’ are preventing some regex to roundtrip, the changes up there is one of them. Lastly, some regex feature have been added recently to python 3.11 I believe. So we need to add support for those in our sre_parse module to provide complete colorizing for regexes.

What would give you confidence the logic isn't broken, beyond the existing test suite?

Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)
And no, we just have our test suite to check that.

Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)

gotcha, this makes a bit more sense. My plate is honestly kind of full right now, so I don't know how much time I can dedicate to this. Maybe ping the mailing list to see if anyone is interested? Sustainable open source infrastructure is hard :(.

tristanlatr · 2023-05-29T09:33:05Z

pydoctor/test/epydoc/test_pyval_repr.py

    assert color_re(r'^<(?P<descr>.*)>$') == """r<span class="rst-variable-quote">'</span>^&lt;<span class="rst-re-group">(?P&lt;</span><span class="rst-re-ref">descr</span><span class="rst-re-group">&gt;</span>.<span class="rst-re-op">*</span><span class="rst-re-group">)</span>&gt;$<span class="rst-variable-quote">'</span>"""

+@pytest.mark.xfail
+def test_re_named_groups_weird() -> None:
+    # This regex triggers some weird behaviour: it adds the &crarr; element at the end where it should not be...


This test should be be fixed

tristanlatr · 2023-05-29T09:34:25Z

pydoctor/epydoc/markup/_pyval_repr.py

+    linebreakok:bool = attr.ib(default=False, init=False)
+    warnings: List[str] = attr.ib(factory=list)
+
+    # state linked to regex colorization


These state trackers should be moved into the state object.

tristanlatr added 5 commits January 23, 2023 16:52

Trying to fix #668

2d33e16

fix bugs

ccf0ccd

Move up state.mark() call and adjust tests

82d732e

fix annotation

8eaebff

Remove surrogates from bytes

d877429

tristanlatr and others added 10 commits January 23, 2023 17:23

use cast() to fix mypy

bbc1009

Remove optimizations in sre_parse36

c83d90a

Try to be more smart regarding the regex literals.

f3cdabb

Fix else branch

f521ce3

fix error raising

49e8807

Give None default value to expect_failure

76bd2db

Try removing early exception raising

dd9e297

Try fixing linewrap issue...

8ff8220

Try really hard to get regex colorization right

bb750af

Merge branch 'master' into 668-fix-regex-colorizing

0ed194f

tristanlatr marked this pull request as ready for review January 31, 2023 05:18

tristanlatr added 2 commits February 1, 2023 10:20

Merge branch 'master' into 668-fix-regex-colorizing

0fc48d0

Merge branch 'master' into 668-fix-regex-colorizing

f44d3b9

tristanlatr commented May 29, 2023

View reviewed changes

Merge branch 'master' into 668-fix-regex-colorizing

88d31d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix corner case regex colorization #678

Fix corner case regex colorization #678

tristanlatr commented Jan 23, 2023

codecov bot commented Jan 23, 2023 •

edited

Loading

tristanlatr May 29, 2023

tristanlatr May 29, 2023

tristanlatr Jun 7, 2023

tristanlatr Jan 14, 2024

glyph Jan 16, 2024

tristanlatr Jan 16, 2024

tristanlatr Jan 16, 2024

glyph Jan 17, 2024

tristanlatr May 29, 2023

tristanlatr May 29, 2023

Fix corner case regex colorization #678

Are you sure you want to change the base?

Fix corner case regex colorization #678

Conversation

tristanlatr commented Jan 23, 2023

codecov bot commented Jan 23, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 23, 2023 •

edited

Loading