Keep re.findall usage pattern? #4

tovrstra · 2019-04-10T15:22:40Z

@agalera I noticed (because of a packaging issue) that pywildcard has a second distinction with Python's fnmatch, i.e. how the resulting regular expression can be used with re.findall:

import pywildcard
regex = pywildcard.translate('example/**')
re.findall(regex, "\n".join(urls))
# return ['example/l1/l2/test3-1.py',
#         'example/l1/test2-1.py',
#         'example/l1/test2-2.py',
#         'example/l1/l2/l3/test4-1.py']

The usage pattern with re.findall does not work with Python's fnmatch. Is this difference intentional or just historical? In the latter case, I would suggest to make pywildcard more similar to fnmatch again, such that the following would work:

import pywildcard
import fnmatch
assert pywildcard.translate('example/**') == fnmatch.translate('example/*')

The text was updated successfully, but these errors were encountered:

agalera · 2019-04-10T16:11:30Z

It can be changed and so that it works the same.

if it is intentional, the reason for this change is explained here:
https://bugs.python.org/issue25734

I leave the text here:

https://hg.python.org/cpython/file/tip/Lib/fnmatch.py
fnmatch reviewing the code I've noticed that the outcome of the regular expression all returns generated in the first result

l97:
res = res + '.'
to:
res = res + '.?'

l100:
return res + '\Z(?ms)'
to:
return res + '$(?ms)'

example test:

import re
import fnmatch

urls = ['example/l1/l2/test3-1.py',
        'example/l1/test2-1.py',
        'example/l1/test2-2.py',
        'example/l1/l2/l3/test4-1.py']

regex = fnmatch.translate('example/*')
# 'example\\/.*\\Z(?ms)'
re.findall(regex, "\n".join(urls))
# return ['example/l1/l2/test3-1.py\nexample/l1/test2-1.py\nexample/l1/test2-2.py\nexample/l1/l2/l3/test4-1.py']

# suggested change 
re.findall('example\\/.*?$(?ms)', "\n".join(urls))
# return ['example/l1/l2/test3-1.py',
#         'example/l1/test2-1.py',
#         'example/l1/test2-2.py',
#         'example/l1/l2/l3/test4-1.py']

tovrstra · 2019-04-10T21:56:56Z

OK, I see it is intentional. I'd like to fully understand everything and so I'm sorry for asking possibly silly questions. What is the advantage of using

regex = pywildcard.translate('example/**')
lst = re.findall(regex, urls_multiline)

instead of using the following?

lst = pywildcard.filter('example/**', urls_multiline.split('\n'))

E.g. is there a large performance difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep re.findall usage pattern? #4

Keep re.findall usage pattern? #4

tovrstra commented Apr 10, 2019

agalera commented Apr 10, 2019

tovrstra commented Apr 10, 2019

Keep re.findall usage pattern? #4

Keep re.findall usage pattern? #4

Comments

tovrstra commented Apr 10, 2019

agalera commented Apr 10, 2019

tovrstra commented Apr 10, 2019