Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep re.findall usage pattern? #4

Open
tovrstra opened this issue Apr 10, 2019 · 2 comments
Open

Keep re.findall usage pattern? #4

tovrstra opened this issue Apr 10, 2019 · 2 comments

Comments

@tovrstra
Copy link
Collaborator

@agalera I noticed (because of a packaging issue) that pywildcard has a second distinction with Python's fnmatch, i.e. how the resulting regular expression can be used with re.findall:

import pywildcard
regex = pywildcard.translate('example/**')
re.findall(regex, "\n".join(urls))
# return ['example/l1/l2/test3-1.py',
#         'example/l1/test2-1.py',
#         'example/l1/test2-2.py',
#         'example/l1/l2/l3/test4-1.py']

The usage pattern with re.findall does not work with Python's fnmatch. Is this difference intentional or just historical? In the latter case, I would suggest to make pywildcard more similar to fnmatch again, such that the following would work:

import pywildcard
import fnmatch
assert pywildcard.translate('example/**') == fnmatch.translate('example/*')
@agalera
Copy link
Owner

agalera commented Apr 10, 2019

It can be changed and so that it works the same.

if it is intentional, the reason for this change is explained here:
https://bugs.python.org/issue25734

I leave the text here:

https://hg.python.org/cpython/file/tip/Lib/fnmatch.py
fnmatch reviewing the code I've noticed that the outcome of the regular expression all returns generated in the first result

l97:
res = res + '.'
to:
res = res + '.
?'

l100:
return res + '\Z(?ms)'
to:
return res + '$(?ms)'

example test:

import re
import fnmatch

urls = ['example/l1/l2/test3-1.py',
        'example/l1/test2-1.py',
        'example/l1/test2-2.py',
        'example/l1/l2/l3/test4-1.py']

regex = fnmatch.translate('example/*')
# 'example\\/.*\\Z(?ms)'
re.findall(regex, "\n".join(urls))
# return ['example/l1/l2/test3-1.py\nexample/l1/test2-1.py\nexample/l1/test2-2.py\nexample/l1/l2/l3/test4-1.py']
# suggested change 
re.findall('example\\/.*?$(?ms)', "\n".join(urls))
# return ['example/l1/l2/test3-1.py',
#         'example/l1/test2-1.py',
#         'example/l1/test2-2.py',
#         'example/l1/l2/l3/test4-1.py']

@tovrstra
Copy link
Collaborator Author

OK, I see it is intentional. I'd like to fully understand everything and so I'm sorry for asking possibly silly questions. What is the advantage of using

regex = pywildcard.translate('example/**')
lst = re.findall(regex, urls_multiline)

instead of using the following?

lst = pywildcard.filter('example/**', urls_multiline.split('\n'))

E.g. is there a large performance difference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants