-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle quoted external CSS URLs in privacy plugin #7651
Conversation
Same as #7650 (comment), need a little more input. |
Ah nice catch, missed that one! Will adapt the PR. Here's a reproduction zip with all 3 quote styles: 9.5.42-css-quoted-urls-missing-from-privacy-plugin.zip With the current version, only the unquoted external URL gets fetched. Let me know if these 2 repro zips are clear enough @squidfunk 🙇 Edit: pushed the new regex and also updated the regex101 link above. Should be ready for another round 🏓 |
7bca798
to
c4105b0
Compare
Thanks! I believe we need to allow spaces before and after the URL, as according to syntax level 3, specifically how URL tokens are parsed. There are two ways how URLs are handled: if they contain a string, they're just considered to be normal function tokens, which definitely demands for whitespace after and before. If they contain a verbatim URL, I believe we need to support whitespace as well. url\(\s*([\"']?)(?P<url>http?[^)'\"]+)\1\s*\) This now works correctly with the following strings, albeit it consumes the trailing whitespace on the verbatim version, which should not be a problem though: /* correct */
url("https://example.com/images/myImg.jpg");
url('https://example.com/images/myImg.jpg');
url( 'https://example.com/images/myImg.jpg' );
url( "https://example.com/images/myImg.jpg" );
url(https://example.com/images/myImg.jpg);
url( https://example.com/images/myImg.jpg );
/* mismatching */
url('https://example.com/images/myImg.jpg);
url("https://example.com/images/myImg.jpg);
url('https://example.com/images/myImg.jpg");
url(https://example.com/images/myImg.jpg');
url(https://example.com/images/myImg.jpg");
/* non-http links */
url("data:image/jpg;base64,iRxVB0…");
url(myImg.jpg);
url(#IDofSVGpath); PS: How can I share on regex101? I'm too stupid to find the share button 😅 |
c4105b0
to
e3268f7
Compare
Perfect, thanks! Added the new pattern. Hehe, Ctrl+S should save it and give you a modal with the link: https://regex101.com/r/LVJJfK/1https://regex101.com/r/LVJJfK/1. Not sure if it requires starting over when receiving a link though. I also just checked and as you say at least urlparse seems to be happy with trailing whitespace: >>> from urllib.parse import urlparse
>>> urlparse("https://example.com/images/myImg.jpg ")
ParseResult(scheme='https', netloc='example.com', path='/images/myImg.jpg ', params='', query='', fragment='') |
Perfect, thanks for investigating! I think this is safe to merge then 🤟 |
Released as part of 9.5.43. |
This one is very similar to #7650, but keeping it separate as it covers different files. I was wondering why our external woff/woff2 files weren't being downloaded and noticed this issue.
The back-reference helps get the exact URL and to avoid more false positives on invalid quoting, though there are still some - but still better than just expecting quotes on either side IMO. Also, this is probably caught by CSS linters and build system anyway.
I had to introduce named groups to handle this, hence the slight change in logic as
findall
doesn't support named groups. But IMO this might also help in the future if more edge cases show up.🛠️ with ❤️ at Siemens