Handle quoted external CSS URLs in privacy plugin #7651

nejch · 2024-10-29T13:10:03Z

This one is very similar to #7650, but keeping it separate as it covers different files. I was wondering why our external woff/woff2 files weren't being downloaded and noticed this issue.

CSS map regex test: https://regex101.com/r/UG2Bv0/1
JS map regex test: https://regex101.com/r/zPRrTp/1

The back-reference helps get the exact URL and to avoid more false positives on invalid quoting, though there are still some - but still better than just expecting quotes on either side IMO. Also, this is probably caught by CSS linters and build system anyway.

I had to introduce named groups to handle this, hence the slight change in logic as findall doesn't support named groups. But IMO this might also help in the future if more edge cases show up.

🛠️ with ❤️ at Siemens

squidfunk · 2024-10-29T15:01:34Z

Same as #7650 (comment), need a little more input.

squidfunk · 2024-10-30T09:17:23Z

Looking at the CSS regex, in your example, cases that should not match still do:

Moving the ? quantifier into the capturing group will allow the back reference to accept an empty match:

url\(([\"']?)(?P<url>\s*http?[^)'\"]+)\1\)

nejch · 2024-10-30T11:49:05Z

Looking at the CSS regex, in your example, cases that should not match still do:
...
Moving the ? quantifier into the capturing group will allow the back reference to accept an empty match:
...
url\(([\"']?)(?P<url>\s*http?[^)'\"]+)\1\)

Ah nice catch, missed that one! Will adapt the PR.

Here's a reproduction zip with all 3 quote styles:

9.5.42-css-quoted-urls-missing-from-privacy-plugin.zip

With the current version, only the unquoted external URL gets fetched.

Let me know if these 2 repro zips are clear enough @squidfunk 🙇

Edit: pushed the new regex and also updated the regex101 link above. Should be ready for another round 🏓

squidfunk · 2024-10-30T12:51:19Z

Thanks! I believe we need to allow spaces before and after the URL, as according to syntax level 3, specifically how URL tokens are parsed. There are two ways how URLs are handled: if they contain a string, they're just considered to be normal function tokens, which definitely demands for whitespace after and before. If they contain a verbatim URL, I believe we need to support whitespace as well.

url\(\s*([\"']?)(?P<url>http?[^)'\"]+)\1\s*\)

This now works correctly with the following strings, albeit it consumes the trailing whitespace on the verbatim version, which should not be a problem though:

/* correct */
url("https://example.com/images/myImg.jpg");
url('https://example.com/images/myImg.jpg');
url(  'https://example.com/images/myImg.jpg'  );
url(  "https://example.com/images/myImg.jpg"  );
url(https://example.com/images/myImg.jpg);
url(   https://example.com/images/myImg.jpg   );


/* mismatching */
url('https://example.com/images/myImg.jpg);
url("https://example.com/images/myImg.jpg);
url('https://example.com/images/myImg.jpg");
url(https://example.com/images/myImg.jpg');
url(https://example.com/images/myImg.jpg");

/* non-http links */
url("data:image/jpg;base64,iRxVB0…");
url(myImg.jpg);
url(#IDofSVGpath);

PS: How can I share on regex101? I'm too stupid to find the share button 😅

nejch · 2024-10-30T13:28:40Z

Perfect, thanks! Added the new pattern.

Hehe, Ctrl+S should save it and give you a modal with the link: https://regex101.com/r/LVJJfK/1https://regex101.com/r/LVJJfK/1. Not sure if it requires starting over when receiving a link though.

I also just checked and as you say at least urlparse seems to be happy with trailing whitespace:

>>> from urllib.parse import urlparse
>>> urlparse("https://example.com/images/myImg.jpg   ")
ParseResult(scheme='https', netloc='example.com', path='/images/myImg.jpg   ', params='', query='', fragment='')

squidfunk · 2024-10-30T14:18:19Z

Perfect, thanks for investigating! I think this is safe to merge then 🤟

squidfunk · 2024-10-31T11:22:13Z

Released as part of 9.5.43.

nejch marked this pull request as ready for review October 29, 2024 13:12

nejch force-pushed the fix/css-url-quoted branch 2 times, most recently from 7bca798 to c4105b0 Compare October 30, 2024 12:18

Handle quoted external CSS URLs in privacy plugin

e3268f7

nejch force-pushed the fix/css-url-quoted branch from c4105b0 to e3268f7 Compare October 30, 2024 13:25

squidfunk merged commit 4918a10 into squidfunk:master Oct 30, 2024
4 checks passed

nejch deleted the fix/css-url-quoted branch October 30, 2024 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle quoted external CSS URLs in privacy plugin #7651

Handle quoted external CSS URLs in privacy plugin #7651

nejch commented Oct 29, 2024 •

edited

Loading

squidfunk commented Oct 29, 2024

squidfunk commented Oct 30, 2024

nejch commented Oct 30, 2024 •

edited

Loading

squidfunk commented Oct 30, 2024

nejch commented Oct 30, 2024

squidfunk commented Oct 30, 2024

squidfunk commented Oct 31, 2024

Handle quoted external CSS URLs in privacy plugin #7651

Handle quoted external CSS URLs in privacy plugin #7651

Conversation

nejch commented Oct 29, 2024 • edited Loading

squidfunk commented Oct 29, 2024

squidfunk commented Oct 30, 2024

nejch commented Oct 30, 2024 • edited Loading

squidfunk commented Oct 30, 2024

nejch commented Oct 30, 2024

squidfunk commented Oct 30, 2024

squidfunk commented Oct 31, 2024

nejch commented Oct 29, 2024 •

edited

Loading

nejch commented Oct 30, 2024 •

edited

Loading