-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGRESSION: Remove new inline JS from ZIMs produced by dev (at least 1.14.0) to comply with restrictive CSPs #2096
Comments
I'm familiar with CSP and I understand why inline scripts need to be removed. My feeling is that this entire area was overlooked in the transition from mobile-endpoint to mobile-html. However, my reading of the code and the README for "pcs", which is the service provided by these scripts, is that they are mostly for theming and image queueing/lazy loading. That is, I don't think this is the script which unhides the sections (#2074). Agree it should be fixed anyways, but I wonder if we could ask upstream to let us opt out of the inline scripts, rather than "forcibly" removing them at the time of scraping. If we don't do that, we will always be playing catch up if new instances are introduced. |
My inclination is to just strip all |
@audiodude It's definitely Regarding your plan of removing all inline scripts, unfortunately at least in this case, it would result in invisible Sections. You can tell what's broken by opening But that is something to deal with in #2074 (by stripping Removing all scripts could also affect lazy loading of images, unless mwOffliner handles lazy loading itself. I guess we don't want two competing systems doing that! There may also be other dynamic functions on other pages you probably don't want to strip. NB I'm not against running any JS at all from the ZIM, it's just not feasible to have basic display of content on all pages depending on running an inline script! |
Thanks for the detailed response, I missed the code in I have #2097 out (and waiting for your review!) to fix #2074. There is no problem with lazy loading of images, mwoffliner already resolves all images, downloads them ahead of time, and embeds them in the ZIM.
I'm not sure I understand this. Are you saying that |
It's for the Chrome extension, which can run JS but blocks inline scripts as per its CSP, when running locally in Service Worker mode... For all I know this is the only example of inline scripts with such a disastrous effect of they cannot be run, so maybe the point will be moot... |
@audiodude I already reviewed #2097 which looks great. |
@Jaifroid I'm having trouble reproducing. When I load a ZIM in the browser extension that should have the problem and view source with CTRL-U, I don't see any <script> tags at all. I assume this is because the extension is modifying the document source? Or maybe Chrome is stripping the tags? What I'd like to do is scrape an article or two with 1.14-dev with the bug, and again with my fix, and compare. |
@audiodude Don't use Ctrl-U to view. Use DevTools (Ctrl-Shift-I, or F12). Here's a step-by-step:
|
Thanks for the extensively detailed instructions! I guess the reason it doesn't work with "view source" is because the article is dynamically loaded into an iframe? |
Possibly, tbh I never use view source, since much more extensive tools are available in DevTools! Quite likely the HTML in the iframe is not shown in view source, because the browser would assume it's a Cross Origin site. In our case, it's Same Origin because we insert that content. |
I imagine the iframe source isn't shown in "view source" because it's loaded after the page loads, but nbd. |
This is a regression of #1578 and its fix may depend on a fix to #2074. However, it is useful to revisit this issue, as a lot of work was done in the past to eradicate inline JS from mwOffliner ZIMs. The issue is that where a restrictive CSP is imposed that bans inline JS as opposed to attached JS (a real-life example of this is in Chrome extensions, so this negatively affects Kiwix JS Browser Extension), no inline JS can be run (as opposed to JS in attached scripts).
In the latest ZIMs scraped from the mobile-html endpoint using dev 1.14.0 (at least), e.g. in
wikipedia_bm_all_nopic_2024-10.zim
, I believe the functionality to unhide hidden sections depends on the following piece of inline JS:As this is blocked by the CSP directive enforced in Chromium extensions, it seriously affects the usability of such ZIMs in Chromium extensions running in ServiceWorkerLocal mode.
This is a regression because, as mentioned, we did a lot of work previously to remove inline JS from all Kiwix scrapers (see the list here kiwix/kiwix-js#865), so we should be very careful not to re-introduce them in development. If it is in fact needed, then the script containing the desired function to run should attach itself internally to an appropriate browser event, and not be run from the HTML.
The fix to this issue may simply be to fix #2074 to ensure all sections are open by default. Then I believe it won't be necessary to run the inline JS here, unless it is doing other things that are desirable (though they should not be essential to be able to read the ZIM in browsers that don't run any JS from the ZIM).
The text was updated successfully, but these errors were encountered: