Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing as HTZ failed #263

Open
LawnchairLarry opened this issue Jan 18, 2022 · 10 comments
Open

Capturing as HTZ failed #263

LawnchairLarry opened this issue Jan 18, 2022 · 10 comments
Labels
question Further information is requested

Comments

@LawnchairLarry
Copy link

...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=80115&lang=de ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=80115&lang=en ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=80115&lang=fr ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&forceview=1 ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmod%2Furl%2Fview.php%3Fid%3D80115 ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&lang=de ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&lang=en ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&lang=fr ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&forceview=1 ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmod%2Furl%2Fview.php%3Fid%3D135815 ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&lang=de ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&lang=en ...
Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&lang=fr ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmod%2Ffolder%2Fview.php%3Fid%3D127753 ...
Capturing linked page (2) https://moodle.htwsaar.de/message/index.php?lang=de ...
Capturing linked page (2) https://moodle.htwsaar.de/message/index.php?lang=en ...
Capturing linked page (2) https://moodle.htwsaar.de/message/index.php?lang=fr ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmessage%2Findex.php ...
Rebuilding links...
Saving data...
Fatal error: Bug : can't construct the Blob.

@danny0838
Copy link
Owner

Can you provide the source URL of the web page and your capture options (through advanced dialog of the save as, or export the options).

@LawnchairLarry
Copy link
Author

Screenshot 2022-01-18 20 44 54
Screenshot 2022-01-18 20 45 06
Screenshot 2022-01-18 20 45 11
Screenshot 2022-01-18 20 45 27
Screenshot 2022-01-18 20 45 06
Screenshot 2022-01-18 20 45 11
Screenshot 2022-01-18 20 45 27
Screenshot 2022-01-18 20 44 54
Screenshot 2022-01-18 20 45 11
Screenshot 2022-01-18 20 45 27
Screenshot 2022-01-18 20 44 54
Screenshot 2022-01-18 20 45 06
Screenshot 2022-01-18 20 45 27
Screenshot 2022-01-18 20 44 54
Screenshot 2022-01-18 20 45 06
Screenshot 2022-01-18 20 45 11
Screenshot 2022-01-18 20 57 40

@danny0838
Copy link
Owner

What is the source URL of the web page?

@LawnchairLarry
Copy link
Author

Well isn't it what's written in the first post of mine? "https://moodle.htwsaar.de/"
It's also in the last picture at the Included URLs for capturing linked pages rule except there is a backslash before the dot ( it puts there by its self when i klicked on the rule.

thx for your fast answer. Can i provide anymore information to help us?

@danny0838
Copy link
Owner

danny0838 commented Jan 19, 2022

No, your first post starts with ... and I cannot see the source URL of the web page you have captured.

I also have tested https://moodle.htwsaar.de/ and the result is normal and contains only a few pages, which is very different from yours:

Capturing (document) [32] https://moodle.htwsaar.de/ ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=5036 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=5050 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=5051 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=100 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=347 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=2153 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=4615 ...
Capturing linked page (1) https://moodle.htwsaar.de/?lang=de ...
Capturing linked page (1) https://moodle.htwsaar.de/?lang=en ...
Capturing linked page (1) https://moodle.htwsaar.de/?lang=fr ...
Capturing linked page (1) https://moodle.htwsaar.de/login/index.php ...
Capturing linked page (1) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F ...
Capturing linked page (1) https://moodle.htwsaar.de/admin/tool/policy/view.php?versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F ...
Capturing linked page (1) https://moodle.htwsaar.de/?cookie-policy ...
Capturing linked page (2) https://moodle.htwsaar.de/auth/shibboleth/index.php ...
Capturing linked page (2) https://moodle.htwsaar.de/login/forgot_password.php ...
Capturing linked page (2) https://moodle.htwsaar.de/ ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Flogin%2Findex.php ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/view.php?policyid=3&versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F&behalfid&manage&numpolicy&totalpolicies&lang=de ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/view.php?policyid=3&versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F&behalfid&manage&numpolicy&totalpolicies&lang=en ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/view.php?policyid=3&versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F&behalfid&manage&numpolicy&totalpolicies&lang=fr ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/index.php ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fadmin%2Ftool%2Fpolicy%2Fview.php%3Fpolicyid%3D3%26amp%3Bversionid%3D3%26amp%3Breturnurl%3Dhttps%253A%252F%252Fmoodle.htwsaar.de%252F%26amp%3Bbehalfid%26amp%3Bmanage%26amp%3Bnumpolicy%26amp%3Btotalpolicies ...
Rebuilding links...
Saving data...
Saved to "**********\WebScrapBook\data\20220119122127837.htz"
Done.

Were you really attempted to capture https://moodle.htwsaar.de/? If not, please provide the original URL of the web page you attempted to capture.

Please also provide the name and version of your OS, browser, and WebScrapBook.

If you have other extensions installed, please also try disabling all other extensions (better restarting the browser afterwards) and performing a capture with the same options.

Please also try a capture with the same options except depth = 1.

Please also try a capture with the same options except Save captured data as: Folder.

@LawnchairLarry
Copy link
Author

Ah i think i know what you mean. The site i wanted to capture is https://moodle.htwsaar.de/course/view.php?id=719
(Its a C++ Tutorial from our scool platform moodle. Of cours it is behind a login but that seems not a problem as long i am logged in.)
And i wanted to download the complete course to have it offline inclusve all kinds of .pdf .txt .cpp data and so on
so i quess i have to go to deepth (3) or more to get it all.
Because it takes a lot of time and i don't need to capture links that linked out of the course i includet that /^https://moodle\.htwsaar\.de// rule as you can see in the last picture.
Is that all right so?

My OS is Win10, actual Firefox browser and actuel WebScrapBook

@danny0838
Copy link
Owner

Ah i think i know what you mean. The site i wanted to capture is https://moodle.htwsaar.de/course/view.php?id=719 (Its a C++ Tutorial from our scool platform moodle. Of cours it is behind a login but that seems not a problem as long i am logged in.) And i wanted to download the complete course to have it offline inclusve all kinds of .pdf .txt .cpp data and so on so i quess i have to go to deepth (3) or more to get it all. Because it takes a lot of time and i don't need to capture links that linked out of the course i includet that /^https://moodle.htwsaar.de// rule as you can see in the last picture. Is that all right so?

Currently it doesn't seem like your configuration is wrong. Unfortunately we cannot reproduce the problem, and thus we need you to provide more information for further investigation.

My OS is Win10, actual Firefox browser and actuel WebScrapBook

Please provide the version of your Firefox and WebScrapBook.

Please complete the tests mentioned above and report whether the same issues persists in each case: with all other extensions disabled, with depth set to 1, and with save to folder.

@LawnchairLarry
Copy link
Author

LawnchairLarry commented Jan 19, 2022

Firefox 96.0.1 (64-Bit) and WebScrapBook 1.1.0

test is running with Save captured data as: Folder.

I hope i can still browse to that downloaded sit then as usual if i save it this way?

@LawnchairLarry
Copy link
Author

LawnchairLarry commented Jan 20, 2022

DownloadLog.txt

Well that looks much much better, exept of a few .pdf files he could not store becauese there is ans 'Ü' in "Übungsstunden" it probably works very well. I post the full log at the end. It would be greate if you can add special characters like Ää Üü and Öö.
In fact it loaded not that only course but the whole moodle platform because i added the "same dite" argument and not the "same directory" argument wich was nessecary because some of the needet files are out of that directory.

I can work with you so far thank you so much! You get full Stars for your work. Is there a posibility to spend you a coffee? ;)

@danny0838
Copy link
Owner

Thank you for the report. Unfortunately we still can't locate the cause of the zip failure. It may be due to some content in the website that is login-protected, and we need you to perform other tests (the "all other extensions disabled" and the "depth set to 1") to confirm it. You can also check whether this issue can be reproduced on another public website.

As for the saving error issue, it seems that the source web site has provided a header with incorrectly encoded chars in the filename, which is treated as having control chars. There is also a bug in WSB causing some control chars not correctly stripped out and causes a saving error, which will be fixed in the next release.

@danny0838 danny0838 added the question Further information is requested label Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants