Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request : Add "upload warc file to archive.org " feature #88

Open
kanihal opened this issue Aug 20, 2017 · 3 comments
Open

Feature Request : Add "upload warc file to archive.org " feature #88

kanihal opened this issue Aug 20, 2017 · 3 comments

Comments

@kanihal
Copy link

kanihal commented Aug 20, 2017

Add "upload warc file to archive.org " feature ( which may take secret token from archive.org user account which is needed for bulk upload) to WAIL in electron so that wayback machine (web.archive.org) can index the snapshot of the site.

@machawk1
Copy link

@kanihal As far as I know, even if a WARC file is uploaded to archive.org, it won't be ingest by the globally accessible Wayback Machine at archive.org unless it is a "privileged" account like Archive-Team's. The feature you mentioned could still be accomplished, i.e., the WARC generated by WAIL could be uploaded to archive.org but the contents held within the WARC will not be replayable through the expected means.

@kanihal
Copy link
Author

kanihal commented Aug 21, 2017

From Archive Team FAQ - http://archiveteam.org/index.php?title=Frequently_Asked_Questions

To ensure content integrity, items with WARC files must have the mediatype set to “web” 
and be under the Archive Team collection in order for it to be ingested by the Wayback Machine.

@machawk1
1.How do you get previleged account from archive.org ? is it even possible now?
2.Do we need to send request to Archive Team to get my warc to their ArchiveTeam collection?
3. I find that there is an option to make wayback machine save your content if you send url that you want to save to http://web.archive.org/save/

 wget http://web.archive.org/save/ <url>

Does bulk requests for 'save' on all urls that we get from crawling work? Don't they some limit per IP or something?

@machawk1
Copy link

@kanihal I think this privileged access is just that, i.e., limited to those with the credentials or from Archive-Team tools like Warrior. If anyone were to upload a WARC for ingestion by Wayback, the content may have been manipulated in the WARC prior or may lack some other form of integrity, so I have my doubts as to whether they take external WARC contributions without some form of vetting.

Sending a URI is different from sending a WARC, particularly with the capabilities of the preservation tool contained within the Electron version of WAIL. Further, submitting a URI on archive.org does not give a user access to the generated WARC file. I believe archive.org's "Save Page Now" is meant more for one-offs and not bulk preservation.

archive.org's features are outside of the scope of WAIL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants