The collection of scripts for wget and yawbdl.
- Save sites by domain, saving as much content as possible.
- Automatically zip downloaded sites (see the "Notes" section).
- Save by lists (gently).
Install wget
, dig
, whois
.
Install Yet Another WayBack DownLoader.
Create a folder where you want your sites downloaded in a drive where you have enough space available.
Install WSL (requires about 2GB disk space), Cygwin, Git Bash, or some other tool that enables Bash functionality in Windows.
Follow the above section.
Also, there are alternative ways to download web sites: Teleport Pro
, PageNest
.
Open a terminal in the folder and run ./save
or other script with arguments.
./save example.com
./save example.com http://www.example.com/page.htm
./save example.com http://www.example.com/subdir --include-directories=/subdir
./save example.com http://www.example.com --exclude-directories=/trash --header="Accept: text/html"
./save example.com http://www.example.com --wait=15 --random-wait
./save example.com http://www.example.com --reject="morehead.php3*"
./save forum.example.com http://forum.example.com --reject-regex="members.php|search.php"
./save example.com http://www.example.com --exclude-domains=ftp.example.com
./save example.com http://www.example.com/keyword-article.htm --accept-regex=keyword
export SAVESITES_DOMAINS=otherdomain1.com,otherdomain2.com ; ./save example.com
./save_gently example.com
./save_by_list example.txt
./save_archived example.com
./save_archived_by_list example.txt
./save_archivedmap portal.com
./save_maps_by_list portals.txt
./save_collection
Each site may contain thousands and tens thousands files, that's why the scripts zip downloaded sites. Also, it resolves porential issues related to supporting filenames in various file systems.
Mount zips: use Zipster on macOS, avfs
on Linux, some tool like WinMount
on Windows. So, you don't have to unpack zip-files to work with sites.
Visit Cloudflare sites in browser, then use generated cookie as a parameter.
Savings from the Wayback Machine are designed for restoration, not for browsing. Use save_archivedmap
for portals.
Check downloaded sites.
Version for Windows Shell, maybe with a simple GUI.
Autotest sites after downloaded.
Indicate date when existing site was downloaded, maybe resave it (preserving old) if it was a long time ago.
Save from the Wayback Machine with parameters (e.g., only specified URL).
Save site maps from Web (using wget --spider
flag), seeing something like this.
Allow resuming incompleted downloads (backup it first).