diff --git a/.flake8 b/.flake8
new file mode 100644
index 0000000..0f6deaa
--- /dev/null
+++ b/.flake8
@@ -0,0 +1,89 @@
+# vim: set syntax=dosini:
+[flake8]
+exclude = .*,__pycache__
+max-line-length = 120
+
+# B001 Do not use bare `except:`
+# C408 Unnecessary dict call - rewrite as a literal
+# D,DAR: this project has incomplete documentation
+# E221 multiple spaces before operator
+# E303 too many blank lines
+# E722 do not use bare 'except'
+# E741 ambiguous variable name
+# E800 Found commented out code
+# N801 class name 'open_outfile' should use CapWords convention
+# N806 variable in function should be lowercase
+# P101 format string does contain unindexed parameters
+# S101 Use of assert detected
+# S324 Use of weak MD5 hash for security. Consider usedforsecurity=False
+# S406 Using escape to parse untrusted XML data is known to be vulnerable to XML attacks
+# S410 Using lxml to parse untrusted XML data is known to be vulnerable to XML attacks
+# WPS100 Found wrong module name
+# WPS110 Found wrong variable name
+# WPS111 Found too short name
+# WPS113 Found same alias import
+# WPS114 Found underscored number name pattern
+# WPS115 Found upper-case constant in a class
+# WPS120 Found regular name with trailing underscore
+# WPS2XX: Complexity
+# WPS300 Found local folder import
+# WPS301 Found dotted raw import: http.client
+# WPS305 Found `f` string
+# WPS306 Found class without a base class
+# WPS309 Found reversed compare order
+# WPS316 Found context manager with too many assignments
+# WPS317 Found incorrect multi-line parameters
+# WPS318 Found extra indentation
+# WPS319 Found bracket in wrong position
+# WPS322 Found incorrect multi-line string
+# WPS323 Found `%` string formatting
+# WPS326 Found implicit string concatenation
+# WPS329 Found useless `except` case
+# WPS330 Found unnecessary operator
+# WPS336 Found explicit string concatenation
+# WPS337 Found multiline conditions
+# WPS347 Found vague import that may cause confusion
+# WPS360 Found an unnecessary use of a raw string
+# WPS361 Found an inconsistently structured comprehension
+# WPS414 Found incorrect unpacking target
+# WPS420 Found wrong keyword
+# WPS421 Found wrong function call
+# WPS429 Found multiple assign targets
+# WPS430 Found nested function
+# WPS431 Found nested class
+# WPS432 Found magic number
+# WPS433 Found nested import
+# WPS437 Found protected attribute usage
+# WPS440 Found block variables overlap
+# WPS440 Found block variables overlap
+# WPS441 Found control variable used after block
+# WPS442 Found outer scope names shadowing
+# WPS457 Found an infinite while loop
+# WPS458 Found imports collision: argparse
+# WPS460 Found single element destructuring
+# WPS462 Wrong multiline string usage
+# WPS463 Found a getter without a return value
+# WPS473 Found too many empty lines in `def`: 6 > 5
+# WPS501 Found `finally` in `try` block without `except`
+# WPS504 Found negated condition
+# WPS505 Found nested `try` block
+# WPS508 Found incorrect `not` with compare usage
+# WPS509 Found incorrectly nested ternary
+# WPS510 Found `in` used with a non-set container
+# WPS515 Found `open()` used without a context manager
+# WPS516 Found `type()` used to compare types
+# WPS519 Found implicit `sum()` call
+# WPS529 Found implicit `.get()` dict usage
+# WPS531 Found simplifiable returning `if` condition in a function
+# WPS602 Found using `@staticmethod`
+# WPS604 Found incorrect node inside `class` body
+# WPS605 Found method without arguments
+# WPS608 Found incorrect `super()` call
+# WPS609 Found direct magic attribute usage
+# WPS613 Found incorrect `super()` call context
+# WPS615 Found unpythonic getter or setter
+extend-ignore = B001,C408,D,DAR,E221,E303,E722,E741,E800,N801,N806,P101,S101,S324,S406,S410,WPS100,WPS110,WPS111,WPS113,WPS114,WPS115,WPS120,WPS2,WPS300,WPS301,WPS305,WPS306,WPS309,WPS316,WPS317,WPS318,WPS319,WPS322,WPS323,WPS326,WPS329,WPS330,WPS336,WPS337,WPS347,WPS360,WPS361,WPS414,WPS420,WPS421,WPS429,WPS430,WPS431,WPS432,WPS433,WPS437,WPS440,WPS440,WPS441,WPS442,WPS457,WPS458,WPS460,WPS462,WPS463,WPS473,WPS501,WPS504,WPS505,WPS508,WPS509,WPS510,WPS515,WPS516,WPS519,WPS529,WPS531,WPS602,WPS604,WPS605,WPS608,WPS609,WPS613,WPS615
+
+# E131 continuation line unaligned for hanging indent
+per-file-ignores =
+     tumblr_backup/is_reblog.py: E131
diff --git a/.gitignore b/.gitignore
index fce19e4..edcb236 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,5 @@
-settings.py
+/*.egg-info
+/.*/
+/dist/
+*.pyc
+__pycache__/
diff --git a/README.md b/README.md
index 4d4fe95..ffc1be4 100644
--- a/README.md
+++ b/README.md
@@ -1,26 +1,353 @@
-# tumblr-utils
+# tumblr-backup
 
-This is a collection of utilities dealing with Tumblr blogs.
+### About this fork
 
-- `tumble.py` creates new posts from RSS or Atom feeds
-- `tumblr_backup.py` makes a local backup of posts and images
-- `mail_export.py` mails tagged links to a recipient list
+This is a fork of bbolli's
+[tumblr-utils](https://github.com/bbolli/tumblr-utils), with a focus on
+tumblr\_backup.py. It adds Python 3 compatibility, various bug fixes, a few
+enhancements to normal operation, support for dashboard-only blogs, and several
+other features - see the output of `tumblr-backup --help` for the full list of
+options.
 
-These scripts are or have been useful to me over the years.
+---
 
-More documentation can be found in each script's docstring or in
-[tumblr_backup.md](https://github.com/bbolli/tumblr-utils/blob/master/tumblr_backup.md).
+## 0. Description
 
-The utilities run under Python 2.7.
+tumblr-backup is a script that backs up your [Tumblr](http://tumblr.com) blog
+locally.
 
-### Notice
+The backup includes all images both from inline text as well as photo posts. An
+index links to monthly pages, which contain all the posts from the respective
+month with links to single post pages. Command line options select which posts
+to backup and set the output format. The audio and video files can also be
+saved.
 
-On 2015-06-04, I made the v2 API the default on the master branch. The former
-master branch using the v1 API is still available on Github as `api-v1`, but
-will no longer be updated. The one feature that's only available with the old
-API is the option to backup password-protected blogs. There's no way to pass
-a password in Tumblr's v2 API.
+By default, all posts of a blog are backed up in minimally styled HTML5.
 
-### License
+You can see an example of its output [on my home page](http://drbeat.li/tumblr).
 
-[GPL3](http://www.gnu.org/licenses/gpl-3.0.txt).
+
+## 1. Installation
+
+1. `pip install tumblr-backup`
+2. Create an "app" at https://www.tumblr.com/oauth/apps. Follow the instructions
+   there; most values entered don't matter.
+3. `tumblr-backup --set-api-key API_KEY`, where API\_KEY is the OAuth Consumer
+   Token from the app created in the previous step.
+4. Run `tumblr-backup blog-name` as often as you like manually or from a cron
+   job.
+
+There are several optional dependencies that enable additional features:
+
+1. To backup audio and video, install `tumblr-backup[video]`, or you can
+   manually install either yt-dlp or youtube\_dl. If you need HTTP cookies to
+   download, use an appropriate browser plugin to extract the cookie(s) into a
+   file and use option `--cookiefile=file`. See
+   [issue 132](https://github.com/bbolli/tumblr-utils/issues/132).
+2. To enable EXIF tagging, install `tumblr-backup[exif]`, or you can manually
+   install py3exiv2.
+3. To back up notes with the --save-notes option, install
+   `tumblr-backup[notes]`, or you can manually install beautifulsoup4 and lxml.
+4. To use the -F/--filter option to filter the downloaded posts with arbitrary
+   rules based on their metadata, install `tumblr-backup[jq]`. Alternatively,
+   you can manually install the [jq](https://github.com/mwilliamson/jq.py)
+   module.
+5. To install tumblr-backup with all optional features available, use
+   `pip install tumblr-backup[all]`.
+
+
+## 2. Usage
+
+### Synopsis
+
+    tumblr-backup [options] blog-name ...
+
+### Options
+
+```
+positional arguments:
+  blogs
+
+options:
+  -h, --help            show this help message and exit
+  -O OUTDIR, --outdir OUTDIR
+                        set the output directory (default: blog-name)
+  -D, --dirs            save each post in its own folder
+  -q, --quiet           suppress progress messages
+  -i, --incremental     incremental backup mode
+  -l, --likes           save a blog's likes, not its posts
+  -k, --skip-images     do not save images; link to Tumblr instead
+  --save-video          save all video files
+  --save-video-tumblr   save only Tumblr video files
+  --save-audio          save audio files
+  --save-notes          save a list of notes for each post
+  --copy-notes          copy the notes list from a previous archive (inverse:
+                        --no-copy-notes)
+  --notes-limit COUNT   limit requested notes to COUNT, per-post
+  --cookiefile COOKIEFILE
+                        cookie file for youtube-dl, --save-notes, and svc API
+  -j, --json            save the original JSON source
+  -b, --blosxom         save the posts in blosxom format
+  -r, --reverse-month   reverse the post order in the monthly archives
+  -R, --reverse-index   reverse the index file order
+  --tag-index           also create an archive per tag
+  -a HOUR, --auto HOUR  do a full backup at HOUR hours, otherwise do an
+                        incremental backup (useful for cron jobs)
+  -n COUNT, --count COUNT
+                        save only COUNT posts
+  -s SKIP, --skip SKIP  skip the first SKIP posts
+  -p PERIOD, --period PERIOD
+                        limit the backup to PERIOD ('y', 'm', 'd',
+                        YYYY[MM[DD]][Z], or START,END)
+  -N COUNT, --posts-per-page COUNT
+                        set the number of posts per monthly page, 0 for
+                        unlimited
+  -Q REQUEST, --request REQUEST
+                        save posts matching the request
+                        TYPE:TAG:TAG:…,TYPE:TAG:…,…. TYPE can be text, quote,
+                        link, answer, video, audio, photo, chat or any; TAGs
+                        can be omitted or a colon-separated list. Example: -Q
+                        any:personal,quote,photo:me:self
+  -t REQUEST, --tags REQUEST
+                        save only posts tagged TAGS (comma-separated values;
+                        case-insensitive)
+  -T REQUEST, --type REQUEST
+                        save only posts of type TYPE (comma-separated values
+                        from text, quote, link, answer, video, audio, photo,
+                        chat)
+  -F FILTER, --filter FILTER
+                        save posts matching a jq filter (needs jq module)
+  --no-reblog           don't save reblogged posts
+  --only-reblog         save only reblogged posts
+  -I FMT, --image-names FMT
+                        image filename format ('o'=original, 'i'=<post-id>,
+                        'bi'=<blog-name>_<post-id>)
+  -e KW, --exif KW      add EXIF keyword tags to each picture (comma-separated
+                        values; '-' to remove all tags, '' to add no extra
+                        tags)
+  -S, --no-ssl-verify   ignore SSL verification errors
+  --prev-archives DIRS  comma-separated list of directories (one per blog)
+                        containing previous blog archives
+  --no-post-clobber     Do not re-download existing posts
+  --no-server-timestamps
+                        don't set local timestamps from HTTP headers
+  --hostdirs            Generate host-prefixed directories for media
+  --user-agent USER_AGENT
+                        User agent string to use with HTTP requests
+  --skip-dns-check      Skip DNS checks for internet access
+  --threads THREADS     number of threads to use for post retrieval
+  --continue            Continue an incomplete first backup
+  --ignore-diffopt      Force backup over an incomplete archive with different
+                        options
+  --no-get              Don't retrieve files not found in --prev-archives
+  --reuse-json          Reuse the API responses saved with --json (implies
+                        --copy-notes)
+  --internet-archive    Fall back to the Internet Archive for Tumblr media 403
+                        and 404 responses
+  --media-list          Save post media URLs to media.json
+  --id-file FILE        file containing a list of post IDs to save, one per
+                        line
+  --json-info           Just print some info for each blog, don't make a
+                        backup
+```
+
+### Arguments
+
+_blog-name_: The name of the blog to backup.
+
+If your blog is under `.tumblr.com`, you can give just the first domain name
+part; if your blog is under your own domain, give the whole domain name. You
+can give more than one _blog-name_ to backup multiple blogs in one go.
+
+The default blog name(s) can be changed by copying `settings.py.example` to
+`settings.py` and adding the name(s) to the `DEFAULT_BLOGS` list.
+
+### Environment variables
+
+`LC_ALL`, `LC_TIME`, `LANG`: These variables, in decreasing importance,
+determine the locale for month names and the date/time format.
+
+### Exit code
+
+The exit code is 0 if at least one post has been backed up, 1 if no post has
+been backed up, 2 on invocation errors, 3 if the backup was interrupted, or 4
+on HTTP errors.
+
+
+## 3. Operation
+
+By default, tumblr-backup backs up all posts in HTML format.
+
+The generated directory structure looks like this:
+
+    ./ - the current directory
+        <outdir>/ - your blog backup
+            index.html - table of contents with links to the monthly pages
+            backup.css - the default backup style sheet
+            custom.css - the user's style sheet (optional)
+            override.css - the user's style sheet override (optional)
+            archive/
+                <yyyy-mm-pnn>.html - the monthly pages
+                …
+            posts/
+                <id>.html - the single post pages
+                …
+            media/
+                <image.ext> - image files
+                <audio>.mp3 - audio files
+                <video>.mp4 - video files
+                …
+            json/
+                <id>.json - the original JSON posts
+                …
+            tags/
+                index.html - the index of all tag indices
+                <tag>/index.html - the index for <tag>
+                    archive/
+                        <yyyy-mm-pnn>.html - the monthly pages for <tag>
+            theme/
+                avatar.<ext> - the blog’s avatar
+                style.css - the blog’s style sheet
+
+The default `outdir` is the `blog-name`.
+
+If option `-D` is used, one folder per post is generated, and the post's
+images are saved in the same folder. The monthly archive is also stored in a
+folder per month. This results in the same URL structure as on the Tumblr page.
+
+The directories look like this:
+
+    ./ - the current directory
+        <outdir>/ - your blog backup
+            index.html - table of contents with links to the monthly pages
+            backup.css - the default backup style sheet
+            custom.css - the user's style sheet (optional)
+            override.css - the user's style sheet override (optional)
+            archive/
+                <yyyy-mm-pnn>/
+                    index.html - the monthly page
+                …
+            posts/
+                <id>/
+                    index.html - the single post page
+                    <image.ext> - the image file(s) for this post
+                    <audio>.mp3 - audio files
+                    <video>.mp4 - video files
+                    …
+                …
+            json/
+                <id>.json - the original JSON posts
+                …
+            theme/
+                avatar.<ext> - the blog’s avatar
+                style.css - the blog’s style sheet
+
+The modification time of the single post pages is set to the post’s timestamp.
+tumblr-backup applies a simple style to the saved pages. All generated pages are
+[HTML5](http://html5.org).
+
+The index pages are recreated from scratch after every backup, based on the
+existing single post pages. Normally, the index and monthly pages are in
+reverse chronological order, i.e. more recent entries on top. The options `-R`
+and `-r` can be used to reverse the order.
+
+Option `--tag-index` creates a tag index for each tag used in the posts.
+It can be reached through the "Tag index" link in the main index.
+
+If you want to use a custom CSS file, call it `custom.css`, put it in the backup
+folder and do a complete backup. Without a custom CSS file, tumblr-backup saves
+a default style sheet in `backup.css`. The blog's style sheet itself is always
+saved in `theme/style.css`.
+
+It you want to override just a few default styles, create the file
+`override.css` in the backup folder. This file is included automatically by the
+default style sheet. You may have to mark your overriding styles with
+`!important` to make them stick because `override.css` is imported first in the
+style sheet.
+
+Tumblr saves some image files without extension. This probably saves a few
+billion bytes in their database. tumblr-backup restores the image extensions. If
+an image is already backed up, it is not downloaded again. If an image is
+re-uploaded/edited, the old image is kept in the backup, but no post links to
+it. The format of the image file names can be selected with the `-I` option.
+
+It must be noted that saved inline images (from non-photo posts) keep their
+name. This means that only the first image with any given name will be saved;
+the others with the same name will point to the first one.
+
+The download of images can be disabled with option `-k`. In this case, the
+image URLs will point to the original location.
+
+With option `-e`, IPTC keyword tags can be added to image files. There are
+three possibilities:
+
+1. `-e kw1,kw2` adds the post's tags plus `kw1` and `kw2` as keywords
+2. `-e ''` adds just the post's tags
+3. `-e -` removes all keywords from the image
+
+In incremental backup mode, tumblr-backup saves only posts that have higher ids
+than the highest id saved locally. Note that posts that are edited after being
+backed up are not backed up again with this option.
+
+In JSON backup mode, the original JSON source returned by the Tumblr API is
+saved under the `json/` folder in addition to the HTML format.
+
+Automatic archive mode `-a` is designed to be used from an hourly cron script.
+It normally makes an incremental backup except if the current hour is the one
+given as argument. In this case, tumblr-backup will make a full backup. An
+example invocation is `tumblr-backup -qa4` to do a full backup at 4 in the
+morning. This option obviates the need for shell script logic to determine what
+options to pass. If you don't want cron to send a mail if no new posts have been
+backed up, use this crontab entry:
+
+    0 * * * * tumblr-backup -qa4 <blog-name> || test $? -eq 1
+
+This changes the exit code 1 to 0.
+
+In Blosxom format mode, the posts generated are saved in a format suitable for
+re-publishing in [Blosxom](http://www.blosxom.com) with the [Meta
+plugin](http://www.blosxom.com/plugins/meta/meta.htm). Images are not
+downloaded; instead, the image links point back to the original image on
+Tumblr. The posts are saved in the current folder with a `.txt` extension. The
+index is not updated.
+
+In order to limit the set of backed up posts, use the `-n` and `-s` options. The
+most recent post is always number 0, so the option `-n 200` would select the 200
+most recent posts. Calling `tumblr-backup -n 100 -s 200` would skip the 200 most
+recent posts and backup the next 100. `-n 1` is the fastest way to rebuild the
+index pages.
+
+The option `-T` limits the backup to posts of the given type. `-t` saves only
+posts with the given tags. `-Q` combines both: it accepts comma-separated
+requests of the form `TYPE:TAG1:TAG2:…`, where the tags for each post type can
+be different. Omitting the TAGs is allowed; this saves posts of this type with
+any or no tags. Example: `-Q any:personal,quote,photo:me:self` saves all posts
+tagged 'personal', all quotes, and photos tagged 'me' or 'self' or 'personal'
+(because of the `any` request).
+
+The option `--no-reblog` suppresses the backup of reposts of other blogs'
+posts.
+
+If you combine `-n`, `-s`, `-i`, `-p`, `-t`, `-T`, `-Q` and `--no-reblog`, only
+posts matching all criteria will be backed up.
+
+All options use only public Tumblr APIs, so you can use the program to backup
+blogs that you don’t own.
+
+tumblr-backup is developed and tested on Linux and OS X. If you want to run it
+under Windows, I suggest to try the excellent [Cygwin](http://cygwin.com)
+environment.
+
+
+## 4. Changelog
+
+See [here](https://github.com/cebtenzzre/tumblr-utils/commits).
+
+
+## 5. Acknowledgments
+
+- [bdoms](https://github.com/bdoms/tumblr_backup) for the initial implementation
+- [WyohKnott](https://github.com/WyohKnott) for numerous bug reports and patches
+- [Tumblr](https://www.tumblr.com) for their discontinued backup tool whose
+  output was the inspiration for the styling applied in `tumblr_backup`.
+- [Beat Bolli](https://github.com/bbolli/tumblr-utils)
diff --git a/mail_export.py b/mail_export.py
deleted file mode 100755
index 10f2f90..0000000
--- a/mail_export.py
+++ /dev/null
@@ -1,140 +0,0 @@
-#!/usr/bin/env python
-
-"""E-mails a user's recent Tumblr links to recipients.
-
-I use this to automate the distribution of interesting links to
-my geek-buddies mailing list.
-
-- tag your links with a special tag
-- run this script with your tumblr blog name, this tag and the recipient(s)
-  as arguments every hour or so
-
-The script needs write permissions in /var/local to save the ID of the
-most recently mailed link. This ID is saved independently per user and tag.
-"""
-
-import os
-import re
-import smtplib
-import textwrap
-import urllib
-import urlparse
-from email.mime.text import MIMEText
-try:
-    import json
-except ImportError:
-    # Python 2.5 and earlier need this package
-    import simplejson as json
-
-
-# configuration
-SMTP_SERVER = 'localhost'
-SENDER = 'bbolli@ewanet.ch'
-
-
-class TumblrToMail:
-
-    def __init__(self, user, tag, recipients):
-        self.user = self.domain = user
-        if '.' not in self.domain:
-            self.domain += '.tumblr.com'
-        self.tag = tag
-        self.recipients = recipients
-        self.db_file = os.path.expanduser('~/.config/tumblr_mail.latest')
-        self.db_key = (user, tag)
-        try:
-            self.db = eval(open(self.db_file).read(), {}, {})
-        except:
-            self.db = {}
-        self.latest = self.db.get(self.db_key, 0)
-        self.lw = textwrap.TextWrapper(initial_indent='* ', subsequent_indent='  ',
-            break_long_words=False, break_on_hyphens=False
-        )
-        self.tw = textwrap.TextWrapper(initial_indent='  ', subsequent_indent='  ')
-
-    def __del__(self):
-        if self.latest:
-            self.db[self.db_key] = self.latest
-            open(self.db_file, 'w').write(repr(self.db))
-
-    def get_links(self):
-        url = 'http://%s/api/read/json?type=link&filter=text' % self.domain
-        posts = urllib.urlopen(url).read()
-        posts = re.sub(r'^.*?(\{.*\});*$', r'\1', posts)   # extract the JSON structure
-        try:
-            posts = json.loads(posts)
-        except ValueError:
-            print posts
-            return []
-        return [
-            p for p in posts['posts']
-            if int(p['id']) > self.latest and self.tag in p.get('tags', [])
-        ]
-
-    def make_mail(self, link):
-        url = list(urlparse.urlsplit(link['link-url']))
-        url[2] = urllib.quote(url[2])
-        mail = self.lw.fill(u'%s: %s' % (link['link-text'], urlparse.urlunsplit(url)))
-        desc = link['link-description']
-        if desc:
-            mail += '\n\n' + self.tw.fill(desc)
-        return mail
-
-    def run(self, options):
-        links = self.get_links()
-        if not links:
-            return
-
-        body = ('\n\n'.join(self.make_mail(l) for l in links)).strip() + """
-
--- 
-http://%s
-""" % self.domain
-
-        self.latest = max(int(l['id']) for l in links) if not options.dry_run else None
-
-        if not self.recipients and not options.full:
-            print body
-            return
-
-        msg = MIMEText(body.encode('utf-8'))
-        msg.set_charset('utf-8')
-        msg['Subject'] = "Interesting links" if len(links) > 1 else links[0]['link-text']
-        msg['From'] = '%s (%s)' % (SENDER, self.user)
-        if self.recipients:
-            msg['To'] = ', '.join(self.recipients)
-
-        if options.full:
-            print msg.as_string()
-            return
-
-        smtp = smtplib.SMTP(SMTP_SERVER)
-        smtp.sendmail(SENDER, self.recipients, msg.as_string())
-        smtp.quit()
-
-
-def main():
-    import optparse
-    parser = optparse.OptionParser("Usage: %prog [options] blog-name tag [recipient ...]",
-        description="Sends an email generated from tagged link posts.",
-        epilog="Without recipients, prints the mail body to stdout."
-    )
-    parser.add_option('-d', '--dry-run', action='store_true',
-        help="don't save which link was sent last"
-    )
-    parser.add_option('-f', '--full', action='store_true',
-        help="print the full mail with headers to stdout"
-    )
-    options, args = parser.parse_args()
-    try:
-        user = args[0]
-        tag = args[1]
-        recipients = args[2:]
-    except IndexError:
-        parser.error("blog-name and tag are required arguments.")
-
-    TumblrToMail(user, tag, recipients).run(options)
-
-
-if __name__ == '__main__':
-    main()
diff --git a/oauth.py b/oauth.py
deleted file mode 100644
index 7cabc29..0000000
--- a/oauth.py
+++ /dev/null
@@ -1,68 +0,0 @@
-import sys
-import urlparse
-import oauth2 as oauth
-import urllib
-
-consumer_key = sys.argv[1]
-consumer_secret = sys.argv[2]
-
-request_token_url = 'http://www.tumblr.com/oauth/request_token'
-access_token_url = 'http://www.tumblr.com/oauth/access_token'
-authorize_url = 'http://www.tumblr.com/oauth/authorize'
-
-consumer = oauth.Consumer(consumer_key, consumer_secret)
-client = oauth.Client(consumer)
-
-# Step 1: Get a request token. This is a temporary token that is used for
-# having the user authorize an access token and to sign the request to obtain
-# said access token.
-
-resp, content = client.request(request_token_url, "POST",
-    body=urllib.urlencode({"oauth_callback": "http://www.tumblr.com/omgwtf"})
-)
-if resp['status'] != '200':
-    raise Exception("Invalid response %s." % resp['status'])
-
-request_token = dict(urlparse.parse_qsl(content))
-
-print "Request Token:"
-print "    - oauth_token        = %s" % request_token['oauth_token']
-print "    - oauth_token_secret = %s\n" % request_token['oauth_token_secret']
-
-# Step 2: Redirect to the provider. Since this is a CLI script we do not
-# redirect. In a web application you would redirect the user to the URL
-# below.
-
-print "Go to the following link in your browser:"
-print "%s?%s\n" % (authorize_url, urllib.urlencode({
-    "oauth_token": request_token['oauth_token'],
-    "oauth_callback": 'http://localhost/doctorstrange'
-}))
-
-# After the user has granted access to you, the consumer, the provider will
-# redirect you to whatever URL you have told them to redirect to. You can
-# usually define this in the oauth_callback argument as well.
-accepted = 'n'
-while accepted.lower() == 'n':
-    accepted = raw_input('Have you authorized me? (y/n) ')
-    oauth_verifier = raw_input('What is the OAuth Verifier? ')
-
-# Step 3: Once the consumer has redirected the user back to the oauth_callback
-# URL you can request the access token the user has approved. You use the
-# request token to sign this request. After this is done you throw away the
-# request token and use the access token returned. You should store this
-# access token somewhere safe, like a database, for future use.
-token = oauth.Token(
-    request_token['oauth_token'], request_token['oauth_token_secret']
-)
-token.set_verifier(oauth_verifier)
-client = oauth.Client(consumer, token)
-
-resp, content = client.request(access_token_url, "POST")
-access_token = dict(urlparse.parse_qsl(content))
-
-print "Access Token:"
-print "    - oauth_token        = %s" % access_token['oauth_token']
-print "    - oauth_token_secret = %s" % access_token['oauth_token_secret']
-print
-print "You may now access protected resources using the access tokens above.\n"
diff --git a/pyproject.toml b/pyproject.toml
new file mode 100644
index 0000000..5c55d3b
--- /dev/null
+++ b/pyproject.toml
@@ -0,0 +1,91 @@
+[project]
+name = "tumblr-backup"
+version = "1.0.3"
+description = "An advanced tool for backing up Tumblr blogs."
+readme = "README.md"
+requires-python = ">=3.8"
+classifiers = [
+    "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3 :: Only",
+    "Programming Language :: Python :: 3.8",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+]
+dependencies = [
+    "filetype~=1.2",
+    "platformdirs~=4.2",
+    "requests~=2.31",
+    "urllib3~=2.2",
+]
+
+[project.optional-dependencies]
+exif = ["py3exiv2~=0.12"]
+jq = ["jq~=1.6"]
+notes = ["beautifulsoup4~=4.12", "lxml~=5.1"]
+video = ["yt_dlp>=2023.12.30"]
+all = ["tumblr-backup[exif,jq,notes,video]"]
+dev = [
+    "tumblr-backup[all]",
+
+    # dev tools
+    "flake8~=7.0",
+    "mypy~=1.9",
+    "pytype>=2024.2.27",
+    "wemake-python-styleguide~=0.18",
+
+    # type stubs and other optional modules
+    "lxml-stubs~=0.5",
+    "pysocks~=1.7",
+    "types-beautifulsoup4~=4.12",
+    "types-requests~=2.31",
+    "youtube_dl>=2021.12.17",
+]
+
+[project.urls]
+"Homepage" = "https://github.com/cebtenzzre/tumblr-utils"
+"Bug Reports" = "https://github.com/cebtenzzre/tumblr-utils/issues"
+"Source" = "https://github.com/cebtenzzre/tumblr-utils"
+
+[project.scripts]
+tumblr-backup = "tumblr_backup.main:main"
+tb-login = "tumblr_backup.login:main"
+
+[build-system]
+requires = ["setuptools>=43.0.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[tool.mypy]
+files = 'tumblr_backup'
+pretty = true
+strict = true
+warn_unused_ignores = false
+allow_untyped_calls = true
+warn_return_any = false
+allow_subclassing_any = true
+allow_untyped_defs = true
+allow_incomplete_defs = true
+disable_error_code = ['import-untyped']
+
+[tool.pytype]
+inputs = ['tumblr_backup']
+jobs = 'auto'
+bind_decorated_methods = true
+none_is_not_bool = true
+overriding_renamed_parameter_count_checks = true
+strict_none_binding = true
+precise_return = true
+# protocols:
+# - https://github.com/google/pytype/issues/1423
+# - https://github.com/google/pytype/issues/1424
+# strict_import: https://github.com/google/pytype/issues/1444
+# strict_parameter_checks: https://github.com/google/pytype/issues/365
+strict_primitive_comparisons = true
+# strict_undefined_checks: too many false positives
+
+[tool.isort]
+src_paths = ['tumblr_backup']
+line_length = 120
+combine_as_imports = true
diff --git a/settings.py.example b/settings.py.example
deleted file mode 100644
index 74e7691..0000000
--- a/settings.py.example
+++ /dev/null
@@ -1,3 +0,0 @@
-DEFAULT_BLOGS = [
-    'example',
-]
diff --git a/tumble.py b/tumble.py
deleted file mode 100755
index 827c910..0000000
--- a/tumble.py
+++ /dev/null
@@ -1,167 +0,0 @@
-#!/usr/bin/env python3
-
-"""Read a feed from stdin and post its entries to tumblr.
-
-Options:
-    -b sub-blog         Post to a sub-blog of your account.
-    -c cred-file        The name of the credentials file.
-    -e post-id          Edit the existing post with the given ID.
-                        This only looks at the first entry of the feed.
-    -d                  Debug mode: print the raw post data instead
-                        of posting it to tumblr.com.
-"""
-
-"""Authorization is handled via OAuth. Prepare the file ~/.config/tumblr
-with 5 lines:
-
-    - your default blog name
-    - the consumer key
-    - the consumer secret
-    - the access token
-    - the access secret
-
-You get these values by registering for a new application on the tumblr
-developer site and running the included oauth.py with the consumer key and
-secret as arguments to get an access token and secret.
-
-Non-standard Python dependencies:
-    - simplejson (http://pypi.python.org/pypi/simplejson/; for Python <= 2.6)
-    - oauth2 (http://pypi.python.org/pypi/oauth2/)
-    - httplib2 (http://pypi.python.org/pypi/httplib2/)
-"""
-
-import sys
-import os
-import getopt
-from urllib.parse import urlencode
-from datetime import datetime
-from calendar import timegm
-import json
-
-import oauth2 as oauth
-import feedparser
-
-URL_FMT = 'http://api.tumblr.com/v2/blog/%s/post'
-CONFIG = '~/.config/tumblr'
-
-
-class Tumble:
-
-    def __init__(self):
-        self.blog = self.consumer_token = self.consumer_secret = \
-        self.access_token = self.access_secret = None
-        self.post_id = None
-        self.debug = False
-
-    def set_credentials(self, cred_file):
-        (
-            self.blog,
-            self.consumer_token, self.consumer_secret,
-            self.access_token, self.access_secret
-        ) = (s.strip() for s in open(cred_file))
-
-    def tumble(self, feed):
-        feed = feedparser.parse(feed)
-        if self.post_id:
-            return [self.post(feed.entries[0])]
-        else:
-            return [self.post(e) for e in feed.entries]
-
-    def post(self, entry):
-        # the first enclosure determines the media type
-        enc = entry.get('enclosures', [])
-        if enc:
-            enc = enc[0]
-        if enc and enc.type.startswith('image/'):
-            data = {
-                'type': 'photo', 'source': enc.href,
-                'caption': entry.title, 'link': entry.link
-            }
-        elif enc and enc.type.startswith('audio/'):
-            data = {
-                'type': 'audio', 'caption': entry.title, 'external-url': enc.href
-            }
-        elif 'link' in entry and entry.link:
-            data = {'type': 'link', 'url': entry.link, 'title': entry.title}
-            if 'content' in entry:
-                data['description'] = entry.content[0].value
-            elif 'summary' in entry:
-                data['description'] = entry.summary
-        elif 'content' in entry:
-            data = {'type': 'text', 'title': entry.title, 'body': entry.content[0].value}
-        elif 'summary' in entry:
-            data = {'type': 'text', 'title': entry.title, 'body': entry.summary}
-        else:
-            return 'unknown', entry
-        if 'tags' in entry:
-            data['tags'] = ','.join('"%s"' % t.term for t in entry.tags)
-        for d in ('published_parsed', 'updated_parsed'):
-            if d in entry:
-                pub = datetime.fromtimestamp(timegm(entry.get(d)))
-                data['date'] = pub.isoformat(' ')
-                break
-
-        if '.' not in self.blog:
-            self.blog += '.tumblr.com'
-        url = URL_FMT % self.blog
-        if self.post_id:
-            data['id'] = self.post_id
-            op = 'edit'
-            url += '/edit'
-        else:
-            op = 'post'
-        if self.debug:
-            return dict(url=url, entry=entry, data=data)
-
-        for k in data:
-            if type(data[k]) is str:
-                data[k] = data[k].encode('utf-8')
-
-        # do the OAuth thing
-        consumer = oauth.Consumer(self.consumer_token, self.consumer_secret)
-        token = oauth.Token(self.access_token, self.access_secret)
-        client = oauth.Client(consumer, token)
-        try:
-            headers, resp = client.request(url, method='POST', body=urlencode(data))
-            resp = json.loads(resp)
-        except ValueError as e:
-            return 'error', 'json', resp
-        except EnvironmentError as e:
-            return 'error', str(e)
-        if resp['meta']['status'] in (200, 201):
-            return op, str(resp['response']['id'])
-        else:
-            return 'error', headers, resp
-
-if __name__ == '__main__':
-    t = Tumble()
-    try:
-        opts, args = getopt.getopt(sys.argv[1:], 'hb:c:e:d')
-    except getopt.GetoptError:
-        print("Usage: %s [-b blog-name] [-c cred-file] [-e post-id] [-d]" %
-            sys.argv[0].split(os.sep)[-1])
-        sys.exit(1)
-    for o, v in opts:
-        if o == '-h':
-            print(__doc__.strip())
-            sys.exit(0)
-        if o == '-b':
-            t.blog = v
-        elif o == '-c':
-            CONFIG = v
-        elif o == '-e':
-            t.post_id = v
-        elif o == '-d':
-            t.debug = True
-    try:
-        t.set_credentials(os.path.expanduser(CONFIG))
-    except EnvironmentError:
-        sys.stderr.write('Credentials file %s not found or not readable\n' % CONFIG)
-        sys.exit(1)
-    result = t.tumble(sys.stdin.buffer)  # read stdin in binary mode
-    if result:
-        import pprint
-        pprint.pprint(result)
-        if not t.debug and 'error' in [r[0] for r in result]:
-            sys.exit(2)
-    sys.exit(0)
diff --git a/tumblr_backup.md b/tumblr_backup.md
deleted file mode 100644
index 1ac6eb1..0000000
--- a/tumblr_backup.md
+++ /dev/null
@@ -1,312 +0,0 @@
-## 0. Description
-
-`tumblr_backup.py` is a script that backs up your [Tumblr](http://tumblr.com)
-blog locally.
-
-The backup includes all images both from inline text as well as photo posts. An index links to
-monthly pages, which contain all the posts from the respective month with links
-to single post pages. Command line options select which posts to backup and set
-the output format. The audio and video files can also be saved.
-
-By default, all posts of a blog are backed up in minimally styled HTML5.
-
-You can see an example of its output [on my home page](http://drbeat.li/tumblr).
-
-
-## 1. Installation
-
-1. Download and unzip
-   [tumblr-utils.zip](https://github.com/bbolli/tumblr-utils/zipball/master)
-   or clone the Github repo from `git://github.com/bbolli/tumblr-utils.git`.
-2. Copy or symlink `tumblr_backup.py` to a directory on your `$PATH` like
-   `~/bin` or `/usr/local/bin`.
-3. Get your personal Tumblr API key. Before June 2020, the author's API key
-   was distributed with the source code, the then Tumblr denied access using
-   this key. Now, each user needs to get their own key at
-   <https://www.tumblr.com/oauth/apps>. Follow the instructions there; most
-   values entered don't matter. The API key must then be copied between the
-   single quotes in the source code at around line 105 (the line starts with
-   `API_KEY = `).
-4. Run `tumblr_backup.py` _blog-name_ as often as you like manually or from a
-   cron job. The recommendation is to do a hourly incremental backup and a
-   daily complete one.
-
-There are two optional dependencies that enable additional features:
-
-1. To backup audio and video, install [youtube-dl](https://rg3.github.io/youtube-dl/).
-   If you need HTTP cookies to download, use an appropriate browser plugin to
-   extract the cookie(s) into a file and use option `--cookiefile=file`. See
-   [issue 132](https://github.com/bbolli/tumblr-utils/issues/132).
-2. To enable EXIF tagging, install [pyexiv2](https://github.com/escaped/pyexiv2).
-
-The fastest option to install these packages is via the package manager of
-your operating system (apt-get, synaptic, yum, brew, etc). If this is not
-feasible, download, build and install from the links above.
-
-
-## 2. Usage
-
-### Synopsis
-
-    tumblr_backup.py [options] blog-name ...
-
-### Options
-
-    -h, --help            show this help message and exit
-    -O OUTDIR, --outdir=OUTDIR
-                          set the output directory (default: blog-name)
-    -D, --dirs            save each post in its own folder
-    -q, --quiet           suppress progress messages
-    -i, --incremental     incremental backup mode
-    -l, --likes           save a blog's likes, not its posts
-    -j, --json            save the original JSON source
-    -k, --skip-images     do not save images; link to Tumblr instead
-    --save-video          save all video files
-    --save-video-tumblr   save only Tumblr video files
-    --save-audio          save audio files
-    --cookiefile=FILE     cookie file for youtube-dl
-    -b, --blosxom         save the posts in blosxom format
-    -r, --reverse-month   reverse the post order in the monthly archives
-    -R, --reverse-index   reverse the index file order
-    --tag-index           also create an archive per tag
-    -a HOUR, --auto=HOUR  do a full backup at HOUR hours, otherwise do an
-                          incremental backup (useful for cron jobs)
-    -n COUNT, --count=COUNT
-                          save only COUNT posts
-    -s SKIP, --skip=SKIP  skip the first SKIP posts
-    -p PERIOD, --period=PERIOD
-                          limit the backup to PERIOD:
-                            'y': the current year
-                            'm': the current month
-                            'd': the current day (i.e. today ;-)
-                            YYYY: the given year
-                            YYYY-MM: the given month
-                            YYYY-MM-DD: the given day
-    -N COUNT, --posts-per-page=COUNT
-                          set the number of posts per monthly page
-    -Q REQUEST, --request=REQUEST
-                          save posts matching the request
-                          TYPE:TAG:TAG:…,TYPE:TAG:…,…. TYPE can be text, quote,
-                          link, answer, video, audio, photo, chat or any; TAGs
-                          can be omitted or a colon-separated list. Example:
-                          -Q any:personal,quote,photo:me:self
-    -t TAGS, --tags=TAGS  save only posts tagged TAGS (comma-separated values;
-                          case-insensitive)
-    -T TYPE, --type=TYPE  save only posts of type TYPE (comma-separated values;
-                          from text, quote, link, answer, video, audio, photo,
-                          chat)
-    --no-reblog           don't save reblogged posts
-    -I FMT, --image-names=FMT
-                          image filename format ('o'=original, 'i'=<post-id>,
-                          'bi'=<blog-name>_<post-id>)
-    -e KW, --exif=KW      add EXIF keyword tags to each picture (comma-separated
-                          values; '-' to remove all tags, '' to add no extra
-                          tags)
-    -S, --no-ssl-verify   ignore SSL verification errors
-
-### Arguments
-
-_blog-name_: The name of the blog to backup.
-
-If your blog is under `.tumblr.com`, you can give just the first domain name
-part; if your blog is under your own domain, give the whole domain name. You
-can give more than one _blog-name_ to backup multiple blogs in one go.
-
-The default blog name(s) can be changed by copying `settings.py.example` to
-`settings.py` and adding the name(s) to the `DEFAULT_BLOGS` list.
-
-### Environment variables
-
-`LC_ALL`, `LC_TIME`, `LANG`: These variables, in decreasing importance,
-determine the locale for month names and the date/time format.
-
-### Exit code
-
-The exit code is 0 if at least one post has been backed up, 1 if no post has
-been backed up, 2 on invocation errors, 3 if the backup was interrupted, or 4
-on HTTP errors.
-
-
-## 3. Operation
-
-By default, `tumblr_backup` backs up all posts in HTML format.
-
-The generated directory structure looks like this:
-
-    ./ - the current directory
-        <outdir>/ - your blog backup
-            index.html - table of contents with links to the monthly pages
-            backup.css - the default backup style sheet
-            custom.css - the user's style sheet (optional)
-            override.css - the user's style sheet override (optional)
-            archive/
-                <yyyy-mm-pnn>.html - the monthly pages
-                …
-            posts/
-                <id>.html - the single post pages
-                …
-            media/
-                <image.ext> - image files
-                <audio>.mp3 - audio files
-                <video>.mp4 - video files
-                …
-            json/
-                <id>.json - the original JSON posts
-                …
-            tags/
-                index.html - the index of all tag indices
-                <tag>/index.html - the index for <tag>
-                    archive/
-                        <yyyy-mm-pnn>.html - the monthly pages for <tag>
-            theme/
-                avatar.<ext> - the blog’s avatar
-                style.css - the blog’s style sheet
-
-The default `outdir` is the `blog-name`.
-
-If option `-D` is used, one folder per post is generated, and the post's
-images are saved in the same folder. The monthly archive is also stored in a
-folder per month. This results in the same URL structure as on the Tumblr page.
-
-The directories look like this:
-
-    ./ - the current directory
-        <outdir>/ - your blog backup
-            index.html - table of contents with links to the monthly pages
-            backup.css - the default backup style sheet
-            custom.css - the user's style sheet (optional)
-            override.css - the user's style sheet override (optional)
-            archive/
-                <yyyy-mm-pnn>/
-                    index.html - the monthly page
-                …
-            posts/
-                <id>/
-                    index.html - the single post page
-                    <image.ext> - the image file(s) for this post
-                    <audio>.mp3 - audio files
-                    <video>.mp4 - video files
-                    …
-                …
-            json/
-                <id>.json - the original JSON posts
-                …
-            theme/
-                avatar.<ext> - the blog’s avatar
-                style.css - the blog’s style sheet
-
-The modification time of the single post pages is set to the post’s timestamp.
-`tumblr_backup` applies a simple style to the saved pages. All generated pages
-are [HTML5](http://html5.org).
-
-The index pages are recreated from scratch after every backup, based on the
-existing single post pages. Normally, the index and monthly pages are in
-reverse chronological order, i.e. more recent entries on top. The options `-R`
-and `-r` can be used to reverse the order.
-
-Option `--tag-index` creates a tag index for each tag used in the posts.
-It can be reached through the "Tag index" link in the main index.
-
-If you want to use a custom CSS file, call it `custom.css`, put it in the
-backup folder and do a complete backup. Without a custom CSS file,
-`tumblr_backup` saves a default style sheet in `backup.css`. The blog's style
-sheet itself is always saved in `theme/style.css`.
-
-It you want to override just a few default styles, create the file
-`override.css` in the backup folder. This file is included automatically by the
-default style sheet. You may have to mark your overriding styles with
-`!important` to make them stick because `override.css` is imported first in the
-style sheet.
-
-Tumblr saves some image files without extension. This probably saves a few
-billion bytes in their database. `tumblr_backup` restores the image extensions.
-If an image is already backed up, it is not downloaded again. If an image is
-re-uploaded/edited, the old image is kept in the backup, but no post links to
-it. The format of the image file names can be selected with the `-I` option.
-
-It must be noted that saved inline images (from non-photo posts) keep their
-name. This means that only the first image with any given name will be saved;
-the others with the same name will point to the first one.
-
-The download of images can be disabled with option `-k`. In this case, the
-image URLs will point to the original location.
-
-With option `-e`, IPTC keyword tags can be added to image files. There are
-three possibilities:
-
-1. `-e kw1,kw2` adds the post's tags plus `kw1` and `kw2` as keywords
-2. `-e ''` adds just the post's tags
-3. `-e -` removes all keywords from the image
-
-In incremental backup mode, `tumblr_backup` saves only posts that have higher
-ids than the highest id saved locally. Note that posts that are edited after
-being backed up are not backed up again with this option.
-
-In JSON backup mode, the original JSON source returned by the Tumblr API is saved
-under the `json/` folder in addition to the HTML format.
-
-Automatic archive mode `-a` is designed to be used from an hourly cron script.
-It normally makes an incremental backup except if the current hour is the one
-given as argument. In this case, `tumblr_backup` will make a full backup. An
-example invocation is `tumblr_backup.py -qa4` to do a full backup at 4 in the
-morning. This option obviates the need for shell script logic to determine what
-options to pass. If you don't want cron to send a mail if no new posts have
-been backed up, use this crontab entry:
-
-    0 * * * * tumblr_backup -qa4 <blog-name> || test $? -eq 1
-
-This changes the exit code 1 to 0.
-
-In Blosxom format mode, the posts generated are saved in a format suitable for
-re-publishing in [Blosxom](http://www.blosxom.com) with the [Meta
-plugin](http://www.blosxom.com/plugins/meta/meta.htm). Images are not
-downloaded; instead, the image links point back to the original image on
-Tumblr. The posts are saved in the current folder with a `.txt` extension. The
-index is not updated.
-
-In order to limit the set of backed up posts, use the `-n` and `-s` options.
-The most recent post is always number 0, so the option `-n 200` would select
-the 200 most recent posts. Calling `tumblr_backup -n 100 -s 200` would skip the
-200 most recent posts and backup the next 100. `-n 1` is the fastest way to
-rebuild the index pages.
-
-The option `-T` limits the backup to posts of the given type. `-t` saves only
-posts with the given tags. `-Q` combines both: it accepts comma-separated
-requests of the form `TYPE:TAG1:TAG2:…`, where the tags for each post type can
-be different. Omitting the TAGs is allowed; this saves posts of this type with
-any or no tags. Example: `-Q any:personal,quote,photo:me:self` saves all posts
-tagged 'personal', all quotes, and photos tagged 'me' or 'self' or 'personal'
-(because of the `any` request).
-
-The option `--no-reblog` suppresses the backup of reposts of other blogs'
-posts.
-
-If you combine `-n`, `-s`, `-i`, `-p`, `-t`, `-T`, `-Q` and `--no-reblog`, only
-posts matching all criteria will be backed up.
-
-All options use only public Tumblr APIs, so you can use the program to backup
-blogs that you don’t own.
-
-`tumblr_backup` is developed and tested on Linux and OS X. If you want to run
-it under Windows, I suggest to try the excellent [Cygwin](http://cygwin.com)
-environment.
-
-
-## 4. Changelog
-
-See [here](https://github.com/bbolli/tumblr-utils/commits/master/tumblr_backup.py).
-There are no formal releases so check back often!
-
-
-## 5. Acknowledgments
-
-- [bdoms](https://github.com/bdoms/tumblr_backup) for the initial implementation
-- [WyohKnott](https://github.com/WyohKnott) for numerous bug reports and patches
-- [Tumblr](https://www.tumblr.com) for their discontinued backup tool whose
-  output was the inspiration for the styling applied in `tumblr_backup`.
-
-
-## 6. Author
-
-Beat Bolli `<me+tumblr-utils@drbeat.li>`,
-[http://drbeat.li/py/](http://drbeat.li/py/)
diff --git a/tumblr_backup.py b/tumblr_backup.py
deleted file mode 100755
index 8e3cf76..0000000
--- a/tumblr_backup.py
+++ /dev/null
@@ -1,1231 +0,0 @@
-#!/usr/bin/env python
-# encoding: utf-8
-
-# standard Python library imports
-from __future__ import with_statement
-import codecs
-from collections import defaultdict
-from datetime import datetime
-import errno
-from glob import glob
-import hashlib
-from httplib import HTTPException
-import imghdr
-try:
-    import json
-except ImportError:
-    import simplejson as json
-import locale
-import os
-from os.path import join, split, splitext
-import Queue
-import re
-import ssl
-import sys
-import threading
-import time
-import urllib
-import urllib2
-import urlparse
-from xml.sax.saxutils import escape
-
-try:
-    from settings import DEFAULT_BLOGS
-except ImportError:
-    DEFAULT_BLOGS = []
-
-# extra optional packages
-try:
-    import pyexiv2
-except ImportError:
-    pyexiv2 = None
-try:
-    import youtube_dl
-    from youtube_dl.utils import sanitize_filename
-except ImportError:
-    youtube_dl = None
-
-# Format of displayed tags
-TAG_FMT = '#%s'
-
-# Format of tag link URLs; set to None to suppress the links.
-# Named placeholders that will be replaced: domain, tag
-TAGLINK_FMT = 'http://%(domain)s/tagged/%(tag)s'
-
-# exit codes
-EXIT_SUCCESS    = 0
-EXIT_NOPOSTS    = 1
-# EXIT_OPTPARSE = 2 -- returned by module optparse
-EXIT_INTERRUPT  = 3
-EXIT_ERRORS     = 4
-
-# add another JPEG recognizer
-# see http://www.garykessler.net/library/file_sigs.html
-def test_jpg(h, f):
-    if h[:3] == '\xFF\xD8\xFF' and h[3] in "\xDB\xE0\xE1\xE2\xE3":
-        return 'jpg'
-
-imghdr.tests.append(test_jpg)
-
-# variable directory names, will be set in TumblrBackup.backup()
-save_folder = ''
-media_folder = ''
-
-# constant names
-root_folder = os.getcwdu()
-post_dir = 'posts'
-json_dir = 'json'
-media_dir = 'media'
-archive_dir = 'archive'
-theme_dir = 'theme'
-save_dir = '../'
-backup_css = 'backup.css'
-custom_css = 'custom.css'
-avatar_base = 'avatar'
-dir_index = 'index.html'
-tag_index_dir = 'tags'
-
-blog_name = ''
-post_ext = '.html'
-have_custom_css = False
-
-POST_TYPES = (
-    'text', 'quote', 'link', 'answer', 'video', 'audio', 'photo', 'chat'
-)
-POST_TYPES_SET = frozenset(POST_TYPES)
-TYPE_ANY = 'any'
-TAG_ANY = '__all__'
-
-MAX_POSTS = 50
-
-HTTP_TIMEOUT = 90
-HTTP_CHUNK_SIZE = 1024 * 1024
-
-# get your own API key at https://www.tumblr.com/oauth/apps
-API_KEY = ''
-
-# ensure the right date/time format
-try:
-    locale.setlocale(locale.LC_TIME, '')
-except locale.Error:
-    pass
-encoding = 'utf-8'
-time_encoding = locale.getlocale(locale.LC_TIME)[1] or encoding
-
-
-have_ssl_ctx = sys.version_info >= (2, 7, 9)
-if have_ssl_ctx:
-    ssl_ctx = ssl.create_default_context()
-    def urlopen(url):
-        return urllib2.urlopen(url, timeout=HTTP_TIMEOUT, context=ssl_ctx)
-else:
-    def urlopen(url):
-        return urllib2.urlopen(url, timeout=HTTP_TIMEOUT)
-
-
-def log(account, s):
-    if not options.quiet:
-        if account:
-            sys.stdout.write('%s: ' % account)
-        sys.stdout.write(s[:-1] + ' ' * 20 + s[-1:])
-        sys.stdout.flush()
-
-
-def mkdir(dir, recursive=False):
-    if not os.path.exists(dir):
-        try:
-            if recursive:
-                os.makedirs(dir)
-            else:
-                os.mkdir(dir)
-        except OSError as e:
-            if e.errno != errno.EEXIST:
-                raise
-
-
-def path_to(*parts):
-    return join(save_folder, *parts)
-
-
-def open_file(open_fn, parts):
-    if len(parts) > 1:
-        mkdir(path_to(*parts[:-1]), (len(parts) > 2))
-    return open_fn(path_to(*parts))
-
-
-def open_text(*parts):
-    return open_file(
-        lambda f: codecs.open(f, 'w', encoding, 'xmlcharrefreplace'), parts
-    )
-
-
-def open_media(*parts):
-    return open_file(lambda f: open(f, 'wb'), parts)
-
-
-def strftime(format, t=None):
-    if t is None:
-        t = time.localtime()
-    return time.strftime(format, t).decode(time_encoding)
-
-
-def get_api_url(account):
-    """construct the tumblr API URL"""
-    global blog_name
-    blog_name = account
-    if '.' not in account:
-        blog_name += '.tumblr.com'
-    return 'https://api.tumblr.com/v2/blog/%s/%s' % (
-        blog_name, 'likes' if options.likes else 'posts'
-    )
-
-
-def set_period():
-    """Prepare the period start and end timestamps"""
-    i = 0
-    tm = [int(options.period[:4]), 1, 1, 0, 0, 0, 0, 0, -1]
-    if len(options.period) >= 6:
-        i = 1
-        tm[1] = int(options.period[4:6])
-    if len(options.period) == 8:
-        i = 2
-        tm[2] = int(options.period[6:8])
-    options.p_start = time.mktime(tm)
-    tm[i] += 1
-    options.p_stop = time.mktime(tm)
-
-
-def apiparse(base, count, start=0):
-    params = {'api_key': API_KEY, 'limit': count, 'reblog_info': 'true'}
-    if start > 0:
-        params['offset'] = start
-    url = base + '?' + urllib.urlencode(params)
-    for _ in range(10):
-        try:
-            resp = urlopen(url)
-            data = resp.read()
-        except (EnvironmentError, HTTPException) as e:
-            sys.stderr.write("%s getting %s\n" % (e, url))
-            continue
-        if resp.info().gettype() == 'application/json':
-            break
-        sys.stderr.write("Unexpected Content-Type: '%s'\n" % resp.info().gettype())
-        return None
-    else:
-        return None
-    try:
-        doc = json.loads(data)
-    except ValueError as e:
-        sys.stderr.write('%s: %s\n%d %s %s\n%r\n' % (
-            e.__class__.__name__, e, resp.getcode(), resp.msg, resp.info().gettype(), data
-        ))
-        return None
-    return doc if doc.get('meta', {}).get('status', 0) == 200 else None
-
-
-def add_exif(image_name, tags):
-    try:
-        metadata = pyexiv2.ImageMetadata(image_name)
-        metadata.read()
-    except EnvironmentError:
-        sys.stderr.write("Error reading metadata for image %s\n" % image_name)
-        return
-    KW_KEY = 'Iptc.Application2.Keywords'
-    if '-' in options.exif:     # remove all tags
-        if KW_KEY in metadata.iptc_keys:
-            del metadata[KW_KEY]
-    else:                       # add tags
-        if KW_KEY in metadata.iptc_keys:
-            tags |= set(metadata[KW_KEY].value)
-        tags = list(tag.strip().lower() for tag in tags | options.exif if tag)
-        metadata[KW_KEY] = pyexiv2.IptcTag(KW_KEY, tags)
-    try:
-        metadata.write()
-    except EnvironmentError:
-        sys.stderr.write("Writing metadata failed for tags: %s in: %s\n" % (tags, image_name))
-
-
-def save_style():
-    with open_text(backup_css) as css:
-        css.write('''\
-@import url("override.css");
-
-body { width: 720px; margin: 0 auto; }
-body > footer { padding: 1em 0; }
-header > img { float: right; }
-img { max-width: 720px; }
-blockquote { margin-left: 0; border-left: 8px #999 solid; padding: 0 24px; }
-.archive h1, .subtitle, article { padding-bottom: 0.75em; border-bottom: 1px #ccc dotted; }
-article[class^="liked-"] { background-color: #f0f0f8; }
-.post a.llink { display: none; }
-header a, footer a { text-decoration: none; }
-footer, article footer a { font-size: small; color: #999; }
-''')
-
-
-def get_avatar():
-    try:
-        resp = urlopen('http://api.tumblr.com/v2/blog/%s/avatar' % blog_name)
-        avatar_data = resp.read()
-    except (EnvironmentError, HTTPException):
-        return
-    avatar_file = avatar_base + '.' + imghdr.what(None, avatar_data[:32])
-    with open_media(theme_dir, avatar_file) as f:
-        f.write(avatar_data)
-
-
-def get_style():
-    """Get the blog's CSS by brute-forcing it from the home page.
-    The v2 API has no method for getting the style directly.
-    See https://groups.google.com/d/msg/tumblr-api/f-rRH6gOb6w/sAXZIeYx5AUJ"""
-    try:
-        resp = urlopen('http://%s/' % blog_name)
-        page_data = resp.read()
-    except (EnvironmentError, HTTPException):
-        return
-    for match in re.findall(r'(?s)<style type=.text/css.>(.*?)</style>', page_data):
-        css = match.strip().decode(encoding, 'replace')
-        if not '\n' in css:
-            continue
-        css = css.replace('\r', '').replace('\n    ', '\n')
-        with open_text(theme_dir, 'style.css') as f:
-            f.write(css + '\n')
-        return
-
-
-class Index:
-
-    def __init__(self, blog, body_class='index'):
-        self.blog = blog
-        self.body_class = body_class
-        self.index = defaultdict(lambda: defaultdict(list))
-
-    def add_post(self, post):
-        self.index[post.tm.tm_year][post.tm.tm_mon].append(post)
-        return self
-
-    def save_index(self, index_dir='.', title=None):
-        self.archives = sorted(((y, m) for y in self.index for m in self.index[y]),
-            reverse=options.reverse_month
-        )
-        subtitle = self.blog.title if title else self.blog.subtitle
-        title = title or self.blog.title
-        with open_text(index_dir, dir_index) as idx:
-            idx.write(self.blog.header(title, self.body_class, subtitle, True))
-            if options.tag_index and self.body_class == 'index':
-                idx.write('<p><a href=%s/%s>Tag index</a></p>\n' % (
-                    tag_index_dir, dir_index
-                ))
-            for year in sorted(self.index.keys(), reverse=options.reverse_index):
-                self.save_year(idx, index_dir, year)
-            idx.write(u'<footer><p>Generated on %s by <a href=https://github.com/'
-                'bbolli/tumblr-utils>tumblr-utils</a>.</p></footer>\n' % strftime('%x %X')
-            )
-
-    def save_year(self, idx, index_dir, year):
-        idx.write('<h3>%s</h3>\n<ul>\n' % year)
-        for month in sorted(self.index[year].keys(), reverse=options.reverse_index):
-            tm = time.localtime(time.mktime([year, month, 3, 0, 0, 0, 0, 0, -1]))
-            month_name = self.save_month(index_dir, year, month, tm)
-            idx.write(u'    <li><a href=%s/%s title="%d post(s)">%s</a></li>\n' % (
-                archive_dir, month_name, len(self.index[year][month]),
-                strftime('%B', tm)
-            ))
-        idx.write('</ul>\n\n')
-
-    def save_month(self, index_dir, year, month, tm):
-        posts = sorted(self.index[year][month], key=lambda x: x.date, reverse=options.reverse_month)
-        posts_month = len(posts)
-        posts_page = options.posts_per_page if options.posts_per_page >= 1 else posts_month
-
-        def pages_per_month(y, m):
-            posts = len(self.index[y][m])
-            return posts / posts_page + bool(posts % posts_page)
-
-        def next_month(inc):
-            i = self.archives.index((year, month))
-            i += inc
-            if i < 0 or i >= len(self.archives):
-                return 0, 0
-            return self.archives[i]
-
-        FILE_FMT = '%d-%02d-p%s'
-        pages_month = pages_per_month(year, month)
-        for page, start in enumerate(range(0, posts_month, posts_page), start=1):
-
-            archive = [self.blog.header(strftime('%B %Y', tm), body_class='archive')]
-            archive.extend(p.get_post() for p in posts[start:start + posts_page])
-
-            file_name = FILE_FMT % (year, month, page)
-            if options.dirs:
-                base = save_dir + archive_dir + '/'
-                suffix = '/'
-                arch = open_text(index_dir, archive_dir, file_name, dir_index)
-                file_name += suffix
-            else:
-                base = ''
-                suffix = post_ext
-                file_name += suffix
-                arch = open_text(index_dir, archive_dir, file_name)
-
-            if page > 1:
-                pp = FILE_FMT % (year, month, page - 1)
-            else:
-                py, pm = next_month(-1)
-                pp = FILE_FMT % (py, pm, pages_per_month(py, pm)) if py else ''
-                first_file = file_name
-
-            if page < pages_month:
-                np = FILE_FMT % (year, month, page + 1)
-            else:
-                ny, nm = next_month(+1)
-                np = FILE_FMT % (ny, nm, 1) if ny else ''
-
-            archive.append(self.blog.footer(base, pp, np, suffix))
-
-            arch.write('\n'.join(archive))
-
-        return first_file
-
-
-class Indices:
-
-    def __init__(self, blog):
-        self.blog = blog
-        self.main_index = Index(blog)
-        self.tags = defaultdict(lambda: Index(blog, 'tag-archive'))
-
-    def build_index(self):
-        filter = join('*', dir_index) if options.dirs else '*' + post_ext
-        self.all_posts = map(LocalPost, glob(path_to(post_dir, filter)))
-        for post in self.all_posts:
-            self.main_index.add_post(post)
-            if options.tag_index:
-                for tag, name in post.tags:
-                    self.tags[tag].add_post(post).name = name
-
-    def save_index(self):
-        self.main_index.save_index()
-        if options.tag_index:
-            self.save_tag_index()
-
-    def save_tag_index(self):
-        global save_dir
-        save_dir = '../../../'
-        mkdir(path_to(tag_index_dir))
-        self.fixup_media_links()
-        tag_index = [self.blog.header('Tag index', 'tag-index', self.blog.title, True), '<ul>']
-        for tag, index in sorted(self.tags.items(), key=lambda kv: kv[1].name):
-            digest = hashlib.md5(tag).hexdigest()
-            index.save_index(tag_index_dir + os.sep + digest,
-                u"Tag ‛%s’" % index.name
-            )
-            tag_index.append(u'    <li><a href=%s/%s>%s</a></li>' % (
-                digest, dir_index, escape(index.name)
-            ))
-        tag_index.extend(['</ul>', ''])
-        with open_text(tag_index_dir, dir_index) as f:
-            f.write(u'\n'.join(tag_index))
-
-    def fixup_media_links(self):
-        """Fixup all media links which now have to be two folders lower."""
-        shallow_media = '../' + media_dir
-        deep_media = save_dir + media_dir
-        for p in self.all_posts:
-            p.post = p.post.replace(shallow_media, deep_media)
-
-
-class TumblrBackup:
-
-    def __init__(self):
-        self.errors = False
-        self.total_count = 0
-
-    def exit_code(self):
-        if self.errors:
-            return EXIT_ERRORS
-        if self.total_count == 0:
-            return EXIT_NOPOSTS
-        return EXIT_SUCCESS
-
-    def header(self, title='', body_class='', subtitle='', avatar=False):
-        root_rel = {
-            'index': '', 'tag-index': '../', 'tag-archive': '../../'
-        }.get(body_class, save_dir)
-        css_rel = root_rel + (custom_css if have_custom_css else backup_css)
-        if body_class:
-            body_class = ' class=' + body_class
-        h = u'''<!DOCTYPE html>
-
-<meta charset=%s>
-<title>%s</title>
-<link rel=stylesheet href=%s>
-
-<body%s>
-
-<header>
-''' % (encoding, self.title, css_rel, body_class)
-        if avatar:
-            f = glob(path_to(theme_dir, avatar_base + '.*'))
-            if f:
-                h += '<img src=%s%s/%s alt=Avatar>\n' % (root_rel, theme_dir, split(f[0])[1])
-        if title:
-            h += u'<h1>%s</h1>\n' % title
-        if subtitle:
-            h += u'<p class=subtitle>%s</p>\n' % subtitle
-        h += '</header>\n'
-        return h
-
-    def footer(self, base, previous_page, next_page, suffix):
-        f = '<footer><nav>'
-        f += '<a href=%s%s rel=index>Index</a>\n' % (save_dir, dir_index)
-        if previous_page:
-            f += '| <a href=%s%s%s rel=prev>Previous</a>\n' % (base, previous_page, suffix)
-        if next_page:
-            f += '| <a href=%s%s%s rel=next>Next</a>\n' % (base, next_page, suffix)
-        f += '</nav></footer>\n'
-        return f
-
-    def backup(self, account):
-        """makes single files and an index for every post on a public Tumblr blog account"""
-
-        base = get_api_url(account)
-
-        # make sure there are folders to save in
-        global save_folder, media_folder, post_ext, post_dir, save_dir, have_custom_css
-        if options.blosxom:
-            save_folder = root_folder
-            post_ext = '.txt'
-            post_dir = os.curdir
-            post_class = BlosxomPost
-        else:
-            save_folder = join(root_folder, options.outdir or account)
-            media_folder = path_to(media_dir)
-            if options.dirs:
-                post_ext = ''
-                save_dir = '../../'
-                mkdir(path_to(post_dir), True)
-            else:
-                mkdir(save_folder, True)
-            post_class = TumblrPost
-            have_custom_css = os.access(path_to(custom_css), os.R_OK)
-
-        self.post_count = 0
-
-        # get the highest post id already saved
-        ident_max = None
-        if options.incremental:
-            try:
-                ident_max = max(
-                    long(splitext(split(f)[1])[0])
-                    for f in glob(path_to(post_dir, '*' + post_ext))
-                )
-                log(account, "Backing up posts after %d\r" % ident_max)
-            except ValueError:  # max() arg is an empty sequence
-                pass
-        else:
-            log(account, "Getting basic information\r")
-
-        # start by calling the API with just a single post
-        soup = apiparse(base, 1)
-        if not soup:
-            self.errors = True
-            return
-
-        # collect all the meta information
-        resp = soup['response']
-        if options.likes:
-            _get_content = lambda soup: soup['response']['liked_posts']
-            blog = {}
-            count_estimate = resp['liked_count']
-        else:
-            _get_content = lambda soup: soup['response']['posts']
-            blog = resp['blog']
-            count_estimate = blog['posts']
-        self.title = escape(blog.get('title', account))
-        self.subtitle = blog.get('description', '')
-
-        # use the meta information to create a HTML header
-        TumblrPost.post_header = self.header(body_class='post')
-
-        # returns whether any posts from this batch were saved
-        def _backup(posts):
-            for p in sorted(posts, key=lambda x: x['id'], reverse=True):
-                post = post_class(p)
-                if ident_max and long(post.ident) <= ident_max:
-                    return False
-                if options.count and self.post_count >= options.count:
-                    return False
-                if options.period:
-                    if post.date >= options.p_stop:
-                        continue
-                    if post.date < options.p_start:
-                        return False
-                if options.request:
-                    if post.typ not in options.request:
-                        continue
-                    tags = options.request[post.typ]
-                    if not (TAG_ANY in tags or tags & post.tags_lower):
-                        continue
-                if options.no_reblog:
-                    if 'reblogged_from_name' in p or 'reblogged_root_name' in p:
-                        if 'trail' in p and not p['trail']:
-                            continue
-                        elif 'trail' in p and 'is_current_item' not in p['trail'][-1]:
-                            continue
-                    elif 'trail' in p and p['trail'] and 'is_current_item' not in p['trail'][-1]:
-                        continue
-                backup_pool.add_work(post.save_content)
-                self.post_count += 1
-            return True
-
-        # start the thread pool
-        backup_pool = ThreadPool()
-        try:
-            # Get the JSON entries from the API, which we can only do for MAX_POSTS posts at once.
-            # Posts "arrive" in reverse chronological order. Post #0 is the most recent one.
-            i = options.skip
-            while True:
-                # find the upper bound
-                log(account, "Getting posts %d to %d (of %d expected)\r" % (i, i + MAX_POSTS - 1, count_estimate))
-
-                soup = apiparse(base, MAX_POSTS, i)
-                if soup is None:
-                    i += 1 # try skipping a post
-                    self.errors = True
-                    continue
-
-                posts = _get_content(soup)
-                # `_backup(posts)` can be empty even when `posts` is not if we don't backup reblogged posts
-                if not posts or not _backup(posts):
-                    log(account, "done\r")
-                    break
-
-                i += MAX_POSTS
-        except:
-            # ensure proper thread pool termination
-            backup_pool.cancel()
-            raise
-
-        # wait until all posts have been saved
-        backup_pool.wait()
-
-        # postprocessing
-        if not options.blosxom and self.post_count:
-            get_avatar()
-            get_style()
-            if not have_custom_css:
-                save_style()
-            ix = Indices(self)
-            ix.build_index()
-            ix.save_index()
-
-        log(account, "%d posts backed up\n" % self.post_count)
-        self.total_count += self.post_count
-
-
-class TumblrPost:
-
-    post_header = ''    # set by TumblrBackup.backup()
-
-    def __init__(self, post):
-        self.content = ''
-        self.post = post
-        self.json_content = json.dumps(post, sort_keys=True, indent=4, separators=(',', ': '))
-        self.creator = post['blog_name']
-        self.ident = str(post['id'])
-        self.url = post['post_url']
-        self.shorturl = post['short_url']
-        self.typ = str(post['type'])
-        self.date = post['timestamp']
-        self.isodate = datetime.utcfromtimestamp(self.date).isoformat() + 'Z'
-        self.tm = time.localtime(self.date)
-        self.title = ''
-        self.tags = post['tags']
-        self.note_count = post.get('note_count', 0)
-        self.reblogged_from = post.get('reblogged_from_url')
-        self.reblogged_root = post.get('reblogged_root_url')
-        self.source_title = post.get('source_title', '')
-        self.source_url = post.get('source_url', '')
-        if options.request:
-            self.tags_lower = set(t.lower() for t in self.tags)
-        self.file_name = join(self.ident, dir_index) if options.dirs else self.ident + post_ext
-        self.llink = self.ident if options.dirs else self.file_name
-
-    def save_content(self):
-        """generates the content for this post"""
-        post = self.post
-        content = []
-
-        def append(s, fmt=u'%s'):
-            content.append(fmt % s)
-
-        def get_try(elt):
-            return post.get(elt) or ''
-
-        def append_try(elt, fmt=u'%s'):
-            elt = get_try(elt)
-            if elt:
-                if options.save_images:
-                    elt = re.sub(r'''(?i)(<img\s(?:[^>]*\s)?src\s*=\s*["'])(.*?)(["'][^>]*>)''',
-                        self.get_inline_image, elt
-                    )
-                if options.save_video or options.save_video_tumblr:
-                    # Handle video element poster attribute
-                    elt = re.sub(r'''(?i)(<video\s(?:[^>]*\s)?poster\s*=\s*["'])(.*?)(["'][^>]*>)''',
-                        self.get_inline_video_poster, elt
-                    )
-                    # Handle video element's source sub-element's src attribute
-                    elt = re.sub(r'''(?i)(<source\s(?:[^>]*\s)?src\s*=\s*["'])(.*?)(["'][^>]*>)''',
-                        self.get_inline_video, elt
-                    )
-                append(elt, fmt)
-
-        self.media_dir = join(post_dir, self.ident) if options.dirs else media_dir
-        self.media_url = save_dir + self.media_dir
-        self.media_folder = path_to(self.media_dir)
-
-        if self.typ == 'text':
-            self.title = get_try('title')
-            append_try('body')
-
-        elif self.typ == 'photo':
-            url = get_try('link_url')
-            is_photoset = len(post['photos']) > 1
-            for offset, p in enumerate(post['photos'], start=1):
-                o = p['alt_sizes'][0] if 'alt_sizes' in p else p['original_size']
-                src = o['url']
-                if options.save_images:
-                    src = self.get_image_url(src, offset if is_photoset else 0)
-                append(escape(src), u'<img alt="" src="%s">')
-                if url:
-                    content[-1] = u'<a href="%s">%s</a>' % (escape(url), content[-1])
-                content[-1] = '<p>' + content[-1] + '</p>'
-                if p['caption']:
-                    append(p['caption'], u'<p>%s</p>')
-            append_try('caption')
-
-        elif self.typ == 'link':
-            url = post['url']
-            self.title = u'<a href="%s">%s</a>' % (escape(url), post['title'] or url)
-            append_try('description')
-
-        elif self.typ == 'quote':
-            append(post['text'], u'<blockquote><p>%s</p></blockquote>')
-            append_try('source', u'<p>%s</p>')
-
-        elif self.typ == 'video':
-            src = ''
-            if (options.save_video or options.save_video_tumblr) \
-            and post['video_type'] == 'tumblr':
-                src = self.get_media_url(post['video_url'], '.mp4')
-            elif options.save_video:
-                src = self.get_youtube_url(self.url)
-                if not src:
-                    sys.stdout.write(u'Unable to download video in post #%s%-50s\n' %
-                        (self.ident, ' ')
-                    )
-            if src:
-                append(u'<p><video controls><source src="%s" type=video/mp4>%s<br>\n<a href="%s">%s</a></video></p>' % (
-                    src, "Your browser does not support the video element.", src, "Video file"
-                ))
-            else:
-                append(post['player'][-1]['embed_code'])
-            append_try('caption')
-
-        elif self.typ == 'audio':
-            src = ''
-            if options.save_audio:
-                audio_url = get_try('audio_url') or get_try('audio_source_url')
-                if post['audio_type'] == 'tumblr':
-                    if audio_url.startswith('https://a.tumblr.com/'):
-                        src = self.get_media_url(audio_url, '.mp3')
-                    elif audio_url.startswith('https://www.tumblr.com/audio_file/'):
-                        audio_url = u'https://a.tumblr.com/%so1.mp3' % audio_url.split('/')[-1]
-                        src = self.get_media_url(audio_url, '.mp3')
-                elif post['audio_type'] == 'soundcloud':
-                    src = self.get_media_url(audio_url, '.mp3')
-            if src:
-                append(u'<p><audio controls><source src="%s" type=audio/mpeg>%s<br>\n<a href="%s">%s</a></audio></p>' % (
-                    src, "Your browser does not support the audio element.", src, "Audio file"
-                ))
-            else:
-                append(post['player'])
-            append_try('caption')
-
-        elif self.typ == 'answer':
-            self.title = post['question']
-            append_try('answer')
-
-        elif self.typ == 'chat':
-            self.title = get_try('title')
-            append(
-                u'<br>\n'.join('%(label)s %(phrase)s' % d for d in post['dialogue']),
-                u'<p>%s</p>'
-            )
-
-        else:
-            sys.stderr.write(
-                u"Unknown post type '%s' in post #%s%-50s\n" % (self.typ, self.ident, ' ')
-            )
-            append(escape(self.json_content), u'<pre>%s</pre>')
-
-        self.content = '\n'.join(content)
-
-        # fix wrongly nested HTML elements
-        for p in ('<p>(<(%s)>)', '(</(%s)>)</p>'):
-            self.content = re.sub(p % 'p|ol|iframe[^>]*', r'\1', self.content)
-
-        self.save_post()
-
-    def get_youtube_url(self, youtube_url):
-        # determine the media file name
-        filetmpl = u'%(id)s_%(uploader_id)s_%(title)s.%(ext)s'
-        ydl_options = {
-            'outtmpl': join(self.media_folder, filetmpl),
-            'quiet': True,
-            'restrictfilenames': True,
-            'noplaylist': True,
-            'continuedl': True,
-            'nooverwrites': True,
-            'retries': 3000,
-            'fragment_retries': 3000,
-            'ignoreerrors': True
-        }
-        if options.cookiefile:
-            ydl_options['cookiefile'] = options.cookiefile
-        ydl = youtube_dl.YoutubeDL(ydl_options)
-        ydl.add_default_info_extractors()
-        try:
-            result = ydl.extract_info(youtube_url, download=False)
-            media_filename = sanitize_filename(filetmpl % result['entries'][0], restricted=True)
-        except:
-            return ''
-
-        # check if a file with this name already exists
-        if not os.path.isfile(media_filename):
-            try:
-                ydl.extract_info(youtube_url, download=True)
-            except:
-                return ''
-        return u'%s/%s' % (self.media_url, split(media_filename)[1])
-
-    def get_media_url(self, media_url, extension):
-        if not media_url:
-            return ''
-        media_filename = self.get_filename(media_url)
-        media_filename = os.path.splitext(media_filename)[0] + extension
-        saved_name = self.download_media(media_url, media_filename)
-        if saved_name is not None:
-            media_filename = u'%s/%s' % (self.media_url, saved_name)
-        return media_filename
-
-    def get_image_url(self, image_url, offset):
-        """Saves an image if not saved yet. Returns the new URL or
-        the original URL in case of download errors."""
-        image_filename = self.get_filename(image_url, '_o%s' % offset if offset else '')
-        saved_name = self.download_media(image_url, image_filename)
-        if saved_name is not None:
-            if options.exif and saved_name.endswith('.jpg'):
-                add_exif(join(self.media_folder, saved_name), set(self.tags))
-            image_url = u'%s/%s' % (self.media_url, saved_name)
-        return image_url
-
-    @staticmethod
-    def maxsize_image_url(image_url):
-        if ".tumblr.com/" not in image_url or image_url.endswith('.gif'):
-            return image_url
-        # change the image resolution to 1280
-        return re.sub(r'_\d{2,4}(\.\w+)$', r'_1280\1', image_url)
-
-    def get_inline_image(self, match):
-        """Saves an inline image if not saved yet. Returns the new <img> tag or
-        the original one in case of download errors."""
-        image_url = match.group(2)
-        if image_url.startswith('//'):
-            image_url = 'http:' + image_url
-        image_url = self.maxsize_image_url(image_url)
-        path = urlparse.urlparse(image_url).path
-        image_filename = path.split('/')[-1]
-        if not image_filename or not image_url.startswith('http'):
-            return match.group(0)
-        saved_name = self.download_media(image_url, image_filename)
-        if saved_name is None:
-            return match.group(0)
-        return u'%s%s/%s%s' % (match.group(1), self.media_url,
-            saved_name, match.group(3)
-        )
-
-    def get_inline_video_poster(self, match):
-        """Saves an inline video poster if not saved yet. Returns the new
-        <video> tag or the original one in case of download errors."""
-        poster_url = match.group(2)
-        if poster_url.startswith('//'):
-            poster_url = 'http:' + poster_url
-        path = urlparse.urlparse(poster_url).path
-        poster_filename = path.split('/')[-1]
-        if not poster_filename or not poster_url.startswith('http'):
-            return match.group(0)
-        saved_name = self.download_media(poster_url, poster_filename)
-        if saved_name is None:
-            return match.group(0)
-        # get rid of autoplay and muted attributes to align with normal video
-        # download behaviour
-        return (u'%s%s/%s%s' % (match.group(1), self.media_url,
-            saved_name, match.group(3)
-        )).replace('autoplay="autoplay"', '').replace('muted="muted"', '')
-
-    def get_inline_video(self, match):
-        """Saves an inline video if not saved yet. Returns the new <video> tag
-        or the original one in case of download errors."""
-        video_url = match.group(2)
-        if video_url.startswith('//'):
-            video_url = 'http:' + video_url
-        path = urlparse.urlparse(video_url).path
-        video_filename = path.split('/')[-1]
-        if not video_filename or not video_url.startswith('http'):
-            return match.group(0)
-        saved_name = None
-        if '.tumblr.com' in video_url:
-            saved_name = self.get_media_url(video_url, '.mp4')
-        elif options.save_video:
-            saved_name = self.get_youtube_url(video_url)
-        if saved_name is None:
-            return match.group(0)
-        return u'%s%s%s' % (match.group(1), saved_name, match.group(3))
-
-    def get_filename(self, url, offset=''):
-        """Determine the image file name depending on options.image_names"""
-        if options.image_names == 'i':
-            return self.ident + offset
-        elif options.image_names == 'bi':
-            return account + '_' + self.ident + offset
-        else:
-            # delete characters not allowed under Windows
-            return re.sub(r'[:<>"/\\|*?]', '', url.split('/')[-1])
-
-    def download_media(self, url, filename):
-        # check if a file with this name already exists
-        known_extension = '.' in filename[-5:]
-        image_glob = glob(path_to(self.media_dir,
-            filename + ('' if known_extension else '.*')
-        ))
-        if image_glob:
-            return split(image_glob[0])[1]
-        # download the media data
-        try:
-            resp = urlopen(url)
-            with open_media(self.media_dir, filename) as dest:
-                data = resp.read(HTTP_CHUNK_SIZE)
-                hdr = data[:32]     # save the first few bytes
-                while data:
-                    dest.write(data)
-                    data = resp.read(HTTP_CHUNK_SIZE)
-        except (EnvironmentError, ValueError, HTTPException) as e:
-            sys.stderr.write('%s downloading %s\n' % (e, url))
-            try:
-                os.unlink(path_to(self.media_dir, filename))
-            except OSError as e:
-                if e.errno != errno.ENOENT:
-                    raise
-            return None
-        # determine the file type if it's unknown
-        if not known_extension:
-            image_type = imghdr.what(None, hdr)
-            if image_type:
-                oldname = path_to(self.media_dir, filename)
-                filename += '.' + image_type.replace('jpeg', 'jpg')
-                os.rename(oldname, path_to(self.media_dir, filename))
-        return filename
-
-    def get_post(self):
-        """returns this post in HTML"""
-        typ = ('liked-' if options.likes else '') + self.typ
-        post = self.post_header + u'<article class=%s id=p-%s>\n' % (typ, self.ident)
-        post += u'<header>\n'
-        if options.likes:
-            post += u'<p><a href=\"http://{0}.tumblr.com/\" class=\"tumblr_blog\">{0}</a>:</p>\n'.format(self.creator)
-        post += u'<p><time datetime=%s>%s</time>\n' % (self.isodate, strftime('%x %X', self.tm))
-        post += u'<a class=llink href=%s%s/%s>¶</a>\n' % (save_dir, post_dir, self.llink)
-        post += u'<a href=%s>●</a>\n' % self.shorturl
-        if self.reblogged_from and self.reblogged_from != self.reblogged_root:
-            post += u'<a href=%s>⬀</a>\n' % self.reblogged_from
-        if self.reblogged_root:
-            post += u'<a href=%s>⬈</a>\n' % self.reblogged_root
-        post += '</header>\n'
-        if self.title:
-            post += u'<h2>%s</h2>\n' % self.title
-        post += self.content
-        foot = []
-        if self.tags:
-            foot.append(u''.join(self.tag_link(t) for t in self.tags))
-        if self.note_count:
-            foot.append(u'%d note%s' % (self.note_count, 's'[self.note_count == 1:]))
-        if self.source_title and self.source_url:
-            foot.append(u'<a title=Source href=%s>%s</a>' %
-                (self.source_url, self.source_title)
-            )
-        if foot:
-            post += u'\n<footer>%s</footer>' % u' — '.join(foot)
-        post += '\n</article>\n'
-        return post
-
-    @staticmethod
-    def tag_link(tag):
-        tag_disp = escape(TAG_FMT % tag)
-        if not TAGLINK_FMT:
-            return tag_disp + ' '
-        url = TAGLINK_FMT % {'domain': blog_name, 'tag': urllib.quote(tag.encode('utf-8'))}
-        return u'<a href=%s>%s</a>\n' % (url, tag_disp)
-
-    def save_post(self):
-        """saves this post locally"""
-        if options.dirs:
-            f = open_text(post_dir, self.ident, dir_index)
-        else:
-            f = open_text(post_dir, self.file_name)
-        with f:
-            f.write(self.get_post())
-        os.utime(f.stream.name, (self.date, self.date))  # XXX: is f.stream.name portable?
-        if options.json:
-            with open_text(json_dir, self.ident + '.json') as f:
-                f.write(self.json_content)
-
-
-class BlosxomPost(TumblrPost):
-
-    def get_image_url(self, image_url, offset):
-        return image_url
-
-    def get_post(self):
-        """returns this post as a Blosxom post"""
-        post = self.title + '\nmeta-id: p-' + self.ident + '\nmeta-url: ' + self.url
-        if self.tags:
-            post += '\nmeta-tags: ' + ' '.join(t.replace(' ', '+') for t in self.tags)
-        post += '\n\n' + self.content
-        return post
-
-
-class LocalPost:
-
-    def __init__(self, post_file):
-        with codecs.open(post_file, 'r', encoding) as f:
-            post = f.read()
-        # extract all URL-encoded tags
-        self.tags = []
-        footer_pos = post.find('<footer>')
-        if footer_pos > 0:
-            self.tags = re.findall(r'(?m)<a.+?/tagged/(.+?)>#(.+?)</a>', post[footer_pos:])
-        # remove header and footer
-        lines = post.split('\n')
-        while lines and '<article ' not in lines[0]:
-            del lines[0]
-        while lines and '</article>' not in lines[-1]:
-            del lines[-1]
-        self.post = '\n'.join(lines)
-        parts = post_file.split(os.sep)
-        if parts[-1] == dir_index:  # .../<post_id>/index.html
-            self.file_name = os.sep.join(parts[-2:])
-            self.ident = parts[-2]
-        else:
-            self.file_name = parts[-1]
-            self.ident = splitext(self.file_name)[0]
-        self.date = os.stat(post_file).st_mtime
-        self.tm = time.localtime(self.date)
-
-    def get_post(self):
-        return self.post
-
-
-class ThreadPool:
-
-    def __init__(self, thread_count=20, max_queue=1000):
-        self.queue = Queue.Queue(max_queue)
-        self.quit = threading.Event()
-        self.abort = threading.Event()
-        self.threads = [threading.Thread(target=self.handler) for _ in range(thread_count)]
-        for t in self.threads:
-            t.start()
-
-    def add_work(self, work):
-        self.queue.put(work)
-
-    def wait(self):
-        self.quit.set()
-        self.queue.join()
-
-    def cancel(self):
-        self.abort.set()
-        for i, t in enumerate(self.threads, start=1):
-            log('', "\rStopping threads %s%s\r" %
-                (' ' * i, '.' * (len(self.threads) - i))
-            )
-            t.join()
-
-    def handler(self):
-        while not self.abort.is_set():
-            try:
-                work = self.queue.get(True, 0.1)
-            except Queue.Empty:
-                if self.quit.is_set():
-                    break
-            else:
-                if self.quit.is_set() and self.queue.qsize() % MAX_POSTS == 0:
-                    log(account, "%d remaining posts to save\r" % self.queue.qsize())
-                try:
-                    work()
-                finally:
-                    self.queue.task_done()
-
-
-if __name__ == '__main__':
-    import optparse
-
-    def csv_callback(option, opt, value, parser):
-        setattr(parser.values, option.dest, set(value.split(',')))
-
-    def tags_callback(option, opt, value, parser):
-        request_callback(option, opt, TYPE_ANY + ':' + value.replace(',', ':'), parser)
-
-    def request_callback(option, opt, value, parser):
-        request = parser.values.request or {}
-        for req in value.lower().split(','):
-            parts = req.strip().split(':')
-            typ = parts.pop(0)
-            if typ != TYPE_ANY and typ not in POST_TYPES:
-                parser.error("%s: invalid post type '%s'" % (opt, typ))
-            for typ in POST_TYPES if typ == TYPE_ANY else (typ,):
-                if parts:
-                    request[typ] = request.get(typ, set()).union(parts)
-                else:
-                    request[typ] = set([TAG_ANY])
-        parser.values.request = request
-
-    parser = optparse.OptionParser("Usage: %prog [options] blog-name ...",
-        description="Makes a local backup of Tumblr blogs."
-    )
-    parser.add_option('-O', '--outdir', help="set the output directory"
-        " (default: blog-name)"
-    )
-    parser.add_option('-D', '--dirs', action='store_true',
-        help="save each post in its own folder"
-    )
-    parser.add_option('-q', '--quiet', action='store_true',
-        help="suppress progress messages"
-    )
-    parser.add_option('-i', '--incremental', action='store_true',
-        help="incremental backup mode"
-    )
-    parser.add_option('-l', '--likes', action='store_true',
-        dest='likes', help="save a blog's likes, not its posts"
-    )
-    parser.add_option('-k', '--skip-images', action='store_false', default=True,
-        dest='save_images', help="do not save images; link to Tumblr instead"
-    )
-    parser.add_option('--save-video', action='store_true', help="save all video files")
-    parser.add_option('--save-video-tumblr', action='store_true', help="save only Tumblr video files")
-    parser.add_option('--save-audio', action='store_true', help="save audio files")
-    parser.add_option('--cookiefile', help="cookie file for youtube-dl")
-    parser.add_option('-j', '--json', action='store_true',
-        help="save the original JSON source"
-    )
-    parser.add_option('-b', '--blosxom', action='store_true',
-        help="save the posts in blosxom format"
-    )
-    parser.add_option('-r', '--reverse-month', action='store_false', default=True,
-        help="reverse the post order in the monthly archives"
-    )
-    parser.add_option('-R', '--reverse-index', action='store_false', default=True,
-        help="reverse the index file order"
-    )
-    parser.add_option('--tag-index', action='store_true',
-        help="also create an archive per tag"
-    )
-    parser.add_option('-a', '--auto', type='int', metavar="HOUR",
-        help="do a full backup at HOUR hours, otherwise do an incremental backup"
-        " (useful for cron jobs)"
-    )
-    parser.add_option('-n', '--count', type='int', default=0,
-        help="save only COUNT posts"
-    )
-    parser.add_option('-s', '--skip', type='int', default=0,
-        help="skip the first SKIP posts"
-    )
-    parser.add_option('-p', '--period', help="limit the backup to PERIOD"
-        " ('y', 'm', 'd' or YYYY[MM[DD]])"
-    )
-    parser.add_option('-N', '--posts-per-page', type='int', default=50,
-        metavar='COUNT', help="set the number of posts per monthly page, "
-        "0 for unlimited"
-    )
-    parser.add_option('-Q', '--request', type='string', action='callback',
-        callback=request_callback, help="save posts matching the request"
-        u" TYPE:TAG:TAG:…,TYPE:TAG:…,…. TYPE can be %s or %s; TAGs can be"
-        " omitted or a colon-separated list. Example: -Q %s:personal,quote"
-        ",photo:me:self" % (', '.join(POST_TYPES), TYPE_ANY, TYPE_ANY)
-    )
-    parser.add_option('-t', '--tags', type='string', action='callback',
-        callback=tags_callback, help="save only posts tagged TAGS (comma-separated values;"
-        " case-insensitive)"
-    )
-    parser.add_option('-T', '--type', type='string', action='callback',
-        callback=request_callback, help="save only posts of type TYPE"
-        " (comma-separated values from %s)" % ', '.join(POST_TYPES)
-    )
-    parser.add_option('--no-reblog', action='store_true', help="don't save reblogged posts")
-    parser.add_option('-I', '--image-names', type='choice', choices=('o', 'i', 'bi'),
-        default='o', metavar='FMT',
-        help="image filename format ('o'=original, 'i'=<post-id>, 'bi'=<blog-name>_<post-id>)"
-    )
-    parser.add_option('-e', '--exif', type='string', action='callback',
-        callback=csv_callback, default=set(), metavar='KW',
-        help="add EXIF keyword tags to each picture (comma-separated values;"
-        " '-' to remove all tags, '' to add no extra tags)"
-    )
-    parser.add_option('-S', '--no-ssl-verify', action='store_true',
-        help="ignore SSL verification errors"
-    )
-    options, args = parser.parse_args()
-
-    if options.auto is not None and options.auto != time.localtime().tm_hour:
-        options.incremental = True
-    if options.period:
-        try:
-            pformat = {'y': '%Y', 'm': '%Y%m', 'd': '%Y%m%d'}[options.period]
-            options.period = time.strftime(pformat)
-        except KeyError:
-            options.period = options.period.replace('-', '')
-            if not re.match(r'^\d{4}(\d\d)?(\d\d)?$', options.period):
-                parser.error("Period must be 'y', 'm', 'd' or YYYY[MM[DD]]")
-        set_period()
-    if have_ssl_ctx and options.no_ssl_verify:
-        ssl_ctx = ssl._create_unverified_context()
-        # Otherwise, it's an old Python version without SSL verification,
-        # so this is the default.
-
-    args = args or DEFAULT_BLOGS
-    if not args:
-        parser.error("Missing blog-name")
-    if options.outdir and len(args) > 1:
-        parser.error("-O can only be used for a single blog-name")
-    if options.dirs and options.tag_index:
-        parser.error("-D cannot be used with --tag-index")
-    if options.exif and not pyexiv2:
-        parser.error("--exif: module 'pyexiv2' is not installed")
-    if options.save_video and not youtube_dl:
-        parser.error("--save-video: module 'youtube_dl' is not installed")
-
-    if not API_KEY:
-        sys.stderr.write('''\
-Missing API_KEY; please get your own API key at
-https://www.tumblr.com/oauth/apps\n''')
-        sys.exit(1)
-
-    tb = TumblrBackup()
-    try:
-        for account in args:
-            tb.backup(account)
-    except KeyboardInterrupt:
-        sys.exit(EXIT_INTERRUPT)
-
-    sys.exit(tb.exit_code())
diff --git a/tumblr_backup/__init__.py b/tumblr_backup/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tumblr_backup/is_reblog.py b/tumblr_backup/is_reblog.py
new file mode 100644
index 0000000..3c3815e
--- /dev/null
+++ b/tumblr_backup/is_reblog.py
@@ -0,0 +1,139 @@
+from __future__ import annotations
+
+import re
+from typing import Any, Callable
+
+
+def _check_posted_note(doc: dict[str, Any]) -> bool:
+    notes = doc.get('notes')
+    if not (notes and isinstance(notes, list)):
+        return False  # no notes available
+
+    n = notes[-1]
+    return bool(
+        n['type'] == 'posted'
+        and n['timestamp'] < doc['timestamp']  # sometimes a later reblog is credited
+        and n['blog_uuid'] != doc['blog']['uuid'],
+    )
+
+
+def _check_content(doc: dict[str, Any], pred: Callable[[str], bool], name: str) -> bool:
+    reblog_info = doc.get('reblog', {})
+
+    if doc.get('is_submission') and not reblog_info.get('tree_html'):
+        return False  # prone to false-positives
+    if 'post_html' in doc:
+        return False  # post_html is messy and we have root_id anyway
+
+    # reason: quote source content
+    if 'source' in doc:
+        return name == 'via' and pred(doc['source'])  # this key is more specific
+
+    # reason: comment content
+    return bool(
+        reblog_info
+        and (name == 'via' or not reblog_info['tree_html'])
+        and pred(reblog_info['comment']),
+    )
+
+
+BQ_RE = re.compile(
+    r'('
+      r'<(?!a[ >])[^<>]+>'
+      r'|'
+      r'(?![^>\n\s][^\S\n]*<a[ >])[^<>]'
+    r')*'
+    r'<a('
+      r' class="(?P<classes>[^"]*)"'
+      r'|'
+      r' href="https?://('
+        r'(?P<blogco>tmblr\.co/[a-zA-Z0-9_]+/?)'
+        r'|'
+        r'www\.tumblr\.com/dashboard/blog/(?P<bname0>[a-zA-Z0-9-]+)/[0-9]+/?'
+        r'|'
+        r'(?P<priv>www\.tumblr\.com/blog/private_[0-9]+\?[0-9]+)'
+        r'|'
+          r'('
+            r'(www|(?P<bname1>[a-zA-Z0-9-]+))\.tumblr.com'
+            r'|'
+            r'[^/"]+'
+          r')'
+          r'('
+            r'(?P<blogpost>/post/[0-9]+(/[^/"]*)?)'
+            r'|'
+            r'/[^"]*'  # poster-editable
+          r')?'
+      r')"'
+      r'|'
+      r' [^\s</>"' "'" r'=]+(="[^"]*"|\b)'
+    r')*'
+    r'>'
+      r'[^<>]*'  # poster-editable
+    r'</a>:'
+    r'(?![^\S\n]*[^<\s])',
+)
+BQ_RE2 = re.compile(r'(<p>)+[a-z0-9-]+:</p>\n*<blockquote>')
+
+
+def bqpred(c: str) -> bool:
+    if 'replied to your' in c:
+        return False
+    if BQ_RE2.match(c):
+        return True
+    m = BQ_RE.match(c)
+    if not m:
+        return False
+    return bool(
+        'tumblr_blog' in (m.group('classes') or '').split(' ')
+        or m.group('blogpost') or m.group('priv') or m.group('bname0')
+        or ((m.group('blogco') or m.group('bname1')) and re.search(r'<blockquote[ >]', c)),
+    )
+
+
+def post_is_reblog(doc: dict[str, Any]) -> bool:
+    # reason: reblogged_from_id
+    # true for 84.9% of posts, 99.7% of reblogs
+    if 'reblogged_from_id' in doc:
+        return True
+
+    # reason: root_id
+    # false for all svc reblogs (let's say 14.3% of posts)
+    # true for 0.3% of remaining reblogs
+    root = doc.get('root_id')
+    if root:
+        return int(root) != int(doc['id'])
+
+    trail = doc.get('trail')
+    if trail:
+        # reason: trail first post ID
+        # true for 95.6% of remaining reblogs
+        if int(trail[0]['post']['id']) != int(doc['id']):
+            return True
+
+        # reason: missing trail root
+        # true for 7.9% of remaining reblogs (and cheap)
+        if not any(p.get('is_root_item') for p in trail):
+            return True
+
+    # true for 96.9% of remaining reblogs
+    def viapred(c: str) -> bool:
+        return bool(re.search(r'\(via <a (class="tumblr_blog" |href="https?://[^/]+/?"[ >])', c))
+    if _check_content(doc, viapred, 'via'):
+        return True
+
+    # reason: posted note
+    # true for 36.4% of remaining reblogs (and cheap)
+    if _check_posted_note(doc):
+        return True
+
+    # reason: non-empty tree_html
+    # true for 14.3% of remaining reblogs (and cheap)
+    reblog_info = doc.get('reblog', {})
+    if reblog_info.get('tree_html') and ' replied to your ' not in reblog_info['tree_html']:
+        return True
+
+    # true for all (known) remaining reblogs
+    if _check_content(doc, bqpred, 'blockquote'):
+        return True
+
+    return False  # probably not a reblog
diff --git a/tumblr_backup/login.py b/tumblr_backup/login.py
new file mode 100644
index 0000000..4cb3c22
--- /dev/null
+++ b/tumblr_backup/login.py
@@ -0,0 +1,71 @@
+# Credit to johanneszab for the C# implementation in TumblThree.
+# Credit to MrEldritch for the initial Python port.
+# Cleaned up and split off by Cebtenzzre.
+
+"""
+This script uses Tumblr's internal SVC API to access a hidden or explicit blog,
+and retrieves a JSON of very similar (but not quite identical) format to the
+normal API.
+"""
+
+from __future__ import annotations
+
+import re
+import sys
+from getpass import getpass
+from http.cookiejar import MozillaCookieJar
+
+import requests
+
+
+def get_api_token(session):
+    r = session.get('https://www.tumblr.com/login')
+    if r.status_code != 200:
+        raise ValueError('Response has non-200 status: HTTP {} {}'.format(r.status_code, r.reason))
+    # https://stackoverflow.com/a/1732454
+    match = re.search(r'"API_TOKEN":"([^"]+)"', r.text)
+    if not match:
+        raise ValueError('Could not find API token in Tumblr response')
+    return match.group(1)
+
+
+def tumblr_login(session, login, password):
+    api_token = get_api_token(session)
+
+    headers = {
+        'Authorization': 'Bearer {}'.format(api_token),
+        'Origin': 'https://www.tumblr.com',
+        'Referer': 'https://www.tumblr.com/login',
+    }
+    request_body = {
+        'grant_type': 'password',
+        'username': login,
+        'password': password,
+    }
+    r = session.post('https://www.tumblr.com/api/v2/oauth2/token', headers=headers, json=request_body)
+    if r.status_code != 200:
+        raise ValueError('Response has non-200 status: HTTP {} {}'.format(r.status_code, r.reason))
+
+    # We now have the necessary cookies loaded into our session.
+
+
+def main():
+    cookiefile, = sys.argv[1:]
+
+    print('Enter the credentials for your Tumblr account.')
+    login = input('Email: ')
+    password = getpass()
+
+    # Create a requests session with cookies
+    session = requests.Session()
+    session.cookies = MozillaCookieJar(cookiefile)  # type: ignore[assignment]
+    session.headers['User-Agent'] = (
+        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 '
+        'Safari/537.36'
+    )
+
+    # Log into Tumblr
+    tumblr_login(session, login, password)
+
+    # Save the cookies
+    session.cookies.save(ignore_discard=True)  # type: ignore[attr-defined]
diff --git a/tumblr_backup/main.py b/tumblr_backup/main.py
new file mode 100644
index 0000000..fccbf80
--- /dev/null
+++ b/tumblr_backup/main.py
@@ -0,0 +1,2402 @@
+# builtin modules
+from __future__ import annotations
+
+import argparse
+import calendar
+import contextlib
+import errno
+import hashlib
+import http.client
+import itertools
+import json
+import locale
+import multiprocessing
+import os
+import re
+import shutil
+import signal
+import sys
+import textwrap
+import threading
+import time
+import traceback
+from argparse import Namespace
+from collections import defaultdict
+from dataclasses import dataclass
+from datetime import datetime, timedelta
+from multiprocessing.queues import SimpleQueue
+from os.path import join, split, splitext
+from pathlib import Path
+from posixpath import basename as urlbasename, join as urlpathjoin, splitext as urlsplitext
+from tempfile import NamedTemporaryFile
+from types import ModuleType
+from typing import TYPE_CHECKING, Any, Callable, ContextManager, Iterable, Iterator, Literal, TextIO, cast
+from urllib.parse import quote, urlencode, urlparse
+from xml.sax.saxutils import escape
+
+# third-party modules
+import filetype
+import platformdirs
+import requests
+
+# internal modules
+from .is_reblog import post_is_reblog
+from .util import (AsyncCallable, ConnectionFile, LockedQueue, LogLevel, MultiCondition, copyfile, enospc, fdatasync,
+                   fsync, have_module, is_dns_working, make_requests_session, no_internet, opendir, to_bytes)
+from .wget import HTTP_TIMEOUT, HTTPError, Retry, WGError, WgetRetrieveWrapper, setup_wget, touch, urlopen
+
+if TYPE_CHECKING:
+    from bs4 import Tag
+    from typing_extensions import TypeAlias
+else:
+    Tag = None
+
+JSONDict: TypeAlias = 'dict[str, Any]'
+
+# extra optional packages
+try:
+    import pyexiv2
+except ImportError:
+    if not TYPE_CHECKING:
+        pyexiv2 = None
+
+try:
+    import jq  # type: ignore[import-not-found]
+except ImportError:
+    if not TYPE_CHECKING:
+        jq = None
+
+# Imported later if needed
+ytdl_module: ModuleType | None = None
+
+# Format of displayed tags
+TAG_FMT = '#{}'  # noqa: P103
+
+# Format of tag link URLs; set to None to suppress the links.
+# Named placeholders that will be replaced: domain, tag
+TAGLINK_FMT = 'https://{domain}/tagged/{tag}'
+
+# exit codes
+EXIT_SUCCESS    = 0
+EXIT_FAILURE    = 1
+# EXIT_ARGPARSE = 2 -- returned by argparse
+EXIT_INTERRUPT  = 3
+EXIT_ERRORS     = 4
+EXIT_NOPOSTS    = 5
+
+# variable directory names, will be set in TumblrBackup.backup()
+save_folder = ''
+media_folder = ''
+
+# constant names
+root_folder = os.getcwd()
+post_dir = 'posts'
+json_dir = 'json'
+media_dir = 'media'
+archive_dir = 'archive'
+theme_dir = 'theme'
+save_dir = '..'
+backup_css = 'backup.css'
+custom_css = 'custom.css'
+avatar_base = 'avatar'
+dir_index = 'index.html'
+tag_index_dir = 'tags'
+
+blog_name = ''
+post_ext = '.html'
+have_custom_css = False
+
+POST_TYPES = ('text', 'quote', 'link', 'answer', 'video', 'audio', 'photo', 'chat')
+TYPE_ANY = 'any'
+TAG_ANY = '__all__'
+
+MAX_POSTS = 50
+REM_POST_INC = 10
+
+# Always retry on 503 or 504, but never on connect or 429, the latter handled specially
+HTTP_RETRY = Retry(3, connect=False, status_forcelist=frozenset((503, 504)))
+HTTP_RETRY.RETRY_AFTER_STATUS_CODES = frozenset((413,))  # type: ignore[misc]
+
+# ensure the right date/time format
+try:
+    locale.setlocale(locale.LC_TIME, '')
+except locale.Error:
+    pass
+FILE_ENCODING = 'utf-8'
+
+PREV_MUST_MATCH_OPTIONS = ('likes', 'blosxom')
+MEDIA_PATH_OPTIONS = ('dirs', 'hostdirs', 'image_names')
+MUST_MATCH_OPTIONS = PREV_MUST_MATCH_OPTIONS + MEDIA_PATH_OPTIONS
+BACKUP_CHANGING_OPTIONS = (
+    'save_images', 'save_video', 'save_video_tumblr', 'save_audio', 'save_notes', 'copy_notes', 'notes_limit', 'json',
+    'count', 'skip', 'period', 'request', 'filter', 'no_reblog', 'only_reblog', 'exif', 'prev_archives',
+    'use_server_timestamps', 'user_agent', 'no_get', 'internet_archive', 'media_list', 'idents',
+)
+
+wget_retrieve: WgetRetrieveWrapper | None = None
+main_thread_lock = threading.RLock()
+multicond = MultiCondition(main_thread_lock)
+disable_note_scraper: set[str] = set()
+disablens_lock = threading.Lock()
+downloading_media: set[str] = set()
+downloading_media_cond = threading.Condition()
+
+
+def load_bs4(reason):
+    sys.modules['soupsieve'] = ()  # type: ignore[assignment]
+    try:
+        from bs4 import BeautifulSoup
+    except ImportError:
+        raise RuntimeError("Cannot {} without module 'bs4'".format(reason))
+    try:
+        import lxml  # noqa: F401
+    except ImportError:
+        raise RuntimeError("Cannot {} without module 'lxml'".format(reason))
+    return BeautifulSoup
+
+
+class Logger:
+    def __init__(self, quiet=False, file=sys.stdout):
+        self.quiet = quiet
+        self.file = file
+        self.lock = threading.Lock()
+        self.backup_account: str | None = None
+        self.status_msg: str | None = None
+
+    def log(self, level: LogLevel, msg: str, account: bool = False) -> None:
+        if self.quiet and level < LogLevel.WARN:
+            return
+        with self.lock:
+            for line in msg.splitlines(True):
+                self._print(line, account)
+            if self.status_msg:
+                self._print(self.status_msg, account=True)
+            sys.stdout.flush()
+
+    def info(self, msg, account=False):
+        self.log(LogLevel.INFO, msg, account)
+
+    def warn(self, msg, account=False):
+        self.log(LogLevel.WARN, msg, account)
+
+    def error(self, msg, account=False):
+        self.log(LogLevel.ERROR, msg, account)
+
+    def status(self, msg):
+        self.status_msg = msg
+        self.log(LogLevel.INFO, '')
+
+    def _print(self, msg, account=False):
+        if account:  # Optional account prefix
+            msg = '{}: {}'.format(self.backup_account, msg)
+
+        # Separate terminator
+        it = (i for i, c in enumerate(reversed(msg)) if c not in '\r\n')
+        try:
+            idx = len(msg) - next(it)
+        except StopIteration:
+            idx = 0
+        msg, term = msg[:idx], msg[idx:]
+
+        pad = ' ' * (80 - len(msg))  # Pad to 80 chars
+        print(msg + pad + term, end='', file=self.file)
+
+
+logger = Logger()
+
+
+def mkdir(dir_, recursive=False):
+    if not os.path.exists(dir_):
+        try:
+            if recursive:
+                os.makedirs(dir_)
+            else:
+                os.mkdir(dir_)
+        except FileExistsError:
+            pass  # ignored
+
+
+def path_to(*parts):
+    return join(save_folder, *parts)
+
+
+def open_file(open_fn, parts):
+    mkdir(path_to(*parts[:-1]), recursive=True)
+    return open_fn(path_to(*parts))
+
+
+class open_outfile:
+    def __init__(self, mode, *parts, **kwargs):
+        self._dest_path = open_file(lambda f: f, parts)
+        dest_dirname, dest_basename = split(self._dest_path)
+
+        self._partf = NamedTemporaryFile(mode, prefix='.{}.'.format(dest_basename), dir=dest_dirname, delete=False)
+        # NB: open by name so name attribute is accurate
+        self._f = open(self._partf.name, mode, **kwargs)
+
+    def __enter__(self):
+        return self._f
+
+    def __exit__(self, exc_type, exc_value, tb):
+        partf = self._partf
+        self._f.close()
+
+        if exc_type is not None:
+            # roll back on exception; do not write partial files
+            partf.close()
+            os.unlink(partf.name)
+            return
+
+        # NamedTemporaryFile is created 0600, set mode to the usual 0644
+        if os.name == 'posix':
+            os.fchmod(partf.fileno(), 0o644)
+        else:
+            os.chmod(partf.name, 0o644)
+
+        # Flush buffers and sync the inode
+        partf.flush()
+        fsync(partf)
+        partf.close()
+
+        # Move to final destination
+        os.replace(partf.name, self._dest_path)
+
+
+@contextlib.contextmanager
+def open_text(*parts, mode='w') -> Iterator[TextIO]:
+    assert 'b' not in mode
+    with open_outfile(mode, *parts, encoding=FILE_ENCODING, errors='xmlcharrefreplace') as f:
+        yield f
+
+
+def strftime(fmt, t=None):
+    if t is None:
+        t = time.localtime()
+    return time.strftime(fmt, t)
+
+
+def get_api_url(account: str, likes: bool) -> str:
+    """construct the tumblr API URL"""
+    global blog_name
+    blog_name = account
+    if any(c in account for c in '/\\') or account in ('.', '..'):
+        raise ValueError(f'Invalid blog name: {account!r}')
+    if '.' not in account:
+        blog_name += '.tumblr.com'
+    return f'https://api.tumblr.com/v2/blog/{blog_name}/{"likes" if likes else "posts"}'
+
+
+def parse_period_date(period):
+    """Prepare the period start and end timestamps"""
+    timefn: Callable[[Any], float] = time.mktime
+    # UTC marker
+    if period[-1] == 'Z':
+        period = period[:-1]
+        timefn = calendar.timegm
+
+    i = 0
+    tm = [int(period[:4]), 1, 1, 0, 0, 0, 0, 0, -1]
+    if len(period) >= 6:
+        i = 1
+        tm[1] = int(period[4:6])
+    if len(period) == 8:
+        i = 2
+        tm[2] = int(period[6:8])
+
+    def mktime(tml):
+        tmt: Any = tuple(tml)
+        return timefn(tmt)
+
+    p_start = int(mktime(tm))
+    tm[i] += 1
+    p_stop = int(mktime(tm))
+    return [p_start, p_stop]
+
+
+def get_posts_key(likes: bool) -> str:
+    return 'liked_posts' if likes else 'posts'
+
+
+class ApiParser:
+    TRY_LIMIT = 2
+    session: requests.Session | None = None
+    api_key: str | None = None
+
+    def __init__(self, base: str, account: str, options: Namespace):
+        self.base = base
+        self.account = account
+        self.options = options
+        self.prev_resps: list[str] | None = None
+        self.dashboard_only_blog: bool | None = None
+        self._prev_iter: Iterator[JSONDict] | None = None
+        self._last_mode: str | None = None
+        self._last_offset: int | None = None
+
+    @classmethod
+    def setup(
+        cls, api_key: str, no_ssl_verify: bool, user_agent: str, cookiefile: str | os.PathLike[str],
+    ) -> None:
+        cls.api_key = api_key
+        cls.session = make_requests_session(
+            requests.Session, HTTP_RETRY, HTTP_TIMEOUT,
+            not no_ssl_verify, user_agent, cookiefile,
+        )
+
+    def read_archive(self, prev_archive):
+        if self.options.reuse_json:
+            prev_archive = save_folder
+        elif prev_archive is None:
+            return True
+
+        def read_resp(path):
+            with open(path, encoding=FILE_ENCODING) as jf:
+                return json.load(jf)
+
+        if self.options.likes:
+            logger.warn('Reading liked timestamps from saved responses (may take a while)\n', account=True)
+
+        if self.options.idents is None:
+            respfiles: Iterable[str] = (
+                e.path for e in os.scandir(join(prev_archive, 'json'))
+                if e.name.endswith('.json') and e.is_file()
+            )
+        else:
+            respfiles = []
+            for ident in self.options.idents:
+                resp = join(prev_archive, 'json', str(ident) + '.json')
+                if not os.path.isfile(resp):
+                    logger.error("post '{}' not found\n".format(ident), account=True)
+                    return False
+                respfiles.append(resp)
+
+        self.prev_resps = sorted(
+            respfiles,
+            key=lambda p: (
+                read_resp(p)['liked_timestamp'] if self.options.likes
+                else int(os.path.basename(p)[:-5])
+            ),
+            reverse=True,
+        )
+        return True
+
+    def get_initial(self) -> JSONDict | None:
+        if self.prev_resps is not None:
+            try:
+                first_post = next(self._iter_prev())
+            except StopIteration:
+                return None
+            r = {get_posts_key(self.options.likes): [first_post], 'blog': first_post['blog'].copy()}
+            if self.options.likes:
+                r['liked_count'] = len(self.prev_resps)
+            else:
+                r['blog']['posts'] = len(self.prev_resps)
+            return r
+
+        resp = self.apiparse(1)
+        if self.dashboard_only_blog and resp and resp['posts']:
+            # svc API doesn't return blog info, steal it from the first post
+            resp['blog'] = resp['posts'][0]['blog']
+        return resp
+
+    def apiparse(self, count, start=0, before=None, ident=None) -> JSONDict | None:
+        assert self.api_key is not None
+
+        if self.prev_resps is not None:
+            if self._prev_iter is None:
+                self._prev_iter = self._iter_prev()
+            if ident is not None:
+                assert self._last_mode in (None, 'ident')
+                self._last_mode = 'ident'
+                # idents are pre-filtered
+                try:
+                    posts = [next(self._prev_iter)]
+                except StopIteration:
+                    return None
+            else:
+                it = self._prev_iter
+                if before is not None:
+                    assert self._last_mode in (None, 'before')
+                    assert self._last_offset is None or before < self._last_offset
+                    self._last_mode = 'before'
+                    self._last_offset = before
+                    it = itertools.dropwhile(
+                        lambda p: p['liked_timestamp' if self.options.likes else 'timestamp'] >= before,
+                        it,
+                    )
+                else:
+                    assert self._last_mode in (None, 'offset')
+                    assert start == (0 if self._last_offset is None else self._last_offset + MAX_POSTS)
+                    self._last_mode = 'offset'
+                    self._last_offset = start
+                posts = list(itertools.islice(it, None, count))
+            return {get_posts_key(self.options.likes): posts}
+
+        if self.dashboard_only_blog:
+            base = 'https://www.tumblr.com/svc/indash_blog'
+            params = {'tumblelog_name_or_id': self.account, 'post_id': '', 'limit': count,
+                      'should_bypass_safemode': 'true', 'should_bypass_tagfiltering': 'true'}
+            headers: dict[str, str] | None = {
+                'Referer': 'https://www.tumblr.com/dashboard/blog/' + self.account,
+                'X-Requested-With': 'XMLHttpRequest',
+            }
+        else:
+            base = self.base
+            params = {'api_key': self.api_key, 'limit': count, 'reblog_info': 'true'}
+            headers = None
+        if ident is not None:
+            params['post_id' if self.dashboard_only_blog else 'id'] = ident
+        elif before is not None and not self.dashboard_only_blog:
+            params['before'] = before
+        elif start > 0:
+            params['offset'] = start
+
+        try:
+            doc, status, reason = self._get_resp(base, params, headers)
+        except (OSError, HTTPError) as e:
+            logger.error('URL is {}?{}\n[FATAL] Error retrieving API repsonse: {!r}\n'.format(
+                base, urlencode(params), e,
+            ))
+            return None
+
+        if not 200 <= status < 300:
+            # Detect dashboard-only blogs by the error codes
+            if status == 404 and self.dashboard_only_blog is None and not (doc is None or self.options.likes):
+                errors = doc.get('errors', ())
+                if len(errors) == 1 and errors[0].get('code') == 4012:
+                    self.dashboard_only_blog = True
+                    logger.info('Found dashboard-only blog, trying svc API\n', account=True)
+                    return self.apiparse(count, start)  # Recurse once
+            if status == 403 and self.options.likes:
+                logger.error('HTTP 403: Most likely {} does not have public likes.\n'.format(self.account))
+                return None
+            logger.error('URL is {}?{}\n[FATAL] {} API repsonse: HTTP {} {}\n{}'.format(
+                base, urlencode(params),
+                'Error retrieving' if doc is None else 'Non-OK',
+                status, reason,
+                '' if doc is None else '{}\n'.format(doc),
+            ))
+            if status == 401 and self.dashboard_only_blog:
+                logger.error("This is a dashboard-only blog, so you probably don't have the right cookies.{}\n".format(
+                    '' if self.options.cookiefile else ' Try --cookiefile.',
+                ))
+            return None
+        if doc is None:
+            return None  # OK status but invalid JSON
+
+        if self.dashboard_only_blog:
+            with disablens_lock:
+                if self.account not in disable_note_scraper:
+                    disable_note_scraper.add(self.account)
+                    logger.info('[Note Scraper] Dashboard-only blog - scraping disabled for {}\n'.format(self.account))
+        elif self.dashboard_only_blog is None:
+            # If the first API request succeeds, it's a public blog
+            self.dashboard_only_blog = False
+
+        return doc.get('response')
+
+    def _iter_prev(self) -> Iterator[JSONDict]:
+        assert self.prev_resps is not None
+        for path in self.prev_resps:
+            with open(path, encoding=FILE_ENCODING) as f:
+                try:
+                    yield json.load(f)
+                except ValueError as e:
+                    f.seek(0)
+                    logger.error('{}: {}\n{!r}\n'.format(e.__class__.__name__, e, f.read()))
+
+    def _get_resp(self, base, params, headers):
+        assert self.session is not None
+        try_count = 0
+        while True:
+            try:
+                with self.session.get(base, params=params, headers=headers) as resp:
+                    try_count += 1
+                    doc = None
+                    ctype = resp.headers.get('Content-Type')
+                    if not (200 <= resp.status_code < 300 or 400 <= resp.status_code < 500):
+                        pass  # Server error, will not attempt to read body
+                    elif ctype and ctype.split(';', 1)[0].strip() != 'application/json':
+                        logger.error("Unexpected Content-Type: '{}'\n".format(ctype))
+                    else:
+                        try:
+                            doc = resp.json()
+                        except ValueError as e:
+                            logger.error('{}: {}\n{} {} {}\n{!r}\n'.format(
+                                e.__class__.__name__, e, resp.status_code, resp.reason, ctype,
+                                resp.content.decode('utf-8'),
+                            ))
+                    status = resp.status_code if doc is None else doc['meta']['status']
+                    if status == 429 and try_count < self.TRY_LIMIT and self._ratelimit_sleep(resp.headers):
+                        continue
+                    return doc, status, resp.reason if doc is None else http.client.responses.get(status, '(unknown)')
+            except HTTPError:
+                if not is_dns_working(timeout=5, check=self.options.use_dns_check):
+                    no_internet.signal()
+                    continue
+                raise
+
+    @staticmethod
+    def _ratelimit_sleep(headers):
+        # Daily ratelimit
+        if headers.get('X-Ratelimit-Perday-Remaining') == '0':
+            reset = headers.get('X-Ratelimit-Perday-Reset')
+            try:
+                freset = float(reset)  # pytype: disable=wrong-arg-types
+            except (TypeError, ValueError):
+                logger.error(f'Expected numerical X-Ratelimit-Perday-Reset, got {reset!r}\n')
+                msg = 'sometime tomorrow'
+            else:
+                treset = datetime.now() + timedelta(seconds=freset)
+                msg = 'at {}'.format(treset.ctime())
+            raise RuntimeError('{}: Daily API ratelimit exceeded. Resume with --continue after reset {}.\n'.format(
+                logger.backup_account, msg,
+            ))
+
+        # Hourly ratelimit
+        reset = headers.get('X-Ratelimit-Perhour-Reset')
+        if reset is None:
+            return False
+
+        try:
+            sleep_dur = float(reset)
+        except ValueError:
+            logger.error("Expected numerical X-Ratelimit-Perhour-Reset, got '{}'\n".format(reset), account=True)
+            return False
+
+        hours, remainder = divmod(abs(sleep_dur), 3600)
+        minutes, seconds = divmod(remainder, 60)
+        sleep_dur_str = ' '.join(str(int(t[0])) + t[1] for t in ((hours, 'h'), (minutes, 'm'), (seconds, 's')) if t[0])
+
+        if sleep_dur < 0:
+            logger.warn('Warning: X-Ratelimit-Perhour-Reset is {} in the past\n'.format(sleep_dur_str), account=True)
+            return True
+        if sleep_dur > 3600:
+            treset = datetime.now() + timedelta(seconds=sleep_dur)
+            raise RuntimeError('{}: Refusing to sleep for {}. Resume with --continue at {}.'.format(
+                logger.backup_account, sleep_dur_str, treset.ctime(),
+            ))
+
+        logger.warn('Hit hourly ratelimit, sleeping for {} as requested\n'.format(sleep_dur_str), account=True)
+        time.sleep(sleep_dur + 1)  # +1 to be sure we're past the reset
+        return True
+
+
+def add_exif(image_name: str, tags: set[str], exif: set[str]) -> None:
+    assert pyexiv2 is not None
+    try:
+        metadata = pyexiv2.ImageMetadata(image_name)
+        metadata.read()
+    except OSError as e:
+        logger.error('Error reading metadata for image {!r}: {!r}\n'.format(image_name, e))
+        return
+    KW_KEY = 'Iptc.Application2.Keywords'
+    if '-' in exif:  # remove all tags
+        if KW_KEY in metadata.iptc_keys:
+            del metadata[KW_KEY]
+    else:  # add tags
+        if KW_KEY in metadata.iptc_keys:
+            tags |= set(metadata[KW_KEY].value)
+        taglist = [tag.strip().lower() for tag in tags | exif if tag]
+        metadata[KW_KEY] = pyexiv2.IptcTag(KW_KEY, taglist)
+    try:
+        metadata.write()
+    except OSError as e:
+        logger.error('Writing metadata failed for tags {} in {!r}: {!r}\n'.format(tags, image_name, e))
+
+
+def save_style():
+    with open_text(backup_css) as css:
+        css.write(textwrap.dedent("""\
+            @import url("override.css");
+
+            body { width: 720px; margin: 0 auto; }
+            body > footer { padding: 1em 0; }
+            header > img { float: right; }
+            img { max-width: 720px; }
+            blockquote { margin-left: 0; border-left: 8px #999 solid; padding: 0 24px; }
+            .archive h1, .subtitle, article { padding-bottom: 0.75em; border-bottom: 1px #ccc dotted; }
+            article[class^="liked-"] { background-color: #f0f0f8; }
+            .post a.llink { display: none; }
+            header a, footer a { text-decoration: none; }
+            footer, article footer a { font-size: small; color: #999; }
+        """))
+
+
+def find_files(path, match=None):
+    try:
+        it = os.scandir(path)
+    except FileNotFoundError:
+        return  # ignore nonexistent dir
+    with it:
+        yield from (e.path for e in it if match is None or match(e.name))
+
+
+def find_post_files(dirs: bool) -> Iterator[str]:
+    path = path_to(post_dir)
+    if not dirs:
+        yield from find_files(path, lambda n: n.endswith(post_ext))
+        return
+
+    indexes = (join(e, dir_index) for e in find_files(path))
+    yield from filter(os.path.exists, indexes)
+
+
+def match_avatar(name):
+    return name.startswith(avatar_base + '.')
+
+
+def get_avatar(prev_archive: str | os.PathLike[str], no_get: bool) -> None:
+    if prev_archive is not None:
+        # Copy old avatar, if present
+        avatar_matches = find_files(join(prev_archive, theme_dir), match_avatar)
+        src = next(avatar_matches, None)
+        if src is not None:
+            path_parts = (theme_dir, split(src)[-1])
+            cpy_res = maybe_copy_media(prev_archive, path_parts)
+            if cpy_res:
+                return  # We got the avatar
+    if no_get:
+        return  # Don't download the avatar
+
+    url = 'https://api.tumblr.com/v2/blog/%s/avatar' % blog_name
+    avatar_dest = avatar_fpath = open_file(lambda f: f, (theme_dir, avatar_base))
+
+    # Remove old avatars
+    avatar_matches = find_files(theme_dir, match_avatar)
+    if next(avatar_matches, None) is not None:
+        return  # Do not clobber
+
+    def adj_bn(old_bn, f):
+        # Give it an extension
+        kind = filetype.guess(f)
+        if kind:
+            return avatar_fpath + '.' + kind.extension
+        return avatar_fpath
+
+    # Download the image
+    assert wget_retrieve is not None
+    try:
+        wget_retrieve(url, avatar_dest, adjust_basename=adj_bn)
+    except WGError as e:
+        e.log()
+
+
+def get_style(prev_archive: str | os.PathLike[str], no_get: bool, use_dns_check: bool) -> None:
+    """Get the blog's CSS by brute-forcing it from the home page.
+    The v2 API has no method for getting the style directly.
+    See https://groups.google.com/d/msg/tumblr-api/f-rRH6gOb6w/sAXZIeYx5AUJ"""
+    if prev_archive is not None:
+        # Copy old style, if present
+        path_parts = (theme_dir, 'style.css')
+        cpy_res = maybe_copy_media(prev_archive, path_parts)
+        if cpy_res:
+            return  # We got the style
+    if no_get:
+        return  # Don't download the style
+
+    url = 'https://%s/' % blog_name
+    try:
+        resp = urlopen(url, use_dns_check=use_dns_check)
+        page_data = resp.data
+    except HTTPError as e:
+        logger.error('URL is {}\nError retrieving style: {}\n'.format(url, e))
+        return
+    for match in re.findall(br'(?s)<style type=.text/css.>(.*?)</style>', page_data):
+        css = match.strip().decode('utf-8', errors='replace')
+        if '\n' not in css:
+            continue
+        css = css.replace('\r', '').replace('\n    ', '\n')
+        with open_text(theme_dir, 'style.css') as f:
+            f.write(css + '\n')
+        return
+
+
+# Copy media file, if present in prev_archive
+def maybe_copy_media(prev_archive, path_parts, pa_path_parts=None):
+    if prev_archive is None:
+        return False  # Source does not exist
+    if pa_path_parts is None:
+        pa_path_parts = path_parts  # Default
+
+    srcpath = join(prev_archive, *pa_path_parts)
+    dstpath = open_file(lambda f: f, path_parts)
+
+    try:
+        os.stat(srcpath)
+    except FileNotFoundError:
+        return False  # Source does not exist
+
+    try:
+        os.stat(dstpath)
+    except FileNotFoundError:
+        pass  # Destination does not exist yet
+    else:
+        return True  # Don't overwrite
+
+    with open_outfile('wb', *path_parts) as dstf:
+        copyfile(srcpath, dstf.name)
+        shutil.copystat(srcpath, dstf.name)
+
+    return True  # Copied
+
+
+def check_optional_modules(options: Namespace) -> None:
+    if options.exif:
+        if pyexiv2 is None:
+            raise RuntimeError("--exif: module 'pyexiv2' is not installed")
+        if not hasattr(pyexiv2, 'ImageMetadata'):
+            raise RuntimeError("--exif: module 'pyexiv2' is missing features, perhaps you need 'py3exiv2'?")
+    if options.filter is not None and jq is None:
+        raise RuntimeError("--filter: module 'jq' is not installed")
+    if options.save_notes or options.copy_notes:
+        load_bs4('save notes' if options.save_notes else 'copy notes')
+    if options.save_video and not (have_module('yt_dlp') or have_module('youtube_dl')):
+        raise RuntimeError("--save-video: module 'youtube_dl' is not installed")
+
+
+
+def import_youtube_dl():
+    global ytdl_module
+    if ytdl_module is not None:
+        return ytdl_module
+
+    try:
+        import yt_dlp
+    except ImportError:
+        pass
+    else:
+        ytdl_module = yt_dlp
+        return ytdl_module  # noqa: WPS331
+
+    import youtube_dl
+
+    ytdl_module = youtube_dl
+    return ytdl_module  # noqa: WPS331
+
+
+class Index:
+    index: defaultdict[int, defaultdict[int, list[LocalPost]]]
+
+    def __init__(
+        self, blog: TumblrBackup, posts_per_page: int, dirs: bool, reverse_month: bool, reverse_index: bool,
+        tag_index: bool, body_class: str = 'index',
+    ):
+        self.blog = blog
+        self.posts_per_page = posts_per_page
+        self.dirs_option = dirs
+        self.reverse_month = reverse_month
+        self.reverse_index = reverse_index
+        self.tag_index = tag_index
+        self.body_class = body_class
+        self.index = defaultdict(lambda: defaultdict(list))
+
+    def add_post(self, post):
+        self.index[post.tm.tm_year][post.tm.tm_mon].append(post)
+
+    def save_index(self, index_dir='.', title=None):
+        archives = sorted(((y, m) for y in self.index for m in self.index[y]),
+                          reverse=self.reverse_month)
+        subtitle = self.blog.title if title else self.blog.subtitle
+        title = title or self.blog.title
+        with open_text(index_dir, dir_index) as idx:
+            idx.write(self.blog.header(title, self.body_class, subtitle, avatar=True))
+            if self.tag_index and self.body_class == 'index':
+                idx.write('<p><a href={}>Tag index</a></p>\n'.format(
+                    urlpathjoin(tag_index_dir, dir_index),
+                ))
+            for year in sorted(self.index.keys(), reverse=self.reverse_index):
+                self.save_year(idx, archives, index_dir, year)
+            idx.write(
+                f'<footer><p>Generated on {strftime("%x %X")} by <a href=https://github.com/'
+                f'bbolli/tumblr-utils>tumblr-utils</a>.</p></footer>\n',
+            )
+
+    def save_year(self, idx, archives, index_dir, year):
+        idx.write('<h3>%s</h3>\n<ul>\n' % year)
+        for month in sorted(self.index[year].keys(), reverse=self.reverse_index):
+            tm = time.localtime(time.mktime((year, month, 3, 0, 0, 0, 0, 0, -1)))
+            month_name = self.save_month(archives, index_dir, year, month, tm)
+            idx.write('    <li><a href={} title="{} post(s)">{}</a></li>\n'.format(
+                urlpathjoin(archive_dir, month_name), len(self.index[year][month]), strftime('%B', tm),
+            ))
+        idx.write('</ul>\n\n')
+
+    def save_month(self, archives, index_dir, year, month, tm):
+        posts = sorted(self.index[year][month], key=lambda x: x.date, reverse=self.reverse_month)
+        posts_month = len(posts)
+        posts_page = self.posts_per_page if self.posts_per_page >= 1 else posts_month
+
+        def pages_per_month(y, m):
+            posts_m = len(self.index[y][m])
+            return posts_m // posts_page + bool(posts_m % posts_page)
+
+        def next_month(inc):
+            i = archives.index((year, month))
+            i += inc
+            if 0 <= i < len(archives):
+                return archives[i]
+            return 0, 0
+
+        FILE_FMT = '%d-%02d-p%s%s'
+        pages_month = pages_per_month(year, month)
+        first_file: str | None = None
+        for page, start in enumerate(range(0, posts_month, posts_page), start=1):
+
+            archive = [self.blog.header(strftime('%B %Y', tm), body_class='archive')]
+            archive.extend(p.get_post(self.body_class == 'tag-archive') for p in posts[start:start + posts_page])
+
+            suffix = '/' if self.dirs_option else post_ext
+            file_name = FILE_FMT % (year, month, page, suffix)
+            if self.dirs_option:
+                base = urlpathjoin(save_dir, archive_dir)
+                arch = open_text(index_dir, archive_dir, file_name, dir_index)
+            else:
+                base = ''
+                arch = open_text(index_dir, archive_dir, file_name)
+
+            if page > 1:
+                pp = FILE_FMT % (year, month, page - 1, suffix)
+            else:
+                py, pm = next_month(-1)
+                pp = FILE_FMT % (py, pm, pages_per_month(py, pm), suffix) if py else ''
+                first_file = file_name
+
+            if page < pages_month:
+                np = FILE_FMT % (year, month, page + 1, suffix)
+            else:
+                ny, nm = next_month(+1)
+                np = FILE_FMT % (ny, nm, 1, suffix) if ny else ''
+
+            archive.append(self.blog.footer(base, pp, np))
+
+            with arch as archf:
+                archf.write('\n'.join(archive))
+
+        assert first_file is not None
+        return first_file
+
+
+class TagIndex(Index):
+    def __init__(
+        self, name: str, blog: TumblrBackup, posts_per_page: int, dirs: bool, reverse_month: bool, reverse_index: bool,
+        tag_index: bool,
+    ):
+        super().__init__(blog, posts_per_page, dirs=dirs, reverse_month=reverse_month, reverse_index=reverse_index,
+                         tag_index=tag_index, body_class='tag-archive')
+        self.name = name
+
+
+class Indices:
+    def __init__(
+        self, blog: TumblrBackup, posts_per_page: int, dirs: bool, reverse_month: bool, reverse_index: bool,
+        tag_index: bool,
+    ):
+        self.blog = blog
+        self.posts_per_page = posts_per_page
+        self.dirs_option = dirs
+        self.reverse_month = reverse_month
+        self.reverse_index = reverse_index
+        self.tag_index = tag_index
+        self.main_index = Index(blog, posts_per_page, dirs=dirs, reverse_month=reverse_month,
+                                reverse_index=reverse_index, tag_index=tag_index)
+        self.tags: dict[str, TagIndex] = {}
+
+    def build_index(self):
+        posts = (LocalPost(p, self.tag_index) for p in find_post_files(self.dirs_option))
+        for post in posts:
+            self.main_index.add_post(post)
+            if self.tag_index:
+                for tag, name in post.tags:
+                    if tag not in self.tags:
+                        self.tags[tag] = TagIndex(
+                            name, self.blog, self.posts_per_page, dirs=self.dirs_option,
+                            reverse_month=self.reverse_month, reverse_index=self.reverse_index,
+                            tag_index=self.tag_index,
+                        )
+                    self.tags[tag].name = name
+                    self.tags[tag].add_post(post)
+
+    def save_index(self):
+        self.main_index.save_index()
+        if self.tag_index:
+            self.save_tag_index()
+
+    def save_tag_index(self):
+        global save_dir
+        save_dir = '../../..'
+        mkdir(path_to(tag_index_dir))
+        tag_index = [self.blog.header('Tag index', 'tag-index', self.blog.title, avatar=True), '<ul>']
+        for tag, index in sorted(self.tags.items(), key=lambda kv: kv[1].name):
+            digest = hashlib.md5(to_bytes(tag)).hexdigest()
+            index.save_index(tag_index_dir + os.sep + digest, f'Tag ‛{index.name}’')
+            tag_index.append('    <li><a href={}>{}</a></li>'.format(
+                urlpathjoin(digest, dir_index), escape(index.name),
+            ))
+        tag_index.extend(['</ul>', ''])
+        with open_text(tag_index_dir, dir_index) as f:
+            f.write('\n'.join(tag_index))
+
+
+class TumblrBackup:
+    def __init__(self, options: Namespace, orig_options: dict[str, Any], get_arg_default: Callable[[str], Any]):
+        self.options = options
+        self.orig_options = orig_options
+        self.get_arg_default = get_arg_default
+        self.failed_blogs: list[str] = []
+        self.postfail_blogs: list[str] = []
+        self.total_count = 0
+        self.post_count = 0
+        self.filter_skipped = 0
+        self.title: str | None = None
+        self.subtitle: str | None = None
+        self.pa_options: JSONDict | None = None
+        self.media_list_file: TextIO | None = None
+        self.mlf_seen: set[int] = set()
+        self.mlf_lock = threading.Lock()
+
+    def exit_code(self):
+        if self.failed_blogs or self.postfail_blogs:
+            return EXIT_ERRORS
+        if self.total_count == 0 and not self.options.json_info:
+            return EXIT_NOPOSTS
+        return EXIT_SUCCESS
+
+    def header(self, title='', body_class='', subtitle='', avatar=False):
+        root_rel = {
+            'index': '', 'tag-index': '..', 'tag-archive': '../..',
+        }.get(body_class, save_dir)
+        css_rel = urlpathjoin(root_rel, custom_css if have_custom_css else backup_css)
+        if body_class:
+            body_class = ' class=' + body_class
+        h = textwrap.dedent("""\
+            <!DOCTYPE html>
+
+            <meta charset=%s>
+            <title>%s</title>
+            <link rel=stylesheet href=%s>
+
+            <body%s>
+
+            <header>
+            """ % (FILE_ENCODING, self.title, css_rel, body_class),
+        )
+        if avatar:
+            avatar_matches = find_files(path_to(theme_dir), match_avatar)
+            avatar_path = next(avatar_matches, None)
+            if avatar_path is not None:
+                h += '<img src={} alt=Avatar>\n'.format(urlpathjoin(root_rel, theme_dir, split(avatar_path)[1]))
+        if title:
+            h += '<h1>%s</h1>\n' % title
+        if subtitle:
+            h += '<p class=subtitle>%s</p>\n' % subtitle
+        h += '</header>\n'
+        return h
+
+    @staticmethod
+    def footer(base, previous_page, next_page):
+        f = '<footer><nav>'
+        f += '<a href={} rel=index>Index</a>\n'.format(urlpathjoin(save_dir, dir_index))
+        if previous_page:
+            f += '| <a href={} rel=prev>Previous</a>\n'.format(urlpathjoin(base, previous_page))
+        if next_page:
+            f += '| <a href={} rel=next>Next</a>\n'.format(urlpathjoin(base, next_page))
+        f += '</nav></footer>\n'
+        return f
+
+    @staticmethod
+    def get_post_timestamp(post, bs4_class):
+        if TYPE_CHECKING:
+            from bs4 import BeautifulSoup
+        else:
+            BeautifulSoup = bs4_class
+
+        with open(post, encoding=FILE_ENCODING) as pf:
+            soup = BeautifulSoup(pf, 'lxml')
+        postdate = cast(Tag, soup.find('time'))['datetime']
+        # datetime.fromisoformat does not understand 'Z' suffix
+        return int(datetime.strptime(cast(str, postdate), '%Y-%m-%dT%H:%M:%SZ').timestamp())
+
+    def process_existing_backup(self, account, prev_archive):
+        complete_backup = os.path.exists(path_to('.complete'))
+        try:
+            with open(path_to('.first_run_options'), encoding=FILE_ENCODING) as f:
+                first_run_options = json.load(f)
+        except FileNotFoundError:
+            first_run_options = None
+
+        @dataclass(frozen=True)
+        class Options:
+            fro: dict[str, Any]
+            orig: dict[str, Any]
+            def differs(self, opt): return opt not in self.fro or self.orig[opt] != self.fro[opt]
+            def first(self, opts): return {opt: self.fro.get(opt, '<not present>') for opt in opts}
+            def this(self, opts): return {opt: self.orig[opt] for opt in opts}
+
+        # These options must always match
+        backdiff_nondef = None
+        if first_run_options is not None:
+            opts = Options(first_run_options, self.orig_options)
+            mustmatchdiff = tuple(filter(opts.differs, MUST_MATCH_OPTIONS))
+            if mustmatchdiff:
+                raise RuntimeError('{}: The script was given {} but the existing backup was made with {}'.format(
+                    account, opts.this(mustmatchdiff), opts.first(mustmatchdiff)))
+
+            backdiff = tuple(filter(opts.differs, BACKUP_CHANGING_OPTIONS))
+            if complete_backup:
+                # Complete archives may be added to with different options
+                if (
+                    self.options.resume
+                    and first_run_options.get('count') is None
+                    and (self.orig_options['period'] or [0, 0])[0] >= (first_run_options.get('period') or [0, 0])[0]
+                ):
+                    raise RuntimeError('{}: Cannot continue complete backup that was not stopped early with --count or '
+                                       '--period'.format(account))
+            elif self.options.resume:
+                backdiff_nondef = tuple(opt for opt in backdiff if self.orig_options[opt] != self.get_arg_default(opt))
+                if backdiff_nondef and not self.options.ignore_diffopt:
+                    raise RuntimeError('{}: The script was given {} but the existing backup was made with {}. You may '
+                                       'skip this check with --ignore-diffopt.'.format(
+                                            account, opts.this(backdiff_nondef), opts.first(backdiff_nondef)))
+            elif not backdiff:
+                raise RuntimeError('{}: Found incomplete archive, try --continue'.format(account))
+            elif not self.options.ignore_diffopt:
+                raise RuntimeError('{}: Refusing to make a different backup (with {} instead of {}) over an incomplete '
+                                   'archive. Delete the old backup to start fresh, or skip this check with '
+                                   '--ignore-diffopt (optionally with --continue).'.format(
+                                       account, opts.this(backdiff), opts.first(backdiff)))
+
+        pa_options = None
+        if prev_archive is not None:
+            try:
+                with open(join(prev_archive, '.first_run_options'), encoding=FILE_ENCODING) as f:
+                    pa_options = json.load(f)
+            except FileNotFoundError:
+                pa_options = None
+
+            # These options must always match
+            if pa_options is not None:
+                pa_opts = Options(pa_options, self.orig_options)
+                mustmatchdiff = tuple(filter(pa_opts.differs, PREV_MUST_MATCH_OPTIONS))
+                if mustmatchdiff:
+                    raise RuntimeError('{}: The script was given {} but the previous archive was made with {}'.format(
+                        account, pa_opts.this(mustmatchdiff), pa_opts.first(mustmatchdiff)))
+
+        oldest_tstamp = None
+        if self.options.resume or not complete_backup:
+            # Read every post to find the oldest timestamp already saved
+            post_glob = list(find_post_files(self.options.dirs))
+            if not self.options.resume:
+                pass  # No timestamp needed but may want to know if posts are present
+            elif not post_glob:
+                raise RuntimeError('{}: Cannot continue empty backup'.format(account))
+            else:
+                logger.warn('Found incomplete backup.\n', account=True)
+                BeautifulSoup = load_bs4('continue incomplete backup')
+                if self.options.likes:
+                    logger.warn('Finding oldest liked post (may take a while)\n', account=True)
+                    oldest_tstamp = min(self.get_post_timestamp(post, BeautifulSoup) for post in post_glob)
+                else:
+                    post_min = min(post_glob, key=lambda f: int(splitext(split(f)[1])[0]))
+                    oldest_tstamp = self.get_post_timestamp(post_min, BeautifulSoup)
+                logger.info(
+                    'Backing up posts before timestamp={} ({})\n'.format(oldest_tstamp, time.ctime(oldest_tstamp)),
+                    account=True,
+                )
+
+        write_fro = False
+        if backdiff_nondef is not None:
+            # Load saved options, unless they were overridden with --ignore-diffopt
+            for opt in BACKUP_CHANGING_OPTIONS:
+                if opt not in backdiff_nondef:
+                    setattr(self.options, opt, first_run_options[opt])
+        else:
+            # Load original options
+            for opt in BACKUP_CHANGING_OPTIONS:
+                setattr(self.options, opt, self.orig_options[opt])
+            if first_run_options is None and not (complete_backup or post_glob):
+                # Presumably this is the initial backup of this blog
+                write_fro = True
+
+        if pa_options is None and prev_archive is not None:
+            # Fallback assumptions
+            logger.warn('Warning: Unknown media path options for previous archive, assuming they match ours\n',
+                        account=True)
+            pa_options = {opt: getattr(self.options, opt) for opt in MEDIA_PATH_OPTIONS}
+
+        return oldest_tstamp, pa_options, write_fro
+
+    def record_media(self, ident: int, urls: set[str]) -> None:
+        with self.mlf_lock:
+            if self.media_list_file is not None and ident not in self.mlf_seen:
+                json.dump(dict(post=ident, media=sorted(urls)), self.media_list_file, separators=(',', ':'))
+                self.media_list_file.write('\n')
+                self.mlf_seen.add(ident)
+
+    def backup(self, account, prev_archive):
+        """makes single files and an index for every post on a public Tumblr blog account"""
+
+        base = get_api_url(account, likes=self.options.likes)
+
+        # make sure there are folders to save in
+        global save_folder, media_folder, post_ext, post_dir, save_dir, have_custom_css
+        if self.options.json_info:
+            pass  # Not going to save anything
+        elif self.options.blosxom:
+            save_folder = root_folder
+            post_ext = '.txt'
+            post_dir = os.curdir
+            post_class: type[TumblrPost] = BlosxomPost
+        else:
+            save_folder = join(root_folder, self.options.outdir or account)
+            media_folder = path_to(media_dir)
+            if self.options.dirs:
+                post_ext = ''
+                save_dir = '../..'
+            post_class = TumblrPost
+            have_custom_css = os.access(path_to(custom_css), os.R_OK)
+
+        self.post_count = 0
+        self.filter_skipped = 0
+
+        oldest_tstamp, self.pa_options, write_fro = self.process_existing_backup(account, prev_archive)
+        check_optional_modules(self.options)
+
+        if self.options.idents:
+            # Normalize idents
+            self.options.idents.sort(reverse=True)
+
+        if self.options.incremental or self.options.resume:
+            post_glob = list(find_post_files(self.options.dirs))
+
+        ident_max = None
+        if self.options.incremental and post_glob:
+            if self.options.likes:
+                # Read every post to find the newest timestamp already saved
+                logger.warn('Finding newest liked post (may take a while)\n', account=True)
+                BeautifulSoup = load_bs4('backup likes incrementally')
+                ident_max = max(self.get_post_timestamp(post, BeautifulSoup) for post in post_glob)
+                logger.info('Backing up posts after timestamp={} ({})\n'.format(ident_max, time.ctime(ident_max)),
+                            account=True)
+            else:
+                # Get the highest post id already saved
+                ident_max = max(int(splitext(split(f)[1])[0]) for f in post_glob)
+                logger.info('Backing up posts after id={}\n'.format(ident_max), account=True)
+
+        if self.options.resume:
+            # Update skip and count based on where we left off
+            self.options.skip = 0
+            self.post_count = len(post_glob)
+
+        logger.status('Getting basic information\r')
+
+        api_parser = ApiParser(base, account, self.options)
+        if not api_parser.read_archive(prev_archive):
+            self.failed_blogs.append(account)
+            return
+        resp = api_parser.get_initial()
+        if not resp:
+            self.failed_blogs.append(account)
+            return
+
+        # collect all the meta information
+        if self.options.likes:
+            if not resp.get('blog', {}).get('share_likes', True):
+                logger.error('{} does not have public likes\n'.format(account))
+                self.failed_blogs.append(account)
+                return
+            posts_key = 'liked_posts'
+            blog = {}
+            count_estimate = resp['liked_count']
+        else:
+            posts_key = 'posts'
+            blog = resp.get('blog', {})
+            count_estimate = blog.get('posts')
+        self.title = escape(blog.get('title', account))
+        self.subtitle = blog.get('description', '')
+
+        if self.options.json_info:
+            posts = resp[posts_key]
+            info = {'uuid': blog.get('uuid'),
+                    'post_count': count_estimate,
+                    'last_post_ts': posts[0]['timestamp'] if posts else None}
+            json.dump(info, sys.stdout)
+            return
+
+        if write_fro:
+            # Blog directory gets created here
+            with open_text('.first_run_options') as f:
+                json.dump(self.orig_options, f)
+                f.write('\n')
+
+        def build_index():
+            logger.status('Getting avatar and style\r')
+            get_avatar(prev_archive, no_get=self.options.no_get)
+            get_style(prev_archive, no_get=self.options.no_get, use_dns_check=self.options.use_dns_check)
+            if not have_custom_css:
+                save_style()
+            logger.status('Building index\r')
+            ix = Indices(
+                self, self.options.posts_per_page, dirs=self.options.dirs, reverse_month=self.options.reverse_month,
+                reverse_index=self.options.reverse_index, tag_index=self.options.tag_index,
+            )
+            ix.build_index()
+            ix.save_index()
+
+            if not (account in self.failed_blogs or os.path.exists(path_to('.complete'))):
+                # Make .complete file
+                sf: int | None
+                if os.name == 'posix':  # Opening directories and fdatasync are POSIX features
+                    sf = opendir(save_folder, os.O_RDONLY)
+                else:
+                    sf = None
+                try:
+                    if sf is not None:
+                        fdatasync(sf)
+                    with open(open_file(lambda f: f, ('.complete',)), 'wb') as f:
+                        fsync(f)
+                    if sf is not None:
+                        fdatasync(sf)
+                finally:
+                    if sf is not None:
+                        os.close(sf)
+
+        if not self.options.blosxom and self.options.count == 0:
+            build_index()
+            return
+
+        # use the meta information to create a HTML header
+        TumblrPost.post_header = self.header(body_class='post')
+
+        jq_filter = request_sets = None
+        if self.options.filter is not None:
+            assert jq is not None
+            jq_filter = jq.compile(self.options.filter)
+        if self.options.request is not None:
+            request_sets = {typ: set(tags) for typ, tags in self.options.request.items()}
+
+        # start the thread pool
+        backup_pool = ThreadPool(self.options.threads)
+
+        before = self.options.period[1] if self.options.period else None
+        if oldest_tstamp is not None:
+            before = oldest_tstamp if before is None else min(before, oldest_tstamp)
+        if before is not None and api_parser.dashboard_only_blog:
+            logger.warn('Warning: skipping posts on a dashboard-only blog is slow\n', account=True)
+
+        # returns whether any posts from this batch were saved
+        def _backup(posts):
+            def sort_key(x): return x['liked_timestamp'] if self.options.likes else int(x['id'])
+            oldest_date = None
+            for p in sorted(posts, key=sort_key, reverse=True):
+                no_internet.check()
+                enospc.check()
+                post = post_class(p, self.options, account, prev_archive, self.pa_options, self.record_media)
+                oldest_date = post.date
+                if before is not None and post.date >= before:
+                    if api_parser.dashboard_only_blog:
+                        continue  # cannot request 'before' with the svc API
+                    raise RuntimeError('Found post with date ({}) newer than before param ({})'.format(
+                        post.date, before))
+                if ident_max is None:
+                    pass  # No limit
+                elif (p['liked_timestamp'] if self.options.likes else int(post.ident)) <= ident_max:
+                    logger.info('Stopping backup: Incremental backup complete\n', account=True)
+                    return False, oldest_date
+                if self.options.period and post.date < self.options.period[0]:
+                    logger.info('Stopping backup: Reached end of period\n', account=True)
+                    return False, oldest_date
+                if next_ident is not None and int(post.ident) != next_ident:
+                    logger.error("post '{}' not found\n".format(next_ident), account=True)
+                    return False, oldest_date
+                if request_sets:
+                    if post.typ not in request_sets:
+                        continue
+                    tags = request_sets[post.typ]
+                    if not (TAG_ANY in tags or tags & {t.lower() for t in post.tags}):
+                        continue
+                if self.options.no_reblog and post_is_reblog(p):
+                    continue
+                if self.options.only_reblog and not post_is_reblog(p):
+                    continue
+                if jq_filter:
+                    try:
+                        matches = jq_filter.input(p).first()
+                    except StopIteration:
+                        matches = False
+                    if not matches:
+                        self.filter_skipped += 1
+                        continue
+                if os.path.exists(path_to(*post.get_path())) and self.options.no_post_clobber:
+                    continue  # Post exists and no-clobber enabled
+
+                with multicond:
+                    while backup_pool.queue.qsize() >= backup_pool.queue.maxsize:
+                        no_internet.check(release=True)
+                        enospc.check(release=True)
+                        # All conditions false, wait for a change
+                        multicond.wait((backup_pool.queue.not_full, no_internet.cond, enospc.cond))
+                    backup_pool.add_work(post.save_post)
+
+                self.post_count += 1
+                if self.options.count and self.post_count >= self.options.count:
+                    logger.info('Stopping backup: Reached limit of {} posts\n'.format(self.options.count), account=True)
+                    return False, oldest_date
+            return True, oldest_date
+
+        api_thread = AsyncCallable(main_thread_lock, api_parser.apiparse, 'API Thread')
+
+        next_ident: int | None = None
+        if self.options.idents is not None:
+            remaining_idents = self.options.idents.copy()
+            count_estimate = len(remaining_idents)
+
+        mlf: ContextManager[TextIO] | None
+        if self.options.media_list:
+            mlf = open_text('media.json', mode='r+')
+            self.media_list_file = mlf.__enter__()
+            self.mlf_seen.clear()
+            for line in self.media_list_file:
+                doc = json.loads(line)
+                self.mlf_seen.add(doc['post'])
+        else:
+            mlf = None
+
+        try:
+            # Get the JSON entries from the API, which we can only do for MAX_POSTS posts at once.
+            # Posts "arrive" in reverse chronological order. Post #0 is the most recent one.
+            i = self.options.skip
+
+            while True:
+                # find the upper bound
+                logger.status('Getting {}posts {} to {}{}\r'.format(
+                    'liked ' if self.options.likes else '', i, i + MAX_POSTS - 1,
+                    '' if count_estimate is None else ' (of {} expected)'.format(count_estimate),
+                ))
+
+                if self.options.idents is not None:
+                    try:
+                        next_ident = remaining_idents.pop(0)
+                    except IndexError:
+                        # if the last requested post does not get backed up we end up here
+                        logger.info('Stopping backup: End of requested posts\n', account=True)
+                        break
+
+                with multicond:
+                    api_thread.put(MAX_POSTS, i, before, next_ident)
+
+                    while not api_thread.response.qsize():
+                        no_internet.check(release=True)
+                        enospc.check(release=True)
+                        # All conditions false, wait for a change
+                        multicond.wait((api_thread.response.not_empty, no_internet.cond, enospc.cond))
+
+                    resp = api_thread.get(block=False)
+
+                if resp is None:
+                    self.failed_blogs.append(account)
+                    break
+
+                posts = resp[posts_key]
+                if not posts:
+                    logger.info('Backup complete: Found empty set of posts\n', account=True)
+                    break
+
+                res, oldest_date = _backup(posts)
+                if not res:
+                    break
+
+                if self.options.likes and prev_archive is None:
+                    next_ = resp['_links'].get('next')
+                    if next_ is None:
+                        logger.info('Backup complete: Found end of likes\n', account=True)
+                        break
+                    before = int(next_['query_params']['before'])
+                elif before is not None and not api_parser.dashboard_only_blog:
+                    assert oldest_date <= before
+                    if oldest_date == before:
+                        oldest_date -= 1
+                    before = oldest_date
+
+                if self.options.idents is None:
+                    i += MAX_POSTS
+                else:
+                    i += 1
+
+            api_thread.quit()
+            backup_pool.wait()  # wait until all posts have been saved
+        except:
+            api_thread.quit()
+            backup_pool.cancel()  # ensure proper thread pool termination
+            raise
+        finally:
+            if mlf is not None:
+                mlf.__exit__(*sys.exc_info())
+                self.media_list_file = None
+
+        if backup_pool.errors:
+            self.postfail_blogs.append(account)
+
+        # postprocessing
+        if not self.options.blosxom and self.post_count:
+            build_index()
+
+        logger.status(None)
+        skipped_msg = (', {} did not match filter'.format(self.filter_skipped)) if self.filter_skipped else ''
+        logger.warn(
+            '{} {}posts backed up{}\n'.format(self.post_count, 'liked ' if self.options.likes else '', skipped_msg),
+            account=True,
+        )
+        self.total_count += self.post_count
+
+
+class TumblrPost:
+    post_header = ''  # set by TumblrBackup.backup()
+
+    def __init__(
+        self,
+        post: JSONDict,
+        options: Namespace,
+        backup_account: str,
+        prev_archive: str | None,
+        pa_options: JSONDict | None,
+        record_media: Callable[[int, set[str]], None],
+    ) -> None:
+        self.post = post
+        self.options = options
+        self.backup_account = backup_account
+        self.prev_archive = prev_archive
+        self.pa_options = pa_options
+        self.record_media = record_media
+        self.post_media: set[str] = set()
+        self.creator = post.get('blog_name') or post['tumblelog']
+        self.ident = str(post['id'])
+        self.url = post['post_url']
+        self.shorturl = post['short_url']
+        self.typ = str(post['type'])
+        self.date: float = post['liked_timestamp' if options.likes else 'timestamp']
+        self.isodate = datetime.utcfromtimestamp(self.date).isoformat() + 'Z'
+        self.tm = time.localtime(self.date)
+        self.title = ''
+        self.tags: str = post['tags']
+        self.note_count = post.get('note_count')
+        if self.note_count is None:
+            self.note_count = post.get('notes', {}).get('count')
+        if self.note_count is None:
+            self.note_count = 0
+        self.reblogged_from = post.get('reblogged_from_url')
+        self.reblogged_root = post.get('reblogged_root_url')
+        self.source_title = post.get('source_title', '')
+        self.source_url = post.get('source_url', '')
+        self.file_name = join(self.ident, dir_index) if options.dirs else self.ident + post_ext
+        self.llink = self.ident if options.dirs else self.file_name
+        self.media_dir = join(post_dir, self.ident) if options.dirs else media_dir
+        self.media_url = urlpathjoin(save_dir, self.media_dir)
+        self.media_folder = path_to(self.media_dir)
+
+    def get_content(self):
+        """generates the content for this post"""
+        post = self.post
+        content = []
+        self.post_media.clear()
+
+        def append(s, fmt='%s'):
+            content.append(fmt % s)
+
+        def get_try(elt) -> Any | Literal['']:
+            return post.get(elt, '')
+
+        def append_try(elt, fmt='%s'):
+            elt = get_try(elt)
+            if elt:
+                if self.options.save_images:
+                    elt = re.sub(r"""(?i)(<img\s(?:[^>]*\s)?src\s*=\s*["'])(.*?)(["'][^>]*>)""",
+                                 self.get_inline_image, elt)
+                if self.options.save_video or self.options.save_video_tumblr:
+                    # Handle video element poster attribute
+                    elt = re.sub(r"""(?i)(<video\s(?:[^>]*\s)?poster\s*=\s*["'])(.*?)(["'][^>]*>)""",
+                                 self.get_inline_video_poster, elt)
+                    # Handle video element's source sub-element's src attribute
+                    elt = re.sub(r"""(?i)(<source\s(?:[^>]*\s)?src\s*=\s*["'])(.*?)(["'][^>]*>)""",
+                                 self.get_inline_video, elt)
+                append(elt, fmt)
+
+        if self.typ == 'text':
+            self.title = get_try('title')
+            append_try('body')
+
+        elif self.typ == 'photo':
+            url = get_try('link_url')
+            is_photoset = len(post['photos']) > 1
+            for offset, p in enumerate(post['photos'], start=1):
+                o = p['alt_sizes'][0] if 'alt_sizes' in p else p['original_size']
+                src = o['url']
+                if self.options.save_images:
+                    src = self.get_image_url(src, offset if is_photoset else 0)
+                append(escape(src), '<img alt="" src="%s">')
+                if url:
+                    content[-1] = '<a href="%s">%s</a>' % (escape(url), content[-1])
+                content[-1] = '<p>' + content[-1] + '</p>'
+                if p['caption']:
+                    append(p['caption'], '<p>%s</p>')
+            append_try('caption')
+
+        elif self.typ == 'link':
+            url = post['url']
+            self.title = '<a href="%s">%s</a>' % (escape(url), post['title'] or url)
+            append_try('description')
+
+        elif self.typ == 'quote':
+            append(post['text'], '<blockquote><p>%s</p></blockquote>')
+            append_try('source', '<p>%s</p>')
+
+        elif self.typ == 'video':
+            src = ''
+            if (
+                (self.options.save_video or self.options.save_video_tumblr)
+                and post['video_type'] == 'tumblr'
+            ):
+                src = self.get_media_url(post['video_url'], '.mp4')
+            elif self.options.save_video:
+                src = self.get_youtube_url(self.url)
+                if not src:
+                    logger.warn('Unable to download video in post #{}\n'.format(self.ident))
+            if src:
+                append('<p><video controls><source src="%s" type=video/mp4>%s<br>\n<a href="%s">%s</a></video></p>' % (
+                    src, 'Your browser does not support the video element.', src, 'Video file',
+                ))
+            else:
+                player = get_try('player')
+                if player:
+                    append(player[-1]['embed_code'])
+                else:
+                    append_try('video_url')
+            append_try('caption')
+
+        elif self.typ == 'audio':
+            def make_player(src):
+                append(textwrap.dedent(
+                    f'<p><audio controls><source src="{src}" type=audio/mpeg>'
+                    f'Your browser does not support the audio element.<br>\n<a href="{src}">Audio file</a></audio></p>',
+                ))
+
+            src = None
+            audio_url = get_try('audio_url') or get_try('audio_source_url')
+            if self.options.save_audio:
+                if post['audio_type'] == 'tumblr':
+                    if audio_url.startswith('https://a.tumblr.com/'):
+                        src = self.get_media_url(audio_url, '.mp3')
+                    elif audio_url.startswith('https://www.tumblr.com/audio_file/'):
+                        audio_url = 'https://a.tumblr.com/{}o1.mp3'.format(urlbasename(urlparse(audio_url).path))
+                        src = self.get_media_url(audio_url, '.mp3')
+                elif post['audio_type'] == 'soundcloud':
+                    src = self.get_media_url(audio_url, '.mp3')
+            player = get_try('player')
+            if src:
+                make_player(src)
+            elif player:
+                append(player)
+            elif audio_url:
+                make_player(audio_url)
+            append_try('caption')
+
+        elif self.typ == 'answer':
+            self.title = post['question']
+            append_try('answer')
+
+        elif self.typ == 'chat':
+            self.title = get_try('title')
+            append(
+                '<br>\n'.join('%(label)s %(phrase)s' % d for d in post['dialogue']),
+                '<p>%s</p>',
+            )
+
+        else:
+            logger.warn("Unknown post type '{}' in post #{}\n".format(self.typ, self.ident))
+            append(escape(self.get_json_content()), '<pre>%s</pre>')
+
+        # Write URLs to media.json
+        self.record_media(int(self.ident), self.post_media)
+
+        content_str = '\n'.join(content)
+
+        # fix wrongly nested HTML elements
+        for p in ('<p>(<({})>)', '(</({})>)</p>'):  # noqa: P103
+            content_str = re.sub(p.format('p|ol|iframe[^>]*'), r'\1', content_str)
+
+        return content_str
+
+    def get_youtube_url(self, youtube_url):
+        # determine the media file name
+        filetmpl = '%(id)s_%(uploader_id)s_%(title)s.%(ext)s'
+        ydl_options = {
+            'outtmpl': join(self.media_folder, filetmpl),
+            'quiet': True,
+            'restrictfilenames': True,
+            'noplaylist': True,
+            'continuedl': True,
+            'nooverwrites': True,
+            'retries': 3000,
+            'fragment_retries': 3000,
+            'ignoreerrors': True,
+        }
+        if self.options.cookiefile is not None:
+            ydl_options['cookiefile'] = self.options.cookiefile
+
+        if TYPE_CHECKING:
+            import youtube_dl
+        else:
+            youtube_dl = import_youtube_dl()
+
+        ydl = youtube_dl.YoutubeDL(ydl_options)
+        ydl.add_default_info_extractors()
+        try:
+            result = ydl.extract_info(youtube_url, download=False)
+            media_filename = youtube_dl.utils.sanitize_filename(filetmpl % result['entries'][0], restricted=True)
+        except Exception:
+            return ''
+
+        # check if a file with this name already exists
+        if not os.path.isfile(media_filename):
+            try:
+                ydl.extract_info(youtube_url, download=True)
+            except Exception:
+                return ''
+        return urlpathjoin(self.media_url, split(media_filename)[1])
+
+    def get_media_url(self, media_url, extension):
+        if not media_url:
+            return ''
+        saved_name = self.download_media(media_url, extension=extension)
+        if saved_name is not None:
+            return urlpathjoin(self.media_url, saved_name)
+        return media_url
+
+    def get_image_url(self, image_url, offset):
+        """Saves an image if not saved yet. Returns the new URL or
+        the original URL in case of download errors."""
+        saved_name = self.download_media(image_url, offset='_o%s' % offset if offset else '')
+        if saved_name is not None:
+            if self.options.exif and saved_name.endswith('.jpg'):
+                add_exif(join(self.media_folder, saved_name), set(self.tags), self.options.exif)
+            return urlpathjoin(self.media_url, saved_name)
+        return image_url
+
+    @staticmethod
+    def maxsize_image_url(image_url):
+        if '.tumblr.com/' not in image_url or image_url.endswith('.gif'):
+            return image_url
+        # change the image resolution to 1280
+        return re.sub(r'_\d{2,4}(\.\w+)$', r'_1280\1', image_url)
+
+    def get_inline_image(self, match):
+        """Saves an inline image if not saved yet. Returns the new <img> tag or
+        the original one in case of download errors."""
+        image_url, image_filename = self._parse_url_match(match, transform=self.maxsize_image_url)
+        if not image_filename or not image_url.startswith('http'):
+            return match.group(0)
+        saved_name = self.download_media(image_url, filename=image_filename)
+        if saved_name is None:
+            return match.group(0)
+        return match.group(1) + self.media_url + '/' + saved_name + match.group(3)
+
+    def get_inline_video_poster(self, match):
+        """Saves an inline video poster if not saved yet. Returns the new
+        <video> tag or the original one in case of download errors."""
+        poster_url, poster_filename = self._parse_url_match(match)
+        if not poster_filename or not poster_url.startswith('http'):
+            return match.group(0)
+        saved_name = self.download_media(poster_url, filename=poster_filename)
+        if saved_name is None:
+            return match.group(0)
+        # get rid of autoplay and muted attributes to align with normal video
+        # download behaviour
+        el = '%s%s/%s%s' % (match.group(1), self.media_url, saved_name, match.group(3))
+        return el.replace('autoplay="autoplay"', '').replace('muted="muted"', '')
+
+    def get_inline_video(self, match):
+        """Saves an inline video if not saved yet. Returns the new <video> tag
+        or the original one in case of download errors."""
+        video_url, video_filename = self._parse_url_match(match)
+        if not video_filename or not video_url.startswith('http'):
+            return match.group(0)
+        saved_name = None
+        if '.tumblr.com' in video_url:
+            saved_name = self.get_media_url(video_url, '.mp4')
+        elif self.options.save_video:
+            saved_name = self.get_youtube_url(video_url)
+        if saved_name is None:
+            return match.group(0)
+        return '%s%s%s' % (match.group(1), saved_name, match.group(3))
+
+    def get_filename(self, parsed_url, image_names, offset=''):
+        """Determine the image file name depending on image_names"""
+        fname = urlbasename(parsed_url.path)
+        ext = urlsplitext(fname)[1]
+        if parsed_url.query:
+            # Insert the query string to avoid ambiguity for certain URLs (e.g. SoundCloud embeds).
+            query_sep = '@' if os.name == 'nt' else '?'
+            if ext:
+                extwdot = '.{}'.format(ext)
+                fname = fname[:-len(extwdot)] + query_sep + parsed_url.query + extwdot
+            else:
+                fname = fname + query_sep + parsed_url.query
+        if image_names == 'i':
+            return self.ident + offset + ext
+        if image_names == 'bi':
+            return self.backup_account + '_' + self.ident + offset + ext
+        # delete characters not allowed under Windows
+        return re.sub(r'[:<>"/\\|*?]', '', fname) if os.name == 'nt' else fname
+
+    def download_media(self, url, filename=None, offset='', extension=None):
+        parsed_url = urlparse(url, 'http')
+        hostname = parsed_url.hostname
+        if parsed_url.scheme not in ('http', 'https') or not hostname:
+            return None  # This URL does not follow our basic assumptions
+
+        # Make a sane directory to represent the host
+        try:
+            hostname = hostname.encode('idna').decode('ascii')
+        except UnicodeError:
+            pass
+        if hostname in ('.', '..'):
+            hostname = hostname.replace('.', '%2E')
+        if parsed_url.port not in (None, (80 if parsed_url.scheme == 'http' else 443)):
+            hostname += '{}{}'.format('+' if os.name == 'nt' else ':', parsed_url.port)
+
+        def get_path(media_dir, image_names, hostdirs):
+            if filename is not None:
+                fname = filename
+            else:
+                fname = self.get_filename(parsed_url, image_names, offset)
+                if extension is not None:
+                    fname = splitext(fname)[0] + extension
+            return media_dir, *((hostname,) if hostdirs else ()), fname
+
+        path_parts = get_path(self.media_dir, self.options.image_names, self.options.hostdirs)
+        media_path = path_to(*path_parts)
+
+        # prevent racing of existence check and download
+        with downloading_media_cond:
+            while media_path in downloading_media:
+                downloading_media_cond.wait()
+            downloading_media.add(media_path)
+
+        try:
+            return self._download_media_inner(url, get_path, path_parts, media_path)
+        finally:
+            with downloading_media_cond:
+                downloading_media.remove(media_path)
+                downloading_media_cond.notify_all()
+
+    def get_post(self):
+        """returns this post in HTML"""
+        typ = ('liked-' if self.options.likes else '') + self.typ
+        post = self.post_header + '<article class=%s id=p-%s>\n' % (typ, self.ident)
+        post += '<header>\n'
+        if self.options.likes:
+            post += '<p><a href=\"https://{0}.tumblr.com/\" class=\"tumblr_blog\">{0}</a>:</p>\n'.format(self.creator)
+        post += '<p><time datetime=%s>%s</time>\n' % (self.isodate, strftime('%x %X', self.tm))
+        post += '<a class=llink href={}>¶</a>\n'.format(urlpathjoin(save_dir, post_dir, self.llink))
+        post += '<a href=%s>●</a>\n' % self.shorturl
+        if self.reblogged_from and self.reblogged_from != self.reblogged_root:
+            post += '<a href=%s>⬀</a>\n' % self.reblogged_from
+        if self.reblogged_root:
+            post += '<a href=%s>⬈</a>\n' % self.reblogged_root
+        post += '</header>\n'
+        content = self.get_content()
+        if self.title:
+            post += '<h2>%s</h2>\n' % self.title
+        post += content
+        foot = []
+        if self.tags:
+            foot.append(''.join(self.tag_link(t) for t in self.tags))
+        if self.source_title and self.source_url:
+            foot.append(f'<a title=Source href={self.source_url}>{self.source_title}</a>')
+
+        notes_html = ''
+
+        if self.options.save_notes or self.options.copy_notes:
+            if TYPE_CHECKING:
+                from bs4 import BeautifulSoup
+            else:
+                BeautifulSoup = load_bs4('save notes' if self.options.save_notes else 'copy notes')
+
+        if self.options.copy_notes:
+            # Copy notes from prev_archive (or here)
+            prev_archive = save_folder if self.options.reuse_json else self.prev_archive
+            assert prev_archive is not None
+            try:
+                with open(join(prev_archive, post_dir, self.ident + post_ext)) as post_file:
+                    soup = BeautifulSoup(post_file, 'lxml')
+            except FileNotFoundError:
+                pass  # skip
+            else:
+                notes = cast(Tag, soup.find('ol', class_='notes'))
+                if notes is not None:
+                    notes_html = ''.join([n.prettify() for n in notes.find_all('li')])
+
+        if self.options.save_notes and self.backup_account not in disable_note_scraper and not notes_html.strip():
+            from . import note_scraper
+
+            # Scrape and save notes
+            while True:
+                ns_stdout_rd, ns_stdout_wr = multiprocessing.Pipe(duplex=False)
+                ns_msg_queue: SimpleQueue[tuple[LogLevel, str]] = multiprocessing.SimpleQueue()
+                try:
+                    args = (
+                        ns_stdout_wr, ns_msg_queue, self.url, self.ident, self.options.no_ssl_verify,
+                        self.options.user_agent, self.options.cookiefile, self.options.notes_limit,
+                        self.options.use_dns_check,
+                    )
+                    process = multiprocessing.Process(target=note_scraper.main, args=args)
+                    process.start()
+                except:
+                    ns_stdout_rd.close()
+                    ns_msg_queue._reader.close()  # type: ignore[attr-defined]
+                    raise
+                finally:
+                    ns_stdout_wr.close()
+                    ns_msg_queue._writer.close()  # type: ignore[attr-defined]
+
+                try:
+                    try:
+                        while True:
+                            level, msg = ns_msg_queue.get()
+                            logger.log(level, msg)
+                    except EOFError:
+                        pass  # Exit loop
+                    finally:
+                        ns_msg_queue.close()  # type: ignore[attr-defined]
+
+                    with ConnectionFile(ns_stdout_rd) as stdout:
+                        notes_html = stdout.read()
+
+                    process.join()
+                except:
+                    process.terminate()
+                    process.join()
+                    raise
+
+                if process.exitcode == 2:  # EXIT_SAFE_MODE
+                    # Safe mode is blocking us, disable note scraping for this blog
+                    notes_html = ''
+                    with disablens_lock:
+                        # Check if another thread already set this
+                        if self.backup_account not in disable_note_scraper:
+                            disable_note_scraper.add(self.backup_account)
+                            logger.info(
+                                f'[Note Scraper] Blocked by safe mode - scraping disabled for {self.backup_account}\n',
+                            )
+                elif process.exitcode == 3:  # EXIT_NO_INTERNET
+                    no_internet.signal()
+                    continue
+                break
+
+        notes_str = '{} note{}'.format(self.note_count, 's'[self.note_count == 1:])
+        if notes_html.strip():
+            foot.append('<details><summary>{}</summary>\n'.format(notes_str))
+            foot.append('<ol class="notes">')
+            foot.append(notes_html)
+            foot.append('</ol></details>')
+        else:
+            foot.append(notes_str)
+
+        if foot:
+            post += '\n<footer>{}</footer>'.format('\n'.join(foot))
+        post += '\n</article>\n'
+        return post
+
+    @staticmethod
+    def tag_link(tag):
+        tag_disp = escape(TAG_FMT.format(tag))
+        if not TAGLINK_FMT:
+            return tag_disp + ' '
+        url = TAGLINK_FMT.format(domain=blog_name, tag=quote(to_bytes(tag)))
+        return '<a href=%s>%s</a>\n' % (url, tag_disp)
+
+    def get_path(self):
+        return (post_dir, self.ident, dir_index) if self.options.dirs else (post_dir, self.file_name)
+
+    def save_post(self):
+        """saves this post locally"""
+        if self.options.json and not self.options.reuse_json:
+            with open_text(json_dir, self.ident + '.json') as f:
+                f.write(self.get_json_content())
+        path_parts = self.get_path()
+        try:
+            with open_text(*path_parts) as f:
+                f.write(self.get_post())
+            os.utime(path_to(*path_parts), (self.date, self.date))
+        except Exception:
+            logger.error('Caught exception while saving post {}:\n{}'.format(self.ident, traceback.format_exc()))
+            return False
+        return True
+
+    def get_json_content(self):
+        return json.dumps(self.post, sort_keys=True, indent=4, separators=(',', ': '))
+
+    def _download_media_inner(self, url, get_path, path_parts, media_path):
+        self.post_media.add(url)
+
+        if self.prev_archive is None:
+            cpy_res = False
+        else:
+            assert self.pa_options is not None
+            pa_path_parts = get_path(
+                join(post_dir, self.ident) if self.pa_options['dirs'] else media_dir,
+                self.pa_options['image_names'], self.pa_options['hostdirs'],
+            )
+            cpy_res = maybe_copy_media(self.prev_archive, path_parts, pa_path_parts)
+        file_exists = os.path.exists(media_path)
+        if not (cpy_res or file_exists):
+            if self.options.no_get:
+                return None
+            # We don't have the media and we want it
+            assert wget_retrieve is not None
+            dstpath = open_file(lambda f: f, path_parts)
+            try:
+                wget_retrieve(url, dstpath, post_id=self.ident, post_timestamp=self.post['timestamp'])
+            except WGError as e:
+                e.log()
+                return None
+        if file_exists:
+            try:
+                st = os.stat(media_path)
+            except FileNotFoundError:
+                pass  # skip
+            else:
+                if st.st_mtime > self.post['timestamp']:
+                    touch(media_path, self.post['timestamp'])
+
+        return path_parts[-1]
+
+    @staticmethod
+    def _parse_url_match(match, transform=None):
+        url = match.group(2)
+        if url.startswith('//'):
+            url = 'https:' + url
+        if transform is not None:
+            url = transform(url)
+        filename = urlbasename(urlparse(url).path)
+        return url, filename
+
+
+class BlosxomPost(TumblrPost):
+    def get_image_url(self, image_url, offset):
+        return image_url
+
+    def get_post(self):
+        """returns this post as a Blosxom post"""
+        post = self.title + '\nmeta-id: p-' + self.ident + '\nmeta-url: ' + self.url
+        if self.tags:
+            post += '\nmeta-tags: ' + ' '.join(t.replace(' ', '+') for t in self.tags)
+        post += '\n\n' + self.get_content()
+        return post
+
+
+class LocalPost:
+    def __init__(self, post_file: str, tag_index: bool):
+        self.post_file = post_file
+        if tag_index:
+            with open(post_file, encoding=FILE_ENCODING) as f:
+                post = f.read()
+            # extract all URL-encoded tags
+            self.tags: list[tuple[str, str]] = []
+            footer_pos = post.find('<footer>')
+            if footer_pos > 0:
+                self.tags = re.findall(r'<a.+?/tagged/(.+?)>#(.+?)</a>', post[footer_pos:])
+        parts = post_file.split(os.sep)
+        if parts[-1] == dir_index:  # .../<post_id>/index.html
+            self.file_name = join(*parts[-2:])
+            self.ident = parts[-2]
+        else:
+            self.file_name = parts[-1]
+            self.ident = splitext(self.file_name)[0]
+        self.date: float = os.stat(post_file).st_mtime
+        self.tm = time.localtime(self.date)
+
+    def get_post(self, in_tag_index):
+        with open(self.post_file, encoding=FILE_ENCODING) as f:
+            post = f.read()
+        # remove header and footer
+        lines = post.split('\n')
+        while lines and '<article ' not in lines[0]:
+            del lines[0]
+        while lines and '</article>' not in lines[-1]:
+            del lines[-1]
+        post = '\n'.join(lines)
+        if in_tag_index:
+            # fixup all media links which now have to be two folders lower
+            shallow_media = urlpathjoin('..', media_dir)
+            deep_media = urlpathjoin(save_dir, media_dir)
+            post = post.replace(shallow_media, deep_media)
+        return post
+
+
+class ThreadPool:
+    queue: LockedQueue[Callable[[], None]]
+
+    def __init__(self, threads: int, max_queue: int = 1000):
+        self.queue = LockedQueue(main_thread_lock, max_queue)
+        self.quit = threading.Condition(main_thread_lock)
+        self.quit_flag = False
+        self.abort_flag = False
+        self.errors = False
+        self.threads = [threading.Thread(target=self.handler) for _ in range(threads)]
+        for t in self.threads:
+            t.start()
+
+    def add_work(self, *args, **kwargs):
+        self.queue.put(*args, **kwargs)
+
+    def wait(self):
+        with multicond:
+            self._print_remaining(self.queue.qsize())
+            self.quit_flag = True
+            self.quit.notify_all()
+            while self.queue.unfinished_tasks:
+                no_internet.check(release=True)
+                enospc.check(release=True)
+                # All conditions false, wait for a change
+                multicond.wait((self.queue.all_tasks_done, no_internet.cond, enospc.cond))
+
+    def cancel(self):
+        with main_thread_lock:
+            self.abort_flag = True
+            self.quit.notify_all()
+            no_internet.destroy()
+            enospc.destroy()
+
+        for i, t in enumerate(self.threads, start=1):
+            logger.status('Stopping threads {}{}\r'.format(' ' * i, '.' * (len(self.threads) - i)))
+            t.join()
+
+        logger.info('Backup canceled.\n')
+
+        with main_thread_lock:
+            self.queue.queue.clear()
+            self.queue.all_tasks_done.notify_all()
+
+    def handler(self):
+        def wait_for_work():
+            while not self.abort_flag:
+                if self.queue.qsize():
+                    return True
+                elif self.quit_flag:
+                    break
+                # All conditions false, wait for a change
+                multicond.wait((self.queue.not_empty, self.quit))
+            return False
+
+        while True:
+            with multicond:
+                if not wait_for_work():
+                    break
+                work = self.queue.get(block=False)
+                qsize = self.queue.qsize()
+                if self.quit_flag and qsize % REM_POST_INC == 0:
+                    self._print_remaining(qsize)
+
+            try:
+                while True:
+                    try:
+                        success = work()
+                        break
+                    except OSError as e:
+                        if e.errno == errno.ENOSPC:
+                            enospc.signal()
+                            continue
+                        raise
+            finally:
+                self.queue.task_done()
+            if not success:
+                self.errors = True
+
+    @staticmethod
+    def _print_remaining(qsize):
+        if qsize:
+            logger.status('{} remaining posts to save\r'.format(qsize))
+        else:
+            logger.status('Waiting for worker threads to finish\r')
+
+
+def main():
+    global wget_retrieve
+
+    # The default of 'fork' can cause deadlocks, even on Linux
+    # See https://bugs.python.org/issue40399
+    if 'forkserver' in multiprocessing.get_all_start_methods():
+        multiprocessing.set_start_method('forkserver')  # Fastest safe option, if supported
+    else:
+        multiprocessing.set_start_method('spawn')  # Slow but safe
+
+    # Raises SystemExit to terminate gracefully
+    def handle_term_signal(signum, frame):
+        if sys.is_finalizing():
+            return  # Not a good time to exit
+        sys.exit(1)
+    signal.signal(signal.SIGTERM, handle_term_signal)
+    if hasattr(signal, 'SIGHUP'):
+        signal.signal(signal.SIGHUP, handle_term_signal)
+
+
+    config_dir = platformdirs.user_config_dir('tumblr-backup', roaming=True, ensure_exists=True)
+    config_file = Path(config_dir) / 'config.json'
+
+    if '--set-api-key' in sys.argv[1:]:
+        # special argument parsing
+        opt, *args = sys.argv[1:]
+        if opt != '--set-api-key' or len(args) != 1:
+            print(f'{Path(sys.argv[0]).name}: invalid usage', file=sys.stderr)
+            return 1
+        api_key, = args
+
+        try:
+            fd = os.open(config_file, os.O_WRONLY | os.O_CREAT | os.O_EXCL, 0o644)
+            file_exists = False
+        except FileExistsError:
+            fd = os.open(config_file, os.O_RDWR, 0o644)
+            file_exists = True
+
+        with open(fd, 'r+') as f:
+            cfg = json.load(f) if file_exists else {}
+            cfg['oauth_consumer_key'] = api_key
+            f.seek(0)
+            f.truncate()
+            json.dump(cfg, f, indent=4)
+            f.write('\n')
+        return 0
+
+
+    no_internet.setup(main_thread_lock)
+    enospc.setup(main_thread_lock)
+
+    class CSVCallback(argparse.Action):
+        def __call__(self, parser, namespace, values, option_string=None):
+            setattr(namespace, self.dest, list(values.split(',')))
+
+    class RequestCallback(argparse.Action):
+        def __call__(self, parser, namespace, values, option_string=None):
+            request = getattr(namespace, self.dest) or {}
+            for req in values.lower().split(','):
+                parts = req.strip().split(':')
+                typ = parts.pop(0)
+                if typ != TYPE_ANY and typ not in POST_TYPES:
+                    parser.error("{}: invalid post type '{}'".format(option_string, typ))
+                types = POST_TYPES if typ == TYPE_ANY else (typ,)
+                for typ in types:
+                    if not parts:
+                        request[typ] = [TAG_ANY]
+                        continue
+                    if typ not in request:
+                        request[typ] = []
+                    request[typ].extend(parts)
+            setattr(namespace, self.dest, request)
+
+    class TagsCallback(RequestCallback):
+        def __call__(self, parser, namespace, values, option_string=None):
+            super().__call__(
+                parser, namespace, TYPE_ANY + ':' + values.replace(',', ':'), option_string,
+            )
+
+    class PeriodCallback(argparse.Action):
+        def __call__(self, parser, namespace, values, option_string=None):
+            try:
+                pformat = {'y': '%Y', 'm': '%Y%m', 'd': '%Y%m%d'}[values]
+            except KeyError:
+                periods = values.replace('-', '').split(',')
+                if not all(re.match(r'\d{4}(\d\d)?(\d\d)?Z?$', p) for p in periods):
+                    parser.error("Period must be 'y', 'm', 'd' or YYYY[MM[DD]][Z]")
+                if not (1 <= len(periods) < 3):
+                    parser.error('Period must have either one year/month/day or a start and end')
+                prange = parse_period_date(periods.pop(0))
+                if periods:
+                    prange[1] = parse_period_date(periods.pop(0))[0]
+            else:
+                period = time.strftime(pformat)
+                prange = parse_period_date(period)
+            setattr(namespace, self.dest, prange)
+
+    class IdFileCallback(argparse.Action):
+        def __call__(self, parser, namespace, values, option_string=None):
+            with open(values) as f:
+                lines = (l.rstrip('\n') for l in f)
+                setattr(namespace, self.dest, sorted(
+                    (int(line) for line in lines if line), reverse=True,
+                ))
+
+    parser = argparse.ArgumentParser(usage='%(prog)s [options] blog-name ...',
+                                     description='Makes a local backup of Tumblr blogs.')
+    postexist_group = parser.add_mutually_exclusive_group()
+    reblog_group = parser.add_mutually_exclusive_group()
+    parser.add_argument('-O', '--outdir', help='set the output directory (default: blog-name)')
+    parser.add_argument('-D', '--dirs', action='store_true', help='save each post in its own folder')
+    parser.add_argument('-q', '--quiet', action='store_true', help='suppress progress messages')
+    postexist_group.add_argument('-i', '--incremental', action='store_true', help='incremental backup mode')
+    parser.add_argument('-l', '--likes', action='store_true', help="save a blog's likes, not its posts")
+    parser.add_argument('-k', '--skip-images', action='store_false', dest='save_images',
+                        help='do not save images; link to Tumblr instead')
+    parser.add_argument('--save-video', action='store_true', help='save all video files')
+    parser.add_argument('--save-video-tumblr', action='store_true', help='save only Tumblr video files')
+    parser.add_argument('--save-audio', action='store_true', help='save audio files')
+    parser.add_argument('--save-notes', action='store_true', help='save a list of notes for each post')
+    parser.add_argument('--copy-notes', action='store_true', default=None,
+                        help='copy the notes list from a previous archive (inverse: --no-copy-notes)')
+    parser.add_argument('--no-copy-notes', action='store_false', default=None, dest='copy_notes',
+                        help=argparse.SUPPRESS)
+    parser.add_argument('--notes-limit', type=int, metavar='COUNT', help='limit requested notes to COUNT, per-post')
+    parser.add_argument('--cookiefile', help='cookie file for youtube-dl, --save-notes, and svc API')
+    parser.add_argument('-j', '--json', action='store_true', help='save the original JSON source')
+    parser.add_argument('-b', '--blosxom', action='store_true', help='save the posts in blosxom format')
+    parser.add_argument('-r', '--reverse-month', action='store_false',
+                        help='reverse the post order in the monthly archives')
+    parser.add_argument('-R', '--reverse-index', action='store_false', help='reverse the index file order')
+    parser.add_argument('--tag-index', action='store_true', help='also create an archive per tag')
+    postexist_group.add_argument('-a', '--auto', type=int, metavar='HOUR',
+                                 help='do a full backup at HOUR hours, otherwise do an incremental backup'
+                                      ' (useful for cron jobs)')
+    parser.add_argument('-n', '--count', type=int, help='save only COUNT posts')
+    parser.add_argument('-s', '--skip', type=int, default=0, help='skip the first SKIP posts')
+    parser.add_argument('-p', '--period', action=PeriodCallback,
+                        help="limit the backup to PERIOD ('y', 'm', 'd', YYYY[MM[DD]][Z], or START,END)")
+    parser.add_argument('-N', '--posts-per-page', type=int, default=50, metavar='COUNT',
+                        help='set the number of posts per monthly page, 0 for unlimited')
+    parser.add_argument('-Q', '--request', action=RequestCallback,
+                        help=f'save posts matching the request TYPE:TAG:TAG:…,TYPE:TAG:…,…. '
+                             f'TYPE can be {", ".join(POST_TYPES)} or {TYPE_ANY}; TAGs can be omitted or a '
+                             f'colon-separated list. Example: -Q {TYPE_ANY}:personal,quote,photo:me:self')
+    parser.add_argument('-t', '--tags', action=TagsCallback, dest='request',
+                        help='save only posts tagged TAGS (comma-separated values; case-insensitive)')
+    parser.add_argument('-T', '--type', action=RequestCallback, dest='request',
+                        help=f'save only posts of type TYPE (comma-separated values from {", ".join(POST_TYPES)})')
+    parser.add_argument('-F', '--filter', help='save posts matching a jq filter (needs jq module)')
+    reblog_group.add_argument('--no-reblog', action='store_true', help="don't save reblogged posts")
+    reblog_group.add_argument('--only-reblog', action='store_true', help='save only reblogged posts')
+    parser.add_argument('-I', '--image-names', choices=('o', 'i', 'bi'), default='o', metavar='FMT',
+                        help="image filename format ('o'=original, 'i'=<post-id>, 'bi'=<blog-name>_<post-id>)")
+    parser.add_argument('-e', '--exif', action=CSVCallback, default=[], metavar='KW',
+                        help='add EXIF keyword tags to each picture'
+                             " (comma-separated values; '-' to remove all tags, '' to add no extra tags)")
+    parser.add_argument('-S', '--no-ssl-verify', action='store_true', help='ignore SSL verification errors')
+    parser.add_argument('--prev-archives', action=CSVCallback, default=[], metavar='DIRS',
+                        help='comma-separated list of directories (one per blog) containing previous blog archives')
+    parser.add_argument('--no-post-clobber', action='store_true', help='Do not re-download existing posts')
+    parser.add_argument('--no-server-timestamps', action='store_false', dest='use_server_timestamps',
+                        help="don't set local timestamps from HTTP headers")
+    parser.add_argument('--hostdirs', action='store_true', help='Generate host-prefixed directories for media')
+    parser.add_argument('--user-agent', help='User agent string to use with HTTP requests')
+    parser.add_argument('--skip-dns-check', action='store_false', dest='use_dns_check',
+                        help='Skip DNS checks for internet access')
+    parser.add_argument('--threads', type=int, default=20, help='number of threads to use for post retrieval')
+    postexist_group.add_argument('--continue', action='store_true', dest='resume',
+                                 help='Continue an incomplete first backup')
+    parser.add_argument('--ignore-diffopt', action='store_true',
+                        help='Force backup over an incomplete archive with different options')
+    parser.add_argument('--no-get', action='store_true', help="Don't retrieve files not found in --prev-archives")
+    postexist_group.add_argument('--reuse-json', action='store_true',
+                                 help='Reuse the API responses saved with --json (implies --copy-notes)')
+    parser.add_argument('--internet-archive', action='store_true',
+                        help='Fall back to the Internet Archive for Tumblr media 403 and 404 responses')
+    parser.add_argument('--media-list', action='store_true', help='Save post media URLs to media.json')
+    parser.add_argument('--id-file', action=IdFileCallback, dest='idents', metavar='FILE',
+                        help='file containing a list of post IDs to save, one per line')
+    parser.add_argument('--json-info', action='store_true',
+                        help="Just print some info for each blog, don't make a backup")
+    parser.add_argument('blogs', nargs='*')
+    options = parser.parse_args()
+
+    blogs = options.blogs
+    if not blogs:
+        parser.error('Missing blog-name')
+
+    logger.quiet = options.quiet
+    if options.json_info:
+        options.quiet = True
+        logger.file = sys.stderr
+
+    if options.auto is not None and options.auto != time.localtime().tm_hour:
+        options.incremental = True
+    if options.resume or options.incremental:
+        # Do not clobber or count posts that were already backed up
+        options.no_post_clobber = True
+    if options.count is not None and options.count < 0:
+        parser.error('--count: count must not be negative')
+    if options.count == 0 and (options.incremental or options.auto is not None):
+        parser.error('--count 0 conflicts with --incremental and --auto')
+    if options.skip < 0:
+        parser.error('--skip: skip must not be negative')
+    if options.posts_per_page < 0:
+        parser.error('--posts-per-page: posts per page must not be negative')
+    if options.outdir and len(blogs) > 1:
+        parser.error('-O can only be used for a single blog-name')
+    if options.dirs and options.tag_index:
+        parser.error('-D cannot be used with --tag-index')
+    if options.cookiefile is not None and not os.access(options.cookiefile, os.R_OK):
+        parser.error('--cookiefile: file cannot be read')
+    if options.notes_limit is not None:
+        if not options.save_notes:
+            parser.error('--notes-limit requires --save-notes')
+        if options.notes_limit < 1:
+            parser.error('--notes-limit: Value must be at least 1')
+    if options.prev_archives and options.reuse_json:
+        parser.error('--prev-archives and --reuse-json are mutually exclusive')
+    if options.prev_archives:
+        if len(options.prev_archives) != len(blogs):
+            parser.error('--prev-archives: expected {} directories, got {}'.format(
+                len(blogs), len(options.prev_archives),
+            ))
+        for blog, pa in zip(blogs, options.prev_archives):
+            if not os.access(pa, os.R_OK | os.X_OK):
+                parser.error("--prev-archives: directory '{}' cannot be read".format(pa))
+            blogdir = os.curdir if options.blosxom else (options.outdir or blog)
+            if os.path.realpath(pa) == os.path.realpath(blogdir):
+                parser.error("--prev-archives: Directory '{}' is also being written to. Use --reuse-json instead if "
+                             "you want this, or specify --outdir if you don't.".format(pa))
+    if options.threads < 1:
+        parser.error('--threads: must use at least one thread')
+    if options.no_get and not (options.prev_archives or options.reuse_json):
+        parser.error('--no-get makes no sense without --prev-archives or --reuse-json')
+    if options.no_get and options.save_notes:
+        logger.warn('Warning: --save-notes uses HTTP regardless of --no-get\n')
+    if options.copy_notes and not (options.prev_archives or options.reuse_json):
+        parser.error('--copy-notes requires --prev-archives or --reuse-json')
+    if options.idents is not None and options.likes:
+        parser.error('--id-file not implemented for likes')
+    if options.copy_notes is None:
+        # Default to True if we may regenerate posts
+        options.copy_notes = options.reuse_json and not (options.no_post_clobber or options.mtime_fix)
+
+    # NB: this is done after setting implied options
+    orig_options = vars(options).copy()
+
+    check_optional_modules(options)
+
+    try:
+        with open(config_file) as f:
+            api_key = json.load(f)['oauth_consumer_key']
+    except (FileNotFoundError, KeyError):
+        msg = f"""\
+            API key not set. To use tumblr-backup:
+            1. Go to https://www.tumblr.com/oauth/apps and create an app if you don't have one already.
+            2. Copy the "OAuth Consumer Key" from the app you created.
+            3. Run `{Path(sys.argv[0]).name} --set-api-key API_KEY`, where API_KEY is the key that you just copied."""
+        print(textwrap.dedent(msg), file=sys.stderr)
+        return 1
+
+    wget_retrieve = WgetRetrieveWrapper(logger.log, options)
+    setup_wget(not options.no_ssl_verify, options.user_agent)
+
+    ApiParser.setup(api_key, options.no_ssl_verify, options.user_agent, options.cookiefile)
+    tb = TumblrBackup(options, orig_options, parser.get_default)
+    try:
+        for i, account in enumerate(blogs):
+            logger.backup_account = account
+            tb.backup(account, options.prev_archives[i] if options.prev_archives else None)
+    except KeyboardInterrupt:
+        return EXIT_INTERRUPT
+
+    if tb.failed_blogs:
+        logger.warn('Failed to back up {}\n'.format(', '.join(tb.failed_blogs)))
+    if tb.postfail_blogs:
+        logger.warn('One or more posts failed to save for {}\n'.format(', '.join(tb.postfail_blogs)))
+    return tb.exit_code()
diff --git a/tumblr_backup/note_scraper.py b/tumblr_backup/note_scraper.py
new file mode 100644
index 0000000..91c261f
--- /dev/null
+++ b/tumblr_backup/note_scraper.py
@@ -0,0 +1,252 @@
+from __future__ import annotations
+
+import re
+import sys
+import time
+import traceback
+import warnings
+from datetime import datetime
+from multiprocessing.queues import SimpleQueue
+from typing import cast
+from urllib.parse import parse_qs, quote, urlencode, urljoin, urlparse, urlsplit, urlunsplit
+
+import requests
+from bs4 import BeautifulSoup, Tag
+from requests.exceptions import RequestException
+from urllib3 import Retry, Timeout
+from urllib3.exceptions import HTTPError, InsecureRequestWarning
+
+from .util import ConnectionFile, LogLevel, is_dns_working, make_requests_session, setup_urllib3_ssl, to_bytes
+
+setup_urllib3_ssl()
+
+EXIT_SUCCESS = 0
+EXIT_SAFE_MODE = 2
+EXIT_NO_INTERNET = 3
+
+HTTP_TIMEOUT = Timeout(90)
+# Always retry on 503 or 504, but never on connect or 429, the latter handled specially
+HTTP_RETRY = Retry(3, connect=False, status_forcelist=frozenset((503, 504)))
+HTTP_RETRY.RETRY_AFTER_STATUS_CODES = frozenset((413,))  # type: ignore[misc]
+
+# Globals
+post_url = None
+ident = None
+msg_queue: SimpleQueue[tuple[int, str]] | None = None
+
+
+def log(level, url, msg):
+    assert msg_queue is not None
+    url_msg = ", URL '{}'".format(url) if url != post_url else ''
+    # see https://github.com/google/pytype/issues/1344#issuecomment-1553500779
+    msg_queue.put(  # pytype: disable=attribute-error
+        (level, '[Note Scraper] Post {}{}: {}\n'.format(ident, url_msg, msg)),
+    )
+
+
+class WebCrawler:
+    # Python 2.x urllib.always_safe is private in Python 3.x; its content is copied here
+    _ALWAYS_SAFE_BYTES = (b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+                          b'abcdefghijklmnopqrstuvwxyz'
+                          b'0123456789' b'_.-')
+
+    _reserved = b';/?:@&=+$|,#'  # RFC 3986 (Generic Syntax)
+    _unreserved_marks = b"-_.!~*'()"  # RFC 3986 sec 2.3
+    _safe_chars = _ALWAYS_SAFE_BYTES + b'%' + _reserved + _unreserved_marks
+
+    TRY_LIMIT = 10  # max attempts when ratelimited
+
+    def __init__(self, noverify, user_agent, cookiefile, notes_limit):
+        self.notes_limit = notes_limit
+        self.lasturl = None
+        self.session = make_requests_session(
+            requests.Session, HTTP_RETRY, HTTP_TIMEOUT, not noverify, user_agent, cookiefile,
+        )
+        self.original_post_seen = False
+
+    @classmethod
+    def quote_unsafe(cls, string):
+        return quote(to_bytes(string), cls._safe_chars)
+
+    # Based on w3lib.safe_url_string
+    @classmethod
+    def iri_to_uri(cls, iri):
+        parts = urlsplit(iri)
+
+        # IDNA encoding can fail for too long labels (>63 characters) or missing labels (e.g. http://.example.com)
+        try:
+            netloc = parts.netloc.encode('idna').decode('ascii')
+        except UnicodeError:
+            netloc = parts.netloc
+
+        return urlunsplit((
+            parts.scheme,
+            netloc.rstrip(':'),
+            *(cls.quote_unsafe(getattr(parts, p)) for p in ('path', 'query', 'fragment')),
+        ))
+
+    def ratelimit_sleep(self, status_code, headers):
+        if status_code == 420:  # 'Enhance Your Calm' has no suggested delay
+            log(LogLevel.WARN, self.lasturl, 'Rate limited, sleeping for one minute')
+            time.sleep(60)
+            return True
+
+        reset = headers.get('X-Rate-Limit-Reset')
+        if reset is None:
+            return False
+
+        # there's a comma, but both numbers seem to be the same for now
+        if ',' in reset:
+            reset, tmp = reset.split(',', 1)
+            assert reset == tmp
+
+        try:
+            reset_time = int(reset)
+        except ValueError:
+            log(LogLevel.ERROR, self.lasturl, "Expected integer X-Rate-Limit-Reset, got '{}'".format(reset))
+            return False
+
+        # This header is apparently a unix timestamp
+        sleep_dur = (datetime.fromtimestamp(reset_time) - datetime.now()).total_seconds()
+
+        if sleep_dur < 0:
+            log(LogLevel.WARN, self.lasturl, 'Warning: X-Rate-Limit-Reset is {} seconds in the past'.format(-sleep_dur))
+            return True
+        if sleep_dur > 3600:
+            log(LogLevel.ERROR, self.lasturl,
+                'Refusing to sleep for {} minutes, giving up'.format(round(sleep_dur / 60)))
+            return False
+
+        log(LogLevel.WARN, self.lasturl, 'Rate limited, sleeping for {:.2f} seconds as requested'.format(sleep_dur))
+        time.sleep(sleep_dur)
+        return True
+
+    def urlopen(self, iri):
+        self.lasturl = iri
+        uri = self.iri_to_uri(iri)
+
+        try_count = 0
+        while True:
+            with self.session.get(uri) as resp:
+                try_count += 1
+                parsed_uri = urlparse(resp.url)
+                if (
+                    re.match(r'(www\.)?tumblr\.com', parsed_uri.netloc)
+                    and re.match(r'/safe-mode$|/[a-z0-9-]+/[0-9]+(/|$)', parsed_uri.path)
+                ):
+                    sys.exit(EXIT_SAFE_MODE)
+                if (
+                    resp.status_code in (420, 429) and try_count < self.TRY_LIMIT
+                    and self.ratelimit_sleep(resp.status_code, resp.headers)
+                ):
+                    continue
+                if 200 <= resp.status_code < 300:
+                    return resp.content.decode('utf-8', errors='ignore')
+                log(LogLevel.WARN, iri, 'Unexpected response status: HTTP {} {}{}'.format(
+                    resp.status_code, resp.reason,
+                    '' if resp.status_code == 404 else '\nHeaders: {}'.format(resp.headers),
+                ))
+                return None
+
+    @staticmethod
+    def get_more_link(soup, base, notes_url):
+        global ident
+        element = cast(Tag, soup.find('a', class_='more_notes_link'))
+        if not element:
+            return None
+        onclick = element.get_attribute_list('onclick')[0]
+        if not onclick:
+            log(LogLevel.WARN, notes_url, 'No onclick attribute, probably a dashboard-only blog')
+            return None
+        match_ = re.search(r";tumblrReq\.open\('GET','([^']+)'", onclick)
+        if not match_:
+            log(LogLevel.ERROR, notes_url, 'tumblrReq regex failed, did Tumblr update?')
+            return None
+        url = urljoin(base, match_.group(1))
+        spl = urlsplit(url)
+        query = parse_qs(spl.query)
+        try:
+            del query['large']
+        except KeyError:
+            pass
+        return urlunsplit(spl._replace(query=urlencode(query, doseq=True)))
+
+    def append_notes(self, soup, notes_list, notes_url):
+        notes_ol = cast(Tag, soup.find('ol', class_='notes'))
+        if notes_ol is None:
+            log(LogLevel.WARN, notes_url, 'Response HTML does not have a notes list')
+            return False
+        notes = notes_ol.find_all('li')
+        for note in reversed(notes):
+            classes = note.get('class', [])
+            if 'more_notes_link_container' in classes:
+                continue  # skip more notes link
+            if 'original_post' in classes:
+                if self.original_post_seen:
+                    continue  # only show original post once
+                self.original_post_seen = True
+            notes_list.append(note.prettify())
+        return True
+
+    def get_notes(self, post_url):
+        parsed_uri = urlparse(post_url)
+        base = '{uri.scheme}://{uri.netloc}'.format(uri=parsed_uri)
+
+        notes_10k = 0
+        notes_list: list[str] = []
+
+        notes_url = post_url
+        while True:
+            resp_str = self.urlopen(notes_url)
+            if resp_str is None:
+                break
+
+            soup = BeautifulSoup(resp_str, 'lxml')
+            if not self.append_notes(soup, notes_list, notes_url):
+                break
+
+            old_notes_url, notes_url = notes_url, self.get_more_link(soup, base, notes_url)
+            if (not notes_url) or notes_url == old_notes_url:
+                break
+
+            if len(notes_list) > (notes_10k + 1) * 10000:
+                notes_10k += 1
+                log(LogLevel.INFO, notes_url, 'Note: {} notes retrieved so far'.format(notes_10k * 10000))
+            if self.notes_limit is not None and len(notes_list) > self.notes_limit:
+                log(LogLevel.WARN, notes_url, 'Warning: Reached notes limit, stopping early.')
+                break
+
+        return ''.join(notes_list)
+
+
+def main(stdout_conn, msg_queue_, post_url_, ident_, noverify, user_agent, cookiefile, notes_limit, use_dns_check):
+    global post_url, ident, msg_queue
+    msg_queue, post_url, ident = msg_queue_, post_url_, ident_
+
+    assert msg_queue is not None
+    msg_queue._reader.close()  # type: ignore[attr-defined]
+
+    if noverify:
+        # Hide the InsecureRequestWarning from urllib3
+        warnings.filterwarnings('ignore', category=InsecureRequestWarning)
+
+    try:
+        crawler = WebCrawler(noverify, user_agent, cookiefile, notes_limit)
+
+        try:
+            notes = crawler.get_notes(post_url)
+        except KeyboardInterrupt:
+            sys.exit()  # Ignore these so they don't propogate into the parent
+        except (HTTPError, RequestException) as e:
+            if not is_dns_working(timeout=5, check=use_dns_check):
+                sys.exit(EXIT_NO_INTERNET)
+            log(LogLevel.ERROR, crawler.lasturl, e)
+            sys.exit()
+        except Exception:
+            log(LogLevel.ERROR, crawler.lasturl, 'Caught an exception\n{}'.format(traceback.format_exc()))
+            sys.exit()
+    finally:
+        msg_queue._writer.close()  # type: ignore[attr-defined]
+
+    with ConnectionFile(stdout_conn, 'w') as stdout:
+        print(notes, end='', file=stdout)
diff --git a/tumblr_backup/util.py b/tumblr_backup/util.py
new file mode 100644
index 0000000..a00a599
--- /dev/null
+++ b/tumblr_backup/util.py
@@ -0,0 +1,466 @@
+from __future__ import annotations
+
+import errno
+import os
+import queue
+import shutil
+import socket
+import sys
+import threading
+import time
+import warnings
+from abc import ABC, abstractmethod
+from collections import deque
+from enum import Enum
+from functools import total_ordering
+from http.cookiejar import MozillaCookieJar
+from importlib.machinery import PathFinder
+from typing import TYPE_CHECKING, Any, Deque, Generic, TypeVar
+
+from urllib3.exceptions import DependencyWarning
+
+if sys.platform == 'darwin':
+    import fcntl
+
+if TYPE_CHECKING:
+    import requests
+    from typing_extensions import TypeAlias
+    swt_base = requests.Session
+
+
+def to_bytes(string, encoding='utf-8', errors='strict'):
+    if isinstance(string, bytes):
+        return string
+    return string.encode(encoding, errors)
+
+
+class FakeGenericMeta(type):
+    def __getitem__(cls, item):
+        return cls
+
+
+if TYPE_CHECKING:
+    T = TypeVar('T')
+
+    class GenericQueue(queue.Queue[T], Generic[T]):
+        pass
+else:
+    T = None
+
+    class GenericQueue(queue.Queue, metaclass=FakeGenericMeta):
+        pass
+
+
+class LockedQueue(GenericQueue[T]):
+    def __init__(self, lock, maxsize=0):
+        super().__init__(maxsize)
+        self.mutex = lock
+        self.not_empty = threading.Condition(lock)
+        self.not_full = threading.Condition(lock)
+        self.all_tasks_done = threading.Condition(lock)
+
+
+class ConnectionFile:
+    def __init__(self, conn, *args, **kwargs):
+        kwargs.setdefault('closefd', False)
+        self.conn = conn
+        self.file = open(conn.fileno(), *args, **kwargs)
+
+    def __enter__(self):
+        return self.file.__enter__()
+
+    def __exit__(self, *excinfo):
+        self.file.__exit__(*excinfo)
+        self.conn.close()
+
+
+KNOWN_GOOD_NAMESERVER = '8.8.8.8'
+# DNS query for 'A' record of 'google.com'.
+# Generated using python -c "import dnslib; print(bytes(dnslib.DNSRecord.question('google.com').pack()))"
+DNS_QUERY = b'\xf1\xe1\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\x00\x00\x01\x00\x01'
+
+
+def is_dns_working(timeout=None, check=True):
+    if not check:
+        return True  # assume internet is OK
+
+    try:
+        with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
+            if timeout is not None:
+                sock.settimeout(timeout)
+            sock.sendto(DNS_QUERY, (KNOWN_GOOD_NAMESERVER, 53))
+            sock.recvfrom(1)
+    except OSError:
+        return False
+
+    return True
+
+
+class WaitOnMainThread(ABC):
+    def __init__(self):
+        self.cond: threading.Condition | None = None
+        self.flag: bool | None = False
+
+    def setup(self, lock=None):
+        self.cond = threading.Condition(lock)
+
+    def signal(self):
+        assert self.cond is not None
+        if isinstance(threading.current_thread(), threading._MainThread):  # type: ignore[attr-defined]
+            self._do_wait()
+            return
+
+        with self.cond:
+            if self.flag is None:
+                sys.exit(1)
+            self.flag = True
+            self.cond.wait()
+            if self.flag is None:
+                sys.exit(1)
+
+    # Call on main thread when signaled or idle. If the lock is held, pass release=True.
+    def check(self, release=False):
+        assert self.cond is not None
+        if self.flag is False:
+            return
+
+        if release:
+            saved_state = lock_release_save(self.cond)
+            try:
+                self._do_wait()
+            finally:
+                lock_acquire_restore(self.cond, saved_state)
+        else:
+            self._do_wait()
+
+        with self.cond:
+            self.flag = False
+            self.cond.notify_all()
+
+    # Call on main thread to prevent threads from blocking in signal()
+    def destroy(self):
+        assert self.cond is not None
+        if self.flag is None:
+            return
+
+        with self.cond:
+            self.flag = None  # Cause all waiters to exit
+            self.cond.notify_all()
+
+    def _do_wait(self):
+        assert self.cond is not None
+        if self.flag is None:
+            raise RuntimeError('Broken WaitOnMainThread cannot be reused')
+
+        try:
+            self._wait()
+        except:
+            with self.cond:
+                self.flag = None  # Waiting never completed
+                self.cond.notify_all()
+            raise
+
+    @staticmethod
+    @abstractmethod
+    def _wait():
+        raise NotImplementedError
+
+
+class NoInternet(WaitOnMainThread):
+    @staticmethod
+    def _wait():
+        # Having no internet is a temporary system error
+        # Wait 30 seconds at first, then exponential backoff up to 15 minutes
+        print('DNS probe finished: No internet. Waiting...', file=sys.stderr)
+        sleep_time = 30
+        while True:
+            time.sleep(sleep_time)
+            if is_dns_working():
+                break
+            sleep_time = min(sleep_time * 2, 900)
+
+
+class Enospc(WaitOnMainThread):
+    @staticmethod
+    def _wait():
+        if not os.isatty(sys.stdin.fileno()):
+            # Pausing or consuming input does no good during unattended execution.
+            # We have no hope of recovering, so raise an uncaught exception.
+            raise RuntimeError(OSError(errno.ENOSPC, os.strerror(errno.ENOSPC)))
+        print('Error: No space left on device. Press Enter to try again...', file=sys.stderr)
+        input()
+
+
+no_internet = NoInternet()
+enospc = Enospc()
+
+
+# Set up ssl for urllib3. This should be called before using urllib3 or importing requests.
+def setup_urllib3_ssl():
+    # Don't complain about missing SOCKS dependencies
+    warnings.filterwarnings('ignore', category=DependencyWarning)
+
+    try:
+        import ssl
+    except ImportError:
+        return  # Can't do anything without this module
+
+    have_sni = getattr(ssl, 'HAS_SNI', False)
+
+    # Inject SecureTransport on macOS if the linked OpenSSL is too old to handle TLSv1.2 or doesn't support SNI
+    if sys.platform == 'darwin' and (ssl.OPENSSL_VERSION_NUMBER < 0x1000100F or not have_sni):
+        try:
+            from urllib3.contrib import securetransport
+        except (ImportError, OSError) as e:
+            print('Warning: Failed to inject SecureTransport: {!r}'.format(e), file=sys.stderr)
+        else:
+            securetransport.inject_into_urllib3()
+            have_sni = True  # SNI always works
+
+    # Inject PyOpenSSL if the linked OpenSSL has no SNI
+    if not have_sni:
+        try:
+            from urllib3.contrib import pyopenssl
+            pyopenssl.inject_into_urllib3()
+        except ImportError as e:
+            print('Warning: Failed to inject pyOpenSSL: {!r}'.format(e), file=sys.stderr)
+        else:
+            have_sni = True  # SNI always works
+
+
+def make_requests_session(session_type, retry, timeout, verify, user_agent, cookiefile):
+    if TYPE_CHECKING:
+        global swt_base
+    else:
+        swt_base = session_type  # type: ignore
+
+    class SessionWithTimeout(swt_base):
+        def request(self, method, url, *args, **kwargs):
+            kwargs.setdefault('timeout', timeout)
+            return super().request(method, url, *args, **kwargs)
+
+    session = SessionWithTimeout()
+    session.verify = verify
+    if user_agent is not None:
+        session.headers['User-Agent'] = user_agent
+    for adapter in session.adapters.values():
+        adapter.max_retries = retry
+    if cookiefile is not None:
+        cookies = MozillaCookieJar(cookiefile)
+        cookies.load()
+
+        # Session cookies are denoted by either `expires` field set to an empty string or 0. MozillaCookieJar only
+        # recognizes the former (see https://bugs.python.org/issue17164).
+        for cookie in cookies:
+            if cookie.expires == 0:
+                cookie.expires = None
+                cookie.discard = True
+
+        session.cookies = cookies  # type: ignore[assignment]
+    return session
+
+
+@total_ordering
+class LogLevel(Enum):
+    INFO = 0
+    WARN = 1
+    ERROR = 2
+
+    def __lt__(self, other):
+        if type(self) is type(other):
+            return self.value < other.value
+        return NotImplemented
+
+
+def fsync(fd):
+    if sys.platform == 'darwin':
+        # Apple's fsync does not flush the drive write cache
+        try:
+            fcntl.fcntl(fd, fcntl.F_FULLFSYNC)
+        except OSError:
+            pass  # fall back to fsync
+        else:
+            return
+    os.fsync(fd)
+
+
+def fdatasync(fd):
+    if hasattr(os, 'fdatasync'):
+        return os.fdatasync(fd)
+    fsync(fd)
+
+
+# Minimal implementation of a sum of mutable sequences
+class MultiSeqProxy:
+    def __init__(self, subseqs):
+        self.subseqs = subseqs
+
+    def append(self, value):
+        for sub in self.subseqs:
+            sub.append((value, self.subseqs))
+
+    def remove(self, value):
+        for sub in self.subseqs:
+            sub.remove((value, self.subseqs))
+
+
+# Hooks into methods used by threading.Condition.notify
+class NotifierWaiters(Deque[Any]):
+    def __iter__(self):
+        return (value[0] for value in super(NotifierWaiters, self).__iter__())
+
+    def __getitem__(self, index):
+        item = super().__getitem__(index)
+        return deque(v[0] for v in item) if isinstance(index, slice) else item[0]  # pytype: disable=not-callable
+
+    def remove(self, value):
+        try:
+            match = next(x for x in super(NotifierWaiters, self).__iter__() if x[0] == value)
+        except StopIteration:
+            raise ValueError('deque.remove(x): x not in deque')
+        for ref in match[1]:
+            try:
+                super(NotifierWaiters, ref).remove(match)  # Remove waiter from known location
+            except ValueError:
+                raise RuntimeError('Unexpected missing waiter!')
+
+
+# Supports waiting on multiple threading.Conditions objects simultaneously
+class MultiCondition(threading.Condition):
+    def __init__(self, lock):  # noqa: WPS612
+        super().__init__(lock)
+
+    def wait(self, children, timeout=None):  # pytype: disable=signature-mismatch
+        assert len(frozenset(id(c) for c in children)) == len(children), 'Children must be unique'
+        assert all(c._lock is self._lock for c in children), 'All locks must be the same'  # type: ignore[attr-defined]
+
+        # Modify children so their notify methods do cleanup
+        for child in children:
+            if not isinstance(child._waiters, NotifierWaiters):
+                child._waiters = NotifierWaiters(
+                    ((w, (child._waiters,)) for w in child._waiters),
+                )
+        self._waiters = MultiSeqProxy(tuple(c._waiters for c in children))
+
+        super().wait(timeout)
+
+    def notify(self, n=1):
+        raise NotImplementedError
+
+    def notify_all(self):
+        raise NotImplementedError
+
+    notifyAll = notify_all  # noqa: N815
+
+
+def lock_is_owned(lock):
+    try:
+        return lock._is_owned()
+    except AttributeError:
+        if lock.acquire(0):
+            lock.release()
+            return False
+        return True
+
+
+def lock_release_save(lock):
+    try:
+        return lock._release_save()  # pytype: disable=attribute-error
+    except AttributeError:
+        lock.release()  # No state to save
+        return None
+
+
+def lock_acquire_restore(lock, state):
+    try:
+        lock._acquire_restore(state)  # pytype: disable=attribute-error
+    except AttributeError:
+        lock.acquire()  # Ignore saved state
+
+
+ACParams: TypeAlias = 'tuple[tuple[Any, ...], dict[str, Any]]'  # (args, kwargs)
+
+
+class AsyncCallable:
+    request: LockedQueue[ACParams | None]
+    response: LockedQueue[Any]
+
+    def __init__(self, lock, fun, name=None):
+        self.lock = lock
+        self.fun = fun
+        self.request = LockedQueue(lock, maxsize=1)
+        self.response = LockedQueue(lock, maxsize=1)
+        self.quit_flag = False
+        self.thread = threading.Thread(target=self.run_thread, name=name, daemon=True)
+        self.thread.start()
+
+    def run_thread(self):
+        while not self.quit_flag:
+            request = self.request.get()
+            if request is None:
+                break  # quit sentinel
+            args, kwargs = request
+            response = self.fun(*args, **kwargs)
+            self.response.put(response)
+
+    def put(self, *args, **kwargs):
+        self.request.put((args, kwargs))
+
+    def get(self, *args, **kwargs):
+        return self.response.get(*args, **kwargs)
+
+    def quit(self):
+        self.quit_flag = True
+        # Make sure the thread wakes up
+        try:
+            self.request.put(None, block=False)
+        except queue.Full:
+            pass
+        self.thread.join()
+
+
+def opendir(dir_, flags):
+    try:
+        flags |= os.O_DIRECTORY
+    except AttributeError:
+        dir_ += os.path.sep  # Fallback, some systems don't support O_DIRECTORY
+    return os.open(dir_, flags)
+
+
+def try_unlink(path):
+    try:
+        os.unlink(path)
+    except FileNotFoundError:
+        pass  # ignored
+
+
+def _copy_file_range(src, dst):
+    if not hasattr(os, 'copy_file_range'):
+        return False
+
+    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
+        infd, outfd = fsrc.fileno(), fdst.fileno()
+        blocksize = max(os.fstat(infd).st_size, 2 ** 23)  # min 8MiB
+        if sys.maxsize < 2 ** 32:  # 32-bit architecture
+            blocksize = min(blocksize, 2 ** 30)  # max 1GiB
+
+        try:
+            while True:
+                bytes_copied = os.copy_file_range(infd, outfd, blocksize)  # type: ignore[attr-defined]
+                if not bytes_copied:
+                    return True  # EOF
+        except OSError as e:
+            if e.errno == errno.EXDEV:
+                return False  # Different devices (pre Linux 5.3)
+            e.filename, e.filename2 = src, dst
+            raise e
+
+
+def copyfile(src, dst):
+    if _copy_file_range(src, dst):
+        return dst
+    return shutil.copyfile(src, dst)
+
+
+def have_module(name):
+    return PathFinder.find_spec(name) is not None
diff --git a/tumblr_backup/wget.py b/tumblr_backup/wget.py
new file mode 100644
index 0000000..bfc3982
--- /dev/null
+++ b/tumblr_backup/wget.py
@@ -0,0 +1,869 @@
+from __future__ import annotations
+
+import errno
+import functools
+import itertools
+import os
+import time
+import traceback
+import warnings
+from argparse import Namespace
+from collections import OrderedDict
+from email.utils import mktime_tz, parsedate_tz
+from enum import Enum
+from http.client import HTTPConnection as _HTTPConnection, ResponseNotReady
+from tempfile import NamedTemporaryFile
+from typing import Any, BinaryIO, Callable, Dict, Optional, Set
+from urllib.parse import urljoin, urlsplit
+
+from urllib3 import (BaseHTTPResponse, HTTPConnectionPool, HTTPHeaderDict, HTTPResponse, HTTPSConnectionPool,
+                     PoolManager, Retry as Retry, Timeout, make_headers)
+from urllib3.connection import HTTPConnection, HTTPSConnection, _url_from_connection  # noqa: WPS450
+from urllib3.exceptions import (ConnectTimeoutError, HeaderParsingError, HTTPError as HTTPError, InsecureRequestWarning,
+                                MaxRetryError, PoolError)
+from urllib3.util.response import assert_header_parsing
+
+from .util import LogLevel, enospc, fsync, is_dns_working, no_internet, opendir, setup_urllib3_ssl, try_unlink
+
+setup_urllib3_ssl()
+
+HTTP_TIMEOUT = Timeout(90)
+# Always retry on 503 or 504, but never on connect, which is handled specially
+# Also retry on 500 and 502 since Tumblr servers have temporary failures
+HTTP_RETRY = Retry(3, connect=False, status_forcelist=frozenset((500, 502, 503, 504)))
+HTTP_RETRY.RETRY_AFTER_STATUS_CODES = frozenset((413, 429))  # type: ignore[misc]
+HTTP_CHUNK_SIZE = 1024 * 1024
+
+base_headers = make_headers(keep_alive=True, accept_encoding=True)
+
+
+# Document type flags
+RETROKF = 0x2             # retrieval was OK
+
+
+# Error statuses
+class UErr(Enum):
+    RETRUNNEEDED = 0
+    RETRINCOMPLETE = 1
+    RETRFINISHED = 2
+
+
+class HttpStat:
+    current_url: Optional[Any]
+    contlen: Optional[int]
+    last_modified: Optional[str]
+    remote_time: Optional[int]
+    dest_dir: Optional[int]
+    part_file: Optional[BinaryIO]
+    remote_encoding: Optional[str]
+    enc_is_identity: Optional[bool]
+    decoder: Optional[object]
+    _make_part_file: Optional[Callable[[], BinaryIO]]
+
+    def __init__(self):
+        self.current_url = None      # the most recent redirect, otherwise the initial url
+        self.bytes_read = 0          # received length
+        self.bytes_written = 0       # written length
+        self.contlen = None          # expected length
+        self.restval = 0             # the restart value
+        self.last_modified = None    # Last-Modified header
+        self.remote_time = None      # remote time-stamp
+        self.statcode = 0            # status code
+        self.dest_dir = None         # handle to the directory containing part_file
+        self.part_file = None        # handle to local file used for in-progress download
+        self.remote_encoding = None  # the encoding of the remote file
+        self.enc_is_identity = None  # whether the remote encoding is identity
+        self.decoder = None          # saved decoder from the HTTPResponse
+        self._make_part_file = None  # part_file supplier
+
+    def set_part_file_supplier(self, value):
+        self._make_part_file = value
+
+    def init_part_file(self):
+        if self._make_part_file is not None:
+            self.part_file = self._make_part_file()
+            self._make_part_file = None
+
+
+class WGHTTPResponse(HTTPResponse):
+    REDIRECT_STATUSES = [300] + HTTPResponse.REDIRECT_STATUSES
+
+    def __init__(
+        self, body='', headers=None, status=0, version=0, reason=None, preload_content=True, decode_content=True,
+        original_response=None, pool=None, connection=None, msg=None, retries=None, enforce_content_length=False,
+        request_method=None, request_url=None, auto_close=True,
+    ):
+        # Copy original Content-Length for _init_length
+        if not isinstance(headers, HTTPHeaderDict):
+            headers = HTTPHeaderDict(headers)
+        if 'Content-Length' not in headers and 'X-Archive-Orig-Content-Length' in headers:
+            headers['Content-Length'] = headers['X-Archive-Orig-Content-Length']
+
+        self.bytes_to_skip = 0
+        self.last_read_length = 0
+        super().__init__(
+            body, headers, status, version, reason, preload_content, decode_content, original_response, pool,
+            connection, msg, retries, enforce_content_length, request_method, request_url, auto_close,
+        )
+
+    # Make decoder public for saving and restoring the decoder state
+    @property
+    def decoder(self):
+        return self._decoder  # pytype: disable=attribute-error
+
+    @decoder.setter
+    def decoder(self, value):
+        self._decoder = value
+
+    # Make _init_length publicly usable because its implementation is nice
+    def get_content_length(self, meth):
+        return self._init_length(meth)  # type: ignore[attr-defined]
+
+    def _init_decoder(self) -> None:
+        self.last_read_length = 0
+        super()._init_decoder()
+
+    # Wrap _decode to do some extra processing of the content-encoded entity data.
+    def _decode(self, data, decode_content, flush_decoder):
+        # Skip any data we don't need
+        data_len = len(data)
+        if self.bytes_to_skip >= data_len:
+            data = b''
+            self.bytes_to_skip -= data_len
+        elif self.bytes_to_skip > 0:
+            data = data[self.bytes_to_skip:]
+            self.bytes_to_skip = 0
+
+        self.last_read_length += len(data)  # Count only non-skipped data
+        if not data:
+            data = b''
+            if flush_decoder:
+                data += self._flush_decoder()
+            return data
+        return super()._decode(data, decode_content, flush_decoder)  # type: ignore[misc]
+
+
+class WGHTTPConnection(HTTPConnection):
+    def getresponse(self) -> WGHTTPResponse:  # type: ignore[override]
+        # Raise the same error as http.client.HTTPConnection
+        if self._response_options is None:
+            raise ResponseNotReady()
+
+        # Reset this attribute for being used again.
+        resp_options = self._response_options
+        self._response_options = None
+
+        # Since the connection's timeout value may have been updated
+        # we need to set the timeout on the socket.
+        self.sock.settimeout(self.timeout)
+
+        # Get the response from http.client.HTTPConnection
+        httplib_response = _HTTPConnection.getresponse(self)
+
+        try:
+            assert_header_parsing(httplib_response.msg)
+        except (HeaderParsingError, TypeError) as hpe:
+            print('Failed to parse headers (url={}): {}'.format(
+                _url_from_connection(self, resp_options.request_url), hpe,
+            ))
+            traceback.print_exc()
+
+        headers = HTTPHeaderDict(httplib_response.msg.items())
+
+        return WGHTTPResponse(
+            body=httplib_response,
+            headers=headers,
+            status=httplib_response.status,
+            version=httplib_response.version,
+            reason=httplib_response.reason,
+            preload_content=resp_options.preload_content,
+            decode_content=resp_options.decode_content,
+            original_response=httplib_response,
+            enforce_content_length=resp_options.enforce_content_length,
+            request_method=resp_options.request_method,
+            request_url=resp_options.request_url,
+        )
+
+
+class WGHTTPSConnection(WGHTTPConnection, HTTPSConnection):
+    pass
+
+
+class WGHTTPConnectionPool(HTTPConnectionPool):
+    ConnectionCls = WGHTTPConnection
+
+    def __init__(self, host, port=None, *args, **kwargs):
+        norm_host = normalized_host(self.scheme, host, port)
+        cfh_url = kwargs.pop('cfh_url', None)
+        if norm_host in unreachable_hosts:
+            raise WGUnreachableHostError(None, cfh_url, 'Host {} is ignored.'.format(norm_host))
+        super().__init__(host, port, *args, **kwargs)
+
+
+class WGHTTPSConnectionPool(HTTPSConnectionPool):
+    ConnectionCls = WGHTTPSConnection
+
+    def __init__(self, host, port=None, *args, **kwargs):
+        norm_host = normalized_host(self.scheme, host, port)
+        cfh_url = kwargs.pop('cfh_url', None)
+        if norm_host in unreachable_hosts:
+            raise WGUnreachableHostError(None, cfh_url, 'Host {} is ignored.'.format(norm_host))
+        super().__init__(host, port, *args, **kwargs)
+
+
+class WGPoolManager(PoolManager):
+    def __init__(self, num_pools=10, headers=None, **connection_pool_kw):
+        super().__init__(num_pools, headers, **connection_pool_kw)
+        self.cfh_url = None
+        self.pool_classes_by_scheme = {'http': WGHTTPConnectionPool, 'https': WGHTTPSConnectionPool}
+
+    def connection_from_url(self, url, pool_kwargs=None):
+        try:
+            self.cfh_url = url
+            return super().connection_from_url(url, pool_kwargs)  # type: ignore[call-arg]
+        finally:
+            self.cfh_url = None
+
+    # the urllib3 stubs lie about this method's signature
+    def urlopen(self, method, url, redirect=True, **kw):  # pytype: disable=signature-mismatch
+        try:
+            self.cfh_url = url
+            return super().urlopen(method, url, redirect, **kw)
+        finally:
+            self.cfh_url = None
+
+    def _new_pool(self, scheme, host, port, request_context=None):
+        if request_context is None:
+            request_context = self.connection_pool_kw.copy()
+        request_context['cfh_url'] = self.cfh_url
+        return super()._new_pool(scheme, host, port, request_context)  # type: ignore[misc]
+
+
+poolman = WGPoolManager(maxsize=20, timeout=HTTP_TIMEOUT)
+
+
+class Logger:
+    def __init__(self, original_url, post_id, log):
+        self.original_url = original_url
+        self.post_id = post_id
+        self.log_cb = log
+
+    def info(self, url, msg):
+        self._log_info(LogLevel.INFO, url, msg)
+
+    def warn(self, url, msg):
+        self._log_info(LogLevel.WARN, url, msg)
+
+    def error(self, url, msg, info):
+        qmsg = '[wget] Error retrieving media\n'
+        qmsg += '  {}\n'.format(msg)
+        if self.post_id is not None:
+            info['Post'] = self.post_id
+
+        url_key = 'URL' if url == self.original_url else 'Original URL'
+        info[url_key] = self.original_url
+        if url != self.original_url:
+            info['Redirect URL'] = url
+
+        for k, v in info.items():
+            qmsg += '  {}: {}\n'.format(k, v)
+
+        self.log_cb(LogLevel.WARN, qmsg)  # wget errors can still be silenced
+
+    def _log_info(self, level, url, msg):
+        qmsg = '[wget] {}\n'.format(msg)
+        qmsg += '  URL{}: {}\n'.format(
+            '' if url == self.original_url else ' (redirect)',
+            url,
+        )
+        self.log_cb(level, qmsg)
+
+
+def gethttp(url, hstat, doctype, logger, retry_counter, use_dns_check):
+    if hstat.current_url is not None:
+        url = hstat.current_url  # The most recent location is cached
+
+    hstat.bytes_read = 0
+    hstat.contlen = None
+    hstat.remote_time = None
+
+    # Initialize the request
+    request_headers = {}
+    if hstat.restval:
+        request_headers['Range'] = 'bytes={}-'.format(hstat.restval)
+
+    doctype &= ~RETROKF
+
+    resp = urlopen(url, use_dns_check, request_headers, preload_content=False, enforce_content_length=False)
+    url = hstat.current_url = urljoin(url, resp.geturl())
+
+    try:
+        err, doctype = process_response(url, hstat, doctype, logger, retry_counter, resp)
+    finally:
+        resp.release_conn()
+
+    return err, doctype
+
+
+def process_response(url, hstat, doctype, logger, retry_counter, resp):
+    # RFC 7233 section 4.1 paragraph 6:
+    # "A server MUST NOT generate a multipart response to a request for a single range [...]"
+    conttype = resp.headers.get('Content-Type')
+    if conttype is not None and conttype.lower().split(';', 1)[0].strip() == 'multipart/byteranges':
+        raise WGBadResponseError(logger, url, 'Sever sent multipart response, but multiple ranges were not requested')
+
+    contlen = resp.get_content_length('GET')
+
+    crange_header = resp.headers.get('Content-Range')
+    crange_parsed = parse_content_range(crange_header)
+    if crange_parsed is not None:
+        first_bytep, last_bytep, _ = crange_parsed
+        contrange = first_bytep
+        contlen = last_bytep - first_bytep + 1
+    else:
+        contrange = 0
+
+    hstat.last_modified = resp.headers.get('Last-Modified')
+    if hstat.last_modified is None:
+        hstat.last_modified = resp.headers.get('X-Archive-Orig-Last-Modified')
+
+    if hstat.last_modified is None:
+        hstat.remote_time = None
+    else:
+        lmtuple = parsedate_tz(hstat.last_modified)
+        hstat.remote_time = None if lmtuple is None else mktime_tz(lmtuple)
+
+    remote_encoding = resp.headers.get('Content-Encoding')
+
+    def norm_enc(enc):
+        return None if enc is None else tuple(e.strip() for e in enc.split(','))
+
+    if hstat.restval > 0 and norm_enc(hstat.remote_encoding) != norm_enc(remote_encoding):
+        # Retry without restart
+        hstat.restval = 0
+        retry_counter.increment(url, hstat, 'Inconsistent Content-Encoding, must start over')
+        return UErr.RETRINCOMPLETE, doctype
+
+    hstat.remote_encoding = remote_encoding
+    hstat.enc_is_identity = remote_encoding in (None, '') or all(
+        enc.strip() == 'identity' for enc in remote_encoding.split(',')
+    )
+
+    # In some cases, httplib returns a status of _UNKNOWN
+    try:
+        hstat.statcode = int(resp.status)
+    except ValueError:
+        hstat.statcode = 0
+
+    # HTTP 20X
+    # HTTP 207 Multi-Status
+    if 200 <= hstat.statcode < 300 and hstat.statcode != 207:
+        doctype |= RETROKF
+
+    # HTTP 204 No Content
+    if hstat.statcode == 204:
+        hstat.bytes_read = hstat.restval = 0
+        return UErr.RETRFINISHED, doctype
+
+    # HTTP 420 Enhance Your Calm
+    if hstat.statcode == 420:
+        retry_counter.increment(url, hstat, 'Rate limited (HTTP 420 Enhance Your Calm)', 60)
+        logger.info(url, 'Rate limited, sleeping for one minute')
+        return UErr.RETRINCOMPLETE, doctype
+
+    if not (doctype & RETROKF):
+        e = WGWrongCodeError(logger, url, hstat.statcode, resp.reason, resp.headers)
+        # Cloudflare-specific errors
+        # 521 Web Server Is Down
+        # 522 Connection Timed Out
+        # 523 Origin Is Unreachable
+        # 525 SSL Handshake Failed
+        # 526 Invalid SSL Certificate
+        if resp.headers.get('Server') == 'cloudflare' and hstat.statcode in (521, 522, 523, 525, 526):
+            # Origin is unreachable - condemn it and don't retry
+            hostname = normalized_host_from_url(url)
+            unreachable_hosts.add(hostname)
+            msg = 'Error connecting to origin of host {}. From now on it will be ignored.'.format(hostname)
+            raise WGUnreachableHostError(logger, url, msg, e)
+        raise e
+
+    shrunk = False
+    if hstat.statcode == 416:
+        shrunk = True  # HTTP 416 Range Not Satisfiable
+    elif hstat.statcode != 200 or contlen == 0:
+        pass  # Only verify contlen if 200 OK (NOT 206 Partial Contents) and contlen is nonzero
+    elif contlen is not None and contrange == 0 and hstat.restval >= contlen:
+        shrunk = True  # Got the whole content but it is known to be shorter than the restart point
+
+    if shrunk:
+        # NB: Unlike wget, we will retry because restarts are expected to succeed (we do not support '-c')
+        # The remote file has shrunk, retry without restart
+        hstat.restval = 0
+        retry_counter.increment(url, hstat, 'Resume with Range failed, must start over')
+        return UErr.RETRINCOMPLETE, doctype
+
+    # The Range request was misunderstood. Bail out.
+    # Unlike wget, we bail hard with no retry, because this indicates a broken or unreasonable server.
+    if contrange not in (0, hstat.restval):
+        raise WGRangeError(
+            logger, url,
+            f'Server provided unexpected Content-Range: Requested {hstat.restval}, got {contrange}',
+        )
+    # HTTP 206 Partial Contents
+    if hstat.statcode == 206 and hstat.restval > 0 and contrange == 0:
+        if crange_header is None:
+            crange_status = 'not provided'
+        elif crange_parsed is None:
+            crange_status = 'invalid'
+        else:  # contrange explicitly zero
+            crange_status = 'zero'
+        raise WGRangeError(logger, url, 'Requested a Range and server sent HTTP 206 Partial Contents, '
+                           'but Content-Range is {}!'.format(crange_status))
+
+    hstat.contlen = contlen
+    if hstat.contlen is not None:
+        hstat.contlen += contrange
+
+    if not (doctype & RETROKF):
+        hstat.bytes_read = hstat.restval = 0
+        return UErr.RETRFINISHED, doctype
+
+    if hstat.restval > 0 and contrange == 0:
+        # If the server ignored our range request, skip the first RESTVAL bytes of the body.
+        resp.bytes_to_skip = hstat.restval
+    else:
+        resp.bytes_to_skip = 0
+
+    hstat.bytes_read = hstat.restval
+
+    assert resp.decoder is None
+    if hstat.restval > 0:
+        resp.decoder = hstat.decoder  # Resume the previous decoder state -- Content-Encoding is weird
+
+    hstat.init_part_file()  # We're about to write to part_file, make sure it exists
+    assert hstat.part_file is not None
+
+    try:
+        for chunk in resp.stream(HTTP_CHUNK_SIZE, decode_content=True):
+            hstat.bytes_read += resp.last_read_length
+            if not chunk:  # May be possible if not resp.chunked due to implementation of _decode
+                continue
+            hstat.part_file.write(chunk)
+    except MaxRetryError:
+        raise
+    except (HTTPError, OSError) as e:
+        is_read_error = isinstance(e, HTTPError)
+        length_known = hstat.contlen is not None and (is_read_error or hstat.enc_is_identity)
+        logger.warn(url, '{} error at byte {}{}'.format(
+            'Read' if is_read_error else 'Write',
+            hstat.bytes_read if is_read_error else hstat.bytes_written,
+            '/{}'.format(hstat.contlen) if length_known else '',
+        ))
+
+        if hstat.bytes_read == hstat.restval:
+            raise  # No data read
+        if isinstance(e, OSError) and e.errno == errno.ENOSPC:
+            raise  # Handled specialy in outer except block
+        if not retry_counter.should_retry():
+            raise  # This won't be retried
+
+        # Grab the decoder state for next time
+        if resp.decoder is not None:
+            hstat.decoder = resp.decoder
+
+        # We were able to read at least _some_ body data from the server. Keep trying.
+        raise  # Jump to outer except block
+
+    hstat.decoder = None
+    return UErr.RETRFINISHED, doctype
+
+
+def parse_crange_num(hdrc, ci, postchar):
+    if not hdrc[ci].isdigit():
+        raise ValueError('parse error')
+    num = 0
+    while hdrc[ci].isdigit():
+        num = 10 * num + int(hdrc[ci])
+        ci += 1
+    if hdrc[ci] != postchar:
+        raise ValueError('parse error')
+    ci += 1
+    return ci, num
+
+
+def parse_content_range(hdr):
+    if hdr is None:
+        return None
+
+    # Ancient version of Netscape proxy server don't have the "bytes" specifier
+    if hdr.startswith('bytes'):
+        hdr = hdr[5:]
+        # JavaWebServer/1.1.1 sends "bytes: x-y/z"
+        if hdr.startswith(':'):
+            hdr = hdr[1:]
+        hdr = hdr.lstrip()
+        if not hdr:
+            return None
+
+    ci = 0
+    # Final string is a sentinel, equivalent to a null terminator
+    hdrc = tuple(itertools.chain((c for c in hdr), ('',)))
+
+    try:
+        ci, first_bytep = parse_crange_num(hdrc, ci, '-')
+        ci, last_bytep = parse_crange_num(hdrc, ci, '/')
+    except ValueError:
+        return None
+
+    if hdrc[ci] == '*':
+        entity_length = None
+    else:
+        num_ = int(0)
+        while hdrc[ci].isdigit():
+            num_ = int(10) * num_ + int(hdrc[ci])
+            ci += 1
+        entity_length = num_
+
+    # A byte-content-range-spec whose last-byte-pos value is less than its first-byte-pos value, or whose entity-length
+    # value is less than or equal to its last-byte-pos value, is invalid.
+    if last_bytep < first_bytep or (entity_length is not None and entity_length <= last_bytep):
+        return None
+
+    return first_bytep, last_bytep, entity_length
+
+
+def touch(fl, mtime, dir_fd=None):
+    atime = time.time()
+    if os.utime in os.supports_dir_fd and dir_fd is not None:
+        os.utime(os.path.basename(fl), (atime, mtime), dir_fd=dir_fd)
+    else:
+        os.utime(fl, (atime, mtime))
+
+
+class WGError(Exception):
+    def __init__(self, logger, url, msg, cause=None, info=None):
+        self.logger = logger
+        self.url = url
+        self.msg = msg
+        self.cause = cause
+        self.info = info
+
+    def log(self):
+        info = OrderedDict()
+        if self.cause is not None:
+            info['Caused by'] = repr(self.cause)
+        if self.info is not None:
+            info.update(self.info)
+        self.logger.error(self.url, self.msg, info)
+
+    def __str__(self):
+        return repr(self)
+
+
+class WGMaxRetryError(WGError):
+    pass
+
+
+class WGUnreachableHostError(WGError):
+    pass
+
+
+class WGBadProtocolError(WGError):
+    pass
+
+
+class WGBadResponseError(WGError):
+    pass
+
+
+class WGWrongCodeError(WGBadResponseError):
+    def __init__(self, logger, url, statcode, statmsg, headers):
+        msg = 'Unexpected response status: HTTP {} {}'.format(statcode, statmsg)
+        info = OrderedDict()
+        if statcode not in (403, 404):
+            info['Headers'] = headers
+        super().__init__(logger, url, msg, info=info)
+        self.statcode = statcode
+        self.statmsg = statmsg
+
+
+class WGRangeError(WGBadResponseError):
+    pass
+
+
+unreachable_hosts: Set[str] = set()
+
+
+class RetryCounter:
+    TRY_LIMIT = 20
+    MAX_RETRY_WAIT = 10
+
+    def __init__(self, logger):
+        self.logger = logger
+        self.count = 0
+
+    def reset(self):
+        self.count = 0
+
+    def should_retry(self):
+        return self.TRY_LIMIT is None or self.count < self.TRY_LIMIT
+
+    def increment(self, url, hstat, cause, sleep_dur=None):
+        self.count += 1
+        status = 'incomplete' if hstat.bytes_read > hstat.restval else 'failed'
+        msg = 'because of {} retrieval: {}'.format(status, cause)
+        if not self.should_retry():
+            raise WGMaxRetryError(
+                self.logger, url,
+                'Retrieval {} after {} tries.'.format(status, self.TRY_LIMIT),
+                cause,
+            )
+        trylim = '' if self.TRY_LIMIT is None else '/{}'.format(self.TRY_LIMIT)
+        self.logger.info(url, 'Retrying ({}{}) {}'.format(self.count, trylim, msg))
+
+        if sleep_dur is None:
+            sleep_dur = min(self.count, self.MAX_RETRY_WAIT)
+        time.sleep(sleep_dur)
+
+
+def normalized_host_from_url(url):
+    split = urlsplit(url, 'http')
+    hostname = split.hostname
+    port = split.port
+    if port is None:
+        port = 80 if split.scheme == 'http' else 443
+    return '{}:{}'.format(hostname, port)
+
+
+def normalized_host(scheme, host, port):
+    if port is None:
+        port = 80 if scheme == 'http' else 443
+    return '{}:{}'.format(host, port)
+
+
+def _retrieve_loop(
+    hstat: HttpStat,
+    url: str,
+    dest_file: str,
+    post_id: Optional[str],
+    post_timestamp: Optional[float],
+    adjust_basename: Optional[Callable[[str, BinaryIO], str]],
+    log: Callable[[LogLevel, str], None],
+    use_dns_check: bool,
+    use_internet_archive: bool,
+    use_server_timestamps: bool,
+) -> None:
+    logger = Logger(url, post_id, log)
+
+    if urlsplit(url).scheme not in ('http', 'https'):
+        raise WGBadProtocolError(logger, url, 'Non-HTTP(S) protocols are not implemented.')
+
+    hostname = normalized_host_from_url(url)
+    if hostname in unreachable_hosts:
+        raise WGUnreachableHostError(logger, url, 'Host {} is ignored.'.format(hostname))
+
+    doctype = 0
+    dest_dirname, dest_basename = os.path.split(dest_file)
+
+    if os.name == 'posix':  # Opening directories is a POSIX feature
+        hstat.dest_dir = opendir(dest_dirname, os.O_RDONLY)
+    hstat.set_part_file_supplier(functools.partial(
+        lambda pfx, dir_: NamedTemporaryFile('wb', prefix=pfx, dir=dir_, delete=False),
+        '.{}.'.format(dest_basename), dest_dirname,
+    ))
+
+    # THE loop
+
+    using_internet_archive = False
+    ia_fallback_cause: Optional[WGWrongCodeError] = None
+    orig_url = url
+    orig_doctype = doctype
+    retry_counter = RetryCounter(logger)
+    while True:
+        # Behave as if force_full_retrieve is always enabled
+        hstat.restval = hstat.bytes_read
+
+        try:
+            err, doctype = gethttp(url, hstat, doctype, logger, retry_counter, use_dns_check)
+        except MaxRetryError as e:
+            raise WGMaxRetryError(logger, url, 'urllib3 reached a retry limit.', e)
+        except HTTPError as e:
+            if isinstance(e, ConnectTimeoutError):
+                # Host is unreachable (incl ETIMEDOUT, EHOSTUNREACH, and EAI_NONAME) - condemn it and don't retry
+                conn = e.pool if isinstance(e, PoolError) else e.args[0]
+                hostname = normalized_host(None, conn.host, conn.port)
+                unreachable_hosts.add(hostname)
+                msg = 'Error connecting to host {}. From now on it will be ignored.'.format(hostname)
+                raise WGUnreachableHostError(logger, url, msg, e)
+
+            retry_counter.increment(url, hstat, repr(e))
+            continue
+        except OSError as e:
+            if e.errno != errno.ENOSPC:
+                raise
+
+            # Being low on disk space is a temporary system error, don't count against the server
+            enospc.signal()
+            continue
+        except WGUnreachableHostError as e:
+            # Set the logger for unreachable host errors thrown from WGHTTP(S)ConnectionPool
+            if e.logger is None:
+                e.logger = logger
+            raise
+        except WGWrongCodeError as e:
+            if (
+                use_internet_archive
+                and not using_internet_archive
+                and hstat.statcode in (403, 404)
+                and urlsplit(orig_url).netloc.endswith('.tumblr.com')  # type: ignore[arg-type]
+            ):
+                using_internet_archive = True
+                traceback.clear_frames(e.__traceback__)  # prevent reference cycle
+                ia_fallback_cause = e
+                url = 'https://web.archive.org/web/0/{}'.format(orig_url)  # type: ignore[assignment,str-bytes-safe]
+                doctype = orig_doctype
+                retry_counter.reset()
+                continue
+            if using_internet_archive and hstat.statcode == 404:
+                # Not available at the Internet Archive, report the original error
+                assert ia_fallback_cause is not None
+                raise ia_fallback_cause from None
+            raise
+        finally:
+            if hstat.current_url is not None:
+                url = hstat.current_url
+
+        if err == UErr.RETRINCOMPLETE:
+            continue  # Non-fatal error, try again
+        if err == UErr.RETRUNNEEDED:
+            return
+        assert err == UErr.RETRFINISHED
+
+        if hstat.contlen is not None and hstat.bytes_read < hstat.contlen:
+            # We lost the connection too soon
+            retry_counter.increment(url, hstat, 'Server closed connection before Content-Length was reached.')
+            continue
+
+        # We shouldn't have read more than Content-Length bytes
+        assert hstat.contlen in (None, hstat.bytes_read)
+
+        if using_internet_archive:
+            assert ia_fallback_cause is not None
+            c = ia_fallback_cause
+            logger.info(
+                orig_url, 'Downloaded from Internet Archive due to HTTP Error {} {}'.format(c.statcode, c.statmsg),
+            )
+
+        # Normal return path - we wrote a local file
+        assert hstat.part_file is not None
+        pfname = hstat.part_file.name
+
+        # NamedTemporaryFile is created 0600, set mode to the usual 0644
+        if os.name == 'posix':
+            os.fchmod(hstat.part_file.fileno(), 0o644)
+        else:
+            os.chmod(hstat.part_file.name, 0o644)
+
+        if use_server_timestamps and hstat.remote_time is None:
+            status = 'missing' if hstat.last_modified is None else f'invalid: {hstat.last_modified}'
+            logger.warn(url, f'Warning: Last-Modified header is {status}')
+
+        # Flush the userspace buffer so mtime isn't updated
+        hstat.part_file.flush()
+
+        # Set the timestamp on the local file
+        if (
+            use_server_timestamps
+            and (hstat.remote_time is not None or post_timestamp is not None)
+            and hstat.contlen in (None, hstat.bytes_read)
+        ):
+            if hstat.remote_time is None:
+                tstamp = post_timestamp
+            elif post_timestamp is None:
+                tstamp = hstat.remote_time
+            else:
+                tstamp = min(hstat.remote_time, post_timestamp)
+            touch(pfname, tstamp, dir_fd=hstat.dest_dir)
+
+        # Adjust the new name
+        if adjust_basename is None:
+            new_dest_basename = dest_basename
+        else:
+            # Give adjust_basename a read-only file handle
+            pf = open(hstat.part_file.fileno(), 'rb', closefd=False)
+            new_dest_basename = adjust_basename(dest_basename, pf)
+
+        # Sync the inode
+        fsync(hstat.part_file)
+        try:
+            hstat.part_file.close()
+        finally:
+            hstat.part_file = None
+
+        # Move to final destination
+        new_dest = os.path.join(dest_dirname, new_dest_basename)
+        if os.rename not in os.supports_dir_fd:
+            os.replace(pfname, new_dest)
+        else:
+            os.replace(os.path.basename(pfname), new_dest_basename,
+                       src_dir_fd=hstat.dest_dir, dst_dir_fd=hstat.dest_dir)
+
+        return
+
+
+def setup_wget(ssl_verify, user_agent):
+    if not ssl_verify:
+        # Hide the InsecureRequestWarning from urllib3
+        warnings.filterwarnings('ignore', category=InsecureRequestWarning)
+    poolman.connection_pool_kw['cert_reqs'] = 'CERT_REQUIRED' if ssl_verify else 'CERT_NONE'
+    if user_agent is not None:
+        base_headers['User-Agent'] = user_agent
+
+
+# This is a simple urllib3-based urlopen function.
+def urlopen(url, use_dns_check: bool, headers: Optional[Dict[str, str]] = None, **kwargs) -> BaseHTTPResponse:
+    req_headers = base_headers.copy()
+    if headers is not None:
+        req_headers.update(headers)
+
+    while True:
+        try:
+            return poolman.request('GET', url, headers=req_headers, retries=HTTP_RETRY, **kwargs)
+        except HTTPError:
+            if is_dns_working(timeout=5, check=use_dns_check):
+                raise
+            # Having no internet is a temporary system error
+            no_internet.signal()
+
+
+# This functor is the primary API of this module.
+class WgetRetrieveWrapper:
+    def __init__(self, log: Callable[[LogLevel, str], None], options: Namespace):
+        self.log = log
+        self.options = options
+
+    def __call__(self, url, file, post_id=None, post_timestamp=None, adjust_basename=None):
+        hstat = HttpStat()
+        try:
+            _retrieve_loop(
+                hstat, url, file, post_id, post_timestamp, adjust_basename, self.log,
+                use_dns_check=self.options.use_dns_check, use_internet_archive=self.options.internet_archive,
+                use_server_timestamps=self.options.use_server_timestamps,
+            )
+        finally:
+            if hstat.dest_dir is not None:
+                os.close(hstat.dest_dir)
+                hstat.dest_dir = None
+            # part_file may still be around if we didn't move it
+            if hstat.part_file is not None:
+                self._close_part(hstat)
+
+        return hstat
+
+    @staticmethod
+    def _close_part(hstat):
+        try:
+            hstat.part_file.close()
+            try_unlink(hstat.part_file.name)
+        finally:
+            hstat.part_file = None
diff --git a/tumblr_backup_for_beginners.md b/tumblr_backup_for_beginners.md
deleted file mode 100644
index 2282ae2..0000000
--- a/tumblr_backup_for_beginners.md
+++ /dev/null
@@ -1,113 +0,0 @@
-# Tumblr Backup 101
-
-This guide is for 100% programming/coding newbies to use this Tumblr Backup service.
-
-If you prefer watching a video instead of reading, see [this video tutorial](https://www.youtube.com/watch?v=mwG9bzL0E_4) by [@Sheepykin](https://github.com/Sheepykin). Thanks a lot!
-
-## Why 101?
-Tumblr does not have an export service, and all the easily downloadable/online ones are not very good. I strongly believe that any service you use should make it easy to back up your words and work.
-
-This program backs up your Tumblr onto your computer, and saves it on your hard drive. It ends up looking like [this](http://drbeat.li/tumblr). This program is excellent and easy to use - but also a but intimidating if you have never used command line programmes before.
-
-Don't panic! I'm going to walk you through step by step.
-
-## Getting Started
-This guide is for Windows users. 
-
-### Step 1: Install Python
-1. The program we are going to run is called tumblr_backup.py. It is a **python file**. This means it is a file written in the programming language Python.
-
-2. Just like you need a program like Word to view a_document.doc, or Paint to view a_picture.jpeg, you need to download Python to make this program work.
-
-3. Go to the [Python website](https://www.python.org/downloads/release/python-2712/). We are downloading v.2 of Python because this program is designed to work with v2
-
-4. Download the file called **Windows x86 MSI installer**
-
-5. Install it by double clicking.
-
-6. You've installed Python! You can now run Python programs, and if you want to learn to code, you can also use this installation to practice your coding.
-
-### Step 2: Download tumblr_backup
-
-1. Download and unzip this file: [tumblr-utils.zip](https://github.com/bbolli/tumblr-utils/zipball/master)
-
-2. Unzip the file somewhere easy to find, say in your Downloads folder. Remember the folder where you extracted the ZIP file!
-
-### Step 3: Add the download folder to the PATH
-
-This step is optional. It's only needed if you intend to start the script from anywhere on your PC. You don't need to do it if you follow the rest of this guide.
-
-1. We are now going to add this file to your PATH. What is the PATH? It essentially tells the computer where to find programs that you call from the command prompt.
-
-2. First, you need to find the complete path of the folder your download is in. This is the folder name you remembered in step 2.2. Mine looks like:
-`C:\Users\Unmutual\Downloads\bbolli-tumblr-utils-3a37fe6\bbolli-tumblr-utils-3a37fe6`
-(Yours will be different. The word Unmutual is my username; and you may have saved your file in a different place)
-
-3. Open up Control Panel. Search for Advanced System Settings. Click the link reading Environment Variables. 
-
-4. Scroll down the variables until you find one reading "Path". Click it. Click edit. If there is nothing in the box, simply paste in the url. If there is something in the box add a semi-colon to the end of the line. Then, paste in the url. (the semi-colon tells the computer to treat the two things as different, not interpret it as one big thing)
-
-(I learnt how to do this from [this page](https://www.java.com/en/download/help/path.xml), which gives lots of options for different windows systems. Check the link if my description isn't working for you.)
-
-### Step 4. Use the Command Prompt
-
-1. The command line is the bit of the computer which makes you feel like you're in the Matrix. Once you get used to the command line, you will become fucking addicted to it - I promise. This is because the command line is like seeing the puppeteer beind the puppet show. You will feel powerful. You will feel like the computer is yours to control, not this arcane box, but *your* computer which you can use to do pretty much anything.
-
-2. To find the command prompt, go to your system search and type in "Command Prompt". Click it.
-
-3. Your next step is to navigate the prompt to the file tumblr_backup.py. There are better guides out there than this for using the command prompt. I am going to explain, but feel free to google for one with pictures.
-
-4. On the left hand side of the screen is your current folder. For me, it reads `C:\Users\Unmutual>`, and then there is a blinky cursor. 
-
-5. Type `cd Downloads` and then press enter. Your screen now reads `C:\Users\Unmutual\Downloads>` (with your name, in place of the word "Unmutual"). "cd" stands for "change directory". You have gone one directory down! This is equivalent to just double clicking on the downloads folder. If you go wrong, typing `cd ..` will go up one directory again (back to C:\Users\Unmutual>). Have a play around and do some cackling. If you simply type "dir" it will give you a list of all the files in that directory.
-
-6. Once you're done pretending to be in the Matrix, navigate to the folder the file tumblr_backup.py is in. For me, this is:
-`cd C:\Users\Unmutual\Downloads\bbolli-tumblr-utils-3a37fe6`, or, from the Downloads folder, just `cd bbolli-tumblr-utils-3a37fe6`.
-
-### Step 5. Run!
-
-1. Plug in your laptop charger, and make sure you have a stable internet connection, and that the laptop won't auto shutdown, sleep or screensaver. This program will run for a while and it's a faff to restart.
-
-2. Where the blinky cursor is, type `tumblr_backup.py yourtumblrname`. The first bit tells the Windows to run Python, the second bit tells Python to run the backup script, and the third bit - yourtumblrname - tells the backup script which tumblr to download. For example, you may type
-
-    tumblr_backup.py discoinferno
-
-If you are tumblr user @ discoinferno. 
-
-(You can use this to backup any tumblr, including someone else's, but I think that's a tad shady)
-
-3. Your command prompt will start spitting letters and phrases onto the screen. Leave it to it! You can do other stuff while you wait, just leave the black command prompt box open and running.
-
-### Appendix A. How tumblr_backup works
-
-tumblr_backup grabs 50 posts at a time and downloads them onto your hard drive. In the same folder as the program tumblr_backup.py, it will create a folder with the name of your blog. it downloads everything into the folder. 
-
-Once you're done, you can open the folder and find the document called "index.html". Right click index.html, and choose "Open With Firefox" - or whatever internet browser you use.
-
-### Appendix B. Flags, options etc.
-
-In Step 4, you use the command line to tell the program to run.
-
-You type in the name of the program, and then your username.
-
-You can also add "flags" which give the program special running instructions.
-
-You put flags **between** the program name and username - for example:
-
-    tumblr_backup.py -t DOGS discoinferno
-
-Would only backup pages marked "dogs". You can see the whole list of flags in tumblr_backup.md.
-
-They are useful for, example - using `-T text` to only download your text posts, or `-p 2018` to only download this year's posts.
-
-You can get a list of all supported options with `tumblr_backup.py --help`.
-
-###### TODO
-
-1. how to restart the process
-2. more detail
-3. probably links and pictures
-
-
-
-