Skip to content

Commit

Permalink
Merge pull request #10 from prof79/m3u8
Browse files Browse the repository at this point in the history
Important video fixes
  • Loading branch information
prof79 authored Jan 27, 2024
2 parents 53bd7e1 + 21a5db9 commit 44b8c40
Show file tree
Hide file tree
Showing 14 changed files with 572 additions and 159 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,5 @@ config.ini
*.bak
logo*.txt
dummy.*
# Linux/macOS binary
fansly-downloader-ng
29 changes: 25 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,32 @@ This is a rewrite/refactoring of [Avnsx](https://github.com/Avnsx)'s original [F

## 📰 What's New (Release Notes)

### v0.7.10 2024-01-05
### v0.8.0 2024-01-27

Binary release fixing the [missing media downloads issue #3](../../issues/3). Thanks to all participants!
Also fixes a statistics message counting bug.
Summary release for v0.7.7-v0.7.9, no code changes in this one.
Video Fix Edition

This version fixes some graving bugs in regard to video downloading:

* Ludicrous memory usage, whole MPEG-4 files were buffered to RAM using up to several gigabytes ([#8](../../issues/8))
* Manual re-muxing of MPEG streams which a) caused incompatibilites with certain media ([#9](../../issues/9)) and b) could also lead to malformed MPEG-4 files
* Hashing video files is tricky and broke due to the fix for ([#9](../../issues/9)) but was bound to unnoticeably break in the future anyway, like a timebomb

As a side effect, existing files will be re-hashed and now have a `_hash1_` part instead of `_hash_`. The front remains the same. Sorry for the inconvenience. I also have plans for a new (opt-in) shorter naming scheme using a checksum probably but that's a story for another day.

Along the way I also fixed a configuration file issue where timeline settings where not honored and a file-rename bug.

Long read:

Video files are actually split into chunks of several MPEG-TS streams in varying resolutions and a web video player can decide what to load in (adaptive streaming, DASH, whatever technology and naming). It is common to have such info in playlists using a text format called `M3U8`. So to get an MPEG-4 out of this you need to take the playlist with the highest resolution, fetch all MPEG-TS streams and merge them into an MPEG-4 file. This should be done by software written by video experts who know the standards, not by hand; Avnsx, for whatever reason, decided to re-mux the streams not only on-the-fly in RAM but also fixing DTS packet sequences by hand. People with some tech knowledge can see what all could go and went wrong with this and how I might feel about that.

First, all streams (`.ts`) must be downloaded to disk first instead of buffering all to RAM. Second, regarding concatenation/merging a web search usually ends up with the go-to tool for manipulation of audio and video files - `ffmpeg`. Thus I ended up using `pyffmpeg` which is platform-independent and downloads an appropriate `ffmpeg` binary to help with re-encoding tasks. The lib misses some fixes regarding Linux support - but I could easily launch `ffmpeg` with appropriate arguments by hand. I then use the "demuxer" concat protocol of `ffmpeg` using a concat file (that gets deleted afterwards) to properly merge all streams into an MPEG-4 file, using copy-encoding, with proper timing info and no artifacts (except the original already had problems). This results in a structurally clean `.mp4`.

Merging (concatenating) to a proper MPEG-4 file makes the file look totally different at first glance. Two vids downloaded with the old and new methods differ in file sizes and metadata info like bitrate and duration although they are essentially the same content-wise. What is more, I also discovered that all `libav*`-based software like `ffmpeg` and `PyAV` write the framework's version number into the user metadata portion of the `.mp4`. That's the timebomb I referred to, upgrade to a new library and files that would be the same suddenly differ.

Using some online articles about the essentials of the MPEG-4 format I devised a new hashing method for `.mp4` files: I exclude the so-called `moov` and `mdat` boxes (or atoms) which essentially include all varying header data/metadata like bitrate, duration and so on and also have user data (`udta`) with the `Lavf` version as a sub-part. I'm no MPEG-4 expert at all so hopefully I haven't missed something essential here - but from my tests this works beautifully. The bytes of the audio-video-content itself are the same so they hash the same 🙂.
However, since there is no way to distinguish old-style from new-style hashed files I had to introduce a marker, like a version number, `_hash1_` - and re-hash all existing old-version files on program launch including images. Although image hashing has not changed, differentiating here would have only led to a buggy, unintelligible mess.

Obviously, if a creator re-encoded existing material then the file will be totally different from a binary perspective - even though it may optically check out the same as a previous release; this would require something like a "perceptive hash" - but I still have doubts of that tech probably being too vague - and thus missing content. Therefore, after testing, I might remove pHashing from images in the future.

For more details and history see: **[Release Notes](ReleaseNotes.md)**

Expand Down
27 changes: 27 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,33 @@

## 🗒️ Release Notes

### v0.8.0 2024-01-27

Video Fix Edition

This version fixes some graving bugs in regard to video downloading:

* Ludicrous memory usage, whole MPEG-4 files were buffered to RAM using up to several gigabytes ([#8](../../issues/8))
* Manual re-muxing of MPEG streams which a) caused incompatibilites with certain media ([#9](../../issues/9)) and b) could also lead to malformed MPEG-4 files
* Hashing video files is tricky and broke due to the fix for ([#9](../../issues/9)) but was bound to unnoticeably break in the future anyway, like a timebomb

As a side effect, existing files will be re-hashed and now have a `_hash1_` part instead of `_hash_`. The front remains the same. Sorry for the inconvenience. I also have plans for a new (opt-in) shorter naming scheme using a checksum probably but that's a story for another day.

Along the way I also fixed a configuration file issue where timeline settings where not honored and a file-rename bug.

Long read:

Video files are actually split into chunks of several MPEG-TS streams in varying resolutions and a web video player can decide what to load in (adaptive streaming, DASH, whatever technology and naming). It is common to have such info in playlists using a text format called `M3U8`. So to get an MPEG-4 out of this you need to take the playlist with the highest resolution, fetch all MPEG-TS streams and merge them into an MPEG-4 file. This should be done by software written by video experts who know the standards, not by hand; Avnsx, for whatever reason, decided to re-mux the streams not only on-the-fly in RAM but also fixing DTS packet sequences by hand. People with some tech knowledge can see what all could go and went wrong with this and how I might feel about that.

First, all streams (`.ts`) must be downloaded to disk first instead of buffering all to RAM. Second, regarding concatenation/merging a web search usually ends up with the go-to tool for manipulation of audio and video files - `ffmpeg`. Thus I ended up using `pyffmpeg` which is platform-independent and downloads an appropriate `ffmpeg` binary to help with re-encoding tasks. The lib misses some fixes regarding Linux support - but I could easily launch `ffmpeg` with appropriate arguments by hand. I then use the "demuxer" concat protocol of `ffmpeg` using a concat file (that gets deleted afterwards) to properly merge all streams into an MPEG-4 file, using copy-encoding, with proper timing info and no artifacts (except the original already had problems). This results in a structurally clean `.mp4`.

Merging (concatenating) to a proper MPEG-4 file makes the file look totally different at first glance. Two vids downloaded with the old and new methods differ in file sizes and metadata info like bitrate and duration although they are essentially the same content-wise. What is more, I also discovered that all `libav*`-based software like `ffmpeg` and `PyAV` write the framework's version number into the user metadata portion of the `.mp4`. That's the timebomb I referred to, upgrade to a new library and files that would be the same suddenly differ.

Using some online articles about the essentials of the MPEG-4 format I devised a new hashing method for `.mp4` files: I exclude the so-called `moov` and `mdat` boxes (or atoms) which essentially include all varying header data/metadata like bitrate, duration and so on and also have user data (`udta`) with the `Lavf` version as a sub-part. I'm no MPEG-4 expert at all so hopefully I haven't missed something essential here - but from my tests this works beautifully. The bytes of the audio-video-content itself are the same so they hash the same 🙂.
However, since there is no way to distinguish old-style from new-style hashed files I had to introduce a marker, like a version number, `_hash1_` - and re-hash all existing old-version files on program launch including images. Although image hashing has not changed, differentiating here would have only led to a buggy, unintelligible mess.

Obviously, if a creator re-encoded existing material then the file will be totally different from a binary perspective - even though it may optically check out the same as a previous release; this would require something like a "perceptive hash" - but I still have doubts of that tech probably being too vague - and thus missing content. Therefore, after testing, I might remove pHashing from images in the future.

### v0.7.10 2024-01-05

Binary release fixing the [missing media downloads issue #3](../../issues/3). Thanks to all participants!
Expand Down
8 changes: 6 additions & 2 deletions config/args.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
'-tr', '--timeline-retries',
required=False,
default=1,
default=None,
type=int,
dest='timeline_retries',
help="Number of retries on empty timelines. Defaults to 1. "
Expand All @@ -229,7 +229,7 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
'-td', '--timeline-delay-seconds',
required=False,
default=60,
default=None,
type=int,
dest='timeline_delay_seconds',
help="Number of seconds to wait before retrying empty timelines. "
Expand Down Expand Up @@ -460,6 +460,10 @@ def map_args_to_config(args: argparse.Namespace, config: FanslyConfig) -> None:
check_attr(attr_name, attr_name)
arg_attribute = getattr(args, attr_name)

if arg_attribute is None:
# No arg given, keep default or config.ini value
continue

int_value = 0

try:
Expand Down
10 changes: 10 additions & 0 deletions config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ def load_config(config: FanslyConfig) -> None:
metadata_handling = config._parser.get(options_section, 'metadata_handling', fallback='Advanced')
config.metadata_handling = MetadataHandling(metadata_handling.lower())

# Booleans
config.download_media_previews = config._parser.getboolean(options_section, 'download_media_previews', fallback=True)
config.open_folder_when_finished = config._parser.getboolean(options_section, 'open_folder_when_finished', fallback=True)
config.separate_messages = config._parser.getboolean(options_section, 'separate_messages', fallback=True)
Expand All @@ -222,6 +223,12 @@ def load_config(config: FanslyConfig) -> None:
config.interactive = config._parser.getboolean(options_section, 'interactive', fallback=True)
config.prompt_on_exit = config._parser.getboolean(options_section, 'prompt_on_exit', fallback=True)

# Numbers
config.timeline_retries = config._parser.getint(options_section, 'timeline_retries', fallback=1)
config.timeline_delay_seconds = config._parser.getint(options_section, 'timeline_delay_seconds', fallback=60)

#region Renamed Options

# I renamed this to "use_duplicate_threshold" but retain older config.ini compatibility
# True, False -> boolean
if config._parser.has_option(options_section, 'utilise_duplicate_threshold'):
Expand All @@ -231,6 +238,7 @@ def load_config(config: FanslyConfig) -> None:
else:
config.use_duplicate_threshold = config._parser.getboolean(options_section, 'use_duplicate_threshold', fallback=False)

# Renamed this to "use_folder_suffix"
# True, False -> boolean
if config._parser.has_option(options_section, 'use_suffix'):
config.use_folder_suffix = config._parser.getboolean(options_section, 'use_suffix', fallback=True)
Expand All @@ -240,6 +248,8 @@ def load_config(config: FanslyConfig) -> None:
config.use_folder_suffix = config._parser.getboolean(options_section, 'use_folder_suffix', fallback=True)

#endregion

#endregion

# Safe to save! :-)
save_config_or_raise(config)
Expand Down
Loading

0 comments on commit 44b8c40

Please sign in to comment.