RAM Usage issue with m3u8 videos #8

fl4shforward · 2024-01-05T16:00:46Z

Hi!
I dockerized your fork and run it on a NAS with 8gb of ram.

A creator I follow posts 15-20 minutes 4K videos (1,5 to 3GB file size) that make the scraper RAM usage explode and saturate all 8GB of ram of the nas. At most the scraper was using close to 6GB.

As you can see on the screenshot above, the NAS starts aggressively killing everything to get RAM back (all containers and nas services).
I was able to get around the issue by limiting ram usage with docker limits but the scraper runs super slow because of it.

I'm under the impression that m3u8 videos are fully downloaded in RAM before being offloaded in a file is that right ?
Would this be something that could be mitigated ?

prof79 · 2024-01-05T16:36:53Z

Hi!

The M3U8 code is largely from the original codebase, regrettably I am not an expert for M3U8 and stream processing in general and in Python in particular.

All TS streams are plucked from the M3U8, downloaded and then re-muxed (demuxed/muxed). I don't know if mem usage already explodes during downloads or only during the re-muxing process. I see Avnsx properly uses a streamed web request saving memory but all streams are downloaded almost at once using a thread pool. For the re-muxing I currently lack the knowledge.

I can do some more research and try to debug it but without a concrete sample, where this can be observed, it is even more difficult. I also had to postpone some important private stuff due to #3 so I beg your patience.

prof79 · 2024-01-05T16:53:43Z

Now I see it, you are absolutely right - although the downloads are streamed/chunked, everything goes into a memory buffer and all .ts file contents are then collected/merged - also in memory!

This will take quite some effort to re-write the code using temporary files on disk instead and not breaking anything during the process. This may take 14 days or so according to my schedule, if I get some free spot maybe earlier but no promises.

fl4shforward · 2024-01-05T17:18:43Z

Now I see it, you are absolutely right - although the downloads are streamed/chunked, everything goes into a memory buffer and all .ts file contents are then collected/merged - also in memory!

That would explain why ram usage is about double the size of the file. That's how I kind of guessed it was working with a quick look at the code.

No worries though, I'm not in a rush, it works, slowly, but it works.
Since I run it on a nas it's really not an urgent issue.

I was just sharing my discovery 😄

I run it fully headless and noticed the issue when my monitoring started alerting me about all my services going down lol

prof79 · 2024-01-05T21:54:50Z

Thanks a lot for sharing, neither do I know 4K creators nor would I have noticed on my gaming PC 😂 (shame on me)
So, hopefully, all old code paths will get an overhaul sooner or later.

Glad this is just a private NAS and nothing critical 😂

prof79 · 2024-01-19T22:04:04Z

Writing a little scraper for another site I learned more about M3U8 and MPEG-TS and ffmpeg, I plan on moving to this new way of downloading and processing. This might, however, break de-duped existing videos and re-download them. I also hope this will package properly. Stay tuned.

prof79 · 2024-01-20T19:30:10Z

Hi, you might try this version but note the warning - try with a different folder or backup your existing creator(s). Though I have some ideas I do not yet have a solution for the de-duplication thing as the files essentially become different files when ffmpeg merges them proper.

I'm not sure whether this will work in your Docker container. It didn't work on WSL on a mounted project directory but I didn't try on a native Linux. There might be an issue regarding pyffmpeg and quoting as mentioned on their GitHub but the error message is different.

https://github.com/prof79/fansly-downloader-ng/releases/tag/ondemand

fl4shforward · 2024-01-22T09:32:43Z

I'll have to try and make a new image with your new branch.
Gonna have to make a new branch myself with a new submodule version, never done that before and I'm not a git expert by any way.
Probably gonna take some days till I have the time to figure it out :)

[EDIT]

Following error occurs at concat step: "Error opening input files: No such file or directory"

 Info | 11:52 || Downloading video '2023-11-18_at_19-46_id_582327431130525696.m3u8'
2024-01-22 11:52:41,156 - pyffmpeg.FFmpeg - INFO - Checking GitHub Activeness: True
2024-01-22 11:52:44,568 - pyffmpeg.FFmpeg - INFO - Using /root/.pyffmpeg/bin/ffmpeg as ffmpeg file
2024-01-22 11:52:44,568 - pyffmpeg.FFmpeg - INFO - Options is: /root/.pyffmpeg/bin/ffmpeg -y -f concat -i "downloads/******/Messages/Videos/_ffmpeg_concat_.ffc" -c copy "downloads/******/Messages/Videos/2023-11-18_at_19-46_id_582327431130525696.mp4" as at now
2024-01-22 11:52:44,572 - pyffmpeg.FFmpeg - ERROR - Error opening input files: No such file or directory
 [43]ERROR | 11:52 || Unexpected error during Messages download: 
Traceback (most recent call last):
  File "/usr/src/fansly-ng/download/common.py", line 144, in process_download_accessible_media
    download_media(config, state, accessible_media)
  File "/usr/src/fansly-ng/download/media.py", line 162, in download_media
    file_downloaded = download_m3u8(
                      ^^^^^^^^^^^^^^
  File "/usr/src/fansly-ng/download/m3u8.py", line 166, in download_m3u8
    ffmpeg.options(
  File "/usr/local/lib/python3.11/site-packages/pyffmpeg/__init__.py", line 292, in options
    raise Exception(self.error)
Exception: Error opening input files: No such file or directory
Continuing in 15 seconds ...

Files and concat are written to disk:

_ffmpeg_concat_.ffc is populated:

~~Shouldn't the files be prefixed with the full path (downloads/******/Messages/Videos/) in the _ffmpeg_concat_.ffc file ?~~ Doesn't change anything.

prof79 · 2024-01-22T18:17:47Z

Very quick and I wanted to write, I might have dispelled my Linux doubts in a few days :D

Actually there is two ways to do such concat files - relative or absolute; and absolute would require an unsafe flag. Relative paths are used relative to where the concat file is located - so should be no problem in this case, as I also specify the list file name in a fully-qualified manner to pyffmpeg.

I rather suspect, but could not yet test it due to hashing headaches, that pyffmpeg does some fancy quoting or non-quoting stuff with the command-line options in the background. Thus I'll try a version where I'll manually launch ffmpeg as provided by pyffmepg using the subprocess module. Some luck and this might already be a winner. I can at least tell you, using pyffmpeg's binary directly from a WSL command line and a set of .ts files and a list file, it works. There are also several issues in the pyffmpeg GitHub repository hinting that Linux support is currently broken and they, for whatever reason, do not commit the necessary changes/pull requests upstream. But this is the only package I could find so far that is not overblown/overly complicated and includes an ffmpeg binary independent of platform.

fl4shforward · 2024-01-22T18:31:20Z

I'm under the impression that pyffmpeg "overrides" the current working dir which would obviously cause issues with relative paths. Currently setting up a linux VM to do some more testing.

Actually there is two ways to do such concat files - relative or absolute; and absolute would require an unsafe flag.

I'm curious about this, do you know of any doc I could read about that ? Never knew absolute paths were unsafe.

I rather suspect, but could not yet test it due to hashing headaches, that pyffmpeg does some fancy quoting or non-quoting stuff with the command-line options in the background. Thus I'll try a version where I'll manually launch ffmpeg as provided by pyffmepg using the subprocess module.

Could be a realtively quick and easy fix, yes.

prof79 · 2024-01-22T19:19:32Z

Yeah and it gives more control like exception handling - I can also do fast and should probably start out a psychic 😂

Try this: https://github.com/prof79/fansly-downloader-ng/releases/tag/ondemand - 176c42f 😁

Regarding your question:

https://trac.ffmpeg.org/wiki/Concatenate

https://ffmpeg.org/ffmpeg-formats.html#concat-1 -> see 3.5.2

prof79 · 2024-01-22T19:30:13Z

I've also implemented a new selective MP4 hashing algorithm that ignores stupid lavf version info or re-muxing "artifacts" like deviating bitrates and stuff in the header although the track data is identical to the old manual method.

Having an opt-in new, more succinct file naming scheme probably using a CRC is also on my personal wishlist. But I still don't know if using pHash for images currently is a beneficial or detrimental thing ...

fl4shforward · 2024-01-22T19:49:54Z

It seems to work fine. I lifted the docker limits and I'm trying the full scrape of the creator that raised the issue.
RAM usage is way more under control. No services are down at the moment 😁

Happy NAS !

Regarding your question:

https://trac.ffmpeg.org/wiki/Concatenate

https://ffmpeg.org/ffmpeg-formats.html#concat-1 -> see 3.5.2

Oh, I see ! Never thought of that.

prof79 · 2024-01-22T20:24:40Z

Awesome! 😁🙏
That's what I expected using memory-profiler, whose commented fragments are still there, where memory usage aside video was somewhat shy of 100 MiB. If I interpret it correctly, most of your RAM usage is caching by the NAS or Docker engine itself, the grey-blueish usage bar is almost unnoticeable 😁

We can leave this open if you want to do some more testing, you could close yourself or I could close with the next main branch release tomorrow+, also need to write up some explanation/release notes but not today ...

fl4shforward · 2024-01-22T20:31:13Z

Blue is the RAM, yes the most I've seen used atm is around 900mb but most of the time it's around 180-200mb.
Red is cache, don't really know what it means though.

From what I see, all the big 4K vids are downloaded and no issue. I guess we can close.

prof79 · 2024-01-22T21:05:23Z

Well, tbh I've not monitored/checked what RAM usage the ffmpeg binary contributes during a merge but sounds OK I guess.

Cache is cache 😁 - all the stuff from disk that is used by the NAS OS/services/Docker and identified as potentially required often is proactively loaded/stored in RAM - aka cached - since RAM is just so much faster than even the fastest SSD can be. What is more, this also ensures good use of your RAM instead of being empty to a large degree all the time 😁Also, stuff written back to disk may get buffered (cached) in RAM to speed things up and cut the disks some slack. But cache can be freed/shrinked by the OS as needed.
Eg. my Windows power horse with 32 GiB RAM from 18.1 GiB used RAM uses 13.1 GiB for caching alone, so effectively ~ 5 GiB are required for basic OS services, lots of browser sessions and VS Code.

prof79 self-assigned this Jan 5, 2024

prof79 added the help wanted Extra attention is needed label Jan 5, 2024

prof79 added performance Unoptimized CPU/RAM usage and removed help wanted Extra attention is needed labels Jan 5, 2024

prof79 mentioned this issue Jan 19, 2024

[Errno 22] Application provided invalid, non monotonically increasing dts to muxer in stream 1 #9

Closed

3 tasks

prof79 linked a pull request Jan 27, 2024 that will close this issue

Important video fixes #10

Merged

prof79 closed this as completed in #10 Jan 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAM Usage issue with m3u8 videos #8

RAM Usage issue with m3u8 videos #8

fl4shforward commented Jan 5, 2024 •

edited

Loading

prof79 commented Jan 5, 2024

prof79 commented Jan 5, 2024

fl4shforward commented Jan 5, 2024 •

edited

Loading

prof79 commented Jan 5, 2024

prof79 commented Jan 19, 2024

prof79 commented Jan 20, 2024

fl4shforward commented Jan 22, 2024 •

edited

Loading

prof79 commented Jan 22, 2024 •

edited

Loading

fl4shforward commented Jan 22, 2024

prof79 commented Jan 22, 2024

prof79 commented Jan 22, 2024

fl4shforward commented Jan 22, 2024 •

edited

Loading

prof79 commented Jan 22, 2024

fl4shforward commented Jan 22, 2024

prof79 commented Jan 22, 2024 •

edited

Loading

RAM Usage issue with m3u8 videos #8

RAM Usage issue with m3u8 videos #8

Comments

fl4shforward commented Jan 5, 2024 • edited Loading

prof79 commented Jan 5, 2024

prof79 commented Jan 5, 2024

fl4shforward commented Jan 5, 2024 • edited Loading

prof79 commented Jan 5, 2024

prof79 commented Jan 19, 2024

prof79 commented Jan 20, 2024

fl4shforward commented Jan 22, 2024 • edited Loading

prof79 commented Jan 22, 2024 • edited Loading

fl4shforward commented Jan 22, 2024

prof79 commented Jan 22, 2024

prof79 commented Jan 22, 2024

fl4shforward commented Jan 22, 2024 • edited Loading

prof79 commented Jan 22, 2024

fl4shforward commented Jan 22, 2024

prof79 commented Jan 22, 2024 • edited Loading

fl4shforward commented Jan 5, 2024 •

edited

Loading

fl4shforward commented Jan 5, 2024 •

edited

Loading

fl4shforward commented Jan 22, 2024 •

edited

Loading

prof79 commented Jan 22, 2024 •

edited

Loading

fl4shforward commented Jan 22, 2024 •

edited

Loading

prof79 commented Jan 22, 2024 •

edited

Loading