Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

Rewrite/Refactoring #157

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Rewrite/Refactoring #157

wants to merge 20 commits into from

Conversation

prof79
Copy link

@prof79 prof79 commented Aug 30, 2023

Hi there,

this is the rewrite/refactoring I promised. Took a week in my vacation.

It's now modularized and avoids code duplication where possible.

A configuration object, representing config.ini, and a download state (per creator) is threaded through the whole application. The config object also handles the synchronization to config.ini on save.
Thus multi-user scraping also becomes a reality.

There is also no need to pre-supply a config.ini anymore - if it is missing a blank one will be generated automatically; missing options are added with their defaults automatically (on save).

You can fully operate it via command-line now (see fansly_downloader.py -h).

It also cleanly parses weird input like:

fansly_downloader.py -u here ,goes , nothing, @of,Course

which should have been, according to OS parsing semantics:

fansly_downloader.py -u here goes nothing @of Course

but will still be parsed as:

['here', 'goes', 'nothing', 'of', 'course']

User lists will be split by comma or space otherwise.

The program can now also be operated non-interactively like in a scheduled task, preventing input() and sleep()ing instead.
There is also a log file now helping such headless scenarios.

I also introduced an option to omit the _fansly suffix because it felt redundant.

I took a few creators and compared the directory output with your 0.4.1 and that checked out. I tried several things but certainly could not test each and any variation and situation, especially different platform checking fell short.
I tried to refine the update logic but saw no way of testing it at all.

There are certainly things I forgot to mention but see RewriteNotes.md for some stuff I noted down the path.

-- Markus

@Avnsx
Copy link
Owner

Avnsx commented Sep 2, 2023

Thank you for submitting this pull request!

I've had a brief look at it, but unfortunately, during the same time, I was also making changes to the master branch of Fansly Downloader for version 0.4.2. These changes include:

  • Implementing metadata handling for the most common file formats.
  • Adding two new requirements, pyexiv2 and mutagen.
  • Removing the custom exit() function since the current versions of pyinstaller seem to work without it.
  • Introducing del_redudant_pyinstaller_files(), designed to delete old MEI folders created by pyinstaller every time the executable version of Fansly Downloader is launched but left behind without clearing them.
  • Adjusting for the recent rate-limiting update by Fansly.

It would be great if you could incorporate these recent changes into your fork & pull request as well.

Furthermore, I'll be going on a month-long vacation, so I won't be able to review or interact with this pull request for a while. However, upon my return, I will thoroughly test these changes and may create a dev branch to fine-tune them before merging them into the master branch.

From my initial observation, I'm striving to maintain a clean initial file structure. When someone opens the repository, it can be overwhelming to see numerous folders and files. I suggest organizing them into subfolders, similar to what I was doing previously with the 'utils' folder, and possibly creating additional subfolders below it for better categorization. Regarding the .gitignore, while it's valuable for development purposes, I feel it might be somewhat unnecessary overall.

Edit: Actually I am not sure if I can even stay with these metadata changes as pyexiv2 requires exiv2.dll so I can't even package it with pyinstaller without it erroring out ...

@prof79
Copy link
Author

prof79 commented Sep 3, 2023

The changes are there with one exception: I'm a fan of clean code and encapsulation and MetadataManager() isn't. You know that there are file limits in pyexiv2 of 1 GiB/2 GiB and do nothing about it - a typical case for MetadataManager() to raise an exception on too large files, return False or whatever.

Also I do not incorporate changes when I don't know where the journey is heading and what the intended use is - does this mean that file names get shorter then/no id/hash info? Will they still be unique? What about existing repositories of users on their drives/NAS? Will existing files stay the same? Will their names be converted? Will the users be able to opt-in to file name conversion? There is enough multimedia software like image libraries which remember last used files, catalog files and so on.

Therefore I added the MetadataManager class but did not do anything with it in the file hashing code as the path is not clear and also stuff would need to be rewritten for cleanliness and safety.

Btw your rate-limiting changes worked for a couple of hours, or a day, but the second timeline-request of a creator is empty when it shouldn't. You can proof this using Postman or any other tool that a request yielding posts when repeated again after a few secs will be empty.

@Avnsx

This comment was marked as outdated.

@Avnsx
Copy link
Owner

Avnsx commented Nov 3, 2023

@prof79 Nevermind about that 0.4.2 version, I am honestly too lazy to find out how to fix the metadata stuff so that it would work with macOS and also be packageable with pyinstaller.

I would appreciate it alot, if you could revert your changes to this version of the re-write back to how it was with version 0.4.1 (before I added the metadata adding stuff), else just let me know and I'll do it myself and finally create a dev branch, I think I kinda found ambition to work on this open source project again.

@prof79
Copy link
Author

prof79 commented Nov 10, 2023

Well I had only added the metadata class - but tbh I never added the metadata code itself; since I have not had the time to learn properly about EXIF and 3rd party tagging and potential implications - thus in doubt of the proper way forward I refrained from doing it.

@Avnsx Avnsx mentioned this pull request Nov 23, 2023
@Joly0
Copy link

Joly0 commented Dec 9, 2023

Hey @prof79 just so u know, i tried your fork for quite some time now and i noticedd this error keeps coming up:

Info | 12:27 || Downloading video '2023-11-23_at_04-10_id_583908461062922240.m3u8'
 [43]ERROR | 12:27 || Unexpected error during Timeline download: 
Traceback (most recent call last):
  File "/fansly-downloader/fansly-downloader-rewrite/fansly_downloader/download/common.py", line 89, in process_download_accessible_media
    download_media(config, state, accessible_media)
  File "/fansly-downloader/fansly-downloader-rewrite/fansly_downloader/download/media.py", line 110, in download_media
    file_downloaded = download_m3u8(config, m3u8_url=media_item.download_url, save_path=file_save_path)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fansly-downloader/fansly-downloader-rewrite/fansly_downloader/download/m3u8.py", line 101, in download_m3u8
    audio_stream = input_container.streams.audio[0]
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range


Press <ENTER> to attempt to continue ...

This appears constantly and i have to press enter every time to continue. And the fork feels slow. It could be due to this bug, i am not sure, but compared to the main repo it feels slower.

@prof79
Copy link
Author

prof79 commented Dec 9, 2023

Hey @Joly0 , thanks - you should probably switch to my Fansly Downloader NG which I've been using for months myself now:

https://github.com/prof79/fansly-downloader-ng

This is a bug of sloppy coding in the original code assuming that m3u8s always have video and audio both which is not the case. This seems to be a media piece without audio. This issue has already cropped up and been fixed over there: prof79/fansly-downloader-ng#2

Regarding speed, NG has intrinsic random delays below one sec per media and 2 to 4 secs per timeline page to prevent anti-rate-limiting measures from the Fansly servers. There was a time when Fansly servers forced to wait around a full minute between fetching timelines or you would just get empty results. This last part, however, can be configured in NG using --timeline-delay-seconds (and --timeline-retries) on the command-line or as timeline_delay_seconds/timeline_retries under Options in the .ini. You may also like to run it with -ni to have no interactive prompts at all except the "finished" one.

Apart from that it's difficult to judge and compare speeds starting with the definition of what is slow and what is not.

@Joly0
Copy link

Joly0 commented Dec 9, 2023

Thanks @prof79 i'll give it a try

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants