Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio files and attached files doesn't get downloaded in some cases #52

Open
sumeetweb opened this issue Nov 4, 2023 · 19 comments
Open
Assignees

Comments

@sumeetweb
Copy link
Owner

Verification Required:

Script doesn't download audio files for contentable type: audio

It also doesn't download attached files from pages where you have text and then attached files at the bottom of the page.
contentable type: HtmlItem with attached files.

@sumeetweb sumeetweb self-assigned this Nov 4, 2023
@thereals
Copy link

Hi Sumeet,
firstly, thanks for this awesome piece of software!
I can confirm that it won't download sites of type "Audio" and a speaker symbol in front of it.
Like here:
audio_example

Can you try to fix this?

@sumeetweb
Copy link
Owner Author

sumeetweb commented Dec 18, 2023

Hi there. I don't have the api response for type Audio. Please send once sample api response.

Update: I have added audio file downloads to fix-lesson-dl branch. Please test it out.

https://github.com/sumeetweb/Thinki-Downloader/tree/fix-lesson-dl

@thereals
Copy link

thereals commented Dec 18, 2023 via email

@sumeetweb
Copy link
Owner Author

sumeetweb commented Dec 19, 2023 via email

@thereals
Copy link

thereals commented Dec 19, 2023 via email

@sumeetweb
Copy link
Owner Author

@sumeetweb
Copy link
Owner Author

The issue was file name being set as null before saving the file.

@thereals
Copy link

thereals commented Dec 19, 2023 via email

@sumeetweb
Copy link
Owner Author

Hi there. Good to know, it worked. The above seems to be the course json file. Can you send a sample HtmlItem response, which has the embed player? I will debug.

@thereals
Copy link

thereals commented Dec 20, 2023 via email

@sumeetweb
Copy link
Owner Author

sumeetweb commented Dec 20, 2023 via email

@thereals
Copy link

thereals commented Dec 20, 2023 via email

@thereals
Copy link

thereals commented Dec 22, 2023 via email

@thereals
Copy link

thereals commented Dec 22, 2023 via email

@sumeetweb
Copy link
Owner Author

Hi Sumeet, do you think it's technically feasible to read out via Javascript or another language the three parameters request_url, set-cookie and X-Thinkific-Client-Date? It's very cumbersome to do it for 50+ courses with just 5 videos each.

Hi there, I tried for the cookie and date, and was able to fetch, but for course link, I didn't find any APIs to get list of enrollments with course slug. Need to check one more time.

@sumeetweb
Copy link
Owner Author

Hi Sumeet, I now encountered another download issue. There are also quiz pages on one thinkific teaching site. Here, the html including questions and answers are getting downloaded. However, there are also videos embedded. Unfortunately, they're not being downloaded.

Thanks. I will integrate these.

@sumeetweb
Copy link
Owner Author

Hi @thereals. Can you please test wistia-iframe-dl branch if it fixes soundslice videos downloading and quiz issue?

https://github.com/sumeetweb/Thinki-Downloader/tree/wistia-iframe-dl

@thereals
Copy link

thereals commented Jan 1, 2024

Hi @sumeetweb,
I tried to test it but I constantly got the error that the docker image is already used by the main container. After some research I found out that the container name in file compose.yaml was named the same. So I just added "-wistia" to the container name. Then it worked.

  • Quiz issue
    I tested the new branch on quiz pages. It now downloads all the videos from the wistia frame.
    However, if I open the downloaded HTML the frames are empty. I have to go to the folder and have to open to corresponding video myself. I wanted to ask whether it's possible to download the HTML with the embedded wistia videos instead? I mean, it is already working with embedded audio files. The sound files don't get downloaded but instead get embedded into the downloaded HTML. The user experience is then much better. I mean, it would be great to be able to select this behaviour in the .env file. Like, embed_video = "Yes', embed_audio = "Yes", the videos and audios would get embedded. If it's a "No", then they don't get embedded instead they get downloaded into the subfolder.

  • Another html-item issue
    I tested this new branch also on other lesson pages and found out that some embedded videos are not being downloaded. I guess, I know what is the reason. These videos are embedded with another iframe url, go like "iframe src="//fast.wistia.net/embed/iframe/c**** ". The other videos which get downloaded are without the "fast" prefix.

  • Soundslice video issue
    Here the videos within a soundslice iframe are correctly determined and they get downloaded as MP4 files into subfolders, however, they all have 0 bytes and are not working.

@sumeetweb
Copy link
Owner Author

Long time about a year on this issue. I am getting now, why OSS is hard for people with jobs.

I am planning to refactor whole thing in python and create separate modules for each third party service to download.

For Quiz, It requires custom html parser which replaces the iframe with video link if video downloads are enabled else it will just add https:// in front of iframe source.

The html issue was fixed in wistia iframe branch, but I included the network protocol (i.e. https://) in the regex and which caused the issue.

On what I remember, soundslice seems like it will require ffmpeg to combine the parts and again join them with audio later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants