-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't get spotify_dl to continue downloading after 7k downloads #359
Comments
I can't say anything given that's there's no error whatsoever. After you run into the error, can you try running the spotify-dl command outside of the script? Maybe spotify is rate limiting you |
i have a similar problem so i came here to search if anything was already talked about, but i can't seem to find anything else. not so much of a problem to me, but it's happening.
is there some kind of setting for limiting spotify api calls? for example to 1 (or 1.5) second(s) between the calls? it would be slower, but it wouldn't make that problem. or are you maybe getting 100 spotify playlist items (tracks) at once in this script? because the limit is 50 in 1 api call (it's on spotifys api page documentation - to get the playlist items you can use "Range: 0 - 50" in 1 api call). just something that came to my mind that could be making that "API limit reached" problem so fast. the best bet is to get 50 items at once with a wait of 1 second between the calls untill it gets them all. that way it will not reach the limit if repeated requests for playlist items happen in a shorter amount of time. i didn't really look into the code yet, i have no idea how you are getting the playlist data (artist - track,...) but i'm thinking something related to that is making too many requests to spotify. when the playlists are big, like the OPs (or even mine) it makes sense that that would happen if there's no wait time between requests properly set in the script. and spotify is probably the service with the harshest api limits per second. i know, i was recently making a script that gets radio station data (what was playing) for more than 100 stations and makes automatic playlists with songs from those stations on spotify (and it reruns 10 minutes after it's done so i always get new tracks) so i had a lot of fun with api restriction implementations. but i was doing it with the help of chatGPT so i actually can't code other than the fact i can somehow read python code and understand it (probably because of years and years of kodi use) so i can change little things or tell the AI what to do in a way it will understand me. and it does it. AI's great, but i still can't create code without it. when a 429 error appears, spotify will give a retry-after information in the header with the amount of seconds you have to wait. i could never get my scripts to read that and wait that amount of time so the 1 second wait between the api calls helped when getting the playlists and getting the playlist data. anyway, just wanted to pitch the idea of getting the api limits on spotify a little more thought when there are big playlists like that and a huge amount of data to be processed because spotify is much harsher with limiting - after you get a 429 and you retry, you wait longer. every time you retry, you wait longer. so if you continue retrying, like i did yesterday - the keys will be blocked for a long time and you just have to change them to continue cause it's faster. if you can read the "retry-after" information in the header when the 429 error first appears, then you can set it to wait that amount of time before retrying. if you can't 1 or 1.5 second wait between the api calls should be enough. today i seem to have 0 errors so the rest of the songs continued just fine. and in subsequent runs, i expect not to be any errors since there won't be as much new tracks since i'm using -w. to me, the bigger problem is that spotify limit rate. |
almost...this is the error that happens :)
and now i'm stuck on getting the playlist in the new run: if i re-run it, it won't get them. i reached the API limit. i'd have to change the keys (or wait - but i don't know for how long). something was happening too fast for spotify api. |
ok, i found the spotify.py and did this:
it's much slower now but it should hopefully work better with the api restrictions. i can probably limit it even less than that. i'll see what works. there could be some setting for that somwhere...somehow :) something like -sl 0.5 (as in "spotify limit" and the amount of wait time in seconds) i changed this too:
so currently, with the time.sleep(0.7) and 50 tracks per api call (instead of 100 that it's fetching by default) i was able to process all the playlist items. took a while but it didn't stop. i will see if i can lower time.sleep in the future runs since i'm thinking that the fact it might have used 2 api calls every time it got 100 items from a playlist might have had something to do with accumulating too many api calls in a 30 second period. |
by now i understand that we are making an api call not only for playlist items (which i still think should be on 50 by page) and additional call for each track information. and it currently has no limits in spotify.py so it only depends on the spotipy defaults as i understand. i just have no idea how to debug that download error. it's always the same thing. |
in the end, this is what worked best. since it crashes with this playlist every 15-20 minutes when it downloads tracks i can never get it to run the playlist processing faster the next time and retain the ability to get them again when it crashes without being restricted by spotify api. the playlist has exactly 3226 songs. and i need to get them at least twice in an hour. that's a lot of api calls in one hour. what i did in the end was waiting for 15 seconds after each 50 tracks were processed. so now, i get 50 tracks, they are immediately processed (blazing fast), then i wait 15 seconds and then it gets the next 50 and so on. that way i can get it to get all 3226 tracks again and again and again. i'm on 1761/3226 downloaded files now and i think i can just make a bat to run it again automatically when it crashes and go to sleep. hopefully i will wake up tomorrow with it still running, but with all the tracks downloaded :) |
Hi @ray2301 thanks for the detailed write ups. Never thought I'd see playlists having multiple tens of thousands in the playlist, so there wasn't any thought to having batch sizes / (exponential) backoffs to fetch the data. I'm not even sure what's spotify's default rate limit is like. |
oh, i think nobody knows. it is calculated by how many api calls you make in a 30 second period so nobody really understands how it works (i did a lot of research about this but you just can't know - you can only try to get it right by knowing how many api calls spotify will make for a specific thing you are doing in a 30-second period and try to restrain them from making too many), but if you are retrieving data by their API documentation (some things can be fetched in batches and have a limit of max items that can be fetched to be counted as 1 api call), you can make it work. if you start making too many api calls, you get a 429 error and the error headers should tell you how many seconds you have to wait. if you continue retrying after that, the wait time will start to increase. i could never retrieve that data from headers to wait that specific amount of time. when you ignore the 429 error and continue retrying, the time will increase for the wait. and if you're still retrying, it will continue to increase forever :) when your script can read the header and implement the wait logic based on that header information, it can be possible to stop the script from continuing on a 429 error and wait before retrying but since i never could read the "retry-after" header information, i had to restrict it manually. so the script i mentioned that i made that gets radio data from about 100 radio stations and updates my playlists (named as those radio stations) with new tracks runs 2 api call every 1 second (it's restricted to 0,5 seconds/1api call) and i never ever get restricted. the script runs 24/7. i found that works as expected with my flow so that it can just run in loops (with a 10 minute wait before it gets all the data again for all the stations - it skips existing files) for 1 full month before playlists get cleaned automatically (on the 1st of the month when the cleaning script runs and deletes songs from the playlists) and it all starts from the start. so each month i see a playlist for that specific radio station, i see how many songs they actually play (since duplicates are not added) in a month and then it it all gets reset on the 1st and starts again. but the script never stops running (since it's a bat with many scripts that execute specific things on a specific date). you should be able to have more than 2 api calls per seconds but you need to find the correct back-off strategy. i mean, if you're going to be thinking about it in the future. you don't have to - i'm just sharing my experience :) i'm actually having fun with this even though it can be exausting but i have a lot of time in my life now so it's good for the brain :) there was nothing i could do in the end with your script because this was just doing everything so fast and i couldn't really pinpoint why and where so i found this which didn't really work (and still doesn't tag the year and album track number in the filenames and i can't pinpoint how to get more data for the tags) and i started to do what i know how to do - fixing the little things i could and implementing fuzzy matching so the tracks at least can be chosen with the correct data and the least amount of lenght difference between the best possible results. i mean i made a mess from the normal downloader (so it doesn't work) but the precise mode works perfectly now and the results i'm getting are pretty much perfect now (it even has cover art, but just missing year and track number in tags since i'm an idiot and can't understand how to get those two). i don't know if you're interested in such an approach in your script, but have a look here to see what i mean. in the end i did find what i needed and i did make it work, but i made that just for me. i'm not someone who can maintain things and always pinpoint an error all by myself so i'm leaving you this just so you can see the idea i had to actually find the correct track on youtube based on the closest lenght to spotify's lenght and fuzzy matching to get the best possible quality of audio in our downloaded files by using specific boosters and filters. maybe it'll come handy if you continue to work on this. i would love to see you implement some backoffs for spotify and some api restrictions so this flow can work even when you want to download a playlist of 3000 files :) anyway, i'm leaving you with my ideas and if they can make something better, good. now, the main difference in why that one works and yours makes problems with 429 errors on spotify is because the script i used in the end gets a 100 playlist files from spotify, then goes throught them and then it downloads from youtube. so some time passes from the first api calls. then it gets the next 100 items from spotify and downloads them. and so on. |
Describe the bug
I'm trying to run the downloader within some python code to pull a large number of mp3s for use in a data science project. The output that I'm seeing is the following:
My code looks like this:
It was working before, but after about 7k songs, it stopped. Even running the code in terminal don't seem to be working. Not sure if I got rate limited or there's something else going on. Any guidance would be appreciated.
To Reproduce
spotify_dl -l https://open.spotify.com/track/0BRjO6ga9RKCKjfDqeFgWV -o ../data/mp3s/<track_id> (where the track_id is a Spotify track ID)
Expected behavior
I expected the track to download.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: