Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Youtube selector quota barriers #131

Open
breezykermo opened this issue Jan 22, 2020 · 2 comments
Open

Investigate Youtube selector quota barriers #131

breezykermo opened this issue Jan 22, 2020 · 2 comments
Labels
bug Something isn't working priority Needs to be fixed urgently

Comments

@breezykermo
Copy link
Member

It's currently unclear exactly how the youtube selector exhausts Google Cloud quotas. It was assumed that using the Youtube V3 api (which the youtube selector does under the hood) would incur a quota usage of 1 unit per search.

When setting the daily parameter to false, this should only use 1 quota per but the selector is often hitting the limit (10k queries) within just a few minutes, even when putting a sleep in the code to space them out.

This could be due to a number of things- the fact that certain queries require paging via tokens, or perhaps that more metadata is being returned from each search than just the video ID. Needs further investigation and a fix.

@breezykermo breezykermo added bug Something isn't working priority Needs to be fixed urgently labels Jan 22, 2020
@breezykermo
Copy link
Member Author

breezykermo commented Jan 28, 2020

Note: should also make clear what the quota limit is in the selector documentation, and how the selector exhausts quota. Ash's message from Discord:

"I tracked the number of pages I was pulling from, the number of videos I had fetched, and the quota cost I incurred. I had 5 search queries, range was across the past 3 years, set daily to false, set the limit on the number of pages to 50 (though most of the time, there were max 15 pages worth of data) and ended up with about 500-600 videos for each query. The quota usage (queries per day) this incurred was 7446. This is larger than the number of videos and much larger than the number of pages but it didn't hit the 10k limit so that's good.
Also, the other change I made was adding a sleep counter between successive queries so as to not breach the other quota limit: queries per 100 seconds per user. Perhaps a future version of mtriage could also have some sort of throttling functionality so users don't hit this limit."

@breezykermo
Copy link
Member Author

Should also note that we are always subject to Youtube's search algorithm when considering what appears and what doesn't appear. Exhaustive searches using mtriage at scale could be flagged and could modify results.

@breezykermo breezykermo changed the title Investigate Youtube selector quota barriers Investigate Youtube selector quota barriers, and fix auth Feb 15, 2020
@breezykermo breezykermo changed the title Investigate Youtube selector quota barriers, and fix auth Investigate Youtube selector quota barriers Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority Needs to be fixed urgently
Projects
None yet
Development

No branches or pull requests

1 participant