Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]: Make auto updates rate-limit aware #22994

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TomJo2000
Copy link
Member

@TomJo2000 TomJo2000 commented Jan 21, 2025

This PR aims to make the auto-update workflow rate-limit aware in order to avoid future shadowbans of @termuxbot2 for just doing its job.

In this current state it only implements basic debugging functionality by printing out the .resources.core table of the rate limit query API response which contains the limit(int), used(int), remaining(int) and reset(unix_epoch) keys.
image

For local testing I have generated a fine grained access token for GitHub, which I am sourcing from gh.env inside the root directory of the git repo.
(This token only has read permissions to the rate-limit API endpoint, I may need to replace it for further testing)

# file: gh.env
export GITHUB_TOKEN='<your_github_token_here>'

The token is obviously passed in as part of the environment in the CI, so this file exists just for local testing.
Speaking of which, my local testing setup consists of running:

./scripts/run-docker.sh ./scripts/bin/update-packages '@all'

Which simulates a full auto update run.
(It also eats through the rate limit quite fast, which is why I'm opening this PR now, since I need to wait an hour to do more testing)

@twaik
Copy link
Member

twaik commented Jan 21, 2025

  1. I am pretty much sure we should not spam logs with ratelimit status. The most of ratelimit consuming job is done in _fetch_and_cache_tags which invokes a lot of simultaneous requests to github (to save some CI time by avoiding requesting tags one by one). And not every single package invokes github rest api. I see at least two ways to optimize it.
    a. Moving ratelimit checking to termux_github_api_get_tag.
    b. Invoking ratelimit check every 50 packages.
  2. termux_repology_api_get_latest_version creates repology metadata file on the first invocation and after this simply reuses it. We can do similar thing for ratelimits. We can cache the used field of github ratelimit reply and decrement it on every github request. This counter will be used to report null or current version of package when ~50 requests are remaining. AFAIK null should make tag checker report the current version as latest without throwing errors.

@TomJo2000
Copy link
Member Author

I know we don't want to spam logs, I'm not suggesting this is ready to merge in this state, this is an initial draft.

If you got any code suggestions do feel free to put them in here, or push them to my branch, or I can move the branch over to this repository if that is easier in terms of workflow.

@twaik
Copy link
Member

twaik commented Jan 21, 2025

I can try this a bit later, currently I should go to sleep.

@TomJo2000
Copy link
Member Author

Yeah I'm about to as well, I just wanted to push this before I do.

@TomJo2000
Copy link
Member Author

Alright I did a full run now that the limit reset for me and it looks like it takes ~820 API calls for the full auto-update cycle.

@TomJo2000
Copy link
Member Author

TomJo2000 commented Jan 21, 2025

That limit is shared globally between all CI runs, and as such also resets globally for all CI runs.
We could probably make it idle or reschedule for after the next reset if there isn't even API limit remaining is the current time window.

(That limit being 5000 per hour)

@@ -183,12 +186,12 @@ _fetch_and_cache_tags() {
echo "build run_all: phony ${__PACKAGES[@]}"
echo "default run_all"
}

local LATEST_TAGS="$(\
F="$(declare -p TERMUX_SCRIPTDIR GITHUB_TOKEN TERMUX_REPOLOGY_DATA_FILE); $(declare -f __main__ | sed 's/__main__ ()//')" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason we are getting rate limited is because we aren't slowing down our api requests. See https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#about-secondary-rate-limits, maybe we should actually fix this than using such workarounds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants