-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make bot-share proportional to "Awaiting PR" #2612
Comments
Yeah I've messed with this a lot over the years. I've definitely made things proportionate to waiting prs and that was a problem for other reasons. (The issue is that new migrations take over and block things if you make the share proportional to the number of waiting prs.) Also, the bot ramps up migrations automatically so if you've just merged or restarted one, you might wait for about six to ten hours to see what happens. I'm honestly reluctant to adjust things now but open to ideas. Most of the time nobody notices which is basically the point. |
One more comment is that we used to not retry old migrations very much and then we got complaints that once something was fixed, the bot didn't get back to it fast enough. It's kind of a crap shoot no matter what. |
Yeah, I get that optimizing for all cases is very tricky. Which is why I'm so adamant to at least not waste the scarce cycles we have on stuff that will 100% fail (#2613). |
One thing that would IMO be quite an easy win would be to open PRs for "not solvable" feedstocks after a given time or number of retries. That way, it gets out of the bot retry queue, and also gets it to the attention of the maintainers (rather than having to go dig into the status page entrails). To me that'd be a double-win. The same reasoning can probably be applied to "bot error" as well. |
Yes this last thing we should do for sure. There are some edge cases around bot automerge but those are easy to handle. |
Just realized that if we do that, it automatically also solves #2613 for free 🙃 |
Just noticed the following in the most recent bot-bot run:
I don't know how the bot shares are calculated exactly, but that to me seems really wasteful, and also not very responsive to new migrations. In particular, snappy120 is an old migration where only one feedstock doesn't have a PR yet ("Not solvable" for tensorflow), and yet it receives about 2/3rds of the resources of the numpy2 migration, where we have 300 PRs awaiting action.
Also, old migrations like python312 constantly hogging 20% of our resources despite very little movement seems like a waste to me as well. I'm not sure how easy it is to square all circles here (e.g. maybe need to include migrator age), but if we were using a score per migration à la1:
to determine bot-share, then we'd be IMO:
Within a migration I wouldn't want to deprioritize the error cases though, IMO the choice within a migrator should be driven by the total number of children - what I want to avoid is that an early failure of a big blocker doesn't get retried until all the other small-fry feedstocks get through the queue.
CC @beckermr
Footnotes
|{...}|
is the cardinality of the set, i.e. how many feedstocks per status ↩The text was updated successfully, but these errors were encountered: