-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulsar's CI Roadmap #685
Comments
Great summary! macOS: The easiest solution for macOS is to just provide intel binaries, and have Apple Silicon users take whatever performance hit there may be from translating ARM to x86 compatibility via Rosetta 2. [EDIT: Not great, but free. So, we can take this option (with its inherent trade-off between cost/perf) and keep it in mind when going forward.] ARM Linux: Since ARM Linux is fairly cheap on Cirrus, we could continue building those on Cirrus. This could be as an "also-supported" but not primary tier of binary release support, to give fair warning that, as the only thing left on Cirrus, and with fewer end-users, we might not immediately notice if it stops working, and it will be somewhat of a chore to maintain the second CI config for few end-users. Or... maybe it will be so easy to keep doing ARM Linux on Cirrus that we just don't think twice about it and it remains "tier-1" support. Time and experience will tell... Cirrus Pricing: Also: I need to confirm the math I did pricing out Cirrus. They said something about much-reduced effective pricing for macOS and Windows, but I suspect the graph I consulted may not have been adjusted for those pricing adjustments that were said to have just started taking effect as of August 1.
EDIT AGAIN: Okay, looks like this is indeed the updated pricing. I found a Web Archive snapshot of the pricing page from July, the month just before the change took place August 1: https://web.archive.org/web/20230726163213/https://cirrus-ci.org/pricing/#compute-credits Cirrus CI Compute Credit pricing BEFORE (July 2023) (click to expand):
Cirrus CI Compute Credit pricing AFTER (August 2023) (click to expand):
Best, |
I think we can mitigate this a bit: it's certainly too onerous to ask somebody to build a new Silicon binary with each rolling release, but it can at least be part of the release checklist to have one of our Apple Silicon–havers build Pulsar manually once a month for the regular releases. That said, I think that if Cirrus can give us a straight answer on the SSL stuff or we can figure out a way not to have our CI need to download We could even throttle Silicon usage on Cirrus to run less often than with every PR — maybe once a day, or only when PRs land, or something. I feel like this would be less worse than most of the options we've been discussing. |
How many Apple Silicon macOS runs we could do in 50 credits-equivalent on CirrusIf we use Cirrus strictly for Apple Silicon macOS builds, we can get up to 833.333[...] minutes of macOS builds per month within the free 50 credits. Our typical Apple Silicon task run lengths (and how many of those we could do) are:
How many real-world Apple Silicon CI run minutes can we do per month? Showing my work... (click to expand):The essential numbers this is using, with sources:
The math:
Table of our typical Apple Silicon macOS task durations... (click to expand):
Our typical Apple Silicon task run lengths (and how many of those we could do, given 833.33 real-world minutes of macOS runs we can do in a month for free) are:
So, I estimate we could do anywhere from 31-58 macOS runs per month (including re-runs for failures, wasted or mistakenly scheduled runs, etc), and that ~30 is a safer conservative estimate. Conclusion: That means we could run one just about daily, but not necessarily one literally every day, and keeping in mind that that would leave no run-time for any ARM Linux builds or separate "Regular" version number builds, or failed/mistakenly scheduled/otherwise "wasted" runs. So, IMO we should look at ~30 and say we can't do it more than every couple days, every few days, something like that at most. And note that our ARM Linux usage could be quite a bit less, though I haven't run all the math on that yet. P.S. (edit to add): Fun fact, the "Apple Silicon" builds take (really roughly) 1/2 the time as the intel ones. I suppose emulating x86 ain't cheap. SO. That means the Apple Silicon builds represent only about a third of our existing macOS usage on Cirrus. Which, IMO, is fantastic. Given that we can get intel macOS runs elsewhere. |
With @DeeDeeG's amazing breakdown of price and the consideration of how often we could be hypothetically packaging builds for both Silicon and ARM, I thought it'd be a good idea and take a look at what sort of impact limiting the amount of rolling releases we have could be. Since rolling releases are provided by our Rolling Release Download count by Platform over the past 30 days
This means out of a total of 5,513 rolling release downloads over the last 30 days:
So all of this is to say, it seems that we under no circumstances should consider stopping all Apple Silicon Rolling Release builds (Also I wouldn't suggest we stop Apple Silicon Regular Releases either, considering Apple Silicon makes up As for ARM Linux, while I would hate to stop those rolling releases, and it's cheaper to still produce them, I could see the argument being made to discontinue them, although I don't think we will have to. But just thought I'd provide some insight into what kind of impact changes to the rolling release schedule might pose on our users. Although I do think a smart move would be to not generate these binaries for every PR. Instead generating them only, say on commits to the |
For the record, the macOS builds have been unaffected by the SSL errors, IIRC. (That's been ARM Linux and Windows.) The ripgrep thing could affect us on any CI, but GitHub Actions having its own dedicated API token makes it a simpler to keep those requests "fed" with API access. Still, a team member with low usage against the GitHub API could make a permissionless, non-expiring token and we'd basically be done with ripgrep issues, like 98+% of the time, knock on wood. |
As sad as it would be to think of ARM Linux as a "less priority" platform, it is our least populous category of users. That said, if we are smart and think ahead, I think we can leave enough CPU hours of Cirrus for ARM Linux, especially since each CPU-minute of ARM Linux is a literal fraction of the cost of macOS and we can scale down the CPUs to either 1 or 2 for max efficiency, if needed. Side note about the "greedy" config option for Cirrus (click to expand):(Also, tiny note: if using Cirrus, we should turn the "greedy" option on. They only bill you for the CPU-minutes you specify you need, but extra free CPUs in the data-center, if available, can speed up your run at no extra credit cost to you, thus making the duration of the run shorter, and maybe maybe reducing your billable CPU-minutes, since the minimum (i.e. billable) CPUs you requested, times run duration, goes down??? This config option only applies to Linux and Windows, tho, and their allocation of ARM Linux CPUs seems to be tighter than the x86 ones, so it might make no difference... Hmm. Just a thought.) I wanna do the math on how many credits ARM Linux takes, cuz I suspect it's peanuts. The SSL thing will make it comparatively annoying and risky to run though, so we are going to have to make sure re-runs of ARM Linux don't eat up the credits and compromise our ability to get Apple Silicon macOS binaries out there. IMO. |
So it seems as of now, the general consensus is to just use CirrusCI to generate our ARM binaries. Possibly only generating them on commits to the master branch, which should only occur after a PR is merged, which really is the point of these binaries. That does mean we won't be running our visual tests on them, but that doesn't worry me to much, as I'd be surprised (but knock on wood) to find an error only effecting ARM. As for Apple Silicon, I still am worried about running it on CirrusCI, just because the cost is so high if we do go over the free limit. Although if the math works out like stated above, do we think we could get away with a bi-daily rolling release for Apple Silicon on Cirrus? Running it at midnight once every two days? That way assuming that every run is our longest recorded run we won't go above our free usage limits, and does allow some free runs, say for manually invoking a run for our regular releases. This again would mean we no longer generate them for PRs or even on every commit for master. Which does make testing slightly more difficult, but that may just have to be the price we pay for these more unique and uncommon platforms. |
I say we don't give Cirrus billing info, such that it just turns off the CI spigot if we go over, not charges us. (Unless we wanna support them at say, the fixed $10 a month tier or something with an upper bound, just out of appreciation and support for what they're doing, etc... EDIT: We would discuss this before doing it, not just offhand since I mentioned it, obviously.) And yeah, I agree with the direction things seem to be going, toward running some ARM macOS and some ARM Linux on Cirrus. TODO: I wanna know how much the typical ARM Linux + Apple Silicon macOS build would be. Then calculate how many of those we can safely get away with, and using a conservative estimate, fly safely within the limits. Keeping in mind we have to play it really safe, or we miss out on being able to build a Regular release timely. EDIT to add: And the variability with the SSL errors means, to play it safe, we have to take the most conservative approach to (at least estimating and planning for) those re-runs. Maybe we just don't re-run most of the time for Rolling, since it's so low-priority and there'll be another in a few days? Math for the ARM Linux runs pending after I eat something, I guess. (Unless someone gets to it before me, I showed my work and anyone with admin for this repo can open the edit view for my comment to copy-paste the markdown table and just work from there.) EDIT ALSO TO ADD: I suppose we need to re-write the download micro-service to return GitHub Releases binaries for those if we get that set up. Cuz they won't be fetchable from Cirrus, at this rate. |
How many ARM Linux runs we could do in 50 credits-equivalent on CirrusIf we used Cirrus strictly for [ARM] Linux builds, we could get up to 8333.333[...] minutes of Linux builds per month within the free 50 credits. (Not a typo, by the way, that's fully 10x the amount of real-world macOS minutes we could get, with same size of 50 credit-equivalent allowance, if I'm doing the math right. Our typical ARM Linux task run lengths (and how many of those we could do) are:
How many real-world ARM Linux run minutes can we do per month? Showing my work... (click to expand):The essential numbers this is using, with sources:
The math:
Table of our typical ARM Linux task durations... (click to expand):
Our typical ARM Linux task run lengths (and how many of those we could do, given 8333.33 real-world minutes of Linux runs we can do in a month for free) are:
Discussion (In summary: not super useful as-is, but combined with macOS data can tell us how much of each we can do, combined. That math coming soon?) (click to expand):We are definitely looking into running Apple Silicon macOS builds as a first priority, and seeing about how much ARM Linux we can do in parallel. As such, these numbers aren't very useful. (We wouldn't actually consider using Cirrus only for ARM Linux builds at the moment.) But it's part of the way there. One more round of math and I can see how much our runs cost in units of credit-equivalent cost, a decimal amount of credits for a given build consisting of the "ARM Linux task" usage plus the "Apple Silicon macOS task" usage. MORE MATH INCOMING, as I get to it, hopefully right now I'm going to be able to work on it if no interruptions from outside stuff. |
Costs for "ARM Linux" + "Apple Silicon macOS" combined, on Cirrus, and How many builds of both we can do each monthMath: How many credits per minute, for ARM Linux and Apple Silicon macOS task execution, respectively? Showing my work. (click to expand):Using the calculations above, we can derive how many credits-equivalent it costs per minute of ARM Linux task execution, and how many credits-equivalent it costs per minute of Apple Silicon macOS task execution. How many credits-equivalent per minute of ARM Linux execution? (Assuming our requirement of 2 CPUs minimum): 50 credits ÷ 8333.3333[...] minutes we can run of real-world ARM Linux tasks in total per 50 credits = 0.006 credits-per-minute of ARM Linux task execution How many credits-equivalent per minute of Apple Silicon macOS execution? (Assuming non-configurable 4 CPUs for public repos/free usage): 50 credits ÷ 833.3333[...] minutes we can run of real-world Apple Silicon macOS tasks in total per 50 credits = 0.06 credits-per-minute of Apple Silicon macOS task execution Answer:
Table of credit-equivalent usage/costs for the 12 recent successful builds I collected data for (Apple Silicon macOS + ARM Linux tasks) (click to expand):
How many credits do we typically use for these builds, and how many of these builds could we do, given one Apple Silicon macOS task and one ARM Linux task (no re-runs, no wasted or unintended runs, no "do-overs")?
Projecting alternate scenarios, with either more or less frequent ARM Linux builds, to see if that can help us tailor our credit budget, given the higher priority placed on macOS builds being completed (click to expand):Double the ARM Linux usage (very rough simulation of having to re-run every ARM Linux task one time, on account of SSL errors):Table (click to expand):
So, when doubling ARM Linux cost (rough simulation of re-running every ARM Linux task once for SSL errors, worst-case scenario-ish):
You could say this knocks three to seven builds off the number we can comfortably predict we can run each month. I feel it's good to lean conservative here, especially for the first month -- once you're out of free credits, you're out of them. No more free builds. Would have to add some credits and start paying. (Although if we are close to the free 50-credit-equivalent usage, it wouldn't be a ton of cost.) Half the ARM Linux usage (very rough simulation of just running relatively fewer ARM Linux builds, so as to save credits):Table (click to expand):
So, when halving ARM Linux cost (rough simulation of just running relatively fewer ARM Linux builds, so as to save credits):
You could say if we run ARM Linux tasks 1/2 as often as macOS tasks, that could save us... enough credits for one to four more macOS builds. Not really worth it, you could argue. But I suppose it can save us... a rather small amount of credits in a pinch. If we go further than "half as often", and instead really run almost zero ARM Linux builds... Such as only offering ARM Linux builds for the Regular releases, it could save us enough credits for (asymptotically approaching -- that is to say almost) three to ten more macOS builds per month, depending on how optimistically you project it out, based on my data analyses above. (Which is all pretty rough and approximate, you can't predict the future, only try to define trends from the past.) |
@DeeDeeG Amazing write up on all of this, and it's really good to be able to land on a final number, based on our current trends. Which from above seems to be, that we should plan for a maximum Which seems to still work for the above idea of running once every other day. That should give us about 15 builds per month, which leaves a very healthy buffer for the reties needed for something like a Regular Release, or some other circumstance that'd cause us to run these any more than once every other day. So seriously thanks for all this work! I suppose this leaves on thing left to do, act on these plans. As it looks now, here's the full list of everything we must do (borrowed from DeeDeeG on Discord)
This is quite a bit to tackle in the few short days we have. But luckily we already have a head start. If anybody would like to take ownership of these following tasks please feel free to comment here, and lets see if we can accomplish this all in time to ensure the end user of rolling releases for these platforms never notices anything changed |
I found the docs for scheduled (cron-style) Cirrus runs: https://cirrus-ci.org/guide/writing-tasks/#cron-builds
So I'll work toward that, it should be one of the easier parts of all this. Also, disabling the non-ARM tasks in the |
Ahh thanks for finding this, some parts of those docs lead me to believe having to check for certain items, such as PR labels would still count towards our usage, but you seem to be correct against that idea. So yes, working out the cron job we want to run for these builds is a must, as well as we should probably build in the functionality of publishing these artifacts. Speaking of publishing rolling release artifacts, I know we've discussed this before, and I've looked at Atom's old nightly repo, but it seems the nightly repo just consisted of tags with source code tars. Would you think uploading the binaries to a tag of the rolling release build number is the best method to do this? Or, do we just add them to a repository itself, and each new binary for each platform replaces the old one, so that way there's only ever a single binary for each platform? (The later option above, would make the |
A proposed Cron schedule for Cirrus
I was looking into this. I propose "15:00 UTC, Mon/Wed/Fri every week". As a cron expression: (This handy thing linked from the Cirrus docs helped pick/format the expression: https://www.freeformatter.com/cron-expression-generator-quartz.html) Thoughts on/rationale as to why this cron expression (click to expand):The constraints I had in mind were, "should run twice or three times a week" (meaning at most 14 times a month, should be within our budget per above cost estimates), and "should happen on weekdays, at a time when most people in the team are likely to be awake". So that if a build goes out, and there's any issue, we can act on it rather than us all being asleep/busy when it goes out. (Also, I wanted something predictable, not something that acts drastically different for a shorter month like February, or that skips around days of the week like "every three days" would.) Monday/Wednesday/Friday takes care of "three times a week" and "always on weekdays". (While being quite predictable and orderly). Then there's still time of day... So... I looked at time zones for the Americas plus the UK. Plugged those time zones into an international meeting planner web app: https://www.timeanddate.com/worldclock/meetingtime.html?iso=20230828&p1=137&p2=179&p3=233&p4=136 15:00 UTC is during "business hours" for all four time zones I looked at, and if I figured this correctly, it is the only time of day that's business hours during Daylight Time and Standard time for all those time zones, maybe. So it should remain "business hours" all year, not just in Daylight Time like it is in three out of those four time zones right now, IIUC. |
I've now gone ahead and updated my GitHub Actions PR for the following:
|
Posting this checklist for locking down Cirrus from the Discord and then really going to sleep Something like:
And ideally we have GitHub Actions in place to replace all the stuff Cirrus _was_doing for most PRs and branch pushes, will make transition of most things off of Cirrus feel complete and whole, not just disabled. |
@DeeDeeG As for point 4 here, that may not be necessary? Since if we only allow runs of people who have write permission already, then I'd argue they should be allowed to easily test their changes, and would hope they are aware of the effects changing the config will have. If we really want to though, I believe we can disallow cirrus from running if a certain file or path has been changed, including it's own config. |
@confused-Techie I suppose it was to save us from ourselves, but you do have a point it would really be team members using credits. We have to ensure every team member with write access is as serious about preserving those credits as we are, having been neck deep in the pricing data like this, otherwise I think that pitfall should have a lid installed over it that we can temporarily remove if we need to tweak CI. Anyone with write access should be able to log into Cirrus to remove the restriction... Better safe than out of credits, IMO?? Open to more feedback on it, but that's my leaning. EDIT to add: there would already need to be some intervention to get CI tweaks to run, since PR's won't be running on Cirrus at all anymore. So, hmm. Just layers of defense-in-depth against exhausting our credits. Maybe it's one two many layers and just making it cumbersome. I'll let this one percolate in my head a bit... This is a niche and mostly hypothetical concern, though, given the plans to disable PR runs in our Cirrus setup soon. |
Alright, to put an update out on progress here, since at this point there's quite a few moving parts that seem to be about to converge (which is a good thing). What is needed to actually switch:Lets first layout our plan of action:
With all that said, it seems we are nearly complete, with the last things to do, would be to limit Cirrus runs, get Apple Signing secrets into GitHub, and finally test. Please let me know if any further clarification is needed, but to further highlight the last tasks that need to be done:
|
I have added the codesigning secrets to GHA now >w< |
Seems the last thing to do, is limit our cirrus runs otherwise, then we can finally test how things look |
I've also gone ahead and added limits to when the cirrus script will run, using: only_if: $CIRRUS_CRON != "" || $CIRRUS_TAG == "regular_release" This should mean, that the cirrus scripts will only be run, if triggered by a cron job (which again will be triggered every two days, or if the tag of a PR that's triggered the run has the label We will just need to create this label and begin using it for all the regular release tagged PRs, only members with write access can change the labels of their PR so we don't have to worry about drive by contributors using it, but we must then all understand that usage of this label will trigger cirrus, and if we aren't careful will use our very limited credits with cirrus, and could cause the lack of credits to hold back rolling releases or even regular releases of ARM and Silicon binaries if misused. |
One more to do:
|
Have you checked for existing feature requests?
Summary
As some of you may know, CirrusCI (Which is the service we use to build all of our Pulsar binaries, including rolling and regular release) will soon be slashing the features of their free tier significantly. While we aren't totally opposed to paying for their services, the issue is, with our current usage, at the end of the month when these changes go into effect, it'll cost the Pulsar org ~$350USD per month.
This is more than it costs to run the backend per month by a pretty large margin, making it untenable for us to reasonably charge the Open Collective account for this. Meaning we have to find some other way to create our binaries.
While there have been several ideas of what we can do floating around:
While there has been significant discussion about this over on Discord, it seems a popular solution is moving as much as possible to GitHub Actions.
And as a do-ocracy, I've elected to attempt this transition over on #682
This transition is working pretty well, being able to run visual tests with just as much previous success over on CirrusCI, as well as cleaner retry capabilities, and of course being able to build binaries just as we would expect.
Although there are several kinks to work out still, and to track what we know of, I'll compile this issue here to ensure proper traceability in solutions, and problems:
Platform Support
GitHub Actions only supports
windows
,macos
, andubuntu
. Meaning we have to native way to build Linux ARM binaries, or Apple Silicon binaries. (Although it may be worth mentioning that Apple Silicon is on the GitHub Actions roadmap ).Meaning we need to figure out support for:
Possible Platform Support Solutions
We could decide to still build only these binaries on CirrusCI, especially since CirrusCI supports GitHub Actions runners on all their supported platforms, which means it'll integrate very nicely into our new build pipeline. But it's important to mention that with their pricing changes, last month (a lower usage month for us) would still cost us ~$305USD for Apple platforms, sure half of that would be removed since we aren't building Intel builds there, but that's still a possible cost of ~$115USD just for Apple Silicon builds every month on CirrusCI.
So very likely, another solution must be found.
One possible solution that's been discussed, considering the cost of Apple Silicon cloud environments, it may be worth it to purchase a low spec, Apple Silicon machine second hand. Which while pricy around ~$400USD, it would only take a few months to start saving on this purchase, and use the machine as a Pulsar dedicated GitHub Self-Hosted runner. The same may be possible with an Linux ARM machine.
Regular Release
This likely won't be as much of an issue. Since it's totally possible to download binaries manually from GitHub Actions, and upload them to our release, which is what we have been doing this whole time, although, it would be worth testing if
electron-builder
is actually capable of automatically uploading binaries to a GitHub draft release, if available. Which could actually mean this saves us more time in the long run.Rolling Release
This will be a big issue.
Currently, binaries produced by GitHub Actions are wrapped up in a zip file. Meaning there is no easy way to provide them as downloads to users, likely meaning we will have to fully rewrite the
download
Microservice, that provides downloads for the Rolling Release as well as find a completely new system and methodology of providing and storing our rolling release. Although we could take some inspiration from the way Atom handled their nightly releases. But otherwise there are still several unanswered questions there.Summary of issues presented above:
What benefits does this feature provide?
A CI platform to build binaries for Pulsar.
Any alternatives?
Many, that is the purpose of this issue.
Other examples:
No response
The text was updated successfully, but these errors were encountered: