-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI #389
Fix CI #389
Conversation
@timothystewart6 I have just seen this - and I am not convinced that this would actually improve CI in its current form:
|
Thanks @sleiner I am coming to the same conclusion too. This was more of debugging experiment to track down build errors. Our error seems to be a Virtual Box error.
This PR will not be merged, I am just working some of my suspicions, I have a few more to chase down. Virtual Box Error
|
From the error message that you posted:
That's a trippy error message if I've ever seen one 😅
Hmm, accessing the GUI will not be possible, but maybe we can get the actual VirtualBox logs out? |
@timothystewart6 btw if you are testing runner types, it might be worthwhile to check out |
@sleiner good eye! I only saw |
Ah, |
In principle, porting this to Ubuntu is trivial (since Molecule, Vagrant and VirtualBox all work on Ubuntu too). In GitHub Actions, we are using macOS however, because at the time these were the only GitHub-hosted runners with support for nested virtualization. In the mean time, GitHub seems to have added KVM support to large runners (i.e., the ones that are not free to use). So as far as I see, with GitHub-hosted runners, macOS is the only viable option. |
Yeah, I just read through all of those threads. Seems like enabling it on macOS was an oversight. I wonder if that's why vagrant isn't installed on 13, maybe they "fixed the bug" 😆. I'll keep plowing through this and figure it out! |
Hey First of all let me say that I am not familiar with molecule but I tried to run the test workflow on my fork and it looks like this is facing some resource starvation. One of the steps ran without an issue: Run Link Are we provisioning several vms at the same time? If so, could we split those workloads? (at least just to test) It could also be the cache since it only worked the first time Hope that helps |
according to this "It's getting a triple fault error, which possibly means disk corruption" So another vote on the cache |
6643d0b
to
edf0c9e
Compare
I think I found the answer is that we are running out of resources during CI. Reducing allocation seemed to fix it on the first attempt. I have a feeling that sometimes we would get lucky and land on a macOS machine that had enough resources to process our CI, most other times it would not. Similar Issue (it's not just us) After that change, it worked on the first try. |
I am most likely going to add some additional changes to this job.
|
After working on this for about 2 days straight, I think I finally understand what's going on Turns out nested virtualization was working on some Turns out nested virtualization is only supported on their large runners, which is a paid teams feature. I will look into cost and I will also consider running self-hosted runners. CI for this is pretty important but also not sure I can afford GitHub Team when I am a solo developer. |
I signed up for teams, converted to ubuntu, and provisioned a large runner and I am still seeing the same thing. That thread has so many conflicting comments it's hard to know what's right. Someone also mentioned after the update to |
Good news, my self-hosted runners are working! I am going to clean up this PR before merge, I will also update the summary too. |
Just reporting back, we now have 3 self-hosted runners so jobs can run concurrently |
Proposed Changes
I don't even know where to begin 😅
macos-12
imagemacos-13
ubuntu
not macOSI also used this PR to reworking CI steps and caching. While troubleshooting this CI job I found that it was better to cache things up front in a pre step, vs letting each step decide. Steps downstream only restore cache. Also, it seem like a race condition that all 3 molecule tests could download all vms and then save all vms. Caching them up front and forcing them to download only ensures that there's no unnecessary save later.
All of this should work. I am going to merge it shortly after the tests pass (again) so that we can get caught up on PRs
Checklist
site.yml
playbookreset.yml
playbook