-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s390x arches got slower in the last few days (like a lot) #3060
Comments
Should we report this against IBM Cloud support? |
Yes please? Just to give some broader context here. We are using COPR to test the GCC 14 rebase by using the Mass Prebuild tooling. We are using COPR at the recommendation of FESCo because it is deemed to be as-fast and as-conformant with builds as Koji. If the s390x builds take longer then it blocks us from getting results to improve GCC 14 for all Fedora arches, and the eventual Fedora 40 mass rebuild on 2024-01-17. We could exclude s390x, but we don't want to do that. Adding @fweimer-rh for awareness. |
Sorry for the inconvenience. There seemed to be some allocation api problem, not sure about details, I killed some of the old builders, and things seem to allocate fine now. I'll try to keep this monitored :-/ Can you confirm this "performance" problem is still happening? These IBM Cloud builders were always slower compared to the other architecture, but according to @fberat report above it seems they are now unuseably slow. But I don't seem to observe this right now. |
I tested a tar build that spent like 14 minutes on the s390x builder, mostly spent on the disk-intensive test stuff:
Copr uses tmpfs for chroots, and if combined with memory-intensive task -> we might overflow to SWAP extensively, see the hw profile of s390x builder. Can this be the issue? |
The instances are cz2-2x4 + 160 Volumes for SWAP. The quota sponsored by IBM allows us to spawn 18 such machines in parallel. |
@praiskup I'm trying out right now. I've start 2 builds. A longer build is still ongoing, we'll see how it goes today: https://copr.fedorainfracloud.org/coprs/fberat/gcc-14_gnat/build/6765971/ |
Thank you for the test, but the build eventually failed for some core dump :-(
|
Yes, but that's fine, since it fails with the same core dump on all platforms :D Regarding the second build, it wasn't completed in 5h, so I'd say the builders are still quite slow. |
Do you have some build of |
We discussed this "off list" and it appears that the problems before were caused by the resource allocation hiccup that is resolved. The builder performance is "expected", the |
JFTR, as mitigation for a frequent giant build slowdown, we upgraded the machines to have more RAM and decreased the quota from "up to 18 machines" to "up to 12" machines (to keep the same $ budget). This is because we believe that @fberat could rebuild all the Fedora packages much faster. But it seems people already complaining that we do have not enough s390x builders, https://matrix.to/#/#buildsys:fedoraproject.org |
Just a quick update; it seems that the new pattern with "up to 12" s390x machines with more memory works well enough. The queue gets bigger for "many small builds" but copr eventually handles it; and the "bigger builds" just scale better. So we don't plan to revert the change. I bumped the thread with IBM folks - asking if we could implement more "boosted" approach to process the queue. |
We got approval to bump the builders up to 18 while staying with the memory-optimized instances. And up to that start 2 high-performance builders, so I created #3086. |
Done, so closing. |
The following build on all arch took less than 1h in all arches except for s390x for which it took more than 7h. That's unusual behaviour for s390x.
https://copr.fedorainfracloud.org/coprs/fberat/gcc-14_gnat/build/6743516/
Overall, since a few days (maybe last Sunday actually), it is harder to get the s390x builder to start, and when it does the build seem quite slow.
Note that a build of GCC started on Friday took a similar time as the other arches (about 30 hours). Which would mean that if there is any problem, that's fairly recent.
Can you please have a look ?
The text was updated successfully, but these errors were encountered: