-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump to Fedora 40 #3785
Bump to Fedora 40 #3785
Conversation
Didn't test this at all. Let's see what CI says. |
openshift/release PR: openshift/release#51370 |
(Testing locally as well in parallel now.) Let's also push a release and add a Quay.io tag before merging this. |
agree. Ideally we build the next |
Prow needs openshift/release#51370. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/test ci/prow/images |
@travier: The specified target(s) for
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test images |
CoreOS CI hanging at the |
Seems related to virtio-serial writes from the guest side sometimes hanging for some reason. (I.e. writes to |
OK, latest commit seems to have fixed it! Looked a bit through |
since we have to run CI again maybe let's update: |
OK weird, debugging in the pod, it looks like Prow is still hitting the same hanging issue that I thought Anyway, this now sounds like possibly some bug when combining virtio-serial and stdio. I think I'll just rework this to use a regular serial device instead of virtio-serial since that's obviously way more battle-tested. |
d20b066
to
f124fa9
Compare
OK, ran out of cycles trying to debug this. I've ended having to essentially revert 4eb19f4, which is unfortunate. But at least it passes CI in both Prow and CoreOS CI.
The problem with this is that it doesn't work on all arches. E.g. on aarch64, adding another |
Have some work to try to create a minimal/self-contained reproducer to file a bug, but it's proving trickier than expected. |
Some of our upstream CIs (ostree, rpm-ostree) require cosa and FCOS to be on the same release. Ideally we'd fix that but there's details there and we want to move cosa anyway.
This is more or less a revert of 4eb19f4. It seems like QEMU v8.2.2 (in Fedora 40) is hitting issues when combining virtio-serial ports and the stdio character device. When the guest writes to the virtio-serial port, it sometimes hangs. We can look at reverting this patch if it works again in a future version.
Since CI already passed on this, let's just merge it in to unbreak CI and get to any other fallout faster. |
@@ -842,6 +845,9 @@ EOF | |||
fi | |||
rc="$(cat "${rc_file}")" | |||
|
|||
# cleanup tail before nuking dir containing file it's following | |||
kill "$tail_pid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a potential race here where tail
could be killed before it finished actually printing all the output, even though qemu already exited. A simple fix is to just e.g. sleep 1
or whatever but ughhh. Really wish we could go back to the virtio-serial approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And indeed: openshift/os#1498 (comment)
I can't reproduce this locally, but I have a suspicion that `tail` can exit too quickly in some circumstances, causing truncated output: openshift/os#1498 (comment) coreos#3785 (comment) Rather than having an unconditional `sleep`, let's make it easier to test that theory by having an env var we can use to make it optional. Then we'll test that in CI. Mid-term, I'd like to revert 79b15c8 soon so we can go back to virtio-serial which is just so much cleaner.
I can't reproduce this locally, but I have a suspicion that `tail` can exit too quickly in some circumstances, causing truncated output: openshift/os#1498 (comment) #3785 (comment) Rather than having an unconditional `sleep`, let's make it easier to test that theory by having an env var we can use to make it optional. Then we'll test that in CI. Mid-term, I'd like to revert 79b15c8 soon so we can go back to virtio-serial which is just so much cleaner.
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: coreos#3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin.
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: #3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin.
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: coreos#3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin. (cherry picked from commit bb60451)
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: #3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin. (cherry picked from commit bb60451)
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: coreos#3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin. (cherry picked from commit bb60451)
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: #3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin. (cherry picked from commit bb60451)
This is a follow-up to 79b15c8 ("cmdlib.sh: go back to using `tail -F` for command output") which was subsequently reverted. To summarize, it seems like in QEMU v8.2 (in f40), the guest sometimes would hang when writing over virtio-serial if the device is hooked up to the QEMU's stdio. In testing, removing the `<&-` hack to close QEMU's stdin fixed it for CoreOS CI but not Prow: coreos#3785 (comment) I think I've narrowed it down to CoreOS CI (i.e. Jenkins) allocating a tty and Prow not. When stdin is not a tty, QEMU would immediately gets EOF if it tries to read anything. I'm not sure exactly what happens, but I think the virtio-serial hang is linked to this (even though there's no userspace code in the guest trying to read from the virtio-serial port). Work around this by explicitly feeding `/dev/zero` to QEMU's stdin. (cherry picked from commit bb60451)
Some of our upstream CIs (ostree, rpm-ostree) require cosa and FCOS to be on the same release. Ideally we'd fix that but there's details there and we want to move cosa anyway.