-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: podman stop: timeout #23685
Comments
There is no panic, see #23551 This is an intentionally send SIGABRT on timeout, see the red error message. If it doesn't say panic or SIGSEGV it is not a panic but rather us sending the signal (see |
This seems to be the relevant hang, it is failing to unmount the netns fail (there is a infinity loop in there wait for the netns fail to be unmounted by the kernel as we called unmount with MNT_DETACH) |
FYI: I run ginkgo with |
Fine with me. I opened the issue because some flakes are easy to keep track of (when they all fail in the same test) and some are not (non-unique failure message across multiple tests). |
@edsantiago Another case in https://api.cirrus-ci.com/v1/artifact/task/6443976285224960/html/int-podman-rawhide-rootless-host-sqlite.log.html but more importantly as the process was killed the netns file was leaked so the new netns leak seems to work as expected I would say. |
Ok maybe we are getting somewhere here, QE hit this in RHEL 9.5 testing and I have access to that VM.
This doesn't make any sense to me, mount says it is mounted, umount says it is not mounted and rm is failing with EBUSY (which is why podman keeps looping until we remove successfully or a different error is hit, I am starting to regret that I didn't add a max timeout before we give up trying to remove the file) |
Weird. In case it helps, here's the list so far. All rootless, and all in the same test:
|
strace umount says EINVAL
|
The netns dir has a special logic to bind mout itself and make itslef shared. This code here didn't which lead to catastrophic bug during netns unmounting as we were unable to unmount the netns as the mount got duplicated and had the wrong parent mount. This caused us to loop forever trying to remove the file. Fixes https://issues.redhat.com/browse/RHEL-59620 Fixes containers#23685 Signed-off-by: Paul Holzinger <[email protected]>
The netns dir has a special logic to bind mout itself and make itslef shared. This code here didn't which lead to catastrophic bug during netns unmounting as we were unable to unmount the netns as the mount got duplicated and had the wrong parent mount. This caused us to loop forever trying to remove the file. Fixes https://issues.redhat.com/browse/RHEL-59620 Fixes containers#23685 Fixes https://issues.redhat.com/browse/RHEL-59703 (backport) Signed-off-by: Paul Holzinger <[email protected]>
Seen yesterday in a v5.3 PR, debian root. Is it possible that this fix didn't get backported to v5.3? |
No that is most likely something else. It is the remote hang so we do not see the server stack trace which would be the interesting thing here. I really should get #23631 in a good state so we can capture this info on all tests as this has been needed several times by now. |
One-off. I think I've seen this one before, but have no way to find it because I didn't file it.
In debian rootless, and not in any of my test PRs
The text was updated successfully, but these errors were encountered: