-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: system tests: something is eating volumes #23913
Comments
Seen on my laptop:
|
That second one looks like a TOCTOU in the volume ls code, similar to the ones Paul fixed recently around containers and pods. Probably unrelated to the first. |
A friendly reminder that this issue had no activity for 30 days. |
Yesterday, f40 aarch64 root |
Well we have the cleanup tracer now so we can see all podman commands that were executed and I am still baffled. I am ignoring everything before the logged |
Thanks for the confirmation. There are very few |
I have no idea, there is There is no easy explanation for me to blame the tests, so I have to assume we might have a bug in podman itself where unrelated things get deleted. |
Debug for containers#23913, I though if we have no idea which process is nuking the volume then we need to figure this out. As there is no reproducer we can (ab)use the cleanup tracer. Simply trace all unlink syscalls to see which process deletes our special named volume. Given the volume name is used as path on the fs and is deleted on volume rm we should know exactly which process deleted it the next time hoopefully. Signed-off-by: Paul Holzinger <[email protected]>
Debug for containers#23913, I though if we have no idea which process is nuking the volume then we need to figure this out. As there is no reproducer we can (ab)use the cleanup tracer. Simply trace all unlink syscalls to see which process deletes our special named volume. Given the volume name is used as path on the fs and is deleted on volume rm we should know exactly which process deleted it the next time hopefully. Signed-off-by: Paul Holzinger <[email protected]>
Ok this is not what I expected: No match for my magic |
Hm. If we assume that the volume is still there, and (please oh please) that there are no db integrity issues, could it be a bug in the query logic? |
Maybe? So far only sqlite is seen but we have to few hits on that flake to exclude boltdb IMO. |
If volume ls was called while another volume was removed at the right time it could have failed with "no such volume" as we did not ignore such error during listing. As we list things and this no longer exists the correct thing is to ignore the error and continue like we do with containers, pods, etc... I have a slight feeling that this might solve containers#23913 but I am not to sure there so I am not adding a Fixes here. Signed-off-by: Paul Holzinger <[email protected]>
If volume ls was called while another volume was removed at the right time it could have failed with "no such volume" as we did not ignore such error during listing. As we list things and this no longer exists the correct thing is to ignore the error and continue like we do with containers, pods, etc... This was pretty easy to reproduce with these two commands running in differernt terminals: while :; do bin/podman volume create test && bin/podman volume rm test || break; done while :; do bin/podman volume ls || break ; done I have a slight feeling that this might solve containers#23913 but I am not to sure there so I am not adding a Fixes here. Signed-off-by: Paul Holzinger <[email protected]>
If volume ls was called while another volume was removed at the right time it could have failed with "no such volume" as we did not ignore such error during listing. As we list things and this no longer exists the correct thing is to ignore the error and continue like we do with containers, pods, etc... This was pretty easy to reproduce with these two commands running in different terminals: while :; do bin/podman volume create test && bin/podman volume rm test || break; done while :; do bin/podman volume ls || break ; done I have a slight feeling that this might solve containers#23913 but I am not to sure there so I am not adding a Fixes here. Signed-off-by: Paul Holzinger <[email protected]>
If volume ls was called while another volume was removed at the right time it could have failed with "no such volume" as we did not ignore such error during listing. As we list things and this no longer exists the correct thing is to ignore the error and continue like we do with containers, pods, etc... This was pretty easy to reproduce with these two commands running in different terminals: while :; do bin/podman volume create test && bin/podman volume rm test || break; done while :; do bin/podman volume ls || break ; done I have a slight feeling that this might solve containers#23913 but I am not to sure there so I am not adding a Fixes here. Signed-off-by: Paul Holzinger <[email protected]> (cherry picked from commit 9a0c0b2) Signed-off-by: Paul Holzinger <[email protected]>
I haven't seen this once since #24479 so I assume that was it. |
Seeing flakes in
|600| podman shell completion test
when run in parallel:The trick is finding the culprit. There are very few
run_podman volume rm
commands in system tests, it's easy to check them individually, and I don't see any way for any of them to be responsible. Journal shows nothing useful.The text was updated successfully, but these errors were encountered: