-
Notifications
You must be signed in to change notification settings - Fork 27
Parsing layer blob: Broken pipe #657
Comments
@cgwalters , the problem came up again in MicroShift CI (see this log as an example). |
@cgwalters , our CI is in great pain because of this issue, is there a known workaround ? |
Not currently other than retrying, which we should probably add some defaults to do that... |
@cgwalters , this problem is causing a lot of grief in MicroShift CI. It's all over either booting VMs, or even running tests that pull a new layer. Is there a chance we add the retry w/a sooner than later in CentOS 9 and also backport it to RHEL 9.4? Once implementer, we should immediatelly provide feedback whether it made a difference. |
I can "easily" reproduce this problem on a local VM by running the following command to alternate between the images (new -> original -> new -> etc.) without rebooting. Is there any additional debug information I can provide to give a clue on why this is happening?
On the registry server side, I'm seeing the following messages in the log when the problem occurs. Looks like the issue is on the client.
|
I've been trying to reproduce this today without much success. I managed to do it one time by switching back and forth, and then I started poking at things with |
As far as tweaking timing one thing to try is using e.g. strace fault injection to add random delays (especially to read+write), something like:
(Fine tune the injection step and delay as desired) |
I let it run for a bit more Wednesday, no luck. Yesterday I took the opportunity to rebuild my dev setup with a fresh podman-machine, and latest podman-bootc from git. Then I let it run all day trying to reproduce. More precisely, what I did was:
Couldn't get it to reproduce, ran it for about 6 hours like that. Then I ran it again, except removing all of the At this point it's driving me absolutely mad that I was able to reproduce it the one time within like 5 minutes of trying, and then never again 😠 |
@jeckersb, I used the following commands to reproduce the error and I'm attaching the
The error was the "usual" one.
|
@ggiguash thanks, definitely a step in the right direction! Do you have the kernel version and the version of skopeo? Or alternatively if you still have the system just run
But at the very least from that we can see that it's skopeo getting the EPIPE. |
Another possibly related CI failure:
|
@jeckersb , here's the stack after installing the debug symbols for skopeo. I'm also uploading perf archive which you can
|
That that's a much more interesting error...it's almost like we somehow swapped data streams? I think the one we're generally seeing ends like:
which my suspicion is that there's some other underlying error that's being masked in this case. |
The one time I got it to reproduce, I got:
Which I guess is only notable because it's not a remote error so I guess it's failing on the "client" side. My plan of attack for tomorrow is to figure out the right perf incantation for us to get a stacktrace when we drop a pipe in rust and then see if that sheds any light on things. |
To aid debugging ostreedev/ostree-rs-ext#657 Signed-off-by: Colin Walters <[email protected]>
To aid debugging ostreedev/ostree-rs-ext#657 Signed-off-by: Colin Walters <[email protected]>
To aid debugging ostreedev/ostree-rs-ext#657 Signed-off-by: Colin Walters <[email protected]>
To aid debugging ostreedev/ostree-rs-ext#657 Signed-off-by: Colin Walters <[email protected]>
I've been poking at this for a while now, overdue for at least some sort of update on where this stands. I realized this is reproducible via just doing a
I let this run overnight last night. The stream9 VM failed after 66 iterations and the f40 VM failed after 102 iterations. Not particularly conclusive either way. However today I noticed that the f40 VM was out of date (notably running kernel-6.8.5-301.fc40.x86_64 whereas latest is kernel-6.11.3-200.fc40.x86_64), so I made sure both hosts were updated (stream9 only had non-meaningful changes) and rebooted into the latest to re-run the tests. The stream9 VM failed after 57 iterations. As of writing this, the f40 VM is still going at 90 iterations. We'll see how that turns out. Ideally I would prefer to use fedora for reproducing because the (Edit: f40 made it 102 iterations (coincidentally the same as last time) before it failed with a registry error unrelated to the broken pipe, restarting it...) (Later edit: ran everything again, stream9 made 53 iterations and f40 76 iterations before both hit the broken pipe) |
To aid debugging ostreedev/ostree-rs-ext#657 This will be especially handy with containers/skopeo#2439 Signed-off-by: Colin Walters <[email protected]>
I finally have a consistent reproducer for this, using only Run as root: # Put the data on tmpfs to minimize I/O delay (thanks @cgwalters for the nudge)
cd /tmp
# Export the image to oci-dir on tmpfs. If instead read from host containers-storage (or registry),
# the problem no longer reproduces.
podman save --format=oci-dir -o fedora-bootc quay.io/fedora/fedora-bootc:40
# Create a cgroup limited to one cpu and put our shell into it
echo +cpuset >> /sys/fs/cgroup/cgroup.subtree_control
mkdir /sys/fs/cgroup/onecpu
echo 1 > /sys/fs/cgroup/onecpu/cpuset.cpus
echo $$ > /sys/fs/cgroup/onecpu/cgroup.procs
# Try to pull and fail!
ostree --repo=repo init
ostree container image pull --quiet repo ostree-unverified-image:oci:fedora-bootc |
Some observations about where and how it breaks:
Worth noting that this is the specific image (didn't realize until now that it was two weeks old):
Layers history for this image:
|
With the extra debug output from the patches to containers-image-proxy-rs and skopeo that @cgwalters did:
|
Nope still fails like that (I had to modify your patch slightly with only one |
More details on the timing here! Running like so...
gets us...
Which shows where the |
I have a theory that there's a different underlying error that's being masked here. |
Can you try with this?
|
No difference, still just fails with: |
Run under gdb, |
Can you capture in gdb the point where we call |
Should be here. Note that once I add a breakpoint on drop and have it dump stack it slows things down enough that the problem no longer reproduces. But I suspect the
|
Ohh right in current code it's the tar stream that owns the reader, does this help?
|
Does not help. Another thing worth noting is that I tried forcing tokio, tokio-util, and tokio-stream to the latest versions, just to make sure there wasn't some random upstream fix there but that also did not help. |
For reference instead of directly writing to cgroups I was able to reproduce with So a core loop would look like this:
The first command flushes the refs for the layers in the existing repo. |
This took a crazy long time to debug but after lots of false starts I think this is right. Basically what's going on is we have async tasks that are talking over a `pipe()` inside our own process. We must not close the read side of the pipe until the writer is done. I believe this is dependent on tokio task scheduling order, and it's way easier to reproduce when pinned to a single CPU. Closes: ostreedev#657 Signed-off-by: Colin Walters <[email protected]>
This took a crazy long time to debug but after lots of false starts I think this is right. Basically what's going on is we have async tasks that are talking over a `pipe()` inside our own process. We must not close the read side of the pipe until the writer is done. I believe this is dependent on tokio task scheduling order, and it's way easier to reproduce when pinned to a single CPU. Closes: ostreedev#657 Signed-off-by: Colin Walters <[email protected]>
OK the thing I'm thinking we should try now:
(There and in the other place too) |
I have a tentative patch that moves the decompressor bits inline, needs a bit of cleanup just for readability-sake and then I will post a draft. I think there's a bug either in my new implementation, or my new implementation is uncovering some pre-existing bug. On current main, when running the tests I occasionally get:
With my patch, I still occasionally get that same error. However, I also now occasionally get this error as well:
Which is similar but distinct from the first. Will need to investigate a bit more. |
Yeah we've always had this bug sadly #664 |
I guess I should dig more into #664 as part of this then. I'd like to understand better what's going on there so I can at least be somewhat confident I'm not making things worse. |
This (with cautious optimism) appears to fix ostreedev#657. Signed-off-by: John Eckersberg <[email protected]>
This (with cautious optimism) appears to fix ostreedev#657. Signed-off-by: John Eckersberg <[email protected]>
@jeckersb , can you explain if the current fix is supposed to resolve this error, or we also need to wait for #664? |
This should be sufficient to fix that error. #664 is similar, but is a separate problem. |
Hopefully soon! We just released 1.1.2 so now it's just a matter of getting that pushed through the centos stream processes. @jmarrero I'd like to run through the centos release process with you so I can take some of the burden off of you 😄 |
Actually for those two specific errors that you posted, it looks like the error happened in |
This one is like my enemy! I have a tracker over here for it coreos/rpm-ostree#4567 too
Discoveries so far:
More generally it's definitely a race condition; I can sometimes reproduce this by doing
ostree refs --delete ostree/container
and then re-running the rebase.Also of note: kola defaults to a uniprocessor VM, which I think is more likely to expose this race.
I'm quite certain it has something to do with the scheduling of us closing the pipe vs calling
FinishPipe
.The text was updated successfully, but these errors were encountered: