-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS send hangs sometimes #16731
Comments
I'm facing a somehow similar issue. It occurs only on one out of 7 servers that all run the same OS (FreeBSD 14.1, OpenZFS 2.2.4), and use the same (home-made) zfs replication software. It is basically a series of The Killing the piped There are no related I'd be glad to help investigating the issue, but don't know where to look. |
@vedadkajtaz thanks! I have to correct myself, zfs-2.2.4 also contains the suspected patch. If you would be able to test openzfs without commit 6bdc725, that would help. I am running 3 boxes now with that reverted, only for a few days, without hanging zfs send. But, I definitely would need more time to declare this as a possible cause. |
@vedadkajtaz did you have a chance to build zfs userspace with 6bdc725 reverted? Then, that will need some time, but according to your experiments, in a week you'll be able to report some results. |
Hi, I haven't done anything yet regarding this, possibly/likely next week, sorry. |
@nabijaczleweli I can report that reverting the mentioned commit caused no zfs send issues on 3 FreeBSD based NAS servers for more than a week now. Can you have a look at the commit? |
It looked sound back then so it looks sound now. No-one seems to have posted a strace (or backtrace) that would indicate where these hang, that commit basically doesn't touch the actually-sending-stuff thread at all, and all the setup is deterministic AFAICT. This bug hasn't left "oh i see this sometimes". I can't evaluate data you're withholding. |
I have a hung process (with stock binary, FreeBSD 14.1, OpenZFS 2.2.4) right now. There is no
Not super helpful without debugging symbols, but it's obviously stuck in |
Would it be a terrible bother to take a backtrace, with symbols, of all the threads, so we don't have to guess what's happening? Attaching the strace-equivalent should in general be easier and tell you in which syscall each thread is stuck, but I don't really know if FreeBSD possesses this ability. |
I'll rebuild (stock, ie. |
@nabijaczleweli unfortunately, I can only add that since I am running my servers with the mentioned patch reverted, I am not facing with hung zfs processes. |
System information
Describe the problem you're observing
TrueNAS is using zettarepl to replicate zfs datasets to remote sites. During a cycle, sometimes, rarely, zfs send hangs. The symptom is that
zfs send
hangs, not sending anything to its output, is in idle state. I've applied a workaround, a simpe pipe command which reads output from zfs send and passes data through, and this command is reporting that no output is received from zfs send for minutes. Then it killszfs send
. Also, it is reporting that usually only a few thousand bytes are sent by zfs send, not more. Then, simply killing zfs send solves the problem, upon next cycle it will usually send the snapshots completely, without errors.Must note here that zfs used by TrueNAS contains this PR. I suspect this may be the source of my issue.I suspect that 6bdc725 may be the source of my issue.
Usually, I receive send errors once in a week or two, cannot reproduce, but I will now give a try without this patch, and see the difference.
Describe how to reproduce the problem
Unfortunately, cannot reproduce.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: