Replies: 23 comments 10 replies
-
the pool is corrupted with errors and can't be repair, how many times do this need to occur to be called a bug?
small database files
no
i didn't corrupt my own data, are you asking if i can do it more than one time?
every month, but this pool is new. today a scrub was run ====> scan: scrub repaired 0B in 0h0m with 0 errors on Mon Dec 9 10:19:57 2019
disks are new, amazon black friday new
i don't use smart |
Beta Was this translation helpful? Give feedback.
-
Can you please check the After checking |
Beta Was this translation helpful? Give feedback.
-
You may need to run |
Beta Was this translation helpful? Give feedback.
-
@behlendorf Is this type of error always related to Does this comment express, that these errors are more likely to happen on USB connected drives? If there was a snapshot made of the pool, before the error incident happened, will restoring the snapshot automatically "clear" the errors? UpdateScrub finished with
Now the metadata errors are gone. Am still not entirely sure, what caused them. |
Beta Was this translation helpful? Give feedback.
-
I have seen this issue as well. In my case the files in question are discovered as corrupt during a zfs send and previous scrubs did not see them. I have seen at least one of these corrupt file issues disappear after/during a scrub. I'm not seeing anything related to the backing drives in dmesg at all. Additionally, I've seen corrupt files become corrupt metadata as the OP has. It happened when I renamed the data set with those files, restored it from a replica and then destroyed the renamed version with the corruption. The corrupted metadata showed up until, I presume, the background destroy finished. linux amd64, debian buster, debian backports, kernel 5.4.0-0.bpo.2-amd64, zfs-dkms 0.8.2-3~bpo10+1 |
Beta Was this translation helpful? Give feedback.
-
This saved my pool. I got similar errors as @opili892 when I played around with vdev device names while doing zfs send/receive. Stupid me. The pool ended up with this status:
The second(!) scrub fixed it! I would never have tried a second scrub until I read @richardelling comment. I always thought that what one scrub cannot fix, another scrub cannot fix either. Is this behaviour documented somewhere? |
Beta Was this translation helpful? Give feedback.
-
@mabod: from what I know, you don't need two scrubs to fix the issue; rather, the two scrubs are required to clear the issue from |
Beta Was this translation helpful? Give feedback.
-
ok. Does that mean that I always need to run scrub twice to get rid of any(!) error messages? What is the technical reason for not clearing the status right away when the issue is fixed. |
Beta Was this translation helpful? Give feedback.
-
It is simply a log of the previous pool state. I am not sure if all errors behave in this manner, though. |
Beta Was this translation helpful? Give feedback.
-
Seems dead simple to me, imagine this scenario:
If the first run of the scrub cleared the errors, there would be no errors shown at all when you come back to check on it (except for brief moment during scrub itself). |
Beta Was this translation helpful? Give feedback.
-
I am not sure if I understand your explanation and which of the three points is actually explaining the zpool status I saw: To recap what happened
2.) after(!) the first scrub the status was showing the same two errors How does this behavior fit to your explanations? |
Beta Was this translation helpful? Give feedback.
-
On May 26, 2020, at 4:28 AM, Matthias Bodenbinder ***@***.***> wrote:
ok. Does that mean that I always need to run scrub twice to get rid of any(!) error messages? What is the technical reason for not clearing the status right a way when the issue is fixed.
scrubs can take a long time (weeks), so the status shows the results of current scrub and previous scrub
-- richard
|
Beta Was this translation helpful? Give feedback.
-
double scrub does not fix the status message. it is still there. but the Permanent errors have been detected in the following files: message is still there. |
Beta Was this translation helpful? Give feedback.
-
I'm running zfs-2.0 |
Beta Was this translation helpful? Give feedback.
-
@microcai please post the output of |
Beta Was this translation helpful? Give feedback.
-
The output of |
Beta Was this translation helpful? Give feedback.
-
Unsure what happened to mine , did a Ubuntu update rebooted to see IO errors pop up and now I have a tone of corrupt files and folders ? Ubuntu did keep waiting for the zfs to unmount and kept bumping up the wait timer |
Beta Was this translation helpful? Give feedback.
-
I just got some metadata errors too - this is quite confusing this two scrub thing. In particular because if it shows a previous status, the display should note which is the previous and which is the current. I seriously doubt that status legend would be left out intentionally and if it wasn't intentional, how has it Never been fixed all these years. Here's mine for good measure - putting some large 100G files on. All I was doing was copying files. But it's a new disk so it's possibly faulty. Note that no scrub has yet been performed. I assume I can delete those files, scrub it (maybe twice) and then fill it up again.
|
Beta Was this translation helpful? Give feedback.
-
Yes it's a single drive, no important data on it. My understanding is scrubs on a single drive have a hope of repairing some metadata but that's about it. Metadata has two copies I believe so it was a little surprising but not unlikely I guess. Though the example is same symptom in that that there are no read / write / checksum errors associated with the drive. |
Beta Was this translation helpful? Give feedback.
-
It is unacceptable that one has to run scrub twice to clear the error message ... Every scrub takes a lot of time. It basically can almost run through the entire disks. The second one is very moronic, but not only that. If one reads the life conditions on modern harddrives they specify a maximum quantity of Bytes that can be read every year before the drive can start to fail. These type of operations increases, totally unnecessarily, the risk of disk failure ... this is totally unacceptable. It should be labeled as a critical bug. |
Beta Was this translation helpful? Give feedback.
-
I don't want to be the guy on the sidelines complaining, but as a simple user of openzfs I would just like to mention that the nature of these send/recv corruptions that require double scrubs are hurting the great image zfs has constructed over the years. Honestly, I would suggest to pull encryption from the code (just disable the relevant commands) or to make it more clear what scenario's are safe to use. I have just migrated my home systems away from zfs encryption. I had to scrub every 4-5 days, with a scrub taking a day to complete. That is nearly 50% of the time (indeed doing double scrubs every time). This is not up to the zfs standards that people have come to expect. I think openzfs has one of the most vibrant, positive, respectful and productive communities I have ever seen in open source, and I understand there is a lack of knowledge on the code. I feel this encryption was a poisoned present: if parties contribute they should commit to properly supporting their code at least for a few years for something as innovative as this was. Now 3 years in there is hardly any progress on it so it seems. I would pull it before real accidents happen. |
Beta Was this translation helpful? Give feedback.
-
I just encountered this issue too. For any devs interested in reproducing the error, you can do it by putting a ZFS pool on an iSCSI device, then reboot the iSCSI host while writing to ZFS. This is what happened to me, and rather than waiting until the iSCSI host came back online and retrying the pending writes, it just failed them.
The URL given for the error assumes the problems are with the files not the metadata so it doesn't really explain what to do. I'm trying a scrub but it will take a long time. I am only using this disk as a I wonder if there's a way to tell ZFS to not give up so quickly on I/O failures and to keep retrying indefinitely, so that they can complete successfully when the iSCSI host comes back online? |
Beta Was this translation helpful? Give feedback.
-
Bump on this thread. I'm running into a similar problem (see #15166 ). I'll do another scrub but I'm not sure I see the point of this outside of just clearing error messages. I also did a "zpool clear data"
|
Beta Was this translation helpful? Give feedback.
-
my pool has errors with redundant disks, but scrub didn't repair, why?
pool: nas
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0B in 0h0m with 0 errors on Mon Dec 9 10:19:57 2019
config:
errors: Permanent errors have been detected in the following files:
nas is ubuntu with zfs version 0.7.5-1ubuntu16.6
Beta Was this translation helpful? Give feedback.
All reactions