Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support single files scrubbing #16018

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Mar 21, 2024

Motivation and Context

Currently, we have various scrubbing approaches, including scrubbing an entire pool or scrubbing saved errors. However, the ability to target specific data for scrubbing can be particularly beneficial. For instance, one simple application of this is scrubbing a single file rather than the entire pool. One of the simplest examples is to scrub a single file instead of the whole pool. This scrubbing could be desired for files written prior to disk/controller errors or for older files that warrant inspection. For a large pools limiting the actual data that we have scrub can be quite usefull.

Description

This introduces the functionality of scrubbing data from files. Originally, the errloglist structure was utilized for tracking error blocks identified during scrub operations. Same structure can be used record blocks from specific files. By generating these records we can simbply scurb it. This allows scrubbing a single file, leveraging existing infrastructure.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

How Has This Been Tested?

A new test was added.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@oshogbo oshogbo force-pushed the oshogbo/scrub_single_file branch 3 times, most recently from 80cb074 to adca54a Compare March 21, 2024 20:05
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 27, 2024
@tonyhutter
Copy link
Contributor

  1. The two test cases are nearly identical except one makes a 10MB file and the other makes a 1GB file. Could you combine the tests?
  2. It would be desirable to have a test case where you scramble a file and they verify the file scrub actually repairs it.
  3. What is the behavior if you are actively scrubbing a file and someone else tries to scrub the pool (and vice versa)?

@oshogbo
Copy link
Contributor Author

oshogbo commented Mar 29, 2024

The two test cases are nearly identical except one makes a 10MB file and the other makes a 1GB file. Could you combine the tests?

Yes, I can do that.

It would be desirable to have a test case where you scramble a file and they verify the file scrub actually repairs it.

Ok, I can try to add such.

What is the behavior if you are actively scrubbing a file and someone else tries to scrub the pool (and vice versa)?

The file will be added to the error list; however, the scrub will fail, as you can't start another scrub.

man/man8/zfs-scrub.8 Outdated Show resolved Hide resolved
@tonyhutter
Copy link
Contributor

What is the behavior if you are actively scrubbing a file and someone else tries to scrub the pool (and vice versa)?

The file will be added to the error list; however, the scrub will fail, as you can't start another scrub.

Just so I'm understanding correctly - if I do:

zfs scrub /tank/huge_file
zpool scrub tank

...the zpool scrub will fail and the zfs scrub will continue, with /tank/huge_file being put in the error list? Is that correct? If it's in the error list, will /tank/huge_file be listed as having errors if I do zpool status -v ?

And conversely, if I do:

zpool scrub tank
zfs scrub /tank/huge_file

The zfs scrub will fail?

If this is correct, then we should print an error in both cases saying something to the effect of "I can't do that, there's a scrub in progress". We should mention this behavior in the man pages as well.

Some other scenarios that come to mind:

  1. What is the behavior if you do a zpool scrub, pause the scrub, then scrub a file? Does the "paused state" get canceled when you scrub the file, or can you resume the zpool scrub after scrubbing the file?

  2. If you start a zfs scrub is there any way to cancel the scrub? Will zpool scrub -s cancel it?

@oshogbo
Copy link
Contributor Author

oshogbo commented Apr 24, 2024

...the zpool scrub will fail and the zfs scrub will continue, with /tank/huge_file being put in the error list? Is that correct? If it's in the error list, will /tank/huge_file be listed as having errors if I do zpool status -v ?

Yes, that is correct. You will see the file on error list.

zpool scrub tank
zfs scrub /tank/huge_file

The zfs scrub will fail?

Yes, as there is already a running scrub.

If this is correct, then we should print an error in both cases saying something to the effect of "I can't do that, there's a scrub in progress". We should mention this behavior in the man pages as well.

I have added that only one scrub can work.

$ sudo ./zfs scrub /ztest1/t ; sudo zpool scrub ztest1
cannot scrub ztest1: currently resilvering

What is the behavior if you do a zpool scrub, pause the scrub, then scrub a file? Does the "paused state" get canceled when you scrub the file, or can you resume the zpool scrub after scrubbing the file?

So currently the file will be added to the error list and the scrub will be still paused.

If you start a zfs scrub is there any way to cancel the scrub? Will zpool scrub -s cancel it?

Yes, its a normal scrub it behaves the same.

This introduces the functionality of scrubbing data from
files. Originally, the `errloglist` structure was utilized
for tracking error blocks identified during scrub operations.
Same structure can be used record blocks from specific files.
By generating these records we can simbply scurb it.
This allows scrubbing a single file, leveraging existing
infrastructure.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Mariusz Zaborski <[email protected]>
Comment on lines +64 to +67
log_must zinject -a -t data -C 0,1 -e io $mntpnt/file

log_must zfs scrub $mntpnt/file
log_must is_pool_without_errors $TESTPOOL true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test corrupts the first two DVAs of $mntpnt/file then I would expect scrub to say "I repaired data" rather than "everything was fine". Does this check that scrub repaired data here?

Comment on lines +1976 to +1977

function is_pool_without_errors #pool <verbose>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return true if there were no errors, or does it return true if there were errors but they were all repaired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there are no more known errors.

@tonyhutter
Copy link
Contributor

If it's in the error list, will /tank/huge_file be listed as having errors if I do zpool status -v ?

Yes, that is correct. You will see the file on error list.

Hmm.. I'm worried that the user will think their files are corrupted when they really arn't. Is there a way around that?

Also, minor typo in commit message:

By generating these records we can simbply scurb it.

Lastly, are you planning to add delegation support for zfs scrub? If so, I could see some potential issues there.

@oshogbo
Copy link
Contributor Author

oshogbo commented May 10, 2024

Hmm.. I'm worried that the user will think their files are corrupted when they really arn't. Is there a way around that?

We could either create a separate list for such files or add some flags that indicate that file was added manually (however I think this would require the changes on disk format).

Lastly, are you planning to add delegation support for zfs scrub? If so, I could see some potential issues there.

The user who has access to the file can scrub it. So I didn't think about doing a delegation.

@tonyhutter
Copy link
Contributor

Hmm.. I'm worried that the user will think their files are corrupted when they really arn't. Is there a way around that?

We could either create a separate list for such files or add some flags that indicate that file was added manually (however I think this would require the changes on disk format).

Unfortunately, I think we would have to do this. We don't want the user to see files flagged in zpool status -v while they are being scrubed.


The user who has access to the file can scrub it. So I didn't think about doing a delegation.

This has some unfortunate side effects.

The unprivileged user would be able to start a scrub on their files, but they wouldn't be able to pause or cancel the scrub, which seems like something they should be able to do.

An unprivilaged user could also do:

while [ 1 ] ; do zfs scrub /mydataset/myfile ; fi

... and prevent anyone else from doing a scrub of anything (including the admin's "monthly scrub cron job").

I get that it makes sense to make this a zfs command since it operates on files. However, it is this weird gray zone area where it's a file-level scrub, but it affects the entire pool. Would it make sense to make per-file scrubbing part of zpool scrub? So like: zpool scrub /mydataset/myfile? Or is there a specific use case you're thinking of where it would be useful for unprivilaged users to do zfs scrub? I don't have a specific opinion on this, just discussing pros and cons.

@oshogbo
Copy link
Contributor Author

oshogbo commented May 10, 2024

Unfortunately, I think we would have to do this. We don't want the user to see files flagged in zpool status -v while they are being scrubed.

I will try to figure out something.

I get that it makes sense to make this a zfs command since it operates on files. However, it is this weird gray zone area where it's a file-level scrub, but it affects the entire pool. Would it make sense to make per-file scrubbing part of zpool scrub? So like: zpool scrub /mydataset/myfile? Or is there a specific use case you're thinking of where it would be useful for unprivilaged users to do zfs scrub? I don't have a specific opinion on this, just discussing pros and cons.

No I don't have. However, conceptually I like that this scrub is separated from the "global scrub". Maybe we can simply do a zfs scrub a part of zpool scrub permissions set? So if you can run zpool scrub, you can also run zfs scrub.

@jumbi77
Copy link
Contributor

jumbi77 commented Jun 10, 2024

Can I ask the progress of this? Scrubbing a file (or even a dataset, see other PR) would be a pretty great feature since subbing really big pools is time consuming.

If this got landed, it is imaginable that Klara, Inc. or Wasabi Technology, Inc. also sponsoring to finish the dataset-wide scrubbing?

In any case much thanks.

@oshogbo
Copy link
Contributor Author

oshogbo commented Jun 10, 2024

@jumbi77 I have switched to another task (the scrub range), although the plan is to add it sooner then later. Probably with a second errlist for the files that have been ordered manually.

@jumbi77
Copy link
Contributor

jumbi77 commented Sep 20, 2024

@jumbi77 I have switched to another task (the scrub range), although the plan is to add it sooner then later. Probably with a second errlist for the files that have been ordered manually.

@oshogbo Can I may ask the status of this? Just sharing my thoughts that for many users a file-based and also dataset-based ( #7257 ) would be awesome for many zfs users. In any case much thanks for the work!

@behlendorf behlendorf self-requested a review September 21, 2024 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants