Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool import progress indicator #8646

Closed
wants to merge 2 commits into from

Conversation

ofaaland
Copy link
Contributor

@ofaaland ofaaland commented Apr 20, 2019

Motivation and Context

When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time. The user may
not know why, or be able to tell whether the import is progressing or is
hung.

Description

A previous commit added a kstat which lists all imports currently
being processed by the
kernel (currently only one at a time is possible, but the kstat allows
for more than one). The kstat is at
/proc/spl/kstat/zfs/import_progress.

In the zpool utility, create an additional thread which periodically
checks the contents of this kstat. If the import does not finish
quickly, it reports the following on stderr:

  • Name and guid of pool being imported
  • User-friendly version of spa_load_state
  • spa_load_max_txg

If one or more of these values change, the thread prints a new record.

A new record will be printed to stderr with a maximum frequency of one
per second so as not to spam the user. As a result the printed output
may reflect only some of the import states transitioned through.

Sample output:

Pool tank1 (guid 4591991398949870326):
Checking for a remote import.
Check will complete by Tue Apr 16 08:43:58 2019

Pool tank1 (guid 4591991398949870326):
Checking for a remote import.
Check will complete by Tue Apr 16 08:44:43 2019
Recovering Pool max txg 745

How Has This Been Tested?

Hand testing with and without multihost check triggered, and with and without zpool -FX -T arguments.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@ofaaland ofaaland added the Status: Work in Progress Not yet ready for general review label Apr 20, 2019
@richardelling
Copy link
Contributor

The idea is good. But I’m not sure about stderr. It seems more consistent to have a zpool import —verbose option and print to stdout.

Perhaps some of the other automation and appliance builders can comment on how this might affect the automation and reporting?

@ofaaland
Copy link
Contributor Author

@richardelling thanks for the feedback.

@codecov
Copy link

codecov bot commented Apr 21, 2019

Codecov Report

Merging #8646 (6555737) into master (161ed82) will increase coverage by 4.49%.
The diff coverage is 9.09%.

❗ Current head 6555737 differs from pull request most recent head c7b9aa8. Consider uploading reports for the commit c7b9aa8 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8646      +/-   ##
==========================================
+ Coverage   75.17%   79.67%   +4.49%     
==========================================
  Files         402      394       -8     
  Lines      128071   124719    -3352     
==========================================
+ Hits        96283    99372    +3089     
+ Misses      31788    25347    -6441     
Flag Coverage Δ
kernel 80.39% <ø> (+1.63%) ⬆️
user 65.68% <9.09%> (+18.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/zpool/zpool_main.c 80.12% <9.09%> (-0.70%) ⬇️
include/os/linux/zfs/sys/trace_acl.h 33.33% <0.00%> (-33.34%) ⬇️
cmd/mount_zfs/mount_zfs.c 42.85% <0.00%> (-20.91%) ⬇️
cmd/zed/zed_file.c 33.80% <0.00%> (-8.42%) ⬇️
cmd/zed/zed_exec.c 82.92% <0.00%> (-7.49%) ⬇️
cmd/zvol_id/zvol_id_main.c 76.31% <0.00%> (-5.27%) ⬇️
lib/libzfs/os/linux/libzfs_mount_os.c 68.86% <0.00%> (-4.97%) ⬇️
module/zfs/vdev_missing.c 60.00% <0.00%> (-3.64%) ⬇️
module/zfs/abd.c 91.61% <0.00%> (-3.36%) ⬇️
cmd/raidz_test/raidz_test.c 79.47% <0.00%> (-3.15%) ⬇️
... and 259 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec64fdb...c7b9aa8. Read the comment docs.

@ofaaland ofaaland mentioned this pull request Apr 30, 2019
12 tasks
@ofaaland ofaaland force-pushed the b-import-progress-thread branch 4 times, most recently from d47cd40 to 6555737 Compare August 17, 2020 06:48
@ahrens
Copy link
Member

ahrens commented Jun 4, 2021

@ofaaland This seems like a nice enhancement to zpool import. I see that it's marked "work in progress". What additional work needs to be done to get it ready to review and integrate?

@ahrens ahrens added the Type: Feature Feature request or new feature label Jun 4, 2021
@ofaaland
Copy link
Contributor Author

ofaaland commented Jun 5, 2021 via email

@ofaaland
Copy link
Contributor Author

ofaaland commented Jun 5, 2021

@ofaaland This seems like a nice enhancement to zpool import. I see that it's marked "work in progress". What additional work needs to be done to get it ready to review and integrate?

I confirmed this PR is just for the utility changes. The rebase was fairly clean so it might still work fine (haven't tested yet). I haven't written tests for the utility changes yet. I'll work on that - it would be nice to get this done.

@ofaaland ofaaland force-pushed the b-import-progress-thread branch 3 times, most recently from a6584d0 to 986d678 Compare June 7, 2021 04:44
@ofaaland ofaaland force-pushed the b-import-progress-thread branch 4 times, most recently from 290864e to ddcc1c1 Compare July 21, 2021 16:26
@ofaaland ofaaland added Status: Design Review Needed Architecture or design is under discussion and removed Status: Work in Progress Not yet ready for general review labels Jul 21, 2021
@ofaaland
Copy link
Contributor Author

ofaaland commented Jul 21, 2021

Hi, before reviewing this for detail stuff, I have a question.

I've altered the existing MMP tests to use the output of zpool import -v so that the the multihost related output has some test coverage. I don't currently have any test coverage for the TXG rewind portion of the output because I don't see an existing way to make the import slow other than forcing a MMP activity check. As a consequence, the import would complete (or fail) before my TXG rewind related output would appear.

Options I've come up with are:

  1. Create a new test that sets up an import which requires an MMP check and does a TXG rewind. Then at least I can confirm the rewind-related text appears.
  2. Create a mechanism to artificially slow down the import, with a sleep() somewhere in zutil_import.c triggered by the presence of a special environment variable
  3. Create a mechanism to artificially slow down the import, in the kernel, triggered by an undocumented module option or a new zinject mechanism.

I don't really think options 2 or 3 really add much, although the 3 would allow the test to check which TXG numbers reported are reasonable.

Do you have any thoughts? Thanks

@ofaaland ofaaland added Status: Feedback requested More information is requested and removed Status: Design Review Needed Architecture or design is under discussion labels Jul 22, 2021
@ofaaland ofaaland changed the title zpool import progress indicator [WIP] zpool import progress indicator Oct 9, 2021
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
@ofaaland ofaaland force-pushed the b-import-progress-thread branch 8 times, most recently from e44440b to e930c99 Compare October 20, 2021 06:04
@ofaaland
Copy link
Contributor Author

@freqlabs @nabijaczleweli @asomers I have a linux/freebsd compat question and see you've contributed several freebsd-related patches.

My patch needs to read a kstat. On linux that's done with something like fd = open("/proc/spl/kstat/<kstat-name>", O_RDONLY); read(fd, buf, bufsize); .

Googling around it looks like kstats are accessible on freebsd systems under /dev/kstat/ and there's a family of functions kstat_open(), kstat_read(), etc. for that purpose. I looked quite a bit and find no place in the existing ZFS code where a C function reads a kstat to use as a model. What's the right way? Is there already a mechanism for abstracting this in the ZFS code that I overlooked? Thank you.

@asomers
Copy link
Contributor

asomers commented Oct 21, 2021

@freqlabs @nabijaczleweli @asomers I have a linux/freebsd compat question and see you've contributed several freebsd-related patches.

My patch needs to read a kstat. On linux that's done with something like fd = open("/proc/spl/kstat/<kstat-name>", O_RDONLY); read(fd, buf, bufsize); .

Googling around it looks like kstats are accessible on freebsd systems under /dev/kstat/ and there's a family of functions kstat_open(), kstat_read(), etc. for that purpose. I looked quite a bit and find no place in the existing ZFS code where a C function reads a kstat to use as a model. What's the right way? Is there already a mechanism for abstracting this in the ZFS code that I overlooked? Thank you.

There is no /dev/kstat, kstat_open, kstat_read on FreeBSD. Instead, use the sysctl interface. From sh, do something like sysctl kstat.zfs.zroot.dataset.objset-0x7923.reads. Or from C, use sysctlbyname(3).

@ofaaland
Copy link
Contributor Author

There is no /dev/kstat, kstat_open, kstat_read on FreeBSD. Instead, use the sysctl interface. From sh, do something like sysctl kstat.zfs.zroot.dataset.objset-0x7923.reads. Or from C, use sysctlbyname(3).

Awesome, thank you

@ofaaland ofaaland force-pushed the b-import-progress-thread branch 3 times, most recently from b1a689e to 1df3d3a Compare October 22, 2021 06:24
On linux, kstats are fetched with open(),read(),close() or similar.
On freebsd, they are fetched with sysctl.

Create compat functions that do the right thing for the platform.
Currently handles only kstats which are directly below "zfs".

Signed-off-by: Olaf Faaland <[email protected]>
When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time.  The user may
not know why, or be able to tell whether the import is progressing or is
hung.

Add a new option to "zpool import", "-v" (for verbose).

When the -v option is used, create an additional thread which
periodically checks the contents of kstat/zfs/import_progress.  If the
import does not finish quickly, it reports the following on stdout:
* Name and guid of pool being imported
* User-friendly version of spa_load_state
* Expected time import will complete
* the max txg if recovering the pool

If one or more of these values change, the thread prints a new record.

A new record will be printed to stdout with a maximum frequency of one
per second so as not to spam the user.  As a result the printed output
may reflect only some of the import states transitioned through.

Use import kstat to check for Multihost activity check in relevant
tests, instead of using import duration, which works poorly when testing
on a slow machine.

Sample output:

	Pool tank1 (guid 4591991398949870326):
	Checking for a remote import.
	Check will complete by Tue Apr 16 08:43:58 2019

	Pool tank1 (guid 4591991398949870326):
	Checking for a remote import.
	Check will complete by Tue Apr 16 08:44:43 2019
	Recovering Pool max txg 745

Signed-off-by: Olaf Faaland <[email protected]>
@ghost
Copy link

ghost commented Oct 25, 2021

The sysctls on FreeBSD are typed, they're not all text like the kstat files on Linux. That throws a bit of a monkey wrench into the kstat_read interface. It will work for the import_progress kstat because we happened to reuse most of what Linux was doing for that instead of having separate properly typed fields, but more generally we don't expose kstats as text. Briefly looking over kstat_read(3) and kstat(3kstat) manuals for illumos, it seems to be a similar situation there. Linux is the oddball in exposing kstats as strings of text.
The kstats are also organized differently on FreeBSD, so import_progress is in kstat.zfs.misc.import_progress rather than kstat.zfs.import_progress, among other things. Here's the structure and types of kstat.zfs sysctl nodes on a system with two pools "storage" and "system-nvme" imported:
https://gist.github.com/freqlabs/0039bd94c498b17c39031a9311e25928

@ofaaland
Copy link
Contributor Author

Thanks, @freqlabs

@ghost
Copy link

ghost commented Nov 9, 2021

To move this forward I think it makes sense to abandon the idea of a general kstat_read library function and make it something like read_import_progress so it's clear this function isn't a general purpose kstat reader. A real implementation of kstat_read is tricky enough to be outside the scope of this work IMO.

@ofaaland
Copy link
Contributor Author

ofaaland commented Nov 9, 2021

@freqlabs,

it makes sense to abandon the idea of a general kstat_read library function and make it something like read_import_progress so it's clear this function isn't a general purpose kstat reader

I agree, I'll do that. Thanks for the information.

@amotin
Copy link
Member

amotin commented Oct 31, 2024

This is mostly replaced by more detailed #15539 & Co.

@amotin amotin closed this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants