Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detaching 1/N resilvering disks caused remaining N-1 resilver to instantly succeed without completing #780

Open
josephvusich opened this issue Dec 12, 2020 · 0 comments

Comments

@josephvusich
Copy link

TLDR: I attached a 3rd mirror to every VDEV. The new disk attached to the special VDEV was clearly bad (write errors) so I detached it. The remaining new drives instantly "completed" resilvering without error, even though the resilver should have continued for hours. Confirmed an issue by starting a manual scrub that identified millions of CKSUM errors on the disks that were incorrectly marked as resilvered.

More detailed walkthrough below. Note that mirror-0 and mirror-2 are HDDs, and special mirror-1 is comprised of SSDs.

OS/ZFS version

$ zfs version
zfs-1.9.4-0
zfs-kmod-1.9.4-0

$ sw_vers                       
ProductName:    Mac OS X
ProductVersion: 10.15.6
BuildVersion:   19G2021

Initial pool layout

	NAME           STATE     READ WRITE CKSUM
	tank           ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    media-0-0  ONLINE       0     0     0
	    media-0-1  ONLINE       0     0     0
	  mirror-2     ONLINE       0     0     0
	    media-2-0  ONLINE       0     0     0
	    media-2-1  ONLINE       0     0     0
	special	
	  mirror-1     ONLINE       0     0     0
	    media-1-0  ONLINE       0     0     0
	    media-1-1  ONLINE       0     0     0

Adding mirrors

zpool attach tank media-1-0 /dev/disk8
zpool attach tank media-0-0 /dev/disk9
zpool attach tank media-2-0 /dev/disk10

One bad disk identified during resilver

(resilvering)

	NAME           STATE     READ WRITE CKSUM
	tank           ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    media-0-0  ONLINE       0     0     0
	    media-0-1  ONLINE       0     0     0
	    disk9      ONLINE       0     0     0
	  mirror-2     ONLINE       0     0     0
	    media-2-0  ONLINE       0     0     0
	    media-2-1  ONLINE       0     0     0
	    disk10     ONLINE       0     0     0
	special	
	  mirror-1     ONLINE       0     0     0
	    media-1-0  ONLINE       0     0     0
	    media-1-1  ONLINE       0     0     0
	    disk8      ONLINE       0 4.08M   326

Detach bad disk

zpool detach tank /dev/disk8

ZFS stops resilver for remaining disks without error

(no resilver in progress)

	NAME           STATE     READ WRITE CKSUM
	tank           ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    media-0-0  ONLINE       0     0     0
	    media-0-1  ONLINE       0     0     0
	    disk9      ONLINE       0     0     0
	  mirror-2     ONLINE       0     0     0
	    media-2-0  ONLINE       0     0     0
	    media-2-1  ONLINE       0     0     0
	    disk10     ONLINE       0     0     0
	special	
	  mirror-1     ONLINE       0     0     0
	    media-1-0  ONLINE       0     0     0
	    media-1-1  ONLINE       0     0     0

Start scrub

zpool scrub tank

The resilver was clearly not finished

(scrub in progress)

	NAME           STATE     READ WRITE CKSUM
	tank           ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    media-0-0  ONLINE       0     0     0
	    media-0-1  ONLINE       0     0     0
	    disk9      ONLINE       0     0 3.08M
	  mirror-2     ONLINE       0     0     0
	    media-2-0  ONLINE       0     0     0
	    media-2-1  ONLINE       0     0     0
	    disk10     ONLINE       0     0 2.75M
	special	
	  mirror-1     ONLINE       0     0     0
	    media-1-0  ONLINE       0     0     0
	    media-1-1  ONLINE       0     0     0
@josephvusich josephvusich changed the title Detach of 1/N resilvering disks caused remaining N-1 resilver to instantly succeed without completing Detaching 1/N resilvering disks caused remaining N-1 resilver to instantly succeed without completing Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant