Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never boots, saying rpool is in use from other system #2195

Closed
jgoerzen opened this issue Mar 20, 2014 · 19 comments
Closed

Never boots, saying rpool is in use from other system #2195

jgoerzen opened this issue Mar 20, 2014 · 19 comments
Milestone

Comments

@jgoerzen
Copy link

On every boot on this system, I get this message:

FAIL: zpool import -c /etc/zfs/zpool.cache -N rpool . Retrying...
FAIL: zpool import -N -d /dev/disk/by-id rpool . Retrying...

Command: zpool import -N -d /dev/disk/by-id rpool
Message: cannot import 'rpool': pool may be in use from other system
use '-f' to import anyway
Error: 1

Manually import the root pool at the command prompt then exit.
Hint: Try: zpool import -R / -N rpool

Running:

zpool imprt -fR / -N rpool; exit

lets the system boot.

This issue was first reported on the mailing list at https://groups.google.com/a/zfsonlinux.org/d/topic/zfs-discuss/RggMKyj-64A/discussion but no resolution was reached. I am unsure if it is a zfs or zfs-pkg bug. I do not appear to have hostid issues. The disk is never in use by another system.

@ryao
Copy link
Contributor

ryao commented Mar 20, 2014

The problem is that the hostid for the pool in the initramfs does not match the hostid of the actual pool. Run zpool set cachefile= rpool and then rebuild your initramfs. That should clear this problem. If it does not, the initramfs software needs modification to store the system hostid correctly. You could workaround it by using the spl.spl_hostid kernel commandline parameter to override it should your initramfs software support that.

@jgoerzen
Copy link
Author

I've rebuilt the initramfs numerous times. The hostid command from within the initramfs, the spl message from mere seconds after boot in dmesg, and the hostid command from the running system, all show the same hostid as well.

zpool set cachefile= rpool, by itself, does nothing. If I then rebuild the initramfs, things progress a little further. However, it still doesn't boot, with this message:

FAIL: zpool import -c /etc/zfs/zpool.cache -N rpool . Retrying...
FAIL: zpool import -N -d /dev/disk/by-id rpool . Retrying...

Command: zpool import -N -d /dev/disk/by-id rpool
Message: cannot import 'rpool': a pool with that name is already created/imported,
and no additional pools with that name were found
Error: 1

followed by the same hints.

This time, I have to run just exit to boot.

One other note. Something is changing cachefile back to none on this pool. It does not stay at the empty string.

@jgoerzen
Copy link
Author

@FransUrbo You might also be interested in this discussion.

@ryao
Copy link
Contributor

ryao commented Mar 20, 2014

The command I provided should regenerate zpool.cache. Which distribution and initramfs generator are you using? It looks like the software is not able to handle verbatim import from the cachefile. In that case, making an empty cachefile would work. I believe that I solved this problem in Gentoo by modifying genkernel to autodetect verbatim import from the cachefile and skip that step.

@jgoerzen
Copy link
Author

This is Debian with its default initramfs generator (hence the @FransUrbo cc)

It was working fine with the 0.6.2 support; the 0.6.3 dailies have shown this breakage.

@jgoerzen
Copy link
Author

How could I go about debugging hostid issues?

@FransUrbo
Copy link
Contributor

In the current released version, the initrd would try to mount it without using force. If that failed, it would then try a forced import.

In my dailies, I've removed that (because it's considered .... if not 'evil', but at least 'bad practice'). This unfortunately leads to the effect that more and more people are reporting import failures. It only affects people that have their root on ZFS.

The reason for this is that the pool isn't exported properly when the system shuts down. The new init scripts in the dailies does try to do this correctly (and that code is sound!). Unfortunately, since one is booting from the filesystem/pool, it can not be exported (because it is in use by the very script that tries to export it!).

Technically, this is not a problem because the filesystem is mounted read only a couple of moments earlier. It's just a problem with the next import (it will be reported as 'in use by another system', because it wasn't exported properly).

I have no real, good way of solving this unfortunately :(. @behlendorf have mentioned somewhere (a couple of years ago) that eventually the hostid 'stuff' will be removed (because it no longer serves a purpose if I remember correctly). Then this might go away.

But until then, adding the 'zfsforce=yes' option on the kernel command line will help. It is not a good and proper way, but it will work. And if you ONLY use your pool on one, single computer with only one OS (as opposed to importing it on multiple computers with many different operating systems) then there won't be any problem.

@behlendorf behlendorf added this to the 0.6.5 milestone Mar 21, 2014
@behlendorf behlendorf added the Bug label Mar 21, 2014
@behlendorf
Copy link
Contributor

Yes, I'd like to remove the existing hostid implementation in favor of proper multi-mount protection. That work is described in #745 and should resolve this issue, however we haven't scheduled anyone yet to do that work.

@jgoerzen
Copy link
Author

Thanks everyone for your help.

The way I see it, there are at least these three open questions:

  1. Why does zpool import fail without -f, given that everything I can see suggests that the hostid matches everywhere?

  2. Why does the zpool set cachefile= work around that first problem?

  3. Why does the system claim the pool is already imported after setting cachefile=?

@FransUrbo have you seen that third problem anywhere before? I can readily duplicate all of these.

@agijsberts
Copy link

Are there any updates on this issue?

This still seems to be relevant with release 0.6.3 and I only manage to boot from ZFS without errors (the same as reported by jgoerzen) if I build initramfs without zpool.cache (and zfsforce=1). It even gives the error when I write an explicit /etc/hostid and include that file also in initramfs. Like jgoerzen, I double checked that this hostid matches the pool's hostid and the hostid used by SPL during boot.

@behlendorf
Copy link
Contributor

@agijsberts For reasons like this in 0.6.3 we've set the default hostid to 0 which disables the hostid check. What you're going to want to do is make sure the hostid for your system gets set to 0 on boot by removing your /etc/hostid file. Then force import the pool and export it. At this point your pool should no longer contain a specific host id and will no long perform this check. You can verify that's the case by running zdb -l <device> on any of the disks and making sure there is no hostid entry listed.

If you need to run a fail over configuration in the future you'll need to explicitly create /etc/hostid files to enable this support. See openzfs/spl#224.

@agijsberts
Copy link

@behlendorf Thanks for your suggestions, they sound like the (default) setup I had originally. To be sure I removed /etc/hostid and rebuilt initramfs. After export/import the pools no longer have any hostid attached and SPL reports hostid=00000000. Unfortunately, I still get the following error (the same as the second one reported by jgoerzen):

FAIL: zpool import -c /etc/zfs/zpool.cache -N zroot -f. Retrying...
FAIL: zpool import -N -d /dev/disk/by-id zroot -f . Retrying...

Command: zpool import -N -d /dev/disk/by-id zroot -f
Message: cannot import 'zroot': a pool with that name is already created/imported,
and no additional pools with that name were found
Error: 1

At this point the pools are actually mounted and I can resume system boot simply by exiting the emergency shell (CTRL-D), so to my untrained eye it appears that it tries to import the pool twice.

So far the only ways I found to avoid this error are to either (1) to build initramfs without zpool.cache or (2) to explicitly set /etc/hostid. It might be a user-error somewhere, but I'm drawing a blank what it could be.

@StephanieSunshine
Copy link

I just did an install this morning using Linux Mint Debian ( Mate ) 64 bit rolling release with a ZFS (0.6.3) root and I'm experiencing this problem as well. I have tried removing /etc/hostid, adding zfsforce=1, and zpool set cachefile= as Ryao had suggested and nothing is working. Every boot I'm forced to type zpool import -f -N rpool ; exit to get it started.

I tried zdb -l | grep hostid and I see nothing. I did notice that hostname was set to '(none)', could this be a problem?

Anyone else have any other suggestions?

@agijsberts
Copy link

@FuzzySunshine This refers to a different problem than discussed here, but it seems you try to boot from the ZFS root dataset: did you try to remove the trailing slash from the cmdline in grub.cfg? (see: zfsonlinux/grub#15). Also make sure to try and rebuild initramfs without zpool.cache.

@StephanieSunshine
Copy link

@agijsberts Thank you for replying :)

I looked at the bug you linked and I don't think it applies because I did end up writing my own grub.cfg line " linux /vmlinuz-3.11-2-amd64 bootfs=rpool/ROOT/debian-live-1 boot=zfs ro ". I did try to delete zpool.cache and recreated initramfs (update-initramfs -u -k all ) with no luck. I did manage to figure out that when boot did bork, that if I force import and then export rpool and then reboot instead of exit that the next boot completes just fine without any interaction.

Any suggestions?

@agijsberts
Copy link

@FuzzySunshine You're right, in your case it cmdline issue does not apply. Make sure though to include zfsforce=1 in the cmdline, this is absolutely required (see FransUrbo's comment above).

I'm not a ZFS developer, so unfortunately I can merely suggest which things helped in my case. As temporary solution, you can also try to write /etc/hostid explicitly. Then export+import your pools, double check with zdb that it has been set (iirc zdb converts the bytes to decimal), and recreate initramfs. ZFS is moving away from this reliance on hostid, but at least this might help you right now (it worked in my case). If it doesn't, you might want to move the issue to the mailing list where more people might see it.

@l1k
Copy link
Contributor

l1k commented Oct 6, 2014

Likely fixed by #2766 if Dracut is used.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Oct 7, 2014
Make use of Dracut's ability to restore the initramfs on shutdown and
pivot to it, allowing for a clean unmount and export of the ZFS root.
No need to force-import on every reboot anymore.

Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2195
Issue openzfs#2476
Issue openzfs#2498
Issue openzfs#2556
Issue openzfs#2563
Issue openzfs#2575
Issue openzfs#2600
Issue openzfs#2755
Issue openzfs#2766
@behlendorf
Copy link
Contributor

The combination of d94fd5f and 07a3312 which are now in master should resolve this issue.

@behlendorf behlendorf modified the milestones: 0.6.4, 0.6.5 Oct 31, 2014
ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
Make use of Dracut's ability to restore the initramfs on shutdown and
pivot to it, allowing for a clean unmount and export of the ZFS root.
No need to force-import on every reboot anymore.

Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2195
Issue openzfs#2476
Issue openzfs#2498
Issue openzfs#2556
Issue openzfs#2563
Issue openzfs#2575
Issue openzfs#2600
Issue openzfs#2755
Issue openzfs#2766
@behlendorf
Copy link
Contributor

Closing, this was fixed in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants