Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRUB fails to resolve canonical path to device, uses invalid partition and fails to detect zfs #5

Open
seletskiy opened this issue Dec 31, 2013 · 53 comments

Comments

@seletskiy
Copy link

Let's suppose following scenario:

I've want to create zpool that use entire device /dev/sda:

# ls -al /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001   
lrwxrwxrwx 1 root root 9 Dec 31 15:46 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 -> ../../sda

I'm using by-id path to disk while creating zpool:

# zpool create zroot /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001
# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

    NAME                         STATE     READ WRITE CKSUM
    zroot                        ONLINE       0     0     0
      ata-QEMU_HARDDISK_QM00001  ONLINE       0     0     0

errors: No known data errors

# zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
zroot   110K  3.91G    30K  /zroot

zroot is mounted into /zroot.

So, trying to grub-probe:

# grub-probe /zroot
grub-probe: error: failed to get canonical path of `/dev/ata-QEMU_HARDDISK_QM00001'.

Wut? So, GRUB detected that /zroot is a ZFS (otherwise, how did it know about ata-QEMU stuff?), but if fails to resolve correct path to device.

OK, let's try to fix it in a dirty way:

# ln -s /dev/sda /dev/ata-QEMU_HARDDISK_QM00001
# grub-probe /zroot
grub-probe: error: unknown filesystem.
# grub-probe -vv /zroot
grub-core/kern/fs.c:56: Detecting zfs...
grub-core/osdep/hostdisk.c:319: opening the device `/dev/sda' in open_device()
grub-core/fs/zfs/zfs.c:1183: label ok 0
grub-core/fs/zfs/zfs.c:1183: label ok 1
...
grub-core/kern/fs.c:78: zfs detection failed.

Didn't work. Hmmm... Looks like grub-probe tries to read from /dev/sda.
Take a look:

# zdb -l /dev/sda 
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
...

 # fdisk -l /dev/sda
Device           Start          End Size Type
/dev/sda1         2048      8370175   4G Solaris /usr & Apple ZFS
/dev/sda9      8370176      8386559   8M Solaris reserved 1

# zdb -l /dev/sda1

--------------------------------------------
LABEL 0
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 4
    pool_guid: 16263322471539432696
    hostname: 'archiso'
    top_guid: 9383719665143350581
    guid: 9383719665143350581
...

Indeed!

# ln -sf /dev/sda1 /dev/ata-QEMU_HARDDISK_QM00001
# grub-probe /zroot
zfs

After that series of hacks I can do grub-install and it finishes successfully.

So, my suggestion is that GRUB uses incorrect device path when zpool is created over entire device.

@FransUrbo
Copy link
Contributor

On Dec 31, 2013, at 5:48 PM, Stanislav Seletskiy wrote:

grub-probe /zroot

grub-probe: error: failed to get canonical path of `/dev/ata-QEMU_HARDDISK_QM00001'.
Wut? So, GRUB detected that /zroot is a ZFS (otherwise, how did it know about ata-QEMU stuff?), but if fails to resolve correct path to device.

Better question still: Why is it interested in that device in the first place?

The man page for grub-probe say:

Probe device information for a given path

But for a zfs filesystem, this should be the dataset, not the/a physical device in the pool!

I'm specifying the device to write to on the grub-install command line, so it shouldn't even care about
that device...

PS. I got the same error with the latest grub. Ever since I started trying to use a ZFS root in April, I've been having trouble with grub (mostly grub-probe). I had to resort to a hacked grub-probe (a shell script with hardcoded values.

@seletskiy
Copy link
Author

Better question still: Why is it interested in that device in the first place?

Truth! I was experimenting on a single vdev configuration, where this approach to access a device can be somehow doable, but in stripped or raidz configuration this would be utterly wrong...

@0x54726F6E
Copy link

I can confirm this is still an issue with debian wheezy which grub-mkconfig 2.01-22debian1+zfs3~wheezy

Running update-grub with set -x/+x set in grub-probe and /etc/grub/00_header will show that the problem is that grub-probe runs on /dev/sda1 to probe for the filesystem. Which is "unknown" to grub-probe. Some purging and reinstalling later the error still is

grub-probe: error: failed to get canonical path of udev. So no booting from a zfs pool for now.

@Logos01
Copy link

Logos01 commented May 1, 2014

I am also experiencing the behavior of failed grub installations. Ubuntu 14.04 w/ latest (as of 2014-04-30). I am also receiving the "grub-probe: error: failed to get canonical path of 'udev'." error.

@maci0
Copy link

maci0 commented May 9, 2014

i have a similar problem
grub2-probe fails when 'zpool' is not in the path

sh-4.2# /usr/sbin/grub2-probe --target=device /
/usr/sbin/grub2-probe: error: failed to get canonical path of ‘ZoL-2316/ROOT/rhel7’.
sh-4.2# export PATH=$PATH:/usr/sbin
sh-4.2# /usr/sbin/grub2-probe --target=device /
/dev/nbd0p2

@erpadmin
Copy link

I use ryao's method of full disk zfs which is a bit different to your approach (will be shown later)

anyway this is not an issue for me on raring (more specifically Mint 15), but the error the OP reported cropped up when testing out saucey (Mint 16). I had to use rarings zfs grub ppa as there doesn't seem to be a version specific to saucey.

Anyway here's some output on the new system:

.# grub-probe -V
grub-probe (GRUB) 2.00-19ubuntu2.1

.# grub-probe /
grub-probe: error: failed to get canonical path of /dev/ata-ST31500341AS_9VS40QN8.

.# grub-probe --target=device /
grub-probe: error: failed to get canonical path of /dev/ata-ST31500341AS_9VS40QN8.

.# /tmp/grub-probe -V
/tmp/grub-probe (GRUB) 2.00-13ubuntu3+zfs3~raring

.# /tmp/grub-probe /
/tmp/grub-probe: error: unknown filesystem

.# /tmp/grub-probe --target=device /
/dev/sdc1

On the older system grub-probe works fine:

olivia ~ # zpool status
  pool: pool_one
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on software that does not support
    feature flags.
  scan: resilvered 102G in 0h42m with 0 errors on Sun Oct 28 13:51:14 2012
config:

    NAME        STATE     READ WRITE CKSUM
    pool_one    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sda     ONLINE       0     0     0

errors: No known data errors
olivia ~ # grub-probe /
zfs
olivia ~ # gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.5

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 976773168 sectors, 465.8 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 91F0CCEC-AC16-F746-883C-909DF568C93B
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 976773134
Partitions will be aligned on 16-sector boundaries
Total free space is 29 sectors (14.5 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       976756735   465.8 GiB   BF01  zfs
   2              48            2047   1000.0 KiB  EF02  BIOS boot partition
   9       976756736       976773119   8.0 MiB     BF07  

@dasjoe
Copy link

dasjoe commented May 29, 2014

As a workaround I have udev create the required /dev/ata-* symlinks:

# cat > /etc/udev/rules.d/70-zfs-grub-fix.rules << 'EOF'
KERNEL=="sd*[a-z]1|cciss*[a-z]1", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="$env{ID_BUS}-$env{ID_SERIAL}"
EOF
# udevadm trigger

@Logos01
Copy link

Logos01 commented May 29, 2014

This workaround doesn't achieve correction of the problem behavior for me.

On Thu, May 29, 2014 at 5:38 AM, dasjoe [email protected] wrote:

As a workaround I have udev create the required /dev/ata-* symlinks:

cat > /etc/udev/rules.d/70-zfs-grub-fix.rules << 'EOF'KERNEL=="sd_[a-z]1|cciss_[a-z]1", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="$env{ID_BUS}-$env{ID_SERIAL}"

EOF# udevadm trigger


Reply to this email directly or view it on GitHub
#5 (comment).

@storepeter
Copy link

The rule above did not work for me, here is a rule, which works for me

KERNEL=="sd*[0-9]", IMPORT{parent}=="ID_*", ENV{ID_FS_TYPE}=="zfs_member", SYMLINK+="$env{ID_BUS}-$env{ID_SERIAL}-part%n"

@nelsoncs
Copy link

nelsoncs commented Jul 4, 2014

14.04 broke for me at an update to kernel 3.13.0-30. Originally I had used the raring zfs-grub version, but it now had failed. After remiving zfs-grub and going back to trusty grub version (see https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/050qD_mvMAg). I was having problem with the incorrect /dev/.....

So, thank you very much for this fix. I am back in business.

@maci0
Copy link

maci0 commented Jul 28, 2014

recently ran into the same error when using whole devices (on rhel7)

@seletskiy
Copy link
Author

Do anyone know, is this repo even alive? It's get ignored for about an year...

@maci0
Copy link

maci0 commented Jul 29, 2014

with gpt disks maybe using /dev/disk/by-partuuid helps ?

@maci0
Copy link

maci0 commented Jul 29, 2014

ok, it didnt help...

also when using /dev/sd* and not /dev/disk/by-id the same problems occurs.

when using /dev/disk/by-id and @dasjoe 's udev rule it partly works. but fails when the udev rule is not included in the initram image.

@danielkza
Copy link

+1, I'm running into this trying a 14.04 install.

@FransUrbo
Copy link
Contributor

So for conclusion:

  1. Grub doesn't check /dev/disk/by-* (it should)
  2. There's a workaround, which is ugly (if you ask me :) and only works for some

This seems to be true for upstream grub as well. Latest version as of now is 2.02~beta2. I've been trying to build that, but several tests fails. I'll keep trying and once I can successfully build it, I'll start figuring out how to get Grub to look in /dev/disk/by-*.

@danielkza
Copy link

Yeah, it's quite a strange situation, because the GRUB scripts do detect the root is a zfs one by falling back to stat, but then obviously fail to retrieve the pool name. Right now I'm editing /etc/grub.d/10_linux and retrieving the pool and bootfs by grep-ing zfs mount, but it doesn't seem reliable.

Also, after manually creating symlinks in /dev as the others pointed out, I still get a failed zfs detection. I don't have complete logs right now (I will post them later), but I suspect that, like some folk in the Arch Linux forum figured out, GRUB's ZFS support is missing features (upstream bug).

@dasjoe
Copy link

dasjoe commented Aug 12, 2014

I just realized why my workaround doesn't work sometimes, it assumes ZFS is using whole disks and not partitions, sdX1 being the zpool member.
A combination of both udev rules like this should work, as long as each disk contains (at most) one partition which is part of a zpool:

# cat > /etc/udev/rules.d/70-zfs-grub-fix.rules << 'EOF'
ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*", ENV{ID_FS_TYPE}=="zfs_member", SYMLINK+="$env{ID_BUS}-$env{ID_SERIAL} $env{ID_BUS}-$env{ID_SERIAL}-part%n"
EOF
# udevadm trigger

It should break when a disk's multiple partitions are used in zpools and you don't boot off the last (queried) zpool, as udev would overwrite $env{ID_BUS}-$env{ID_SERIAL} with that disk's last queried partiton.

@danielkza
Copy link

GRUB is actually using zpool status to grab the pool components and simply prepending /dev/ to them if they don't start with a slash. But ZoL only prints the basename of the device, leading to the wrong paths to be used (check grub/grub-core/osdep/unix/getroot.c:309 on GRUB 2.02-beta2).

GRUB actually does already have logic to look for a device in /dev recursively higher in the call chain, so it would probably be enough just to not append anything to the device name and let it bubble up. But it made me wonder: is GRUB the one making the wrong assumptions, or is ZoL non-standard in printing only the base name for devices instead of full relative paths to /dev?

@FransUrbo
Copy link
Contributor

GRUB is actually using zpool status to grab the pool components and simply appending '/dev/' to them if they don't start with a slash themselves.

Good catch! That is helpful, we could do something with this...
do BSD and Solaris print the full device name, the name relative to /dev, or just the basename?

(EDIT) Yes they do [print the name relative to /dev].

@FransUrbo
Copy link
Contributor

freenas# zpool status
  pool: share
 state: ONLINE
  scan: scrub repaired 0 in 59h49m with 0 errors on Tue Aug 12 20:50:19 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        share                                           ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/3c61e7dd-03a8-11e4-9ca5-003048928b1c  ONLINE       0     0     0
            gptid/3cbc0167-03a8-11e4-9ca5-003048928b1c  ONLINE       0     0     0
            gptid/3d1601a7-03a8-11e4-9ca5-003048928b1c  ONLINE       0     0     0
            gptid/3d71bb26-03a8-11e4-9ca5-003048928b1c  ONLINE       0     0     0
            gptid/3dc9af53-03a8-11e4-9ca5-003048928b1c  ONLINE       0     0     0
            gptid/3e230c8c-03a8-11e4-9ca5-003048928b1c  ONLINE       0     0     0

errors: No known data errors
freenas# ls -l /dev/gptid/3c61e7dd-03a8-11e4-9ca5-003048928b1c
crw-r-----  1 root  operator  0x8b Jul  5 20:11 /dev/gptid/3c61e7dd-03a8-11e4-9ca5-003048928b1c

@tilgovi
Copy link

tilgovi commented Aug 16, 2014

The last comment from @danielkza fixed the issue for me.

@danielkza
Copy link

Did you patch GRUB yourself? I resorted to using the GRUB and zfs-initramfs versions from Wheezy on Trusty. So far they seem to work alright,

@tilgovi
Copy link

tilgovi commented Aug 16, 2014

Sorry.... I meant @dasjoe. The udev rule worked for me. I just applied manually and will watch every upgrade like a hawk.

@danielkza
Copy link

The udev rule should work if you don't have a new enough pool: otherwise, even if you fix GRUB looking at the wrong device, it'd probably fail when checking the contained filesystem due to missing some disk format features as I mentioned before.

@buc0
Copy link

buc0 commented Aug 19, 2014

I ran into this after upgrading from 12.04 to 14.04. A bit of a nasty shock.

Adding symbolic links got grub-probe to identify the device but it still wasn't able to identify the filesystem as zfs.

But, based on the information here I was able to get it to work with the help of the following short perl script:

#!/usr/bin/perl

my $ipch;

my $cpid = open $ipch, '-|';

if( $cpid ) {
    while( my $line = <$ipch> ) {
        $line =~ s{^(\s+)((scsi|ata|wwn)-.*)}{$1disk/by-id/$2};
        print $line;
    }
}
elsif( $cpid == 0 ) {
    exec { '/sbin/zpool_true' } 'zpool', @ARGV;
}
else {
    die;
}

I moved the original /sbin/zpool to /sbin/zpool_true and then installed that script as /sbin/zpool.

@tronar
Copy link

tronar commented Aug 25, 2014

Just in case someone is using WWNs instead of scsi-xxx, this udev rules does it for me:

ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*", ENV{ID_FS_TYPE}=="zfs_member", SYMLINK+="wwn-$env{ID_WWN_WITH_EXTENSION}-part%n"

@FransUrbo
Copy link
Contributor

It's nice that people is posting workarounds, but it would be nice if someone could take some time to look at the code of grub and see if they could find a way to fix this once and for all...

I'm unfortunately very short on time (and the fact that it doesn't happen to me probably isn't helping :). I just have to many more important issues to deal with to look at this any closer. I spend almost a weekend a couple of weeks ago, but didn't get any closer to this.

@danielkza have some ideas on this (see #5 (comment)), it just needs to be taken to the next step...

@danielkza
Copy link

Even writing a patch for GRUB would only fix half the problem: if you create a new pool with ZFS 0.6.3 you will hit another one. The right action would be to build a version of the ZoL GRUB version for Ubuntu Trusty too: right now upstream is not good enough.

@danielkza
Copy link

There is another issue which seems related to GRUB missing some pool features. Even if it can successfully detect the correct devices to probe, it may not detect ZFS on them:

http://savannah.gnu.org/bugs/?42861

I agree that writing the patches and hoping they get upstream would be the ideal situation, but right now ZFS root is completely broken on Trusty without those workarounds, and possibly even then. The only solution I have found is using the Debian GRUB packages instead. And I have no idea how long the GRUB folks would take to get the fixes in.

@FransUrbo
Copy link
Contributor

http://savannah.gnu.org/bugs/?42861

See you have a fix. Good, if/when we have the core issue fixed, we can include these in our package(s) as well waiting for upstream to accept them.
I agree that writing the patches and hoping they get upstream would be the ideal situation, but right now ZFS root is completely broken on Trusty without those workarounds

It's broken because everyone is spending time finding workarounds instead of working on the real issue...

If this is perfectly ok with everyone, by all means. Continue with the workarounds. If you're not interested in getting a package that actually works, then who am I to argue - I have other things to do anyway...

@danielkza
Copy link

I don't have any particular knowledge of the GRUB code, nor much free time to work on the issue either. It does seem quite simple to fix though, so I'll see what I can do about it, but I can't commit to any time frame.

@nelsoncs
Copy link

Don't know if this is helpful or not but current grub for trusty (not
zfs-grub) has been working for me so far, with links from dev/disk/by-id
names positioned at /dev (ie. /dev/ata-ST3000DM001-1CH166_Z1F4B9FN-part3
-> sda3). so far no problem with mirrored disks, gpt partitions on a
non-EFI board.

On 08/25/14 16:09, Turbo Fredriksson wrote:

http://savannah.gnu.org/bugs/?42861

See you have a fix. Good, if/when we have the core issue fixed, we can
include these in our package(s) as well waiting for upstream to accept
them.
I agree that writing the patches and hoping they get upstream would
be the ideal situation, but right now ZFS root is completely broken on
Trusty without those workarounds

It's broken because everyone is spending time finding workarounds
instead of working on the real issue...

If this is perfectly ok with everyone, by all means. Continue with the
workarounds. If you're not interested in getting a package that
actually works, then who am I to argue - I have other things to do
anyway...


Reply to this email directly or view it on GitHub
#5 (comment).

@FransUrbo
Copy link
Contributor

with links from dev/disk/by-id names positioned at /dev

Yes, that's the whole point. That's a workaround, which I think is very ugly! The ... idea/hope is that we could get it fixed properly so none of that crap is needed...

@Der-Jan
Copy link

Der-Jan commented Aug 26, 2014

I don't see a problem with upstream - for me enabling libzfs did the trick
https://github.com/Der-Jan/grub-zfs
https://launchpad.net/~der-jan/+archive/ubuntu/grub

@danielkza
Copy link

libzfs will never be enabled by default in any distribution, linked statically or not. And it shouldn't really be necessary, since GRUB went all the way to do their own support. If using the default distribution packages is a goal all the small bugs need to be fixed.

@FransUrbo
Copy link
Contributor

If using the default distribution packages is a goal all the small bugs need to be fixed.

That is "only" the end goal... While we wait for them, we use, build and distribute our own package(s).

@tronar
Copy link

tronar commented Aug 26, 2014

I somehow feel guilty for starting some kind of argument here, but I'm missing something.
It would seem that the issue is that "zpool status" shows bare dev names and not /dev relative ones, and yet you keep talking about fixing GRUB.
I'm in no position to fix any, but to me the point is who defines the policy. I.e. why is that GRUB should look in /dev/disk/by-* ?

@danielkza
Copy link

There is no guarantee, by any of the ZFS implementations, about how device names will appear in zpool status. It isn't even meant to have a machine-parseable output (although it should probably have an option to do so). GRUB is relying on an undocumented assumption. The policy doesn't actually exist, but GRUB is assuming it does!

Considering there is even code to traverse the /dev tree recursively to find devices already, using it seems to me a strictly superior solution: it covers all the cases the naive method does and some more, and severely reduces the chances of any other different implementation breaking things yet again.

@FransUrbo
Copy link
Contributor

It isn't even meant to have a machine-parseable output (although it should probably have an option to do so).

Good idea. We have machine parseable output in other places, but not "status".

I have a script where I could really have needed it. Instead, I had to write 146 lines of bash code to make sure I caught all possibilities...

I've created openzfs/zfs#2626 where this can be discussed. I'm volunteering to do it even :).

Considering GRUB even code to browse the /dev tree recursively to look for devices already, using it seems to me a strictly superior solution

It's also the correct UNIX way - "do not assume" (my favorite quote is something like "If need is the mother of invention(s), then assumptions is the father of all fuckups" :).

@ghost
Copy link

ghost commented Nov 23, 2014

@buc0 mooving the original /sbin/zpool to /sbin/zpool_true and then creating a script as /sbin/zpool is quite danger as the next update-initramfs will include the zpool script into initramdisk instead of the original zpool binary and you wont be able to import any pool while in initram. I guess you can alter the /usr/share/initramfs-tools/scripts/zfs to reflect it.

@Darael
Copy link

Darael commented Dec 1, 2014

As of today, this bug has manifested in Debian Jessie.

@allentiak
Copy link

Same here.
I even converted an installed system to a ZFS pool using a separate /boot ext4 partition, in the hope that GRUB would not complain... with no avail.

gdisk -l /dev/sda

GPT fdisk (gdisk) version 0.8.10

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 625142448 sectors, 298.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): E8F8D1FE-07D5-374B-B577-22A5A46029B6
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 625142414
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number Start (sector) End (sector) Size Code Name
1 2048 411647 200.0 MiB 8300
2 411648 625141759 297.9 GiB BF00

zpool list

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zonda 296G 6.14G 290G 2% 1.00x ONLINE -

zfs list

NAME USED AVAIL REFER MOUNTPOINT
zonda 12.5G 279G 32K /zonda
zonda/home 61K 279G 31K /zonda/home
zonda/home/allentiak 30K 279G 30K /zonda/home/allentiak
zonda/rootfs 6.14G 279G 6.14G /zonda/rootfs
zonda/swap 6.38G 285G 20K -

I get the same "canonical path" error into the chroot (details below)...

/dev/sda1 (ext4) /boot
/zonda/rootfs (zfs) /
/zonda/home (zfs) /home
/zonda/home/allentiak (zfs) /home/allentiak

@ianhinder
Copy link

I think I am running into this on Ubuntu 16.04.1. I have a ZFS root on an LVM volume, and grub-probe is not finding it. zpool status gives the pool device as <vg>-<lv> rather than /dev/mapper/<vg>-<lv> or /dev/<vg>/<lv> and I get

$ sudo grub-probe /
grub-probe: error: failed to get canonical path of `/dev/<vg>-<lv>'.

Creating a symlink from /dev/mapper/<vg>-<lv> to /dev/<vg>-<lv> works around the problem for me (I added it to my /etc/rc.local rather than creating a udev rule), but it would be better if grub-probe would use its recursive search code in this case.

@megajocke
Copy link

megajocke commented Mar 28, 2017

For what it's worth, I get a similar behavior on Debian Jessie (installed roughly according to the guide, but with LUKS encrypted root) if I use the grub-pc package from jessie-backports (2017-03-28) but it works if I use the version from Debian testing instead.

I thought at first that I wouldn't need the newer grub version from testing because I was going to use /boot on ext4, but I found out the grub package is involved in figuring out the root=ZFS= kernel parameter. So I needed the testing version anyways.

The versions were like this:

# Before upgrade (does not work)
root@debnastest:/etc/grub.d# /.zfs/snapshot/seems-to-work-now/usr/sbin/grub-probe --version
/.zfs/snapshot/seems-to-work-now/usr/sbin/grub-probe (GRUB) 2.02~beta2-22+deb8u1

# After upgrade (works)
root@debnastest:~# grub-probe --version
grub-probe (GRUB) 2.02~beta3-5

root@debnastest:/etc/grub.d# apt-cache policy grub-pc
grub-pc:
  Installed: 2.02~beta3-5
  Candidate: 2.02~beta3-5
  Version table:
 *** 2.02~beta3-5 0
        600 http://ftp.debian.org/debian/ testing/main amd64 Packages
        100 /var/lib/dpkg/status
     2.02~beta2-22+deb8u1 0
        700 http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
        700 http://security.debian.org/ jessie/updates/main amd64 Packages

With grub-pc from jessie-backports grub-probe is broken:

root@debnastest:~# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 996K in 0h0m with 0 errors on Sat Mar 25 12:42:14 2017
config:

        NAME               STATE     READ WRITE CKSUM
        rpool              ONLINE       0     0     0
          mirror-0         ONLINE       0     0     0
            crypt-rpool-a  ONLINE       0     0     0
            crypt-rpool-b  ONLINE       0     0     0

errors: No known data errors
root@debnastest:~# grub-probe /boot
ext2
root@debnastest:~# grub-probe /
grub-probe: error: failed to get canonical path of `/dev/crypt-rpool-a'.
root@debnastest:~# ln -s /dev/mapper/crypt-rpool-* /dev/
root@debnastest:~# grub-probe /
grub-probe: error: unknown filesystem.

The first problem is that it looks for the devices in the wrong place. Symlinking them causes another unknown filesystem error in this version of grub from jessie-backports.

After upgrading to grub-pc from Debian testing it works even without the symlinks:

root@debnastest:~# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 996K in 0h0m with 0 errors on Sat Mar 25 12:42:14 2017
config:

        NAME               STATE     READ WRITE CKSUM
        rpool              ONLINE       0     0     0
          mirror-0         ONLINE       0     0     0
            crypt-rpool-a  ONLINE       0     0     0
            crypt-rpool-b  ONLINE       0     0     0

errors: No known data errors
root@debnastest:~# grub-probe --version
grub-probe (GRUB) 2.02~beta3-5
root@debnastest:~# grub-probe /
zfs
root@debnastest:~# ls -l /dev/crypt*
ls: cannot access /dev/crypt*: No such file or directory

The command used by /etc/grub.d/10_linux to find out the pool name for the root also started working when upgrading to grub from testing:

root@debnastest:/etc/grub.d# grub-probe --device /dev/mapper/crypt-rpool-a --target=fs_label                  
rpool

-edit-: here is what the partitioning looks like:

root@debnastest:~# gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.10

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 31457280 sectors, 15.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): C9A4B367-0338-4234-AF67-0A75E82ABF5C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 31457246
Partitions will be aligned on 2-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1         1667072        31440861   14.2 GiB    BF01  rpool-b
   2              34            4095   2.0 MiB     EF02  bios-boot
   3            4096          618495   300.0 MiB   EF00  efi-boot
   4          618496         1667071   512.0 MiB   FD00  boot-b
   9        31440862        31457246   8.0 MiB     8301  Linux reserved

root@debnastest:~# gdisk -l /dev/sdb
GPT fdisk (gdisk) version 0.8.10

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdb: 31457280 sectors, 15.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): F6113851-6D14-4C39-AE43-D14B2B36DD13
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 31457246
Partitions will be aligned on 2-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1         1667072        31440861   14.2 GiB    BF01  rpool-a
   2              34            4095   2.0 MiB     EF02  bios-boot
   3            4096          618495   300.0 MiB   EF00  efi-boot
   4          618496         1667071   512.0 MiB   FD00  boot-a
   9        31440862        31457246   8.0 MiB     8301  Linux reserved

I boot using the BIOS method with GPT partitioning, but have prepared the efi partitions for future use (unformatted). I'm testing this in VirtualBox as a candidate for a home server install. /boot is on ext4 on mdadm RAID1 on the boot-{a,b} partitions.

mdadm and cryptsetup:

root@debnastest:~# cat /etc/crypttab 
# <target name> <source device>         <key file>      <options>
crypt-rpool-a /dev/disk/by-partlabel/rpool-a none luks,initramfs
crypt-rpool-b /dev/disk/by-partlabel/rpool-b none luks,initramfs

root@debnastest:~# cat /etc/mdadm/mdadm.conf 
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md/0  metadata=1.2 UUID=985a044f:1e43f780:7df469cf:48b6eca2 name=debian:0

# This configuration was auto-generated on Thu, 23 Mar 2017 19:18:37 +0100 by mkconf

fstab:

root@debnastest:~# cat /etc/fstab 
rpool/ROOT/debian / zfs defaults 0 1
/dev/md/0 /boot ext4 defaults 0 2

@sehe
Copy link

sehe commented Apr 9, 2017

I too had this problem on Ubuntu 16.04.2 LTS, when trying to create a bootable USB using grub-install manually.

I worked around it by creating the symlinks in /dev/

+1 for #2626

@ChristianUlbrich
Copy link

FWIW I had a similar problem on latest Ubuntu 16.04, when chrooting into a zfs root (to clone a running system to use zfs on root) and symlinks helped as well.

@tijszwinkels
Copy link

This answer on askubuntu explains the problem (and the solution) very well.

In my case, the symlink trick resulted in a unbootable system. The following fix works for me:

export ZPOOL_VDEV_NAME_PATH=YES

This causes zpool status to output full device paths instead of shortened names. This is what grub uses to determine where to find the block-devices.

@Rovanion
Copy link

Rovanion commented Jan 5, 2018

Is it possible to add export ZPOOL_VDEV_NAME_PATH=YES to every run of unattended upgrades so that it can update grub properly?

@sehe
Copy link

sehe commented Jan 7, 2018

@Rovanion have you tried adding it to something like /etc/environment or /etc/profile?

@Rovanion
Copy link

Adding it to /etc/environment helped with interactive sessions. We'll see if it works for unattended upgrades.

Vaelatern pushed a commit to void-linux/void-packages that referenced this issue Sep 29, 2019
Not setting this results in the output of 'zpool status' being
misparsed.

See zfsonlinux/grub#5

Closes: #14801 [via git-merge-pr]
atweiden pushed a commit to atweiden/voidpkgs that referenced this issue Sep 30, 2019
Not setting this results in the output of 'zpool status' being misparsed.

See zfsonlinux/grub#5
@isopix
Copy link

isopix commented May 29, 2021

I'm planning to move my system to ZFS and wonder if all this workarounds are still needed for grub 2.0.6rc1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests