Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs_util crashes on restart #776

Open
jawbroken opened this issue Sep 22, 2020 · 7 comments
Open

zfs_util crashes on restart #776

jawbroken opened this issue Sep 22, 2020 · 7 comments

Comments

@jawbroken
Copy link

jawbroken commented Sep 22, 2020

I have OpenZFS_on_OS_X_1.9.4 2020-02-29 installed on macOS Catalina 10.15.6. I'm getting crashes in zfs_util on every restart. They always seem to come in pairs, as follows.

Example pair:

zfs_util_2020-09-22-232650_fileserver.crash

Process:               zfs_util [622]
Path:                  /Library/Filesystems/zfs.fs/Contents/Resources/zfs_util
Identifier:            zfs_util
Version:               0
Code Type:             X86-64 (Native)
Parent Process:        diskarbitrationd [195]
Responsible:           diskarbitrationd [195]
User ID:               0

Date/Time:             2020-09-22 23:26:50.074 +1000
OS Version:            Mac OS X 10.15.6 (19G2021)
Report Version:        12
Anonymous UUID:        77D92FC9-1CD8-9252-E068-3F13655407BE


Time Awake Since Boot: 55 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [622]

VM Regions Near 0:
--> 
    __TEXT                 000000010c369000-000000010c53e000 [ 1876K] r-x/r-x SM=COW  /Library/Filesystems/zfs.fs/Contents/Resources/zfs_util

Application Specific Information:
dyld2 mode

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib      	0x00007fff69164e52 _platform_strlen + 18
1   libsystem_platform.dylib      	0x00007fff69165718 _platform_strlcpy + 28
2   libsystem_c.dylib             	0x00007fff6903b279 __strlcpy_chk + 30
3   zfs_util                      	0x000000010c36b403 main + 2139
4   libdyld.dylib                 	0x00007fff68f6ecc9 start + 1

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000080  rcx: 0x0000000000000000  rdx: 0x0000000000000000
  rdi: 0x0000000000000000  rsi: 0x0000000000000000  rbp: 0x00007ffee3895b30  rsp: 0x00007ffee3895b30
   r8: 0x000000000000065d   r9: 0x0000000000000022  r10: 0x00007ffee3896d90  r11: 0x00007fff6903b25b
  r12: 0x0000000000000080  r13: 0x0000000000000006  r14: 0x00007fa90740a340  r15: 0x0000000000000000
  rip: 0x00007fff69164e52  rfl: 0x0000000000010246  cr2: 0x0000000000000000
  
Logical CPU:     1
Error Code:      0x00000004 (no mapping for user data read)
Trap Number:     14

zfs_util_2020-09-22-232739_tbfileserver.crash

Process:               zfs_util [821]
Path:                  /Library/Filesystems/zfs.fs/Contents/Resources/zfs_util
Identifier:            zfs_util
Version:               0
Code Type:             X86-64 (Native)
Parent Process:        diskarbitrationd [195]
Responsible:           diskarbitrationd [195]
User ID:               0

Date/Time:             2020-09-22 23:27:14.596 +1000
OS Version:            Mac OS X 10.15.6 (19G2021)
Report Version:        12
Anonymous UUID:        77D92FC9-1CD8-9252-E068-3F13655407BE


Time Awake Since Boot: 80 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [821]

VM Regions Near 0:
--> 
    __TEXT                 0000000106275000-000000010644a000 [ 1876K] r-x/r-x SM=COW  /Library/Filesystems/zfs.fs/Contents/Resources/zfs_util

Application Specific Information:
dyld2 mode

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib      	0x00007fff69164e52 _platform_strlen + 18
1   libsystem_platform.dylib      	0x00007fff69165718 _platform_strlcpy + 28
2   libsystem_c.dylib             	0x00007fff6903b279 __strlcpy_chk + 30
3   zfs_util                      	0x0000000106277403 main + 2139
4   libdyld.dylib                 	0x00007fff68f6ecc9 start + 1

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000080  rcx: 0x0000000000000000  rdx: 0x0000000000000000
  rdi: 0x0000000000000000  rsi: 0x0000000000000000  rbp: 0x00007ffee9989b30  rsp: 0x00007ffee9989b30
   r8: 0x0000000000000043   r9: 0x0000000000000024  r10: 0x00007ffee998ad90  r11: 0x00007fff6903b25b
  r12: 0x0000000000000080  r13: 0x0000000000000007  r14: 0x00007f8a64504080  r15: 0x0000000000000000
  rip: 0x00007fff69164e52  rfl: 0x0000000000010246  cr2: 0x0000000000000000
  
Logical CPU:     2
Error Code:      0x00000004 (no mapping for user data read)
Trap Number:     14

I only see two uses of strlcpy in zfsutil, here and here. The latter is perhaps more supicious, being introduced in this commit from April 2019 and mentioning diskarbitrationd, etc.

@jawbroken jawbroken changed the title Frequent zfs_util crashes zfs_util crashes on restart Sep 22, 2020
@lundman
Copy link
Contributor

lundman commented Sep 23, 2020

Seems related to:

# lldb /Library/Filesystems/zfs.fs/Contents/Resources/zfs_util

(lldb) image lookup --verbose --address main+2139
      Address: zfs_util[0x0000000100002403] (zfs_util.__TEXT.__text + 4419)
      Summary: zfs_util`main + 2139 [inlined] zfs_probe_iokit + 40 at zfs_util.c:733
               zfs_util`main + 2099 at zfs_util.c:733
       Module: file = "/Library/Filesystems/zfs.fs/Contents/Resources/zfs_util", arch = "x86_64"
  CompileUnit: id = {0x00000000}, file = "/Users/lundman/Developer/catalina/zfs/cmd/zfs_util/zfs_util.c", language = "c99"
     Function: id = {0x100000aec}, name = "main", range = [0x0000000100001ba8-0x0000000100002458)
     FuncType: id = {0x100000aec}, byte-size = 0, decl = zfs_util.c:609, compiler_type = "int (int, char **)"
       Blocks: id = {0x100000aec}, range = [0x100001ba8-0x100002458)
               id = {0x100000c86}, ranges = [0x100001e75-0x100001e96)[0x100001fe5-0x10000200f)[0x100002110-0x1000021f9)[0x1000023db-0x100002438), name = "zfs_probe_iokit", decl = zfs_util.c:418
    LineEntry: [0x0000000100002403-0x000000010000240b): /Users/lundman/Developer/catalina/zfs/cmd/zfs_util/zfs_util.c:475:3
       Symbol: id = {0x000009e8}, range = [0x0000000100001ba8-0x0000000100002484), name="main"
     Variable: id = {0x100000c92}, name = "devpath", type = "const char *", location = , decl = zfs_util.c:418
     Variable: id = {0x100000c97}, name = "args", type = "probe_args_t *", location = , decl = zfs_util.c:418
     Variable: id = {0x100000c9c}, name = "result", type = "int", location = DW_OP_consts(0x-1) , decl = zfs_util.c:424
     Variable: id = {0x100000ca5}, name = "cfstr", type = "CFStringRef", location = r15, decl = zfs_util.c:423
     Variable: id = {0x100000cae}, name = "service", type = "io_service_t", location = r14, decl = zfs_util.c:422
     Variable: id = {0x100000b26}, name = "blockdevice", type = "char [1024]", location = DW_OP_fbreg(-3200), decl = zfs_util.c:612
     Variable: id = {0x100000b36}, name = "rawdevice", type = "char [1024]", location = DW_OP_fbreg(-2176), decl = zfs_util.c:613
     Variable: id = {0x100000b46}, name = "volname", type = "char [1024]", location = DW_OP_fbreg(-4368), decl = zfs_util.c:626
     Variable: id = {0x100000b56}, name = "pool_name", type = "char *", location = rbx, decl = zfs_util.c:627
     Variable: id = {0x100000b86}, name = "what", type = "char", location = , decl = zfs_util.c:614
     Variable: id = {0x100000bb2}, name = "i", type = "int", location = r13, decl = zfs_util.c:623
     Variable: id = {0x100000bc2}, name = "sb", type = "stat", location = [rbp-3344], decl = zfs_util.c:618
     Variable: id = {0x100000bf2}, name = "len", type = "int", location = DW_OP_consts(0x256) , decl = zfs_util.c:623
     Variable: id = {0x100000c12}, name = "attr", type = "attrlist", location = , decl = zfs_util.c:624
     Variable: id = {0x100000c1e}, name = "nameBuf", type = "attrNameBuf", location = , decl = zfs_util.c:625
                cfstr = IORegistryEntryCreateCFProperty(service,
                        CFSTR("ZFS Dataset"), kCFAllocatorDefault, 0);
                if (cfstr != NULL) {
                        (void) strlcpy(args->pool_name,
                                CFStringGetCStringPtr(cfstr,
                                        kCFStringEncodingMacRoman),
                                sizeof (io_name_t));

Makes me wonder what string is in your "ZFS Dataset". Ie, can you run:
# ioreg -al
and search for "ZFS Dataset". My test pool194 has:

   <key>ZFS Dataset</key>
                            <string>pool194</string>

@jawbroken
Copy link
Author

I have two pools, so I get two results that look benign:

<key>ZFS Dataset</key>
<string>lstorage</string>
…
<key>ZFS Dataset</key>
<string>tbstorage</string>

@rejsmont
Copy link

I am experiencing exactly the same crashes (same symbols), and additionally, random reboots upon high load when the ZFS pool is imported. I did not find any kernel panic dumps related to these reboots. Not sure if it's related, if you think it's not, I can submit a separate issue.

# uname -a
Darwin SFLab-MacBookPro.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64

# zfs --version
zfs-1.9.3-0
zfs-kmod-1.9.3-0

Cheers!

@lundman
Copy link
Contributor

lundman commented Oct 14, 2020

I tightened up that code and hopefully it will do better. It will be in the v2.0 version, but you could extract the zfs_util and give it a try.

@jawbroken
Copy link
Author

jawbroken commented Oct 14, 2020

I am experiencing exactly the same crashes (same symbols), and additionally, random reboots upon high load when the ZFS pool is imported. I did not find any kernel panic dumps related to these reboots. Not sure if it's related, if you think it's not, I can submit a separate issue.

I was experiencing similar random reboots which is what led me to look into and report this issue, but I'm not sure the two things are related (I didn't initially realise the zfs_util crashes were happening after restart). I was also seeing mds crashes, but I'm not sure whether that's relevant either. So I have disabled all mds activity using mdutil, which also meant having to unmount my HFS+ Time Machine drive (you can't disable mds on a Time Machine drive), and haven't had a crash in the last 22 days. When I find time I'll start re-enabling things and remounting that drive and try to isolate the issue, because it's unclear if it's ZFS related at all.

Edit: Though this part of the 1.9.4 changelog made me suspect that the mds crashes could theoretically be related: “Reserved inode numbers avoided (Catalina, spotlight might need to reindex)”

@eejd
Copy link

eejd commented Dec 21, 2020

I have been experiencing what appears to be the same kernel panic caused by what appears to be the same zfs_util bad memory access. Do you need more logs? Should we try using the 2.0 RC already or is there a branch with this fix backported?

@jawbroken
Copy link
Author

jawbroken commented Dec 22, 2020

Are you sure the kernel panics are being caused by zfs_util? I originally had similar suspicions until i realised the zfs_util crashes were happening after reboot, not before, and therefore appear to be relatively benign. See “Time Awake Since Boot: 55 seconds” in my above logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants