Crash after shutdown #17

warped-rudi · 2014-09-15T09:22:46Z

When shutting down the system people experience a crash like this:

    [ 4535.825460] reboot: Power down
    [ 4535.832833] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
    [ 4535.832833]
    [ 4539.724102] ------------[ cut here ]------------
    [ 4539.728768] WARNING: CPU: 0 PID: 1 at kernel/time/tick-broadcast.c:604 tick_handle_oneshot_broadcast+0x1d8/0x200()
    [ 4539.739125] Modules linked in: bnep bluetooth 6lowpan_iphc brcmfmac brcmutil cfg80211 rfkill joydev ir_lirc_codec c
    [ 4539.765959] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.14-cubox-i #1
    [ 4539.773234] [<80015478>] (unwind_backtrace) from [<80011620>] (show_stack+0x10/0x14)
    [ 4539.781010] [<80011620>] (show_stack) from [<806ea118>] (dump_stack+0x84/0x94)
    [ 4539.788269] [<806ea118>] (dump_stack) from [<8002b964>] (warn_slowpath_common+0x70/0x8c)
    [ 4539.796387] [<8002b964>] (warn_slowpath_common) from [<8002ba1c>] (warn_slowpath_null+0x1c/0x24)
    [ 4539.805200] [<8002ba1c>] (warn_slowpath_null) from [<8007eed4>] (tick_handle_oneshot_broadcast+0x1d8/0x200)
    [ 4539.814974] [<8007eed4>] (tick_handle_oneshot_broadcast) from [<800221b8>] (mxc_timer_interrupt+0x2c/0x34)
    [ 4539.824665] [<800221b8>] (mxc_timer_interrupt) from [<8006c33c>] (handle_irq_event_percpu+0x54/0x180)
    [ 4539.833911] [<8006c33c>] (handle_irq_event_percpu) from [<8006c4ac>] (handle_irq_event+0x44/0x64)
    [ 4539.842807] [<8006c4ac>] (handle_irq_event) from [<8006f430>] (handle_fasteoi_irq+0x80/0x158)
    [ 4539.851354] [<8006f430>] (handle_fasteoi_irq) from [<8006bb04>] (generic_handle_irq+0x2c/0x3c)
    [ 4539.859991] [<8006bb04>] (generic_handle_irq) from [<8000eb80>] (handle_IRQ+0x40/0x90)
    [ 4539.867929] [<8000eb80>] (handle_IRQ) from [<800084e8>] (gic_handle_irq+0x2c/0x5c)
    [ 4539.875528] [<800084e8>] (gic_handle_irq) from [<806efb40>] (__irq_svc+0x40/0x50)
    [ 4539.883022] Exception stack(0xdc093dc0 to 0xdc093e08)
    [ 4539.888095] 3dc0: 0001ae04 ffffffff 00000182 802b3d88 00000057 80a336a4 00000ed8 00000fa0
    [ 4539.896291] 3de0: 00000000 80a601f0 00000000 dc0a0000 00000000 dc093e08 806e6e40 802b3db8
    [ 4539.904480] 3e00: 200f0013 ffffffff
    [ 4539.908000] [<806efb40>] (__irq_svc) from [<802b3db8>] (__loop_delay+0x0/0x10)
    [ 4539.915236] ---[ end trace 2755275968036b0d ]---

It does no harm, since the file system has already been closed. I did not yet find out the root cause, but it only happens, when the kernel receives LINUX_REBOOT_CMD_POWER_OFF. It does not happen on LINUX_REBOOT_CMD_HALT. Looking at the code I stumbled over commit d3e1746 (ENGR00243120). This add a default power off handler, which causes the kernel to call machine_power_off() instead of machine_halt(). The main difference between the two is, that the latter does not return while the first one does. The crash happens after the return and looking here and here we are not the only people having problems with this.
The question is how to fix it. The power off handler introduced by ENGR00243120 seems to have no effect on most iMX boards. So I don't think it should be installed by default. Maybe this could be controlled by a *.dts entry. Furthermore I don't really understand the difference in behavior between machine_halt() and machine_power_off(). I would have expected both to behave identically in case the power off function does not what it is supposed to do. Also I'm wondering if the ARM has something like Intel's HLT instruction to replace the busy loop in order to reduce power consumption in this state.

The text was updated successfully, but these errors were encountered:

linux4kix · 2014-09-15T09:43:16Z

I have accepted that upstream patch because as you know the Cubox-i and
hummingboard do not power off. I need to hook into machine_power_off so I
can then put the machine in the safest, lowest power mode possible while
still on.

The crash is trivial and happens because local irqs have been disabled but
not all drivers have disabled their interrupts properly. It is on my list
but not a priority because we get far enough into the halt that nothing bad
can become of it.

On Mon, Sep 15, 2014 at 11:22 AM, Rudi Ihle [email protected]
wrote:

When shutting down the system people experience a crash like this:
[ 4535.825460] reboot: Power down
[ 4535.832833] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
[ 4535.832833]
[ 4539.724102] ------------[ cut here ]------------
[ 4539.728768] WARNING: CPU: 0 PID: 1 at kernel/time/tick-broadcast.c:604 tick_handle_oneshot_broadcast+0x1d8/0x200()
[ 4539.739125] Modules linked in: bnep bluetooth 6lowpan_iphc brcmfmac brcmutil cfg80211 rfkill joydev ir_lirc_codec c
[ 4539.765959] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.14-cubox-i #1
[ 4539.773234] [<80015478>] (unwind_backtrace) from [<80011620>] (show_stack+0x10/0x14)
[ 4539.781010] [<80011620>] (show_stack) from [<806ea118>] (dump_stack+0x84/0x94)
[ 4539.788269] [<806ea118>] (dump_stack) from [<8002b964>] (warn_slowpath_common+0x70/0x8c)
[ 4539.796387] [<8002b964>] (warn_slowpath_common) from [<8002ba1c>] (warn_slowpath_null+0x1c/0x24)
[ 4539.805200] [<8002ba1c>] (warn_slowpath_null) from [<8007eed4>] (tick_handle_oneshot_broadcast+0x1d8/0x200)
[ 4539.814974] [<8007eed4>] (tick_handle_oneshot_broadcast) from [<800221b8>] (mxc_timer_interrupt+0x2c/0x34)
[ 4539.824665] [<800221b8>] (mxc_timer_interrupt) from [<8006c33c>] (handle_irq_event_percpu+0x54/0x180)
[ 4539.833911] [<8006c33c>] (handle_irq_event_percpu) from [<8006c4ac>] (handle_irq_event+0x44/0x64)
[ 4539.842807] [<8006c4ac>] (handle_irq_event) from [<8006f430>] (handle_fasteoi_irq+0x80/0x158)
[ 4539.851354] [<8006f430>] (handle_fasteoi_irq) from [<8006bb04>] (generic_handle_irq+0x2c/0x3c)
[ 4539.859991] [<8006bb04>] (generic_handle_irq) from [<8000eb80>] (handle_IRQ+0x40/0x90)
[ 4539.867929] [<8000eb80>] (handle_IRQ) from [<800084e8>] (gic_handle_irq+0x2c/0x5c)
[ 4539.875528] [<800084e8>] (gic_handle_irq) from [<806efb40>] (__irq_svc+0x40/0x50)
[ 4539.883022] Exception stack(0xdc093dc0 to 0xdc093e08)
[ 4539.888095] 3dc0: 0001ae04 ffffffff 00000182 802b3d88 00000057 80a336a4 00000ed8 00000fa0
[ 4539.896291] 3de0: 00000000 80a601f0 00000000 dc0a0000 00000000 dc093e08 806e6e40 802b3db8
[ 4539.904480] 3e00: 200f0013 ffffffff
[ 4539.908000] [<806efb40>] (__irq_svc) from [<802b3db8>] (__loop_delay+0x0/0x10)
[ 4539.915236] ---[ end trace 2755275968036b0d ]---
It does no harm, since the file system has already been closed. I did not
yet find out the root cause, but it only happens, when the kernel receives
LINUX_REBOOT_CMD_POWER_OFF. It does not happen on LINUX_REBOOT_CMD_HALT.
Looking at the code I stumbled over commit d3e1746
d3e1746
(ENGR00243120). This add a default power off handler, which causes the
kernel to call machine_power_off()
https://github.com/linux4kix/linux-linaro-stable-mx6/blob/linux-linaro-lsk-v3.14-mx6/arch/arm/kernel/process.c#L215
instead of machine_halt()
https://github.com/linux4kix/linux-linaro-stable-mx6/blob/linux-linaro-lsk-v3.14-mx6/arch/arm/kernel/process.c#L200.
The main difference between the two is, that the latter does not return
while the first one does. The crash happens after the return and looking
here
http://lists.infradead.org/pip%20ermail/linux-arm-kernel/2012-August/114864.html
and here https://lkml.org/lkml/2014/3/24/299 we are not the only people
having problems with this.
The question is how to fix it. The power off handler introduced by
ENGR00243120 seems to have no effect on most iMX boards. So I don't think
it should be installed by default. Maybe this could be controlled by a
*.dts entry. Furthermore I don't really understand the difference in
behavior between machine_halt() and machine_power_off(). I would have
expected both to behave identically in case the power off function does not
what it is supposed to do. Also I'm wondering if the ARM has something like
Intel's HLT instruction to replace the busy loop in order to reduce power
consumption in this state.

—
Reply to this email directly or view it on GitHub
#17.

warped-rudi · 2014-09-20T14:05:57Z

The oops can probably be avoided by stopping the GPT. But to me it appears that when machine_power_off() or machine_halt() return to their callers we do get this 'Attempted to kill init' panic.

BTW, here is a slightly improved version of the patch. Do you want a PR?

As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: linux4kix#1 linux4kix#2 linux4kix#3 linux4kix#4 linux4kix#5 linux4kix#6 linux4kix#7 OK [ 0.644008] smpboot: Booting Node 1, Processors: linux4kix#8 linux4kix#9 linux4kix#10 linux4kix#11 linux4kix#12 linux4kix#13 linux4kix#14 linux4kix#15 OK [ 1.245006] smpboot: Booting Node 2, Processors: linux4kix#16 linux4kix#17 linux4kix#18 linux4kix#19 linux4kix#20 linux4kix#21 linux4kix#22 linux4kix#23 OK [ 1.864005] smpboot: Booting Node 3, Processors: linux4kix#24 linux4kix#25 linux4kix#26 linux4kix#27 linux4kix#28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: linux4kix#1 linux4kix#2 linux4kix#3 linux4kix#4 linux4kix#5 linux4kix#6 linux4kix#7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: linux4kix#1 linux4kix#2 linux4kix#3 linux4kix#4 linux4kix#5 linux4kix#6 linux4kix#7 [ 0.603005] .... node linux4kix#1, CPUs: linux4kix#8 linux4kix#9 linux4kix#10 linux4kix#11 linux4kix#12 linux4kix#13 linux4kix#14 linux4kix#15 [ 1.200005] .... node linux4kix#2, CPUs: linux4kix#16 linux4kix#17 linux4kix#18 linux4kix#19 linux4kix#20 linux4kix#21 linux4kix#22 linux4kix#23 [ 1.796005] .... node linux4kix#3, CPUs: linux4kix#24 linux4kix#25 linux4kix#26 linux4kix#27 linux4kix#28 #29 #30 #31 [ 2.393005] .... node linux4kix#4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node linux4kix#5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node linux4kix#6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node linux4kix#7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node linux4kix#8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node linux4kix#9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node linux4kix#10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node linux4kix#11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node linux4kix#12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node linux4kix#13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node linux4kix#14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node linux4kix#15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

BugLink: http://bugs.launchpad.net/bugs/1235977 The profile introspection seq file has a locking bug when policy is viewed from a virtual root (task in a policy namespace), introspection from the real root is not affected. The test for root while (parent) { is correct for the real root, but incorrect for tasks in a policy namespace. This allows the task to walk backup the policy tree past its virtual root causing it to be unlocked before the virtual root should be in the p_stop fn. This results in the following lockdep back trace: [ 78.479744] [ BUG: bad unlock balance detected! ] [ 78.479792] 3.11.0-11-generic linux4kix#17 Not tainted [ 78.479838] ------------------------------------- [ 78.479885] grep/2223 is trying to release lock (&ns->lock) at: [ 78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10 [ 78.480002] but there are no more locks to release! [ 78.480037] [ 78.480037] other info that might help us debug this: [ 78.480037] 1 lock held by grep/2223: [ 78.480037] #0: (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0 [ 78.480037] [ 78.480037] stack backtrace: [ 78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic linux4kix#17 [ 78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 78.480037] ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190 [ 78.480037] ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190 [ 78.480037] ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7 [ 78.480037] Call Trace: [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff817b97ef>] dump_stack+0x54/0x74 [ 78.480037] [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100 [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300 [ 78.480037] [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180 [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff810e5d5c>] lock_release+0xac/0x310 [ 78.480037] [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180 [ 78.480037] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff81376c91>] p_stop+0x51/0x90 [ 78.480037] [<ffffffff81211408>] seq_read+0x288/0x3d0 [ 78.480037] [<ffffffff811e9d9e>] vfs_read+0x9e/0x170 [ 78.480037] [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0 [ 78.480037] [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f Signed-off-by: John Johansen <[email protected]> Signed-off-by: James Morris <[email protected]>

This patch makes a copy of the 'template_fmt' function argument so that the latter will not be modified by strsep(), which does the splitting by replacing the given separator with '\0'. IMA: No TPM chip found, activating TPM-bypass! Unable to handle kernel pointer dereference at virtual kernel address 0000000000842000 Oops: 0004 [linux4kix#1] SMP Modules linked in: CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc2-00098-g3ce1217d6cd5 linux4kix#17 task: 000000003ffa0000 ti: 000000003ff84000 task.ti: 000000003ff84000 Krnl PSW : 0704e00180000000 000000000044bf88 (strsep+0x7c/0xa0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3 Krnl GPRS: 000000000000007c 000000000000007c 000000003ff87d90 0000000000821fd8 0000000000000000 000000000000007c 0000000000aa37e0 0000000000aa9008 0000000000000051 0000000000a114d8 0000000100000002 0000000000842bde 0000000000842bdf 00000000006f97f0 000000000040062c 000000003ff87cf0 Krnl Code: 000000000044bf7c: a7f4000a brc 15,44bf90 000000000044bf80: b90200cc ltgr %r12,%r12 #000000000044bf84: a7840006 brc 8,44bf90 >000000000044bf88: 9200c000 mvi 0(%r12),0 000000000044bf8c: 41c0c001 la %r12,1(%r12) 000000000044bf90: e3c020000024 stg %r12,0(%r2) 000000000044bf96: b904002b lgr %r2,%r11 000000000044bf9a: ebbcf0700004 lmg %r11,%r12,112(%r15) Call Trace: ([<00000000004005fe>] ima_init_template+0xa2/0x1bc) [<0000000000a7c896>] ima_init+0x7a/0xa8 [<0000000000a7c938>] init_ima+0x24/0x40 [<00000000001000e8>] do_one_initcall+0x68/0x128 [<0000000000a4eb56>] kernel_init_freeable+0x20a/0x2b4 [<00000000006a1ff4>] kernel_init+0x30/0x178 [<00000000006b69fe>] kernel_thread_starter+0x6/0xc [<00000000006b69f8>] kernel_thread_starter+0x0/0xc Last Breaking-Event-Address: [<000000000044bf42>] strsep+0x36/0xa0 Fixes commit: adf53a7 ima: new templates management mechanism Changelog v1: - make template_fmt 'const char *' (reported-by James Morris) - fix kstrdup memory leak (reported-by James Morris) Reported-by: Heiko Carstens <[email protected]> Signed-off-by: Roberto Sassu <[email protected]> Signed-off-by: Mimi Zohar <[email protected]> Tested-by: Heiko Carstens <[email protected]>

…-mx6 PWM and fec initialization cleanup

…sk-v3.14-mx6" This reverts commit ee05b81, reversing changes made to 828f1bb.

tomlohave pushed a commit to tomlohave/linux-linaro-stable-mx6 that referenced this issue Feb 21, 2015

Merge pull request linux4kix#17 from linux4kix/linux-linaro-lsk-v3.14…

ee05b81

…-mx6 PWM and fec initialization cleanup

tomlohave pushed a commit to tomlohave/linux-linaro-stable-mx6 that referenced this issue Feb 21, 2015

Revert "Merge pull request linux4kix#17 from linux4kix/linux-linaro-l…

9f1e33a

…sk-v3.14-mx6" This reverts commit ee05b81, reversing changes made to 828f1bb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash after shutdown #17

Crash after shutdown #17

warped-rudi commented Sep 15, 2014

linux4kix commented Sep 15, 2014

warped-rudi commented Sep 20, 2014

Crash after shutdown #17

Crash after shutdown #17

Comments

warped-rudi commented Sep 15, 2014

linux4kix commented Sep 15, 2014

warped-rudi commented Sep 20, 2014