Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash after shutdown #17

Open
warped-rudi opened this issue Sep 15, 2014 · 2 comments
Open

Crash after shutdown #17

warped-rudi opened this issue Sep 15, 2014 · 2 comments

Comments

@warped-rudi
Copy link
Contributor

When shutting down the system people experience a crash like this:

    [ 4535.825460] reboot: Power down
    [ 4535.832833] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
    [ 4535.832833]
    [ 4539.724102] ------------[ cut here ]------------
    [ 4539.728768] WARNING: CPU: 0 PID: 1 at kernel/time/tick-broadcast.c:604 tick_handle_oneshot_broadcast+0x1d8/0x200()
    [ 4539.739125] Modules linked in: bnep bluetooth 6lowpan_iphc brcmfmac brcmutil cfg80211 rfkill joydev ir_lirc_codec c
    [ 4539.765959] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.14-cubox-i #1
    [ 4539.773234] [<80015478>] (unwind_backtrace) from [<80011620>] (show_stack+0x10/0x14)
    [ 4539.781010] [<80011620>] (show_stack) from [<806ea118>] (dump_stack+0x84/0x94)
    [ 4539.788269] [<806ea118>] (dump_stack) from [<8002b964>] (warn_slowpath_common+0x70/0x8c)
    [ 4539.796387] [<8002b964>] (warn_slowpath_common) from [<8002ba1c>] (warn_slowpath_null+0x1c/0x24)
    [ 4539.805200] [<8002ba1c>] (warn_slowpath_null) from [<8007eed4>] (tick_handle_oneshot_broadcast+0x1d8/0x200)
    [ 4539.814974] [<8007eed4>] (tick_handle_oneshot_broadcast) from [<800221b8>] (mxc_timer_interrupt+0x2c/0x34)
    [ 4539.824665] [<800221b8>] (mxc_timer_interrupt) from [<8006c33c>] (handle_irq_event_percpu+0x54/0x180)
    [ 4539.833911] [<8006c33c>] (handle_irq_event_percpu) from [<8006c4ac>] (handle_irq_event+0x44/0x64)
    [ 4539.842807] [<8006c4ac>] (handle_irq_event) from [<8006f430>] (handle_fasteoi_irq+0x80/0x158)
    [ 4539.851354] [<8006f430>] (handle_fasteoi_irq) from [<8006bb04>] (generic_handle_irq+0x2c/0x3c)
    [ 4539.859991] [<8006bb04>] (generic_handle_irq) from [<8000eb80>] (handle_IRQ+0x40/0x90)
    [ 4539.867929] [<8000eb80>] (handle_IRQ) from [<800084e8>] (gic_handle_irq+0x2c/0x5c)
    [ 4539.875528] [<800084e8>] (gic_handle_irq) from [<806efb40>] (__irq_svc+0x40/0x50)
    [ 4539.883022] Exception stack(0xdc093dc0 to 0xdc093e08)
    [ 4539.888095] 3dc0: 0001ae04 ffffffff 00000182 802b3d88 00000057 80a336a4 00000ed8 00000fa0
    [ 4539.896291] 3de0: 00000000 80a601f0 00000000 dc0a0000 00000000 dc093e08 806e6e40 802b3db8
    [ 4539.904480] 3e00: 200f0013 ffffffff
    [ 4539.908000] [<806efb40>] (__irq_svc) from [<802b3db8>] (__loop_delay+0x0/0x10)
    [ 4539.915236] ---[ end trace 2755275968036b0d ]---

It does no harm, since the file system has already been closed. I did not yet find out the root cause, but it only happens, when the kernel receives LINUX_REBOOT_CMD_POWER_OFF. It does not happen on LINUX_REBOOT_CMD_HALT. Looking at the code I stumbled over commit d3e1746 (ENGR00243120). This add a default power off handler, which causes the kernel to call machine_power_off() instead of machine_halt(). The main difference between the two is, that the latter does not return while the first one does. The crash happens after the return and looking here and here we are not the only people having problems with this.
The question is how to fix it. The power off handler introduced by ENGR00243120 seems to have no effect on most iMX boards. So I don't think it should be installed by default. Maybe this could be controlled by a *.dts entry. Furthermore I don't really understand the difference in behavior between machine_halt() and machine_power_off(). I would have expected both to behave identically in case the power off function does not what it is supposed to do. Also I'm wondering if the ARM has something like Intel's HLT instruction to replace the busy loop in order to reduce power consumption in this state.

@linux4kix
Copy link
Owner

I have accepted that upstream patch because as you know the Cubox-i and
hummingboard do not power off. I need to hook into machine_power_off so I
can then put the machine in the safest, lowest power mode possible while
still on.

The crash is trivial and happens because local irqs have been disabled but
not all drivers have disabled their interrupts properly. It is on my list
but not a priority because we get far enough into the halt that nothing bad
can become of it.

On Mon, Sep 15, 2014 at 11:22 AM, Rudi Ihle [email protected]
wrote:

When shutting down the system people experience a crash like this:

[ 4535.825460] reboot: Power down
[ 4535.832833] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
[ 4535.832833]
[ 4539.724102] ------------[ cut here ]------------
[ 4539.728768] WARNING: CPU: 0 PID: 1 at kernel/time/tick-broadcast.c:604 tick_handle_oneshot_broadcast+0x1d8/0x200()
[ 4539.739125] Modules linked in: bnep bluetooth 6lowpan_iphc brcmfmac brcmutil cfg80211 rfkill joydev ir_lirc_codec c
[ 4539.765959] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.14-cubox-i #1
[ 4539.773234] [<80015478>] (unwind_backtrace) from [<80011620>] (show_stack+0x10/0x14)
[ 4539.781010] [<80011620>] (show_stack) from [<806ea118>] (dump_stack+0x84/0x94)
[ 4539.788269] [<806ea118>] (dump_stack) from [<8002b964>] (warn_slowpath_common+0x70/0x8c)
[ 4539.796387] [<8002b964>] (warn_slowpath_common) from [<8002ba1c>] (warn_slowpath_null+0x1c/0x24)
[ 4539.805200] [<8002ba1c>] (warn_slowpath_null) from [<8007eed4>] (tick_handle_oneshot_broadcast+0x1d8/0x200)
[ 4539.814974] [<8007eed4>] (tick_handle_oneshot_broadcast) from [<800221b8>] (mxc_timer_interrupt+0x2c/0x34)
[ 4539.824665] [<800221b8>] (mxc_timer_interrupt) from [<8006c33c>] (handle_irq_event_percpu+0x54/0x180)
[ 4539.833911] [<8006c33c>] (handle_irq_event_percpu) from [<8006c4ac>] (handle_irq_event+0x44/0x64)
[ 4539.842807] [<8006c4ac>] (handle_irq_event) from [<8006f430>] (handle_fasteoi_irq+0x80/0x158)
[ 4539.851354] [<8006f430>] (handle_fasteoi_irq) from [<8006bb04>] (generic_handle_irq+0x2c/0x3c)
[ 4539.859991] [<8006bb04>] (generic_handle_irq) from [<8000eb80>] (handle_IRQ+0x40/0x90)
[ 4539.867929] [<8000eb80>] (handle_IRQ) from [<800084e8>] (gic_handle_irq+0x2c/0x5c)
[ 4539.875528] [<800084e8>] (gic_handle_irq) from [<806efb40>] (__irq_svc+0x40/0x50)
[ 4539.883022] Exception stack(0xdc093dc0 to 0xdc093e08)
[ 4539.888095] 3dc0: 0001ae04 ffffffff 00000182 802b3d88 00000057 80a336a4 00000ed8 00000fa0
[ 4539.896291] 3de0: 00000000 80a601f0 00000000 dc0a0000 00000000 dc093e08 806e6e40 802b3db8
[ 4539.904480] 3e00: 200f0013 ffffffff
[ 4539.908000] [<806efb40>] (__irq_svc) from [<802b3db8>] (__loop_delay+0x0/0x10)
[ 4539.915236] ---[ end trace 2755275968036b0d ]---

It does no harm, since the file system has already been closed. I did not
yet find out the root cause, but it only happens, when the kernel receives
LINUX_REBOOT_CMD_POWER_OFF. It does not happen on LINUX_REBOOT_CMD_HALT.
Looking at the code I stumbled over commit d3e1746
d3e1746
(ENGR00243120). This add a default power off handler, which causes the
kernel to call machine_power_off()
https://github.com/linux4kix/linux-linaro-stable-mx6/blob/linux-linaro-lsk-v3.14-mx6/arch/arm/kernel/process.c#L215
instead of machine_halt()
https://github.com/linux4kix/linux-linaro-stable-mx6/blob/linux-linaro-lsk-v3.14-mx6/arch/arm/kernel/process.c#L200.
The main difference between the two is, that the latter does not return
while the first one does. The crash happens after the return and looking
here
http://lists.infradead.org/pip%20ermail/linux-arm-kernel/2012-August/114864.html
and here https://lkml.org/lkml/2014/3/24/299 we are not the only people
having problems with this.
The question is how to fix it. The power off handler introduced by
ENGR00243120 seems to have no effect on most iMX boards. So I don't think
it should be installed by default. Maybe this could be controlled by a
*.dts entry. Furthermore I don't really understand the difference in
behavior between machine_halt() and machine_power_off(). I would have
expected both to behave identically in case the power off function does not
what it is supposed to do. Also I'm wondering if the ARM has something like
Intel's HLT instruction to replace the busy loop in order to reduce power
consumption in this state.


Reply to this email directly or view it on GitHub
#17.

@warped-rudi
Copy link
Contributor Author

The oops can probably be avoided by stopping the GPT. But to me it appears that when machine_power_off() or machine_halt() return to their callers we do get this 'Attempted to kill init' panic.

BTW, here is a slightly improved version of the patch. Do you want a PR?

wolfgar pushed a commit to wolfgar/linux-linaro-stable-mx6 that referenced this issue Sep 21, 2014
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      linux4kix#1  linux4kix#2  linux4kix#3  linux4kix#4  linux4kix#5  linux4kix#6  linux4kix#7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  linux4kix#8  linux4kix#9 linux4kix#10 linux4kix#11 linux4kix#12 linux4kix#13 linux4kix#14 linux4kix#15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: linux4kix#16 linux4kix#17 linux4kix#18 linux4kix#19 linux4kix#20 linux4kix#21 linux4kix#22 linux4kix#23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: linux4kix#24 linux4kix#25 linux4kix#26 linux4kix#27 linux4kix#28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    linux4kix#1 linux4kix#2 linux4kix#3 linux4kix#4 linux4kix#5 linux4kix#6 linux4kix#7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Libin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
wolfgar pushed a commit to wolfgar/linux-linaro-stable-mx6 that referenced this issue Sep 21, 2014
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          linux4kix#1   linux4kix#2   linux4kix#3   linux4kix#4   linux4kix#5   linux4kix#6   linux4kix#7
[    0.603005] .... node   linux4kix#1, CPUs:     linux4kix#8   linux4kix#9  linux4kix#10  linux4kix#11  linux4kix#12  linux4kix#13  linux4kix#14  linux4kix#15
[    1.200005] .... node   linux4kix#2, CPUs:    linux4kix#16  linux4kix#17  linux4kix#18  linux4kix#19  linux4kix#20  linux4kix#21  linux4kix#22  linux4kix#23
[    1.796005] .... node   linux4kix#3, CPUs:    linux4kix#24  linux4kix#25  linux4kix#26  linux4kix#27  linux4kix#28  #29  #30  #31
[    2.393005] .... node   linux4kix#4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   linux4kix#5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   linux4kix#6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   linux4kix#7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   linux4kix#8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   linux4kix#9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  linux4kix#10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  linux4kix#11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  linux4kix#12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  linux4kix#13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  linux4kix#14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  linux4kix#15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
wolfgar pushed a commit to wolfgar/linux-linaro-stable-mx6 that referenced this issue Sep 21, 2014
BugLink: http://bugs.launchpad.net/bugs/1235977

The profile introspection seq file has a locking bug when policy is viewed
from a virtual root (task in a policy namespace), introspection from the
real root is not affected.

The test for root
    while (parent) {
is correct for the real root, but incorrect for tasks in a policy namespace.
This allows the task to walk backup the policy tree past its virtual root
causing it to be unlocked before the virtual root should be in the p_stop
fn.

This results in the following lockdep back trace:
[   78.479744] [ BUG: bad unlock balance detected! ]
[   78.479792] 3.11.0-11-generic linux4kix#17 Not tainted
[   78.479838] -------------------------------------
[   78.479885] grep/2223 is trying to release lock (&ns->lock) at:
[   78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
[   78.480002] but there are no more locks to release!
[   78.480037]
[   78.480037] other info that might help us debug this:
[   78.480037] 1 lock held by grep/2223:
[   78.480037]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0
[   78.480037]
[   78.480037] stack backtrace:
[   78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic linux4kix#17
[   78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   78.480037]  ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190
[   78.480037]  ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190
[   78.480037]  ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7
[   78.480037] Call Trace:
[   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff817b97ef>] dump_stack+0x54/0x74
[   78.480037]  [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100
[   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300
[   78.480037]  [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180
[   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff810e5d5c>] lock_release+0xac/0x310
[   78.480037]  [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180
[   78.480037]  [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff81376c91>] p_stop+0x51/0x90
[   78.480037]  [<ffffffff81211408>] seq_read+0x288/0x3d0
[   78.480037]  [<ffffffff811e9d9e>] vfs_read+0x9e/0x170
[   78.480037]  [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0
[   78.480037]  [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f

Signed-off-by: John Johansen <[email protected]>
Signed-off-by: James Morris <[email protected]>
wolfgar pushed a commit to wolfgar/linux-linaro-stable-mx6 that referenced this issue Sep 21, 2014
This patch makes a copy of the 'template_fmt' function argument so that
the latter will not be modified by strsep(), which does the splitting by
replacing the given separator with '\0'.

 IMA: No TPM chip found, activating TPM-bypass!
 Unable to handle kernel pointer dereference at virtual kernel address 0000000000842000
 Oops: 0004 [linux4kix#1] SMP
 Modules linked in:
 CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc2-00098-g3ce1217d6cd5 linux4kix#17
 task: 000000003ffa0000 ti: 000000003ff84000 task.ti: 000000003ff84000
 Krnl PSW : 0704e00180000000 000000000044bf88 (strsep+0x7c/0xa0)
            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
 Krnl GPRS: 000000000000007c 000000000000007c 000000003ff87d90 0000000000821fd8
            0000000000000000 000000000000007c 0000000000aa37e0 0000000000aa9008
            0000000000000051 0000000000a114d8 0000000100000002 0000000000842bde
            0000000000842bdf 00000000006f97f0 000000000040062c 000000003ff87cf0
 Krnl Code: 000000000044bf7c: a7f4000a           brc     15,44bf90
            000000000044bf80: b90200cc           ltgr    %r12,%r12
           #000000000044bf84: a7840006           brc     8,44bf90
           >000000000044bf88: 9200c000           mvi     0(%r12),0
            000000000044bf8c: 41c0c001           la      %r12,1(%r12)
            000000000044bf90: e3c020000024       stg     %r12,0(%r2)
            000000000044bf96: b904002b           lgr     %r2,%r11
            000000000044bf9a: ebbcf0700004       lmg     %r11,%r12,112(%r15)
 Call Trace:
 ([<00000000004005fe>] ima_init_template+0xa2/0x1bc)
  [<0000000000a7c896>] ima_init+0x7a/0xa8
  [<0000000000a7c938>] init_ima+0x24/0x40
  [<00000000001000e8>] do_one_initcall+0x68/0x128
  [<0000000000a4eb56>] kernel_init_freeable+0x20a/0x2b4
  [<00000000006a1ff4>] kernel_init+0x30/0x178
  [<00000000006b69fe>] kernel_thread_starter+0x6/0xc
  [<00000000006b69f8>] kernel_thread_starter+0x0/0xc
 Last Breaking-Event-Address:
  [<000000000044bf42>] strsep+0x36/0xa0

Fixes commit: adf53a7 ima: new templates management mechanism

Changelog v1:
- make template_fmt 'const char *' (reported-by James Morris)
- fix kstrdup memory leak (reported-by James Morris)

Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Roberto Sassu <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Tested-by: Heiko Carstens <[email protected]>
tomlohave pushed a commit to tomlohave/linux-linaro-stable-mx6 that referenced this issue Feb 21, 2015
tomlohave pushed a commit to tomlohave/linux-linaro-stable-mx6 that referenced this issue Feb 21, 2015
…sk-v3.14-mx6"

This reverts commit ee05b81, reversing
changes made to 828f1bb.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants