-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boot frozen at lpddr4 swffc start #2
Comments
@JoeSandom I also face the same behavior on Hummingboard Pulse. What was your solution / workaround? Simply remove the 58e9698 commit ? |
jnettlet
pushed a commit
that referenced
this issue
Jan 7, 2021
…d ops [ Upstream commit 52719fc ] There are three ways pseries_suspend_begin() can be reached: 1. When "mem" is written to /sys/power/state: kobj_attr_store() -> state_store() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This never works because there is no way to supply a valid stream id using this interface, and H_VASI_STATE is called with a stream id of zero. So this call path is useless at best. 2. When a stream id is written to /sys/devices/system/power/hibernate. pseries_suspend_begin() is polled directly from store_hibernate() until the stream is in the "Suspending" state (i.e. the platform is ready for the OS to suspend execution): dev_attr_store() -> store_hibernate() -> pseries_suspend_begin() 3. When a stream id is written to /sys/devices/system/power/hibernate (continued). After #2, pseries_suspend_begin() is called once again from the pm core: dev_attr_store() -> store_hibernate() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This is redundant because the VASI suspend state is already known to be Suspending. The begin() callback of platform_suspend_ops is optional, so we can simply remove that assignment with no loss of function. Fixes: 32d8ad4 ("powerpc/pseries: Partition hibernation support") Signed-off-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jan 7, 2021
[ Upstream commit 4a9d81c ] If the elem is deleted during be iterated on it, the iteration process will fall into an endless loop. kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [nfsd:17137] PID: 17137 TASK: ffff8818d93c0000 CPU: 4 COMMAND: "nfsd" [exception RIP: __state_in_grace+76] RIP: ffffffffc00e817c RSP: ffff8818d3aefc98 RFLAGS: 00000246 RAX: ffff881dc0c38298 RBX: ffffffff81b03580 RCX: ffff881dc02c9f50 RDX: ffff881e3fce8500 RSI: 0000000000000001 RDI: ffffffff81b03580 RBP: ffff8818d3aefca0 R8: 0000000000000020 R9: ffff8818d3aefd40 R10: ffff88017fc03800 R11: ffff8818e83933c0 R12: ffff8818d3aefd40 R13: 0000000000000000 R14: ffff8818e8391068 R15: ffff8818fa6e4000 CS: 0010 SS: 0018 #0 [ffff8818d3aefc98] opens_in_grace at ffffffffc00e81e3 [grace] #1 [ffff8818d3aefca8] nfs4_preprocess_stateid_op at ffffffffc02a3e6c [nfsd] #2 [ffff8818d3aefd18] nfsd4_write at ffffffffc028ed5b [nfsd] #3 [ffff8818d3aefd80] nfsd4_proc_compound at ffffffffc0290a0d [nfsd] #4 [ffff8818d3aefdd0] nfsd_dispatch at ffffffffc027b800 [nfsd] #5 [ffff8818d3aefe08] svc_process_common at ffffffffc02017f3 [sunrpc] #6 [ffff8818d3aefe70] svc_process at ffffffffc0201ce3 [sunrpc] #7 [ffff8818d3aefe98] nfsd at ffffffffc027b117 [nfsd] #8 [ffff8818d3aefec8] kthread at ffffffff810b88c1 #9 [ffff8818d3aeff50] ret_from_fork at ffffffff816d1607 The troublemake elem: crash> lock_manager ffff881dc0c38298 struct lock_manager { list = { next = 0xffff881dc0c38298, prev = 0xffff881dc0c38298 }, block_opens = false } Fixes: c87fb4a ("lockd: NLM grace period shouldn't block NFSv4 opens") Signed-off-by: Cheng Lin <[email protected]> Signed-off-by: Yi Wang <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jan 25, 2021
[ Upstream commit d9e4498 ] Like other tunneling interfaces, the bareudp doesn't need TXLOCK. So, It is good to set the NETIF_F_LLTX flag to improve performance and to avoid lockdep's false-positive warning. Test commands: ip netns add A ip netns add B ip link add veth0 netns A type veth peer name veth1 netns B ip netns exec A ip link set veth0 up ip netns exec A ip a a 10.0.0.1/24 dev veth0 ip netns exec B ip link set veth1 up ip netns exec B ip a a 10.0.0.2/24 dev veth1 for i in {2..1} do let A=$i-1 ip netns exec A ip link add bareudp$i type bareudp \ dstport $i ethertype ip ip netns exec A ip link set bareudp$i up ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \ dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i ip netns exec B ip link add bareudp$i type bareudp \ dstport $i ethertype ip ip netns exec B ip link set bareudp$i up ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \ dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i done ip netns exec A ping 10.0.2.2 Splat looks like: [ 96.992803][ T822] ============================================ [ 96.993954][ T822] WARNING: possible recursive locking detected [ 96.995102][ T822] 5.10.0+ #819 Not tainted [ 96.995927][ T822] -------------------------------------------- [ 96.997091][ T822] ping/822 is trying to acquire lock: [ 96.998083][ T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 96.999813][ T822] [ 96.999813][ T822] but task is already holding lock: [ 97.001192][ T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 97.002908][ T822] [ 97.002908][ T822] other info that might help us debug this: [ 97.004401][ T822] Possible unsafe locking scenario: [ 97.004401][ T822] [ 97.005784][ T822] CPU0 [ 97.006407][ T822] ---- [ 97.007010][ T822] lock(_xmit_NONE#2); [ 97.007779][ T822] lock(_xmit_NONE#2); [ 97.008550][ T822] [ 97.008550][ T822] *** DEADLOCK *** [ 97.008550][ T822] [ 97.010057][ T822] May be due to missing lock nesting notation [ 97.010057][ T822] [ 97.011594][ T822] 7 locks held by ping/822: [ 97.012426][ T822] #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00 [ 97.014191][ T822] #1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020 [ 97.016045][ T822] #2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960 [ 97.017897][ T822] #3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 97.019684][ T822] #4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp] [ 97.021573][ T822] #5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020 [ 97.023424][ T822] #6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960 [ 97.025259][ T822] [ 97.025259][ T822] stack backtrace: [ 97.026349][ T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819 [ 97.027609][ T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 97.029407][ T822] Call Trace: [ 97.030015][ T822] dump_stack+0x99/0xcb [ 97.030783][ T822] __lock_acquire.cold.77+0x149/0x3a9 [ 97.031773][ T822] ? stack_trace_save+0x81/0xa0 [ 97.032661][ T822] ? register_lock_class+0x1910/0x1910 [ 97.033673][ T822] ? register_lock_class+0x1910/0x1910 [ 97.034679][ T822] ? rcu_read_lock_sched_held+0x91/0xc0 [ 97.035697][ T822] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 97.036690][ T822] lock_acquire+0x1b2/0x730 [ 97.037515][ T822] ? __dev_queue_xmit+0x1f52/0x2960 [ 97.038466][ T822] ? check_flags+0x50/0x50 [ 97.039277][ T822] ? netif_skb_features+0x296/0x9c0 [ 97.040226][ T822] ? validate_xmit_skb+0x29/0xb10 [ 97.041151][ T822] _raw_spin_lock+0x30/0x70 [ 97.041977][ T822] ? __dev_queue_xmit+0x1f52/0x2960 [ 97.042927][ T822] __dev_queue_xmit+0x1f52/0x2960 [ 97.043852][ T822] ? netdev_core_pick_tx+0x290/0x290 [ 97.044824][ T822] ? mark_held_locks+0xb7/0x120 [ 97.045712][ T822] ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 [ 97.046824][ T822] ? __local_bh_enable_ip+0xa5/0xf0 [ 97.047771][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.048710][ T822] ? trace_hardirqs_on+0x41/0x120 [ 97.049626][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.050556][ T822] ? __local_bh_enable_ip+0xa5/0xf0 [ 97.051509][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.052443][ T822] ? check_chain_key+0x244/0x5f0 [ 97.053352][ T822] ? rcu_read_lock_bh_held+0x56/0xa0 [ 97.054317][ T822] ? ip_finish_output2+0x6ea/0x2020 [ 97.055263][ T822] ? pneigh_lookup+0x410/0x410 [ 97.056135][ T822] ip_finish_output2+0x6ea/0x2020 [ ... ] Acked-by: Guillaume Nault <[email protected]> Fixes: 571912c ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Taehee Yoo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jan 25, 2021
commit 3a21777 upstream. We had kernel panic, it is caused by unload module and last close confirmation. call trace: [1196029.743127] free_sess+0x15/0x50 [rtrs_client] [1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client] [1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client] [1196029.743130] close_rtrs+0x25/0x50 [rnbd_client] [1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client] [1196029.743132] __x64_sys_delete_module+0x190/0x260 And in the crashdump confirmation kworker is also running. PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2" #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891 #1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98 #2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938 #3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7 #4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e #5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client] #6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client] #7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client] #8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client] #9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client] The problem is both code path try to close same session, which lead to panic. To fix it, just skip the sess if the refcount already drop to 0. Fixes: f7a7a5c ("block/rnbd: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jan 25, 2021
[ Upstream commit b774134 ] The buffer list can have zero skb as following path: tipc_named_node_up()->tipc_node_xmit()->tipc_link_xmit(), so we need to check the list before casting an &sk_buff. Fault report: [] tipc: Bulk publication failure [] general protection fault, probably for non-canonical [#1] PREEMPT [...] [] KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf] [] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.10.0-rc4+ #2 [] Hardware name: Bochs ..., BIOS Bochs 01/01/2011 [] RIP: 0010:tipc_link_xmit+0xc1/0x2180 [] Code: 24 b8 00 00 00 00 4d 39 ec 4c 0f 44 e8 e8 d7 0a 10 f9 48 [...] [] RSP: 0018:ffffc90000006ea0 EFLAGS: 00010202 [] RAX: dffffc0000000000 RBX: ffff8880224da000 RCX: 1ffff11003d3cc0d [] RDX: 0000000000000019 RSI: ffffffff886007b9 RDI: 00000000000000c8 [] RBP: ffffc90000007018 R08: 0000000000000001 R09: fffff52000000ded [] R10: 0000000000000003 R11: fffff52000000dec R12: ffffc90000007148 [] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90000007018 [] FS: 0000000000000000(0000) GS:ffff888037400000(0000) knlGS:000[...] [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [] CR2: 00007fffd2db5000 CR3: 000000002b08f000 CR4: 00000000000006f0 Fixes: af9b028 ("tipc: make media xmit call outside node spinlock context") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Hoang Le <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Feb 1, 2021
commit fb28610 upstream. While testing the error paths of relocation I hit the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc6+ #217 Not tainted ------------------------------------------------------ mount/779 is trying to acquire lock: ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340 but task is already holding lock: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-root-00){++++}-{3:3}: down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x27/0x100 btrfs_read_lock_root_node+0x31/0x40 btrfs_search_slot+0x462/0x8f0 btrfs_update_root+0x55/0x2b0 btrfs_drop_snapshot+0x398/0x750 clean_dirty_subvols+0xdf/0x120 btrfs_recover_relocation+0x534/0x5a0 btrfs_start_pre_rw_mount+0xcb/0x170 open_ctree+0x151f/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (sb_internal#2){.+.+}-{0:0}: start_transaction+0x444/0x700 insert_balance_item.isra.0+0x37/0x320 btrfs_balance+0x354/0xf40 btrfs_ioctl_balance+0x2cf/0x380 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}: __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 __mutex_lock+0x7e/0x7b0 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-root-00); lock(sb_internal#2); lock(btrfs-root-00); lock(&fs_info->balance_mutex); *** DEADLOCK *** 2 locks held by mount/779: #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 stack backtrace: CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 ? trace_call_bpf+0x139/0x260 __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 ? btrfs_recover_balance+0x2f0/0x340 __mutex_lock+0x7e/0x7b0 ? btrfs_recover_balance+0x2f0/0x340 ? btrfs_recover_balance+0x2f0/0x340 ? rcu_read_lock_sched_held+0x3f/0x80 ? kmem_cache_alloc_trace+0x2c4/0x2f0 ? btrfs_get_64+0x5e/0x100 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea ? rcu_read_lock_sched_held+0x3f/0x80 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? __kmalloc_track_caller+0x2f2/0x320 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 ? capable+0x3a/0x60 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is straightforward to fix, simply release the path before we setup the balance_ctl. CC: [email protected] # 4.4+ Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Feb 1, 2021
commit cf9d052 upstream. In Linux, if a driver does disable_irq() and later does enable_irq() on its interrupt, I believe it's expecting these properties: * If an interrupt was pending when the driver disabled then it will still be pending after the driver re-enables. * If an edge-triggered interrupt comes in while an interrupt is disabled it should assert when the interrupt is re-enabled. If you think that the above sounds a lot like the disable_irq() and enable_irq() are supposed to be masking/unmasking the interrupt instead of disabling/enabling it then you've made an astute observation. Specifically when talking about interrupts, "mask" usually means to stop posting interrupts but keep tracking them and "disable" means to fully shut off interrupt detection. It's unfortunate that this is so confusing, but presumably this is all the way it is for historical reasons. Perhaps more confusing than the above is that, even though clients of IRQs themselves don't have a way to request mask/unmask vs. disable/enable calls, IRQ chips themselves can implement both. ...and yet more confusing is that if an IRQ chip implements disable/enable then they will be called when a client driver calls disable_irq() / enable_irq(). It does feel like some of the above could be cleared up. However, without any other core interrupt changes it should be clear that when an IRQ chip gets a request to "disable" an IRQ that it has to treat it like a mask of that IRQ. In any case, after that long interlude you can see that the "unmask and clear" can break things. Maulik tried to fix it so that we no longer did "unmask and clear" in commit 71266d9 ("pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback"), but it only handled the PDC case and it had problems (it caused sc7180-trogdor devices to fail to suspend). Let's fix. >From my understanding the source of the phantom interrupt in the were these two things: 1. One that could have been introduced in msm_gpio_irq_set_type() (only for the non-PDC case). 2. Edges could have been detected when a GPIO was muxed away. Fixing case #1 is easy. We can just add a clear in msm_gpio_irq_set_type(). Fixing case #2 is harder. Let's use a concrete example. In sc7180-trogdor.dtsi we configure the uart3 to have two pinctrl states, sleep and default, and mux between the two during runtime PM and system suspend (see geni_se_resources_{on,off}() for more details). The difference between the sleep and default state is that the RX pin is muxed to a GPIO during sleep and muxed to the UART otherwise. As per Qualcomm, when we mux the pin over to the UART function the PDC (or the non-PDC interrupt detection logic) is still watching it / latching edges. These edges don't cause interrupts because the current code masks the interrupt unless we're entering suspend. However, as soon as we enter suspend we unmask the interrupt and it's counted as a wakeup. Let's deal with the problem like this: * When we mux away, we'll mask our interrupt. This isn't necessary in the above case since the client already masked us, but it's a good idea in general. * When we mux back will clear any interrupts and unmask. Fixes: 4b7618f ("pinctrl: qcom: Add irq_enable callback for msm gpio") Fixes: 71266d9 ("pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback") Signed-off-by: Douglas Anderson <[email protected]> Reviewed-by: Maulik Shah <[email protected]> Tested-by: Maulik Shah <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/20210114191601.v7.4.I7cf3019783720feb57b958c95c2b684940264cd1@changeid Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
commit 2e99ded upstream. Similar to commit 165ae7a ("igb: Report speed and duplex as unknown when device is runtime suspended"), if we try to read speed and duplex sysfs while the device is runtime suspended, igc will complain and stops working: [ 123.449883] igc 0000:03:00.0 enp3s0: PCIe link lost, device now detached [ 123.450052] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 123.450056] #PF: supervisor read access in kernel mode [ 123.450058] #PF: error_code(0x0000) - not-present page [ 123.450059] PGD 0 P4D 0 [ 123.450064] Oops: 0000 [#1] SMP NOPTI [ 123.450068] CPU: 0 PID: 2525 Comm: udevadm Tainted: G U W OE 5.10.0-1002-oem #2+rkl2-Ubuntu [ 123.450078] RIP: 0010:igc_rd32+0x1c/0x90 [igc] [ 123.450080] Code: c0 5d c3 b8 fd ff ff ff c3 0f 1f 44 00 00 0f 1f 44 00 00 55 89 f0 48 89 e5 41 56 41 55 41 54 49 89 c4 53 48 8b 57 08 48 01 d0 <44> 8b 28 41 83 fd ff 74 0c 5b 44 89 e8 41 5c 41 5d 4 [ 123.450083] RSP: 0018:ffffb0d100d6fcc0 EFLAGS: 00010202 [ 123.450085] RAX: 0000000000000008 RBX: ffffb0d100d6fd30 RCX: 0000000000000000 [ 123.450087] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff945a12716c10 [ 123.450089] RBP: ffffb0d100d6fce0 R08: ffff945a12716550 R09: ffff945a09874000 [ 123.450090] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008 [ 123.450092] R13: ffff945a12716000 R14: ffff945a037da280 R15: ffff945a037da290 [ 123.450094] FS: 00007f3b34c868c0(0000) GS:ffff945b89200000(0000) knlGS:0000000000000000 [ 123.450096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.450098] CR2: 0000000000000008 CR3: 00000001144de006 CR4: 0000000000770ef0 [ 123.450100] PKRU: 55555554 [ 123.450101] Call Trace: [ 123.450111] igc_ethtool_get_link_ksettings+0xd6/0x1b0 [igc] [ 123.450118] __ethtool_get_link_ksettings+0x71/0xb0 [ 123.450123] duplex_show+0x74/0xc0 [ 123.450129] dev_attr_show+0x1d/0x40 [ 123.450134] sysfs_kf_seq_show+0xa1/0x100 [ 123.450137] kernfs_seq_show+0x27/0x30 [ 123.450142] seq_read+0xb7/0x400 [ 123.450148] ? common_file_perm+0x72/0x170 [ 123.450151] kernfs_fop_read+0x35/0x1b0 [ 123.450155] vfs_read+0xb5/0x1b0 [ 123.450157] ksys_read+0x67/0xe0 [ 123.450160] __x64_sys_read+0x1a/0x20 [ 123.450164] do_syscall_64+0x38/0x90 [ 123.450168] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 123.450170] RIP: 0033:0x7f3b351fe142 [ 123.450173] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 3a ca 0a 00 e8 f5 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 123.450174] RSP: 002b:00007fffef2ec138 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 123.450177] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b351fe142 [ 123.450179] RDX: 0000000000001001 RSI: 00005644c047f070 RDI: 0000000000000003 [ 123.450180] RBP: 00007fffef2ec340 R08: 00005644c047f070 R09: 00007f3b352d9320 [ 123.450182] R10: 00005644c047c010 R11: 0000000000000246 R12: 00005644c047cbf0 [ 123.450184] R13: 00005644c047e6d0 R14: 0000000000000003 R15: 00007fffef2ec140 [ 123.450189] Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher af_alg bnep toshiba_acpi industrialio toshiba_haps hp_accel lis3lv02d btusb btrtl btbcm btintel bluetooth ecdh_generic ecc joydev input_leds nls_iso8859_1 snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_hda_codec_hdmi snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core ath10k_pci snd_hwdep intel_rapl_msr intel_rapl_common ath10k_core soundwire_bus snd_soc_core x86_pkg_temp_thermal ath intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine mac80211 snd_pcm coretemp snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_intel cfg80211 snd_seq snd_seq_device snd_timer mei_hdcp kvm libarc4 snd crct10dif_pclmul ghash_clmulni_intel aesni_intel mei_me dell_wmi [ 123.450266] dell_smbios soundcore sparse_keymap dcdbas crypto_simd cryptd mei dell_uart_backlight glue_helper ee1004 wmi_bmof intel_wmi_thunderbolt dell_wmi_descriptor mac_hid efi_pstore acpi_pad acpi_tad intel_cstate sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crc32_pclmul rc_core drm intel_lpss_pci i2c_i801 ahci igc intel_lpss i2c_smbus idma64 xhci_pci libahci virt_dma xhci_pci_renesas wmi video pinctrl_tigerlake [ 123.450335] CR2: 0000000000000008 [ 123.450338] ---[ end trace 9f731e38b53c35cc ]--- The more generic approach will be wrap get_link_ksettings() with begin() and complete() callbacks, and calls runtime resume and runtime suspend routine respectively. However, igc is like igb, runtime resume routine uses rtnl_lock() which upper ethtool layer also uses. So to prevent a deadlock on rtnl, take a different approach, use pm_runtime_suspended() to avoid reading register while device is runtime suspended. Fixes: 8c5ad0d ("igc: Add ethtool support") Signed-off-by: Kai-Heng Feng <[email protected]> Acked-by: Sasha Neftin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
commit 3f618ab upstream. When building with KASAN and LKDTM, clang may implictly generate an asan.module_ctor function in the LKDTM rodata object. The Makefile moves the lkdtm_rodata_do_nothing() function into .rodata by renaming the file's .text section to .rodata, and consequently also moves the ctor function into .rodata, leading to a boot time crash (splat below) when the ctor is invoked by do_ctors(). Let's prevent this by marking the function as noinstr rather than notrace, and renaming the file's .noinstr.text to .rodata. Marking the function as noinstr will prevent tracing and kprobes, and will inhibit any undesireable compiler instrumentation. The ctor function (if any) will be placed in .text and will work correctly. Example splat before this patch is applied: [ 0.916359] Unable to handle kernel execute from non-executable memory at virtual address ffffa0006b60f5ac [ 0.922088] Mem abort info: [ 0.922828] ESR = 0x8600000e [ 0.923635] EC = 0x21: IABT (current EL), IL = 32 bits [ 0.925036] SET = 0, FnV = 0 [ 0.925838] EA = 0, S1PTW = 0 [ 0.926714] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000427b3000 [ 0.928489] [ffffa0006b60f5ac] pgd=000000023ffff003, p4d=000000023ffff003, pud=000000023fffe003, pmd=0068000042000f01 [ 0.931330] Internal error: Oops: 8600000e [#1] PREEMPT SMP [ 0.932806] Modules linked in: [ 0.933617] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc7 #2 [ 0.935620] Hardware name: linux,dummy-virt (DT) [ 0.936924] pstate: 40400005 (nZcv daif +PAN -UAO -TCO BTYPE=--) [ 0.938609] pc : asan.module_ctor+0x0/0x14 [ 0.939759] lr : do_basic_setup+0x4c/0x70 [ 0.940889] sp : ffff27b600177e30 [ 0.941815] x29: ffff27b600177e30 x28: 0000000000000000 [ 0.943306] x27: 0000000000000000 x26: 0000000000000000 [ 0.944803] x25: 0000000000000000 x24: 0000000000000000 [ 0.946289] x23: 0000000000000001 x22: 0000000000000000 [ 0.947777] x21: ffffa0006bf4a890 x20: ffffa0006befb6c0 [ 0.949271] x19: ffffa0006bef9358 x18: 0000000000000068 [ 0.950756] x17: fffffffffffffff8 x16: 0000000000000000 [ 0.952246] x15: 0000000000000000 x14: 0000000000000000 [ 0.953734] x13: 00000000838a16d5 x12: 0000000000000001 [ 0.955223] x11: ffff94000da74041 x10: dfffa00000000000 [ 0.956715] x9 : 0000000000000000 x8 : ffffa0006b60f5ac [ 0.958199] x7 : f9f9f9f9f9f9f9f9 x6 : 000000000000003f [ 0.959683] x5 : 0000000000000040 x4 : 0000000000000000 [ 0.961178] x3 : ffffa0006bdc15a0 x2 : 0000000000000005 [ 0.962662] x1 : 00000000000000f9 x0 : ffffa0006bef9350 [ 0.964155] Call trace: [ 0.964844] asan.module_ctor+0x0/0x14 [ 0.965895] kernel_init_freeable+0x158/0x198 [ 0.967115] kernel_init+0x14/0x19c [ 0.968104] ret_from_fork+0x10/0x30 [ 0.969110] Code: 00000003 00000000 00000000 00000000 (00000000) [ 0.970815] ---[ end trace b5339784e20d015c ]--- Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Kees Cook <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Stephen Boyd <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we end up triggering an assertion and crashing. Example reproducer: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT # Do a direct IO write with O_DSYNC into a non-aligned range... xfs_io -f -d -s -c "pwrite -S 0xab -b 64K 1111 64K" $MNT/foobar umount $MNT When running the reproducer an assertion fails and produces the following trace: [ 2418.403134] assertion failed: !current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA, in fs/btrfs/space-info.c:1467 [ 2418.403745] ------------[ cut here ]------------ [ 2418.404306] kernel BUG at fs/btrfs/ctree.h:3286! [ 2418.404862] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI [ 2418.405451] CPU: 1 PID: 64705 Comm: xfs_io Tainted: G D 5.10.15-btrfs-next-87 #1 [ 2418.406026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 2418.407228] RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs] [ 2418.407835] Code: e6 48 c7 (...) [ 2418.409078] RSP: 0018:ffffb06080d13c98 EFLAGS: 00010246 [ 2418.409696] RAX: 000000000000006c RBX: ffff994c1debbf08 RCX: 0000000000000000 [ 2418.410302] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff [ 2418.410904] RBP: ffff994c21770000 R08: 0000000000000000 R09: 0000000000000000 [ 2418.411504] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000010000 [ 2418.412111] R13: ffff994c22198400 R14: ffff994c21770000 R15: 0000000000000000 [ 2418.412713] FS: 00007f54fd7aff00(0000) GS:ffff994d35200000(0000) knlGS:0000000000000000 [ 2418.413326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2418.413933] CR2: 000056549596d000 CR3: 000000010b928003 CR4: 0000000000370ee0 [ 2418.414528] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2418.415109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2418.415669] Call Trace: [ 2418.416254] btrfs_reserve_data_bytes.cold+0x22/0x22 [btrfs] [ 2418.416812] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] [ 2418.417380] btrfs_buffered_write+0x1b0/0x7f0 [btrfs] [ 2418.418315] btrfs_file_write_iter+0x2a9/0x770 [btrfs] [ 2418.418920] new_sync_write+0x11f/0x1c0 [ 2418.419430] vfs_write+0x2bb/0x3b0 [ 2418.419972] __x64_sys_pwrite64+0x90/0xc0 [ 2418.420486] do_syscall_64+0x33/0x80 [ 2418.420979] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2418.421486] RIP: 0033:0x7f54fda0b986 [ 2418.421981] Code: 48 c7 c0 (...) [ 2418.423019] RSP: 002b:00007ffc40569c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 [ 2418.423547] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54fda0b986 [ 2418.424075] RDX: 0000000000010000 RSI: 000056549595e000 RDI: 0000000000000003 [ 2418.424596] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000400 [ 2418.425119] R10: 0000000000000400 R11: 0000000000000246 R12: 00000000ffffffff [ 2418.425644] R13: 0000000000000400 R14: 0000000000010000 R15: 0000000000000000 [ 2418.426148] Modules linked in: btrfs blake2b_generic (...) [ 2418.429540] ---[ end trace ef2aeb44dc0afa34 ]--- 1) At btrfs_file_write_iter() we set current->journal_info to BTRFS_DIO_SYNC_STUB; 2) We then call __btrfs_direct_write(), which calls btrfs_direct_IO(); 3) We can't do the direct IO write because it starts at a non-aligned offset (1111). So at btrfs_direct_IO() we return -EINVAL (coming from check_direct_IO() which does the alignment check), but we leave current->journal_info set to BTRFS_DIO_SYNC_STUB - we only clear it at btrfs_dio_iomap_begin(), because we assume we always get there; 4) Then at __btrfs_direct_write() we see that the attempt to do the direct IO write was not successful, 0 bytes written, so we fallback to a buffered write by calling btrfs_buffered_write(); 5) There we call btrfs_check_data_free_space() which in turn calls btrfs_alloc_data_chunk_ondemand() and that calls btrfs_reserve_data_bytes() with flush == BTRFS_RESERVE_FLUSH_DATA; 6) Then at btrfs_reserve_data_bytes() we have current->journal_info set to BTRFS_DIO_SYNC_STUB, therefore not NULL, and flush has the value BTRFS_RESERVE_FLUSH_DATA, triggering the second assertion: int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes, enum btrfs_reserve_flush_enum flush) { struct btrfs_space_info *data_sinfo = fs_info->data_sinfo; int ret; ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA || flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE); ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA); (...) So fix that by setting the journal to NULL whenever check_direct_IO() returns a failure. This bug only affects 5.10 kernels, and the regression was introduced in 5.10-rc1 by commit 0eb7929 ("btrfs: dio iomap DSYNC workaround"). The bug does not exist in 5.11 kernels due to commit ecfdc08 ("btrfs: remove dio iomap DSYNC workaround"), which depends on a large patchset that went into the merge window for 5.11. So this is a fix only for 5.10.x stable kernels, as there are people hitting this bug. Fixes: 0eb7929 ("btrfs: dio iomap DSYNC workaround") CC: [email protected] # 5.10 (and only 5.10) Acked-by: David Sterba <[email protected]> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1181605 Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
[ Upstream commit eaba3b2 ] Unprivileged user can crash kernel by using DRM_IOCTL_NOUVEAU_CHANNEL_ALLOC ioctl. This was reported by trinity[1] fuzzer. [ 71.073906] nouveau 0000:01:00.0: crashme[1329]: channel failed to initialise, -17 [ 71.081730] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 71.088928] #PF: supervisor read access in kernel mode [ 71.094059] #PF: error_code(0x0000) - not-present page [ 71.099189] PGD 119590067 P4D 119590067 PUD 1054f5067 PMD 0 [ 71.104842] Oops: 0000 [#1] SMP NOPTI [ 71.108498] CPU: 2 PID: 1329 Comm: crashme Not tainted 5.8.0-rc6+ #2 [ 71.114993] Hardware name: AMD Pike/Pike, BIOS RPK1506A 09/03/2014 [ 71.121213] RIP: 0010:nouveau_abi16_ioctl_channel_alloc+0x108/0x380 [nouveau] [ 71.128339] Code: 48 89 9d f0 00 00 00 41 8b 4c 24 04 41 8b 14 24 45 31 c0 4c 8d 4b 10 48 89 ee 4c 89 f7 e8 10 11 00 00 85 c0 75 78 48 8b 43 10 <8b> 90 a0 00 00 00 41 89 54 24 08 80 7d 3d 05 0f 86 bb 01 00 00 41 [ 71.147074] RSP: 0018:ffffb4a1809cfd38 EFLAGS: 00010246 [ 71.152526] RAX: 0000000000000000 RBX: ffff98cedbaa1d20 RCX: 00000000000003bf [ 71.159651] RDX: 00000000000003be RSI: 0000000000000000 RDI: 0000000000030160 [ 71.166774] RBP: ffff98cee776de00 R08: ffffdc0144198a08 R09: ffff98ceeefd4000 [ 71.173901] R10: ffff98cee7e81780 R11: 0000000000000001 R12: ffffb4a1809cfe08 [ 71.181214] R13: ffff98cee776d000 R14: ffff98cec519e000 R15: ffff98cee776def0 [ 71.188339] FS: 00007fd926250500(0000) GS:ffff98ceeac80000(0000) knlGS:0000000000000000 [ 71.196418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.202155] CR2: 00000000000000a0 CR3: 0000000106622000 CR4: 00000000000406e0 [ 71.209297] Call Trace: [ 71.211777] ? nouveau_abi16_ioctl_getparam+0x1f0/0x1f0 [nouveau] [ 71.218053] drm_ioctl_kernel+0xac/0xf0 [drm] [ 71.222421] drm_ioctl+0x211/0x3c0 [drm] [ 71.226379] ? nouveau_abi16_ioctl_getparam+0x1f0/0x1f0 [nouveau] [ 71.232500] nouveau_drm_ioctl+0x57/0xb0 [nouveau] [ 71.237285] ksys_ioctl+0x86/0xc0 [ 71.240595] __x64_sys_ioctl+0x16/0x20 [ 71.244340] do_syscall_64+0x4c/0x90 [ 71.248110] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 71.253162] RIP: 0033:0x7fd925d4b88b [ 71.256731] Code: Bad RIP value. [ 71.259955] RSP: 002b:00007ffc743592d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 71.267514] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd925d4b88b [ 71.274637] RDX: 0000000000601080 RSI: 00000000c0586442 RDI: 0000000000000003 [ 71.281986] RBP: 00007ffc74359340 R08: 00007fd926016ce0 R09: 00007fd926016ce0 [ 71.289111] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000400620 [ 71.296235] R13: 00007ffc74359420 R14: 0000000000000000 R15: 0000000000000000 [ 71.303361] Modules linked in: rfkill sunrpc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep kvm_amd snd_seq ccp snd_seq_device snd_pcm kvm snd_timer snd irqbypass soundcore sp5100_tco pcspkr crct10dif_pclmul crc32_pclmul ghash_clmulni_intel wmi_bmof joydev i2c_piix4 fam15h_power k10temp acpi_cpufreq ip_tables xfs libcrc32c sd_mod t10_pi sg nouveau video mxm_wmi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm broadcom bcm_phy_lib ata_generic ahci drm e1000 crc32c_intel libahci serio_raw tg3 libata firewire_ohci firewire_core wmi crc_itu_t dm_mirror dm_region_hash dm_log dm_mod [ 71.365269] CR2: 00000000000000a0 simplified reproducer ---------------------------------8<---------------------------------------- /* * gcc -o crashme crashme.c * ./crashme /dev/dri/renderD128 */ struct drm_nouveau_channel_alloc { uint32_t fb_ctxdma_handle; uint32_t tt_ctxdma_handle; int channel; uint32_t pushbuf_domains; /* Notifier memory */ uint32_t notifier_handle; /* DRM-enforced subchannel assignments */ struct { uint32_t handle; uint32_t grclass; } subchan[8]; uint32_t nr_subchan; }; static struct drm_nouveau_channel_alloc channel; int main(int argc, char *argv[]) { int fd; int rv; if (argc != 2) die("usage: %s <dev>", 0, argv[0]); if ((fd = open(argv[1], O_RDONLY)) == -1) die("open %s", errno, argv[1]); if (ioctl(fd, DRM_IOCTL_NOUVEAU_CHANNEL_ALLOC, &channel) == -1 && errno == EACCES) die("ioctl %s", errno, argv[1]); close(fd); printf("PASS\n"); return 0; } ---------------------------------8<---------------------------------------- [1] https://github.com/kernelslacker/trinity Fixes: eeaf06a ("drm/nouveau/svm: initial support for shared virtual memory") Signed-off-by: Frantisek Hrbata <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Ben Skeggs <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
[ Upstream commit 661f385 ] During connection setup, the application may choose to zero-size inbound and outbound READ queues, as well as the Receive queue. This patch fixes handling of zero-sized queues, but not prevents it. Kamal Heib says in an initial error report: When running the blktests over siw the following shift-out-of-bounds is reported, this is happening because the passed IRD or ORD from the ulp could be zero which will lead to unexpected behavior when calling roundup_pow_of_two(), fix that by blocking zero values of ORD or IRD. UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 20 PID: 3957 Comm: kworker/u64:13 Tainted: G S 5.10.0-rc6 #2 Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x99/0xcb ubsan_epilogue+0x5/0x40 __ubsan_handle_shift_out_of_bounds.cold.11+0xb4/0xf3 ? down_write+0x183/0x3d0 siw_qp_modify.cold.8+0x2d/0x32 [siw] ? __local_bh_enable_ip+0xa5/0xf0 siw_accept+0x906/0x1b60 [siw] ? xa_load+0x147/0x1f0 ? siw_connect+0x17a0/0x17a0 [siw] ? lock_downgrade+0x700/0x700 ? siw_get_base_qp+0x1c2/0x340 [siw] ? _raw_spin_unlock_irqrestore+0x39/0x40 iw_cm_accept+0x1f4/0x430 [iw_cm] rdma_accept+0x3fa/0xb10 [rdma_cm] ? check_flush_dependency+0x410/0x410 ? cma_rep_recv+0x570/0x570 [rdma_cm] nvmet_rdma_queue_connect+0x1a62/0x2680 [nvmet_rdma] ? nvmet_rdma_alloc_cmds+0xce0/0xce0 [nvmet_rdma] ? lock_release+0x56e/0xcc0 ? lock_downgrade+0x700/0x700 ? lock_downgrade+0x700/0x700 ? __xa_alloc_cyclic+0xef/0x350 ? __xa_alloc+0x2d0/0x2d0 ? rdma_restrack_add+0xbe/0x2c0 [ib_core] ? __ww_mutex_die+0x190/0x190 cma_cm_event_handler+0xf2/0x500 [rdma_cm] iw_conn_req_handler+0x910/0xcb0 [rdma_cm] ? _raw_spin_unlock_irqrestore+0x39/0x40 ? trace_hardirqs_on+0x1c/0x150 ? cma_ib_handler+0x8a0/0x8a0 [rdma_cm] ? __kasan_kmalloc.constprop.7+0xc1/0xd0 cm_work_handler+0x121c/0x17a0 [iw_cm] ? iw_cm_reject+0x190/0x190 [iw_cm] ? trace_hardirqs_on+0x1c/0x150 process_one_work+0x8fb/0x16c0 ? pwq_dec_nr_in_flight+0x320/0x320 worker_thread+0x87/0xb40 ? __kthread_parkme+0xd1/0x1a0 ? process_one_work+0x16c0/0x16c0 kthread+0x35f/0x430 ? kthread_mod_delayed_work+0x180/0x180 ret_from_fork+0x22/0x30 Fixes: a531975 ("rdma/siw: main include file") Fixes: f29dd55 ("rdma/siw: queue pair methods") Fixes: 8b6a361 ("rdma/siw: receive path") Fixes: b9be6f1 ("rdma/siw: transmit path") Fixes: 303ae1c ("rdma/siw: application interface") Link: https://lore.kernel.org/r/[email protected] Reported-by: Kamal Heib <[email protected]> Reported-by: Yi Zhang <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Bernard Metzler <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
[ Upstream commit 429fa96 ] The size of tx_valid_cpus was calculated under the assumption that the numa nodes identifiers are continuous, which is not the case in all archs as this could lead to the following panic when trying to access an invalid tx_valid_cpus index, avoid the following panic by using nr_node_ids instead of num_online_nodes() to allocate the tx_valid_cpus size. Kernel attempted to read user page (8) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000008 Faulting instruction address: 0xc0080000081b4a90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: siw(+) rfkill rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm sunrpc ib_umad rdma_cm ib_cm iw_cm i40iw ib_uverbs ib_core i40e ses enclosure scsi_transport_sas ipmi_powernv ibmpowernv at24 ofpart ipmi_devintf regmap_i2c ipmi_msghandler powernv_flash uio_pdrv_genirq uio mtd opal_prd zram ip_tables xfs libcrc32c sd_mod t10_pi ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto aacraid drm_panel_orientation_quirks dm_mod CPU: 40 PID: 3279 Comm: modprobe Tainted: G W X --------- --- 5.11.0-0.rc4.129.eln108.ppc64le #2 NIP: c0080000081b4a90 LR: c0080000081b4a2c CTR: c0000000007ce1c0 REGS: c000000027fa77b0 TRAP: 0300 Tainted: G W X --------- --- (5.11.0-0.rc4.129.eln108.ppc64le) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 44224882 XER: 00000000 CFAR: c0000000007ce200 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0 GPR00: c0080000081b4a2c c000000027fa7a50 c0080000081c3900 0000000000000040 GPR04: c000000002023080 c000000012e1c300 000020072ad70000 0000000000000001 GPR08: c000000001726068 0000000000000008 0000000000000008 c0080000081b5758 GPR12: c0000000007ce1c0 c0000007fffc3000 00000001590b1e40 0000000000000000 GPR16: 0000000000000000 0000000000000001 000000011ad68fc8 00007fffcc09c5c8 GPR20: 0000000000000008 0000000000000000 00000001590b2850 00000001590b1d30 GPR24: 0000000000043d68 000000011ad67a80 000000011ad67a80 0000000000100000 GPR28: c000000012e1c300 c0000000020271c8 0000000000000001 c0080000081bf608 NIP [c0080000081b4a90] siw_init_cpulist+0x194/0x214 [siw] LR [c0080000081b4a2c] siw_init_cpulist+0x130/0x214 [siw] Call Trace: [c000000027fa7a50] [c0080000081b4a2c] siw_init_cpulist+0x130/0x214 [siw] (unreliable) [c000000027fa7a90] [c0080000081b4e68] siw_init_module+0x40/0x2a0 [siw] [c000000027fa7b30] [c0000000000124f4] do_one_initcall+0x84/0x2e0 [c000000027fa7c00] [c000000000267ffc] do_init_module+0x7c/0x350 [c000000027fa7c90] [c00000000026a180] __do_sys_init_module+0x210/0x250 [c000000027fa7db0] [c0000000000387e4] system_call_exception+0x134/0x230 [c000000027fa7e10] [c00000000000d660] system_call_common+0xf0/0x27c Instruction dump: 40810044 3d420000 e8bf0000 e88a82d0 3d420000 e90a82c8 792a1f24 7cc4302a 7d2642aa 79291f24 7d25482a 7d295214 <7d4048a8> 7d4a3b78 7d4049ad 40c2fff4 Fixes: bdcf26b ("rdma/siw: network and RDMA core interface") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Bernard Metzler <[email protected]> Tested-by: Yi Zhang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 #7 0x56153264e796 in run_builtin perf.c:312:11 #8 0x56153264cf03 in handle_internal_command perf.c:364:8 #9 0x56153264e47d in run_argv perf.c:408:2 #10 0x56153264c9a9 in main perf.c:538:3 #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) #12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Mar 14, 2021
commit 4d14c5c upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 #10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED #11 touch_atime at ffffffffa4cf0000 #12 generic_file_read_iter at ffffffffa4c1f123 #13 new_sync_read at ffffffffa4ccdc8a #14 vfs_read at ffffffffa4cd0849 #15 ksys_read at ffffffffa4cd0bd1 #16 do_syscall_64 at ffffffffa4a052eb #17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f #10 __extent_writepage at ffffffffc03a4c01 #11 extent_write_cache_pages at ffffffffc03a500b #12 extent_writepages at ffffffffc03a6de2 #13 do_writepages at ffffffffa4c277eb #14 __filemap_fdatawrite_range at ffffffffa4c1e5bb #15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes #16 normal_work_helper at ffffffffc03b706c #17 process_one_work at ffffffffa4aba4e4 #18 worker_thread at ffffffffa4aba6fd #19 kthread at ffffffffa4ac0a3d #20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jul 16, 2021
Pach series "mm: thp: use generic THP migration for NUMA hinting fault", v3. When the THP NUMA fault support was added THP migration was not supported yet. So the ad hoc THP migration was implemented in NUMA fault handling. Since v4.14 THP migration has been supported so it doesn't make too much sense to still keep another THP migration implementation rather than using the generic migration code. It is definitely a maintenance burden to keep two THP migration implementation for different code paths and it is more error prone. Using the generic THP migration implementation allows us remove the duplicate code and some hacks needed by the old ad hoc implementation. A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP and NUMA balancing. The most of them support THP migration except for S390. Zi Yan tried to add THP migration support for S390 before but it was not accepted due to the design of S390 PMD. For the discussion, please see: https://lkml.org/lkml/2018/4/27/953. Per the discussion with Gerald Schaefer in v1 it is acceptible to skip huge PMD for S390 for now. I saw there were some hacks about gup from git history, but I didn't figure out if they have been removed or not since I just found FOLL_NUMA code in the current gup implementation and they seems useful. Patch #1 ~ #2 are preparation patches. Patch #3 is the real meat. Patch #4 ~ #6 keep consistent counters and behaviors with before. Patch #7 skips change huge PMD to prot_none if thp migration is not supported. Test ---- Did some tests to measure the latency of do_huge_pmd_numa_page. The test VM has 80 vcpus and 64G memory. The test would create 2 processes to consume 128G memory together which would incur memory pressure to cause THP splits. And it also creates 80 processes to hog cpu, and the memory consumer processes are bound to different nodes periodically in order to increase NUMA faults. The below test script is used: echo 3 > /proc/sys/vm/drop_caches # Run stress-ng for 24 hours ./stress-ng/stress-ng --vm 2 --vm-bytes 64G --timeout 24h & PID=$! ./stress-ng/stress-ng --cpu $NR_CPUS --timeout 24h & # Wait for vm stressors forked sleep 5 PID_1=`pgrep -P $PID | awk 'NR == 1'` PID_2=`pgrep -P $PID | awk 'NR == 2'` JOB1=`pgrep -P $PID_1` JOB2=`pgrep -P $PID_2` # Bind load jobs to different nodes periodically to force generate # cross node memory access while [ -d "/proc/$PID" ] do taskset -apc 8 $JOB1 taskset -apc 8 $JOB2 sleep 300 taskset -apc 58 $JOB1 taskset -apc 58 $JOB2 sleep 300 done With the above test the histogram of latency of do_huge_pmd_numa_page is as shown below. Since the number of do_huge_pmd_numa_page varies drastically for each run (should be due to scheduler), so I converted the raw number to percentage. patched base @us[stress-ng]: [0] 3.57% 0.16% [1] 55.68% 18.36% [2, 4) 10.46% 40.44% [4, 8) 7.26% 17.82% [8, 16) 21.12% 13.41% [16, 32) 1.06% 4.27% [32, 64) 0.56% 4.07% [64, 128) 0.16% 0.35% [128, 256) < 0.1% < 0.1% [256, 512) < 0.1% < 0.1% [512, 1K) < 0.1% < 0.1% [1K, 2K) < 0.1% < 0.1% [2K, 4K) < 0.1% < 0.1% [4K, 8K) < 0.1% < 0.1% [8K, 16K) < 0.1% < 0.1% [16K, 32K) < 0.1% < 0.1% [32K, 64K) < 0.1% < 0.1% Per the result, patched kernel is even slightly better than the base kernel. I think this is because the lock contention against THP split is less than base kernel due to the refactor. To exclude the affect from THP split, I also did test w/o memory pressure. No obvious regression is spotted. The below is the test result *w/o* memory pressure. patched base @us[stress-ng]: [0] 7.97% 18.4% [1] 69.63% 58.24% [2, 4) 4.18% 2.63% [4, 8) 0.22% 0.17% [8, 16) 1.03% 0.92% [16, 32) 0.14% < 0.1% [32, 64) < 0.1% < 0.1% [64, 128) < 0.1% < 0.1% [128, 256) < 0.1% < 0.1% [256, 512) 0.45% 1.19% [512, 1K) 15.45% 17.27% [1K, 2K) < 0.1% < 0.1% [2K, 4K) < 0.1% < 0.1% [4K, 8K) < 0.1% < 0.1% [8K, 16K) 0.86% 0.88% [16K, 32K) < 0.1% 0.15% [32K, 64K) < 0.1% < 0.1% [64K, 128K) < 0.1% < 0.1% [128K, 256K) < 0.1% < 0.1% The series also survived a series of tests that exercise NUMA balancing migrations by Mel. This patch (of 7): Add orig_pmd to struct vm_fault so the "orig_pmd" parameter used by huge page fault could be removed, just like its PTE counterpart does. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Zi Yan <[email protected]> Cc: Huang Ying <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jul 16, 2021
Patch series "mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables", v2. Excessive details on MADV_POPULATE_(READ|WRITE) can be found in patch #2. This patch (of 5): Let's make the variable names in the function declaration match the variable names used in the definition. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Peter Xu <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Ram Pai <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jul 16, 2021
ASan reports a memory leak caused by evlist not being deleted on exit in perf-report, perf-script and perf-data. The problem is caused by evlist->session not being deleted, which is allocated in perf_session__read_header, called in perf_session__new if perf_data is in read mode. In case of write mode, the session->evlist is filled by the caller. This patch solves the problem by calling evlist__delete in perf_session__delete if perf_data is in read mode. Changes in v2: - call evlist__delete from within perf_session__delete v1: https://lore.kernel.org/lkml/[email protected]/ ASan report follows: $ ./perf script report flamegraph ================================================================= ==227640==ERROR: LeakSanitizer: detected memory leaks <SNIP unrelated> Indirect leak of 2704 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x7f999e in evlist__new /home/user/linux/tools/perf/util/evlist.c:77:26 #3 0x8ad938 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3797:20 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 568 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x80ce88 in evsel__new_idx /home/user/linux/tools/perf/util/evsel.c:268:24 #3 0x8aed93 in evsel__new /home/user/linux/tools/perf/util/evsel.h:210:9 #4 0x8ae07e in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3853:11 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 264 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbe3e70 in xyarray__new /home/user/linux/tools/lib/perf/xyarray.c:10:23 #3 0xbd7754 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:361:21 #4 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbd77e0 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:365:14 #3 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 7 byte(s) in 1 object(s) allocated from: #0 0x4b8207 in strdup (/home/user/linux/tools/perf/perf+0x4b8207) #1 0x8b4459 in evlist__set_event_name /home/user/linux/tools/perf/util/header.c:2292:16 #2 0x89d862 in process_event_desc /home/user/linux/tools/perf/util/header.c:2313:3 #3 0x8af319 in perf_file_section__process /home/user/linux/tools/perf/util/header.c:3651:9 #4 0x8aa6e9 in perf_header__process_sections /home/user/linux/tools/perf/util/header.c:3427:9 #5 0x8ae3e7 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3886:2 #6 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #7 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #8 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #9 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #10 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #11 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #12 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #13 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) SUMMARY: AddressSanitizer: 3728 byte(s) leaked in 7 allocation(s). Signed-off-by: Riccardo Mancini <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jul 16, 2021
On trogdor devices I see the following lockdep splat when stopping youtube with lockdep enabled in the kernel. ====================================================== WARNING: possible circular locking dependency detected 5.13.0-rc2 #71 Not tainted ------------------------------------------------------ ThreadPoolSingl/3969 is trying to acquire lock: ffffff80d4d5c080 (&inst->lock#3){+.+.}-{3:3}, at: vdec_buf_cleanup+0x3c/0x17c [venus_dec] but task is already holding lock: ffffff80d3c3c4f8 (&q->mmap_lock){+.+.}-{3:3}, at: vb2_core_reqbufs+0xe4/0x390 [videobuf2_common] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (&q->mmap_lock){+.+.}-{3:3}: __mutex_lock_common+0xcc/0xb88 mutex_lock_nested+0x5c/0x68 vb2_mmap+0xf4/0x290 [videobuf2_common] v4l2_m2m_fop_mmap+0x44/0x50 [v4l2_mem2mem] v4l2_mmap+0x5c/0xa4 mmap_region+0x310/0x5a4 do_mmap+0x348/0x43c vm_mmap_pgoff+0xfc/0x178 ksys_mmap_pgoff+0x84/0xfc __arm64_compat_sys_aarch32_mmap2+0x2c/0x38 invoke_syscall+0x54/0x110 el0_svc_common+0x88/0xf0 do_el0_svc_compat+0x28/0x34 el0_svc_compat+0x24/0x34 el0_sync_compat_handler+0xc0/0xf0 el0_sync_compat+0x19c/0x1c0 -> #4 (&mm->mmap_lock){++++}-{3:3}: __might_fault+0x60/0x88 filldir64+0x124/0x3a0 dcache_readdir+0x7c/0x1ec iterate_dir+0xc4/0x184 __arm64_sys_getdents64+0x78/0x170 invoke_syscall+0x54/0x110 el0_svc_common+0xa8/0xf0 do_el0_svc_compat+0x28/0x34 el0_svc_compat+0x24/0x34 el0_sync_compat_handler+0xc0/0xf0 el0_sync_compat+0x19c/0x1c0 -> #3 (&sb->s_type->i_mutex_key#3){++++}-{3:3}: down_write+0x94/0x1f4 start_creating+0xb0/0x174 debugfs_create_dir+0x28/0x138 opp_debug_register+0x88/0xc0 _add_opp_dev+0x84/0x9c _add_opp_table_indexed+0x16c/0x310 _of_add_table_indexed+0x70/0xb5c dev_pm_opp_of_add_table_indexed+0x20/0x2c of_genpd_add_provider_onecell+0xc4/0x1c8 rpmhpd_probe+0x21c/0x278 platform_probe+0xb4/0xd4 really_probe+0x140/0x35c driver_probe_device+0x90/0xcc __device_attach_driver+0xa4/0xc0 bus_for_each_drv+0x8c/0xd8 __device_attach+0xc4/0x150 device_initial_probe+0x20/0x2c bus_probe_device+0x40/0xa4 device_add+0x22c/0x3fc of_device_add+0x44/0x54 of_platform_device_create_pdata+0xb0/0xf4 of_platform_bus_create+0x1d0/0x350 of_platform_populate+0x80/0xd4 devm_of_platform_populate+0x64/0xb0 rpmh_rsc_probe+0x378/0x3dc platform_probe+0xb4/0xd4 really_probe+0x140/0x35c driver_probe_device+0x90/0xcc __device_attach_driver+0xa4/0xc0 bus_for_each_drv+0x8c/0xd8 __device_attach+0xc4/0x150 device_initial_probe+0x20/0x2c bus_probe_device+0x40/0xa4 device_add+0x22c/0x3fc of_device_add+0x44/0x54 of_platform_device_create_pdata+0xb0/0xf4 of_platform_bus_create+0x1d0/0x350 of_platform_bus_create+0x21c/0x350 of_platform_populate+0x80/0xd4 of_platform_default_populate_init+0xb8/0xd4 do_one_initcall+0x1b4/0x400 do_initcall_level+0xa8/0xc8 do_initcalls+0x5c/0x9c do_basic_setup+0x2c/0x38 kernel_init_freeable+0x1a4/0x1ec kernel_init+0x20/0x118 ret_from_fork+0x10/0x30 -> #2 (gpd_list_lock){+.+.}-{3:3}: __mutex_lock_common+0xcc/0xb88 mutex_lock_nested+0x5c/0x68 __genpd_dev_pm_attach+0x70/0x18c genpd_dev_pm_attach_by_id+0xe4/0x158 genpd_dev_pm_attach_by_name+0x48/0x60 dev_pm_domain_attach_by_name+0x2c/0x38 dev_pm_opp_attach_genpd+0xac/0x160 vcodec_domains_get+0x94/0x14c [venus_core] core_get_v4+0x150/0x188 [venus_core] venus_probe+0x138/0x444 [venus_core] platform_probe+0xb4/0xd4 really_probe+0x140/0x35c driver_probe_device+0x90/0xcc device_driver_attach+0x58/0x7c __driver_attach+0xc8/0xe0 bus_for_each_dev+0x88/0xd4 driver_attach+0x30/0x3c bus_add_driver+0x10c/0x1e0 driver_register+0x70/0x108 __platform_driver_register+0x30/0x3c 0xffffffde113e1044 do_one_initcall+0x1b4/0x400 do_init_module+0x64/0x1fc load_module+0x17f4/0x1958 __arm64_sys_finit_module+0xb4/0xf0 invoke_syscall+0x54/0x110 el0_svc_common+0x88/0xf0 do_el0_svc_compat+0x28/0x34 el0_svc_compat+0x24/0x34 el0_sync_compat_handler+0xc0/0xf0 el0_sync_compat+0x19c/0x1c0 -> #1 (&opp_table->genpd_virt_dev_lock){+.+.}-{3:3}: __mutex_lock_common+0xcc/0xb88 mutex_lock_nested+0x5c/0x68 _set_required_opps+0x74/0x120 _set_opp+0x94/0x37c dev_pm_opp_set_rate+0xa0/0x194 core_clks_set_rate+0x28/0x58 [venus_core] load_scale_v4+0x228/0x2b4 [venus_core] session_process_buf+0x160/0x198 [venus_core] venus_helper_vb2_buf_queue+0xcc/0x130 [venus_core] vdec_vb2_buf_queue+0xc4/0x140 [venus_dec] __enqueue_in_driver+0x164/0x188 [videobuf2_common] vb2_core_qbuf+0x13c/0x47c [videobuf2_common] vb2_qbuf+0x88/0xec [videobuf2_v4l2] v4l2_m2m_qbuf+0x84/0x15c [v4l2_mem2mem] v4l2_m2m_ioctl_qbuf+0x24/0x30 [v4l2_mem2mem] v4l_qbuf+0x54/0x68 __video_do_ioctl+0x2bc/0x3bc video_usercopy+0x558/0xb04 video_ioctl2+0x24/0x30 v4l2_ioctl+0x58/0x68 v4l2_compat_ioctl32+0x84/0xa0 __arm64_compat_sys_ioctl+0x12c/0x140 invoke_syscall+0x54/0x110 el0_svc_common+0x88/0xf0 do_el0_svc_compat+0x28/0x34 el0_svc_compat+0x24/0x34 el0_sync_compat_handler+0xc0/0xf0 el0_sync_compat+0x19c/0x1c0 -> #0 (&inst->lock#3){+.+.}-{3:3}: __lock_acquire+0x248c/0x2d6c lock_acquire+0x240/0x314 __mutex_lock_common+0xcc/0xb88 mutex_lock_nested+0x5c/0x68 vdec_buf_cleanup+0x3c/0x17c [venus_dec] __vb2_queue_free+0x98/0x204 [videobuf2_common] vb2_core_reqbufs+0x14c/0x390 [videobuf2_common] vb2_reqbufs+0x58/0x74 [videobuf2_v4l2] v4l2_m2m_reqbufs+0x58/0x90 [v4l2_mem2mem] v4l2_m2m_ioctl_reqbufs+0x24/0x30 [v4l2_mem2mem] v4l_reqbufs+0x58/0x6c __video_do_ioctl+0x2bc/0x3bc video_usercopy+0x558/0xb04 video_ioctl2+0x24/0x30 v4l2_ioctl+0x58/0x68 v4l2_compat_ioctl32+0x84/0xa0 __arm64_compat_sys_ioctl+0x12c/0x140 invoke_syscall+0x54/0x110 el0_svc_common+0x88/0xf0 do_el0_svc_compat+0x28/0x34 el0_svc_compat+0x24/0x34 el0_sync_compat_handler+0xc0/0xf0 el0_sync_compat+0x19c/0x1c0 other info that might help us debug this: Chain exists of: &inst->lock#3 --> &mm->mmap_lock --> &q->mmap_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->mmap_lock); lock(&mm->mmap_lock); lock(&q->mmap_lock); lock(&inst->lock#3); *** DEADLOCK *** 1 lock held by ThreadPoolSingl/3969: #0: ffffff80d3c3c4f8 (&q->mmap_lock){+.+.}-{3:3}, at: vb2_core_reqbufs+0xe4/0x390 [videobuf2_common] stack backtrace: CPU: 2 PID: 3969 Comm: ThreadPoolSingl Not tainted 5.13.0-rc2 #71 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) Call trace: dump_backtrace+0x0/0x1b4 show_stack+0x24/0x30 dump_stack+0xe0/0x15c print_circular_bug+0x32c/0x388 check_noncircular+0x138/0x140 __lock_acquire+0x248c/0x2d6c lock_acquire+0x240/0x314 __mutex_lock_common+0xcc/0xb88 mutex_lock_nested+0x5c/0x68 vdec_buf_cleanup+0x3c/0x17c [venus_dec] __vb2_queue_free+0x98/0x204 [videobuf2_common] vb2_core_reqbufs+0x14c/0x390 [videobuf2_common] vb2_reqbufs+0x58/0x74 [videobuf2_v4l2] v4l2_m2m_reqbufs+0x58/0x90 [v4l2_mem2mem] v4l2_m2m_ioctl_reqbufs+0x24/0x30 [v4l2_mem2mem] v4l_reqbufs+0x58/0x6c __video_do_ioctl+0x2bc/0x3bc video_usercopy+0x558/0xb04 video_ioctl2+0x24/0x30 v4l2_ioctl+0x58/0x68 v4l2_compat_ioctl32+0x84/0xa0 __arm64_compat_sys_ioctl+0x12c/0x140 invoke_syscall+0x54/0x110 el0_svc_common+0x88/0xf0 do_el0_svc_compat+0x28/0x34 el0_svc_compat+0x24/0x34 el0_sync_compat_handler+0xc0/0xf0 el0_sync_compat+0x19c/0x1c0 The 'gpd_list_lock' is nominally named as such to protect the 'gpd_list' from concurrent access and mutation. Unfortunately, holding that mutex around various OPP framework calls leads to lockdep splats because now we're doing various operations in OPP core such as registering with debugfs while holding the list lock. We don't need to hold any list mutex while we're calling into OPP, so let's shrink the locking area of the 'gpd_list_lock' so that lockdep isn't triggered. This also helps reduce contention on this lock, which probably doesn't matter much but at least is nice to have. Cc: Len Brown <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Cc: Viresh Kumar <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Jul 16, 2021
ASan reports a heap-buffer-overflow in elf_sec__is_text when using perf-top. The bug is caused by the fact that secstrs is built from runtime_ss, while shdr is built from syms_ss if shdr.sh_type != SHT_NOBITS. Therefore, they point to two different ELF files. This patch renames secstrs to secstrs_run and adds secstrs_sym, so that the correct secstrs is chosen depending on shdr.sh_type. $ ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1 ./perf top ================================================================= ==363148==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61300009add6 at pc 0x00000049875c bp 0x7f4f56446440 sp 0x7f4f56445bf0 READ of size 1 at 0x61300009add6 thread T6 #0 0x49875b in StrstrCheck(void*, char*, char const*, char const*) (/home/user/linux/tools/perf/perf+0x49875b) #1 0x4d13a2 in strstr (/home/user/linux/tools/perf/perf+0x4d13a2) #2 0xacae36 in elf_sec__is_text /home/user/linux/tools/perf/util/symbol-elf.c:176:9 #3 0xac3ec9 in elf_sec__filter /home/user/linux/tools/perf/util/symbol-elf.c:187:9 #4 0xac2c3d in dso__load_sym /home/user/linux/tools/perf/util/symbol-elf.c:1254:20 #5 0x883981 in dso__load /home/user/linux/tools/perf/util/symbol.c:1897:9 #6 0x8e6248 in map__load /home/user/linux/tools/perf/util/map.c:332:7 #7 0x8e66e5 in map__find_symbol /home/user/linux/tools/perf/util/map.c:366:6 #8 0x7f8278 in machine__resolve /home/user/linux/tools/perf/util/event.c:707:13 #9 0x5f3d1a in perf_event__process_sample /home/user/linux/tools/perf/builtin-top.c:773:6 #10 0x5f30e4 in deliver_event /home/user/linux/tools/perf/builtin-top.c:1197:3 #11 0x908a72 in do_flush /home/user/linux/tools/perf/util/ordered-events.c:244:9 #12 0x905fae in __ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:323:8 #13 0x9058db in ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:341:9 #14 0x5f19b1 in process_thread /home/user/linux/tools/perf/builtin-top.c:1109:7 #15 0x7f4f6a21a298 in start_thread /usr/src/debug/glibc-2.33-16.fc34.x86_64/nptl/pthread_create.c:481:8 #16 0x7f4f697d0352 in clone ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 0x61300009add6 is located 10 bytes to the right of 332-byte region [0x61300009ac80,0x61300009adcc) allocated by thread T6 here: #0 0x4f3f7f in malloc (/home/user/linux/tools/perf/perf+0x4f3f7f) #1 0x7f4f6a0a88d9 (/lib64/libelf.so.1+0xa8d9) Thread T6 created by T0 here: #0 0x464856 in pthread_create (/home/user/linux/tools/perf/perf+0x464856) #1 0x5f06e0 in __cmd_top /home/user/linux/tools/perf/builtin-top.c:1309:6 #2 0x5ef19f in cmd_top /home/user/linux/tools/perf/builtin-top.c:1762:11 #3 0x7b28c0 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #4 0x7b119f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #5 0x7b2423 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #6 0x7b0c19 in main /home/user/linux/tools/perf/perf.c:539:3 #7 0x7f4f696f7b74 in __libc_start_main /usr/src/debug/glibc-2.33-16.fc34.x86_64/csu/../csu/libc-start.c:332:16 SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/user/linux/tools/perf/perf+0x49875b) in StrstrCheck(void*, char*, char const*, char const*) Shadow bytes around the buggy address: 0x0c268000b560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c268000b5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c268000b5b0: 00 00 00 00 00 00 00 00 00 04[fa]fa fa fa fa fa 0x0c268000b5c0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 0x0c268000b5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c268000b5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c268000b5f0: 07 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==363148==ABORTING Suggested-by: Jiri Slaby <[email protected]> Signed-off-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Remi Bernon <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
[ Upstream commit 8ef9dc0 ] We got the following lockdep splat while running fstests (specifically btrfs/003 and btrfs/020 in a row) with the new rc. This was uncovered by 87579e9 ("loop: use worker per cgroup instead of kworker") which converted loop to using workqueues, which comes with lockdep annotations that don't exist with kworkers. The lockdep splat is as follows: WARNING: possible circular locking dependency detected 5.14.0-rc2-custom+ #34 Not tainted ------------------------------------------------------ losetup/156417 is trying to acquire lock: ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600 but task is already holding lock: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x28/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x163/0x3a0 path_openat+0x74d/0xa40 do_filp_open+0x9c/0x140 do_sys_openat2+0xb1/0x170 __x64_sys_openat+0x54/0x90 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #4 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 blkdev_get_by_dev.part.0+0xd1/0x3c0 blkdev_get_by_path+0xc0/0xd0 btrfs_scan_one_device+0x52/0x1f0 [btrfs] btrfs_control_ioctl+0xac/0x170 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #3 (uuid_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 btrfs_rm_device+0x48/0x6a0 [btrfs] btrfs_ioctl+0x2d1c/0x3110 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #2 (sb_writers#11){.+.+}-{0:0}: lo_write_bvec+0x112/0x290 [loop] loop_process_work+0x25f/0xcb0 [loop] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x266/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 flush_workqueue+0xae/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); *** DEADLOCK *** 1 lock held by losetup/156417: #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] stack backtrace: CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0x10a/0x120 __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 ? flush_workqueue+0x84/0x600 flush_workqueue+0xae/0x600 ? flush_workqueue+0x84/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] ? __lock_acquire+0x3a0/0x1dc0 ? update_dl_rq_load_avg+0x152/0x360 ? lock_is_held_type+0xa5/0x120 ? find_held_lock.constprop.0+0x2b/0x80 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f645884de6b Usually the uuid_mutex exists to protect the fs_devices that map together all of the devices that match a specific uuid. In rm_device we're messing with the uuid of a device, so it makes sense to protect that here. However in doing that it pulls in a whole host of lockdep dependencies, as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus we end up with the dependency chain under the uuid_mutex being added under the normal sb write dependency chain, which causes problems with loop devices. We don't need the uuid mutex here however. If we call btrfs_scan_one_device() before we scratch the super block we will find the fs_devices and not find the device itself and return EBUSY because the fs_devices is open. If we call it after the scratch happens it will not appear to be a valid btrfs file system. We do not need to worry about other fs_devices modifying operations here because we're protected by the exclusive operations locking. So drop the uuid_mutex here in order to fix the lockdep splat. A more detailed explanation from the discussion: We are worried about rm and scan racing with each other, before this change we'll zero the device out under the UUID mutex so when scan does run it'll make sure that it can go through the whole device scan thing without rm messing with us. We aren't worried if the scratch happens first, because the result is we don't think this is a btrfs device and we bail out. The only case we are concerned with is we scratch _after_ scan is able to read the superblock and gets a seemingly valid super block, so lets consider this case. Scan will call device_list_add() with the device we're removing. We'll call find_fsid_with_metadata_uuid() and get our fs_devices for this UUID. At this point we lock the fs_devices->device_list_mutex. This is what protects us in this case, but we have two cases here. 1. We aren't to the device removal part of the RM. We found our device, and device name matches our path, we go down and we set total_devices to our super number of devices, which doesn't affect anything because we haven't done the remove yet. 2. We are past the device removal part, which is protected by the device_list_mutex. Scan doesn't find the device, it goes down and does the if (fs_devices->opened) return -EBUSY; check and we bail out. Nothing about this situation is ideal, but the lockdep splat is real, and the fix is safe, tho admittedly a bit scary looking. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> [ copy more from the discussion ] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
…eam state [ Upstream commit b7b1d02 ] The internal stream state sets the timeout to 120 seconds 2 seconds after the creation of the flow, attach this internal stream state to the IPS_ASSURED flag for consistent event reporting. Before this patch: [NEW] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] [DESTROY] udp 17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] Note IPS_ASSURED for the flow not yet in the internal stream state. after this update: [NEW] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 120 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] [DESTROY] udp 17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] Before this patch, short-lived UDP flows never entered IPS_ASSURED, so they were already candidate flow to be deleted by early_drop under stress. Before this patch, IPS_ASSURED is set on regardless the internal stream state, attach this internal stream state to IPS_ASSURED. packet #1 (original direction) enters NEW state packet #2 (reply direction) enters ESTABLISHED state, sets on IPS_SEEN_REPLY paclet #3 (any direction) sets on IPS_ASSURED (if 2 seconds since the creation has passed by). Reported-by: Maciej Żenczykowski <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
[ Upstream commit 64ea2d0 ] The change of devlink_alloc() to accept device makes sure that device is fully initialized and device_register() does nothing except allowing users to use that devlink instance. Such change ensures that no user input will be usable till that point and it eliminates the need to worry about internal locking as long as devlink_register is called last since all accesses to the devlink are during initialization. This change fixes the following lockdep warning. ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ #27 Not tainted ------------------------------------------------------ devlink/265 is trying to acquire lock: ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core] but task is already holding lock: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (devlink_mutex){+.+.}-{3:3}: __mutex_lock+0x149/0x1310 devlink_register+0xe7/0x280 mlx5_devlink_register+0x118/0x480 [mlx5_core] mlx5_init_one+0x34b/0x440 [mlx5_core] probe_one+0x480/0x6e0 [mlx5_core] pci_device_probe+0x2a0/0x4a0 really_probe+0x1cb/0xba0 __driver_probe_device+0x18f/0x470 driver_probe_device+0x49/0x120 __driver_attach+0x1ce/0x400 bus_for_each_dev+0x11e/0x1a0 bus_add_driver+0x309/0x570 driver_register+0x20f/0x390 0xffffffffa04a0062 do_one_initcall+0xd5/0x400 do_init_module+0x1c8/0x760 load_module+0x7d9d/0xa4b0 __do_sys_finit_module+0x118/0x1a0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2999/0x5a40 lock_acquire+0x1a9/0x4a0 __mutex_lock+0x149/0x1310 mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] devlink_reload+0x1f2/0x640 devlink_nl_cmd_reload+0x6c3/0x10d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x27f/0x4a0 netlink_rcv_skb+0x11e/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6fb/0xbe0 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(devlink_mutex); lock(&dev->intf_state_mutex); lock(devlink_mutex); lock(&dev->intf_state_mutex); *** DEADLOCK *** 3 locks held by devlink/265: #0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0 #2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 stack backtrace: CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0x268/0x310 ? print_circular_bug+0x460/0x460 ? __kernel_text_address+0xe/0x30 ? alloc_chain_hlocks+0x1e6/0x5a0 __lock_acquire+0x2999/0x5a40 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? add_lock_to_list.constprop.0+0x6c/0x530 lock_acquire+0x1a9/0x4a0 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_release+0x6c0/0x6c0 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? lock_is_held_type+0x98/0x110 __mutex_lock+0x149/0x1310 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_is_held_type+0x98/0x110 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? find_held_lock+0x2d/0x110 ? mutex_lock_io_nested+0x1160/0x1160 ? mlx5_lag_is_active+0x72/0x90 [mlx5_core] ? lock_downgrade+0x6d0/0x6d0 ? do_raw_spin_lock+0x12e/0x270 ? rwlock_bug.part.0+0x90/0x90 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] ? netlink_broadcast_filtered+0x308/0xac0 ? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core] ? __build_skb_around+0x110/0x2b0 ? __alloc_skb+0x113/0x2b0 devlink_reload+0x1f2/0x640 ? devlink_unregister+0x1e0/0x1e0 ? security_capable+0x51/0x90 devlink_nl_cmd_reload+0x6c3/0x10d0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? devlink_nl_pre_doit+0x72/0x8d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 ? __lock_acquire+0x15e2/0x5a40 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? mutex_lock_io_nested+0x1160/0x1160 ? security_capable+0x51/0x90 genl_rcv_msg+0x27f/0x4a0 ? genl_get_cmd+0x3c0/0x3c0 ? lock_acquire+0x1a9/0x4a0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? lock_release+0x6c0/0x6c0 netlink_rcv_skb+0x11e/0x340 ? genl_get_cmd+0x3c0/0x3c0 ? netlink_ack+0x930/0x930 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 ? netlink_attachskb+0x750/0x750 ? __alloc_skb+0x113/0x2b0 netlink_sendmsg+0x6fb/0xbe0 ? netlink_unicast+0x700/0x700 ? netlink_unicast+0x700/0x700 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 ? __x64_sys_getpeername+0xb0/0xb0 ? do_sys_openat2+0x10a/0x370 ? down_write_nested+0x150/0x150 ? do_user_addr_fault+0x215/0xd50 ? __x64_sys_openat+0x11f/0x1d0 ? __x64_sys_open+0x1a0/0x1a0 __x64_sys_sendto+0xdc/0x1b0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f50b50b6b3a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003 RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440 mlx5_core 0000:00:09.0: firmware version: 4.8.9999 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link) mlx5_core 0000:00:09.0 eth1: Link up Fixes: a6f3b62 ("net/mlx5: Move devlink registration before interfaces load") Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
commit 2aa3660 upstream. It is generally unsafe to call put_device() with dpm_list_mtx held, because the given device's release routine may carry out an action depending on that lock which then may deadlock, so modify the system-wide suspend and resume of devices to always drop dpm_list_mtx before calling put_device() (and adjust white space somewhat while at it). For instance, this prevents the following splat from showing up in the kernel log after a system resume in certain configurations: [ 3290.969514] ====================================================== [ 3290.969517] WARNING: possible circular locking dependency detected [ 3290.969519] 5.15.0+ #2420 Tainted: G S [ 3290.969523] ------------------------------------------------------ [ 3290.969525] systemd-sleep/4553 is trying to acquire lock: [ 3290.969529] ffff888117ab1138 ((wq_completion)hci0#2){+.+.}-{0:0}, at: flush_workqueue+0x87/0x4a0 [ 3290.969554] but task is already holding lock: [ 3290.969556] ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0 [ 3290.969571] which lock already depends on the new lock. [ 3290.969573] the existing dependency chain (in reverse order) is: [ 3290.969575] -> #3 (dpm_list_mtx){+.+.}-{3:3}: [ 3290.969583] __mutex_lock+0x9d/0xa30 [ 3290.969591] device_pm_add+0x2e/0xe0 [ 3290.969597] device_add+0x4d5/0x8f0 [ 3290.969605] hci_conn_add_sysfs+0x43/0xb0 [bluetooth] [ 3290.969689] hci_conn_complete_evt.isra.71+0x124/0x750 [bluetooth] [ 3290.969747] hci_event_packet+0xd6c/0x28a0 [bluetooth] [ 3290.969798] hci_rx_work+0x213/0x640 [bluetooth] [ 3290.969842] process_one_work+0x2aa/0x650 [ 3290.969851] worker_thread+0x39/0x400 [ 3290.969859] kthread+0x142/0x170 [ 3290.969865] ret_from_fork+0x22/0x30 [ 3290.969872] -> #2 (&hdev->lock){+.+.}-{3:3}: [ 3290.969881] __mutex_lock+0x9d/0xa30 [ 3290.969887] hci_event_packet+0xba/0x28a0 [bluetooth] [ 3290.969935] hci_rx_work+0x213/0x640 [bluetooth] [ 3290.969978] process_one_work+0x2aa/0x650 [ 3290.969985] worker_thread+0x39/0x400 [ 3290.969993] kthread+0x142/0x170 [ 3290.969999] ret_from_fork+0x22/0x30 [ 3290.970004] -> #1 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}: [ 3290.970013] process_one_work+0x27d/0x650 [ 3290.970020] worker_thread+0x39/0x400 [ 3290.970028] kthread+0x142/0x170 [ 3290.970033] ret_from_fork+0x22/0x30 [ 3290.970038] -> #0 ((wq_completion)hci0#2){+.+.}-{0:0}: [ 3290.970047] __lock_acquire+0x15cb/0x1b50 [ 3290.970054] lock_acquire+0x26c/0x300 [ 3290.970059] flush_workqueue+0xae/0x4a0 [ 3290.970066] drain_workqueue+0xa1/0x130 [ 3290.970073] destroy_workqueue+0x34/0x1f0 [ 3290.970081] hci_release_dev+0x49/0x180 [bluetooth] [ 3290.970130] bt_host_release+0x1d/0x30 [bluetooth] [ 3290.970195] device_release+0x33/0x90 [ 3290.970201] kobject_release+0x63/0x160 [ 3290.970211] dpm_resume+0x164/0x3e0 [ 3290.970215] dpm_resume_end+0xd/0x20 [ 3290.970220] suspend_devices_and_enter+0x1a4/0xba0 [ 3290.970229] pm_suspend+0x26b/0x310 [ 3290.970236] state_store+0x42/0x90 [ 3290.970243] kernfs_fop_write_iter+0x135/0x1b0 [ 3290.970251] new_sync_write+0x125/0x1c0 [ 3290.970257] vfs_write+0x360/0x3c0 [ 3290.970263] ksys_write+0xa7/0xe0 [ 3290.970269] do_syscall_64+0x3a/0x80 [ 3290.970276] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3290.970284] other info that might help us debug this: [ 3290.970285] Chain exists of: (wq_completion)hci0#2 --> &hdev->lock --> dpm_list_mtx [ 3290.970297] Possible unsafe locking scenario: [ 3290.970299] CPU0 CPU1 [ 3290.970300] ---- ---- [ 3290.970302] lock(dpm_list_mtx); [ 3290.970306] lock(&hdev->lock); [ 3290.970310] lock(dpm_list_mtx); [ 3290.970314] lock((wq_completion)hci0#2); [ 3290.970319] *** DEADLOCK *** [ 3290.970321] 7 locks held by systemd-sleep/4553: [ 3290.970325] #0: ffff888103bcd448 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xa7/0xe0 [ 3290.970341] #1: ffff888115a14488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x103/0x1b0 [ 3290.970355] #2: ffff888100f719e0 (kn->active#233){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1b0 [ 3290.970369] #3: ffffffff82661048 (autosleep_lock){+.+.}-{3:3}, at: state_store+0x12/0x90 [ 3290.970384] #4: ffffffff82658ac8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9f/0x310 [ 3290.970399] #5: ffffffff827f2a48 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x4c/0x80 [ 3290.970416] #6: ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0 [ 3290.970428] stack backtrace: [ 3290.970431] CPU: 3 PID: 4553 Comm: systemd-sleep Tainted: G S 5.15.0+ #2420 [ 3290.970438] Hardware name: Dell Inc. XPS 13 9380/0RYJWW, BIOS 1.5.0 06/03/2019 [ 3290.970441] Call Trace: [ 3290.970446] dump_stack_lvl+0x44/0x57 [ 3290.970454] check_noncircular+0x105/0x120 [ 3290.970468] ? __lock_acquire+0x15cb/0x1b50 [ 3290.970474] __lock_acquire+0x15cb/0x1b50 [ 3290.970487] lock_acquire+0x26c/0x300 [ 3290.970493] ? flush_workqueue+0x87/0x4a0 [ 3290.970503] ? __raw_spin_lock_init+0x3b/0x60 [ 3290.970510] ? lockdep_init_map_type+0x58/0x240 [ 3290.970519] flush_workqueue+0xae/0x4a0 [ 3290.970526] ? flush_workqueue+0x87/0x4a0 [ 3290.970544] ? drain_workqueue+0xa1/0x130 [ 3290.970552] drain_workqueue+0xa1/0x130 [ 3290.970561] destroy_workqueue+0x34/0x1f0 [ 3290.970572] hci_release_dev+0x49/0x180 [bluetooth] [ 3290.970624] bt_host_release+0x1d/0x30 [bluetooth] [ 3290.970687] device_release+0x33/0x90 [ 3290.970695] kobject_release+0x63/0x160 [ 3290.970705] dpm_resume+0x164/0x3e0 [ 3290.970710] ? dpm_resume_early+0x251/0x3b0 [ 3290.970718] dpm_resume_end+0xd/0x20 [ 3290.970723] suspend_devices_and_enter+0x1a4/0xba0 [ 3290.970737] pm_suspend+0x26b/0x310 [ 3290.970746] state_store+0x42/0x90 [ 3290.970755] kernfs_fop_write_iter+0x135/0x1b0 [ 3290.970764] new_sync_write+0x125/0x1c0 [ 3290.970777] vfs_write+0x360/0x3c0 [ 3290.970785] ksys_write+0xa7/0xe0 [ 3290.970794] do_syscall_64+0x3a/0x80 [ 3290.970803] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3290.970811] RIP: 0033:0x7f41b1328164 [ 3290.970819] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 4a d2 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [ 3290.970824] RSP: 002b:00007ffe6ae21b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3290.970831] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f41b1328164 [ 3290.970836] RDX: 0000000000000004 RSI: 000055965e651070 RDI: 0000000000000004 [ 3290.970839] RBP: 000055965e651070 R08: 000055965e64f390 R09: 00007f41b1e3d1c0 [ 3290.970843] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000004 [ 3290.970846] R13: 0000000000000001 R14: 000055965e64f2b0 R15: 0000000000000004 Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
[ Upstream commit 54659ca ] when turning off a connection, lockdep complains with the following warning (a modprobe has been done but the same happens with a disconnection from NetworkManager, it's enough to trigger a cfg80211_disconnect call): [ 682.855867] ====================================================== [ 682.855877] WARNING: possible circular locking dependency detected [ 682.855887] 5.14.0-rc6+ #16 Tainted: G C OE [ 682.855898] ------------------------------------------------------ [ 682.855906] modprobe/1770 is trying to acquire lock: [ 682.855916] ffffb6d000332b00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_free_stainfo+0x52/0x4a0 [r8723bs] [ 682.856073] but task is already holding lock: [ 682.856081] ffffb6d0003336a8 (&pstapriv->sta_hash_lock){+.-.}-{2:2}, at: rtw_free_assoc_resources+0x48/0x110 [r8723bs] [ 682.856207] which lock already depends on the new lock. [ 682.856215] the existing dependency chain (in reverse order) is: [ 682.856223] -> #1 (&pstapriv->sta_hash_lock){+.-.}-{2:2}: [ 682.856247] _raw_spin_lock_bh+0x34/0x40 [ 682.856265] rtw_get_stainfo+0x9a/0x110 [r8723bs] [ 682.856389] rtw_xmit_classifier+0x27/0x130 [r8723bs] [ 682.856515] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs] [ 682.856642] rtl8723bs_hal_xmit+0x3b/0xb0 [r8723bs] [ 682.856752] rtw_xmit+0x4ef/0x890 [r8723bs] [ 682.856879] _rtw_xmit_entry+0xba/0x350 [r8723bs] [ 682.856981] dev_hard_start_xmit+0xee/0x320 [ 682.856999] sch_direct_xmit+0x8c/0x330 [ 682.857014] __dev_queue_xmit+0xba5/0xf00 [ 682.857030] packet_sendmsg+0x981/0x1b80 [ 682.857047] sock_sendmsg+0x5b/0x60 [ 682.857060] __sys_sendto+0xf1/0x160 [ 682.857073] __x64_sys_sendto+0x24/0x30 [ 682.857087] do_syscall_64+0x3a/0x80 [ 682.857102] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 682.857117] -> #0 (&pxmitpriv->lock){+.-.}-{2:2}: [ 682.857142] __lock_acquire+0xfd9/0x1b50 [ 682.857158] lock_acquire+0xb4/0x2c0 [ 682.857172] _raw_spin_lock_bh+0x34/0x40 [ 682.857185] rtw_free_stainfo+0x52/0x4a0 [r8723bs] [ 682.857308] rtw_free_assoc_resources+0x53/0x110 [r8723bs] [ 682.857415] cfg80211_rtw_disconnect+0x4b/0x70 [r8723bs] [ 682.857522] cfg80211_disconnect+0x12e/0x2f0 [cfg80211] [ 682.857759] cfg80211_leave+0x2b/0x40 [cfg80211] [ 682.857961] cfg80211_netdev_notifier_call+0xa9/0x560 [cfg80211] [ 682.858163] raw_notifier_call_chain+0x41/0x50 [ 682.858180] __dev_close_many+0x62/0x100 [ 682.858195] dev_close_many+0x7d/0x120 [ 682.858209] unregister_netdevice_many+0x416/0x680 [ 682.858225] unregister_netdevice_queue+0xab/0xf0 [ 682.858240] unregister_netdev+0x18/0x20 [ 682.858255] rtw_unregister_netdevs+0x28/0x40 [r8723bs] [ 682.858360] rtw_dev_remove+0x24/0xd0 [r8723bs] [ 682.858463] sdio_bus_remove+0x31/0xd0 [mmc_core] [ 682.858532] device_release_driver_internal+0xf7/0x1d0 [ 682.858550] driver_detach+0x47/0x90 [ 682.858564] bus_remove_driver+0x77/0xd0 [ 682.858579] rtw_drv_halt+0xc/0x678 [r8723bs] [ 682.858685] __x64_sys_delete_module+0x13f/0x250 [ 682.858699] do_syscall_64+0x3a/0x80 [ 682.858715] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 682.858729] other info that might help us debug this: [ 682.858737] Possible unsafe locking scenario: [ 682.858744] CPU0 CPU1 [ 682.858751] ---- ---- [ 682.858758] lock(&pstapriv->sta_hash_lock); [ 682.858772] lock(&pxmitpriv->lock); [ 682.858786] lock(&pstapriv->sta_hash_lock); [ 682.858799] lock(&pxmitpriv->lock); [ 682.858812] *** DEADLOCK *** [ 682.858820] 5 locks held by modprobe/1770: [ 682.858831] #0: ffff8d870697d980 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x1a/0x1d0 [ 682.858869] #1: ffffffffbdbbf1c8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20 [ 682.858906] #2: ffff8d87054ee5e8 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: cfg80211_netdev_notifier_call+0x9e/0x560 [cfg80211] [ 682.859131] #3: ffff8d870f2bc8f0 (&wdev->mtx){+.+.}-{3:3}, at: cfg80211_leave+0x20/0x40 [cfg80211] [ 682.859354] #4: ffffb6d0003336a8 (&pstapriv->sta_hash_lock){+.-.}-{2:2}, at: rtw_free_assoc_resources+0x48/0x110 [r8723bs] [ 682.859482] stack backtrace: [ 682.859491] CPU: 1 PID: 1770 Comm: modprobe Tainted: G C OE 5.14.0-rc6+ #16 [ 682.859507] Hardware name: LENOVO 80NR/Madrid, BIOS DACN25WW 08/20/2015 [ 682.859517] Call Trace: [ 682.859531] dump_stack_lvl+0x56/0x6f [ 682.859551] check_noncircular+0xdb/0xf0 [ 682.859579] __lock_acquire+0xfd9/0x1b50 [ 682.859606] lock_acquire+0xb4/0x2c0 [ 682.859623] ? rtw_free_stainfo+0x52/0x4a0 [r8723bs] [ 682.859752] ? mark_held_locks+0x48/0x70 [ 682.859769] ? rtw_free_stainfo+0x4a/0x4a0 [r8723bs] [ 682.859898] _raw_spin_lock_bh+0x34/0x40 [ 682.859914] ? rtw_free_stainfo+0x52/0x4a0 [r8723bs] [ 682.860039] rtw_free_stainfo+0x52/0x4a0 [r8723bs] [ 682.860171] rtw_free_assoc_resources+0x53/0x110 [r8723bs] [ 682.860286] cfg80211_rtw_disconnect+0x4b/0x70 [r8723bs] [ 682.860397] cfg80211_disconnect+0x12e/0x2f0 [cfg80211] [ 682.860629] cfg80211_leave+0x2b/0x40 [cfg80211] [ 682.860836] cfg80211_netdev_notifier_call+0xa9/0x560 [cfg80211] [ 682.861048] ? __lock_acquire+0x4dc/0x1b50 [ 682.861070] ? lock_is_held_type+0xa8/0x110 [ 682.861089] ? lock_is_held_type+0xa8/0x110 [ 682.861104] ? find_held_lock+0x2d/0x90 [ 682.861120] ? packet_notifier+0x173/0x300 [ 682.861141] ? lock_release+0xb3/0x250 [ 682.861160] ? packet_notifier+0x192/0x300 [ 682.861184] raw_notifier_call_chain+0x41/0x50 [ 682.861205] __dev_close_many+0x62/0x100 [ 682.861224] dev_close_many+0x7d/0x120 [ 682.861245] unregister_netdevice_many+0x416/0x680 [ 682.861264] ? find_held_lock+0x2d/0x90 [ 682.861284] unregister_netdevice_queue+0xab/0xf0 [ 682.861306] unregister_netdev+0x18/0x20 [ 682.861325] rtw_unregister_netdevs+0x28/0x40 [r8723bs] [ 682.861434] rtw_dev_remove+0x24/0xd0 [r8723bs] [ 682.861542] sdio_bus_remove+0x31/0xd0 [mmc_core] [ 682.861615] device_release_driver_internal+0xf7/0x1d0 [ 682.861637] driver_detach+0x47/0x90 [ 682.861656] bus_remove_driver+0x77/0xd0 [ 682.861674] rtw_drv_halt+0xc/0x678 [r8723bs] [ 682.861782] __x64_sys_delete_module+0x13f/0x250 [ 682.861801] ? lockdep_hardirqs_on_prepare+0xf3/0x170 [ 682.861817] ? syscall_enter_from_user_mode+0x20/0x70 [ 682.861836] do_syscall_64+0x3a/0x80 [ 682.861855] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 682.861873] RIP: 0033:0x7f6dbe85400b [ 682.861890] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48 [ 682.861906] RSP: 002b:00007ffe7a82f538 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 682.861923] RAX: ffffffffffffffda RBX: 000055a64693bd20 RCX: 00007f6dbe85400b [ 682.861935] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a64693bd88 [ 682.861946] RBP: 000055a64693bd20 R08: 0000000000000000 R09: 0000000000000000 [ 682.861957] R10: 00007f6dbe8c7ac0 R11: 0000000000000206 R12: 000055a64693bd88 [ 682.861967] R13: 0000000000000000 R14: 000055a64693bd88 R15: 00007ffe7a831848 This happens because when we enqueue a frame for transmission we do it under xmit_priv lock, then calling rtw_get_stainfo (needed for enqueuing) takes sta_hash_lock and this leads to the following lock dependency: xmit_priv->lock -> sta_hash_lock Turning off a connection will bring to call rtw_free_assoc_resources which will set up the inverse dependency: sta_hash_lock -> xmit_priv_lock This could lead to a deadlock as lockdep complains. Fix it by removing the xmit_priv->lock around rtw_xmitframe_enqueue call inside rtl8723bs_hal_xmit and put it in a smaller critical section inside rtw_xmit_classifier, the only place where xmit_priv data are actually accessed. Replace spin_{lock,unlock}_bh(pxmitpriv->lock) in other tx paths leading to rtw_xmitframe_enqueue call with spin_{lock,unlock}_bh(psta->sleep_q.lock) - it's not clear why accessing a sleep_q was protected by a spinlock on xmitpriv->lock. This way is avoided the same faulty lock nesting order. Extra changes in v2 by Hans de Goede: -Lift the taking of the struct __queue.lock spinlock out of rtw_free_xmitframe_queue() into the callers this allows also protecting a bunch of related state in rtw_free_stainfo(): -Protect psta->sleepq_len on rtw_free_xmitframe_queue(&psta->sleep_q); -Protect struct tx_servq.tx_pending and tx_servq.qcnt when calling rtw_free_xmitframe_queue(&tx_servq.sta_pending) -This also allows moving the spin_lock_bh(&pxmitpriv->lock); to below the sleep_q free-ing code, avoiding another ABBA locking issue CC: Larry Finger <[email protected]> Co-developed-by: Hans de Goede <[email protected]> Tested-on: Lenovo Ideapad MiiX 300-10IBY Signed-off-by: Fabio Aiuto <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
[ Upstream commit bdc1bbd ] The assoc_timer takes the pmlmepriv->lock and various functions which take the pmlmepriv->scanned_queue.lock first take the pmlmepriv->lock, this means that we cannot have code which waits for the timer (timer_del_sync) while holding the pmlmepriv->scanned_queue.lock to avoid a triangle deadlock: [ 363.139361] ====================================================== [ 363.139377] WARNING: possible circular locking dependency detected [ 363.139396] 5.15.0-rc1+ #470 Tainted: G C E [ 363.139413] ------------------------------------------------------ [ 363.139424] RTW_CMD_THREAD/2466 is trying to acquire lock: [ 363.139441] ffffbacd00699038 (&pmlmepriv->lock){+.-.}-{2:2}, at: _rtw_join_timeout_handler+0x3c/0x160 [r8723bs] [ 363.139598] but task is already holding lock: [ 363.139610] ffffbacd00128ea0 ((&pmlmepriv->assoc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x260 [ 363.139673] which lock already depends on the new lock. [ 363.139684] the existing dependency chain (in reverse order) is: [ 363.139696] -> #2 ((&pmlmepriv->assoc_timer)){+.-.}-{0:0}: [ 363.139734] del_timer_sync+0x59/0x100 [ 363.139762] rtw_joinbss_event_prehandle+0x342/0x640 [r8723bs] [ 363.139870] report_join_res+0xdf/0x110 [r8723bs] [ 363.139980] OnAssocRsp+0x17a/0x200 [r8723bs] [ 363.140092] rtw_recv_entry+0x190/0x1120 [r8723bs] [ 363.140209] rtl8723b_process_phy_info+0x3f9/0x750 [r8723bs] [ 363.140318] tasklet_action_common.constprop.0+0xe8/0x110 [ 363.140345] __do_softirq+0xde/0x485 [ 363.140372] __irq_exit_rcu+0xd0/0x100 [ 363.140393] irq_exit_rcu+0xa/0x20 [ 363.140413] common_interrupt+0x83/0xa0 [ 363.140440] asm_common_interrupt+0x1e/0x40 [ 363.140463] finish_task_switch.isra.0+0x157/0x3d0 [ 363.140492] __schedule+0x447/0x1880 [ 363.140516] schedule+0x59/0xc0 [ 363.140537] smpboot_thread_fn+0x161/0x1c0 [ 363.140565] kthread+0x143/0x160 [ 363.140585] ret_from_fork+0x22/0x30 [ 363.140614] -> #1 (&pmlmepriv->scanned_queue.lock){+.-.}-{2:2}: [ 363.140653] _raw_spin_lock_bh+0x34/0x40 [ 363.140675] rtw_free_network_queue+0x31/0x80 [r8723bs] [ 363.140776] rtw_sitesurvey_cmd+0x79/0x1e0 [r8723bs] [ 363.140869] rtw_cfg80211_surveydone_event_callback+0x3cf/0x470 [r8723bs] [ 363.140973] rdev_scan+0x42/0x1a0 [cfg80211] [ 363.141307] nl80211_trigger_scan+0x566/0x660 [cfg80211] [ 363.141635] genl_family_rcv_msg_doit+0xcd/0x110 [ 363.141661] genl_rcv_msg+0xce/0x1c0 [ 363.141680] netlink_rcv_skb+0x50/0xf0 [ 363.141699] genl_rcv+0x24/0x40 [ 363.141717] netlink_unicast+0x16d/0x230 [ 363.141736] netlink_sendmsg+0x22b/0x450 [ 363.141755] sock_sendmsg+0x5e/0x60 [ 363.141781] ____sys_sendmsg+0x22f/0x270 [ 363.141803] ___sys_sendmsg+0x81/0xc0 [ 363.141828] __sys_sendmsg+0x49/0x80 [ 363.141851] do_syscall_64+0x3b/0x90 [ 363.141873] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 363.141895] -> #0 (&pmlmepriv->lock){+.-.}-{2:2}: [ 363.141930] __lock_acquire+0x1158/0x1de0 [ 363.141954] lock_acquire+0xb5/0x2b0 [ 363.141974] _raw_spin_lock_bh+0x34/0x40 [ 363.141993] _rtw_join_timeout_handler+0x3c/0x160 [r8723bs] [ 363.142097] call_timer_fn+0x94/0x260 [ 363.142122] __run_timers.part.0+0x1bf/0x290 [ 363.142147] run_timer_softirq+0x26/0x50 [ 363.142171] __do_softirq+0xde/0x485 [ 363.142193] __irq_exit_rcu+0xd0/0x100 [ 363.142215] irq_exit_rcu+0xa/0x20 [ 363.142235] sysvec_apic_timer_interrupt+0x72/0x90 [ 363.142260] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 363.142283] __module_address.part.0+0x0/0xd0 [ 363.142309] is_module_address+0x25/0x40 [ 363.142334] static_obj+0x4f/0x60 [ 363.142361] lockdep_init_map_type+0x47/0x220 [ 363.142382] __init_swait_queue_head+0x45/0x60 [ 363.142408] mmc_wait_for_req+0x4a/0xc0 [mmc_core] [ 363.142504] mmc_wait_for_cmd+0x55/0x70 [mmc_core] [ 363.142592] mmc_io_rw_direct+0x75/0xe0 [mmc_core] [ 363.142691] sdio_writeb+0x2e/0x50 [mmc_core] [ 363.142788] _sd_cmd52_write+0x62/0x80 [r8723bs] [ 363.142885] sd_cmd52_write+0x6c/0xb0 [r8723bs] [ 363.142981] rtl8723bs_set_hal_ops+0x982/0x9b0 [r8723bs] [ 363.143089] rtw_write16+0x1e/0x30 [r8723bs] [ 363.143184] SetHwReg8723B+0xcc9/0xd30 [r8723bs] [ 363.143294] mlmeext_joinbss_event_callback+0x17a/0x1a0 [r8723bs] [ 363.143405] rtw_joinbss_event_callback+0x11/0x20 [r8723bs] [ 363.143507] mlme_evt_hdl+0x4d/0x70 [r8723bs] [ 363.143620] rtw_cmd_thread+0x168/0x3c0 [r8723bs] [ 363.143712] kthread+0x143/0x160 [ 363.143732] ret_from_fork+0x22/0x30 [ 363.143757] other info that might help us debug this: [ 363.143768] Chain exists of: &pmlmepriv->lock --> &pmlmepriv->scanned_queue.lock --> (&pmlmepriv->assoc_timer) [ 363.143809] Possible unsafe locking scenario: [ 363.143819] CPU0 CPU1 [ 363.143831] ---- ---- [ 363.143841] lock((&pmlmepriv->assoc_timer)); [ 363.143862] lock(&pmlmepriv->scanned_queue.lock); [ 363.143882] lock((&pmlmepriv->assoc_timer)); [ 363.143902] lock(&pmlmepriv->lock); [ 363.143921] *** DEADLOCK *** Make rtw_joinbss_event_prehandle() release the scanned_queue.lock before it deletes the timer to avoid this (it is still holding pmlmepriv->lock protecting against racing the timer). Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
[ Upstream commit f347c26 ] The following issue was observed running syzkaller: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:377 [inline] BUG: KASAN: slab-out-of-bounds in sg_copy_buffer+0x150/0x1c0 lib/scatterlist.c:831 Read of size 2132 at addr ffff8880aea95dc8 by task syz-executor.0/9815 CPU: 0 PID: 9815 Comm: syz-executor.0 Not tainted 4.19.202-00874-gfc0fe04215a9 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe4/0x14a lib/dump_stack.c:118 print_address_description+0x73/0x280 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:352 [inline] kasan_report+0x272/0x370 mm/kasan/report.c:410 memcpy+0x1f/0x50 mm/kasan/kasan.c:302 memcpy include/linux/string.h:377 [inline] sg_copy_buffer+0x150/0x1c0 lib/scatterlist.c:831 fill_from_dev_buffer+0x14f/0x340 drivers/scsi/scsi_debug.c:1021 resp_report_tgtpgs+0x5aa/0x770 drivers/scsi/scsi_debug.c:1772 schedule_resp+0x464/0x12f0 drivers/scsi/scsi_debug.c:4429 scsi_debug_queuecommand+0x467/0x1390 drivers/scsi/scsi_debug.c:5835 scsi_dispatch_cmd+0x3fc/0x9b0 drivers/scsi/scsi_lib.c:1896 scsi_request_fn+0x1042/0x1810 drivers/scsi/scsi_lib.c:2034 __blk_run_queue_uncond block/blk-core.c:464 [inline] __blk_run_queue+0x1a4/0x380 block/blk-core.c:484 blk_execute_rq_nowait+0x1c2/0x2d0 block/blk-exec.c:78 sg_common_write.isra.19+0xd74/0x1dc0 drivers/scsi/sg.c:847 sg_write.part.23+0x6e0/0xd00 drivers/scsi/sg.c:716 sg_write+0x64/0xa0 drivers/scsi/sg.c:622 __vfs_write+0xed/0x690 fs/read_write.c:485 kill_bdev:block_device:00000000e138492c vfs_write+0x184/0x4c0 fs/read_write.c:549 ksys_write+0x107/0x240 fs/read_write.c:599 do_syscall_64+0xc2/0x560 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe We get 'alen' from command its type is int. If userspace passes a large length we will get a negative 'alen'. Switch n, alen, and rlen to u32. Link: https://lore.kernel.org/r/[email protected] Acked-by: Douglas Gilbert <[email protected]> Signed-off-by: Ye Bin <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 10, 2021
[ Upstream commit 4d9380e ] Often some test cases like btrfs/161 trigger lockdep splats that complain about possible unsafe lock scenario due to the fact that during mount, when reading the chunk tree we end up calling blkdev_get_by_path() while holding a read lock on a leaf of the chunk tree. That produces a lockdep splat like the following: [ 3653.683975] ====================================================== [ 3653.685148] WARNING: possible circular locking dependency detected [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted [ 3653.687239] ------------------------------------------------------ [ 3653.688400] mount/447465 is trying to acquire lock: [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.691054] but task is already holding lock: [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.693978] which lock already depends on the new lock. [ 3653.695510] the existing dependency chain (in reverse order) is: [ 3653.696915] -> #3 (btrfs-chunk-00){++++}-{3:3}: [ 3653.698053] down_read_nested+0x4b/0x140 [ 3653.698893] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.699988] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 3653.701205] btrfs_search_slot+0x537/0xc00 [btrfs] [ 3653.702234] btrfs_insert_empty_items+0x32/0x70 [btrfs] [ 3653.703332] btrfs_init_new_device+0x563/0x15b0 [btrfs] [ 3653.704439] btrfs_ioctl+0x2110/0x3530 [btrfs] [ 3653.705405] __x64_sys_ioctl+0x83/0xb0 [ 3653.706215] do_syscall_64+0x3b/0xc0 [ 3653.706990] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.708040] -> #2 (sb_internal#2){.+.+}-{0:0}: [ 3653.708994] lock_release+0x13d/0x4a0 [ 3653.709533] up_write+0x18/0x160 [ 3653.710017] btrfs_sync_file+0x3f3/0x5b0 [btrfs] [ 3653.710699] __loop_update_dio+0xbd/0x170 [loop] [ 3653.711360] lo_ioctl+0x3b1/0x8a0 [loop] [ 3653.711929] block_ioctl+0x48/0x50 [ 3653.712442] __x64_sys_ioctl+0x83/0xb0 [ 3653.712991] do_syscall_64+0x3b/0xc0 [ 3653.713519] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.714233] -> #1 (&lo->lo_mutex){+.+.}-{3:3}: [ 3653.715026] __mutex_lock+0x92/0x900 [ 3653.715648] lo_open+0x28/0x60 [loop] [ 3653.716275] blkdev_get_whole+0x28/0x90 [ 3653.716867] blkdev_get_by_dev.part.0+0x142/0x320 [ 3653.717537] blkdev_open+0x5e/0xa0 [ 3653.718043] do_dentry_open+0x163/0x390 [ 3653.718604] path_openat+0x3f0/0xa80 [ 3653.719128] do_filp_open+0xa9/0x150 [ 3653.719652] do_sys_openat2+0x97/0x160 [ 3653.720197] __x64_sys_openat+0x54/0x90 [ 3653.720766] do_syscall_64+0x3b/0xc0 [ 3653.721285] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.721986] -> #0 (&disk->open_mutex){+.+.}-{3:3}: [ 3653.722775] __lock_acquire+0x130e/0x2210 [ 3653.723348] lock_acquire+0xd7/0x310 [ 3653.723867] __mutex_lock+0x92/0x900 [ 3653.724394] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.725041] blkdev_get_by_path+0xb8/0xd0 [ 3653.725614] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.726332] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.726999] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.727739] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.728384] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.729130] legacy_get_tree+0x30/0x50 [ 3653.729676] vfs_get_tree+0x28/0xc0 [ 3653.730192] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.730800] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.731427] legacy_get_tree+0x30/0x50 [ 3653.731970] vfs_get_tree+0x28/0xc0 [ 3653.732486] path_mount+0x2d4/0xbe0 [ 3653.732997] __x64_sys_mount+0x103/0x140 [ 3653.733560] do_syscall_64+0x3b/0xc0 [ 3653.734080] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.734782] other info that might help us debug this: [ 3653.735784] Chain exists of: &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00 [ 3653.737123] Possible unsafe locking scenario: [ 3653.737865] CPU0 CPU1 [ 3653.738435] ---- ---- [ 3653.739007] lock(btrfs-chunk-00); [ 3653.739449] lock(sb_internal#2); [ 3653.740193] lock(btrfs-chunk-00); [ 3653.740955] lock(&disk->open_mutex); [ 3653.741431] *** DEADLOCK *** [ 3653.742176] 3 locks held by mount/447465: [ 3653.742739] #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0 [ 3653.744114] #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs] [ 3653.745563] #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.747066] stack backtrace: [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1 [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 3653.750592] Call Trace: [ 3653.750967] dump_stack_lvl+0x57/0x72 [ 3653.751526] check_noncircular+0xf3/0x110 [ 3653.752136] ? stack_trace_save+0x4b/0x70 [ 3653.752748] __lock_acquire+0x130e/0x2210 [ 3653.753356] lock_acquire+0xd7/0x310 [ 3653.753898] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.754596] ? lock_is_held_type+0xe8/0x140 [ 3653.755125] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.755729] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.756338] __mutex_lock+0x92/0x900 [ 3653.756794] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.757400] ? do_raw_spin_unlock+0x4b/0xa0 [ 3653.757930] ? _raw_spin_unlock+0x29/0x40 [ 3653.758437] ? bd_prepare_to_claim+0x129/0x150 [ 3653.758999] ? trace_module_get+0x2b/0xd0 [ 3653.759508] ? try_module_get.part.0+0x50/0x80 [ 3653.760072] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.760661] ? devcgroup_check_permission+0xc1/0x1f0 [ 3653.761288] blkdev_get_by_path+0xb8/0xd0 [ 3653.761797] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.762454] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.763055] ? clone_fs_devices+0x8f/0x170 [btrfs] [ 3653.763689] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.764370] ? kvm_sched_clock_read+0x14/0x40 [ 3653.764922] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.765493] ? super_setup_bdi_name+0x79/0xd0 [ 3653.766043] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.766780] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.767488] ? kfree+0x1f2/0x3c0 [ 3653.767979] legacy_get_tree+0x30/0x50 [ 3653.768548] vfs_get_tree+0x28/0xc0 [ 3653.769076] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.769718] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.770381] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.771086] ? kfree+0x1f2/0x3c0 [ 3653.771574] legacy_get_tree+0x30/0x50 [ 3653.772136] vfs_get_tree+0x28/0xc0 [ 3653.772673] path_mount+0x2d4/0xbe0 [ 3653.773201] __x64_sys_mount+0x103/0x140 [ 3653.773793] do_syscall_64+0x3b/0xc0 [ 3653.774333] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.775094] RIP: 0033:0x7f648bc45aaa This happens because through btrfs_read_chunk_tree(), which is called only during mount, ends up acquiring the mutex open_mutex of a block device while holding a read lock on a leaf of the chunk tree while other paths need to acquire other locks before locking extent buffers of the chunk tree. Since at mount time when we call btrfs_read_chunk_tree() we know that we don't have other tasks running in parallel and modifying the chunk tree, we can simply skip locking of chunk tree extent buffers. So do that and move the assertion that checks the fs is not yet mounted to the top block of btrfs_read_chunk_tree(), with a comment before doing it. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 20, 2021
commit c56c963 upstream. In line 800 (#1), nfp_cpp_area_alloc() allocates and initializes a CPP area structure. But in line 807 (#2), when the cache is allocated failed, this CPP area structure is not freed, which will result in memory leak. We can fix it by freeing the CPP area when the cache is allocated failed (#2). 792 int nfp_cpp_area_cache_add(struct nfp_cpp *cpp, size_t size) 793 { 794 struct nfp_cpp_area_cache *cache; 795 struct nfp_cpp_area *area; 800 area = nfp_cpp_area_alloc(cpp, NFP_CPP_ID(7, NFP_CPP_ACTION_RW, 0), 801 0, size); // #1: allocates and initializes 802 if (!area) 803 return -ENOMEM; 805 cache = kzalloc(sizeof(*cache), GFP_KERNEL); 806 if (!cache) 807 return -ENOMEM; // #2: missing free 817 return 0; 818 } Fixes: 4cb584e ("nfp: add CPP access core") Signed-off-by: Jianglei Nie <[email protected]> Acked-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Dec 20, 2021
…_lock [ Upstream commit 598ad0b ] Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218f ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <[email protected]> Tested-by: Jeff Layton <[email protected]> cc: Jan Kara <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
commit 1016fc0 upstream. A recent commit expanding the scope of the udc_lock mutex in the gadget core managed to cause an obscure and slightly bizarre lockdep violation. In abbreviated form: ====================================================== WARNING: possible circular locking dependency detected 5.19.0-rc7+ #12510 Not tainted ------------------------------------------------------ udevadm/312 is trying to acquire lock: ffff80000aae1058 (udc_lock){+.+.}-{3:3}, at: usb_udc_uevent+0x54/0xe0 but task is already holding lock: ffff000002277548 (kn->active#4){++++}-{0:0}, at: kernfs_seq_start+0x34/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (kn->active#4){++++}-{0:0}: lock_acquire+0x68/0x84 __kernfs_remove+0x268/0x380 kernfs_remove_by_name_ns+0x58/0xac sysfs_remove_file_ns+0x18/0x24 device_del+0x15c/0x440 -> #2 (device_links_lock){+.+.}-{3:3}: lock_acquire+0x68/0x84 __mutex_lock+0x9c/0x430 mutex_lock_nested+0x38/0x64 device_link_remove+0x3c/0xa0 _regulator_put.part.0+0x168/0x190 regulator_put+0x3c/0x54 devm_regulator_release+0x14/0x20 -> #1 (regulator_list_mutex){+.+.}-{3:3}: lock_acquire+0x68/0x84 __mutex_lock+0x9c/0x430 mutex_lock_nested+0x38/0x64 regulator_lock_dependent+0x54/0x284 regulator_enable+0x34/0x80 phy_power_on+0x24/0x130 __dwc2_lowlevel_hw_enable+0x100/0x130 dwc2_lowlevel_hw_enable+0x18/0x40 dwc2_hsotg_udc_start+0x6c/0x2f0 gadget_bind_driver+0x124/0x1f4 -> #0 (udc_lock){+.+.}-{3:3}: __lock_acquire+0x1298/0x20cc lock_acquire.part.0+0xe0/0x230 lock_acquire+0x68/0x84 __mutex_lock+0x9c/0x430 mutex_lock_nested+0x38/0x64 usb_udc_uevent+0x54/0xe0 Evidently this was caused by the scope of udc_mutex being too large. The mutex is only meant to protect udc->driver along with a few other things. As far as I can tell, there's no reason for the mutex to be held while the gadget core calls a gadget driver's ->bind or ->unbind routine, or while a UDC is being started or stopped. (This accounts for link #1 in the chain above, where the mutex is held while the dwc2_hsotg_udc is started as part of driver probing.) Gadget drivers' ->disconnect callbacks are problematic. Even though usb_gadget_disconnect() will now acquire the udc_mutex, there's a window in usb_gadget_bind_driver() between the times when the mutex is released and the ->bind callback is invoked. If a disconnect occurred during that window, we could call the driver's ->disconnect routine before its ->bind routine. To prevent this from happening, it will be necessary to prevent a UDC from connecting while it has no gadget driver. This should be done already but it doesn't seem to be; currently usb_gadget_connect() has no check for this. Such a check will have to be added later. Some degree of mutual exclusion is required in soft_connect_store(), which can dereference udc->driver at arbitrary times since it is a sysfs callback. The solution here is to acquire the gadget's device lock rather than the udc_mutex. Since the driver core guarantees that the device lock is always held during driver binding and unbinding, this will make the accesses in soft_connect_store() mutually exclusive with any changes to udc->driver. Lastly, it turns out there is one place which should hold the udc_mutex but currently does not: The function_show() routine needs protection while it dereferences udc->driver. The missing lock and unlock calls are added. Link: https://lore.kernel.org/all/[email protected]/ Fixes: 2191c00 ("USB: gadget: Fix use-after-free Read in usb_udc_uevent()") Cc: Felipe Balbi <[email protected]> Cc: [email protected] Reported-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Signed-off-by: Alan Stern <[email protected]> Link: https://lore.kernel.org/r/YwkfhdxA/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
commit 902e02e upstream. Syzkaller reports the following problem: BUG: sleeping function called from invalid context at kernel/printk/printk.c:2347 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1105, name: syz-executor423 3 locks held by syz-executor423/1105: #0: ffff8881468b9098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x90 drivers/tty/tty_ldisc.c:266 #1: ffff8881468b9130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: tty_write_lock drivers/tty/tty_io.c:952 [inline] #1: ffff8881468b9130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: do_tty_write drivers/tty/tty_io.c:975 [inline] #1: ffff8881468b9130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: file_tty_write.constprop.0+0x2a8/0x8e0 drivers/tty/tty_io.c:1118 #2: ffff88801b06c398 (&gsm->tx_lock){....}-{2:2}, at: gsmld_write+0x5e/0x150 drivers/tty/n_gsm.c:2717 irq event stamp: 3482 hardirqs last enabled at (3481): [<ffffffff81d13343>] __get_reqs_available+0x143/0x2f0 fs/aio.c:946 hardirqs last disabled at (3482): [<ffffffff87d39722>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline] hardirqs last disabled at (3482): [<ffffffff87d39722>] _raw_spin_lock_irqsave+0x52/0x60 kernel/locking/spinlock.c:159 softirqs last enabled at (3408): [<ffffffff87e01002>] asm_call_irq_on_stack+0x12/0x20 softirqs last disabled at (3401): [<ffffffff87e01002>] asm_call_irq_on_stack+0x12/0x20 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 2 PID: 1105 Comm: syz-executor423 Not tainted 5.10.137-syzkaller #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:118 ___might_sleep.cold+0x1e8/0x22e kernel/sched/core.c:7304 console_lock+0x19/0x80 kernel/printk/printk.c:2347 do_con_write+0x113/0x1de0 drivers/tty/vt/vt.c:2909 con_write+0x22/0xc0 drivers/tty/vt/vt.c:3296 gsmld_write+0xd0/0x150 drivers/tty/n_gsm.c:2720 do_tty_write drivers/tty/tty_io.c:1028 [inline] file_tty_write.constprop.0+0x502/0x8e0 drivers/tty/tty_io.c:1118 call_write_iter include/linux/fs.h:1903 [inline] aio_write+0x355/0x7b0 fs/aio.c:1580 __io_submit_one fs/aio.c:1952 [inline] io_submit_one+0xf45/0x1a90 fs/aio.c:1999 __do_sys_io_submit fs/aio.c:2058 [inline] __se_sys_io_submit fs/aio.c:2028 [inline] __x64_sys_io_submit+0x18c/0x2f0 fs/aio.c:2028 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x61/0xc6 The problem happens in the following control flow: gsmld_write(...) spin_lock_irqsave(&gsm->tx_lock, flags) // taken a spinlock on TX data con_write(...) do_con_write(...) console_lock() might_sleep() // -> bug As far as console_lock() might sleep it should not be called with spinlock held. The patch replaces tx_lock spinlock with mutex in order to avoid the problem. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 32dd59f ("tty: n_gsm: fix race condition in gsmld_write()") Cc: stable <[email protected]> Signed-off-by: Fedor Pchelkin <[email protected]> Signed-off-by: Alexey Khoroshilov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
[ Upstream commit ac56a0b ] Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
[ Upstream commit fb8396a ] The driver incorrectly frees client instance and subsequent i40e module removal leads to kernel crash. Reproducer: 1. Do ethtool offline test followed immediately by another one host# ethtool -t eth0 offline; ethtool -t eth0 offline 2. Remove recursively irdma module that also removes i40e module host# modprobe -r irdma Result: [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110 [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2 [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01 [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1 [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 8687.768755] #PF: supervisor read access in kernel mode [ 8687.773895] #PF: error_code(0x0000) - not-present page [ 8687.779034] PGD 0 P4D 0 [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G W I 5.19.0+ #2 [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019 [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e] [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202 [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000 [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000 [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000 [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0 [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008 [ 8687.870342] FS: 00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000 [ 8687.878427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0 [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8687.905572] PKRU: 55555554 [ 8687.908286] Call Trace: [ 8687.910737] <TASK> [ 8687.912843] i40e_remove+0x2c0/0x330 [i40e] [ 8687.917040] pci_device_remove+0x33/0xa0 [ 8687.920962] device_release_driver_internal+0x1aa/0x230 [ 8687.926188] driver_detach+0x44/0x90 [ 8687.929770] bus_remove_driver+0x55/0xe0 [ 8687.933693] pci_unregister_driver+0x2a/0xb0 [ 8687.937967] i40e_exit_module+0xc/0xf48 [i40e] Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this failure is indicated back to i40e_client_subtask() that calls i40e_client_del_instance() to free client instance referenced by pf->cinst and sets this pointer to NULL. During the module removal i40e_remove() calls i40e_lan_del_device() that dereferences pf->cinst that is NULL -> crash. Do not remove client instance when client open callbacks fails and just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs to take care about this situation (when netdev is up and client is NOT opened) in i40e_notify_client_of_netdev_close() and calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED is set. Fixes: 0ef2d5a ("i40e: KISS the client interface") Signed-off-by: Ivan Vecera <[email protected]> Tested-by: Helena Anna Dubel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
[ Upstream commit 84a5358 ] The SRv6 layer allows defining HMAC data that can later be used to sign IPv6 Segment Routing Headers. This configuration is realised via netlink through four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual length of the SECRET attribute, it is possible to provide invalid combinations (e.g., secret = "", secretlen = 64). This case is not checked in the code and with an appropriately crafted netlink message, an out-of-bounds read of up to 64 bytes (max secret length) can occur past the skb end pointer and into skb_shared_info: Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 208 memcpy(hinfo->secret, secret, slen); (gdb) bt #0 seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600, extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>, family=<optimized out>) at net/netlink/genetlink.c:731 #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00, family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775 #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792 #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>) at net/netlink/af_netlink.c:2501 #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803 #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000) at net/netlink/af_netlink.c:1319 #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>) at net/netlink/af_netlink.c:1345 #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921 ... (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end $1 = 0xffff88800b1b76c0 (gdb) p/x secret $2 = 0xffff88800b1b76c0 (gdb) p slen $3 = 64 '@' The OOB data can then be read back from userspace by dumping HMAC state. This commit fixes this by ensuring SECRETLEN cannot exceed the actual length of SECRET. Reported-by: Lucas Leong <[email protected]> Tested: verified that EINVAL is correctly returned when secretlen > len(secret) Fixes: 4f4853d ("ipv6: sr: implement API to control SR HMAC structure") Signed-off-by: David Lebrun <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
[ Upstream commit 59bcdb5 ] Using two different types of workoads, it was observed that guc_update_engine_gt_clks was being called too frequently and/or causing a CPU-to-lmem bandwidth hit over PCIE. Details on the workloads and numbers are in the notes below. Background: At the moment, guc_update_engine_gt_clks can be invoked via one of 3 ways. #1 and #2 are infrequent under normal operating conditions: 1.When a predefined "ping_delay" timer expires so that GuC- busyness can sample the GTPM clock counter to ensure it doesn't miss a wrap-around of the 32-bits of the HW counter. (The ping_delay is calculated based on 1/8th the time taken for the counter go from 0x0 to 0xffffffff based on the GT frequency. This comes to about once every 28 seconds at a GT frequency of 19.2Mhz). 2.In preparation for a gt reset. 3.In response to __gt_park events (as the gt power management puts the gt into a lower power state when there is no work being done). Root-cause: For both the workloads described farther below, it was observed that when user space calls IOCTLs that unparks the gt momentarily and repeats such calls many times in quick succession, it triggers calling guc_update_engine_gt_clks as many times. However, the primary purpose of guc_update_engine_gt_clks is to ensure we don't miss the wraparound while the counter is ticking. Thus, the solution is to ensure we skip that check if gt_park is calling this function earlier than necessary. Solution: Snapshot jiffies when we do actually update the busyness stats. Then get the new jiffies every time intel_guc_busyness_park is called and bail if we are being called too soon. Use half of the ping_delay as a safe threshold. NOTE1: Workload1: IGTs' gem_create was modified to create a file handle, allocate memory with sizes that range from a min of 4K to the max supported (in power of two step-sizes). Its maps, modifies and reads back the memory. Allocations and modification is repeated until total memory allocation reaches the max. Then the file handle is closed. With this workload, guc_update_engine_gt_clks was called over 188 thousand times in the span of 15 seconds while this test ran three times. With this patch, the number of calls reduced to 14. NOTE2: Workload2: 30 transcode sessions are created in quick succession. While these sessions are created, pcm-iio tool was used to measure I/O read operation bandwidth consumption sampled at 100 milisecond intervals over the course of 20 seconds. The total bandwidth consumed over 20 seconds without this patch was measured at average at 311KBps per sample. With this patch, the number went down to about 175Kbps which is about a 43% savings. Signed-off-by: Alan Previn <[email protected]> Reviewed-by: Umesh Nerlige Ramappa <[email protected]> Acked-by: Tvrtko Ursulin <[email protected]> Signed-off-by: John Harrison <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Stable-dep-of: aee5ae7 ("drm/i915/guc: Cancel GuC engine busyness worker synchronously") Signed-off-by: Sasha Levin <[email protected]>
jnettlet
pushed a commit
that referenced
this issue
Sep 29, 2022
[ Upstream commit 23c6191 ] In the IDC callback that is accessed when the aux drivers request a reset, the function to unplug the aux devices is called. This function is also called in the ice_prepare_for_reset function. This double call is causing a "scheduling while atomic" BUG. [ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003 [ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003 [ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003 [ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424 [ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe r ttm [ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [ 662.815557] Preemption disabled at: [ 662.815558] [<0000000000000000>] 0x0 [ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2 [ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815568] Call Trace: [ 662.815572] <IRQ> [ 662.815574] dump_stack_lvl+0x33/0x42 [ 662.815581] __schedule_bug.cold.147+0x7d/0x8a [ 662.815588] __schedule+0x798/0x990 [ 662.815595] schedule+0x44/0xc0 [ 662.815597] schedule_preempt_disabled+0x14/0x20 [ 662.815600] __mutex_lock.isra.11+0x46c/0x490 [ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815633] device_del+0x37/0x3d0 [ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.815756] __handle_irq_event_percpu+0x4c/0x180 [ 662.815762] handle_irq_event_percpu+0xf/0x40 [ 662.815764] handle_irq_event+0x34/0x60 [ 662.815766] handle_edge_irq+0x9a/0x1c0 [ 662.815770] __common_interrupt+0x62/0x100 [ 662.815774] common_interrupt+0xb4/0xd0 [ 662.815779] </IRQ> [ 662.815780] <TASK> [ 662.815780] asm_common_interrupt+0x1e/0x40 [ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.815801] cpuidle_enter+0x29/0x40 [ 662.815803] do_idle+0x261/0x2b0 [ 662.815807] cpu_startup_entry+0x19/0x20 [ 662.815809] start_secondary+0x114/0x150 [ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb [ 662.815818] </TASK> [ 662.815846] bad: scheduling from the idle thread! [ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2 [ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815853] Call Trace: [ 662.815855] <IRQ> [ 662.815856] dump_stack_lvl+0x33/0x42 [ 662.815860] dequeue_task_idle+0x20/0x30 [ 662.815863] __schedule+0x1c3/0x990 [ 662.815868] schedule+0x44/0xc0 [ 662.815871] schedule_preempt_disabled+0x14/0x20 [ 662.815873] __mutex_lock.isra.11+0x3a8/0x490 [ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815904] device_del+0x37/0x3d0 [ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.816033] __handle_irq_event_percpu+0x4c/0x180 [ 662.816037] handle_irq_event_percpu+0xf/0x40 [ 662.816039] handle_irq_event+0x34/0x60 [ 662.816042] handle_edge_irq+0x9a/0x1c0 [ 662.816045] __common_interrupt+0x62/0x100 [ 662.816048] common_interrupt+0xb4/0xd0 [ 662.816052] </IRQ> [ 662.816053] <TASK> [ 662.816054] asm_common_interrupt+0x1e/0x40 [ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.816075] cpuidle_enter+0x29/0x40 [ 662.816077] do_idle+0x261/0x2b0 [ 662.816080] cpu_startup_entry+0x19/0x20 [ 662.816083] start_secondary+0x114/0x150 [ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb [ 662.816091] </TASK> [ 662.816169] bad: scheduling from the idle thread! The correct place to unplug the aux devices for a reset is in the prepare_for_reset function, as this is a common place for all reset flows. It also has built in protection from being called twice in a single reset instance before the aux devices are replugged. Fixes: f9f5301 ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <[email protected]> Tested-by: Helena Anna Dubel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
The most recent commits have caused the IMX8MQ to fail to boot (58e9698).
The point at which boot freezes is as follows;
[ 0.688134] imx-sdma 30bd0000.sdma: no iram assigned, using external mem [ 0.695559] imx-sdma 30bd0000.sdma: Falling back to user helper [ 0.696391] imx-sdma 302c0000.sdma: no iram assigned, using external mem lpddr4 swffc start
The text was updated successfully, but these errors were encountered: