Relocation fixes #1427

josefbacik · 2024-10-03T15:39:30Z

No description provided.

…rg/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes, 15 of which are cc:stable. Mostly MM, no identifiable theme. And a few nilfs2 fixups" * tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area mailmap: update entry for Jan Kuliga codetag: debug: mark codetags for poisoned page as empty mm/memcontrol: respect zswap.writeback setting from parent cg too scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum Revert "mm: skip CMA pages when they are not available" maple_tree: remove rcu_read_lock() from mt_validate() kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook nilfs2: fix state management in error path of log writing function nilfs2: fix missing cleanup on rollforward recovery error nilfs2: protect references to superblock parameters exposed in sysfs userfaultfd: don't BUG_ON() if khugepaged yanks our page table userfaultfd: fix checks for huge PMDs mm: vmalloc: ensure vmap_block is initialised before adding to queue selftests: mm: fix build errors on armhf

…/kernel/git/deller/parisc-linux Pull parisc architecture fix from Helge Deller: - Fix boot issue where boot memory is marked read-only too early * tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Delay write-protection until mark_rodata_ro() call

Fixes this missed case: xe 0000:00:02.0: [drm] Missing outer runtime PM protection WARNING: CPU: 99 PID: 1455 at drivers/gpu/drm/xe/xe_pm.c:564 xe_pm_runtime_get_noresume+0x48/0x60 [xe] Call Trace: <TASK> ? show_regs+0x67/0x70 ? __warn+0x94/0x1b0 ? xe_pm_runtime_get_noresume+0x48/0x60 [xe] ? report_bug+0x1b7/0x1d0 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? xe_pm_runtime_get_noresume+0x48/0x60 [xe] xe_device_declare_wedged+0x91/0x280 [xe] gt_reset_worker+0xa2/0x250 [xe] v2: Also move get and get the right Fixes tag (Himal, Brost) Fixes: fb74b20 ("drm/xe: Introduce a simple wedged state") Cc: Himal Prasad Ghimiray <[email protected]> Cc: Matthew Brost <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Reviewed-by: Himal Prasad Ghimiray <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit bc947d9) Signed-off-by: Rodrigo Vivi <[email protected]>

…t/rmk/linux Pull ARM fix from Russell King: - Fix a build issue with older binutils with LD dead code elimination disabled * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION

Ole reported that event->mmap_mutex is strictly insufficient to serialize the AUX buffer, add a per RB mutex to fully serialize it. Note that in the lock order comment the perf_event::mmap_mutex order was already wrong, that is, it nesting under mmap_lock is not new with this patch. Fixes: 45bfb2e ("perf: Add AUX area to ring buffer for raw data streams") Reported-by: Ole <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>

Suspend fbdev sooner, and disable user access before suspending to prevent some races. I've noticed this when comparing xe suspend to i915's. Matches the following commits from i915: 24b412b ("drm/i915: Disable intel HPD poll after DRM poll init/enable") 1ef28d8 ("drm/i915: Suspend the framebuffer console earlier during system suspend") bd738d8 ("drm/i915: Prevent modesets during driver init/shutdown") Thanks to Imre for pointing me to those commits. Driver shutdown is currently missing, but I have some idea how to implement it next. Signed-off-by: Maarten Lankhorst <[email protected]> Cc: Imre Deak <[email protected]> Reviewed-by: Uma Shankar <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Maarten Lankhorst,,, <[email protected]> (cherry picked from commit 492be2a) Signed-off-by: Rodrigo Vivi <[email protected]>

Enable/Disable user access only during system suspend/resume. This should not happen during runtime s/r v2: rebased Reviewed-by: Arun R Murthy <[email protected]> Signed-off-by: Imre Deak <[email protected]> Signed-off-by: Vinod Govindapillai <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit a64e7e5) Signed-off-by: Rodrigo Vivi <[email protected]>

Fix circular locking dependency on runtime suspend. <4> [74.952215] ====================================================== <4> [74.952217] WARNING: possible circular locking dependency detected <4> [74.952219] 6.10.0-rc7-xe #1 Not tainted <4> [74.952221] ------------------------------------------------------ <4> [74.952223] kworker/7:1/82 is trying to acquire lock: <4> [74.952226] ffff888120548488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_modeset_lock_all+0x40/0x1e0 [drm] <4> [74.952260] but task is already holding lock: <4> [74.952262] ffffffffa0ae59c0 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}, at: xe_pm_runtime_suspend+0x2f/0x340 [xe] <4> [74.952322] which lock already depends on the new lock. The commit 'b1d90a86 ("drm/xe: Use the encoder suspend helper also used by the i915 driver")' didn't do anything wrong. It actually fixed a critical bug, because the encoder_suspend was never getting actually called because it was returning if (has_display(xe)) instead of if (!has_display(xe)). However, this ended up introducing the encoder suspend calls in the runtime routines as well, causing the circular locking dependency. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2304 Fixes: b1d90a8 ("drm/xe: Use the encoder suspend helper also used by the i915 driver") Cc: Imre Deak <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit 8da1944) Signed-off-by: Rodrigo Vivi <[email protected]>

…kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "Two netfs fixes for this merge window: - Ensure that fscache_cookie_lru_time is deleted when the fscache module is removed to prevent UAF - Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range() Before it used truncate_inode_pages_partial() which causes copy_file_range() to fail on cifs" * tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()

Pull smb server fixes from Steve French: - Fix crash in session setup - Fix locking bug - Improve access bounds checking * tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Unlock on in ksmbd_tcp_set_interfaces() ksmbd: unset the binding mark of a reused connection smb: Annotate struct xattr_smb_acl with __counted_by()

…rnel/git/kdave/linux Pull btrfs fixes from David Sterba: - followup fix for direct io and fsync under some conditions, reported by QEMU users - fix a potential leak when disabling quotas while some extent tracking work can still happen - in zoned mode handle unexpected change of zone write pointer in RAID1-like block groups, turn the zones to read-only * tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix race between direct IO write and fsync when using same fd btrfs: zoned: handle broken write pointer on zones btrfs: qgroup: don't use extent changeset when not needed

If the length of the name string is 1 and the value of name[0] is NULL byte, an OOB vulnerability occurs in btf_name_valid_section() and the return value is true, so the invalid name passes the check. To solve this, you need to check if the first position is NULL byte and if the first character is printable. Suggested-by: Eduard Zingerman <[email protected]> Fixes: bd70a8f ("bpf: Allow all printable characters in BTF DATASEC names") Signed-off-by: Jeongjun Park <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Eduard Zingerman <[email protected]>

…/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - hp-wmi-sensors: Check if WMI event data exists before accessing it - ltc2991: fix register bits defines * tag 'hwmon-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (hp-wmi-sensors) Check if WMI event data exists hwmon: ltc2991: fix register bits defines

….org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "A number of small fixes for the late cycle: - Two more build fixes on 32-bit archs - Fixed a segfault during perf test - Fixed spinlock/rwlock accounting bug in perf lock contention" * tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf daemon: Fix the build on more 32-bit architectures perf python: include "util/sample.h" perf lock contention: Fix spinlock and rwlock accounting perf test pmu: Set uninitialized PMU alias to null

Add selftest for cases where btf_name_valid_section() does not properly check for certain types of names. Suggested-by: Eduard Zingerman <[email protected]> Signed-off-by: Jeongjun Park <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Eduard Zingerman <[email protected]>

…id_section' Jeongjun Park says: ==================== bpf: fix incorrect name check pass logic in btf_name_valid_section This patch was written to fix an issue where btf_name_valid_section() would not properly check names with certain conditions and would throw an OOB vuln. And selftest was added to verify this patch. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Pull bcachefs fixes from Kent Overstreet: - Fix a typo in the rebalance accounting changes - BCH_SB_MEMBER_INVALID: small on disk format feature which will be needed for full erasure coding support; this is only the minimum so that 6.11 can handle future versions without barfing. * tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs: bcachefs: BCH_SB_MEMBER_INVALID bcachefs: fix rebalance accounting

Bareudp devices update their stats concurrently. Therefore they need proper atomic increments. Fixes: 571912c ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/04b7b9d0b480158eb3ab4366ec80aa2ab7e41fcb.1725031794.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>

We observed a null-ptr-deref in fou_gro_receive() while shutting down a host. [0] The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol in struct fou. When fou_release() is called due to netns dismantle or explicit tunnel teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data. Then, the tunnel socket is destroyed after a single RCU grace period. So, in-flight udp4_gro_receive() could find the socket and execute the FOU GRO handler, where sk->sk_user_data could be NULL. Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL checks in FOU GRO handlers. [0]: BUG: kernel NULL pointer dereference, address: 0000000000000008 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 80000001032f4067 P4D 80000001032f4067 PUD 103240067 PMD 0 SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.216-204.855.amzn2.x86_64 #1 Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017 RIP: 0010:fou_gro_receive (net/ipv4/fou.c:233) [fou] Code: 41 5f c3 cc cc cc cc e8 e7 2e 69 f4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 49 89 f8 41 54 48 89 f7 48 89 d6 49 8b 80 88 02 00 00 <0f> b6 48 08 0f b7 42 4a 66 25 fd fd 80 cc 02 66 89 42 4a 0f b6 42 RSP: 0018:ffffa330c0003d08 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff93d9e3a6b900 RCX: 0000000000000010 RDX: ffff93d9e3a6b900 RSI: ffff93d9e3a6b900 RDI: ffff93dac2e24d08 RBP: ffff93d9e3a6b900 R08: ffff93dacbce6400 R09: 0000000000000002 R10: 0000000000000000 R11: ffffffffb5f369b0 R12: ffff93dacbce6400 R13: ffff93dac2e24d08 R14: 0000000000000000 R15: ffffffffb4edd1c0 FS: 0000000000000000(0000) GS:ffff93daee800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000102140001 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) ? no_context (arch/x86/mm/fault.c:752) ? exc_page_fault (arch/x86/include/asm/irqflags.h:49 arch/x86/include/asm/irqflags.h:89 arch/x86/mm/fault.c:1435 arch/x86/mm/fault.c:1483) ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:571) ? fou_gro_receive (net/ipv4/fou.c:233) [fou] udp_gro_receive (include/linux/netdevice.h:2552 net/ipv4/udp_offload.c:559) udp4_gro_receive (net/ipv4/udp_offload.c:604) inet_gro_receive (net/ipv4/af_inet.c:1549 (discriminator 7)) dev_gro_receive (net/core/dev.c:6035 (discriminator 4)) napi_gro_receive (net/core/dev.c:6170) ena_clean_rx_irq (drivers/amazon/net/ena/ena_netdev.c:1558) [ena] ena_io_poll (drivers/amazon/net/ena/ena_netdev.c:1742) [ena] napi_poll (net/core/dev.c:6847) net_rx_action (net/core/dev.c:6917) __do_softirq (arch/x86/include/asm/jump_label.h:25 include/linux/jump_label.h:200 include/trace/events/irq.h:142 kernel/softirq.c:299) asm_call_irq_on_stack (arch/x86/entry/entry_64.S:809) </IRQ> do_softirq_own_stack (arch/x86/include/asm/irq_stack.h:27 arch/x86/include/asm/irq_stack.h:77 arch/x86/kernel/irq_64.c:77) irq_exit_rcu (kernel/softirq.c:393 kernel/softirq.c:423 kernel/softirq.c:435) common_interrupt (arch/x86/kernel/irq.c:239) asm_common_interrupt (arch/x86/include/asm/idtentry.h:626) RIP: 0010:acpi_idle_do_entry (arch/x86/include/asm/irqflags.h:49 arch/x86/include/asm/irqflags.h:89 drivers/acpi/processor_idle.c:114 drivers/acpi/processor_idle.c:575) Code: 8b 15 d1 3c c4 02 ed c3 cc cc cc cc 65 48 8b 04 25 40 ef 01 00 48 8b 00 a8 08 75 eb 0f 1f 44 00 00 0f 00 2d d5 09 55 00 fb f4 <fa> c3 cc cc cc cc e9 be fc ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 RSP: 0018:ffffffffb5603e58 EFLAGS: 00000246 RAX: 0000000000004000 RBX: ffff93dac0929c00 RCX: ffff93daee833900 RDX: ffff93daee800000 RSI: ffff93daee87dc00 RDI: ffff93daee87dc64 RBP: 0000000000000001 R08: ffffffffb5e7b6c0 R09: 0000000000000044 R10: ffff93daee831b04 R11: 00000000000001cd R12: 0000000000000001 R13: ffffffffb5e7b740 R14: 0000000000000001 R15: 0000000000000000 ? sched_clock_cpu (kernel/sched/clock.c:371) acpi_idle_enter (drivers/acpi/processor_idle.c:712 (discriminator 3)) cpuidle_enter_state (drivers/cpuidle/cpuidle.c:237) cpuidle_enter (drivers/cpuidle/cpuidle.c:353) cpuidle_idle_call (kernel/sched/idle.c:158 kernel/sched/idle.c:239) do_idle (kernel/sched/idle.c:302) cpu_startup_entry (kernel/sched/idle.c:395 (discriminator 1)) start_kernel (init/main.c:1048) secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:310) Modules linked in: udp_diag tcp_diag inet_diag nft_nat ipip tunnel4 dummy fou ip_tunnel nft_masq nft_chain_nat nf_nat wireguard nft_ct curve25519_x86_64 libcurve25519_generic nf_conntrack libchacha20poly1305 nf_defrag_ipv6 nf_defrag_ipv4 nft_objref chacha_x86_64 nft_counter nf_tables nfnetlink poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper mousedev psmouse button ena ptp pps_core crc32c_intel CR2: 0000000000000008 Fixes: d92283e ("fou: change to use UDP socket GRO") Reported-by: Alphonse Kurian <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

generic_ocp_write() asks the parameter "size" must be 4 bytes align. Therefore, write the bp would fail, if the mac->bp_num is odd. Align the size to 4 for fixing it. The way may write an extra bp, but the rtl8152_is_fw_mac_ok() makes sure the value must be 0 for the bp whose index is more than mac->bp_num. That is, there is no influence for the firmware. Besides, I check the return value of generic_ocp_write() to make sure everything is correct. Fixes: e5c266a ("r8152: set bp in bulk") Signed-off-by: Hayes Wang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

When userspace wants to take over a fdb entry by setting it as EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add(). If the bridge updates the entry later because its port changed, we clear the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER flag set. If userspace then wants to take over the entry again, br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the update. Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless if this was a user fdb entry or not. Fixes: 710ae72 ("net: bridge: Mark FDB entries that were added by user as such") Signed-off-by: Jonas Gorski <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

axienet_dma_err_handler can race with axienet_stop in the following manner: CPU 1 CPU 2 ====================== ================== axienet_stop() napi_disable() axienet_dma_stop() axienet_dma_err_handler() napi_disable() axienet_dma_stop() axienet_dma_start() napi_enable() cancel_work_sync() free_irq() Fix this by setting a flag in axienet_stop telling axienet_dma_err_handler not to bother doing anything. I chose not to use disable_work_sync to allow for easier backporting. Signed-off-by: Sean Anderson <[email protected]> Fixes: 8a3b7a2 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.11 Hopefully final fixes for v6.11 and this time only fixes to ath11k driver. We need to revert hibernation support due to reported regressions and we have a fix for kernel crash introduced in v6.11-rc1. * tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: MAINTAINERS: wifi: cw1200: add net-cw1200.h Revert "wifi: ath11k: support hibernation" Revert "wifi: ath11k: restore country code during resume" wifi: ath11k: fix NULL pointer dereference in ath11k_mac_get_eirp_power() ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…t/tnguy/net-queue Tony Nguyen says: ==================== ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba says: PF reset can be triggered asynchronously, by tx_timeout or by a user. With some unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and modify XDP rings at the same time, causing system crash. The first patch factors out rtnl-locked code from VSI rebuild code to avoid deadlock. The following changes lock rebuild and .ndo_bpf() critical sections with an internal mutex as well and provide complementary fixes. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: do not bring the VSI up, if it was down before the XDP setup ice: remove ICE_CFG_BUSY locking from AF_XDP code ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset ice: check for XDP rings instead of bpf program when unconfiguring ice: protect XDP configuration with a mutex ice: move netif_queue_set_napi to rtnl-protected sections ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

We were allowing any users to create a high priority group without any permission checks. As a result, this was allowing possible denial of service. We now only allow the DRM master or users with the CAP_SYS_NICE capability to set higher priorities than PANTHOR_GROUP_PRIORITY_MEDIUM. As the sole user of that uAPI lives in Mesa and hardcode a value of MEDIUM [1], this should be safe to do. Additionally, as those checks are performed at the ioctl level, panthor_group_create now only check for priority level validity. [1]https://gitlab.freedesktop.org/mesa/mesa/-/blob/f390835074bdf162a63deb0311d1a6de527f9f89/src/gallium/drivers/panfrost/pan_csf.c#L1038 Signed-off-by: Mary Guillemard <[email protected]> Fixes: de85488 ("drm/panthor: Add the scheduler logical block") Cc: [email protected] Reviewed-by: Boris Brezillon <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

The WL-355608-A8 is a 3.5" 640x480@60Hz RGB LCD display from an unknown OEM used in a number of handheld gaming devices made by Anbernic. Previously committed using the OEM serial without a vendor prefix, however following subsequent discussion the preference is to use the integrating device vendor and name where the OEM is unknown. There are 4 RG35XX series devices from Anbernic based on an Allwinner H700 SoC using this panel, with the -Plus variant introduced first. Therefore the -Plus is used as the fallback for the subsequent -H, -2024, and -SP devices. Alter the filename and compatible string to reflect the convention. Fixes: 45b888a ("dt-bindings: display: panel: Add WL-355608-A8 panel") Signed-off-by: Ryan Walklin <[email protected]> Acked-by: Rob Herring (Arm) <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

As per the previous dt-binding commit, update the WL-355608-A8 panel compatible to reflect the the integrating device vendor and name as the panel OEM is unknown. Fixes: 62ea2ee ("drm: panel: nv3052c: Add WL-355608-A8 panel") Signed-off-by: Ryan Walklin <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

In the off-chance that waiting for the firmware to signal its booted status timed out in the fast reset path, one must flush the cache lines for the entire FW VM address space before reloading the regions, otherwise stale values eventually lead to a scheduler job timeout. Fixes: 647810e ("drm/panthor: Add the MMU/VM logical block") Cc: [email protected] Signed-off-by: Adrián Larumbe <[email protected]> Acked-by: Liviu Dudau <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Document what was discussed multiple times on list and various virtual / in-person conversations. guard() being okay in functions <= 20 LoC is a bit of my own invention. If the function is trivial it should be fine, but feel free to disagree :) We'll obviously revisit this guidance as time passes and we and other subsystems get more experience. Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Deferred I/O requires struct page for framebuffer memory, which is not guaranteed for all DMA ranges. We thus only install deferred I/O if we have a framebuffer that requires it. A reported bug affected the ipu-v3 and pl111 drivers, which have video memory in either Normal or HighMem zones [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000010000000-0x000000003fffffff] [ 0.000000] HighMem [mem 0x0000000040000000-0x000000004fffffff] where deferred I/O only works correctly with HighMem. See the Closes tags for bug reports. v2: - test if screen_buffer supports deferred I/O (Sima) Signed-off-by: Thomas Zimmermann <[email protected]> Fixes: 808a40b ("drm/fbdev-dma: Implement damage handling and deferred I/O") Reported-by: Alexander Stein <[email protected]> Closes: https://lore.kernel.org/all/23636953.6Emhk5qWAg@steina-w/ Reported-by: Linus Walleij <[email protected]> Closes: https://lore.kernel.org/dri-devel/CACRpkdb+hb9AGavbWpY-=uQQ0apY9en_tWJioPKf_fAbXMP4Hg@mail.gmail.com/ Tested-by: Alexander Stein <[email protected]> Tested-by: Linus Walleij <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Javier Martinez Canillas <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Reviewed-by: Simona Vetter <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

[PROBLEM] It is very common for udev to trigger device scan, and every time a mounted btrfs device got re-scan from different soft links, we will get some of unnecessary device path updates, this is especially common for LVM based storage: # lvs scratch1 test -wi-ao---- 10.00g scratch2 test -wi-a----- 10.00g scratch3 test -wi-a----- 10.00g scratch4 test -wi-a----- 10.00g scratch5 test -wi-a----- 10.00g test test -wi-a----- 10.00g # mkfs.btrfs -f /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # dmesg -c [ 205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154) [ 205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 [ 205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm [ 205.713856] BTRFS info (device dm-4): using free-space-tree [ 205.722324] BTRFS info (device dm-4): checking UUID tree So far so good, but even if we just touched any soft link of "dm-4", we will get quite some unnecessary device path updates. # touch /dev/mapper/test-scratch1 # dmesg -c [ 469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221) [ 469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221) Such device path rename is unnecessary and can lead to random path change due to the udev race. [CAUSE] Inside device_list_add(), we are using a very primitive way checking if the device has changed, strcmp(). Which can never handle links well, no matter if it's hard or soft links. So every different link of the same device will be treated as a different device, causing the unnecessary device path update. [FIX] Introduce a helper, is_same_device(), and use path_equal() to properly detect the same block device. So that the different soft links won't trigger the rename race. Reviewed-by: Filipe Manana <[email protected]> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

[PROBLEM] Currently btrfs accepts any file path for its device, resulting some weird situation: # ./mount_by_fd /dev/test/scratch1 /mnt/btrfs/ The program has the following source code: #include <fcntl.h> #include <stdio.h> #include <sys/mount.h> int main(int argc, char *argv[]) { int fd = open(argv[1], O_RDWR); char path[256]; snprintf(path, sizeof(path), "/proc/self/fd/%d", fd); return mount(path, argv[2], "btrfs", 0, NULL); } Then we can have the following weird device path: BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440) Normally it's not a big deal, and later udev can trigger a device path rename. But if udev didn't trigger, the device path "/proc/self/fd/3" will show up in mtab. [CAUSE] For filename "/proc/self/fd/3", it means the opened file descriptor 3. In above case, it's exactly the device we want to open, aka points to "/dev/test/scratch1" which is another symlink pointing to "/dev/dm-2". Inside kernel we solve the mount source using LOOKUP_FOLLOW, which follows the symbolic link and grab the proper block device. But inside btrfs we also save the filename into btrfs_device::name, and utilize that member to report our mount source, which leads to the above situation. [FIX] Instead of unconditionally trust the path, check if the original file (not following the symbolic link) is inside "/dev/", if not, then manually lookup the path to its final destination, and use that as our device path. This allows us to still use symbolic links, like "/dev/mapper/test-scratch" from LVM2, which is required for fstests runs with LVM2 setup. And for really weird names, like the above case, we solve it to "/dev/dm-2" instead. Reviewed-by: Filipe Manana <[email protected]> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

Remove the duplicated transaction joining, block reserve setting and raid extent inserting in btrfs_finish_ordered_extent(). While at it, also abort the transaction in case inserting a RAID stripe-tree entry fails. Suggested-by: Naohiro Aota <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Signed-off-by: David Sterba <[email protected]>

…s enabled When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(), if we fail to insert the qgroup record we don't error out, we ignore it. In fact we treat it as if there was no error and there was already an existing record - we don't distinguish between the cases where btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already existed and we can free the given record, and the case where it returns a negative error value, meaning the insertion into the xarray that is used to track records failed. Effectively we end up ignoring that we are lacking qgroup record in the dirty extents xarray, resulting in incorrect qgroup accounting. Fix this by checking for errors and return them to the callers. Fixes: 3cce39a ("btrfs: qgroup: use xarray to track dirty extents in transaction") Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

We are using the logical address ("bytenr") of an extent as the key for qgroup records in the dirty extents xarray. This is a problem because the xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits platform any extent starting at or beyond 4G is truncated, which is a too low limitation as virtually everyone is using storage with more than 4G of space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and 16G for example, resulting in incorrect qgroup accounting. Fix this by using sector numbers as keys instead, that is, using keys that match the logical address right shifted by fs_info->sectorsize_bits, which is what we do for the fs_info->buffer_radix that tracks extent buffers (radix trees also use an "unsigned long" type for keys). This also makes the index space more dense which helps optimize the xarray (as mentioned at Documentation/core-api/xarray.rst). Fixes: 3cce39a ("btrfs: qgroup: use xarray to track dirty extents in transaction") Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

While running checkpatch against a patch that modifies the btrfs_qgroup_extent event class, it complained about using a comma instead of a semicolon: $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch WARNING: Possible comma where semicolon could be used torvalds#215: FILE: include/trace/events/btrfs.h:1720: + __entry->bytenr = bytenr, __entry->num_bytes = rec->num_bytes; total: 0 errors, 1 warnings, 184 lines checked So replace the comma with a semicolon to silence checkpatch and possibly other tools. It also makes the code consistent with the rest. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

…ecord Now that we track qgroup extent records in a xarray we don't need to have a "bytenr" field in struct btrfs_qgroup_extent_record, since we can get it from the index of the record in the xarray. So remove the field and grab the bytenr from either the index key or any other place where it's available (delayed refs). This reduces the size of struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning that we now can store 128 instances of this structure instead of 102 per 4K page. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

…_post() Instead of extracting fs_info from the transaction multiples times, store it in a local variable and use it. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

…extent() There's no need to hold the delayed refs spinlock when calling btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since it doesn't change anything in delayed refs and it only changes the xarray used to track qgroup extent records, which is protected by the xarray's lock. Holding the lock is only adding unnecessary lock contention with other tasks that actually need to take the lock to add/remove/change delayed references. So remove the locking. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

…xtent() Instead of dereferecing the delayed refs from the transaction multiple times, store it early in the local variable and then always use the variable. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

The qgroup record was allocated with kzalloc(), so it's pointless to set its old_roots member to NULL. Remove the assignment. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

…ecreased During an incremental send we may end up sending an invalid clone operation, for the last extent of a file which ends at an unaligned offset that matches the final i_size of the file in the send snapshot, in case the file had its initial size (the size in the parent snapshot) decreased in the send snapshot. In this case the destination will fail to apply the clone operation because its end offset is not sector size aligned and it ends before the current size of the file. Sending the truncate operation always happens when we finish processing an inode, after we process all its extents (and xattrs, names, etc). So fix this by ensuring the file has a valid size before we send a clone operation for an unaligned extent that ends at the final i_size of the file. The size we truncate to matches the start offset of the clone range but it could be any value between that start offset and the final size of the file since the clone operation will expand the i_size if the current size is smaller than the end offset. The start offset of the range was chosen because it's always sector size aligned and avoids a truncation into the middle of a page, which results in dirtying the page due to filling part of it with zeroes and then making the clone operation at the receiver trigger IO. The following test reproduces the issue: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a file with a size of 256K + 5 bytes, having two extents, one # with a size of 128K and another one with a size of 128K + 5 bytes. last_ext_size=$((128 * 1024 + 5)) xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \ -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \ $MNT/foo # Another file which we will later clone foo into, but initially with # a larger size than foo. xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar btrfs subvolume snapshot -r $MNT/ $MNT/snap1 # Now resize bar and clone foo into it. xfs_io -c "truncate 0" \ -c "reflink $MNT/foo" $MNT/bar btrfs subvolume snapshot -r $MNT/ $MNT/snap2 rm -f /tmp/send-full /tmp/send-inc btrfs send -f /tmp/send-full $MNT/snap1 btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2 umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT btrfs receive -f /tmp/send-full $MNT btrfs receive -f /tmp/send-inc $MNT umount $MNT Running it before this patch: $ ./test.sh (...) At subvol snap1 At snapshot snap2 ERROR: failed to clone extents to bar: Invalid argument A test case for fstests will be sent soon. Reported-by: Ben Millwood <[email protected]> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/ Fixes: 46a6e10 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size") Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

…acntion [BUG] Syzbot reported a NULL pointer dereference with the following crash: FAULT_INJECTION: forcing a failure. start_transaction+0x830/0x1670 fs/btrfs/transaction.c:676 prepare_to_relocate+0x31f/0x4c0 fs/btrfs/relocation.c:3642 relocate_block_group+0x169/0xd20 fs/btrfs/relocation.c:3678 ... BTRFS info (device loop0): balance: ended with status: -12 Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cc: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000660-0x0000000000000667] RIP: 0010:btrfs_update_reloc_root+0x362/0xa80 fs/btrfs/relocation.c:926 Call Trace: <TASK> commit_fs_roots+0x2ee/0x720 fs/btrfs/transaction.c:1496 btrfs_commit_transaction+0xfaf/0x3740 fs/btrfs/transaction.c:2430 del_balance_item fs/btrfs/volumes.c:3678 [inline] reset_balance_state+0x25e/0x3c0 fs/btrfs/volumes.c:3742 btrfs_balance+0xead/0x10c0 fs/btrfs/volumes.c:4574 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ---[ end trace 0000000000000000 ]--- [CAUSE] The allocation failure happens at the start_transaction() inside prepare_to_relocate(), and during the error handling we call unset_reloc_control(), which makes fs_info->balance_ctl to be NULL. Then we continue the error path cleanup in btrfs_balance() by calling reset_balance_state() which will call del_balance_item() to fully delete the balance item in the root tree. However during the small window between set_reloc_contrl() and unset_reloc_control(), we can have a subvolume tree update and created a reloc_root for that subvolume. Then we go into the final btrfs_commit_transaction() of del_balance_item(), and into btrfs_update_reloc_root() inside commit_fs_roots(). That function checks if fs_info->reloc_ctl is in the merge_reloc_tree stage, but since fs_info->reloc_ctl is NULL, it results a NULL pointer dereference. [FIX] Just add extra check on fs_info->reloc_ctl inside btrfs_update_reloc_root(), before checking fs_info->reloc_ctl->merge_reloc_tree. That DEAD_RELOC_TREE handling is to prevent further modification to the reloc tree during merge stage, but since there is no reloc_ctl at all, we do not need to bother that. Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

The variable stop_loop was originally introduced in commit 625f1c8 ("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in commit 3b080b2 ("Btrfs: scrub raid56 stripes in the right way"). However, in a later commit 18d30ab ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code that modified stop_loop was removed, making the variable redundant. Currently, stop_loop is only initialized with 0 and is never used or modified within the scrub_stripe() function. As a result, this patch removes the stop_loop variable to clean up the code and eliminate unnecessary redundancy. This change has no impact on functionality, as stop_loop was never utilized in any meaningful way in the final version of the code. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Riyan Dhiman <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

…umount During unmount, at close_ctree(), we have the following steps in this order: 1) Park the cleaner kthread - this doesn't destroy the kthread, it basically halts its execution (wake ups against it work but do nothing); 2) We stop the cleaner kthread - this results in freeing the respective struct task_struct; 3) We call btrfs_stop_all_workers() which waits for any jobs running in all the work queues and then free the work queues. Syzbot reported a case where a fixup worker resulted in a crash when doing a delayed iput on its inode while attempting to wake up the cleaner at btrfs_add_delayed_iput(), because the task_struct of the cleaner kthread was already freed. This can happen during unmount because we don't wait for any fixup workers still running before we call kthread_stop() against the cleaner kthread, which stops and free all its resources. Fix this by waiting for any fixup workers at close_ctree() before we call kthread_stop() against the cleaner and run pending delayed iputs. The stack traces reported by syzbot were the following: BUG: KASAN: slab-use-after-free in __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065 Read of size 8 at addr ffff8880272a8a18 by task kworker/u8:3/52 CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.12.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: btrfs-fixup btrfs_work_helper Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline] try_to_wake_up+0xb0/0x1480 kernel/sched/core.c:4154 btrfs_writepage_fixup_worker+0xc16/0xdf0 fs/btrfs/inode.c:2842 btrfs_work_helper+0x390/0xc50 fs/btrfs/async-thread.c:314 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 2: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:319 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345 kasan_slab_alloc include/linux/kasan.h:247 [inline] slab_post_alloc_hook mm/slub.c:4086 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4187 alloc_task_struct_node kernel/fork.c:180 [inline] dup_task_struct+0x57/0x8c0 kernel/fork.c:1107 copy_process+0x5d1/0x3d50 kernel/fork.c:2206 kernel_clone+0x223/0x880 kernel/fork.c:2787 kernel_thread+0x1bc/0x240 kernel/fork.c:2849 create_kthread kernel/kthread.c:412 [inline] kthreadd+0x60d/0x810 kernel/kthread.c:765 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Freed by task 61: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:2343 [inline] slab_free mm/slub.c:4580 [inline] kmem_cache_free+0x1a2/0x420 mm/slub.c:4682 put_task_struct include/linux/sched/task.h:144 [inline] delayed_put_task_struct+0x125/0x300 kernel/exit.c:228 rcu_do_batch kernel/rcu/tree.c:2567 [inline] rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1037 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1037 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 Last potentially related work creation: kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541 __call_rcu_common kernel/rcu/tree.c:3086 [inline] call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190 context_switch kernel/sched/core.c:5318 [inline] __schedule+0x184b/0x4ae0 kernel/sched/core.c:6675 schedule_idle+0x56/0x90 kernel/sched/core.c:6793 do_idle+0x56a/0x5d0 kernel/sched/idle.c:354 cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:424 start_secondary+0x102/0x110 arch/x86/kernel/smpboot.c:314 common_startup_64+0x13e/0x147 The buggy address belongs to the object at ffff8880272a8000 which belongs to the cache task_struct of size 7424 The buggy address is located 2584 bytes inside of freed 7424-byte region [ffff8880272a8000, ffff8880272a9d00) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x272a8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000 head: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000 head: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000 head: 00fff00000000003 ffffea00009caa01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, tgid 2 (kthreadd), ts 71247381401, free_ts 71214998153 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_slab_page+0x6a/0x120 mm/slub.c:2413 allocate_slab+0x5a/0x2f0 mm/slub.c:2579 new_slab mm/slub.c:2632 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3819 __slab_alloc+0x58/0xa0 mm/slub.c:3909 __slab_alloc_node mm/slub.c:3962 [inline] slab_alloc_node mm/slub.c:4123 [inline] kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4187 alloc_task_struct_node kernel/fork.c:180 [inline] dup_task_struct+0x57/0x8c0 kernel/fork.c:1107 copy_process+0x5d1/0x3d50 kernel/fork.c:2206 kernel_clone+0x223/0x880 kernel/fork.c:2787 kernel_thread+0x1bc/0x240 kernel/fork.c:2849 create_kthread kernel/kthread.c:412 [inline] kthreadd+0x60d/0x810 kernel/kthread.c:765 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 page last free pid 5230 tgid 5230 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcd0/0xf00 mm/page_alloc.c:2638 discard_slab mm/slub.c:2678 [inline] __put_partials+0xeb/0x130 mm/slub.c:3146 put_cpu_partial+0x17c/0x250 mm/slub.c:3221 __slab_free+0x2ea/0x3d0 mm/slub.c:4450 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329 kasan_slab_alloc include/linux/kasan.h:247 [inline] slab_post_alloc_hook mm/slub.c:4086 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4142 getname_flags+0xb7/0x540 fs/namei.c:139 do_sys_openat2+0xd2/0x1d0 fs/open.c:1409 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x247/0x2a0 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Memory state around the buggy address: ffff8880272a8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880272a8980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880272a8a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880272a8a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880272a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

This macro is no longer used after the "btrfs: Cleaned up folio->page conversion" series patch [1] was applied, so remove it. [1]: https://patchwork.kernel.org/project/linux-btrfs/cover/[email protected]/ Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Youling Tang <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

Fix some confusing spelling errors that were currently identified, the details are as follows: block-group.c: 2800: uncompressible ==> incompressible extent-tree.c: 3131: EXTEMT ==> EXTENT extent_io.c: 3124: utlizing ==> utilizing extent_map.c: 1323: ealier ==> earlier extent_map.c: 1325: possiblity ==> possibility fiemap.c: 189: emmitted ==> emitted fiemap.c: 197: emmitted ==> emitted fiemap.c: 203: emmitted ==> emitted transaction.h: 36: trasaction ==> transaction volumes.c: 5312: filesysmte ==> filesystem zoned.c: 1977: trasnsaction ==> transaction Signed-off-by: Shen Lichuan <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

Disable ratelimiting for btrfs_printk when CONFIG_BTRFS_DEBUG is enabled. This allows for more verbose output which is often needed by functions like btrfs_dump_space_info(). Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Leo Martins <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

Add first stash of very basic self tests for the RAID stripe-tree. More test cases will follow exercising the tree. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

…one info At btrfs_load_zone_info() we have an error path that is dereferecing the name of a device which is a RCU string but we are not holding a RCU read lock, which is incorrect. Fix this by using btrfs_err_in_rcu() instead of btrfs_err(). The problem is there since commit 08e11a3 ("btrfs: zoned: load zone's allocation offset"), back then at btrfs_load_block_group_zone_info() but then later on that code was factored out into the helper btrfs_load_zone_info() by commit 09a4672 ("btrfs: zoned: factor out per-zone logic from btrfs_load_block_group_zone_info"). Fixes: 08e11a3 ("btrfs: zoned: load zone's allocation offset") Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Naohiro Aota <[email protected]> Signed-off-by: Filipe Manana <[email protected]>

This BUG_ON is meant to catch backref cache problems, but these can arise from either bugs in the backref cache or corruption in the extent tree. Fix it to be a proper error and change it to an ASSERT() so that developers notice problems. Signed-off-by: Josef Bacik <[email protected]>

Now that we're not updating the backref cache when we switch transids we can remove the changed list. We're going to keep the new_bytenr field because it serves as a good sanity check for the backref cache and relocation, and can prevent us from making extent tree corruption worse. Signed-off-by: Josef Bacik <[email protected]>

Add a comment for this field so we know what it is used for. Previously we used it to update the backref cache, so people may mistakenly think it is useless, but in fact exists to make sure the backref cache makes sense. Signed-off-by: Josef Bacik <[email protected]>

We have this setup as a loop, but in reality we will never walk back up the backref tree, if we do then it's a bug. Get rid of the loop and handle the case where we have node->new_bytenr set at all. Previous the check was only if node->new_bytenr != root->node->start, but if it did then we would hit the WARN_ON() and walk back up the tree. Instead we want to just freak out if ->new_bytenr is set, and then do the normal updating of the node for the reloc root and carry on. Signed-off-by: Josef Bacik <[email protected]>

Since we no longer maintain backref cache across transactions, and this is only called when we're creating the reloc root for a newly created snapshot in the transaction critical section, we will end up doing a bunch of work that will just get thrown away when we start the transaction in the relocation loop. Delete this code as it no longer does anything for us. Signed-off-by: Josef Bacik <[email protected]>

We already determine the owner for any blocks we find when we're relocating, and for cowonly blocks (and the data reloc tree) we cow down to the block and call it good enough. However we still build a whole backref tree for them, even though we're not going to use it, and then just don't put these blocks in the cache. Rework the code to check if the block belongs to a cowonly root or the data reloc root, and then just cow down to the block, skipping the backref cache generation. Signed-off-by: Josef Bacik <[email protected]>

Now that we handle relocation for non-shareable roots without using the backref cache, remove the ->cowonly field from the backref nodes and update the handling to throw an ASSERT()/error. Signed-off-by: Josef Bacik <[email protected]>

We rely on finding all our nodes on the various lists in the backref cache, when they are all also in the rbtree. Instead just search through the rbtree and free everything. Signed-off-by: Josef Bacik <[email protected]>

Before we were keeping all of our nodes on various lists in order to make sure everything got cleaned up correctly. We used node->lowest to indicate that node->lower was linked into the cache->leaves list. Now that we do cleanup based on the rb tree both the list and the flag are useless, so delete them both. Signed-off-by: Josef Bacik <[email protected]>

We don't ever look at this list, remove it. Signed-off-by: Josef Bacik <[email protected]>

torvalds and others added 30 commits September 4, 2024 08:37

adam900710 and others added 30 commits October 3, 2024 19:53

btrfs: tests: add selftests for RAID stripe-tree

e367807

Add first stash of very basic self tests for the RAID stripe-tree. More test cases will follow exercising the tree. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

btrfs: simplify btrfs_backref_release_cache

4f7e20a

We rely on finding all our nodes on the various lists in the backref cache, when they are all also in the rbtree. Instead just search through the rbtree and free everything. Signed-off-by: Josef Bacik <[email protected]>

btrfs: remove detached list from btrfs_backref_cache

cc3a43f

We don't ever look at this list, remove it. Signed-off-by: Josef Bacik <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relocation fixes #1427

Relocation fixes #1427

josefbacik commented Oct 3, 2024

Relocation fixes #1427

Are you sure you want to change the base?

Relocation fixes #1427

Conversation

josefbacik commented Oct 3, 2024